A secure container runtime with OCI interface

Related tags

Deployment Quark
Overview

Quark Container

Quark Container

Welcome to Quark Container.

This repository is the home of Quark Containers code.

What's Quark Container

Quark Container is high performance secure container runtime with following features:

  1. OCI compatible: Quark Container includes an Open Container Initiative (OCI) interface. Common Docker container image can run in Quark Container.
  2. Secure: It provides Virtual Machine level workload isolation and security.
  3. High Performance: Quark Container is born for container workload execution with high performance. It developed with Rust language.

Performance test

The performance slices is performance.pdf. The detail test steps and result is here

Architecture

Quark Container takes classic Linux Virtual Machine architecture as below. It includes an HyperVisor named QVisor and a guest kernel named QKernel. Unlike the common Linux Virtual Machine design, in which standard OS image such as Linux/Windows can run on Qumu. QVisor and QKernel are tightly coupled. QVisor only supports QKernel.

Architecture

Quark Container's high level design is as below. It handles Container Application request with following steps.

  1. Container Application System Call: In Quark Container, Container Application run as a Guest Application. And it sends request to Quark through Guest System Call, e.g. X86-64 SysCall/SysRet.
  2. Host System Call: From Host OS perspective, Quark is running as a common Linux application. When Quark gets Guest System Call, it will explained that in the Quark runtime. If it needs to access the host system, e.g. read host file, it will call Host OS through Host System Call.
  3. QCall: For the communication between Guest Space and Host Space, QKernel doesn't call QVisor through HyperCall directly as common Virtual Machine design. Instead, it sends request to QVisor through QCall, which is based on Share memory queue. There is a dedicated QCall handing thread waiting in Host Space to process QCall request. Based on that, VCPU thread's high cost Guest/Host switch is avoid. For the host IO data operation, such as socket read/write, Qkernel will call the Host Kernel direclty with IO-Uring, which could bypass QVisor to achieve better performance. (Note: IO-Uring won't handle IO control operation, such as Open, for security purpose)

High Level Design

System Requirement

  1. OS: Linux Kernel > 5.8.0
  2. Processor: X86-64 (Quark only support 64 bit architecture, So far only support Intel CPU)
  3. Docker: > 17.09.0

Installing from source

Requirement

Quark builds on X86-64 only. Other architecture will be available in the future.

Quark is developed with Rust language. The build OS needs to install Rust nightly.

Build

git clone [email protected]:QuarkContainer/Quark.git
cd Quark
make
make install

Install / Setup / Configuration

  1. Install binary: Quark has 2 binaries: "quark" and "qkernel.bin". Both of them was copied to /usr/local/bin/ folder when running make install. "quark" contains QVisor code and it also implement the OCI interface.
  2. Setup Docker: To enable Docker to run container with Quark Container, "/etc/docker/daemon.json" needs to be updated. Example is as daemon.json
  3. Restart Docker: After the "/etc/docker/daemon.json" is updated, The Docker daemon restart is need to enable the configuration change
sudo systemctl restart docker

Helloworld:

The helloworld docker sample application can be executed as below:

sudo systemctl restart docker
docker run --rm --runtime=quark hello-world

Configuration

Quark Container's configuration file is at /etc/quark/config.json. Configuration detail is TBD...

Debug and Log

Quark Container's debug log is put in /var/log/quark/quark.log. It could enable or disable by "DebugLevel" of /etc/quark/config.json. There are 5 possible value of "DebugLevel" as below.

Off,
Error,
Warn,
Info,
Debug,
Trace,

When log is enabled, e.g. Debug. After run a docker image with Quark Container, the logs will be generated in the /var/log/quark/quark.log. doc

Comments
  • busybox container spend more than 140MB memory with runtime of quark

    busybox container spend more than 140MB memory with runtime of quark

    Hi, I'm trying to reproduce the performance test in the doc, but when I test the memory overhead, I found that the memory cost is too high compared with the doc.

    the memory cost is only 11MB for quark, but my result is almost 140MB.

    My host kernel is Linux localhost.localdomain 5.12.10-1.el8.elrepo.x86_64 #1 SMP Wed Jun 9 16:17:47 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

    the config of quark is

    {
      "DebugLevel"    : "Error",
      "KernelMemSize" : 16,
      "SlowPrint"     : false,
      "LogLevel"      : "Simple",
      "TcpBuffIO"     : true,
      "EnableAIO"     : true,
      "PrintException": false
    }
    
    opened by abel-von 9
  • Ask for help: Crash with java runtime.

    Ask for help: Crash with java runtime.

    Repro steps:

    1. git clone https://github.com/linksgo2011/jmh-reports.git
    2. docker run -it --net=host -v /go/src/github.com/linksgo2011/jmh-reports:/mnt maven:3.8-openjdk-11 /bin/bash
    3. In the docker bash terminal: mvn

    The issue can be repro when running "javac". The debug steps are as below. The issue is that there are one memory page are marked as PROTO_NONE. But there is an read access later. The code is generated at runtime.

    Need to understand the purpose of the mprotect and when the access will be triggered. So that we know which system call implementation bug leads to this crash.

    bug 
    opened by QuarkContainer 6
  • Got `Segmentation fault` when trying to run Node.js

    Got `Segmentation fault` when trying to run Node.js

    Hi, I got Segmentation fault when trying to run Node.js in ubuntu:22.04 image. There are commands I used:

    • Build & install quark runtime:
    git clone [email protected]:QuarkContainer/Quark.git
    cd Quark
    make
    make install
    
    sudo dockerd -D -H unix:///home/dm4/work/quark/docker/docker.sock \
      --data-root /home/dm4/work/quark/docker/root \
      --pidfile /home/dm4/work/quark/docker/docker.pid \
      --add-runtime quark=/usr/local/bin/quark \
      --add-runtime quark_d=/usr/local/bin/quark_d
    
    docker context create dm4-quark --docker host=unix:///home/dm4/work/quark/docker/docker.sock
    
    docker context use dm4-quark
    

    Run hello-world:latest image is fine:

    docker run --rm hello-world
    docker run --rm --runtime=quark hello-world
    

    hello-world.js

    console.log('Hello, world!');
    

    Dockerfile

    FROM ubuntu:22.04
    
    WORKDIR /app
    RUN apt-get update && apt-get install -y curl sudo && \
      curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - && \
      sudo apt-get install -y nodejs
    ADD hello-world.js .
    CMD ["node", "hello-world.js"]
    

    Build docker image:

    docker build -t nodejs .
    

    Use default runtime is fine:

    docker run --rm nodejs
    

    Got Segmentation fault when using quark runtime:

    docker run --rm --runtime=quark nodejs
    

    Environment:

    $ uname -r
    5.15.0-52-generic
    
    $ docker version
    Client:
     Version:           20.10.12
     API version:       1.41
     Go version:        go1.16.2
     Git commit:        20.10.12-0ubuntu2~20.04.1
     Built:             Wed Apr  6 02:14:38 2022
     OS/Arch:           linux/amd64
     Context:           dm4-quark
     Experimental:      true
    
    Server:
     Engine:
      Version:          20.10.12
      API version:      1.41 (minimum version 1.12)
      Go version:       go1.16.2
      Git commit:       20.10.12-0ubuntu2~20.04.1
      Built:            Thu Feb 10 15:03:35 2022
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.5.9-0ubuntu1~20.04.4
      GitCommit:
     runc:
      Version:          1.1.0-0ubuntu1~20.04.1
      GitCommit:
     docker-init:
      Version:          0.19.0
      GitCommit:
    
    opened by dm4 5
  • EOWNERDEAD during IO_URING_WAKE in Kernel version 5.12

    EOWNERDEAD during IO_URING_WAKE in Kernel version 5.12

    Per our discussion on Discord and emailing I am creating this issue so it is publicly known. Running docker with quark runtime fails for kernel versions 5.11 and 5.12 (and probably some later patch versions of 5.10) during IO_URING_WAKE returning a SysError(130) which maps to EOWNERDEAD in errno.

    I have not tried the latest version of the rust uring library from tokio authors to test if their implementation is working (there have been several commits iouring subsystem in the kernel and to tokio-rs/io-uring/master after you have adapted their code base).

    What I have done so far: Tracing the syscalls does not help as this error is thrown from one place anyway.

    Possible further actions: Try tokio-rs/io-uring in the latest version of the kernel and look for the discrepancies between their version and "qlib/uring".

    bug good first issue 
    opened by Akaame 5
  • Stripped gVisor copyright?

    Stripped gVisor copyright?

    Neat project!

    I work on gVisor. We'd thought about rust in the past and it's pretty amazing that you've got so much plumbing done.

    One issue that I would like to raise, however: I noticed that a lot of the code is directly derived from the gVisor code (same comments, same structure, same directory layout) effectively transliterated to rust. There are many examples, but the ELF loader is a good one ([1], [2]). I noticed that you did preserve the Copyright only in the VDSO [3] for some reason.

    It's awesome and exciting to see the code seeding new things, and I imagine that this is a simple misunderstanding. I'd like to ask that any file that includes content copied or derived from the files of other open source projects should please retain all copyright notices present in the upstream files.

    Thanks, and I'm keen to dig in more to what you've made!

    [1] https://github.com/QuarkContainer/Quark/blob/main/qkernel/src/loader/elf.rs#L47 [2] https://github.com/google/gvisor/blob/master/pkg/sentry/loader/elf.go#L70 [3] https://github.com/QuarkContainer/Quark/tree/main/vdso

    opened by amscanne 3
  • After upgrade to 1.53.0-nightly, SetupRootContainerFS crash if print mounts

    After upgrade to 1.53.0-nightly, SetupRootContainerFS crash if print mounts

    After upgrade to 1.53.0-nightly, the qkernel bootstrap will crash in SetupRootContainerFS. Workaround this with commit https://github.com/QuarkContainer/Quark/commit/fe80a71ee461fa8c7dd4b3c642a9ed082760a94d.

    Need to RCS this issue.

    bug 
    opened by QuarkContainer 3
  • should we update crate x86_64  to version 1.30+

    should we update crate x86_64 to version 1.30+

    Rust 1.52.0-nightly has removed feature const_in_array_repeat_expressions. With latest nightly rust, build Quark will report error:

    error[E0557]: feature has been removed
     --> /home/cordius/.cargo/registry/src/github.com-1ecc6299db9ec823/x86_64-0.12.3/src/lib.rs:9:43
      |
    9 | #![cfg_attr(feature = "const_fn", feature(const_in_array_repeat_expressions))]
      |                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ feature has been removed
      |
      = note: removed due to causing promotable bugs
    

    And update crate x86_64 should resolv this problem.

    see https://github.com/rust-osdev/x86_64/issues/233

    opened by Cordius 3
  • Quark container failed to run with error:

    Quark container failed to run with error: "quark create start" and cannot be remove

    Repro Steps:

    1. run command below:
    docker pull gcr.io/workload-controller-manager/serverless/benchmark/ml-time
    docker run --runtime=quark --rm --name time-app gcr.io/workload-controller-manager/serverless/benchmark/ml-time
    
    1. result: no response and quark log has error:
     1 [ERROR] [0/4470552343743] quark create start
    .....
    ......
     [DEBUG] [0/4470552345537] Creating new sandbox for container 6facb7b756f6cec6c2c23da3449dcac61ab9a752c16d58e953871a9873e749ff
      3 [INFO] [0/4470552358780] commandline args is /usr/local/bin/quark boot --pipefd 9
      2 [DEBUG] [0/4470552370651] Creating the sandboxRootDir at /var/lib/quark/6facb7b756f6cec6c2c23da3449dcac61ab9a752c16d58e953871a9873e749ff
      1 [INFO] [0/4470552373045] Save container 6facb7b756f6cec6c2c23da3449dcac61ab9a752c16d58e953871a9873e749ff
      0 [ERROR] [0/4470552373324] EnableNamespace ToEnterNS is []
      1 [ERROR] [0/4470552379536] exit successfully ...
    
    
    
    1. try to delete using the command below no response
    $ docker ps -a
    CONTAINER ID   IMAGE                                                             COMMAND                 CREATED        STATUS        PORTS     NAMES
    6b2a26750d80   gcr.io/workload-controller-manager/serverless/benchmark/ml-time   "python3 /app/app.py"   15 hours ago   Up 15 hours             time-app
    
    $ docker rm -f 6b2a26750d80
    
    1. expected results from docker runtime
    $ docker run --rm --name time-app gcr.io/workload-controller-manager/serverless/benchmark/ml-time
    Linux-5.15.0-1021-gcp-x86_64-with-glibc2.2.5
    Python 3.8.15 (default, Nov 15 2022, 22:26:27)
    [GCC 8.3.0]
    NumPy 1.23.5
    SciPy 1.9.3
    Shape of the training data
    (1300, 160)
    (1300,)
    /tmp/clf_nn.joblib
    /tmp/clf_lda.joblib
    latency : 24.292011737823486
    1669946718124332038
    
    1. logs can be found at 10.218.233.96: /home/ubuntu/sonya/quark/issues/quark-issue757.log
    2. REpro build:
    $ git log --oneline
    19faa942 (HEAD) Merge pull request #738 from QuarkContainer/dev7
    15c632de (origin/dev7) free heap when hibernate
    98dc4497 Merge pull request #737 from QuarkContainer/dev7
    b7cf6179 add amd64 support
    56e3eefc Merge pull request #736 from QuarkContainer/dev7
    a9b3d47d clean tlbshootdown
    13b0339f revert cpuenter/cpuleave
    0c805cf6 Merge pull request #735 from QuarkContainer/hc-dev
    a1e3af5f Add ingress dockerfile
    
    opened by sonyafenge 1
  • Linux kernel 5.13 eventfd write doesn't regression

    Linux kernel 5.13 eventfd write doesn't regression

    In Linux kernel 5.13, eventfd write doesn't work in randomly. The issue doesn't happen in other Linux kernel version such 5.12 and 5.14. As the Quark vcpu wake up depends on io_uring eventfd wake up, it will block Quark execution.

    opened by QuarkContainer 1
  • Linux Kernel 5.14.0-051400 IO-Uring random performance regression issue

    Linux Kernel 5.14.0-051400 IO-Uring random performance regression issue

    Linux Kernel 5.14.0-051400 IO-Uring performance is unstable. Its performance sometime performance decrease very much.

    Repro:

    1. Enable {"DebugLevel" : "Debug"} in config.json
    2. run docker run -P --runtime=quark --rm -it ubuntu /bin/dd if=/dev/zero of=/test/fio-rand-read bs=4k count=2500
    3. Check the quar.log. We can find that sometime the sys_write performance is very bad. The root cause is that the io_uring 's write is very slow.

    This issue can't repro in 5.11.0-38 and 5.11.0-37.

    opened by QuarkContainer 1
  • Urning doesn't work for Linux 5.13 and 5.11.0-34

    Urning doesn't work for Linux 5.13 and 5.11.0-34

    Urning still can work for 5.11.0-27-generic. There is same issue for 5.14.0-051400.

    When The uring sqe(Submit Queue Entry) queue is more than 4K (64 entries), the memory map of the sqe >4K part can't be access from qkernel. When set the sqe count <= 64 entries, there is no issue.

    opened by QuarkContainer 1
  • Got  `io::error is Os { code: 13, kind: PermissionDenied, message: \\\\\\\\\\\\\\\

    Got `io::error is Os { code: 13, kind: PermissionDenied, message: \\\\\\\\\\\\\\\"Permission denied\\\\\\\\\\\\\\\" }\\\\\\\")\\\")\")"): unknown` error when I try to deploy `nginx-quark` on kubedam cluster

    I got the following error when I try to deploy a qvisor based nginx on kubedam cluster:

    yaoxin@master:~/Quark$ kubectl describe pod
    Name:                nginx-quark
    Namespace:           default
    Priority:            0
    Runtime Class Name:  quark
    Service Account:     default
    Node:                worker/192.168.122.12
    Start Time:          Thu, 01 Sep 2022 09:36:55 +0000
    Labels:              <none>
    Annotations:         <none>
    Status:              Pending
    IP:                  
    IPs:                 <none>
    Containers:
      nginx:
        Container ID:   
        Image:          nginx
        Image ID:       
        Port:           <none>
        Host Port:      <none>
        State:          Waiting
          Reason:       ContainerCreating
        Ready:          False
        Restart Count:  0
        Environment:    <none>
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fs9jj (ro)
    Conditions:
      Type              Status
      Initialized       True 
      Ready             False 
      ContainersReady   False 
      PodScheduled      True 
    Volumes:
      kube-api-access-fs9jj:
        Type:                    Projected (a volume that contains injected data from multiple sources)
        TokenExpirationSeconds:  3607
        ConfigMapName:           kube-root-ca.crt
        ConfigMapOptional:       <nil>
        DownwardAPI:             true
    QoS Class:                   BestEffort
    Node-Selectors:              <none>
    Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:
      Type     Reason                  Age   From               Message
      ----     ------                  ----  ----               -------
      Normal   Scheduled               26s   default-scheduler  Successfully assigned default/nginx-quark to worker
      Warning  FailedCreatePodSandBox  21s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: Others("Other: Common(\"ttrpc error is Other(\\\"IOError(\\\\\\\"WriteFile \\\\\\\\\\\\\\\"/sys/fs/cgroup/cpu/kubepods-besteffort-podc6afbe65_7732_4026_8075_8b677c2ecbba.slice:cri-containerd:81e21487a6ae2bef9ec479fc1bba62403d0cb914c3b8a10cc271b6477f36d58e/cpu.shares\\\\\\\\\\\\\\\" io::error is Os { code: 13, kind: PermissionDenied, message: \\\\\\\\\\\\\\\"Permission denied\\\\\\\\\\\\\\\" }\\\\\\\")\\\")\")"): unknown
      Warning  FailedCreatePodSandBox  6s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: Others("Other: Common(\"ttrpc error is Other(\\\"IOError(\\\\\\\"WriteFile \\\\\\\\\\\\\\\"/sys/fs/cgroup/cpu/kubepods-besteffort-podc6afbe65_7732_4026_8075_8b677c2ecbba.slice:cri-containerd:db9ab5d94833c4795b7d6029f004056d81b1fc34d42b543b05e03e55350d7796/cpu.shares\\\\\\\\\\\\\\\" io::error is Os { code: 13, kind: PermissionDenied, message: \\\\\\\\\\\\\\\"Permission denied\\\\\\\\\\\\\\\" }\\\\\\\")\\\")\")"): unknown
    

    Environment

    Kernel version:

    yaoxin@master:~/Quark$ uname -r
    5.15.0-46-generic
    

    Kubeadm version:

    yaoxin@master:~/Quark$ sudo kubeadm version
    [sudo] password for yaoxin: 
    kubeadm version: &version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-08-23T17:43:25Z", GoVersion:"go1.19", Compiler:"gc", Platform:"linux/amd64"}
    

    Check nest virtualization on master VM/Node:

    yaoxin@master:~/Quark$ sudo virt-host-validate
      QEMU: Checking for hardware virtualization                                 : PASS
      QEMU: Checking if device /dev/kvm exists                                   : PASS
      QEMU: Checking if device /dev/kvm is accessible                            : PASS
      QEMU: Checking if device /dev/vhost-net exists                             : PASS
      QEMU: Checking if device /dev/net/tun exists                               : PASS
      QEMU: Checking for cgroup 'cpu' controller support                         : PASS
      QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
      QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
      QEMU: Checking for cgroup 'memory' controller support                      : PASS
      QEMU: Checking for cgroup 'devices' controller support                     : PASS
      QEMU: Checking for cgroup 'blkio' controller support                       : PASS
      QEMU: Checking for device assignment IOMMU support                         : WARN (No ACPI DMAR table found, IOMMU either disabled in BIOS or not supported by this hardware platform)
      QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)
       LXC: Checking for Linux >= 2.6.26                                         : PASS
       LXC: Checking for namespace ipc                                           : PASS
       LXC: Checking for namespace mnt                                           : PASS
       LXC: Checking for namespace pid                                           : PASS
       LXC: Checking for namespace uts                                           : PASS
       LXC: Checking for namespace net                                           : PASS
       LXC: Checking for namespace user                                          : PASS
       LXC: Checking for cgroup 'cpu' controller support                         : PASS
       LXC: Checking for cgroup 'cpuacct' controller support                     : PASS
       LXC: Checking for cgroup 'cpuset' controller support                      : PASS
       LXC: Checking for cgroup 'memory' controller support                      : PASS
       LXC: Checking for cgroup 'devices' controller support                     : PASS
       LXC: Checking for cgroup 'freezer' controller support                     : PASS
       LXC: Checking for cgroup 'blkio' controller support                       : PASS
       LXC: Checking if device /sys/fs/fuse/connections exists                   : PASS
    

    Check nest virtualization on worker VM/Node:

    yaoxin@worker:~/Quark$ sudo virt-host-validate
    [sudo] password for yaoxin: 
      QEMU: Checking for hardware virtualization                                 : PASS
      QEMU: Checking if device /dev/kvm exists                                   : PASS
      QEMU: Checking if device /dev/kvm is accessible                            : PASS
      QEMU: Checking if device /dev/vhost-net exists                             : PASS
      QEMU: Checking if device /dev/net/tun exists                               : PASS
      QEMU: Checking for cgroup 'cpu' controller support                         : PASS
      QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
      QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
      QEMU: Checking for cgroup 'memory' controller support                      : PASS
      QEMU: Checking for cgroup 'devices' controller support                     : PASS
      QEMU: Checking for cgroup 'blkio' controller support                       : PASS
      QEMU: Checking for device assignment IOMMU support                         : WARN (No ACPI DMAR table found, IOMMU either disabled in BIOS or not supported by this hardware platform)
      QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)
       LXC: Checking for Linux >= 2.6.26                                         : PASS
       LXC: Checking for namespace ipc                                           : PASS
       LXC: Checking for namespace mnt                                           : PASS
       LXC: Checking for namespace pid                                           : PASS
       LXC: Checking for namespace uts                                           : PASS
       LXC: Checking for namespace net                                           : PASS
       LXC: Checking for namespace user                                          : PASS
       LXC: Checking for cgroup 'cpu' controller support                         : PASS
       LXC: Checking for cgroup 'cpuacct' controller support                     : PASS
       LXC: Checking for cgroup 'cpuset' controller support                      : PASS
       LXC: Checking for cgroup 'memory' controller support                      : PASS
       LXC: Checking for cgroup 'devices' controller support                     : PASS
       LXC: Checking for cgroup 'freezer' controller support                     : FAIL (Enable 'freezer' in kernel Kconfig file or mount/enable cgroup controller in your system)
       LXC: Checking for cgroup 'blkio' controller support                       : PASS
       LXC: Checking if device /sys/fs/fuse/connections exists                   : PASS
    
    

    Master VM libvirt xml:

    <domain type='kvm'>
    <name>master</name>
    <memory unit='G'>3</memory>
    <currentMemory unit='G'>3</currentMemory>
    <vcpu>2</vcpu>
    <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='hd'/>     //即harddisk,从磁盘启
    </os>
    <cpu mode='host-passthrough'/>
    <features>
    <acpi/>
    <apic/>
    <pae/>
    </features>
    <clock offset='localtime'/>
    <on_poweroff>destroy</on_poweroff>
    <on_reboot>restart</on_reboot>
    <on_crash>destroy</on_crash>
    <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
    <driver name='qemu' type='qcow2'/>
    <source file='/home/yaoxin/project/useful_script/test/master_img.qcow2'/> //目的镜像路径
    <target dev='hda' bus='ide'/>
    </disk>
    <disk type='file' device='cdrom'>
    <source file='/home/yaoxin/project/useful_script/test/ubuntu-22.04.1-live-server-amd64.iso'/> //光盘镜像路径
    <target dev='hdb' bus='ide'/>
    </disk>
    <interface type='bridge'>
    <source bridge='virbr0'/>
    <mac address="00:16:3e:5d:aa:a9"/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
    
    </devices>
    </domain>
    

    Worker VM libvirt xml:

    <domain type='kvm'>
    <name>master</name>
    <memory unit='G'>3</memory>
    <currentMemory unit='G'>3</currentMemory>
    <vcpu>2</vcpu>
    <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='hd'/>     //即harddisk,从磁盘启
    </os>
    <cpu mode='host-passthrough'/>
    <features>
    <acpi/>
    <apic/>
    <pae/>
    </features>
    <clock offset='localtime'/>
    <on_poweroff>destroy</on_poweroff>
    <on_reboot>restart</on_reboot>
    <on_crash>destroy</on_crash>
    <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
    <driver name='qemu' type='qcow2'/>
    <source file='/home/yaoxin/project/useful_script/test/master_img.qcow2'/> //目的镜像路径
    <target dev='hda' bus='ide'/>
    </disk>
    <disk type='file' device='cdrom'>
    <source file='/home/yaoxin/project/useful_script/test/ubuntu-22.04.1-live-server-amd64.iso'/> //光盘镜像路径
    <target dev='hdb' bus='ide'/>
    </disk>
    <interface type='bridge'>
    <source bridge='virbr0'/>
    <mac address="00:16:3e:5d:aa:a9"/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
    
    </devices>
    </domain>
    

    Containerd version

    yaoxin@master:~/Quark$ containerd version
    INFO[2022-09-01T11:16:48.013602390Z] starting containerd                           revision=9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 version=1.6.8
    

    CPU MODEL on vm

    yaoxin@master:~/Quark$ cat /proc/cpuinfo 
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 69
    model name      : Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
    stepping        : 1
    microcode       : 0x26
    cpu MHz         : 2394.456
    cache size      : 16384 KB
    physical id     : 0
    siblings        : 1
    core id         : 0
    cpu cores       : 1
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 13
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat umip md_clear arch_capabilities
    vmx flags       : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest shadow_vmcs pml
    bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs srbds
    bogomips        : 4788.91
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 39 bits physical, 48 bits virtual
    power management:
    
    processor       : 1
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 69
    model name      : Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
    stepping        : 1
    microcode       : 0x26
    cpu MHz         : 2394.456
    cache size      : 16384 KB
    physical id     : 1
    siblings        : 1
    core id         : 0
    cpu cores       : 1
    apicid          : 1
    initial apicid  : 1
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 13
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat umip md_clear arch_capabilities
    vmx flags       : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest shadow_vmcs pml
    bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs srbds
    bogomips        : 4788.91
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 39 bits physical, 48 bits virtual
    power management:
    

    How to reproduce?

    After boot the master and worker VM, I did following on master and worker vms

    1. Download the qvisor
    2. Compiled it with "ShimMode" : true,
    3. make install
    4. Modify /etc/containerd/config.toml
    cat <<EOF | sudo tee /etc/containerd/config.toml
    version = 2
    [plugins."io.containerd.runtime.v1.linux"]
      shim_debug = true
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
      runtime_type = "io.containerd.runsc.v1"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.quark]
      runtime_type = "io.containerd.quark.v1"
    EOF
    
    1. Start a k8s cluster
    # Execute on master node
    sudo kubeadm init --cri-socket=/var/run/containerd/containerd.sock --pod-network-cidr=10.244.0.0/16
    
    sudo rm $HOME/.kube/config
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    # Execute on worker node
    sudo kubeadm join 10.218.233.29:6443 --cri-socket=/var/run/containerd/containerd.sock --token qy2r1j.t0y5ekx71t0tcfiq \
            --discovery-token-ca-cert-hash sha256:78a23762652befd90bbcd3506ca9309c5243371360d7a66fc131cb1a4b25555
    
    1. Add CNI to K8S
    kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
    
    1. Add quark as a Runtime Resource to K8S
    cat <<EOF | kubectl apply -f -
    apiVersion: node.k8s.io/v1
    kind: RuntimeClass
    metadata:
      name: quark
    handler: quark
    EOF
    
    1. use Quark to run nginx
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx-quark
    spec:
      runtimeClassName: quark
      containers:
      - name: nginx
        image: nginx
    EOF
    

    Then I checked the pod status:

    yaoxin@master:~/Quark$ kubectl get pod
    NAME          READY   STATUS              RESTARTS   AGE
    nginx-quark   0/1     ContainerCreating   0          109m
    
    opened by yaoxin1995 0
Releases(v0.2.0)
  • v0.2.0(Dec 9, 2022)

    Release Summary

    The quark container v0.2.0 release includes adding AMD64 support to quark container, supporting UDP over RDMA, adding Egress and Ingress over Kubernetes over RDMA.

    Key Features and Improvements

    1. AMD64 support

    • Add AMD64 support for Quark in addition to X86-64.
    • Test environment: AMD Ryzen 9 6900HX

    2. UDP over RDMA

    • k8s can run pods using Quark runtime RDMA network
    • Quark in k8s supports both TCP and UDP

    3. Egress and Ingress networking in k8s RDMA network

    • Add Egress gateway for container. Containers can visit external IPs.
    • Inside k8s cluster, the traffic is RDMA. Will be changed to TCP in Egress gateway to visit external.
    • Add Ingress gateway for container. User can define ingress rule for mapping between host port to containers.
    • Ingress traffic visits host ip/port, and Ingress gateway changes to traffic to RDMA and redirects to corresponding pods.
    • Add support for traffic from host machine to container network. The transition happens in Ingress gateway.

    4. Reliability/Performance bug fixes

    • Fix issue that quark container cannot be started randomly.
    • Optimize hibernate feature.
    • Add an inner loop inside event processing to improve perf.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Aug 1, 2022)

    Release Summary

    The quark container v0.1.0 release includes supporting Container Runtime Interface (CRI), initial version of TCP Socket over RDMA (TSoR) control plane and data plane and fix multiple reliability/Performance bug fixes.

    *Note: To set up k8s using quark container, it depends on quarkcni, please refer to k8s setup to configure.

    Key Features and Improvements

    1. CRI support

    • Develop Quark contained Shim based on Containerd Shim V2.
    • Support Sandbox/Subcontainer in Quark runtime
    • Kubernetes integration test with MiniKube and K8S cluster.

    2. TCP over RDMA support control plane

    • k8s can run pods using Quark runtime
    • Quark in k8s supports both RDMA CNI or other CNIs such as flannel
    • Added Quark Control Manager (quarkcm) and Quark RDMA CNI (quarkcni)
    • Quark Control Manager watches k8s cluster's node and pod and send changed data to RDMA service
    • RDMA service setups connections between nodes automatically

    3. TCP over RDMA support data plane

    • RDMA support for following socket APIs:

      • bind
      • listen
      • connect
      • read* (a group of functions related to read: read, recvfrom, receive)
      • write (a group of functions related to write: write, sendto, send)
      • getsockname
      • getpeername
    • RDMA service which acts as a standalone process to routing network traffic between local pods and remote nodes

      • Use share memory and lockless queue between RDMA service and quark container to gain best performance
      • Share RDMA Queue Pair between nodes for multiple quark containers to make RDMA NIC more scalable
      • Create RDMA channel state machine to simulate TCP state diagram

    4. Reliability/Performance bug fixes

    • Fix TLB shoot down handling bug
    • Increase memory page management parallelism
    • Change process memory space lock from mutex to upgradable RW lock
    Source code(tar.gz)
    Source code(zip)
Owner
null
Easy to use, extendable, OCI-compliant container runtime written in pure Rust

PURA - Lightweight & OCI-compliant container runtime Pura is an experimental Linux container runtime written in pure and dependency-minimal Rust. The

Branimir Malesevic 73 Jan 9, 2023
Shallow Container is a light-weight container tool written in Rust.

Shallow Container is a light-weight container tool written in Rust. It is totally for proof-of-concept and may not suit for production environment.

Rui Li 14 Apr 8, 2022
Experimental implementation of the oci-runtime in Rust

youki Experimental implementation of the oci-runtime in Rust Overview youki is an implementation of runtime-spec in Rust, referring to runc. This proj

utam0k 12 Sep 23, 2022
youki is an implementation of the OCI runtime-spec in Rust, similar to runc.

youki is an implementation of the OCI runtime-spec in Rust, similar to runc.

Containers 4.2k Dec 29, 2022
A tiny minimal container runtime written in Rust.

vas-quod A tiny minimal container runtime written in Rust. The idea is to support a minimal isolated containers without using existing runtimes, vas-q

flouthoc 438 Dec 26, 2022
dedock is a container runtime, with a particular focus on enabling embedded software development across all platforms

dedock is a container runtime, with a particular focus on enabling embedded software development across all platforms. It supports native "containers" on both Linux and macOS.

Daniel Mangum 12 May 27, 2023
VMM-based macOS Native Container Runtime

Akari: VMM-based macOS Native Container Runtime Akari is an experimental OCI runtime aims to run macOS native containers on macOS. This runtime works

Akira Moroo 29 Jul 15, 2024
Runc - CLI tool for spawning and running containers according to the OCI specification

runc Introduction runc is a CLI tool for spawning and running containers on Linux according to the OCI specification. Releases You can find official r

Open Container Initiative 9.9k Jan 5, 2023
Inspect and dump OCI images.

reinlinsen ?? rl is a tool to inspect and dump OCI images or single image layers. Installation From source If you have cargo installed you can just ru

Tobias Brumhard 5 May 11, 2023
A lite tool to make systemd work in any container(Windows Subsystem for Linux 2, Docker, Podman, etc.)

Angea Naming from hydrangea(アジサイ) A lite tool to make systemd work in any container(Windows Subsystem for Linux 2, Docker, Podman, etc.) WSL1 is not s

いんしさくら 16 Dec 5, 2022
Container monitor in Rust

Conmon-rs A pod level OCI container runtime monitor. The goal of this project is to provide a container monitor in Rust. The scope of conmon-rs encomp

Containers 84 Dec 21, 2022
insject is a tool for poking at containers. It enables you to run an arbitrary command in a container or any mix of Linux namespaces.

Insject insject is a tool for poking at containers. It enables you to run an arbitrary command in a container or any mix of Linux namespaces. It suppo

NCC Group Plc 44 Nov 9, 2022
Hot-plug devices into a Docker container as they are plugged.

container-hotplug Hot-plug (and unplug) devices into a Docker container as they are (un)plugged. Description Docker provides the --device flag to give

lowRISC 2 Oct 17, 2022
Rust Kubernetes client and controller runtime

kube-rs Rust client for Kubernetes in the style of a more generic client-go, a runtime abstraction inspired by controller-runtime, and a derive macro

kube-rs 1.8k Jan 8, 2023
Valheim Docker powered by Odin. The Valheim dedicated gameserver manager which is designed with resiliency in mind by providing automatic updates, world backup support, and a user friendly cli interface.

Valheim Docker If you are looking for a guide on how to get started click here Mod Support! It is supported to launch the server with BepInEx but!!!!!

Michael 657 Dec 30, 2022
oci-image and oci-runtime spec in rust.

oci-lib Oci-Spec for your container runtime or container registry. Oci-lib is a rust port for original oci spec written in go. Following crate contain

flouthoc 12 Mar 10, 2022
Easy to use, extendable, OCI-compliant container runtime written in pure Rust

PURA - Lightweight & OCI-compliant container runtime Pura is an experimental Linux container runtime written in pure and dependency-minimal Rust. The

Branimir Malesevic 73 Jan 9, 2023
A high-performance, secure, extensible, and OCI-complaint JavaScript runtime for WasmEdge.

Run JavaScript in WebAssembly Now supporting wasmedge socket for HTTP requests and Tensorflow in JavaScript programs! Prerequisites Install Rust and w

Second State 219 Jan 3, 2023
A container image builder tool for OCI (distrobox/toolbox, also podman/docker)

Distrobox Boost A container image builder tool for Open Container Initiative (distrobox/toolbox, also podman/docker). Distrobox is good enough in runn

xz-dev 6 Aug 15, 2023
Shallow Container is a light-weight container tool written in Rust.

Shallow Container is a light-weight container tool written in Rust. It is totally for proof-of-concept and may not suit for production environment.

Rui Li 14 Apr 8, 2022