H2O Open Source Kubernetes operator and a command-line tool to ease deployment (and undeployment) of H2O open-source machine learning platform H2O-3 to Kubernetes.

Overview

H2O Kubernetes

Tests

Repository with official tools to aid the deployment of H2O Machine Learning platform to Kubernetes. There are two essential tools to be found in this repository:

  1. H2O Operator - for first class H2O Kubernetes support (README),
  2. Command Line Interface - to ease deployment of the operator and/or deploy H2O to clusters without the operator (README).

Binaries available: Download for Mac / Linux / Windows. Or build from source.

operator

The operator is an implementation of Kubernetes operator pattern specifically for H2O. Once deployed to a Kubernetes cluster, a new custom resource named H2O is recognized by Kubernetes, making it easy to create H2O clusters inside Kubernetes cluster using plain kubectl. The CLI is a binary usually running on the client's side, usable to deploy the operator itself into Kubernetes cluster or create H2O clusters in Kubernetes in cases when the operator itself may not be used. There are also Helm charts available as yet another way to deploy H2O into Kubernetes. Using the operator first and then falling back to CLI/Helm is the recommended approach.

For detailed instructions on how to use each tool, please refer to the specific user guides:

Contributing

Contributions are welcome and encouraged. Please refer to the contributing guide. If you've encountered a bug, or there is any feature missing, please create an issue on GitHub.

License

This project is licensed under the Apache License 2.0.

Comments
  • Release to Docker Hub

    Release to Docker Hub

    Each release from master should be done into Docker Hub as well, not only into Red Hat. Bundle with v1beta1 CRD should go to Red Hat and Red-Hat specific files should be attached to the GitHub release.

    A "normal" release should use the v1 CRD (binary should be the same) and should be pushed to Docker Hub after OpenShift stage is finished. Use H2O_DOCKER_HUB_LOGIN AND H2O_DOCKER_HUB_PASSWORD secrets to push to docker hub.

    Will be release using a single pipeline.

    Include a file for each version of the release with:

    • CRD role
    • Permissions for the role
    area/infra-tools-automation 
    opened by Pscheidl 1
  • [PUBDEV-7847] Kubernetes operator for H2O-3

    [PUBDEV-7847] Kubernetes operator for H2O-3

    https://0xdata.atlassian.net/browse/PUBDEV-7847

    What is not present

    • Release pipeline for the operator. There will probably be an official docker image based on Red Hat's UBI (as this is required for the certification process)
    • Integration with www.operatorhub.io

    These will be done in next iteration, as it would make such a complex PR even more complex. Let's build the operator first, then do the integration part.

    What was done & guide to review

    Previously, this repository served only for the command line interface. There was only one binary with multiple modules. Now, there are three projects: cli, operator and deployment. As mentioned in the documentation, the deployment module contains common functions and modules for deployment of H2O into Kubernetes.

    Looking at the diff is not recommended, as basically everything changed. Let me suggest reviewing the whole project from scratch as if it was just commited :octocat: One way to do that is to view the repository from this PR's point of view: https://github.com/h2oai/h2o-kubernetes/tree/pubdev-7847 and follow the instructions on how to build the project and develop. This could also reveal any mistakes done in the documentation.

    What was done, it's purpose and means should be sufficiently explained in the in-code documentation and READMEs. Let me put that into a test by NOT repeating it here.

    Logging

    Example of logging template used for you to review:

    ~/.../target/release >>> ./h2o-operator                                                                         
    2020-11-07 19:24:55,748 INFO  [h2o_operator] Kubeconfig found. Using default namespace: default
    2020-11-07 19:24:55,768 INFO  [h2o_operator] Detected H2O CustomResourceDefinition already present in the cluster.
    2020-11-07 19:28:54,869 INFO  [h2o_operator::controller] Deployed H2O 'h2o-test'.
    2020-11-07 19:28:54,869 INFO  [h2o_operator::controller] Reconciled (ObjectRef { kind: (), name: "h2o-test", namespace: Some("default") }, ReconcilerAction { requeue_after: None })
    2020-11-07 19:28:54,873 INFO  [h2o_operator::controller] No action taken for:
    H2O { api_version: "h2o.ai/v1", kind: "H2O", metadata: ObjectMeta { annotations: Some({"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"h2o.ai/v1\",\"kind\":\"H2O\",\"metadata\":{\"annotations\":{},\"name\":\"h2o-test\",\"namespace\":\"default\"},\"spec\":{\"nodes\":3,\"resources\":{\"cpu\":1,\"memory\":\"512Mi\",\"memoryPercentage\":90},\"version\":\"3.32.0.1\"}}\n"}), cluster_name: None, creation_timestamp: Some(Time(2020-11-07T18:28:54Z)), deletion_grace_period_seconds: None, deletion_timestamp: None, finalizers: Some(["h2o3.h2o.ai"]), generate_name: None, generation: Some(1), labels: None, managed_fields: Some([ManagedFieldsEntry { api_version: Some("h2o.ai/v1"), fields_type: Some("FieldsV1"), fields_v1: Some(FieldsV1(Object({"f:metadata": Object({"f:annotations": Object({".": Object({}), "f:kubectl.kubernetes.io/last-applied-configuration": Object({})})}), "f:spec": Object({".": Object({}), "f:nodes": Object({}), "f:resources": Object({".": Object({}), "f:cpu": Object({}), "f:memory": Object({}), "f:memoryPercentage": Object({})}), "f:version": Object({})})}))), manager: Some("kubectl"), operation: Some("Update"), time: Some(Time(2020-11-07T18:28:54Z)) }, ManagedFieldsEntry { api_version: Some("h2o.ai/v1"), fields_type: Some("FieldsV1"), fields_v1: Some(FieldsV1(Object({"f:metadata": Object({"f:finalizers": Object({".": Object({}), "v:\"h2o3.h2o.ai\"": Object({})})})}))), manager: Some("unknown"), operation: Some("Update"), time: Some(Time(2020-11-07T18:28:54Z)) }]), name: Some("h2o-test"), namespace: Some("default"), owner_references: None, resource_version: Some("19334"), self_link: Some("/apis/h2o.ai/v1/namespaces/default/h2os/h2o-test"), uid: Some("084ea47e-7d8c-439f-abd4-953b521729c1") }, spec: H2OSpec { nodes: 3, version: Some("3.32.0.1"), resources: Resources { cpu: 1, memory: "512Mi", memory_percentage: Some(90) }, custom_image: None } }
    2020-11-07 19:28:54,873 INFO  [h2o_operator::controller] Reconciled (ObjectRef { kind: (), name: "h2o-test", namespace: Some("default") }, ReconcilerAction { requeue_after: None })
    2020-11-07 19:29:30,519 INFO  [h2o_operator::controller] Deleted H2O 'h2o-test'.
    2020-11-07 19:29:30,519 INFO  [h2o_operator::controller] Reconciled (ObjectRef { kind: (), name: "h2o-test", namespace: Some("default") }, ReconcilerAction { requeue_after: None })
    2020-11-07 19:30:21,428 INFO  [h2o_operator::controller] Deployed H2O 'h2o-test'.
    2020-11-07 19:30:21,428 INFO  [h2o_operator::controller] Reconciled (ObjectRef { kind: (), name: "h2o-test", namespace: Some("default") }, ReconcilerAction { requeue_after: None })
    2020-11-07 19:30:21,430 INFO  [h2o_operator::controller] No action taken for:
    H2O { api_version: "h2o.ai/v1", kind: "H2O", metadata: ObjectMeta { annotations: Some({"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"h2o.ai/v1\",\"kind\":\"H2O\",\"metadata\":{\"annotations\":{},\"name\":\"h2o-test\",\"namespace\":\"default\"},\"spec\":{\"nodes\":3,\"resources\":{\"cpu\":1,\"memory\":\"512Mi\",\"memoryPercentage\":90},\"version\":\"3.32.0.1\"}}\n"}), cluster_name: None, creation_timestamp: Some(Time(2020-11-07T18:30:21Z)), deletion_grace_period_seconds: None, deletion_timestamp: None, finalizers: Some(["h2o3.h2o.ai"]), generate_name: None, generation: Some(1), labels: None, managed_fields: Some([ManagedFieldsEntry { api_version: Some("h2o.ai/v1"), fields_type: Some("FieldsV1"), fields_v1: Some(FieldsV1(Object({"f:metadata": Object({"f:annotations": Object({".": Object({}), "f:kubectl.kubernetes.io/last-applied-configuration": Object({})})}), "f:spec": Object({".": Object({}), "f:nodes": Object({}), "f:resources": Object({".": Object({}), "f:cpu": Object({}), "f:memory": Object({}), "f:memoryPercentage": Object({})}), "f:version": Object({})})}))), manager: Some("kubectl"), operation: Some("Update"), time: Some(Time(2020-11-07T18:30:21Z)) }, ManagedFieldsEntry { api_version: Some("h2o.ai/v1"), fields_type: Some("FieldsV1"), fields_v1: Some(FieldsV1(Object({"f:metadata": Object({"f:finalizers": Object({".": Object({}), "v:\"h2o3.h2o.ai\"": Object({})})})}))), manager: Some("unknown"), operation: Some("Update"), time: Some(Time(2020-11-07T18:30:21Z)) }]), name: Some("h2o-test"), namespace: Some("default"), owner_references: None, resource_version: Some("19501"), self_link: Some("/apis/h2o.ai/v1/namespaces/default/h2os/h2o-test"), uid: Some("7c527027-11ee-4e24-a5b3-292a6085a80d") }, spec: H2OSpec { nodes: 3, version: Some("3.32.0.1"), resources: Resources { cpu: 1, memory: "512Mi", memory_percentage: Some(90) }, custom_image: None } }
    2020-11-07 19:30:21,430 INFO  [h2o_operator::controller] Reconciled (ObjectRef { kind: (), name: "h2o-test", namespace: Some("default") }, ReconcilerAction { requeue_after: None })
    2020-11-07 19:30:39,728 INFO  [h2o_operator::controller] Deleted H2O 'h2o-test'.
    2020-11-07 19:30:39,728 INFO  [h2o_operator::controller] Reconciled (ObjectRef { kind: (), name: "h2o-test", namespace: Some("default") }, ReconcilerAction { requeue_after: None })
    

    :pray: Thank you for reviewing such a big PR. :pray:

    type/feature 
    opened by Pscheidl 1
  • Create Kubernetes operator

    Create Kubernetes operator

    The codebase for deploying H2O into K8S cluster might be easily re-used to create a Kubernetes operator. With Rust, the resulting binary should be thin and performant. Cargo workspaces is an ideal tool to organize the repository.

    • Define custom H2O resource - the inputs should be the same as for the CLI tools.
    • Parameter check with meaningful errors (1:1 with CLI unless there is strong reason for a detour)
    • Working deployment and undeployment first, H2O cluster status watch afterwards, as well as other functionality like Jupyter.
    type/feature 
    opened by Pscheidl 1
  • Automatically detect namespace from Kubeconfig

    Automatically detect namespace from Kubeconfig

    Kubeconfig might contain namespace name - h2ok should replace the default value with the value from the kubeconfig, unless user specifies the namespace specifically.

    Example OpenShift Kubeconfig

    apiVersion: v1
    clusters:
    - cluster:
        server: https://api.us-east-1.starter.openshift-online.com:6443
      name: api-us-east-1-starter-openshift-online-com:6443
    contexts:
    - context:
        cluster: api-us-east-1-starter-openshift-online-com:6443
        namespace: pavel-test
        user: <omitted>
      name: pavel-test/api-us-east-1-starter-openshift-online-com:6443/<omitted>
    current-context: <omitted>t/api-us-east-1-starter-openshift-online-com:6443/<omitted>
    kind: Config
    preferences: {}
    users:
    - name: <omitted>
      user:
        token: <omitted>
    
    type/feature 
    opened by Pscheidl 1
  • Ingress deployment

    Ingress deployment

    Make h2ok create ingresses for existing H2O K8S deployments. The deployment should be able to expose H2O and all corresponding services (Jupyter notebooks attached to it and others). Also, the services exposed should be configurable by the user.

    Related: https://github.com/h2oai/h2o-kubernetes/issues/3

    type/feature 
    opened by Pscheidl 1
  • Operator should facilitate reliable log collection

    Operator should facilitate reliable log collection

    We have cases when users are not able to provide complete logs of their H2O runs (eg. pod failure), this makes diagnosing exceptional states hard or even impossible.

    We need to abstract users from solving the logging problem and k8s operator seems like the right place to do it.

    The operator should facilitate log collection from H2O and report where to get log from a particular cluster. Since we are starting to support node restart with fault tolerance, we also need to collect logs from failed logs.

    The mechanism should rely on the logs that H2O writes in the log directory (they roll off), instead, it should collect the standard output and error output to make sure everything is preserved.

    area/core 
    opened by michalkurka 0
  • Fix Custom Resource Definition

    Fix Custom Resource Definition

    The scec uses camel case:

                spec:
                  type: object
                  properties:
                    nodes:
                      type: integer
                    version:
                      type: string
                    customImage:
                      type: object
                      properties:
                        image:
                          type: string
                        command:
                          type: string
                      ...    
    
    opened by mn-mikke 0
  • Automate operator release

    Automate operator release

    Automate the release of h2o-operator binary.

    Release triggers

    Release will be triggered by a new git tag operator-x.y.z where x,y,z are semver2 compatible version numbers. First release will be 0.1.0 with operator API version v1beta. The corresponding tag will be operator-0.1.0.

    • At the beginning, user creating the tag is responsible to set the version in operator's Cargo.toml. Optionally, a check may be set up.

    Release actions

    General

    • All tests cargo test will be ran before any release. If tests fail, release fails.

    Docker images

    • Build UBI-based docker image with h2o-operator inside. The operator will be started as the container is started with docker run. UBI is required for Red Hat specification.
    • Smallest possible UBI will be used. The operator is AOT compiled, no special runtime dependencies required besides basic ones like libssl.
    • The docker image will be pushed to Docker Hub h2oai space.
    • Optionally, the image will be pushed to Red Hat's docker registry: https://catalog.redhat.com/software/containers/explore - This might be required for Red Hat operator certification.
    • The operator will be built in the Dockerfile itself from source during release, enabling rustc to make all the optimization towards the final environment.
    • Final docker image should be cleaned of intermediate dependencies and installments required for the operator to be built.

    Binaries

    • Operator binary will source code will be released in GitHub releases of this repository in the very same manner CLI is released. This means a generic binary for three major platforms: Linux, macOs and Windows. New macOs releases require a bribe to Apple in order to run the Application. During the first release automation, this is not going to be solved, as it is not a major use case (if any at all).

    Certification

    • The operator will uploaded to Red Hat for certification. Upload will be done via their Rest API: https://connect.redhat.com/api-docs
    • All the metadata required will be generated and sent.
    • Investigate certification response duration - if instant, rollback the release. If instant feedback is not possible, leave certification failures for manual resolvement.
    • There is a new certification API coming to be released in Q1 next year. Use the current one at the moment.

    Operator Hub

    • NOT MANDATORY - investigate options and ways of publishing the operator on https://operatorhub.io/ - Only if quick. Otherwise delegate to another issue, Red Hat certification is the priority.

    Resources

    These blogs also cover usage of the API: https://connect.redhat.com/blog/getting-started-red-hat-partner-connect-api https://connect.redhat.com/blog/streamlining-projects-red-hat-partner-connect-api

    Here is information for the vulnerability scoring system (Container Health Index): https://redhat-connect.gitbook.io/catalog-help/container-images/container-health

    area/infra-tools-automation priority/blocker reporter/internal state/WIP type/feature 
    opened by Pscheidl 0
  • Log handling

    Log handling

    • Download logs from running H2O deployment
    • Download logs from crash H2O deployment
    • Log turnaround configuration
    • How to create "big enough" volumes ?
    type/feature 
    opened by Pscheidl 0
  • Operator projects using the removed APIs in k8s 1.22 requires changes.

    Operator projects using the removed APIs in k8s 1.22 requires changes.

    Problem Description

    Kubernetes has been deprecating API(s), which will be removed and are no longer available in 1.22. Operators projects using these APIs versions will not work on Kubernetes 1.22 or any cluster vendor using this Kubernetes version(1.22), such as OpenShift 4.9+. Following the APIs that are most likely your projects to be affected by:

    • apiextensions.k8s.io/v1beta1: (Used for CRDs and available since v1.16)
    • rbac.authorization.k8s.io/v1beta1: (Used for RBAC/rules and available since v1.8)
    • admissionregistration.k8s.io/v1beta1 (Used for Webhooks and available since v1.16)

    Therefore, looks like this project distributes solutions via the Red Hat Connect with the package name as h2o-operator and does not contain any version compatible with k8s 1.22/OCP 4.9. Following some findings by checking the distributions published:

    • h2o-operator.v0.1.0: this distribution is using APIs which were deprecated and removed in v1.22. More info: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-22. Migrate the API(s) for CRD: (["h2os.h2o.ai"])

    NOTE: The above findings are only about the manifests shipped inside of the distribution. It is not checking the codebase.

    How to solve

    It would be very nice to see new distributions of this project that are no longer using these APIs and so they can work on Kubernetes 1.22 and newer and published in the Red Hat Connect collection. OpenShift 4.9, for example, will not ship operators anymore that do still use v1beta1 extension APIs.

    Due to the number of options available to build Operators, it is hard to provide direct guidance on updating your operator to support Kubernetes 1.22. Recent versions of the OperatorSDK greater than 1.0.0 and Kubebuilder greater than 3.0.0 scaffold your project with the latest versions of these APIs (all that is generated by tools only). See the guides to upgrade your projects with OperatorSDK Golang, Ansible, Helm or the Kubebuilder one. For APIs other than the ones mentioned above, you will have to check your code for usage of removed API versions and upgrade to newer APIs. The details of this depend on your codebase.

    If this projects only need to migrate the API for CRDs and it was built with OperatorSDK versions lower than 1.0.0 then, you maybe able to solve it with an OperatorSDK version >= v0.18.x < 1.0.0:

    $ operator-sdk generate crds --crd-version=v1 INFO[0000] Running CRD generator.
    INFO[0000] CRD generation complete.

    Alternatively, you can try to upgrade your manifests with controller-gen (version >= v0.4.1) :

    If this project does not use Webhooks:

    $ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role paths="./..."

    If this project is using Webhooks:

    1. Add the markers sideEffects and admissionReviewVersions to your webhook (Example with sideEffects=None and admissionReviewVersions={v1,v1beta1}: memcached-operator/api/v1alpha1/memcached_webhook.go):

    2. Run the command:

    $ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role webhook paths="./..."

    For further info and tips see the blog.

    Thank you for your attention.

    opened by camilamacedo86 1
Owner
H2O.ai
Fast Scalable Machine Learning For Smarter Applications
H2O.ai
A Rust command line tool to simplify embedded development and deployment.

Bobbin-CLI bobbin-cli is a tool designed to make it easy to build, deploy, test and debug embedded devices using a unified CLI. bobbin-cli understands

Bobbin 110 Dec 25, 2022
⏱ Kubernetes operator that allows to set maximum lifetime for pods

Pod Lifetime Limiter Hi! ?? So you deal with a crappy application which stops working after some period of time and you want to restart it every N hou

Viktor 27 Sep 8, 2022
Rust based Kubernetes Operator to deploy K8s objects minimally.

kube-nimble nimble /ˈnɪmbl/ - quick and light in movement or action; agile. This project began from a place of curiosity about Kubernetes CRDs and the

Meet Vasani 3 Feb 26, 2024
Upkeep your websites and web applications with ease from the comfort of the command line.

Upkeep Upkeep your websites and web applications with ease from the comfort of the command line. Explore the docs » View Demo · Report Bug · Request F

Kevin B 0 Dec 24, 2021
Small command-line tool to switch monitor inputs from command line

swmon Small command-line tool to switch monitor inputs from command line Installation git clone https://github.com/cr1901/swmon cargo install --path .

William D. Jones 5 Aug 20, 2022
Horus is an open source tool for running forensic and administrative tasks at the kernel level using eBPF, a low-overhead in-kernel virtual machine, and the Rust programming language.

Horus Horus is an open-source tool for running forensic and administrative tasks at the kernel level using eBPF, a low-overhead in-kernel virtual mach

null 4 Dec 15, 2022
Rust command line utility to quickly display useful secrets in a Kubernetes namespace

kube-secrets This is a command line utility for quickly looking at secrets in a Kubernetes namespace that are typically looked at by humans. It specif

Frank Wiles 8 Feb 10, 2022
A cli tool to automate the building and deployment of Bitcoin nodes

ℹ️ Automate Bitcoin builds, speed up deployment Shran is an open-source cli tool being developed to address the needs of DMG Blockchain Solutions. It

Matt Williams 1 Oct 20, 2022
A blazing fast command line license generator for your open source projects written in Rust🚀

Overview This is a blazing fast ⚡ , command line license generator for your open source projects written in Rust. I know that GitHub

Shoubhit Dash 43 Dec 30, 2022
A tool to use the webeep platform of the Politecnico di Milano directly from the command line.

webeep-cli A tool to use the WeBeep platform of the Politecnico di Milano directly from the command line. Features Browse the course folders as if the

Simone Orlando 9 Apr 8, 2022
A command-line downloader for sites archived on the Wayback Machine

This is a small command-line utility I wrote to help with browsing archived websites from the Wayback Machine, which can sometimes be pretty slow.

Jonas Schievink 7 Oct 18, 2022
Pink is a command-line tool inspired by the Unix man command.

Pink is a command-line tool inspired by the Unix man command. It displays custom-formatted text pages in the terminal using a subset of HTML-like tags.

null 3 Nov 2, 2023
ripsecrets is a command-line tool to prevent committing secret keys into your source code.

ripsecrets is a command-line tool to prevent committing secret keys into your source code. ripsecrets has a few features that distinguish it from other secret scanning tools:

Brian Smith 588 Dec 30, 2022
🗄️ A simple (and safe!) to consume history of Client and Studio deployment versions.

??️ Roblox Version Archive A simple (and safe!) to consume history of Client and Studio deployment versions. About Parsing Roblox's DeployHistory form

Brooke Rhodes 4 Dec 28, 2022
IntMaxRollup operator node & cli tools.

Intmax Rollup Operator Int max operator node Prepara Install rustup. rustup override set nightly

null 5 Jul 26, 2022
Expand your possibilities with the Try ? Operator

Expand your possibilities with the Try ? Operator Have you ever found yourself writing a function which may return early based on some condition? fn m

EC 1 Feb 1, 2022
Set of tools that make it easier for the operator to manage a TAPLE network.

⚠️ TAPLE is in early development and should not be used in production ⚠️ TAPLE Tools TAPLE (pronounced T+ ?? ['tapəl]) stands for Tracking (Autonomous

Open Canarias 5 Jan 25, 2023
A simple, C-like, ternary operator for cleaner syntax.

A simple ternary operator macro in rust. the iff! macro is the only item exported by this crate, it simply takes three expressions, seperated by ? and

KaitlynEthylia 42 May 23, 2023
Contains challenges, write-ups, and deployment configurations from b01lersCTF 2023.

CTF Name A template repository for a CTF competition. This is a description of the CTF event. CTFTime Link Structure Challenges are organized by categ

null 7 Mar 29, 2023