A project management tool for data science and bioinformatics. If you want it, Kerblam it!

Related tags

Command-line kerblam
Overview

If you want it, Kerblam it!

GitHub issues GitHub License GitHub Repo stars All Contributors

Warning

kerblam run and kerblam package are complete but still untested. Please do use them, but be careful. Always have a backup of your data and code! Report any problems in the issues. Thank you kindly!

Kerblam! is a tool that can help you manage data analysis projects.

A Kerblam! project has a kerblam.toml file in its root. Kerblam! allows you to:

  • Access remote data quickly, by just specifying URLs to fetch from;
  • Package and export data in order to share the project with colleagues;
  • Manage and run multiple makefiles for different tasks;
  • Leverage git to isolate, rollback and run the project at a different point in time;
  • Clean up intermediate and output files quickly;
  • Manage Docker environments and run code in them for you.
  • Manage the content of your .gitignore for you, allowing to add files, directories and even whole languages in one command.
  • Make it easy to use pre-commit by managing .pre-commit-hooks.
  • Specify test data to run and quickly use it instead of real data.

To transform a project to a Kerblam! project just make the kerblam.toml file yourself. To learn how, look at the section below.

Overview

Warning

Some commands are missing some features that would be nice to have. Please take a look at the issues and see if what you'd like to do is already proposed and/or being worked on. If you don't find it, open an issue yourself detailing what you think would be a good addition!

  • kerblam new can be used to create a new kerblam! project. Kerblam! asks you if you want to use some common programming languages and sets up a proper .gitignore and pre-commit hooks for you.
  • kerblam data fetches remote data and saves it locally, manages local data and can clean it up, preserving only files that must be preserved. It also shows you how much local data is on the disk, how much data is remote and how much disk space you can free without losing anything important.
  • kerblam package packages your pipeline and exports a docker image for execution later. It's useful for reproducibility purposes as the docker image is primed for execution, bundling the kerblam! executable, execution files and non-remote data in the blob itself.
  • kerblam run executes the analysis for you, by choosing your makefiles and dockerfiles appropriately and building docker containers as needed. Optionally, allows test data or alternative data to be used instead of real data, in order to test your pipelines.
  • kerblam ignore can edit your .gitignore file by adding files, folders and GitHub's recommended ignores for specific languages in just one command.

Kerblam! is not and does not want to be:

  • A pipeline manager like snakemake and nextflow: It supports and helps you execute make, but it does not interfere from then on;
  • A replacement for any of the tools it leverages (e.g. git, docker, pre-commit);
  • Something that insulates you from the nuances of writing good, correct pipelines and Dockerfiles.
    Specifically, Kerblam! will never:
    • Parse your .gitignore, .dockerignore, pipes or Dockerfiles to check for errors or potential issues;
    • Edit code for you (with the exception of a tiny bit of wrapping to allow kerblam package to work);
    • Handle any errors produced by the pipelines or containers.
  • A tool that covers every edge case. Implementing more features for popular and widespread tasks is perfectly fine, but Kerblam! will never have a wall of options for you to choose from. If you need more advanced control on what is done, you should directly use the tools that Kerblam! leverages.

Tip

Kerblam! works with you, not for you!

Opinions

Tip

If you wish to learn more on why these design choices were made, please take a look at the kerblam! philosophy.

Kerblam! projects are opinionated:

  • The folder structure of your project adheres to the Kerblam! standard, although you may configure it in kerblam.toml. Read about it below.
  • You use make or bash scripts as your pipeline manager.
  • You use docker as your virtualisation service.
  • You use git as your version control system. Additionally, you create tags with git to record important previous versions of your project.
  • You execute your pipelines in a Docker container, and not in your development environment.
  • Most of your input data is remotely downloadable, especially for large and bulky files.

If you don't like this setup, Kerblam! is not for you.

Folder structure

Kerblam!, by default, requires the following folder structure (relative to the root of the project, ./):

  • ./kerblam.toml: This file contains the options for Kerblam!. It is usually empty.
  • ./data/: This is a directory for the data. Intermediate data files are held here.
  • ./data/in/: Input data files are saved and should be looked for, in here.
  • ./data/out/: Output data files are saved and should be looked for, in here.
  • ./src/: Code you want to be executed should be saved here.
  • ./src/pipes/: Makefiles and bash build scripts should be saved here. They have to be written as if they were saved in ./.
  • ./src/dockerfiles/: Dockerfiles should be saved here.

You can configure all of these paths in kerblam.toml, if you so desire. This is mostly done for compatibility reasons with non-kerblam! projects.

Warning

Please take a look at issue #11 before editing your paths.

Contributing

To contribute, please take a look at the contributing guide.

Code is not the only thing that you can contribute. Written a guide? Considered a new feature? Wrote some docstrings? Found a bug? All of these are meaningful and important contributions. For this reason, all contributors are listed in the contributing guide.

Thank you for taking an interest in Kerblam! Any help is really appreciated.

Licensing and citation

Kerblam! is licensed under the MIT License. If you wish to cite Kerblam!, please provide a link to this repository.

Naming

This project is named after the fictitious online shop/delivery company in S11E07 of Doctor Who. Kerblam! might be referred to as Kerblam!, Kerblam or Kerb!am, interchangeably, although Kerblam! is preferred. The Kerblam! logo is written in the Kwark Font by tup wanders.

Installation

You can find and download a Kerblam! binary in the releases tab. Download it and drop it somewhere that you $PATH points to.

If you want to install from source, install Rust and cargo, then run:

cargo install --git https://github.com/MrHedmad/kerblam.git

Requirements

Kerblam! requires a Linux (or generally unix-like) OS. It also uses binaries that it assumes are already installed:

If you can use git, make, tar and docker from your CLI, you should be good.

Documentation

The Kerblam! documentation is in the /docs folder. Please take a look there for more information on what Kerblam! can do. For example, you might find the tutorial interesting.


And remember! If you want it...

Kerblam it!

Comments
  • `kerblam run` needs not copy the dockerfile

    `kerblam run` needs not copy the dockerfile

    kerblam run copies the dockerfile in /data/dockerfiles/ in the root of the repo as Dockerfile, then runs docker build, docker run, etc... Docker supports an -f flag to specify which dockerfile to use. The Executor could be changed to just accept an env: PathBuf (or similar) and just use the path as the dockerfile (instead of needing a FileMover).

    Makes cleanup easier too.

    refactor 
    opened by MrHedmad 2
  • Killing `kerblam run` with SIGINT does not allow kerblam to clean up

    Killing `kerblam run` with SIGINT does not allow kerblam to clean up

    Title. If the running command receives a SIGINT, the whole program dies, not allowing kerblam to cleanup files (e.g. the executor, the dockerfile, the profile renames, etc...).

    I treat this as a bug since cleanup is crucial.

    See this chapter of the CLI tools and this question.

    The Command .spawn() method returns a Child handler with which we can try_wait() until the parent command has finished. If we catch the SIGINT (e.g. with the ctrlc crate) we can signal the child to stop and stop early (but cleaning up).

    bug 
    opened by MrHedmad 1
  • build(deps): bump openssl from 0.10.59 to 0.10.60

    build(deps): bump openssl from 0.10.59 to 0.10.60

    Bumps openssl from 0.10.59 to 0.10.60.

    Release notes

    Sourced from openssl's releases.

    openssl-v0.10.60

    What's Changed

    Full Changelog: https://github.com/sfackler/rust-openssl/compare/openssl-v0.10.59...openssl-v0.10.60

    Commits
    • 8f4b97a Merge pull request #2104 from alex/bump-for-release
    • df66283 Release openssl v0.10.60 and openssl-sys v0.9.96
    • 1a09dc8 Merge pull request #2102 from sfackler/ex-leak
    • b0a1da5 Merge branch 'master' into ex-leak
    • f456b60 Merge pull request #2099 from alex/deprecate-store-ref-objects
    • a8413b8 Merge pull request #2100 from alex/symm-update-unchecked
    • a92c237 clippy
    • e839496 Don't leak when overwriting ex data
    • 602d38d Added update_unchecked to symm::Crypter
    • cf9681a fixes #2096 -- deprecate X509StoreRef::objects, it is unsound
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.
    opened by dependabot[bot] 1
  • Display should be implemented for paths

    Display should be implemented for paths

    For convenience I've yet to implement Display for PathBuf.

    At the very least, I'd like to get rid of the ugly (and useless) dots that arise from how the PathBufs are created:

    /some/path/./to/file.txt would become /some/path/to/file.txt.

    Additionally, displaying paths as relative would also be nice. If the current invocation directory is /test/path, /test/path/to/file.txt would be displayed as ./to/file.txt.

    feat good first issue 
    opened by MrHedmad 1
  • Add version compatibility check

    Add version compatibility check

    This would fix #18.

    The check is placed in the code to generate the KerblamTomlOptions, so it's executed only when the options are created (e.g. not when running kerblam new).

    opened by MrHedmad 0
  • Add a check to see if the Kerblam! version is compatible with the `kerblam.toml` file

    Add a check to see if the Kerblam! version is compatible with the `kerblam.toml` file

    kerblam new saves in a [meta] section the version of the tool. This field is ignored by Kerblam! right now as it is not ever deserialized.

    The motive of this check is to warn the user that the version of kerblam! used by them is not the same as the version that is currently running, and there might be compatibility differences between the versions.

    feat 
    opened by MrHedmad 0
  • Rename fragile data to something else

    Rename fragile data to something else

    "fragile" when referring to non-remote, non-precious data is inappropriate.

    It should be replaced with something better. A potential candidate is "replaceable", but it should be short.

    feat RFC 
    opened by MrHedmad 0
  • Allow inserting descriptions of the pipelines

    Allow inserting descriptions of the pipelines

    To differentiate between the pipes one has only the filename to turn to. Since you write the filename each time you kerblam run, it is usually kept short. A short name does not tell much about what a pipe does, so adding a description is preferable.

    To do this, I propose adding a custom header in the makefiles/shellfiles. The first contiguous block of lines starting with just whitespace and #? is treated as a "docstring". For instance, in the world.sh file:

    set -e
    echo "Hello, world!"
    #? Say hello to the world, then dominate it.
    #?
    #? This script is generally very simple as it just uses
    #? an LLM to produce nonsense.
    
    world --dominate
    #? Disregard me.
    

    The block:

    Say hello to the world, then dominate it.
    
    This script is generally very simple as it just uses
    an LLM to produce nonsense.
    

    will be the description of the file.

    Running kerblam run will then display:

    Available pipes:
    	world 		Say hello to world, then dominate it.
        save        Save a copy of this text
        fast        Run faster than normal
    

    Like a git commit the first line is the short version of the description (to be truncated if too long), while the full text is the full description.

    To be decided is how to show the full description to the user (kerblam run world --help? --describe? kerblam describe world?).

    feat RFC 
    opened by MrHedmad 0
  • Add a `--preserve-remote` option to `kerblam data clean`

    Add a `--preserve-remote` option to `kerblam data clean`

    Sometimes you wish to simply cleanup everything but the input data to "start fresh". Kerblam data clean does that, but you need to re-download the input data. Avoiding this step with a flag would be very useful for this use-case.

    feat 
    opened by MrHedmad 0
  • Add a `--local` option to run pipelines locally even if dockerfile exists

    Add a `--local` option to run pipelines locally even if dockerfile exists

    Sometimes, even though a dockerfile exists, you might want to run the pipelines locally.

    Add a flag to kerblam run to do this. Possibilities:

    • --local
    • --no-env
    • --ignore-env
    feat 
    opened by MrHedmad 0
  • feat: Add pipes and env dirs to config

    feat: Add pipes and env dirs to config

    This fixes #12. The anyhow error has the info:

    Error: Could not find specified runtime 'test4'
    Available pipes:
        test 🐋
        test3
        test2
    

    The :whale2: icon signifies a pipe with a dockerfile.

    The implementation is a bit clunky. Perhaps we should fuse the logic of finding the paths into a list of objects (e.g. Pipe) with both the path to the pipe and the path to the docker (if any).

    opened by MrHedmad 0
  • CI does not deploy to many platforms

    CI does not deploy to many platforms

    We wish to support for linux and mac, but the CI pipeline only deploys for one linux target, which is not completely supported.

    See Rust platform support. I'd like to support:

    • aarch64-unknown-linux-gnu
    • i686-unknown-linux-gnu
    • x86_64-unknown-linux-gnu
    • x86_64-apple-darwin
    • aarch64-apple-darwin

    Potentially other Tier 2 support, but one could always compile from source themselves if they need it, cargo makes it easy enough.

    feat help wanted 
    opened by MrHedmad 2
Releases(v0.1.0)
  • v0.1.0(Dec 7, 2023)

    This is the first release of Kerblam! Expect bugs and missing features. Your feedback is highly appreciated!

    Features:

    • kerblam new can be used to create a new kerblam! project. Kerblam! asks you if you want to use some common programming languages and sets up a proper .gitignore and pre-commit hooks for you.
    • kerblam data fetches remote data and saves it locally, manages local data and can clean it up, preserving only files that must be preserved. It also shows you how much local data is on the disk, how much data is remote and how much disk space you can free without losing anything important.
    • kerblam package packages your pipeline and exports a docker image for execution later. It's useful for reproducibility purposes as the docker image is primed for execution, bundling the kerblam! executable, execution files and non-remote data in the blob itself.
    • kerblam run executes the analysis for you, by choosing your makefiles and dockerfiles appropriately and building docker containers as needed. Optionally, allows test data or alternative data to be used instead of real data, in order to test your pipelines.
    • kerblam ignore can edit your .gitignore file by adding files, folders and GitHub's recommended ignores for specific languages in just one command.

    Known issues: Please see the issues to learn more about missing features and bugs. The most prominent issues are listed here:

    • :exclamation: #11 : Some paths specified in kerblam.toml might not be respected by Kerblam! Especially regarding the ./src, ./src/pipes and ./src/dockerfiles directories.
    • #13 and #8 : Default profiles without the need for configuration would be nice.
    • #12 : kerblam run does not list available pipes when you specify a pipe that is not available.

    Thank you for checking Kerblam! out! If you'd like to support the project, please leave a star. I really appreciate it.

    Source code(tar.gz)
    Source code(zip)
    kerblam_v0.1.0_x86_64-apple-darwin.zip(2.33 MB)
    kerblam_v0.1.0_x86_64-apple-darwin.zip.sha256sum(103 bytes)
    kerblam_v0.1.0_x86_64-unknown-linux-musl.tar.gz(2.07 MB)
    kerblam_v0.1.0_x86_64-unknown-linux-musl.tar.gz.sha256sum(112 bytes)
Owner
Luca "Hedmad" Visentin
PhD Student in Complex Systems for Life Sciences @ UniTO
Luca
A free and open-source DNA Sequencing/Visualization software for bioinformatics research.

DNArchery ?? A free and open-source cross-platform DNA Sequencing/Visualization Software for bioinformatics research. A toolkit for instantly performi

null 21 Mar 26, 2023
Bruteforce connecting to a specific Sea of Thieves server. Useful if you want to be in the same server as your friends.

SoT Server Finder Find which Sea of Thieves server you're connected to. Useful if you want to be in the same server as your friends. Setup Download so

Martin 4 Mar 19, 2023
A little application that makes it possible to display mpv's subs anywhere you want.

Mpv Subs Popout A little application that makes it possible to display mpv's subs anywhere you want. Why? You can now watch shows in foreign languages

sdaqo 4 Jul 14, 2023
A Yocto setup and management tool that helps you keep your environment up-to-date and in-sync with your team

yb (Yocto Buddy) yb is designed to make it easy to setup and (perhaps more importantly) keep Yocto environments up-to-date and in-sync with your team.

null 13 Oct 31, 2022
This is a `Rust` based package to help with the management of complex medicine (pill) management cycles.

reepicheep This is a Rust based package to help with the management of complex medicine (pill) management cycles. reepicheep notifies a person(s) via

Daniel B 24 Dec 13, 2023
Oxygen is a voice journal and audio analysis toolkit for people who want to change the way their voice comes across.

Oxygen Voice Journal Oxygen is a voice journal and audio analysis toolkit for people who want to change the way their voice comes across. Or rather, i

Jocelyn Stericker 32 Oct 20, 2022
tmplt is a command-line interface tool that allows you to quickly and easily set up project templates for various programming languages and frameworks

tmplt A User Friendly CLI Tool For Creating New Projects With Templates About tmplt is a command-line tool that lets users quickly create new projects

Humble Penguin 35 Apr 8, 2023
A diff-based data management language to implement unlimited undo, auto-save for games, and cloud-apps which needs to retain every change.

Docchi is a diff-based data management language to implement unlimited undo, auto-save for games, and cloud-apps which needs to save very often. User'

juzy 21 Sep 19, 2022
A CLI tool that allow you to create a temporary new rust project using cargo with already installed dependencies

cargo-temp A CLI tool that allow you to create a new rust project in a temporary directory with already installed dependencies. Install Requires Rust

Yohan Boogaert 61 Oct 31, 2022
rpm (Rust project manager) is a tool that helps you to manage your rust projects

rpm rpm (Rust project manager) is a open source tool for managing your rust project in an organized way Installation # make sure you have rust install

Dilshad 4 May 4, 2023
SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project.

SKYULL is a command-line interface (CLI) in development that creates REST API project structure templates with the aim of making it easy and fast to start a new project. With just a few primary configurations, such as project name, you can get started quickly.

Gabriel Michaliszen 4 May 9, 2023
A file management automation tool.

organize A file management automation tool. Current Status This is in really early development. Please come back later! Background The Python organize

null 4 Jun 6, 2023
A simple, TUI git management tool

Gitten Gitten is git project manager with multiple repositories. With gitten you can check out to new branches and tag branches from you active. Prere

Hamza Oral 12 Mar 31, 2023
fas stand for Find all stuff and it's a go app that simplify the find command and allow you to easily search everything you nedd

fas fas stands for Find all stuff and it's a rust app that simplify the find command and allow you to easily search everything you need. Note: current

M4jrT0m 1 Dec 24, 2021
🔣 nerdfix helps you to find/fix obsolete Nerd Font icons in your project.

?? nerdfix nerdfix helps you to find/fix obsolete Nerd Font icons in your project. ?? Why Nerd Fonts is used in many projects for a beautiful UI. It p

Loi Chyan 80 Apr 8, 2023
Shellfirm - Intercept any risky patterns (default or defined by you) and prompt you a small challenge for double verification

shellfirm Opppppsss you did it again? ?? ?? ?? Protect yourself from yourself! rm -rf * git reset --hard before saving? kubectl delete ns which going

elad 652 Dec 29, 2022
zigfi is an open-source stocks, commodities and cryptocurrencies price monitoring CLI app, written fully in Rust, where you can organize assets you're watching easily into watchlists for easy access on your terminal.

zigfi zigfi is an open-source stocks, commodities and cryptocurrencies price monitoring CLI app, written fully in Rust, where you can organize assets

Aldrin Zigmund Cortez Velasco 18 Oct 24, 2022
Concurrent and multi-stage data ingestion and data processing with Rust+Tokio

TokioSky Build concurrent and multi-stage data ingestion and data processing pipelines with Rust+Tokio. TokioSky allows developers to consume data eff

DanyalMh 29 Dec 11, 2022
A high-performance WebSocket integration library for streaming public market data. Used as a key dependency of the `barter-rs` project.

Barter-Data A high-performance WebSocket integration library for streaming public market data from leading cryptocurrency exchanges - batteries includ

Barter 23 Feb 3, 2023