High-performance runtime for data analytics applications

Weld

Last update: Dec 28, 2022

Related tags

Data processing rust data machine-learning performance analytics llvm pandas stanford code-generation

Overview

Weld

Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and functions by expressing the core computations in libraries using a common intermediate representation, and optimizing across each framework.

Modern analytics applications combine multiple functions from different libraries and frameworks to build complex workflows. Even though individual functions can achieve high performance in isolation, the performance of the combined workflow is often an order of magnitude below hardware limits due to extensive data movement across the functions. Weld’s take on solving this problem is to lazily build up a computation for the entire workflow, and then optimizing and evaluating it only when a result is needed.

You can join the discussion on Weld on our Google Group or post on the Weld mailing list at [email protected].

Building
Documentation
Grizzly (Pandas on Weld)
Tools

Building

To build Weld, you need the latest stable version of Rust and LLVM/Clang++ 6.0.

To install Rust, follow the steps here. You can verify that Rust was installed correctly on your system by typing rustc into your shell. If you already have Rust and rustup installed, you can upgrade to the latest stable version with:

rustup update stable

MacOS LLVM Installation

To install LLVM on macOS, first install Homebrew. Then:

brew install llvm@6

Weld's dependencies require llvm-config on $PATH, so you may need to create a symbolic link so the correct llvm-config is picked up (note that you might need to add sudo at the start of this command):

ln -sf `brew --prefix llvm@6`/bin/llvm-config /usr/local/bin/llvm-config

To make sure this worked correctly, run llvm-config --version. You should see 6.0.x.

Ubuntu LLVM Installation

To install LLVM on Ubuntu, get the LLVM 6.0 sources and then apt-get:

On Ubuntu 16.04 (Xenial):

wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-6.0 main"
sudo apt-get update
sudo apt-get install llvm-6.0-dev clang-6.0

On Ubuntu 14.04 (Trusty):

wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-6.0 main"

# gcc backport is required on 14.04, for libstdc++. See https://apt.llvm.org/
sudo apt-add-repository "deb http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu trusty main"
sudo apt-get update
sudo apt-get install llvm-6.0-dev clang-6.0

Weld's dependencies require llvm-config, so you may need to create a symbolic link so the correct llvm-config is picked up. sudo may be required:

ln -s /usr/bin/llvm-config-6.0 /usr/local/bin/llvm-config

To make sure this worked correctly, run llvm-config --version. You should see 6.0.x or newer.

You will also need zlib:

sudo apt-get install zlib1g-dev

Building Weld

With LLVM and Rust installed, you can build Weld. Clone this repository, set the WELD_HOME environment variable, and build using cargo:

git clone https://www.github.com/weld-project/weld
cd weld/
export WELD_HOME=`pwd`
cargo build --release

Weld builds two dynamically linked libraries (.so files on Linux and .dylib files on Mac): libweld and libweldrt.

Finally, run the unit and integration tests:

cargo test

Documentation

The Rust Weld crate is documented here.

The docs/ directory contains documentation for the different components of Weld.

language.md describes the syntax of the Weld IR.
api.md describes the low-level C API for interfacing with Weld.
python.md gives an overview of the Python API.
tutorial.md contains a tutorial for how to build a small vector library using Weld.

Python Bindings

Weld's Python bindings are in python, with examples in examples/python.

Grizzly

Grizzly is a subset of Pandas integrated with Weld. Details on how to use Grizzly are in python/grizzly. Some example workloads that make use of Grizzly are in examples/python/grizzly. To run Grizzly, you will also need the WELD_HOME environment variable to be set, because Grizzly needs to find its own native library through this variable.

Testing

cargo test runs unit and integration tests. A test name substring filter can be used to run a subset of the tests:

cargo test <substring to match in test name>

Tools

This repository contains a number of useful command line tools which are built automatically with the main Weld repository, including an interactive REPL for inspecting and debugging programs. More information on those tools can be found under docs/tools.md.

Comments

Codegen cleanup
On leg 2 of my winter break journey across the US, I made a prototype of a new LLVM code generator using llvm-rs. Three major changes:

Code generation through builders. Instead of generating LLVM code strings, builders provide a more type-safe (both at compile time and runtime) and concise means of creating an IR.

New code execution runtime. The llvm-rs JitEngine is pretty comparable to what's already implemented in easy_ll, except it integrates well with the code builders instead of relying on a string intermediary.

Revamped the REPL to actually produce output and use a command line parser.
opened by willcrichton 11
Build refactor
Removes the make commands from build.rs that were used to build the convertor dylib.

Change the package name grizzly to pygrizzly

Added a binary extension for libweld in pyweld/setup.py which somehow allowed auditwheel to stop complaining about libweld.so being part of the python wheel. I could run auditwheel successfully and it changed the platform tag which allowed me to upload to pypi.
opened by rahulpalamuttam 10
String ~ vec[i8] comparisons Python3
Am attempting in baloo to encode strings to Weld for e.g. sr[sr != 'abc'] to work, however there seems to be a bug somewhere. Are vec[i8] <comparison> vec[i8] expected to work correctly at the Weld level?

For example:

// _inp2 here is the index associated with the _inp0 strings data |_inp0: vec[vec[i8]], _inp1: vec[i8], _inp2: vec[i64]| let obj100 = (_inp0); let obj101 = (map( obj100, |a: vec[i8]| a != _inp1 )); result( for( zip(_inp2, obj101), appender[i64], |b: appender[i64], i: i64, e: {i64, bool}| if (e.$1, merge(b, e.$0), b) ) )

This only seems to work when _inp1 is of length 1. So for:

sr = Series(np.array(['abc', 'Burgermeister', 'b'], dtype=np.bytes_)) sr[sr != 'b'] # will correctly return the first 2 elements sr[sr != 'abc'] # does not; (returns all elements)

The most likely culprit is the encoding with Python3. The only changes I made are essentially moving from PyString_AsString and PyString_Size to the PyBytes_* equivalents (in the .cpp file) and encoding the str to utf-8, e.g. abc.encode('utf-8') (in the encoders.py file):

extern "C" weld::vec<uint8_t> str_to_weld_char_arr(PyObject* in) { int64_t dimension = (int64_t) PyBytes_Size(in); weld::vec<uint8_t> t; t.size = dimension; t.ptr = (uint8_t*) PyBytes_AsString(in); return t; } ... if isinstance(obj, str): numpy_to_weld = self.utils.str_to_weld_char_arr numpy_to_weld.restype = WeldVec(WeldChar()).ctype_class numpy_to_weld.argtypes = [py_object] return numpy_to_weld(obj.encode('utf-8'))

Note that

En-/decoding numpy arrays of bytes works fine with the grizzly encoders (and using PyBytes_FromStringAndSize instead of PyString_FromStringAndSize).

Also toyed around with modifying WeldChar.ctype_class to c_char_p as opposed to c_wchar_p which seemed more appropriate yet produces the same result.

Encoding as ascii would probably be more appropriate, since Weld can't handle unicode from what I can tell. Nevertheless, the tested data is ascii.

This is with the master branch Weld.

Any feedback/idea on what the issue might be?
opened by radujica 9
Python encoder
@sppalkia: just as an FYI, not ready to merge.

Major issues right now:

Some benchmarks don't yet work correctly

Encoder / decoder much slower than C++ encoder / decoder
opened by deepakn94 8
Use typed null pointers instead of i64 0-values

Fixes #473 when using Weld with an LLVM 6.0 distribution that has LLVM_ENABLE_ASSERTIONS enabled.

Also fixes some README issues.

The issue was that some places in the code generation used an i64 0 literal as a substitute for null, which was okay with LLVM's module verifier but caused certain debug assertions to complain.

opened by sppalkia 7

Rust program using Weld exits with "LLVM ERROR: Program used external function ..." after calling FindFunction

System: Ubuntu 16.04 Rust: stable-x86_64-unknown-linux-gnu (1.28.0)

I can successfully compile the Weld library. I write client application with Rust. Similar to the example application at https://www.weld.rs/docs/weld/, it is

extern crate weld;

use weld::*;

#[repr(C)]
struct MyArgs {
    a: i32,
    b: i32,
}

fn main() {
    set_log_level(WeldLogLevel::Trace);
    let code = "|a: i32, b: i32| a + b";
    let ref mut conf = WeldConf::new();
    conf.set("weld.compile.dumpCode", "true");
    conf.set("weld.compile.dumpCodeDir", "/tmp");
    let mut module = WeldModule::compile(code, conf).unwrap();

    // Weld accept packed C structs as an argument.
    let ref args = MyArgs { a: 1, b: 50 };
    let ref input = WeldValue::new_from_data(args as *const _ as Data);

    // Running a Weld module and reading a value out of it is unsafe!
    unsafe {
        // Run the module, which returns a wrapper `WeldValue`.
        let result = module.run(conf, input).unwrap();
        // The data is just a pointer: cast it to the expected type
        let data = result.data() as *const i32;

        let result = (*data).clone();
        assert_eq!(args.a + args.b, result);
    }
}

However, the program terminated with the following output

[debug] 23:56:48.124: Started compiling LLVM
[debug] 23:56:48.124: Done creating LLVM context
[debug] 23:56:48.124: Done parsing module
[debug] 23:56:48.124: Done parsing bytecode file
[debug] 23:56:48.125: Done linking bytecode file
[debug] 23:56:48.125: Done validating module
[debug] 23:56:48.128: Done optimizing module
[debug] 23:56:48.128: Done creating execution engine
[debug] 23:56:48.128: Before Calling FindFunction
LLVM ERROR: Program used external function 'weld_rt_get_run_id' which could not be resolved!

I try to investigate into Weld source code, and find that it called llvm::execution_engine::LLVMGetFunctionAddress and it is LLVM that terminates the whole program forcefully. What's going wrong?

opened by stevenybw 7

Sequential loops

This implements sequential loops using the iterate(initial_value, update_func) construct, where initial_value is of some type T and update_func is of type T => {T, bool}. It works the same way as the sequential loop in NVL. We call update_func repeatedly on values starting from initial_value and stop when the bool it returns is false. Then the final value of the expression is the last T it produced.

I made this work slightly differently if the loop body is sequential vs parallel: in the sequential case, it just adds basic blocks in the current function, while in the parallel case, it adds new functions for the continuation, etc. This is the same way If generates code.

opened by mateiz 7
Weld caches the result of UDF calls?

Modifying the Weld program here https://github.com/weld-project/weld/blob/b6ef6748cec3f2740032df164fbeff0aeb0b236a/examples/cpp/udfs_from_library/udfs.cpp#L24 into |x:i64| cudf[add_five,i64](x) + cudf[add_five,i64](x) returns the expected result (2*x + 10) but only calls add_five once.

opened by mihai-varga 6
Nditer
The main structural changes involves adding a new kind, NdIter, to IterKind: pub enum IterKind { ScalarIter, // A standard scalar iterator. SimdIter, // A vector iterator. FringeIter, // A fringe iterator, handling the fringe of a vector iter. NdIter, // multi-dimensional nd-iter }

and a couple of fields relevant to NdIter in struct Iter, and struct ParallelForIter, e.g,: pub struct ParallelForIter { pub data: Symbol, pub start: Option, pub end: Option, pub stride: Option, pub kind: IterKind, // NdIter specific fields pub strides: Option, pub shapes: Option, }

The code follows the same path as for other IterKinds, and the main changes are in the code to generate the index of the next element in llvm.rs (gen_loop_iteration_start, and gen_loop_iteration_end). And changing the bounds/num_iterations based on the shapes parameter also in llvm.rs (gen_loop_bounds_check, and gen_num_iters_and_fringe_start).

Besides this, I added a struct in llvm.rs (not sure if this was the best place for this?): pub struct VecLLVMInfo { pub ty_str: String, pub arr_str: String, pub prefix: String, pub len_str: String, pub el_ty_str: String, } which was useful for many of the llvm routines I had.

I have only added two tests for it (testing with zip, and a basic op (log)), but I did test it further using the numpy API where it was more natural as I could compare different outputs of non-contig arrays with results numpy produces. Here, I had a simple case that simulates a non-contiguous array using a 1-d rust array.

One issue that still remains is how to set the size of the array correctly: right now it is being done in transforms.rs --> infer_size using the formula: len = end-start / stride; Instead, the new formula should be shapes[0]shapes[1]...*shapes[n-1], but I wasn't sure how to emit weld code for this using exprs...in weld it should be something like:

for(shapes, merger[i64, *], |b, i, e| merge(b, e));

And using exprs, as in infer_size, I thought it would be something like: let b = exprs::newbuilder_expr(Merger(Scalar(I64), BinOpKind::Multiply))?; let m = exprs::merge_expr(b, ???); exprs::for_expr(iters[0].data.clone(), b, m, false)?

but I wasn't sure how to do the merge_expr --> in weld code I guess we have access to element "e", but how do we get it here?

If this is resolved, then we won't need to pass in "end", and "stride" to nditer. So far, I was just using stride=1, and end = start + real_len, which is functionally correct, but adds too many parameters to nditer...
opened by parimarjan 6
Reduce small loop overhead
increase outer loop grain size to 4096

eliminate bounds checks for simple single-iter (no explicit start and end) cases

minimize thread ID retrievals

local stack-based mergers with lazy creation of global mergers

performance on a worst-case loop like this

merger[+] m for v in vs { merger[+] n for e in v { // only one iteration merge(n, e) } merge(m, result(n)) } result(m)

is still 4x off the C version ... used to be 500x though
opened by jjthomas 6
Eliminated shared runtime library

Compile times may have gone up ... tests seem to take a bit longer to run. @mateiz maybe you can measure this?

weld_module_free and weld_module_mem_free are now used to free memory (they free all memory allocated by the module, which is what weld_value_free did before), and weld_module_memory_usage is used to determine a module's total memory usage. weld_value_free and weld_value_memory_usage now do nothing. I think we should implement these later once we figure out how to correctly free and determine the memory usage of a single value.

opened by jjthomas 6
movielens_grizzly.py code not working.

Hello,

I tried to run "movielens_grizzly.py" but I got the following error:

weld_type = grizzly_impl.numpy_to_weld_type_mapping[dtype] KeyError: '|S1'

Is this a known issue?

Thanks in advance.

opened by kchasialis 0
Bump numpy from 1.18.1 to 1.22.0 in /weld-python
Bumps numpy from 1.18.1 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
unique() function on weld-capi.
Hello,

I want to implement the grizzly_impl.unique() function using weld-capi.

After looking at the grizzly_impl.py code I found out that this is the code for unique()

map( tovec( result( for( map( obj_id, |p: vec[i8]| {p,0} ), dictmerger[vec[i8],i32,+], |b, i, e| merge(b,e) ) ) ), |p: {vec[i8], i32}| p.$0 )

However, obj_id is retrieved during runtime and I do not know how to do what using weld-capi. Basically my question is how to write a unique() function using weld-ir that can be compiled and called using weld-capi.

Thanks in advance!
opened by kchasialis 0
Running Python UDFs in Weld.

I am trying to run a UDF pipeline on a dataset using Weld (or grizzly, I suppose).

Grizzly, however, (as far as I know) does not offer an optimized function to apply for example a scalar UDF on a specific column of the dataset.

I found that one way to do it is to access the internal data using to_pandas() which has a function called “apply” and use this function to run a Python UDF on a column.

The problem is that I want to measure Weld’s performance on UDFs and by accessing the internal data and applying the functions just like a normal python program would do is not a fair way to measure Weld’s performance regarding (Python) UDF execution.

How can I apply a python UDF on a column of the dataset in an optimized way using Weld?

Thanks in advance!

opened by kchasialis 0
example udfs run error

When I try to build the example udfs and run it , I got error: LLVM ERROR: Program used external function 'add_five' which could not be resolved! I figure that should be lack the -rdynamic in the gcc flags. I want to submit a PR to fix that. But I found that still exist 2 unmerge PR. I want to know who is maintaining this project and what is the PR standard?

opened by bakey 0

Releases(v0.4.0)

v0.4.0(Feb 13, 2020)
Changes

Deprecates the old constructors API in lieu of the NewExpr trait on Expr

Adds a new SIR pass that allows for improved performance in some cases

Source code(tar.gz)
Source code(zip)
v0.3.1(Aug 23, 2019)

This release is equivalent to the tagged v0.3.0 release, but it increments the Cargo.toml version number. This allows the version of Weld published on Cargo to be in sync with the version tagged in releases on GitHub from now on.
Source code(tar.gz)
Source code(zip)
v0.3.0(Aug 23, 2019)
This is the first tagged release of Weld. See the release notes below.

v0.3.0

This release brings a new, optimized LLVM backend and updates to the core Weld APIs. It also removes multi-threading support for the time being, beause the old multi-threading design imposed undue overhead on single-threaded programs. In the future, threading will be re-introduced.

Detailed Notes

Introduces the WeldContext API, which provides a context for Weld runs to allocate their memory and to reuse objects allocated by other Weld runs. See the [WeldContext documentation]({{ site.url }}/docs/latest/weld/struct.WeldContext.html) for more details on how this is used.

Adds an optlookup operator that performs keyexists and lookup with a single hash operation. The operation returns a value and a boolean indicating whether the value was found. This operator will replace lookup on dictionaries eventually. See the language docs for more detail.

Changes the sort API to take a comparator function similar to libc qsort.

Adds an assert operator, which crashes the program if the assertion fails and evaluates to true otherwise.

Removes support for the nditer iterator. This will be added back in a future release.

Adds new options for dumping code. See the [new configuration options]({{ site.url }}/docs/latest/weld/#constants) for details.

Adds the ability to make type aliases:

type mytuple = {i32,i32};

Internally, the backend now uses LLVM's C builder API. This improves both compilation times and execution time (since LLVM's optimizer does a better job overall)

Change the hash function to CRC32. This is only supported on x86/x64.

Implements common subexpression elimination.

Bug fixes (see the PRs on the Weld repository).

Source code(tar.gz)
Source code(zip)

Owner

Weld

The Weld Project

GitHub https://www.weld.rs

A high-performance, high-reliability observability data pipeline.

Quickstart • Docs • Guides • Integrations • Chat • Download What is Vector? Vector is a high-performance, end-to-end (agent & aggregator) observabilit

12.1k Jan 2, 2023

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

Datafuse Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture Datafuse is a Real-Time Data Processing & Analytics DBMS wit