A pure, low-level tensor program representation enabling tensor program optimization via program rewriting

Overview

Glenside

Build and test Check formatting

Check out the web demo!

Glenside is a pure, low-level tensor program representation which enables tensor program optimization via program rewriting, using rewriting frameworks such as the egg equality saturation library. If you are interested in transforming and optimizing tensor kernels (e.g. fusing kernels, exploring data layouts, or mapping to custom hardware), then Glenside is of interest to you! See the web demo for concrete examples. See our MAPS 2021 paper to understand why Glenside exists and how it can be used. Finally, see the docs for technical documentation.

Quickstart

Fastest way to verify that this code actually does something is to build the Docker image and run the tests:

git clone <this repo>
cd <this repo>
docker build --tag glenside .
docker run -it glenside cargo test

...and "soon" I will add interactive web demos and pretty visualizations!

Dependencies

Glenside optionally depends on TVM and CPLEX. To disable these optional dependencies, use the --no-default-features flag with cargo, e.g. cargo test --no-default-features.

CPLEX

Glenside uses the CPLEX ILP solver. It isn't actually used in the core of Glenside anymore, and needs to be removed or cordoned off, but for now, to get Glenside fully working, you need CPLEX. To set up CPLEX, follow these steps:

  1. Get access to CPLEX. Students and academics can do so by making an account through their academic program. Download and install CPLEX on your machine.
  2. Set environment variables. Set $CPLEX_LIB to the location of the newly-installed CPLEX library on your machine. For me on OSX, it resides in /Applications/CPLEX_Studio1210/cplex/lib/x86-64_osx/static_pic/.

Publications

Comments
  • Use rewrites to convert relay to glenside

    Use rewrites to convert relay to glenside

    This is a long-needed change which implements the Relay-to-Glenside compiler using egg rewrites.

    In the past, the Relay-to-Glenside compiler would read in Relay programs using the Relay Rust bindings, and then construct Glenside expressions out of those expressions. There were no Relay expressions in Glenside at the time, and so everything needed to be converted directly to Glenside. It was only later that we began adding Relay expressions into Glenside. Now, all Relay constructs of interest are explicitly representable in Glenside using RelayOperatorCall and other constructs. With that in mind, the cleaner way to implement a Relay to Glenside compiler would be by importing a Relay program as RelayOperatorCalls, and then using rewrites to rewrite the various subexpressions to their Glenside equivalents. We can implement the Glenside-to-Relay compiler the same way. This has multiple benefits:

    1. If we implement the Glenside-to-Relay compiler the same way, then we can run Relay-to-Glenside, Glenside-to-Glenside, and Glenside-to-Relay rewrites all at the same time, simplifying the entire pipeline significantly.
    2. We don't have to choose a single Glenside representation for any Relay operator. Previously, the Relay-to-Glenside compiler needed to choose a way to represent, for example, conv2d, which can be expressed in a few ways in Glenside. Now, to cover the multiple ways to represent a complex Relay operator in Glenside, we need only include multiple rewrites.
    3. Equivalent Glenside and Relay nodes are automatically unified. Currently, Mike has some code which manually unifies each Relay node with its equivalent Glenside node, after Relay-to-Glenside compilation. This is totally unnecessary if we use rewrites, as rewrites do this exact unification. Once Mike had to implement this code, it was pretty immediately obvious that we needed to rethink our compilation method.

    My plan for doing this: for each Relay operator supported in Glenside, add a rewrite which rewrites from a Relay operator invocation to a Glenside expression. How should we test, however?

    Operators remaining to be implemented:

    • [x] conv2d
    • [x] conv1d
    • [x] softmax
    • [x] relu
    • [x] sqrt
    • [x] negative
    • [x] maxpool2d
    • [x] global_avg_pool2d
    • [x] expand_dims
    • [x] pad
    • [x] dense
    • [x] add
    • [x] multiply
    • [x] divide
    • [ ] ~maximum~ (we only implement this op opaquely in Glenside at the moment)
    • [ ] ~minimum~ (we only implement this op opaquely in Glenside at the moment)
    • [x] batch_flatten
    • [x] bias_add
    • [x] concatenate
    • [x] reshape
    • [x] transpose
    • [x] squeeze
    opened by gussmith23 14
  • Bring some of 3la-pldi-push-main to main

    Bring some of 3la-pldi-push-main to main

    This PR brings changes from our PLDI 2022 push into main. It does so by starting from our 3la-pldi-push-main branch and reverting/undoing/modifying a bunch of the changes. This allows me to preserve the commit history (giving proper contribution attribution to Vishal and Mike!) while not merging everything from the PLDI push.

    For each of the following files, we need to inspect the diff between 3la-pldi-push-main and main, and revert any changes that we don't want. Additionally, we need to add in any necessary tests.

    I might enlist some help on this cleanup in winter 2022. If so, I'll provide some more detailed instructions.

    Basic workflow:

    1. checkout bring-some-of-3la-pldi-push-main-to-main
    2. choose a file from the list below and run git diff bring-some-of-3la-pldi-push-main-to-main..main
    3. inspect the changes and determine which are appropriate to bring to main
    4. run git checkout --patch main -- <filename>. This will allow you to revert changes hunk-by-hunk. For the changes which shouldn't be merged, hit y to tell git to create a revert change for the change. For the changes which should be merged, hit n -- no revert change will be created.
    5. once you finish the checkout, commit all of the reverts for that file. They should already be staged.

    Each major change being brought to main should be tested.

    This will likely also result in changes to TVM, which will hopefully get merged into mainline. We have a similar 3la-pldi-push-main branch on our TVM repo, for which we'll need to do this same process.

    List of files:

    • [X] .github/workflows/build-and-test.yml
    • [x] .github/workflows/rustfmt.yml
    • [ ] Cargo.toml
    • [x] Dockerfile
    • [x] examples/glenside-cli.rs
    • [x] models/lstm-for-pldi-pattern.relay
    • [x] models/lstm-for-pldi.relay
    • [x] models/resmlp.relay
    • [x] models/resnet.relay
    • [x] models/transformer.relay
    • [x] src/codegen.rs
    • [x] src/extraction/ilp.rs
    • [ ] src/extraction/mod.rs
    • [ ] src/language/from_relay/mod.rs
    • [ ] src/language/interpreter.rs
    • [ ] src/language/language.rs
    • [ ] src/language/rewrites.rs
    • [ ] tests/3la-glenside.rs
    • [ ] tests/3la-resnet.rs
    • [ ] tests/codegen-mlp.rs
    • [ ] tests/conv1d-flexmatch.rs
    • [ ] tests/lstm_relay_to_glenside.rs
    • [ ] tests/mobilenet-relay-to-glenside.rs
    • [ ] tests/mobilenet-try-to-run-rewrites.rs
    • [ ] tests/resmlp-linear-rewrites.rs
    • [ ] tests/resnet18_relay_to_glenside.rs
    • [ ] tests/tensorize-conv2d-with-padding-and-splitting-from-command-line.rs
    • [ ] tests/tensorize-conv2d-with-padding-and-splitting-with-blocking-from-command-line.rs
    • [ ] tests/tensorize-conv2d-with-padding-and-splitting.rs
    • [ ] tests/transformer.rs

    specific todos:

    • [x] relay_op_softmax test in codegen shouldn't be ignored
    • [x] from_relay() needs to be updated so that it doesn't return a tuple
    • [x] Compilation needs to use the new relay->glenside rewrites
    • [x] Fix conv2d NHWC typechecking in language.rs (see warn!)
    • [x] conv2d from_relay should respect use_opaque_operators_for. Currently, the glenside impl is always generated.
    • [x] We can't point to 3la-tvm, should point to mainline TVM if possible
    • [x] Update to new egg
    opened by gussmith23 8
  • Glenside expressions have too many common subexpressions; not capturing what's really happening

    Glenside expressions have too many common subexpressions; not capturing what's really happening

    The basic problem can be captured with the syntax of access-slice. Each slice slices out a piece of a tensor. For a given slice, it's currently ambiguous as to whether the tensor is precomputed and resident in memory, or computed just for that slice. Well, most of the time, that slice will have already been computed, and we're just reading from memory. That isn't what the language seems to capture, though! It seems moreso to say that every slice expression computes the entire subexpression again.

    Now, at first this wasn't a problem, because we could do whatever we wanted with the glenside expression once we extracted it. So when we compile a program, we make sure each subexpression is computed just once.

    This raises the minor issue of the fact that the semantics of the language are vague/contradictory. But we can live with that, if things are working.

    However, there's a much bigger problem now: with real models (i.e. mobilenet), subexpressions show up a LOT. Mobilenet's convolutional layers are currently implemented using slice to slice out parts of the weight/image to be convolved. This happens many times (16x, 32x at least) per convolutional layer. This means that when we go to write a cost function, the cost function is going to grow 16x per convolutional layer, as egg cost functions must be monotonic (that is, an expression's cost must be >= the cost of any of its children). Those 16s stack fast and quickly overflow usize::MAX.

    research 
    opened by gussmith23 5
  • Build Errors from Fresh Clone

    Build Errors from Fresh Clone

    Replication Steps:

    $ git clone [email protected]:gussmith23/glenside.git
    $ cd glenside
    $ cargo run demo \
         conv2d \
         data/conv2d/conv2d.glenside \
         data/conv2d/conv2d-shapes.json \
         conv2d.c \
         conv2d-hw-design.json \
         --iter-limit 40 \
         --time-limit 40 \
         --node-limit 500000 \
         --find-monolithic-designs '(64,64)' \
         --blocking '(64,64)' \
         --prefer-bsg-blocking
    

    Error Message:

     1    Compiling serde v1.0.116
     2    Compiling tvm-sys v0.1.0 (https://github.com/mwillsey/incubator-tvm?rev=3b6edf9ec0b6b3ab6a91174e7e2aa321cd8ec9b2#3b6    edf9e)
     3    Compiling tvm-macros v0.1.1 (https://github.com/mwillsey/incubator-tvm?rev=3b6edf9ec0b6b3ab6a91174e7e2aa321cd8ec9b2#    3b6edf9e)
     4 error: failed to run custom build command for `tvm-sys v0.1.0 (https://github.com/mwillsey/incubator-tvm?rev=3b6edf9ec0    b6b3ab6a91174e7e2aa321cd8ec9b2#3b6edf9e)`
     5
     6 Caused by:
     7   process didn't exit successfully: `/home/stdavids/bsg_ml_atoms/extern/glenside/target/debug/build/tvm-sys-36ae5b38ceb    5a8d5/build-script-build` (exit code: 101)
     8 --- stdout
     9 cargo:rerun-if-env-changed=TVM_HOME
    10 cargo:rustc-link-lib=dylib=tvm
    11 cargo:rustc-link-search=/home/stdavids/.cargo/git/checkouts/incubator-tvm-4920fa2689ba3c60/3b6edf9/build
    12 cargo:warning=couldn't execute `llvm-config --prefix` (error: No such file or directory (os error 2))
    13 cargo:warning=set the LLVM_CONFIG_PATH environment variable to a valid `llvm-config` executable
    14
    15 --- stderr
    16 thread 'main' panicked at 'Unable to find libclang: "the `libclang` shared library at /usr/lib64/clang-private/libclang    .so.7 could not be opened: libclangAST.so.7: cannot open shared object file: No such file or directory"', /home/stdavid    s/.cargo/registry/src/github.com-1ecc6299db9ec823/bindgen-0.51.1/src/lib.rs:1731:13
    17 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    18
    19 warning: build failed, waiting for other jobs to finish...
    20 error: build failed
    
    opened by stdavids 4
  • Add new atom (with rewrites and codegen): systolic array with externally (non-Glenside) managed blocking

    Add new atom (with rewrites and codegen): systolic array with externally (non-Glenside) managed blocking

    Scott's tools can block up computations just like Glenside can. He wants us to add a new atom that will let him do the blocking himself.

    Tasks:

    • [x] Add atom: systolic-array-automatic-blocking
      • naming isn't great
    • [x] Add rewrite: need to figure out exactly what can be blocked up by Scott's scripts
    • [x] ~Interpret~ we're not actually implementing the systolic arrays in the interpreter (they're not needed, yet...and at that point, the interpreter would become, like, an emulator)
    • [x] Add codegen: call Scott's API
    • [x] Add demo
    opened by gussmith23 4
  • Tuples!

    Tuples!

    Included in this PR:

    • new glenside language constructs
      • ConstructTuple
      • TupleGetItem
    • additional Relay Opaque Operators
      • RelayLeakyReLU
      • RelayAvgPool2D
      • RelayUpSampling
      • RelaySigmoid
      • RelayMaximum
      • RelayMinimum
    • edit fn create_worklist in from_relay/mod.rs to efficiently parse larger programs
    • edit run_relay.py to allow tuple output
    opened by hypercubestart 3
  • Implement conv2d-im2col-fc

    Implement conv2d-im2col-fc

    Scott can support an im2col'ed conv2d fully-connected systolic array invocation taking NHWC input. This means we can fold a bunch more access patterns into a hardware op, making us generate less code on our side (and hopefully making things faster by relying on his stuff more).

    To be clear, what I mean is this: There are roughly three ways that we've discussed implementing convolutions.

    1. "normal" convolutions. See #45 for a description of these. I have more info in my notes from talking with Scott.
    2. Im2col convolutions, where the data is already in im2col format -- we say that this is a "fully connected" conv2d, because we're computing a conv2d but it looks like a single fully-connected (i.e. matmul/dense) invocation, because of the im2col trick. These are already implemented in Glenside.
    3. Im2col convolutions, where the data is in memory as NHWC and Scott's hardware reads it in as im2col. This is what we want to implement here.
    hardware language rewrites 
    opened by gussmith23 3
  • Opaque relay op codegen

    Opaque relay op codegen

    TODO:

    • [x] implement codegen for relay ops
    • [x] implement add broadcasting
    • [x] add tests
      • [x] RelayGlobalAvgPool2D
      • [x] RelayBatchFlatten
      • [x] RelayBiasAdd
    • [x] implement assertion to check result with out array (need to add functions on NDarray)
      • [x] batchnorm
      • [x] softmax
      • [x] relu
      • [x] maxpool2d

    codegen for #65

    opened by hypercubestart 3
  • Investigate TVM playing nice with Docker

    Investigate TVM playing nice with Docker

    docker build --tag glenside .
    docker run glenside cargo test --no-default-features --features tvm
    

    On a clean copy of the repository, running the commands above works on the first iteration.

    However, subsequent runs of the test suite produce the following output:

    failures:
    
    ---- codegen::tests::relay_op_softmax stdout ----
    thread 'codegen::tests::relay_op_softmax' panicked at 'Running Relay code failed with code Some(1).
    stdout:
    
    stderr:
    Traceback (most recent call last):
      File "/root/glenside/src/language/from_relay/run_relay.py", line 42, in <module>
        output = relay.create_executor(mod=expr, kind="graph").evaluate()(*inputs)
      File "/root/tvm/python/tvm/relay/backend/interpreter.py", line 172, in evaluate
        return self._make_executor()
      File "/root/tvm/python/tvm/relay/build_module.py", line 395, in _make_executor
        mod = build(self.mod, target=self.target)
      File "/root/tvm/python/tvm/relay/build_module.py", line 277, in build
        tophub_context = autotvm.tophub.context(list(target.values()))
      File "/root/tvm/python/tvm/autotvm/tophub.py", line 116, in context
        if not check_backend(tophub_location, name):
      File "/root/tvm/python/tvm/autotvm/tophub.py", line 158, in check_backend
        download_package(tophub_location, package_name)
      File "/root/tvm/python/tvm/autotvm/tophub.py", line 184, in download_package
        os.mkdir(path)
    FileExistsError: [Errno 17] File exists: '/root/.tvm'
    ', src/codegen.rs:1836:9
    
    ---- codegen::tests::relay_op_relu stdout ----
    thread 'codegen::tests::relay_op_relu' panicked at 'Running Relay code failed with code Some(1).
    stdout:
    
    stderr:
    Traceback (most recent call last):
      File "/root/glenside/src/language/from_relay/run_relay.py", line 42, in <module>
        output = relay.create_executor(mod=expr, kind="graph").evaluate()(*inputs)
      File "/root/tvm/python/tvm/relay/backend/interpreter.py", line 172, in evaluate
        return self._make_executor()
      File "/root/tvm/python/tvm/relay/build_module.py", line 395, in _make_executor
        mod = build(self.mod, target=self.target)
      File "/root/tvm/python/tvm/relay/build_module.py", line 277, in build
        tophub_context = autotvm.tophub.context(list(target.values()))
      File "/root/tvm/python/tvm/autotvm/tophub.py", line 116, in context
        if not check_backend(tophub_location, name):
      File "/root/tvm/python/tvm/autotvm/tophub.py", line 158, in check_backend
        download_package(tophub_location, package_name)
      File "/root/tvm/python/tvm/autotvm/tophub.py", line 184, in download_package
        os.mkdir(path)
    FileExistsError: [Errno 17] File exists: '/root/.tvm/tophub'
    ', src/codegen.rs:1836:9
    
    ---- codegen::tests::relay_op_batchflatten stdout ----
    thread 'codegen::tests::relay_op_batchflatten' panicked at 'Running Relay code failed with code Some(1).
    stdout:
    
    stderr:
    Traceback (most recent call last):
      File "/root/glenside/src/language/from_relay/run_relay.py", line 42, in <module>
        output = relay.create_executor(mod=expr, kind="graph").evaluate()(*inputs)
      File "/root/tvm/python/tvm/relay/backend/interpreter.py", line 172, in evaluate
        return self._make_executor()
      File "/root/tvm/python/tvm/relay/build_module.py", line 395, in _make_executor
        mod = build(self.mod, target=self.target)
      File "/root/tvm/python/tvm/relay/build_module.py", line 277, in build
        tophub_context = autotvm.tophub.context(list(target.values()))
      File "/root/tvm/python/tvm/autotvm/tophub.py", line 116, in context
        if not check_backend(tophub_location, name):
      File "/root/tvm/python/tvm/autotvm/tophub.py", line 158, in check_backend
        download_package(tophub_location, package_name)
      File "/root/tvm/python/tvm/autotvm/tophub.py", line 184, in download_package
        os.mkdir(path)
    FileExistsError: [Errno 17] File exists: '/root/.tvm/tophub'
    ', src/codegen.rs:1836:9
    
    
    failures:
        codegen::tests::relay_op_batchflatten
        codegen::tests::relay_op_relu
        codegen::tests::relay_op_softmax
    
    test result: FAILED. 300 passed; 3 failed; 8 ignored; 0 measured; 0 filtered out; finished in 33.86s
    

    Sometimes, clearing the Docker cache and rebuilding the image can fix this issue, but it doesn't always fix it for some reason.

    We should look into this!

    continuous integration 
    opened by ninehusky 2
  • Stack overflow when running ILP extraction, but only after adding an objective function

    Stack overflow when running ILP extraction, but only after adding an objective function

    Strange bug I've been hunting all day.

    Originally, I'd set up the most basic set of constraints in an ILP program and pushed it through the solver. I realized, though, that without an objective function, it will just set all of the values to 1.0. So I chose the most basic objective I could: minimize the sum of all of the eclass boolean variables. In some sense, it's like extracting the smallest program, as we do with the normal egg extractor.

    I started running into a bunch of problems, though, with the stack overflowing.

    First, some routines inside of lp-modeler were not happy with the size of the model, it seemed. I was running into this problem: https://github.com/jcavat/rust-lp-modeler/issues/37

    Then, though I think (I'm not sure, should check) that I sidestepped those issues by avoiding having the problematic routines run in lp-modeler, I'm still getting stack overflows. Strangely, I'm having trouble tracking them down, too. They may also be happening during panic!(), as I've put some panic!()s in for testing. (A solution Jared suggested is to use exit() instead of panic!() to avoid unwinding the stack...there may be issues with this still on Mac).

    I've tried upping the stack size, but initial testing doesn't seem to indicate that it's helping. Also, doing this in a sustainable way with cargo seems like a pain! I've been avoiding doing it, but may be stuck now.

    I am also going to try running on the linux machine. Maybe the easiest way out.

    bug 
    opened by gussmith23 2
  • Remove `MyAnalysisData::Legacy`

    Remove `MyAnalysisData::Legacy`

    It's basically only used for Usize now. Should be replaced with Usize.

    • [x] Remove MyAnalysisDataLegacyData struct
    • [ ] change implementation of get_usize()
    • [ ] Remove implementation of get_shape()
    good first issue cleanup 
    opened by gussmith23 2
  • Bump numpy from 1.17.2 to 1.22.0

    Bump numpy from 1.17.2 to 1.22.0

    Bumps numpy from 1.17.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Relation with TENSAT?

    Relation with TENSAT?

    I'm new to the space and I'm trying to fit all the pieces together conceptually... TENSAT (https://github.com/uwplse/tensat) appears to operate in a closely related task. From my understanding, TENSAT is working at a higher level (performing subgraph substitutions of subgraphs of operators) whereas Glenside is a lower-level representation that can optimize the individual implementation of these operators.

    Although (I think?) Glenside can process an entire computation graph at once, its rewrite rules focus it on a very different set of optimizations, and trying to shoe-horn in TENSAT's high level rewrites would, even if it is possible, blow up the search space too much for egg.

    Conceptually, could the two approaches be combined in a hierarchical fashion in which TENSAT rewrites the computation graph, then Glenside could be used to optimize each operator in isolation?

    opened by ZakSingh 1
  • Operator and access fusion

    Operator and access fusion

    For 3la work, it seems like it'd be useful to be able to represent fused operators in Glenside.

    Something like

    (compute add
     (pair
      (compute dot product
       (cartprod ?a ?b))
      ?c))
    

    would be nice to represent as

    (compute-fused (list dot-prod pair add)
      (cartprod ?a ?b) ?c))
    

    I think this is doable.

    First step would be to represent access operators as we represent compute operators, i.e. (access op args). This allows us to make the "fusion list" that I showed above, (list dot-prod pair add), of access pattern ops and compute ops.

    Type checking would then go through each compute/access op, performing type checking on them in sequence. There will be some limitations early on, I think, with how type checking can proceed. To illustrate, let me step through how we'd type-check/do shape inference on the above example.

    • We would iterate through the list of compute/access ops and shape-infer/type-check them in sequence. We start first with dot-prod. Dot prod takes one argument, so we pop the first argument off the list of args, in this case the cart-prod expression.
      • Here we hit our first limitation: if we implement it the above way, then we need to know how many args each op takes. Alternatively, we represent the list of arguments as a list of lists.
      • Another big limitation: I'm not sure how to represent any access patterns that require non-tensor arguments. One way would be to differentiate between data args and attributes, as TVM/Relay does. So for example, the current access operator, which we might rename to at so that it's (access at ...), would take one attribute (the access axis) and one data argument (the tensor to access). Then, to represent it in a fusion pipeline, we'd have (list ... (at 2) ...), where in this case the operator is constructed with its attributes.
    • To type-check, we would just run the same cartprod type checking routine we already have in Glenside.
    • Then we'd store the resulting intermediate type and move on to pair. Pair takes two args, and we have one provided by the intermediate value, so we pop another off the list.
      • Here's another limitation: we have to make an assumption about argument order. so perhaps we assume that the intermediate value is always the first arg. In this case, it doesn't matter, but in other cases it would. This could be fixed by replacing the list with a full AST structure, showing where arguments lie.
    • We'd run type checking on the pair, to produce another intermediate type.
    • Lastly, add takes only a single arg, which is our intermediate. So we just run that through type checking for add and we're done.

    I think this works. We could do this quickly by doing the following:

    • Introduce a new apply-access-op construct: (apply-access-op <access-op> <arg>...)
    • Introduce new access ops for the accesses we need: in the example above, we only need pair.
    • Introduce compute-fused and implement the type checking routine I described above.

    The big question is, does this work for Akash?

    research 
    opened by gussmith23 0
  • integration with Bernstein/Liu's ATL (A Tensor Language)

    integration with Bernstein/Liu's ATL (A Tensor Language)

    ATL is a language developed by Gilbert Bernstein and Amanda Liu at MIT. It is a scheduling-level tensor language. It would be very interesting to compile to ATL from Glenside, or even to incorporate a subset of ATL into Glenside to do rewriting over ATL without having to use Coq.

    research 
    opened by gussmith23 0
  • Import Rosettafold model

    Import Rosettafold model

    The Rosettafold model (which is implemented in PyTorch, I think) could be a good test example for reducing data transpositions using Glenside.

    https://github.com/RosettaCommons/RoseTTAFold/tree/main/network

    new workload 
    opened by gussmith23 0
Owner
Gus Smith
Computer Architecture and Programming Languages PhD @ University of Washington
Gus Smith
l2 is a fast, Pytorch-style Tensor+Autograd library written in Rust

l2 • ?? A Pytorch-style Tensor+Autograd library written in Rust Installation • Contributing • Authors • License • Acknowledgements Made by Bilal Khan

Bilal Khan 163 Dec 25, 2022
A neural network, and tensor dynamic automatic differentiation implementation for Rust.

Corgi A neural network, and tensor dynamic automatic differentiation implementation for Rust. BLAS The BLAS feature can be enabled, and requires CBLAS

Patrick Song 20 Nov 7, 2022
Rust implementation of real-coded GA for solving optimization problems and training of neural networks

revonet Rust implementation of real-coded genetic algorithm for solving optimization problems and training of neural networks. The latter is also know

Yury Tsoy 19 Aug 11, 2022
TP - Optimization of the energy consumption of a Matrix Multiplication

Optimization de la consommation: Multiplication de matrices Nous allons travailler sur la multiplication de matrices denses. C’est un algorithme class

Bynawers 0 Mar 11, 2022
A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

random_search A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the los

ph04 2 Apr 1, 2022
PyQIR is a set of APIs for generating, parsing, and evaluating Quantum Intermediate Representation (QIR).

PyQIR PyQIR is a set of APIs for generating, parsing, and evaluating Quantum Intermediate Representation (QIR). It consists of the following component

QIR Alliance 37 Dec 31, 2022
Serenade: Low-Latency Session-Based Recommendations

Serenade: Low-Latency Session-Based Recommendations This repository contains the official code for session-based recommender system Serenade, which em

bol.com 61 Dec 16, 2022
Low effort scraping Python's pickle format in Rust. It is to complete pickle parsing as BeautifulSoup was to complete HTML parsing.

repugnant-pickle Because it is, isn't it? This is a Rust crate for dealing with the Python pickle format. It also has support for opening PyTorch file

Kerfuffle 7 Apr 7, 2023
An example of using TensorFlow rust bindings to serve trained machine learning models via Actix Web

Serving TensorFlow with Actix-Web This repository gives an example of training a machine learning model using TensorFlow2.0 Keras in python, exporting

Kyle Kosic 39 Dec 12, 2022
Network-agnostic, high-level game networking library for client-side prediction and server reconciliation.

WARNING: This crate currently depends on nightly rust unstable and incomplete features. crystalorb Network-agnostic, high-level game networking librar

Ernest Wong 175 Dec 31, 2022
High-level non-blocking Deno bindings to the rust-bert machine learning crate.

bertml High-level non-blocking Deno bindings to the rust-bert machine learning crate. Guide Introduction The ModelManager class manages the FFI bindin

Carter Snook 14 Dec 15, 2022
A high level, easy to use gpgpu crate based on wgpu

A high level, easy to use gpgpu crate based on wgpu. It is made for very large computations on powerful gpus

null 18 Nov 26, 2022
RustFFT is a high-performance FFT library written in pure Rust.

RustFFT is a high-performance FFT library written in pure Rust. It can compute FFTs of any size, including prime-number sizes, in O(nlogn) time.

Elliott Mahler 411 Jan 9, 2023
Instance Distance is a fast pure-Rust implementation of the Hierarchical Navigable Small Worlds paper

Fast approximate nearest neighbor searching in Rust, based on HNSW index

Instant Domain Search, Inc. 135 Dec 24, 2022
Tensors and dynamic neural networks in pure Rust.

Neuronika is a machine learning framework written in pure Rust, built with a focus on ease of use, fast prototyping and performance. Dynamic neural ne

Neuronika 851 Jan 3, 2023
Deep learning superresolution in pure rust

Rusty_SR A Rust super-resolution tool, which when given a low resolution image utilises deep learning to infer the corresponding high resolution image

zza 189 Dec 9, 2022
Automatic differentiation in pure Rust.

Niura is an automatic differentiation library written in Rust. Add niura to your project [dependencies] niura = { git = "https://github.com/taminki/n

null 10 Jun 16, 2022
A tiny embedding database in pure Rust.

tinyvector - a tiny embedding database in pure Rust ✨ Features Tiny: It's in the name. It's literally just an axum server. Extremely easy to customize

Miguel Piedrafita 210 Jul 12, 2023
Program implementing the approximate version of DBSCAN introduced by Gan and Tao

appr_dbscan_rust Rust implementation of the approximate version of DBSCAN introduced by Gan and Tao in this paper Notice An upated version of this lib

Ivano Donadi 1 May 18, 2021