Library for the Standoff Text Annotation Model, in Rust

Overview

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

STAM Library

STAM is a data model for stand-off text annotation and described in detail here. This is a sofware library to work with the model, written in Rust.

This is the primary software library for working with the data model. It is currently in a preliminary stage. We aim to implement the full model and most extensions.

Installation

$ cargo install stam

Usage

Import the library

use stam;

Loading a STAM JSON file containing an annotation store:

fn your_function() -> Result<(),stam::StamError> {
    let store = stam::AnnotationStore::from_file("example.stam.json")?;
    ...
}

We assume some kind of function returning Result<_,stam::StamError> for all examples in this section.

The annotation store is your workspace, it holds all resources, annotation sets (i.e. keys and annotation data) and of course the actual annotations. It is a memory-based store and you can as much as you like into it (as long as it fits in memory:).

Retrieving anything by ID:

let annotation: &stam::Annotation = store.get_by_id("my-annotation")?;
let resource: &stam::TextResource = store.get_by_id("my-resource")?;
let annotationset: &stam::AnnotationDataSet = store.get_by_id("my-annotationset")?;
let key: &stam::DataKey = annotationset.get_by_id("my-key")?;
let data: &stam::AnnotationData = annotationset.get_by_id("my-data")?;

Note it is important to specify the return type, as that's how the compiler can infer what you want to get. The methods are provided by the ForStore<T> trait.)

Iterating through all annotations in the store, and outputting a simple tab separated format:

for annotation in store.annotations() {
    let id = annotation.id().unwrap_or("");
    for (key, data, dataset) in store.data(annotation) {
        // get the text to which this annotation refers (if any)
        let text: &str = match annotation.target().kind() {
            stam::SelectorKind::TextSelector => {
                store.select(annotation.target())?
            },
            _ => "",
        };
        print!("{}\t{}\t{}\t{}", id, key.id().unwrap(), data.value(), text);
    }
}

Add resources:

let resource_handle = store.insert( stam::TextResource::from_file("my-text.txt") )?;

Many methods return a so called handle instead of a reference. You can use this handle to obtain a reference as shown in the next example, in which we obtain a reference to the resource we just inserted:

let resource: &stam::Resource = store.get(resource_handle)?;

Retrieving items by handle is much faster than retrieval by public ID, as handles encapsulate an internal numeric ID. Passing around handles is also cheap and sometimes easier than passing around references, as it avoids borrowing issues.

Add annotations:

let annotation_handle = store.annotate( stam::Annotation::builder()
           .target_text( "testres".into(), stam::Offset::simple(6,11)) 
           .with_data("testdataset".into(), "pos".into(), stam::DataValue::String("noun".to_string())) 
)?;

Here we see some Builder types that are use a builder pattern to construct instances of their respective types. The actual instances will be built by the underlying store. You can note the heavy use of into() to coerce the parameters to the right type. Rather than pass string parameters referring to public IDs, you may just as well pass and coerce (again with into()) references like &Annotation, &AnnotationDataSet, &DataKey or handles. We call the type of these parameters AnyId<T> and you will encounter them in more places.

Create a store and annotations from scratch, with an explicitly filled AnnotationDataSet:

let store = stam::AnnotationStore::new().with_id("test".into())
    .add( stam::TextResource::from_string("testres".into(), "Hello world".into()))?
    .add( stam::AnnotationDataSet::new().with_id("testdataset".into())
           .add( stam::DataKey::new("pos".into()))?
           .with_data("D1".into(), "pos".into() , "noun".into())?
    )?
    .with_annotation( stam::Annotation::builder() 
            .with_id("A1".into())
            .target_text( "testres".into(), stam::Offset::simple(6,11)) 
            .with_data_by_id("testdataset".into(), "D1".into()) )?;

And here is the very same thing but the AnnotationDataSet is filled implicitly here:

let store = stam::AnnotationStore::new().with_id("test".into())
    .add( stam::TextResource::from_string("testres".to_string(),"Hello world".into()))?
    .add( stam::AnnotationDataSet::new().with_id("testdataset".into()))?
    .with_annotation( stam::Annotation::builder()
            .with_id("A1".into())
            .target_text( "testres".into(), stam::Offset::simple(6,11)) 
            .with_data_with_id("testdataset".into(),"pos".into(),"noun".into(),"D1".into())
    )?;

The implementation will ensure to reuse any already existing AnnotationData if possible, as not duplicating data is one of the core characteristics of the STAM model.

You can serialize the entire annotation store (including all sets and annotations) to a STAM JSON file:

store.to_file("example.stam.json")?;

API Reference Documentation

See here

Comments
Releases(v0.1.0)
  • v0.1.0(Jan 13, 2023)

    Initial release. This library is an an early stage of development. Not ready for production use yet.

    Implemented at this stage is the core model and serialisation from/to STAM json.

    Source code(tar.gz)
    Source code(zip)
Owner
annotation
Annotation tools to facilitate computational analysis of research data in the humanities, especially texts.
annotation
m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

Bayes' Witnesses 2.3k Dec 31, 2022
Docker for PyTorch rust bindings `tch`. Example of pretrain model.

tch-rs-pretrain-example-docker Docker for PyTorch rust bindings tch-rs. Example of pretrain model. Docker files support the following install libtorch

vaaaaanquish 5 Oct 7, 2022
Python+Rust implementation of the Probabilistic Principal Component Analysis model

Probabilistic Principal Component Analysis (PPCA) model This project implements a PPCA model implemented in Rust for Python using pyO3 and maturin. In

FindHotel 11 Dec 16, 2022
Experimenting with Rust's fundamental data model

ferrilab Redefining the Rust fundamental data model bitvec funty radium Introduction The ferrilab project is a collection of crates that provide more

Rusty Bit-Sequences 13 Dec 13, 2022
A rust implementation of the csl-next model.

Vision This is a project to write the CSL-Next typescript model and supporting libraries and tools in Rust, and convert to JSON Schema from there. At

Bruce D'Arcus 4 Jun 13, 2023
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Cleora Cleora is a genus of moths in the family Geometridae. Their scientific name derives from the Ancient Greek geo γῆ or γαῖα "the earth", and metr

Synerise 405 Dec 20, 2022
Masked Language Model on Wasm

Masked Language Model on Wasm This project is for OPTiM TECH BLOG. Please see below: WebAssemblyを用いてBERTモデルをフロントエンドで動かす Demo Usage Build image docker

OPTiM Corporation 20 Sep 23, 2022
This is a rewrite of the RAMP (Rapid Assistance in Modelling the Pandemic) model

RAMP from scratch This is a rewrite of the RAMP (Rapid Assistance in Modelling the Pandemic) model, based on the EcoTwins-withCommuting branch, in Rus

Dustin Carlino 3 Oct 26, 2022
A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

random_search A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the los

ph04 2 Apr 1, 2022
Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

OpenTau: Using OpenAI Codex for Gradual Type Inference Current implementation is focused on TypeScript Python implementation comes next Requirements r

Gamma Tau 11 Dec 18, 2022
Your one stop CLI for ONNX model analysis.

Your one stop CLI for ONNX model analysis. Featuring graph visualization, FLOP counts, memory metrics and more! ⚡️ Quick start First, download and ins

Christopher Fleetwood 20 Dec 30, 2022
A demo repo that shows how to use the latest component model feature in wasmtime to implement a key-value capability defined in a WIT file.

Key-Value Component Demo This repo serves as an example of how to use the latest wasm runtime wasmtime and its component-model feature to build and ex

Jiaxiao Zhou 3 Dec 20, 2022
Believe in AI democratization. llama for nodejs backed by llama-rs, work locally on your laptop CPU. support llama/alpaca model.

llama-node Large Language Model LLaMA on node.js This project is in an early stage, the API for nodejs may change in the future, use it with caution.

Genkagaku.GPT 145 Apr 10, 2023
WebAssembly component model implementation for any backend.

wasm_component_layer wasm_component_layer is a runtime agnostic implementation of the WebAssembly component model. It supports loading and linking WAS

Douglas Dwyer 11 Aug 28, 2023
🌾 High-performance Text processing library for the Thai language, built with Rust and exposed as a Python package.

Thongna ?? Thongna (ท้องนา) is a high-performance text processing library for the Thai language, built with Rust and exposed as a Python package. Insp

fr4nk 3 Aug 17, 2024
Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022
A Rust library with homemade machine learning models to classify the MNIST dataset. Built in an attempt to get familiar with advanced Rust concepts.

mnist-classifier Ideas UPDATED: Finish CLI Flags Parallelize conputationally intensive functions Class-based naive bayes README Image parsing Confusio

Neil Kaushikkar 0 Sep 2, 2021
Machine Learning library for Rust

rusty-machine This library is no longer actively maintained. The crate is currently on version 0.5.4. Read the API Documentation to learn more. And he

James Lucas 1.2k Dec 31, 2022
Rust library for Self Organising Maps (SOM).

RusticSOM Rust library for Self Organising Maps (SOM). Using this Crate Add rusticsom as a dependency in Cargo.toml [dependencies] rusticsom = "1.1.0"

Avinash Shenoy 26 Oct 17, 2022