Library for the Standoff Text Annotation Model, in Rust

annotation

Last update: Jan 11, 2023

Related tags

Machine learning stam-rust

Overview

STAM Library

STAM is a data model for stand-off text annotation and described in detail here. This is a sofware library to work with the model, written in Rust.

This is the primary software library for working with the data model. It is currently in a preliminary stage. We aim to implement the full model and most extensions.

Installation

$ cargo install stam

Usage

Import the library

use stam;

Loading a STAM JSON file containing an annotation store:

fn your_function() -> Result<(),stam::StamError> {
    let store = stam::AnnotationStore::from_file("example.stam.json")?;
    ...
}

We assume some kind of function returning Result<_,stam::StamError> for all examples in this section.

The annotation store is your workspace, it holds all resources, annotation sets (i.e. keys and annotation data) and of course the actual annotations. It is a memory-based store and you can as much as you like into it (as long as it fits in memory:).

Retrieving anything by ID:

let annotation: &stam::Annotation = store.get_by_id("my-annotation")?;
let resource: &stam::TextResource = store.get_by_id("my-resource")?;
let annotationset: &stam::AnnotationDataSet = store.get_by_id("my-annotationset")?;
let key: &stam::DataKey = annotationset.get_by_id("my-key")?;
let data: &stam::AnnotationData = annotationset.get_by_id("my-data")?;

Note it is important to specify the return type, as that's how the compiler can infer what you want to get. The methods are provided by the ForStore<T> trait.)

Iterating through all annotations in the store, and outputting a simple tab separated format:

for annotation in store.annotations() {
    let id = annotation.id().unwrap_or("");
    for (key, data, dataset) in store.data(annotation) {
        // get the text to which this annotation refers (if any)
        let text: &str = match annotation.target().kind() {
            stam::SelectorKind::TextSelector => {
                store.select(annotation.target())?
            },
            _ => "",
        };
        print!("{}\t{}\t{}\t{}", id, key.id().unwrap(), data.value(), text);
    }
}

Add resources:

let resource_handle = store.insert( stam::TextResource::from_file("my-text.txt") )?;

Many methods return a so called handle instead of a reference. You can use this handle to obtain a reference as shown in the next example, in which we obtain a reference to the resource we just inserted:

let resource: &stam::Resource = store.get(resource_handle)?;

Retrieving items by handle is much faster than retrieval by public ID, as handles encapsulate an internal numeric ID. Passing around handles is also cheap and sometimes easier than passing around references, as it avoids borrowing issues.

Add annotations:

let annotation_handle = store.annotate( stam::Annotation::builder()
           .target_text( "testres".into(), stam::Offset::simple(6,11)) 
           .with_data("testdataset".into(), "pos".into(), stam::DataValue::String("noun".to_string())) 
)?;

Here we see some Builder types that are use a builder pattern to construct instances of their respective types. The actual instances will be built by the underlying store. You can note the heavy use of into() to coerce the parameters to the right type. Rather than pass string parameters referring to public IDs, you may just as well pass and coerce (again with into()) references like &Annotation, &AnnotationDataSet, &DataKey or handles. We call the type of these parameters AnyId<T> and you will encounter them in more places.

Create a store and annotations from scratch, with an explicitly filled AnnotationDataSet:

let store = stam::AnnotationStore::new().with_id("test".into())
    .add( stam::TextResource::from_string("testres".into(), "Hello world".into()))?
    .add( stam::AnnotationDataSet::new().with_id("testdataset".into())
           .add( stam::DataKey::new("pos".into()))?
           .with_data("D1".into(), "pos".into() , "noun".into())?
    )?
    .with_annotation( stam::Annotation::builder() 
            .with_id("A1".into())
            .target_text( "testres".into(), stam::Offset::simple(6,11)) 
            .with_data_by_id("testdataset".into(), "D1".into()) )?;

And here is the very same thing but the AnnotationDataSet is filled implicitly here:

let store = stam::AnnotationStore::new().with_id("test".into())
    .add( stam::TextResource::from_string("testres".to_string(),"Hello world".into()))?
    .add( stam::AnnotationDataSet::new().with_id("testdataset".into()))?
    .with_annotation( stam::Annotation::builder()
            .with_id("A1".into())
            .target_text( "testres".into(), stam::Offset::simple(6,11)) 
            .with_data_with_id("testdataset".into(),"pos".into(),"noun".into(),"D1".into())
    )?;

The implementation will ensure to reuse any already existing AnnotationData if possible, as not duplicating data is one of the core characteristics of the STAM model.

You can serialize the entire annotation store (including all sets and annotations) to a STAM JSON file:

store.to_file("example.stam.json")?;

API Reference Documentation

See here

Comments

Implement resolution of relative offsets (AnnotationSelector)

Offsets in STAM may be relative to the annotation that is being pointed at (with an AnnotationSelector). These need to be added to the reverse index (textrelationmap)

opened by proycon 0
Pass user parameters to AnnotationStore

Users may want to pass parameters to the annotation store to configure what indices they want to build (by default all are built) and set some other run-time parameters.

opened by proycon 0
Serialisation/deserialisation to/from stand-off files with @include

Implement deserialisation and serialisation of the '@include' field. It is currently implemented only for TextResource. It also requires some extra bookkeeping to serialize to the same files as items were deserialized from.

opened by proycon 0
W3C Web Annotation export

Implement the https://github.com/annotation/stam/tree/master/extensions/stam-webannotations extension that enables export to W3C web annotations. Also requires a validation component that checks whether all IDs are proper IRIs.

opened by proycon 0
Implement deletion from stores

The current implementation does not do deletion yet.

Implement deletion (very easy), but also implement a mechanism to add subsequent new items at places that have been freed (rather than at the end increasing the store size).

opened by proycon 1

Releases(v0.1.0)

v0.1.0(Jan 13, 2023)

Initial release. This library is an an early stage of development. Not ready for production use yet.

Implemented at this stage is the core model and serialisation from/to STAM json.
Source code(tar.gz)
Source code(zip)

Library for the Standoff Text Annotation Model, in Rust

Related tags

Overview

STAM Library

Installation

Usage

API Reference Documentation

Comments

Implement resolution of relative offsets (AnnotationSelector)

Pass user parameters to AnnotationStore

Serialisation/deserialisation to/from stand-off files with @include

W3C Web Annotation export

Implement deletion from stores

Releases(v0.1.0)

v0.1.0(Jan 13, 2023)

Owner

annotation

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Docker for PyTorch rust bindings `tch`. Example of pretrain model.

Python+Rust implementation of the Probabilistic Principal Component Analysis model

Experimenting with Rust's fundamental data model

A rust implementation of the csl-next model.

Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Masked Language Model on Wasm

This is a rewrite of the RAMP (Rapid Assistance in Modelling the Pandemic) model

A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

Your one stop CLI for ONNX model analysis.

A demo repo that shows how to use the latest component model feature in wasmtime to implement a key-value capability defined in a WIT file.

Believe in AI democratization. llama for nodejs backed by llama-rs, work locally on your laptop CPU. support llama/alpaca model.

WebAssembly component model implementation for any backend.

🌾 High-performance Text processing library for the Thai language, built with Rust and exposed as a Python package.

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

A Rust library with homemade machine learning models to classify the MNIST dataset. Built in an attempt to get familiar with advanced Rust concepts.

Machine Learning library for Rust

Rust library for Self Organising Maps (SOM).