6 Repositories
Rust datasets Libraries
Repository for CinPatent: Datasets for Patent Classification
CinPatent: Datasets for Patent Classification We release two datasets for patent classification in English and Japanese at Google Drive. The data fold
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper
Create full-fledged APIs for static datasets without writing a single line of code.
ROAPI ROAPI automatically spins up read-only APIs for static datasets without requiring you to write a single line of code. It builds on top of Apache
Display ZFS datasets' I/O in real time
ztop Display ZFS datasets' I/O in real time Overview ztop is like top, but for ZFS datasets. It displays the real-time activity for datasets. The buil
shavee is a Program to automatically decrypt and mount ZFS datasets using Yubikey HMAC as 2FA or any USB drive with support for PAM to auto mount home directories.
shavee is a simple program to decrypt and mount encrypted ZFS user home directories at login using Yubikey HMAC or a Simple USB drive as 2FA written in rust.
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
Cleora Cleora is a genus of moths in the family Geometridae. Their scientific name derives from the Ancient Greek geo γῆ or γαῖα "the earth", and metr