Repository for CinPatent: Datasets for Patent Classification

Cinnamon

Last update: Jan 2, 2023

Related tags

Machine learning CinPatent

Overview

CinPatent: Datasets for Patent Classification

We release two datasets for patent classification in English and Japanese at Google Drive. The data folder structure upon extracted is described in section Structure.

Data description

Each data file is a .ndjson in which each line describes a sample in json format with following attributes.

Field	Data type	Meaning
id	string	Patent ID
title	string	Patent title
abstract	string	Patent abstract
claim_1	string	First claim from patent claims
claims	string	All patent claims
description	string	Patent description
is_train	boolean	Whether the sample is for training
is_dev	boolean	Whether the sample is for development
is_test	boolean	Whether the sample is for testing

We partition data with ratio 80:10:10 for training, development, and testing. Following table provides several statistics of our datasets.

	CinPatent-EN	CinPatent-JA
no. samples	45,131	54,657
no. labels	425	523
no. samples/label	221.69 ± 38.56	226.94 ± 41.74
no. labels/sample	2.09 ± 1.31	2.17 ± 1.32

Structure

The datasets are available with multiple ratios: 10%, 25%, 50%, 75%, and 100% (in en_patent and ja_patent).

cinpatent
├── en_patent
│   ├── en.ndjson
│   └── segmentation
│       ├── en_0.1.ndjson
│       ├── en_0.25.ndjson
│       ├── en_0.5.ndjson
│       └── en_0.75.ndjson
└── ja_patent
    ├── ja.ndjson
    └── segmentation
        ├── ja_0.1.ndjson
        ├── ja_0.25.ndjson
        ├── ja_0.5.ndjson
        └── ja_0.75.ndjson

Contact

For further support, please contact us at [email protected].

You might also like...

🌲 Open the current remote repository in your browser

🌲 Open the current remote repository in your browser

gitweb Some of the flags and options are subject to change in the future. Ideas are welcome. Ideas are bulletproof (V). gitweb is a command line inter

Yoann Fleury

26 Dec 17, 2022

A parallel universal-ctags wrapper for git repository

ptags A parallel universal-ctags wrapper for git repository Description ptags is a universal-ctags wrapper to have the following features. Search git

null

107 Dec 30, 2022

Repository for the Rust Language Server (aka RLS)

Rust Language Server (RLS) The RLS provides a server that runs in the background, providing IDEs, editors, and other tools with information about Rust

The Rust Programming Language

3.6k Jan 7, 2023

Cross-platform Rust wrappers for the USB ID Repository

usb-ids Cross-platform Rust wrappers for the USB ID Repository. This library bundles the USB ID database, allowing platforms other than Linux to query

William Woodruff

18 Dec 14, 2022

Estimate the amount of time spent working on a Git repository

jikyuu (時給) A tool to estimate the amount of time spent working on a Git repository. It is a direct port of git-hours, written in Node.js, because the

null

18 Nov 16, 2022

Archer - a repository builder for ArchLinux

Archer - a repository builder for ArchLinux

Archer - a repository builder for ArchLinux This project is at a very early stage. Current Progress Naive Dependency Resolving Todos dependency resolv

LightQuantum

13 Mar 9, 2022

Repository for Public Impervious Releases

Impervious Releases This is the repository for impervious releases and supporting files and documentation. Binaries The binaries are now released and

null

67 Dec 20, 2022

Official Repository for the InvArch platform.

Official Repository for the InvArch platform.

InvArch The Future of Innovation The world’s first intellectual property tokenization & networking platform. Official Repository for the InvArch platf

InvArch

29 Jan 4, 2023

A repository containing dozens of projects requiring vastly different skillsets.

A repository containing dozens of projects requiring vastly different skillsets.

The 100 Project Challenge A repository containing dozens of projects requiring vastly different skillsets. All the projects that I might add to this r

null

4 Jun 21, 2022

Clean up the lines of files in your code repository

lineman Clean up the lines of files in your code repository NOTE: While lineman does have tests in place to ensure it operates in a specific way, I st

Joseph T. Lyons

4 Nov 25, 2021

This repository is an experimental WebAssembly build of the [ymfm] Yamaha FM sound cores library.

This repository is an experimental WebAssembly build of the [ymfm] Yamaha FM sound cores library.

This repository is an experimental WebAssembly build of the [ymfm] Yamaha FM sound cores library.

hiromasa

36 Dec 25, 2022

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

Memgraph

40 Dec 20, 2022

Repository containing assets for the Holium CLI tutorial

Welcome to getting-started 👋 In this repository you can find all necessary assets for the Holium CLI tutorial 🏠 Homepage The tutorial that reference

Polyphene

2 Mar 11, 2022

This repository contains the source of "The Rust Programming Language" book.

The Rust Programming Language This repository contains the source of "The Rust Programming Language" book. The book is available in dead-tree form fro

The Rust Programming Language

11.2k Jan 8, 2023

A repository for showcasing my knowledge of the Rust programming language, and continuing to learn the language.

Learning Rust I started learning the Rust programming language before using GitHub, but increased its usage afterwards. I have found it to be a fast a

Sean P. Myrick V19.1.7.2

2 Nov 8, 2022

Repository with my Advent of Code 2021 puzzle solutions 🎄

Repository with my Advent of Code 2021 puzzle solutions 🎄

🎄 Advent of Code 2021 🎄 I decided to stick with Rust this year and try to improve a bit on it, I basically haven't used it since last year's AoC, so

fratorgano

2 Dec 1, 2022

Telegram bot for searching in Arch User Repository ( AUR ); Implemented using rust.

AurSearchBot A Telegram Inline Search Bot Written in Rust Introduction Telegram Bot that can search AUR ( Arch User Repository ) in inline mode. This

AlenPaulVarghese

3 Feb 15, 2022

GRM — Git Repository Manager

GRM helps you manage git repositories in a declarative way. Configure your repositories in a TOML file, GRM does the rest.

Hannes Körber

32 Dec 30, 2022

This is the repository with the tutorials of Learning Rust series in @Leticia-maria Youtube channel

This is the repository with the tutorials of Learning Rust series in @Leticia-maria Youtube channel

Rust Tutorials This repository contains the information of Learning Rust playlist in my youtube channel. Learning Rust(part. 1)! Installation on Ubunt

Letícia Maria Pequeno Madureira

10 May 24, 2022

Owner

Cinnamon

Cinnamon

GitHub

An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification

Omikuji An efficient implementation of Partitioned Label Trees (Prabhu et al., 2018) and its variations for extreme multi-label classification, writte

Tom Dong

73 Nov 7, 2022

This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by using the ndarray library.

Kalman filter and RTS smoother in Rust (ndarray) This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by usin

SPDEs

3 Dec 1, 2022

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null

294 Dec 23, 2022

shavee is a Program to automatically decrypt and mount ZFS datasets using Yubikey HMAC as 2FA or any USB drive with support for PAM to auto mount home directories.

shavee is a simple program to decrypt and mount encrypted ZFS user home directories at login using Yubikey HMAC or a Simple USB drive as 2FA written in rust.

Ashutosh Verma

38 Dec 24, 2022

Display ZFS datasets' I/O in real time

ztop Display ZFS datasets' I/O in real time Overview ztop is like top, but for ZFS datasets. It displays the real-time activity for datasets. The buil

Alan Somers

40 Nov 23, 2022

Create full-fledged APIs for static datasets without writing a single line of code.

ROAPI ROAPI automatically spins up read-only APIs for static datasets without requiring you to write a single line of code. It builds on top of Apache

null

2.5k Dec 31, 2022

Simple neural network library for classification written in Rust.

Cogent A note I continue working on GPU stuff, I've made some interesting things there, but ultimately it made me realise this is far too monumental a

Jonathan Woollett-Light

41 Dec 25, 2022

🦀 A Rust implementation of a RoBERTa classification model for the SNLI dataset

RustBERTa-SNLI A Rust implementation of a RoBERTa classification model for the SNLI dataset, with support for fine-tuning, predicting, and serving. Th

AI2

11 Oct 17, 2022

An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification

Omikuji An efficient implementation of Partitioned Label Trees (Prabhu et al., 2018) and its variations for extreme multi-label classification, writte

Tom Dong

73 Nov 7, 2022

Blazingly fast spam classification API built using Rocket Web Framework.

Telegram Antispam API Blazingly fast spam classification API built using Rocket Web Framework. Notes The classifier works in aggressive mode, it can s

Akshay Rajput

13 May 5, 2023