Repository for CinPatent: Datasets for Patent Classification

Overview

CinPatent: Datasets for Patent Classification

We release two datasets for patent classification in English and Japanese at Google Drive. The data folder structure upon extracted is described in section Structure.

Data description

Each data file is a .ndjson in which each line describes a sample in json format with following attributes.

Field Data type Meaning
id string Patent ID
title string Patent title
abstract string Patent abstract
claim_1 string First claim from patent claims
claims string All patent claims
description string Patent description
is_train boolean Whether the sample is for training
is_dev boolean Whether the sample is for development
is_test boolean Whether the sample is for testing

We partition data with ratio 80:10:10 for training, development, and testing. Following table provides several statistics of our datasets.

CinPatent-EN CinPatent-JA
no. samples 45,131 54,657
no. labels 425 523
no. samples/label 221.69 ± 38.56 226.94 ± 41.74
no. labels/sample 2.09 ± 1.31 2.17 ± 1.32

Structure

The datasets are available with multiple ratios: 10%, 25%, 50%, 75%, and 100% (in en_patent and ja_patent).

cinpatent
├── en_patent
│   ├── en.ndjson
│   └── segmentation
│       ├── en_0.1.ndjson
│       ├── en_0.25.ndjson
│       ├── en_0.5.ndjson
│       └── en_0.75.ndjson
└── ja_patent
    ├── ja.ndjson
    └── segmentation
        ├── ja_0.1.ndjson
        ├── ja_0.25.ndjson
        ├── ja_0.5.ndjson
        └── ja_0.75.ndjson

Contact

For further support, please contact us at [email protected].

You might also like...
🌲 Open the current remote repository in your browser
🌲 Open the current remote repository in your browser

gitweb Some of the flags and options are subject to change in the future. Ideas are welcome. Ideas are bulletproof (V). gitweb is a command line inter

A parallel universal-ctags wrapper for git repository

ptags A parallel universal-ctags wrapper for git repository Description ptags is a universal-ctags wrapper to have the following features. Search git

Repository for the Rust Language Server (aka RLS)

Rust Language Server (RLS) The RLS provides a server that runs in the background, providing IDEs, editors, and other tools with information about Rust

Cross-platform Rust wrappers for the USB ID Repository

usb-ids Cross-platform Rust wrappers for the USB ID Repository. This library bundles the USB ID database, allowing platforms other than Linux to query

Estimate the amount of time spent working on a Git repository

jikyuu (時給) A tool to estimate the amount of time spent working on a Git repository. It is a direct port of git-hours, written in Node.js, because the

 Archer - a repository builder for ArchLinux
Archer - a repository builder for ArchLinux

Archer - a repository builder for ArchLinux This project is at a very early stage. Current Progress Naive Dependency Resolving Todos dependency resolv

Repository for Public Impervious Releases

Impervious Releases This is the repository for impervious releases and supporting files and documentation. Binaries The binaries are now released and

Official Repository for the InvArch platform.
Official Repository for the InvArch platform.

InvArch The Future of Innovation The world’s first intellectual property tokenization & networking platform. Official Repository for the InvArch platf

A repository containing dozens of projects requiring vastly different skillsets.
A repository containing dozens of projects requiring vastly different skillsets.

The 100 Project Challenge A repository containing dozens of projects requiring vastly different skillsets. All the projects that I might add to this r

Clean up the lines of files in your code repository

lineman Clean up the lines of files in your code repository NOTE: While lineman does have tests in place to ensure it operates in a specific way, I st

This repository is an experimental WebAssembly build of the [ymfm] Yamaha FM sound cores library.
This repository is an experimental WebAssembly build of the [ymfm] Yamaha FM sound cores library.

This repository is an experimental WebAssembly build of the [ymfm] Yamaha FM sound cores library.

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪
An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

Repository containing assets for the Holium CLI tutorial

Welcome to getting-started 👋 In this repository you can find all necessary assets for the Holium CLI tutorial 🏠 Homepage The tutorial that reference

This repository contains the source of "The Rust Programming Language" book.

The Rust Programming Language This repository contains the source of "The Rust Programming Language" book. The book is available in dead-tree form fro

A repository for showcasing my knowledge of the Rust programming language, and continuing to learn the language.

Learning Rust I started learning the Rust programming language before using GitHub, but increased its usage afterwards. I have found it to be a fast a

Repository with my Advent of Code 2021 puzzle solutions 🎄
Repository with my Advent of Code 2021 puzzle solutions 🎄

🎄 Advent of Code 2021 🎄 I decided to stick with Rust this year and try to improve a bit on it, I basically haven't used it since last year's AoC, so

Telegram bot for searching in Arch User Repository ( AUR ); Implemented using rust.

AurSearchBot A Telegram Inline Search Bot Written in Rust Introduction Telegram Bot that can search AUR ( Arch User Repository ) in inline mode. This

GRM — Git Repository Manager

GRM helps you manage git repositories in a declarative way. Configure your repositories in a TOML file, GRM does the rest.

This is the repository with the tutorials of Learning Rust series in @Leticia-maria Youtube channel
This is the repository with the tutorials of Learning Rust series in @Leticia-maria Youtube channel

Rust Tutorials This repository contains the information of Learning Rust playlist in my youtube channel. Learning Rust(part. 1)! Installation on Ubunt

Owner
Cinnamon
Cinnamon
An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification

Omikuji An efficient implementation of Partitioned Label Trees (Prabhu et al., 2018) and its variations for extreme multi-label classification, writte

Tom Dong 73 Nov 7, 2022
This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by using the ndarray library.

Kalman filter and RTS smoother in Rust (ndarray) This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by usin

SPDEs 3 Dec 1, 2022
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
shavee is a Program to automatically decrypt and mount ZFS datasets using Yubikey HMAC as 2FA or any USB drive with support for PAM to auto mount home directories.

shavee is a simple program to decrypt and mount encrypted ZFS user home directories at login using Yubikey HMAC or a Simple USB drive as 2FA written in rust.

Ashutosh Verma 38 Dec 24, 2022
Display ZFS datasets' I/O in real time

ztop Display ZFS datasets' I/O in real time Overview ztop is like top, but for ZFS datasets. It displays the real-time activity for datasets. The buil

Alan Somers 40 Nov 23, 2022
Create full-fledged APIs for static datasets without writing a single line of code.

ROAPI ROAPI automatically spins up read-only APIs for static datasets without requiring you to write a single line of code. It builds on top of Apache

null 2.5k Dec 31, 2022
Simple neural network library for classification written in Rust.

Cogent A note I continue working on GPU stuff, I've made some interesting things there, but ultimately it made me realise this is far too monumental a

Jonathan Woollett-Light 41 Dec 25, 2022
🦀 A Rust implementation of a RoBERTa classification model for the SNLI dataset

RustBERTa-SNLI A Rust implementation of a RoBERTa classification model for the SNLI dataset, with support for fine-tuning, predicting, and serving. Th

AI2 11 Oct 17, 2022
An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification

Omikuji An efficient implementation of Partitioned Label Trees (Prabhu et al., 2018) and its variations for extreme multi-label classification, writte

Tom Dong 73 Nov 7, 2022
Blazingly fast spam classification API built using Rocket Web Framework.

Telegram Antispam API Blazingly fast spam classification API built using Rocket Web Framework. Notes The classifier works in aggressive mode, it can s

Akshay Rajput 13 May 5, 2023