A small, memory efficient crawler written in Rust.

Related tags

Command-line atra
Overview

Atra - The smaller way to crawl

!!This read me will we reworked in a few days. Currently I am working on a better version and a wiki for the config files.!!

Atra is a novel web crawling solution, implemented in Rust, designed with the primary goal of scraping websites as comprehensively as possible, while ensuring ease of use and accessibility.

Your first crawl

Download the precompiled executable (coming soon) and run the following command:

  • Windows: ./atra.exe single -s test_crawl -d 2 --absolute https://choosealicense.com
  • Linux: ./atra single -s test_crawl -d 2 --absolute https://choosealicense.com You will then find a folder atra_data on the level of the binary.

Crawling more

  1. Create a file with the name seeds.txt
    • Add a single url per line
    • Put it in the directory with the atra binary
  2. Call ./atra.exe --generate-example-config or ./atra --generate-example-config
    • Modify the values to meet your needs
    • rename them to atra.ini and crawl.yaml
  3. Call ./atra.exe multi --log-to-file file:seeds.txt or ./atra multi --log-to-file file:seeds.txt

How to build?

In order to build Atra you need Rust.

Windows

After installing Rust you need LLVM, with the proper environment paths set.

Linux

After installing rust you need pkg-config, libssl-dev, clang, and llvm in order to compile Atra. You can also use the docker container to build a binary.

Due to the dynamic linking you will need libc6, openssl, and ca-certificates installed on you system.

Why is the crawler named Atra?

The name Atra comes from the Erigone atra, a dwarf spider with a body length of 1.8mm to 2.8mm. Not only do they play a central role in natural pest control in agriculture (aphids), but they are also aerial spiders that can travel long distances by ballooning, also known as kiting.

More fun spider facts can be found on Wikipedia.

You might also like...
An open source, programmed in rust, privacy focused tool for reading programming resources (like stackoverflow) fast, efficient and asynchronous from the terminal.

Falion An open source, programmed in rust, privacy focused tool for reading programming resources (like StackOverFlow) fast, efficient and asynchronou

Schemars is a high-performance Python serialization library, leveraging Rust and PyO3 for efficient handling of complex objects

Schemars Introduction Schemars is a Python package, written in Rust and leveraging PyO3, designed for efficient and flexible serialization of Python c

⚡️Highly efficient data and string formatting library for Rust.

⚡️Highly efficient data and string formatting library for Rust. 🔎 Overview Pad and format string slices and generic vectors efficiently with minimal

A robust, customizable, blazingly-fast, efficient and easy-to-use command line application to uwu'ify your text!
A robust, customizable, blazingly-fast, efficient and easy-to-use command line application to uwu'ify your text!

uwuifyy A robust, customizable, blazingly-fast, efficient and easy-to-use command line application to uwu'ify your text! Logo Credits: Jade Nelson Tab

An efficient pictures manager based on custom tags and file system organization.

PicturesManager An efficient pictures manager based on custom tags and file system organization. Developed with Tauri (web app) with a Rust backend an

Efficient scan conversion of a line segment with clipping to a rectangular window.
Efficient scan conversion of a line segment with clipping to a rectangular window.

✂️ clipline 📏 clipline is a Rust crate for efficient scan conversion of a line segment with clipping to a rectangular window. It is an implementation

A simple to use and efficient Web Automation Tool.

teemo A simple to use and efficient Web Automation Tool. teemo allows you to do some web automation action(such as click and so on) and crawl some inf

A simple and efficient terminal UI implementation with ratatui.rs for getting quick insights from csv files right on the terminal
A simple and efficient terminal UI implementation with ratatui.rs for getting quick insights from csv files right on the terminal

CSV-GREP csv-grep is an intuitive TUI application writting with ratatui.rs for reading, viewing and quickly analysing csv files right on the terminal.

Rusty Shellcode Reflective DLL Injection (sRDI) - A small reflective loader in Rust 4KB in size for generating position-independent code (PIC) in Rust.
Rusty Shellcode Reflective DLL Injection (sRDI) - A small reflective loader in Rust 4KB in size for generating position-independent code (PIC) in Rust.

Shellcode Reflective DLL Injection (sRDI) Shellcode reflective DLL injection (sRDI) is a process injection technique that allows us to convert a given

Owner
Felix Engl
Felix Engl
A CLI tool based on the crypto-crawler-rs library to crawl trade, level2, level3, ticker, funding rate, etc.

carbonbot A CLI tool based on the crypto-crawler-rs library to crawl trade, level2, level3, ticker, funding rate, etc. Run To quickly get started, cop

null 8 Dec 21, 2022
🌊 ~ seaward is a crawler which searches for links or a specified word in a website.

?? seaward Installation cargo install seaward On NetBSD a pre-compiled binary is available from the official repositories. To install it, simply run:

null 3 Jul 16, 2023
Fast & Memory Efficient NodeJs Excel Writer using Rust Binding

FastExcel This project need Rust to be installed, check here for Rust installation instruction This project using Rust and Neon as a binding to Rust t

Aditya Kresna 2 Dec 15, 2022
Shared memory - A Rust wrapper around native shared memory for Linux and Windows

shared_memory A crate that allows you to share memory between processes. This crate provides lightweight wrappers around shared memory APIs in an OS a

elast0ny 274 Dec 29, 2022
A small in-memory filesystem using FUSE.

slabfs A small in-memory filesystem using FUSE. Running Simply run: RUST_LOG="slabfs=trace" cargo r -r -- <mountpoint> To suppress most log messages:

Carlos López 2 Jul 7, 2023
Command line tool for cheap and efficient email automation written in Rust

Pigeon Pigeon is a command line tool for automating your email workflow in a cheap and efficient way. Utilize your most efficient dev tools you are al

null 57 Nov 20, 2022
A fast, efficient emulator for the osu! Bancho protocol written in Rust

rosu. A fast, efficient emulator for the osu! Bancho protocol written in Rust. Setup Git clone rosu, setup your nginx (example config is in the ext fo

James 2 Sep 19, 2022
A fast, efficient osu! beatmap mirror written in asynchronous Rust

A fast, efficient osu! beatmap mirror written in asynchronous Rust. Supports cheesegull, aswell as osu!api v2 formats.

James 4 Oct 28, 2022
General purpose memory allocator written in Rust.

Memalloc Memory allocator written in Rust. It implements std::alloc::Allocator and std::alloc::GlobalAlloc traits. All memory is requested from the ke

Antonio Sarosi 35 Dec 25, 2022
A small command-line application to view images from the terminal written in Rust.

A small command-line application to view images from the terminal written in Rust. It is basically the front-end of viuer

Atanas Yankov 1.9k Jan 3, 2023