Explore the WWW and find the shortest path between two HTML documents

Overview

explore

Find shortest path between two web resources.

About

I decided to create this project because some day I started to wonder: In how many clicks can I go from one to another article on Wikipedia?

This app takes two URLs (they don't have to belong to Wikipedia; the only requirement is they have to refer to HTML documents, not any media files) and use BFS algorithm to solve the problem. Each page is downloaded and next level URLs are extracted from it.

At the end of the process app prints the number of clicks (depth of the tree).

Build

git clone https://github.com/qwercik/explore
cd explore
cargo build --release

# To run app type
./target/release/explore <run options...>

Usage

To use app you have to give two URLs as parameters. The first URL is start URL, and the following is end URL.

explore 'https://en.wikipedia.org/wiki/Prolog' 'https://en.wikipedia.org/wiki/Poland'

You can also specify URL --domain regex (e.g. to avoid visiting other sites than Wikipedia).

explore 'https://en.wikipedia.org/wiki/Prolog' 'https://en.wikipedia.org/wiki/Poland' --domain '^pl.wikipedia.org$'

If you would like to see which pages app visits, you can use --verbose option.

explore 'https://en.wikipedia.org/wiki/Prolog' 'https://en.wikipedia.org/wiki/Poland' --verbose

Unfortunately, at the moment app doesn't store URL's tree.

To get know more about supported options use --help.

explore --help

To do

  • better up performance
  • provide an option for storing URL's tree
  • use threads to do some operations paralell
  • create a distributed system for solving more complex instances
You might also like...
Utility to quickly setup Starcraft Broodwar matches between 2 or more bots

BWAIShotgun Utility to quickly setup Starcraft Broodwar matches between 2 or more bots Be aware that all bots will be executed directly, without any l

A Rust proc-macro crate which derives functions to compile and parse back enums and structs to and from a bytecode representation

Bytecode A simple way to derive bytecode for you Enums and Structs. What is this This is a crate that provides a proc macro which will derive bytecode

A library and tool for automata and formal languages, inspired by JFLAP
A library and tool for automata and formal languages, inspired by JFLAP

Sugarcubes is a library and application for automata and formal languages. It is inspired by JFLAP, and is intended to eventually to be an alternative to JFLAP.

A stupid macro that compiles and executes Rust and spits the output directly into your Rust code

inline-rust This is a stupid macro inspired by inline-python that compiles and executes Rust and spits the output directly into your Rust code. There

This is a Discord bot written in Rust to translate to and from the Bottom Encoding Standard using bottom-rs and Serenity.
This is a Discord bot written in Rust to translate to and from the Bottom Encoding Standard using bottom-rs and Serenity.

bottom-bot This is a Discord bot written in Rust to translate to and from the Bottom Encoding Standard using bottom-rs and Serenity. Ever had this pro

An implementation of Code Generation and Factoring for Fast Evaluation of Low-order Spherical Harmonic Products and Squares

sh_product An implementation of Code Generation and Factoring for Fast Evaluation of Low-order Spherical Harmonic Products and Squares (paper by John

lightweight and customizable rust s-expression (s-expr) parser and printer

s-expr Rust library for S-expression like parsing and printing parser keeps track of spans, and representation (e.g. number base) number and decimal d

Crates Registry is a tool for serving and publishing crates and serving rustup installation in offline networks.
Crates Registry is a tool for serving and publishing crates and serving rustup installation in offline networks.

Crates Registry Description Crates Registry is a tool for serving and publishing crates and serving rustup installation in offline networks. (like Ver

Simplify temporary email management and interaction, including message retrieval and attachment downloads, using Rust.

Tempmail The Tempmail simplifies temporary email management and interaction, including message retrieval and attachment downloads, using the Rust prog

Owner
Eryk Andrzejewski
Computer science student and programming geek. Mostly interested in everything
Eryk Andrzejewski
Schema2000 is a tool that parses exsiting JSON documents and tries to derive a JSON schema from these documents.

Schema 2000 Schema2000 is a tool that parses exsiting JSON documents and tries to derive a JSON schema from these documents. Currently, Schema2000 is

REWE Digital GmbH 12 Dec 6, 2022
Insert a new named workspace between two other named workspaces

Insert a new named workspace between two other named workspaces

null 2 Mar 13, 2022
This repository simulates and renders fluid particles in two dimensions, in Rust.

mlsmpm-particles-rs This repository simulates and renders fluid particles in two dimensions, in Rust. My matching implementation in Go is mlsmpm-parti

null 6 Oct 31, 2022
Provides two APIs for easily cancelling futures, with the option to fallback to a timeout cancellation

tokio-context Provides two different methods for cancelling futures with a provided handle for cancelling all related futures, with a fallback timeout

Peter Farr 18 Dec 27, 2022
Pool is a befunge inspired, two-dimensional esolang

Pool is a befunge inspired, two-dimensional esolang

null 1 Nov 1, 2021
A simple and fast FRC autonomous path planner (designed for swerve drive)! (Desktop/Laptop only)

This is a website developed for planning autonomous paths for FRC robots. It is intended to be a simple and fast tool to create autos, which works offline at competitions.

Weaver Goldman 2 Jan 6, 2023
Try to find the correct word with only first letter and unknown letter count

Try to find the correct word with only first letter and unknown letter count

Alexandre 6 Apr 11, 2022
A static mail HTML archive for the 21st century, written in Rust

?? Crabmail ?? self-hosted / github mirror A static mail HTML archive for the 21st century, written in Rust. Includes helpful "modern" features that e

Alex Wennerberg 18 Oct 11, 2022
Find out who is pretending to be offline

Dinkleberg Find out who is pretending to be offline Preview Disclaimer Dinkleberg was developed for educational, private and fair use. I am not respon

oSumAtrIX 40 Dec 28, 2022
Scan all IP nodes of CloudFlare to find the fastest IP node.

中文版 | English ?? Introduction Scan all IP nodes of CloudFlare to find the fastest IP node. ⚡️ Get Started ??️ Build git clone https://github.com/golan

golangboy 47 Mar 19, 2024