The fastest way to identify any mysterious text or analyze strings from a file, just ask `lemmeknow` !

Swanand Mulay

Last update: Dec 30, 2022

Related tags

Text processing rust cli cryptography regex rust-lang cybersecurity rust-crate pywhat

Overview

The fastest way to identify anything

lemmeknow ⚡

Identify any mysterious text or analyze strings from a file, just ask lemmeknow.

lemmeknow can be used for identifying mysterious text or to analyze hard-coded strings from captured network packets, malwares, or just about anything, for identifying

All URLs
Emails
Phone numbers
Credit card numbers
Cryptocurrency addresses
Social Security Numbers
and much more.

🧰 Usage

If you have the executable, then just pass TEXT or /PATH/TO/FILE as argument e.g. lemmeknow secrets.pcap and it will determine if the argument is a file or just a text and then perform analysis accordingly!

If you want output in JSON, then pass --json, e.g. lemmeknow UC11L3JDgDQMyH8iolKkVZ4w --json

🔭 Installation

Download executable 📈

You can directly download executable and run it. No need for any installation.

Check releases here.

Using `cargo` 🦀

cargo install lemmeknow

Build it from source 🎯

Clone repository

git clone https://github.com/swanandx/lemmeknow && cd lemmeknow

then build and run

cargo run e.g. cargo run -- [OPTIONS]

cargo build --release
cd target/release/
./lemmeknow e.g. ./lemmeknow [OPTIONS]

🙀 API

Want to use this as a crate in your project? or make a web api for it? No worries! Just add a entry in your Cargo.toml

[dependencies]
lemmeknow = "0.1.0"

[dependencies]
lemmeknow = { git = "https://github.com/swanandx/lemmeknow" }

Refer to documentation for more info.

🚧 Contributing

You can contribute by adding new regex, improving current regex, improving code performance or fixing minor bugs! Just open a issue or submit a PR.

Acknowledgement

This project is inspired by PyWhat! Thanks to developer of it for the awesome idea <3 .

Comments

Unable to identify base64

Hi, I'm trying lemmeknow and I gotta say that it works quite well with links (e.g. YouTube channels, wallets and so on), but it misses to detect the easiest things. For example it is unable to recognize a base64 encoded text.

Nice work though.

opened by thelicato 3
Allow querying the online lemmeknow by URL

When opening the lemmeknow webpage with a URL such as this: https://swanandx.github.io/lemmeknow-frontend/?q=search+term It should use this as input and try to figure out what the search term could be.

Why?

Browser Search engines.

With this feature you could register lemmeknow as a search engine for your browser. You could (for example) use the alias lmk to search lemmeknow.

Then typing lmk dQw4w9WgXcQ would lead you to find out what the ID stands for.

opened by DrRuhe 1
Use bytes instead of strings, ditch fancy_regex for regex crate

Currently lemmeknow uses the fancy_regex crate for matching regex. The problem is that it doesn't support bytes. The regex crate, however does: https://docs.rs/regex/1.0.0/regex/bytes/index.html

If there is no reason to use fancy_regex then we should switch. Both pyWhat and lemmeknow only support ASCII strings. We need to support UTF-8, UTF-16, etc. and bytes.

See this equivalent pyWhat issue: https://github.com/bee-san/pyWhat/issues/34

opened by SkeletalDemise 1
Add some benchmarks!
This project uses same regex database as PyWhat , we need some performance benchmarks against it <3 !

For identifying single text

For analyzing strings from a file

Calling function multiple times through API i.e. lemmeknow::what_is("text here") for lemmeknow and Identifier.identify("text") for pyWhat.

API documentations available here 😸 :-

lemmeknow

pyWhat

documentation good first issue
opened by swanandx 1
Add support for filtering output

We have entries for Rarity and Tags in database. We need to implement a filter so that user can filter based on rarity and/or tags.

For example, lemmeknow --rarity 0.2:0.6 --tags Credentials TEXT.

making it a module will be nice idea 😄 src/output/filter.rs
enhancement

opened by swanandx 1
Add Nix package instructions

Nix package is added in https://github.com/NixOS/nixpkgs/pull/194268 and I added myself as a maintainer in https://github.com/NixOS/nixpkgs/pull/196953.

opened by Br1ght0ne 0

Show Exploits in cli output

We have Exploit for some identifications. It would be great if we could show them if user passed -v i.e. verbose flag on cli.

{
      "Name": "Mailchimp API Key",
      ...
      "Description": null,
      "Exploit": "Use the command below to verify that the API key is valid (substitute <dc> for your datacenter, i. e. us5):\n  $ curl --request GET --url 'https://<dc>.api.mailchimp.com/3.0/' --user 'anystring:API_KEY_HERE' --include\n",
      "Rarity": 0.8,
      "URL": null,
     ...
   },

opened by swanandx 0

need some tests to validate the regexes from JSON file

The regex.json file also have Examples for some regexes,

{
      "Name": "Capture The Flag (CTF) Flag",
      "Regex": "(?i)^(flag\\{.*\\}|ctf\\{.*\\}|ctfa\\{.*\\})$",
      "plural_name": false,
      "Description": null,
      "Rarity": 1,
      "URL": null,
      "Tags": [
         "CTF Flag"
      ],
      "Examples": {
         "Valid": [
            "FLAG{hello}"
         ],
         "Invalid": []
      }
   },

We need to check if the regex is matching those examples correctly to validate it! For that, we can create a file under tests like lemmeknow/tests/validate_regexes.rs and just parse the JSON file and validate it there.

opened by swanandx 0

Optimize release builds & add CI checks

Add some more fields to Cargo.toml to optimize release builds more. Reference: https://nnethercote.github.io/perf-book/build-configuration.html

Also adds CI checks that run cargo check, cargo test, cargo fmt, and cargo clippy

opened by SkeletalDemise 0
Identifies JWT as LTC/Ripple/BCH-Wallet-Adress

Input: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Output: Found Possible Identifications :) | Matched text | Identified as | Description | |--------------------|--------------------|-----------------| | eyJhbGciOiJIUzI1..._adQssw5c | Litecoin (LTC) Wallet Address | URL: https://live.blockcypher.com/ltc/address/eyJhbGciOiJIUzI1..._adQssw5c |

Expected "Identified as" would be JWT. Shortened the JWT for better visibility.

Another example:

Input: eyJhbGciOiJIUzI1NiJ9.eyJmb28iOiJiYXIifQ.JTvQIxZOL_-00JdKfTAEmhV-a6KUlB6OUWM8NuN7MN8 Output: No Possible Identifications :(

Expected "Identified as" would be JWT.

opened by tolik518 1
There are no unit tests

I am debugging if my program is broken or if LemmeKnow is broken, there is no unit tests in LemmeKnow so I cannot prove it works. Please add unit tests for the API :)

opened by bee-san 1

Use `&str` for tags

    /// Only include the Data which have at least one of the specified `tags`
    pub tags: Vec<String>,
    /// Only include Data which doesn't have any of the `excluded_tags`
    pub exclude_tags: Vec<String>,

String is a growable buffer, we do not expect them to grow so we should use str

opened by bee-san 0

calculate min and max length for regex, if any
TryHackMe flag will be min 3 ( as it must have thm in it ) and there is no max limit.

YouTube Video ID will be 11 characters long, ( n92YrzELBJU )

We need a list with this for every regex.

if there is no fixed length, put *

if min but no max, use 3-*

can only be 8 or 10, use 8/10

exact length 11, use 11

Many regex identify text which have fixed size range, if we filter based on it first, we might optimize our algorithm. Suggestions are welcomed for other ways to optimize.
opened by swanandx 0
rewrite regex without using look-around
Following regex won't compile as regex crate doesn't support look-around.

If possible, can we rewrite them in such a way that they don't use look-around??

[ ] Internet Protocol (IP) Address Version 6

[ ] Bitcoin (₿) Wallet Address

[ ] American Social Security Number

[ ] Date of Birth

[ ] JSON Web Token (JWT)

[ ] Amazon Web Services Access Key

[ ] Amazon Web Services Secret Access Key

[ ] YouTube Video ID

Update them in src/data/regex.json

PS: You can run following command in repo if you want to see exact syntax error of regex ( or just use regex crate to compile them one by one )

cargo test validate_regex_examples -- --show-output
opened by swanandx 0

Releases(v0.7.0)

v0.7.0(Oct 19, 2022)
lemmeknow v0.7.0 with bytes support!

Being able to match on bytes was a requirement for API users, that is why lemmeknow is now using regex crate instead of fancy_regex to get bytes support.

But there is always a tradeoff, regex crate isn't fancy. It doesn't support look-around.

Below is the list of regex which won't compile because of using look-around.

- Internet Protocol (IP) Address Version 6 - Bitcoin (₿) Wallet Address - American Social Security Number - Date of Birth - JSON Web Token (JWT) - Amazon Web Services Access Key - Amazon Web Services Secret Access Key - YouTube Video ID

There are other changes for hopefully performance improvement like parsing regex.json file at compile time.

What's Changed

new bytes API with regex crate by @swanandx in https://github.com/swanandx/lemmeknow/pull/9

add validate_regexes.rs by @trasua in https://github.com/swanandx/lemmeknow/pull/21

New Contributors

@trasua made their first contribution in https://github.com/swanandx/lemmeknow/pull/21

Full Changelog: https://github.com/swanandx/lemmeknow/compare/v0.6.0...v0.7.0
Source code(tar.gz)
Source code(zip)
lemmeknow-linux(1.80 MB)
lemmeknow-macos(1.55 MB)
lemmeknow-windows.exe(1.47 MB)
v0.6.0(Jul 30, 2022)
What's new?

first_match() to get the first identification - docs

let identifier = lemmeknow::Identifier::default(); let some_result = identifier.first_match("8888888888"); let not_gonna_find = identifier.first_match("a friend for swanandx"); assert_eq!(some_result.unwrap().data.name, "Phone Number"); assert!(not_gonna_find.is_none());

what else? well actually, we got old stuff back!

Breaking Changes

Identify is now Identifier ( makes more sense ).

min_rarity and max_rarity are now f32 instead of Option<f32> ( let me know if Option was better )

Bug Fix for API users: We used to call build_regexes inside identify, which kept building regexes everytime we call identify making it slow. Instead of this, now regexes are built and stored in lazy static. Which means they are compiled only when first accessed.
Source code(tar.gz)
Source code(zip)
lemmeknow-linux(2.04 MB)
lemmeknow-macos(1.80 MB)
lemmeknow-windows.exe(1.73 MB)
v0.5.0(Jul 12, 2022)

lemmeknow now supports webassembly! It can be compiled to wasm <3 You can directly use it as dependency in your project for wasm and it works out of the box! For example, here is fronted made using Yew - link .
Source code(tar.gz)
Source code(zip)
lemmeknow-linux(2.02 MB)
lemmeknow-macos(1.79 MB)
lemmeknow-windows.exe(1.71 MB)
v0.4.1(Mar 31, 2022)

Source code(tar.gz)
Source code(zip)
lemmeknow-linux(1.96 MB)
lemmeknow-macos(1.74 MB)
lemmeknow-windows.exe(1.66 MB)
v0.4.0(Mar 27, 2022)
What's new?

Made some changes to API

Dependencies for CLI (clap and comfy-table) are now optional, reducing the dependencies when we use lemmeknow as API!

Fixed the bug where panic occurred when directory was supplied instead of a file.

Reduced binary size for Linux and MacOS by working on release profile in Cargo.toml

Added benchmarks against pyWhat.

Performance improvements !?

etc.

Source code(tar.gz)
Source code(zip)
lemmeknow-linux(1.96 MB)
lemmeknow-macos(1.74 MB)
lemmeknow-windows.exe(1.66 MB)
v0.3.0(Mar 9, 2022)

Source code(tar.gz)
Source code(zip)
lemmeknow-linux(3.58 MB)
lemmeknow-macos(2.06 MB)
lemmeknow-windows.exe(1.61 MB)
v0.2.0(Dec 27, 2021)

Source code(tar.gz)
Source code(zip)
lemmeknow-linux(3.59 MB)
lemmeknow-macos(2.00 MB)
lemmeknow-windows.exe(1.56 MB)
v0.1.0(Aug 29, 2021)

Source code(tar.gz)
Source code(zip)
lemmeknow-linux(5.75 MB)
lemmeknow-macos(2.63 MB)
lemmeknow-windows(2.04 MB)