The fastest way to identify any mysterious text or analyze strings from a file, just ask `lemmeknow` !

Overview

The fastest way to identify anything

lemmeknow

Identify any mysterious text or analyze strings from a file, just ask lemmeknow.

lemmeknow can be used for identifying mysterious text or to analyze hard-coded strings from captured network packets, malwares, or just about anything, for identifying

  • All URLs
  • Emails
  • Phone numbers
  • Credit card numbers
  • Cryptocurrency addresses
  • Social Security Numbers
  • and much more.

🧰 Usage

If you have the executable, then just pass TEXT or /PATH/TO/FILE as argument e.g. lemmeknow secrets.pcap and it will determine if the argument is a file or just a text and then perform analysis accordingly!

If you want output in JSON, then pass --json, e.g. lemmeknow UC11L3JDgDQMyH8iolKkVZ4w --json

demo

🔭 Installation

Download executable 📈

You can directly download executable and run it. No need for any installation.

  • Check releases here.

Using cargo 🦀

  • cargo install lemmeknow

Build it from source 🎯

Clone repository

  • git clone https://github.com/swanandx/lemmeknow && cd lemmeknow

then build and run

  • cargo run e.g. cargo run -- [OPTIONS]

OR

  • cargo build --release
  • cd target/release/
  • ./lemmeknow e.g. ./lemmeknow [OPTIONS]

🙀 API

Want to use this as a crate in your project? or make a web api for it? No worries! Just add a entry in your Cargo.toml

[dependencies]
lemmeknow = "0.1.0"

OR

[dependencies]
lemmeknow = { git = "https://github.com/swanandx/lemmeknow" }

Refer to documentation for more info.

🚧 Contributing

You can contribute by adding new regex, improving current regex, improving code performance or fixing minor bugs! Just open a issue or submit a PR.

Acknowledgement

This project is inspired by PyWhat! Thanks to developer of it for the awesome idea <3 .

Comments
  • Unable to identify base64

    Unable to identify base64

    Hi, I'm trying lemmeknow and I gotta say that it works quite well with links (e.g. YouTube channels, wallets and so on), but it misses to detect the easiest things. For example it is unable to recognize a base64 encoded text.

    Nice work though.

    opened by thelicato 3
  • Allow querying the online lemmeknow by URL

    Allow querying the online lemmeknow by URL

    When opening the lemmeknow webpage with a URL such as this: https://swanandx.github.io/lemmeknow-frontend/?q=search+term It should use this as input and try to figure out what the search term could be.

    Why?

    Browser Search engines.

    With this feature you could register lemmeknow as a search engine for your browser. You could (for example) use the alias lmk to search lemmeknow.

    Then typing lmk dQw4w9WgXcQ would lead you to find out what the ID stands for.

    opened by DrRuhe 1
  • Use bytes instead of strings, ditch fancy_regex for regex crate

    Use bytes instead of strings, ditch fancy_regex for regex crate

    Currently lemmeknow uses the fancy_regex crate for matching regex. The problem is that it doesn't support bytes. The regex crate, however does: https://docs.rs/regex/1.0.0/regex/bytes/index.html

    If there is no reason to use fancy_regex then we should switch. Both pyWhat and lemmeknow only support ASCII strings. We need to support UTF-8, UTF-16, etc. and bytes.

    See this equivalent pyWhat issue: https://github.com/bee-san/pyWhat/issues/34

    opened by SkeletalDemise 1
  • Add some benchmarks!

    Add some benchmarks!

    This project uses same regex database as PyWhat , we need some performance benchmarks against it <3 !

    • For identifying single text
    • For analyzing strings from a file
    • Calling function multiple times through API i.e. lemmeknow::what_is("text here") for lemmeknow and Identifier.identify("text") for pyWhat.

    API documentations available here 😸 :-

    documentation good first issue 
    opened by swanandx 1
  • Add support for filtering output

    Add support for filtering output

    We have entries for Rarity and Tags in database. We need to implement a filter so that user can filter based on rarity and/or tags.

    For example, lemmeknow --rarity 0.2:0.6 --tags Credentials TEXT.

    making it a module will be nice idea 😄 src/output/filter.rs

    enhancement 
    opened by swanandx 1
  • Add Nix package instructions

    Add Nix package instructions

    Nix package is added in https://github.com/NixOS/nixpkgs/pull/194268 and I added myself as a maintainer in https://github.com/NixOS/nixpkgs/pull/196953.

    opened by Br1ght0ne 0
  • Show Exploits in cli output

    Show Exploits in cli output

    We have Exploit for some identifications. It would be great if we could show them if user passed -v i.e. verbose flag on cli.

    {
          "Name": "Mailchimp API Key",
          ...
          "Description": null,
          "Exploit": "Use the command below to verify that the API key is valid (substitute <dc> for your datacenter, i. e. us5):\n  $ curl --request GET --url 'https://<dc>.api.mailchimp.com/3.0/' --user 'anystring:API_KEY_HERE' --include\n",
          "Rarity": 0.8,
          "URL": null,
         ...
       },
    
    opened by swanandx 0
  • need some tests to validate the regexes from JSON file

    need some tests to validate the regexes from JSON file

    The regex.json file also have Examples for some regexes,

    {
          "Name": "Capture The Flag (CTF) Flag",
          "Regex": "(?i)^(flag\\{.*\\}|ctf\\{.*\\}|ctfa\\{.*\\})$",
          "plural_name": false,
          "Description": null,
          "Rarity": 1,
          "URL": null,
          "Tags": [
             "CTF Flag"
          ],
          "Examples": {
             "Valid": [
                "FLAG{hello}"
             ],
             "Invalid": []
          }
       },
    

    We need to check if the regex is matching those examples correctly to validate it! For that, we can create a file under tests like lemmeknow/tests/validate_regexes.rs and just parse the JSON file and validate it there.

    opened by swanandx 0
  • Optimize release builds & add CI checks

    Optimize release builds & add CI checks

    Add some more fields to Cargo.toml to optimize release builds more. Reference: https://nnethercote.github.io/perf-book/build-configuration.html

    Also adds CI checks that run cargo check, cargo test, cargo fmt, and cargo clippy

    opened by SkeletalDemise 0
  • Identifies JWT as LTC/Ripple/BCH-Wallet-Adress

    Identifies JWT as LTC/Ripple/BCH-Wallet-Adress

    Input: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

    Output: Found Possible Identifications :) | Matched text | Identified as | Description | |--------------------|--------------------|-----------------| | eyJhbGciOiJIUzI1..._adQssw5c | Litecoin (LTC) Wallet Address | URL: https://live.blockcypher.com/ltc/address/eyJhbGciOiJIUzI1..._adQssw5c |

    Expected "Identified as" would be JWT. Shortened the JWT for better visibility.


    Another example:

    Input: eyJhbGciOiJIUzI1NiJ9.eyJmb28iOiJiYXIifQ.JTvQIxZOL_-00JdKfTAEmhV-a6KUlB6OUWM8NuN7MN8 Output: No Possible Identifications :(

    Expected "Identified as" would be JWT.

    opened by tolik518 1
  • There are no unit tests

    There are no unit tests

    I am debugging if my program is broken or if LemmeKnow is broken, there is no unit tests in LemmeKnow so I cannot prove it works. Please add unit tests for the API :)

    opened by bee-san 1
  • Use `&str` for tags

    Use `&str` for tags

        /// Only include the Data which have at least one of the specified `tags`
        pub tags: Vec<String>,
        /// Only include Data which doesn't have any of the `excluded_tags`
        pub exclude_tags: Vec<String>,
    

    String is a growable buffer, we do not expect them to grow so we should use str

    opened by bee-san 0
  • calculate min and max length for regex, if any

    calculate min and max length for regex, if any

    TryHackMe flag will be min 3 ( as it must have thm in it ) and there is no max limit.

    YouTube Video ID will be 11 characters long, ( n92YrzELBJU )

    We need a list with this for every regex.

    • if there is no fixed length, put *
    • if min but no max, use 3-*
    • can only be 8 or 10, use 8/10
    • exact length 11, use 11

    Many regex identify text which have fixed size range, if we filter based on it first, we might optimize our algorithm. Suggestions are welcomed for other ways to optimize.

    opened by swanandx 0
  • rewrite regex without using look-around

    rewrite regex without using look-around

    Following regex won't compile as regex crate doesn't support look-around.

    If possible, can we rewrite them in such a way that they don't use look-around??

    • [ ] Internet Protocol (IP) Address Version 6
    • [ ] Bitcoin (₿) Wallet Address
    • [ ] American Social Security Number
    • [ ] Date of Birth
    • [ ] JSON Web Token (JWT)
    • [ ] Amazon Web Services Access Key
    • [ ] Amazon Web Services Secret Access Key
    • [ ] YouTube Video ID

    Update them in src/data/regex.json

    PS: You can run following command in repo if you want to see exact syntax error of regex ( or just use regex crate to compile them one by one )

    cargo test validate_regex_examples -- --show-output
    
    opened by swanandx 0
Releases(v0.7.0)
Owner
Swanand Mulay
wannabe 1337_haxxor! CTFs | Programming | Game Dev | And much more...
Swanand Mulay
fastest text uwuifier in the west

uwuify fastest text uwuifier in the west transforms Hey... I think I really love you. Do you want a headpat? into hey... i think i w-weawwy wuv you.

Daniel Liu 1.2k Dec 29, 2022
Ultra-fast, spookily accurate text summarizer that works on any language

pithy 0.1.0 - an absurdly fast, strangely accurate, summariser Quick example: pithy -f your_file_here.txt --sentences 4 --help: Print this help messa

Catherine Koshka 13 Oct 31, 2022
Text Expression Runner – Readable and easy to use text expressions

ter - Text Expression Runner ter is a cli to run text expressions and perform basic text operations such as filtering, ignoring and replacing on the c

Maximilian Schulke 72 Jul 31, 2022
📏 ― Uses the Jaro similarity metric to measure the distance between two strings

distance distance: Uses the Jaro similarity metric to measure the distance between two strings FYI, this was just to test Neon, I do not recommend usi

Demigender 6 Dec 7, 2021
An efficient way to filter duplicate lines from input, à la uniq.

runiq This project offers an efficient way (in both time and space) to filter duplicate entries (lines) from texual input. This project was born from

Isaac Whitfield 170 Dec 24, 2022
A quick way to decode a contract's transaction data with only the contract address and abi.

tx-decoder A quick way to decode a contract's transaction data with only the contract address and abi. E.g, let tx_data = "0xe70dd2fc00000000000000000

DeGatchi 15 Feb 13, 2023
Text calculator with support for units and conversion

cpc calculation + conversion cpc parses and evaluates strings of math, with support for units and conversion. 128-bit decimal floating points are used

Kasper 82 Jan 4, 2023
Find and replace text in source files

Ruplacer Find and replace text in source files: $ ruplacer old new src/ Patching src/a_dir/sub/foo.txt -- old is everywhere, old is old ++ new is ever

Tanker 331 Dec 28, 2022
An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

Martin Geisler 322 Dec 26, 2022
bottom encodes UTF-8 text into a sequence comprised of bottom emoji

bottom encodes UTF-8 text into a sequence comprised of bottom emoji (with , sprinkled in for good measure) followed by ????. It can encode any valid UTF-8 - being a bottom transcends language, after all - and decode back into UTF-8.

Bottom Software Foundation 345 Dec 30, 2022
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l

Peter M. Stahl 569 Jan 3, 2023
Semantic text segmentation. For sentence boundary detection, compound splitting and more.

NNSplit A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is a

Benjamin Minixhofer 273 Dec 29, 2022
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

Benjamin Minixhofer 496 Jan 8, 2023
A crate using DeepSpeech bindings to convert mic audio from speech to text

DS-TRANSCRIBER Need an Offline Speech To Text converter? Records your mic, and returns a String containing what was said. Features Begins transcriptio

null 32 Oct 8, 2022
Sorta Text Format in UTF-8

STFU-8: Sorta Text Format in UTF-8 STFU-8 is a hacky text encoding/decoding protocol for data that might be not quite UTF-8 but is still mostly UTF-8.

Rett Berg 18 Sep 4, 2022
Source text parsing, lexing, and AST related functionality for Deno

Source text parsing, lexing, and AST related functionality for Deno.

Deno Land 90 Jan 1, 2023
better tools for text parsing

nom-text Goal: a library that extends nom to provide better tools for text formats (programming languages, configuration files). current needs Recogni

null 5 Oct 18, 2022
Font independent text analysis support for shaping and layout.

lipi Lipi (Sanskrit for 'writing, letters, alphabet') is a pure Rust crate that provides font independent text analysis support for shaping and layout

Chad Brokaw 12 Sep 22, 2022
lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike.

lingua-py lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike. Installation pip install l

messense 7 Dec 30, 2022