A fast, searchable, knowledge engine using various machine learning models to aggregate based on importance, association and relevance

Overview

NewsAggregator

We live in an era where both the demand and quantity of information are enormous. However, the way we store and access that information has remained unchanged, via keywords. Given such a direct association, it is hard to algorithmically determine the relationship between pieces of information.

So we want to introduce a knowledge engine to work with data inference within a given search. Wolfram Alpha does just this, but limited to mathematical context. We will branch beyond this into many subjects and topics including news and knowledge databases.

Thus, we are building a knowledge engine that will utilise machine learning models to show and find information related to a query. Users will be able to search for information that has better association, allowing them the ability to more efficiently learn and expand on a topic

Overview

Setting up the components of News Aggregator is suprisingly straightforward. We rely on Dockerized services in order to ensure ease of use, modularity, reliability and configurability. Each of the Redis, Elasticsearch, Frontend and Backend containers have portions of the repository shared onto this via mounted volumes, including configuration.

From the get go, there isn’t any additional configuration properties to alter, its mostly ensuring the required tooling is installed.

Dependencies

In order to start up the services and work on them there is a few bits a pieces to install. Off the bat you will need to have the following installed:

Tool Download
Git This should already by installed, but if it isn’t then you can find installation instructions here:
https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
Docker Mac: https://docs.docker.com/docker-for-mac/install/
Windows: https://docs.docker.com/docker-for-windows/install/
Linux: https://docs.docker.com/engine/install/ubuntu/
Python 3.8 https://www.python.org/downloads/release/python-3810/
Node JS https://nodejs.org/en/download/
An IDE and/or text editor such as:
* VSCode
* IntelliJ
* Sublime Text
* Atom
* PyCharm
* etc
* https://code.visualstudio.com/
* https://www.sublimetext.com/
* https://atom.io/
* https://www.jetbrains.com/idea/
* https://www.jetbrains.com/pycharm/
Ansible [Optional] We use ansible for the deployment system. If you are not working on it then you can skip installing it.
However here is the installation page if need be:
https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html
Rust + Cargo (Rust CLI) [Optional] If you are working on the Redis modules, then you’ll need to install Rust and it's Cargo cli tool.
You can do so with the following instructions:
https://www.rust-lang.org/tools/install
https://doc.rust-lang.org/cargo/getting-started/installation.html

Once you have those installed, we can move on to setting up the repo itself.

Repository Local setup

You’ll first want to clone down the repo onto your local machine by using the following command:

git clone https://github.com/EngineersBox/NewsAggregator.git

Frontend

If you are looking to work on the frontend portion of the repo, then you’ll want to head over to the frontend directory.

  1. Run npm i to install all the relevant npm packges
  2. In a seperate terminal window/tab run npm run serve to start the local development server
  3. Open up your IDE or text editor of choice in the frontend directory

Backend

Running the backend flask API is also straightforward. You’ll need to be in the top level NewsAggregator directory for this.

  1. Run python3 -m pip install -r requirements.txt to setup all the dependencies
  2. Install the SpaCy transform with python3 -m spacy download en_core_web_trf
  3. Ensure you have execute permissions on the start script with sudo chmod +x ./run_server.sh
  4. Run the start script with ./run_server.sh

Dockerized Services

Even when developing for Redis or Elasticsearch, we recommend running them in their Dockerized states. This alleviates having to configure the environment for them to run in.

On the other hand, if you are developing the Dockerized platform, testing deployable situations or otherwise this also applies. Note that the containers will be deployed with the host network attached, allowing you to directly access the ports for services such as Elasticsearch as if they were running on your regular local environment (E.g. localhost:9200)

All of the services can be ran with already define compose files. These can be found in the compose/* directories, depending on the component you are looking for, there will be subdirectories to account for this.

You can start any of these services as containers by running docker compose up -d or docker-compose up -d depending on your version of docker and support.

Comments
  • UIUX 20 Configure OpenSearch for better ingestion

    UIUX 20 Configure OpenSearch for better ingestion

    • Configuration is updated to handle asynchronous interaction (threaded)
    • Node count is increased from 1 to 7
      • 1 Master
      • 1 Coordinator
      • 3 data/ingest
      • 1 Dashboard
    • Network configured to allow access via localhost:<PORT>
    enhancement elasticsearch 
    opened by EngineersBox 2
  • Redis 44 create docker deployment

    Redis 44 create docker deployment

    Changelog

    • Added Ansible workflow
    • Added Ansible playbook for deploying services
    • Fixed SSL issues
    • Merged in @RongXin code for rate limiter
    • Added missing tests for resolver in KeyedRateLimiter module
    bug documentation enhancement rate-limiter cuckoo-filter deployment 
    opened by EngineersBox 2
  • UIUX-73 Fix API docker configuration with autocomplete init

    UIUX-73 Fix API docker configuration with autocomplete init

    ChangeLog

    • Fixed python dependencies
    • Fixed duration wrapper to modify response body rather than object itself
    • Improved gunicorn setup to be more verbose for debugging
    deployment API 
    opened by EngineersBox 1
  • UIUX-74 New box deployment for caddy and frontend

    UIUX-74 New box deployment for caddy and frontend

    Changelog

    • Added new ansible inventory, now two exist as inventory/old.ini and inventory/prod.ini
    • Added new GH workflow input to specify which inventory to use (passed to -i with ansible-playbook command)
    • Dockerized caddy deployment with gateway passthrough for API and frontend access
    • Dockerized frontend via npm's http-server lib to serve on port 3002
    • Refactored playbook to target new frontend build instead of old
    • Refactored user pass provide to switch based on workflow target
    • Refactored mentions of elasticsearch toopensearch`
    • Added missing containers for each opensearch node
    • Fixed dockerfile contexts
    • Refactored backend dockerfile to have stricter copy-in for reduced size
    enhancement deployment 
    opened by EngineersBox 1
  • Draft:UIUX-53 add route link

    Draft:UIUX-53 add route link

    • Added URL Links to be reflective of the search type and the query
    • Make the search input reflect the changes made to the URL query
    • Slight modification of Res.js file to respond to the URL parameters rather than passed values
    frontend 
    opened by WeiXinFam 1
Owner
EngineersBox
I do stuff and things. Yep, that's all.
EngineersBox
SQLite Extension adding various hashing functions like MD5, SHA1, SHA256, SHA512, etc.

sqlite-hashes Use this crate to add various hash functions to SQLite, including MD5, SHA1, SHA256, and SHA512. This crate uses rusqlite to add user-de

Yuri Astrakhan 3 Jul 28, 2023
Incomplete Redis client and server implementation using Tokio - for learning purposes only

mini-redis mini-redis is an incomplete, idiomatic implementation of a Redis client and server built with Tokio. The intent of this project is to provi

Tokio 2.3k Jan 4, 2023
X-Engine: A SQL Engine built from scratch in Rust.

XNGIN (pronounced "X Engine") This is a personal project to build a SQL engine from scratch. The project name is inspired by Nginx, which is a very po

Jiang Zhe 111 Dec 15, 2022
Rust client for Timeplus Proton, a fast and lightweight streaming SQL engine

Rust Client for Timeplus Proton Rust client for Timeplus Proton. Proton is a streaming SQL engine, a fast and lightweight alternative to Apache Flink,

Timeplus 4 Feb 27, 2024
Owlyshield is an open-source AI-driven behaviour based antiransomware engine written in Rust.

Owlyshield (mailto:[email protected]) We at SitinCloud strongly believe that cybersecurity products should always be open-source: Critical decis

SitinCloud 255 Dec 25, 2022
Rewrite Redis in Rust for evaluation and learning.

Drill-Redis This library has been created for the purpose of evaluating Rust functionality and performance. As such, it has not been fully tested. The

Akira Kawahara 3 Oct 18, 2022
A Rust-based comment server using SQLite and an intuitive REST API.

soudan A Rust-based comment server using SQLite and an intuitive REST API. Soudan is built with simplicity and static sites in mind. CLI usage See sou

Elnu 0 Dec 19, 2022
Distributed SQL database in Rust, written as a learning project

toyDB Distributed SQL database in Rust, written as a learning project. Most components are built from scratch, including: Raft-based distributed conse

Erik Grinaker 4.6k Jan 8, 2023
A simple embedded key-value store written in rust as a learning project

A simple embedded key-value store written in rust as a learning project

Blobcode 1 Feb 20, 2022
Learning Rust by implementing parts of redis.

Redis This is a simple CLI Redis inspired project that supports the GET, SET, and INCR commands. Run it Have rust installed (if you don't, visit rustu

Shahzeb K. 3 Mar 28, 2024
Provides a Rust-based SQLite extension for using Hypercore as the VFS for your databases.

SQLite and Hypercore A Rust library providing SQLite with an virtual file system to enable Hypercore as a means of storage. Contributing The primary r

Jacky Alciné 14 Dec 5, 2022
ReadySet is a lightweight SQL caching engine written in Rust that helps developers enhance the performance and scalability of existing applications.

ReadySet is a SQL caching engine designed to help developers enhance the performance and scalability of their existing database-backed applications. W

ReadySet 1.7k Jan 8, 2023
A high-performance storage engine for modern hardware and platforms.

PhotonDB A high-performance storage engine for modern hardware and platforms. PhotonDB is designed from scratch to leverage the power of modern multi-

PhotonDB 466 Jun 22, 2023
Immutable Ordered Key-Value Database Engine

PumpkinDB Build status (Linux) Build status (Windows) Project status Usable, between alpha and beta Production-readiness Depends on your risk toleranc

null 1.3k Jan 2, 2023
A Toy Query Engine & SQL interface

Naive Query Engine (Toy for Learning) ?? This is a Query Engine which support SQL interface. And it is only a Toy for learn query engine only. You can

谭巍 45 Dec 21, 2022
Bind the Prisma ORM query engine to any programming language you like ❤️

Prisma Query Engine C API Bind the Prisma ORM query engine to any programming language you like ❤️ Features Rust bindings for the C API Static link li

Prisma ORM for community 10 Dec 15, 2022
Bind the Prisma ORM query engine to any programming language you like ❤️

Prisma Query Engine C API Bind the Prisma ORM query engine to any programming language you like ❤️ Features Rust bindings for the C API Static link li

Odroe 6 Sep 9, 2022
SQL/JSON path engine in Rust.

sql-json-path SQL/JSON Path implementation in Rust. ?? Under development ?? Features Compatible with SQL/JSON Path standard and PostgreSQL implementat

RisingWave Labs 3 Nov 22, 2023
Skybase is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and SSL

Skybase The next-generation NoSQL database What is Skybase? Skybase (or SkybaseDB/SDB) is an effort to provide the best of key/value stores, document

Skybase 1.4k Dec 29, 2022