AskBend: SQL-based Knowledge Base Search and Completion using Databend

Related tags

Command-line askbend
Overview

AskBend: SQL-based Knowledge Base Search and Completion using Databend

AskBend is a Rust project that utilizes the power of Databend and OpenAI to create a SQL-based knowledge base from Markdown files.

Databend is a cloud-native data warehouse adept at storing and performing vector computations, making it suitable for this use case.

Databend Cloud seamlessly integrates with OpenAI's capabilities, such as embedding generation, cosine distance calculation, and text completion. This integration means you don't need to interact with OpenAI directly; Databend Cloud manages everything.

The project automatically generates document embeddings from the content, enabling users to search and retrieve the most relevant information to their queries using SQL.

SQL-Based means you don't need any OpenAI API knowledge. With the Databend Cloud platform, you can perform these tasks using SQL. Some SQL AI functions of Databend Cloud include:

Overview

The project follows this general process:

  1. Read and parse Markdown files from a directory.
  2. Extract the content and store it in the askbend.doc table.
  3. Compute embeddings for the content using Databend Cloud's built-in AI capabilities, including OpenAI's embedding generation, all through SQL.
  4. When a user queries, generate the query embedding using Databend Cloud's SQL-based ai_embedding_vector function.
  5. Perform a vector calculation to find the most relevant doc.content using Databend Cloud's SQL-based cosine_distance function.
  6. Concatenate the retrieved content and use OpenAI's completion capabilities with Databend Cloud's SQL-based ai_text_completion function.
  7. Output the completion result in Markdown format.

Setup

1. Clone the repository

git clone https://github.com/datafuselabs/askbend
cd askbend

2. Build the project

make setup
make build

3. Create a database in your Databend Cloud

table:

CREATE DATABASE askbend;
USE askbend;

CREATE TABLE doc (path VARCHAR, content VARCHAR, embedding ARRAY(FLOAT32));

4. Modify the configuration file conf/askbend.toml

# Usage:
# askbend -c askbend.toml

[data]
# Path to the directory containing your markdown documents
path = "data/"

[database]
database = "askbend"
table = "doc"
# Data source name (DSN) for connecting to your Databend cloud warehouse
# https://docs.databend.com/using-databend-cloud/warehouses/connecting-a-warehouse
dsn = "databend://<sql-user>:<sql-password>@<your-databend-cloud-warehouse>/default"

[server]
host = "0.0.0.0"
port = 8081

[query]
top = 3
prompt = '''
<your prompt> ... 
Documentation sections:
{{context}}

Question:
{{query}}
'''

5. Prepare your Markdown files by copying them to the data/ directory

6. Parse the Markdown files and build embeddings

./target/release/askbend -c conf/askbend.toml --rebuild

[2023-04-01T07:17:13Z INFO ] Step-1: begin parser all markdown files
[2023-04-01T07:17:14Z INFO ] Step-1: finish parser all markdown files:397, sections:969, tokens:117758
[2023-04-01T07:17:14Z INFO ] Step-2: begin insert to table
[2023-04-01T07:17:14Z INFO ] Step-2: finish insert to table
[2023-04-01T07:17:14Z INFO ] Step-3: begin generate embedding, may take some minutes
[2023-04-01T07:26:03Z INFO ] Step-3: finish generate embedding
... ...

The --rebuild flag rebuilds all the embeddings for the data directory. This process may take a few minutes, depending on the number of Markdown files.

7. Start the API server

./target/release/askbend -c conf/askbend.toml

8. Query your Markdown knowledge base using the API

curl -X POST -H "Content-Type: application/json" -d '{"query": "tell me how to do copy"}' http://localhost:8081/query

Response:

{"result":["\n\nYou can use the `COPY INTO <table>` command to copy data from an internal stage, Amazon S3 bucket, or a remote file into a table in Databend. \n\nFor example, to copy data from an internal stage, you can use the following command:\n\n```\nCOPY INTO <table>\nFROM (\n    SELECT <columns>\n    FROM @<stage>\n    FILE_FORMAT = (TYPE = PARQUET)\n)\n```\n\nFor more information, please refer to the [Tutorial: Load from an internal stage](../../12-load-data/00-stage.md) and [Tutorial: Load from an Amazon S3 bucket](../../12-load-data/01-s3.md) sections in the Databend documentation."]}

AskBend Query API

This API document describes how to use the Databend query API to submit queries and receive results.

Endpoint

http://:8081/query

Request

The request body should be a JSON object containing a single field query, which is the query string.

Example:

{
    "query": "whats the fast way to load data to databend"
}

Response

On successful query execution, the API will return a 200 OK status code, along with a JSON object containing the field result.

The result field is an array of strings. However, we only need to consider the first string in the array as the final result.

The API assumes that if the query was successful, the first item in the result array is the most relevant answer.

You might also like...
This repository presents a numbers vizualizer in a polar base. This small project has been entirely made in Rust !

NumbersRepresentation This repository presents a numbers vizualizer in a polar base. This small project has been entirely made in Rust ! This is an id

A command line tool to control the power state of Valve Base Stations 2.0.

lighthousectl A command line tool to control the power state of Valve Base Stations 2.0. Usage Scan All Base Stations It scans endlessly. You can stop

Mod for Mega Man Battle Network Legacy Collection to restore the WWW base music in the postgame.

MMBNLC Postgame WWW Base Music mod This is a mod for Mega Man Battle Network Legacy Collection Vol. 2 adjusts the field music played in the WWW base i

Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge study group

What is this? This is where I'll be collecting resources related to the Study Group on Dr. Justin Thaler's Proofs Arguments And Zero Knowledge Book. T

A SQL query parser written using nom.

sqlparser-nom A SQL query parser written using nom. Query Select From Where Order by Limit CTE Group by Having Aggregate Window Pratt Parsing Friendly

A tool that allow you to run SQL-like query on local files instead of database files using the GitQL SDK.
A tool that allow you to run SQL-like query on local files instead of database files using the GitQL SDK.

FileQL - File Query Language FileQL is a tool that allow you to run SQL-like query on local files instead of database files using the GitQL SDK. Sampl

 ClangQL is a tool that allow you to run SQL-like query on C/C++ Code instead of database files using the GitQL SDK
ClangQL is a tool that allow you to run SQL-like query on C/C++ Code instead of database files using the GitQL SDK

ClangQL - Clang AST Query Language ClangQL is a tool that allow you to run SQL-like query on C/C++ Code instead of database files using the GitQL SDK.

Terminal based, feature rich, interactive SQL tool

datafusion-tui (dft) DataFusion-tui provides a feature rich terminal application, built with tui-rs, for using DataFusion (and eventually Ballista). I

Pathfinding on grids using jumping point search and connected components.

grid_pathfinding A grid-based pathfinding system. Implements Jump Point Search with improved pruning rules for speedy pathfinding. Pre-computes connec

Comments
  • Data insert failed if path contains `'`

    Data insert failed if path contains `'`

    Insert data will fail if path contains ':

    [2023-04-06T09:20:29Z INFO ] Step-2: begin insert to table
    Error: Query error 1303: Table columns count is not match, expect 2, input: 1, expr: [Literal { span: Some(1..81), lit: String("xxxxx/yyyyy") }]
    
    opened by Xuanwo 0
  • feat: Add AI interactive page

    feat: Add AI interactive page

    Place to put predefined questions

    image

    Request API address

    If it's not the same domain, server-side configuration for cross-origin resource sharing (CORS) is required. image

    opened by Chasen-Zhang 0
Owner
Databend Labs
The Future of Cloud Data Analytics
Databend Labs
Sleek is a CLI tool for formatting SQL. It helps you maintain a consistent style across your SQL code, enhancing readability and productivity.

Sleek: SQL Formatter ✨ Sleek is a CLI tool for formatting SQL. It helps you maintain a consistent style across your SQL code, enhancing readability an

Nick Rempel 40 Apr 20, 2023
Emacs client for ycmd, the code completion system.

This package is currently unmaintained! If you want to take over maintenance, let me know in an issue. emacs-ycmd emacs-ycmd is a client for ycmd, the

Austin Bingham 381 Dec 22, 2022
A code-completion engine for Vim

YouCompleteMe: a code-completion engine for Vim Help, Advice, Support Looking for help, advice or support? Having problems getting YCM to work? First

null 24.5k Dec 31, 2022
call-me-maybe is a small CLI tool to notify you of the completion of a command

call-me-maybe call-me-maybe is a small CLI tool to notify you of the completion of a command By default, the tools consumes stdin for a message's cont

Samuel Yvon 4 Sep 16, 2022
Nushell "extern" definitions for tab completion generated from Fish's

Nushell completions pack This is a system for generating extern defs (tab-completion) in nu. Background The fish shell project has a long, complicated

Scott Boggs 7 Feb 28, 2023
VICTOR: An Arcane Connect Four AI using Ancient Knowledge from the 80s

VICTOR VICTOR is a program based on 'A Knowledge-based Approach of Connect-Four' by Victor Allis. The original program written in C has been lost to h

null 2 Jan 6, 2022
Base 32 + 64 encoding and decoding identifiers + bytes in rust, quickly

fast32 Base32 and base64 encoding in Rust. Primarily for integer (u64, u128) and UUID identifiers (behind feature uuid), as well as arbitrary byte arr

Chris Rogus 9 Dec 18, 2023
Rust CS:GO base

Helveta Counter-Strike: Global Offensive Cheat base written in Rust. NOT COMPLETE! In active development. Disclaimer There's an alternative, currently

cristei 3 Sep 4, 2021
CLI tool to convert numbers from one base to another

changebase A CLI tool for changing the base of numbers. > changebase -h numeric base converter USAGE: changebase [FLAGS] [OPTIONS] <value> FLAG

null 2 Oct 14, 2022
Rust implementation of custom numeric base conversion.

base_custom Use any characters as your own numeric base and convert to and from decimal. This can be taken advantage of in various ways: Mathematics:

Daniel P. Clark 5 Dec 28, 2021