Serverless search for AWS.

Overview

Pathery πŸ”₯ Serverless Search πŸ”₯

npm version

Pathery is a serverless search service built on AWS using Rust, CDK and Tantivy.

πŸ”” WARNING: This is currently a work in progress and not ready for production usage.

Features

  • πŸ”₯ Fast full-text search - Pathery is built on Rust to limit its AWS Lambda cold start overhead.
  • πŸ₯° Simple REST API - Pathery exposes a simple REST API to make search as easy as possible.
  • πŸ‘ Easy to install - Pathery ships as a CDK Component making it easy to get started.
  • πŸ’΅ Usage based - Pathery has no long running servers, only pay for what you use.
  • πŸ”Ό Built for AWS - Pathery leans on AWS managed services to limit its maintenance burden and maximize its scalability.

Getting Started

Check out the getting started guide to deploy Pathery into your AWS account using CDK.

You might also like...
Rust client for AWS Infinidash service.
Rust client for AWS Infinidash service.

AWS Infinidash - Fully featured Rust client Fully featured AWS Infinidash client for Rust applications. You can use the AWS Infinidash client to make

Rusoto is an AWS SDK for Rust
Rusoto is an AWS SDK for Rust

Rusoto is an AWS SDK for Rust You may be looking for: An overview of Rusoto AWS services supported by Rusoto API documentation Getting help with Rusot

Easy switch between AWS Profiles and Regions
Easy switch between AWS Profiles and Regions

AWSP - CLI To Manage your AWS Profiles! AWSP provides an interactive terminal to interact with your AWS Profiles. The aim of this project is to make i

Simple fake AWS Cognito User Pool API server for development.

Fakey Cognito 🏑 Homepage Simple fake AWS Cognito API server for development. βœ… Implemented features AdminXxx on User Pools API. Get Started # run wit

Postgres proxy which allows tools that don't natively supports IAM auth to connect to AWS RDS instances.

rds-iamauth-proxy rds-proxy lets you make use of IAM-based authentication to AWS RDS instances from tools that don't natively support that method of a

A tool to run web applications on AWS Lambda without changing code.
A tool to run web applications on AWS Lambda without changing code.

AWS Lambda Adapter A tool to run web applications on AWS Lambda without changing code. How does it work? AWS Lambda Adapter supports AWS Lambda functi

πŸ“¦ πŸš€ a smooth-talking smuggler of Rust HTTP functions into AWS lambda
πŸ“¦ πŸš€ a smooth-talking smuggler of Rust HTTP functions into AWS lambda

lando 🚧 maintenance mode ahead 🚧 As of this announcement AWS not officialy supports Rust through this project. As mentioned below this projects goal

cargo-lambda a Cargo subcommand to help you work with AWS Lambda

cargo-lambda cargo-lambda is a Cargo subcommand to help you work with AWS Lambda. This subcommand compiles AWS Lambda functions natively and produces

cargo-lambda is a Cargo subcommand to help you work with AWS Lambda.

cargo-lambda cargo-lambda is a Cargo subcommand to help you work with AWS Lambda. The new subcommand creates a basic Rust package from a well defined

Comments
  • Consider using DynamoDB or S3 for original document storage

    Consider using DynamoDB or S3 for original document storage

    When Tantivy indexes documents it will optionally store the original text as well. This is used for generating snippets which are highlights of the matching text. Tantivy uses the filesystem to store these documents and stores them in a compressed format. This works fairly well when not using a networked storage solution but when using EFS, the whole store for a segment needs to be pulled in order to find a given document.

    .store files tend to be considerably larger than the rest of the segment:

    -rw-rw-r--  1 1001 1001  48K Nov 28 22:19 d908e24f73f04e83b85e679acf1d361b.7388140.del
    -rw-rw-r--  1 1001 1001   99 Nov 28 21:44 d908e24f73f04e83b85e679acf1d361b.fast
    -rw-rw-r--  1 1001 1001 2.7M Nov 28 21:44 d908e24f73f04e83b85e679acf1d361b.fieldnorm
    -rw-rw-r--  1 1001 1001  30M Nov 28 21:44 d908e24f73f04e83b85e679acf1d361b.idx
    -rw-rw-r--  1 1001 1001  17M Nov 28 21:44 d908e24f73f04e83b85e679acf1d361b.pos
    -rw-rw-r--  1 1001 1001  91M Nov 28 21:44 d908e24f73f04e83b85e679acf1d361b.store
    -rw-rw-r--  1 1001 1001 5.8M Nov 28 21:44 d908e24f73f04e83b85e679acf1d361b.term
    

    Looking at all the .store files in the test index it is clear that pulling .store files to look up specific documents by id would be incredibly inefficient when taking network latency into account:

    -rw-rw-r--  1 1001 1001 149M Nov 28 22:14 03578055b76b45bd961cf3931a0282d9.store
    -rw-rw-r--  1 1001 1001 176M Nov 28 23:29 103552f789714d07a2dff9f7143e001c.store
    -rw-rw-r--  1 1001 1001 162M Nov 29 00:06 1bfb3f7ef08e40b4bd166919c0786769.store
    -rw-rw-r--  1 1001 1001 9.9M Nov 29 00:38 87c0dd2f0577477ba233ee6a1c57c948.store
    -rw-rw-r--  1 1001 1001  66M Nov 29 00:16 92f9621c4f3e41a6938ad65d2c37e969.store
    -rw-rw-r--  1 1001 1001  51M Nov 29 00:32 b208026b097445c4a8eef6ea7dc6754e.store
    -rw-rw-r--  1 1001 1001  91M Nov 28 21:44 d908e24f73f04e83b85e679acf1d361b.store
    -rw-rw-r--  1 1001 1001  58M Nov 29 00:24 dd957117dbf3437ca3cdb552b38cc8c4.store
    -rw-rw-r--  1 1001 1001  38M Nov 29 00:37 ea6445abef914faeb767686e1c054987.store
    

    Instead of using Tantivy's built-in storage capability, we could use S3 or DynamoDB to store original documents such that they could be retrieved efficiently by id. A beneficial side-effect of this change would be that it should be cheaper as well as both DynamoDB and S3 have a lower monthly storage cost compared to EFS.

    enhancement 
    opened by tvanhens 1
  • Query fails sporadically when indexing due to deleted files from merged segments

    Query fails sporadically when indexing due to deleted files from merged segments

    It appears that segment merging is the culprit here. Merging segments deletes the merged segments but readers may have out of date meta.json files which refer to these deleted segments.

    bug 
    opened by tvanhens 1
  • feature: batching and batch index endpoint

    feature: batching and batch index endpoint

    Adds a batch indexing endpoint POST /index/{index_id}/batch to upload batches of documents.

    Batches are serialized to S3 and a message pointing to the S3 object is enqueued to SQS. This allows for large batches without running into SQS payload limitations.

    Closes #2

    opened by tvanhens 0
  • Allow all tantivy schema fields to be used by pathery indexes

    Allow all tantivy schema fields to be used by pathery indexes

    Pathery Schemas include functions for transforming JSON data as well as self-serialization. This means the custom code for both functions can be removed and it should improve feature coverage.

    • [x] dates
    • [x] integers
    • [ ] floats
    enhancement 
    opened by tvanhens 0
Owner
Tyler van Hensbergen
Tyler van Hensbergen
Ref Arch: Serverless GraphQL in Rust on AWS

A Whole Hog Reference Architecture for an Apollo Federation-Ready, Serverless, Rust-Based GraphQL Microservice on AWS using Cloud Development Kit (CDK)

Michael Edelman 3 Jan 12, 2022
Aws-sdk-rust - AWS SDK for the Rust Programming Language

The AWS SDK for Rust This repo contains the new AWS SDK for Rust (the SDK) and its public roadmap. Please Note: The SDK is currently released as a dev

Amazon Web Services - Labs 2k Jan 3, 2023
Rs.aws-login - A command line utility to simplify logging into AWS services.

aws-login A command line utility to simplify logging into AWS accounts and services. $ aws-login use ? Please select a profile to use: β€Ί ❯ dev-read

Kevin Herrera 11 Oct 30, 2022
This repo is a sample video search app using AWS services.

Video Search This repo is a sample video search app using AWS services. You can check the demo on this link. Features Transcribing Video and generate

AWS Samples 8 Jan 5, 2023
Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications

Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications Explore PipeCD docs Β» Overview PipeCD provides a unified co

PipeCD 650 Dec 29, 2022
Examples of how to use Rust with Serverless Framework, Lambda, API Gateway v1 and v2, SQS, GraphQL, etc

Rust Serverless Examples All examples live in their own directories: project: there is nothing here, just a simple cargo new project_name with a custo

Fernando Daciuk 9 Dec 17, 2022
πŸš«πŸ“† Serverless calendar built with shuttle.rs

zerocal ?? ?? Welcome to zerocal, the serverless calendar. It allows you to create calendar invites from the convenience of your terminal! ?? Here's m

Matthias 150 Jan 2, 2023
Serverless setup for activity pub (using lambda+dynamodb) in Rust

Serverless ActivityPub About This is an experiment to have free/cheaper activitypub instances running on AWS (making use of free tiers as much as poss

Conrad Ludgate 3 Dec 30, 2022
Monorep for fnRPC (high performance serverless rpc framework)

fnrpc Monorep for fnRPC (high performance serverless rpc framework) cli Cli tool help build and manage functions Create RPC functions Create & Manage

Faasly 3 Dec 21, 2022
Remote Secret Editor for AWS Secret Manager

Barberousse - Remote Secrets Editor About Usage Options Printing Editing Copying RoadMap 1.0 1.1 Future About A project aimed to avoid downloading sec

Mohamed Zenadi 18 Sep 28, 2021