Fast, accessible and privacy friendly AI deployment

Overview

BlindAI

Mithril Security - BlindAI

Website | LinkedIn | Blog | Twitter | Documentation | Discord

Fast, accessible and privacy friendly AI deployment πŸš€ πŸ”’

BlindAI is a fast, easy to use and confidential inference server, allowing you to deploy your model on sensitive data. Thanks to the end-to-end protection guarantees, data owners can send private data to be analyzed by AI models, without fearing exposing their data to anyone else.

We reconcile AI and privacy by leveraging Confidential Computing for secure inference. You can learn more about this technology here.

We currently only support Intel SGX, but we plan to cover AMD SEV and Nitro Enclave in the future. More information about our roadmap will be provided soon.

Our solution comes in two parts:

  • A secure inference solution to serve AI models with privacy guarantees.
  • A client SDK to securely consume the remote AI models.

Getting started

To deploy a model on sensitive data, with end-to-end protection, we provide a Docker image to serve models with confidentiality, and a client SDK to consume this service securely.

Note

Because the server requires specific hardware, for instance Intel SGX currently, we also provide a simulation mode. Using the simulation mode, any computer can serve models with our solution. However, the two key properties of secure enclaves, data in use confidentiality, and code attestation, will not be available. Therefore this is just for testing on your local machine but is not relevant for real guarantees in production.

Our first article Deploy Transformers with confidentiality covers the deployment of both simulation and hardware mode.

A - Deploying the server

Deploy the inference server, for instance using one of our Docker images. To get started quickly, you can use the image with simulation, which does not require any specific hardware.

docker run -p 50051:50051 -p 50052:50052 mithrilsecuritysas/blindai-server-sim

B - Sending data from the client to the server

Our client SDK is rather simple, but behind the scenes, a lot happens. If we are talking to a real enclave (simulation=False), the client actually verifies we are indeed talking with an enclave with the right security properties, such as the code loaded inside the enclave or security patches applied. Once those checks pass, data or model can be uploaded safely, with end-to-end protection through a TLS tunnel ending inside the enclave. Thanks to the data in use, protection of the enclave and verification of the code, everything sent remotely will not be exposed to any third party.

You can learn more about the attestation mechanism for code integrity here.

i - Upload the model

Then we need to load a model inside the secure inference server. First we will export our model from Pytorch to ONNX, then we can upload it securely to the inference server. Uploading the model through our API allows the model to be kept confidential, for instance when deploying it on foreign infrastructure, like Cloud or client on-premise.

from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
from blindai.client import BlindAiClient, ModelDatumType

# Get pretrained model
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")

# Create dummy input for export
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
sentence = "I love AI and privacy!"
inputs = tokenizer(sentence, padding = "max_length", max_length = 8, return_tensors="pt")["input_ids"]

# Export the model
torch.onnx.export(
	model, inputs, "./distilbert-base-uncased.onnx",
	export_params=True, opset_version=11,
	input_names = ['input'], output_names = ['output'],
	dynamic_axes={'input' : {0 : 'batch_size'},
	'output' : {0 : 'batch_size'}})

# Launch client
client = BlindAiClient()
client.connect_server(addr="localhost", simulation=True)
client.upload_model(model="./distilbert-base-uncased.onnx", shape=inputs.shape, dtype=ModelDatumType.I64)

ii - Send data and run model

Upload the data securely to the inference server.

from transformers import DistilBertTokenizer
from blindai.client import BlindAiClient

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")

sentence = "I love AI and privacy!"
inputs = tokenizer(sentence, padding = "max_length", max_length = 8)["input_ids"]

# Load the client
client = BlindAiClient()
client.connect_server("localhost", simulation=True)

# Get prediction
response = client.run_model(inputs)

What you can do with BlindAI

  • Easily deploy state-of-the-art models with confidentiality. Run models from BERT for text to ResNets for images, through WaveNet for audio.
  • Provide guarantees to third parties, for instance clients or regulators, that you are indeed providing data protection, through code attestation.
  • Explore different scenarios from confidential Speech-to-text, to biometric identification, through secure document analysis with our pool of examples.

What you cannot do with BlindAI

  • Our solution aims to be modular but we have yet to incorporate tools for generic pre/post processing. Specific pipelines can be covered but will require additional handwork for now.
  • We do not cover training and federated learning yet, but if this feature interests you do not hesitate to show your interest through the roadmap or Discord channel.
  • The examples we provide are simple, and do not take into account complex mechanisms such as secure storage of confidential data with sealing keys, advanced scheduler for inference requests, or complex key management scenarios. If your use case involves more than what we show, do not hesitate to contact us for more information.

Install

A - Server

Our inference server can easily be deployed through our Docker images. You can pull it from our Docker repository or build it yourself.

B - Client

We advise you to install our client SDK using a virtual environment. You can simply install the client using pip with:

pip install blindai

You can find more details regarding the installation in our documentation here.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Disclaimer

BlindAI is still being developed and is provided as is, use it at your own risk.

Comments
  • Export proofs, python tests and some refactor

    Export proofs, python tests and some refactor

    Description

    I am sorry this PR is big :sweat_smile:

    • refactor Policy to a proper class
    • add typings on a lot of functions and classes, add AttestationError error
    • fix python imports (the init hack we used, which caused conflicts on downstream code..)
    • moved generated protobuf files to a client/blindai/pb instead of client/blindai directly
    • add a protobuf file, which contains the file format for proof files
    • change RunModelResponse & UploadModelResponse classes to proper clean classes
    • add SignedResponse which is inherited by RunModelResponse and UploadModelResponse
    • new apis on SignedResponse: is_signed, save_to_file, as_bytes, load_from_file, load_from_bytes, validate (see #37)
    • change some methods and fields from BlindaiClient to private
    • add unittests to python client for send_model and run_model and (they mock the server responses)
    • add unittests to python client for the newly added proof export API surface

    Future work

    • CI integration for these new unittests (which are separate from the e2e tests)
    • create unittests for dcap_attestation.py (verify_dcap_attestation, verify_claims)

    Related Issue

    Closes #37

    Type of change

    • [x] This change requires a documentation update
    • [x] This change affects the client
    • [ ] This change affects the server
    • [ ] This change affects the API
    • [ ] This change only concerns the documentation

    How Has This Been Tested?

    Added unit tests for the new api surface

    Checklist:

    • [x] My code follows the style guidelines of this project
    • [x] I have performed a self-review of my code
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [x] My changes generate no new warnings
    • [ ] I have updated the documentation according to my changes
    • [x] I have added tests that prove my fix is effective or that my feature works
    • [x] New and existing unit tests pass locally with my changes
    opened by cchudant 8
  • feat/client: configurable ports

    feat/client: configurable ports

    Description

    Configurable ports numbers in the client side.

    Related Issue

    None.

    Type of change

    • [X] This change requires a documentation update
    • [X] This change affects the client
    • [X] This change affects the API (The client interface, not the proto)

    How Has This Been Tested?

    Tested locally with default and custom ports numbers.

    Checklist:

    • [X] My code follows the style guidelines of this project
    • [X] I have performed a self-review of my code
    • [X] I have commented my code, particularly in hard-to-understand areas
    • [X] My changes generate no new warnings
    • [X] I have updated the documentation according to my changes (The changelog file)
    Info : Client :snake: Type : New Feature :heavy_plus_sign: 
    opened by CerineBnsd 7
  • [Question] How to use multiple inputs for model?

    [Question] How to use multiple inputs for model?

    Hi,

    How can I upload a model with multiple inputs? The distilbert example does not use multiple inputs but it's quite normal with pre-trained models. What should I pass to dtype and shape in this case?

    Thanks.

    opened by liebkne 5
  • server: Optimize docker image sizes

    server: Optimize docker image sizes

    Description

    On dockerhub, the images are very very big:

    software => 852.76 MB
    hardware => 929MB
    hardware-dcsv3 => 853.85 MB
    

    This is the compressed size, meaning the actual sizes are even bigger

    Result of docker images <image> --format "{{.Size}}" currently (uncompressed size)

    software => 2.96GB
    hardware => 3.17GB
    hardware-dcsv3 => 2.98GB
    

    This pull request changes the uncompressed sizes to:

    software => 281MB
    hardware => 532MB (still big, it contains nodejs..)
    hardware-dcsv3 => 286MB
    

    I don't have the numbers for the compressed sizes

    How is that possible?

    Docker works on an overlay filesystem. This means, every time we use an instruction such as RUN during the build, it will create a new filesystem layer. The final image is just every layer overlapped on one another. This means that if we install a temporary dependency in a RUN command, we have to uninstall it in the same RUN command, or else, it will still impact the image size after being uninstalled.

    The way this new Dockerfile works is by creating separate images for building the app and running it. Build images are quite big since they have all the build dependencies, and run images are as slim as possible, and optimized for size.

    Docs: Developer environment

    This PR introduces a base-build stage/image that has almost everything you need for developing on BlindAI server. This is a good opportunity to document how to create a proper dev environment for the server on the docs, using docker and vscode.

    Something like

    DOCKER_BUILDKIT=1 docker build \
      -f ./server/docker/build.dockerfile \
      -t blindai-dev-env \
      --target base-build \
      --name blindai-dev-env \
      --volume $(pwd):/blindai \
    #  --device /dev/sgx/enclave \
    #  --device /dev/sgx/provision \
      ./server
    

    What do you think? Where in the docs would that fit?

    Related Issue

    None

    Type of change

    • [ ] This change requires a documentation update
    • [ ] This change affects the client
    • [x] This change affects the server
    • [ ] This change affects the API
    • [ ] This change only concerns the documentation

    How Has This Been Tested?

    This has been tested in software mode on my machine. The images compile fine in hardware and hardware-dcsv3 mode, but I will need to test on actual machines to make sure I did not break anything (CI doesn't check hardware and hardware-dcsv3 yet)

    This PR is marked as draft until I do these tests.

    Checklist:

    • [x] My code follows the style guidelines of this project
    • [x] I have performed a self-review of my code
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [ ] My changes generate no new warnings
    • [ ] I have updated the documentation according to my changes
    Info : Build :building_construction: 
    opened by cchudant 5
  • Connection Error: client.connect_server(..) - Hardware mode - CovidNet Example

    Connection Error: client.connect_server(..) - Hardware mode - CovidNet Example

    Hi, I encountered some problems while implementing your framework in hardware with the CovidNet example that you provide. Do you think of anything I forgot?

    Description

    I have an error that I cannot explain during the step of connecting the client to the server.

    Blindai Versions

    • BlindAI Client : "0.2.0"
    • BlindAI Server : "0.2.2"

    Additional Information

    • Ubuntu
    • Version: 20.04.1
    • Package Manager Version: pip 22.0.4
    • Language version : Python 3.8.10
    • Kernel : 5.13.0-40-generic

    Screenshots:

    image Capture d’écran du 2022-04-27 16-32-42

    Type : Bug :lady_beetle: 
    opened by Alexis-CAPON 4
  • Merge hardware and software in notebook examples

    Merge hardware and software in notebook examples

    Description

    Merge hardware and software in notebook examples

    Something like

    client = BlindAiClient()
    
    # Comment this line for hardware mode
    client.connect_server(addr="localhost", simulation=True)
    
    # Comment this line for simulation mode
    client.connect_server(
        addr="localhost",
        policy="policy.toml",
        certificate="host_server.pem"
    )
    

    Why this modification is needed?

    Make the notebooks clearer, and less redundant

    What documents need to be updated

    • [ ] Main README
    • [ ] CONTRIBUTING guidelines
    • [ ] BlindAI Client README
    • [ ] BlindAI Server README
    • [ ] BlindAI Client CHANGELOG
    • [ ] BlindAI Server CHANGELOG
    • [ ] Python Docstrings
    • [x] others => python examples

    Additional Information

    None

    Checklist

    • [x] This issue concerns BlindAI Client
    • [ ] This issue concerns BlindAI Server
    Type : Documentation :memo: Type : Testing :test_tube: 
    opened by cchudant 4
  • Client telemetry

    Client telemetry

    Description

    Here are the info it collects:

    For every event,

    • uid is a unique id created by taking the sha256 of the hostname and username of the machine
    • platform, composed of os, release, version, arch. Example:
      • os: Linux/Windows
      • release: 5.13.19-2-MANJARO on my machine
      • version: SMP PREEMPT Sun Sep 19 21:31:53 UTC 2021 on my machine (kernel/windows version)
      • arch: x86_64
    • the time

    Event list:

    class ConnectEvent(Event):
        simulation: bool
        policy_allow_debug: Optional[bool]
        server_sgx_enabled: bool
        server_platform: str
        server_version: str
    class SendModelEvent(Event):
        simulation: bool
        policy_allow_debug: Optional[bool]
        server_sgx_enabled: bool
        server_platform: str
        server_version: str
        sign: bool
    class RunModelEvent(Event):
        simulation: bool
        policy_allow_debug: Optional[bool]
        server_sgx_enabled: bool
        server_platform: str
        server_version: str
        sign: bool
    

    server_platform, server_platform and server_version are sent by the server to the client using the GetServerInfo call.

    This builds upon #48, so i expect conflicts if #48 gets new commits.

    Questions

    • Should we remove the server-side telemetry?
    • Are the infos I collect here enough / too much?
    • I am making a call to amplitude every time the client connects/send_model/run_model
      • this makes me worry about performance, since we don't batch events, we send them one at a time at every request..
      • also, connect/send_model/run_model has to block until the amplitude request is fully sent, nothing is done in the background and it's fully synchronous
      • maybe there is a better way => an alternative is to send more info to the server when doing a request (like the client uid for example, client version, client platform..)
        • this may mean we don't have to do client-side telemetry, which imo is a better idea? what do you think
    • I then need to update the documentation (docs / readme / everywhere we talk about telemetry)

    These questions will need to get answered before I can make more progress :) @dhuynh95 @JoFrost

    Related Issue

    Closes #46

    Type of change

    • [x] This change requires a documentation update
    • [x] This change affects the client
    • [x] This change affects the server
    • [ ] This change affects the API
    • [ ] This change only concerns the documentation

    How Has This Been Tested?

    Tested locally and using the amplitude dashboard, and it works :)

    Checklist:

    • [ ] My code follows the style guidelines of this project
    • [ ] I have performed a self-review of my code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] My changes generate no new warnings
    • [ ] I have updated the documentation according to my changes