Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.

XayNet

Last update: Dec 22, 2022

Related tags

Overview

xaynet

Xaynet: Train on the Edge with Federated Learning

Want a framework that supports federated learning on the edge, in desktop browsers, integrates well with mobile apps, is performant, and preserves privacy? Welcome to XayNet, written entirely in Rust!

Making federated learning easy for developers

Frameworks for machine learning - including those expressly for federated learning - exist already. These frameworks typically facilitate federated learning of cross-silo use cases - for example in collaborative learning across a limited number of hospitals or for instance across multiple banks working on a common use case without the need to share valuable and sensitive data.

This repository focusses on masked cross-device federated learning to enable the orchestration of machine learning in millions of low-power edge devices, such as smartphones or even cars. By doing this, we hope to also increase the pace and scope of adoption of federated learning in practice and especially allow the protection of end user data. All data remains in private local premises, whereby only encrypted AI models get automatically and asynchronously aggregated. Thus, we provide a solution to the AI privacy dilemma and bridge the often-existing gap between privacy and convenience. Imagine, for example, a voice assistant to learn new words directly on device level and sharing this knowledge with all other instances, without recording and collecting your voice input centrally. Or, think about search engine that learns to personalise search results without collecting your often sensitive search queries centrally… There are thousands of such use cases that right today still trade privacy for convenience. We think this shouldn’t be the case and we want to provide an alternative to overcome this dilemma.

Concretely, we provide developers with:

App dev tools: An SDK to integrate federated learning into apps written in Dart or other languages of choice for mobile development, as well as frameworks like Flutter.
Privacy via cross-device federated learning: Train your AI models locally on edge devices such as mobile phones, browsers, or even in cars. Federated learning automatically aggregates the local models into a global model. Thus, all insights inherent in the local models are captured, while the user data stays private on end devices.
Security Privacy via homomorphic encryption: Aggregate models with the highest security and trust. Xayn’s masking protocol encrypts all models homomorphically. This enables you to aggregate encrypted local models into a global one – without having to decrypt local models at all. This protects private and even the most sensitive data.

The case for writing this framework in Rust

Our framework for federated learning is not only a framework for machine learning as such. Rather, it supports the federation of machine learning that takes place on possibly heterogeneous devices and where use cases involve many such devices.

The programming language in which this framework is written should therefore give us strong support for the following:

Runs "everywhere": the language should not require its own runtime and code should compile on a wide range of devices.
Memory and concurrency safety: code that compiles should be both memory safe and free of data races.
Secure communication: state of the art cryptography should be available in vetted implementations.
Asynchronous communication: abstractions for asynchronous communication should exist that make federated learning scale.
Fast and functional: the language should offer functional abstractions but also compile code into fast executables.

Rust is one of the very few choices of modern programming languages that meets these requirements:

its concepts of Ownership and Borrowing make it both memory and thread-safe (hence avoiding many common concurrency issues).
it has a strong and static type discipline and traits, which describe shareable functionality of a type.
it is a modern systems programming language, with some functional style features such as pattern matching, closures and iterators.
its idiomatic code compares favourably to idiomatic C in performance.
it compiles to WASM and can therefore be applied natively in browser settings.
it is widely deployable and doesn't necessarily depend on a runtime, unlike languages such as Java and their need for a virtual machine to run its code. Foreign Function Interfaces support calls from other languages/frameworks, including Dart, Python and Flutter.
it compiles into LLVM, and so it can draw from the abundant tool suites for LLVM.

Getting Started

Minimum supported rust version

rustc 1.48.0

Running the platform

There are a few different ways to run the backend: via docker, or by deploying it to a Kubernetes cluster or by compiling the code and running the binary manually.

Everything described below assumes your shell's working directory to be the root of the repository.
The following instructions assume you have pre-existing knowledge on some of the referenced software (like docker and docker-compose) and/or a working setup (if you decide to compile the Rust code and run the binary manually).
In case you need help with setting up your system accordingly, we recommend you refer to the official documentation of each tool, as supporting them here would be beyond the scope of this project:
- Rust
- Docker and Docker Compose
- Kubernetes

Note:

With Xaynet v0.11 the coordinator needs a connection to a redis instance in order to save its state.

Don't connect the coordinator to a Redis instance that is used in production!

We recommend connecting the coordinator to its own Redis instance. We have invested a lot of time to make sure that the coordinator only deletes its own data but in the current state of development, we cannot guarantee that this will always be the case.

Using Docker

The convenience of using the docker setup is that there's no need to setup a working Rust environment on your system, as everything is done inside the container.

Run an image from Docker Hub

Docker images of the latest releases are provided on Docker Hub.

You can try them out with the default configs/docker-dev.toml by running:

Xaynet below v0.11

docker run -v ${PWD}/configs/docker-dev.toml:/app/config.toml -p 8081:8081 xaynetwork/xaynet:v0.10.0 /app/coordinator -c /app/config.toml

Xaynet v0.11+

# don't forget to adjust the Redis url in configs/docker-dev.toml
docker run -v ${PWD}/configs/docker-dev.toml:/app/config.toml -p 8081:8081 xaynetwork/xaynet:v0.11.0

The docker image contains a release build of the coordinator without optional features.

Run a coordinator with additional infrastructure

Start the coordinator by pointing to the docker/docker-compose.yml file. It spins up all infrastructure that is essential to run the coordinator with default or optional features. Keep in mind that this file is used for development only.

docker-compose -f docker/docker-compose.yml up --build

Create a release build

If you would like, you can create an optimized release build of the coordinator, but keep in mind that the compilation will be slower.

docker build --build-arg RELEASE_BUILD=1 -f ./docker/Dockerfile .

Build a coordinator with optional features

Optional features can be specified via the build argument COORDINATOR_FEATURES.

docker build --build-arg COORDINATOR_FEATURES=tls,metrics -f ./docker/Dockerfile .

Using Kubernetes

To deploy an instance of the coordinator to your Kubernetes cluster, use the manifests that are located inside the k8s/coordinator folder. The manifests rely on kustomize to be generated (kustomize is officially supported by kubectl since v1.14). We recommend you thoroughly go through the manifests and adjust them according to your own setup (namespace, ingress, etc.).

Remember to also check (and adjust if necessary) the default configuration for the coordinator, available at k8s/coordinator/development/config.toml.

Please adjust the domain used in the k8s/coordinator/development/ingress.yaml file so it matches your needs (you can also skip ingress altogether, just make sure you remove its reference from k8s/coordinator/development/kustomization.yaml).

Keep in mind that the ingress configuration that is shown on k8s/coordinator/development/ingress.yaml relies on resources that aren't available in this repository, due to their sensitive nature (TLS key and certificate, for instance).

To verify the generated manifests, run:

kubectl kustomize k8s/coordinator/development

To apply them:

kubectl apply -k k8s/coordinator/development

In case you are not exposing your coordinator via ingress, you can still reach it using a port-forward. The example below creates a port-forward at port 8081 assuming the coordinator pod is still using the app=coordinator label:

kubectl port-forward $(kubectl get pods -l "app=coordinator" -o jsonpath="{.items[0].metadata.name}") 8081

Building the project manually

The coordinator without optional features can be built and started with:

cd rust
cargo run --bin coordinator -- -c ../configs/config.toml

Running the example

The example can be found under rust/examples/. It uses a dummy model but is network-capable, so it's a good starting point for checking connectivity with the coordinator.

`test-drive`

Make sure you have a running instance of the coordinator and that the clients you will spawn with the command below are able to reach it through the network.

Here is an example on how to start 20 participants that will connect to a coordinator running on 127.0.0.1:8081:

cd rust
RUST_LOG=info cargo run --example test-drive -- -n 20 -u http://127.0.0.1:8081

For more in-depth details on how to run examples, see the accompanying Getting Started guide under rust/xaynet-server/src/examples.rs.

Troubleshooting

If you have any difficulties running the project, please reach out to us by opening an issue and describing your setup and the problems you're facing.

Comments

Integrate metrics sender into state machine

The second part of the Implement data collection to InfluxDB task.

Integration of the MetricsSender into the state machine.

My first idea was to put the metrics sender into the CoordinatorState so that we don't have to add another field to the PhaseState. However, I didn't like it because the metrics sender has nothing to do with the coordinator state. Therefore, I ended up putting the metrics sender into the PhaseState.

In my opinion the same applies for EventPublisher. It may make sense to decouple the EventPublisher from the CoordinatorState in a separate PR. We might have two structs: a state struct and an io struct. Both are fields of a coordinator struct which then used in the state machine.

pub struct State {
    /// The credentials of the coordinator.
    pub keys: EncryptKeyPair,
    /// Internal ID used to identify a round
    pub round_id: u64,
    /// The round parameters.
    pub round_params: RoundParameters,
    /// The minimum of required sum/sum2 messages.
    pub min_sum_count: usize,
    /// The minimum of required update messages.
    pub min_update_count: usize,
    /// The minimum time (in seconds) reserved for processing sum/sum2 messages.
    pub min_sum_time: u64,
    /// The minimum time (in seconds) reserved for processing update messages.
    pub min_update_time: u64,
    /// The maximum time (in seconds) permitted for processing sum/sum2 messages.
    pub max_sum_time: u64,
    /// The maximum time (in seconds) permitted for processing update messages.
    pub max_update_time: u64,
    /// The number of expected participants.
    pub expected_participants: usize,
    /// The masking configuration.
    pub mask_config: MaskConfig,
    /// The size of the model.
    pub model_size: usize,
}

pub struct IO<R> {
    /// The request receiver half.
    pub request_rx: RequestReceiver<R>,
    /// The event publisher.
    pub events: EventPublisher,
    /// The metrics sender half.
    pub metrics_tx: MetricsSender, 
}

pub struct Coordinator<R> {
    pub state: State,
    pub io: IO<R>
}


pub struct PhaseState<R, S> {
    /// The inner state.
    pub(in crate::state_machine) inner: S,
    /// The Coordinator state.
    pub(in crate::state_machine) coordinator: Coordinator<R>,
}

Future improvements

My original plan was to have something like a log macro e.g. metric!(round_id, 1) which you can add anywhere in the code without having to carry a metrics sender around. However, I have no glue how to do that. I know that the metrics-rs crate does just that, but it is at a very early stage of development. It could be possible to do this with #[proc_macro_hack].

opened by Robert-Steiner 14

wrap libsodium

We'd like to implement some traits on libsodium types, but this is prevented by the orphan rule. To work around this, this commit introduce newtypes around the libsodium crypto primitives that we use in our codebase.

opened by little-dude 11
Rust coordinator grpc
Adds a GRPC server using the generated GRPC services. Incorporates #98, the issues there should be resolved

The server does:

Client/Server authentication using TLS

signed certificates for localhost are included for testing, do not use in production

password for the CA certificate is 123456

Accepts both coordinator requests and ndarray requests.

A test client is also included
opened by skade 11

Aggregator crashes after connecting to coordinator (macos, cargo)

Running a coordinator with

cargo run --bin coordinator -- -c ../configs/dev-coordinator.toml

followed by an aggregator with

cargo run --bin aggregator -- -c ../configs/dev-aggregator.toml

leads to a crash on macOS due to some python import error:

X-MBP-010:rust jan.petsche$ cargo run --bin aggregator -- -c ../configs/dev-aggregator.toml
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/aggregator -c ../configs/dev-aggregator.toml`
[2020-03-17T13:54:03Z TRACE mio::poll] registering with poller
[2020-03-17T13:54:03Z TRACE mio::poll] registering with poller
[2020-03-17T13:54:03Z TRACE mio::sys::unix::kqueue] registering; token=Token(1); interests=Readable | Writable | Error | Hup
[2020-03-17T13:54:03Z TRACE mio::sys::unix::kqueue] registering; token=Token(0); interests=Readable | Writable | Error | Hup
[2020-03-17T13:54:03Z INFO  stubborn_io::tokio::io] Initial connection succeeded.
[2020-03-17T13:54:03Z TRACE tokio_util::codec::framed_write] flushing framed transport
[2020-03-17T13:54:03Z TRACE tokio_util::codec::framed_write] framed transport flushed
[2020-03-17T13:54:03Z TRACE mio::poll] registering with poller
[2020-03-17T13:54:03Z TRACE mio::sys::unix::kqueue] registering; token=Token(2); interests=Readable | Writable | Error | Hup
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::service] polling AggregatorService
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::service] polling RPC requests
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::rpc] polling RpcRequestsMux
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::rpc] no RequestStream, polling the RequestStream receiver
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::service] polling API requests
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::service] polling ApiRx
[2020-03-17T13:54:03Z TRACE mio::poll] registering with poller
[2020-03-17T13:54:03Z TRACE mio::sys::unix::kqueue] registering; token=Token(3); interests=Readable | Writable | Error | Hup
[2020-03-17T13:54:03Z INFO  xain_fl::aggregator::api] starting HTTP server on localhost:8082
[2020-03-17T13:54:03Z INFO  warp::server] listening with custom incoming
Traceback (most recent call last):
  File "/Users/jan.petsche/Documents/AI/FL-RUST/coordinator-rs/python/aggregators/xain_aggregators/weighted_average.py", line 2, in <module>
    import pickle
  File "/Users/jan.petsche/.pyenv/versions/3.7.2/lib/python3.7/pickle.py", line 33, in <module>
    from struct import pack, unpack
  File "/Users/jan.petsche/.pyenv/versions/3.7.2/lib/python3.7/struct.py", line 13, in <module>
    from _struct import *
ImportError: dlopen(/Users/jan.petsche/.pyenv/versions/3.7.2/lib/python3.7/lib-dynload/_struct.cpython-37m-darwin.so, 2): Symbol not found: __PyFloat_Pack2
  Referenced from: /Users/jan.petsche/.pyenv/versions/3.7.2/lib/python3.7/lib-dynload/_struct.cpython-37m-darwin.so
  Expected in: flat namespace
 in /Users/jan.petsche/.pyenv/versions/3.7.2/lib/python3.7/lib-dynload/_struct.cpython-37m-darwin.so
[2020-03-17T13:54:03Z ERROR xain_fl::aggregator::py_aggregator] py_aggregator failure: failed to load python module `xain_aggregators.weighted_average`
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::service] polling AggregatorService
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::service] polling RPC requests
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::rpc] polling RpcRequestsMux
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::rpc] no RequestStream, polling the RequestStream receiver
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::service] polling API requests
[2020-03-17T13:54:03Z TRACE xain_fl::aggregator::service] polling ApiRx
[2020-03-17T13:54:03Z TRACE mio::poll] deregistering handle with poller
[2020-03-17T13:54:03Z INFO  aggregator] shutting down: Aggregator terminated
[2020-03-17T13:54:03Z TRACE tokio_util::codec::framed_write] flushing framed transport
[2020-03-17T13:54:03Z TRACE tokio_util::codec::framed_write] framed transport flushed
[2020-03-17T13:54:03Z INFO  tarpc::rpc::client::channel] Shutdown: write half closed, and no requests in flight.
[2020-03-17T13:54:03Z TRACE mio::poll] deregistering handle with poller
[2020-03-17T13:54:03Z TRACE mio::poll] deregistering handle with poller
[2020-03-17T13:54:03Z TRACE mio::poll] deregistering handle with poller

opened by janpetschexain 10

restore coordinator state
Summary

added the possibility to restore the coordinator state

Behavior

Link to diagram

If the RestoreSettings.no_restore flag is set to true, the current coordinator state will be reset and a new StateMachine is created with the given settings. The new state machine starts with round id 1.

If no coordinator state exists, the current coordinator state will be reset and a new StateMachine is created with the given settings.

If a coordinator state exists but no global model has been created so far, the StateMachine will be restored with the coordinator state but without a global model.

If a coordinator state and a global model exists, the StateMachine will be restored with the coordinator state and the global model.

If a global model has been created but does not exists, the initialization will fail with StateMachineInitializationError::GlobalModelUnavailable.

If a global model exists but its properties do not match the coordinator model settings, the initialization will fail with StateMachineInitializationError::GlobalModelInvalid.

Any network error will cause the initialization to fail.

Ideas for future improvements

Hide the redis and s3 client behind a StorageAPI

Currently we use the both clients directly in the state machine which has some disadvantages:

we work with S3Error and RedisError types in the state machine. It would be better if we can hide these errors behind more appropriate error types like GlobalModelUnavailable, CoordinatorStateUnavailable, etc.

we have to carry around the #[cfg(feature = "model-persistence")] annotation if we want to use the s3 client (see StateMachineInitializer). With a StorageAPI we can hide it behind the API. However we will still need to add the annotation in the state machine for functions like upload_global_model.

further advantages of a StorageAPI:

It allows us to swap the storage backend (although I don't see the need in the near future)

we can hide something like

use xaynet_core::crypto::ByteObject; let round_seed = hex::encode(self.shared.state.round_params.seed.as_slice()); let key = format!("{}_{}", self.shared.state.round_id, round_seed); self.shared .io .s3 .upload_global_model(&key, &global_model) .await .map_err(PhaseStateError::SaveGlobalModel)?; let _ = self .shared .io .redis .connection() .await .set_latest_global_model_id(&key) .await .map_err(|err| warn!("failed to update latest global model id: {}", err));

behinde one storage function:

storage.upload_global_model(round_id, round_seed)

it cleans up the state machine

Add global_model_idto restore settings

I think it would make sense if the user has the option to restore the coordnator with a specific global model.

Open questions:

Would you prefer to introduce another feature flag for the restore functionality? I used the "model-persistence" feature flag for now, as it is the only one that a restore feature flag would depend on.
opened by Robert-Steiner 9
xaynet-sdk: allow other HTTP clients

The only client we currently support is reqwest but this opens the door to supporting more clients in the future.

To add a new client NewClient, we just need to implement the HttpClient trait.

opened by little-dude 8
coordinator: introduce type aliases & some renaming
This is based on https://github.com/xainag/xain-fl/pull/380, I split it to make the changes easier to review.

Type aliases help the reader understanding the nature of the data being manipulated. For instance in

type UpdateDict = HashMap<SumParticipantPublicKey, HashMap<UpdateParticipantPublicKey, EncryptedMaskingSeed>>;

it is clear that this is a dictionary where the keys identify sum participant, and the values are nested dictionaries where the keys identify update participants. While in the following, it is not so clear, and one has to look for a place where the map is being manipulated to try to understand what the keys refer to:

HashMap<box_::PublicKey, HashMap<box_::PublicKey, Vec<u8>>>

This commit also renames the Coordinator.dict_seed attribute to Coordinator.dict_update. There are two types of seed dictionaries: the dictionaries sent by the update participant, and the dictionary built by the coordinator from these updates. It is easy to confuse them, so we chose to name the dictionaries sent by the participants "seed dictionary" and the global dictionary built by the coordinator during the update phase "update dictionary".

Finally, the renaming aforementioned lead to renaming Coordinator.update_dict_seed to Coordinator.update_dict_update which sounds weird, so we renamed these methods as well:

update_sum_dict => add_sum_participant

update_dict_seed => add_seed_dict

update_dict_mask => add_mask_hash
opened by little-dude 8
cleanup README files
use a separate README for the repo and for readthedocs

remove the implementation details from the repo's main README. That part of the documentation belong to the Rust documentation and was slightly outdated anyway

remove the keras example README, which was outdated and redundant with the instructions in the repo's README
opened by little-dude 8
XN-1460 add model conversion benchmarks

Adds benchmarks for parsing a vector i32 into a Model, and serialising a Model into a vector i32.

Performance: | test | time | | --- | --- | | parse model from 4 bytes vector | 351.78 ns | | parse model (bounded) from 100kB vector | 3.4069 ms | | parse model from 1MB vector | 41.986 ms | | serialize 4 bytes model to primitives | 829.05 ns | | serialize 100kB model into primitives | 9.5551 ms | | serialize 1MB model to primitives | 103.23 ms |

I tried with 10MB as well but preparing the 1000 iterations was too painfully slow.

Not sure this is the best way, so please don't hold back on comments.

I also get a weird warning that the make_model_XXX functions in the utils mod are unused, but they clearly are, and they work. SO answer was a bit overwhelming.. any word of advice? Thanks!

EDIT: warning disappeared, so that's good.

opened by wilk10 6
implement a service for handling multipart messages

Add a service for handling multipart message in xaynet_server::services::messages::multipart. That service comes after the MessageParser: it takes a Message and returns an Option<Message>.

This PR also changes how the multipart nature is encoded in the message. Previously, this was encoded in the message tag. Therefore, the "real" type of the message wasn't known. This posed multiple problems. First we couldn't parse a multipart message without this information. Second, we rely on the tag information to filter out incoming messages (sum messages are only accepted during the sum phase for instance). Without this information, we'd have to process all the multipart messages we receive.

Therefore we decided to restore the flag field in the message header and use it to indicate whether a message is multipart, and keep the tag field to indicate the type of the message.

opened by little-dude 6

fix to poll round params data

update. This addresses a break in the build introduced in https://github.com/xainag/xain-fl/pull/397

now compatible with the changes around round parameters introduced in https://github.com/xainag/xain-fl/pull/411
PB-783

This compiles, but isn't quite as it should be. In the loop of during_round, we poll for round_params_data. Now the "old" round_params are inside, and also Optional. So really we should be polling until round_params is also available. What we'd really like is something like

// doesn't compile!
let round_params = loop {
    if let Some(round_params_data) = self.handle.get_round_parameters().await {
        if let Some(round_params) = round_params_data.round_parameters {
            break round_params;
        }
    }
    // interval tick...
};

But the Arc around round_params_data makes it problematic to do...

error[E0507]: cannot move out of an `Arc`
   --> src/client.rs:108:45
    |
108 |                 if let Some(round_params) = round_params_data.round_parameters {
    |                             ------------    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider borrowing here: `&round_params_data.round_parameters`
    |                             |
    |                             data moved here
    |                             move occurs because `round_params` has type `coordinator::RoundParameters`, which does not implement the `Copy` trait

opened by finiteprods 6

What does example code do?

Hi, I've been able to successfully run your rust/examples code (coordinator and clients + redis db). In the description you mention that it's a basic federated learning algorithm but it's "network capable". Could you pleas elaborate on this further?

What exactly is the algorithm the clients are running? How do you aggregate the updates to the global model (if you do)? What data the test example is running on? What do you train the model on?

Thanks

opened by zavalyshyn 2
use of Xaynet with Raspberry pi

hi @Robert-Steiner , Iam currently working on project about federated learning and came across your framework during exploratory analysis. My project should utilize federated learning in this manner - I have an aggregation server (let's say in a cloud). I want this server to provide a model for my 2 Raspberry PIs. These two RPIs would then train the model on a local data for x epochs and provide the trained models/gradients back to the global server. On this server, the results would be federated averaged and a new model would be sent to the PIs. Is such a workflow possible with your framework? If so, could you provide me with a hint?any other examples using xaynet if possible?

Thank you,

opened by prathapkumarbaratam 10
Distribute through a privacy oriented application store

First of all, thank you for your awesome application and for making part of it libre, it is really awesome to see this kind of project follows these principles.

Following the spirit of conquering more individual and collective privacy in the virtual realm, it would fantastic if you could give some other form of installing the application for people who don't want to rely on Google's store, but also want to keep up to date and enjoy your fantastic project. Right now the only way of doing so is by using Aurora Store, which is a third party client to the aforementioned application, and even though it is not that bad, it would be even better if you wouldn't need to depend on them.

The best option would be to publish it on F-Droid, which is the most well known and has high standards for publication, but at the same time these high standards come with compromises since it requires to not use proprietary software. I don't know exactly how Xayn is constructed or what it has or hasn't but it would be interesting to at least discuss the possibility of liberating what's not libre.

https://f-droid.org/ https://forum.f-droid.org/t/how-to-publish-my-app-on-f-droid/198

Another good option which is less strict would be to publish it on Izzy on Droid, which is also a privacy and libre software oriented repository, but it is less strict than F-Droid.

https://apt.izzysoft.de/fdroid/

opened by LongJohn-Silver 1
Is it possible to have Federated Learning on Cloud-Edge?

Hi everyone,

Currently I am working on a school project about federated learning and came across your framework during exploratory analysis. My project should utilize federated learning in this manner - I have an aggregation server (let's say in a cloud). I want this server to provide model to my 2 Raspberry PIs. These two RPIs would then train the model on a local data for x epochs and provide the trained models/gradients back to the global server. On this server, the results would be federated averaged and new model would be sent to the PIs. Is such a workflow possible with your framework? If so, could you provide me a hint?

Thank you, Best regards

opened by Martiniann 2

Releases(v0.11.0)

v0.11.0(Jan 18, 2021)
Added

Rust SDK xaynet-sdk

xaynet-sdk contains the basic building blocks required to run the Privacy-Enhancing Technology (PET) Protocol. It consists of a state machine and two I/O interfaces with which specific Xaynet participants can be developed that are adapted to the respective environments/requirements.

If you are interested in building your own Xaynet participant, you can take a look at xaynet-sdk, our Rust participant which we use primarily for testing or at xaynet-mobile our mobile friendly participant.

A Mobile friendly Xaynet participant xaynet-mobile

xaynet-mobile provides a mobile friendly implementation of a Xaynet participant. It gives the user a lot of control on how to drive the participant execution. You can regularly pause the execution of the participant, save it, and later restore it and continue the execution. When running on a device that is low on battery or does not have access to Wi-Fi for instance, it can be useful to be able to pause the participant.

C API

Furthermore, xaynet-mobile offers C bindings that allow xaynet-mobile to be used in other programming languages such as Dart.

Python participant SDK xaynet-sdk-python

We are happy to announce that we finally released xaynet-sdk-python a Python SDK that consists of two experimental Xaynet participants (ParticipantABC and AsyncParticipant).

The ParticipantABC API is similar to the old one which we introduced in v0.8.0. Aside from some changes to the method signature, the biggest change is that the participant now runs in its own thread. To migrate from v0.8.0 to v0.11.0 please follow the migration guide.

However, we noticed that our Participant API may be difficult to integrate with existing applications, considering the code for the training has to be moved into the train_round method, which can lead to significant changes to the existing code. Therefore, we offer a second API (AsyncParticipant) in which the training of the model is no longer part of the participant.

A more in-depth explanation of the differences between the Participant APIs and examples of how to use them can be found here.

Multi-part messages

Participant messages can get large, possibly too large to be sent successfully in one go. On mobile devices in particular, the internet connection may not be as reliable. In order to make the transmission of messages more robust, we implemented multi-part messages to break a large message into parts and send them sequentially to the coordinator. If the transmission of part of a message fails, only that part will be resent and not the entire message.

Coordinator state managed in Redis

In order to be able to restore the state of the coordinator after a failure or shutdown, the state is managed in Redis and no longer in memory.

The Redis client can be configured via the [redis] setting:

[redis] url = "redis://127.0.0.1/"

Support for storing global models in S3/Minio

The coordinator is able to save a global model in S3/Minio after a successful round.

The S3 client can be configured via the [s3] setting:

[s3] access_key = "minio" secret_access_key = "minio123" region = ["minio", "http://localhost:9000"] [s3.buckets] global_models = "global-models"

xaynet-server must be compiled with the feature flag model-persistence in order to enable this feature.

Restore coordinator state

The state of the coordinator can be restored after a failure or shutdown. You can find more information about the restore behavior here.

Restoring the coordinator be configured via the [restore] setting:

[restore] enable = true

xaynet-server must be compiled with the feature flag model-persistence in order to enable this feature.

Improved collection of state machine metrics

In v0.10.0 we introduced the collection of metrics that are emitted in the state machine of xaynet-server and sent to an InfluxDB instance. In v0.11.0 we have revised the implementation and improved it further. Metrics are now sent much faster and adding metrics to the code has become much easier.

Removed

xaynet_client (was split into xaynet_sdk and xaynet_mobile)

xaynet_ffi (is now part of xaynet_mobile)

xaynet_macro

What's next?

Roadmap 2021
Source code(tar.gz)
Source code(zip)
v0.10.0(Sep 22, 2020)
Added

Preparation for redis support: prepare for xaynet_server to store PET data in redis #416, #515

Add support for multipart messages in the message structure #508, #513, #514

Generalised scalar extension #496, #507

Add server metrics #487, #488, #489, #493

Refactor the client into a state machine, and add a client tailored for mobile devices #471, #497, #506

Changed

Split the xaynet crate into several sub-crates:

xaynet_core (0.1.0 released), re-exported as xaynet::core

xaynet_client (0.1.0 released), re-exported as xaynet::client when compiled with --features client

xaynet_server (0.1.0 released), re-exported as xaynet::server when compiled with --features server

xaynet_macro (0.1.0 released)

xaynet_ffi (not released)

Source code(tar.gz)
Source code(zip)
v0.9.0(Jul 24, 2020)

xain/xain-fl repository was renamed to xaynetwork/xaynet.

The new crate is now published as xaynet: https://crates.io/crates/xaynet

Added

This release introduces the integration of the PET protocol into the platform.

Note: The integration of the PET protocol required a complete rewrite of the codebase and is therefore not compatible with the previous release.
Source code(tar.gz)
Source code(zip)
v0.8.0(Apr 8, 2020)
Added

New tutorial for the Python SDK (https://github.com/xainag/xain-fl/pull/355)

Swagger description of the REST API (https://github.com/xainag/xain-fl/pull/345), and is published at https://xain-fl.readthedocs.io/en/latest/ (https://github.com/xainag/xain-fl/pull/358)

The Python examples now accepts additional parameters (model size, heartbeat period, verbosity, etc.) (https://github.com/xainag/xain-fl/pull/351)

Publish docker images to dockerhub

Security

Stop using pickle for messages serialization (https://github.com/xainag/xain-fl/pull/355). pickle is insecure and can lead to remote code execution. Instead, the default aggregator uses numpy.save().

Fixed

The documentation has been updated at https://xain-fl.readthedocs.io/en/latest/ (https://github.com/xainag/xain-fl/pull/358)

Document aggregator error on Darwin platform (https://github.com/xainag/xain-fl/pull/365/files)

Changed

Simplified the Python SDK API (https://github.com/xainag/xain-fl/pull/355)

Added unit tests for the coordinator and aggregator (https://github.com/xainag/xain-fl/pull/353), (https://github.com/xainag/xain-fl/pull/352)

Refactor the metrics store (https://github.com/xainag/xain-fl/pull/340)

Speed up the docker builds (https://github.com/xainag/xain-fl/pull/348)

Source code(tar.gz)
Source code(zip)
v0.7.0(Mar 25, 2020)
On this release we archived the Python code under the legacy folder and shifted the development to Rust. This release has many breaking changes from the previous versions. More details will be made available through the updated README.md of the repository.

PB-584 Update the manifest Cargo.toml with description, license, keywords, repository URL and project homepage (#343) [Ricardo Saffi Marques]

PB-584 Add LICENSE file at the root of the repository prior to release v0.7.0 (#342) [Ricardo Saffi Marques]

Update Cargo.lock for release. [Ricardo Saffi Marques]

Update versions and authors (#335) [Corentin Henry]

Remove the rustfmt config file (#341) [Corentin Henry]

Remove nix files (#337) [Corentin Henry]

Remove caddy (#338) [Robert Steiner]

Nix-shell: install rust-analyzer (#336) [Corentin Henry]

Re-use the CHANGELOG file from the legacy codebase. [little-dude]

Rewrite xain-fl in Rust. [little-dude]

Merge pull request #64 from xainag/optional-metric-store. [Corentin Henry]

In docker, compile with influx_metrics [little-dude]

Silence "unused variable" warning from rustc. [little-dude]

Add CI jobs for the influx_metrics feature. [little-dude]

Introduce the influx_metrics feature. [little-dude]

Disable metrics when running locally. [little-dude]

Make the metric store optional. [little-dude]

Move the metric store to xain_fl::common. [little-dude]

Merge pull request #68 from xainag/ci-build-all-features. [Corentin Henry]

Fix build for the telemetry feature. [little-dude]

Ci: add matrix to tests various --features flags. [little-dude]

Merge pull request #67 from xainag/example-fixes. [Corentin Henry]

Sdk: participant should exit when a heartbeat fails. [little-dude]

Sdk: crash instead of calling sys.exit when participant errors out. [little-dude]

Fix exit condition in dummy example. [little-dude]

Merge pull request #65 from xainag/less-verbose-logging. [Corentin Henry]

Make logging more finely configurable. [little-dude]

Merge pull request #66 from xainag/fix-crash. [Corentin Henry]

Fix crash after training finishes. [little-dude]

Merge pull request #63 from xainag/misc. [Corentin Henry]

Log an error when a heartbeat is rejected. [little-dude]

Use more reasonable values in docker-release-aggregator.toml. [little-dude]

Sdk: make the heartbeat frequency configurable. [little-dude]

Simplify match statement. [little-dude]

Nix: add some cargo tools. [little-dude]

Merge pull request #61 from xainag/refactor. [Corentin Henry]

Refactor: simplify RPC code by using the ServiceHandle. [little-dude]

Merge pull request #52 from xainag/PB-490-protocol-tests. [Corentin Henry]

Fix protocol test: by default expect two rounds. [little-dude]

Document the protocol tests. [little-dude]

Remove ignore. [Robert Steiner]

Remove unused function. [Robert Steiner]

Fix typo. [Robert Steiner]

Clean up. [Robert Steiner]

Add full training test case. [Robert Steiner]

Clean up tests. [Robert Steiner]

Add end_training and end_aggregation tests. [Robert Steiner]

Add endtraining test. [Robert Steiner]

Remove comments. [Robert Steiner]

Add start training tests. [Robert Steiner]

Add heartbeat test. [Robert Steiner]

Fix tests. [Robert Steiner]

Add heartbeat tests. [Robert Steiner]

PB-490 Add protocol tests. [Robert Steiner]

Merge pull request #59 from xainag/opentelemetry. [Corentin Henry]

Integrate with opentelemetry and jaeger. [little-dude]

Get rid of log and env_logger dependencies. [little-dude]

Small refactoring. [little-dude]

Configure a custom Subscriber for the aggregator. [little-dude]

Split AggregatorService::poll_rpc_requests() [little-dude]

Implement Display for RPC requests. [little-dude]

Switch to tracing for logging. [little-dude]

Update README. [little-dude]

Merge pull request #56 from xainag/PB-491-test-python-ffi. [Corentin Henry]

Add reset, get_global_weights and aggregate tests. [Robert Steiner]

Add add_weights and aggregate test. [Robert Steiner]

Add python setup in rust-test. [Robert Steiner]

Add more tests. [Robert Steiner]

Add load test. [Robert Steiner]

Update README. [little-dude]

Add instructions to run the examples. [little-dude]

Remove more dependencies. [little-dude]

Remove unused dependency. [little-dude]

Bump dependencies. [little-dude]

Merge pull request #54 from xainag/simplify_rpc_client. [Corentin Henry]

Simplify the RPC implementation. [little-dude]

Install docker-compose. [little-dude]

Merge pull request #53 from xainag/error-handling. [Corentin Henry]

Aggregator: properly handle CTRL+C. [little-dude]

Aggregator: when a task finishes cancel the other ones. [little-dude]

Rename channels to reflect whether they are senders or receivers. [little-dude]

Improve error handling in the py_aggregator module. [little-dude]

Add anyhow and thiserror dependencies for error handling. [little-dude]

Merge pull request #49 from xainag/fix-clippy-warnings. [Corentin Henry]

Update ci. [Robert Debug]

Fix clippy warnings. [Robert Debug]

Deny warnings when compiling rust code. [little-dude]

Do not run CI for PRs. [little-dude]

Configure python CI (#50) [Robert Steiner]

Remove dummy aggregator. [little-dude]

Cleanup the keras_house_prices example. [little-dude]

Add vscode workspace settings to gitignore. [Robert Debug]

Clean up write metrics. [Robert Steiner]

Merge pull request #37 from xainag/benchmarks. [Corentin Henry]

Delete dummy tensorflow example. [little-dude]

Cleanup the keras benchmark. [little-dude]

Optimize dummy example. [little-dude]

Fix data_handlers. [little-dude]

Beef up the .dockerignore to speed up docker. [little-dude]

Port keras example from the benchmarks repo. [little-dude]

Remove junk files. [little-dude]

Beef up the .dockerignore to speed up docker. [little-dude]

Merge pull request #44 from xainag/add_metric_queue. [Corentin Henry]

Remove tests. [Robert Steiner]

Fix config. [Robert Steiner]

Add metric queue. [Robert Steiner]

Merge pull request #41 from xainag/valgrind. [Robert Steiner]

Remove valgrind in dev. [Robert Steiner]

Clean up. [Robert Steiner]

Fix rebase. [Robert Steiner]

Add valgrind. [Robert Debug]

Merge pull request #43 from xainag/memory-leak. [Corentin Henry]

Fix python memory leak. [little-dude]

Fix warning about unused import. [little-dude]

Configure logging for the weighted average python aggregator. [little-dude]

In the tf.py example just pretend to train. [little-dude]

Make participant logging less noisy. [little-dude]

Merge pull request #38 from xainag/add_rendez_vous_tests. [Robert Steiner]

Fmt. [Robert Steiner]

Clean up tests. [Robert Steiner]

Add rendez_vous tests. [Robert Steiner]

Rustfmt. [little-dude]

Fix CI. [little-dude]

Set working directory in CI. [little-dude]

Fix typo in github workflow file. [little-dude]

Add CI for rust. [little-dude]

Document how to run release builds. [little-dude]

Update images. [little-dude]

Add top level .gitignore. [little-dude]

Move the rust code into its own directory. [little-dude]

Add instructions for profiling. [little-dude]

TEST CONFIG. [little-dude]

Add instructions to run the tf example. [little-dude]

Sdk: handle start training rejections. [little-dude]

Fix shebang for nixos. [little-dude]

Tweak the log levels. [little-dude]

Use different tags for debug/release docker images. [little-dude]

Require more participants for release builds. [little-dude]

Docker: add release build and move configs out of docker dir. [little-dude]

Fix metric store tests. [Robert Steiner]

Merge pull request #32 from xainag/add-more-metrics. [Robert Steiner]

Delete old dashboard. [Robert Steiner]

Replace String with &'static str. [Robert Steiner]

Add more metrics. [Robert Steiner]

Add run participant script. [Robert Steiner]

Use real aggregator in docker. [little-dude]

Log why a start training request is rejected. [little-dude]

Train on smaller models (1M) [little-dude]

Fix py_aggregator memory leak. [little-dude]

Fix Dockerfile. [little-dude]

More profiling tools. [little-dude]

Nix: add gperftools dependency to debug aggregator memory leak. [little-dude]

Merge pull request #33 from xainag/aggregators. [Corentin Henry]

Add tensorflow example. [little-dude]

Make the sdk more generic. [little-dude]

Formatting and pylint fixes. [little-dude]

Add real aggregators. [little-dude]

Merge pull request #34 from xainag/docker-stats. [Robert Steiner]

Update caddyfile. [Robert Steiner]

Remove alertmanager. [Robert Steiner]

Remove alert manager. [Robert Steiner]

Add docker stats. [Robert Debug]

Repo cleanup. [little-dude]

Make the python aggregator configurable. [little-dude]

Fix aggregator port. [Robert Steiner]

Add numpy. [Robert Steiner]

Merge pull request #30 from xainag/collect_metrics. [Robert Steiner]

Remove clone impl. [Robert Steiner]

Collect Metrics. [Robert Steiner]

Fix ports in config files. [little-dude]

Sdk: various fixes. [little-dude]

Some more logging. [little-dude]

Wait for end aggregation message from the aggregator to enter finish state. [little-dude]

Don't panic upon un-expected end aggregation message from the aggregator. [little-dude]

Handle protocol events after polling the aggregation future. [little-dude]

Move DummyParticipant out of sdk package. [little-dude]

Working full round. [little-dude]

Working upload. [little-dude]

Python run black. [little-dude]

Some renaming. [little-dude]

Nix: fix build. [little-dude]

Basic upload handling. [little-dude]

Delete outdated examples. [little-dude]

Implement downloading weights. [little-dude]

Merge pull request #16 from xainag/metrics_store. [Robert Steiner]

Applied review changes. [Robert Steiner]

Add metrics store tests. [Robert Steiner]

Add metric store. [Robert Debug]

WIP. [Robert Debug]

Call poll. [Robert Debug]

Add metric store. [Robert Debug]

Fix RequestStream and RequestReceiver. [little-dude]

Merge pull request #25 from xainag/expose-rest-api. [Robert Steiner]

Fix docker build. [Robert Steiner]

Expose REST API. [Robert Debug]

Nix: move back to lorri. [little-dude]

Client: implement start_training. [little-dude]

Show milliseconds in log timestamps. [little-dude]

Remove docstring. [little-dude]

Fix build. [little-dude]

Start client implementation + minor fixes. [little-dude]

Quick & dirty json serialization. [little-dude]

Fix undeclared module. [Robert Debug]

Improve settings error handling. [Robert Debug]

Better logging for the python client. [little-dude]

Fix coordinator api paths. [little-dude]

Copy Participant abstract class from xain-sdk. [little-dude]

Rough http client. [little-dude]

Make timeout configurable. [little-dude]

Update config files. [little-dude]

Nix: do not re-install the python client in the shell hook. [little-dude]

More logging in the protocol. [little-dude]

Move settings to their own module. [little-dude]

Merge pull request #23 from xainag/error-handling-improvement. [Robert Steiner]

Replace match with ? [Robert Steiner]

Small improvements. [Robert Steiner]

Create package for a python client. [little-dude]

Start the HTTP server in both service. [little-dude]

Fix path in .dockerignore. [little-dude]

Create a docker directory. [little-dude]

Use specific config files for docker. [little-dude]

Merge pull request #22 from xainag/docker-bin-config. [Corentin Henry]

Add bin configs in docker-compose. [Robert Steiner]

Fix coordinator config file. [little-dude]

Add config files. [little-dude]

Merge pull request #19 from xainag/docker. [Corentin Henry]

Update docker files. [Robert Debug]

Update readme. [Robert Steiner]

Update bin section Cargo (used for cargo vendor) [Robert Steiner]

Add new line. [Robert Steiner]

Small improvements. [Robert Steiner]

Add dockerignore. [Robert Steiner]

Add docker file. [Robert Steiner]

Implement http layer for the coordinator. [little-dude]

Bump dependencies. [little-dude]

Implement http api for the aggregator. [little-dude]

Merge pull request #20 from xainag/repo-setup. [Corentin Henry]

Add new line. [Robert Steiner]

Update gitignore, add rust toolchain. [Robert Steiner]

Implement AggregatorServiceHandle. [little-dude]

Small cleanup. [little-dude]

Very rough implementation of aggregation. [little-dude]

Rename rpc aggregator method: reset->aggregate. [little-dude]

Add Protocol.waiting_for_aggregation. [little-dude]

Add logging. [little-dude]

Use stubborn-io. [little-dude]

Split the aggregator out of the coordinator. [little-dude]

Dummy aggregator::main() implementation. [little-dude]

Start implementing the AggregatorService future. [little-dude]

Document the RPC module. [little-dude]

Implement spawn_rpc() [little-dude]

Documentation, comments and logging for AggregatorService. [little-dude]

More rpc work. [little-dude]

Add commented out code. [little-dude]

Rework crate structure. [little-dude]

Update README. [little-dude]

Implement AggregatorTarpcServiceHandle. [little-dude]

Add diagram of envisioned architecture. [little-dude]

More py_aggregator work. [little-dude]

Move PyAggregator to crate::aggregator. [little-dude]

Start playing around with pyo3. [little-dude]

Bump dependencies. [little-dude]

Split examples and binaries. [little-dude]

Clippy + rustfmt. [little-dude]

First steps with the aggregator service: rpc. [little-dude]

Update sequence diagram. [little-dude]

Add a "common" module. [little-dude]

More logs and use reasonable values for example. [little-dude]

Require T: Debug. [little-dude]

Add readme. [little-dude]

Quick & dirty client implementation. [little-dude]

Start working on an example and fixing issues. [little-dude]

Rename state_machine into protocol + various cleanups. [little-dude]

Add sanity checks for counters. [little-dude]

Add new() methods. [little-dude]

Simplify state machine. [little-dude]

Implement start training and end training. [little-dude]

Get rid of the StateMachinEventHandler trait. [little-dude]

Implement selection. [little-dude]

Add coordinator handle. [little-dude]

Remove unused imports. [little-dude]

State machine cleanups. [little-dude]

Impl StateMachineHandler for CoordinatorService. [little-dude]

Get rid of the Client wrapper. [little-dude]

Initial commit. [little-dude]

Archive the legacy code. [little-dude]

Source code(tar.gz)
Source code(zip)
v0.6.0(Feb 26, 2020)
HOTFIX add disclaimer (#309) [janpetschexain]

PB-314: document the new weight exchange mechanism (#308) [Corentin Henry]

PB-407 add more debug level logging (#303) [janpetschexain]

PB-44 add heartbeat time and timeout to config (#305) [Robert Steiner]

PB-423 lock round access (#304) [kwok]

PB-439 Make thread pool workers configurable (#302) [Robert Steiner]

PB-159: update xain-{proto,sdk} dependencies to the right branch (#301) [Corentin Henry]

PB-159: remove weights from gRPC messages (#298) [Corentin Henry]

PB-431 send participant state to influxdb (#300) [Robert Steiner]

PB-434 separate metrics (#296) [Robert Steiner]

PB-406 :snowflake: Configure mypy (#297) [Anastasiia Tymoshchuk]

PB-428 send coordinator states (#292) [Robert Steiner]

PB-425 split weight init from training (#295) [janpetschexain]

PB-398 Round resumption in Coordinator (#285) [kwok]

Merge pull request #294 from xainag/master. [Daniel Kravetz]

Hotfix: PB-432 :pencil: :books: Update test badge and CI to reflect changes. [Daniel Kravetz]

PB-417 Start new development cycle (#291) [Anastasiia Tymoshchuk, kwok]

Source code(tar.gz)
Source code(zip)
v0.5.0(Feb 12, 2020)
Fix minor issues, update documentation.

PB-402 Add more logs (#281) [Robert Steiner]

DO-76 :whale: non alpine image (#287) [Daniel Kravetz]

PB-401 Add console renderer (#280) [Robert Steiner]

DO-80 :ambulance: Update dev Dockerfile to build gRPC (#286) [Daniel Kravetz]

DO-78 :sparkles: add grafana (#284) [Daniel Kravetz]

DO-66 :sparkles: Add keycloak (#283) [Daniel Kravetz]

PB-400 increment epoch base (#282) [janpetschexain]

PB-397 Simplify write metrics function (#279) [Robert Steiner]

PB-385 Fix xain-sdk test (#278) [Robert Steiner]

PB-352 Add sdk config (#272) [Robert Steiner]

Merge pull request #277 from xainag/master. [Daniel Kravetz]

Hotfix: update ci. [Daniel Kravetz]

DO-72 :art: Make CI name and feature consistent with other repos. [Daniel Kravetz]

DO-47 :newspaper: Build test package on release branch. [Daniel Kravetz]

PB-269: enable reading participants weights from S3 (#254) [Corentin Henry]

PB-363 Start new development cycle (#271) [Anastasiia Tymoshchuk]

PB-119 enable isort diff (#262) [janpetschexain]

PB-363 :gem: Release v0.4.0. [Daniel Kravetz]

DO-73 :green_heart: Disable continue_on_failure for CI jobs. Fix mypy. [Daniel Kravetz]

Source code(tar.gz)
Source code(zip)
v0.4.0(Feb 5, 2020)
Flatten model weights instead of using lists. Fix minor issues, update documentation.

PB-116: pin docutils version (#259) [Corentin Henry]

PB-119 update isort config and calls (#260) [janpetschexain]

PB-351 Store participant metrics (#244) [Robert Steiner]

Adjust isort config (#258) [Robert Steiner]

PB-366 flatten weights (#253) [janpetschexain]

PB-379 Update black setup (#255) [Anastasiia Tymoshchuk]

PB-387 simplify serve module (#251) [Corentin Henry]

PB-104: make the tests fast again (#252) [Corentin Henry]

PB-122: handle sigint properly (#250) [Corentin Henry]

PB-383 write aggregated weights after each round (#246) [Corentin Henry]

PB-104: Fix exception in monitor_hearbeats() (#248) [Corentin Henry]

DO-57 Update docker-compose files for provisioning InfluxDB (#249) [Ricardo Saffi Marques]

DO-59 Provision Redis 5.x for persisting states for the Coordinator (#247) [Ricardo Saffi Marques]

PB-381: make the log level configurable (#243) [Corentin Henry]

PB-382: cleanup storage (#245) [Corentin Henry]

PB-380: split get_logger() (#242) [Corentin Henry]

XP-332: grpc resource exhausted (#238) [Robert Steiner]

XP-456: fix coordinator command (#241) [Corentin Henry]

XP-485 Document revised state machine (#240) [kwok]

XP-456: replace CLI argument with a config file (#221) [Corentin Henry]

DO-48 :snowflake: :rocket: Build stable package on git tag with SemVer (#234) [Daniel Kravetz]

XP-407 update documentation (#239) [janpetschexain]

XP-406 remove numpy file cli (#237) [janpetschexain]

XP-544 fix aggregate module (#235) [janpetschexain]

DO-58: cache xain-fl dependencies in Docker (#232) [Corentin Henry]

XP-479 Start training rounds from 0 (#226) [kwok]

Source code(tar.gz)
Source code(zip)
v0.3.0(Jan 23, 2020)
XP-337 Clean up docs before generation (#188)

XP-229 Update Readme.md (#189)

XP-255 update codeowners and authors in setup (#195)

XP-265 move benchmarks to separate repo (#193)

XP-243 remove flags from xain-fl codebase

XP-255 update codeowners and authors in setup (#195)

XP-257 cleanup cproto dir (#198)

XP-261 move tests to own dir (#197)

XP-168 update setup.py (#191)

DO-35 :newspaper: :sparkles: Add placeholder for e2e. Update CI for gitflow

XP-241 remove legacy participant and sdk dir (#199)

DO-17 :whale: :sparkles: Add Dockerfiles, dockerignore and docs (#202)

XP-354 Remove proto files (#200)

XP-385 Fix docs badge (#204)

XP-273 scripts cleanup (#206)

XP-357 make controller parametrisable (#201)

XP-384 remove unused files (#209)

DO-43 docker compose minio (#208)

XP-374 Clean up docs (#211)

XP-271 fix pylint issues (#210)

XP-424 Remove unused packages (#212)

DO-49 :whale: :sparkles: Create initial buckets (#213)

XP-373 add sdk as dependency in fl (#214)

XP-433 Fix docker headings (#218)

XP-119 Fix gRPC testing setup so that it can run on macOS (#217)

XP-422 ai metrics (#216)

XP-308: store aggregated weights in S3 buckets (#215)

XP-436 Reinstate FINISHED heartbeat from Coordinator (#219)

XP-208 model sum aggregator as a mock for testing

XP-208 add test integrating P and C

XP-208 add identity controller as a mock for test

XP-208 small typo in id-controller docstring

XP-208 add coordinator test fixture with mocked components

XP-208 test for top-level function start_participant

XP-308: print debugging information for spurious failure

XP-208 join monitor_thread to confirm response to terminate_event

XP-208 change mock patch string to object called rather than defined

XP-208 model sum aggregator as a mock for testing

XP-208 add test integrating P and C

XP-208 add identity controller as a mock for test

XP-208 small typo in id-controller docstring

XP-208 add coordinator test fixture with mocked components

XP-208 test for top-level function start_participant

XP-436 fix bug: coordinator not advertising FINISHED state

XP-436 revert logging level

XP-208 join monitor_thread to confirm response to terminate_event

XP-208 change mock patch string to object called rather than defined

XP-436 suggested documentation fixes from review

XP-208 mock the store (following rebase)

XP-208 fix import

XP-208 test_start_participant: mark as slow, comment on aggregation

XP-480 revise message names (#222)

XP-499 Remove conftest, exclude tests folder (#223)

XP-333 Replace numproto with xain-proto (#220)

XP-505: docstrings cleanup (#224)

XP-508 Replace circleci badge (#225)

XP-510 allow for zero epochs on cli (#227)

XP-498: more generic shebangs (#229)

XP-505: cleanup docstrings in xain_fl.coordinator (#228)

Prepare release of v0.3.0 (matching version with xain-sdk and xain-proto) (#230)

Co-authored-by: Anastasiia Tymoshchuk [email protected] Co-authored-by: Daniel Kravetz [email protected] Co-authored-by: Felix Reichel [email protected] Co-authored-by: Anselmo Sampietro [email protected] Co-authored-by: Robert Steiner [email protected] Co-authored-by: Corentin Henry [email protected] Co-authored-by: janpetschexain [email protected] Co-authored-by: kwok [email protected]
Source code(tar.gz)
Source code(zip)
v0.2.0(Dec 2, 2019)
[0.2.0] - 2019-12-02

Changed

Renamed package from xain to xain-fl

Source code(tar.gz)
Source code(zip)
v0.1.0(Sep 25, 2019)
The first public release of XAIN

Added

FedML implementation on well known benchmarks using a realistic deep learning model structure.

Source code(tar.gz)
Source code(zip)