Border

Border is a reinforcement learning library in Rust. For reusability of both RL environments and agents, this library provides a reference implementation of environments and agents, both of which are independent on each other. Documentation provides more information about how this library is used.

Status

Border is currently under development. API is unstable.

Prerequisites

In order to run examples, install python>=3.7 and gym, for which the library provides a wrapper using PyO3. As the agents used in the examples are based on tch-rs, libtorch is required to be installed. Some examples requires PyBullet Gym.

Examples

Random policy on cartople environment

The following command runs a random controller (policy) for 5 episodes in CartPole-v0:

$ cargo run --example random_cartpole

It renders during the episodes and generates a csv file in examples/model, including the sequences of observation and reward values in the episodes.

$ head -n3 examples/model/random_cartpole_eval.csv
0,0,1.0,-0.012616985477507114,0.19292789697647095,0.04204097390174866,-0.2809212803840637
0,1,1.0,-0.008758427575230598,-0.0027677505277097225,0.036422546952962875,0.024719225242733955
0,2,1.0,-0.008813782595098019,-0.1983925849199295,0.036916933953762054,0.3286677300930023

Deep Q-network (DQN) on cartpole environment

The following command trains a DQN agent:

$ cargo run --example dqn_cartpole

After training, the trained agent runs for 5 episodes. The parameters of the trained Q-network (and the target network) are saved in examples/model/dqn_cartpole.

Soft actor-critic (SAC) on pendulum environment

The following command trains a SAC agent on Pendulum-v0, which takes continuous action:

$ cargo run --example sac_pendulum

The code defines an action filter that doubles the torque in the environment.

Atari games

The following command trains a DQN agent on PongNoFrameskip-v4:

$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4

During training, the program will save the model parameters when the evaluation reward achieves its maximum value. The agent can be trained for other atari games (e.g., SeaquestNoFrameskip-v4) by replacing the name of the environment in the above command.

For Pong, you can download a pretrained agent from my google drive and see how it plays with the following command:

$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4 --play-gdrive

The pretrained agent will be saved locally in $HOME/.border/model.

Vectorized environment for atari games

(The code might be broken due to recent changes. It will be fixed in future. The below description is for an older version)

The following command trains a DQN agent in an vectorized environment of Pong:

$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_pong_vecenv

The code demonstrates how to use vectorized environments, in which 4 environments are running synchronously. It took about 11 hours for 2M steps (8M transition samples) on a g3s.xlarge instance of EC2. Hyperparameter values, tuned specific to Pong instead of all Atari games, are adapted from the book Deep Reinforcement Learning Hands-On. The learning curve is as shown below.

After the training, you can see how the agent plays:

$ PYTHONPATH=$REPO/examples cargo run --example dqn_pong_eval

Features

Environments which wrap gym using PyO3 and ndarray
Interfaces to record quantities in training process or in evaluation path
- Support tensorboard using tensorboard-rs
Vectorized environment using a tweaked atari_wrapper.py, adapted from the RL example in tch
Agents based on tch
- Currently including DQN, SAC, Implicit quantile network (IQN).