Border
Border is a reinforcement learning library in Rust. For reusability of both RL environments and agents, this library provides a reference implementation of environments and agents, both of which are independent on each other. Documentation provides more information about how this library is used.
Status
Border is currently under development. API is unstable.
Prerequisites
In order to run examples, install python>=3.7 and gym, for which the library provides a wrapper using PyO3. As the agents used in the examples are based on tch-rs, libtorch is required to be installed. Some examples requires PyBullet Gym.
Examples
Random policy on cartople environment
The following command runs a random controller (policy) for 5 episodes in CartPole-v0:
$ cargo run --example random_cartpole
It renders during the episodes and generates a csv file in examples/model
, including the sequences of observation and reward values in the episodes.
$ head -n3 examples/model/random_cartpole_eval.csv
0,0,1.0,-0.012616985477507114,0.19292789697647095,0.04204097390174866,-0.2809212803840637
0,1,1.0,-0.008758427575230598,-0.0027677505277097225,0.036422546952962875,0.024719225242733955
0,2,1.0,-0.008813782595098019,-0.1983925849199295,0.036916933953762054,0.3286677300930023
Deep Q-network (DQN) on cartpole environment
The following command trains a DQN agent:
$ cargo run --example dqn_cartpole
After training, the trained agent runs for 5 episodes. The parameters of the trained Q-network (and the target network) are saved in examples/model/dqn_cartpole
.
Soft actor-critic (SAC) on pendulum environment
The following command trains a SAC agent on Pendulum-v0, which takes continuous action:
$ cargo run --example sac_pendulum
The code defines an action filter that doubles the torque in the environment.
Atari games
The following command trains a DQN agent on PongNoFrameskip-v4:
$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4
During training, the program will save the model parameters when the evaluation reward achieves its maximum value. The agent can be trained for other atari games (e.g., SeaquestNoFrameskip-v4
) by replacing the name of the environment in the above command.
For Pong, you can download a pretrained agent from my google drive and see how it plays with the following command:
$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4 --play-gdrive
The pretrained agent will be saved locally in $HOME/.border/model
.
Vectorized environment for atari games
(The code might be broken due to recent changes. It will be fixed in future. The below description is for an older version)
The following command trains a DQN agent in an vectorized environment of Pong:
$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_pong_vecenv
The code demonstrates how to use vectorized environments, in which 4 environments are running synchronously. It took about 11 hours for 2M steps (8M transition samples) on a g3s.xlarge
instance of EC2. Hyperparameter values, tuned specific to Pong instead of all Atari games, are adapted from the book Deep Reinforcement Learning Hands-On. The learning curve is as shown below.
After the training, you can see how the agent plays:
$ PYTHONPATH=$REPO/examples cargo run --example dqn_pong_eval
Features
- Environments which wrap gym using PyO3 and ndarray
- Interfaces to record quantities in training process or in evaluation path
- Support tensorboard using tensorboard-rs
- Vectorized environment using a tweaked atari_wrapper.py, adapted from the RL example in tch
- Agents based on tch
- Currently including DQN, SAC, Implicit quantile network (IQN).
Roadmap
- More tests and documentations
- More environments
- More RL algorithms
License
Border is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).