🏋️‍♀️ bevy_rl

🏗️ Build 🤔 Reinforcement Learning 🏋🏿‍♂️ Gym environments with 🕊 Bevy engine to train 👾 AI agents that 💡 learn from 📺 screen pixels.

Compatibility

bevy version	bevy_rl version
0.7	0.0.5
0.8	0.8.4
0.9	0.9.1

📝Features

Set of APIs to implement OpenAI Gym interface
REST API to control an agent
Rendering to RAM membuffer

📋 Changelog

0.8.4
- Added object representation of observation space

👩‍💻 Usage

1. Define App States


#[derive(Debug, Clone, Eq, PartialEq, Hash)]
enum AppState {
    InGame,  // where all the game logic is executed
    Control, // A paused state in which bevy_rl waits for agent actions
    Reset,   // A request to reset environment state
}

2. Define Action Space and Observation Space

A action space is a set of actions that an agent can take. An observation space is a set of observations that an agent can see. Action space can be discrete or continuous. Observations should be serializable to JSON with serde_json crate.

// Action space
bitflags! {
    #[derive(Default)]
    pub struct PlayerActionFlags: u32 {
        const FORWARD = 1 << 0;
        const BACKWARD = 1 << 1;
        const LEFT = 1 << 2;
        const RIGHT = 1 << 3;
    }
}

// Observation space
#[derive(Default, Serialize, Clone)]
pub struct EnvironmentState {
    pub map: GameMap,
    pub actors: Vec<Actor>,
}

3. Enable AI Gym Plugin

Width and hight should exceed 256, otherwise wgpu will panic.

    let gym_settings = AIGymSettings {
        width: 256,
        height: 256,
        num_agents: 16,
    };

    app
        .insert_resource(gym_settings.clone())
        .insert_resource(Arc::new(Mutex::new(AIGymState::<
            PlayerActionFlags,
            EnvironmentState,
        >::new(gym_settings.clone()))))
        .add_plugin(AIGymPlugin::<PlayerActionFlags, EnvironmentState>::default())

4. Implement Environment Logic

DelayedControlTimer should pause environment execution to allow agents to take actions.

struct DelayedControlTimer(Timer);

Define systems that implement environment logic.

app.add_system_set(
    SystemSet::on_update(AppState::InGame)
        .with_system(turnbased_control_system_switch),
);

app.insert_resource(DelayedControlTimer(Timer::from_seconds(0.1, true))); // 10 Hz
app.add_system_set(
    SystemSet::on_update(AppState::Control)
        // Game Systems
        .with_system(turnbased_text_control_system) // System that parses user command
        .with_system(execute_reset_request),        // System that performs environment state reset
);

turnbased_control_system_switch should pause game world and poll bevy_rl for agent actions.

fn turnbased_control_system_switch(
    mut app_state: ResMut<State<AppState>>,
    time: Res<Time>,
    mut timer: ResMut<DelayedControlTimer>,
    ai_gym_state: ResMut<Arc<Mutex<AIGymState<PlayerActionFlags>>>>,
) {
    if timer.0.tick(time.delta()).just_finished() {
        app_state.push(AppState::Control);
        physics_time.pause();

        let ai_gym_state = ai_gym_state.lock().unwrap();
        ai_gym_state.send_step_result(true);
    }
}

execute_reset_request handles environment reset request. turnbased_control_system_switch in this example parses agent actions and issues commands to agents in environment via control_agents.

pub(crate) fn execute_reset_request(
    mut app_state: ResMut<State<AppState>>,
    ai_gym_state: ResMut<Arc<Mutex<AIGymState<PlayerActionFlags>>>>,
) {
    let ai_gym_state = ai_gym_state.lock().unwrap();
    if !ai_gym_state.is_reset_request() {
        return;
    }

    ai_gym_state.receive_reset_request();
    app_state.set(AppState::Reset).unwrap();
}

pub(crate) fn turnbased_control_system_switch(
    mut app_state: ResMut<State<AppState>>,
    time: Res<Time>,
    mut timer: ResMut<DelayedControlTimer>,
    ai_gym_state: ResMut<Arc<Mutex<AIGymState<PlayerActionFlags>>>>,
    ai_gym_settings: Res<AIGymSettings>,
    mut physics_time: ResMut<PhysicsTime>,
) {
    if timer.0.tick(time.delta()).just_finished() {
        app_state.overwrite_push(AppState::Control).unwrap();
        physics_time.pause();

        let ai_gym_state = ai_gym_state.lock().unwrap();
        let results = (0..ai_gym_settings.num_agents).map(|_| true).collect();
        ai_gym_state.send_step_result(results);
    }
}

pub(crate) fn turnbased_text_control_system(
    agent_movement_q: Query<(&mut heron::prelude::Velocity, &mut Transform, &Actor)>,
    collision_events: EventReader<CollisionEvent>,
    event_gun_shot: EventWriter<EventGunShot>,
    ai_gym_state: ResMut<Arc<Mutex<AIGymState<PlayerActionFlags>>>>,
    ai_gym_settings: Res<AIGymSettings>,
    mut app_state: ResMut<State<AppState>>,
    mut physics_time: ResMut<PhysicsTime>,
) {
    let mut ai_gym_state = ai_gym_state.lock().unwrap();

    // Drop the system if users hasn't sent request this frame
    if !ai_gym_state.is_next_action() {
        return;
    }

    let unparsed_actions = ai_gym_state.receive_action_strings();
    let mut actions: Vec<Option<PlayerActionFlags>> =
        (0..ai_gym_settings.num_agents).map(|_| None).collect();

    for i in 0..unparsed_actions.len() {
        let unparsed_action = unparsed_actions[i].clone();
        ai_gym_state.set_reward(i, 0.0);

        if unparsed_action.is_none() {
            actions[i] = None;
            continue;
        }

        let action = match unparsed_action.unwrap().as_str() {
            "FORWARD" => Some(PlayerActionFlags::FORWARD),
            "BACKWARD" => Some(PlayerActionFlags::BACKWARD),
            "LEFT" => Some(PlayerActionFlags::LEFT),
            "RIGHT" => Some(PlayerActionFlags::RIGHT),
            _ => None,
        };

        actions[i] = action;
    }

    // Send environment state to AI Gym
    ai_gym_state.set_env_state(EnvironmentState {});

    physics_time.resume();
    control_agents(actions, agent_movement_q, collision_events, event_gun_shot);

    app_state.pop().unwrap();
}

💻 AIGymState API

Method	Description
`send_step_result(results: Vec<bool>)`	Send upon agents interactions are complete
`send_reset_result(result: bool)`	Send when reset request is complete
`receive_action_strings(Vec<Option<String>>)`	Recieve environment for agent actions
`receive_reset_request()`	Recieve environment for reset request
`is_next_action() -> bool`	Whether agent actions are supplied
`is_reset_request() -> bool`	Whether reset request was sent
`set_reward(agent_index: usize, score: f32)`	Set reward for an agent
`set_terminated(agent_index: usize, result: bool)`	Set termination status for an agent
`reset()`	Reset bevy_rl state
`set_env_state(state: B)`	Set current environment state

🌐 REST API

Method	Verb	bevy_rl version
Camera Pixels	GET	`http://localhost:7878/visual_observations`
State	GET	`http://localhost:7878/state`
Reset Environment	POST	`http://localhost:7878/reset`
Step	GET	`http://localhost:7878/step` `payload=ACTION`

✍️ Examples

bevy_rl_shooter — example FPS project