API functions

Based on the CBEngine, we provide APIs that share the similar parameters as the OpenAI Gym environment.

Simulation Initialization

env_config = {
    "simulator_cfg_file": simulator_cfg_file,
    "thread_num": 8,
    "gym_dict": gym_configs,
    "metric_period": 200,
    "vehicle_info_path": "/starter-kit/log/"
}
env = CBEngine_rllib_class(env_config)

simulator_cfg_file:

the path of simulator.cfg
be used for initialize engine

Example

#configuration for simulator

# Time Parameters
start_time_epoch = 0
max_time_epoch = 3600

# Roadnet file and flow file used to simulate
road_file_addr : /starter-kit/data/roadnet_round3.txt
vehicle_file_addr : /starter-kit/data/flow_round3_flow0.txt


# Log configuration
# Don't change the value of report_log_mode
report_log_mode : normal
# Log path
report_log_addr : ./log/
# Log interval
report_log_rate = 10
# Log configuration to track the vehicle. Don't change the value
warning_stop_time_log = 100

thread_num:

the thread number used for engine

gym_dict:

the configuration used for initialize gym
a dict
The meaning of it is clarified at next section.
stored in /agent/gym_cfg.py, as a member variable of class gym_cfg.

Example of gym_dict

gym_dict = {
    'observation_features':['lane_vehicle_num'],
    'observation_dimension':24,
    'custom_observation' : False
}

metric_period:

the interval of scoring
At each intervals, output a score json file

vehicle_info_path:

the path of vehicle informaton log

Environment Configuration: gym_cfg.py

gym_cfg.py in agent folder defines the configuration of gym environment. Currently it contains observation features. There are two options in observation features, namely lane_speed , lane_vehicle_num, which determines the content of observations you get from the env.step() api. You must write at least one of the two features.

class gym_cfg():
    def __init__(self):


        self.cfg = {
            'observation_features':['lane_vehicle_num'],
            'observation_dimension':24,
            'custom_observation' : False
        }

self.cfg:

store the configuration of gym
custom_observation’: If ‘True’, use costom observation feature in CBEngine_round3.py. If ‘False’, use ‘observation_features’
observation_features’ : Same as round2. Add ‘classic’ observation feature, which has dimension of 16. It’s order will be same as the order of observation from env.step()
observation_dimension’ : The dimension of observation. Need to be correct both custom observation and default observation.

Simulation Step

step(actions):

Simulate 10 seconds in engine.
The format of action is specified below.
return observation, reward, info, dones
The format of observations, rewards, infos and dones is specified below.

observation, reward, dones, info = env.step(action)

actions:

Required to be a dict:

``{agent_id_1: phase_1, ... , agent_id_n: phase_n}``

Set agent_id to some phase (The figure below demonstrates the allowed traffic movements in each phase)
The phase is required to be an integer in the range [1, 8] (note there is no 0)
The initial phases of all agents are set to 1
The phase of an agent will remain the same as the last phase if not specified in the dict actions
Attention: If an agent is switched to a different phase, there will be a 5 seconds period of ‘all red’ at this agent, which means all vehicles could not pass this intersection. If continuously switched to different phase, agent would be always ‘all red’.
In final round, agent_id will be str rather than int

observations:

a dict
format:

{
    # agent_id : {'observation' : obs}
    '12647332106' : {'observation': [0, 0, 0, 0, 0, 0, 0, 2, 0, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1]}
}

The key is agent_id (str) , the value is a dict. The dict only contains one key “observation”, and its value is a list catenated by the order in 'observation_features' of gym_cfg.py
Format of the 'lane_speed', 'lane_vehicle_num' and 'classic' observations_values are described below:

# observation values:

# 'lane_speed' sample: [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2]
# There are 24 lanes left. The order of their roads is defined in 'signal' part of roadnet file
# the order is :inroad0lane0, inroad0lane1, inroad0lane2, inroad1lane0 ... inroad3lane2, outroad0lane0, outroad0lane1 ...
# Note that, [lane0, lane1, lane2] indicates the [left_turn lane, approach lane, right_turn lane] repespectively of the corresponding road.
# The order of roads are determined clockwise.
# If there is a -1 in the signal part of roadnet file (which indicates this road doesn't exist), then the returned observation of the corresponding lanes on this road are also 3 -1s.
# -2 indicating there's no vehicle on this lane

# 'lane_vehcile_num' sample [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,]
# There are 24 lanes left. The order of their roads is defined in 'signal' part of roadnet file
# the order is :inroad0lane0, inroad0lane1, inroad0lane2, inroad1lane0 ... inroad3lane2, outroad0lane0, outroad0lane1 ...
# If there is -1 in signal part of roadnet file, then the lane of this road is filled with three -1.


# 'classic' sample: [1, 0, 0, 0, 3, 2, 1, 4, 1, 0, 0, 0, 1, 0, 0, 0]
# the first 8 values are the number of vehicles of left-turing and go-straight lanes ordered by the 'signal' part of roadnet file
# the last 8 values are the one-hot code indicates which lanes are available in last signal phase

rewards:

a dict
key is str
must implement in CBEngine_round3
{agent_id_1: reward_values_1, …, agent_id_n: reward_values_n}
Format of reward_values:
reward in rllib needs to be single values. We provide 2 types of rewards definition as demos, pressure and queue length , along with the old rewards.

# Sample Output
{
0: -0.5
}

Here is an illustration of the lane index in observation and reward.

info:

a dict
key is vehicle ID, values includes ‘distance’, ‘drivable’, ‘road’, ‘speed’ and ‘start_time’
{vehicle_id_1: vehicle_info_1, …, vehicle_id_m: vehicle_info_m}
env.set_info(1) to return a dictionary of vehicle information, otherwise, return an empty dictionary.
“route” and “t_ff” are removed from “vehicle_info” in final phase

0: # 0 is the vehicle ID
{
    "distance": [259.0], # The distance from this vehicle to the start point of current road.
    "drivable": [29301.0], # Current lane of this vehicle. Here 293 is the road segment ID, 01 indicates the middle lane (00 and 02 indicate inner and outer lanes respectively)
    "road": [293.0], # Current road of this vehicle.
    "speed": [0.0], # Current instantaneous speed of this vehicle.
    "start_time": [73.0], # Time of creation of this vehicle.
    },
...
}

dones:

a dict
{agent_id_1: bool_value_1, …, agent_id_n: bool_value_n}
Indicating whether the simulation of an agent is ended.

Simulation Reset

reset:

Reset the simulation
return a dict: observation
reset the engine

observation = env.reset()

Other interface

The following interfaces of simulation environment are also provided:

set_warning(flag):

env.set_warning(0): set flag as False to turn off the warning of invalid phases. The warning will be issued if a green phase to an inexistent lane.

set_log(flag):

env.set_log(0): set flag as False to turn off logs for a faster speed when training. Note that the score function will not work if the logging is turned off.

set_ui(flag):

env.set_ui(0): set flag as False to turn off visualization logs for a faster speed when training.

set_info(flag):

env.set_info(0): set flag as False to make info that returned from env.step to be None, which can make training faster.