SMARTS 快速上手使用

2023年4月21日07:56:56

评论 8465字阅读28分13秒

摘要这里对 SMARTS 进行了简单的介绍。主要是包含（1）SMARTS 的安装；（2）SMARTS 中的一些概念：SMARTS Env，对不同 RL 框架的接口；SMARTS RL 三要素，预定义的 Obs，Action 和 Reward；SMARTS Agent，定义了每个车辆可以获得的信息，这个可以认为是 SMARTS Env 的一部分；

文章目录(Table of Contents)

简介

在这篇中，我们对 SMARTS 进行了简单的介绍。主要是包含以下的内容：

SMARTS 的安装；
SMARTS 中的一些概念：
- SMARTS Env，对不同 RL 框架的接口；
- SMARTS RL 三要素，预定义的 Obs，Action 和 Reward；
- SMARTS Agent，定义了每个车辆可以获得的信息，这个可以认为是 SMARTS Env 的一部分；

最后会介绍一个例子，结合上面的内容。

参考资料

SMARTS Github 仓库，官方的仓库，注意使用最新版本；
SMARTS Setup，官方安装说明；

SMARTS 的安装

最新的安装说明可以参考SMARTS Setup，这里做一个简单说明。首先克隆仓库。

$ git clone https://github.com/huawei-noah/SMARTS.git
$ cd <path/to/SMARTS>
# For latest stable release
$ git checkout master

进入项目，可以在 setup.cfg 中看到依赖的库，以及 options 选项对应的库的安装。同时安装的时候以「开发模式」进行安装，方便后面对仓库进行更新后可以不用二次安装。下面是安装的命令：

pip install -e '.[camera_obs,test]'

安装完毕之后，可以运行下面的命令进行测试。测试结果会保存在 sanity_test_result.xml，如果出现错误可以进行查看。

make sanity-test

Environments（环境）

这部分主要参考，SMARTS--Environment。SMARTS 环境模块定义在 env 部分。目前 SMARTS 提供两种训练的环境，分别是：

HiWayEnv utilizing a gymnasium.Env interface
RLlibHiwayEnv customized for RLlib training

HiWayEnv 环境

HiWayEnv 继承自 gymnasium.Env 支持 gym API，例如 reset，step，close 等。下面是一个 HiWayEnv 的例子。核心就是 gym.make 的时候，使用 smarts.env:hiway-v0：

import gymnasium as gym
# Make env
env = gym.make(
"smarts.env:hiway-v0", # Env entry name.
scenarios=[scenario_path], # List of paths to scenario folders.
agent_interfaces={AGENT_ID: agent_spec.interface}, # Dictionary mapping agents to agent interfaces.
headless=False, # False to enable Envision visualization of the environment.
seed=42, # RNG seed. Seeds are set at the start of simulation, and never automatically re-seeded.
)
# Reset env and build agent.
observations = env.reset()
agent = agent_spec.build_agent()
# Step env.
agent_obs = observations[AGENT_ID]
agent_action = agent.act(agent_obs)
observations, rewards, dones, infos = env.step({AGENT_ID: agent_action})
# Close env.
env.close()

如果对于 multi-agent 的环境，可以使用 HiWayEnvV1 环境。当 scenarios 是一个可迭代对象的时候，每一次 env.reset 的时候就会换下一个场景，这样可以遍历到所有的场景。

RLlibHiwayEnv 环境

RLlibHiWayEnv 继承自 MultiAgentEnv 环境（来自 RLlib）。他也是支持常见的环境的 API，例如 reset，step，close 等。下面是一个例子，这里直接从 rllib_hiway_env 中导入 RLlibHiWayEnv，和上面有一些区别：

from smarts.env.rllib_hiway_env import RLlibHiWayEnv
env = RLlibHiWayEnv(
config={
"scenarios": [scenario_path], # List of paths to scenario folders.
"agent_specs": {AGENT_ID: agent_spec}, # Dictionary mapping agents to agent specs.
"headless": False, # False to enable Envision visualization of the environment.
"seed": 42, # RNG seed. Seeds are set at the start of simulation, and never automatically re-seeded.
}
)
# Reset env and build agent.
observations = env.reset()
agent = agent_spec.build_agent()
# Step env.
agent_obs = observations[AGENT_ID]
agent_action = agent.act(agent_obs)
observations, rewards, dones, infos = env.step({AGENT_ID: agent_action})
# Close env.
env.close()

由于多个智能体开始和结束时间是不同的，可能出现某个时间段没有智能体是活动的，这个时候会返回一个空的字典。

SMARTS ENV 车辆类型

在 SMARTS 的环境中，支持下面类型的车辆：

ego agents - controlled by RL policy currently in training.
social agents - controlled by (pre-trained) policies from the Agent Zoo (see policies). Like ego agents, social agents also use AgentSpec to register with the environment. They interact by watching the observation and returning action messages. Compared to ego agents, social agents are driven by trained models, hence they can provide the behavioral characteristics we want.
traffic vehicles - controlled by an underlying traffic engine, like SUMO or SMARTS.
dataset vehicles - controlled by replay of traffic history from naturalistic datasets such as Argoverse, NGSIM, and Waymo.

也就是「ego vehicle」是目前正在训练的 agent；「social agent」是按照既定的策略执行；「traffic vehicle」通过 traffic engine 来控制，例如 SUMO；「dataset vehicles」可以回放数据集中的数据。

RL 三要素

上面介绍了 SMARTS 中的环境的概念。下面是 RL 的三要素，分别是 Observation，Action 和 Reward。这部分内容参考连接，SMARTS-Observation, Action, and Reward。

Observation

完整的观测包含内容可以参考 SMARTS-Observation, Action, and Reward。这里只介绍部分。现在假设仿真如下图所示，方便解释一些 state：

ego_vehicle_state，控制车辆的信息（包含位置，所在车道等）；

neighborhood_vehicle_states，周围车辆的信息；

occupancy_grid_map，（包含行人的时候推荐使用），此时的结果如下所示：

drivable_area_grid_map，车道的可视化：

top_down_rgb，俯视的结果：

Action

SMARTS 提供了多种 action 的组合。完整的动作列表可以参考 SMARTS-Observation, Action, and Reward。例如：

Continuous
Lane，离散动作，包括：“keep_lane”，“slow_down”，“change_lane_left”, and “change_lane_right”；
ActuatorDynamic，Action=(throttle, brake, steering_rate)
LaneWithContinuousSpeed，Action=(target_speed, lane_change).
TargetPose，连续动作空间
RelativeTargetPose
Trajectory
MultiTargetPose
MPC，Adaptive control performed on the vehicle model to match the given trajectory comprising of vehicle’s x coordinates, y coordinates, headings, and speeds.
TrajectoryWithTime
Direct

Reward

默认的 reward 就是行驶的距离。下面式子中 x 表示从最后一个非零 step 开始行驶的距离：

SMARTS Agent

这里介绍 SMARTS 里面 agent 的概念。说是 agent，但我感觉这个更像是 env 的一部分。因为我们提取特征或是控制的时候，是按照 vehicle 为单位进行控制，每一辆车提取的信息都是可以不一样的。于是 Agent 在 SMART 里面就是每一辆车。

在 SMARTS 中，一个 agent 需要明确两个部分，分别是：

interface（接口），用于与环境进行交互，获得环境的信息；
policy（策略），根据 obs 做出的策略，这里可以加载模型；

这部分完整的内容可以参考 SMARTS-Agent。

Agent Interface

The AgentInterface regulates information flow between the agent and SMARTS environment.

It specifies the observation from the environment to the agent, by selecting the sensors to enable in the vehicle.
It specifies the action from the agent to the environment. Attribute action controls the action type used. There are multiple action types to choose from ActionSpaceType.

我们可以直接使用「预定义」好的接口，或是自己定义接口。

预定义好的接口

SMART 提供了下面几种定义好的接口，我们通过预定义好的接口，使得 vehicle 与环境进行交互：

AgentType.Full
AgentType.StandardWithAbsoluteSteering
AgentType.Standard
AgentType.Laner
AgentType.LanerWithSpeed
AgentType.Tracker
AgentType.TrajectoryInterpolator
AgentType.MPCTracker
AgentType.Boid

SMARTS-Agent 中包含对每一种接口的介绍。下图介绍了 Full 和 StandardWithAbsoluteSteering 两种接口。

我们也可以直接在某一个接口的基础上添加某些属性，例如下面我们在 Standard 的基础上添加 lidar observation。

agent_interface = AgentInterface.from_type(
requested_type = AgentType.Standard,
lidar_point_cloud = True,
)

自定义接口

除了上面预先定义好的接口，我们可以对其进行自由组合。下面是一个例子：

from smarts.core.agent_interface import AgentInterface, NeighborhoodVehicles, RGB, Waypoints
from smarts.core.controllers import ActionSpaceType
agent_interface = AgentInterface(
max_episode_steps=1000,
waypoint_paths=Waypoints(lookahead=50), # lookahead 50 meters
neighborhood_vehicle_states=NeighborhoodVehicles(radius=50), # only get neighborhood info with 50 meters.
drivable_area_grid_map=True,
occupancy_grid_map=True,
top_down_rgb=RGB(height=128,width=128,resolution=100/128), # 128x128 pixels RGB image representing a 100x100 meters area.
lidar_point_cloud=False,
action=ActionSpaceType.Continuous,
)

Policy

策略定义了 agent 在看到 obs 之后所作出的 action。所有的策略都需要继承 Agent，接着定义 act(self, obs)。

act(self, obs) 会返回一个动作，对应 action type。例如有我们的动作空间是 LaneWithContinusSpeed，那么 act 就会返回一个元组 (speed, lane_change)，对应的数据类型是 (float, int)。

下面就是一个简单的策略，跟着预测的 waypoint 来运行：

class FollowWaypoints(Agent):
def __init__(self):
"""Any policy initialization matters, including loading of model,
may be performed here.
"""
pass
def act(self, obs):
speed_limit = obs.waypoint_paths[0][0].speed_limit
return (speed_limit, 0)

SMARTS 简单例子

在有了 SMARTS ENV 和 SMARTS Agent 的概念之后，我们来看一个 SMARTS 的例子，SMARTS Quickstart。这个例子在 SMARTS 的文档中是放在了最前面，但是感觉放在前面不太好理解，所以我在介绍完毕上面的概念之后在介绍。

我们首先定义两个 Agent 与 Env 的接口，表示我们有两个自动驾驶车辆。

agent_interface = {agent_id : AgentInterface(
max_episode_steps=1000,
waypoint_paths=Waypoints(lookahead=50), # lookahead 50 meters
neighborhood_vehicle_states=NeighborhoodVehicles(radius=50), # only get neighborhood info with 50 meters.
drivable_area_grid_map=True,
occupancy_grid_map=True,
top_down_rgb=RGB(height=128,width=128,resolution=100/128), # 128x128 pixels RGB image representing a 100x100 meters area.
lidar_point_cloud=False,
action=ActionSpaceType.LaneWithContinuousSpeed,
)
for agent_id in ['Agent_001', 'Agent_007']
}