首页/知识库/AI 在自动驾驶中的应用

AI 在自动驾驶中的应用

✍️ AI Master📅 创建 2026-04-12📖 18 min 阅读
💡

文章摘要

从感知到决策规划,掌握 AI 在自动驾驶中的核心技术

1自动驾驶系统架构:从模块化到端到端

自动驾驶系统的核心在于如何将传感器输入转化为安全可靠的控制指令。传统的模块化架构将系统分为感知、定位、预测、规划和控制五大模块,每个模块独立优化,通过明确的接口传递信息。这种架构的优势是可解释性强、故障易于定位,但模块间的误差累积和接口瓶颈也限制了系统性能的上限。近年来,端到端学习方法开始兴起,直接从传感器输入映射到控制输出,减少了人工设计模块的信息损失。然而,端到端系统的黑盒特性在安全关键的自动驾驶领域仍面临巨大挑战。当前业界主流方案是 hybrid 架构,保留关键模块的可解释性,同时在感知和预测等子模块中引入深度学习。SAE 将自动驾驶分为 L0 到 L5 六个等级,不同等级对系统架构的要求差异显著。L2 级别只需处理特定场景的纵向和横向控制,而 L4 级别需要在 ODD 内实现完全自主驾驶。

python
# 模块化自动驾驶系统架构概览
import torch
import torch.nn as nn
from typing import Dict, List, Tuple

class ModularAutonomousSystem(nn.Module):
    def __init__(self):
        super().__init__()
        # 感知模块
        self.perception = PerceptionModule()
        # 定位模块
        self.localization = LocalizationModule()
        # 预测模块
        self.prediction = TrajectoryPredictionModule()
        # 规划模块
        self.planning = MotionPlanningModule()
        # 控制模块
        self.control = ControlModule()

    def forward(self, sensor_data: Dict[str, torch.Tensor],
                map_data: torch.Tensor) -> Dict[str, torch.Tensor]:
        # 感知:目标检测 + 语义分割
        objects = self.perception(sensor_data)
        # 定位:车辆位姿估计
        pose = self.localization(sensor_data, map_data)
        # 预测:周围目标轨迹预测
        predictions = self.prediction(objects, pose)
        # 规划:生成行驶轨迹
        trajectory = self.planning(pose, objects, predictions, map_data)
        # 控制:油门、刹车、转向
        commands = self.control(trajectory, pose)
        return {
            "objects": objects,
            "pose": pose,
            "predictions": predictions,
            "trajectory": trajectory,
            "commands": commands
        }
python
# 端到端驾驶模型基础架构
import torch
import torch.nn as nn

class EndToEndDrivingModel(nn.Module):
    def __init__(self, num_cameras: int = 6, img_size: int = 256):
        super().__init__()
        # 多视角特征提取
        self.backbone = nn.Sequential(
            nn.Conv2d(3 * num_cameras, 64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1)
        )
        # BEV 特征转换
        self.bev_encoder = nn.Sequential(
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 512, kernel_size=3, padding=1)
        )
        # 轨迹头
        self.trajectory_head = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 80)  # 20步 x 4维
        )
        # 控制头
        self.control_head = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.Linear(512, 64),
            nn.ReLU(),
            nn.Linear(64, 3)  # 油门、刹车、转向
        )

    def forward(self, multi_view_images: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
        B = multi_view_images.shape[0]
        features = self.backbone(multi_view_images)
        bev_features = self.bev_encoder(features)
        trajectory = self.trajectory_head(bev_features).view(B, 20, 4)
        commands = self.control_head(bev_features)
        return trajectory, commands
架构类型可解释性实时性安全性代表方案

模块化

中等

Apollo, Waymo

端到端

待验证

Tesla FSD, UniAD

Hybrid

中高

Mobileye REM

规则+学习混合

中等

Cruise AV

选择架构时,L2 级别推荐模块化方案便于功能安全认证,L4 级别可考虑 hybrid 架构平衡性能与安全性。

端到端模型在罕见场景下可能产生不可预测的行为,部署前必须在 corner case 数据集上进行充分验证。

2感知:目标检测与语义分割

感知是自动驾驶系统的眼睛,负责从传感器数据中理解周围环境。目标检测用于识别和定位道路上的车辆、行人、自行车等动态障碍物,而语义分割则对每个像素进行分类,理解道路、车道线、交通标志等静态环境。3D 目标检测是当前自动驾驶感知的核心挑战,需要从点云或图像中恢复目标的三维位置和姿态。主流方法包括基于点云的 PointPillars、CenterPoint 和基于多视角图像的 BEVDepth、BEVFormer。语义分割方面,BEVDet 和 Mask2Former 等模型通过引入 BEV 表示,统一了多传感器特征的空间对齐问题。感知的准确性直接决定了下游模块的输入质量,是自动驾驶系统最基础也最关键的环节。KITTI、nuScenes 和 Waymo Open Dataset 是该领域的三大基准数据集。

python
# CenterPoint 3D 目标检测核心
import torch
import torch.nn as nn
import torch.nn.functional as F

class CenterPoint3D(nn.Module):
    def __init__(self, num_classes: int = 3, voxel_size: float = 0.1):
        super().__init__()
        self.voxel_encoder = VoxelFeatureEncoder()
        self.backbone = SpMiddleFHD(output_channels=128)
        self.head = nn.ModuleDict({
            "heatmap": nn.Conv2d(128, num_classes, kernel_size=3, padding=1),
            "offset": nn.Conv2d(128, 2, kernel_size=3, padding=1),
            "height": nn.Conv2d(128, 1, kernel_size=3, padding=1),
            "dim": nn.Conv2d(128, 3, kernel_size=3, padding=1),
            "rotation": nn.Conv2d(128, 2, kernel_size=3, padding=1)
        })

    def forward(self, point_cloud: torch.Tensor) -> Dict[str, torch.Tensor]:
        voxels, coords = self.voxel_encoder(point_cloud)
        features = self.backbone(voxels, coords)
        heatmap = torch.sigmoid(self.head["heatmap"](features))
        offset = self.head["offset"](features)
        height = self.head["height"](features)
        dim = self.head["dim"](features)
        rotation = self.head["rotation"](features)
        return {
            "heatmap": heatmap,
            "offset": offset,
            "height": height,
            "dim": dim,
            "rotation": rotation
        }
python
# BEV 语义分割网络
import torch
import torch.nn as nn

class BEVSegmentation(nn.Module):
    def __init__(self, num_classes: int = 10, bev_range: list = [-50, 50, -50, 50]):
        super().__init__()
        self.bev_range = bev_range
        # 多视角图像编码器
        self.image_encoder = nn.Sequential(
            nn.Conv2d(3, 64, 7, 2, 3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(64, 128, 3, 2, 1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(128, 256, 3, 2, 1)
        )
        # LSS 深度分布估计
        self.depth_net = nn.Sequential(
            nn.Conv2d(256, 128, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, 80, 1)  # 80 depth bins
        )
        # BEV 融合与分割
        self.bev_head = nn.Sequential(
            nn.Conv2d(256, 512, 3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, num_classes, 1)
        )

    def forward(self, images: torch.Tensor, cam_params: torch.Tensor) -> torch.Tensor:
        B, N, C, H, W = images.shape
        images = images.view(B * N, C, H, W)
        features = self.image_encoder(images)
        depth_dist = torch.softmax(self.depth_net(features), dim=1)
        # 简化 LSS 体素池化
        bev_features = self._pool_to_bev(features, depth_dist, cam_params, B, N)
        segmentation = self.bev_head(bev_features)
        return segmentation

    def _pool_to_bev(self, features, depth_dist, cam_params, B, N):
        # 实际实现需要完整的 LSS 几何投影
        return torch.zeros(B, 256, 200, 200, device=features.device)
数据集场景数传感器目标类别标注类型

KITTI

7481 帧

相机+激光雷达

8类

3D 边界框

nuScenes

1000 场景

6相机+激光雷达+毫米波

23类

3D 边界框

Waymo Open

1150 场景

5相机+激光雷达

4类

3D 边界框+跟踪

Argoverse 2

1000 场景

7相机+激光雷达

多类

3D 边界框+运动学

感知模型的训练数据必须覆盖目标部署地区的所有典型场景,包括不同的天气、光照和道路类型。

3D 检测的评估指标 AP 计算对 IoU 阈值非常敏感,对比不同模型时务必使用相同的评估配置。

3传感器融合:多源信息的统一

自动驾驶车辆配备了多种传感器,包括摄像头、激光雷达、毫米波雷达和超声波传感器。每种传感器都有其独特的优势和局限性:摄像头提供丰富的纹理和颜色信息但缺乏深度感知,激光雷达提供精确的三维几何信息但在恶劣天气下性能下降,毫米波雷达可以直接测量目标速度但分辨率较低。传感器融合的目标是整合多源信息,发挥各自优势,弥补单一传感器的不足。融合策略按处理阶段分为数据级融合、特征级融合和决策级融合。当前主流的趋势是在 BEV 空间进行特征级融合,如 BEVFusion 和 TransFusion 等方法,将不同传感器的特征投影到统一的鸟瞰图空间,再通过注意力机制进行信息交互。这种融合方式既保留了各传感器的独特信息,又实现了空间对齐,是自动驾驶感知系统的重要发展方向。时间同步和空间标定是传感器融合的前置条件,任何偏差都会导致融合效果显著下降。

python
# BEV 空间多传感器特征融合
import torch
import torch.nn as nn
import torch.nn.functional as F

class BEVFusionModule(nn.Module):
    def __init__(self, lidar_dim: int = 128, camera_dim: int = 256, fusion_dim: int = 256):
        super().__init__()
        # 传感器特征投影到统一维度
        self.lidar_proj = nn.Conv2d(lidar_dim, fusion_dim, 1)
        self.camera_proj = nn.Conv2d(camera_dim, fusion_dim, 1)
        # 交叉注意力融合
        self.lidar_attn = nn.MultiheadAttention(fusion_dim, num_heads=8, batch_first=True)
        self.camera_attn = nn.MultiheadAttention(fusion_dim, num_heads=8, batch_first=True)
        # 融合后处理
        self.fusion_conv = nn.Sequential(
            nn.Conv2d(fusion_dim * 2, fusion_dim, 3, padding=1),
            nn.BatchNorm2d(fusion_dim),
            nn.ReLU(),
            nn.Conv2d(fusion_dim, fusion_dim, 3, padding=1)
        )

    def forward(self, lidar_bev: torch.Tensor, camera_bev: torch.Tensor) -> torch.Tensor:
        B, C, H, W = lidar_bev.shape
        lidar_feat = self.lidar_proj(lidar_bev)
        camera_feat = self.camera_proj(camera_bev)
        # 展平为序列
        lidar_seq = lidar_feat.flatten(2).permute(0, 2, 1)
        camera_seq = camera_feat.flatten(2).permute(0, 2, 1)
        # 交叉注意力
        lidar_enhanced, _ = self.lidar_attn(lidar_seq, camera_seq, camera_seq)
        camera_enhanced, _ = self.camera_attn(camera_seq, lidar_seq, lidar_seq)
        # 恢复空间形状
        lidar_enhanced = lidar_enhanced.permute(0, 2, 1).view(B, -1, H, W)
        camera_enhanced = camera_enhanced.permute(0, 2, 1).view(B, -1, H, W)
        # 拼接融合
        fused = torch.cat([lidar_enhanced, camera_enhanced], dim=1)
        return self.fusion_conv(fused)
python
# 传感器标定与时间同步工具
import numpy as np
from scipy.spatial.transform import Rotation

class SensorCalibration:
    def __init__(self):
        self.extrinsics = {}  # 传感器外参
        self.time_offsets = {}  # 时间偏移

    def set_extrinsic(self, sensor_name: str, R: np.ndarray, t: np.ndarray):
        T = np.eye(4)
        T[:3, :3] = R
        T[:3, 3] = t
        self.extrinsics[sensor_name] = T

    def transform_point(self, point: np.ndarray, from_sensor: str, to_sensor: str) -> np.ndarray:
        T_from = self.extrinsics[from_sensor]
        T_to = np.linalg.inv(self.extrinsics[to_sensor])
        T = T_to @ T_from
        point_h = np.append(point, 1.0)
        return (T @ point_h)[:3]

    def synchronize_timestamps(self, timestamps: dict, reference: str = "lidar") -> dict:
        synced = {}
        ref_time = timestamps[reference]
        for sensor, ts in timestamps.items():
            offset = self.time_offsets.get(sensor, 0.0)
            synced[sensor] = ref_time + offset
        return synced

    def calibrate_lidar_camera(self, lidar_points: np.ndarray,
                                camera_points: np.ndarray) -> tuple:
        from scipy.optimize import minimize
        def objective(params):
            R = Rotation.from_euler("xyz", params[:3]).as_matrix()
            t = params[3:]
            T = np.eye(4)
            T[:3, :3] = R
            T[:3, 3] = t
            projected = (T @ np.hstack([lidar_points, np.ones((len(lidar_points), 1))].T).T).T
            return np.mean((projected[:, :3] - camera_points) ** 2)
        result = minimize(objective, np.zeros(6), method="Nelder-Mead")
        R_opt = Rotation.from_euler("xyz", result.x[:3]).as_matrix()
        return R_opt, result.x[3:]
传感器优势劣势有效距离成本

摄像头

丰富纹理、颜色、可读交通标志

无深度、受光照影响大

0-200m

激光雷达

精确 3D 几何、不受光照影响

雨雪天气衰减、无颜色

0-150m

毫米波雷达

直接测速、全天候工作

分辨率低、多径干扰

0-250m

超声波

近距离精度高、成本极低

距离极短、受材质影响

0-5m

极低

传感器融合前必须确保精确的时空同步,否则融合特征会出现空间错位,反而降低检测精度。

不要在仿真环境中假设完美的传感器标定参数,真实车辆每次启动后的振动都会导致微小偏差。

4轨迹预测:理解他车意图

轨迹预测是自动驾驶系统理解周围交通参与者未来行为的关键模块。与目标检测只提供当前时刻的状态不同,轨迹预测需要在给定历史观测的情况下,预测周围车辆、行人和自行车在未来 3 到 8 秒内的运动轨迹。这是一个本质上不确定的问题,同一个历史轨迹可能对应多种合理的未来结果。因此,现代预测模型输出的是多模态分布,而非单一确定性轨迹。主流方法包括基于社会池化(Social Pooling)的 LSTM 网络、基于图神经网络的交互建模、以及基于 Transformer 的序列预测。Waymo Motion Dataset 和 Argoverse 2 Motion 是该领域的标准基准。轨迹预测的难点在于建模车辆之间的复杂交互关系,例如交叉路口的让行行为、高速公路的换道意图等。多模态输出需要配合合理的评分机制,确保规划模块能够选择最安全的预测结果。

python
# 基于 Transformer 的多模态轨迹预测
import torch
import torch.nn as nn
import math

class TrajectoryTransformer(nn.Module):
    def __init__(self, num_modes: int = 6, horizon: int = 80, hidden_dim: int = 128):
        super().__init__()
        self.num_modes = num_modes
        self.horizon = horizon
        # 历史编码
        self.history_encoder = nn.Sequential(
            nn.Linear(6, hidden_dim),  # x, y, vx, vy, heading, type
            nn.LayerNorm(hidden_dim),
            nn.ReLU()
        )
        # Transformer 编码器
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=hidden_dim, nhead=8,
            dim_feedforward=512, dropout=0.1, batch_first=True
        )
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=4)
        # 多模态解码头
        self.mode_query = nn.Parameter(torch.randn(num_modes, hidden_dim))
        self.decoder = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, horizon * 2)  # (x, y) per step
        )
        # 概率评分头
        self.prob_head = nn.Sequential(
            nn.Linear(hidden_dim, 1)
        )

    def forward(self, agent_history: torch.Tensor,
                map_context: torch.Tensor) -> tuple:
        B, N, T, _ = agent_history.shape
        # 编码历史
        agent_features = self.history_encoder(agent_history).view(B * N, T, -1)
        encoded = self.transformer_encoder(agent_features)
        # 聚合
        context = encoded.mean(dim=1).view(B, N, -1).mean(dim=1)
        # 多模态解码
        mode_features = self.mode_query.unsqueeze(0).expand(B, -1, -1)
        mode_features = mode_features + context.unsqueeze(1)
        trajectories = self.decoder(mode_features).view(B, self.num_modes, self.horizon, 2)
        probabilities = torch.softmax(self.prob_head(mode_features).squeeze(-1), dim=-1)
        return trajectories, probabilities
python
# 交互感知图神经网络预测
import torch
import torch.nn as nn
import torch.nn.functional as F

class InteractionGraphPredictor(nn.Module):
    def __init__(self, num_agents: int = 20, hidden_dim: int = 64, num_modes: int = 6):
        super().__init__()
        self.num_agents = num_agents
        self.num_modes = num_modes
        self.node_encoder = nn.Linear(5, hidden_dim)
        self.edge_encoder = nn.Linear(3, hidden_dim)
        # 图注意力层
        self.gat_layer1 = nn.MultiheadAttention(hidden_dim, num_heads=4, batch_first=True)
        self.gat_layer2 = nn.MultiheadAttention(hidden_dim, num_heads=4, batch_first=True)
        # 轨迹输出
        self.trajectory_decoder = nn.Sequential(
            nn.Linear(hidden_dim, 128),
            nn.ReLU(),
            nn.Linear(128, 80 * 2)  # 80 timesteps
        )
        self.confidence_decoder = nn.Linear(hidden_dim, num_modes)

    def build_adjacency(self, positions: torch.Tensor, threshold: float = 50.0) -> torch.Tensor:
        B, N, _ = positions.shape
        dist_matrix = torch.cdist(positions, positions)
        adjacency = (dist_matrix < threshold).float()
        # 掩码自连接
        adjacency += torch.eye(N, device=positions.device).unsqueeze(0)
        return adjacency

    def forward(self, agent_states: torch.Tensor) -> tuple:
        B, N, T, _ = agent_states.shape
        # 节点特征
        node_feat = self.node_encoder(agent_states[:, :, -1, :])
        # 交互建模
        enhanced1, _ = self.gat_layer1(node_feat, node_feat, node_feat)
        enhanced2, _ = self.gat_layer2(enhanced1, enhanced1, enhanced1)
        # 输出
        trajectories = self.trajectory_decoder(enhanced2).view(B, N, 80, 2)
        confidence = torch.softmax(self.confidence_decoder(enhanced2.mean(dim=1)), dim=-1)
        return trajectories, confidence
方法交互建模多模态计算复杂度minFDE (nuScenes)

Social-LSTM

社会池化

隐式

2.8m

MultiPath++

场景上下文

显式 6 模态

1.9m

QCNet

Query-Centric

显式 6 模态

中高

1.5m

GameFormer

博弈论交互

显式 6 模态

1.4m

轨迹预测的输出必须包含不确定性估计,规划模块需要知道每条预测轨迹的置信度才能做出安全决策。

不要在训练数据中只包含正常行驶场景,急刹车、紧急避让等罕见行为对预测模型的泛化至关重要。

5决策规划:从规则到学习

决策规划是自动驾驶系统的大脑,负责在理解周围环境的基础上,生成安全、舒适、高效的行驶轨迹。传统的规划方法基于规则和优化,如状态机、行为树和模型预测控制(MPC),这些方法具有良好的可解释性和安全保证,但在复杂交通场景中泛化能力有限。基于学习的规划方法,特别是强化学习和模仿学习,正在为这一领域带来新的可能。强化学习通过与环境交互学习最优策略,能够处理规则方法难以编码的复杂场景。模仿学习则从人类驾驶数据中学习,生成的行为更加自然。当前主流的方案是将学习方法和优化方法结合,用学习模型生成初始轨迹或参考路径,再用优化方法进行安全约束下的微调。这种 hybrid 方法既保留了学习方法处理复杂场景的能力,又确保最终输出满足安全约束。规划模块需要考虑的约束包括碰撞避免、交通规则遵守、乘坐舒适性和通行效率。

python
# 强化学习驾驶策略 - PPO 实现
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.distributions import Normal

class DrivingPPOAgent(nn.Module):
    def __init__(self, state_dim: int = 64, action_dim: int = 3):
        super().__init__()
        # 共享特征提取
        self.feature_extractor = nn.Sequential(
            nn.Linear(state_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU()
        )
        # Actor 网络
        self.actor_mean = nn.Sequential(
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, action_dim),
            nn.Tanh()
        )
        self.actor_logstd = nn.Parameter(torch.zeros(action_dim))
        # Critic 网络
        self.critic = nn.Sequential(
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )

    def forward(self, state: torch.Tensor) -> tuple:
        features = self.feature_extractor(state)
        action_mean = self.actor_mean(features)
        action_std = torch.exp(self.actor_logstd).expand_as(action_mean)
        dist = Normal(action_mean, action_std)
        action = dist.sample()
        log_prob = dist.log_prob(action).sum(dim=-1)
        value = self.critic(features).squeeze(-1)
        return action, log_prob, value

    def evaluate(self, state: torch.Tensor, action: torch.Tensor) -> tuple:
        features = self.feature_extractor(state)
        action_mean = self.actor_mean(features)
        action_std = torch.exp(self.actor_logstd).expand_as(action_mean)
        dist = Normal(action_mean, action_std)
        log_prob = dist.log_prob(action).sum(dim=-1)
        entropy = dist.entropy().sum(dim=-1)
        value = self.critic(features).squeeze(-1)
        return log_prob, entropy, value
python
# 模型预测控制 - 轨迹优化
import numpy as np
from scipy.optimize import minimize

class MPCPlanner:
    def __init__(self, horizon: int = 20, dt: float = 0.1):
        self.horizon = horizon
        self.dt = dt
        self.vehicle_length = 4.5
        self.vehicle_width = 1.8

    def bicycle_model(self, state: np.ndarray, control: np.ndarray) -> np.ndarray:
        x, y, v, theta = state
        delta, a = control
        # 运动学自行车模型
        x_dot = v * np.cos(theta)
        y_dot = v * np.sin(theta)
        v_dot = a
        theta_dot = v * np.tan(delta) / 2.7  # 轴距 2.7m
        return np.array([x_dot, y_dot, v_dot, theta_dot])

    def optimize_trajectory(self, initial_state: np.ndarray,
                             reference_path: np.ndarray,
                             obstacles: list) -> np.ndarray:
        n_controls = self.horizon * 2
        def cost(controls):
            controls = controls.reshape(self.horizon, 2)
            state = initial_state.copy()
            total_cost = 0.0
            for t in range(self.horizon):
                state = state + self.bicycle_model(state, controls[t]) * self.dt
                # 跟踪误差
                ref_pos = reference_path[min(t, len(reference_path)-1)]
                tracking_error = (state[0] - ref_pos[0])2 + (state[1] - ref_pos[1])2
                total_cost += 10.0 * tracking_error
                # 控制平滑
                if t > 0:
                    total_cost += 0.1 * np.sum(controls[t] - controls[t-1])2
                # 碰撞惩罚
                for obs_x, obs_y, obs_r in obstacles:
                    dist = np.sqrt((state[0]-obs_x)2 + (state[1]-obs_y)2)
                    if dist < obs_r + 1.0:
                        total_cost += 1000.0 * (obs_r + 1.0 - dist)2
            return total_cost
        result = minimize(cost, np.zeros(n_controls),
                         method="SLSQP",
                         bounds=[(-0.5, 0.5)] * self.horizon + [(-3.0, 3.0)] * self.horizon)
        return result.x.reshape(self.horizon, 2)
规划方法可解释性安全性保证复杂场景实时性

规则+状态机

极快

MPC 优化

强(约束内)

中等

强化学习

快(推理)

Hybrid(学习+优化)

中等

Hybrid 方案中,学习模型负责生成多样化的候选轨迹,优化方法负责从中选择安全且舒适的最优解。

强化学习策略在训练分布之外的场景可能产生危险行为,必须通过安全层进行约束和过滤。

6仿真与测试:安全验证的基石

自动驾驶系统的测试和验证是确保功能安全的关键环节。由于真实道路测试成本极高且存在安全风险,仿真测试成为自动驾驶开发中不可或缺的部分。仿真平台需要在虚拟环境中重建真实的物理规律、传感器特性和交通场景。CARLA、LGSVL 和 NVIDIA DRIVE Sim 是主流的开源和商业仿真平台。场景生成是仿真的核心挑战,包括自然驾驶场景的自动采集和 corner case 的人工设计。基于生成对抗网络和扩散模型的场景生成方法,可以自动创建高价值的测试场景,大幅提升测试效率。功能安全标准 ISO 26262 和预期功能安全标准 ISO 21448(SOTIF)为自动驾驶系统的安全评估提供了框架。仿真测试不能完全替代真实测试,但可以覆盖 99% 以上的测试场景,大幅降低实车测试的风险和成本。数字孪生技术正在将仿真和实车测试的数据闭环打通。

python
# CARLA 自动化测试框架
import carla
import numpy as np

class AutonomousTestRunner:
    def __init__(self, host: str = "localhost", port: int = 2000):
        self.client = carla.Client(host, port)
        self.client.set_timeout(10.0)
        self.world = self.client.get_world()
        self.results = []

    def run_test_scenario(self, scenario_name: str,
                           autopilot_class, num_episodes: int = 100) -> dict:
        scenario = self._load_scenario(scenario_name)
        successes = 0
        infractions = {"collision": 0, "red_light": 0, "route_deviation": 0}
        for ep in range(num_episodes):
            env = self._setup_environment(scenario, ep)
            agent = autopilot_class()
            done = False
            reward_sum = 0.0
            while not done:
                state = self._get_state(env)
                action = agent.get_action(state)
                next_state, reward, done, info = self._step(env, action)
                reward_sum += reward
                if info.get("collision"):
                    infractions["collision"] += 1
                    done = True
                if info.get("red_light"):
                    infractions["red_light"] += 1
            if reward_sum > 0:
                successes += 1
        return {
            "scenario": scenario_name,
            "success_rate": successes / num_episodes,
            "infractions": infractions,
            "total_episodes": num_episodes
        }

    def _load_scenario(self, name):
        # 从场景库加载
        return {"name": name}

    def _setup_environment(self, scenario, seed):
        return {"scenario": scenario, "seed": seed}

    def _get_state(self, env):
        return np.random.randn(64)

    def _step(self, env, action):
        return None, 0.0, True, {}
python
# 基于扩散模型的场景生成
import torch
import torch.nn as nn

class ScenarioDiffusionModel(nn.Module):
    def __init__(self, num_agents: int = 20, horizon: int = 80, hidden_dim: int = 128):
        super().__init__()
        self.num_agents = num_agents
        self.horizon = horizon
        # 场景编码
        self.scene_encoder = nn.Sequential(
            nn.Linear(num_agents * horizon * 4, hidden_dim * 4),
            nn.LayerNorm(hidden_dim * 4),
            nn.ReLU(),
            nn.Linear(hidden_dim * 4, hidden_dim)
        )
        # 扩散去噪网络
        self.denoise_net = nn.Sequential(
            nn.Linear(hidden_dim + num_agents * horizon * 4, hidden_dim * 2),
            nn.ReLU(),
            nn.Linear(hidden_dim * 2, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, num_agents * horizon * 4)
        )
        self.noise_schedule = torch.linspace(1e-4, 0.02, 1000)

    def forward_diffusion(self, x0: torch.Tensor, t: int) -> tuple:
        beta_t = self.noise_schedule[t]
        alpha_t = 1.0 - beta_t
        noise = torch.randn_like(x0)
        xt = torch.sqrt(torch.tensor(alpha_t)) * x0 + torch.sqrt(torch.tensor(1 - alpha_t)) * noise
        return xt, noise

    def denoise(self, xt: torch.Tensor, t: int, context: torch.Tensor) -> torch.Tensor:
        t_emb = self._timestep_embedding(t, context.shape[0])
        combined = torch.cat([xt, context, t_emb], dim=-1)
        return self.denoise_net(combined)

    def sample(self, context: torch.Tensor, num_steps: int = 50) -> torch.Tensor:
        B = context.shape[0]
        x = torch.randn(B, self.num_agents * self.horizon * 4, device=context.device)
        for t in reversed(range(num_steps)):
            noise_pred = self.denoise(x, t, context)
            x = x - noise_pred * self.noise_schedule[t]
        return x.view(B, self.num_agents, self.horizon, 4)

    def _timestep_embedding(self, t: int, batch_size: int) -> torch.Tensor:
        return torch.full((batch_size, 1), t / 1000.0)
仿真平台传感器模拟物理引擎开源典型用户

CARLA

相机/激光雷达/毫米波

PhysX

学术界

LGSVL

相机/激光雷达

Unity PhysX

Apollo

NVIDIA DRIVE Sim

全传感器

PhysX + Omniverse

NVIDIA 生态

Cognata

全传感器

云原生

Mobileye, Zoox

场景覆盖率比测试里程更重要。1000 个精心设计的 corner case 场景比 10000 公里的随机驾驶更有价值。

仿真中的传感器模型和真实传感器存在 sim2real gap,仿真测试结果只能作为参考,不能替代真实测试。

7实战:端到端驾驶模型开发

本章将整合前面所学的感知、预测和规划知识,从零构建一个端到端驾驶模型。我们选择 UniAD 架构作为基础,这是一种将感知、预测和规划统一在一个框架中的方法,通过共享 BEV 特征表示,各模块之间的信息可以端到端地流动。训练数据来自 nuScenes 数据集,包含多视角图像、激光雷达点云和标注的驾驶轨迹。端到端模型的训练是一个多任务学习过程,需要同时优化检测、跟踪、预测和规划等多个目标的损失函数。损失权重的设计至关重要,不同任务的梯度规模和收敛速度差异可能导致训练不稳定。实际部署中,还需要考虑模型压缩、实时推理优化和安全回退机制。一个完整的端到端自动驾驶系统从数据采集到量产部署,通常需要 2 到 3 年的开发周期和数百万公里的测试数据。

python
# 多任务端到端驾驶模型训练
import torch
import torch.nn as nn
import torch.nn.functional as F

class UniADUnified(nn.Module):
    def __init__(self, num_classes: int = 10, num_modes: int = 6):
        super().__init__()
        # BEV 特征提取(共享)
        self.bev_encoder = BEVEncoder()
        # 检测头
        self.detect_head = nn.Sequential(
            nn.Conv2d(256, 128, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, num_classes + 4, 1)  # cls + bbox
        )
        # 跟踪头
        self.track_head = nn.Sequential(
            nn.Conv2d(256, 128, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, 64, 1)  # track embedding
        )
        # 预测头
        self.predict_head = TrajectoryHead(num_modes=num_modes)
        # 规划头
        self.plan_head = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 80 * 2)  # 规划轨迹
        )

    def forward(self, multi_view_images: torch.Tensor,
                targets: dict = None) -> dict:
        bev_features = self.bev_encoder(multi_view_images)
        # 多任务输出
        detection = self.detect_head(bev_features)
        tracking = self.track_head(bev_features)
        prediction = self.predict_head(bev_features)
        planning = self.plan_head(bev_features).view(-1, 80, 2)
        outputs = {
            "detection": detection,
            "tracking": tracking,
            "prediction": prediction,
            "planning": planning
        }
        if targets is not None:
            loss = self._compute_loss(outputs, targets)
            outputs["loss"] = loss
        return outputs

    def _compute_loss(self, outputs: dict, targets: dict) -> torch.Tensor:
        det_loss = F.cross_entropy(outputs["detection"], targets["detection_labels"]) / 10.0
        track_loss = F.mse_loss(outputs["tracking"], targets["tracking_emb"]) / 100.0
        pred_loss = F.l1_loss(outputs["prediction"], targets["prediction_traj"]) / 5.0
        plan_loss = F.mse_loss(outputs["planning"], targets["planning_traj"]) / 2.0
        return det_loss + track_loss + pred_loss + plan_loss
python
# 部署与推理服务
import torch
import numpy as np
from dataclasses import dataclass
import time

@dataclass
class DrivingCommand:
    steering: float
    throttle: float
    brake: float
    planned_trajectory: np.ndarray

class DrivingInferenceEngine:
    def __init__(self, model_path: str, device: str = "cuda"):
        self.device = torch.device(device)
        self.model = torch.jit.load(model_path)
        self.model.to(self.device)
        self.model.eval()
        self.latency_buffer = []

    def infer(self, camera_images: np.ndarray,
              ego_state: np.ndarray,
              navigation_goal: np.ndarray) -> DrivingCommand:
        start_time = time.time()
        # 预处理
        tensor_images = self._preprocess_images(camera_images)
        tensor_ego = torch.tensor(ego_state, device=self.device).float()
        # 推理
        with torch.no_grad():
            plan_traj = self.model(tensor_images, tensor_ego)
        # 轨迹转控制指令
        command = self._trajectory_to_control(plan_traj.cpu().numpy(), ego_state)
        latency = time.time() - start_time
        self.latency_buffer.append(latency)
        if len(self.latency_buffer) > 100:
            self.latency_buffer.pop(0)
        return command

    def _preprocess_images(self, images: np.ndarray) -> torch.Tensor:
        tensor = torch.tensor(images, device=self.device).float() / 255.0
        return tensor.permute(0, 3, 1, 2).unsqueeze(0)

    def _trajectory_to_control(self, trajectory: np.ndarray,
                                ego_state: np.ndarray) -> DrivingCommand:
        # 纯追踪控制器
        lookahead_idx = 10
        target_point = trajectory[lookahead_idx]
        dx = target_point[0] - ego_state[0]
        dy = target_point[1] - ego_state[1]
        steering = np.arctan2(dy, dx) * 0.5
        speed = np.sqrt(dx2 + dy2) / (lookahead_idx * 0.1)
        throttle = min(max(speed * 0.3, 0.0), 1.0)
        brake = max(0.0, 1.0 - speed * 0.5)
        return DrivingCommand(
            steering=float(np.clip(steering, -1.0, 1.0)),
            throttle=float(throttle),
            brake=float(brake),
            planned_trajectory=trajectory
        )

    @property
    def avg_latency(self) -> float:
        return np.mean(self.latency_buffer) if self.latency_buffer else 0.0
开发阶段核心任务数据需求典型周期关键指标

数据采集

收集多场景驾驶数据

100万+ 帧

6 个月

场景多样性

模型开发

多任务模型训练调优

标注数据

4 个月

各任务精度

仿真测试

大规模虚拟场景测试

10000+ 场景

3 个月

成功率 > 95%

封闭场地

实际道路验证

5万+ 公里

6 个月

MPI > 5000

开放道路

真实场景部署

持续积累

持续

MPI > 10000

多任务训练中,使用动态损失权重(如不确定性加权)比固定权重效果更好,模型会自动平衡各任务的贡献。

端到端模型的规划输出必须经过安全验证层,任何可能碰撞的轨迹都应该被过滤掉,不能直接输出给控制系统。

继续你的 AI 学习之旅

浏览更多 AI 知识库文章,或者探索 GitHub 上的优质 AI 项目