基于时空融合的多模态路面特征提取算法

武文轩; 孟维亮; 张晓鹏

doi:10.13505/j.1007-1482.2025.30.01.009

基于时空融合的多模态路面特征提取算法

Spatiotemporal fusion-based multimodal road feature extraction for 3D visual perception

摘要

摘要: 视觉感知作为智能驾驶系统的核心技术，通过融合多模态传感器数据(包括LiDAR 点云、视觉和雷达)构建具有几何与语义信息的场景矢量化表征。本文提出一种基于时空融合的多模态道路特征解析框架，其新意在于将transformer 架构与鸟瞰图(BEV)表征学习相结合，设计了道路特征提取系统。该系统通过多尺度特征金字塔提取异构传感器数据，并利用注意力机制实现多视角特征向BEV 空间的对齐转换，同时引入时空融合方法实现多帧观测数据的自适应融合，从而提升准确性和召回率。该系统可以广泛的应用于离线自动化标注系统和自动化生成车端感知模型的训练真值。实验结果表明，该框架在自建自动驾驶数据集上的车道线和边缘线检测的准确率和召回率具有明显优势。

Abstract: Three-dimensional visual perception, a core technology of intelligent driving systems, constructs geometrically and semantically enriched vectorized scene representations by fusing multimodal sensor data (including LiDAR point clouds, camera images, and radar signals). This paper proposes a spatiotemporal fusion-based multimodal road feature parsing framework that innovatively combines transformer architectures with bird's-eye view (BEV) representation learning to design a road feature extraction system. The proposed system employs a multi-scale feature pyramid for heterogeneous sensor data extraction and utilizes attention mechanisms for multi - perspective feature alignment and transformation into BEV space. Furthermore, a spatiotemporal fusion methodology is introduced to enable adaptive integration of multi-frame observational data, thus improving accuracy and recall rate. This system can be widely applied to offline automated annotation systems for automatically generating training ground truth for vehicle online perception models. Experimental results demonstrate that the framework achieves superior precision and recall rates in lane marking and road boundary detection on our proprietary autonomous driving dataset compared to conventional approaches.

HTML全文

参考文献(17)

施引文献

资源附件(0)