面向集运MaaS多式联运的多模态大模型研究与应用
Research and Application of MLLMs for Container Shipping MaaS Multimodal Transportation
摘要: 多模态大模型作为人工智能的前沿方向,正逐步成为推动智慧物流升级的核心动力。在集运MaaS多式联运这一复杂场景中,传统单模态方法难以融合船舶AIS轨迹、港区监控视频、电子单证与气象文本等多源异构数据,导致跨模态语义理解与决策支持不足。针对上述问题,本文提出一种基于多模态大模型的智能系统架构,并构建四大核心应用场景:智能单证识别、货物异常识别、智能规划决策与业务数据问答。系统利用Qwen2.5-VL-32B实现单证自动解析与结构化输出,通过图文一致性比对提升货物核验效率,整合实时多源数据为客户提供多式联运路径优化方案,并以Qwen3-235B-A22B支撑自然语言查询与多轮交互。研究表明上述应用在提升物流效率、优化运营决策与降低作业成本方面具有显著潜力。
Abstract: Multimodal large models, as a frontier of artificial intelligence, are increasingly becoming a principal driving force for advancing intelligent logistics. In the complex context of container consolidation under Mobility-as-a-Service (MaaS) multimodal transport, traditional unimodal approaches struggle to integrate heterogeneous data sources—such as vessel AIS trajectories, port surveillance video, electronic documents, and meteorological texts—resulting in insufficient cross-modal semantic understanding and decision support. To address these challenges, this paper proposes an intelligent system architecture based on multimodal large models and develops four core application scenarios: intelligent document recognition, cargo anomaly detection, intelligent planning and decision-making, and business-data question answering. The system employs Qwen2.5-VL-32B to automate document parsing and structured output, leverages image-text consistency checks to enhance cargo verification efficiency, integrates real-time multisource data to provide multimodal route-optimization solutions, and utilizes Qwen3-235B-A22B to enable natural-language queries and multi-turn interaction. Empirical analysis demonstrates that the proposed applications have significant potential to improve logistics efficiency, optimize operational decision-making, and reduce operational costs.
参考文献
|
[1]
|
刘畅行, 陈思衡, 杨峰. 基于MLMs的智能无人机系统: 总结与展望[J]. 无线电工程, 2024, 54(12): 2923-2932.
|
|
[2]
|
车万翔, 窦志成, 冯岩松, 等. 大模型时代的自然语言处理: 挑战、机遇与发展[J]. 中国科学: 信息科学, 2023, 53(9): 1645-1687.
|
|
[3]
|
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
|
|
[4]
|
刘静, 郭龙腾. GPT-4对多模态大模型在多模态理解、生成、交互上的启发[J]. 中国科学基金, 2023, 37(5): 793-802.
|
|
[5]
|
Zhang, H., Li, F., Liu, S., et al. (2022) DINO: DETR with Improved De-Noising Anchor Boxes for End-To-End Object Detection. arXiv: 2203.03605.
|
|
[6]
|
Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, 18-24 July 2021, 8748-8763.
|
|
[7]
|
Peng, B., Li, C., He, P., et al. (023) Instruction Tuning with GPT-4. arXiv: 2304.03277.
|
|
[8]
|
Bai, J., Bai, S., Yang, S., et al. (2023) Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv: 2308.12966.
|
|
[9]
|
梁生龙, 范秋霞. 基于大模型的生成式数字孪生体建模[J/OL]. 图学学报: 1-7. https://link.cnki.net/urlid/10.1034.T.20250812.1556.004, 2025-10-13.
|
|
[10]
|
张雨薇, 王民. 基于提示词与多模态大模型选择的环境地图评价优化路径研究[J]. 地理教学, 2025(13): 20-24.
|