基于多源异构信息融合的肺癌专病知识图谱构建研究
Research on the Construction of a Lung Cancer-Specific Knowledge Graph Based on Multi-Source Heterogeneous Information Fusion
摘要: 肺癌是全球重大公共卫生挑战。医学知识图谱(MKG)可为智能诊疗提供关键支持,但现有图谱常面临信息源单一、覆盖不全、缺乏真实案例等问题。为此,本研究融合MIMIC-IV电子病历、DrugBank、PubMed、ICD-10等多源异构数据,构建肺癌专病知识图谱。创新性地采用模块化子图融合方法:先构建患者、疾病、药物三个子图,再通过实体对齐融合为总图谱。实验验证:1) 基于微调BioBERT的医疗实体识别模型性能优于基线;2) 利用TransE/TransH生成的图谱嵌入在药物/手术预测任务中,Top-3和Top-5命中率均≥92%。该图谱为肺癌临床决策提供了可靠知识支撑,其构建框架为多源医学数据融合与知识图谱构建提供了可复用的参考方案。
Abstract: Lung cancer is a major global public health challenge. Medical Knowledge Graphs (MKG) can provide crucial support for intelligent diagnosis and treatment, but existing graphs often face issues such as single information sources, incomplete coverage, and a lack of real cases. To address these problems, this study constructs a lung cancer-specific knowledge graph by integrating multi-source heterogeneous data, including MIMIC-IV electronic medical records, DrugBank, PubMed, and ICD-10. It innovatively adopts a modular subgraph fusion approach: first constructing three subgraphs for patients, diseases, and drugs, then fusing them into an overall graph through entity alignment. Experimental verification shows that: 1) The medical entity recognition model based on fine-tuned BioBERT outperforms the baseline; 2) The graph embeddings generated using TransE/TransH achieve a hit rate of ≥92% for both Top-3 and Top-5 in drug/surgery prediction tasks. This graph provides reliable knowledge support for clinical decision-making in lung cancer, and its construction framework offers a reusable reference scheme for multi-source medical data fusion and knowledge graph construction.
文章引用:雒增月, 尹裴. 基于多源异构信息融合的肺癌专病知识图谱构建研究[J]. 建模与仿真, 2025, 14(10): 28-38. https://doi.org/10.12677/mos.2025.1410603

参考文献

[1] Kim, M., Park, H., Kho, B., Park, C., Oh, I., Kim, Y., et al. (2020) Artificial Intelligence and Lung Cancer Treatment Decision: Agreement with Recommendation of Multidisciplinary Tumor Board. Translational Lung Cancer Research, 9, 507-514. [Google Scholar] [CrossRef] [PubMed]
[2] Huang, Z., Xu, W. and Yu, K. (2015) Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv:1508.01991.
[3] Edge, D., Trinh, H., Cheng, N., et al. (2024) From Local to Global: A Graph Rag Approach to Query-Focused Summarization. arXiv:2404.16130.
[4] Yang, P., Wang, H., Huang, Y., Yang, S., Zhang, Y., Huang, L., et al. (2024) LMKG: A Large-Scale and Multi-Source Medical Knowledge Graph for Intelligent Medicine Applications. Knowledge-Based Systems, 284, Article 111323. [Google Scholar] [CrossRef
[5] 靳淑雁, 王爽, 黄琼, 邱五七, 林怿昊. 基于乳腺癌专病库的知识图谱构建研究[J]. 医学信息学杂志, 2023, 44(12): 65-70.
[6] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H., et al. (2019) BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining. Bioinformatics, 36, 1234-1240. [Google Scholar] [CrossRef] [PubMed]
[7] 杨善林, 丁帅, 顾东晓, 等. 医疗健康大数据驱动的知识发现与知识服务方法[J]. 管理世界, 2022, 38(1): 219-229.
[8] Chandak, P., Huang, K. and Zitnik, M. (2023) Building a Knowledge Graph to Enable Precision Medicine. Scientific Data, 10, Article No. 67. [Google Scholar] [CrossRef] [PubMed]
[9] Li, L., Wang, P., Yan, J., et al. (2020) Real-World Data Medical Knowledge Graph: Construction and Applications. Artificial Intelligence in Medicine, 103, Article 101817. [Google Scholar] [CrossRef] [PubMed]
[10] Yang, H. and Liu, J. (2021) Knowledge Graph Representation Learning as Groupoid: Unifying TransE, RotatE, QuatE, ComplEx. Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event Queensland, 1-5 November 2021, 2311-2320. [Google Scholar] [CrossRef