基于分子图相似度的医药文献推荐方法
A Recommender via Similarity of Molecule Graphs for Medical Literature
DOI: 10.12677/CSA.2022.1212290, PDF,    科研立项经费支持
作者: 冯贤兵, 陶 涛, 吕肖庆:北京大学王选计算机研究所,北京
关键词: 图相似度分子图论文推荐图编辑距离二部图Graph Similarity Molecule Graph Paper Recommender Graphic Edit Distance Bipartite Graph
摘要: 当今生物医学等领域的文献快速增长,一方面促进了科研交流,但同时也为研究人员带来了巨大的阅读压力,尽管业界已出现了一些论文搜索和推荐的方法,但其大都只依据论文的元信息和文本信息,而对文章内容,尤其是插图等非文字对象尚未充分挖掘并利用,因此现有系统在给读者的推荐结果中,还存在着大量重复、泛化等低效情况。为此,我们探索并建立了一个基于论文内容的文档级推荐系统,具体包括:文档解析、文本对象理解、内容相似性度量、多级索引机制、以及优化推荐结果等主要环节。其中,针对生物医学类科技文献中特有的分子式图片,我们提出了一种图相似度的度量方法,即半分支编辑距离(Half-branch GED,简称HB-GED)算法,同时针对分子图形表示和文档之间关系表示也提出了图卷积模型。在真实数据集上的实验结果表明,本文提出的论文推荐方法,可有效筛选出更符合查询者意图的候选论文。
Abstract: Nowadays, the consistent growth of scientific and technical literature leads to formidable pressure on medical researchers. Researchers turn to the search engine and paper recommender systems and still have to spend more time keeping up with the trends and directions in their field. However, most existing recommender approaches mainly depend on text-based information and ignore non-text objects, such as informative figures. To this end, we establish a document-to-document recommender system for medical literature. Specifically, we proposed a deep-learning-based seg-mentation method for extracting molecular graphs, a Half-branch GED algorithm for evaluating the similarity of molecules, and a bipartite-graph-based algorithm for paper similarity, respectively. Experimental results on real-world datasets demonstrate the effectiveness of the proposed recom-mender system.
文章引用:冯贤兵, 陶涛, 吕肖庆. 基于分子图相似度的医药文献推荐方法[J]. 计算机科学与应用, 2022, 12(12): 2853-2862. https://doi.org/10.12677/CSA.2022.1212290

参考文献

[1] Wu, S., Sun, F., Zhang, W. and Cui, B. (2022) Graph Neural Networks in Recommender Systems: A Survey. ACM Computing Surveys, 55, Article No. 97. [Google Scholar] [CrossRef
[2] Li, H. (2014) Learning to Rank for Information Retrieval and Natural Language Processing. In: Hirst, G., Ed., Synthesis Lectures on Human Language Technologies, 2nd Edition, Springer, Berlin, 121 p. [Google Scholar] [CrossRef
[3] Hui, K., Yates, A., Beberich, K. and Melo, G.D. (2018) Co-PACRR: A Context-Aware Neural IR Model for Ad-Hoc Retrieval. Proceedings of the 11th ACM Interna-tional Conference on Web Search and Data Mining (WSDM ’18), Los Angeles, 5-9 February 2018, 279-287. [Google Scholar] [CrossRef
[4] Xiong, C., Dai, Z., Callan, J., Liu, Z. and Power, R. (2017) End-to-End Neural Ad-Hoc Ranking with Kernel Pooling. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’17), Tokyo, 7-11 August 2017, 55-64. [Google Scholar] [CrossRef
[5] Dai, Z. (2020) Neural Matching and Importance Learning in Infor-mation Retrieval. Ph.D. Thesis, Tsinghua University, Beijing.
[6] Gao, L., Dai, Z., Chen, T., Fan, Z., Durme, B.V. and Callan, J. (2021) Complementing Lexical Retrieval with Semantic Residual Embedding. In: Hiemstra, D., Moens, M.F., Mothe, J., Perego, R., Potthast, M. and Sebastiani, F., Eds., Advances in Information Retrieval. Lecture Notes in Com-puter Science, Vol. 12656, Springer, Cham, 146-160. [Google Scholar] [CrossRef
[7] Xiong, L., Xiong, C., Li, Y., Tang, K.F., Liu, J., Bennett, P., Ahmed, J. and Overwijk, A. (2021) Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Re-trieval. The 9th International Conference on Learning Representations (ICLR 2021), Virtual Event Austria, 3-7 May 2021, 16 p.
https://openreview.net/pdf?id=zeFrfgyZln
[8] Lin, S.C., Yang, J.H. and Lin, J. (2020) Distilling Dense Representations for Ranking Using Tightly-Coupled Teachers. ArXiv Preprint arXiv: 2010.11386.
[9] Qian, Y., Santus, E., Jin, Z., Guo, J. and Barzilay, R. (2018) GraphIE: A Graph-Based Framework for Information Extraction. ArXiv Preprint arXiv: 1810.13083.
[10] Trabelsi, M., Chen, Z., Davison, B.D. and Heflin, J. (2021) Neural Ranking Models for Document Retrieval. Information Retrieval Journal, 24, 400-444. [Google Scholar] [CrossRef
[11] Zhang, Z., Bu, J., Ester, M., Li, Z., Yao, C., Yu, Z. and Wang, C. (2021) H2mn: Graph Similarity Learning with Hierarchical Hypergraph Matching Networks. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual Event Singapore, 14-18 August 2021, 2274-2284. [Google Scholar] [CrossRef
[12] Coupette, C. and Vreeken, J. (2021) Graph Similarity Description: How Are These Graphs Similar? Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD ’21), Virtual Event Singapore, 14-18 August 2021, 185-195. [Google Scholar] [CrossRef
[13] Raveaux, R. (2021) On the Unification of the Graph Edit Distance and Graph Matching Problems. Pattern Recognition Letters, 145, 240-246. [Google Scholar] [CrossRef
[14] Riba, P., Fischer, A., Lladós, J. and Fornés, A. (2020) Learning Graph Edit Distance by Graph Neural Networks. Pattern Recognition, 120, 108-132. [Google Scholar] [CrossRef
[15] Ling, X., Wu, L., Wang, S., Ma, T., Xu, F., Liu, A.X., Wu, C. and Ji, S. (2021) Multilevel Graph Matching Networks for Deep Graph Similarity Learning. IEEE Transactions on Neural Networks and Learning Systems. [Google Scholar] [CrossRef
[16] Ren, S., He, K., Girshick, R. and Sun, J. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef
[17] He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2017) Mask R-CNN. IEEE International Conference on Computer Vision (ICCV 2017), Venice, 22-29 October 2017, 2980-2988. [Google Scholar] [CrossRef
[18] Lin, T.-Y., Goyal, P., Girshick, R., He, K. and Dollar, P. (2017) Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, 22-29 October 2017, 2999-3007. [Google Scholar] [CrossRef
[19] Tian, Z., Shen, C., Chen, H. and He, T. (2019) FCOS: Fully Convo-lutional One-Stage Object Detection. IEEE/CVF International Conference on Computer Vision (ICCV 2019), Seoul, 27 October-2 November 2019, 9626-9635. [Google Scholar] [CrossRef