基于MapReduce的语义网空间数据关联
A Map-Reduce-Based Parallel Approach for Geospatial Data Interlinking in a Semantic Web
DOI: 10.12677/GST.2019.72014, PDF,   
作者: 杨雯雨*:武汉大学测绘遥感信息工程国家重点实验室,湖北 武汉;慕尼黑工业大学土木地质环境工程系,德国 慕尼黑
关键词: Map Reduce数据关联地理空间数据Hausdorff距离Map-Reduce Data Interlinking Geospatial Semantic Data Hausdorff Distance
摘要: 构建数据网是实现语义网的一种途径,而关联不同的RDF数据集是构建数据网中的重要问题。在RDF关联中,同质关联是一种重要类型,旨在匹配来自不同数据集中的相同实体。构建地理空间实体之间的同质关联有许多方法,本文采用了基于相似性的关联方法,使用Hausdorff距离计算两个实体之间的位置和形状相似度。由于Hausdorff距离的计算十分复杂并且地理空间数据具有大数据的特性,因此整个匹配过程非常耗时。本文提出了一种基于MapReduce框架的并行计算方法,大大减少了运行时间。实验对欧洲领土数据库(NUTS)和全球行政区划数据库(GADM)中的数据进行了同质关联。关联结果精度高,在1个节点上运行时,运行时间超过了一天,而利用拟议的并行框架,在8个节点上运行时间仅3小时左右。
Abstract: The Web of Data represents an intermediate step towards the Semantic Web. Constructing links among different Resource Description Framework (RDF) datasets is a key issue in the Web of Data. An identity link aims to match entities from different datasets and is an important type of RDF link. There are many approaches to constructing identity links between geospatial entities. This paper adopts the Hausdorff distance to compute the location and shape similarity between two entities. Because the computation of the Hausdorff distance is complex and geospatial data intrinsically large, the entire matching process is very time consuming. This paper proposes a Map-Reduce-based framework to parallelize the similarity computation, significantly reducing the runtime. This approach was verified to be effective in an experiment using data from Nomenclature of Territorial Units for Statistics (NUTS) and Database of Global Administrative Areas (GADM). The matching precision was high, and with the utilization of the proposed parallel framework, the runtime was reduced to only approximately 3 h on 8 nodes; in contrast, when run on 1 node, the runtime exceeded one day.
文章引用:杨雯雨. 基于MapReduce的语义网空间数据关联[J]. 测绘科学技术, 2019, 7(2): 90-100. https://doi.org/10.12677/GST.2019.72014

参考文献

[1] Auer, S., et al. (2007) DBpedia: A Nucleus for a Web of Open Data. Proceedings of 6th International Semantic Web Conference and 2nd Asian Semantic WEB Conference, Busan, 11-15 November 2007, 722-735. [Google Scholar] [CrossRef
[2] Auer, S., Lehmann, J. and Hellmann, S. (2009) Linked Geo Data: Adding a Spatial Dimension to the Web of Data. Proceedings of International Semantic Web Conference, Chantilly, 25-29 October 2009, 731-746.
[3] Mika, P. and Tummarello, G. (2008) Web Semantics in the Clouds. IEEE Intelligent Systems, 23, 82-87. [Google Scholar] [CrossRef
[4] Hoffart, J., et al. (2013) YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence, 194, 28-61. [Google Scholar] [CrossRef
[5] Berners-Lee, T. (2006) Linked Data.
http://www.w3.org/DesignIssues/LinkedData.html
[6] Heath, T. and Bizer, C. (2011) Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, San Rafael.
[7] Winkler, W.E. (1990) String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, 354-359.
[8] Rodriguez, M.A. and Egenhofer, M.J. (2003) Determining Semantic Similarity among Entity Classes from Different Ontologies. IEEE Transactions on Knowledge and Data Engineering, 15, 442-456. [Google Scholar] [CrossRef
[9] Varelas, G., et al. (2005) Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web. In: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, ACM, New York, 10-16. [Google Scholar] [CrossRef
[10] Nguyen, H.A. and Al-Mubaid, H. (2006) A Combination-Based Semantic Similarity Measure Using Multiple Information Sources. IEEE International Conference on Information Reuse and Integration, 16-18 September 2006, 617-621.
[11] Ge, J. and Qiu, Y. (2008) Concept Similarity Matching Based on Semantic Distance. 4th International Conference on Semantics, Knowledge and Grid, 3-5 December 2008, 380-383. [Google Scholar] [CrossRef
[12] Tejada, S., Knoblock, C.A. and Minton, S. (2001) Learning Object Identification Rules for Information Integration. Information Systems, 26, 607-633. [Google Scholar] [CrossRef
[13] Cohen, W.W., Ravikumar, P. and Fienberg, S.E. (2003) A Comparison of String Metrics for Matching Names and Records. KDD Workshop on DATA Cleaning & Object Con-solidation, Washington, DC, Vol. 3, 73-78.
[14] Zhang, M., et al. (2013) An Interlinking Approach for Linked Geo-spatial Data. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 40, 283-287. [Google Scholar] [CrossRef
[15] Tversky, A. (1977) Features of Similarity. Psychological Review, 84, 327-352. [Google Scholar] [CrossRef
[16] Pschorr, J., et al. (2010) Sensor Discovery on Linked Data. Proceedings of the 7th Extended Semantic Web Conference, Heraklion.
[17] Volz, J., et al. (2010) Silk—A Link Discovery Framework for the Web of Data. LDOW, 538.
[18] Bizer, C., Cyganiak, R. and Heath, T. (2007) How to Publish Linked Data on the Web.
http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/LinkedDataTutorial/
[19] Dean, J. and Ghemawat, S. (2004) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51, 107-113. [Google Scholar] [CrossRef