面向企业的报告自动生成方法研究综述
Review on Automated Report Generation Methods for Enterprises
DOI: 10.12677/CSA.2022.1210236, PDF,  被引量    科研立项经费支持
作者: 冯灿锐, 安建业*:天津商业大学理学院,天津;刘志勇:天津众联智能科技有限责任公司,天津
关键词: 深度学习知识图谱自然语言处理报告生成Deep Learning Knowledge Graph Natural Language Processing Report Generation
摘要: 报告是企业日常运营中具有重复性与复杂性的一项工作,本文系统梳理了报告自动生成的实现方法、研究重点与发展方向。在总结基于数据合并、模板等基本报告生成方法的基础上,首先介绍了序列生成、知识指导等智能化报告生成的改进模型;然后进一步分析了基于模板与智能化的混合模型报告生成方法;最后对报告生成的评价标准进行了对比。研究表明:基于模板与智能化的报告生成方法具有程式化、精度高、适应性强等特点,是今后研究的主流方向。
Abstract: Reporting is a repetitive and complex task in the daily operation of enterprises. In this paper, we systematically sort out the implementation methods, research priorities and development directions of automatic report generation. This paper summarizes the basic report generation methods based on data merging and templates and introduces the improved models of intelligent report generation such as sequence generation and knowledge guidance, then further analyzes the mixed model report generation methods based on templates and intelligence; finally, the evaluation criteria of report generation are compared. We come to the conclusion that report generation methods based on templates and intelligence has the characteristics of programmatic, high accuracy, and adaptability. It will be the mainstream direction of future research.
文章引用:冯灿锐, 安建业, 刘志勇. 面向企业的报告自动生成方法研究综述[J]. 计算机科学与应用, 2022, 12(10): 2307-2317. https://doi.org/10.12677/CSA.2022.1210236

参考文献

[1] Lauriola, I., Lavelli, A. and Aiolli, F. (2022) An Introduction to Deep Learning in Natural Language Processing: Models, Techniques, and Tools. Neurocomputing, 470, 443-456. [Google Scholar] [CrossRef
[2] Goldberg, E. and Driedger, N. (1994) Using Natural-Language Processing to Produce Weather Forecasts. IEEE Expert, 9, 45-53. [Google Scholar] [CrossRef
[3] Charles, D. (2018) Robot Science Writers. Computing in Science & Engineering, 20, 101. [Google Scholar] [CrossRef
[4] Jing, B., Xie, P. and Xing, E. (2017) On the Automatic Gener-ation of Medical Imaging Reports. Proceedings of the 56th Annual Meeting of the Association for Computational Lin-guistics (Volume 1: Long Papers), Melbourne, 15-20 July 2018, 2577-2586. [Google Scholar] [CrossRef
[5] Weizenbaum, J. (1983) ELIZA—A Computer Program for the Study of Natural Language Communication between Man and Machine. Communications of the ACM, 9, 36-45. [Google Scholar] [CrossRef
[6] 吴俣, 李舟军. 检索式聊天机器人技术综述[J]. 计算机科学, 2021, 48(12): 278-285.
[7] 阮叶丽. 基于GAN考虑外部知识与情感属性的文本生成研究[D]: [硕士学位论文]. 武汉: 中南财经政法大学, 2020.
[8] Xiong, Y., Du, B. and Yan, P. (2019) Reinforced Transformer for Medical Im-age Captioning. International Workshop on Machine Learning in Medical Imaging, Conjunction, 13 October 2019, 673-680. [Google Scholar] [CrossRef
[9] Reiter, E. and Dale, R. (1997) Building Applied Natural Lan-guage Generation Systems. Natural Language Engineering, 3, 57-87. [Google Scholar] [CrossRef
[10] 徐泽建. 基于自动生成模板的知识库问答方法研究[D]: [硕士学位论文]. 南京: 东南大学, 2019.
[11] Reiter, E. (1995) NLG vs. Templates.
https://arxiv.org/abs/cmp-lg/9504013
[12] Gong, J., Ren, W. and Zhang, P. (2017) An Automatic Generation Method of Sports News Based on Knowledge Rules. Proceedings of 16th IEEE/ACIS International Conference on Computer and Information Science, Wuhan, China, 24-26 May 2017, 499-502. [Google Scholar] [CrossRef
[13] 姚颖. 基于模板的大数据统计分析报告生成方法研究与应用[D]: [硕士学位论文]. 广州: 华南理工大学, 2020.
[14] Li, C. and Min, L. (2020) The Current Situation and Future Development Trend of Robot News—Take the “Xinhua News Agency Kaibi Xiaoxin” as an Example. 2020 3rd Interna-tional Conference on Humanities Education and Social Sciences (ICHESS 2020), Chengdu, October 23-25, 1017-1020. [Google Scholar] [CrossRef
[15] Yu, H., Ning, J., Wang, Y., et al. (2021) Flexible Yard Man-agement in Container Terminals for Uncertain Retrieving Sequence. Ocean & Coastal Management, 212, Article ID: 105794. [Google Scholar] [CrossRef
[16] Yuan, W., Neubig, G. and Liu, P. (2021) BARTScore: Evaluating Generated Text as Text Generation. 35th Conference on Neural Information Processing Sys-tems, Online, 6-14 December 2021, 27263-27277.
[17] 邱锡鹏. 神经网络与深度学习[M]. 第1版. 北京: 机械工业出版社, 2020: 110-112.
[18] Cho, K., Merriënboer, B., Gulcehre, C., et al. (2014) Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, October 2014, 1724-1734. [Google Scholar] [CrossRef
[19] Sutskever, I., Vinyals, O. and Le, Q.V. (2014) Sequence to Sequence Learning with Neural Networks. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, 8-13 December 2014, 3104-3112.
[20] 石磊, 阮选敏, 魏瑞斌, 等. 基于序列到序列模型的生成式文本摘要研究综述[J]. 情报学报, 2019, 38(10): 1102-1116.
[21] Lewis, M., Liu, Y., Goyal, N., et al. (2020) BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehen-sion. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5-10 July 2020, 7871-7880, ArXiv Preprint ArXiv:1910.13461. [Google Scholar] [CrossRef
[22] 胡康, 奚雪峰, 崔志明, 等. [J].计算机科学与探索, 26(6): 376-377.
[23] Sherstinsky, A. (2020) Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica D: Nonlinear Phenomena, 404, Article ID: 132306. [Google Scholar] [CrossRef
[24] 杨丽, 吴雨茜, 王俊丽, 等. 循环神经网络研究综述[J]. 计算机应用, 2018, 38(2): 16-26.
[25] 张建华, 陈家骏. 自然语言生成综述[J]. 计算机应用研究, 2006(8): 3-13.
[26] Yu, Y., Si, X., Hu, C., et al. (2019) A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation, 31, 1235-1270. [Google Scholar] [CrossRef] [PubMed]
[27] Schuster, M. and Paliwal, K.K. (1997) Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 45, 2673-2681. [Google Scholar] [CrossRef
[28] Bahdanau, D., Cho, K. and Bengio, Y. (2014) Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473.
[29] Luong, M.T., Pham, H. and Manning, C.D. (2015) Effective Approaches to Attention-Based Neural Machine Translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 17-21 September 2015, 1412-1421. [Google Scholar] [CrossRef
[30] Nallapati, R., Zhou, B., Gulcehre, C., et al. (2016) Abstractive Text Summarization Using Sequence-to-Sequence RNNS and beyond. Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, 11-12 August 2016, 280-290. [Google Scholar] [CrossRef
[31] Kannan, A., Wu, Y., Nguyen, P., et al. (2018) An Analysis of Incor-porating an External Language Model into a Sequence-to-Sequence Model. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 15-20 April 2018, 1-5828. [Google Scholar] [CrossRef
[32] Feng, X., Liu, M., Liu, J., et al. (2018) Topic-to-Essay Gen-eration with Neural Networks. Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stock-holm, 13-19 July 2018, 4078-4084. [Google Scholar] [CrossRef
[33] Liu, T., Wang, K., Sha, L., et al. (2018) Table-to-Text Generation by Structure-Aware Seq2seq Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 4881-4888. [Google Scholar] [CrossRef
[34] 孟志刚, 吴云伟, 姜宇杰. 基于深度学习的财务机器人自动撰文场景研究[J]. 长沙大学学报, 2021, 35(2): 9-14.
[35] Zhu, Y., Zhang, W., Chen, Y., et al. (2019) A Novel Approach to Workload Prediction Using Attention-Based LSTM Encoder-Decoder Network in Cloud Environment. EURASIP Journal on Wireless Communications and Networking, 2019, Article No. 274. [Google Scholar] [CrossRef
[36] Han, K., Wang, Y., Chen, H., et al. (2022) A Survey on Vision Transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence. [Google Scholar] [CrossRef
[37] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems 30: Proceedings of the 2017 Conference: Neural Information Processing Systems Foundation, Long Beach, 4-9 December 2017, 5999-6009.
[38] 许晓泓, 何霆, 王华珍, 等. 结合Transformer模型与深度神经网络的数据到文本生成方法[J]. 重庆大学学报, 2020, 43(7): 91-100.
[39] 徐泽. 金融自动化报告生成系统的研究与实现[D]: [硕士学位论文]. 南京: 东南大学, 2020.
[40] Devlin, J., Chang M W, Lee, K., et al. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019—2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference: Association for Computa-tional Linguistics, Minneapolis, 2-7 June 2019, 4171-4186.
[41] Liu, Y., Ott, M., Goyal, N., et al. (2019) RoBERTa: A Robustly Optimized Bert Pretraining Approach. ArXiv Preprint ArXiv:1907.11692.
[42] 胡智喻, 杨婉霞, 杨泰康, 等. 结合BERT词嵌入与注意力机制的宋词自动生成[J]. 软件导刊, 2021, 20(11): 1-9.
[43] 劳南新, 王帮海. 基于BERT的混合字词特征中文文本摘要模型[J]. 计算机应用与软件, 2022, 39(6): 258-264+296.
[44] 当知识图谱“遇见”深度学习[M]. 第1版. 北京: 电子工业出版社, 2020: 554-556.
[45] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600.
[46] Wang, Q., Mao, Z., Wang, B., et al. (2017) Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Transactions on Knowledge and Data Engineering, 29, 2724-2743. [Google Scholar] [CrossRef
[47] Kim, T., Yun, Y. and Kim, N. (2021) Deep Learning-Based Knowledge Graph Generation for COVID-19. Sustainability, 13, Article No. 2276. [Google Scholar] [CrossRef
[48] 靳京. 基于深度学习融入实体描述的知识图谱表示学习研究[D]: [硕士学位论文]. 北京: 北京交通大学, 2018.
[49] Li, C.Y., Liang, X., Hu, Z., et al. (2019) Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 6666-6673. [Google Scholar] [CrossRef
[50] 李娇. 基于知识图谱的科研综述生成研究[D]: [博士学位论文]. 北京: 中国农业科学院, 2021.
[51] Reiter, E., Mellish, C. and Levine, J. (1995) Automatic Generation of Tech-nical Documentation. Applied Artificial Intelligence an International Journal, 9, 259-287. [Google Scholar] [CrossRef
[52] 柏欣雨. 基于智能模板的消化内镜报告自动生成系统设计与实现[D]: [硕士学位论文]. 济南: 山东大学, 2021.
[53] Wen, T.H., Gasic, M., Mrksic, N., et al. (2016) Mul-ti-Domain Neural Network Language Generation for Spoken Dialogue Systems. ArXiv Preprint ArXiv:1603.01232.
[54] Kafle, K. and Kanan, C. (2017) Visual Question Answering: Datasets, Algorithms, and Future Challenges. Computer Vision and Image Understanding, 163, 3-20. [Google Scholar] [CrossRef
[55] 曹娟, 龚隽鹏, 张鹏洲. 数据到文本生成研究综述[J]. 计算机技术与发展, 2019, 29(1): 80-84+89.
[56] Papineni, K., Roukos, S., Ward, T., et al. (2002) BLEU: A Method for Au-tomatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, 7-12 July 2002, 311-318. [Google Scholar] [CrossRef
[57] 刘群. 统计机器翻译综述[J]. 中文信息学报, 2003, 17(4): 2-13.
[58] 胡康, 奚雪峰, 崔志明, 等. .深度学习的表格到文本生成研究综述[J]. 计算机科学与探索, 2022, 11(5): 266-287.
[59] Lin, C.Y. (2004) Rouge: A Package for Automatic Evaluation of Summaries. In: Text Summarization Branches out, Association for Computational Linguistics, Barcelona, 74-81.
[60] 陈晨, 朱晴晴, 严睿, 等. 基于深度学习的开放领域对话系统研究综述[J]. 计算机学报, 2019, 42(7): 1439-1466.
[61] 王岩, 李沁. 一种基于复合图像语义的图像描述方法与流程[P]. 中国专利, CN202110376986.2. 2021-10-19.
[62] Lin, C.Y. and Och, F.J. (2004) Looking for a Few Good Metrics: ROUGE and Its Evaluation. Fourth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization, Tokyo, 2-4 June 2004.
[63] Huang, D., Cui, L., Yang, S., et al. (2020) What Have We Achieved on Text Summarization? Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Online, 16-20 November 2020, 446-469. [Google Scholar] [CrossRef
[64] Dale, R. (2020) Natural Language Generation: The Commer-cial State of the Art in 2020. Natural Language Engineering, 26, 481-487. [Google Scholar] [CrossRef