基于协同过滤的聊天机器人话题推荐
Chatbot Topic Recommendation Based on Collaborative Filtering
DOI: 10.12677/AIRR.2020.92018, PDF,   
作者: 李 欢, 徐 慧:中国矿业大学(北京),北京
关键词: 闲聊机器人协同过滤话题推荐Chatbots Collaborative Filtering Topic Recommendation
摘要: 随着深度学习技术的迅速发展,市场上出现了越来越多的聊天机器人,按照功能大致分为5类,客服、闲聊、教育、个人助理以及问答型聊天机器人。它们的主要功能不一样,问答型主要满足用户信息查询的要求,闲聊型主要负责和用户对话,给用户带来情感慰藉和精神陪伴。按照回复方式的不同,闲聊型聊天机器人又分为基于检索式和基于生成式。二者都普遍存在一个问题,即和用户聊天时很容易陷入僵局,中断聊天,影响用户体验。基于检索式的聊天机器人预先在知识库中构建好话题及对应的答案,当用户提出话题后,寻找知识库中和用户话题最相似的并给出回复。这类机器人聊天容易陷入僵局原因有以下两点。1、知识库中匹配不到用户话题相关的条目,此时会选择库中出现概率最大的句子进行回复,往往是“嗯嗯”、“我知道了”等万能回复。2、知识库中能检索到用户话题的答案,但回复的质量差或用户没有兴趣。这两类情况都容易使聊天陷入僵局。基于生成式的使用对话语料训练神经网络逐词的生成回复,往往出现回复语句不通顺,即回复质量差的问题,使得聊天陷入僵局。针对回复质量差、用户对回复不感兴趣使得聊天陷入僵局的问题,本文提出:1、结合基于协同过滤和关键词提取的方式收集用户兴趣,使用关键词提取收集用户兴趣,让机器人围绕兴趣和用户聊天,避免用户因对话题不感兴趣而陷入僵局。使用协同过滤的目的是扩展用户兴趣。其他用户的历史聊天信息中也可能包含当前用户的兴趣,因此可以结合协同过滤方式来扩展用户的兴趣,优化推荐效果。2、引入外部热点话题(百度热点、微博热搜)结合用户兴趣生成用户可能感兴趣的话题,当聊天陷入僵局时,给用户推荐可能感兴趣的话题,以此打破僵局,增强用户体验。另外判断聊天是否陷入僵局也是本文的重点,这涉及到后续能否形成推荐。因此本文重点研究了短文本相似度算法,以此来检测聊天是否陷入僵局,即判断用户话语和机器人回复的相关性,根据相关性高低来判断是否陷入僵局。最后通过对比实验,使用持续对话轮数评价指标验证了本文提出的方法是可行的。
Abstract: With the rapid development of deep learning technology, there are more and more chatbots on the market, which are roughly divided into five categories according to their functions: customer ser-vice, small talk, education, personal assistant and question-and-answer chatbots. Their main func-tions are different. The question-and-answer type mainly meets the requirements of the user’s in-formation inquiry, while the small talk type is mainly responsible for the conversation with the user, bringing emotional comfort and spiritual company to the user. According to the different ways of reply, the chatbots are divided into two types: retrieval-based and generative-based. There is a common problem in both of them, that is, when chatting with users, it is easy to get bogged down, interrupt the chat and affect the user experience. The retrieval based chatbot constructs the topic and the corresponding answer in the knowledge base in advance. When the user raises the topic, it looks for the topic most similar to the user’s topic in the knowledge base and gives the reply. There are two reasons why this type of robot chat is prone to deadlock. 1. The items related to the user’s topic cannot be matched in the knowledge base, so the sentences with the highest probability in the library will be selected for reply, which is usually “mm-hmm”, “I know” and other universal replies. 2. The knowledge base can retrieve the answers of users’ topics, but the quality of the replies is poor or the users are not interested. Both of these situations can easily lead to a conversation impasse. When the neural network is trained to generate word-by-word responses based on generative conversational data, it is often found that the response statement is not smooth, that is, the response quality is poor, which leads to the deadlock of the chat. Replying for poor quality, the user that is not interested in reply makes chat deadlock problem, this paper puts forward: 1. based on collaborative filtering and keyword extraction way to collect user interest, use the keyword extraction collect user interest, let robot around the interest and the user to chat, to avoid the user stalled by is not interested in topic. The purpose of using collaborative filtering is to expand user interest. Other users’ historical chat messages may also contain the interests of the current user, so the collaborative filtering method can be combined to expand the interests of users and optimize the recommendation effect. 2. Introduce external hot topics (Baidu hot topics, hot searches on Weibo) to generate topics that users may be interested in by combining with their interests; when the chat is deadlocked, recommend topics that users may be interested in to break the deadlock; enhance the user experience. Judging whether the chat is deadlocked is also the focus of this article, which involves whether recommendations can be made later. Therefore, this paper focuses on the similarity algorithm of short texts to detect whether the chat is deadlocked, that is, to judge the correlation between the user’s speech and the robot’s reply to judge whether you are deadlocked by the level of correlation. Finally, the method proposed in this paper is proved to be feasible by using the evaluation index of the number of rounds of continuous dialogue.
文章引用:李欢, 徐慧. 基于协同过滤的聊天机器人话题推荐[J]. 人工智能与机器人研究, 2020, 9(2): 154-162. https://doi.org/10.12677/AIRR.2020.92018

参考文献

[1] 曹东岩. 基于强化学习的开放域聊天机器人对话生成算法[D]: [硕士学位论文]. 哈尔滨: 哈尔滨工业大学, 2017.
[2] 张世尧. 基于用户聚类的微博话题推荐方法研究[D]: [硕士学位论文]. 安徽: 安徽理工大学, 2017.
[3] 王广新. 基于微博的用户兴趣分析与个性化信息推荐[D]: [硕士学位论文]. 上海: 上海交通大学, 2013.
[4] 杨晶. 用户兴趣模型及实时个性化推荐算法研究[D]: [硕士学位论文]. 南京: 南京邮电大学, 2013.