Preview

NSU Vestnik. Series: Linguistics and Intercultural Communication

Advanced search

Text Vectorization Methods for Retrieval-Based Chatbot

https://doi.org/10.25205/1818-7935-2020-18-3-16-34

Abstract

Nowadays, a field of dialogue systems and conversational agents is one of the rapidly growing research areas in artificial intelligence applications. Business and industry are showing increasing interest in implementing intelligent conversational agents into their products. There are numerous applications of chatbots in industry, banking, healthcare, and education; and it keeps on growing year-by-year. Many recent studies has tended to focus on possibility of creating intelligent bots helping users not only to accomplish specific tasks (by identifying their intents from text or voice conversations using artificial intelligence), but to capture the user’s identity, attributes, engagement data, and any feedback the user provides - to better handle a wide variety of conversational topics imitating human-like behavior. In this paper, we review the recent progress in developing intelligent conversational agents (or chatbots), its current architecture (rule-based, retrieval based and generative-based models) as well as discuss the main advantages and disadvantages of the approaches. Additionally, we conduct a comparative analysis of state-of-the-art text data vectorization methods (i. e. word/sentence embeddings) which we apply in implementation of a retrieval-based chatbot as an experiment. The results of the experiment are presented as a quality of the chatbot responses selection using various R10@k measures. We also focus on the features of open data sources providing dialogues in Russian. Natural language processing (NLP) techniques for the collected dialogue data are described. Both the final dataset and program code are published. In this paper, the authors also discuss the issues of assessing the quality of chatbots response selection, in particular, emphasizing the importance of choosing the proper evaluation method. We also demonstrate examples of chatbot dialogues implemented using text vectorization models (TF-IDF-weighted Word2Vec embeddings and LASER sentence embeddings) which revealed best performance. Our future work research is also briefly described in this paper.

About the Authors

Y. A. Zherebtsova
ITMO University
Russian Federation


A. V. Chizhik
ITMO University; Saint Petersburg State University
Russian Federation


Review

For citations:


Zherebtsova Y.A., Chizhik A.V. Text Vectorization Methods for Retrieval-Based Chatbot. NSU Vestnik. Series: Linguistics and Intercultural Communication. 2020;18(3):16-34. (In Russ.) https://doi.org/10.25205/1818-7935-2020-18-3-16-34

Views: 576


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7935 (Print)