Research Proposal - HongyuJiang/Persona-Chatbot GitHub Wiki

The dialogue system/chatbot get growing attention in recent years, it is designed to generate human-like responses based on a given input, often in the form of text or speech. Those applications are mainly designed to simulate the interaction between people. The chatbot is primarily used in business and corporate organizations, including government, non-professional and private organizations. Their application covered from customer service, product advice, product inquiries to personal assistants and contain a high commercial value. For instance, chatbot also called intelligent customer service or virtual assistants which supporting financial banking service, which can provide accurate and personalized consultation and services to users via text or speech dialogue. It significantly reduces the costs of the manual call center and learns user preferences and habits more accurately. Besides, it can help service provider identifying the potential needs of the customer, and increase the user retention and conversion rates. Now, we know that the development of the dialogue system generally begins with the classic chatbot ELIZA (the term Chatbot was first mentioned by Michael Loren Mauldin in 1994). From the birth of ELIZA, the technical development of the dialogue system history can be mainly divided into three stages:

The 1st gen: based on symbolic rules and templates, it is still being used in many fields from the beginning of the 1980s. It mainly relies on the grammar rules and ontology design which manually created by the domain experts. This technique is easy to explain and repair. However, it strongly relies on expert systems, and it is hard to be transferred from one domain to another. The data is used to help design the rules rather than learning the features in data. The application is limited in a narrow region;
The 2nd gen: data-driven shallow learning, which is started in the 1990s (the study of reinforcement learning for dialogue strategy is also started at this time). It is the mainstream technique in commercials at now, which based on learning the statistic features of data to design the dialogue system. It is also hard to understand, repair and expand. The weakness of the representation ability of models shows the limitation. Moreover, it not based on end-to-end made it challenging to scale up;
The 3rd gen: data-driven deep learning, which started in recent years, the current mainstream in the research field, like the 2nd gen, learning the parameters from data. However, the representation ability of neural model is powerful, and the end-to-end learning made the dialogue system more feasible (i.e., the completely data-driven and the model is a black box). Unfortunately, there is no successful business case yet.

The response generation in the current dialog system is attached to a series of techniques. In search-based techniques, the chat agent scans for query keywords in inputs then retrieves relevant answers based on the query string. This approach relies on keyword similarity. It extracts retrieved text from internal or external data sources, including the WWW or databases in organizations. Some advanced chatbots are developed by using natural language processing (NLP) techniques and machine learning algorithms. Besides, there are a lot of commercial chat engines which is used to help to build chatbots with client’s inputs. However, only the rule-based and retrieval-based approaches can be introduced in applications, the dialogue system which based on simple machine learning cannot work. However, the functions of various dialogue systems are strictly limited. They show lowly flexibility and diversity in real conversations. The current intelligent personal assistants like Amazon's Alexa, Microsoft's Cortana and Google's Google Assistant are developed with search-based methods, which are highly restrictive and they cannot conversation like the human being.

In order to improve the text generation flexibility of the dialogue system, the deep learning-based method is gradually introduced into text generation. Deep learning-based methods learn essential features representation and response generation strategies from a large amount of corpus. This method can be applied in open-domain scenes with potential in the future. A simple dialogue system’s building only needs developers to set parameters and network. Then it can generate responses without default answers, for example, Google's Neural Machine Translation (NMT) model, a sequence-to-sequence (Seq2Seq) modeling and encoder-decoder architecture. The encoder-decoder uses a recursive neural network with bidirectional LSTM (long-short-term memory) units, which is ideal for generating sentences in question-and-answer types. Based on this, to improve the diversity and authenticity of the dialogue, this work hopes to embed the personality traits of the person into the dialogue system to make dialogue system more human-like.

Personality represents a set of mental traits or types of interpretation and prediction of ideas, patterns, and behaviors of feelings. It remains relatively stable over time and different environments, it is a diversity’s source of human, which can affect the probability of success of relationships, work, and learning. As to the measurement and description of human’s personality, the "Big Five" personality traits model (OCEAN) is a standard of personality psychology, which use multiple indicators to describe the differences in personality traits of human beings numerically. In order to personalize the dialogue system, we can calculate the correlation between a wide range of linguistic variables and the Big Five features, made the different personality traits related to the person’s preference of different vocabulary, grammar, tone, etc., even the style of sentences generation, further the persona language system. The language system is not only influenced by personality, but also the emotions of the speaker. The PAD (Happiness, Awakening, Dominance) model has been matured to describe the human’s emotions. It was also being applied to consumer marketing strategies and emotional expression of animated characters in virtual worlds. Hence, this study also aims to describe the speaker’s emotions by the PAD model and calculate the correlation of the human linguistic variables with personality traits and emotions of humans.

Also, how to embed personality traits into the deep learning model (Seq2Seq) is a hot topic in academic. The researchers are working in make dialogue behavior approaching human-like and own their unique language system. The popular method takes the conversations between persons having different personalities as the deep learning model’s input, made the model learns the speaker’s speech style and words using preference, find the hidden variables of personalities which profoundly affect the language system. When the model receives a new question and personality traits, the model can generate the corresponding answer which matches the respondent's personality. This method having both flexibility and diversity, but it cannot adjust the language style flexible, because of the deep learning model is hard to understanding and explanation, the defect in dialogue strategy dynamically adjustment based on user’s information, and the difficulty of handling ambiguous items in user’s request.

To improve the adversity and flexibility of the dialogue system, and support the users to control the personality of the dialogue system. Our work will focus on embed personality to the dialogue system based on deep learning method. With the present situations, the personality is hard to describe in a mathematical method, and the personality traits are hard to objective evaluation, our exploration work hopes to change it. In this work, the Seq2Seq and attention model will be introduced, by using the personality corpus (which include thousands round conversation from persons with various personality), the Attention model is used to help to search in input contents, as an encoder which specially used to generate the target words. It breaks the limitation of the fixed length of vectors which been forced encoded by the input sequence in the traditional model, in the same time, when the model is generating next target work, it will focus on related information in inputs.

We assumed that people with different personalities have different attention to various elements in the dialogue sentences. For example, sensitive people are more concerned with the tone of dialogue, and straightforward people are more consider about the results of dialogue (nouns and verbs in sentences). In this work, we try to use attention model to characterize the relationship between personality features and language systems, made the attention models fit to the different personality. For the issue of low adversity and plenty of information in dialogue, this work will introduce the Beam Search method to improve the diversity of the dialogue system and reduce the frequency of meaningless responses.

Based on the above work, taking the dialogue system as a case, we try to add the personality factor into different intelligent agents (such as robots, mobile phones, computers, etc.), which enables smart devices initiative to match the styles of different users, then improve the satisfaction and experience for the users. Afterward, this work proposes the ubiquitous personality conception, discuss the application value of personality for different intelligent agents, the value to society, and the roadmap in the future development. On the one hand, personalized intelligence can make the steely machine becoming more human-like, and create more possibilities for the future world.

In general, the specific research route of this work is as follows:

Personality detection based on personal corpus;
Personalized conversation generation by using the Seq2Seq +attention models;
Embedding the personality traits in smart devices.