Development of a Basic Chemistry Conversational Corpus - mauriceling/mauriceling.github.io GitHub Wiki

Citation: Ling, MHT, Musttakim, S, Lau, PN. 2023. Development of a Basic Chemistry Conversational Corpus. Acta Scientific Nutritional Health 7(2): 48-54.

Link to [abstract], and [video].

Here is the permanent [PDF], and [Data Set] links to my archive.

Chatbot technology can be an important tool and supplement to education, leading to explorations in this area. Corpus-based chatbot building has a relatively low entry barrier as it only requires a relevant corpus to train a chatbot engine. The corpus is a set of human-readable questions and answers, and may be an amalgamation of existing corpora. However, a suitable chemistry-based chatbot corpus catering for a freshman general chemistry course addressing inorganic and physical chemistry has not been developed. In this study, we present a basic chemistry conversational corpus consisting of 998 pairs of questions and answers, focused on a freshman general chemistry course addressing inorganic and physical chemistry. Ten human raters evaluated the responses of a chatbot trained on the corpus and suggests that the corpus resulted in better response than random (t = 17.4, p-value = 1.86E-53). However, only 20 of the 50 test questions show better responses compared to random (difference in mean score ≥ 1.9, paired t-test p-value ≤ 0.0324), suggesting that the corpus provides better responses to certain questions rather than overall better responses, with questions related to definitions and computational procedures answered more accurately. Hence, this provides a baseline for future corpora development.