Subword Semantic Hashing for Intent Classification on Small Datasets

Document identifier: oai:DiVA.org:ltu-76841
Access full text here:10.1109/IJCNN.2019.8852420
Keyword: Natural Sciences, Computer and Information Sciences, Computer Sciences, Naturvetenskap, Data- och informationsvetenskap, Datavetenskap (datalogi), Natural Language Processing, Intent Classification, Chatbots, Semantic Hashing, Machine Learning, State-of-the-art, Maskininlärning
Publication year: 2019
Abstract:

In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based methods [11], [13], [14] are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise with the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: Chatbot, Ask Ubuntu, and Web Applications [3]. Our benchmarks are available online.

Authors

Kumar Shridhar

MindGarage, Technical University Kaiserslautern, Germany
Other publications >>

Ayushman Dash

MindGarage, Technical University Kaiserslautern, Germany
Other publications >>

Amit Sahu

MindGarage, Technical University Kaiserslautern, Germany
Other publications >>

Gustav Grund Pihlgren

Luleå tekniska universitet; EISLAB
Other publications >>

Pedro Alonso

Luleå tekniska universitet; EISLAB
Other publications >>

Vinaychandran Pondenkandath

University of Fribourg, Switzerland
Other publications >>

György Kovács

Luleå tekniska universitet; EISLAB
Other publications >>

Foteini Simistira

Luleå tekniska universitet; EISLAB; University of Fribourg, Switzerland
Other publications >>

Marcus Liwicki

Luleå tekniska universitet; EISLAB; MindGarage, Technical University Kaiserslautern, Germany. University of Fribourg, Switzerland
Other publications >>

Record metadata

Click to view metadata