Context and Sequentiality in Conversational Applications

Contextual memory in conversational applications plays a central role in any type of interaction between the Chatbot and the user. It is the bidirectional transfer of information where interlocutors are aware of the relational, environmental, and cultural context of the exchange. I will show some examples on how a contextual based system might improve the flow of the dialog.

Deeplearning in Text Classification

In the Divine Comedy, Minos is a daemon appointed to guard the entrance of the hell. He listens to the sins of souls and indicates them their destinations by wrapping his tail as many times as the assigned circle. The figure is emblematic of the machine learning classification, where an entity is identified as belonging to a category or to another. Rather than condemning souls to endless pains, the harmless tool I am describing can judge whether an user’s utterance belongs to a specific intention, or to a limited range of emotions. Namely, it can serve intention recognition and sentimental analysis.

In the realm of conversational commerce, the examined sentence could be:

I want to buy some apples and pears

The system recognizes the intention search and presents the results.

Intention prediction is not an untackled problem and the market offers plenty of services. There are many players such as Google (Api.ai), Facebook (Wit.ai), Microsoft (Luis.ai) just for mentioning some of them, but this shouldn’t prevent further explorations in the topic, sometimes with unexpected positive surprises, as shows in the graph.

The test was performed against real data used for training the deployed model of the Chatbot system and the results are relevant for the real working scenario, so no cherry picking in this case. 300 training samples, 56 test samples for 25 classes, these are the dataset’s numbers.

Minos, the text classifier, uses an ensemble of machine learning models. It combines multiple classifiers for getting a good prediction out of utterances submitted to Charly. One of the models is based on Convolutional Neural Networks (CNN).

CNN in NLP

CNN is mostly applied to image recognition thanks to the tollerance on translations (rotations, distortions) and the compositionality principle (entities are composed by its constituents). Admittedly, CNN might appear counter-intuitive at a first approach because text looks very different from images:

1. The order of the words in text is not as important as the order of the pixel in an image.
2. Humans percept text sequentially, not in convolutions.

Invariance

Entities like images and texts, should be compared differently. The smallest atomic element in text is the single charater, rather than the word, like the pixel in images. The proportion is more like:

text : char = image : pixel

By this angle of view, the order of characters in sentences is fundamental. Convolutions in text come in form of:

single word => bi-grams (two adjacent words) => n-grams

like graphical features

lines , corners => mouths, eyes => faces

come out of portraits.

In CNN the pair adjective + object for example, could be recognized invariantly of its position, at the begin or at the end of a sentence, exactly like a face is recognized wherever it is located in the whole picture.

Sequentiality

It might seem more intuitive to apply Recurrent Neural Networks (like LSTM, Attention or Seq2seq) for text classification, due to the sequential nature of RNNs algorithms. I didn’t run any test on them so far, but I would promptly play with TreeLSTM. CNN performs well, and one might say that Essentially, all models are wrong, but some are useful, an essay the fit with the idea that final outcome drives the decisions, and experimental results play an important role.

Word Embeddings

Alike any NLP, in CNN words are replaced by their correspective semantic vector. Most famous are Google word2vec, GloVe and FastText. I decided to make use of ConceptNet Numberbatch that took first place in both subtasks at SemEval 2017 task 2. Moreover, the vector file is very small (~250M) compared to Google News word2vec (~1.5G) and from an engeneering point of view, those numbers matter.

Minos is still experimental and not well tuned, doors are open for improvements. An aspect shouldn’t be ignored on working with CNN is the Catastrofic Forgetting, an annoying phenomenon that ruins irrevocably the entire training.

Automated Question Answering using Semantic Networks

I worked recently in a small prototype that combines NLP analysis and semantic datasources for answering simple generic questions, by learning how to get the informations given a fairly small amount of question/answer pairs.

Conversational interactions represents the core of any modern Chatbot and the ability to manage utterances and conversations is the strongest indicator of user’s satisfaction. A natural and spontaneous QA dialogue, as every Chatbot would aim to engage, will attempt to solve 3 fundamental issues:

1. Classify utterances and extract dependencies between words.
2. Integrate source of knowledge.
3. Infer transitive semantics (e.g., reconstructing what it is implied but not written).

Machine Comprehension on Chatbots

One of the most demanded feature in chatbots is the ability to automatically provide helpful informations. Users might ask about how to pay the purchases online, how to return an defected item, when the purchase could be delivered or just about the opening hours of a shop.

One way to implement this feature is to train a sentence classifier for a determined set of questions the merchant is willing to answer. The system should be instructed on some examples such as: “which credit card do you accept?”, “How do I pay?”, “which payment do you support?” and so on. This simple technique requires a sequence of manual tasks for every conversational agent, such as set up the training and inference pipeline for questions/answers, or reuse the Natural Language Understanding (NLU) system already adopted by the chatbot, if present.

SOA example application

SOA describes a set of patterns for creating loosely coupled, standards-based business-aligned services that, because of the separation of concerns between description, implementation, and binding, provide a new level of flexibility.

Service Oriented Architecture terminology has spread in recent years, at least among people who were involved in most of the Information Technology activities. The guidelines suggested by this methodology are granted as major factors to succeed in different distributable systems domains. Just as the definition is clear and easy to understand, so is its implementation into a real project, being intuitive, concise and elegant.

I have released an application demonstrating how SOA’s principles can be applied into a small project making use of EIP (Enterprise Integration Pattern), IoC (Inversion of Control), and a building tool and scripting language such as Groovy. I analized a simple business case: an entertainment provider who wanted to dispatch rewards and bonuses to some of its customers, depending on customer service’s subscriptions. The process sequence is simple:

It is required to provide an implementation of a RewardsService. The service accepts as input a customer account number and a portfolio containing channels subscriptions. The Customer Status team is currently developing the EligibilityService which accepts the account number as an input.

I set up an infrastructure to write acceptance tests for this first meaningful feature. This is what could be defined as a “walking skeleton,” a prototype with the essential aspect that it could be built, deployed and tested after being easily downloaded from Github.

RewardService is invoked by the client and it calls, in turn, the eligibility service which however, in this case is not implemented. As many real scenarios expect external services, this proof-of-concept refers the eligibility service to a black-box, where only request/response interface is known.

The unit test simulates the eligibility service behaviors mocking the end-point through the Camel Testing Framework. However, if you want to run the application on your local machine I set up, within a line of code, a faux eligibility service that merely returns a positive response:

The entry point is an HTTP Restful interface built upon the Apache CXF, and is easily set up within few lines in the configuration. CXF is initialized by Spring in this following way:

Services are connected by Apache Camel. RewardService contains only the reference of the ESB context – an instance of ProducerTemplate. Such solution allows a complete separation between the linking system and the business services. The Camel context represents the SOA’s wiring, and is configured through a DSL as in the example below: