Giancarlo Frison Signals from the Noise

Context and Sequentiality in Conversational Applications

Contextual memory in conversational applications plays a central role in any type of interaction between the Chatbot and the user. It is the bidirectional transfer of information where interlocutors are aware of the relational, environmental, and cultural context of the exchange. I will show some examples on how a contextual based system might improve the flow of the dialog.

Consider the following sentence:

If you prefer salty food, tiramisù is a bad dessert to eat.

If we quote only the part that says:

tiramisù is a bad dessert to eat.

The resulting sense is completely misleading. The speaker appears to simply saying that tiramisù is a bad dessert for everybody while it is a bad choice only for those who don’t like sweet and prefer instead salty food. Even though the quoting is taken directly from the original sentence, it omits important informations that you need to have for understanding what the speaker is really saying. That omitted information is the context.

Context is the circumstances surroundings a message.

In the realm of conversational applications, where users can dialog with Chatbots, the context is a fundamental aspect for any type of interaction that spans from goal-oriented to one-shot tasks. Let’s consider a simple case, where the same sentence appears in different contexts with consequentially different outcomes.

Taking the following sentence: Add bread into cart the system should react accordingly with what the user has previously entered, more precisely:

  • If user has just searched for a product, the system will drive him to select the desired product and tap the button below → Drive the user to tap the related ‘Add to Cart’ button.
  • In all other cases → Search for bread.


Context as customer engagement

The sequentiality of interaction between customer and Chatbot is the key for understand user’s intention, in any situation. In the online buying process, shortening the conversion rate and accelerating checkouts makes real difference between success and failure of e-commerce initiatives. Conversational systems can tie customer engagement and purchase in a very short cycle. Smart sequencing, when seamlessly embedded in the processes, can lead new purchase opportunities.

Charly context dialog flow

Exploring contextual repetition

Sequence is a collection of events that might contain repetition. Repetitive inputs might mean unsatisfied or not understood requests. In this odd example we can simulate a behavioral pattern where an interlocutor may get upset by an annoying and repetitive progression of unsolicited opinions, as may I rhetorically define this following conversation:

In this article I summarized how the ability to apply Contextual Intelligence should represent an intrinsic skill for any conversational application, for any scenario. It is the proficiency at automatically adapt responses to what the user is demanding during a conversation. All those example cases are actually implemented in Charly. How I made it, will be the subject on a following article. Stay tuned!

Catalog Entity Extraction for Search

Keyword extraction from search queries is a fundamental aspect of conversational commerce. In this article I illustrate a simple but effective way to get relevant entities from user’s utterances and rank them against an unstructured product catalog and an ontology database.

The primary purpose of a conversational application is to serve user demands, and when an user search in a e-commerce context, he is mostly looking for products. There is one main distinction that characterize a query when it is performed in the website rather than a messaging application. In the website, when users submit a query they already express their search intention, therefore the terms are usually concise and descriptive. Conversely, when inquiring a Chatbot, users use more expressive forms such as: Could you suggest me pale ale beers and ice creams for my party?. While the intention is deducted by a classification task, relevant terms for search, are just a subset of the entire sentence.

Baseline approach for searching, would be to take all text as query, returning innumerable hits of everything even remotely relevant, providing little help for customers. Another solution regards Named Entity Recognition, a class of algorithms that seeks and classify entities, also by means of neural networks.

While machine learning techniques can reach high levels of accuracy, they might not be the favorite solution for production usage. They require hardly available training data, and what will work for a specific product segment will not work for another. That is why the following approach could offer the flexibility demanded for real use case scenarios. It is easily plugged in any e-commerce without any particular adaptation. This method is very simple. I don’t consider structured product features, rather I take in account only simple and concise information that is obtained just by the product name.

I want to extract the features that might affect the Chatbot’s answer, based on the quality of the search query.

It is very plausible to give straights result when the query is really pertinent to returned item list, as well as informing the users whenever the query terms do not match exactly with what we can offer them, or even when the query terms demand for something we cannot provide. The desirable features are:

  • Query entities selection. When in the query there are more than one entity cluster, the conversational agent will be able to detect it and to ask the user to choose with entity will search first. For example, in query above there 2 terms: pale ale beers and ice creams. For example the Chatbot could answer:

    Are you searching for pale ale beer or ice cream?

  • Partial term matching. The user is prompted that the exact criteria does not match, but a less ranking one is provided. Pale ale beers is not in catalog, but ale beer yes.

    We don’t have pale ale beer, but just ale beer. These are our suggestions:…

  • Term out of market segment. Prompt the user that the inquired item is not sold by this shopping website.

    We do not sell insurance, sorry.

Indexing and searching tasks

The two fundamental tasks in information retrieval are the one for collecting and storing product informations, and on the other side, the task for obtaining them. Index phase collects features from the products’ name, while the search phase extracts matches from text query. Both tasks manipulate text in the following ways: Entities clusterization, Part Of Speech filtering, Lemmatization.

Entities clusterization

The objective is to isolate every entity within their search space (or features) that refines the query. For doing that, I use stop words (irrelevant terms such as articles, prepositions, adverbs) and some punctuations (full stops, semicolon, exclamation and question marks) to split the entire sentence into word clusters

could you suggest me pale ale beers and ice creams for my party

This rainbowed sentence assume me, and, for as stop words for tokenizing the possible entities clusters.

Part Of Speech filtering

Clusters previously obtained are filter by their Part Of Speech (POS) classification. The POS tagging assigns to each word their definition as noun, verb, adjective, adverb. I explicitly exclude verbs, adverbs and pronouns. This is why could you suggest is excluded since it is entirely formed by ignored words. The output is represented as:

could you suggest / pale ale beers / ice creams / party


Lemmatization refers to the process of returning the root form of inflected words, in order to facilitate the analysis and the search of those terms. For example, “Finds” and “found” are grouped together as “find”. In this way, cluster entities are turned into:

pale ale beer(s) / ice cream(s) / party

Catalog indexing

Text manipulation, as above described, occurs both for storing the catalog data and for querying.

In the indexing phase, when all catalog is scanned, parsed and tokenized, all particles will be stored into a Set. A Set is a collection of distinct items. For efficiently storing the presence of a particular cluster, bloom filters play a fundamental role.

Bloom Filters

How to check if a n-gram is present in the product list? Bloom filters solve the problem on storing large Set in a fixed and pre-defined sized vector. By the algorithm, an element is converted in some numeric values (h) and set true in a bit vector, at the h position.

Bloom filters allow to compress a large amount of source data, negotiating a grade of uncertainty.

How could be validated the presence of the element in the bit array? Just checking if the vector is true/false in the h position. That gives the certainty whether the element is not present, or, whether vector checking is positive, a determined confidence degree that such element is present. The true positive probability depends on the vector length and the number of hashes.

N-gram generation

An n-gram is a contiguous sequence of n words. I generate all possible combination of n-grams out of word clusters. The emphasized words are the result of this generation:

Beer Ice cream Party
pale ale beer ice cream party
pale ale ice  
ale beer cream  

Once the n-grams are generated, it is fast to check if one of them is present by inquiring the catalog bloom filter. For each entity cluster, we can check n-grams starting from the longest, in order to prioritize what exactly the user wants. We want to know also if the exact entity cluster is not present but only an its sub-gram. Moreover, we need to deal with such queries that asks products or services not offered by the given catalog market segment.

Ontology database

How can we check whether in the query there are valid terms but they are not treated by us? ConceptNet could be the answer. For this purpose more than 400K terms have been collected among several categories and indexed as the catalog terms in a separate database.


At the end of this process the final output will look like to this:

entity clusters:
  term: pale ale beer
  catalog: false
    term: ale beer
    catalog: true

  term: ice cream
  catalog: true

  term: party
  catalog: false

I have described a simple way for extracting query terms from a raw sentence. This approach provides useful information that could be managed by an conversational engine for corroborating search results with meaningful answers. On the other hand, this model doesn’t handle with misspellings, which represent alone about 15% of online search failures. This technique doesn’t deal with relatedness matching, or semantic matching. That means we can’t satisfy the search with relevant and pertinent results whenever customers use different terms from those in the website. I have already solved this problem by means of neural networks, and I will describe it in another article.

Acknowledgment Thank you Sidi for the contribution.

Deeplearning in Text Classification

In the Divine Comedy, Minos is a daemon appointed to guard the entrance of the hell. He listens to the sins of souls and indicates them their destinations by wrapping his tail as many times as the assigned circle. The figure is emblematic of the machine learning classification, where an entity is identified as belonging to a category or to another. Rather than condemning souls to endless pains, the harmless tool I am describing can judge whether an user’s utterance belongs to a specific intention, or to a limited range of emotions. Namely, it can serve intention recognition and sentimental analysis.

In the realm of conversational commerce, the examined sentence could be:

I want to buy some apples and pears

The system recognizes the intention search and presents the results.

Intention prediction is not an untackled problem and the market offers plenty of services. There are many players such as Google (, Facebook (, Microsoft ( just for mentioning some of them, but this shouldn’t prevent further explorations in the topic, sometimes with unexpected positive surprises, as shows in the graph.

Minos Accuracy

The test was performed against real data used for training the deployed model of the Chatbot system and the results are relevant for the real working scenario, so no cherry picking in this case. 300 training samples, 56 test samples for 25 classes, these are the dataset’s numbers.

Minos, the text classifier, uses an ensemble of machine learning models. It combines multiple classifiers for getting a good prediction out of utterances submitted to Charly. One of the models is based on Convolutional Neural Networks (CNN).


CNN is mostly applied to image recognition thanks to the tollerance on translations (rotations, distortions) and the compositionality principle (entities are composed by its constituents). Admittedly, CNN might appear counter-intuitive at a first approach because text looks very different from images:

  1. The order of the words in text is not as important as the order of the pixel in an image.
  2. Humans percept text sequentially, not in convolutions.


Entities like images and texts, should be compared differently. The smallest atomic element in text is the single charater, rather than the word, like the pixel in images. The proportion is more like:

text : char = image : pixel

By this angle of view, the order of characters in sentences is fundamental. Convolutions in text come in form of:

single word => bi-grams (two adjacent words) => n-grams

like graphical features

lines , corners => mouths, eyes => faces

come out of portraits.

In CNN the pair adjective + object for example, could be recognized invariantly of its position, at the begin or at the end of a sentence, exactly like a face is recognized wherever it is located in the whole picture.


It might seem more intuitive to apply Recurrent Neural Networks (like LSTM, Attention or Seq2seq) for text classification, due to the sequential nature of RNNs algorithms. I didn’t run any test on them so far, but I would promptly play with TreeLSTM. CNN performs well, and one might say that Essentially, all models are wrong, but some are useful, an essay the fit with the idea that final outcome drives the decisions, and experimental results play an important role.

Word Embeddings

Alike any NLP, in CNN words are replaced by their correspective semantic vector. Most famous are Google word2vec, GloVe and FastText. I decided to make use of ConceptNet Numberbatch that took first place in both subtasks at SemEval 2017 task 2. Moreover, the vector file is very small (~250M) compared to Google News word2vec (~1.5G) and from an engeneering point of view, those numbers matter.

Minos is still experimental and not well tuned, doors are open for improvements. An aspect shouldn’t be ignored on working with CNN is the Catastrofic Forgetting, an annoying phenomenon that ruins irrevocably the entire training.

Automated Question Answering using Semantic Networks

I worked recently in a small prototype that combines NLP analysis and semantic datasources for answering simple generic questions, by learning how to get the informations given a fairly small amount of question/answer pairs.

Conversational interactions represents the core of any modern Chatbot and the ability to manage utterances and conversations is the strongest indicator of user’s satisfaction. A natural and spontaneous QA dialogue, as every Chatbot would aim to engage, will attempt to solve 3 fundamental issues:

  1. Classify utterances and extract dependencies between words.
  2. Integrate source of knowledge.
  3. Infer transitive semantics (e.g., reconstructing what it is implied but not written).

Neural networks are particularly effective in conversational modelling. Architectures like seq2seq have demonstrated to generate sounding conversational interactions, by predicting sentences given the previous sentences in the dialogue. The end-to-end nature of those models represents one of their major strengths. Those models have no hand-crafted rules, since they are trained against large conversational datasets.
Nevertheless, these models can’t incorporate content in the form of factual informations from sources rather than the training corpora; the conversational analyzer (1) and the knowledge representation (2) are not distinguishable.

Split language and knowledge domains

The approach I’m going to describe will allow the model to be versatile and applicable to different domains since it can clearly distinguish (1) and (2) as interoperable components of a QA system.

The goal of the prototype is to obtain meaningful answers out of simple questions, by learning how to get the information instead of learning the information. Basically, the machine learning system won’t be instructed merely on which is the right answer, but rather how to find it in a given datasource.

I was inspired by this paper that uses ConceptNet, an open-source semantic database which gather informations from several sources such as WordNet, DBPedia, Wiktionary. It helps computers to understand the meaning of the words by solving their analogies with others. ConceptNet serves as ontology source and also covers the inference service (3) by its graph based nature, where entities are connected by semantic relations.

Sentences classification and entities extraction

As could be intuitively deducted, the sentences structure is invariant in respect of the number of entities used for querying the datasource, therefore the samples necessary to train could be much lower than the seq2seq previously mentioned approach.

NLP dependency tree

Knowledge crawler

Within the entities extracted from the utterance and the expected answer (or a list of optional answers) the crawler should return the shortest navigable path in the knowledge graph for obtaining the accepted answer.

Input: nsubj: colour, nmod: sea

Expected answer: blue

Compiled model: [{nsubj}:/r/IsA] and [{nmod}:/r/RelatedTo]

The outcome is correlated to the logic of the sentence, and different types of sentences are classified accordingly. However, similar grammar structures can also have different semantics and they should be treated differently. In the case of

Question:what is the capital of germany?

the crawler will find the query with the least number of joins required to get the expected results. It outputs a different model: {nmod}:/r/dbpedia/capital

In this case, a direct relation (dbpedia/capital) univocally describes the expected relation, and it is selected as the best alternative for answering that specific class of questions.

Run the model

Now let’s run the model asking: what is the color of the sun?. The inference component will first classify the sentence and it will associate it with the first model have been previously compiled. The crawler will search entities that fulfill both relations (X IsA color and X RelatedTo sea), and gets the result: yellow.

These are some of possible outcomes:

>Tell me some cities in Italy in front of the sea
Venice, genoa

>What is the capital of Germany?

>Is blue the color of the sea?

>Which lakes can you find in Italy?
Lago como, lago garda

Machine Comprehension on Chatbots

One of the most demanded feature in chatbots is the ability to automatically provide helpful informations. Users might ask about how to pay the purchases online, how to return an defected item, when the purchase could be delivered or just about the opening hours of a shop.

One way to implement this feature is to train a sentence classifier for a determined set of questions the merchant is willing to answer. The system should be instructed on some examples such as: “which credit card do you accept?”, “How do I pay?”, “which payment do you support?” and so on. This simple technique requires a sequence of manual tasks for every conversational agent, such as set up the training and inference pipeline for questions/answers, or reuse the Natural Language Understanding (NLU) system already adopted by the chatbot, if present.

A second more intriguing and sophisticated approach leverages the advances in machine comprehension, which is the ability to read text and then answer questions about it, automatically. Stanford NLP group created SQuAD a dataset consisting of 107.785 questions pairs on 536 articles, for training and evaluate machine comprehension models. One example of article and question and expected answer is:

In meteorology, precipitation is any product of the condensation of atmospheric water vapor that falls under gravity. The main forms of precipitation include drizzle, rain, sleet, snow, graupel and hail…

Question: What causes precipitation to fall?

Answer: gravity

Understand text is hard. It requires the knowledge of the language and a representation of the topic. Those challenges could be easily compared with the linguistic and cultural barrier among people. For example, I can hardly understand a paper written in chinese about Panda’s immune system, essentially because I don’t know the chinese language and I don’t know anything about immunology. Similarly, a program can’t do better on unless it masters these two aspects: language on one hand, and high-level concepts on the another.

For running an automatic FAQ responder I used one of the top-ten reading comprehension system available, the BiDAF (Bi-Directional Attention Flow). It doesn’t perform badly (81,5% F1 accuracy) compared to human precision, which is 86,8%. I applied BiDAF on Charly, a chatbot for conversational commerce, for serving informations extracted from given text like this:

My phone number is +4911002233. I live in Munich, Germany. You can pay with your credit card, we accept: Visa, Mastercard, American Express, Maestro, Visa Debit. The delivery is twice per week on Tuesday and Saturday. if the purchase or a product is not good or you are unsatisfied please return the product with the receipt within 30 days to the driver or call us on +4911002233.

This is how Charly answers:

Charly chatbot

This approach is much scalable than a classical questions’ classifier. It allows automatic responses from text that could be just scanned automatically by the FAQ’s page present in the customer website, or just a plain info text submitted by e-mail or a web form.