x + y =4
” is true in a world where x is 2
and y is 2
, but false in a world where x is 1 and y is 1.
PL is a simple language consisting of proposition symbols and logical connectives. Its syntax defines the allowable sentences while its semantics defines the rules for determining the truth (just true
or false
) of PL sentences with respect to a particular model. The semantics for propositional logic must specify how to compute the truth value of any sentence, given a model. This is done recursively. All sentences are constructed from atomic sentences and the five connectives.
A model is a truth assignment of propositions in a knowledge base (KB) which is a set of sentences (axioms) when the sentence is taken as given without being derived from other sentences. Sentences can be derived from other sentences that are logically entailed. Entailment is the idea that a sentence follows logically from another sentence. In mathematical notation, we write α $\vDash$ β if and only if, in every model in which α is true, β is also true.
Sentence derivation is done by running an inference algorithm that follows the modus ponens logic paradigm or the modus tollens. The former refers to deductive forward chaining (FC) while the latter the inductive backward chaining (BC) families of algorithms. FC is an example of the general concept of data-driven reasoning; reasoning in which the focus of attention starts with the known data. BC algorithms, as its name suggests, works backward from the query. If the query $q$ is known to be true, then no work is needed. Otherwise, the algorithm finds those implications in the knowledge base whose conclusion is $q$.
For a sentence to be proved as true, it must be sound and complete. A sentence is valid if it is true in all models. For example, the sentence P ∨ $\neg$P is valid. Valid sentences are also known as tautologies, they are necessarily true. If a sentence is valid and its premises are true, then it is also sound. For being complete, the inference algorithm must derive any sentence that is entailed. Proofing is obtained by reductio ad absurdum (or proof by refutation or contradiction) on which α $\vDash$ β if and only if the sentence (α ∧ $\neg$β) is unsatisfiable. How resolution works for obtaining a proof?
PL lacks the expressive power to concisely describe an environment with many objects.
The language of FOL is built around objects and relations and it assumes that the world consists of objects with certain relations among them that do or do not hold. While propositional logic commits only to the existence of facts, FOL commits to the existence of objects and relations and thereby gains expressive power. In FOL are represented objects, predicates and functions. Predicates are $n$-arity relations among objects or employed for expressing features of a single object. Functions are expressions that return a single object out of a set of arguments.
In FOL, it is natural to express properties of entire set of objects and it is done by adopting the existential quantifier ($\exists$) and the universal quantifier ($\forall$).
An interesting application of FOL is the description of Peano numbers which are Peano numbers are a simple way of representing recursively the natural numbers ($Nat$) using only an axiom ($Zero$) value and a successor function $succ(Nat)$.
\(Nat(Zero).
\forall x [Nat(x) \rightarrow Nat(succ(x))].\)
This is the process to make - whenever possible - different logical expressions look the same by substituting properly the value of variables. With unification it is possible to construct all queries that unifies with a given sentence, e.g.: $Employes(SAP, Giancarlo)$ and $Male(Giancarlo)$ some queries might be: is there are male employee in SAP? In FOL it might be: $\exists x [Male(x) \wedge Employes(SAP,x)]$ .
Reasoning in FOL works by bringing the formula into Skolem form - by removing existential and universal quantifiers - and transform it in clause normal form (which indicate a formula composed of conjunctions separated by comma). Use propositional reasoning (resolution, SAT), forward cha backward chaining.
The Herbrand Universe (HU) is a set of combinations of all ground terms (not variables) present in a formula, e.g.: in the clause formula $CapitalOf(x,y) \rightarrow IsA(x,City), IsA(y,Country), PartOf(x,y)$ we have the HU as $HU = {City,Country}$ but if there would be a function, the HU size will be infinite. The HU is useful for example to restrict the search space of Prolog programs, whenever it is possible.
The HU could play a part on resolution in FOL by applying the Herbrand expansion which is the set of formulas the results of substituting terms in the initial formula in all possible ways. Given a knowledge base: $KB = \forall x [SpecialAgent(x) \rightarrow SpiesOn(x, Danz)] \wedge SpecialAgent(MrSmith)$
it will be translated as “every special agent spies on Danz and MrSmith is a special agent”.
If we advance the hypothesis that formula $\phi=SpiesOn(MrSmith, Danz)$ in entailed by KB ($KB \vDash \phi$), the Herbrand expansion could be applied to show that $HE(KB \wedge \neg\phi)$ is unsatisfiable.
As in the established definition of agent1, the sensation is the process of acquiring environmental information from the environment through sensors. The behaviour the agent manifests is the result of pondered actions, activated by proper strategies that aim to achieve desirable goals. Living creatures, including humans, do not deviate from this general definition. Attention could be seen as a function that help on elaborating successful strategies in efficient way by partitioning what really can affect the goals, and what it does not 2.
Sensation brings its meaning directly from the raw information acquisition from the senses. They include of course the 5 canonical ones, but not limited of them, they involve also movement, position of the body, pain and temperature. It could be surprising how much sensitive are humans senses. For example, it is possible to detect a candle almost 50km away in a dark and clear night 3. On the other hand, perception is a more abstract concept.
When perception is closely tied up with sensation it is defined as bottom-up perception. In this case, perception is the interpretation of information from the environment so we can identify its meaning 4 and enable more or less accurate predictions about what should be there. Perception, on the other hand, emerges also from cognitive processes that involve memory and attention, completely detached from sensory circuits. This scenario is described as top-down perception, and it is a process strongly mediated by expectations and the contextual setup during which those sensations are collected 5. In the image below, the shadow is expected to lower the brightness of the B
region, then a kind of visual compensation innately make us thinking that the region should be brighter without shadow, generating the famous visual illusion.
Attention and perception are highly correlated and it is very evident in every day life. If we are walking in the university department looking for the Cognitive Psychology professor, I think it won’t be surprising if we pay attention on people faces by visual searching in the crowd until the target pops out in the hall. The eye movement will be directed to scan the faces of the people 6. If expectations are violated by novel surprises (e.g.: an old mate in the classroom) these are explored extensively 7. The other face of the coin is inattentional blindness by which what it is not attended during the attention phase is not consciously perceived. For example, 60-80% of the observers do not attend the center point which turns into a text while a distracting cross appears in the left 8 in the figure below.
I’ve mentioned popping out by describing the subjective effect may have when something catches the attention with no efforts. It is established that this is an interesting topic that shows particular dynamic depending on the environmental context. If the search task involves picking a simple feature out of a context disseminated of distracting details, it is much easier than searching for complex pattern with a combination of features - the so called conjunctive search 9. In the experiment below, recognizing a simple red dot among all green ones is automatic, while the case in the right pane involves some degree of attentive scrutiny which is much slower and more resources demanding.
This might suggest that attention is somehow necessary for recognize non-trivial objects 10. The feature integration theory 10 affirms that automatic feature processing is followed by attentive processes to bind the features into a whole object. Objects are not a mere list of independent features and perception does not limit on pattern matching but also on identifying high order structures, with the support of controlled attention.
Just as the letter N
is not simply a casual aggregation of 3 segments, things must obey to some grammar for being recognized.
A demonstration of the importance of structural relations is depicted in the image below, where a set of simple forms (geons) can easily compose complex objects 11. That demonstrates that not only the relational information if needed, but it is more critical to perception than the features themselves 12.
While some information need attention to retrieve complex structures from senses, an equal demand of effort is required to selectively ignore conflicting information while performing some tasks. This is the case raised by John Stroop in his famous experiment 13, that demonstrates the difficulty of partitioning contrasting information on one side, and useful one to the other. In the Stroop task, an observer reports the color of appearing words, while the words points to a different color’s name. It appears that the task is more difficult than the no-contrasting information setting (when the name matches the color). Observers activate different parts of the brain on discordant stimuli processing, those parts are in charge of executive control and selective attention functions. For the same reason, when we stop at a cross-light while driving a car, we discriminate the cross-light of our own lane from the adjacent cross-lights. Selective attention prevents us to switch from the brake to the accelerator pedal when it is not the case.
The selective attention theory states that perception is filtered before being processed by high level mechanisms, but it clashes with the notable cocktail party effect (CPE). Unfortunately for Broadbent, the CPE seems to confine the selective theory not as the only way the brain deal with attention. The CPE describes the capability of being caught by unattended stimuli once they present an important pattern. For example, while we’re on talking with friends and someone out of the interlocutor’s circle will loudly mention our name, our attention will be probably triggered by this event toward the speaker and his discourse, even though we were previously unaware of that discussion.
An interesting effect of unattended stimuli is that they interfere with attended perceptions. The experiments that put light on this effect, are the ones that enforce shadowing, by which observers are instructed to follow only one stream of perception among many. If two recorded discourses are played simultaneously, one to the left side, the other to the right side of the observers’ headset:
a: “They were standing near the bank…”
b: “the silicon valley bank has gone bankrupt…” ^[It is not exactly the example of the experiment, since that bank went bankrupt on 2023, but I guess it is equivalent to the purposes of the experiment]
Observers disambiguates the term bank with the financial bank, and not with other possible meanings 14. This experiment tell us that though attention allocates sufficient resources to spotlight some stimuli, it leaves space for unconscious mechanisms that capture information for the unattended channel, for blending it with the attended one.
Giancarlo Frison
Russell, Stuart J.; Norvig, Peter; Artificial Intelligence: A Modern Approach (2003, 2nd ed.); Chapter 2. ↩
There is no clear definition of what attention is: “No one knows what attention is” (Hommel et al., 2019). The first attempt to categorize it comes from William James in The Principles of Psychology (1890): “is the taking possession by the mind, in clear and vivid form, of one out of what may seem several simultaneously possible objects or trains of thought…It implies withdrawal from some things in order to deal effectively with others”. In the Schema Theory (Neisser, 1976), attention is a dynamic process that seeks information consistent with current situation. I think a good synthesis of many definitions could be summarized into “attention is the allocation of resources and processing to a particular object, region, dimension”. ↩
Okawa & Sampath, 2007 ↩
William Wozniak. Sensation and Perception ↩
Neisser, 1976 ↩
Yarbus, 1967 ↩
The metaphor of brain as a predictive machine found matches for example on Jeff Hawkins - On Intelligence (2004), and Karl Friston - The free-energy principle: a unified brain theory? (2010). The latter reduces agents as surprising minimizers; the former adapt the free energy principle to human cognition, on which automatic processing will escalate to high form of deliberate decision making (throughout attention mechanisms) whenever the automatic layer does not know what to do in certain circumstances. ↩
Mack, Rock 1998 ↩
Treisman & Gelade, 1980; Treisman & Sato, 1990 ↩
Anne Treisman, Garry Gelade; A feature-integration theory of attention; (1980). ↩ ↩2
Biderman, 1987 ↩
Biderman, 1985 ↩
Stroop, John Ridley -Studies of interference in serial verbal reactions (1935) ↩
MacKay, 1973 ↩
“This is the promise of the Semantic Web – it will improve all the areas of your life where you currently use syllogisms. Which is to say, almost nowhere.”
—Clay Shirky
“Fortunately, a large majority of the information we want to express is along the lines of ‘a hex-head bolt is a type of machine bolt.’
—Berners-Lee
“Unfortunately this is not true. If one considers how humans handle concepts, the class relation structures of the Semantic Web capture only a minute part of our information about concepts”
—Peter Gärdenfors
I guess that it doesn’t come without notice that the semantic web approach isn’t the favourite of the author of the Conceptual Spaces (CS).
Two distinguished views on knowledge representation are the semantic web and the vectorized embeddings that belong to the symbolic and the connectionist schools respectively. The CS theory comes from a very different vision on how knowledge should be encoded. Concepts in CS are without doubt, close to vectorized embeddings representation though they preserve interpretability - a strength of the symbolic world.
It’s not hard to think many of you have already heard about word embeddings for knowledge graphs. Modern natural language processing tasks based on neural networks would not be in there without vectorized embeddings. We’re witnessing the immense progress of natural language processing (ex: GPT and related) in recent times, and their machine learning algorithms rely on word embeddings. A branch of deep learning named graph neural networks (GNN) have brought a similar advancement on machine learning tasks such as link prediction in ontologies. The intuition behind neural network embeddings is that words or graph nodes can be represented as a series of real numbers that embed the semantics, and programs of a very special kind - the neural networks - can use them to accomplish some specific task. Vectorized embeddings come as a byproduct of those processes where there is not contemplated human supervision in the loop. Embeddings, differently from ontologies, don’t convey any semantics that is comprehensible, and even less edited, by people.
If requested during a talk in a conference, I can hardly imagine someone raising the hand because of past experience with CS.
In the CS framework, concepts are represented as vectors of numbers that are continuous in their spectrum and convex, which means that similarity scores and distances among concepts are naturally derivable, but those vectors are not imperscrutable to human scrutiny. Concepts are built taking in account how we process and categorise them.
Domains represent a single quality. They are convex and differentiable because they can be represented in real values. A concept is qualified via a set of domains. An apple has a round shape, a colour in the range of green, yellow and red, a taste (which is apparently related to the colour), a size, a weight. Likewise neural embeddings, similar concepts cluster together in regions. In CS, concepts are defined by vectorized properties where each of them describe a quality in a specific domain:
#000000
(black) and #FFFFFF
(white) and this is numerically consistent with our perception, from nothing to all colours together. Further, a slight change in one parameter implies a small visual change, as more or less reddish, bluish or greenish depending on which parameter._“The things have weight,mass, volume, size, time, shape, colour, position, texture, duration, density, smell, value,consistency, depth,boundaries, temperature, function, appearance, price, fate, age, significance.
The things have no peace.”_
—Arnaldo Antunes
The main similarity with neural embeddings and CS is that both are differentiable and convex. At the same time, concepts incorporate symbolism with their multi-facets domains; they naturally enable semantic algebra and computational problem solving. For example, consider the request: ‘show me a movie like Casablanca but scarier as Shining’ will consider the properties of Casablanca but with the scary
domain similar to Shining. Put it in the frame of logic programming, and you get it in a line of code.
CS has also some interesting psychological foundations in the regards of how people deal with inner knowledge representations and how they learn them. It has proven some validity in explaining some cognitive aspects, especially those involved in concept learning and understanding. It has been found out that when children have assimilated the meaning of a domain, it’s then easy to learn concepts that represent a flavoured materialisation of that domain. For example, once they know what the domain of ‘colour’ is, it’s easy to learn new concepts related to colour, such as ‘turquoise’. Grasping a new domain is a much more difficult step than adding new terms to an already established one. Conceptual domains are mental buckets where we place concepts based on how their properties fit into that domain and we don’t have to know how ‘turquoise’ is exactly encoded, we just need to think of it in comparison to other concepts as somewhere in the between of light blue and light green. Seems to be a provable trick we use on learning and it is justified by the principle of cognitive economy, for which our mental capabilities are limited and we favour simple and efficient ways to position new information.
I’ve written more about cognitive aspects in meeting of minds. Take a look at there!
]]>👉 hereby, a brief informal account of the idea behing this patent.
In a behavioural test in the old seventies, people about to use a copying machine were asked to let another person, the experimenter, to use it first despite there being a line.
“Excuse me, I have 5 pages. May I use the xerox machine?”. Within this request, 60% of the people let the experiment go first. Then experimenters changed the call into:
“Excuse me, I have 5 pages. May I use the xerox machine because I have to make copies?” The justification is a clear non-sense. It is technically called ‘placebic information’ because, comparably in pharmacy, the given explanation does not contain any additional information. You might be surprised that in the latter case the rate of success reaches an astonishing 93%.
This experiment teaches something interesting about how we evaluate the information from the environment. Even without any significance, just the feeling of a reason can change dramatically how we behave. Most of our daily behaviour is accomplished without paying attention to the informative details. This is obviously not new – advertisers know it very well. But we are not doomed to be always fooled, we are conscious when the decisions are more important.
Differently from the copying machine experiment, argumentation is a bit more interactive than a single-shot justification. We are, as humans, designed to improve our view of the world through confrontation with others. We need to be challenged to defend good ideas or abandon the wrong ones. Argumentation involves a confrontation of clashing reasons and sometimes beliefs are inconsistent in the light of new objections, so we must give up some of them.
John Stuart Mill once said “Both teachers and learners go to sleep at their post as soon as there is no enemy in the field” to spot the point of argumentation.
Argumentation is a fervid research topic and in its basic form, it is simple to explain. I would try here. Arguments can be represented as symbols; they are nodes in a graph of concepts and other arguments. An argument could be in two states: active or defeated. It is defeated when it is attacked by an active argument, while if it is attacked only by defeated ones, it is active. If none of the arguments is attacking, it is still active and can attack others. That’s not really complicated.
The eStore sell photography items, and any product is represented as concept graph and the system knows some property of articles for sales. We know that the Canon EOS-1D is a high quality one and professionists desire high quality cameras. John, the user browsing the eStore, after a careful inspection decides to add it to the cart. It could be interpreted that John desires that product and we can store this information in his personal knowledge graph. This information is used for enabling further reasoning about who John is, and what are his peculiarities. We can infer that maybe John is a professionist. Why? Because John and professionists share common desires. Is it enough to be certain of that? Of course not. But it is a hypothesis that the system should consider.
But there is a problem. The eStore’s machine learning system is prompting some alarms on John. We know that the chances John will finalize the purchase are low (for a variety of reasons) and the system has to do something about that.
I introduce now some logic primitives that lay the foundation of logical argumentation. We know, as it is encoded in our knowledge base, the a professionist requires some kind of ability to perform his job, right? Ability is a concept, and its antonym is inability. We can infer that inability prevents anybody to be a professionist. So the inability’s argument is attacking the professionist’s one. Here we see a pattern, specifically (sorry for the logic lingo) we can affirm that:
attack(X, Y) ← requires(Y, Z), antonym(Z, X)
The initial hypothesis about John being a professionist can enable the system to make a deductive conclusion in the regards of the abandoned cart. We may say that John is unable to use that camera. Why? Maybe because he does not know that camera. Maybe he does not know the full set of characteristics that device can offer. Again, it is hypothetical yet very plausible argument. What the system can do for him? Well, we have many products in store. One of them is an online course, which is intended for learning, and learning is attacking (for the same principle we’ve seen before) the inability argument. Why not recommend it?
John seems to be caught by our proposal an inspect the course we recommend him. What could be go wrong here? Would be the case John has some reluctancies even for the online course? Maybe John does not have time to attend it, he’s a very busy photographer! But the system elaborates some precious information about the course for prompting some valuable argument to convince John that the recommendation is a valid one. From the description we know the course is just 20 minutes long. For the average of courses on sale this is indeed very short. Let’s tell John he does not have anything to worry about time because the course is short.
Or maybe John is worried by the cost of the course? As inattentive user he might overlooked some important information. No worries John, it is for free!
]]>“What do you mean, you’ve broken it off? She was the best thing that ever happened to you. I loved her too, if the truth be known. You’re such an idiot! I have a mind to…”
“I mean I’ve broken the tip off my pen.”
“…Oh.”1
Although we can easily identify when a conversation veers off course, we are often unaware of the inner workings of our mind that silently orchestrates the communication process. This post marks the inception of a brief series, aiming to illuminate these mechanisms through a distilled interpretation of Professor Gärdenfors’ “Geometry of Meaning.” This compelling collection of theories offers valuable insights into the nature of semantics and how our cognitive faculties process it. Exploring these subjects has sparked my curiosity and fascination, particularly in relation to the ultimate objective of developing automated systems capable of emulating our cognitive processes to solve problems on our behalf.
What is that knowledge and where does it come from? Knowledge stands for a set of justified believes, a core of interconnected ideas such that the causal relations among them give us a resemblance of truth. Information can help us to get knowledge but it is not the mere data - aka Big Data - sufficient to associate the mental affordances we need to function in the complex world. As affordance I intend the possibility of an action an entity can offer. We are continuously encountering strange situations that we have to figure out which support from the toolset of knowledge we should pick up for deciding on what to do next.
What we think, what we plan to do today or in the distant future is corroborated by what we know. We are the climbers of the ever growing mountin range of grasp we started to absorb even before we born. We are surfing the deep pack of knowledge we gathered about ourselves and the world.
The metaphor of the mountin as the knowledge, and the climber as the agent that seeks for new mastery, suggest us that agents are positioned at different altitudes - different granularity of knowledge - and then collected background, might be different. In the short dialog I used to open this post, Bob and Alice align themselves into a shared knowledge, but sometimes it is far from simple to establish apparently simple truths. Bob is ambiguous in his ‘I’ve broken off’; he does not specify what he has actually broken off with, and Alice, the interlocutor, wrongly infers he’s leaving his girlfriend. To create the basis of reciprocal understanding Bob pulls the communication level down and he raises Alice’s knowledge by precising that what is broken is the tip of his pen, and not his relationship.
I guess everybody agrees on the importance of sharing a common background of knowledge and Bob shows us we shift from a high semantic layer to a lower one to meet our interlocutor and restart the coordination from there. This admits that knowledge lies in hierarchical structures where the first rung of the ladder is populated with the fundamental and cognitively irreducible terms; they form new and more abstract concepts with increasing abstraction as long as we climb the ladder, in the upper layers. Simon Winter explain this idea of layered levels of knowledge analyzing how a mastercraft teaches a novice on how to replace a violin’s strings, where non-verbal communication is also part the game. He summarizes context levels into:
I don’t know whether there is a finite number of layers or it is open-ended, but on a high level of shared knowledge, the communication style would flow according to the ‘the obvious goes without saying’, where the implicit understandings are at their maximum. I’m also a bit skeptical of the lower bound of this ranking. If we take as granted that meaning of words can be decomposed into a finite set of conditions that are necessary and sufficient to describe the meaning, it is clashing with the level 0
where concepts are irreducible. If I open a vocabulary and every word - even the simplest one - is defined by means of other words, the case of circular loops in meaning references is inevitable, and inequivocal sign of inconsistency.
Language and other forms of communication open the way to various types of engagement. Sophisticated collaborations can enable collective creation of value that can’t be done by single individuals; they’re fruits of communication. If we can summarise in a single word what communication stands for, I would recommend coordination. Coordination among interlocutors is a practical way to point at objects, places or actions, but also as alignment of intents, ideas, persuasions and even entertainments. More generically, coordination is a convergence of mental representations.
If coordination implies a transfer of information, it is expressed in several forms where language is the most rich and expressive one. In the dawn of humanity when the communicative acts became more varied and detached from the immediate and practical purposes, the value of meanings, or semantics, turned out to be more salient in communication. The coordination among participants is an iterative process where the meeting of minds ultimately converge in an alignment of meanings. Sounds almost romantic! As borrowed from maths, the metaphorical meeting point is also named fixpoint. A fixpoint is a value for which a function returns exactly the same value. It is a value with which a specific function works as an identity function. Let’s unfold how the process occurs in our dialog.
More mechanically, Bob applies the expressive function f
to his mental idea and the sentence is passed to Alice, then she applies g’
- the interpretative function - and acquires the meaning of that sentence, which is returned back to Bob as she has understood. Alice encodes her understanding into expression, then Bob applies f’
- the inverse function (cofunctor) of f
- to acquire Alice’s expression. If the idea generated in the round back coincides with the original one, the coordination has been successful. If the original and derived ideas don’t overlap, the alignment failed and Bob will attempt to correct Alice.
For it to be effective on how to fix Alice’s misunderstanding, the distance between Bob’s idea and what he derived from Alice’s feedback should be articulated enough to convey a rich sort of information. It should not just give a quantitative distance from the successful fixpoint, and also a qualitative account on how they differ from each other. This is why the discrepancy should be encoded as a vector in order to be also descriptive of the conversation’s status. This introduces the next topic of conceptual spaces, which poses the representation of semantic values in the between of symbolic AI and neural networks.
Continue the reading with conceptual space as neurosymbolic representation
http://fiftywordstories.com/2013/09/11/connell-wayne-regner-oh/ ↩
Search text usually consists of one or more keywords in sequence that distinguish the item with increasing accuracy, and it won’t be a surprise that utterances could be in an extremely simple form. Probably similar to the structure of our old ancestors’ early attempts me Tarzan you Jane, search requests might lack prepositions, adverbs, and in general they may not exhibit any kind of grammar. In an electronic eStore, a text such as mem disk 500gb flash drive would be then interpreted as a series of constraints that must be all simultaneously satisfied: mem disk AND 500gb AND flash drive. As result, it would then prompt us a list of external USB memory storage devices with 500Gb of capacity.
What if the above utterance will include different modalities on how to discriminate attributes? Take the sample mem disk flash drive with more than 500gb. I expect to see all USB sticks with at least 500Gb. Unfortunately, this is not the case. The result does not change from the previous query and we can conclude the search engine can’t handle simple inequalities such as more than or less than. If the limitation is not worrying enough, please pause reading and answer the following quiz: What do you think the query a Nikon cheaper than a canon EOS-1D will show you as results? Yes, you will have a mixed list of Nikons and EOS-1Ds together, which is not, indeed, what we meant. I think that it’s time to evolve search engines and align them to the intricacies of natural language.
The reason why in the title is mentioned question answering is because of the shift of paradigm we want to push forward on this new branch of research. Have you ever heard of an IBM agent in Jeopardy! quiz show? We treat user utterances as carriers of semantics that are closer to our natural language rather than a mere search for products. While this approach can help to make better search engines, it can be applied to digital assistants of various nature: IT support, meeting planners, analytics. But for now, let’s focus on search items in eCommerce.
The set of tools developed for business process modelling are extremely helpful and they are paving the way to innovative ideas that share the founding principles:
Traditionally, the bulk of effort on setting up a proper search service is directed toward database indexation, corroborated by named entity recognition (NER) and information retrieval techniques. Moreover, QA systems present two faces, one dedicated to disentangling utterances’ logic, the other converts the semantic into an executable query that pick-up results from a given information system. The former is responsible to interpret what the user is texting, and it is invariant in the respect of the domain context, but dependent on the user language. The latter component, the one closer to databases, is specific for customer/domain of interest and data topology. This architecture allows great portability with customization effort reduced to the minimum.
This project does not come out of nowhere, it is rather the continuation of research on semantic technologies and symbolic reasoning applied to a fairly challenging problem. Honestly, I can’t imagine something more complex than language. As in any innovative endeavor we revise what we have done in the past and we try to improve from the foundations every time we face a new kind of problem, and this time, it won’t be different.
The world has the structure of the language, and the language has the form of mind. Eugenio Montale
I strongly believe that for attaining better solutions, problems should be tackled from several angles. If the symbolic approach provided by knowledge graphs and logic programming help to solve a variety of problems, a hybrid system – symbolic plus non-logic – can expand the reach of our technology. In the Prof Gärdenfors work Geometry of Meaning it is summarised more than 20 years of work in a fully comprehensive theory (philosophical, cognitive, neuroscientific) about Concept Spaces. They are computable geometrical representations with a lot of affinity with semantic algebra, but also with the interesting feature to be convex, which is a pillar of gradient-based algos. Basically, all modern machine learning.
Concept spaces describe a theory of meaning and how this theory is adhering to our cognitive model of learning, which I find extremely interesting. The human oriented basis of the theory is inspiring also for computer programs that attempt to solve complicated problems for us, like for example eCommerce search. This is one of the topics I will write more about, so if you are interested in that, stay tuned!
]]>This experiment teaches something interesting about how we evaluate the information from the environment. Even without any significance, just the feeling of a reason can change dramatically how we behave. Most of our daily behaviour is accomplished without paying attention to the informative details. This is obviously not new - advertisers know it very well. But we are not doomed to be always fooled, we are conscious when the decisions are more important.
If rather than 5 pages the experimenter asks to copy 20 pages, the results are very different. The more demanding request is evaluated thoroughly and the rate of success does not change between the two cases above. With a placebic reason or no reason at all, the success rate is equally low (20%). Instead, what works is a justification such as:
“Excuse me, I have 20 pages. May I use the xerox machine because I’m in a hurry?”
Admittedly, it is not a strong justification. But some compassionate colleagues let you pass ahead. This justified reason convinced more than double of the subjects, 40%. Is sensibility towards justifications something should be considered with online retail customers?
When we pose an explanation for changing a decision, we are giving an argument that supports or attacks a thesis. Differently from the copying machine experiment, argumentation is a bit more interactive than a single-shot justification. We are, as humans, designed to improve our view of the world through confrontation with others. Stuart Mill once said “Both teachers and learners go to sleep at their post as soon as there is no enemy in the field” to spot the point. We need to be challenged to defend good ideas or abandon the wrong ones. Argumentation involves a confrontation of clashing reasons and sometimes beliefs are inconsistent in the light of new objections, so we must give up some of them.
Argumentation is a fervid research topic and in its basic form, it is simple to explain. I would try here. Arguments can be represented as symbols; they are nodes in a graph of concepts and other arguments. An argument could be in two states: active or defeated. It is defeated when it is attacked by an active argument, while if it is attacked only by defeated ones, it is active. If none of the arguments is attacking, it is still active and can attack others. That’s not really complicated. I show you an example: the argument “Home Office Working” is usually attacked by “Employee Disconnection” argument, which is attacked by “Video Conferencing”. Graphs of arguments grow indefinitely, and within their simple status (active/defeated) they can generate complex dynamics. An activation may cause a cascading effect in a large portion of the network. The main claim can be obliterated by a far away, in the arguments’ graph, single argument.
Argumentation in online retail
Let’s talk about argumentation in eCommerce with a story. John is a professional photographer, and he needs to learn about an advanced camera. After adding to the cart an online course, the system detects a high risk of churning. John most probably won’t take that course, why? It’s difficult to guess, but we make suppositions by properly using the knowledge system which expands the catalogue products with a wider semantic network. A knowledge graph of concepts that the system can exploit for making sense of what’s going on with John and his reluctance to finalise the checkout.
According to the knowledge base, a learning course requires time, which might be an issue for a busy professionist. How does the system know whether this is the case? Semantic algebra is very helpful on that. But it is just an hypothesis among many others, the system needs validation. Suppose John confirms, we know he does not have time for it. The course is just 20mins long (a crash course indeed), that compared with others of its category is short. This is a counterargument that invalidates John’s misconception.
John is convinced about that, but there is another problem, his budget is very tight. But wait, this course is free of charge, therefore the ‘low budget’ argument is defeated, and the main claim (John attends the recommended online course) is active and valid.
I don’t know whether John has enrolled in that course, but the system attempted to change his mind. His misbeliefs have been neutralised by reasons the system has been able to scrutinise from the knowledge system.
We were also lucky, the arguments provided do not attack each other by forming loops in the graph. That would considerably complicate the argumentation. I give an example of what could happen. Think of three football teams, Italy, Germany and Brazil. Let’s assume Italy wins against Germany (not always btw), Germany beats Brazil, Brazil beats Italy. In the knockout stage, who will be the winner? Well, it depends on how the competition is set up. It is the Argumentation System that has to set up the proper strategy for achieving the desired goal. There are other cases where things get complicated. Let’s assume not all arguments are equally important and arguments are not true (or false) with the same intensity.
]]>Unconventional PS have been explored by industries since the beginning of computer programming. Consider for example spreadsheets – largely used by non-professional programmers – where formulas create views on the data stored in the grid. Think of computable notebooks such as Jupyter for live-coding programming snippets with a numerical or graphical output. There is a plethora of PSs that have inspired the exploration in the topics of:
If LC platforms reduce ROI and increase productivity by automating a narrow set of business specific problems, they hardly can fulfil the same promises in domains for which they have not been initially conceived. A mobile app builder might accelerate the development of a standardised set of use cases but when required features are not covered by those platforms, developers are forced to back into more traditional programming styles.
While it is desirable to clear out the verbosities and the irrelevant code sections that do not bring any value, the complexity of business problems is irreducible. A proper programming system gives full expressivity to skilled programmers for solving complex problems, and allows citizen developers to build simple applications autonomously. Our system should enable us to build complex algorithms but at the same time allow unskilled practitioners to, at least, understand how routines work, and possibly implement basic programs without involving development teams. It’s possible to be open on ‘hello world’ without sacrificing fundamental programming expressivity.
Let make easy things easy, difficult ones possible
The diversity of solutions for delivery software artefacts requires an unprecedented flexibility on how to integrate new practices into existing software development pipelines. The multi-stage process that brings programs from coding to live production might be highly customised and automated, but also, for diametral opposite needs of simplicity, it should offer fully managed and standardised lifecycles.
For lowering the barrier in already established architectures without sacrificing the core features, we should provide SDKs of the language of choice without interfering with the established development cycle. At the same time, it should be available on an on-demand offering such as Logic as a Service (LaaS). The online-only managed platform is intended for directly editing services, with near-instantaneous testing then publishing cloud API in a complete serveless fashion.
Business applications that do not treat stored information are a narrow niche, and for almost all business domains, software services manipulate stored data. Generic PLs incorporate data only as in-memory pointers. Stored data is retrieved through query frameworks which are very distinct pieces of software from the rest of the programming system. The dichotomy between data and computation does not contribute to the business value of services, it rather just makes it harder to think in terms of conceptual logic. An answer to this is by treating data as code and lowering the barrier for modelling logic. According to that, the function interface is not different from the data’s ones. Functions take parameters and return new information like a query in the DB. The difference between programs and data is that the former’s output is derived while data is passively given. Data and functions (predicates) are interchangeable entities.
Data stored in relational DBs is often mapped with object-oriented style, opening the way to a variety of difficulties due to the tendency to distort the nature of relational data into hierarchical classes. This is more evident when applied to loosely-structured property/graph DBs defined as knowledge graphs. We encourage modelling data in tuples as they represent the common denominator for almost all data representations. Tuples are abstract enough to convey what is necessary to implement the business logic while omitting the details of integrated information systems.
Logic: “the study of correct reasoning, especially regarding making inferences.”
LP Languages have been the subject of computer science research since decades. Their founding principle was about caring for the logic of complex systems, a principle that has never been of secondary importance in software systems. The duality code/data in LP make it natural to use it like a database language since relational algebra can be expressed directly, including tabular relations, views and integrity constraints.
One of the pillars of LP is recursiveness for managing data structures such as graphs, trees, lists, and even natural numbers. Though it is easy to find similarities with functional programming, LP languages like Prolog offer an intuitive programming experience with the multi-directionality of computation where once the interface of a predicate is defined, it could be inquired in any possible way, thanks to unification algorithms.
Correctness and predictable results are facilitated by a compiled and strong typed system. Though it is not widely adopted in LP, nothing prevents the programming system to be backed by type-safe languages. Types are not only primitive strings, numbers or booleans but complex data types such as immutable named tuples commonly present in all programming languages.
The declarative nature of LP, combined with multi-directionality, relational algebra, homoiconicity and programming capabilities for a smooth programming experience, actually hide the complexity that resembles more a software system on its own rather than a mere programming language.
]]>One main fundamental thing a Prolog-inspired language has, is the idea of finding answers for your questions. Which kind of questions? Anything can be encoded in a logical form, for example “Where is Venice?”
?(“locatedAt”, “Venice”, x)
What is the aggregated cost (y) of delivery (d) and warehouse (w) for our eCommerce store?
?(“deliveryCost”, d), (“warehouseCost”, w), y := d + w
Is it true that that item does not cost more than 200?
?(“price”, item, x), x <= 200
The engine iterates over the data and it will return the values you asked for, or it will say whether your statement is true or not. Basically, it implements and run the routines for you. What you have to do is to tell the system what is true and under what conditions, then the system takes the hurdle of computation. With it, the model of your business domain is purely abstract and logical, and the representation you define is closer to your mind than the machine. Ludwig is intended to be an easy tool to use, by application programmers, while at the same time, be understandable by analysts
I would frame Prolog into a programming style for tracing relations between facts and implications. Facts are assertions that we know to be true. They are information, data. Implications are the derived information that can be drawn from facts, either they are explicit or, more interestingly, they are derived from other implications. What I find interesting from a programming experience perspective, is that facts and implications are represented in the same way, lowering the learning curve to master it compared to other approaches. This is the principle of homoiconicity, which is summarised by saying that the language treats “code as data”.
One of the most intriguing ideas in Prolog that I would be extremely happy to have fully supported in Ludwig is the concept of unification. It’s the feature that distinguishes a high level language like Prolog from all others, and I show you what it is with an example taken from the SWI-Prolog.
nth0
is an apparently not-so-cool predicate for some operations on lists. It looks like this:
nth0(I, L, E, R)
Where the variable I
is the index of the E
element in the L
list, and R
is the remainder of L
without E
. If it sounds cryptic, actually it is extremely simple. I provide a list [a,b,c]
and I ask to the system: please, give me back I, E, R
:
? nth0(I, [a,b,c], E, R)
As you might imagine, there are multiple answers, E
could be a
or b
or c
, right? And the answer will reflect this:
I = 0, E = a, R = [b, c]
I = 1, E = b, R = [a, c]
I = 2, E = c, R = [a, b]
Unification in Prolog allows us not just to use predicates in a single way like any other programming language, but to abuse it in any conceivable way. In the following case, If I give the original list and the remainder, what is the missing piece E
?
?nth0(I, [a, b, c, d], E, [b, c, d]) => I = 0, E = a
If instead, I give the remainder without the original list but I provide only the character E
, what could be then the originating list?
? nth0(I, L, a, [b, c, d])
I = 0, L = [a, b, c, d]
I = 3, L = [b, c, d, a]
Intuitively, we have 2 solutions. In the first, the single element is prepended at the beginning of the remainder, in the second it is appended to the end of the reminder. In both cases, the definition of the nth0
predicate is satisfied. The system is returning the missing slots that satisfy the logical contract formulated in that predicate. Differently from any other language, a predicate definition is sufficient for an entire set of functionalities.
Unification should not be seen just for implications. Even independent facts might extend this principle for defining entire classes of concepts. What distinguishes a vertical segment from any other in the space? I think, any line with 2 points lying on it that shares the same x-axis is vertical. I can express it without any explicit condition. I just translate the definition of being vertical
in logical form, by posing the same variable in both edges:
("isVertical", ("point", x, _), ("point", x, _))
While unification stands for unifying variables like an AND operator, in Ludwig we also have the OR operation that, if we use the logic programming lingo, is named resolution. Resolution is the programming feature with which the system is performing pattern matching, predicate calls and recursion. In the factorial example:
object x extends VarInt
object y extends VarInt
val model = ludwig(
("factorial", 0, 1),
("factorial", x, x * y) :- (x > 0, ("factorial", x - 1, y))
)
We have one fact and one implication with the same signature: <”factorial”, Int, Int>. Then we query:
model ? (“factorial", -1 to 3, x)
we get:
| Check | @3fd8v | x |
| --- | ----- | -- |
| ❌ | -1 | |
| ✅ | 0 | 1 |
| ✅ | 1 | 1 |
| ✅ | 2 | 2 |
| ✅ | 3 | 6 |
When the first argument is -1 none of the predicates is matching, then simply the system is telling us that this statement is false.
When the first argument is 0 only the static fact is matching, and the corresponding paired argument is returned.
When the first argument is greater than 0 the second predicate matches, and recurversely invokes itself until, as any recursive function, reaches the bottom condition. The one defined by the fact.
In the beginning of this article I was comparing Ludwig with SQL. I see it as a sophisticated language for both querying systems and programming at the same time, leaving out the hurdle of having a dedicated language – SQL/SparQL on one end, Java/python on the other – in your software application.
]]>