Accidental Pitch ♮

Cognitive Architectures for Business Decision Making in Supply Chain Management

2024-06-13T00:00:00+02:00

Most companies rely on a network of suppliers to provide the components or materials they need to create their final products. This supplier network allows businesses to add value and earn a profit by transforming these raw materials into finished goods. However, this dependence on suppliers introduces a significant risk for profit-driven businesses: the potential inability to meet customer demands at the agreed-upon conditions (price and timing).
What could possibly go wrong in the chain of suppliers? The risk could be caused by unexpected:

unavailability of suppliers for:
1. own internal reasons
2. suppliers’ suppliers availability risk
3. issues in the distribution
rising costs due to:
1. distribution chain
2. raw material
3. lowering of competitive alternatives in the market
geo-political issues
1. wars
2. international sanctions
natural adversities

This short dissertation aims to analyze the architecture of a potential cognitive system designed to alleviate these supplier-related problems.

What is a cognitive system? Cognitive systems, in essence, are intelligent agents capable of learning, reasoning, adapting to new environments, and leveraging past experiences to continuously improve. The key advantage? They can deliver high-value functionalities with minimal human intervention.

What is a cognitive system architecture? In the realm of software, designing an architecture means to trace the invariant aspects and components that meet expected requirements and functionalities. Two well-established cognitive architectures, SOAR and ACT-R, have been under development for decades and will be the foundation for this exploration.

System goals

As it clearly appears, nearly all elements involved in the corporate’s values chain may be impacted by supply management, the procurement will influence all the other elements even top business priorities. This is why a system that fulfill the functionalities proposed here will be important for almost all companies and surely for manufacturing ones.

The goals of the system is to help analysts to mitigate risks related to the supply network and provide more suitable alternatives to lower those risks.

How would the system actually help?
The system is intended to provide probabilistic estimations on a variety of natural language queries. It will elaborate how much target forecasts diverges from current estimations according to real-time analysis. The purpose is to raise issues that may undermine the predicted costs for the business, and to raise awareness on auspicable chances to re-organize the supply chain.

Not limited to that, the system should provide estimations in the current state of affairs and also in hypothetical situations artificially setup by tweaking initial conditions.
If you are think of a simulation engine, this is exactly what I mean with that.
The tool should elaborate possible consequences and, in cascade (recursively), analyze further outcomes from them. Analysts may advance simulations on particular conditions, specifying by natural language propositions, some constraints such as:

What will be the impact on the deliveries for supplier $XYZ$ on a increase of +30% of the cargo shipping costs?

Types of simulation may involve complex aspect of related to change in demand from their customer side as in how the supply provisioning will be affect on % increase product X and % decrease of product Y?

Considering a particular simulation on changes in the manufacturing process, queries may be of interest for industrial managers where processes changes will drive to different procurement due to innovations and optimizations:

What are the costs estimations for our smartphones if we replace plastic cover with alloy masked covers?

Data & system heterogeneity

For performing the wished functionalities the system should elaborate a massive quantity of facts, analyze them, evaluate their impact on direct suppliers and their upstream suppliers. The generality of the term “analysis” hides numerous computational intensive tasks for capturing regularities in the data, so it would much easier to replicate past decisions once they have been learned¹. By impact it is intended of events that have direct effect on carrying goods around the world and their production, and also events that may be indicative of changes in mood regarding a particular technology or company or geopolitical area. Rumors usually are seriously taken into account by decision makers, but at the same time they might lead often to false positives as well.

Data awareness² is just one of the ingredients we need for this recipe. Business processes are not designed by tons of terabyte of data, rather they are intentionally crafted by experts and the decisions on setting up how things should be made up for their business are derived to general ways to conduct a specific process (therefore generalizable in templates) but they could also be unique for that business. We may generalize process modelling as:

hierarchical. Because processes are generalizable into typical (following the prototype theory³) cases
non-monotonic. Peculiarities are the norm, and every process may differ from the general one.
compositional. Process models are associative and composable into macro-processes.

Data heterogeneity

Differently from expert input - used for defining how actual industrial and business processes work - all the other type of input is inherently unstructured and noisy. The evaluated architectures propose an approach that tackles the challenge of reconciling symbolic and connectionist representations for robust AI. This heterogeneity allows for diverse data formats and enables the integration of symbolic (cognitivist, based on symbolic logic) and emerging (connectionist, based on neural networks) approaches within a single system by leveraging existing architectures such as SOAR and ACT-R.

Cognitive features

Despite the high degree of uncertainty, ambiguity, and common sense – qualities even humans struggle with – automatic systems must grapple with these challenges. These processes require a set of cognitive skills. But what exactly is cognition within the realm of automatic systems? In the realm of automatic systems, Vernon⁴ has several depictions of what cognition might be:

Cognition is the process by which an autonomous system perceives its environment, learns from experience, anticipates the outcome of events, acts to pursue goals, and adapts to changing circumstances.

I would point to this more intriguing definition extracted again from his book “Artificial Cognitive Systems”:

Cognition is the means by which the system compensate for the immediate ‘here and now’ nature of perception, allowing it to anticipate events that occur over longer timescales and prepare for interaction that may be necessary for the future.

Cognition is the Swiss-knife of intelligent animals that strive to maximize their surviving chances in the environment where they live. Adapting to uncertain circumstances is not an exception, therefore the ability to self-improving is one among first principles:

Cognition is the result of a developmental process through which the system becomes progressively more skilled and acquires the ability to understand. Anticipation and sense-making are the direct result of the developmental process.

Let’s summarize the reasons why the system should pursue the virtues of cognition:

Inferences

The ability to make inferences about events that might impact a business process. Obviously the most important feature. Those events include the actions a company should ruled out according to the system recommendations. To make such inferences, the system should look back into the past and combine that experience with given processes definitions

Feedback-loop

To notice when performance are degrading, identify the reason of such low outcome and take corrective evaluation. The system should engage a feedback loop in order to minimize estimation errors. This feedback could be negative or positive according to the predictions. The feedback could be seen automatically from historic data (supervised learning) but also from experts that can help to draw a causality network of believes used by the system to infer unseen situations.

Autonomy

The automatic agent should actively trigger alarms concerning its goals and inform about risks and/or actions to take. For that, the system should not only ingest/parse/elaborate the information input, but autonomously search for those sources in order to improve its decision-making capability.

Continuous learning

If we assume that cognitive architecture forms the basis for intelligent system in uncertain environments, it should incorporate a flexible approach toward learning and skills improvements.

While narrow ML system distinguish training from inference, and expert systems take the given knowledge as granted, a cognitive system should develop an understanding of the world by constructing theories that explain and predict events and behaviors.

Continuous learning adhere to the theory-theory view⁵ by which the agents don’t passively accumulate knowledge, but they actively construct and revise (by leveraging counter-factuals) continuously what understood. This reflects a pillar of cognitive system referred as development. It states the importance of self-improvement and it is achieved by self-modification, in this view not only the environment perturb the system, but the system perturb itself ⁶.

Simulations

The system should run simulations for predicting alarms might happen in the supply chain. A simulation occurs anytime there is change in the information, even though it does not affect the tree of suppliers. This is due to the cascading consequences events may cause to apparently unrelated entities.

Interaction with other agents

For making sense of the world, cognitive agents have to transform the knowledge acquired through senses into meaningful affordances with which they can achieve their goals. The appreciations of semantic concepts into meaningful concepts is obtained not only by the interconnection with other concepts, but also by interaction with other agents and exchanging those concepts for knowledge sharing. Vernon perfectly summarize that in:

the meaning of the knowledge that is shared is negotiated and agreed by consensus through interaction

Though the supply chain agent would certainly benefit of this skills, remaining dubious its applicability in business contexts. Intellectual properties that the system is intended to conceptualize are secret by definition, therefore the propensity of sharing information that may leak confidential data (business internal structures, process models of any kind) is approximately zero. Even non confidential data may be not shareable for the risk to advantage competing agents.

Overview of main cognitive architectures

Over decades of research, the field of cognitive systems have seen the rise of a multitude of different designs for what could be defined the artificial cognitive architecture. The two more articulated approaches that have been developed are the SOAR and the ACT-R architecture. The have been inspired by diametrical different motivations. The ACT-R is definitely inclined toward modelling and replicating human cognitive processes, while the former was driven by the ambition to supersede the functionalities of an intelligent cognitive system, whether or not inspired by existing biology.
It turns out that despite the differences, the two paradigms are not so far from each other, a grossly simplified similarity is here provided.

Comparison SOAR/ACT-R

In both architectures, persists the separation of declarative memory (the set of info that we may think consciously of them when we need) from procedural memory (enacted and automated skills, such as riding a bike). Both approaches are mostly based on symbolic representation though actions (or operators in SOAR) are evaluated against continuous ranges of utilities which is adopted as reward for reinforced learning.
While the concept borrowed from cognitive psychology about working memory is a pillar of both systems, it is implemented in different ways. In SOAR, working memory is a centralized space for imminent processing, in ACT-R it is a sparse swap space between modules and modules -> central processing. Being both goal oriented, the approach taken for decision making in SOAR differs and deserve a mention.
Whenever there are multiple operators to select, SOAR attempts to fine tune the raking in order to let just one operator (action to take) to emerge. It solve such impasse setting intermediate goals with the purpose of discriminate the best operator. Among the list of differences, I find it interesting the different learning consolidation between the two, surely due to the different foundational motivations. in ACT-R a rule is consolidated (stored and actionable in procedural memory) after a series of re-learning, in SOAR the consolidation is immediate after its production.

Cognitive system structure

Data input

For performing the expected analysis the system requires different type of information: process models and sparse factual data. The first define the processes that the system is intended to optimize for what supply management concerns. The second is the flow of information that might trigger cascading events that affect evaluations and alarms. The processes models are transcribed and supervised by experts but also derived by already existing representations such as BPM models, so it is both a manual and automatic process. The data flow, on the other hand, is completely automatic, and data source may be listed as:

suppliers info:
1. official financials reports. E.g.: quarterly CEO reports.
2. news & rumors from social media/magazines.
central banks communications. Interest rates might strongly affect costs of debts.
inventory reports. Aggregated changes on company inventories can say something regarding raw material trend prices.
Oil prices. Anything that might affect transportation cost.
International political tensions. Clashes between countries/groups may result in chain disruptions.
important local events like Olympiads or G7 meetings may alter the usual exchange of goods and the freedom of movement of people.
natural disasters.
historic records of all above for training purposes.
Summarizing the type of input I would list them into:
news text streams
company and supplier models that describe business and internal processes.processes are described by symbolic means through some domain specific language such as BPM or other logical and temporal language. Such representation should be denotative and connotative for objects and processes ⁷

Architecture overview

What is architecture in the context of a cognitive system? Again, Vernon⁴ comes to rescue with:

…architecture has been borrowed by many other disciplines to serve as a catch-all term for the technical speciﬁcation and design of any complex artifact. Just as with architecture in the built environment, system architecture addresses both the conceptual form and the utilitarian functional aspects of the system, focusing on inner cohesion and self-contained completeness.

The forms follow functions⁸ approach adopted for this project drag attention toward the SOAR architecture for a variety of reasons such as:

It is adopted a completely functionalist approach. It’s irrelevant whether the set of cognitive processes are closer to the current state of knowledge on how human brain process information. While it is important to consider the implications an intelligent system might have for accomplishing its functions, it is not vital that they are adherent to the human mind.
The knowledge the system may handle would be always incomplete, therefore chain of reasoning will always encounter impasses. Impasse will be the natural intermediate state of any evaluation since the semantic and the procedural knowledge would never be suited for all encountered fact consequences.
However, some ideas will be borrowed from ACT-R as well:
Production compilation. It is a learning task devoted to aggregate multiple rules for composing macro-functionalities. Rule re-usability is an optimization milestone for speed-up learning of complex production rules.

Modules

Fig. 1 cognitive system architecture

News Digester. The module transform the incoming data from news feed (but also many other sources, see above description) into propositional statements according to first order logic. Those statements will be stored in triple-store databases, which constitutes the episodic long-term memory.
Chunking. This module is inspired by the homonym component in SOAR. It is devoted to learn new behaviors/actions in certain conditions. Learning means creating new rules that can accomplish better estimations. Since the knowledge encoding is symbolic, chunking would comprise learning methods for discrete representations such as probabilistic inductive logic programming, that combines deterministic learning with probabilistic likelihood of the output propositions.
Simulator. How the system is supposed to answer questions regarding forecasts in case of hypothetical events and conditions? This is what the simulator is entitled for. It run all existing rules all over all defined processes until a number of stable models are generated by the system. Those models represent a state might occurs in a determined time step, therefore they will be used on extracting the requested metric (delivery date, prices, etc..) in that particular case. Such models are probabilistic, that means they are associated with degree of uncertainty that is reflected to the requested metrics as well.
Optimizer. Paired with the simulator, an optimization engine will act as a rational daemon for the chain of decisions a player involved in the business processes is supposed to behave. The simulation will involve game theoretical analysis because external agents’ decisions must be taken into account on evaluating requested metrics. For example, changing supplier might put the excluded one in the conditions to offer their products/services at discount cost to the competitors, initiating a cascading chain or eroded margin that are not welcome by shareholders. The optimizer would trigger that alarm during the simulation, informing the analysts of possible earning risks.
Scenario builder. This is the user interface module, for gathering constraints and given events for running simulations. The module will allow to retrieve specific evaluation metrics for the current state of events as well.
Above the list comprehend computational modules intended to elaborate information which is stored and represented in the storage modules described below:
Episodic long-term memory. This database holds all propositions that regards events and facts. The information is saved in triple-store databases.
Semantic long-term memory. In here, the information representation is different from the episodic memory. A probabilistic likelihood is associated to proposition rules, which might have particular additional characteristics such as: symmetry and transitivity. This information module is recursive in the way that it is possible to specify recursive relations that have the advantage to be very declarative and concise in their definition. Of course, the database should be capable to deal with recursive structures and implement particular policies for optimizing the computation of (virtually infinite) depth of computation.
Working memory. Among the storage modules this is the most active, in the sense that it is supposed to cache the information that is temporary generated but at the same time, WM should anticipate the demand of information from long (and slow) term memories. Therefore, particular expedients should be employed to optimizing database lookups.
Procedural memory. This is the place for storing internal/external production and business processes models and they are provided by knowledgeable operators. This is the core of the expert system side of the entire system, where the human experience is encoded for being processed by the automatic system.

footnotes

I want to remark the meaning of learning since it is commonly associated with pattern matching in machine learning tasks. Learning could be defined as a meta-function that lays - in hierarchy of cognitive function of human’s mind - above perception, memory, thinking, acting. The learning process encodes (transforms) in organized mental representation in connection with other known concepts. ↩
“Becoming a data-aware organisation means being able to see data opportunities and risks and translate them to actions. For that, you need an organisation that looks at projects from the data point of view and a few data specialists (depending on the size of your team) who can put that perspective into practice.” - source ↩
prototype theory is a theory of categorization in cognitive science, particularly in psychology and cognitive linguistics, in which there is a graded degree of belonging to a conceptual category, and some members are more central than others. - source ↩
Vernon, David. Artificial cognitive systems: A primer. MIT Press, 2014. ↩ ↩²
https://en.wikipedia.org/wiki/Theory-theory ↩
Cognitive psychology uses different and more glamoured terms, I mention them since there is an intersection of different discipline and it is extremely interesting to notice that. For that respect, ontogeny match with the definition of development, to not confuse with phylogeny which refers to the change of agent’s specie through evolution. ↩
Like in the “physical symbol systems” (Newell and Simon 1975) ↩
The idea “form follows function” was given by the architect Louis Sullivan. ↩

The Curious Case of Tip of the Tongue phenomenon

2024-04-11T00:00:00+02:00

Stuck on a name? It’s like having a word stuck right on the tip of your tongue, but you just can’t remember it. Like a ghost, it is driving us toward a right direction with a sense of closeness but without never revealing us the name. Imagine trying to open a door with a bunch of keys, but none of them fit. That’s the tip of the tongue feeling (TOT), and it happens to everyone sometimes.

The TOT state is also difficult to examine, because as you might imagine, it is identified only subjectively and escapes an objective scrutiny. How do we recognize to be in a TOT state? If you are unable to think of the word but you feel sure that you know it and that it is on the verge of coming back to you, then you are in a TOT state¹. However, it’s important to distinguish TOT states from a similar phenomenon known as the feeling of knowing (FOK). While during a TOT we have the sensation to be just one step away to pronounce the answer, FOKs emit different feelings. They gives us the resignation that we don’t know the target word, but we can eventually recognize the right answer among some alternatives.

Interested in language? Take a look into basic principles of language

If you feel uncomfortable experiencing a TOT, it would be pleasant to you to know that usually the feeling associated lasts for no longer than a minute, but be aware that half of the times a TOT is never solved. The sensation of being able to retrieve the word is accompanied by clues that usually appear to be correct, such as the first letter² and even the number of syllables of the target word. Psychologists found that dictionary definitions of uncommon English words would be enough for triggering a word-finding failure in their experiments. That’s why rare words, along with celebrity names, are used so widely in TOT experiments.

While we know all the sensation related to TOT, not all individuals experience this state with the same frequency. Students in their twenties might have one or two per week, while the frequency doubles for elderly people in their 80s. Age disparities seem to be the most correlated trait with this phenomenon, corroborating causal hypothesis that center their attention on cognitive degradation. While it seems a direct and intuitive explanation, this is not the end of the story. We will unfold some interesting reasons behind TOTs.

hypothesis survey

The effect has been under scrutiny since its first mention³ in the 19th century, but while early hypothesis were formulated in the thirties of last century, the TOT state is a prolific arena of research even nowadays. What makes TOT happening? Fingers are pointed against cognitive processes that involve memory and language. In particular, the linking bridge between the semantic memory - dedicated to memorize causal relation among concepts - and the phonological loop, the mental system entitled to vocalize words. How those modules may give us the sensation of TOT?

Fig. 1 - representation of connections between semantic and phonological systems

When we try to remember, the meaning of the word allows direct and immediate access to its sound, thus enabling vocalization. However, sometimes the phonological entry is not fully activated, instilling a certainty of knowing in the absence of expression ability. What could be the origin that causes those impediments between semantic and phonological systems? There are several hypothesis that have been evolved with time.

incomplete retrieval hypothesis

Let’s think of remembering a word. The more clues and reference you have the easier is to narrow down your search. When there are many paths to the final destination - the target word - it will be more likely the retrieval will be successful. Each clue acts as a signpost, pointing you closer to the target word⁴. According to this idea, if there are not so many paths for finding the right word, it would be hard to solve the search. Therefore, it is a lack of related clues that afflicts the identification of the correct answer. If you find this theory appealing, you might be disappointed by some findings⁵ that are undermining the validity of this hypothesis as solely responsible of TOT.

In the experiment, participants were fired with casual words exactly when they were experiencing TOTs. Those “bullets” (metaphorically speaking) were either related or unrelated to the target. It happened that unrelated words don’t affect the resolution of the TOT. Instead, related ones have an influence but in opposite way than predicted by this theory: more related words -> less successful resolutions.

blocking hypothesis

The experiment⁵ above may support a complementary explanation for TOTs. What if some clues instead of favoring retrieval, they prevent it? It is something that Freud mentioned⁶ when we associate a target name too early during the resolution phase, in the way:

“although immediately recognized as false, nevertheless obtrude themselves with great tenacity”

According to the blocking hypothesis, it is not just that a wrong candidate diverts attention away from the right word, it actually competes against the right target as solution. Related words obstruct the reach of the target. It is this impediment the responsible of holding back good resolutions⁷.
Similar words have such power that it is not even necessary to provide them directly. Unconscious alternative words that come to mind that are too weak to be recall may be capable of inhibiting target retrieval⁸ (somehow compatible with the knowledge hypothesis, hold on to the following chapters).

Fig. 2 - The price (memory search) try to find the definition that fits (Cinderella), but the ugly stepsisters (blockers) intercept the search effort

Also the blocking hypothesis present some weaknesses. According to the theory, an higher rate of TOT should be associated with a corresponding higher number of alternative words reported during the TOT state - because the theory prescribes - it is the alternatives that are blocking the target, right? Actually not. A substantial amount of TOTs are reported without any alternative word⁹. Also in this case, evidences do not clearly support the examined theory.

inhibition deficit hypothesis

It is worthy to mention that another hypothesis is closely related to the blocking hypothesis. The inhibition deficit hypothesis (IDH) emphasize the inhibitory control of speech. According to IDH, the retrieval is successful when we can get rid of all distracting information and reach the target without being overwhelmed by irrelevant details. A TOT state is due to the confusing and distracting amount of extraneous information (This is a more formal term that means not essential or necessary. May sound a bit pedant, but I like this term 😊) that impedes a straight resolution¹⁰. In support of this, the capacity of excluding noise during cognitive processes degrades with age. Aged people can less effectively shout out distractors than youngers, explaining why elderly people experience more TOTs than youngers. On the other hand, even this theory is not the whole picture. There is an amount of evidence that shows older adults produce instead fewer alternates during a TOT compared to younger people¹¹. For just this reason elderly people should be affected less, and not more by TOT. Something here does not add up.

transmission deficit hypothesis

This model suggests that TOT states arise from impairments in the communication between various cognitive modules or brain regions, rather than blaming individual systems. The retrieval of phonological traits might be particularly susceptible to linking failures because it is dependent on fragile connections between a word’s lexical node (lemma) and its phonology (Fig. 1). In contrast, the semantic and lexical systems are more connected. That’s why they are more resistant to aging decay than the lexical/phonological linking¹².
If you find many similarities with incomplete retrieval hypothesis, you are not wrong. Both point to a lack of strong connections. Interestingly, this is why some neurological evidences that enlighten aging-related issues, support all of them. Older adults exhibit reduced activity in the left insula - probably due to age-related atrophy¹³ - an important region for phonological production¹⁴.

knowledge hypothesis

This idea comes with diametrical different assumptions than the previous ones. It suggests that TOTs aren’t caused by declining brainpower, but by having too much knowledge. When experiments’ results are controlled over the type of target words, something new pop-ups from the data. The author states that in the experiments that do not require proper names (like celebrity names) there is no age difference in TOT rates⁸. This seems to be a quite incisive discovery.
If we take the transmission deficit hypothesis as granted, more you know about the target word more the semantic system compensate the aging effect. Instead, data shows that higher the knowledge, the higher the TOT rate. This aligns somewhat with the blocking hypothesis, where too many related words might get in the way.

illusion hypothesis

Among the explanations this is the most controversial one. It says that the emotional charge carried during a TOT state is an induced feeling that trigger our curiosity on the target and raise willingness on searching the right word. It seems a mental trick that tell us: “hey, for sure you know the answer, keep going!”, to push us for searching more obstinately. As we can find out in the next chapter, this is not really a disadvantage; rather it is a positive feature of TOT. In situations of great uncertainty this meta-cognitive decision can be helpful on pursuing the right action¹⁵. In support of this thesis, it has been found out that TOTs are more frequent in groups. Group magnify the feeling of the target word is on the reach, prompting TOT states more than what occurs on single individuals¹⁶.

upsides of TOT states

Remember the “tip-of-the-tongue” feeling? It might be a good thing. As mentioned above for the illusion hypothesis, TOT state is beneficial because it induces more curiosity and motivation to spend more energy to reach an achievement¹⁵. Moreover, the confidence of the apparently latent knowledge pushes us into more risky decisions. For example, participants in test quizzes are more willing to risk their grades if they are in TOT state. It turns out that such a bravery is prized by better results¹⁵and then the frustrating feelings of inability are somehow balanced by higher performances.

practical applications

This phenomenon open the way to practical applications, for example in the area of assessing students’ knowledge. If they can’t come up with a short answer, they could maybe use TOTs to know when it’s smart to ask for multiple-choice options. Quiz designers might exploit TOT effect to build adaptive tests, to let participant to demonstrate various levels of knowledge, including knowledge that might be present but momentarily inaccessible¹⁷.

conclusion

This survey of potential explanations for the TOT phenomenon hasn’t yielded a definitive answer. Instead, the evidence suggests that various hypotheses likely contribute to TOTs, each with varying degrees of influence. Tough not privileging a single explanation, we can de-emphasize the incomplete retrieval hypothesis due to the lack of compelling supporting evidence. The blocking hypothesis, however, might be encompassed by the knowledge hypothesis. Since age-related cognitive decline is a well-established factor, it could not be ignored as contributing factor in TOTs.

more curiosities

There are also some tricks could be used to mitigate those kind of retrieval failures. If the blocking hypothesis affirms that if we can’t discern useful clues from irrelevant noise we are more affected by TOT. So why not consciously attempt to ignore related but incorrect words? As Woodworth (1938) suggested:

…the wrong name recalled acquires a recency value and blocks the correct name … a rest
interval allows the recency value of the error to die away

A curious aspect of TOTs has been found out on the ethical sphere. The sensations associated with TOT is a kind of warm glove that attributes high ethical values to not able to remember memories. Curiously, when celebrities’ names induce TOT states, they are judge more ethical¹⁷.

Giancarlo Frison

(p. 329). R. Brown and McNeill (1966). The “tip of the tongue” phenomenon. Journal of Verbal Learning & Verbal Behavior, 5(4), 325–337. Doi: 10.1016/S0022-5371(66)80040-3 ↩
Usually correctly guessed 50% of the times. Rubin, D. C. (1975). Within word structure in the tip-of-the-tongue phenomenon. Journal of Verbal Learning and Verbal Behavior, 14, 392-397. ↩
James W. Principles of psychology. New York: Holt; 1890. ↩
Wenzl (1932, cited in Woodworth, 1938), R. Brown (1970) ↩
Jones, G.V. Back to Woodworth: Role of interlopers in the tip-of-the-tongue phenomenon. Memory & Cognition 17, 69–76 (1989). Doi: 10.3758/BF03199558 ↩ ↩²
Freud (1901, cited in Reason & Lucas, 1984) ↩
Burke et al., 1988; Reason & Lucas, 1984; Jones, 1989; Jones & Langford, 1987 ↩
Burke et al, 1991 - On the tip of the tongue: What causes word finding failures in young and older adults? Doi: 10.1016/0749-596X(91)90026-G ↩ ↩²
Reason, J. T., & Lucas, D. (1984). Using cognitive diaries to investigate naturally occurring memory blocks. Everyday memory, actions and absent-mindedness, 53-70. ↩
Awh, Matsukura & Serences, 2003; McClelland & Rumelhart, 1981; Ridderinkhof, Band, & Logan, 1999 ↩
Burke et al., 1991; Burke & Shafto, 2004; Fraas et al., 2002; White & Abrams, 2002 ↩
Cognition language and aging doi10.1075/z.200; Derived from a theory of language production called the Node Structure Theory (MacKay, 1987). ↩
Shafto, Meredith A., et al. “On the tip-of-the-tongue: Neural correlates of increased word-finding failures in normal aging.” Journal of cognitive neuroscience 19.12 (2007): 2060-2070. ↩
Shafto, Meredith A., et al. “Word retrieval failures in old age: the relationship between structure and function.” Journal of Cognitive Neuroscience 22.7 (2010): 1530-1540. ↩
Metcalfe et al.,2017; Schwartz & Cleary, 2016 ↩ ↩² ↩³
Socially Shared Feelings of Imminent Recall: More Tip-of-the-Tongue States Are Experienced in Small Groups - Rousseau, Kashur 2021. ↩
The tip-of-the-tongue state as a form of access to information: Use of tip-of-the-tongue states for strategic adaptive test-taking - Cleary 2021 ↩ ↩²

Short glimpse on predicate and first order logic

2023-12-17T00:00:00+01:00

Course: Foundations of Artificial Intelligence III
This course spares topics that can be listed in 3 main groups. Propositional logic (PL), first order logic (FOL), reasoning and satisfiability. The literature mentioned in this course refers to Chapters 7,8 and 9 of AIMA book. The example applications used in the course, is (among many other) the wumpus world (inspired by the one in the AIMA book) which is applied as a playground for PL and FOL explanations. Logic as a general class of representations to support knowledge-based agents. Such agents can combine and recombine information to suit myriad purposes. A logic must also define the semantics or meaning of sentences. The semantics defines the truth of each sentence with respect to each possible world. For example, the semantics for arithmetic specifies that the sentence “x + y =4” is true in a world where x is 2 and y is 2, but false in a world where x is 1 and y is 1.

Propositional Logic

PL is a simple language consisting of proposition symbols and logical connectives. Its syntax defines the allowable sentences while its semantics defines the rules for determining the truth (just true or false) of PL sentences with respect to a particular model. The semantics for propositional logic must specify how to compute the truth value of any sentence, given a model. This is done recursively. All sentences are constructed from atomic sentences and the five connectives.
A model is a truth assignment of propositions in a knowledge base (KB) which is a set of sentences (axioms) when the sentence is taken as given without being derived from other sentences. Sentences can be derived from other sentences that are logically entailed. Entailment is the idea that a sentence follows logically from another sentence. In mathematical notation, we write α $\vDash$ β if and only if, in every model in which α is true, β is also true.

Inference

Sentence derivation is done by running an inference algorithm that follows the modus ponens logic paradigm or the modus tollens. The former refers to deductive forward chaining (FC) while the latter the inductive backward chaining (BC) families of algorithms. FC is an example of the general concept of data-driven reasoning; reasoning in which the focus of attention starts with the known data. BC algorithms, as its name suggests, works backward from the query. If the query $q$ is known to be true, then no work is needed. Otherwise, the algorithm finds those implications in the knowledge base whose conclusion is $q$.

Resolution

For a sentence to be proved as true, it must be sound and complete. A sentence is valid if it is true in all models. For example, the sentence P ∨ $\neg$P is valid. Valid sentences are also known as tautologies, they are necessarily true. If a sentence is valid and its premises are true, then it is also sound. For being complete, the inference algorithm must derive any sentence that is entailed. Proofing is obtained by reductio ad absurdum (or proof by refutation or contradiction) on which α $\vDash$ β if and only if the sentence (α ∧ $\neg$β) is unsatisfiable. How resolution works for obtaining a proof?

First, $KB \wedge \negα$ is converted in CNF. CNF stands for conjunction normal form (formula consists only of conjunction of disjunctions).
Each pair that contains complementary literals is resolved to produce new clauses, until:
- There are no new clauses that can be added, in which case KB does not entail $α$.
- Two clauses resolve to yield the empty clause, in which case KB entails $α$.

First order logic

PL lacks the expressive power to concisely describe an environment with many objects.
The language of FOL is built around objects and relations and it assumes that the world consists of objects with certain relations among them that do or do not hold. While propositional logic commits only to the existence of facts, FOL commits to the existence of objects and relations and thereby gains expressive power. In FOL are represented objects, predicates and functions. Predicates are $n$-arity relations among objects or employed for expressing features of a single object. Functions are expressions that return a single object out of a set of arguments.
In FOL, it is natural to express properties of entire set of objects and it is done by adopting the existential quantifier ($\exists$) and the universal quantifier ($\forall$).

Natural numbers

An interesting application of FOL is the description of Peano numbers which are Peano numbers are a simple way of representing recursively the natural numbers ($Nat$) using only an axiom ($Zero$) value and a successor function $succ(Nat)$.
$Nat(Zero). \forall x [Nat(x) \rightarrow Nat(succ(x))].$

Unification

This is the process to make - whenever possible - different logical expressions look the same by substituting properly the value of variables. With unification it is possible to construct all queries that unifies with a given sentence, e.g.: $Employes(SAP, Giancarlo)$ and $Male(Giancarlo)$ some queries might be: is there are male employee in SAP? In FOL it might be: $\exists x [Male(x) \wedge Employes(SAP,x)]$ .

Reasoning in FOL

Reasoning in FOL works by bringing the formula into Skolem form - by removing existential and universal quantifiers - and transform it in clause normal form (which indicate a formula composed of conjunctions separated by comma). Use propositional reasoning (resolution, SAT), forward cha backward chaining.

Herbrand universe

The Herbrand Universe (HU) is a set of combinations of all ground terms (not variables) present in a formula, e.g.: in the clause formula $CapitalOf(x,y) \rightarrow IsA(x,City), IsA(y,Country), PartOf(x,y)$ we have the HU as $HU = {City,Country}$ but if there would be a function, the HU size will be infinite. The HU is useful for example to restrict the search space of Prolog programs, whenever it is possible.
The HU could play a part on resolution in FOL by applying the Herbrand expansion which is the set of formulas the results of substituting terms in the initial formula in all possible ways. Given a knowledge base: $KB = \forall x [SpecialAgent(x) \rightarrow SpiesOn(x, Danz)] \wedge SpecialAgent(MrSmith)$
it will be translated as “every special agent spies on Danz and MrSmith is a special agent”.
If we advance the hypothesis that formula $\phi=SpiesOn(MrSmith, Danz)$ in entailed by KB ($KB \vDash \phi$), the Herbrand expansion could be applied to show that $HE(KB \wedge \neg\phi)$ is unsatisfiable.

a brief account of the intersection between perception and attention

2023-11-29T00:00:00+01:00

Attention and perception are two fundamental elements for enabling cognitive processes that allow us to be functioning in the environment where we are striving to survive. Likewise any different aspect of functionalities in complex systems, single traits should not seen solely on their own but in conjunction of all other ones that contribute to compounded behaviors.
Attention and perception do not make an exception to this rule.

As in the established definition of agent¹, the sensation is the process of acquiring environmental information from the environment through sensors. The behaviour the agent manifests is the result of pondered actions, activated by proper strategies that aim to achieve desirable goals. Living creatures, including humans, do not deviate from this general definition. Attention could be seen as a function that help on elaborating successful strategies in efficient way by partitioning what really can affect the goals, and what it does not ².

Sensation brings its meaning directly from the raw information acquisition from the senses. They include of course the 5 canonical ones, but not limited of them, they involve also movement, position of the body, pain and temperature. It could be surprising how much sensitive are humans senses. For example, it is possible to detect a candle almost 50km away in a dark and clear night ³. On the other hand, perception is a more abstract concept.

When perception is closely tied up with sensation it is defined as bottom-up perception. In this case, perception is the interpretation of information from the environment so we can identify its meaning ⁴ and enable more or less accurate predictions about what should be there. Perception, on the other hand, emerges also from cognitive processes that involve memory and attention, completely detached from sensory circuits. This scenario is described as top-down perception, and it is a process strongly mediated by expectations and the contextual setup during which those sensations are collected ⁵. In the image below, the shadow is expected to lower the brightness of the B region, then a kind of visual compensation innately make us thinking that the region should be brighter without shadow, generating the famous visual illusion.

Attention and perception are highly correlated and it is very evident in every day life. If we are walking in the university department looking for the Cognitive Psychology professor, I think it won’t be surprising if we pay attention on people faces by visual searching in the crowd until the target pops out in the hall. The eye movement will be directed to scan the faces of the people ⁶. If expectations are violated by novel surprises (e.g.: an old mate in the classroom) these are explored extensively ⁷. The other face of the coin is inattentional blindness by which what it is not attended during the attention phase is not consciously perceived. For example, 60-80% of the observers do not attend the center point which turns into a text while a distracting cross appears in the left ⁸ in the figure below.

I’ve mentioned popping out by describing the subjective effect may have when something catches the attention with no efforts. It is established that this is an interesting topic that shows particular dynamic depending on the environmental context. If the search task involves picking a simple feature out of a context disseminated of distracting details, it is much easier than searching for complex pattern with a combination of features - the so called conjunctive search ⁹. In the experiment below, recognizing a simple red dot among all green ones is automatic, while the case in the right pane involves some degree of attentive scrutiny which is much slower and more resources demanding.

This might suggest that attention is somehow necessary for recognize non-trivial objects ¹⁰. The feature integration theory ¹⁰ affirms that automatic feature processing is followed by attentive processes to bind the features into a whole object. Objects are not a mere list of independent features and perception does not limit on pattern matching but also on identifying high order structures, with the support of controlled attention.
Just as the letter N is not simply a casual aggregation of 3 segments, things must obey to some grammar for being recognized.
A demonstration of the importance of structural relations is depicted in the image below, where a set of simple forms (geons) can easily compose complex objects ¹¹. That demonstrates that not only the relational information if needed, but it is more critical to perception than the features themselves ¹².

While some information need attention to retrieve complex structures from senses, an equal demand of effort is required to selectively ignore conflicting information while performing some tasks. This is the case raised by John Stroop in his famous experiment ¹³, that demonstrates the difficulty of partitioning contrasting information on one side, and useful one to the other. In the Stroop task, an observer reports the color of appearing words, while the words points to a different color’s name. It appears that the task is more difficult than the no-contrasting information setting (when the name matches the color). Observers activate different parts of the brain on discordant stimuli processing, those parts are in charge of executive control and selective attention functions. For the same reason, when we stop at a cross-light while driving a car, we discriminate the cross-light of our own lane from the adjacent cross-lights. Selective attention prevents us to switch from the brake to the accelerator pedal when it is not the case.

The selective attention theory states that perception is filtered before being processed by high level mechanisms, but it clashes with the notable cocktail party effect (CPE). Unfortunately for Broadbent, the CPE seems to confine the selective theory not as the only way the brain deal with attention. The CPE describes the capability of being caught by unattended stimuli once they present an important pattern. For example, while we’re on talking with friends and someone out of the interlocutor’s circle will loudly mention our name, our attention will be probably triggered by this event toward the speaker and his discourse, even though we were previously unaware of that discussion.

An interesting effect of unattended stimuli is that they interfere with attended perceptions. The experiments that put light on this effect, are the ones that enforce shadowing, by which observers are instructed to follow only one stream of perception among many. If two recorded discourses are played simultaneously, one to the left side, the other to the right side of the observers’ headset:

a: “They were standing near the bank…”
b: “the silicon valley bank has gone bankrupt…” ^[It is not exactly the example of the experiment, since that bank went bankrupt on 2023, but I guess it is equivalent to the purposes of the experiment]

Observers disambiguates the term bank with the financial bank, and not with other possible meanings ¹⁴. This experiment tell us that though attention allocates sufficient resources to spotlight some stimuli, it leaves space for unconscious mechanisms that capture information for the unattended channel, for blending it with the attended one.

Giancarlo Frison

Russell, Stuart J.; Norvig, Peter; Artificial Intelligence: A Modern Approach (2003, 2nd ed.); Chapter 2. ↩
There is no clear definition of what attention is: “No one knows what attention is” (Hommel et al., 2019). The first attempt to categorize it comes from William James in The Principles of Psychology (1890): “is the taking possession by the mind, in clear and vivid form, of one out of what may seem several simultaneously possible objects or trains of thought…It implies withdrawal from some things in order to deal effectively with others”. In the Schema Theory (Neisser, 1976), attention is a dynamic process that seeks information consistent with current situation. I think a good synthesis of many definitions could be summarized into “attention is the allocation of resources and processing to a particular object, region, dimension”. ↩
Okawa & Sampath, 2007 ↩
William Wozniak. Sensation and Perception ↩
Neisser, 1976 ↩
Yarbus, 1967 ↩
The metaphor of brain as a predictive machine found matches for example on Jeff Hawkins - On Intelligence (2004), and Karl Friston - The free-energy principle: a unified brain theory? (2010). The latter reduces agents as surprising minimizers; the former adapt the free energy principle to human cognition, on which automatic processing will escalate to high form of deliberate decision making (throughout attention mechanisms) whenever the automatic layer does not know what to do in certain circumstances. ↩
Mack, Rock 1998 ↩
Treisman & Gelade, 1980; Treisman & Sato, 1990 ↩
Anne Treisman, Garry Gelade; A feature-integration theory of attention; (1980). ↩ ↩²
Biderman, 1987 ↩
Biderman, 1985 ↩
Stroop, John Ridley -Studies of interference in serial verbal reactions (1935) ↩
MacKay, 1973 ↩

the neurosymbolic nature of Conceptual Spaces

2023-08-04T00:00:00+02:00

“This is the promise of the Semantic Web – it will improve all the areas of your life where you currently use syllogisms. Which is to say, almost nowhere.”

—Clay Shirky

“Fortunately, a large majority of the information we want to express is along the lines of ‘a hex-head bolt is a type of machine bolt.’

—Berners-Lee

“Unfortunately this is not true. If one considers how humans handle concepts, the class relation structures of the Semantic Web capture only a minute part of our information about concepts”

—Peter Gärdenfors

I guess that it doesn’t come without notice that the semantic web approach isn’t the favourite of the author of the Conceptual Spaces (CS).
Two distinguished views on knowledge representation are the semantic web and the vectorized embeddings that belong to the symbolic and the connectionist schools respectively. The CS theory comes from a very different vision on how knowledge should be encoded. Concepts in CS are without doubt, close to vectorized embeddings representation though they preserve interpretability - a strength of the symbolic world.

Vector embeddings

It’s not hard to think many of you have already heard about word embeddings for knowledge graphs. Modern natural language processing tasks based on neural networks would not be in there without vectorized embeddings. We’re witnessing the immense progress of natural language processing (ex: GPT and related) in recent times, and their machine learning algorithms rely on word embeddings. A branch of deep learning named graph neural networks (GNN) have brought a similar advancement on machine learning tasks such as link prediction in ontologies. The intuition behind neural network embeddings is that words or graph nodes can be represented as a series of real numbers that embed the semantics, and programs of a very special kind - the neural networks - can use them to accomplish some specific task. Vectorized embeddings come as a byproduct of those processes where there is not contemplated human supervision in the loop. Embeddings, differently from ontologies, don’t convey any semantics that is comprehensible, and even less edited, by people.

If requested during a talk in a conference, I can hardly imagine someone raising the hand because of past experience with CS.

In the CS framework, concepts are represented as vectors of numbers that are continuous in their spectrum and convex, which means that similarity scores and distances among concepts are naturally derivable, but those vectors are not imperscrutable to human scrutiny. Concepts are built taking in account how we process and categorise them.

Properties as region spaces

Domains represent a single quality. They are convex and differentiable because they can be represented in real values. A concept is qualified via a set of domains. An apple has a round shape, a colour in the range of green, yellow and red, a taste (which is apparently related to the colour), a size, a weight. Likewise neural embeddings, similar concepts cluster together in regions. In CS, concepts are defined by vectorized properties where each of them describe a quality in a specific domain:

The colour domain could be represented as a three dimensional space, where each dimension defines the intensity of one the three basic colours, red, blue and green. The RGB notation is convex in relation to what we perceive as colours. The two ends of RGB scale are #000000 (black) and #FFFFFF (white) and this is numerically consistent with our perception, from nothing to all colours together. Further, a slight change in one parameter implies a small visual change, as more or less reddish, bluish or greenish depending on which parameter.
Consider now the price dimension. 43,78€ is certainly a number but also a currency, thereby the price could be encoded in 2 dimensions.
Think of time dimension, it is a single dimension domain, where zero is the present, a positive value is projected in the future while the negative numbers are settled in the past. Its magnitude instead, can describe how far that point is from the current present.

_“The things have weight,mass, volume, size, time, shape, colour, position, texture, duration, density, smell, value,consistency, depth,boundaries, temperature, function, appearance, price, fate, age, significance.

The things have no peace.”_

—Arnaldo Antunes

The main similarity with neural embeddings and CS is that both are differentiable and convex. At the same time, concepts incorporate symbolism with their multi-facets domains; they naturally enable semantic algebra and computational problem solving. For example, consider the request: ‘show me a movie like Casablanca but scarier as Shining’ will consider the properties of Casablanca but with the scary domain similar to Shining. Put it in the frame of logic programming, and you get it in a line of code.

Cognitive affinity of cognitive spaces

CS has also some interesting psychological foundations in the regards of how people deal with inner knowledge representations and how they learn them. It has proven some validity in explaining some cognitive aspects, especially those involved in concept learning and understanding. It has been found out that when children have assimilated the meaning of a domain, it’s then easy to learn concepts that represent a flavoured materialisation of that domain. For example, once they know what the domain of ‘colour’ is, it’s easy to learn new concepts related to colour, such as ‘turquoise’. Grasping a new domain is a much more difficult step than adding new terms to an already established one. Conceptual domains are mental buckets where we place concepts based on how their properties fit into that domain and we don’t have to know how ‘turquoise’ is exactly encoded, we just need to think of it in comparison to other concepts as somewhere in the between of light blue and light green. Seems to be a provable trick we use on learning and it is justified by the principle of cognitive economy, for which our mental capabilities are limited and we favour simple and efficient ways to position new information.

I’ve written more about cognitive aspects in meeting of minds. Take a look at there!

Justify Recommendations with Semantic Technologies

2023-08-02T00:00:00+02:00

E-Stores loose sells due to the negative biases of consumers. While salespeople give proper reasons to change consumers misbelieves, it is problematic to address those issues in an online shop. I have proposed a method to combine semantic graphs with logic programming and symbolic machine learning to restore consumer’s confidence. The digital agent detects what are the problems users might have and offers them explanations and valid arguments for not worry about, or why a given recommendation is more suitable than others.

👉 hereby, a brief informal account of the idea behing this patent.

The copy machine experiment

In a behavioural test in the old seventies, people about to use a copying machine were asked to let another person, the experimenter, to use it first despite there being a line.
“Excuse me, I have 5 pages. May I use the xerox machine?”. Within this request, 60% of the people let the experiment go first. Then experimenters changed the call into:
“Excuse me, I have 5 pages. May I use the xerox machine because I have to make copies?” The justification is a clear non-sense. It is technically called ‘placebic information’ because, comparably in pharmacy, the given explanation does not contain any additional information. You might be surprised that in the latter case the rate of success reaches an astonishing 93%.
This experiment teaches something interesting about how we evaluate the information from the environment. Even without any significance, just the feeling of a reason can change dramatically how we behave. Most of our daily behaviour is accomplished without paying attention to the informative details. This is obviously not new – advertisers know it very well. But we are not doomed to be always fooled, we are conscious when the decisions are more important.

Argumentation

Differently from the copying machine experiment, argumentation is a bit more interactive than a single-shot justification. We are, as humans, designed to improve our view of the world through confrontation with others. We need to be challenged to defend good ideas or abandon the wrong ones. Argumentation involves a confrontation of clashing reasons and sometimes beliefs are inconsistent in the light of new objections, so we must give up some of them.

John Stuart Mill once said “Both teachers and learners go to sleep at their post as soon as there is no enemy in the field” to spot the point of argumentation.

The argumenting e-commerce

The eStore sell photography items, and any product is represented as concept graph and the system knows some property of articles for sales. We know that the Canon EOS-1D is a high quality one and professionists desire high quality cameras. John, the user browsing the eStore, after a careful inspection decides to add it to the cart. It could be interpreted that John desires that product and we can store this information in his personal knowledge graph. This information is used for enabling further reasoning about who John is, and what are his peculiarities. We can infer that maybe John is a professionist. Why? Because John and professionists share common desires. Is it enough to be certain of that? Of course not. But it is a hypothesis that the system should consider.

Churning Alert!

But there is a problem. The eStore’s machine learning system is prompting some alarms on John. We know that the chances John will finalize the purchase are low (for a variety of reasons) and the system has to do something about that.
I introduce now some logic primitives that lay the foundation of logical argumentation. We know, as it is encoded in our knowledge base, the a professionist requires some kind of ability to perform his job, right? Ability is a concept, and its antonym is inability. We can infer that inability prevents anybody to be a professionist. So the inability’s argument is attacking the professionist’s one. Here we see a pattern, specifically (sorry for the logic lingo) we can affirm that:

attack(X, Y) ← requires(Y, Z), antonym(Z, X)

The initial hypothesis about John being a professionist can enable the system to make a deductive conclusion in the regards of the abandoned cart. We may say that John is unable to use that camera. Why? Maybe because he does not know that camera. Maybe he does not know the full set of characteristics that device can offer. Again, it is hypothetical yet very plausible argument. What the system can do for him? Well, we have many products in store. One of them is an online course, which is intended for learning, and learning is attacking (for the same principle we’ve seen before) the inability argument. Why not recommend it?

John seems to be caught by our proposal an inspect the course we recommend him. What could be go wrong here? Would be the case John has some reluctancies even for the online course? Maybe John does not have time to attend it, he’s a very busy photographer! But the system elaborates some precious information about the course for prompting some valuable argument to convince John that the recommendation is a valid one. From the description we know the course is just 20 minutes long. For the average of courses on sale this is indeed very short. Let’s tell John he does not have anything to worry about time because the course is short.

Or maybe John is worried by the cost of the course? As inattentive user he might overlooked some important information. No worries John, it is for free!

The Meeting of Minds

2022-11-15T00:00:00+01:00

“I’ve broken it off.”

“What do you mean, you’ve broken it off? She was the best thing that ever happened to you. I loved her too, if the truth be known. You’re such an idiot! I have a mind to…”

“I mean I’ve broken the tip off my pen.”

“…Oh.”¹

Although we can easily identify when a conversation veers off course, we are often unaware of the inner workings of our mind that silently orchestrates the communication process. This post marks the inception of a brief series, aiming to illuminate these mechanisms through a distilled interpretation of Professor Gärdenfors’ “Geometry of Meaning.” This compelling collection of theories offers valuable insights into the nature of semantics and how our cognitive faculties process it. Exploring these subjects has sparked my curiosity and fascination, particularly in relation to the ultimate objective of developing automated systems capable of emulating our cognitive processes to solve problems on our behalf.

Layers of Knowledge

What is that knowledge and where does it come from? Knowledge stands for a set of justified believes, a core of interconnected ideas such that the causal relations among them give us a resemblance of truth. Information can help us to get knowledge but it is not the mere data - aka Big Data - sufficient to associate the mental affordances we need to function in the complex world. As affordance I intend the possibility of an action an entity can offer. We are continuously encountering strange situations that we have to figure out which support from the toolset of knowledge we should pick up for deciding on what to do next.
What we think, what we plan to do today or in the distant future is corroborated by what we know. We are the climbers of the ever growing mountin range of grasp we started to absorb even before we born. We are surfing the deep pack of knowledge we gathered about ourselves and the world.
The metaphor of the mountin as the knowledge, and the climber as the agent that seeks for new mastery, suggest us that agents are positioned at different altitudes - different granularity of knowledge - and then collected background, might be different. In the short dialog I used to open this post, Bob and Alice align themselves into a shared knowledge, but sometimes it is far from simple to establish apparently simple truths. Bob is ambiguous in his ‘I’ve broken off’; he does not specify what he has actually broken off with, and Alice, the interlocutor, wrongly infers he’s leaving his girlfriend. To create the basis of reciprocal understanding Bob pulls the communication level down and he raises Alice’s knowledge by precising that what is broken is the tip of his pen, and not his relationship.

I guess everybody agrees on the importance of sharing a common background of knowledge and Bob shows us we shift from a high semantic layer to a lower one to meet our interlocutor and restart the coordination from there. This admits that knowledge lies in hierarchical structures where the first rung of the ladder is populated with the fundamental and cognitively irreducible terms; they form new and more abstract concepts with increasing abstraction as long as we climb the ladder, in the upper layers. Simon Winter explain this idea of layered levels of knowledge analyzing how a mastercraft teaches a novice on how to replace a violin’s strings, where non-verbal communication is also part the game. He summarizes context levels into:

Level 3. Non verbal communication. People interact with minimal information exchange.
Level 2. Instruction. coordination of action is achieved by instruction.
Level 1. Coordination of the inner world. people inform each other so as to reach a richer or better coordination. It can also be achieved via questions.
Level 0. Meaning of words. Lowest level people negotiate the meanings of words and other basic communicative elements.

I don’t know whether there is a finite number of layers or it is open-ended, but on a high level of shared knowledge, the communication style would flow according to the ‘the obvious goes without saying’, where the implicit understandings are at their maximum. I’m also a bit skeptical of the lower bound of this ranking. If we take as granted that meaning of words can be decomposed into a finite set of conditions that are necessary and sufficient to describe the meaning, it is clashing with the level 0 where concepts are irreducible. If I open a vocabulary and every word - even the simplest one - is defined by means of other words, the case of circular loops in meaning references is inevitable, and inequivocal sign of inconsistency.

Communication as Coordination

Language and other forms of communication open the way to various types of engagement. Sophisticated collaborations can enable collective creation of value that can’t be done by single individuals; they’re fruits of communication. If we can summarise in a single word what communication stands for, I would recommend coordination. Coordination among interlocutors is a practical way to point at objects, places or actions, but also as alignment of intents, ideas, persuasions and even entertainments. More generically, coordination is a convergence of mental representations.

If coordination implies a transfer of information, it is expressed in several forms where language is the most rich and expressive one. In the dawn of humanity when the communicative acts became more varied and detached from the immediate and practical purposes, the value of meanings, or semantics, turned out to be more salient in communication. The coordination among participants is an iterative process where the meeting of minds ultimately converge in an alignment of meanings. Sounds almost romantic! As borrowed from maths, the metaphorical meeting point is also named fixpoint. A fixpoint is a value for which a function returns exactly the same value. It is a value with which a specific function works as an identity function. Let’s unfold how the process occurs in our dialog.

Before Bob speeches out with Alice we can assume he has his own imperscrutable beliefs that we can’t have access but he’s willing to express to his friend.
Bob pulls out of his linguistic toolset for transforming his ideas into verbal sentences.
Once Bob has expressed his view, Alice can transform such sentences into her inner mental representation.

More mechanically, Bob applies the expressive function f to his mental idea and the sentence is passed to Alice, then she applies g’ - the interpretative function - and acquires the meaning of that sentence, which is returned back to Bob as she has understood. Alice encodes her understanding into expression, then Bob applies f’- the inverse function (cofunctor) of f - to acquire Alice’s expression. If the idea generated in the round back coincides with the original one, the coordination has been successful. If the original and derived ideas don’t overlap, the alignment failed and Bob will attempt to correct Alice.

For it to be effective on how to fix Alice’s misunderstanding, the distance between Bob’s idea and what he derived from Alice’s feedback should be articulated enough to convey a rich sort of information. It should not just give a quantitative distance from the successful fixpoint, and also a qualitative account on how they differ from each other. This is why the discrepancy should be encoded as a vector in order to be also descriptive of the conversation’s status. This introduces the next topic of conceptual spaces, which poses the representation of semantic values in the between of symbolic AI and neural networks.

Continue the reading with conceptual space as neurosymbolic representation

http://fiftywordstories.com/2013/09/11/connell-wayne-regner-oh/ ↩

Question Answering on Knowledge Graphs for a Better Consumer Search

2022-09-26T00:00:00+02:00

The search box is an ubiquitous widget present in almost all websites, that’s because free text searching is one of the most intuitive actions users can do online. System’s duty is to return results that are pertinent to the inquiry, while discarding what it is not. If consumers are satisfied with what you present to them, there is a good chance that they will move forward to the next step. The importance of having a good search mechanism in eCommerce is all there. It may be the architect of your eStore’s success, or the main cause of its failure.

Search text usually consists of one or more keywords in sequence that distinguish the item with increasing accuracy, and it won’t be a surprise that utterances could be in an extremely simple form. Probably similar to the structure of our old ancestors’ early attempts me Tarzan you Jane, search requests might lack prepositions, adverbs, and in general they may not exhibit any kind of grammar. In an electronic eStore, a text such as mem disk 500gb flash drive would be then interpreted as a series of constraints that must be all simultaneously satisfied: mem disk AND 500gb AND flash drive. As result, it would then prompt us a list of external USB memory storage devices with 500Gb of capacity.

What if the above utterance will include different modalities on how to discriminate attributes? Take the sample mem disk flash drive with more than 500gb. I expect to see all USB sticks with at least 500Gb. Unfortunately, this is not the case. The result does not change from the previous query and we can conclude the search engine can’t handle simple inequalities such as more than or less than. If the limitation is not worrying enough, please pause reading and answer the following quiz: What do you think the query a Nikon cheaper than a canon EOS-1D will show you as results? Yes, you will have a mixed list of Nikons and EOS-1Ds together, which is not, indeed, what we meant. I think that it’s time to evolve search engines and align them to the intricacies of natural language.

The reason why in the title is mentioned question answering is because of the shift of paradigm we want to push forward on this new branch of research. Have you ever heard of an IBM agent in Jeopardy! quiz show? We treat user utterances as carriers of semantics that are closer to our natural language rather than a mere search for products. While this approach can help to make better search engines, it can be applied to digital assistants of various nature: IT support, meeting planners, analytics. But for now, let’s focus on search items in eCommerce.

The set of tools developed for business process modelling are extremely helpful and they are paving the way to innovative ideas that share the founding principles:

Compositionality: the meaning of a complex expression is determined by its constituent parts. Take for example the English language. There are 21 characters, they compose thousands of words, they compose an infinity of essays. We look at the constituent parts of logic edifices, then we assemble them to describe our business domains.
Generalisation: to maximise ROI and minimise data sets and specialisation efforts we are referring always to classes of problems, not to the single instances.
Interpretability: QA models are inspectable for auditing, modifiable via low-code tools and outcomes are justifiable. No black boxes in there.

Traditionally, the bulk of effort on setting up a proper search service is directed toward database indexation, corroborated by named entity recognition (NER) and information retrieval techniques. Moreover, QA systems present two faces, one dedicated to disentangling utterances’ logic, the other converts the semantic into an executable query that pick-up results from a given information system. The former is responsible to interpret what the user is texting, and it is invariant in the respect of the domain context, but dependent on the user language. The latter component, the one closer to databases, is specific for customer/domain of interest and data topology. This architecture allows great portability with customization effort reduced to the minimum.

This project does not come out of nowhere, it is rather the continuation of research on semantic technologies and symbolic reasoning applied to a fairly challenging problem. Honestly, I can’t imagine something more complex than language. As in any innovative endeavor we revise what we have done in the past and we try to improve from the foundations every time we face a new kind of problem, and this time, it won’t be different.

The world has the structure of the language, and the language has the form of mind. Eugenio Montale

I strongly believe that for attaining better solutions, problems should be tackled from several angles. If the symbolic approach provided by knowledge graphs and logic programming help to solve a variety of problems, a hybrid system – symbolic plus non-logic – can expand the reach of our technology. In the Prof Gärdenfors work Geometry of Meaning it is summarised more than 20 years of work in a fully comprehensive theory (philosophical, cognitive, neuroscientific) about Concept Spaces. They are computable geometrical representations with a lot of affinity with semantic algebra, but also with the interesting feature to be convex, which is a pillar of gradient-based algos. Basically, all modern machine learning.

Concept spaces describe a theory of meaning and how this theory is adhering to our cognitive model of learning, which I find extremely interesting. The human oriented basis of the theory is inspiring also for computer programs that attempt to solve complicated problems for us, like for example eCommerce search. This is one of the topics I will write more about, so if you are interested in that, stay tuned!

Argumentation in E-Commerce at SEMANTiCS 2022

2022-09-01T00:00:00+02:00

This experiment teaches something interesting about how we evaluate the information from the environment. Even without any significance, just the feeling of a reason can change dramatically how we behave. Most of our daily behaviour is accomplished without paying attention to the informative details. This is obviously not new - advertisers know it very well. But we are not doomed to be always fooled, we are conscious when the decisions are more important.

👉Join me at SEMANTiCS 22

If rather than 5 pages the experimenter asks to copy 20 pages, the results are very different. The more demanding request is evaluated thoroughly and the rate of success does not change between the two cases above. With a placebic reason or no reason at all, the success rate is equally low (20%). Instead, what works is a justification such as:
“Excuse me, I have 20 pages. May I use the xerox machine because I’m in a hurry?”
Admittedly, it is not a strong justification. But some compassionate colleagues let you pass ahead. This justified reason convinced more than double of the subjects, 40%. Is sensibility towards justifications something should be considered with online retail customers?

Argumentation in a nutshell

When we pose an explanation for changing a decision, we are giving an argument that supports or attacks a thesis. Differently from the copying machine experiment, argumentation is a bit more interactive than a single-shot justification. We are, as humans, designed to improve our view of the world through confrontation with others. Stuart Mill once said “Both teachers and learners go to sleep at their post as soon as there is no enemy in the field” to spot the point. We need to be challenged to defend good ideas or abandon the wrong ones. Argumentation involves a confrontation of clashing reasons and sometimes beliefs are inconsistent in the light of new objections, so we must give up some of them.

Argumentation is a fervid research topic and in its basic form, it is simple to explain. I would try here. Arguments can be represented as symbols; they are nodes in a graph of concepts and other arguments. An argument could be in two states: active or defeated. It is defeated when it is attacked by an active argument, while if it is attacked only by defeated ones, it is active. If none of the arguments is attacking, it is still active and can attack others. That’s not really complicated. I show you an example: the argument “Home Office Working” is usually attacked by “Employee Disconnection” argument, which is attacked by “Video Conferencing”. Graphs of arguments grow indefinitely, and within their simple status (active/defeated) they can generate complex dynamics. An activation may cause a cascading effect in a large portion of the network. The main claim can be obliterated by a far away, in the arguments’ graph, single argument.

Argumentation in online retail
Let’s talk about argumentation in eCommerce with a story. John is a professional photographer, and he needs to learn about an advanced camera. After adding to the cart an online course, the system detects a high risk of churning. John most probably won’t take that course, why? It’s difficult to guess, but we make suppositions by properly using the knowledge system which expands the catalogue products with a wider semantic network. A knowledge graph of concepts that the system can exploit for making sense of what’s going on with John and his reluctance to finalise the checkout.

According to the knowledge base, a learning course requires time, which might be an issue for a busy professionist. How does the system know whether this is the case? Semantic algebra is very helpful on that. But it is just an hypothesis among many others, the system needs validation. Suppose John confirms, we know he does not have time for it. The course is just 20mins long (a crash course indeed), that compared with others of its category is short. This is a counterargument that invalidates John’s misconception.
John is convinced about that, but there is another problem, his budget is very tight. But wait, this course is free of charge, therefore the ‘low budget’ argument is defeated, and the main claim (John attends the recommended online course) is active and valid.
I don’t know whether John has enrolled in that course, but the system attempted to change his mind. His misbeliefs have been neutralised by reasons the system has been able to scrutinise from the knowledge system.

We were also lucky, the arguments provided do not attack each other by forming loops in the graph. That would considerably complicate the argumentation. I give an example of what could happen. Think of three football teams, Italy, Germany and Brazil. Let’s assume Italy wins against Germany (not always btw), Germany beats Brazil, Brazil beats Italy. In the knockout stage, who will be the winner? Well, it depends on how the competition is set up. It is the Argumentation System that has to set up the proper strategy for achieving the desired goal. There are other cases where things get complicated. Let’s assume not all arguments are equally important and arguments are not true (or false) with the same intensity.

Giancarlo Frison

How would you make a logic programming system?

2022-04-27T00:00:00+02:00

I tried to investigate alternative ways to improve the programming experience (PX) considering several aspects of how users interact with the system. Those aspects include mode of interactions, language notation, data integration, function composability, service externability. Why don’t answer questions such as: how can we simplify the interaction with the system in order to build programs with desired behaviour? How can we share knowledge with stakeholders with different attitudes, other than pure technicals?

Unconventional PS have been explored by industries since the beginning of computer programming. Consider for example spreadsheets – largely used by non-professional programmers – where formulas create views on the data stored in the grid. Think of computable notebooks such as Jupyter for live-coding programming snippets with a numerical or graphical output. There is a plethora of PSs that have inspired the exploration in the topics of:

Low-Code Programming
Hybrid Programming Experience
Ambivalence Code/Data
Logic Programming

Low-code programming

If LC platforms reduce ROI and increase productivity by automating a narrow set of business specific problems, they hardly can fulfil the same promises in domains for which they have not been initially conceived. A mobile app builder might accelerate the development of a standardised set of use cases but when required features are not covered by those platforms, developers are forced to back into more traditional programming styles.

While it is desirable to clear out the verbosities and the irrelevant code sections that do not bring any value, the complexity of business problems is irreducible. A proper programming system gives full expressivity to skilled programmers for solving complex problems, and allows citizen developers to build simple applications autonomously. Our system should enable us to build complex algorithms but at the same time allow unskilled practitioners to, at least, understand how routines work, and possibly implement basic programs without involving development teams. It’s possible to be open on ‘hello world’ without sacrificing fundamental programming expressivity.

Let make easy things easy, difficult ones possible

Hybrid Programming Experience

The diversity of solutions for delivery software artefacts requires an unprecedented flexibility on how to integrate new practices into existing software development pipelines. The multi-stage process that brings programs from coding to live production might be highly customised and automated, but also, for diametral opposite needs of simplicity, it should offer fully managed and standardised lifecycles.

For lowering the barrier in already established architectures without sacrificing the core features, we should provide SDKs of the language of choice without interfering with the established development cycle. At the same time, it should be available on an on-demand offering such as Logic as a Service (LaaS). The online-only managed platform is intended for directly editing services, with near-instantaneous testing then publishing cloud API in a complete serveless fashion.

Ambivalence Code/Data

Business applications that do not treat stored information are a narrow niche, and for almost all business domains, software services manipulate stored data. Generic PLs incorporate data only as in-memory pointers. Stored data is retrieved through query frameworks which are very distinct pieces of software from the rest of the programming system. The dichotomy between data and computation does not contribute to the business value of services, it rather just makes it harder to think in terms of conceptual logic. An answer to this is by treating data as code and lowering the barrier for modelling logic. According to that, the function interface is not different from the data’s ones. Functions take parameters and return new information like a query in the DB. The difference between programs and data is that the former’s output is derived while data is passively given. Data and functions (predicates) are interchangeable entities.

Data stored in relational DBs is often mapped with object-oriented style, opening the way to a variety of difficulties due to the tendency to distort the nature of relational data into hierarchical classes. This is more evident when applied to loosely-structured property/graph DBs defined as knowledge graphs. We encourage modelling data in tuples as they represent the common denominator for almost all data representations. Tuples are abstract enough to convey what is necessary to implement the business logic while omitting the details of integrated information systems.

Logic Programming

Logic: “the study of correct reasoning, especially regarding making inferences.”

LP Languages have been the subject of computer science research since decades. Their founding principle was about caring for the logic of complex systems, a principle that has never been of secondary importance in software systems. The duality code/data in LP make it natural to use it like a database language since relational algebra can be expressed directly, including tabular relations, views and integrity constraints.

One of the pillars of LP is recursiveness for managing data structures such as graphs, trees, lists, and even natural numbers. Though it is easy to find similarities with functional programming, LP languages like Prolog offer an intuitive programming experience with the multi-directionality of computation where once the interface of a predicate is defined, it could be inquired in any possible way, thanks to unification algorithms.

Strong type system

Correctness and predictable results are facilitated by a compiled and strong typed system. Though it is not widely adopted in LP, nothing prevents the programming system to be backed by type-safe languages. Types are not only primitive strings, numbers or booleans but complex data types such as immutable named tuples commonly present in all programming languages.

The declarative nature of LP, combined with multi-directionality, relational algebra, homoiconicity and programming capabilities for a smooth programming experience, actually hide the complexity that resembles more a software system on its own rather than a mere programming language.