Giancarlo Frison Discontinuities

First steps in Reinforcement Learning

Reinforcement learning covers a family of algorithms with the purpose of maximize a cumulative reward that an agent can obtain from an environment. It seems like training crows to collect cigarette butts in exchange for peanuts, or paraphrasing an old say, the carrot and stick metaphor for cold algorithms instead of living donkeys.

The agent and environment have not been emphasized vainly, they represent more concretely a vacuum cleaner sweeping your flat, an A/B testing engine for commerce or a driveless car in a crossroad. If you have heard about latest advances in the field, you would have came across of Deepmind’s AlphaZero, by which it is possible, with an affordable set of hardware, to build from scratch the best chess player in the world in just 4 hours.

“Difficulties strengthen the mind, as labor does the body.” ― Lucius Annaeus Seneca

Researches don’t lack of challenges in this field. The most important one comes from the intrinsic nature of RL learning process, which rely solely on the evaluations of its actions. Improvements are driven by just one signal, the reward.

Rewards are often very sparse.

They come after hundred or thousands of steps, exponentially increasing the combination of actions the agent must explore for finding a barely better sequence among of them. If that does not seem arduous, consider also the non-stationary nature of some environments, particularly common in dynamic scenarios where the learning phase resemble pursuing a moving target.

Non-stationary environments

Non-stationary means that the return of an action, performed in the precisely exact conditions of a past experience, might be different from what expected. This is particularly intuitive in the case of multi-agent scenario (MARL), in which the agent plays with one or more other agents that are learning too.

What distinguish RL from other optimization methods

While RL helps on creating agents that can autonomously take decisions, other algorithms attain this goal too, but with different working principles.

Supervised learning could be easily distinguished because it is trained with correct samples instead of vague rewards, making it simpler for loss functions to converge into an useful solution.

Mathematical optimization differs from RL in a more subtle way. Likewise RL, the simplex algorithm find solutions by iterating on optimization loops, but it works only on perfect information problems. When considering what it exactly means, let’s look at the knapsack or the traveling salesman problems.

All the necessary informations for elaborating the optimal solution are readily there.

In other words, there is no exploration. Conversely, an RL agent is like a probe on an heavenly body, where the assumptions on the environment are nearly absent. The agent needs to figure out autonomously the good and the bad actions only by the feedback from environment, the so called model free learning approach.

Genetic Programming is an evolutionary optimization method that share most of the characteristics of RL, it is iterative, suitable for imperfect information systems due to its stochastic explorative nature. Even the terminology is somehow related. For example, what is the objective function, in GP is named fitness function, just a polyseme. What differentiate it from RL is it’s evolutionary method, I briefly explained in this post.

Una Stella in più

“Se fossi la prima a morire
non lasciare che il dolore oscuri il tuo cielo.
Sii coraggiosa e modesta nel tuo lutto.
È un cambiamento, non un addio.
Così i morti vivono nei vivi
e tutte quelle piccole cose
raccolte nel viaggio della tua vita,
quelle parole semplici ed umili
dettate dal cuore di una mamma sofferente,
sono una ricchezza da conservare gelosamente.
Nel tuo cuore.”

Mirella Bobbo

Hai affrontato la morte più e più volte. Sei stata offesa nel corpo, condannata ad un letto, ad una carrozzina ed alla solitudine. Mi hai insegnato con parole umili e con la tua caparbietà a non mollare mai.

“La mia sofferenza è silenziosa dentro
dentro nell’anima.
Invece fuori parla
parla con il sorriso dell’amicizia,
e profuma di umili piante e fiori di campo.
La mia sofferenza non cerca la pietà
ma la forza d’animo.
E questa forza mi viene data da persone sofferenti.
Senti che il dolore se diviso
pian piano se ne va
lasciando un’infinita gioia e felicità.“

Mirella Bobbo

Inesorabile è stato il logorio del tempo contro cui a nessuno è concesso rivaleggiare. A poco a poco, ti ha tolto la forza, ma non ha mai spento la fiamma che avevi in te. Sei stata una leonessa, di dolcezza e sensibilità.

“Signore, dammi abbastanza lacrime
per mantenermi umana
i sorrisi per conservarmi ottimista
dammi le sconfitte per mantenermi umile
abbastanza successo per mantenermi fiduciosa
gli amici per fondermi coraggio
i ricordi per darmi conforto
abbastanza pazienza per sostenermi nell’attesa
la speranza per accompagnarmi nell’incertezza.
Aiutami a scoprire i tuoi messaggi
Nella realtà che vivo
triste realtà.
Ti prego, non rendermi mai esigente
da pretendere ciò che io vorrei
ma permettimi di ringraziarti per ciò che tu vorrai donarmi.“

Mirella Bobbo

Mi raccontavi di quando per poco, per davvero poco, sei sfuggita da un riposo senza ritorno, grazie alla tempestiva rianimazione dei medici. Ricordavi il momento di pace, il sollievo dai dolori lancinanti che avevi in testa. Per un attimo hai creduto di essere libera e serena. Per fortuna invece, ci hai regalato molto più tempo di quanto la vita voleva darti.

Sembrava quasi ti svegliassi a momenti, quando ti ho rivista ieri sera. Quel momento di pace e quiete è arrivato e domenica scorsa ti sei assopita ed hai lasciato questo mondo, in pace.

“Se mi sveglierò filo d’erba sarò felice
perché tutto il cielo sarà mio.
Se mi sveglierò alito di vento sarò felice
perché sentirete il mio respiro.
Se mi sveglierò pioggia sarò felice
perché disseterò la terra ed il grano crescerà orgoglioso
per il pane di ogni dí.
Se mi sveglierò in un grande giardino
coperto di fiori profumati, dove i bambini corrono felici
sarò felice
perché sarò finalmente in paradiso.“

Mirella Bobbo

Mi manca la tua gioia nel rivedermi. Il tuo sorriso sarà sempre con noi.

Ovunque tu sia, ti voglio bene mamma.

Program Induction and Synthesis at ICML 2018

The International Conference on Machine Learning ICML took place this year in Europe, in the beautiful city of Stockholm from 10th to 15th of July. This is one of the two premiere conferences (within NIPS) on Artificial Intelligence research, and the numbers indicate the magnitude of the event: 612 accepted papers out of 2473 submissions, 9 tutorial and 67 workshop sessions on the latest advances in all disciplines of machine learning. One of the most intriguing workshop was about machine intelligence capable of writing software code for complex procedural behavior.

see full article

First Steps on Evolutionary Systems

Nick Youngson CC BY-SA 3.0 Alpha Stock Images

Goal programming attempts to find solutions which possibly satisfy, otherwise violates minimally, a set of goals. It has been enjoyed in innumerable domains such as engineering, financing or resource allocation. Solutions may include optimal strategies to maximize, for example, a sale’s profit or, on the other hand, to minimize the cost of a purchase under an acceptable threshold.

An optimized plan could be blended as a program defined as an abstract syntax tree (AST):

see full article

The Basic Principles of Language

What is this exhilarating noise come out of my mouth when I talk? Not surely because that precise sequence of sounds, pops and squeezes are particularly melodic, but thanks to that palace of sophistications erected in favor of language, we can talk and afford a wide range of expressions. Since I began erratically to explore natural language processing I have been wondering how it comes out so natural for us, while it is extremely complicated from a computational perspective. What has caught my curiosity is the nature of language and its fundamental aspects that might have shaped the rudimentary ‘Me Tarzan, you Jane’, the sentence that paraphrases the earliest and the simplest level of language.

The difficulty of studying the evolution of language is that in its early forms the available evidences are sparse. Spoken languages don’t leave fossils. Moreover, all existing languages, including the far remote tribal ones, are already sophisticated. Contemporary ones have a lot of words, refined grammar structures and can express almost everything with a remarkable richness of details. Even in written human records collected so far, dating 5.000 years ago or so, things look almost the same like they are now. Linguists have studied how communication change over time and inferred how it could appear us when the first rudimental steps toward a language were adopted in the first place. What are the basic and fundamental aspects and principles of language that whether they would be taken away, the whole towering edifice of language would immediately collapse like a stack of cards? I would introduce them by a simple composition, which could not be taken as an example of eloquence, but nobody would find it difficult to understand:

I supermarket enter      basket bring      pick fresh fruit

I go cashier      pay cashier basket      bring bag      quit

As might be noticed, there are no grammatical elements (prepositions, conjunctions, adverbs, plurals, tenses, relative clauses, complement clauses) that glue and hold sentences together, nor any abstract term. Nonetheless, the proto-sentence remains comprehensible due to very few natural principles that arrange those words together. Those principles crystallized into our brain million of years before language was even conceived by our ancestors. The evolution wired those principles in our cortex for facilitating communication. The first lines of distinction in early languages came from the concrete world, such as actions and things and how to refer to them in space, the pointing words. The second principle refers to the sequentiality of events and and as one can correctly imagine this affect the ordering of words. The third is more about the economy of communication, by contextualizing meanings and references in the sentence.

Pointing words

Pointing words assist for referring or locating something in space. They are This, that, here, there and their reference depends on where the actors are. What is this for me could be that for you, due to the relative position of object and subject. Those referencing words are not simply compelling because children use them as an accompaniment to the pointing gesture, reinforcing the intimate link between physical world and mental representation in premature brains. Pointing words, oppositely to other grammatical terms, are not originated by anything else than pointing words. They are root and core concepts.

Things, actions

The sample text should help to inform that early languages were restricted to simple words, the ones involving only concrete entities in the here and now. Things and action distinction is also a part of what is social intelligence and the world representation which is common in other primates and this conceptual distinction was already there. Even metaphors, that count a large belonging among words of our dictionary, turns out of have concrete origins, they were evolved from elements of physical environment.

Order of words

Another basic principle of any language relies on a single strategy: the ordering of words. What belongs together in reality appears close also in the language and follows the same sequentiality. It is natural to describe an action as central word between two participants. Between the actor and the patient (whom the action is performed) the order is the ordinary mapping from reality to language. Consider for example the Caesar’s Principle: I came, I saw, I conquered (veni vidi vici). This saying was conferred to Julius Caesar after a victory. The order of words is clearly not accidental, it reflects the sequence of actions in the real world.


The third principle is concerned with repetition. What is already stated or it is not particularly important does not need to be iterated again. What could be understood and inferred from the context may be omitted in the sentence. This follow the principle of least effort, which is also applicable in language. Whether I would have written the story like this:

I supermarket enter      I bring basket      I pick fruit      I quit

the redundancy of the subject would be truly annoying, in any language. Have been invented several ways to keeping track of participants in the conversation, take by example pronouns.