What is this exhilarating noise come out of my mouth when I talk? Not surely because that precise sequence of sounds, pops and squeezes are particularly melodic, but thanks to that palace of sophistications erected in favor of language, we can talk and afford a wide range of expressions. Since I began erratically to explore natural language processing I have been wondering how it comes out so natural for us, while it is extremely complicated from a computational perspective. What has caught my curiosity is the nature of language and its fundamental aspects that might have shaped the rudimentary ‘Me Tarzan, you Jane’, the sentence that paraphrases the earliest and the simplest level of language.
The difficulty of studying the evolution of language is that in its early forms the available evidences are sparse. Spoken languages don’t leave fossils. Moreover, all existing languages, including the far remote tribal ones, are already sophisticated. Contemporary ones have a lot of words, refined grammar structures and can express almost everything with a remarkable richness of details. Even in written human records collected so far, dating 5.000 years ago or so, things look almost the same like they are now. Linguists have studied how communication change over time and inferred how it could appear us when the first rudimental steps toward a language were adopted in the first place. What are the basic and fundamental aspects and principles of language that whether they would be taken away, the whole towering edifice of language would immediately collapse like a stack of cards? I would introduce them by a simple composition, which could not be taken as an example of eloquence, but nobody would find it difficult to understand:
I supermarket enter basket bring pick fresh fruit
I go cashier pay cashier basket bring bag quit
As might be noticed, there are no grammatical elements (prepositions, conjunctions, adverbs, plurals, tenses, relative clauses, complement clauses) that glue and hold sentences together, nor any abstract term. Nonetheless, the proto-sentence remains comprehensible due to very few natural principles that arrange those words together. Those principles crystallized into our brain million of years before language was even conceived by our ancestors. The evolution wired those principles in our cortex for facilitating communication. The first lines of distinction in early languages came from the concrete world, such as actions and things and how to refer to them in space, the pointing words. The second principle refers to the sequentiality of events and and as one can correctly imagine this affect the ordering of words. The third is more about the economy of communication, by contextualizing meanings and references in the sentence.
Pointing words assist for referring or locating something in space. They are This, that, here, there and their reference depends on where the actors are. What is this for me could be that for you, due to the relative position of object and subject. Those referencing words are not simply compelling because children use them as an accompaniment to the pointing gesture, reinforcing the intimate link between physical world and mental representation in premature brains. Pointing words, oppositely to other grammatical terms, are not originated by anything else than pointing words. They are root and core concepts.
The sample text should help to inform that early languages were restricted to simple words, the ones involving only concrete entities in the here and now. Things and action distinction is also a part of what is social intelligence and the world representation which is common in other primates and this conceptual distinction was already there. Even metaphors, that count a large belonging among words of our dictionary, turns out of have concrete origins, they were evolved from elements of physical environment.
Order of words
Another basic principle of any language relies on a single strategy: the ordering of words. What belongs together in reality appears close also in the language and follows the same sequentiality. It is natural to describe an action as central word between two participants. Between the actor and the patient (whom the action is performed) the order is the ordinary mapping from reality to language. Consider for example the Caesar’s Principle: I came, I saw, I conquered (veni vidi vici). This saying was conferred to Julius Caesar after a victory. The order of words is clearly not accidental, it reflects the sequence of actions in the real world.
The third principle is concerned with repetition. What is already stated or it is not particularly important does not need to be iterated again. What could be understood and inferred from the context may be omitted in the sentence. This follow the principle of least effort, which is also applicable in language. Whether I would have written the story like this:
I supermarket enter I bring basket I pick fruit I quit
the redundancy of the subject would be truly annoying, in any language. Have been invented several ways to keeping track of participants in the conversation, take by example pronouns.