Artificial Intelligence and LLM : an Intro

One can expect that with respect to LLM, Artificial intelligence and ever expanding chat-bots the initial round of reactions are over. Its emergence - the ability to process and generate language and perform human tasks -- was greeted with surprise tinged with shock. The anxieties and fears it triggered chiefly hovered around how they may re-configure work and labour and risks and harms of deploying AI. They won't be allayed anytime soon. A promise, a lurking possibility to a nearby future for enhancing creativity and scaling productivity has ticked off anxieties and fears.

 Project LLM

At its core AI is the replication by software to match( surpass?) an unique ability humans have -- to learn, use and communicate with language. A decade back, tech enthusiasts would consider modelling and scaling language by software  as an insurmountable and impossible task? In terms of audacity and ambition, it is indeed a big project. Humans learn, use and communicate with language, they also build and navigate. AI also intends to replicate the skill-sets with which humans build and navigate the world. This posts focus, however is on the linguistic aspect -- LLM.

The fears, anxieties and concerns are also a reflection of what has been already delivered and should primarily be seen in this light?

Beyond the secrecy, lab training and funds that fueled emergence and expansion of large language models, its deployment in real world scenarios has grown and spread rapidly. LLM has not followed the usual trajectory and the mode of selective, gradual beta releases, specs and leaks. Given this trajectory, the proverbial black-box was invoked and questions were raised? How this ability was achieved is tied to what may come next. This issue raises a distinct set of questions and they stand on their own. During this year and the last more information has been made available about how LLM works and what has been fed to its model.

Emergence of LLM and available info

Understanding LLM and how it works are indicative links to research or serve as a glossary of terms and the processes that have gone into it. While more information and research is always welcome, a lot more needs to be clarified and explained in terms that are available to willing learners, readers and critics. Take for example the abbreviation NLP that is used in the article(first link, this paragraph) and many other journal articles and technical conversations about LLM. NLP stands both for Natural language processing and also Neural language processing. The neural functions and abilities are natural, not artificial, but they aren't the same. 

Reporting and sharing the results of experiments and how research is conducted are distinct activities of scientists. What assumptions underlie well documented demonstrated replicable experiments and how scientists build them?

The activity areas of the brain when language is processed or spoken and how neurons trigger, signal and 'fire' other neurons has been subjects of research for many years. LLM is built on these insights and knowledge?

Take for example the recent report released by google about deciphering language processing in the human brain through llm representations. The report begins by saying "theoretically, large language models (LLM) and symbolic psycholinguistic models of human language provide a fundamentally different computational framework for coding natural language" and finds that "The study revealed a remarkable alignment between the neural activity in the human brain's speech areas and the model's speech embeddings and between the neural activity in the brain’s language area and the model's language embeddings" 

From the google report, it is unclear whether LLM has followed symbolic psycholinguistic model or not. Are the speech and language embeddings of LLM distinct? And the large language model is modeled on what? A lot more clarity in reporting is the need of hour, especially as it is about replicating the natural linguistic abilities through artificial sequencing and pattern recognition.

One gets many impressions about LLM but the black box tag persists. The artificial, sequential and replicable coding of LLM is an incidental discovery, accidental assembly or intuitive mesh of insights? Or have they been put together by modelling available knowledge about how natural language works? The brain areas that are involved(engaged?) when a human processes or actively uses linguistic abilities -- the biological basis, cortical structure and neural networks and mesh-- has been subject of research for many fields and for many, many years. How the embeddings and representations of LLM relate to Natural language processing or neural language process needs more explication and clarification. And reporting research findings cannot be an exception.

LLM's also process and generate images? Does it follow the logic, track and processing of 'brain areas' that goes into processing, cognition and generation of language?

Science, achievements and LLM

In the later part, this post would attempt to place LLM among other scientific achievements and the promises they harboured/contained. It would also attempt to explain evolution and emergence of LLM in terms that are relatively accessible. Working with and not around the limitaions of the author of this post.

Two decades back it was widely reported that some scientists( neuro and medical sciences) have identified happiness and sadness and offered us a list of  neurochemicals involved. It wasn't clear how the production of desired or elimination of the identified chemical can be achieved. The DOSE list research added that production or elimination of neurochemicals depends on genetic factors. Genetic factors, not just the neurochemical, can contribute 30 to 50% of one being happy. 

It turned out that neither the neurochemical or the process leading to the neurochemical's presence was unique. A signal(sign) accompanies a neurochemical or hints that a process is on?  Or is it a trigger not a signal toward the neurochemical? Signal a vibration(or a set of vibrations) or an electric charge? How do we place LLM among these questions?

The inner working of an emerging field is best known to its insiders. Within academics the space for esoteric conversations and the critic is theoretically never closed. How LLM has emerged and works needs to be explained to more people beyond the insiders. It is about people, and it impacts human beings immediately and directly. Additionally, LLM depends on reception and feedback. It has learnt, intends and will learn form human interactions and how they use language.

Going by scale, complexity and synergies between fields, Large Language Models comparable to the Human Genome project? Also in terms of architectural design? Computational framework of sequencing and pattern recognition to map, measure and size the genetic structure. Genetic material - genes- had millions to billions variables and parameters.

How LLM works

LLM has scaled, sequenced and replicated human language, how? The examples used are illustrative. A language's dictionary may have 30000 words and only about 18000 words are in use. For the computational ability of computers a word that is usually followed or preceded by only 40 words is a data set. A sentence with 40 words would be large and is uncommon.  Language has an order, its grammar lists the operating rules. A random combination of words of a language is probable in statistics but can be meaningless in that language. For example, the first word of 4 sentences in the first paragraph is possible but means nothing. So the Large language model builders have filtered them out, how?

By relying on how words are actually used in a language. The actual use and how frequently combinations of  words occur and its distribution is the large data set that feeds large linguistic models. A modern and advanced computer of today can compute billions of parameters and factor millions of variables. Fed by most used combinations and bits about their semantic, syntax and syntactic structure it became possible for a computer model to learn, retrieve and respond in a language; through pattern recognition and learning and it has been achieved. Co-occurrence of words and semantic projections structure the hierarchy of pre-trained to transformers and finally to generation of language. They have to seek  a balance between most used words and rare words and this too is a work in progress.

Computer programs working with Artificial neural networks(ANN) simulate and mimic NLP. They do so with embeddings(three dimensional dynamic space-map of a language) and representations(locations and connections of words and sentences in a language). The space-map acts as a field with embeddings and representations and it mimics/simulates the dynamics of a language. Through embeddings and representations large language models comprehend, process, retrieve and generate a language. BERT, GloVe, CNN's, SVD are methods and processes of computer programs/models to mimic, simulate, process, retrieve and generate language, image and video. The Word2Vec variants include CBOW, Skip-gram and N-gram. Weight of n gram and continuous batch of words guide, anchor and orient( vectors to tensors?) the trained transformer model to learn and respond in a language.😇 Figured bits(bytes) of machine learning, I don't know. Machine learning to deep learning is a deep dive in a loop with new layers and areas -- go figure! < revise or delete this para> 🤔

AI adds generation to automation through tech. AI learns through simulation and mimics NLP and retrieves and generates language. However, Artificial intelligence is not working with biological or genetic material. LLM works with ANN and computer applications.

For a longer and detailed presentation of the long, arduous journey from mathematical random probabilities of words and their orders in language to tapping neural networks to train models through embeddings, read stephenwolfram.

Issues and path forward?

The development and emergence of LLM leans on and benefits from the advancement in various fields and acknowledgement of this aspect is deficient. The surreptitious, invasive and beyond legal acquisition of existing knowledge and expertise to feed and market a model is also an issue. One must remind the readers, again, that the most celebrated language theorist Noam Chomsky dubbed ChatGPT 'basically as high-tech plagiarism' and 'a way of avoiding learning'? For Chomsky 'the predictions of machine learning systems will always be superficial and dubious' as the model is based on pattern recognition from statistical probabilities of words occurring together. Through activation of soft max function normalised distribution of words in a language is obtained.

Modal has affinities with,'of or relating to modality () in logic', 'containing provisions as to the mode of procedure or the manner of taking effect', 'of or relating to structure as opposed to substance', 'of, relating to, or constituting a grammatical form or category characteristically indicating predication of an action or state in some manner other than as a simple fact', 'of or relating to a statistical mode' according to Merriam Webster dictionary. If the discussion preceding this paragraph is correct, the M in LLM is closer to MODAL, not model?

Large language models draw from linguistics and neurosciences and rely on mathematics. This emerging field, though lies at the intersection and overlapping field of computer science, electronics and physics.

The 'artificial' component of intelligence through replicable sequencing of natural language, the jump from predictive-transformer-generation has been achieved. The model can generate language on prompt - a conversational tone designed on Q&A format. It depends on billions of parameters and data-sets we are told, is this the end of story or hints toward a work in progress?

Note: Anthropic claude, ChatGPT, Deepseek, Gemini, Le chat Mistral, Hugging face with user interfaces are consumer products of LLM. Familiarity without overly relying on expertise or domain knowledge is a must. Seek suggestions, feedback, comments and criticism from readers and I thank you in advance.

# In the world of internet, this blog post will be found in how AI processes it: form, organisation, presentation, style and tone.

Comments

Popular posts from this blog

The Artifice, Hype and Artificial Intelligence

Geo-strategic flux and Iran-Israel conflict

Weather Prediction and Climate Change