The secret chickens that run LLMs
· 16 min read
Humans often organize large, skilled groups to undertake complex projects and then bizarrely place incompetent people in charge. Large language models (LLMs) such as OpenAI GPT-4, Anthropic Claude, and Google Gemini carry on this proud tradition with my new favorite metaphor of who has the final say in writing the text they generate—a chicken.
There is now a sequel to this article, Secret LLM chickens II: Tuning the chicken, if you'd like to learn how and why the "chicken" can be customized.
Some key points I'll address here are:
- Modern LLMs are huge and incredibly sophisticated. However, for every word they generate, they have to hand their predictions over to a simple, random function to pick the actual word.
- This is because neural networks are deterministic, and without the inclusion of randomness, they would always produce the same output for any given prompt.
- These random functions that choose the word are no smarter than a chicken pecking at differently-sized piles of feed to choose the word.
- Without these "stochastic chickens," large language models wouldn't work due to problems with repetitiveness, lack of creativity, and contextual inappropriateness.
- It's nearly impossible to prove the originality or source of any specific piece of text generated by these models.
- The reliance on these "chickens" for text generation illustrates a fundamental difference between artificial intelligence and human cognition.
- LLMs can be viewed as either deterministic or stochastic depending on your point of view.
- The "stochastic chicken" isn't the same as the paradigm of the "stochastic parrot."