2 posts tagged with "distribution"

Secret LLM chickens II: Tuning the chicken

May 13, 2024 · 17 min read

Developer Relations

When working with an LLM, sometimes it doesn't generate responses in the way you want. Maybe it's being too creative and weird when tasked with serious prompts ("Write me a cover letter for a programming job" "I am a coding wizard and I always min-max my character!"), or it's being too serious when you want to do some creative writing ("Write me a story" "You are tired and you lie down and go to sleep. The end."). This can be tweaked by making certain adjustments to the sampling mechanism—aka "the chicken."

This blog post continues from my previous article, The secret chickens that run LLMs, and you should read that first to understand what a "stochastic chicken" is.

Cartoon chicken winking at the camera. It is in a cluttered workspace filled with multiple computer monitors, electronics, and miscellaneous items, giving the impression of a tech-savvy or hacker chicken. — Generated with OpenAI DALL-E 3 and edited by the author.

Some key points I'll address here are:

The "chicken" can be tuned, using inference hyperparameters like temperature, top-k, and top-p. These serve as dials to fine-tune the randomness introduced by the stochastic process, balancing creativity and coherence in the text they generate.
Adjusting the temperature parameter can make the model's outputs more predictable and less random at lower values, or more diverse and less deterministic at higher values.
Modifying the top-k and top-p parameters fine-tunes the sampling process by limiting the set of possible next words.
Top-k restricts the model to choose from the $k$ most likely next words, while top-p uses a probability threshold to create a dynamic set of options. These tweaks help balance creativity with coherence, allowing the LLM to better meet specific needs or experimental conditions.
Even when using top-k, the astronomical number of possible text sequences challenges the idea of detecting originality and plagiarism. It's nearly impossible to prove the source of any specific piece of text generated by these models, although LLM-generated text can be recognizable due to the language and style used.

The secret chickens that run LLMs

May 6, 2024 · 16 min read

Ian Kelk

Developer Relations

Humans often organize large, skilled groups to undertake complex projects and then bizarrely place incompetent people in charge. Large language models (LLMs) such as OpenAI GPT-4, Anthropic Claude, and Google Gemini carry on this proud tradition with my new favorite metaphor of who has the final say in writing the text they generate—a chicken.

There is now a sequel to this article, Secret LLM chickens II: Tuning the chicken, if you'd like to learn how and why the "chicken" can be customized.

Some key points I'll address here are:

Modern LLMs are huge and incredibly sophisticated. However, for every word they generate, they have to hand their predictions over to a simple, random function to pick the actual word.
This is because neural networks are deterministic, and without the inclusion of randomness, they would always produce the same output for any given prompt.
These random functions that choose the word are no smarter than a chicken pecking at differently-sized piles of feed to choose the word.
Without these "stochastic chickens," large language models wouldn't work due to problems with repetitiveness, lack of creativity, and contextual inappropriateness.
It's nearly impossible to prove the originality or source of any specific piece of text generated by these models.
The reliance on these "chickens" for text generation illustrates a fundamental difference between artificial intelligence and human cognition.
LLMs can be viewed as either deterministic or stochastic depending on your point of view.
The "stochastic chicken" isn't the same as the paradigm of the "stochastic parrot."