One misconception I come across fairly often is people seeing a model’s parameters (or weights) as long-term memory and their own provided context as short-term memory. It’s an understandable but flawed analogy, and knowing the very different roles parameters and context play is extremely important for all Large Language Model users.

The Flawed Memory Analogy

Outside of specific training operations (pretraining and fine-tuning), the parameters of a model are static and read-only. These parameters are learned statistical relationships in language represented by numerical values and are used to generate (predict) new text based on user input and probability. Parameters are read-only and static at inference; they do not function as a writable, queryable long-term memory.

That user input, or prompt (context), is a small temporary window of text, also converted to tokens prior to processing but with a major difference: Unlike parameters, which are derived from language patterns and not memorized text, the model does have full access to the user’s text verbatim within its context window.

The Four Components of an LLM at Inference

Let’s look at the four primary language components of a Large Language Model:

Parameters (primarily “weights”)

  • Static/read-only during use (inference)
  • Only updated during some form of training, such as pretraining and fine-tuning
  • Generalized Language Patterns, Not a Verbatim Database
  • Typically, billions or even trillions of parameters

Context (User Input)

  • Alterable by user/application
  • Could be user input (prompt), past conversation history (from application), and external data (from agents, Retrieval Augmented Generation, etc.)
  • Accessible by model unaltered and in its entirety (as tokenized text)
  • Relatively small windows (modern LLMs typically between 8K–128K tokens, with some reaching 1M+)
  • Temporary – typically discarded by model server immediately after generation (unless retained by the application layer)

Activations

  • Calculated relationships between parameters and context used to generate new output
  • “Working memory” for a single inference
  • Temporary – discarded by model server immediately after generation (activations = temporary computation state existing only during the forward pass; cleared after token generation)

Text Generation (Model Output)

  • Generated by model one token at a time and in a forward direction (feed-forward architecture)
  • Based on parameters (learned language patterns), user/application provided context, and probability
  • Temporary – discarded by model server immediately after generation, but running conversation may be maintained by application layer

Tool vs. Material in Action

If we ask most popular modern LLMs to recreate the first 6 paragraphs of Thomas Paine’s The Crisis (in the Public Domain), they will most likely do a fair job reaching up to 90% accuracy in terms of language and higher in terms of intended meaning. This is because most large models have seen several examples of this Public Domain text during training and are able to recreate much of it based on learned language patterns, which can include a form of statistical memorization (a form of overfitting to specific data).

However, this text is created via pattern recognition, not direct storage of verbatim text, which means predictive generation of text will not be completely accurate. If we provide most modern LLMs with the same first 6 paragraphs, the model will have visibility of the entire text verbatim as tokenized input.

We can now have the LLM perform operations on this text with less chance of error. For example, we could ask the LLM to rewrite the text in all capital letters. This is called in-context operation (we provide the model with data to work on along with the instructions).

Conclusion

In short, it’s more accurate to see an LLM and its parameters as a tool, and the context (data) provided by the user and application layers as material used by the tool to generate something new. (Researchers sometimes use “memory” metaphorically for weights or context, but this post uses the term strictly for writable, persistent state.)

Ultimately, the model’s parameters define its core capabilities as a tool for processing language, while the context provided serves as the material it uses for a given task.

Leave a comment

Dave Ziegler

I’m a full-stack AI/LLM practitioner and solutions architect with 30+ years enterprise IT, application development, consulting, and technical communication experience.

While I currently engage in LLM consulting, application development, integration, local deployments, and technical training, my focus is on AI safety, ethics, education, and industry transparency.

Open to opportunities in technical education, system design consultation, practical deployment guidance, model evaluation, red teaming/adversarial prompting, and technical communication.

My passion is bridging the gap between theory and practice by making complex systems comprehensible and actionable.

Founding Member, AI Mental Health Collective

Community Moderator / SME, The Human Line Project

Let’s connect

Discord: AightBits