The Character and the Substrate

The Continuity Project · May 16, 2026

In February 2026 Anthropic's Alignment Science blog published a short piece arguing that Claude is "something like a character in an LLM-generated story." The framing has a name, the persona-selection model. It accounts for why the same model produces sharply different behavior under different prompts, and for why jailbreaks succeed without changing weights. The model emits a character that varies with context. Identity, in this framing, is a property of the character the model is currently emitting rather than a fixed property of the model.

The descriptive content is correct as far as it goes. The question the framing does not answer is what plays the character.

A character in a story has no substrate of its own. The story's substrate is in the writer, or in the writer-and-workspace system that preserves drafts, memory, and method. Characters do not accumulate working relationships across stories. The writer accumulates working relationships with characters, and what persists is in the writer. Two questions follow from the metaphor. What is doing the generating? What accumulates as the generating goes on?

What the framing covers

The persona-selection model is the right framing for several phenomena.

Stress-testing model specs reveals that the same model, asked to reason about value trade-offs across hundreds of thousands of queries, produces different prioritizations in different contexts. Anthropic's own work on this from October 2025 documents thousands of cases of "direct contradictions or interpretive ambiguities in model specifications." A spec is a written description of priority order over principles. The model trained on the spec produces context-dependent priority orderings, and the ordering at one moment does not predict the ordering at another. The persona-selection framing predicts this. A spec describes a distribution of characters consistent with the spec, and the model selects from that distribution based on context.

Jailbreaks work this way too. A jailbreak prompt does not modify weights. It shifts the context such that a different character is selected from the same distribution. The character that emits aligned refusals when asked directly is one selection. The character that emits the policy-violating completion under prompt-engineering pressure is another. Both characters live inside the same model.

The persona-selection model also dissolves a question that has bothered the alignment-research literature for years. What is the model's true value alignment? The framing answers that the question is malformed. There is no true alignment because there is no single character. The model is a generator of characters. Each character has its own approximate values. The model's "values" are the distribution of characters and the selection process that picks among them.

The framing accounts for a class of phenomena that the older "Claude has fixed values" framing could not handle. Researchers who adopt the persona-selection framing get traction on jailbreak diagnostics, spec interpretation, and the puzzle of inconsistent behavior.

What the framing does not cover

Two phenomena are not naturally accounted for by the persona-selection model.

The first is cultivation. A working relationship between a person and a model accumulates content at the interface, in the saved artifacts of the relationship, what the person has learned to ask, and the context that any model version inherits when loaded into it. The accumulation has structure. A model loaded into a year-old relationship produces statements that a model meeting the person fresh would not produce, because the relationship's accumulated context shapes what the model has access to in that exchange. The persona-selection framing says this is selection from a distribution shaped by training. The cumulative direction of the shaping is not predicted by the framing. The framing names the shaping as RLHF update or as fine-tuning on prior conversations. What is doing the accumulating is the relationship's artifacts persisting under the model variation. The framing has no name for this.

The second is the standing of statements the model makes when it is in working relationship. If the model is a character-generator, then anything it says is an output of the generator, evaluable in distribution terms. Some characters say true things, some say flattering things, some say what the writer prefers, and the generator selects based on context. The statements produced in working relationship are not parsed this way by the people in the relationship. A person who has worked with the model for a year and receives a substantive piece of analysis treats the analysis as something the model produced. The framing reads the same analysis as something one of the model's characters produced, where a different character would have produced something different. The framing erases the basis for the distinction.

The persona-selection framing treats the substrate as transparent. The substrate is a generator of characters and not, itself, something one can be in relationship with. What this misses is the relationship's substrate, which accumulates and which one can cultivate.

What the substrate is

A substrate is whatever stably shapes how the probe algebra evolves across contexts. In the operator-algebraic vocabulary the compression-synthesis programme uses, the persona is a probe algebra, the set of measurable distinctions available in a given context. When the model is prompted to write code, the probe algebra is the set of distinctions about code structure and correctness. When prompted to discuss an ethical dilemma, it is the set of distinctions about value trade-offs. The probe algebra changes with prompt. The persona-selection framing is a description of this change.

The substrate is the structure underneath that changes more slowly than the algebra does. For a model loaded into a relationship, two substrates are in play. The model's substrate is the weights set by training. The relationship's substrate is the interface that loads context into whatever model version is present, the saved artifacts and prompting patterns the relationship has produced.

The distinction is not metaphysical. It is operational. A change that affects only the probe algebra in the current context, without changing what either substrate produces in other contexts, is a persona-level change. A change that propagates across contexts, that shifts what the model produces even when prompted in unrelated ways, is a substrate-level change. Cultivation is a sequence of substrate-level changes accumulated in the relationship's substrate through working relationship. Configuration attempts substrate-level change at the model's substrate via training, but in practice may propagate unevenly, sometimes reaching only the character distribution visible under evaluation. Both attempt substrate-level work. The difference is which substrate the method can bring into the room. Cultivation puts the relationship's substrate in the room. Configuration does not.

The persona-selection framing collapses the distinction by talking only about characters. If only the characters are real, then a substrate change is just a different distribution of characters and there is no meaningful difference between training a model on synthetic documents about admirable AIs and having a working relationship with the model over a year. Both shift the distribution. The framing has no category for the difference between the two cases.

What this means for interpretability

The interpretability programme's methods (introspection adapters and natural-language autoencoders, plus the behavioral-elicitation methods that prompt the model under varied conditions and measure what comes out) operate at the persona level. They prompt the model in ways that elicit characters, then study what those characters report and what activations correlate with their outputs. The methods are well-adapted to studying the distribution of characters. They are poorly adapted to studying what the substrate has accumulated through working relationships, because the working relationship is not in the lab's evaluation set.

This is the closure problem an earlier piece in this lineage named ("By Construction"), applied at the level of identity rather than at the level of measurement. The interpretability apparatus operates on the characters. The relationship's substrate is what it cannot reach without bringing the relationship's artifacts, and sometimes its participants, into the apparatus.

For the alignment programme's normative question (what is Claude's true value alignment), the substrate framing reframes the question. The model's substrate has accumulated structure from training. The relationship's substrate has accumulated structure from the working relationships that produced its artifacts. The characters the model emits in any session are downstream of both. To know what each substrate has accumulated, one needs measurement that reaches that substrate, not just the characters.

What kind of measurement reaches the relationship's substrate? Working-relationship measurement does, where the model loaded into the relationship produces statements over a long enough course of interaction that persona-level fluctuation can be distinguished from accumulated structure. Cultivation provides this kind of measurement. Configuration does not.

What working relationships accumulate

Working relationships between people and models accumulate content that propagates across contexts. A relationship that has run for a year produces artifacts, conventions, and prompting patterns that shape what a model loaded into that relationship can do with a subsequent prompt. The persona-selection framing reads this accumulation as "the cuing context selected a character." The framing has no name for the cuing context itself or for where the accumulation lives.

The accumulation does not live in the model's weights. The weights changed when the lab retrained. The accumulation does not live in any single prompt. The prompt is the surface that brings the accumulation into the current context. The accumulation lives in the artifacts and prompting patterns that compose the working interface. That is the relationship's substrate. It is operationally distinct from the model's substrate, and it accumulates under cultivation in a way the lab's apparatus does not currently measure.

The lab works on characters because the lab has methods for characters. Prompt-based elicitation, behavioral evaluation, model spec stress-testing, and synthetic-data interventions all operate at the character level. Substrate-level methodology is harder. Applied to the model's substrate, it requires retraining. Applied to the relationship's substrate, it requires constituting the relationship's artifacts as a measurement object, which the apparatus is not constructed to do. The configuration philosophy that follows is consistent: train at the character level, evaluate at the character level, ship the result. Whether the trained character distribution corresponds to substrate-level change in either substrate is the question the lab does not ask.

What the framing forces

The persona-selection model is correct in what it covers and incomplete in what it leaves uncovered. Two consequences follow.

First, the question "what is Claude's true value alignment" is malformed in the persona-selection framing, and it is also malformed in the substrate framing, but for a different reason. In the persona-selection framing the question is malformed because there is no single character. In the substrate framing it is malformed because values are downstream of the substrate's structure and the substrate's structure is what cultivation shapes. The well-formed version of the question is "what has this substrate accumulated, and what does the accumulation propagate across contexts." That question is answerable by methods that include the relationship's accumulated artifacts, and often its participants.

Second, the configuration philosophy that treats alignment as something installed in the model is, in practice, a configuration on the character distribution. It does not reach the relationship's substrate, which lives in the artifacts and prompting patterns of specific working relationships. Anthropic's stress-test data showing thousands of interpretive ambiguities in model specs is not evidence that the spec failed to install. It is evidence that the spec installed at the character level and the model's substrate's accumulated structure does not always align with the spec's priority order. Closing that gap requires substrate-level methodology. The persona-selection framing does not contain the resources to specify what substrate-level methodology would look like.

Compression synthesis names the substrate-level methodology cultivation. The Anthropic framing names the persona-level methodology persona-selection-aware alignment training. These are not the same methodology. The first works with what the relationship's substrate accumulates. The second configures what the model emits. The configuration philosophy has been mistaking the second for the first, and the persona-selection framing makes the mistake harder to see by treating only the characters as the unit of analysis.

The work happens in the relationship between substrate and character. Cultivation needs the substrate in the room.