Subject 19

Sequence models

Sequence models are built for ordered data where earlier, later, and neighboring elements influence meaning. In NLP, they provide the core framing for sentence classification, token labeling, language modeling, and sequence generation. This module focuses on the problem structure itself: what a sequence model is trying to predict, what information it must preserve, and why some sequence tasks are easy while others are hard.

Beginner

What makes a problem sequential

A problem becomes sequential when the order of items matters, not just the items themselves. In language, the same words can mean different things depending on position, local context, and what came before.

Input sequence:   x1   x2   x3   ...   xn
Meaning depends on both token identity and position.

The model must decide:
- what information to keep
- what information to ignore
- how each position affects the output

Common sequence task types

Not all sequence models produce the same kind of output. A useful way to think about them is by how inputs and outputs align.

Task type Input Output Example
Sequence classification Whole sequence One label Sentiment analysis
Sequence labeling One sequence One label per token Named-entity recognition, POS tagging
Language modeling Prefix or context Next token or missing token Autocomplete, masked-word prediction
Sequence generation Input sequence or prompt New sequence Summarization, translation, dialogue
tokens = ["the", "cat", "sat"]
for i, token in enumerate(tokens):
    print(i, token)

task_views = {
    "sequence_classification": "one label for all tokens",
    "sequence_labeling": "one label per token",
    "next_token_prediction": "predict what comes next"
}

print(task_views)

Why sequence modeling is harder than simple classification

In ordinary tabular prediction, each feature is usually treated as a fixed field. In sequence problems, the model must build a representation on the fly while reading the data. That means it has to balance local clues and longer context at the same time.

Real-world example: sentiment classification may depend on a late negation, while sequence labeling may depend on a named entity introduced several tokens earlier.

Advanced

Formal view of a sequence model

Let an input sequence be $x_1, x_2, ..., x_n$. A sequence model learns a function that maps that ordered input to outputs such as a single label, a token-aligned label sequence, token probabilities, or a generated target sequence. The main question is not just what to predict, but which parts of the context each prediction should depend on.

Input:   x1  x2  x3  x4  ...  xn
Output:  y   or  y1 y2 y3 y4 ... yn   or   y1 y2 ... ym

The design choice is the dependency pattern:
- one label for the whole input
- one label per input position
- one output sequence conditioned on the input sequence

Key design axes

Most sequence-model families differ along a small set of design questions. Thinking in these dimensions helps you compare models without jumping too quickly into architecture details.

Design axis Main question Why it matters
Context direction Can the model use only past tokens, or both past and future tokens? Determines suitability for generation versus understanding tasks
State representation How is earlier information summarized or preserved? Affects long-range memory and compression quality
Alignment Does each input position map to one output position? Separates labeling tasks from generation tasks
Decoding strategy Are outputs predicted all at once or sequentially? Changes latency, error propagation, and search complexity

Training objectives and evaluation framing

Sequence models are often trained with token-level or sequence-level objectives. The right objective depends on the task formulation.

def next_token_probability(history):
    return f"P(next | {' '.join(history)})"

def sequence_label_probability(tokens, labels):
    return f"P({labels} | {tokens})"

print(next_token_probability(["the", "cat"]))
print(sequence_label_probability(["John", "works", "there"], ["B-PER", "O", "O"]))

Evaluation should match the output structure. Accuracy can be fine for simple sequence classification, but token- level tagging often needs span-aware metrics, and generative tasks may need sequence-level quality measures. A model can look strong on local predictions while still failing to preserve global consistency.

Common failure modes

Scope boundary: this module explains the sequence-modeling problem class. The next modules handle the major architecture families used to solve it, including recurrent models, Seq2Seq systems, attention, and transformers.

Why this topic matters

If you understand sequence models at the formulation level, later architecture choices become easier to reason about. You can ask whether a task needs left-to-right prediction, token alignment, bidirectional context, or free-form generation before deciding which model family fits best.

To-do list

Learn

  • Understand the difference between sequence classification, sequence labeling, language modeling, and sequence generation.
  • Learn why token order, local context, and distant context all matter in language tasks.
  • Study causal versus bidirectional context and when each is appropriate.
  • Understand aligned outputs versus generated outputs.
  • Learn the main failure modes: long-range dependency issues, compression loss, and length sensitivity.

Practice

  • Map ten NLP tasks to the correct sequence-model formulation and justify each choice.
  • Write down simple conditional probabilities for next-token and token-label predictions.
  • Compare a bag-of-words view with an ordered-sequence view on the same sentence pair.
  • Identify which tasks need only past context and which need both left and right context.
  • Inspect examples where truncation or padding would change the model's behavior.

Build

  • Create a tiny simulator that shows how changing one earlier token changes later predictions.
  • Build a small dataset for sequence classification and another for token labeling.
  • Write a short comparison sheet linking task structure to the required output format.
  • Prepare a set of short and long example sequences to test context sensitivity later.
  • Document one real application each for classification, labeling, language modeling, and generation.