Subject 24

Machine translation

Machine translation converts text from one language to another while preserving meaning, fluency, and context. It has driven major advances from phrase-based systems to attention and transformers.

Beginner

Translation is harder than replacing words one by one. Good translation preserves meaning, style, grammar, and sometimes cultural or domain-specific intent.

Real-world example: translating legal or medical content requires more than fluency; terminology must be precise and consistent.

source = "The bank is closed."
possible_meanings = ["financial institution", "river bank"]
print(source, possible_meanings)

Advanced

Machine translation systems evolved from statistical phrase-based models to neural Seq2Seq and then transformers. Engineering concerns include domain adaptation, terminology constraints, low-resource languages, document-level context, and evaluation quality beyond surface overlap metrics.

Source text -> tokenize -> encoder -> decoder -> beam search / decoding -> target text -> evaluation
terminology = {"claim": "reclamation", "policy": "politique"}
print(terminology)

Translation is a good case study because it exposes alignment, attention, decoding, and evaluation issues in one task.

To-do list

Learn

  • Understand why translation needs context and alignment.
  • Learn the historical progression from phrase-based MT to transformers.
  • Study low-resource translation challenges.
  • Understand terminology control and document-level consistency.

Practice

  • Inspect translations where literal word substitution fails.
  • Compare outputs from a generic model and a domain-adapted system.
  • Evaluate short translations with BLEU and human judgment.
  • Test ambiguous source sentences and analyze disambiguation failures.

Build

  • Create a simple translation demo using a pretrained model.
  • Add a glossary or terminology constraint mechanism.
  • Build an evaluation notebook comparing outputs across domains.
  • Write notes on where translation quality breaks down.