RAPHAEL THYS
EN FR
Let's Talk
← All essays

AI Literacy

Not all languages are equal for AI

Tokenization is not neutral. French costs more tokens than English for the same idea, and that has direct consequences for cost, latency, and quality.

Raphael Thys 8 min read EN
Lire en français
Comparison of token counts across English, French, and Mandarin for the same sentence

A counter-intuitive but consequential finding: writing your prompts in English is often cheaper, faster, and slightly more accurate than writing them in French — even when the final output must be in French. This article explains why, and when it matters.

Tokens, not words

Models do not see words. They see tokens — sub-word fragments produced by a tokenizer trained mostly on English text. For English, the tokenizer is efficient: roughly one token per four characters. For French, German, or Spanish, that ratio is worse. For Mandarin or Korean, worse still.

[Migration in progress — full article body to be brought across from the original Notion source.]

What to do about it

  • For internal prompts, work in English when you can.
  • For client-facing output, write in French (or whatever target language).
  • When cost matters, run the prompt in English and translate the result.
  • When nuance matters, write in the target language and accept the surcharge.

Keep reading