Technical

Document preparation: the key performance factor in RAG systems

In a RAG project, 80% of final quality is decided before the model: in how sources are cleaned, segmented, and indexed.

Raphael Thys 25 Jun 2025 10 min read EN

Diagram of a RAG pipeline, from source document to user query

RAG demos are misleading: they work on clean, well-labelled corpora. In companies, documents are messy, contradictory, and poorly structured. This article describes what happens upstream, and why that is where quality is decided.

RAG does not fix your data

A RAG system (Retrieval-Augmented Generation) queries a knowledge base before generating an answer. That is powerful, but it inherits the quality of the base wholesale. A badly scanned PDF, a table exported without headers, an undetected duplicate: the model will faithfully reproduce the problem.

[Migration in progress - full article body to be brought across from the original Notion source.]

The preparation checklist

Cleaning - quality OCR, footer removal, encoding normalization.
Segmentation - semantic chunking, not mechanical slicing.
Metadata - date, source, status (current / outdated).
Deduplication - two versions of the same document means two contradictory votes.
Evaluation - a reference question-and-answer set before any production launch.

Keep reading

Diagram showing a context window filling up with conversation tokens

AI Literacy • 2 Oct 2025 • EN

When should you start a new AI chat, and when should you continue?

The context window is a finite resource. Knowing when to reset and when to carry on is one of the highest-leverage skills in AI literacy.

Magnifying glass over a paragraph of generated text

AI Literacy • 18 Sept 2025 • EN

How to spot a hallucination before it spots you

Five practical tells that an AI answer is fabricated, written for non-technical readers who want to trust their tools without being burned by them.

AI activation workshop, with a team gathered around a wall of sticky notes

AI Adoption • 12 Sept 2025 • EN

5 counter-intuitive lessons from AI training and coaching

Five lessons from supporting dozens of teams through AI activation: the opposite of what the market keeps promising.