What is the problem ?
Quality of Web sites and documents
Many authors, frequent updates
Information is redundant (in data and meta-data)
Proof-reading is very difficult
XML, DTDs, schemas give a syntax, not a semantics