Knowledge Base Construction

Content

Language Models have revolutionized natural language processing. Yet, they can say wrong things in a very convincing way — they hallucinate. One solution to this problem can come from structured data such as knowledge graphs, which can serve to correct and inform the model. In this class, we will see how to bridge the gap between natural language (the sentence “Elvis is alive”) and structured information (the statement alive(Elvis)). We will cover the technical steps of information extraction: named entity recognition, entity disambiguation, and fact extraction. For each of them, we will see different methods: fine-tuning language models, prompt engineering, and training-free procedures. Finally, we will talk about techniques for knowledge cleaning: link prediction, entity alignment and rule mining.

Grading

The grades are now available on Synapses. If you can't find your grade or you would like to know about its component grades, contact the lecturer!

The course is graded by 6 labs and a final exam.

Discussing assignments together is allowed, but each student must write their own solution. No sharing of code, plagiarism entails a grade of 0 for the lab/exam. Unless otherwise mentioned, labs are due before the next lecture.

Teachers:

Schedule

The class takes place at Telecom Paris on Monday mornings 9:00-12:15 starting from Monday 15th of September.

Introduction (15/09/2025, 0D19)
  1. Introduction to Knowledge Base Construction
  2. Knowledge Graphs
  3. Knowledge Representation
Named Entity Recognition and Classification (22/09/2025, 0D19)
  1. Continuation of Knowledge Representation
    (We covered all slides except Reification and Reasoning)
  2. Named Entity Recognition and Classification
    (Except Conditional Random Fields and Machine Learning)
Supplementary material: A quick refresher of Regular Expressions
Typing and Disambiguation (29/09/2025, 1D19)
  1. Entity Disambiguation
  2. Prompt Engineering
Fact extraction (06/10/2025, 1D19)
  1. Fact Extraction
    (Without semantic representations)
  2. Constrained Decoding

Lab: Fact extraction with constrained decoding. Resources.

Fact Extraction by Reasoning (13/10/2025, 1C33)
  1. Information Extraction by Reasoning
  2. Weighted MAX SAT

Lab: Weighted Max Sat

Rule Mining (20/10/2025, 1D19)
  1. Rule Mining (recording)
    (without bottom-up rule mining)

Lab: Rule mining

No course on 27/10/2025 (vacation) and on 03/11/2025 (travel)
Semantic Web (10/11/2025, 1C33)
  1. Semantic Web
    (Without OWL, SPARQL, RDFa)

Lab: Data cleaning

Exam (17/11/2025, 9:00-10:30, 0C01)
The exam is “closed-book”: no materials are allowed except for a pen.