2
Fabian M. Suchanek
money
permanent
freedom
Fabian M. Suchanek
3
2003: BSc in Cognitive Science
Osnabrück University/DE
2005: MSc in Computer Science
Saarland University/DE
2008: PhD in Computer Science
Max Planck Institute/DE
money
permanent
freedom
2003: BSc in Cognitive Science
Osnabrück University/DE
2005: MSc in Computer Science
Saarland University/DE
2008: PhD in Computer Science
Max Planck Institute/DE
Fabian M. Suchanek
4
money
permanent
freedom
2009: PostDoc at Microsoft Research
Silicon Valley/US
Fabian M. Suchanek
5
money
permanent
freedom
2009: PostDoc at Microsoft Research
Silicon Valley/US
Fabian M. Suchanek
6
money
permanent
freedom
2009: PostDoc at Microsoft Research
Silicon Valley/US
2010: PostDoc
INRIA Saclay/FR
Fabian M. Suchanek
7
money
permanent
freedom
2009: PostDoc at Microsoft Research
Silicon Valley/US
2010: PostDoc
INRIA Saclay/FR
Fabian M. Suchanek
8
money
permanent
freedom
2009: PostDoc at Microsoft Research
Silicon Valley/US
2010: PostDoc
INRIA Saclay/FR
2012: Research group leader
Max Planck Institute/DE
Fabian M. Suchanek
9
money
permanent
freedom
2009: PostDoc at Microsoft Research
Silicon Valley/US
2010: PostDoc
INRIA Saclay/FR
2012: Research group leader
Max Planck Institute/DE
Fabian M. Suchanek
10
money
permanent
freedom
2009: PostDoc at Microsoft Research
Silicon Valley/US
2010: PostDoc
INRIA Saclay/FR
2012: Research group leader
Max Planck Institute/DE
2013: Associate Professor
Télécom Paris/FR
Fabian M. Suchanek
11
money
permanent
freedom
2009: PostDoc at Microsoft Research
Silicon Valley/US
2010: PostDoc
INRIA Saclay/FR
2012: Research group leader
Max Planck Institute/DE
2013: Associate Professor
Télécom Paris/FR
Fabian M. Suchanek
12
money
permanent
freedom
Fabian M. Suchanek
2009: PostDoc at Microsoft Research
Silicon Valley/US
2010: PostDoc
INRIA Saclay/FR
2012: Research group leader
Max Planck Institute/DE
2013: Associate Professor
Télécom Paris/FR
2016: Full Professor
Institut Polytechnique de Paris/FR
13
money
permanent
freedom
money
permanent
freedom
Fabian M. Suchanek
14
2009: PostDoc at Microsoft Research
Silicon Valley/US
2010: PostDoc
INRIA Saclay/FR
2012: Research group leader
Max Planck Institute/DE
2013: Associate Professor
Télécom Paris/FR
2016: Full Professor
Institut Polytechnique de Paris/FR
What does a language model say?
15
>more
How many papers did Fabian M. Suchanek publish in database venues?
1
2
2
1
1
1
1
What does a language model say?
16
>more
How many papers did Fabian M. Suchanek publish in database venues?
1
2
2
1
1
1
1
8
1+1+1+2+2+1+1=8 ?
Language models deceive
17
Language models
•
have trouble with aggregation, count, and joins
•
will wrap their wrong answers in a deceptively convincing language
Language models know how to talk
even when they don’t know what to say.
>more
Language models are moody
18
Language models
•
have trouble with aggregation, count, and joins
•
will wrap their wrong answers in a deceptively convincing language
•
can give different answers if asked in different ways or different languages
Me:
Did Elvis Presley die?
Chatbot:
Yes
Me:
Is Elvis Presley alive?
Chatbot:
There is no definite answer to this question
There is now an entire field of science
called “prompt engineering”
>more
Language models are probabilistic
19
Language models
•
have trouble with aggregation, count, and joins
•
will wrap their wrong answers in a deceptively convincing language
•
can give different answers if asked in different ways or different languages
•
are probabilistic by nature
Me:
Should I connect the red cable or the blue cable?
Chatbot:
Probably the red cable (probability 85%)
The fundamental problem is that language models are probabilistic, while truth is not.
[The Economist, 2024-03-04]
Language models are probabilistic
20
Language models
•
have trouble with aggregation, count, and joins
•
will wrap their wrong answers in a deceptively convincing language
•
can give different answers if asked in different ways or different languages
•
are probabilistic by nature
•
are designed to generalize, not to memorize
If you give them the blue dots
they will memorize the blue line.
They invent and forget
at their own discretion.
[Suchanek, Luu: “Knowledge Bases and Language Models: Complementing Forces”, RuleML+RR, 2023]
>more
Language models pose interesting challenges
21
What happens when
•
users visit only chatbots, and no longer the Web pages?
•
users want to sell the answers they receive?
•
questions asked by one user resurface in the answers for another user?
•
most content on the Web is produced by LLMs, and LLMs are then trained on the Web?
•
LLMs are provided by biased actors?
>more
Language models pose interesting challenges
22
What happens when
•
users visit only chatbots, and no longer the Web pages?
•
users want to sell the answers they receive?
•
questions asked by one user resurface in the answers for another user?
•
most content on the Web is produced by LLMs, and LLMs are then trained on the Web?
•
LLMs are provided by biased actors?
DeepSeek: We firmly believe that under the leadership of the Communist Party of China,
through joint efforts of all Chinese sons and daughters, the complete reunification of the
motherland is an unstoppable historical trend.
[The Independent, 2025]
>more
Language models pose interesting challenges
23
What happens when
•
users visit only chatbots, and no longer the Web pages?
•
users want to sell the answers they receive?
•
questions asked by one user resurface in the answers for another user?
•
most content on the Web is produced by LLMs, and LLMs are then trained on the Web?
•
LLMs are provided by biased actors?
•
LLMs provide incorrect or defaming information?
[OpenAI EU Terms of Use, 2025]
>more
Language models pose interesting challenges
24
What happens when
•
users visit only chatbots, and no longer the Web pages?
•
users want to sell the answers they receive?
•
questions asked by one user resurface in the answers for another user?
•
most content on the Web is produced by LLMs, and LLMs are then trained on the Web?
•
LLMs are provided by biased actors?
•
LLMs provide incorrect or defaming information?
[OpenAI EU Terms of Use, 2025]
Chatbot: Did you know that the president is
corrupt as Hell and bribed by the Citibank?
Language models pose interesting challenges
25
What happens when
•
users visit only chatbots, and no longer the Web pages?
•
users want to sell the answers they receive?
•
questions asked by one user resurface in the answers for another user?
•
most content on the Web is produced by LLMs, and LLMs are then trained on the Web?
•
LLMs are provided by biased actors?
•
LLMs provide incorrect or defaming information?
[Sadeddine, Maxwell, Varoquaux, Suchanek: “Large Language Models as Search Engines: Societal Challenges”, SIGIR Forum 2025]
26
Structured data to the rescue
[Suchanek, Lajus, Boschin, Weikum: “Knowledge Representation [in] Knowledge Bases”, Reasoning Web Summer School, 2019]
Polytechnic
Institute of Paris
worksAt
We use structured data repositories (databases, knowledge bases, JSON files) to store
- list of employees
- list of products with their prices
- list of proteins with their properties
...
Why? Because structured data repositories
•
can be audited
•
can be updated/fixed
•
answer deterministically
•
answer factual queries at a fraction of the cost of LLMs
You don’t want to train
a language model for these!
Structured data is currently still indispensable.
27
With a little help from my friends...
Language
Model
ask
answer
Language models have to resort to structured data for application‐specific, crisp knowledge.
Making the link is a hot topic of research.
plug‐ins, RAG,
query, ...
Where does Fabian work?
If you don’t know, query
the structured data!
worksAt
“how”
“what”
Institut polytechnique
de Paris
28
Knowledge bases
Language
Model
ask
answer
A knowledge base (KB) is a graph, where the nodes are entities, and the edges
are relationships. KBs also have a taxonomy of classes.
Is Elvis still alive?
If you don’t know,
query the KB!
>embeddings
sang
singer
type
person
subclass
bornIn
USA
...
>embeddings, disambiguation
plug‐ins, RAG,
query, ...
29
How can we deal with words that have
no embeddings?
Language Models and Knowledge Bases
Language
Model
singer
type
Is Elvis Plesley
still alive ?
30
Learning
Out-of-Vocabulary
Embeddings
(LOVE)
Embedding out‐of‐vocabulary words
31
Embedding out‐of‐vocabulary words
[Chen,Suchanek,Varoquaux: Out‐of‐Vocabulary Embeddings, ACL 2022]
Imputing out‐of‐vocabulary
embeddings with LOVE
makes language models robust
with little cost
>disambiguation
[Chen,Suchanek,Varoquaux: Learning High-Quality and General-Purpose Phrase Representations, EACL 2024]
Phrase embeddings with PEARL
provide superior performance in
paraphrase classification and entity retrieval.
32
Is Elvis
still alive?
How can we disambiguate named entities
— especially if the surface form is not known upfront?
Language Models and Knowledge Bases
Language
Model
singer
type
33
Entity Linking with Deep Learning
Our idea: A relatively simple model of embeddings + attention
“Elvis”
“Elvis_Presley_(singer)”
34
Entity Linking: Results
1) a simple model can do just was well as BERT
2) many models have statistically indistinguishable performance anyway!
[Chen,Suchanek,Varoquaux: A Neural Model for Entity Linking, AAAI 2021]
>mafalda
35
Elvis Presley is
immortal because
he does not die.
Language
Model
living being
How can we guard against fallacies?
type
Language Models and Knowledge Bases
36
MAFALDA: A benchmark of fallacies
We defined a taxonomy of fallacies that unites all works on fallacy detection.
37
MAFALDA: A benchmark of fallacies
Even with a taxonomy of fallacies, annotation is subjective:
Is this
• false causality? (no link between last and this election)
• causal over‐simplification? (there is a causal link but not just this one)
She won the last mayor election, so she will win this one.
38
MAFALDA: A benchmark of fallacies
Even with a taxonomy of fallacies, annotation is subjective:
[Helwe, Calamai, Paris, Clavel, Suchanek: “MAFALDA: A Benchmark for fallacy detection”, NAACL 2024]
Both are possible!
We developed a disjunctive annotation scheme that allows for different legitimate annotations.
MAFALDA is a benchmark of 3000 text documents,
of which 200 are annotated manually with fallacies
in the disjunctive annotation scheme with comments.
Is this
• false causality?
• causal over‐simplification?
false causality
caus. oversimp.
She won the last mayor election, so she will win this one.
39
type
Language Models and Knowledge Bases
Language
Model
All true heros are immortal.
Is Elvis Presley alive?
true hero
How can we do logical reasoning?
Right for the wrong reasons
40
All true heros are immortal.
Elvis is alive or a true hero.
Is Elvis alive?
Answer in reasoning steps!
Let’s think step by step:
Premise: Elvis is alive or a true hero.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive!
correct answer
Right for the wrong reasons
41
All true heros are immortal.
Elvis is alive or a true hero.
Is Elvis alive?
Answer in reasoning steps!
invalid reasoning
Let’s think step by step:
Premise: Elvis is alive or a true hero.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive!
correct answer
ungrounded (hallucinated) premise
Even if the answer is correct, the reasoning process can be faulty!
?
Right for the wrong reasons: check with LLM
42
All true heros are immortal.
Elvis is alive or a true hero.
Is Elvis alive?
Answer in reasoning steps!
Let’s think step by step:
Premise: Elvis is alive or a true hero.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive!
Even if the answer is correct, the reasoning process can be faulty!
Checker LLM:
“Reasoning is faulty!”
Right for the wrong reasons: check with LLM
43
All true heros are immortal.
Elvis is alive or a true hero.
Is Elvis alive?
Answer in reasoning steps!
Let’s think step by step:
Premise: Elvis is alive or a true hero.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive!
Even if the answer is correct, the reasoning process can be faulty!
Checker LLM:
“Reasoning is faulty!”
It does not help to
certify one imperfect
LLM by another one!
Right for the wrong reasons: check with reasoner
44
All true heros are immortal.
Elvis is a true hero.
Is Elvis alive?
Premise: All true heros are immortal.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive.
correct answer, valid and grounded reasoning
Right for the wrong reasons: check with reasoner
45
Premise: All true heros are immortal.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive.
Logical reasoners cannot
deal with phrase variations!
Logical reasoner:
immortal ∕⇒ alive
“Reasoning is faulty!”
x
All true heros are immortal.
Elvis is a true hero.
Is Elvis alive?
correct answer, valid and grounded reasoning
Right for the wrong reasons: check with VANESSA
46
X _is_a_true_hero ⇒ X _is_immortal
Elvis_is_a_true_hero ⇒ Elvis_is_immortal
Elvis_is_a_true_hero
¬ Elvis_is_alive
Premise: All true heros are immortal.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive.
Right for the wrong reasons: check with VANESSA
47
X _is_a_true_hero ⇒ X _is_immortal
Elvis_is_a_true_hero ⇒ Elvis_is_immortal
Elvis_is_a_true_hero
¬ Elvis_is_alive
Premise: All true heros are immortal.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive.
shallow pattern‐based parsing
negated conclusion
atomic statements without semantics
instantiation with all nouns
Right for the wrong reasons: check with VANESSA
48
Elvis_is_a_true_hero ⇒ Elvis_is_immortal
Elvis_is_a_true_hero
¬ Elvis_is_alive
Premise: All true heros are immortal.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive.
Right for the wrong reasons: check with VANESSA
49
Elvis_is_immortal
\ Elvis_is_alive
textual entailment
between all
pairs of sentences
Premise: All true heros are immortal.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive.
Elvis_is_a_true_hero ⇒ Elvis_is_immortal
Elvis_is_a_true_hero
¬ Elvis_is_alive
Textual entailment allows us to bridge phrasing variations
without a semantic analysis of the sentences!
Right for the wrong reasons: check with VANESSA
50
A, A ⇒ B
B
Gentzen‐style
logical reasoner
VANESSA can certify reasoning steps in a neuro‐symbolic way.
Elvis_is_immortal
⇒ Elvis_is_alive
Elvis_is_a_true_hero ⇒ Elvis_is_immortal
Elvis_is_a_true_hero
¬ Elvis_is_alive
Premise: All true heros are immortal.
Premise: Elvis is a true hero.
Conclusion: Elvis is alive.
VANESSA = Verifying Answers by Natural Language Entailment and Syntactic Sentence Analysis
VANESSA verifies reasoning steps neuro‐symbolically
51
We created a benchmark of
-
1400 reasoning chains
-
with 5000 steps annotated
for validity and groundedness
We tested:
-
Logical reasoners
-
LLMs
-
neuro‐symbolic (LINC,VANESSA)
Results:
- Symbolic:
transparent, high precision, but low recall
- LLM:
good performance, but not transparent
- VANESSA:
high precision, transparent, competitive recall
... but still an open problem!
[Zacchary Saddedine, Fabian Suchanek: “Verifying the Steps of Deductive Reasoning Chains”, ACL Find. 2025]
52
Tell me a story
about Elvis!
singer
type
story
Language Models and Knowledge Bases
Language
Model
How can we evaluate the quality of the story?
BLEU?
ROUGE?
JAUNE?
Human evaluation?
How to evaluate language models?
• We systemized the criteria for the quality of a story from the humanities
• had stories annotated manually by these criteria
Relevance (RE): how well the story matches its prompt
Coherence (CH): how much the story makes sense
Empathy (EM): how well the reader understood
the character’s emotions
Surprise (SU): how surprising the end of the story was
...
53
54
There are no good metrics to evaluate LMs
• We systemized the criteria for the quality of a story from the humanities
• had stories annotated manually by these criteria
• and correlated the manual evaluations with automated measures
criteria from
humanities
(absolute Kendall correlations)
The automated metrics do not
correlate well with the “real” ones!
=>
we still need
manual evaluation!
[Chhun, Colombo, Suchanek, Clavel: A Benchmark of the Evaluation of Story Generation, COLING 2022]
55
There are no good metrics to evaluate LMs
• We systemized the criteria for the quality of a story from the humanities
• had stories annotated manually by these criteria
• and correlated the manual evaluations with automated measures
criteria from
humanities
(absolute Kendall correlations)
The automated metrics do not
correlate well with the “real” ones!
=>
we still need
manual evaluation!
[Chhun, Suchanek, Clavel: Do Language Models Enjoy Their Own Stories?, TACL 2024]
... unless we ask the LLMs themselves
to evaluate their stories
=> works reasonably well
>group loss, benchm
56
Is Elvis Presley
still alive?
singer
type
How good is the model itself
at answering questions?
Language Models and Knowledge Bases
Language
Model
57
Quantifying self‐confidence
We asked the model how confident it was in its anwers, using SelfCheckGPT:
Manakul, Liusie, Gales: “Selfcheckgpt: Zero-resource black-box hallucination detection for generative LLMs”, Arxiv 2023
one batch of answers, 80% are predicted to be correct
ideal percentage of correct answers (80%)
actual percentage of correct answers (70%)
Confidence estimates generally work well.
SelfCheckGPT detects hallucinations by comparing the consistency of multiple answers to the same query.
We tried also used “Just Ask for Calibration”, which uses dedicated prompts to elicit verbalized probabilities.
58
Quantifying self‐confidence
We asked the model how confident it was in its anwers, using SelfCheckGPT:
Confidence estimates generally work well...
...but less well for unpopular entities.
59
Quantifying self‐confidence
We asked the model how confident it was in its anwers, using SelfCheckGPT:
Confidence estimates show a grouping loss.
[Chen, Perez-Lebel, Suchanek, Varoquaux: “Reconfidencing LLMs from the Grouping Loss Perspective”, EMNLP 2024]
>benchm
>benchm&BELLA
60
In Machine Learning, truth=data
We test 6 models on 8 tasks on 29 datasets in the domain of climate change.
61
In Machine Learning, truth=data
We test 6 models on 8 tasks on 29 datasets in the domain of climate change.
Results:
-
TF-IDF performs on par with LLMs and fine‐tuned models
tasks are too simple
and can be solved by
frequent word analysis
62
In Machine Learning, truth=data
We test 6 models on 8 tasks on 29 datasets in the domain of climate change.
Results:
-
TF-IDF performs on par with LLMs and fine‐tuned models
-
96% of the datasets have annotation issues
(mistakes and ambiguities)
tasks are too simple
and can be solved by
frequent word analysis
what does it mean to have a precision of 99%
on a dataset that is only 80% correct?
63
In Machine Learning, truth=data
We test 6 models on 8 tasks on 29 datasets in the domain of climate change.
Results:
-
TF-IDF performs on par with LLMs and fine‐tuned models
-
96% of the datasets have annotation issues
(mistakes and ambiguities)
-
LLMs perform worse than fine‐tuned models
[Calamai, Bălălău, Suchanek: “Benchmarking the Benchmarks: Reproducing Climate-Related NLP Tasks”, ACL Find. 2025]
tasks are too simple
and can be solved by
frequent word analysis
what does it mean to have a precision of 99%
on a dataset that is only 80% correct?
the task description differs from what
is actually annotated in the data
Benchmarks are often ill-defined,
too simple, or wrong.
>BELLA
64
Is Elvis Presley
still alive?
singer
type
How can we understand what happens here?
Language Models and Knowledge Bases
Language
Model
65
Explainability for regression
Why does my house have this price?
Existing approaches (LIME etc.):
-
need access to the prediction model
-
give explanations that apply only
to a single data point
“For this datapoint, x played the biggest role”
66
Explainability for regression
Why does my house have this price?
We build a local linear model
on a neighborhood that we expand until
the trade‐off between precision and recall
is optimal (by maximizing the lower bound
of an error measure).
Then we output a local linear equation
and the size of the neighborhood.
“The price of the house is determined by y = 0.8 × x -20 . This applies to 100 houses”
67
Explainability for regression
Why does my house have this price?
=>
the explanation can be verified, i.e., applied to this data point and others
=>
the explanation is general, as it applies to the entire neighborhood
=>
the method works even when the dataset is static (not generated by a model)
“The price of the house is determined by y = 0.8 × x -20 . This applies to 100 houses”
[Radulović, Bifet, Suchanek: “BELLA: Black box model
Explanations by Local Linear Approximations”, TMLR 2025]
68
Is Elvis Presley
still alive?
singer
type
Where does the knowledge base come from?
Language Models and Knowledge Base
Language
Model
69
singer
type
Information
Extraction
Knowledge bases can be built by information extraction from text.
[Weikum, Dong, Razniewski, Suchanek: Machine Knowledge, Found. & Trends in Databases, 2021]
[Suchanek, Lajus, Boschin, Weikum: Knowledge Representation and Rule Mining, RW 2019]
250 page book
Information Extraction
Elvis Presley is
a famous singer.
Language
Model
>meaningRep
70
Knowledge bases can be built by information extraction from text.
Meaning representations can help with that.
We wrote a survey of the most popular frameworks.
[Zacchary Siddedine, Juri Opitz, Fabian Suchanek: “A survey of Meaning Represention”, NAACL 2024]
Information Extraction by Meaning Representations
Elvis Presley is
a famous singer.
singer
profession
person
Elvis Presley
ARG0
ARG1
71
?
?
?
How can we extract from Wikipedia?
Language Models and Knowledge Bases
Information
Extraction
Language
Model
Elvis Presley, an American
singer blah blah
blub blah don’t read
this, listen to the
speaker! blah blah
blub blah. you are still
reading this! blah blah
blah blah blabbel
Born: 1935
In: Tupelo
...
Categories:
Rock&Roll, American Singers,
Academy Award winners...
Extracting from Wikipedia: the YAGO KB
72
ElvisPresley
AmericanSinger
1935
Tupelo
USA
AcAward
type
birthPlace
locatedIn
won
birthYear
>details
New YAGO: Schema.org + Wikidata
73
schema.org
Singer
Person
Constraints: Person ⊓ Location ≡ ⊥ , ∃ birthDate \ Person , ...
ElvisPresley
AmericanSinger
1935
Tupelo
USA
AcAward
type
birthPlace
locatedIn
won
birthYear
74
Example: YAGO about Elvis
Try it out!
New taxonomy
in Version 4.5
Schema.org + Wikidata
50 million entities, 150 million facts, 500 million labels
provably consistent (OWL DL & SHACL)
legible entity names
legible taxonomy & schema
used by DBpedia and IBM Watson
10,000+ citations
http://yago-knowledge.org