CC-BY
Fabian M. Suchanek
Disambiguation
56
Semantic IE
You
are
here
2
Source Selection and Preparation
Entity Recognition
Entity Disambiguation
singer
Fact Extraction
KB
construction
Entity Typing
singer Elvis
Overview
Disambiguation
Basic approach
Local features
Global features
3
4
When was Roosevelt born
(the one who oversaw the
drafting of the Human Rights)?
Language
Model
How can the language model make the link between the entities in the
question and the entities in the KB?
With a little help from my friends...
A language model can resort to a knowledge base (KB) for factual information.
?
The Problem of Ambiguity
5
After having recognized and typed entity names, we want to identify the entities.
Roosevelt oversaw the drafting of
the UN Declaration of Human Rights.
The Problem of Ambiguity
6
?
After having recognized and typed entity names, we want to identify the entities.
Roosevelt oversaw the drafting of
the UN Declaration of Human Rights.
7
Wikipedia: Eleanor Roosevelt
[National Archives]
After having recognized and typed entity names, we want to identify the entities.
The Problem of Ambiguity
Roosevelt oversaw the drafting of
the UN Declaration of Human Rights.
Def: Disambiguation
8
(Named Entity) Disambiguation
(NED) is the task of mapping an identified mention of an entity
in a corpus to the intended entity in a knowledge base.
Entity Linking
(EL) is the task of
entity recognition (NERC) and disambiguation.
Anna_Eleanor_Roosevelt
“Roosevelt”
label
nationality
United_States
born
1884
occupation
diplomat
Corpus
Knowledge base
Roosevelt oversaw the drafting of
the UN Declaration of Human Rights.
Def: Disambiguation
9
entity mention
surface form: “Roosevelt”
This is the unique identifier of the
entity in the knowledge base.
This is the human‐readable label of the entity.
Several entities can have the same label!
desired
mapping
Anna_Eleanor_Roosevelt
“Roosevelt”
label
(Named Entity) Disambiguation
(NED) is the task of mapping an identified mention of an entity
in a corpus to the intended entity in a knowledge base.
Roosevelt oversaw the drafting of
the UN Declaration of Human Rights.
Def: Disambiguation
10
This is the human‐readable label of the entity.
Several entities can have the same label!
“Roosevelt”
label
(Named Entity) Disambiguation
(NED, Entity Linking, EL) is the task of mapping an identified mention
of an entity in a corpus to the intended entity in a knowledge base.
This is the unique identifier of the
entity in the knowledge base.
It can be an illegible string of characters!
Q_89970AC57F7
>examples
Roosevelt oversaw the drafting of
the UN Declaration of Human Rights.
Where Disambiguation is needed
11
Disambiguation is essential for information extraction.
Roosevelt served as the First Lady of the US during the four terms in office of her husband
Franklin D. Roosevelt, making her the longest‐serving First Lady of the United States.
Franklin D. Roosevelt
(President)
Franklin D. Roosevelt
(lawyer)
Eleanor
Roosevelt
Ellen
Roosevelt
Roosevelt,
New Jersey
Postcard at Ebay, rest from Wikipedia
married?
>examples
Where Disambiguation is needed
12
[The Economist, 2018-12-22]
Who is mentioned in the Panama Papers?
[Ambiverse: G20 economies and the Panama Papers]
>examples
Where Disambiguation is needed
13
How can I insert images in Latex documents?
Where Disambiguation is needed
14
How can I insert images in Latex documents?
=> Disambiguation problem
Overview
15
Disambiguation
Basic approach
Local features
Global features
First run NERC
Usually, we first run Named Entity Recognition and Classification (NERC) to identify the
entity mentions.
16
Roosevelt
drafted the
Human Rights Declaration
with representatives from
Chile
,
India
, the
UK
,
France
, the
Soviet Union
,
Australia
,
China
, and
Lebanon
.
This process is sometimes called
Mention Detection
.
17
Def: Re-Ranking
Re‐ranking
is a strategy for disambiguation (and other tasks), which first determines
candidate entities, and then ranks them by a score.
Candidates:
- Franklin D. Roosevelt
- Franklin D. Roosevelt Jr.
- Ellen Roosevelt
- Eleanor Roosevelt
Score:
0.5
0.4
0.3
0.8
>details
Roosevelt
drafted the
Human Rights Declaration
...
18
Candidates for Disambiguation
Which entities are candidates? All entities
- whose label is identical to the surface form
- whose label contains (part of) the surface form
- that have an attribute in which the surface form appears
- ...
<Franklin_Roosevelt, label, “Roosevelt”>
<Anna_Eleanor_Roosevelt, label, “Eleanor Roosevelt”>
<Anna_Boettiger, description, “author, née Anna Roosevelt”>
>details
Roosevelt
drafted the
Human Rights Declaration
...
19
Scoring the Candidates
Candidates:
- Franklin_Roosevelt
- Anna_Eleanor_Roosevelt
- Anna_Boettiger
Score:
0.5
0.4
0.3
Based on features such as
- string similarity of label
- statistical likelihood
- similarity of the context
- logical constraints (?)
- ...
>details
>nil
Roosevelt
drafted the
Human Rights Declaration
...
20
The NIL Problem
Candidates:
• Fabian (grape)
• Fabian (hurricane)
• Fabian (film)
Score:
0.2
0.1
0.2
Roosevelt drafted the Human Rights Declaration without Fabian.
21
The NIL Problem
Roosevelt drafted the Human Rights Declaration without Fabian.
The
NIL Problem
appears when the intended entity of a mention is not in the knowledge base.
A common solution is to establish (or learn) a threshold for the score, and to map any mention that
does not score higher than this threshold to a predefined entity “NIL” (for: no mapping).
NIL
Candidates:
• Fabian (grape)
• Fabian (hurricane)
• Fabian (film)
Score:
0.2
0.1
0.2
Vanilla Algorithm for Entity Disambiguation
22
A very simple algorithm for entity disambiguation is as follows:
INPUT: a knowledge base (KB) and a corpus
• run Named Entity Recognition and Classification (NERC)
• for each entity mention
• find candidate entities
(e.g., all entities whose label equals the surface form)
• score the entities
(e.g., by the features we will discuss)
• find the highest‐scoring entity
• output it if its score is above the NIL threshold
>deeplearn
Entity Disambiguation by Deep Learning
23
A very simple deep learning algorithm for entity disambiguation is as follows:
INPUT: a knowledge base (KB), a
training corpus
, and a
testing corpus
• run Named Entity Recognition and Classification (NERC)
• for each entity mention
in the training corpus
• find candidate entities
(e.g., all entities whose label equals the surface form)
• score the entities
by a neural network
(which is designed to take into account the features we will discuss)
• find the highest‐scoring entity
•
compare it to the gold standard entity, and back‐propagate loss
Do the same on the testing corpus, and output the highest‐scoring entity.
->Deep-learning
Entity Disambiguation by pretrained models
24
Entity disambiguation can be done by pretrained language models by concatenating text and
entity candidates and asking the model to extract the correct one. This is called
extractive entity disambiguation
.
->Deep-learning
Roosevelt drafted the Human Rights Declaration.<SEP>
Eleanor Roosevelt; Franklin Roosevelt; Albert Roosevelt.
BERT/BART/...
START
END
NONE
NONE
NONE
[Barba et al: “ExtEnD: Extractive Entity Disambiguation”, ACL 2022]
input text
candidates
Entity Disambiguation by generative models
25
Entity disambiguation can also be done by a generative language model. We engineer
a prompt that asks the model to select the right entity from a list of candidates.
->Deep-learning
Roosevelt drafted the Human Rights Declaration.
Which of the following entities is mentioned in the text?
Eleanor Roosevelt; Franklin Roosevelt; Albert Roosevelt.
[Barba et al: “ExtEnD: Extractive Entity Disambiguation”, ACL 2022]
input text
candidates
instruction
prompt
Eleanor Roosevelt
model answer
Features
26
For disambiguation, we usually first determine the candidates for a given mention. Then we
use the features of the candidates from the KB to rerank them.
•
In symbolic approaches, the features are used to score each candidate
•
In neural approaches, the features are used to compute embeddings
The American diplomat Roosevelt...
US
diplomat
score similarity between KB and text
candidate
entity in KB
<embedding of mention>
<embedding of candidate entity>
score similarity
between embeddings
Features
27
For disambiguation, we usually first determine the candidates for a given mention. Then we
use the features of the candidates from the KB to rerank them.
•
In symbolic approaches, the features are used to score each candidate
•
In neural approaches, the features are used to compute embeddings
•
In approaches by pretrained language models and generative language models,
the features can appear directly in the text next to the candidate
Which entity is mentioned in the text?
Eleanor Roosevelt (diplomat, US, wife of Franklin Roosevelt); Albert Roosevelt (...
Overview
28
Disambiguation
Basic approach
Local features
Global features
Label Similarity
29
The Declaration of Human Rights prohibits slavery and torture, and stipulates freedom of speech,
freedom of religion, freedom in the choice of partners, and freedom of movement.
<UN_HR, label, “Universal Declaration of Human Rights”>
<FR_HR, label, “1789 Declaration of the Rights of Man and of the Citizen”>
?
>details
Label Similarity
30
<UN_HR, label, “Universal Declaration of Human Rights”>
<FR_HR, label, “1789 Declaration of the Rights of Man and of the Citizen”>
One feature to compute the score of a disambiguation candidate entity is the similarity between
the surface form and the label of the entity.
>details
The Declaration of Human Rights prohibits slavery and torture, and stipulates freedom of speech,
freedom of religion, freedom in the choice of partners, and freedom of movement.
Possible measures for string comparison are:
• string identity: fast, simple, may be too restrictive
• case-insensitive string identity: as string identity, a bit more permissive
• edit distance: parameterizable, but expensive to compute
• Jaccard similarity: often a good trade‐off
Label Similarity: Jaccard Similarity
31
The Jaccard Similarity of two strings s and t is
where S is the set of tokens of s , and T the tokens of t .
<UN_HR, label, “Universal Declaration of Human Rights”>
<FR_HR, label, “1789 Declaration of the Rights of Man and of the Citizen”>
3/9=33%
4/5=80%
>details
The Declaration of Human Rights prohibits slavery and torture, and stipulates freedom of speech,
freedom of religion, freedom in the choice of partners, and freedom of movement.
Label Similarity: by Neural Network
32
<UN_HR, label, “Universal Declaration of Human Rights”>
The surface form and the entity label can be compared by a neural network by embedding both,
and scoring the similarity of the embeddings.
embedding
embedding
similarity
scoring
>details
The Declaration of Human Rights prohibits slavery and torture, and stipulates freedom of speech,
freedom of religion, freedom in the choice of partners, and freedom of movement.
Label Similarity: by Neural Network
33
The Declaration of Human Rights
stipulates freedom of speech
<UN_HR, label, “Universal...”>
The embedding can be
• word embeddings
• character embeddings
• n‐gram or BPE embeddings
Different embeddings can
be concatenated.
The output of the embedding
layer is a low‐dimensional
vector.
->Embeddings
embedding
embedding
similarity
scoring
>details
Label Similarity: by Neural Network
34
<UN_HR, label, “Universal...”>
embedding
embedding
similarity
scoring
The similarity can be
• point‐wise multiplication
• point‐wise subtraction
• the scalar product / cosine
• Euclidian distance
... or a combination thereof.
The output of the similarity
layer is a vector.
The scoring can be done by
a single fully connected layer.
The output of the scoring is
a single number.
>details
The Declaration of Human Rights
stipulates freedom of speech
Label Similarity: by Neural Network (Example)
35
>details
[Chen, Varoquaux, Suchanek: A Lightweight Neural Model for Biomedical Entity Linking, AAAI 2021]
Context Similarity
Roosevelt was the first First Lady to broadcast her own weekly radio program (My Day)
and she pressed the United States to join and support the United Nations.
36
Anna_Eleanor_Roosevelt
USA
UN
My_Day_(radio)
“First Lady”
“United States”
“America”
“United Nations”
“UN”
“My Day”
nat.
worksAt
creator
label
label
label
comment
Context Similarity
Roosevelt was the first First Lady to broadcast her own weekly radio program (
My Day
)
and she pressed the
United States
to join and support the
United Nations
.
37
Anna_Eleanor_Roosevelt
USA
UN
My_Day_(radio)
“
First Lady
”
“
United States
”
“America”
“
United Nations
”
“UN”
“
My Day
”
nat.
worksAt
creator
label
label
label
Words in the context of the mention may appear as labels of the entities linked to the candidate.
comment
What is the context of the mention?
Roosevelt was the first First Lady to broadcast her own weekly radio program (
My Day
)
and she pressed the
United States
to join and support the
United Nations
.
38
The context of the mention can be
- a window of ±n tokens around the mention (example: ± 1 )
- the entire sentence
- a paragraph
- the entire document
What is the context of the candidate?
39
Anna_Eleanor_Roosevelt
USA
“Eleanor Roosevelt”
“Anna Eleanor Roosevelt”
“First Lady”
“United States”
“America”
nat.
label
comment
The context of a candidate entity can be
- only the <label> of the entity
- also the <description>, <name>, <comment>, or any other hand‐selected attributes
- all attributes
- external resources, such as the Wikipedia page of an entity
- or the labels of the surrounding entities
label
The entity context and the mention context can also be compared by the weighted Jaccard
similarity:
40
Context Similarity: Weighted Jaccard
Compute the TF-IDF vector for the context of the entity
and for the context of the mention
and compute the similarity as their cosine:
Roosevelt: 0.9, United Nations: 0.8, radio: 0.001, first: 0.6, ...〉
Roosevelt: 0.9, United States: 0.7, female: 0.01, ...〉
In entity disambiguation with pretrained or generative language models,
we can concatenate the entity context for each candidate.
41
Context Similarity with LLMs
Roosevelt was the first First Lady to broadcast her own weekly radio program (My Day)
and she pressed the United States to join and support the United Nations.
Which entity is mentioned?
Eleanor Roosevelt (born in New York, worked at United Nations, is citizen of USA,...)
...
candidate entity label
labels of the neighbors of the candidate entity
The context of an entity can include the classes of the entity from the taxonomy.
42
Context Similarity with the taxonomy
Roosevelt was the first First Lady to broadcast her own weekly radio program (My Day)
and she pressed the United States to join and support the United Nations.
Which entity is mentioned?
Eleanor Roosevelt (politician, activist, First Lady)
...
candidate entity label
labels of the classes to which the candidate belongs
Prior Probability
43
Roosevelt began giving speeches and appearing at campaign events in her husband’s place
when he was stricken with a paralytic illness.
•
President Theodore Roosevelt High School, Honolulu, Hawaii
•
Roosevelt, Arizona, a census-designated place
•
André Roosevelt (1879–1962), filmmaker
•
Eleanor Roosevelt (1884–1962), First Lady of the United States
•
Theodore Roosevelt (1858–1919), U.S. president
•
...
?
Prior Probability
44
Another feature to compute the score of a disambiguation candidate entity is the a‐priori
probability that the surface form refers to that entity.
Wikipedia links with title “Roosevelt”:
3
90
2
1000
3000
clearly the most
likely candidate entities
>logical
?
Roosevelt began giving speeches and appearing at campaign events in her husband’s place
when he was stricken with a paralytic illness.
•
President Theodore Roosevelt High School, Honolulu, Hawaii
•
Roosevelt, Arizona, a census-designated place
•
André Roosevelt (1879–1962), filmmaker
•
Eleanor Roosevelt (1884–1962), First Lady of the United States
•
Theodore Roosevelt (1858–1919), U.S. president
•
...
Logical Constraints
45
• Eleanor Roosevelt
• Roosevelt University
• USS Roosevelt
• Franklin D. Roosevelt
Roosevelt married her husband Franklin Delano in 1905.
(supposing that the NERC did not determine the class)
?
Logical Constraints
46
Another feature to compute the score of a disambiguation candidate entity are logical constraints,
such as type constraints.
married ⇒ person
¬ married(x ,x )
•
Eleanor Roosevelt
• Roosevelt University
• USS Roosevelt
•
Franklin D. Roosevelt
Roosevelt married her husband Franklin Delano in 1905.
?
Combining the features
47
The feature scores for an entity e and a mention m can be combined
• by a simple linear combination (whose weights can be fixed or learned)
• by a lexicographical order
• by a neural network
• or in many different other ways
score(e,m)=α×labelSim(e,m)+β×contextSim(e,m)+γ×prior(e, m)
if labelSim(e,m)=1 return e else return
score1
score2
score3
finalScore
e
m
scoring networks
single fully connected layer
>global
Overview
48
Disambiguation
Basic approach
Local features
Global features
Coherence
49
United Kingdom
UK rock band
University of Kentucky
Russia (Country)
Russia (Ohio)
Russia (horse)
Armenia (Country)
Armenia (album)
Armenia (ship)
European Union
Europinidin
Basque language
Turkey (Country)
Turkey (Bird)
The European Court of Human Rights is for the EU, as well as, e.g., for Turkey,
the UK, Russia, and Armenia.
?
?
?
?
?
The European Court of Human Rights is for the EU, as well as, e.g., for Turkey,
the UK, Russia, and Armenia.
Coherence
50
United Kingdom
Russia (Country)
Armenia (Country)
European Union
Turkey (Country)
A disambiguation is better if the disambiguated entities are closely related.
Country
Europe
locatedIn
locatedIn
locatedIn
locatedIn
type
type
type
type
>global
Coherence: Measures
51
United Kingdom
Armenia (Country)
Turkey (Country)
Possible measures for the relatedness of two entities are
- the cosine‐similarity of the embeddings of the entities
- the normalized number of links between them in a KB or in Wikipedia
- the normalized number of co‐occurrences of the entites in a corpus
- the overlap of their textual attributes
Country
Problem: The relatedness of the disambiguated entities is a global measure,
which cannot be computed per disambiguated entity,
but only for the entire set of disambiguated entities.
>global
Disambiguation as a Graph Problem
52
A disambiguation problem can be modeled as a bipartite graph of mentions and candidate
entities, with weighted similarity edges between candidates and between candidates
and mentions.
“EU”
“UK”
“Turkey”
<Europinidin>
<European_Union>
<United_Kingdom>
<UK_band>
<Turkey_(country)>
<Turkey_(bird)>
0.2
0.8
0.6
0.4
0.5
0.5
0.8
0.6
0.3
Similarity from local features
Entity relatedness (e.g., from KB)
Disambiguation as a Graph Problem
53
Find the mapping φ from mentions M to entities E that maximizes
where sim(., .) is given by the local features and entity relatedness.
(α is a tunable parameter)
“EU”
“UK”
“Turkey”
<Europinidin>
<European_Union>
<United_Kingdom>
<UK_band>
<Turkey_(country)>
<Turkey_(bird)>
0.2
0.8
0.6
0.4
0.5
0.5
0.8
0.6
0.3
Similarity from local features
Entity relatedness (e.g., from KB)
Disambiguation by a GCN
54
The global optimum can be approximated by a graph convolutional network (GCN)
on the graph of candidate entities, where the feature per node is the score given to a candidate.
0.2
0.8
0.6
0.4
0.5
0.5
->GCNs
<Europinidin>
<European_Union>
<United_Kingdom>
<UK_band>
<Turkey_(country)>
<Turkey_(bird)>
0.8
0.6
0.3
Similarity from local features
Entity relatedness (e.g., from KB)
Example: Disambiguation by AIDA
Try it out
AIDA is a system for the disambiguation of entity names, based on YAGO.
55
Example: Disambiguation by AIDA
56
>textual
Summary: Disambiguation
57
Disambiguation is the task of mapping a mention of an entity in a corpus
to the intended entity in a knowledge base (KB).
1) Find candidate entities in the KB, e.g., by their label
2) Score the candidates, e.g., by label similarity, context similarity, etc.
3) Compute a global optimum for the disambiguation
“Happiness is not a goal.
It is a by‐product of a life well lived.”
― Eleanor Roosevelt
References
Ledell Wu et al: “
Scalable Zero-shot EL with Dense Entity Retrieval
”
Mohamed Amir Yosef et al: “
AIDA: An Online Tool for Disambiguation
”
Gerhard Weikum at al: “
Machine Knowledge
”
58
->Fact-extraction