Fabian M. Suchanek
Entity Typing
34
Semantic IE
You
are
here
2
Source Selection and Preparation
Entity Recognition
Entity Disambiguation
singer
Fact Extraction
KB
construction
Entity Typing
singer Elvis
Overview
3
•
Extractive Entity Typing
•
Set Expansion
The weaknesses of NERC
4
Named Entity Extraction and Classification (NERC) classifies named entities into predefined classes.
Traditional systems work with 5-20 classes,
modern systems with up to 10,000 classes.
However, we are inherently limited by the classes that were known at training time.
Bertrand Russel lived in the UK.
<PER> <PER> <LOC>
Def: Entity Typing
5
(Extractive) Entity Typing
is the task of extracting named entities and their class from documents.
Hypatia of Alexandria was a philosopher,
astronomer, and mathematician.
[Jules Maurice Gaspard]
<Hypatia, type, philosopher>
<Hypatia, type, astronomer>
<Hypatia, type, mathematician>
Different from NERC, the classes are mentioned verbatim in the text.
Wikipedia: Hypatia
Entity Typing: Example
6
Hypatia being attacked by the mob
[Charles Mitchell]
Hypatia established herself as a leading mathematician
in the city of Alexandria/Egypt. Historian Socrates Scho‐
lasticus, a Greek Christian, wrote that her attainments in
literature and science by far surpassed all the philoso‐
phers of the Eastern Roman Empire. As a life‐long pagan,
Hypatia was brutally murdered by a mob of Christians
under the lead of a lector named Peter. Her role as one
of the world’s first female academics, murdered with
horrific cruelty, has made her a symbol of female
empowerment and a “martyr for philosophy”.
Entity Typing: Example
7
Hypatia: mathematician, pagan, dead, academic, female, symbol, martyr; Alexandria: city;
Eastern Roman Empire: empire; Peter: lector; Socrates Scholasticus: historian, Greek, Christian
Hypatia being attacked by the mob
[Charles Mitchell]
Hypatia established herself as a leading mathematician
in the city of Alexandria/Egypt. Historian Socrates Scho‐
lasticus, a Greek Christian, wrote that her attainments in
literature and science by far surpassed all the philoso‐
phers of the Eastern Roman Empire. As a life‐long pagan,
Hypatia was brutally murdered by a mob of Christians
under the lead of a lector named Peter. Her role as one
of the world’s first female academics, murdered with
horrific cruelty, has made her a symbol of female
empowerment and a “martyr for philosophy”.
Why Entity Typing is difficult
8
Encyclopedias mention the class of an entity explicitly:
However, that is rare in normal text:
Rather, classes are expressed in patterns such as:
• composite noun phrases: “the historian Socrates”
• titles: “Prof. Russel”
• descriptions: “established herself as”
• appositions: “Hypatia, a renowed mathematician”
Hypatia was a philosopher.
Breaking news: Paris is a city.
There is one more frequent pattern to express class membership,
which even appears in the text above. Do you see it?
Hypatia being attacked by the mob
[Charles Mitchell]
Def: Hearst Patterns
A
Hearst pattern
is a simple textual pattern that indicates implicitly that an entity belongs to
a class.
...women such as Hypatia...
9
"Y such as X"
type(Hypatia, woman)
Marti Hearst:
Automatic Acquisition of Hyponyms
, COLING 1992
Y such as X+
such Y as X+
X+ and other Y
Y including X+
Y, especially X+
...where X+ is a list of
entity names of the form
“X
,...,X
(and|or)? X
”.
(In the original paper, the X
are noun phrases)
10
Hearst Patterns: Examples
11
Hearst Patterns: Examples
12
Hearst Patterns: Examples
Hearst Patterns mix the “type”
and “subclassOf” relationships!
Misled Hearst Patterns
13
Hearst patterns may get misled by sentences such as
Remedies are:
• Using NERC, POS‐tagging, and/or dependency parsing
• Measuring how often a candidate entity appears, or how often it appears with different patterns.
• doubly-anchored patterns, where at least one known instance appears
People who were not Jews, such as Hypatia, ...
Cities such as Hypatia’s birthplace...
One of the best‐known companies, Procter and Gamble...
if this is known
to be a city...
...accept this as a city
>taxonomyInduction
Cities such as Cairo and Alexandria...
Overview
14
•
Extractive Entity Typing
•
Set Expansion
>setExpansion
Def: Set Expansion
Set Expansion
is the task of, given names of instances of a class (training data, “seeds”), extracting
more such instance names from a corpus.
mathematicians: {Hypatia, Russel, Fermat, Poincaré, ...}
15
mathematicians: {Hypatia, Russel}
Set Expansion
Def: Learning entity type patterns
16
If we have a training set of entities with their classes, we can try to
learn
patterns that express
class membership:
Training data:
<Paris, type, city>
<Berlin, type, city>
Corpus:
• “the mayor of X ” => X is a city
• “X has a population of” => X is a city
• ...
The mayor of Paris decided to support bicycles.
Berlin has a population of 3m.
Example: Hearst in the NELL project
17
NELL: “Robin”
>setexpansion
Def: Recursive Pattern Application
18
0. Start with the seeds
cities: {Austin, Seattle}
Zornitsa Kozareva and Eduard Hovy:
Learning Arguments and Supertypes of Semantic Relations
, ACL 2010
Recursive Pattern Application
is the following algorithm for set expansion:
>example
Def: Recursive Pattern Application
19
0. Start with the seeds
cities: {Austin, Seattle}
Zornitsa Kozareva and Eduard Hovy:
Learning Arguments and Supertypes of Semantic Relations
, ACL 2010
Recursive Pattern Application
is the following algorithm for set expansion:
>example
1. Find the pattern “X, Y, and Z” in the corpus.
Seattle, Chicago, and Austin
Def: Recursive Pattern Application
20
1. Find the pattern “X, Y, and Z” in the corpus.
0. Start with the seeds
Seattle, Chicago, and Austin
cities: {Austin, Seattle}
Zornitsa Kozareva and Eduard Hovy:
Learning Arguments and Supertypes of Semantic Relations
, ACL 2010
Recursive Pattern Application
is the following algorithm for set expansion:
>example
2.
If 2 variables match known instance
names, add the match of the 3rd.
cities: {Austin, Seattle, Chicago}
Def: Recursive Pattern Application
2.
If 2 variables match known instance
names, add the match of the 3rd.
21
1. Find the pattern “X, Y, and Z” in the corpus.
0. Start with the seeds
Seattle, Chicago, and Austin
cities: {Austin, Seattle, Chicago}
cities: {Austin, Seattle}
Zornitsa Kozareva and Eduard Hovy:
Learning Arguments and Supertypes of Semantic Relations
, ACL 2010
Recursive Pattern Application
is the following algorithm for set expansion:
>example
3. Go to 1
... Austin, Seattle, and Houston...
Task: Recursive Pattern Appl.
22
cities: {Springfield, Austin, Seattle}
... Austin, Seattle, and Houston...
Task: Recursive Pattern Appl.
23
cities: {Springfield, Austin, Seattle, Houston}
cities: {Springfield, Austin, Seattle}
... Austin, Seattle, and Houston...
Task: Recursive Pattern Appl.
24
... Houston, Chicago, and Springfield...
cities: {Springfield, Austin, Seattle, Houston}
cities: {Springfield, Austin, Seattle}
... Austin, Seattle, and Houston...
... Houston, Chicago, and Springfield...
Task: Recursive Pattern Appl.
25
cities: {Springfield, Austin, Seattle, Houston, Chicago}
cities: {Springfield, Austin, Seattle, Houston}
cities: {Springfield, Austin, Seattle}
... Austin, Seattle, and Houston...
... Houston, Chicago, and Springfield...
Task: Recursive Pattern Appl.
26
... Austin, Texas, and Seattle, Washington...
cities: {Springfield, Austin, Seattle, Houston, Chicago}
cities: {Springfield, Austin, Seattle, Houston}
cities: {Springfield, Austin, Seattle}
... Austin, Seattle, and Houston...
... Houston, Chicago, and Springfield...
... Austin, Texas, and Seattle, Washington...
Task: Recursive Pattern Appl.
27
Precision may suffer over time
cities: {Springfield, Austin, Seattle, Houston, Chicago}
cities: {Springfield, Austin, Seattle, Houston}
cities: {Springfield, Austin, Seattle}
... Houston, Chicago, and Springfield...
... Austin, Texas, and Seattle, Washington...
28
Def: Semantic Drift
Semantic Drift
is the problem in Set Expansion that names of instances of other classes get into
the set.
cities: {Chicago, Seattle, ..., Texas}
cities: {Springfield, Austin, Seattle, Houston, Chicago}
cities: {Springfield, Austin, Seattle, Houston}
>tables
Def: Table Set Expansion
Table Set Expansion
is the following algorithm for set expansion:
1.
Find HTML tables or lists where one column
contains 2 known instance names
29
2.
Add all column
entries to the set
3. Go to 1
countries: {Russia, China}
countries: {Russia, China, Canada, United States}
0. Start with the seeds
Table Set Expansion: Example
30
https://fr.wikipedia.org/wiki/Prix_Nobel_de_physique
https://www.nobelprize.org/nobel_prizes/physics/laureates/
Summary: Entity Typing
Entity typing finds entities with their class in corpora. We saw 2 methods:
1. Hearst Patterns
2. Set Expansion
31
female academics such as Hypatia
mathematicians: {Hypatia, Russel}
“Reserve your right to think. For even to think wrongly is better than not thinking at all.”
— Hypatia of Alexandria
->Disambiguation
->Dependency-parsing
References
Gerhard Weikum, Luna Dong, Simon Razniewski, Fabian M. Suchanek:
Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases
Marti Hearst:
Automatic Acquisition of Hyponyms
32
->Disambiguation
->Dependency-parsing