RDF Ontologies
Singer
plays
type
born
2
Person
An RDF ontology can be seen as a graph of entities
subclassOf
1935
RDF Ontologies
Singer
plays
type
born
3
Person
An RDF ontology can be seen as a graph of entities
subclassOf
classes
relations / properties
instances
literals
1935
RDF Ontologies
instances
relations / properties
classes
literals
Singer
plays
type
born
4
Person
An RDF ontology can be seen as a graph of entities
subclassOf
intelligent search, QA, machine translation...
Ontologies serve all kinds of purposes:
1935
Existing Ontologies
There are literally hundreds of ontologies on the Web
Singer
type
born
plays
5
Person
subclassOf
1935
]>
Overlapping Data
birthDate
1935
type
RockSinger
marriedTo
YAGO
ElvisPedia
born
Singer
type
6
Many ontologies contain similar or overlapping entities and facts.
plays
1935
Problem: Elvis is lonely
ElvisPedia
birthDate
type
Singer
marriedTo
type
YAGO
born
1935
RockSinger
7
plays
Who is the spouse
of the guitar player?
?
1935
marriedTo
Problem: Elvis is lonely
ElvisPedia
birthDate
type
Singer
marriedTo
type
YAGO
born
1935
RockSinger
8
plays
Who is the spouse
of the guitar player?
?
1935
marriedTo
Complementary
information
cannot be used
Solution: Unify Entities
ElvisPedia
birthDate
type
Singer
marriedTo
type
YAGO
born
1935
RockSinger
9
plays
1935
marriedTo
identical
Goal: Merging Ontologies
ElvisPedia
birthDate
type
Singer
marriedTo
type
YAGO
born
1935
RockSinger
10
plays
To merge two ontologies, we have to identify
1935
Goal: Merging Ontologies
ElvisPedia
birthDate
type
Singer
marriedTo
type
YAGO
born
1935
RockSinger
11
To merge two ontologies, we have to identify
• equivalent instances
plays
sameAs
1935
Goal: Merging Ontologies
subClassOf
ElvisPedia
birthDate
type
Singer
marriedTo
type
YAGO
born
1935
RockSinger
12
To merge two ontologies, we have to identify
• equivalent instances
• equivalent or subsuming classes
sameAs
plays
1935
Goal: Merging Ontologies
sameAs
subClassOf
ElvisPedia
birthDate
type
Singer
marriedTo
type
YAGO
1935
born
1935
RockSinger
13
To merge two ontologies, we have to identify
• equivalent instances
• equivalent or subsuming classes
• equivalent or subsuming relations
subPropertyOf
plays
Previous work
Previous work
• uses hard logical constraints, which may be inadequate
• requires parameter tuning
• has not been tried on large ontologies
birthDate
marriedTo
1935
RockSinger
type
14
type
Singer
born
1935
[Gracia 2009, Jean-Mary 2009, Isaac 2007,
Aumueller 2005, Wang 2008, Noessner 2010,
Sais 2007/2009, Arasu 2009, Volz 2009,
Bhattacharya 2007, Hogan 2007/2010,
Hu 2011, Li 2009, Udrea 2007, and more]
Previous work
Previous work
• uses hard logical constraints, which may be inadequate
• requires parameter tuning
• has not been tried on large ontologies
birthDate
marriedTo
1935
RockSinger
type
15
type
Singer
born
(1)
• has mostly focused on
(1) instance matching or
(2) schema alignment,
... but not both
(2)
(1)
(2)
1935
[Gracia 2009, Jean-Mary 2009, Isaac 2007,
Aumueller 2005, Wang 2008, Noessner 2010,
Sais 2007/2009, Arasu 2009, Volz 2009,
Bhattacharya 2007, Hogan 2007/2010,
Hu 2011, Li 2009, Udrea 2007, and more]
PARIS: Aligning Everything At Once
born
There is a synergy between equality of instances, properties and classes!
=> Compute all together!
birthDate
marriedTo
1935
type
RockSinger
type
Singer
16
PARIS
1935
Probabilistic Model
We chose a probabilistic model
marriedTo
1935
type
RockSinger
born
type
birthDate
Singer
17
the probability that
is a subclass of
the probability that
=
the probability that
is a sub-property of
1935
Probabilistic Model
18
...
All possible
alignments
between the
ontologies
...
Worlds:
Probabilistic Model
19
...
All possible
alignments
between the
ontologies
...
Probabilities:
...
Worlds:
Probabilistic Model
20
Events:
We care here mainly about equality and subsumption events.
Each event can be true or false in a particular world.
...
...
Probabilities:
...
Worlds:
Probabilistic Model
21
Marginals:
The marginal probability of an event
is given by the sum of the probabilities
of the worlds where the event holds.
...
...
Probabilities:
...
Worlds:
23
We do not
care about
these values.
We only care
about these
marginals!
(it is the product measure)
We are interested in marginals that fulfill certain properties.
For any set of marginals, there exists a probability distribution.
Probabilistic Model
Marginals:
...
...
Probabilities:
...
Worlds:
Literals
birthDate
RockSinger
type
1935
Singer
type
born
marriedTo
25
?
1936
The probability that two literals are equal
shall reflect the likelyhood that the two literals
are intended to refer to the same thing.
Literals
birthDate
RockSinger
type
1935
Singer
type
born
marriedTo
26
?
1936
The probability that two literals are equal
shall reflect the likelyhood that the two literals
are intended to refer to the same thing.
• for strings:
string distance
• for numbers:
numeric distance
• for other literals:
domain-specific
Equality of Instances
For instances, relations give a hint:
Elvis Presley
label
label
born
1935
born
1935
Elvis Presley
marriedTo
?
28
Equality of Instances
For instances, relations give a hint:
Elvis Presley
label
label
born
1935
born
1935
Elvis Presley
marriedTo
?
29
Not many people are
called Elvis =>
highly indicative
Equality of Instances
For instances, relations give a hint:
Not many people are
called Elvis =>
highly indicative
Elvis Presley
label
label
born
1935
born
1935
Elvis Presley
marriedTo
?
30
Many people are born
in 1935 =>
less indicative
Not many people are
called Elvis =>
highly indicative
Many people are born
in 1935 =>
less indicative
Elvis Presley
label
label
born
1935
born
1935
Elvis Presley
marriedTo
?
31
The local inverse functionality of a relation with an argument is
one over the number of incoming links:
Local Inverse Functionality
Elvis Presley
1935
marriedTo
label
1935
born
label
born
Elvis Presley
Only one person
is called Elvis
?
32
The local inverse functionality of a relation with an argument is
one over the number of incoming links:
Local Inverse Functionality
Elvis Presley
1935
marriedTo
label
1935
born
label
born
Elvis Presley
Only one person
is called Elvis
?
33
10 people are born
in 1935
The local inverse functionality of a relation with an argument is
one over the number of incoming links:
Local Inverse Functionality
Probability of Inverse Functionality
Elvis Presley
Elvis Presley
marriedTo
label
born
1935
1935
label
born
?
34
Many people share
their year of birth
Few people share
the same name
We define the probability of a relation being inverse functional
as the harmonic mean of the local inverse functionalities:
Equality of Instances
marriedTo
Elvis Presley
Elvis Presley
label
label
born
born
1935
1935
?
35
and
shall be matched if
Equality of Instances
marriedTo
Elvis Presley
Elvis Presley
label
label
born
born
1935
1935
?
36
Equality of Instances
marriedTo
Elvis Presley
Elvis Presley
label
label
born
born
1935
1935
?
37
Equality of Instances
marriedTo
Elvis Presley
Elvis Presley
label
label
born
born
1935
1935
?
38
This evaluates to 1 iff
• There is at least one
highly inverse functional r
• There is one shared
argument
Equality of Instances
• Literals: precomputed
• Others: recursive
label
marriedTo
1935
1935
born
born
label
Elvis Presley
Elvis Presley
?
39
Equality of Classes
Singer
type
RockSinger
If all instances of one class are instances of the other
then the former subsumes the latter
40
type
type
41
subclassOf
type
Singer
RockSinger
Equality of Classes
If all instances of one class are instances of the other
then the former subsumes the latter
type
subclassOf
42
type
Singer
RockSinger
Equality of Classes
If all instances of one class are instances of the other
then the former subsumes the latter
type
subclassOf
43
type
Singer
RockSinger
Equality of Classes
If all instances of one class are instances of the other
then the former subsumes the latter
type
subclassOf
44
type
Singer
RockSinger
Equality of Classes
If all instances of one class are instances of the other
then the former subsumes the latter
Equality of Relations
knows
marriedTo
knows
If every pair of one relation is a pair of another relation,
then the first is a sub-property of the second.
45
Equality of Relations
knows
marriedTo
knows
If every pair of one relation is a pair of another relation,
then the first is a sub-property of the second.
46
marriedTo
subpropertyOf
knows
Equality of Relations
knows
marriedTo
knows
If every pair of one relation is a pair of another relation,
then the first is a sub-property of the second.
47
knows
subpropertyOf
marriedTo
Can be solved analogously to the
subsumption of classes.
Algorithm
1. Fix equalities for literals
48
Literals:
Instances:
Relations:
1. Fix equalities for literals
2. Set equalities for relations to a small initial value
3. Iterate the estimations for relations and instances to convergence (*)
(*) We have no proof for convergence, but it seems to happen
50
Iterate
Algorithm
Literals:
Algorithm
Iterate
1. Fix equalities for literals
2. Set equalities for relations to a small initial value
3. Iterate the estimations for relations and instances to convergence (*)
4. Compute the estimations for classes
(*) We have no proof for convergence, but it seems to happen
51
final
step
Classes:
Instances:
Relations:
Literals:
Experiment: YAGO and DBpedia
52
Singer
SuperHuman
1977
1935
1935
died
born
gender
male
born with
stillborn
twin brother
DBpedia
YAGO
1.7 million instances
19 million facts
2.6 million instances
18 million facts
Experiment: YAGO and DBpedia
53
Singer
SuperHuman
1977
1935
1935
died
born
gender
male
born with
stillborn
twin brother
DBpedia
YAGO
Independent
class hierarchies
2.6 million instances
18 million facts
1.7 million instances
19 million facts
Experiment: YAGO and DBpedia
54
Singer
SuperHuman
1977
1935
1935
died
born
gender
male
Independent
class hierarchies
born with
stillborn
twin brother
DBpedia
YAGO
50% overlap
of instances
2.6 million instances
18 million facts
1.7 million instances
19 million facts
Experiment: YAGO and DBpedia
55
Singer
SuperHuman
1977
1935
1935
died
born
gender
male
Independent
class hierarchies
50% overlap
of instances
born with
stillborn
twin brother
DBpedia
YAGO
Independent
properties
2.6 million instances
18 million facts
1.7 million instances
19 million facts
Experiment: YAGO and DBpedia
56
Singer
SuperHuman
1977
1935
1935
died
born
gender
male
Independent
class hierarchies
50% overlap
of instances
Independent
properties
born with
stillborn
twin brother
DBpedia
YAGO
Many
complementary
facts
2.6 million instances
18 million facts
1.7 million instances
19 million facts
Experiment: YAGO and DBpedia
57
Iteration
1
2
3
4
Change
—
12%
1%
0.3%
Time
4:04h
5:06h
5:00h
5:30h
Precision
86%
89%
90%
90%
Recall
69%
73%
73%
73%
Matching the instances:
F-Measure
increases
Experiment: YAGO and DBpedia
58
Iteration
1
2
3
4
Change
—
12%
1%
0.3%
Time
4:04h
5:06h
5:00h
5:30h
Precision
86%
89%
90%
90%
Recall
69%
73%
73%
73%
Matching the instances:
Precision: 74%
Aligns roughly half of DBpedia's classes
Matching the classes:
Experiment: YAGO and DBpedia
59
yago:actedIn = dbpedia:starring-inv
yago:hasChild = dbpedia:child
yago:hasChild = dbpedia:parent-inv
yago:created > dbpedia:writer
yago:diedIn > dbpedia:placeOfBurial
... and many more. Precision: 96%
Matching the relations:
Experiment: YAGO and DBpedia
60
yago:actedIn = dbpedia:starring-
yago:hasChild = dbpedia:child
yago:hasChild = dbpedia:parent-
yago:created > dbpedia:writer
yago:diedIn > dbpedia:placeOfBurial
... and many more. Precision: 96%
Matching the relations:
+ experiments on IMDB
+ experiments on OAEI standard datasets
Experiment: YAGO and DBpedia
61
yago:actedIn = dbpedia:starring-
yago:hasChild = dbpedia:child
yago:hasChild = dbpedia:parent-
yago:created > dbpedia:writer
yago:diedIn > dbpedia:placeOfBurial
... and many more. Precision: 96%
Matching the relations:
All experiments run with the same settings.
No parameter tuning.
+ experiments on IMDB
+ experiments on OAEI standard datasets
Conclusion
62
• PARIS matches relations, instances and schema holistically
• PARIS has no parameters to tune
• PARIS shows high precision and recall
marriedTo
Conclusion
63
• PARIS matches relations, instances and schema holistically
• PARIS has no parameters to tune
• PARIS shows high precision and recall
• PARIS allows the YAGO Elvis to marry the DBpedia Priscilla
marriedTo
equals
Happy End
Slides done with PowerLine, the free graphical SVG slide editor with Latex support
64
Backup Slides
with more details
Equality of Instances
70
and
shall be matched if
Let's compute the probability of this happening,
(under independence assumptions)
Unequality of Instances
1970
1935
born
born
marriedTo
How do we guard against inequalities?
71
?
Elvis Presley
label
Elvis Presley
label
Unequality of Instances
1970
1935
born
born
marriedTo
How do we guard against inequalities?
72
?
Elvis Presley
label
Elvis Presley
label
These cannot be equal,
because they have
a different value
for a functional relation.
Unequality of Instances
1970
1935
born
born
marriedTo
How do we guard against inequalities?
73
?
Elvis Presley
label
Elvis Presley
label
These cannot be equal,
because they have
a different value
for a functional relation.
The functionality is defined analogously to the inverse functionality.
1970
1935
born
born
marriedTo
74
?
Unequality of Instances
Every value in one ontology should have a pendant in the other,
weighted by the functionality:
This factor makes sure that
for every
, there is one
equivalent
Unequality of Instances
1970
1935
born
born
marriedTo
Every value in one ontology should have a pendant in the other,
weighted by the functionality:
75
This factor makes sure that
for every
, there is one
equivalent
sang
Lavender Blue
Twinkle Twinkle
sang
All shook up
Jailhouse Rock
Equality of Relations
knows
marriedTo
knows
marriedTo
If every pair of one relation is a pair of another relation,
then the first is a sub-property of the second.
76
(Those in the intersection)
(Those that have a pendant)
knows
subpropertyOf
marriedTo
Example
77
Elvis
dreamsOf
label
Elvis
name
Elvis
Ontology 1
Ontology 2
Ontology 2
Example
78
Elvis
dreamsOf
label
Elvis
name
Elvis
Ontology 1
Ontology 2
Ontology 2
Madonna
label
Madonna
name
Ontology 1
Ontology 2
Example
79
Elvis
dreamsOf
label
Elvis
name
Elvis
label
Madonna
name
Madonna
Ontology 1
Ontology 2
Ontology 2
Ontology 1
Ontology 2
Ciccone
1958
Frozen
Example
80
Elvis
dreamsOf
label
Elvis
name
Elvis
Ciccone
1958
Frozen
label
Madonna
name
Madonna
Ontology 1
Ontology 2
Ontology 2
Ontology 1
Ontology 2
Example
81
Elvis
dreamsOf
label
Elvis
name
Elvis
Ciccone
1958
Frozen
label
Madonna
name
Madonna
Ontology 1
Ontology 2
Ontology 2
Ontology 1
Ontology 2
Example
82
Elvis
dreamsOf
label
Elvis
name
Elvis
Ciccone
1958
Frozen
label
Madonna
name
Madonna
Ontology 1
Ontology 2
Ontology 2
Ontology 1
Ontology 2
Example
83
Elvis
dreamsOf
label
Elvis
name
Elvis
Ciccone
1958
Frozen
label
Madonna
name
Madonna
Ontology 1
Ontology 2
Ontology 2
Ontology 1
Ontology 2