PARIS: Probabilistic Alignment of Relations, Instances, and Schema Fabian M. Suchanek (Max Planck Institute for Informatics) Serge Abiteboul (INRIA Saclay, Webdam team) Pierre Senellart (Télécom ParisTech) 1 CC-BY   Fabian M. Suchanek . This talk was originally given at VLDB 2012. To obtain a PDF version of these slides, open them in Chrome, and print to PDF.
RDF Ontologies Singer plays type born 2 Person An RDF ontology can be seen as a graph of entities subclassOf 1935
RDF Ontologies Singer plays type born 3 Person An RDF ontology can be seen as a graph of entities subclassOf classes relations / properties instances literals 1935
RDF Ontologies instances relations / properties classes literals Singer plays type born 4 Person An RDF ontology can be seen as a graph of entities subclassOf intelligent search, QA, machine translation... Ontologies serve all kinds of purposes: 1935
Existing Ontologies There are literally hundreds of ontologies on the Web Singer type born plays 5 Person subclassOf 1935 ]>
Overlapping Data birthDate 1935 type RockSinger marriedTo YAGO ElvisPedia born Singer type 6 Many ontologies contain similar or overlapping entities and facts. plays 1935
Problem: Elvis is lonely ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 7 plays Who is the spouse of the guitar player? ? 1935 marriedTo
Problem: Elvis is lonely ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 8 plays Who is the spouse of the guitar player? ? 1935 marriedTo Complementary  information cannot be used
Solution: Unify Entities ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 9 plays 1935 marriedTo identical
Goal: Merging Ontologies ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 10 plays To merge two ontologies, we have to identify 1935
Goal: Merging Ontologies ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 11 To merge two ontologies, we have to identify • equivalent instances plays sameAs 1935
Goal: Merging Ontologies subClassOf ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 12 To merge two ontologies, we have to identify • equivalent instances • equivalent or subsuming classes sameAs plays 1935
Goal: Merging Ontologies sameAs subClassOf ElvisPedia birthDate type Singer marriedTo type YAGO 1935 born 1935 RockSinger 13 To merge two ontologies, we have to identify • equivalent instances • equivalent or subsuming classes • equivalent or subsuming relations subPropertyOf plays
Previous work Previous work • uses hard logical constraints, which may be inadequate • requires parameter tuning • has not been tried on large ontologies birthDate marriedTo 1935 RockSinger type 14 type Singer born 1935 [Gracia 2009, Jean-Mary 2009, Isaac 2007, Aumueller 2005, Wang 2008, Noessner 2010,  Sais 2007/2009, Arasu 2009, Volz 2009,   Bhattacharya 2007, Hogan 2007/2010, Hu 2011, Li 2009, Udrea 2007, and more]
Previous work Previous work • uses hard logical constraints, which may be inadequate • requires parameter tuning • has not been tried on large ontologies birthDate marriedTo 1935 RockSinger type 15 type Singer born (1) • has mostly focused on     (1) instance matching or     (2) schema alignment,    ... but not both (2) (1) (2) 1935 [Gracia 2009, Jean-Mary 2009, Isaac 2007, Aumueller 2005, Wang 2008, Noessner 2010,  Sais 2007/2009, Arasu 2009, Volz 2009,   Bhattacharya 2007, Hogan 2007/2010, Hu 2011, Li 2009, Udrea 2007, and more]
PARIS: Aligning Everything At Once born There is a synergy between equality of instances, properties and classes! => Compute all together! birthDate marriedTo 1935 type RockSinger type Singer 16 PARIS 1935
Probabilistic Model We chose a probabilistic model marriedTo 1935 type RockSinger born type birthDate Singer 17  the probability that   is a subclass of    the probability that   =    the probability that  is a sub-property of  1935
Probabilistic Model 18 ... All possible alignments between the ontologies ... Worlds:
Probabilistic Model 19 ... All possible alignments between the ontologies ... Probabilities: ... Worlds:
Probabilistic Model 20 Events: We care here mainly about equality and subsumption events. Each event can be true or false  in a particular world. ... ... Probabilities: ... Worlds:
Probabilistic Model 21 Marginals: The marginal probability of an event is given by the sum of the probabilities of the worlds where the event holds. ... ... Probabilities: ... Worlds:
23 We do not care about these values. We only care about these marginals! (it is the product measure) We are interested in marginals that fulfill certain properties. For any set of marginals, there exists a probability distribution. Probabilistic Model Marginals: ... ... Probabilities: ... Worlds:
Literals birthDate RockSinger type 1935 Singer type born marriedTo 25 ? 1936 The probability that two literals are equal shall reflect the likelyhood that the two literals are intended to refer to the same thing.
Literals birthDate RockSinger type 1935 Singer type born marriedTo 26 ? 1936 The probability that two literals are equal shall reflect the likelyhood that the two literals are intended to refer to the same thing. • for strings:       string distance • for numbers:       numeric distance • for other literals:       domain-specific
Equality of Instances For instances, relations give a hint: Elvis Presley label label born 1935 born 1935 Elvis Presley marriedTo ? 28
Equality of Instances For instances, relations give a hint: Elvis Presley label label born 1935 born 1935 Elvis Presley marriedTo ? 29 Not many people are called Elvis => highly indicative
Equality of Instances For instances, relations give a hint: Not many people are called Elvis => highly indicative Elvis Presley label label born 1935 born 1935 Elvis Presley marriedTo ? 30 Many people are born in 1935 => less indicative
Not many people are called Elvis => highly indicative Many people are born in 1935 => less indicative Elvis Presley label label born 1935 born 1935 Elvis Presley marriedTo ? 31 The local inverse functionality of a relation with an argument is one over the number of incoming links: Local Inverse Functionality
Elvis Presley 1935 marriedTo label 1935 born label born Elvis Presley Only one person is called Elvis ? 32 The local inverse functionality of a relation with an argument is one over the number of incoming links: Local Inverse Functionality
Elvis Presley 1935 marriedTo label 1935 born label born Elvis Presley Only one person is called Elvis ? 33 10 people are born in 1935 The local inverse functionality of a relation with an argument is one over the number of incoming links: Local Inverse Functionality
Probability of Inverse Functionality Elvis Presley Elvis Presley marriedTo label born 1935 1935 label born ? 34 Many people share their year of birth Few people share the same name We define the probability of a relation being inverse functional as the harmonic mean of the local inverse functionalities:
Equality of Instances marriedTo Elvis Presley Elvis Presley label label born born 1935 1935 ? 35  and   shall be matched if
Equality of Instances marriedTo Elvis Presley Elvis Presley label label born born 1935 1935 ? 36
Equality of Instances marriedTo Elvis Presley Elvis Presley label label born born 1935 1935 ? 37
Equality of Instances marriedTo Elvis Presley Elvis Presley label label born born 1935 1935 ? 38 This evaluates to 1 iff • There is at least one     highly inverse functional r • There is one shared     argument 
Equality of Instances • Literals: precomputed • Others: recursive label marriedTo 1935 1935 born born label Elvis Presley Elvis Presley ? 39
Equality of Classes Singer type RockSinger If all instances of one class are instances of the other then the former subsumes the latter 40 type
type 41 subclassOf type Singer RockSinger Equality of Classes If all instances of one class are instances of the other then the former subsumes the latter
type subclassOf 42 type Singer RockSinger Equality of Classes If all instances of one class are instances of the other then the former subsumes the latter
type subclassOf 43 type Singer RockSinger Equality of Classes If all instances of one class are instances of the other then the former subsumes the latter
type subclassOf 44 type Singer RockSinger Equality of Classes If all instances of one class are instances of the other then the former subsumes the latter
Equality of Relations knows marriedTo knows If every pair of one relation is a pair of another relation, then the first is a sub-property of the second. 45
Equality of Relations knows marriedTo knows If every pair of one relation is a pair of another relation, then the first is a sub-property of the second. 46 marriedTo subpropertyOf knows
Equality of Relations knows marriedTo knows If every pair of one relation is a pair of another relation, then the first is a sub-property of the second. 47 knows subpropertyOf marriedTo Can be solved analogously to the subsumption of classes.
Algorithm 1. Fix equalities for literals 48 Literals:
Instances: Relations: 1. Fix equalities for literals 2. Set equalities for relations to a small initial value 3. Iterate the estimations for relations and instances to convergence (*) (*) We have no proof for convergence, but it seems to happen 50 Iterate Algorithm Literals:
Algorithm Iterate 1. Fix equalities for literals 2. Set equalities for relations to a small initial value 3. Iterate the estimations for relations and instances to convergence (*) 4. Compute the estimations for classes (*) We have no proof for convergence, but it seems to happen 51 final step Classes: Instances: Relations: Literals:
Experiment: YAGO and DBpedia 52 Singer SuperHuman 1977 1935 1935 died born gender male born with stillborn twin brother DBpedia YAGO 1.7 million instances 19 million facts 2.6 million instances 18 million facts
Experiment: YAGO and DBpedia 53 Singer SuperHuman 1977 1935 1935 died born gender male born with stillborn twin brother DBpedia YAGO Independent class hierarchies 2.6 million instances 18 million facts 1.7 million instances 19 million facts
Experiment: YAGO and DBpedia 54 Singer SuperHuman 1977 1935 1935 died born gender male Independent class hierarchies born with stillborn twin brother DBpedia YAGO 50% overlap  of instances 2.6 million instances 18 million facts 1.7 million instances 19 million facts
Experiment: YAGO and DBpedia 55 Singer SuperHuman 1977 1935 1935 died born gender male Independent class hierarchies 50% overlap  of instances born with stillborn twin brother DBpedia YAGO Independent properties 2.6 million instances 18 million facts 1.7 million instances 19 million facts
Experiment: YAGO and DBpedia 56 Singer SuperHuman 1977 1935 1935 died born gender male Independent class hierarchies 50% overlap  of instances Independent properties born with stillborn twin brother DBpedia YAGO Many  complementary facts 2.6 million instances 18 million facts 1.7 million instances 19 million facts
Experiment: YAGO and DBpedia 57 Iteration      1      2      3      4 Change 12% 1% 0.3% Time 4:04h 5:06h 5:00h 5:30h Precision 86% 89% 90% 90% Recall 69% 73% 73% 73% Matching the instances: F-Measure increases
Experiment: YAGO and DBpedia 58 Iteration      1      2      3      4 Change 12% 1% 0.3% Time 4:04h 5:06h 5:00h 5:30h Precision 86% 89% 90% 90% Recall 69% 73% 73% 73% Matching the instances: Precision: 74% Aligns roughly half of DBpedia's classes Matching the classes:
Experiment: YAGO and DBpedia 59 yago:actedIn = dbpedia:starring-inv yago:hasChild = dbpedia:child yago:hasChild = dbpedia:parent-inv yago:created > dbpedia:writer yago:diedIn > dbpedia:placeOfBurial ... and many more. Precision: 96% Matching the relations:
Experiment: YAGO and DBpedia 60 yago:actedIn = dbpedia:starring- yago:hasChild = dbpedia:child yago:hasChild = dbpedia:parent- yago:created > dbpedia:writer yago:diedIn > dbpedia:placeOfBurial ... and many more. Precision: 96% Matching the relations: + experiments on IMDB + experiments on OAEI standard datasets
Experiment: YAGO and DBpedia 61 yago:actedIn = dbpedia:starring- yago:hasChild = dbpedia:child yago:hasChild = dbpedia:parent- yago:created > dbpedia:writer yago:diedIn > dbpedia:placeOfBurial ... and many more. Precision: 96% Matching the relations: All experiments run with the same settings. No parameter tuning. + experiments on IMDB + experiments on OAEI standard datasets
Conclusion 62 • PARIS matches relations, instances and schema holistically • PARIS has no parameters to tune • PARIS shows high precision and recall marriedTo
Conclusion 63 • PARIS matches relations, instances and schema holistically • PARIS has no parameters to tune • PARIS shows high precision and recall • PARIS allows the YAGO Elvis to marry the DBpedia Priscilla marriedTo equals Happy End Slides done with PowerLine, the free graphical SVG slide editor with Latex support
64 Backup Slides with more details
Equality of Instances 70  and   shall be matched if Let's compute the probability of this happening, (under independence assumptions)
Unequality of Instances 1970 1935 born born marriedTo How do we guard against inequalities? 71 ? Elvis Presley label Elvis Presley label
Unequality of Instances 1970 1935 born born marriedTo How do we guard against inequalities? 72 ? Elvis Presley label Elvis Presley label These cannot be equal, because they have  a different value for a functional relation.
Unequality of Instances 1970 1935 born born marriedTo How do we guard against inequalities? 73 ? Elvis Presley label Elvis Presley label These cannot be equal, because they have  a different value for a functional relation. The functionality is defined analogously to the inverse functionality.
1970 1935 born born marriedTo 74 ? Unequality of Instances Every value in one ontology should have a pendant in the other, weighted by the functionality: This factor makes sure that for every  , there is one equivalent 
Unequality of Instances 1970 1935 born born marriedTo Every value in one ontology should have a pendant in the other, weighted by the functionality: 75 This factor makes sure that for every  , there is one equivalent  sang Lavender Blue Twinkle Twinkle sang All shook up Jailhouse Rock
Equality of Relations knows marriedTo knows marriedTo If every pair of one relation is a pair of another relation, then the first is a sub-property of the second. 76 (Those in the intersection) (Those that have a pendant) knows subpropertyOf marriedTo
Example 77 Elvis dreamsOf label Elvis name Elvis Ontology 1 Ontology 2 Ontology 2
Example 78 Elvis dreamsOf label Elvis name Elvis Ontology 1 Ontology 2 Ontology 2 Madonna label Madonna name Ontology 1 Ontology 2
Example 79 Elvis dreamsOf label Elvis name Elvis label Madonna name Madonna Ontology 1 Ontology 2 Ontology 2 Ontology 1 Ontology 2 Ciccone 1958 Frozen
Example 80 Elvis dreamsOf label Elvis name Elvis Ciccone 1958 Frozen label Madonna name Madonna Ontology 1 Ontology 2 Ontology 2 Ontology 1 Ontology 2
Example 81 Elvis dreamsOf label Elvis name Elvis Ciccone 1958 Frozen label Madonna name Madonna Ontology 1 Ontology 2 Ontology 2 Ontology 1 Ontology 2
Example 82 Elvis dreamsOf label Elvis name Elvis Ciccone 1958 Frozen label Madonna name Madonna Ontology 1 Ontology 2 Ontology 2 Ontology 1 Ontology 2
Example 83 Elvis dreamsOf label Elvis name Elvis Ciccone 1958 Frozen label Madonna name Madonna Ontology 1 Ontology 2 Ontology 2 Ontology 1 Ontology 2