CC-BY
Fabian M. Suchanek
Knowledge Representation
70
We want to represent knowledge about the world in a computer. No easy feat:
Goal: Make computers understand
2
Knowledge representation is a difficult skill to learn on the job. [...]
The importance of knowledge representation in diverse industry settings [...]
should reinforce the idea that knowledge representation should be a fundamental
part of a computer science curriculum, as fundamental as data structures and algorithms.
Here, we look at the knowledge representation formalism that has evolved as a standard.
—
industry experts from Google, Microsoft, Facebook, Amazon, IBM
Overview
3
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
An
entity
(also: resource) is anything that may be an object of thought.
Entity
4
>digression
Digression: Entities
5
Is this an entity?
How many entities are there?
Or this?
Digression: Entities
6
>digression
How many entities are there?
Isn’t everything just atoms?
Is this an entity?
Or this?
Over time, all parts of a ship are
replaced at some point of time.
Then, is it still the same ship?
[Theseus’s ship on Wikipedia]
Digression: Identity
7
[New York Times]
Digression: Identity
8
Humans replace their cells every 7 years.
Over time, all parts of a ship are
replaced at some point of time.
Then, is it still the same ship?
We consider only a finite set of entities that are of interest, and we assume them to be atomic
and identifiable.
An
identifier
for an entity is a string of characters that represents the entity uniquely.
Def: Identifiers
9
Examples for identifiers:
• unique names, as in YAGO or DBpedia:
Rowan_Atkinson_(actor)
• abstract identifiers, as in Wikidata or Freebase:
/m/02jq1
We sometimes say “entity” when we mean the identifier. We sometimes use images for the identifiers.
Try it out!
An
label
for an entity is a human-readable string that names the entity.
Labels that refer to the same entity are called
synonyms
.
Entities that have a label are called
named entities
.
Def: Labels
10
[Atkinson on Wikidata]
identifier
labels (all synonymous)
A label that refers to several entities is called
ambiguous
.
Def: Ambiguity
11
Example: “Paris” is an ambiguous label, as it can refer to
several cities, a greek hero, or people with that name.
Paris reads about Paris in Paris.
Try it out!
Identifier:
Paris_Hilton
Paris_(Greek_myth)
Paris_(city)
Labels:
“Paris Hilton”
“Paris Whitney Hilton”
“
Paris
”
...
“Paris the Hero”
“
Paris
”
“Alexander”
...
“
Paris
”
“City of Light”
“Parigi”
...
>literals
A
literal
is a fixed value that takes the form of a string of characters.
(It is an entity that is identical to its identifier — in all knowledge bases.)
Def: Literals
12
"1955-01-06"
"Hello world"
"42"
This is the number 42, and any knowledge base
will use the identifier "42" to refer to this entity.
Literals can have a
datatype
"1955-01-06"^^xsd:date
"Hello world"^^xsd:string
"42"^^xsd:int
or a
language tag
"Hello world"@en
see example
Overview
13
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
A
class
(also:
concept
) is a set of similar entities.
Each entity is an
instance
of (also: has the type of, belongs to) the class.
Other classes:
- Scientists
- Cars
- Cities
- Rivers
- Universities
- Theories
- ...
Def: Class
14
(The exact definition of “class” is a philosophical conundrum. See
later
in this lecture.)
Instances
Class “Entertainers”
Rowan Atkinson
Madonna
An instance can belong to several classes.
Multiple Classes
15
“The best way to increase society’s resistance to insulting or offensive speech is to allow a lot
more of it. As with childhood diseases, you can better resist those germs to which you have been
exposed.” — Rowan Atkinson
Class “Free Speech Activists”
Jillian York
Andrei Sakharov
...
Sarah Haider
Liu
Xiao‐
bo
Class “Entertainers”
Rowan Atkinson
Madonna
A
class
is a subclass of another class, if all instances of the first class are also instances of
the second class. A
taxonomy
is a hierarchy of classes.
Def: Subclass, Taxonomy
16
>examples
type
subclass
subclass
subclass
subclass
subclass
subclass
[Queen's College in YAGO 4]
If we can say...
• “a/an X”, “every X”
• “Xs” (plural)
• “This is X”
• “X is a Y”
• “Every X is a Y”
then...
X is a class
X is a class
X is an instance of some class
X is an instance of Y
X is a subclass of Y
Taxonomy: Cheat Sheet
17
>examples
Try it out: city, Elvis, Coli bacteria, Ford, time
Taxonomy: Cheat Sheet
18
If we can say...
• “a/an X”, “every X”
• “Xs” (plural)
• “This is X”
• “X is a Y”
• “Every X is a Y”
>examples
then...
X is a class
X is a class
X is an instance of some class
X is an instance of Y
X is a subclass of Y
Def: Punning
19
Punning
happens when an entity is both an instance and a class.
>examples
iPhone
Brand
CreativeWork
Smartphone
PhysicalObject
subclass
subclass
subclass
type
Punning is used in Wikidata, but avoided in most other knowledge bases, because it makes
reasoning harder.
Example
my-iPhone
type
Taxonomy: Examples
20
iPhone -> smartphone
finger -> hand
apple -> orange
flower -> plant
Paris -> city
fruit -> food
France -> Europe
Paris -> France
city -> country
Taxonomy: Examples
21
iPhone -> smartphone
finger -> hand
apple -> orange
flower -> plant
Paris -> city
fruit -> food
France -> Europe
Paris -> France
city -> country
subclass
partof
sibling
subclass
type
subclass
locatedIn
locatedIn
domain and range of “locatedIn”
>examples
Consider a taxonomy of the animal kingdom.
How do we deal with “male” and “female”?
22
maleAnimal
femaleAnimal
cat
animal
subclassOf?
subclassOf?
subclassOf?
subclassOf?
Taxonomy: Examples
Consider a taxonomy of the animal kingdom.
How do we deal with “male” and “female”?
23
Taxonomy: Examples
maleAnimal
femaleAnimal
cat
animal
femaleCat
Overview
24
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
A relation is like a table.
Intuition: Relations
25
City
Consett/UK
Moscow/Russia
Karachi/Pakistan
Dover/US
Changchun/China
...
Year
1955
1921
1991
1982
1955
...
Relation “born”:
Person
Atkinson
Sakharov
Haider
York
Liu
...
A
relation
(also: predicate, relationship, property) over classes is a subset of their cartesian product.
The classes are called the
domains
of the relation. The number of classes is called the
arity
.
born ⊆ person × city × year
born= {〈Atkinson, Consett, 1955〉,
〈Sakharov, Moscow, 1921〉,
...}
Def:Relation
26
>digression (tough)
A relation is
any
subset of the cartesian product.
It does not have to correspond to a real-world relation.
Its name is arbitrary.
Digression: Semantics
27
born= {〈Atkinson, Consett, 1955〉,
〈Sakharov, Moscow, 1921〉,
...}
The semantics/denotation of a symbol is the “meaning” of the symbol,
i.e., the real world entity or tuples.
D(Atkinson) = Rowan Atkinson
D(born) = { <x,y,z> | x was born in y in year z}
Digression: Semantics
28
Syntax
Semantics
But these are again symbols.
What is their semantics?
Digression: Semantics
29
D(born) = { <x,y,z> | x was born in y in year z}
D(Atkinson was born in Consett)
Digression: Semantics
30
D(Atkinson was born in Consett)
Linguistic parsing
+ Montague semantics
= (λ x y <x,y> ∈ born) Atkinson Consett
Digression: Semantics
31
= <Atkinson, Consett> ∈ born
D(Atkinson was born in Consett)
Linguistic parsing
+ Montague semantics
= (λ x y <x,y> ∈ born) Atkinson Consett
Digression: Semantics
32
born
D(born)
{ People born }
λ born
One man’s semantics is another man’s syntax.
All we do is manipulating symbols.
There is no connection to the real world.
Digression: Semantics
33
A binary relation is a relation of arity 2.
bornInCity ⊆ person × city
The first class is called the
domain
and the second class is called the
range
.
A binary relation whose range are literals is called an
attribute
.
An element of a binary relation is called
fact
(also: triple, statement, claim, belief).
It can be written as an atom, a triple, or a graph edge:
Def: Binary Relation, Fact
34
bornInCity(Atkinson, Consett)
〈Atkinson, bornInCity, Consett〉
bornInCity
The first argument of a fact is called the
subject
, the second the
object
.
>events
n ‐ary to binary
35
Date
2012
2021
1963
...
Speaker
Atkinson
Atkinson
King
...
Title
“
Reform Section 5
”
“
Cancel culture is like a medieval mob
”
“
I have a dream
”
...
Atkinson
“
Reform Section 5
”
“
Cancel culture is like a medieval mob
”
2012
2021
speech
speech
date
date
?
Every n -ary relation can be transformed into binary relations by representing each element by an
event entity
:
Def: Event Entity
36
Atkinson
“
Reform Section 5
” 2012
Section5Speech
CancelCultureSpeech
“
Cancel culture...
” 2021
title
title
date
speaker
speaker
date
[
W3C: Defining N-ary Relations on the Semantic Web
]
Date
2012
2021
1963
...
Speaker
Atkinson
Atkinson
King
...
Title
“
Reform Section 5
”
“
Cancel culture is like a medieval mob
”
“
I have a dream
”
...
n-ary relations enforce the presence of all arguments:
Binary relations don’t:
Binary relations are flexible
37
born
Person
City
Year
Atkinson
Consett
1955
1955
1955
Def: Knowledge Graph
38
A
knowledge graph
(also: Entity-Relationship graph, Knowledge base, KB)
is a set of triples. It can be equivalently seen as a directed labeled multi-graph.
Conventions: “label” is the relation between an entity and its label(s), “type” is the relation between an entity
and its class(es), “subclassOf” is the relation between a class and its superclasses. Details see
later
.
person
subclassOf
entity
subclassOf
bornIn
Consett
city
type
type
Taxonomy
Facts
“Atkinson”
Labels
R_Atkinson
label
>schema
->end
Example: Knowledge Base
39
>schema
->end
[Rowan Atkinson in YAGO]
Conventions: “label” is the relation between an entity and its label(s), “type” is the relation between an entity and its class(es), “subclassOf” is the relation between a class and its superclasses,
“range” is the relation between a relation and its range, similarly “domain”. Details see
later
.
Def: Schema
40
A relation is an entity, and can thus have properties, most notably its domain and its range.
The
schema
of a KB is its subset that describes the relations and the classes.
Schema
->end
person
subclassOf
entity
subclassOf
bornIn
Consett
city
type
type
Taxonomy
Facts
“Atkinson”
Labels
R_Atkinson
label
domain
range
Overview
41
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
Digression: The problem with classes
42
type(class, class)
class ∈ class
class = {cars, frenchPeople, class, ...}
class = {cars, frenchPeople,
{cars, frenchPeople, { cars, ...
“The class “class” is a class”
...in a naïve set-theoretical interpretation
Digression: The problem with classes
43
We distinguish:
1) the class/concept/generalization
2) the extension of the class
The class of French people
The extension of the class of French people
{
Emmanuel Macron,
Alizée,
Napoleon Bonaparte,
Marie Dubois,
Pierre Dupont,
... }
(This means that two classes can have the same extension and still be different. Relations are treated in the same way. See
RDF Semantics
for a discussion.)
>task
Digression: Class entities
44
Example: RDFS
(find “axiomatic triples”)
Draw a knowledge graph with the relations domain and range .
Can domain and range appear as nodes?
->end
The
inverse
of a binary relation r is a relation r' , such that r'(x,y) iff r(y,x) .
livesInCity⊆ person × city
livesInCity(Atkinson, Consett)
hasInhabitant ⊆ city × person
hasInhabitant(Consett, Atkinson)
inverses
of each
other
Def: Inverse
45
KBs usually choose the variant that has least objects per subject (here: livesInCity )
A
function
(also: functional relation) is a binary relation that has at most 1 object for each subject.
hasBirthPlace
hasDeathPlace
hasNationality?
r functional ≡ ∀ x: |{y : r(x,y)}| ≤ 1
Def: Function
46
An
inverse functional relation
is a relation whose inverse is functional.
isBirthPlaceOf
hasCitizen (?)
hasEmailAddress
r inv. functional ≡ ∀ y: |{x : r(x,y)}| ≤ 1
Def: Inverse Functional Relation
47
If two entities share the same object of an inverse functional relation, they are equal.
born(Bean,1955)
∧ born(MrBean,1955)
⇒ (nothing)
hasEmail(Bean,me @bean.com)
∧ hasEmail(MrBean,me @bean.com)
⇒ MrBean = Bean
Digression: Equality
48
>meta
->end
A fact can be modeled as a class or as a relation.
Digression: Classes and Relations
49
woman
type
gender
female
type
singer
job
singer
>reification
->end
In general, we a classes when that class could serve as domain or range of a relation.
Otherwise, we use relations.
>canonicity&reification
Overview
50
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
A
reified statement
is an entity that represents a statement. This phenomenon is called reification.
represents
Reified statements
51
Danse Classique
Dance
School
Alizee
completed
statement = class of reified statements
subject ⊆ statement × entity
predicate ⊆ statement × relation
object ⊆ statement × entity
Dance
School
Reification Vocabulary
52
subject
object
predicate
s41
Alizee
completed
statement
type
hopes Pierre << Alizee type single>>
hopes(Pierre, s42)
subject(s42, Alizee)
predicate(s42, type)
object(s42, single)
Example: Reification
53
Simplified notation (RDF-star):
Pierre
see example
The difference to event entities is that reified statements can be hypothetical (= not part of the KB).
Reification and Event Entities
54
single
subject
object
predicate
s42
Alizee
type
>task
->end
Overview
55
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
An entity is
canonic
in a KB, if it has a single identifier in the KB.
A KB where entities are not canonic is called an
open KB
.
Canonicity is essential for
• counting
• answering queries
• constraint satisfaction
Def: Canonic Entities
56
...
...
...
not canonical
produced
Gourmandises
hasProduced
Psychédélices
Alizee
A. Jacotey
Jacotey is considered one of the “100 Se
women of the world”. The singer said in
that Alizée is married, but she lives sepa
Canonicity is not easy to achieve
57
[TextRunner]
Example: Non-Canonicity
58
not a defined relation
not an entity
near duplicates
Example: Canonicity
59
YAGO
But: only very limited information!
Example: Canonicity
60
YAGO
[John Smith in Wikidata]
• easier to extract
• less easy to use
• more noise
• more data
• difficult to extract
• easy to use
• less noise
• less data
Canonicity as Trade-Off
61
non-canonic
canonic
->end
Overview
62
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
Def: Closed World Assumption
63
The
Closed World Assumption
(CWA) says that a claim that is absent from the KB
is false in the real world.
Rowan_Atkinson
UK
hasNationality
Rowan Atkinson is not Uruguayan
But: KBs do not operate under the CWA!
Def: Closed World Assumption
64
The
Closed World Assumption
(CWA) says that a claim that is absent from the KB
is false in the real world.
Rowan_Atkinson
UK
hasNationality
Rowan Atkinson is not Uruguayan
But: KBs do not operate under the CWA!
This is because KBs are usually highly
incomplete
,
meaning that they do not contain all entities, relations,
and facts of the real world.
Def: Open World Assumption
65
KBs operate under the
Open World Assumption
(OWA), which says that a claim that is absent from
the KB can still be true in the real world.
Is Rowan Atkinson Uruguayan?
KB: Maybe!
Did Elvis co‐author the Bible?
KB: Maybe!
Bible
Paul the Apostle
author
author?
Elvis
Rowan_Atkinson
UK
hasNationality
Negative information in KBs
66
KBs usually do not store negative information.
Rowan_Atkinson
UK
hasNationality
Uruguay
does‐not‐
haveNationality
The lack of negative information and the Open World Assumption
mean that there is no canonical way of knowing from a KB
whether a certain statement is false.
Deducing negative assertions
67
•
functional relations allow only a single object
•
we can assume that KBs are locally complete (contain all objects
for a given subject and relation) if they contain at least one object
•
logical constraints can exclude claims
•
negative information can be mined from text
•
cardinality information can be mined from text
Atkinson born in Paris? -> no
Atkinson was married to Madonna? -> no
Atkinson studied at Madonna? -> no
Atkinson was married only once.
>reasoning
[Simon Razniewski, Hiba Arnaout, Shrestha Ghosh, Fabian M. Suchanek: “Completeness, Recall, and Negation in Open-World Knowledge Bases: A Survey”, ACM Computing Surveys, 2023]
Reasoning on KBs
68
How can we reason in the absence of negative information?
• Purely positive deductive reasoning (RDFS)
• Defining axioms (Description Logics, OWL)
• Finding correlations in the data (Rule Mining)
• Constraining the data (SHACL)
marriedTo(x , y ) ⇒ type(x , human)
human ⊓ organization ⊑ ⊥
(the intersection of humans and organizations is empty)
hasChild(x ,y ) ∧ hasChild(z ,y ) ⇒ marriedTo(x , z ) [60%]
Person property prop1. Prop1 hasName birthDate.
>RDFS
see example
Reasoning on KBs: RDFS
69
RDFS
is a set of classes and relations with predefined semantics, given by the following
deductive rules:
〈P, rdfs:domain, C〉, 〈S, P, O〉 ⇒ 〈S, rdf:type, C〉
〈P, rdfs:range, C〉, 〈S, P, O〉 ⇒ 〈O, rdf:type, C〉
〈S, P, O〉 ⇒ 〈P, rdf:type, rdf:Property〉
〈S, P, O〉 ⇒ 〈S, rdf:type, rdfs:Resource〉
〈S, P, O〉 ⇒ 〈O, rdf:type, rdfs:Resource〉
〈Q, rdfs:subPropertyOf, R〉, 〈P, rdfs:subPropertyOf, Q〉 ⇒ 〈P, rdfs:subPropertyOf, R〉
〈Q, rdf:type, rdf:Property〉 ⇒ 〈Q, rdfs:subPropertyOf, Q〉
〈P, rdfs:subPropertyOf, R〉, 〈S, P, O〉 ⇒ 〈S R O〉
〈C, rdf:type, rdfs:Class〉 ⇒ 〈C, rdfs:subClassOf, rdfs:Resource〉
〈A, rdfs:subClassOf, B〉, 〈S, rdf:type, A〉 ⇒ 〈S, rdf:type, B〉
〈Q, rdf:type, rdfs:Class〉 ⇒ 〈Q, rdfs:subClassOf, Q〉
〈B, rdfs:subClassOf, C〉, 〈A, rdfs:subClassOf, B〉 ⇒ 〈A, rdfs:subClassOf, C〉
〈X, rdf:type, rdfs:ContainerMembershipProperty〉 ⇒ 〈X, rdfs:subPropertyOf, rdfs:member〉
〈X, rdf:type, rdfs:Datatype〉 ⇒ 〈X, rdfs:subClassOf, rdfs:Literal〉
Given a KB, adding all consequences of all rules yields the
deductive closure
of the original KB.
>RDFS
Example: RDFS
70
RDFS
cannot derive contradictions:
〈P, rdfs:domain, C〉, 〈S, P, O〉 ⇒ 〈S, rdf:type, C〉
〈Garfield, rdf:type, cat〉
〈Garfield, marriedTo, Priscilla〉
〈marriedTo, rdfs:domain, person〉
RDFS rule
KB
Example: RDFS
71
RDFS cannot derive contradictions:
〈P, rdfs:domain, C〉, 〈S, P, O〉 ⇒ 〈S, rdf:type, C〉
〈Garfield, rdf:type, cat〉
〈Garfield, marriedTo, Priscilla〉
〈marriedTo, rdfs:domain, person〉
〈Garfield, rdf:type, person〉
Deduction
RDFS cannot say that cats and people are disjoint.
RDFS rule
KB
Overview
72
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
73
Digression: Reality
Madonna
York
person
female
We model reality by a representation.
74
Digression: Reality
A14
A16
A15
A17
Entities are represented by arbitrary identifiers.
Can we achieve a unique mapping from identifiers to entities?
75
Digression: Reality
?
?
A14
A16
A15
A17
Most likely no: A Chinese dictionary is a model of the world...
...yet, by reading it, you cannot learn Chinese.
76
Digression: Reality
In today’s knowledge bases, knowledge is represented by
•
instances:
Paris, Atkinson, United Nations, ...
with labels:
“Paris”, “city of light”, “Parigi”,...”
•
classes:
people, actors, organizations,...
arranged in a taxonomy (
“actors” is a subclass of “people”
)
•
relations:
livesIn, locatedIn,...
usually limited to binary relations
77
Summary: Knowledge Representation
bornIn
->Disambiguation
->Entity-typing
->Fact-extraction
->NER
->Nerc
->Regexes
References
78
• W3C consortium:
RDF Standard
• Suchanek et al:
Knowledge Representation in Entity‐centric KBs