Summary Artificial Intelligence (c) 2002-01-28 Fabian M. Suchanek http://suchanek.name/texts/summaries/ai.txt This is a summary of the course "Methods of Artificial Intelligence" held by Prof. Claus Rollinger and Dr. Kai-Uwe Kuehnberger in the WS 2001 at the University of Osnabrueck. Since the official script barely gives clear and consistent definitions, most definitions are just developed by induction and made consistent. By reading the following text, you accept that the author does not accept any responsibility forthe correctness or completeness of this text. If you have any corrections or remarks (especially concerning the statements marked with a (?)), please send me a mail. This is the only way to make the publication of this summary useful for me, too. My e-mail address is f.m.suchanek@zweb.de, but the letter 'z' has to be removed from the address. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Problem Solving ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aritificial Intelligence (AI): ... studies computations that realize cognitive functions. It's human _competence_ which serves as a model for AI. Dimensions of AI: Types of Rationality: Cognitive approach or Engeneering approach Types of Competence: Acting or Reasoning Short history of AI: AI had a strong co-evolution with psychology 1940-1970: Neural networks, heuristic search (Dartmouth conference) 1960-1980: Domain knowledge: microworlds, expert systems 1980-1990: Distributed processing, neural networks 1990-????: Multiagent systems, artificial social organizations Production systems: A production system is a tupel (S,T,C) of * States (all possible states) * Transitions (all operations available) * A Control structure (the unit which executes the transitions (?)) Search: Solving a problem can be seen as searching for the solution by trying out operations that might lead to the goal. Generally speaking, we have an "initial state", lots of possible states and "actions"/ "operators" to transform the current state into another state. We aim to reach a "goal state" by applying a sequence of operators. We have a function (a goal test) which tells us if the current state is a goal state. Graphs: A graph is the drawing of a net. It consists of "nodes" (dots), connected by "edges" (lines). The edges may be "directed" (indicated by arrows), meaning that they can only be passed in one direction, and "weighted" (indicated by numbers), meaning that it costs something to pass this edge. A sequence of connected nodes is called a path. The sum of the numbers on the edges is called the "path cost" of that path. State space: The state space is a graph consisting of: * S: all possible states S (drawn as nodes of the graph) * A: actions leading from one state to another (drawn as the edges connecting the nodes). * i: an initial state * P: a goal test (a function P:S->{true,false}) Solutions to the problem are paths in the state space, beginning at i and ending at any node for which P returns "true". Tree: A tree (in the sense of computer science :-) is a parent node (a dot) with zero, one, two, or more children. Each of the children is a tree again. Nodes without children are called "leaves". In this course, we refer to the average number of children as "b" (breadth) and to the number of generations (levels) as "d" (depth). Search tree: A search tree visualizes the nodes visited by a search algorithm (s.b.). Its nodes are the nodes of the state space. Costs: There are two kinds of costs: * search costs (offline costs): Resources used to find the solution * path costs (online costs): Resources used in carrying out the solution, i.e. path costs as defined in "state space" Complexity (O-calculus): The complexity of a function roughly describes the course of this function. For instance, f(x)=2x^2+3x-10 roughly behaves like x^2. One writes: f E O(x^2). (Here, "E" means "is an element of" and "^" means "to the power of") Mathematically speaking, the complexity of a function f is another function g such that: There is a constant c and there is a starting value x0 such that for all x>x0: f(x)= solution_depth Optimal: no Time complexity: O(b^n) Space complexity: O(b*n) Bidirectional search: Un-informed search, means searching with breadth-first search from the goal and from the initial state until the two searches meet. Advantage: The search depth of each search is d/2, the time and space complexity b^d are thus reduced drastically Disadvantages: The goal has to be known and it has always to be checked whether the two searches have met Complete: yes Optimal: yes Time complexity: O(b^(d/2)) Space complexity: O(b^(d/2)) Best-first search: Informed search which can estimate the costs from a current node to the goal node. It first expands the most promising nodes. Greedy search: A best-first search, which always first expands the node which appears closest (i.e. cheapest) to the goal. It does not take into account the costs of the path from the initial state to the current state. Complete: yes, provided that the tree is finite and that cycles are checked Optimal: no Time complexity: O(b^d) Space complexity: O(b^d) A* search: A best-first search which first expands the node which appears to be on the best path from the initial node to the goal node. A* thus takes into account: The real cost from the initial node to the current node n: g(n) The estimated cost from current node n to the goal node: h(n) A*'s guess about the usefulness of the current node n is thus: f(n)=g(n)+h(n) Complete: yes Optimal: yes Time complexity: O(b^d) (?) Space complextity: O(b^d) (?) Admissible heuristics: In order to work properly, A* needs a heuristic h which fulfills certain conditions. Suppose h*(n) are the real costs from node n to the goal, then it must hold: h(n)==h1(n) for all nodes n. As a result, h2 is more realistic, without overestimating the costs. If A* is run with h2, it will expand less or exactly as many nodes as with h1. Pathmax-search: Sub-type of A* which uses the pathmax lemma. In ordinary A*, it may occur that the costs f(n) decrease along a path, i.e. the total path costs become cheaper the more you walk (it is a non-monotistic heuristic). To avoid this, do not assign f(n)=g(n)+h(n), but use f(n)=max( g(n)+h(n) , f(predecessor(n)) ) (pathmax lemma). As a result, f will always be non-decreasing along the path, taking care that A* focuses on expanding the optimal path. Hill-climbing: Informed search which just always chooses the node which appears best. No stepping back is done. When all nodes around appear worse than the current node, terminate. Analogy: Searching the top of the Mount Everest by just always chosing the direction where it seems to go up. Complete: no Optimal: no, in Osnabrueck, you would end up on the Westerberg Time complexity: O(d) where d is the number of steps to the next hill Space complexity: O(1), nothing is kept in memory Problems: Lots. The algorithm only finds a local maximum (a hill) and can get lost in a plateau Iterative improvement search: Take a single complete but suboptimal solution and try to improve it by local modifications. Space complexity: O(1), just keeps one single path in memory. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Constraints ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Constraint satisfaction problems (CSPs): Unlike in search strategies, something is known about the goal test in CSPs: The goal fulfills certain conditions (constraints) and we can calculate to which extend (fully, partially, not at all) the current state already fulfills these conditions. A CSP consists of a number of variables which have to be assigned values such that the constellation fulfills the goal conditions. In the initial state, all variables are unassigned. Each variable has a set of possible values. When solving the problem, we will decrease the number of possible values for each variable until there is only one possible value for each variable (or more, if there are more solutions to the problem). Constraints: Constraints can be * unary, i.e. of the form predicate(variable). This means that all values which could ever be assigned to this variable must fulfill a constraint. The set of values satisfying these unary constraints is called the "domain" of this variable. * binary, i.e. of the form predicate(var1,var2). This means that for every possible value of var1, there must be a possible value of var2, such that a relation is fulfilled. Consistency: * A variable is node-consistent if its set of possible values only contains those which fulfill the unary constraints for that variable * A pair of variables is arc-consistent if all their binary constraints p are fulfilled, i.e.: For every val1 out of the first variable's set there is a val2 out of the second variable's set such that p(val1,val2). * The problem is path-consistent if for each variable, a value can be chosen out of its set of possible values such that all binary constraints are fulfilled at the same time. This translates to: For all 3-tupels (A,B,C) of variables I can choose, the following holds: If there is a pAC(a,c) with a and c out of the values of A and C, then there is a value b out of B such that pAB(a,b) and pBC(b,c). Consistency algorithm: 1. Make all variables node-consistent by kicking out the values which do not fulfill the unary constraints (domain restriction operation) 2. Make all pairs arc-consistent by kicking out the values which have no correspondant in the other variable such that the binary constraints are fulfilled (arc consistency domain restriction operation) 3. Choose a value for each variable out of the sets of remaining values such that the problem is path-consistent. Path consistency: Establishing path-consistency is non-trivial: There are multiple possible combinations of variable and values, but only some fulfill all binary constraints at the same time. How can we pick the one which does? Here are some observations: * The order of variable assignments does not change the solution * A violated constraint cannot be corrected by further assignments. This leads to a simple cutoff criterion Forward checking: A strategy for establishing path-consistency. Path consistent assignments can be found by keeping track of the possible values for each variable which have not yet been tried out. When all possible values of one variable have been tried out and no solution has been found, there is none. Most constrained variable heuristic strategy: A strategy for establishing path consistency, extending the forward checking strategy. The variable with the fewest possible values is chosen next to have a value assigned. Most constraining variable heuristic strategy: A strategy for establishing path consistency, extending the Most constrained variable heuristic strategy. The variable involved in the largest number of constraints on other unassigned variables is chosen to have a value assigned. Least constraining value heuristic strategy: A strategy for establishing path consistency, extending the Most constraining variable heuristic strategy. Chose a value that kicks out the smallest number of values for the connected variables. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Game Playing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Game playing: A game in sense of computer science is an initial state, a set of legal moves transforming the current state to another state and a valuation function which tells us if the current state is a game over state and who won (?). Playing a game means that two parties alternately perform moves. The game playing problem is finding good moves in order to win the game. This problem differs from classical search since there is an opponent whose moves cannot be predicted. Just for fun, we'll call the starting player MAX and the other one MIN. Types of games: Randomize dimension: Is the game deterministic or is chance involved (as in games with a dice)? Information dimension: Does the player have perfect information or imperfect information (like in card games where the opponent's cards are unknown)? Searching the optimal move: A game tree can be unfolded with the initial game state in its top node and all possible states after MAX's move in the 1st level. The next level is made up by all possible states after MIN's move etc. We just consider legal moves. A leaf is reached when the game is over. MiniMax Algorithm: An algorithm for determining which moves in a deterministic, perfect- information game would be most useful for MAX in order to win. First, the whole tree has to be unfolded for every possible sequence of moves. Then, the final states are examined and given a "utility" (a valuation, eine Bewertung): +1 if MAX won, -1 if MIN won and 0 in case of a remi. Other values could be imagined, depending on how well MAX won or how badly he lost. Now these values are propagated up: Beginning at the last but one level, we go up. When we are on one of MAX's levels, we chose the maximum value of the node's children, when we are on one of MIN's levels, we chose the minimum value of the node's children. Thus, we assume that each player always chooses the optimal move for himself: MIN wants to reach a -1-leaf and MAX heads towards a +1-leaf. Complete: yes, if the tree is finite Optimal: yes, assuming an optimal opponent Time complexity: O(b^d), all nodes are visited Space complexity: O(b*d), since for every level, one node with all its brothers is kept in memory (?) Problem: For really tricky games (chess), it is impossible to unfold the whole game tree since b=35 and d=100. Minimax-Cutoff Algorithm: Simplification of the Minimax Algorithm: The game tree is not unfolded to the leaves, but only to a given level. The nodes of the last expanded level are then given an estimated utility instead of a real one. Alpha-beta principle: If you have an idea that is surely bad, do not take time to see how truely awful it is. (Nilsson) Alpha-Beta Algorithm: Improvement of the Minimax Algorithm, driven by the alpha-beta principle. That is: If a player has the opportunity of choosing a good move (i.e. a high value for MAX or a low value for MIN), it is not necessary to also unfold the worse alternatives. Imagine we are on MAX's level and he can choose between a node with the utility 15 and another node with an unknown utility in the underlying MIN-level. We expand the unknown node and its first child has the utility 3. Then we know that MIN will either choose this 3 or expand the other children. In any case, his choice will be =< 3. Since we already have a good alternative (the 15), there is no use figuring out what MIN might have chosen if we had taken the unknown alternative. Cutting away these unnecessary alternatives is called alpha-beta pruning. We call MAX's choices alpha-values and MIN's choices beta-values. The efficiency of alpha-beta pruning also depends on the order in which we evaluate the alternatives (move ordering). Complete: yes, if the tree is finite Optimal: yes, against an optimal opponent Time complexity: O(b^(d/2)), since the depth is divided by two on average when move ordering is used Space complexity: (?) Nondeterministic Minimax Algorithm: Adjustment of the Minimax Algorithm for non-deterministic games: Each move by MIN or MAX is followed by a "chance": The result state R of a player's move has again n children (possible successive states, with a dice, n would be 6) with a probability assigned to them. The Minimax algorithm now multiplies the utility of each of these successive states with their respective probabilities. The sum of these products is then the utility of node R. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cognitive Search ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Theories on the productiveness of human thought: * Behaviorist: The human mind works by trial and error * Gestaltist: Solutions are found by data-driven perceptual organization (insight) * Newell & Simon (1972): The mind searches for the solution in a state space (problem space hypothesis) Weak strategy: An uninformed, but complete strategy for searching in a state space. Strong strategy: An informed, but incomplete strategy for searching in a state space. Means-End-Analysis: A strategy for solving a problem which works as follows: * If the current state is the goal state, we're done * Choose the operator which covers the largest distance to the goal * In order to make this operator available, the necessary preconditions of this operator need to be fulfilled. Make these preconditions a subgoal and achieve this subgoal by applying Means-End-Analysis again GPS, General Problem Solver: A program written by Newell & Simon in 1972 which solved a problem by means-end-analysis. Expert chess players and novices: Expert chess players do not unfold more nodes of the state space (check out more possible moves) and they do not unfold more levels of the state space (check out more possible concequences) than novice chess players. Experts just choose the better moves and thus cut down their search tree. Visual ability of expert chess players: Expert chess players are better at memorizing and reproducing a constellation of chess pieces on the board. But this only works for chess game constellations -- both experts and novice fail in memorizing and reproducing random constellations. Experts have just learned to memorize common configurations of pieces as one single perceptual unit. Chunk theory: The human working memory can hold about 7 units of information. These units may be words, numbers, letters or concepts. Years of practice establish these units. Sequential access to lists: Sternberg found out in 1969 that when people are to memorize lists, they can access the first element fastest ("anchor effect"). The more remote the required item is from the beginning, the longer it takes to access it. Sequential access to images: Kosslyn found in 1981 that when people memorized a map and are asked to shift their attention from one point in the map to another, the time needed is proportional to the distance in the map. Isomorphic problems: Two problems are isomorphic if they are essentially the same. Example: The two-player game of choosing a number from the set of {1,2,3,4,5,6,7,8,9} alternatedly in order to get three numbers which sum up to 15 is isomorphic to TicTacToe: The numbers can be arranged in a 3x3 matrix and the number- choosing game can be reduced to getting 3 numbers in a row -- like in TicTacToe. Cognitive treatment of isomorphic problems: As analyzed by Kotovsky, Hayes & Simon in 1985, two isomorphic problems are represented differently in the human brain and may be of different cognitive complexity. The working memory load and the ease of learning and applying state operators is highly dependend on the chosen representation of the problem. Modus ponens: The inference that if it holds that p implies q and it holds that p, then it follows q. Example: When it rains, the street gets wet. It is raining. Hence the street gets wet. Modus tollens: Another way of expressing the modus ponens. If it holds that p implies q and it holds that q is false, then it follows that p must be false, too. Example: When it rains, the street gets wet. The street is not wet. Hence it does not rain. (Because if it rained, the street would be wet.) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Knowledge Representation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most important representation formalisms: * PROLOG * Predicate logic (PL) * KL-ONE language family * Semantic nets * Frames * Terminological systems * Production systems * PROLOG Object, entity, member, instance: Anything. Relation: A relation on two sets is a subset of the cartesian product of these sets. In terms of functions, a relation is a function which gets two objects and returns a truth value. Possible writings for a relation R and two objects a and b: a R b, R(a,b), R(a,b)=true, (a,b) E R, E R, a is a R of b. Concept, category, class: A union of objects which are in some way similar. Membership: The fact that an objects belongs to a category. Possible writings for object o and category C: o is a member of C, o is an instance of C, o realizes C, o is a C, o is in C, o is an element of C, o belongs to C C includes o. Ontology: A system of categories. Importance of an ontology: An ontology can make information tools of the future "understand" with what they are dealing. Domain information would be encoded precisely and queries could be formulated in a more precise way than by keywords. Postulations for an ontology: * It must be extendible to fit other domains and jargons * It must have a simple language to enable users to understand it * It must be HTML compliant in order to be integrated into the Web Artificial world: AI scientists' playground. Agent: An actor in an artificial world. He usually gets sensory inputs from the world and performs actions changing the state of the world. He has (or builds up) knowledge about the world, including the current state, the results of his actions and condition-action rules. Upon this knowledge and sensory inputs he decides which action to take. Properties of an artificial world environment: An artificial world environment is called * Accessible, if the sensors detect all aspects of the environment that are relevant to the choice of action * Deterministic, if the next state is completely determined by current state and the action selected * Static, if the environment does not change while the agent is deliberating * Discrete, if there is a limited number of distinct, clearly defined percepts and actions Wumpus world: An artificial world. Consists of a 2-dimensional matrix with an agent, a monster ("wumpus"), several pits and a bunch of gold, each entity occupying one field of the matrix. Goal: The agent has to find the gold and avoid the pits and the wumpus. Actions: Go, turn, shoot (once) Inputs: A "breeze" when any field around him is a pit, a "smell" when any field around him is a wumpus and a "glitter" when he is on the field with the gold. Environment: * not fully accessible, the agent can only get local inputs * deterministic, all results of actions are specified * discrete, there are no fuzzy states * static, the wumpus and the pits do not move Procedural approach: Telling the agent how to act. This implies a conventional imparative programming style. Declarative approach: Telling the agent what to do. The agent will then infer the way to the goal by his domain-specific knowledge, his sensory inputs and a built-in generic inference engine, which is domain-independent. This implies a logic programming style (PROLOG). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Predicate Logic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Syntax: A set of rules defining the well-formed expressions of a formal language. Semantics: A mapping from formal language expressions to real world facts. Proposition: An abstract representation of a fact. Propositional logic: A special formal language which can work on propositions. Propositional logic formula: * An uppercase letter (standing for a proposition) OR * A string of the form ~X where X is a formula OR * A string of the form A /\ B where A and B are formulas OR * A string of the form A \/ B where A and B are formulas First order logic, FOL, predicate logic, PL: A special formal language which extends propositional logic. FOL variable: An undetermined component in FOL, written as a lowercase letter (usually x,y,z). FOL function: A function from terms to terms, written as a lowercase letter (usually f,g,h) or with a name. Examples: sin, f, sqrt, max, ... FOL constant: A 0-place FOL function, usually given a proper name. Examples: wumpus, rollinger, 3, 5.0, ... FOL term: * a FOL variable OR * a FOL function call Example: sin(sqrt(max(x,y))) FOL predicate: A relation on FOL terms, written as an uppercase letter (usually P,Q,R). Examples: P(x), Greater(7,sqrt(3)) FOL formula: * A string of the form P(t1,t2,...tN) where P is a predicate and ti are terms OR * A string of the form ~X where X is a formula OR * A string of the form A /\ B where A and B are formulas OR * A string of the form A \/ B where A and B are formulas OR * A string of the form "All x: v" where x is a variable and v is a formula OR * A string of the form "Ex x: v" where x is a variable and v is a formula Bound FOL variable: A variable is bound in a formula Ex x: V if it occurs in V. (Same with All x: V) Free FOL variable: A variable is free if it is not bound. FOL sentence: A formula without free variables. Prenex form: A FOL formula with all its occurring quantifiers at the beginning. For any FOL formula, there exists an equivalent formula in prenex form. A FOL formula can be transformed to prenex form as follows: 1. Rename all bound variables such that they are unique in the formula 2. Move all quantifiers to the beginning. With outer-level AND- and OR-expressions, this can be done by just wiping out the quantifier and adding it at the beginning. All complex formulas have to be transformed to simple AND- and OR- formulas. Example: (Ex x: v) => a ~~~~~~> All x: ~v \/ a because: (Ex x: v)=> a <=> ~(Ex x: v) \/ a <=> (All x: ~v) \/ a Skolem form: A prenex form without any existential quantifiers. For every prenex form there is a Skolem form, which is not semantically equivalent, but equivalent with respect to satisfiability. Existential quantifiers in a prenex form v can be removed as follows: 1. If there is no existential quantifier in v, we're done 2. Scan the formula up to the first existential quantifier, keep track of all universally quantified variables so far 3. Choose a function symbol which does not yet occur in the formula 4. Replace all occurances of the existentially quantified variable by f(x1,x2,...xN), where f is the new function symbol and x1,...xN are the preceding universally quantified variables 5. Wipe out the existential quantifier 6. Go to 1 Literal: A string of the form P(t1, t2, ...tN) where P is a FOL predicate and ti are FOL terms. Conjunctive normal form, CNF: A FOL formula is in conjunctive normal form if it is of the form (A1 \/ A2 \/... AN) /\ (B1 \/ B2 \/... BM) /\ ... where all symbols are literals. Clause: A set of literals. It stands for the disjunction of these literals. Example: (a \/ b \/ c) ~~~~~> {a, b, c} Clause form: A set of clauses. The clause form is another way of writing a conjunctive normal form. Example: ( a \/ b) /\ (c) ~~~~> { {a,b}, {c} } Substitution pair: A pair of a variable and an identifier. The variable is supposed to be replaced by the identifier in a formula. Example: [variable/identifyer] Substitution: Applying a substitution pair to a formula, i.e. replacing the variable indicated in the substitution pair by the identifier throughout the formula. Example: (a /\ b \/ x) [x/c] = (a /\ b \/ c) Unifier: A set of substitutions is a unifier of a set of literals, if applying the substitutions to the literals makes all literals equal. Example: Literals: L1 = P(x,a), L2 = P(b,y), Unifier: { [x/b], [y/a] }, result: P(b,a) Unifiable: A set of literals is unifiable if there is a unifier for it. Most general unifier, MGU: A unifier of a set of literals is the most general unifier if for any other unifier, there is a substitution which turns the result of the MGU into the result of this other unifier. Carelessly speaking, the MGU is the unifier which eliminates the least variables. Example: For { P(x), P(f(y)) }, the unifier { [x/f(y)] } is more general than {[x/f(a)], [y/a] }, because the result of the first (P(f(y)) can be turned into the result of the second (P(f(a))) by applying [y/a]. Warning: This seems to be explained wrongly in the script: The script talks of applying the substitution to the unfiers instead of to the resulting literals! Resolvent: A clause R is the resolvent of two clauses C1' and C2' if * C1' and C2' can be put thru substitutions such that the results C1 and C2 do not contain common variables (Concretely: Rename the variables in C1 and C2 to make the names unique) * There exist literals L1,L2,...LN in C1 and ~L1', ~L2', ...~LN' in C2 such that { L1, L2,...LN, L1', L2', ... LN' } is unifiable. Let s be the MGU of this set. (Concretely: Find literals in C1 that have negated counterparts in C2 or vice versa) * R = (C1 - {L1, L2, ...LN}) U (C2-{~L1', ~L2', ... ~LN'}) s (Concretely: Substitute variables in C1 and C2 to make L1 look like L1' etc., then wipe out all these literals in both C1 and C2, what you get is the resolvent) Resolution: Finding a resolvent for two clauses. If the resolvent is an empty clause, then this indicates that the two clauses are not in all cases both true, i.e. ~ ( All x1: All x2:... All xN: C1 /\ C2). This in turn means that there is an instantiation of the variables such that C1 /\ C2 becomes false. Imagine you knew that C1 is true in all cases and Q=~C2. Since there is a case where C1 /\ C2 is false, in this case C2 must be false, which is equivalent to saying that there is a case where Q is true. If C1 is your knowledge base (true in all cases) and Q is a query, this is the way to prove that there is an answer to Q. Resolution strategy: Heuristic rules which specify the order of resolving the clauses. Can be used to speed up resolution. Unit clause resolution rule: Resolve those clauses which consist of one literal first. Linear resolution proof: A linear resolution proof for a set S of clauses is a sequence of clauses ending in the empty clause where each clause is * a resolvent of the preceding clause and a clause of S OR * a resolvent of the preceding clause and any other preceding clause Restricting resolution: Avoiding certain resolution steps in order to restrict the search space. As a disadvantage, the resolution may become incomplete. Horn clause: A clause which contains at least one positive literal. Example: { ~a, ~b, c } is the same as ~a \/ ~b \/ c, which is a /\ b => c. Definite clause, program clause: A clause which is an entry in a knowledge base. SLD resolution: Linear resolution of horn clauses with a selection function for definite clauses. Strategy of coercion: Chose an action which turns the current state safe no matter what it was before. (?) Diagnostic rule: A rule infering the cause from the effect. Example: The street is wet. Hence it is raining. Causal rule: A rule infering the effect from the cause. Example: It is raining. Hence the street is wet. Situation calculus: An extension of FOL which can represent changes of facts by adding a situation argument to each predicate. Example: unhappy(bob,tue2pm), where tue2pm denotes a situation. Result function: A function receiving an action and a situation and returning another situation, which is the result of applying the action in the original situation. Example: result(eatChocolate,s) = s1, where it holds that happy(agent, s1) Effect axiom: A rule describing the consequence of an action. Example: All s: hasChocolate(agent,s) => happy(agent,result(eatChocolate,s)) Frame axiom: A rule describing the unchanged environment. Example: All s: stillHasToShootBloedenWumpus(agent,s) => stillHasToShootBloedenWumpus(agent,result(eatChocolate,s)) Frame problem: The problem of finding an elegant way to handle non- change. This includes avoiding the writing down of numerous frame axioms and the copying of frame inferences in every step. Qualification problem: The problem of formalizing facts exhaustingly. Ramification problem: The problem of including all possible consequences of an action (?). Successor state axiom: A rule stating the truth conditions of a predicate in a specific situation by listing the ways in which this predicate might have become true or might have remained true. Successor state axioms solve the frame problem. Example: All s: All a: happy(agent,result(a,s)) <=> (hasChocolate(agent,s) /\ a = eatChocolate) \/ happy(agent,s). This states that the agent becomes happy by eating chocolate or by being happy in the preceding situation. Problem solving with a knowledge base: The knowledge base contains the definite clauses about the world. Then a query of the form "goalAchieved(s)" is asked, i.e it is asked whether there exists a situation s in which the predicate "goalAchieved" holds. The knowledge base now finds such an s. Since we always defined situations by the result function applied to a preceding situation, s will be of the form "result( action1, result(action2, result (action3, ........, result(actionN,s0))))....) and thus implicitely specify a sequence of actions. Problem solving by plans: In the above description, the result of the query is quite unhandy to read. That's why one defines a function planResult, which gets a sequence of actions and a situation and returns the situation, which is the result of applying all the actions. //Applying no action results in the same situation All s: planResult([],s) = s //Apply the rest of the plan to the result of the 1st action All s,p,a: planResult([a|p],s) = planResult(p,result(a,s)) Plan: A sequence of action steps. And-Introduction: The inference that "a /\ b" when it holds that "a" and it holds that "b". Example: big(bob), big(bill) ~~~~~~> big(bob) /\ big(bill) Universal elimination: The inference that a universally quantified proposition holds for a specific constant. Example: All x: toBe(x) \/ ~toBe(x) ~~~~~~~~> toBe(cogsci) \/ ~toBe(cogsci) Proof: A sequence of literals for which it holds that * the last literal is the proposition to be proved * every literal has been gained by applying inferences to preceding literals. Proving as searching: A proof can be found by searching a state space. Its nodes are sets of literals and its operators are inferences. The goal test just checks whether the desired literal is present in the current state. Problem: There is a high branching factor, especially for universal elimination. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ KL-ONE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ KL-ONE: A formalism for representing knowledge. It consists of KL-ONE membership propositions and KL-ONE concepts. KL-ONE membership proposition: A string of the form o E C where o is an object and C is a KL-ONE concept. In PL: C(o) KL-ONE concept: * a concept symbol (mostly an uppercase letter) OR * an intersection of concepts OR * a union of concepts OR * a negation of a concept OR * a value restricted concept OR * an existential value restricted concept Intersection of concepts C1 and C2: The concept which includes all those objects which belong to both C1 and C2. In KL-ONE: C1 AND C2, where AND is a square-shaped logical "and" In PL: The concept of those x for which C1(x) /\ C2(x) Example: Male AND Parent = Father Union of concepts C1 and C2: The concept which includes all objects which belong to C1, to C2 or to both. In KL-ONE: C1 OR C2, where OR is a square-shaped logical "or" In PL: The concept of those x for which C1(x) \/ C2(x) Example: Boy OR Girl = Child Negation of concept C: The concept which includes all objects which are not in C. In KL-ONE: -C In PL: The concept of those x for which ~C(x) Example: -Alive = Dead Value restriced concept: A concept consisting of objects o for which all those objects which fulfill a given relation with o belong to another given category. In KL-ONE: All R:C, meaning those objects o, for which all c with R(o,c) belong to C. In PL: The concept of those x for which All y: R(x,y) => C(y) Example: All Child:Male = Those_with_only_male_children Existential value restricted concept: A concept of objects o for which there is another object of a given category which fulfills a given relation. In KL-ONE: Ex R:C, meaning those objects o, for which there exists a c of C, such that R(o,c). In PL: The concept of those x for which Ex y: R(x,y) /\ C(y) Example: Ex Child:Male = Those_who_have_at_least_one_son Terminology T: A formalism extending KL-ONE. Two more signs are used: * C1 < C2 (a square-shaped less-equal sign), indicating that all objects of C1 also belong to C2 (partial definition) In PL: All x: C1(x) => C2(x) * C1 = C2 (an equal sign with a dot above it), indicating that all objects of C1 also belong to C2 and vice versa (total definition) In PL: All x: C1(x) <=> C2(x) Overall aim: Finding an algorithm which can decide whether one concept is a sub-concept of another concept. In order to do so, we will first simplify T definitions and then transform the problem to a constraint problem. Terminology T*: A restriction of the terminology T, no partial definitions are used. Transforming T into T*: Turning all partial definitions into total definitions. If the partial definition is "C1 < C2", then the total definition would be "C1 = C2 AND C1*", indicating that something C1-specific (namely C1*) picks all C1's out of the C2's. C1* remains undefined. Example: // all CogScis are students CogSci < Student // There is something CogSci specific CogSci*, such that CogSci = Student AND CogSci* Expansion: A KL-ONE concept defined by only undefined concepts. Getting an expansion by the help of T* definitions: Plugging the T* definitions together to get an expansion. The expansion is well-defined if there are no cycles in the T* definitions and each concept occurs only once on the right side of the definitions. Example: // Partial definition of "Student" by undefined concept "Human" Student < Human // Partial definition of "CogSci" CogSci < Student // Transformation to total definitions Student = Human AND Student* CogSci = Student AND CogSci* // Transformation to an expansion E(CogSci) = Human AND Student* AND CogSci* Inconsistency: A concept is inconsistent if no object can belong to it. Subsumption: A relation between two concepts indicating that all objects belonging to the first concept also belong to the second concept (the first is a sub-concept of the second). The relation is written down as an oddly-shaped less-equal sign, here represented by a " S U { (x:C1 AND C2),(x:C1),(x:C2) } intersective constraints are split up * S U { (x:C1 OR C2) } ~~~> S U { (x:C1) } or S U { (x:C2) } union constraints result in two possible constraint systems * S U { (x:(Ex R:C)) } ~~~> S U { (x:(Ex R:C)), (xRy), (y:C) } existential constraints add a new constrained variable * S U { (x:(All R:C)) } ~~~> S U { (x:(All R:C)), (xRy), (y:C) } Satisfiability: A constraint system is satisfiable if there is an instantiation of variables such that all constraints are fulfilled. This translates to: There is no elementary contradiction within the constraint system. Subsumption algorithm: An algorithm deciding whether C1 is a sub-concept of C2. 0. The problem is C1 R(a,c) Symmetric relation: A relation R on a set S for which it holds that All a,b E S: R(a,b) => R(b,a) Asymmetric relation: A relation R on a set S for which it holds that All a,b E S: R(a,b) => ~R(b,a) Antisymmetric relation: A relation R on a set S for which it holds that All a,b E S: R(a,b) /\ R(b,a) => a=b Complete relation: A relation R on a set S for which it holds that All a,b E S: R(a,b) \/ R(b,a) This means that the relation is defined for all pairs in the set. Partial relation: A relation which is not complete. Order relation: A reflexive, antisymmetric and transitive relation. Partial order relation: An order relation which is not complete. See also: statistik.txt at "Eigenschaften von Relationen" for a complete German list of relations and examples. Supremum: A function which receives two elements of a partially ordered set and returns an element of this set, which is the least upperbound of the two. If there is no minimum among the upperbounds, the function fails. Example: If the set is the set of natural numbers, then sup(4,19) = 19 If the set is the set of characters and the relation is the lexical sorting relation, then sup('a','A') fails: Both 'b' and 'B' are "least upperbounds" and we cannot compare them in order to find out which one is the minimum. Infimum: A function which receives two elements of a partially ordered set and returns an element of this set, which is the maximum lowerbound of the two. If there is no maximum among the lowerbounds, the function fails. Lattice: A partially ordered set where for all possible pairs of elements, sup(a,b) and inf(a,b) exist. Complete lattice: A partially ordered set where every subset has an infimum and a supremum. Lattice graph: A graph whose nodes are concepts. It has a top element (standing for a most general concept compromising all objects) and a bottom element (standing for a most restrictive concept with all attributes but no objects). A connection from an upper node to a lower node means that the lower concept is a sub-concept of the higher concept. Such lattices are complete. Formal concept analysis: Determining extensions and intensions of concepts by a lattice graph. Semantic net: A graph whose nodes are concepts, instances and properties. Concepts are connected by "is_a"-connections, indicating that one concept (usually the lower one) is a subconcept of the other. Instances (usually at the bottom) are connected to their concept by an "instance"-connection. Attributes are connected to concepts by a "has_attribute"-connection. Inference in a semantic net: Following the links. There are two ways: Intersection search and inheritance search. Intersection search: Finding the relations between two objects by starting at both nodes in the semantic net and trying to meet the two searches. Advantages: * entity-based organization * fast parallel organization Inheritance search: Finding the properties of an object by following up the concept links and collecting all attributes. Monotonic logic: A reasoning where every concept has the properties of all its super-concepts. Non-monotonic logic: A reasoning where a concept can miss a property which one of its super concepts has. Default-reasoning: A way of handling non-monotonic logic. Properties are attached to concepts in a semantic net as usual, but subconcepts which miss a property of a superconcept are explicitely marked so. When an inference is to be drawn, the links are followed upwards from subconcepts to superconcepts and the first occurance of a property overrides all following occurances of that property. Example: Birds can usually fly. Penguins are birds. Penguins can't fly. Tweety is a Penguin. Hence: Tweety cannot fly. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Conceptual Dependency and Scripts ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Object-Attribute-Value representation: A way of stating that the attribute of an object has a specific value, namely by writing: Object -> Attribute -> Value OAV-representations are more general than representations of the form attribute(object,value) and can be arranged in a net. Event representation: A way of describing a sentence by OAV- representations. The event itself is seen as an instance of the verb (i.e. -> "instance" -> ). All participating instances stand in a certain relation to the event, e.g. "agent", "receiver", ... Surface structure of a sentence: The phrase marker of a sentence. Deep structure of a sentence, case frame: A list of the following variables with their values for that sentence: * Predicate P - the action * Time T - the time of the event * Agent A - the actor of the event * Counter Agent C - force against which the action is carried out * Object O - the entity that moves or changes * Result R - the entity that comes into existence as a result * Instrument I - immediate physical cause of an event * Source S - the place from which something moves * Goal G - the place to which something moves * Experience E - the entity that experiences the effect (invented by Fillmore) Conceptual dependency, CD: A formalism by Roger Schank to represent the meaning of a sentence independent of its surface structure. The event described can be built up from primitive components and can be arranged in a CD graph. CD graph: Mostly a graph of the form <=> <---- | | or of the form <==> [ () ] where is the agent, is a CD primitive act, is an entity, is a CD relation and is an CD action modifier. Several "<---- " constructions may be attached to any node in this graph. The second graph states that the has the , optionally with value . CD Primitive acts: * ATRANS - Transfer of an abstract relationship. e.g. give. * PTRANS - Transfer of the physical location of an object. e.g. go. * PROPEL - Application of a physical force to an object. e.g. push. * MTRANS - Transfer of mental information. e.g. tell. * MBUILD - Construct new information from old. e.g. decide. * SPEAK - Utter a sound. e.g. say. * ATTEND - Focus a sense on a stimulus. e.g. listen, watch. * MOVE - Movement of a body part by owner. e.g. punch, kick. * GRASP - Actor grasping an object. e.g. clutch. * INGEST - Actor ingesting an object. e.g. eat. * EXPEL - Actor getting rid of an object from body. e.g. ... er, ... breathe out CD relations: * o - the relation of being an object to an action * R - recipient-donor relation * I - the relation of being an instrument for an action * D - the source-destination relation CD action modifiers: * p - past * f - future * t - transition * ts - start transition * tf - finished transition * k - continuing * ? - interrogative * / - negative * delta - timeless * c - conditional CD lexical entry: A pattern for a graph with CD placeholders for actual values. CD placeholders: * PP - Real world object (Picture Producer) * ACT - Real world action * PA - Attribute of an object * AA - Attribute of an action * T - Time * LOC - Location Frame: A structure of default information associated with an event. Script: A structure of circumstances which are likely to follow one from the other. Some components of a script are: * Entry Conditions: Conditions which must be satisfied in advance * Results: Conditions that will be true afterwards * Props: Slots representing objects involved in the events * Roles: Persons involved in the events * Track: Variations on the script. * Scenes: The sequence of events that occur. Events are represented in conceptual dependency form. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Basic categories ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mental concept: A representation of a category in the brain with additional information such as e.g. usage. Three-level concept theory (Rosch): Mental concepts are organized in a three-layer pattern: * superordinate concepts General categories with few attributes (e.g. "vehicle") * basic level concepts The most important categories (e.g. "car"), they * have many attributes in common * have motor movements in common * have a shape in common * can be identified by averaged shapes From cogpsy.txt at "Three-level concept experiments": * are most frequently used * are those categories of which people name objects * have motor movements associated with them * can be captured by a mental image * are recognized fastest * subordinate concepts Very special categories with few additional attributes (e.g. "racing car") Problems occur with concepts of persons and social categories. Common attributes of basic level objects: Common attributes were assessed by asking people to name properties of categories of different levels. Result: * Very few attributes were listed for superordinate categories * The number of basic level category attributes was significantly greater * Subordinate categories did not receive more attributes than their basic category Common motor movements of basic level objects: Common motor movements were found out by asking people to describe the movements with which they normally interact with objects of a category. Result: * few movements for superordinate categories * lots of movements for basic level categories * no more common motor movements for subordinate categories See also: newsigh.txt on "Naming" Common shape of basic level objects: The shapes of several randomly chosen pictures of objects of the same category were normalized and the amount of overlap was taken as a measurement of shape similarity. Result: * hardly any similarity for superordinate category objects * a high similarity for basic level category objects * not so much similarity for subordinate category objects Average shape of basic level objects: The normalized shapes of objects of the same category was presented to subjects. Result: Subjects could best identify basic level categories. Prototype: An instance which exhibits the most important features of a category. Prototype theory (Rosch): Membership of a category is fuzzy. It is determined by the similarity of the object to the prototype of a category. See also: cogpsy.txt on "Prototype concept theory" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Knowlede based systems ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Knowledge level: A level of information, namely the level of external world facts. Symbol level: A level of information, namely the level of internal representations of real world facts. Knowledge-based system: A system which can answer questions in a certain domain. Knowledge-based system architecture: software <---> inference engineer engine <-> User | Interface <--> User domain <-> knledge <---> knledge <-> expert engineer base __________________________ Knowledge based system Domain expert: A person who knows about the domain, but not about the working of the system. Knowledge engineer: A person who designs and debugs the knowledge base in consultation with domain experts. Software engineer: A person who programs the inference engine. User: A person who does not know much about the domain and wants to solve a problem with the help of the system. Finding out what the user wants: Querying the user in order to model his problem and find a solution for it. The users answers are recorded. Querying can be done by ground queries (Yes/No), by menues of choices or by free form text areas. In the latter case, the system needs a large dictionary to map user expressions to the formal expressions expected by the system. Psychological issues play a role in modeling the dialog. Explanation of an answer: The system has to explain why it gave a certain answer, both for debugging purposes and seriousness. One can ask * HOW the system proved a goal or a subgoal and it will output the rule * WHY a subgoal is being proved, i.e. why a specific question is being asked * WHYNOT a goal was derived Knowledge organization: Knowledge can be organized by introducing different knowledge models and thus reducing the complexity. Model of expertise: A knowledge model of the following structure: * Domain knowledge (concepts and relations) * Control knowledge * Inference knowledge (meta classes) Reflects the detailed structure of the domain * Task knowledge (Goals) Applies knowledge sources to achieve goals * Strategic knowledge (Plans) Controls tasks and goals (?) Systematic diagnosis: Splitting up a problem into * Selecting a system model * Generating a hypothesis * Testing the hypothesis (?) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Machine Learning ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The study of Machine Learning: The study concerned with the question of how to construct computer programs that automatically improve with experience. Learning: Given a problem you cannot solve, doing what you need to do in order to solve the problem, is called learning. The learner should not just memorize the data but generalize it, compress it and be able to explain it. Types of learning: * Unsupervised, i.e. by experience * Reinforced, i.e. by getting positive feedback (?) * Supervised, i.e. with help of a teacher * By example, i.e. step by step Principle of multiple explanations: If there are more than one theories consistent with observations, keep all theories! (Epicurus) Falsification can be used to eliminate possible explanations (calculus of model elimination). Occam's razor, Principle of simplest explanation: Among the theories that are consistent with the observations, select the simplest theory! (William of Ockham) Principle of probabilistic explanations: The probability of a theory being true is proportional to the learners initial belief in this theory multiplied by the conditional probability of observations given the theory (Bayes) (?) Sample: A set of pairs of examples x[i] and the desired responses t(x[i]) E {yes,no}. Positive example: An example x, whose t(x) is "yes". The set of all positive examples is E+. Negative example: A non-positive example. The set of negative examples is called E-. Learning system: A learning system has background knowledge and tries to generate a hypothesis about a sample by a learning algorithm. Learning algorithm: An algorithm to deduce a hypothesis from a sample. It receives a sample of the form {,,...}, has background knowledge Sigma and outputs a hypothesis h such that every positive example can be deduced by Sigma U {h} and no negative example can be deduced. Formally, the algorithm can be seen as a function from a sample to a hypothesis: A(s)=h. Properties of the hypothesis: A hypothesis h=A(s) is called * complete, if all positive examples can be derived * correct, if no negative example can be derived * consistent, if it is both correct and complete err: The predictive probability of an error of a learning algorithm. errdelta(h) = P( h(x)<>t(x) ). (?) Properties of the learning algorithm: A:S->H is called * s-correct, if it generates a consistent hypothesis * correct, if it generates a hypothesis which also works with all other examples outside the sample * epsilon-approximately correct, if errdelta(A(s)) < epsilon * probably approximately correct (PAC), if there is a minimum number of examples granting that errdelta will be small. Coverage of a hypothesis: The quotient of positive examples which can be deduced from the hypothesis and all positive examples. cov(h) = | {e E E+ : deduced(e) } | / | E+ | Accuracy of a hypothesis: The quotient of positive examples which can be deduced by the hypothesis and all examples, which can in general be deduced by the hypothesis. acc(h) = | {e E E+ : deduced(e) } | / | { x: deduced(x) } | Finding a hypothesis: A hypothesis can be found by a search in a lattice. Too general hypothesis (which include negative examples) need to be specialized and too special hypothesis (which exclude positive examples) need to be generalized. A representation, a distance measure and a heuristics need to be found in order to search efficiently. Overfitting: The property of a hypothesis of being to much specialized to the sample. Overfitting hypothesis will not be able to explain non-sample examples and they also include those examples, which were taught wrongly. A hypothesis h overfits on its sample s if there is another hypothesis h' which performs worse on s but better on a test sample unknown to h. See also: neuroinf.txt on "Overfitting" Pruning: Establishing biases (restrictions) in order to decrease the state space for a hypothesis search. Establishing biases means deliberately performing suboptimally. Language bias: A restriction on possible hypothesis. Search bias: A mechanism for judging the usefulness and the generalness of a hypothesis (?). Validation bias: A criterion for stopping the hypothesis search. Bayesian network: A graph with input nodes (representing attributes of the examples), intermediate nodes (representing attributes composed of the original attributes) and output nodes (representing attributes as concluded by the network). Bayesian networks are a predecessor of artificial neural networks. Artificial neural network: A directed weighted graph of "neuron"- nodes, an input layer ("indicators"), hidden intermediate layers and an output layer ("presentation mode"). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Planning ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Action, operator: A function mapping one world state to another. Action step: A concrete instance/invoking of an action with a name. Example: Going to university today and going to university tomorrow are two action steps but the same action. Closed World Assumption, CWA: The assumption that every proposition which cannot be derived from the knowledge base is false. The CWA can lead to inconsistencies with OR-expressions: If "a \/ b" is in the knowledge base, then a cannot be proven and is considered false. Also, b cannot be proven and is considered false. Hence a and b are both considered false, which is inconsistent with the proposition "a \/ b". Finding a plan by standard search: Defining the search space as a graph of states. The links between the nodes are operators which transform one state to another. Try to find a way in this graph which connects the initial state with a state which fulfills the goal conditions. The operators along this path constitute the plan. Progression planner: A search which starts at the initial node and continues until it finds a goal state. Regression planner: A search which starts at a goal state and tries to find a way to the initial state. Advantage: Usually, there are many possible operators in the initial state (i.e. a high branching factor for progression planners), but few operators that lead to the goal state (i.e. a much smaller branching factor for regression planners). Disadvantage: Often, a conjunctoion of goals has to be found, not just one. Problems with finding a plan by standard search: * There is a high branching factor of possible actions * There are multiple, inefficient solutions, where one operator undoes what a previous one did * There are multiple, inefficient solutions where operators are applied that do not cause a relevant change * An after-the fact heuristic can only check a state when it has been generated Finding a plan by plan searching: Define the search space as a graph of partial plans. Start with a very raw partial plan and refine it until you get a complete plan. Linear plan: A plan consisting of a determined sequence of action steps. Partial plan, non-linear plan: A tupel of * A set of action steps * An ordering * Bindings * Causal links. As a consequence, there need not be a determined order of action steps. The only conditions are that certain action steps must occur somewhere before other action steps. Binding: A statement about the value of a variable of the form " = " or " = ". Ordering: A partial order on action steps, i.e. a subset of ActionStep x ActionStep. Causal link: A relation between two action steps meaning that the first etablishes the preconditions of the second. Every causal link is also an ordering link. Least-commitment planning: A strategy used with partial plans. Only make choices about things that you currently care about, leaving the other choices to be worked out later. Complete plan: A plan which contains a sequence of action steps from the start to the goal. Partial order plan: A partial plan which is complete. Total order plan, linearization: A linear plan which is complete. STRIPS: Stanford Research Institute Problem Solver, an algorithm for finding plans. The algorithm starts with a partial plan of just the start action step and the end action step and applies plan operators to reach a complete plan. Defining an action: Actions are defined by * The name of the action with its arguments * Preconditions which must be fulfilled in order to execute the action (a conjunction of positive literals) * Effects of the action (a conjunction of positive literals and negative literals) Open condition: A precondition which is not yet fulfulled. Start action: The initial action of every STRIPS plan. * Name: START * Preconditions: TRUE (i.e. no open conditions) * Effects: /\ /\ ... (i.e. all predicates which are true in the initial state) End action: The final action of every STRIPS plan. * Name: END * Preconditions: /\ /\ ... (i.e. all conditions for the goal state) * Effects: FALSE (i.e. no effect is necessary) Operators on plans: Operators which map one plan to another. Possible plan operators are: * Refinement operators * Add a step * Add an ordering * Instantiate a previously unbound variable * Modification operators * Restructure the orderings Threat: An action step which annuls the preconditions of another action step. Blocks world: An artificial world of named blocks standing on a table. The goal is to construct a tower with a certain order of blocks. The following operators are available: PutOn(x,y): Preconditions: clear(x) /\ clear(y) /\ on(x,z) Effects: ~on(x,z) /\ ~clear(y) /\ on(x,y) /\ clear(y) PutOnTable(x): Preconditions: clear(x) /\ on(x,z) Effects: ~on(x,z) /\ clear(z) /\ on(x,table)