HowToPlan - ab3nd/TinyRobo GitHub Wiki

There is a planning element to the compiler that needs to allocate the tasks to robots and create a program, which is to say both a plan and a sequence of GCPR rules which implement that plan.

Scoping

It may actually be possible to do most tasks with homogeneous programs on the robots, rather than specialization. The potential should still be there, though.

Notes on planning

Russell & Norvig, Artificial Intelligence, a Modern Approach (1995)

Chapter 11

Source of goals for planning agent is some form of knowledge base that gives goals. It may be that the user input for my system is itself a goal, e.g. "Move the block to area A" can be stated as the goal state "The block is in area A".

A lot of planning work assumes you can check the world, make a plan, execute the plan, and then check the world again. No dynamic world, no failing the plan. I can see why Brooks was regarded as a loon.

Planner has to have some representation of states, goals, and actions

Planner can add actions anywhere, not just at end of plan (so the overall sequence of actions can be built in no particular order, or only ordered by required constraints). If I have to get milk and a hammer, it doesn't matter if I visit the hardware store or the grocery store first, just that I visit both.

Most parts of the world are independent of each other, as with the buying milk and a hammer. Independent pieces can get their own subplans.

Logical calculus/situation calculus (chap 7 of book, apparently) to represent situation. Planning by unguided logical inference is A) very time consuming (have to try every operation in every situation to see if it hits the goal state) and B) only semidecidable.

Original STRIPS had problems with conjunctive goals, because it worked back from the goal to the current state. It also had problems with having to know the full state of the world, although that can usually be worked around by updating the model of the world and then replanning.

Operations can be on plans, rather than on the state of the world. Refinement ops take a partial plan and add constraints to it. Modification ops are anything that isn't a refinement op. Chap 11 of Russel & Norvig only uses refinement ops.

Ordering by least commitment - Only make choices about whatever you are currently trying to do, so if you're solving the "milk" part of "get milk and a hammer", you don't make any action selections dealing with hammers.

Partial order planner orders some parts of the plan and leaves the rest unordered. Total order planner is a straight list. Converting from partial order to total order is called linearization.

Variables in the plan can be bound or unbound, a planner can leave some things unbound. When getting milk and a hammer, Buy(Hammer, store) can leave store unbound and bind it to the grocery store later, if it is found that they stock hammers for some reason. A plan with no unbound variables are called fully instantiated.

Complete plan is one such that every precondition of every step is achieved by some other step, and remains achieved (as opposed to being cancelled by some other step). Note that some notion of temporality can relax this a bit, where I don't care if the precondition of e.g. Buy(Milk, GroceryStore), which is being at the grocery store, is cancelled after I buy the milk, so the independence of the milk and hammer lets me make a complete milk plan, a complete hammer plan, and then tack them together (but perhaps not interleave them).

Consistent plan has no contradictions of the form Step A is before Step B and Step B is before Step A (ordering) or of the form m=Milk and m=Hammer (multiple use of the same variable). Have(Milk) and NotHave(Milk) is probably also a contradiction.

As planner builds plan, it protects certain variables in state transitions. If I have to finish with milk, and there's a transition between where I get the milk and finishing, then anything that results in my not having milk can't be inserted in that transition.

P 356 of book has skeleton of simple planner, but writing the planner might be out of scope here.

Noninterleaved planners (ones that can't mix steps of subplans in the presences of dependencies) run into the Sussman Anomaly.

Partly ordered plans and task networks (Sacerdoti) are the same thing. Nillson calls totally ordered plans linear, hence the "linearization" term for making a partially ordered (what Nilsson calls "nonlinear") into a totally ordered one. Instead of, you know, calling it totalizing or ordering.

Chapter 12

What STRIPS can't do, continued: Hierarchical plans, where some levels are specified, and some are left for later. Having to form a complete plan (and reform it as new things happen) can be to computationally intense, or impossible with current knowledge.

Complex conditions, such as universal quantification or conditionals. STRIPS can't encode the relation that Launch(Spacecraft) results in At(Orbit) if all systems are go and At(Ocean) if they are not.

Time, such as that actions take time, deadlines, reservation of resources in time, etc.

Resources, like limited money or machines available to do a task. In chapter 11, the Buy(thing, place) never failed, but it can, if you don't have any or enough money.

Hierarchical planning allows the substitution of more concrete operations for more abstract ones, generally adding detail. Go(Store) can be replaced by a sequence of street directions, which could then be replaced by odometry and heading directions. Decompositions are kind of like subroutine calls, have to preserve the pre/post conditions of the higher-level version.

Decompositions get the variable bindings from the op they replace, which can cause failure. Causal links from the previous operations may go to different operations within the decomposition, as needed by ordering.

Multiple decompositions may have steps in common, and require them to be the same steps, e.g. "go on a honeymoon and raise a baby" might have "get married" in the decompositions of both parts, but they should be the same marriages (avoiding bigamy and divorces are constraints on the planner). Can use available steps from previous decompositions, which involves a choice when doing new decompositions, or have a "critic" handle things like unifying common steps.

Conditional effects, where a postcondition depends on the current state of the world. When picking subgoals, can favor ones that have conditional effects we want, but then we need to set up the state of the world to trigger them. Also need to make sure the conditional effect doesn't trigger as a threat and make something we want to protect stop being true.

Disjunctive preconditions (x or y) then try one, use the other as a backtrack point.

Disjunctive effects are a problem. The world becomes non-deterministic. FlipCoin() has the effects Heads OR Tails, but not both, and without a clear way to distinguish them. Planner would have to check both branches, which can have things like arbitrary length loops (if e.g. you can't do something until you flip heads) and rapidly growing the state space.

Universal quantification. Instead of (in blocks world) we want no block on any other, we have to have "A is not on B and A is not on C and A is not on D...." etc. Instead, we want "For all b, b is a block, b has nothing on it". We also want universal effects, like "for all m, m is in Bag, if Bag is Carried, m is Carried".

Static world assumption: Everything is listed in initial listing of universe, nothing is added, nothing destroyed.

Using measures. The amount of money you have matters, but not the identities of the particular bills, and you don't want Have(2.00) by two instances of having the same $1 bill. Measures get added as effects that perform operations and assign them to variables used by the planner, e.g. Buy(Milk) has the effect Money=Money-Price(Milk). Can also have constraints like the amount of money you have not going negative. As of '95, when the book was written, there wasn't a lot of agreement about how resources should be represented and reasoned about. The general idea from the book is to pick an initial set of steps and then check that the resource requirements can be satisfied, which is usually done with approximate bounds on the expected costs of things. Time is like any other resource, except that nothing can generate more of it, or take a negative amount of it; actions in parallel take the maximium of their times, not the sum; and constraints on time have to be consistent with ordering. A later step must have less time than an earlier step.

Chapter 13

The real world will fucking cut you.

Conditional or contingency planning: The plan includes plans for what to do if something doesn't work. Some of the operations are sensing the existing state of the world and then the plan has branches for what is sensed.

Execution monitoring: The system monitors the execution of the plan instead of closing its eyes and taking off at a full run. If something goes wrong, it replans from the current state. Essentially, conditional planning, but only doing the conditional plans if they are needed.

Some of the plans could include adding actions in order to gain information about the environment, such as going to a high place to gain bearings.

Future parts of the plan could be parametrized on parameters derived from sensing operations. Once you have conditionals, you start to have plans that look like programs, with loops and e.g. maintenance goals (keep battery above %20 charge, etc.).

Something "going wrong", in a planning context means that after executing an action, its postconditions don't hold. The action didn't complete, or the environment doesn't actually work that way. Bounded indeterminacy: we can enumerate the ways things go wrong and create plans. Unbounded indeterminacy: Anything could happen, replan when it does.

Full integration, the agent is doing the planning all the time, always part of the way through the grand plan of living its life. Can move causal links to reflect updates in the state of the world, or remove steps that have all their causal links removed (as they no longer cause anything that another op needs).

Coercion: Agent forces world into a known state. You can make two things the same color, even without perceiving color, by painting them both from the same can.

Reactive planning, equivalent to policies in an MDP, or POMDP, Brooks, etc. all took off in the 90's, so maybe that's why all the planning stuff I've found is from before then. Layered FSM can represent the agent function, and can act based on current sensor precepts.
AI Planning: Systems and Techniques 30 year overview of the field.

Domain independent (general reasoning model, most AI research) vs domain dependent (uses domain-specific heuristics).

Paper chronology goes up to ~1990s, STRIPS had a pretty strong lineage, as did HEARSAY-II and REF-ARF

STRIPS was for robot control, HACKER for program generation

Transition from initial state of the world to known target state of the world

Actions/Operator schemata have preconditions and effects on the world. They are a class of possible actions that have parameters that can be set to create particular individual actions. "An action that the planner considers to be directly executable is referred to as a primitive action or, simply, a primitive. ". STRIPS operators look a LOT like GCPR rules. They have a precondition, and a sequence of things to add and remove from the world state when executed.

The STRIPS rules look a lot like the generative rules from e.g. Meld too.

World states can be traversed by applying rules whose preconditions are met to the current state, leading to a new state. A sequence of operations that leads to the goal state is the plan, or at least a plan. Combinatoric explosion probably prevents simply trying all operations at all steps and checking if the resulting state is the goal. "finding an optimal plan in even a simple blocks world domain has recently been shown to be NP-hard (Gupta and Nau 1990)." Attempts have been made to minimize the search space with e.g. A* and the graph traverser (Doran and Michie 1966).

Post-1975, most planners work by successive elaboration of the plan using planning transformations such as expansion and ordering to make the plan more and more specific. The plan is done when all the actions in it are primitives. This maps pretty well to the existing work on swarm primitive behaviors.

Plans typically have a temporal ordering on their operations. Action-ordering plans describe plans directly as relations between actions, rather than states and predicates, so the conditions can be incompletely specified.

"Planning involving conditional effects has been shown to be undecidable" (What does this mean?)

Attempt to perform those actions that cause the current state to more closely resemble the goal state. This sounds like it would have problems with local minima and with calculating distances between states. Backtracking during the generation of the plan allows escape from minima, since a minimum ends without reaching the goal. If you don't care about optimality, the system could just do a depth-first search looking for the goal and take whatever result comes up first. Guiding with domain heuristics helps if they're available.

If there are multiple unrelated goals, backtracking can cause undoing one to undo parts of another. Smarter planners cited by the original paper as "Hayes(1975); Stallman and Sussman(1977); Daniel(1983); and, to some extent, Stefik (1981a, 1981b)."

Metaplanning allows planning to apply to the application of plan transformations, by representing them like operators. Can also reject operations based on heuristics, or because their application leads to a state known to be impossible.

Conjunctive goal planning allows planning on interacting goals, e.g. painting a ceiling and painting a ladder. The ceiling should be painted first, since it interacts with (by using) the ladder, while painting the ladder doesn't need the ceiling, but getting the paint is the same action for both of them, and should only need doing once.

Partial-order planners attempt to reorder steps to avoid interactions (mid 1980s). Deviser took into account temporal requirements for planning so that steps sensitive to time constraints were satisfied.

The "STRIPS assumption" is that operations from the plan are the only thing that changes the world model. This seems like a cue for hollow, mocking laughter from no clear source. By the late '80s planners were taking into account failure and dynamic environments.

"Most of the systems described assume that the planner possesses complete knowledge of the current state of the world and the cause-and-effect relationships that govern changes in this world." More laughter. Shakey could apparently deal with changes through the combination of STRIPS and Planex. Generally, replanning in the face of a mismatch between the state of the world and the state of the model or expected state of the world. Since the swarmbots are not going to be doing replanning on the go, but executing an assigned program, they're kind of running in the 1970s. Having conditional tests can be called "universal planning" as it also includes what to do in the case of failures.

Late-80s planning added reactivity. Shallow planning or planning with sensing and action together, so the plan progressed depending on sensor inputs. "Reactive Action Packages" replaces operators with reactive elements, or processes that have reactive abilities.

Late-80s planners were ahistoric, no idea of what plans worked in the past. "Macrops" added learning from failures to STRIPS. Condensed successful sets of operations into single macro-operations that included the successful parameters. Case-based reasoning modifies existing plans to meet new situation. Priar and Plexus used these.

Conditionals and iterators. WARPLAN-C used conditionals by evaluating each branch when planning. Plan became tree-structured as a result.

Planning under uncertainty. When this paper was written, was very new. Plan can fail due to planning based on uncertain information.

Distributed planning. Not swarms or distributed systems, but having experts available to consult, or blackboard and executive process.
Wikipedia article

MDP or POMDP, extended to possibly multiple agents? System does have probabilities on actions (robots can fail) but the priors are not known. Actions are fully observable from the system POV, but not at the individual robot. Multiagent seems to assume some degree of collusion among agents during the planning, rather than execution of a given overall plan or subplan.

There apparently exist connectionist approaches to planning where inputs feed a unit with an activity and outputs propagate the activity to other units. A layered structure, some behaviors act as inhibitions. I do kind of wonder if a deep network or something similar could be trained with this sort of abstractness layering, where the sensor inputs are the low-abstractness version, and somewhere in the middle there's a representation of goals, and then it goes low-abstractness again to output to motors. The representation change would be the hairy bit, but perhaps with hand-built nodes that do e.g. sensor fusion and reasoning, world-state understanding, etc. rather than just summation and sigmoid functions. ROS functions like this to some degree already.

ACT-R and SOAR are more cognitive architectures than planners.