School of Computer Science THE UNIVERSITY OF BIRMINGHAM Ghost Machine

Seeing Possibilities For a Cup And Saucer

(VERY EARLY DRAFT: Still changing rapidly, so saved
copies will soon be out of date. Save links instead!)

Aaron Sloman
School of Computer Science, University of Birmingham.
(Philosopher in a Computer Science department)

Installed: 17 Jul 2014
(Using pictures taken in 2005)
Last updated: Reformatted: 1 Nov 2017
____________________________________________________________________________

This paper is
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/cup-saucer-challenge.html
A PDF version may be added later.
A closely related document, using these pictures, was written during the EU CoSy
robot project, and made available here:
http://www.cs.bham.ac.uk/research/projects/cogaff/07.html#708
     "Perception of structure: Anyone Interested?"

Two other closely related documents written around the same time are
http://www.cs.bham.ac.uk/research/projects/cogaff/07.html#709
     Perception of structure 2: Impossible Objects
http://www.cs.bham.ac.uk/research/projects/cosy/photos/crane/
     Challenge for Vision: Seeing a Toy Crane -- Crane-episodic-memory

A partial index of discussion notes is in
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/AREADME.html
____________________________________________________________________________

Some Challenges for a Visual System

Consider these two scenes:
    Saucer on cup         Cup on saucer
These two pictures were taken using the same three objects arranged differently.
____________________________________________________________________________

Question:
Using a two-finger gripper, what actions can get from the situation on the left (or any situation with partly similar initial relationships) to the situation on the right (or a similar situation), and back again? Think about how you could do that, before reading on.

Discussion:
Notice that no known vision system, computer-based or human, can determine exact directions, distances, curvatures, orientations, thicknesses, and other spatial properties and relations from these two images, partly because they are low resolution images taken in poor light (late at night in a hotel bedroom with one ceiling light not working!), and partly because a single 2-D image cannot in principle provide exact distances and sizes. So anyone who understands the above question and thinks about a possible answer must be using interpretations of the images that abstract from precise metrical details.

Many vision researchers assume that the abstraction has to be done by replacing precise metrical values with probability distributions over such values, but there is another way: using partial orderings to relate parts of the scene rather than absolute values. Orderings can be of many kinds: further away , further apart, wider, more curved, thicker, shallower, sloping more steeply, changing curvature more rapidly in a certain direction, Where processes are involved, again instead of specifying exact directions, velocities, and accelerations in some coordinate system, for many purposes it may suffice to use partial orderings, based on relations like: moving faster than, changing speed faster than, changing direction faster than, rotating faster than, and many more, including comparisons of acceleration (rates, of rates of change).

In some cases instead of processes being described in absolute or relative spatial terms they can be described at an even higher level of abstraction, in terms of changes in affordances that are produced by motion either of things perceived, or of the viewer. This can include changes in proto-affordances: changes in possibilities for motion or changes in possible interactions between things, with no agents' actions or needs being involved.

An extended discussion of opportunities for using partial orderings instead of probability distributions to deal with uncertainty or poor data, can be found in:
   http://www.cs.bham.ac.uk/research/projects/cogaff/07.html#718
   Predicting Affordance Changes
   (Alternative ways to deal with uncertainty)

Further questions
Earlier you were asked to think about how you might rearrange the objects in order to get from a configuration like the first to a configuration like the second. Are you able to describe, not the actions, but how you thought about the actions, including the intermediate stages and linking processes that you thought about? Did you need to consider any exact distances, widths, directions, weights, or other geometric or physical properties or relations?

Did you consider which of your body parts you would use, how the appearance of the scene would change, and what information you would use about the changes when selecting and controlling actions?

Normally we can plan actions without considering those details because we know that we have mastery of the familiar types of sub-task required and this manipulation task is not a difficult test (for a normal adult in our culture), unlike some puzzles that most people find difficult, such as the fisherman's folly puzzle, which requires separation of the metal ring from the rest of the object, without cutting or breaking anything.

    puzzle

Image from:
   Pedro Cabalar, Paulo E. Santos, (2011)
   Formalising the Fisherman's Folly puzzle, in
   Artificial Intelligence, 175, 1, pp. 346--377, 2011,
   Issue on John McCarthy's Legacy,
   http://www.sciencedirect.com/science/article/pii/S0004370210000408

   (That paper shows (a) how the puzzle can be "translated" into a
   logical problem, which most humans can't do, and (b) how an AI
   planning program can solve it, in the translated form. They make no
   claims or promises about automating the translation of the puzzle
   into a logical form.)

Returning to the Crockery challenge

Consider how, prior to the action, the agent (one who has not discovered the translation in Cabalar and Santos) has to, solve several sub-problems.

Could such deliberative premeditation use an action schema (or operator) with approximate, qualitative parameters instead of the more definite actual parameters that would be used (explicitly or implicitly) if the action were performed?

NOTE:
There are problems here partly analogous to problems of reference and identification in language, except that the mode of reference is not linguistic and what is referred to typically cannot be expressed in language because it is anchored in non-shared structures and processes.

(Internal 'attention' processes are partly like external pointing processes: virtual fingers -- in some cases because they exhibit 'causal indexicality', i.e. implicitly referring to the results of learning, or selective attention, achieved by some internal learning mechanism, as pointed out in:

   http://www.cs.bham.ac.uk/research/projects/cogaff/03.html#200302
   Aaron Sloman and Ron Chrisley,
   Virtual machines and consciousness,
   Journal of Consciousness Studies, 10, 4-5, 2003, pp. 113--172,
   NOTE:
   A detailed commentary (and tutorial) on this paper by Marcel
   Kvassay, comparing and contrasting our ideas with the anti-reductionism
   of David Chalmers, was posted on August 16, 2012:
   http://marcelkvassay.net/machines.php

____________________________________________________________________________

Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham