School of Computer Science THE UNIVERSITY OF BIRMINGHAM Ghost Machine

== DRAFT == DRAFT ==

HOW CAN WE EVALUATE A VISION SYSTEM FOR A HUMAN-LIKE MOBILE ROBOT?
OR
Evaluating evaluations for AI/Robot vision systems
Talk for IRLAB group Friday 4th March 2016
4pm Room 245 School of Computer Science.

Aaron Sloman
School of Computer Science, University of Birmingham


Installed: 1 Mar 2016
Last updated: 3 Mar 2016
This file is
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/irlab-vision.html

A partial index of discussion notes is in
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/AREADME.html


How should we evaluate a vision system for a robot in a garden?
(A human-like vision system for a human-like robot.)

I shall introduce a problem that is related to the recent talk on "Cognitive relevance of computer vision datasets" by Janez Pers criticising some of the benchmarks commonly used by AI/Robotics vision researchers.

The problem is: How can we specify better ways to evaluate proposed human-like AI vision systems?

The following (small) web site presents a few videos of garden scenes, showing different views of a garden taken using a low quality video camera moved around by hand. Mostly the flowers, foliage, etc. are stationary but a gentle wind was blowing some of the time.

http://www.cs.bham.ac.uk/research/projects/cogaff/misc/vision/plants/

If you watch one of the videos (preferably on a fairly large screen rather than a mobile phone display) you will have visual experiences that are partly (but only partly) similar to what you would experience if you were actually walking around in the garden, peering at flowers, leaves, bushes, branches, etc.

What would have to go on for a machine (e.g. a robot) to experience the videos (or the original garden scene) as YOU do (under various conditions, e.g. with one eye open or two, viewing on a small or a large screen, etc.).

How could you demonstrate that the robot was seeing as a human does, when it views the videos, or when it moves round the garden following trajectories and gaze directions similar to those used for the videos?

This is not a question about the mechanisms required, or how to design a vision system, but about how to evaluate designs for future vision systems, especially designs for general-purpose human-like systems.

Of course there are variations in how humans see things: so there cannot be one correct design. I am colour-blind and my wife is not, so she sees all sorts of details in a garden that I do not. So any set of tests should allow for a variety different sorts of human-like visual competences.

Apart from physiological differences causing differences in visual competences and experiences there are differences in levels of expertise that have other causes. For example, my wife is a keen gardner and has studied a lot more biology than I have. She therefore knows a great deal more about types of plant and their appearance and therefore notices structural details that I do not, though I could probably learn to see some of them.

How could we tell whether a machine is learning to see features, structures, or relationships in the same sort of way as a human might?

E.g. someone might propose that the robot should build a 3-D model of the contents of the garden and a time-tagged 3-D trajectory corresponding to the path through which the camera had moved and the direction it was facing in, with enough detail to project a view of the scene from a different (but not very different) location at any time.

A more ambitious requirement would be the ability to project views from a fly-through along various trajectories with changing poses. (This sort of test is sometimes used to evaluate a vision system's understanding of a static scene.)

Others might propose that the machine should be able to predict what would be sensed if it had a human-like hand with human-like sensors and moved it in various forms of contact with the objects seen (leaves, branches, petals, soil, grass, tree-trunk, etc.)

A further requirement might be related to perception of affordances for various types of action by the robot, or affordances for the robot's young friend.

Should it also be able see which actions are IMpossible in that situation?
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/impossible.html

As far as I can tell none of the current techniques for evaluating AI/Robotic vision systems is adequate, and many are irrelevant, which suggests that we may be missing something deep about natural vision. This may indicate serious gaps in current AI/Robotics goals, theories, models, techniques.

If anyone has proposals I'll be happy to learn from them.

I suspect researchers in vision, may need to become much better at introspection, in order to devise relevant, deep, evaluation procedures.
See Maja Spener 'Calibrating Introspection' Philosophical Issues, 25, Normativity, 2015
http://onlinelibrary.wiley.com/doi/10.1111/phis.12062/abstract
(This is related to methods of "conceptual analysis" used (sometimes badly, sometimes well) by philosophers, summarised here: http://www.cs.bham.ac.uk/research/projects/cogaff/crp/#chap4
(Chapter 4 of The Computer Revolution in Philosophy, 1978)

If we solve that problem the next one will be to explain how a future robot can make discoveries in Euclidean geometry and topology as our ancestors did, long before the development of modern logic and algebra.

Or (equivalently??) how to design a vision system for a crow-like robot that could build crow-like nests from the materials used by crows.


REFERENCES AND LINKS


Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham