More papers on this 'misc' web site
Background to this web site I noticed an advert for a post-doc post at MIT Postdoc in material perception at MIT. A postdoctoral position is available at MIT on the perception of materials and surfaces, under the supervision of Edward Adelson and Ruth Rosenholtz. The goal is to understand, at a computational level, the information in an image that allows a subject to recognize materials (e.g., wood, glass, fabric, etc.) and their properties (e.g., smooth, shiny, translucent, etc). The ideal candidate will have strong skills in some combination of visual psychophysics, computer graphics, machine vision, and machine learning. Programming in Matlab and C++ are required. Start date is flexible. Please email a CV, a cover letter explaining your research interests, and the names of 3 references, to Edward Adelson (adelson at csail dot mit dot edu). This caught my attention because I have been thinking and writing about the problems of learning about kinds of stuff for some time, e.g. Talk 68: Ontologies for baby animals and robots From "baby stuff" to the world of adult science: Developmental AI from a Kantian viewpoint. Aaron Sloman http://www.cs.bham.ac.uk/research/projects/cogaff/talks/#brown So I searched Ted Adelson's web site, and found the following paper: On Seeing Stuff: The Perception of Materials by Humans and Machines Edward H. Adelson Proceedings of the. SPIE Vol. 4299, pp. 1-12, Human Vision and Electronic Imaging VI, B. E. Rogowitz; T. N. Pappas; Eds. (2001) http://web.mit.edu/persci/people/adelson/pub_pdfs/adelson_spie_01.pdf That prompted me to write him a message saying: I have been trying (not very successfully) to get researchers in vision and robotics to help me think about the kinds of ontologies that animals and robots need in various environments if they are to interact, as humans and other animals do, with things in those environments, and even more so if they are to *understand* those interactions, e.g. so as to be able to think about them in advance of doing them, or to try retrospectively to understand why something did or did not happen, or in order to think about what others (e.g. babies and toddlers exploring their world) are doing, which might hurt or harm them ("vicarious affordances"). So I was really interested to see this job description. A little googling took me to your 2001 paper: On Seeing Stuff: The Perception of Materials by Humans and Machines I am amazed at the overlap of interest, including use of the word 'stuff', though in several ways your paper goes beyond what I have written, but I think there are important gaps (which you may have addressed elsewhere) concerning the role of motion. I have been arguing that a major subset of the learning that human infants and toddlers do must be about kinds of stuff, where each kind is largely defined by is relationships to different shapes, and processes involving causal interactions between shapes. E.g. some kinds of stuff resist change of shape, and break if forced. Others resist change, but allow it to happen and restore it if the shape-changing force is removed. Others allow change with mild or strong resistance, but do not attempt to restore shape. Others offer no noticeable resistance. One of my slide presentations which seems to baffle most roboticists and vision researchers (especially the younger ones) is about what I call 'baby stuff': http://www.cs.bham.ac.uk/research/projects/cogaff/talks/#brown From "baby stuff" to the world of adult science: Developmental AI from a Kantian viewpoint. I shall now modify that presentation to refer to your paper... (... now done...) Your recent advert refers only to the features of materials that might be visible in static scenes ("the information in an image"), which is also the main theme of your 2001 paper. However, I think there's far more that can come from perception of processes, whether produced by the perceiver (pushing, pulling, squeezing, pinching, prodding, stretching, bending, twisting, etc.) or merely observed, e.g. caused by wind, by objects colliding, by actions of others, etc.; and others that are produced by change of viewpoint, change of light source, change of things seen through transparent objects or reflected in them. In fact, without being able to perceive, produce, or think about processes, an individual will not have the means to grasp the full semantic content of many of our labels for describing kinds of material. I suspect that the ability to see material properties in static images of the sort you show in your paper may be the result of much learning where we first acquire concepts of different kinds of surface and different kinds of material through exploration of effects of motion, and when we have those concepts we fit them on to static images using constraint satisfaction mechanisms, which, sometimes get the wrong answer. If that suspicion is correct, it will be difficult, or impossible, for a visual system that is developed to deal only with static images to gain the ability to understand as wide a variety of images showing different kinds of stuff as we can. (I need to see if I can replace that suspicion with an argument based on examples.) E.g. we can define a concept of smoothness in terms of mathematical properties of surfaces, but for humans (and presumably some other animals and future machines) it will be more important (especially for non-mathematicians!) to understand smoothness in terms of what happens when one surface moves relative to another with which it is in contact. That can include various side-effects of the relative motion, including noise produced, or different kinds of resistance to relative motion potentially caused by tangential forces (friction and stiction). Of course, different frictional properties can be produced by equally smooth surfaces made of different materials. I return to this below. So there's a problem of separating out different causes. 'Shiny' and 'translucent' can be defined in terms of static states (e.g. how much light bounces off the surface and in which directions, or how much light passes through the material and with what kind of information loss or distortion, etc.). But both of them also have implications regarding how appearances change as a result of motion (e.g. moving highlights). Some researchers I know think that the way to communicate such concepts to machines is to present lots of labelled examples of pictures and use some sort of learning system. But that assumes that datamining in image features will provide all the relevant semantics, which must be false if our concepts have richer links with causal powers of surfaces and the material. Even moving to labelled examples of movies will not necessarily achieve the required results if the learning engine does not have the right conceptual and representational apparatus to start with. One of the hardest problems, as far as I can tell, is finding what form of representation might be developed by a young explorer, learning about different kinds of process and the various interactions between structure, matter, motion and applied forces. I suspect the appropriate representation of space and motion has not yet been found. It may depend on kinds of mathematics that I won't understand! Note added: 28 May 2010 It is often assumed that information about motion of objects will have to be expressed in terms of (or 'grounded in') sensory and motor signals of the perceiver. This view is a modern revival of concept empiricism, demolished by Immanuel Kant by 1781. In any case I suggest that believers in symbol grounding theory who take that line will find it difficult to produce working systems where all thinking about kinds of stuff and their relationships to shapes and motions has to be expressed in terms of sensory-motor signals. Instead I think we need, and have, a-modal ways of representing information about contents of the environment, along with ways of projecting from those ways of thinking so as to predict or explain the observed sensory-motor statistical relationships. People born blind, or limbless or suffering some other sensory or motor deficiency do learn many of the same concepts as the rest of us, of kinds of things that can exist or occur in the environment. [It is not often noticed that robots that use SLAM (Simultaneous Localiszation and Mapping) end up with topological and metrical relationships between walls, doors, corridors, spaces of other kinds, obstacles etc., which are not represented in terms of the sesory and motor signals from which the information was derived. I.e. SLAM often leads to a-modal exosomatic forms of representation from which it is possible to derive (or project) what will be seen etc. if the machine moves in certain ways.]
Kristine S. Bourgeois, Alexa W. Khawar, S. Ashley Neal, and Jeffrey J. Lockman Department of Psychology Tulane University Infant Manual Exploration of Objects, Surfaces, and Their Interrelations in INFANCY, 8(3), 233-252 http://www.silccenter.org/bibliography_pdfs/infancy_manual_exploration.pdf DOI: 10.1207/s15327078in0803_3
Maintained by
Aaron Sloman
School of Computer Science
The University of Birmingham