Two faces: an unnoticed illusion?

Two faces -- an unnoticed illusion?

Aaron Sloman
http://www.cs.bham.ac.uk/~axs/
School of Computer Science, University of Birmingham.

Installed: 6 Mar 2013
Last updated: 31 Aug 2018 (Separated the pictures)

This paper is available in two formats:
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/two-faces.html
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/two-faces.pdf

A partial index of discussion notes is in
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/AREADME.html

Do the eyes in the faces below Face1 and Face2 look the same?

Stare at each face in turn for a few seconds at a time. Consider both the geometric contents of your experiences and also any other aspects.

Fig Face1


  --

  --

Fig Face2
faces

This example is based on Figure 5 in Sloman(1986).

Some people find that the eyes in the two figures look different -- not geometrically different, but different in a way that appears to be part of the facial expression. I have not collected systematic data, which would be irrelevant for my purposes, as I am merely drawing attention to what can happen, not what regularly happens.

If it can happen then an answer to the question "What makes this possible?" is required. Answers can be given at various levels of abstraction and in relation to different requirements for explanatory information.

One interpretation of the question is as a request for enabling conditions, not just conditions for existence of a particular instance of a phenomenon but for the existence of the possibility of their being such instances: the possibility existed before instances did.

Moreover, a possibility can exist without being realised: e.g. many shapes and sizes of building that have never been constructed and never will be. It is possible for me to read out this paper word by word, starting with the last word and working backward to the beginning of the paper. That possibility has never been realised and probably never will be realised, but those two facts are consistent with existence of the possibility.

The explanation should describe a visual information processing mechanism, such that if we could build a machine with that mechanism then it too would be capable of having the visual experiences described here. However, I shall not attempt to provide such an explanation since the aim of this paper is merely to draw attention to the phenomenon.

I first pointed this out about 30 years ago, and have mentioned it from time to time in talks on requirements for AI systems with human-like visual capabilities. However, I don't know of any other researcher who has thought about this. If you have please get in touch!

One person recently presented with these pictures at first said that the two pairs of eyes had exactly the same shape. When I asked whether there was any additional respect in which they looked different she said one face had "glaring" eyes. (One meaning of the English verb glare is "stare in an angry or fierce way".) That description also fits my experience of one of the pairs of eyes. Some people also report seeing the other eyes as "smiling" or "happy".

I would be happy to receive comments by email to a.sloman[AT]cs.bham.ac.uk if anyone experiences the differences between the two pictures in some other way.

In some sense, the difference in appearance is an illusion: the only things that are different in the two faces are the mouths. But the appearance of something is not just its geometrical features but includes its relationships to other things either in the scene depicted or other other things suggested by what is depicted, e.g. an emotional state in this case.

I think this provides indirect evidence for part of the architectural theory developed in the CogAff project, in particular the claims about "multi-window perception" contrasted with "peephole perception".

I've used "peephole perception" to label the view that there's a one-way "ontologically thin" stream of information from the sensors that triggers central cognitive processes of grouping, segmentation, classification, and interpretation. "Ontologically thin" in this context implies that the information in sensory streams is restricted to sensor values or perhaps some features simply and directly derived from sensor values, e.g. intensity, contrast, optical flow, orientation, connectivity, direction of optical flow, etc.

In the case of haptic perception, "ontologically thin" percepts could include pressure, temperature, motion, and perhaps things roughness, smoothness and other simple felt textural features. This would contrast with seeing objects as petals, leaves, trees, conspecifics, predators, dangerous, or useful. In the case of human perception it could include seeing letters, words, or phrases, insofar as the physical marks are seen as parts of a communication system understood by the perceiver. All of these are cases of "ontologically thick" perception.

The "multi-window" view of perception claims that one of the results of evolution was to produce coexisting sensory systems that evolved at different times and can run in parallel, driven by the same sensory data, some of which, in more complex organisms, perform richer forms of processing that use more abstract and in some cases externally referring information contents, but organised in a way that keeps the information in registration either with sensor arrays or, in the case of vision, in registration with the "optic array" that exists in the environment at a viewpoint, and which animals sample by rapidly moving their eyes. So a portion of the optic array (e.g. where a corner of a table appears to be) can be seen to be stationary although its projection onto the retina, or the primary visual cortex changes because of saccades, or because of visual fixation on some nearby moving object.

Note that this means the information obtained from the optic array is not necessarily in registration with the retina or the primary visual cortex: for a possible mechanism see Trehub(1991). (From this viewpoint, area V1 in the human visual system is primarily part of a sophisticated sensory transducer for information in the optic array. Information detected by such a mechanism could be rapidly copied to other parts of the brain and processed in different ways in parallel, as seems to be the case with information that comes from the above pictures.)

An implication is that there is no sharp division between perception and cognition: the two sets of mechanisms overlap, or extend into each other. (This is an old idea, expressed by Max Clowes as "Perception is controlled hallucination", partly echoing von Helmholtz: "Perception is unconscious inference".)

The combination of more abstract derived information (e.g. the location of a corner where two edges meet) with more "primitive" sensory information can be useful in driving/guiding the further processing of incoming information, and in some cases also useful in controlling actions (using visual servo-control). Some examples were provided in connection with the description of the Popeye program in Chapter 9 of Sloman 1978, available online here:
http://www.cs.bham.ac.uk/research/projects/cogaff/crp/#chap9

Some simple examples are the "place tokens" that David Marr Marr(1982) suggested were added to the 'Primal Sketch' recording inferred entities, such as a line constituted by the collinear edges of a group of line segments, and the "illusory contours" studied by Gaetano Kanizsa.
http://en.wikipedia.org/wiki/Illusory_contours

Fig Illusory contours

From Sloman 1978 Figure 9.2:
http://www.cs.bham.ac.uk/research/projects/cogaff/crp/#fig9.2

A Kanizsa-inspired example from Chapter 9 of Sloman 1978 is above. Here the "illusory contours" are curved, not straight as in the original triangle example. For more information see:
https://en.wikipedia.org/wiki/Illusory_contours

Fig Ambiguous (flipping) pictures

What I am claiming is that interpretations of sensory contents at different levels of abstraction, or using different ontologies, normally assumed to be the function of processes remote from sensors, can contribute "downward" or "backward" information to the low level structures recording sensory details. Examples include the two 3D views of the Necker cube, and the two animal views of the Duck-Rabbit picture.

There are also many examples of pictures that are ambiguous because components can be seen as grouped in different ways, e.g. the well known vase/faces ambiguous image.

Similar "back-projection" of high level interpretations can be seen in the experiences produced by moving light-points attached to humans performing various actions in the dark, studied by Gunnar Johansson.
http://en.wikipedia.org/wiki/Biological_motion

References

M.B. Clowes, (1973) Man the creative machine: A perspective from Artificial Intelligence research, in The Limits of Human Nature, Ed. J. Benthall, Allen Lane, London.

Gaetano Kanizsa (1974). 'Contours Without Gradients or Cognitive Contours?' Italian J. Psychol 1, 1974, pp. 93-112.

David Marr, 1982, Vision, W.H.Freeman, San Francisco.

Aaron Sloman,
The Computer Revolution in Philosophy: Philosophy science and models of mind,
Harvester Press and Humanities Press, 1978. Revised edition (2001--) online:
http://www.cs.bham.ac.uk/research/projects/cogaff/crp/

Aaron Sloman (1986), What Are The Purposes Of Vision?, Presented at the Fyssen Foundation Vision Workshop, Versailles France, March 1986, Organiser: M. Imbert,
http://www.cs.bham.ac.uk/research/projects/cogaff/81-95.html#58

Aaron Sloman and colleagues (1991--)
Introductory overview of the Birmingham Cognition and Affect Project
http://www.cs.bham.ac.uk/research/projects/cogaff/#overview

Arnold Trehub, (1991), The Cognitive Brain, MIT Press, Cambridge, MA,
http://people.umass.edu/trehub/

Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham