Preprint of invited chapter in a book on Information and
Computation published by World Scientific, 2011
This html file was automatically generated from latex by 'tth' and may
contain errors and omissions, especially in the bibliography.
NOTES:
2. Notes on Jane Austen's concept/theory of information, illustrated with
examples from Pride and Prejudice, and contrasted with
Claude Shannon's concept of information can be found here:
3. After this paper was published some of the ideas were developed much further
in the formulation of the Meta-Morphogenesis project, especially the theory of
evolved construction kits.
(Eds)
Gordana Dodig-Crnkovic (Mdlardalen University, Sweden) and
Mark Burgin (UCLA, USA)
Full details:
http://www.cs.bham.ac.uk/research/projects/cogaff/09.html#905
Available also as pdf:
http://www.cs.bham.ac.uk/research/projects/cogaff/sloman-inf-chap.pdf
1. Re-formatted and slightly modified on 16 Jul 2015.
Further corrections 13 Nov 2017
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/austen-info.html
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/meta-morphogenesis.html
Contents
1 Introduction
1.1 The need for a theory
1.2 Is biological information-processing special?
1.3 Questions seeking answers
2 Uses of the word "information"
2.1 Confusions
2.2 This is not "information" in Shannon's sense
2.3 Misguided definitions
2.4 The world is NOT the best representation of itself
2.5 Disagreements about information bearers, representations
2.6 Computation and information
2.7 Not all information is true
3 Is "information" as used here definable?
3.1 The inadequacy of explicit definitions
3.2 Concepts implicitly (partially) defined by theories using them
3.3 Evaluating theories, and their concepts
3.4 The failure of concept empiricism and symbol-grounding theory
4 Information-bearers, information contents.
4.1 Users, bearers, contents, contexts - physical and virtual
4.2 Changing technology for information-bearers
4.3 A common error about bit patterns and symbols
4.4 Many forms of representation
4.5 "Self-documenting" entities
5 Aspects of information
5.1 Information content and function
5.2 Medium used for information bearer
5.3 Same content, but different function
5.4 Processing requirements for different media
5.5 Potential information content for a user
5.6 Potential information content for a TYPE of user
5.7 Information content shared between users
5.8 Ambiguity, noise, and layers of processing
5.9 Information content for a user determined partly by context
5.10 Information-using subsystems
5.11 Layers of interpretation in epigenesis
6 Conclusion
6.1 An implicitly defined notion of "information"
6.2 Life and information
6.3 Information processing in virtual machines
6.4 Finally: Is that everything?
Acknowledgements
Footnotes
References
The question "What is information?", like "What is matter?" and "What is energy?", cannot have a simple answer in the form of a non-circular definition. Answering such a question involves answering a host of related questions. Answers to the second and third cannot be given without presenting deep and complex theories about how the physical universe works. The theories, along with links to experimental methods, instruments and observation techniques, provide the only kind of definition possible for many of the concepts used in the physical sciences: implicit definition.
Moreover, the answers are always subject to the possibility of being revised or extended, as the history of physics shows clearly: old concepts may be gradually transformed as the theories in which they are embedded are expanded and modified - sometimes with major discontinuities, as happened to concepts like "matter", "energy" and "force" in the work of Newton and Einstein, for example. Lesser transformations go with improved instruments and techniques for observation, measurement, and testing predictions. So concepts, have a continuing identity through many changes, like rivers, growing organisms, nations, and many other things. See [Cohen 1962] and [Schurz 2009].
Information cannot play a role in any process unless there is something that encodes or expresses the information: an "information bearer" (B), and some user (U) that takes B to express information I (i.e. interprets B). The same bearer B may be interpreted differently by different users, and the same user, U may interpret B differently in different contexts (C). We need a theory that explains the different ways in which a bearer B can express information I for U in context C, and what that means. I shall henceforth use "representation" to refer to any kind of information bearer, and will later criticise some alternative definitions, in Section 2.3.
Such a theory will have to mention different kinds of information-users and information-bearers (physical and non-physical), as well as different kinds of information content, and the different ways information-bearers can be related to the information they carry, often requiring several layers of interpretation, as we'll see. The theory will also have to survey varieties of information users, with different sorts of information processing architectures, interacting with different sorts of environment, using information-bearers (representations) that have different structures, and use different media (physical and non-physical).
Questions to be addressed include: What are the requirements for U to treat B as expressing a meaning or referring to something? What are the differences between things that merely manipulate symbolic structures and things that also understand and make use of information they associate with those structures, for example, deriving new information from them, or testing the information for consistency? Compare [Searle 1980].
Why is a simple explicit definition for "information" impossible? Is it like some older scientific concepts, not explicitly definable, but implicitly definable by developing powerful explanatory theories that use the concept? Is information something that should be measurable as energy and mass are, or are its features mainly structures to be described not measured (e.g. the structure of this sentence, the structure of a molecule, the structure of an organism)? How does this (centuries old) notion of information (or meaning) relate to the more recent concept of information as something measurable? [Shannon 1948]
Are there conservation laws for information, or is that idea refuted by the fact that one user can give information to another without losing any? Moreover, it is even possible for me to say something that gives you information I did not have. (Compare the role of relay switches in electrical power circuits.)
This document attempts to give partial answers to these questions, and to specify requirements for more complete answers. I shall attempt to sum up what I think many scientists and engineers in many disciplines, and also historians, journalists, and lay people, are talking about when they talk about information, as they increasingly do, even though they don't realise precisely what they are doing. For example the idea of information pervades many excellent books about infant development, such as [Gibson&Pick 2000], without being explicitly defined. I shall try to explain how a good scientific theory can implicitly define its main theoretical concepts, and will sketch some of the main features of a theory of the role of information in our universe. A complete theory would require many volumes. In several other papers and presentations cited below, I have presented some of these ideas in more detail.
Some philosophers talk about "propositional content" but the normal interpretation of that phrase rules out information expressed in non-propositional forms, such as the information in pictures, maps, videos, gestures, and perceptual systems. So I shall stick to the label "information", and attempt to explain how it is used in many everyday contexts and also in scientific (e.g. biological) contexts. The word is also used in this sense in engineering, in addition to being used in Shannon's sense, discussed further in Section 2.2.
The phrase "semantic information" is as pleonastic as the phrase "young youths", since information, in the sense under discussion, is semantic. It is sometimes useful to contrast syntactic information with semantic information, where the former is about the form or structure of something that conveys information, whereas the semantic information would be about the content of what is said. ("Content" is metaphorical here.) For instance, saying that my sentences often have more than eight words gives syntactic information about my habits, whereas saying that I often discuss evolution or that what I say is ambiguous or unoriginal gives semantic information, or, in the latter case, meta-semantic information.
Likewise, we provide syntactic information about a programming language (e.g. how it uses parentheses) or semantic information (e.g. about the kinds of structure and transformations of structure that it can denote). We can distinguish the "internal" semantics of a programming language (the internal structures and processes the programs specify) from its "external" semantics, e.g. its relevance to a robot's environment, or to a company's employees, salaries, jobs, sales, etc.
There is another, more recent, use of the word "information" in the context of Shannon's "information theory" [Shannon 1948]. But that does not refer to what is normally meant by "information" (the topic of this paper), since Shannon's information is a purely syntactic property of something like a bit-string, or other structure that might be transmitted from a sender to a receiver using a mechanism with a fixed repertoire of possible messages. If a communication channel can carry N bits then each string transmitted makes a selection from 2N possible strings. The larger N is, the more alternative possibilities are excluded by each string actually received. In that syntactic sense longer strings carry more "information".
Likewise the information capacity of a communication channel can be measured in terms of the number of bits it can transfer in parallel, and the measure can be modified to take account of noise, etc. Shannon was perfectly aware of all this. He wrote
"The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem."[Shannon 1948]. [My emphasis.]
It is worth noting that although he is talking about an engineering problem of reproducing a message exactly, doing that is not what most human communication is about. If you ask me a question, my answer may fill a gap in your information, allowing you to make inferences that I could not make. Both of us may know that, and that could be the intention of my answer. On a noisy phone line that could happen if you knew in advance that the answer was either "elephant" or "fly". If I say "fly" and you hear "spy", the fact that my precise message was not transmitted accurately does not matter: you can tell that I did not say "elephant", and proceed accordingly.
A pupil's questions or comments may give a teacher information that the pupil would not understand, e.g. about how to continue a lesson. So communication in intelligent systems depends on, but is far more than, mere signal transmission. It also uses context, general knowledge of the world, more or less sophisticated interpretation mechanisms, and reasoning capabilities. Shannon's work is summarised, with strong warnings about extending it beyond the context of electromechanical signal transmission in [Ritchie 1986].
Having a measurable amount of information in Shannon's sense does not, in itself, allow a string to express something true or false, or to contradict or imply something else in the ordinary senses of "contradict" or "imply", or to express a question or command. Of course, a bit string used in a particular context could have these functions. E.g. a single bit could express a "yes" or "no" answer to a previously asked question, as could a "continue" or "stop" command. In some contexts, that single bit may indirectly convey a great deal of information. "Is everything Fred wrote in his letter true?" "Yes."
[Bateson 1972] describes "a bit of information" and later "the elementary unit of information" as "a difference that makes a difference".1 This is widely misquoted as offering a definition of "information" rather than a definition of "a bit/unit of information". He seems to be thinking of any item of information as essentially a collection of "differences" that are propagated along channels. This is far too simplistic - and perhaps too influenced by low level descriptions of computers and brains.
An alternative approach is to define "information" implicitly by a complete theory, as happens for many scientific concepts. This paper attempts to present substantial portions of such a theory, though the task is not completed. Section 3.2 explains how theories can implicitly define the concepts they use and 6.1 relates this to defining "information".
What it means for B to express I for U in context C cannot be given any simple definition. Some people try to define this by saying U uses B to "stand for" or "stand in for" I. For instance, Webb writes "The term `representation' is used in many senses, but is generally understood as a process in which something is used to stand in for something else, as in the use of the symbol `I' to stand for the author of this article" [Webb 2006]. This sort of definition of "representation" is either circular, if standing for is the same thing as referring to, or else false, if standing in for means "being used in place of". There are all sorts of things you can do with information that you would never do with what it refers to and vice versa. You can eat food, but not information about food.
Even if you choose to eat a piece of paper on which "food" is written that is usually irrelevant to your use of the word to refer to food. Information about X is normally used for quite different purposes from the purposes for which X is used. For example, the information can be used for drawing inferences, specifying something to be prevented, or constructed, and many more. Information about a possible disaster can be very useful and therefore desirable, unlike the disaster itself.
So the notion of standing for, or standing in for is the wrong notion to use to explain information content. It is a very bad metaphor, even though its use is very common. We can make more progress by considering ways in which information can be used. If I give you the information that wet whether is approaching, you cannot use the information to wet anything. But you can use it to decide to take an umbrella when you go out, or, if you are a farmer you may use it as a reason for accelerating harvesting. The falling rain cannot so be used: by the time the rain is available it is too late to save the crops.
The same information can be used in different ways in different contexts or at different times. The relationship between information content and information use is not a simple one.
Herbert Simon pointed out long ago [Simon 1969] that sometimes the changes made to the environment while performing a task can serve as reminders or triggers regarding what has to be done next, giving examples from insect behaviours. The use of stigmergy, e.g. leaving tracks or pheromone trails or other indications of travel, which can later be used by other individuals, shows how sometimes changes made to the environment can be useful as means of sharing information with others. Similarly if you cannot be sure whether a chair will fit through a doorway you can try pushing it through, and if it is too large you will fail, or you may discover that it can go through only if it is rotated in some complex way.
The fact that intelligent agents can use the environment as a store of information or as a source of information or as part of a mechanism for reasoning or inferring, does not support the slogan that the world, or any part of it, is always, or even in those cases the best representation of itself (a) because the slogan omits the role of the information-processing in the agent making use of the environment and (b) because it sometimes is better to have specific instructions, a map, a blue-print or some other information structure that decomposes information in a usable way, than to have to use the portion of the world represented, as anyone learning to play the violin simply by watching a violinist will discover.
In general, information about X is something different from X itself. Reasons for wanting or for using information about X are different from the reasons for wanting or using X. E.g. you may wish to use information about X in order to ensure that you never get anywhere near X if X is something dangerous. You may wish to use information about Xs to destroy Xs, but if that destroyed the information you would not know how to destroy the next one until you are close to it. It may then be too late to take necessary precautions, about which you had lost information.
[Dreyfus 2002] wrote "The idea of an intentional arc is meant to capture the idea that all past experience is projected back into the world. The best representation of the world is thus the world itself." As far as I can make out he is merely talking about expert servo control, e.g. the kind of visual servoing which I discussed in [Sloman 1982]. But as any roboticist knows, and his own discussion suggests, this kind of continuous action using sensory feedback requires quite sophisticated internal information processing [Grush 2004]. In such cases "the world" is not nearly enough.
He repeatedly emphasises the need to test working systems on the real world and not only in simulation, a point that has some validity but can be over-stressed. (If aircraft designers find it useful to test their designs in simulation, why not robot designers?) Moreover, he disputes the need for representations (information bearers constructed and manipulated by information users), saying: "We hypothesize (following Agre and Chapman) that much of even human level activity is similarly a reflection of the world through very simple mechanisms without detailed representations," and "We believe representations are not necessary and appear only in the eye or mind of the observer." A critique of that general viewpoint is presented in [Sloman 2009c], which mostly deals with [Brooks 1990], in which he goes further:
"The key observation is that the world is its own best model. It is always exactly up to date. It always contains every detail there is to be known. The trick is to sense it appropriately and often enough."
That's impossible when you are planning the construction of a skyscraper using a new design, or working out the best way to build a bridge across a chasm, or even working out the best way to cross a busy road, which you suspect has a pedestrian crossing out of sight around the bend. The important point is that intelligence often requires reasoning about what might be the case, or might happen, and its consequences: and that cannot be done by inspecting the world as it is. Recall that information bearers and things they represent have different uses (Section 2.3).
Some people, for example the philosopher Fred Dretske, in his contribution to (Floridi 2008), claim that what we ordinarily mean by "information" in the semantic sense is something that is true, implying that it is impossible to have, provide or use false information. False information, on that view can be compared with the decoy ducks used by hunters. The decoys are not really ducks though some real ducks may be deceived into treating the decoys as real - to their cost! Likewise, argues Dretske, false information is not really information, even though some people can be deceived into treating it as information. It is claimed that truth is what makes information valuable, therefore anything false would be of no value.
Whatever the merits of this terminology may be for some philosophers, the restriction of "information" to what is true is such a useless encumbrance that it would force scientists and robot designers (and philosophers like me) to invent a new word or phrase that had the same meaning as "information" but without truth being implied. For example, a phrase something like "information content" might be used to refer to the kind of thing that is common to my belief that the noise outside my window is caused by a lawn-mower, and my belief that the noise in the next room is caused by a vacuum cleaner, when the second belief is true while first belief is false because the noise outside comes from a hedge trimmer.
The observation that humans, other animals and robots, acquire, manipulate, interpret, combine, analyse, store, use, communicate, and share information, applies equally to false information and to true information, or to what could laboriously be referred to as the "information content" that can occur in false as well as true beliefs, expectations, explanations, and percepts, and moreover, can also occur in questions, goals, desires, fears, imaginings, hypotheses, where it is not known whether the information content is true.
So in constructing the question "Is that noise outside caused by a lawnmower?", a speaker can use the same concepts and the same modes of composition of information as are used in formulating true beliefs like: "Lawnmowers are used to cut grass", "Lawnmowers often make a noise", "Lawnmowers are available in different sizes", as well as many questions, plans, goals, requests, etc. involving lawnmowers. Not only true propositions are valuable: all sorts of additional structures containing information are useful.
Even false beliefs can be useful, because by acting on them you may learn that they are false, why they are false, and gain additional information. That's how science proceeds and much of the learning of young children depends heavily on their ability to construct information contents without being able to tell which are true and which are false. The learning process can then determine the answers. This will also be important for intelligent robots.
For the purposes of cognitive science, neuroscience, biology, AI, robotics and many varieties of engineering, it is important not to restrict the notion of "information" to what is true, or even to whole propositions that are capable of being true or false. There are information fragments of many kinds that can be combined in many ways, some, but not all, of which involve constructing propositions. Information items can be used in many other processes.
The uses of information in control probably evolved before other uses of information in biological organisms, including, for example, microbes. Explaining how and why other uses evolved, such as forming memories, predictions, questions and explanations, along with increasingly sophisticated mechanisms to support them, is a task for another occasion. Some hypotheses are sketched in (Sloman 2007a).
After many years of thinking about this, I have concluded that "information" in this sense cannot be explicitly defined without circularity. The same is true of "mass", "energy" and other deep concepts used in important scientific theories. Attempts to define "Information" by writing down an explicit definition of the form "Information is ...." all presuppose some concept that is closely related ("meaning", "content", "reference", "description", etc.). "Information is meaning", "information is semantic content", "information is what something is about" are all inadequate in this sense.
This kind of indefinability is common in concepts needed for deep scientific theories. Attempts to get round this by "operationalising" theoretical concepts fail. For example, there are standard methods of measuring mass and energy, but those do not define the concepts, since the measuring methods change as technology develops, while the meanings of the words remain mostly fixed by their roles in physical theories. The measurement methods define what are sometimes called "bridging rules" or "correspondence rules", which link theories to observations and applications. [Carnap 1947] called some of them "meaning postulates". All this was known to early 20th century philosophers of science, some of whom had tried unsuccessfully to show that scientific concepts are definable in terms of the sensory experiences of scientists, or in terms of "operational definitions" specifying how to detect or measure physical quantities [Bridgman 1927].
The absence of any explicit definition does not mean either that a word is meaningless or that we cannot say anything useful about it. The specific things said about what energy is and how it relates to force, mass, electrical charge, etc., change over time as we learn more, so the concepts evolve. Newton knew about some forms of energy, but what he knew about energy is much less than what we now know about energy, e.g. that matter and energy are interconvertible, and that there are chemical and electromagnetic forms of energy. Growing theoretical knowledge extends and deepens the concepts we use in expressing that knowledge [Cohen 1962], [Schurz 2009]. That is now happening to our concept of information as we learn more about types of information-processing machine, natural and artificial.
If a theory is expressed logically, and is not logically inconsistent, and its undefined concept labels are treated as variables ranging over predicates, relations and functions, then there may be a non-empty set of possible models for the set of statements expressing the theory, where the notion of something being a model is illustrated by lines, points, and relations between them being a model for a set of axioms for Euclidean geometry, and also certain arithmetical entities being a model for the same axioms.
This notion of model was first given a precise recursive definition by Tarski but the idea is much older, as explained in [Sloman 2007c]. I think the core idea can be generalised to theories expressed in natural language and other non-logical forms of representation including non-Fregean forms of representation, but making that idea precise and testing it are research projects (compare [Sloman 1971]). The models that satisfy some theory with undefined terms will include possible portions of reality that the theory could describe.
Insofar as there is more than one model, the meanings of the terms are partly indeterminate, an unavoidable feature of scientific theories. [Sloman 1978,Chap 2] explains why it is not usually possible to completely remove indeterminacy of meaning. Compare [Cohen 1962].
Adding new independent postulates using the same undefined terms will further constrain the set of possible models. That is one way to enrich the content of a theory. Another way is to add new undefined concepts and new hypotheses linking them to the old ones. That increases the complexity required of a piece of reality if it is to be a model of the theory. Other changes may alter the set of models and increase the number of things that are derivable from the theory, increasing the variety of predictions.
Some changes will also increase the precision of the derived conclusions, e.g. specifying predicted processes or possible processes in more detail. Adding new "meaning postulates", or "bridging rules", linking undefined terms to methods of measurement or observation, as explained above, can also further constrain the set of possible models, by "tethering" (label suggested in [Chappell & Sloman 2007]) the theory more closely to some portion of reality. As science progresses and we learn more things about energy. the concept becomes more constrained - restricting the possible models of the theory, as explained in [Sloman 2007c]. This gradual increase in understanding would not be possible if the initial concepts were fully determinate. Far from requiring absolutely precise concepts, as normally supposed, some scientific advances depend on (partial) indeterminacy of concepts.
Doubt is cast on the value of a theory and its concepts if the theory does not enhance our practical abilities, if it doesn't explain a variety of observed facts better than alternative theories, if all its predictions are very vague, if it never generates new research questions that lead to new discoveries of things that need to be explained, if its implications are restricted to very rare situations, and if it cannot be used in making predictions, or selecting courses of action to achieve practical goals, or in designing and steadily improving useful kinds of machinery, In such cases, the concepts implicitly defined by the theory will be limited to reference within the hypothetical world postulated by the theory. Concepts like "angel" and "fairy" are examples of such referentially unsuccessful concepts, though they be used to present myths of various sorts, providing entertainment and, in some cases, social coercion.
These ideas about concepts and theories were elaborated in [Sloman 1978,Chap 2], which pointed out that the deepest advances in science are those that extend our ontology substantively, including new theories that explain possibilities not previously considered. How concepts can be partly defined implicitly by structural relations within a theory is discussed further in [Sloman 1985,Sloman 1987]. These ideas can be extended to non-logical forms of representation, as discussed in [Sloman 2008b].
Unfortunately, the already discredited theory was recently reinvented and labelled "symbol grounding theory" [Harnad 1990]. This theory seems highly plausible to people who have not studied philosophy, so it has spread widely among AI theorists and cognitive scientists, and is probably still being taught to unsuspecting students. Section 3.2 presented "symbol tethering" theory, according to which meanings of theoretical terms are primarily determined by structural relations within a theory, supplemented by "bridging rules". Designers of intelligent robots will have to produce information-processing architectures in which such theories can be constructed, extended, tested and used, by the robots, in a process of acquiring information about the world, and themselves.
Marvin Minsky in [2005] also talks about "grounding" but in a context that neither presupposes nor supports symbol-grounding theory. He seems to be making a point I agree with, namely that insofar as complex systems like human minds monitor or control themselves the subsystem that does the monitoring and controlling needs to observe and intervene at a high level of abstraction instead of having to reason about all the low level details of the physical machine. In some cases, this can imply that the information that such a system has about itself is incomplete or misleading. I.e. self-observation is not infallible, except in the trivial sense in which a voltmeter cannot be misled about what its reading of a voltage is, as explained in [Sloman 2007b].
The rest of this paper attempts to outline some of the main features of a theory about roles information can play in how things work in our world. The theory is still incomplete but we have already learnt a lot and there are many possible lines of development of our understanding of information processing systems in both natural and artificial systems.
The expressed information can be involved in many processes, for instance: acquiring, transforming, decomposing, combining with other information, interpreting, deriving, storing, inferring, asking, testing, using as a premiss, controlling internal or external behaviour, and communicating with other information-users. Such processes usually require U to deploy mechanisms that have access to B, to parts of B, and to other information-bearers (e.g. in U's memory or in the environment).
The existence of information-bearers does not depend on the existence of what they refer to: things can be referred to that do not exist. Mechanisms for this were probably a major advance in biological evolution. Example information-bearers explicitly used by humans include sentences, maps, pictures, bit-strings, video recordings, or other more abstract representations of actual or possible processes.
At present little is known about the variety of information bearers in biological systems, including brains, though known examples include chemical structures and patterns of activation of neurons. In some cases the information-bearers are physical entities, e.g. marks on paper or acoustic signals, or chemicals in the blood stream. But many information-bearers in computing systems, e.g. lists of symbols, the text in a word-processor, are not physical entities but entities in virtual machines (see Section 6.3).
The use of virtual machines in addition to physical machines has many benefits for designers of complex information processing systems. [Sloman 2009f] argues that evolution produced animals that use virtual machines containing information bearers, for similar reasons. The problem of explaining what information is includes the problem of how information can be processed in virtual machines, natural or artificial. (In this context, the word "virtual" does not imply "unreal"2.)
The bearer is a physical or virtual entity (or collection of entities) that encodes or expresses the information, for that user in that context. Many people, in many disciplines, now use the word "representation" to refer to information-bearers of various kinds, though there is no general agreement on usage. Some who argue that representations are not needed proceed to discuss alternatives that are already classified as representations by broad-minded thinkers. Such factional disputes are a waste of time.
These are typically constructed from various primitive entities and relationships available in virtual machines though they are all ultimately implemented in bit-patterns, which themselves are virtual entities implemented in physical machines using transistors, magnetic mechanisms in disc drives, etc. The use of such things as error-correcting memories and raid arrays implies that the bits in a bit pattern are virtual entities that do not correspond in any simple way to physical components.
This use of bit-patterns as a form of representation is relatively recent, although Morse code, which is older, is very close. Long before that, humans were using language, diagrams, gestures, maps, marks in the sand, flashing lights, etc. to express information of various kinds [Dyson 1997]. And before that animal brains used still unknown forms of representation to encode information about the environment, their motives, plans, learnt generalisations, etc.[Sloman 1979, Sloman 2008b]. It is arguable that all living organisms acquire and use information, both in constructing themselves and also in controlling behaviour, repairing damage, detecting infections, etc.3
Information-bearers need not be intentionally constructed to convey information. For example, an animal may hear a sound and derive the information that something is moving nearby. The original information-bearer is a transient acoustic signal in the environment produced unintentionally by whatever moved. The hearer constructs an enduring information-bearer (representation) that may be retained long after the noise has ended. The physical signal does not intrinsically carry that information, though for a particular user it may do so as a result of prior learning. However, in a different context, the same noise may be interpreted differently.
So the association between bearer and information content can depend not only on user but on context: information (or meaning) involves at least a four-termed relation involving B, I, U, and C.
What characterises a form of representation is a collection of primitives, along with ways of modifying them, combining them to form larger structures, transformations that can be applied to the more complex items, mechanisms for storing, matching, searching, and copying them, and particular uses to which instances of the form can be put, e.g. controlling behaviour, searching for plans, explaining, forming generalisations, interpreting sensory input, expressing goals, expressing uncertainty, and communication with others. The representing structures may be physical objects or processes, or objects or processes in virtual machines.
The use of virtual machine forms of representation allows very rapid construction and modification of structures without having to rearrange physical components. In computers instead of physical rearrangements there are merely banks of switches that can be turned on and off, thereby implementing changes to virtual network topology and signals transmitted, in terms of which higher-level virtual machine representations can be implemented.
Humans often use forms that are Fregean [Sloman 1971] insofar as they use application of functions to arguments to combine information items to form larger information items. Examples include sentences, algebraic expressions, logical expressions and many expressions in computer programs. Purely Fregean forms of representation use only function application, whereas impure forms also use spatial or temporal order, and other relationships in the bearer's medium, as [Bateson 1972] noted. For example, the programming language Prolog uses ordering of symbols as well as the function-argument relationship, as significant.
The 1971 paper argued, against [McCarthy & Hayes 1969], that non-Fregean forms of representation, e.g. analogical representations, are often useful, and should be used in AI alongside logic and algebra. For example, information may usefully be expressed in continuously changing levels of activation of some internal or external sensing device, in patterns of activation of many units, in geometrical or topological structures analogous to images or maps, in chemical compounds, and many more. Despite some partial successes, this has proved easier said than done.
Exactly how many different forms exist in which information can be encoded, and what their costs and benefits are, is an important question that will not be discussed further here. One of the profound consequences of developments in metamathematics, computer science, artificial intelligence, neuroscience and biology in the last century has been to stretch our understanding of the huge variety of possible forms of representation [Peterson 1996], including some forms that are not decomposable into discrete components, as sentences, logical expressions, and bit strings are, and some which can also change continuously, unlike Fregean representations.
Besides analogical and Fregean forms of representation many others have been explored, including distributed neural representations and forms of genetic encoding. [Minsky 1992] discusses tradeoffs between some symbolic and neural forms. There probably are many more forms of representation (more types of information-bearer) than we have discovered so far. Some philosophers use the misleading expression "non-conceptual content" to refer to some of the non-Fregean forms of representation - misleading because it presupposes that concepts (units of semantic content) can only be used in propositional formats.
We can achieve greater generality by using the label "concept" wherever there are re-usable information components that can be combined with others in different ways whether in propositions, instructions, pictures, goal specifications, action-control signals, or anything else.4
Obviously, a representation may convey different information to different users, and nothing at all to some individuals (e.g. humans listening to a foreign language). Moreover, the very same information-bearer can convey different information to the same user at different times, in different contexts, for example, indexical expressions, marks in the sand, shadows, etc. (Further examples and their implications are discussed below in Section 5.9 and in [Sloman 2006b].)
The continued investigation of the space of possible forms of representation, including the various options for forming more complex information contents from simpler ones, and the tradeoffs between the various options, is a major long term research project. This paper is mostly neutral as regards the precise forms in which information can be encoded.
Different information users can take in and use different subsets or impoverished forms of that information, depending on their sensory apparatus, their information processing architecture, the forms of representation they are able to use, the theories they have, and their location in relation to the twig. (Compare the notion of "intrinsic information" in [Reading 2006].)
Besides the "categorical" information about the parts, relationships, properties, and material constitution of an object or process that can be discovered by an appropriately equipped perceiver, there is also less obvious "dispositional" information about processes it could be part of, processes that it constrains or prevents, and processes that could have produced it. These are causal relationships. Intelligent perceivers make a great deal of use of such information when they perceive affordances of various kinds.
Gibson's notion of "affordance" [J.J. Gibson 1979] focuses on only a subset of possible processes and constraints, namely those relevant to what a perceiver can and cannot do: action-affordances for the perceiver. We need to generalise that idea if we are to describe all the different kinds of information a perceiver can use in the environment, including proto-affordances, concerned with which processes are and are not physically possible in the environment, epistemic affordances, concerned with what information is and is not available and vicarious affordances, concerned with affordances for other agents, all described in [Sloman 2008a]. Some animals are able to represent meta-affordances: information about ways of producing, modifying, removing, or acquiring information about, affordances of various kinds.
Information-users will typically be restricted in the kinds of information they can obtain or use, and at any time they will only process a subset of the information they could process. They will typically not make use of the majority of kinds of information potentially available. For instance, detailed, transient, metrical information about changing relationships will be relevant during performance of actions such as grasping, placing, catching or avoiding, but only more abstract information will be relevant while future actions are being planned, or while processes not caused by the perceiver are being observed [Sloman 1982].
States of an information-processing system (e.g. the mind of an animal or robot) are generally not just constituted by what is actually occurring in the system but by what would or could occur under various conditions - a point made long ago in [Ryle 1949].
The information-processing mechanisms and forms of representation required for perceivers to acquire and use information about actual and possible processes and causal relationships are not yet understood. Most research on perception has ignored the problem of perceiving processes, and possibilities for and constraints on processes, because of excessive focus on perceiving and learning about objects.
It is possible for the same information content (e.g. that many parents abuse their children by indoctrinating them) to be put to different uses. E.g. it can be stated, hypothesised, denied, remembered, imagined to be the case, inferred from something, used as a premiss, used to explain, used to motivate political action, and many more. Those could all be labelled "declarative" uses of information. An item of declarative information can be true or false, and can imply, contradict, or be derived from, other items of factual information. It can also provide an answer (true or false) to a question, or a description of what needs to be achieved for an item of control information to be successful, e.g. for a command to be obeyed.
The same content can also occur in other information uses, e.g. "interrogative" and "imperative" uses: formulating requests for information and specifying an action to be performed (or modified, terminated, suspended or delayed, etc.), for instance asking whether it is the case or exhorting people to make it false by changing their ways. An important use that is hard to specify is in conditionalising some other information content, which could be a statement, intention, command, question, prediction. Examples: "If it's raining take an umbrella", "If it's raining, why aren't you wet?" There is usually no commitment regarding truth or falsity of the condition, in such uses.
Like questions, imperative uses of information are not true or false, though particular processes can be said to follow or not follow the instructions. Just as some declarative information contents are inconsistent, and therefore incapable of being true, likewise, some instructions are inconsistent, and therefore impossible to execute (e.g. "Put seven balls into an empty box and, put red marks on ten of them").
From the earliest days of AI and software engineering it was clear that choice of form of representation could make a large difference to the success of a particular information-processing system. Different expressive media can be used for the various functions: vocal utterances, print, internet sites, use of sign language, political songs, etc. The same content expressed in print could use different fonts, or even entirely different languages. But some information contents cannot be adequately expressed in some media, e.g. because, as J.L.Austin once quipped: "Fact is richer than diction" [Austin 1956]. Some kinds of richness are better represented in a non-Fregean medium, e.g. using static or moving images, or 3-D models.
A pre-verbal child, or a non-human animal, can have percepts whose content specifies a state of affairs in the environment; and can have intentions whose content specifies some state of affairs to be achieved, maintained or prevented. It is unlikely that toddlers, dogs, crows, and apes use only linguistic or Fregean forms of representation, though there are many unanswered questions about exactly which other forms or media are possible.
Many information-bearers use static media, like sentences, pictures, or flowcharts, whereas some use dynamic media, in which processes are information-bearers, e.g. audio or video recordings, gestures, play acting, and others. If the dynamic representation is repeatedly produced it may be represented by some enduring static structure that is used to generate the dynamic process as needed - e.g. a computer program can repeatedly generate processes. I suspect the role of dynamic information-bearers and static encodings of dynamic information-bearers, in animal intelligence, and future intelligent robots, will turn out to be far more important than anyone currently realises, not least because much information about the environment is concerned with processes occurring, and processes that could occur.
Earlier, in 4.5, we mentioned self-documenting entities, which potentially express information for various kinds of information user simply in virtue of their structure, properties and relations. These information bearers do not depend for their existence on users. They can be contrasted with the sensory signals and other transient and enduring information bearers constructed by information users. An element of truth in the view of Brooks criticised above (2.5) is that in some cases the presence of self-documenting entities reduces (but does not eliminate) the need for an information user to construct internal representations. Moreover, during performance of actions, force-feedback and visual feedback can be used to provide fine-grained control information that reduces the reliance on ballistic control, which may be inaccurate.
Another way of putting the point about control using feedback is that the changing relationships to external objects produced when performing physical actions can be useful self-documenting aspects of the environment, helping with control. They can also be useful for other observers (friendly or unfriendly!) who can perceive the actions and draw conclusions about the intentions and motives of the agent - if the viewers have appropriate meta-semantic information-processing capabilities. In that sense, intentional actions can serve as unintended communications, and it is conjectured in [Sloman 2008b] that fact played a role in evolution of languages used intentionally.
Items of information with the same declarative content can be given different functional roles in an information user. For example, the same thing can be stated to be true and either asked about or commanded to be made or kept true. It can also be wondered about, hypothesised, imagined regretfully, treated as an ideal, etc.
The philosopher R.M. Hare [Hare 1952] introduced the labels "Phrastic" and "Neustic" to distinguish the semantic content of an utterance and the speech act being performed regarding that content, e.g. asserting it, denying it, enquiring about its truth value, commanding that it be made true, etc. The concept of "information content" used here is close to Hare's notion of a "Phrastic", except that we are not restricting semantic content to what can be expressed in a linguistic or Fregean form: other media, including maps, models, diagrams, route-summaries, flow-charts, builders' blue-prints, moving images, 3-D models, and other things, can all encode information contents usable for different functions.
Moreover, not all uses are concerned with communication between individuals: information is processed in perceiving, learning, wanting, planning, remembering, deciding, etc. [Sloman 1979,Sloman 2008b]. We therefore need to generalise the Phrastic/Neustic distinction to contrast content and function in many different information media, including information expressed in diagrams, maps, charts [Sloman 1971], and also whatever forms are used in animal brains or minds. In many cases the "neustic" is not expressed within the representation but simply by its role in an information processing architecture, as explained in [Sloman 2009a], or in some aspect of the context, e.g. the word "Wanted" above a picture of a human face.
Questions, requests, commands, desires, and intentions, can all be described as examples of "control information", because their information-processing function (the neustic aspect), involves making something happen, unlike factual information, which, in itself, has no implications for action, although it can have implications in combination with motives, conditional plans, etc. Control information (and what should be done) is commonly found in kitchen recipes, computer programs, knitting patterns, legal documents, etc. There must be many forms implemented in animal brains.
Summing up: When information is used we can distinguish the content of the information (phrastic) from the use that is being made of it (neustic). The latter may be explicitly indicated in the medium, or implicitly determined by the subsystem of the user that the bearer is located in, or the context. We can also distinguish different information media, e.g. linguistic, Fregean, pictorial, hybrid, static, dynamic, etc. Each of these can be further subdivided in various ways, only some of which have already been explored in working artificial systems.
One of the achievements of AI research in the last half-century has been the study of different information media, and analysis of different information processing mechanisms required for dealing with them, including sentences, algebraic expressions, logical expressions, program texts, collections of numerical values, probability distributions, and a variety of analogical forms of representation, including pictures, diagrams, acoustic signals, and more. There are many ways in which information media can vary, imposing different demands on the mechanisms that process them.
One of the most important features of certain media is their "generativity". For example, our notations for numbers, sentences, maps, computer programs, chemical formulate, construction blue-prints, are all generative insofar as there is a subset of primitive information bearers along with ways in which those primitives can be combined to form more complex bearers, where the users have systematic ways of interpreting the complex bearers on the basis of the components and their relationships. This is referred to as a use of "compositional semantics", where meanings of wholes depend on meanings of parts and their relationships, and sometimes also the context [Sloman 2006b].
If an organism had only six basic actions, and could only process bearers of information about complex actions made up of at most three consecutive basic actions, then it would have restricted generativity, allowing for at most 216 complex actions. Some organisms appear to have sensor arrays that provide a fixed size set of sensor values from which information about the environment at any time can be derived. In contrast, humans, and presumably several other species, do not simply record sensor values but interpret them in terms of configurations of entities and processes in the environment, e.g. visible or tangible surface fragments in various orientations changing their mutual relationships.
If the interpretation allows scale changes (e.g. because of varying distances) and sequential scanning of scenes, both of which are important in human vision, the user can construct and interpret information bearers of different kinds and degrees of complexity. The mechanisms involved may have physical limits without being limited in principle, in which case the animal or machine may have "infinite competence" (explained more fully in [Sloman 2002]). Even when the competence is not infinite, compositionality implies the ability to deal with novelty, a most important feature for animals and robots inhabiting an extremely variable environment. Closely related to this are the ability to plan complex future actions and the ability to construct new explanations of observed phenomena.
A more complete exposition would need to discuss different ways in which information bearers can be combined, with different sorts of compositional semantics. One of the major distinctions mentioned in Section 4.4 is between and Fregean and other forms of composition. As explained in [Sloman 1971], the systematic complexity of forms of representation can provide a basis for reasoning with information-bearers: deriving new conclusions from old information by manipulating the bearers, whether Fregean or not. Logical inference and geometric reasoning using diagrams two special cases among many.
The information in B can be potentially usable by U even though U has never encountered B or anything with similar information content. That's obviously true when U encounters a new sentence, diagram or picture for the first time. Even before U encountered the new item, it was potentially usable as an information-bearer. In some cases, though not all, the potential cannot be realised without U first learning a new language, or notation, or even a new theory within which the information has a place.
You cannot understand the information that is potentially available to others in your environment if you have not yet acquired all the concepts involved in the information. For example, it is likely that a new-born human infant does not have the concept of a metal, i.e. that is not part of its ontology [Sloman 2009b]. So it is incapable of acquiring the information that it is holding something made of metal even if a doting parent says "you are holding a metal object". In humans a lengthy process of development is required for the information-processing mechanisms (forms of representation, algorithms, architectures) to be able treat things in the environment as made of different kinds of stuff, of which metals are a subset.
Even longer is required for that ontology to be extended to include the concepts of physics and chemistry. In part that is a result of cultural evolution: not all our ancestors were able to acquire and use such information.
It is possible for information to be potentially available for a TYPE of user even if NO instances of that type exist. For example, long before humans evolved there were things happening on earth that could have been observed by human-like users using the visual apparatus and conceptual apparatus that humans have. But at the time there were no such observers, and perhaps nothing else existed on the planet that was capable of acquiring, manipulating, or using the information, e.g. information about the patterns of behaviours of some of the animals on earth at the time. (This is related to the points made about self-documenting entities in 4.5.)
There may also be things going on whose detection and description would require organisms or machines with a combination of capabilities, including perceptual and representational capabilities and an information-processing architecture, that are possible in principle, but have never existed in any organism or machine and never will - since not everything that is possible has actual instances. Of course, I cannot give examples, since everything I can present is necessarily capable of being thought about by at least one human.
Weaker, but still compelling, evidence is simply the fact that the set of things humans are capable of thinking of changes over time as humans acquire more sophisticated concepts, forms of representation and forms of reasoning, as clearly happens in mathematics, physics, and the other sciences. There are thoughts considered by current scientists and engineers that are beyond the semantic competences of any three year old child, or any adult human living 3000 years ago. If the earth had been destroyed three thousand years ago, that might have relegated such thoughts to the realm of possible information contents for types of individual that never existed, but could have.
It is sometimes possible for a bearer B to mean the same thing (convey the same information content I) to different users U and U′, and it is also possible for two users who never use the same information-bearers (e.g. they talk different languages) to acquire and use the same information.
This is why relativistic theories of truth are false. It cannot be true for me that my house has burned down but not true for my neighbour. In principle we have access to the same sources of information in the world.
In some cases the medium requires several layers of interpretation, using different ontologies, to be coordinated, e.g. acoustic, phonetic, morphemic, syntactic, semantic and social, in the case of speech understanding systems. Other layers are relevant in visual systems, such as edge features, larger scale 2-D features, 3-D surface fragments, 3-D structures, layers of depth, 3-D processes involving interacting structures, intentions of perceived agents, etc. [Trehub 1991] offers a theory about how such layers might be implemented neurally, but there remain many unknowns about how vision works.
In some cases, the requirement for layers of interpretation is the result of engineering designs making use of compression, encryption, password protection, zipping or tarring several files into one large file, and many more. In other cases, the layers are natural consequences of a biological or engineering information-processing task, e.g. the layers in visual information processing.
Some information-bearers include various amounts and kinds of noise, clutter, and partial occlusion, sometimes causing problems that require collaboration between interpretation processes at different levels of abstraction. Where multiple layers of processing are coordinated, ambiguities in some layers may be resolved by interpretations in other layers, possibly using background knowledge [Sloman 1978,Chap 9]. This is sometimes described as "hierarchical synthesis", or "analysis by synthesis" [Neisser 1967]. A related view of layers of interpretation is presented in [Barrow & Tenenbaum 1978].
Although there has been much research on ways of extracting information from complex information-bearers, it is clear that nothing in AI comes close to matching, for example, the visual competences of a nest-building bird, a tree-climbing ape, a hunting mammal catching prey, a human toddler playing with bricks and other toys. In part, that is because not even the requirements have been understood properly [Sloman 2008a].
Some information-bearing structures express different information for the same user U in different contexts, because they include an explicit indexical element (e.g. "this", "here", "you", "now", or non-local variables in a computer program).
Another factor that makes it possible for U to take a structure B to express different meanings in different contexts can be that B has polymorphic semantics: its semantic function (for U, or a class of users) is to express a higher order function which generates semantic content when combined with a parameter provided by the linguistic or non-linguistic context. E.g. consider: "He ran after the smallest pony". Which pony is the smallest pony can change as new ponies arrive or depart. More subtly, what counts as a tall, big, heavy, or thin X can vary according to the range of heights, sizes, weights, thicknesses of Xs in the current environment and in some cases may also depend on why you are looking for something tall, big, heavy , etc.
There are many more examples in natural language that lead to incorrect diagnosis of words as vague or ambiguous, when they actually express precise higher order functions, applied to sometimes implicit arguments, e.g. "thin", "long", "efficient", "heap". Other examples include spatial prepositions and other constructs, which can be analysed as having a semantics involving higher order functions some of whose arguments are non-linguistic, discussed in [Sloman 2006b].
A more complex example is: "A motor mower is needed to mow a meadow" which is true only if there is an implicit background assumption about constraints on desirable amounts of effort or time, size of meadow, etc. So a person who utters that to a companion when they are standing in a very large meadow might be saying something true, whereas in a different context, where there are lots of willing helpers, several unpowered lawnmowers available, and the meadow under consideration is not much larger than a typical back lawn, the utterance would be taken to say something different, which is false, even if the utterances themselves are physically indistinguishable. Moreover, where they are standing does not necessarily determine what sort of meadow is being referred to. E.g. they may have been talking about some remote very large or very small meadow.
The influence of context on information expressed is discussed in more detail in relation to Grice's theory of communication, in [Sloman 2006b], along with implications for the evolution of language. The importance of the role of extra-linguistic context in linguistic communication can be developed in connection with indexicals, spatial prepositions, and Gricean semantics, into a theory of linguistic communications as using higher order functions some of whose arguments have to be extracted from non-linguistic sources by creative problem-solving.
This has implications for language learning and the evolution of language. It also requires the common claim that natural languages use compositional semantics, to modified, to allow context to play a role. The use of non-local variables can have a similar effect in programming languages. It seems very likely that brain mechanisms also use context-modulated compositional semantics.
An information-user can have parts that are information users. This leads to complications such as that a part can have and use some information that the whole would not be said to have. E.g. your immune system and your digestive system and various metabolic processes use information and take decisions of many kinds though we would not say that you have, use or know about the information.
Likewise there are different parts of our brains that evolved at different times that use different kinds of information, even information obtained via the same route, e.g. the retina or ear-drum, or haptic feedback. Input and output devices can be shared between sub-systems that use them for different purposes, possibly after different pre- or post- processing, as explained in [Sloman 1993]. Some sub-systems are evolutionarily old and shared with other species, some are newer, and some unique to humans.
An example is the information about optical flow that is used in humans to control posture, without individuals being aware of what they are doing [Lee & Lishman 1975]. More generally, it is likely that human information processing architectures include many components that evolved at different times, performing different functions, many of them concurrent, some of them surveyed in [Sloman 2003]. The subsystems need not all use the same forms of representation, and individual subsystems need not all have access to information acquired, derived, constructed or used by others. In particular, some will use transient information that is not transferred to or accessible by other subsystems.
That is why much philosophical, psychological, and social theorising is misguided: it treats humans as unitary and rational information users. That includes Dennett's intentional stance and what Newell refers to as "the Knowledge level". For example, the philosophical claim that only a whole human-like agent can acquire, manipulate and use information is false. To understand biological organisms and design sophisticated artificial systems, we need what [McCarthy 2008] labels "the designer stance". Unfortunately education about how to be a designer of complex working systems is not part of most disciplines that need it.
There is a different kind of use of information: when the user is constructing itself! In that process there are not sensors and motors transferring information and energy between the organism and its environment. The processes by which genetic information is used in organisms are very complex and varied. The use of information provided genetically can be very indirect, involving many stages, several of which are influenced by the environment (e.g. maternal fluids, or soil nutrients), so that the interpretation process required for development of an organism, is highly context sensitive.
In many cases, much of the information from which the processes start is encoded in molecular sequences in DNA, specifying, very indirectly, how to construct a particular organism by constructing a very complex collection of self-organising components, which themselves construct more self-organising components. The interpretation of those sequences as instructions depends on complex chemical machinery assembled in a preceding organism (the mother) to kick-start the interpretation process.
The interpreting system builds additional components that continue the assembly, partly influenced by the genetic information and partly by various aspects of the environment. During development, the ability to interpret both genetic and environmental information changes, partly under the influence of the environment.
So the standard concept of information encoded in the genome is over-simple theory. (Many details are discussed in [Jablonka & Lamb 2005]. The importance of cascaded development of layered cognitive mechanisms influenced by the environment is discussed in [Chappell & Sloman 2007]. See also [Dawkins 1982].)
The problems of interpreting and using visual and genetic information show that the role of the user U in obtaining information I from a bearer B in context C may be extremely complex and changeable, in ways that are not yet fully understood. That kind of complexity is largely ignored in most discussions about the nature of information, meaning, representation, but it cannot be ignored by people trying to design working systems.
What was said above in Section 3.2 about "energy" applies also to "information". We can understand the word "information" insofar as we use it in a rich, deep, precise and widely applicable theory (or collection of theories) in which many things are said about entities and processes involving information. I suspect that we are still at a relatively early stage in the development of a full scientific theory of information, especially as there are many kinds of information processing in organisms that we do not yet understand.
Some of the contents of a theory of information have been outlined in previous sections, elaborating on the proposition that a user U can interpret a bearer B as expressing information I in context C. The topics mentioned include the variety of sources of information, the variety of information-bearing media (about which we still have much to learn), the variety of structures and systems of information-bearers (syntactic forms), the variety of uses to which information can be put (including both communicative and non-communicative uses), the variety of information contents, the variety of ways in which information contents can change (e.g. continuously, discretely, structurally, etc.), the different kinds and degrees of complexity of processes required for interpreting and using the information in particular bearers, the variety of information-using competences different users (or different parts of the same user) can have, the potential information available in objects not yet perceived by information users, and more.
We already have broader and deeper understanding of information in this sense than thinkers had a thousand years ago about force and energy, but there is still a long way to go.
Unlike Shannon's information, the information content we have been discussing does not have a scalar value, although there are partial orderings of information content. One piece of information I1 may contain all the information in I2, and not vice versa. In that case we can say that I1 contains more information. I1 can have more information content than both I2 and I3, neither of which contains the other. So there is at most a partial ordering. The partial ordering may be relative to an individual user, because giving information I1 to a user U1, may allow U1 to derive I2, whereas user U2 may not be able to derive I2, because U2 lacks some additional required information. Even for a given user, the ordering can depend on context.
Information can vary both discontinuously (e.g. adding an adjective or a parenthetical phrase to a sentence, like this) or continuously (e.g. visually obtained information about a moving physical object). More importantly, individual items of information can have a structure: there are replaceable parts of an item of information such that if those parts are replaced the information changes but not necessarily the structure.
Because of this, items of information can be extracted from other information, and can be combined with other information to form new information items, including items with new structures. This is connected with the ability of information users to deal with novelty, and to be creative. Moreover, we have seen that such compositional semantics often needs to be context sensitive (or polymorphic), both human language and other forms of representation.
It can be stored in various forms, can be modified or extended through various kinds of learning, and can influence processes of reasoning and decision making. Information can also be transmitted in various ways, both intentionally and unintentionally, using bearers of many kinds.
Some items of information allow infinitely many distinct items of information to be derived from them. (E.g. Peano's axioms for arithmetic, in combination with predicate calculus.) Physically finite, even quite small, objects with information processing powers can therefore have infinite information content. (Like brains and computers.)
There is a great deal more that could be said about our current theories about information, but that would take several volumes. Many additional points are in papers in the bibliography, and in other books and journals, as well as in human common sense.
For
many very narrowly prescribed tasks it is possible to make machines that
perform better than humans (e.g. repeatedly assembling items of a
certain type from sets of parts arrayed in a particular fashion), but
which are easily disrupted by minor variations of the task, the parts,
or the starting configuration. Aliens who visited in 1973 and saw what
the Edinburgh robot Freddy could do, as described in
[Ambler,Barrow,etc.
1973] and shown in this video
http://groups.inf.ed.ac.uk/vision/ROBOTICS/FREDDY/Freddy_II_original.wmv
might be surprised on returning 36 years later to find how little
progress had been made, compared with ambitions expressed at that time.
Every living thing processes information insofar as it uses (internal or
external) sensors to detect states of itself or the environment and uses
the results of that detection process either immediately or after
further information processing to select from a behavioural repertoire,
where the behaviour may be externally visible physical behaviour or new
information processing. (Similar points are made in [Reading 2006]
and in Steve Burbeck's web site
http://evolutionofcomputing.org/Multicellular/BiologicalInformationProcessing.html)
In the process of using information an organism also uses up stored energy, so that it also needs to use information to acquire more energy, including the energy required for getting energy.
There are huge variations between different ways in which information is used by organisms, including plants, single celled organisms, and everything else. For example, only a tiny subset of organisms appear to have fully deliberative information processing competence, as defined in [Sloman 2006a]. As explained in Section 5.10 there can also be major differences between the competences of sub-systems in a single information-user.
Because possible operations on information are much more complex and far more varied than operations on matter and energy, engineers discovered during the last half-century, as evolution appears to have "discovered" much earlier, that relatively unfettered information processing requires use of a virtual machine rather than a physical machine, like using software rather than cog-wheels to perform mathematical calculations. A short tutorial on virtual machines and some common misconceptions about them can be found in [Sloman 2009f]. See also [Pollock 2008].
One of the main reasons for using virtual machines is that they can be rapidly reconfigured to meet changing environments and tasks, whereas rebuilding physical devices as fast and as often is impossible. It is also possible for a physical machine to support types of virtual machine that were never considered by the designer of the physical machine. Similarly, both cultural evolution and individual development can redeploy biological information processing systems in roles for which they did not specifically evolve.
In [Sloman 2009f] I suggested that the label "Non-physically-describable-machine" (NPDM) might have been preferable to "virtual machine" (VM) because the key feature is having states and processes whose best description uses concepts that are not definable in terms of the concepts of the physical sciences. Examples are concepts like "winning", "threat", "rule", "pawn", "checkmate", relevant to virtual machines that play chess. These VMs/NPDMs are nothing like the old philosophical notions characterised by [Ryle 1949] as referring to "The Ghost in the Machine", for we are not talking about mysterious entities that can continue existing after their physical bodies have been completely destroyed.
The crucial point is that the nature of the physical world allows networks of causation to exist that support processes in such virtual machines that not only cause other virtual machine processes to occur but can also influence physical machines, for example when a decision taken by a running chess program causes the display on a computer screen to change [Sloman 2009e]. A crucial step in evolution was the development of causal networks, including sub-systems running in parallel, in virtual machines that could be their own information-users.
This contradicts a number of common mistakes, such as the assumption that information-processing machines have to operate serially, that they have to use only programs installed by a designer, and that they cannot be aware of what they are doing, or why they are doing it, or decide to change their goals. Such mistakes might be overcome if more people studied AI, even if only designing relatively simple agents, as proposed in [Sloman 2009d].
Although we (or at least software engineers and computer scientists, unlike most philosophers in 2009) understand current virtual machines well enough to create, modify, debug, extend and improve them, the virtual machines that have been produced by biological evolution are another matter: their complexity, their modes of operation, the best ways to describe what they do and how they do it, still defeat scientists, though many subscribe to various personal favourite theories of consciousness, or whatever.
Some of them think the known phenomena cannot possibly be explained in terms of information-processing machinery, though in most cases that is because their concept of information-processing is too impoverished - e.g. because based on the notion of a Turing machine, whose relevance to this topic was challenged in [Sloman 2002].
For example, Turing machines are limited to discrete operations, whereas there is no reason to assume that all information-processing has to be so limited, though it could turn out to be the case that no physical machine could support truly continuous information manipulation. Others take it for granted that brains are information-processing machines, but do not yet understand what information they process or how they do it. For instance, major features of human and animal vision remain unexplained.
This is just the beginning of an analysis of relationships between information, bearers, users, and contexts. What is written here will probably turn out to be a tiny subset of what needs to be said about information. A hundred years from now the theory may be very much more complex and deep, just as what we know now about information is very much more complex and deep than what we knew 60 years ago, partly because we have begun designing, implementing, testing and using so many new kinds of information-processing machines. The mechanisms produced by evolution remain more subtle and complex, however.
I doubt that anyone has yet produced a clear, complete and definitive list of facts about information that constitute an implicit definition of how we (the current scientific community well-educated in mathematics, logic, psychology, neuroscience, biology, computer science, linguistics, social science, artificial intelligence, physics, cosmology, and philosophy) currently understand and use the word "information". But at least this partial survey indicates how much we have already learnt, especially as concerns the complexity and causal powers of information processing mechanisms in virtual machinery, a topic that is still not understood by many scientists and engineers without personal experience of designing, building, testing and debugging a complex distributed virtual machine causally interacting with a complex external environment through concurrently active sensors and activators.
Some physicists seek a "theory of everything", e.g. [Barrow 1991,Deutsch 1997]. However, it does not seem likely that there can be a theory that is recognisable as a physical theory from which all the phenomena referred to here would be derivable, even though all the information-processing systems I have referred to, whether natural or artificial, must be implemented in physical systems. I suspect that we are in the early stages of understanding how the physical world can support non-physical entities of which simple kinds already exist in running virtual machines in computers, including virtual machines that monitor themselves, and use information about what is happening inside them to take decisions that alter their internal and external behaviours.
My own view has been, for several decades, that as regards information processing our state of knowledge could be compared with Galileo's knowledge of physics. He was making good progress and laying foundations for future developments: including developments he could not possibly imagine.
One of the drivers of progress in science (and philosophy) is improved understanding of what is not yet known. I believe the ideas sketched here help us to focus more clearly on aspects of information processing that are not yet understood. Doing that in far more detail with far more specific examples, can help to drive advances that will produce new, deeper, more general explanations. But only time will tell whether this is what Lakatos would call a progressive or a degenerating research programme.
Comments and questions by several readers led to major improvements.
Many of the points made here were previously also made piecemeal over
several years in contributions to the Psyche-D discussion list, now
archived at
http://www.archive.org/details/PSYCHE-D,
and in papers and presentations on my web site, listed in the bibliography.
Discussions by email and face to face with many colleagues have helped to shape
the ideas presented here. It was Max Clowes who first introduced me to
computational ways of thinking about philosophical problems. He always stressed
that semantics, not just syntax, was crucial to visual processing.
See
http://www.cs.bham.ac.uk/research/projects/cogaff/sloman-clowestribute.html
1In at least two of the essays "The Cybernetics of `Self': A Theory of Alcoholism" and in "Form Substance and Difference".
2As explained in various papers and presentations available online [Sloman 1985,Sloman 1987,Sloman 2008b,Sloman 2008c,Sloman 2009e]
3This is discussed in a presentation arguing that there is a sense in which life presupposes mind (informed control) http://www.cs.bham.ac.uk/research/projects/cogaff/talks/#lifemind
4See
also the discussion of alternatives to logical representations in
[Sloman 1978,Chap7]. [Sloman 2008b] argues that
non-communicative "languages" used for perception, learning, planning,
etc., evolved before human languages, some of them using non-Fregean
forms of representation.