School of Computer Science

Jane Austen's concept of information
(Not Claude Shannon's)

Aaron Sloman
School of Computer Science, University of Birmingham.

This file is available in two formats
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/austen-info.html
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/austen-info.pdf
It is part of the Meta-Morphogenesis project.

Installed: 26 Apr 2013
Updates:
14 Dec 2018;29 Jun 2019 (reorg); 30 Dec 2020; 22 Aug 2021(reformat)
12 Jul 2017; 1 Jun 2018; 7 Aug 2018; 29 Aug 2018; 4 Sep 2018
10 Mar 2015; 29 May 2015 (Added introduction);4 Jun 2015;
30 Apr 2013; 16 Dec 2013; 27 Dec 2013; 6 Aug 2014;


CONTENTS


Introduction: Claude Shannon vs Jane Austen

Many scientists and engineers, including, surprisingly, some psychologists and neuroscientists, seem to think that what the word "information" and its equivalents in other languages refer to is what Claude Shannon's ground-breaking 1948 paper referred to as "information": a measurable property of signals that can be stored, transmitted, compared, compressed, decompressed, corrupted, repaired, encrypted, decrypted, etc. (Shannon, 1948).

Many of Shannon's admirers seem to have forgotten that there is a much older, widely used, theoretically important notion of "information". For example, it was familiar to Jane Austen and used in her novels, over a century before Shannon, as illustrated below. The concept also occurs in non-technical, conversational, uses of the word "information". This ancient concept of information, or meaning, is essential for our understanding of biological evolution and its products (including humans) and for attempts to understand what natural intelligence is and how it works, including attempts to model and replicate natural intelligence in machines.

Shannon himself did not make this mistake of conflating the old concept of semantic information with what he called "information". Margaret Boden comments on this in her two volume survey of cognitive science and its history (2006):

This term was drawn from Shannon's information theory, developed at Bell Labs to measure the reliability or degradation of messages passing down telephone lines (Shannon 1948; Shannon and Weaver 1949). But the "messages" were thought of not as meaningful contents, conveying intelligible information such as that Mary is coming home tomorrow. Rather, they were the more or less predictable physical properties of the sound signal. In Shannon's words: "Frequently the messages have meaning; that is, they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the information problem. The significant aspect is that the actual message is one selected from the set of possible messages."
     ...........................
In short, "information" in Shannon's sense was not a semantic notion, having to do with meaning or truth. It was a technical term, denoting a statistical measure of predictability. What was being predicted was the statistics of a physical signal, not the meaning--if any--being carried by the signal. As a technical term for something measurable, "information" needed a quantitative unit. This new unit was the bit (an abbreviation of "binary unit").

In contrast, Jane Austen frequently used the word "information" to refer to information content, not properties of the information vehicles expressing that content.

https://en.wikipedia.org/wiki/Jane_Austen
"English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century."
Born 1775. Died 1817

I'll summarise Shannon's notion and contrast it with Jane Austen's notion (illustrated using extracts from her novel Pride and Prejudice below). She was primarily concerned with useful information contents of various kinds, whereas Shannon, as illustrated above, was primarily concerned with mathematical properties of information vehicles.

I'll try to explain the differences between their approaches, and contrast both of them with the views of Auletta et al. below who, like Shannon, regard information as something transmitted and received, though they focus more on the sender than the receiver.

The oldest type of information: control information

The work for which Shannon is famous was primarily concerned with information as something that can be stored, transmitted, or transformed (e.g. compressed, uncompressed or translated from one notation to another), whereas the older notion of information is a notion of something that can be used for other purposes than such "syntactic" operations, and which can be more or less useful. Moreover its usefulness can depend on other things, such as the state of some part of the world, or a user's current intentions or needs.

The uses are many and varied, including recognising a need, a threat, or a source of something useful, distinguishing known and unknown individuals, selecting a goal (e.g. to meet a need or deflect a threat), finding a means to a goal (e.g. selecting an available action, or an available sequence of actions, or a route through space, to achieve the goal), making predictions, and many more.

There are also many actions that can be performed on information, e.g. deriving new information from old, detecting an inconsistency, detecting an ambiguity, refining information by adding new details, using theoretical information to explain some other information gained from observation or reasoning, checking whether one information item is relevant to another (e.g. whether it answers, or helps to answer, a question), and many more.

The information content of a question is a request for some other information that will answer the question. The information content of a command or instruction or suggestion includes specification of some action or type of action that could be performed.

I suspect none of those statements would have surprised Jane Austen or many other thinkers before and after her, who had never encountered Shannon information.

Evolutionary changes produce new physical structures, capabilities, and behaviours, but they can also extend information-processing abilities, in many different ways, including extending uses of information during growth and development. (E.g. as explained below in connection with Meta-Configured genomes.)

Being of use is a more fundamental feature of information than being physically manipulable, since there would not be any point in storing, manipulating or transmitting information, or even creating information items to be stored or transmitted, if the information could not be used.

Having the potential to be used applies to both true and false information and information contents that are neither true nor false, e.g. questions or imperatives (commandments?). It is possible to use false information inadvertently or deliberately, e.g. in political speeches or commercial advertising.

However, not all control information has the potential to be true or false: e.g. a road sign or traffic light telling you to stop is typically part of a complex traffic control system rather than a piece of factual information that can be true or false.

The most basic use of information, in all forms of life, including the simplest forms of life, is for control -- initiating or modifying an action or process, or selecting between things to do, selecting when to start or stop processes, or modulate them, e.g. speeding up, slowing down or changing direction, and many more.

Moreover it is often sensible to store things that are never used, e.g. plumbing tools, because situations could arise in which they would need to be used, and that is also true of information. Information that has the potential to be used for control (e.g. in deciding what actions to perform) need not actually be used for control -- but that does not prevent it being potentially useful control information.

Information, in all these cases, is something abstract, potentially but not intrinsically concerned with relationships between some actual needs or goals, situations, and decisions or selections.

It involves structures -- parts, and wholes, and relationships -- but that in itself does not imply that there is any numerical measure. A single number, or point on a scale does not capture the required useful properties of information.

There may be measures associated with things that are referred to in items of information, but those are not measures of information: e.g. one piece of information could be about a collision between a child and a doorpost, and another about a collision between an asteroid and a planet. The two events have many measures, e.g. amounts of physical matter involved, amounts of kinetic energy transformed into heat energy, but those are not measures of amounts of information.

Information can be about inaccessible or remote entities. There are old philosophical problems about how it is possible to refer to, think about, ask questions about, things that have never been encountered as sensed, or more generally experienced, objects. My own view, initially proposed in Sloman(1985) (and later extended by Ian Wright) is that semantic content beyond what's immediately accessible can be based on the need to fill gaps in causal loops: e.g. if some of the contents of a machine's current sensory experience change spontaneously, then something outside the machine can be postulated as a cause of the change: "loop-closing" semantic content.

If motions of limbs or rotation of a head seem to cause changes in visual contents and there is no direct connection between the relevant muscles and retinal cells, then an animal or machine may use mechanisms for postulating external causal intermediaries, and for inventing theories about what they are and how they work, including, for example, differences between visually perceived changes caused by moving your hand in front of your eyes, and changes caused by rotating your head so that new parts of the environment come into view. Of course, that sketch has to be filled in with a great deal of mechanism, but it is clear that biological evolution combined with features of the environment has produced extremely sophisticated and varied visual mechanisms dealing with such "loop-closing" semantics.

Biological evolution evidently discovered the importance of loop closing semantics in biological mechanisms used for control on the basis of information outside the organism, or in the organism but outside the control mechanism.

These biological mechanisms can be thought of as precursors of the more sophisticated cases proposed by 20th Century philosophers of science (e.g. Hempel, Pap, Tarski, and many others) who developed anti-empiricist explanations of how scientific theories can meaningfully refer to entities that scientists cannot experience, with properties that cannot be directly measured, e.g. the mass and charge of an electron, or the temperature at the surface of a star light-years away from us at a long-past time.

More on Shannon's notion of information

I'll present a very crude summary of Shannon's ideas, in order to explain how his notion of information differs from the much older notion, which is much more familiar to most people, including people who lived long before Shannon, such as the novelist Jane Austen, whose ideas about information are summarised below.

In Shannon's sense of the word "information", there is a numerical quantity of information associated with a signal, derived from the size of the class of alternative signals possible in that context. In that sense "The caterpillar chewed the leaf" and "The asteroid destroyed the island" might have comparable "amounts" of information as English sentences of the form "The [noun] [verb] the [noun]", since both select from the same word classes, ignoring the structure at the level of the alphabet used.

If each signal in a set of possible signals is constructed by concatenating symbols selected from a fixed set of symbols, then the Shannon information content depends on the size of the set of symbols and the number of symbols in the signal. For example if only two signal elements are used, a dot (".") and a dash ("-"), as in Morse code, then any signal made of four components, e.g. "....", "----", "-.-.", etc. has an amount of information expressible in terms of the number of possible four component signals using only two types of components, namely: 16. Each such four component signal eliminates 15 of the 16 possibilities.

Likewise a five component signal using a two character alphabet eliminates 31 of the 32 possibilities. So it has more Shannon information than a four component signal. (There are different mathematically equivalent ways of defining a measure of information based on this idea, some more generally useful than others.)

If instead of only a choice between two items for each signal component, the code used allows four choices for each component, e.g. one of these: "-", ".", "=", "+", then instead of the information measure being based on 2x2x2x2 = 16, it will be based on 4x4x4x4 = 256. So the measure of how much is excluded increases as the number of options for each item in the string increases and also as the length of the string increases.

For technical reasons, Shannon's measure did not directly use these numbers, 16 and 256, or the numbers of items excluded by each signal, e.g. 15 or 255. but numbers derived from them. The main point is that a signal that excludes 255 of 256 possibilities can be said to have more information, in Shannon's sense than a signal that excludes 15 of 16 possibilities, a smaller ratio. So two equally nonsensical words for an English user, e.g. "zzxxjalp" and "azbycxxyrk", which convey no information if sent unexplained as a message, will have different amounts of Shannon information. Assuming the same alphabet is in use, the second is longer and excludes a higher proportion of alternatives than the short word, and therefore has more Shannon information.

This is analogous to the way in which saying that an animal in the distance is a bird gives less information than saying it is a crow, because "crow" excludes more possibilities, and therefore supports more inferences, than "bird" does. E.g., you can therefore make more inferences from "Tweety is a crow" than from "Tweety is a bird". Intuitively the former therefore has more information. That shows a loose connection between our ordinary concept of information and Shannon information.

Each of the two words "bird" and "crow" contains four letters from the same set of 26 possible letters and therefore, considered purely as signals, they have the same amount of Shannon information. Considered as words of English, however, they each have a smaller information measure than that, because not all combinations of four letters of the alphabet are words of English, e.g. "iiii", "zyww" are not, which most English speakers will (somehow!) know without being told, so the words "crow" and "bird" exclude a smaller number of alternatives than they would exclude if all four letter sequences were words of English.

A further complication is that there are some four letter words, e.g. "pick", which have (at least) three meanings, all of which are excluded by use of the word "crow", so that increases the number of words excluded. But this has nothing to do with what the word "crow" means, i.e. what information it can be used to convey.

There are many technical details omitted by this summary. The main point to note is that this concept of information measure, expressible as a number, which turned out to have profoundly important applications in science and engineering, refers only to the structure of the signal itself and the size of the set of alternative possibilities with that structure. Shannon's measure of "information" in this sense has nothing to do with what we would normally refer to as "meaning", "content" or what is "denoted", or "referred to".

It is a syntactic measure that is not directly connected with semantic content, though it may be indirectly connected when applied to signals in a known language. Shannon understood all this, as is shown clearly by a video presentation in which he discusses maze-learning by a mechanical mouse he had built, clearly indicating that that the mouse acquires information that later can be used by getting from anywhere in the maze to the goal point. But his choice of the label "information" in his publications seems to have confused many highly intelligent people. (He apparently later regretted using the label "information" for his concept.)

I have found Shannon's video online in two places:
Flash format:
http://techchannel.att.com/play-video.cfm/2010/3/16/In-Their-Own-Words-Claude-Shannon-Demonstrates-Machine-Learning
Youtube video (highly distorted):
https://www.youtube.com/watch?v=vPKkXibQXGA

This video summary presents some of Shannon's ideas (without going into technical detail) and explains their importance:
     https://www.youtube.com/watch?v=z2Whj_nL-x8
     Claude Shannon - Father of the Information Age

There are many online documents explaining Shannon's ideas in more technical detail and contrasting them with alternative ideas. For a philosopher's overview see Floridi's Stanford Encyclopedia of Philosophy entry.

Another, much older, concept of information: Jane Austen's

In English the word "information" normally has a quite different meaning: it does not refer to a numerical measure of the structure of a signal, or how a particular signal relates to the set of possible signals. Rather "information" (like related words in other languages) refers to the subject matter conveyed to a listener or reader who understands the signal. Moreover, as we'll see later not all information is associated with signals sent from a transmitter to a receiver, insofar as perception, reasoning, remembering, planning, deciding, acting, surmising, suspecting, asking, and many more, all involve use of information.

Sometimes the subject matter, or information, identifies an entity of some sort, e.g. London, or the tallest building in Paris, or William Shakespeare. Sometimes it is a fact, or possible fact, e.g. "Humans will be born in spaceships by the year 2250", or even something false, e.g. "The Eiffel Tower is in London", or a question, or an instruction or command (the answer to "what shall I do?", which might be "sit on the mat next to the door and twiddle your thumbs").

These are all examples of semantic content, expressed here in printed English, though in principle the same semantic contents could be expressed in spoken English, hand-written English, or many other languages, using different words, and different textual forms for those words, or in sign languages whose physical instantiations are evanescent body movements.

Pictures and diagrams can also have semantic content though the mechanisms (in brains or computers) required for producing and interpreting them are different from those used for producing and interpreting words, phrases and sentences. Perception and understanding of a picture is related to but different from visual acquisition of information, e.g. about what exists and what's happening in some part of the environment. Visual information acquired in ordinary life typically does not have a sender, and in many environments will have a large number of independent sources, e.g. different plants, paths, walls, boundaries, and insects seen at a moment in a garden. There will be no well defined measure of amount of information in the whole scene though there will typically be many different kinds of information.

However, if a picture, or video, stored in a computer is represented by a computer memory structure composed of bits (symbols chosen from a set of two elements, e.g. '0' and '1') then the number of bits will indicate the information content as measured by Shannon.

There are ways of compressing the signal size required for transmitting or storing such picture elements because of the amount of repetition they often include: e.g. for large regions of an image that are all one colour, or because of repeated pairings or groupings of information items. So the amount of Shannon information required for storage may be different from the amount required for the physical display mechanism that has to show all parts of the image, not a mathematically derived summary. Again, the semantic information content that a human looking at the image, e.g. information about a crow next to its nest, is very different from the Shannon information measure.

That semantic sense is the sense in which Jane Austen used the word "Information" in her novel Pride and Prejudice, published in 1813, about 135 years before Shannon published his paper, though she was mainly referring to verbally expressed information.

The claim that she often used such a concept of information is substantiated by a collection of examples of her use of the word "information" in the novel, presented in the next section. However, I would not be surprised to learn that she was perfectly well aware that information can be acquired through sensing or perceiving other things than written or spoken words, or even by reasoning, and also aware that information can be used in many forms of action, including, for example, catching a ball, or locating a lost key.

Extracts from Jane Austen's Pride and Prejudice
With thanks to Project Gutenburg:
http://www.gutenberg.org/files/1342/1342-h/1342-h.htm

Jane Austen knew a lot about human information processing as these snippets from Pride and Prejudice (published in 1813 -- over 200 years ago) show:

She was a woman of mean understanding, little information, and uncertain temper.

Catherine and Lydia had information for them of a different sort.

When this information was given, and they had all taken their seats, Mr. Collins was at leisure to look around him and admire,...

You could not have met with a person more capable of giving you certain information on that head than myself, for I have been connected with his family in a particular manner from my infancy.

This information made Elizabeth smile, as she thought of poor Miss Bingley.

This information, however, startled Mrs. Bennet ...

She then read the first sentence aloud, which comprised the information of their having just resolved to follow their brother to town directly,...

She resolved to give her the information herself, and therefore charged Mr. Collins, when he returned to Longbourn to dinner, to drop no hint of what had passed before any of the family.

...and though he begged leave to be positive as to the truth of his information, he listened to all their impertinence with the most forbearing courtesy.

Mrs. Gardiner about this time reminded Elizabeth of her promise concerning that gentleman, and required information; and Elizabeth had such to send as might rather give contentment to her aunt than to herself.

Elizabeth loved absurdities, but she had known Sir William's too long. He could tell her nothing new of the wonders of his presentation and knighthood; and his civilities were worn out, like his information.

I was first made acquainted, by Sir William Lucas's accidental information, that Bingley's attentions to your sister had given rise to a general expectation of their marriage.

As to his real character, had information been in her power, she had never felt a wish of inquiring.

... and at last she was referred for the truth of every particular to Colonel Fitzwilliam himself-from whom she had previously received the information of his near concern in all his cousin's affairs,

When he was gone, they were certain at least of receiving constant information of what was going on,

Mr. Bennet had been to Epsom and Clapham, before his arrival, but without gaining any satisfactory information....

Elizabeth was at no loss to understand from whence this deference to her authority proceeded; but it was not in her power to give any information of so satisfactory a nature as the compliment deserved.

Upon this information, they instantly passed through the hall once more...

She began now to comprehend that he was exactly the man who, in disposition and talents, would most suit her. His understanding and temper, though unlike her own, would have answered all her wishes. It was an union that must have been to the advantage of both; by her ease and liveliness, his mind might have been softened, his manners improved; and from his judgement, information, and knowledge of the world, she must have received benefit of greater importance.

And will you give yourself the trouble of carrying similar assurances to his creditors in Meryton, of whom I shall subjoin a list according to his information?

But to live in ignorance on such a point was impossible; or at least it was impossible not to try for information.

but to her own more extensive information, he was the person to whom the whole family were indebted

Darcy was delighted with their engagement; his friend had given him the earliest information of it.

"Did you speak from your own observation," said she, "when you told him that my sister loved him, or merely from my information last spring?"

Bingley looked at her so expressively, and shook hands with such warmth, as left no doubt of his good information.

The joy which Miss Darcy expressed on receiving similar information, was as sincere as her brother's in sending it.


Exercises for the reader

What did Jane Austen know about information and the processes in which it can play a role?

What sorts of information-processing machinery can account for the phenomena she was interested in?

Does information have to have a sender and a receiver in order to exist? Can information be received, or acquired, without being sent intentionally? (Which of Jane Austen's examples might be of that sort? What if she had written detective stories?)

Do the examples show that she understood the importance of both control information and factual information? What is the difference?

How can information make something happen?

Do an internet search for "loop-closing semantics" -- a theory in which the most basic form of semantic information is concerned with control (e.g. information used by a thermostat in determining when to turn a boiler on or off, or information used by the small fan on a fan-tail windmill to determine how to rotate the main sails to face the wind, or information used by a Watt centrifugal controller to determine whether to increase or decrease the flow of steam from the boiler to the pistons).

Exercise: which varieties of control information can you distinguish in organisms, at various stages of development, learning, behaviour, competition, cooperation, reproduction?


Emphasising the sending process, not the receiver
(Added 6 Aug 2018)

Auletta, Ellis and Jaeger (2008) includes the following suggestion:

"One of the biggest misunderstandings in information theory is to have taken Shannon's (1948) theory of communication (in the context of controlled transmission) as a general theory of information. In such a theory, centred on signal/noise discrimination, the message is already selected and well defined from the start, ...(selected by the sender)..., and the problem here is only to faithfully transmit or further process, ... ... the sequence of bits that has been selected (Auletta 2008a). On the contrary, a true information theory (as was Wiener's (1948) original aim) starts with an input as a source of variety and has the selection only at the end of the information processing or exchanging. In other words, a message here is only the message selected by the receiver."

Note that this makes the assumption (of which I was once guilty) that information is only something transmitted and received. That assumption ignores the fact that all that encoding, transmitting, decoding, etc., would be pointless if information could not be used. So a deep theory of information should start with users of information and its uses, which may differ for different kinds of information and different users.

For example there are many important uses of information (understood by novelists) that have nothing to do with senders and receivers, since the information is the content of an intention, a percept, a plan for action, or an internal self-directed question (e.g. "What made that noise?" "Where did I previously find fruit?" "Why did my action that previously succeeded fail this time?"). Although my examples are expressed in English, I suspect that pre-verbal human toddlers and other animals are able to use much older internal languages that evolved not for purposes of communication but for intelligent (self-)control, including perception, deliberation, control of action, reflection on successes and failures, and many more. So in addition to discussing information in relation to senders and receivers, we also need to discuss its relevance to information users.

Then sources, senders, receivers, encoders, decoders, etc. could be discussed as secondary topics, though for Shannon's job the secondary topics were the only, or the main, matters of concern: because his employer was the Bell Telephone Company.

In the context of the Meta-Morphogenesis project a major way in which information of various kinds, with various sources, plays central, highly context-sensitive roles, is in individual development, as discussed briefly below (Meta-Configured Genomes).

Is that what Auletta et al. intended to say?


Uses of information in other, especially simpler, organisms
(Added Aug 2018)

I expect nobody knows how Jane Austen might have responded (after hearing an extended report on biological research done long after her death) to a question like:
"Can other animals make use of information, e.g. horses, cats, birds, snails, butterflies, earthworms, or even flowers, trees, fungi in a forest and micro-organisms performing essential functions in human bodies?"

I have a fantasy that one day an Austen scholar, with much deeper acquaintance with her writings, will compose an imaginary dialogue between Jane Austen and a biologist discussing capabilities of a wide range of organisms, perhaps starting with these presentations by Maddie Moate and colleagues:
https://www.youtube.com/watch?v=L7FGxcNPMCE
     Amazing Animal Architects
https://www.youtube.com/watch?v=spMkaJp975s
     Monkeys react to magic
and perhaps going on to much simpler, smaller organisms, or even individual cells in the bodies of living things.

Investigating the varieties of information processing between the very simplest organisms, or proto-organisms, and the most complex, and how and why the relevant types of use of information emerged, is the main goal of the Turing-inspired Meta-Morphogenesis project:
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/meta-morphogenesis.html

That project includes the hypothesis that the most basic and most pervasive use of information in living things is for control: receiving, transmitting, storing, retrieving, encoding and decoding are all of secondary importance, insofar as they all contribute directly or indirectly, immediately or with some delay, to the use of information, although some processes whose function is to make information available for use may turn out to have been redundant because the information is never used. But that doesn't stop it being potentially usable information.

Most philosophers who write about information tend to focus on intentional uses or transmissions of information by humans, whose use of information is typically complex and varied, with many subtleties. For much simpler forms of life the uses of information and the types of information are much more restricted though the physical mechanisms may be quite complex, as suggested by the "chemoton" theory of Ganti(2003).


The Meta-Morphogenesis (M-M) project
(using Meta-Configured genomes)

This collection of notes on information is part of the background to the Turing-Inspired Meta-Morphogenesis project, whose aim is to understand the variety of roles of information in all forms of life, including the information processing mechanisms used at various stages of biological evolution, and in various evolutionary lineages, including microbes, plants, precursors of animals, all varieties of animals, and in some cases uses by larger groups, including flocks, shoals, symbiotic collaborators, and uses (including mis-uses) of information by large "virtual" entities, such as cultures, engineering teams, political movements, and academic disciplines.
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/meta-morphogenesis.html

Meta-Configured Genomes

A key idea that has been under development for some time, in collaboration with Jackie Chappell, which turned out to be of great importance to The M-M project, concerns the multi-stage, multi-level, contexts in which information from a genome (i.e. genetic information) can be used during development, leading to wildly different effects of a particular part of the genome in different developmental environments.

Our ideas constitute steps toward a theory of "A meta-configured genome", according to which genetic information interacts during its expression both with current aspects of the environment and with products of earlier stages of gene-expression, as illustrated most obviously in connection with the multi-stage processes in language development. This idea (still under development) is outlined in: "The Meta-Configured Genomes" (work in progress).
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/meta-configured-genome.html
I believe this is closely related to Annette Karmiloff-Smith's theory of "representational redescription" during individual development:
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/beyond-modularity.html


Frege on Sense & Reference [Sinn/Bedeutung]
(Added: Aug 2014)

Gottlob Frege made use of a distinction between two aspects of meaning, or information content, usually translated using the words "sense" and "reference", echoing what earlier philosophers had referred to by distinguishing "connotation" and "denotation", or "intension" and "extension". The distinction is so pervasive that it has probably been re-invented or re-discovered many times, though using different terminology.

As far as I know this distinction was not discussed by Shannon, although his 1948 paper implicitly makes use of the distinction insofar as he uses the word "sense" several times, e.g. in contexts like
     "...if P is sufficiently large, in the sense of having an entropy power approaching P + N"
and
     "...the evaluation is "reasonable" in the sense that...".

However, problems arise when attempts are made to apply sense/reference distinction to every possible word or phrase or sign or process that in some sense can be said to convey information or have a meaning.

Examples that cause problems (some of them discussed by Frege) include demonstrative/indexical expressions, e.g. "here", "now", "you", "I", "we", words that combine sentence fragments to form new larger fragments or whole sentences, or qualify assertions, such as "but" "although" "perhaps", "of course", proper names, and many others.

A good novelist with a rich and deep command of her language will use all these hard to analyse words and phrases without worrying about philosophers' questions. However, a complete theory of information, covering all the varieties of information contents of portions of human languages, will have to make explicit the roles of the more complex and subtle words and phrases, as philosophers and linguists have attempted to do. (A survey is beyond the scope of this paper. Is there a good tutorial reference?)

I have tried to do this with a little word that causes big problems, namely "self", whose linguistic function is sometimes misconstrued as referring to a special mysterious entity ("the self") by philosophers and others, here:
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/the-self.html
    "THE SELF" -- A BOGUS CONCEPT

Without a good theory covering all the obvious and unobvious cases we are unlikely to be able to design robots that have minds like ours.
[A tutorial on conceptual analysis is available in chapter 4 of my 1978 book:
http://www.cs.bham.ac.uk/research/projects/cogaff/crp/#chap4]


CONTENTS

MISCELLANEOUS NOTES AND REFERENCES

G. Auletta and G.F.R. Ellis and L. Jaeger, 2008,
Top-down causation by information control: from a philosophical problem to a scientific research programme, in J. of the Royal Society Interface, The Royal Society, Oct, 6, 2008, Vol 5, No 27, pages 1159--1172,
http://dx.doi.org/10.1098/rsif.2008.0018

(Discussed in the text, above).


Gregory Bateson's contribution
Steps to an Ecology of Mind: Collected Essays in Anthropology, Psychiatry, Evolution, and Epistemology, 1972,
Chandler Publishing, Suffolk,

Bateson on "difference": discussion note by A.S.
(Added here 29 Aug 2018)
What did Gregory Bateson mean when he wrote: "information" is "a difference that makes a difference"?
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/information-difference.html

Bateson is frequently quoted approvingly by unthinking admirers who seem to ignore the fact that no matter how memorable the slogan sounds it is not at all clear what it could possibly mean. Guided by some of Bateson's writings, the "difference" discussion note explains that Bateson was referring to some of the patterns of causal influence produced in brains by information. (With thanks to Olivier Marteaux for correcting my initial interpretation.)


Margaret Boden's account
Margaret Boden's "Magnum Opus" includes much that is relevant.
Mind As Machine: A history of Cognitive Science (Vols 1--2) (2006)
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/boden-mindasmachine.html
The discussion starting in section 15iii (Mathematical Biology Begins), continuing to the end of 15iv (15.iv. Turing's Biological Turn) is specially relevant to the M-M project, summarising work by D'Arcy Thompson, Alan Turing and others. Most of the rest of the book is also relevant, some portions more closely than others. In particular see her discussions of the role of the notions of "information" (as opposed to matter, energy, force, etc.) both in explanations of natural phenomena and in design of new machinery.
4.v. Cybernetic Circularity: From Steam-Engines to Societies (p.198)
"...the focus of cybernetics was on the flow of information, as opposed to the matter or energy involved. Because information is an abstract notion, it could be applied to many different types of system--even including minds."

Jackie Chappell's contribution
Added: 6 Jun 2018
For some time I have been collaborating with Jackie Chappell, a biologist who works on animal cognition, using the concept of (Semantic -- non-Shannon) information attributed here to Jane Austen. Most of her publications are listed here:
https://www.researchgate.net/profile/Jackie_Chappell

For a sample of her work closely related to this topic, see:
Jackie Chappell, 2014, Acting on the world: understanding how agents use information to guide their action, in From Animals to Robots and Back: Reflections on Hard Problems in the Study of Cognition, Eds., Wyatt, J.L. and Petters, D.D. and Hogg, D.C., Springer, pp 51--64, 978-3-319-06614-1,
http://www.cs.bham.ac.uk/research/projects/cogaff/chappell-action-on-the-world.pdf

Chappell's paper uses the word "information" 27 times in 10 pages -- in the sense of Jane Austen, but mainly in the context of non-human intelligence.

Pre-publication summary:
Most animals navigate a dynamic and shifting sea of information provided by their environment, their food or prey and other animals. How do they work out which pieces of information are the most important or of most interest to them, and gather information on those parts to guide their action later? In this essay, I briefly outline what we already know about how animals use information flexibly and efficiently. I then discuss a few of the unsolved problems relating to how animals collect information by directing their attention or exploration selectively, before suggesting some approaches which might be useful in unravelling these problems.
She does not use "information" to refer to bit-patterns. There's no evidence that a crow deciding where to and how to add a new twig to a partly built nest has access to or makes use of, or needs, an information structure composed of strings of bits.


Luciano Floridi, "Semantic Conceptions of Information", in The Stanford Encyclopedia of Philosophy, (Spring 2017 Edition), Edward N. Zalta.
https://plato.stanford.edu/entries/information-semantic/


George Dyson on Information
Added: 10 Mar 2015
For an excellent historical overview of varieties of information processing (mainly by humans) since ancient times see:
George B. Dyson,
Darwin Among The Machines: The Evolution Of Global Intelligence,
Addison-Wesley, Reading, MA, 1997,
http://www.amazon.co.uk/Darwin-Among-Machines-George-Dyson/dp/0140267441

Gottlob Frege, On Sense and Reference,
(Translated) 1948 The Philosophical Review, 57, 3, May, 1948, pp. 209--230,


Tibor Ganti, 2003. The Principles of Life,
Eds. E. Szathmáry, & J. Griesemer, (Translation of the 1971 Hungarian edition), OUP, New York.
See the very useful summary/review of this book by Gert Korthof:
http://wasdarwinwrong.com/korthof66.htm


Samuel Johnson on information
Added: 27 Dec 2013
The semantic concept of information is, of course, much older than Jane Austen. Among many others, Samuel Johnson (1709--1784) used the "semantic" concept of information:
     "We know a subject ourselves, or we know where we can find information on it"
quoted in Boswell's Life of Johnson, 1791.
Dennett Interview Nov 2017
(Added 2 Jan 2021)
I recently came across this interview in which Daniel Dennett reflects on information:
https://www.edge.org/conversation/daniel_c_dennett-a-difference-that-makes-a-difference

Here's a sample extract:

"....I've been trying to articulate, with the help of Harvard evolutionary biologist David Haig, just what meaning is, what content is, and ultimately, in terms of biological information and physical information, the information presented in A Mathematical Theory of Communication by Shannon and Weaver. There's a chapter in my latest book called "What is Information?" I stand by it, but it's under revision. I'm already moving beyond it and realizing there's a better way of tackling some of these issues.

The key insight, which I've known for years, is that we have to get away from the idea of there being the pure ultimate fixed proposition that captures the information in any informational state. This goal of capturing the proposition, this attempt at idealization that philosophers have poured their hearts and souls into for a hundred years is just wrong. Don't even try. I'm now coming around to wonder why it had such a hold on us. It's quite obvious once you start thinking this way."

I was surprised by the comment about "why it had such a hold on us". To whom does "us" refer? I don't recognize that "hold on us" in most of the philosophers I have read, though I think many psychologists and neuroscientists have been confused by Shannon's ideas. But not all. When a biologist (such as Jackie Chappell, mentioned above) refers to information used by some animal there's no implication that bit-patterns are relevant.

I remember arguing with fellow students, around 1959, including at least one music student, against confusion of our ordinary information concept and Shannon information. My 1962 DPhil thesis was about relationships between meaning (information content) and truth, and never mentions Shannon, or anything remotely like Shannon information. I had encountered his ideas, but decided they were irrelevant to the problem of understanding relationships between meaning and truth, especially necessary truth, and also irrelevant to Kant's deep ideas published in his Critique of Pure Reason(1781), the main inspiration for my work.
http://www.cs.bham.ac.uk/research/projects/cogaff/62-80.html#1962

Dennett seems not to have noticed that Shannon had simply misused the pre-existing label "information". Although many scientists and engineers have found Shannon's ideas very useful they don't always notice his linguistic misuse.


Schrödinger's contribution
Erwin Schrödinger anticipated some of Shannon's ideas in his wonderful little book
     What is life? CUP, Cambridge, 1944.
I have an annotated version of part of the book here:
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/schrodinger-life.html


Claude Shannon, (1948), A mathematical theory of communication, in Bell System Technical Journal, July and October, vol 27, pp. 379--423 and 623--656,
https://archive.org/download/pdfy-nl-WZBa8gJFI8QNh/shannon1948.pdf


Aaron Sloman, 1985, What enables a machine to understand?, in Proceedings 9th IJCAI Los Angeles, pp. 995--1001, http://www.cs.bham.ac.uk/research/projects/cogaff/81-95.html#4
(Introduced the concept of "loop-closing semantics", later developed by Ian Wright, e.g. in this talk:
https://www.youtube.com/watch?v=GOOe2T7VTbo)

What's information
An extended discussion of the older concept of information, and its uses in science and engineering as well as in ordinary life, can be found in:

Aaron Sloman
What's information, for an organism or intelligent machine?
How can a machine or organism mean?,
http://www.cs.bham.ac.uk/research/projects/cogaff/09.html#905
This was an invited contribution to Information and Computation, Eds. Gordana Dodig-Crnkovic and Mark Burgin, World Scientific Publishers, New Jersey, pp.393--438, 2011

A partial index of discussion notes on this and many other topics is in
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/AREADME.html


CONTENTS
Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham