Aaron Sloman
Last updated: 25 Jan 2011
Installed: 25 Jan 2011
More more discussion papers see http://www.cs.bham.ac.uk/research/projects/cogaff/misc/AREADME.html
I have a conjecture and a related research proposal concerning energy and email. Neither energy nor support for email or networking services is my research field, though I am concerned as a citizen about what I suspect is a huge amount of waste involved. I apologise to all if the answer is already known, especially if my conjecture is already known to be wrong. Background: In the early days of email and the internet, messages were sent in plain text. I believe it was after microsoft woke up to email, and the use of graphical terminals supporting multiple fonts as well as pictures became widespread, that they decided to set use of html for messages as the default, presumably thinking of the way corporations wished to impress their members, their rivals, their customers by using fancy logos, colours, special fonts, etc. However, it was necessary also to provide a plain text copy in case a message was received by someone with an old mail reader. So messages were sent in both formats by default. Most microsoft users had no idea this was happening and did not know how to disable it, even when the option to send plain text was available. After that use of email spread, other designers of email software felt they had to provide what microsoft had provided. As a result of the vast majority of the millions of email messages now being sent every day within and between organisations as well as to and from organisations and also individuals include both html and plain text. This duplication before transmission now seems to be the default setting for almost all email users in my university, and most other universities whose members write to me. For example, the plain text version of a recent administrative message contained 1152 characters (Bytes) whereas the html copy of the message, with all the automatically generated html included. contained 7938 characters (Bytes). The increase in size of the message body is therefore (7938)/1152.0 = 6.89, i.e. 689% (It's a smaller increase for the message as whole, including the message headers shared between both versions of the message.) Most people have no idea that such inflation is happening whenever they send email, including those concerned citizens who include in their messages a polite request to save trees by not printing email, sometimes with an attached image of a tree displayed in the html version, adding significantly to the bulk transmitted. Why should all this matter? My conjecture is that world wide, the annual cost of all the extra email bulk caused by use of html will be billions of pounds, and it will probably also be a significant subset of this university's budget, and this university will also be adding to the costs of other organisations and individuals to whom, or through whom, it sends email. Others, of course, are doing the same to us. I have not done a comprehensive analysis but here are some of the extra costs I am aware of: 1. In order to maintain a given level of information transfer, the network bandwith provided (and paid for) has to be significantly increased to cope with the extra html padding. What proportion of increase is required will depend on what else the network is used for. (Part of the research required.) 2. The various relay stations will require extra memory and extra cpu power, plus extra power supplies and electricity consumption, along with extra backup hardware to provide reliability. This will have both capital costs and running costs, including carbon consumption costs. 3. The same is true of the extra processing requirement for spam filters and anti-malware checks at various points. 4. The end users who pay pro-rata for downloads and uploads will have to pay significant extra costs to service providers. The ones who don't pay according to amount transmitted, will just have slower downloads and uploads, and extra capital and running costs to support the processing and file capacity on their local machines. They may also have to pay higher charges simply because the costs of service providers are increased by the need to cope with all the unnecessary html in email messages. 5. If end users keep significant amounts of mail, and do proper backups, the backup capital and energy costs will be increased. In some cases this will also add human time and inconvenience, e.g. because of occasions when file capacity limits are reached. 6. There may also be extra subsequent costs because of regular anti-malware checks, and the costs of using utilities that search local files by content. I don't know if that's an exhaustive list. Proposed research project: The proposed research project would do a systematic investigation (a) to identify all the types of cost associated with email processing (networking, local storage, cpu power needed, energy consumed, human time, etc.) (b) to identify the annual cost to this university -- and perhaps to the whole UK educational system, (c) to estimate the total annual cost to the nation, (d) to estimate the cost world wide. My conjecture is that energy usage will be a significant proportion of the total. (Even capital equipment costs include the energy required to make and transport the equipment. and service engineers, etc.) If the costs are significant, the project would also come up with recommendations for actions at variious levels, to reduce the costs. E.g. could that be done by running software to remove html duplicates of plain text sections of email messages coming into or passed around in the university, or would the energy cost of that process be higher than the cost of allowing all the html to pass through? There is another type of cost that is not the subject of the project I am proposing: the use of html makes transmission of malware easier for criminals because what users see displayed in their messages may not reflect the actual content of the underlying html. (Though some mail interfaces now detect that and warn the user.) I don't know how much of the global cost of html is due to people being deceived in that way, or the other wasted costs because of the criminal spam that is sent in vain to very large numbers of people only because html makes it possible to deceive a small subset of ignorant users who part with their money as a result. Does anyone have figures? There is a little more about this, including advice on how to turn off sending html by default here http://www.cs.bham.ac.uk/~axs/textmail (based in part on Uday Reddy's Email tips file here: http://www.cs.bham.ac.uk/~udr/tips/ ) To illustrate what I am talking about, the plain text version of an adminisrative message received recently had this amount of text. Dear colleagues, XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX In contrast, the TOTALLY UNNECESSARY html copy that came with it (presumably without the sender's knowledge) because of the default mail settings used, starts as follows, although most recipients will never see this stuff because it is not displayed. (To ensure that it is readable even in html I have replaced the "<" and ">" html brackets throughout with "[" and "]" (52 occurrences of each): START OF HTML MESSAGE BODY [html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p="urn:schemas-microsoft-com:office:powerpoint" xmlns:a="urn:schemas-microsoft-com:office:access" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema" xmlns:b="urn:schemas-microsoft-com:office:publisher" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:odc="urn:schemas-microsoft-com:office:odc" xmlns:oa="urn:schemas-microsoft-com:office:activation" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:q="http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc="http://microsoft.com/officenet/conferencing" xmlns:D="DAV:" xmlns:Repl="http://schemas.microsoft.com/repl/" xmlns:mt="http://schemas.microsoft.com/sharepoint/soap/meetings/" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ppda="http://www.passport.com/NameSpace.xsd" xmlns:ois="http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir="http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:dsp="http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc="http://schemas.microsoft.com/data/udc" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sub="http://schemas.microsoft.com/sharepoint/soap/2002/1/alerts/" xmlns:ec="http://www.w3.org/2001/04/xmlenc#" xmlns:sp="http://schemas.microsoft.com/sharepoint/" xmlns:sps="http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:udcs="http://schemas.microsoft.com/data/udc/soap" xmlns:udcxf="http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udcp2p="http://schemas.microsoft.com/data/udc/parttopart" xmlns:wf="http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:dsss="http://schemas.microsoft.com/office/2006/digsig-setup" xmlns:dssi="http://schemas.microsoft.com/office/2006/digsig" xmlns:mdssi="http://schemas.openxmlformats.org/package/2006/digital-signature" xmlns:mver="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns:mrels="http://schemas.openxmlformats.org/package/2006/relationships" xmlns:spwp="http://microsoft.com/sharepoint/webpartpages" xmlns:ex12t="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:ex12m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:pptsl="http://schemas.microsoft.com/sharepoint/soap/SlideLibrary/" xmlns:spsl="http://microsoft.com/webservices/SharePointPortalServer/PublishedLinksService" xmlns:Z="urn:schemas-microsoft-com:" xmlns:st="" xmlns="http://www.w3.org/TR/REC-html40"][head][META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii"][meta name=Generator content="Microsoft Word 12 (filtered medium)"][style][!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif";} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} p {mso-style-priority:99; margin:0cm; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman","serif";} p.MsoAcetate, li.MsoAcetate, div.MsoAcetate {mso-style-priority:99; mso-style-link:"Balloon Text Char"; margin:0cm; margin-bottom:.0001pt; font-size:8.0pt; font-family:"Tahoma","sans-serif";} span.BalloonTextChar {mso-style-name:"Balloon Text Char"; mso-style-priority:99; mso-style-link:"Balloon Text"; font-family:"Tahoma","sans-serif";} p.msochpdefault, li.msochpdefault, div.msochpdefault {mso-style-name:msochpdefault; mso-style-priority:99; margin:0cm; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Times New Roman","serif";} span.balloontextchar0 {mso-style-name:balloontextchar; font-family:"Tahoma","sans-serif";} span.emailstyle19 {mso-style-name:emailstyle19; font-family:"Calibri","sans-serif"; color:windowtext;} span.emailstyle20 {mso-style-name:emailstyle20; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle24 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle25 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle26 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle27 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle28 {mso-style-type:personal-reply; font-family:"Calibri","sans-serif"; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt;} div.WordSection1 {page:WordSection1;} --][/style][!--[if gte mso 9]][xml] [o:shapedefaults v:ext="edit" spidmax="1026" /] [/xml][![endif]--][!--[if gte mso 9]][xml] [o:shapelayout v:ext="edit"] [o:idmap v:ext="edit" data="1" /] [/o:shapelayout][/xml][![endif]--][/head][body lang=EN-GB link=blue vlink=purple][div class=WordSection1][div][div][p class=MsoNormal][span style='color:black']Dear colleagues,[o:p][/o:p][/span][/p][p class=MsoNormal][span style='color:black'][o:p] [/o:p][/span][/p] [p class=MsoNormal][span style='color:black'] ACTUAL MESSAGE STARTS HERE: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXX AND ENDS HERE [o:p][/o:p][/span][/p][p class=MsoNormal] [span style='color:black'][o:p] [/o:p][/span][/p] END OF HTML MESSAGE BODY The actual message content clearly occupies only a tiny subset. Of course, my conjecture that there are substantial costs, including capital costs, running costs for maintenance, etc. and energy costs, as described above, may turn out to be false. However, the project may even then still be worth while, and interesting, though not so important. I spent a few minutes using google to search for reports of such a project but did not find anything (though that may be because I did not spend enough time looking: google is usually very good at putting things I want near the top of the list, if they exist). It's possible that Prof Andy Hopper (Cambridge) already has the information, since he has done a lot of work, and given public presentations, on energy costs of networking, but my (short) search did not reveal anything by him on the costs of html-inflated email. If anyone is interested in doing a project like this, or knows of one that has already been done, please let me know. [A.Sloman AT cs.bham.ac.uk] If you send me a comment please specify if you would prefer it not to be included in an appendix to this web site, which I shall create later.
Maintained by
Aaron Sloman
School of Computer Science
The University of Birmingham