School of Computer Science THE UNIVERSITY OF BIRMINGHAM

Energy and Other Costs of HTML in Email

Aaron Sloman
Last updated: 25 Jan 2011
Installed: 25 Jan 2011
More more discussion papers see http://www.cs.bham.ac.uk/research/projects/cogaff/misc/AREADME.html

Conjecture and Proposal

I have a conjecture and a related research proposal concerning energy and email.
Neither energy nor support for email or networking services is my research field,
though I am concerned as a citizen about what I suspect is a huge amount of waste
involved. I apologise to all if the answer is already known, especially if my
conjecture is already known to be wrong.

Background:

In the early days of email and the internet, messages were sent in plain text. I
believe it was after microsoft woke up to email, and the use of graphical terminals
supporting multiple fonts as well as pictures became widespread, that they decided to
set use of html for messages as the default, presumably thinking of the way
corporations wished to impress their members, their rivals, their customers by using
fancy logos, colours, special fonts, etc.

However, it was necessary also to provide a plain text copy in case a message was
received by someone with an old mail reader. So messages were sent in both formats by
default. Most microsoft users had no idea this was happening and did not know how to
disable it, even when the option to send plain text was available.

After that use of email spread, other designers of email software felt they had to
provide what microsoft had provided. As a result of the vast majority of the millions
of email messages now being sent every day within and between organisations as well
as to and from organisations and also individuals include both html and plain text.

This duplication before transmission now seems to be the default setting for almost
all email users in my university, and most other universities whose members write to
me.

For example, the plain text version of a recent administrative message contained 1152
characters (Bytes) whereas the html copy of the message, with all the automatically
generated html included. contained 7938 characters (Bytes).

The increase in size of the message body is therefore (7938)/1152.0 = 6.89, i.e. 689%

(It's a smaller increase for the message as whole, including the message headers
shared between both versions of the message.)

Most people have no idea that such inflation is happening whenever they send email,
including those concerned citizens who include in their messages a polite request to
save trees by not printing email, sometimes with an attached image of a tree
displayed in the html version, adding significantly to the bulk transmitted.

Why should all this matter?

My conjecture is that world wide, the annual cost of all the extra email bulk
caused by use of html will be billions of pounds, and it will probably also be a
significant subset of this university's budget, and this university will also be
adding to the costs of other organisations and individuals to whom, or through whom,
it sends email.

Others, of course, are doing the same to us.

I have not done a comprehensive analysis but here are some of the extra costs I am
aware of:

 1. In order to maintain a given level of information transfer, the network bandwith
    provided (and paid for) has to be significantly increased to cope with the extra html
    padding. What proportion of increase is required will depend on what else the network
    is used for. (Part of the research required.)

 2. The various relay stations will require extra memory and extra cpu power, plus
    extra power supplies and electricity consumption, along with extra backup hardware to
    provide reliability. This will have both capital costs and running costs, including
    carbon consumption costs.

 3. The same is true of the extra processing requirement for spam filters and
    anti-malware checks at various points.

 4. The end users who pay pro-rata for downloads and uploads will have to pay
    significant extra costs to service providers. The ones who don't pay according to
    amount transmitted, will just have slower downloads and uploads, and extra capital
    and running costs to support the processing and file capacity on their local
    machines. They may also have to pay higher charges simply because the costs of
    service providers are increased by the need to cope with all the unnecessary html
    in email messages.

 5. If end users keep significant amounts of mail, and do proper backups, the backup
    capital and energy costs will be increased. In some cases this will also add human
    time and inconvenience, e.g. because of occasions when file capacity limits are
    reached.

 6. There may also be extra subsequent costs because of regular anti-malware checks,
    and the costs of using utilities that search local files by content.

I don't know if that's an exhaustive list.

Proposed research project:

The proposed research project would do a systematic investigation

(a) to identify all the types of cost associated with email processing (networking,
    local storage, cpu power needed, energy consumed, human time, etc.)

(b) to identify the annual cost to this university -- and perhaps to the whole UK
    educational system,

(c) to estimate the total annual cost to the nation,

(d) to estimate the cost world wide.

My conjecture is that energy usage will be a significant proportion of the total.
(Even capital equipment costs include the energy required to make and transport the
equipment. and service engineers, etc.)

If the costs are significant, the project would also come up with recommendations for
actions at variious levels, to reduce the costs.

E.g. could that be done by running software to remove html duplicates of plain text
sections of email messages coming into or passed around in the university, or would
the energy cost of that process be higher than the cost of allowing all the html to
pass through?

There is another type of cost that is not the subject of the project I am proposing:
the use of html makes transmission of malware easier for criminals because what users
see displayed in their messages may not reflect the actual content of the underlying
html. (Though some mail interfaces now detect that and warn the user.)

I don't know how much of the global cost of html is due to people being deceived in
that way, or the other wasted costs because of the criminal spam that is sent in vain
to very large numbers of people only because html makes it possible to deceive a
small subset of ignorant users who part with their money as a result.

Does anyone have figures?

There is a little more about this, including advice on how to turn off sending html
by default here

    http://www.cs.bham.ac.uk/~axs/textmail
    (based in part on Uday Reddy's Email tips
    file here: http://www.cs.bham.ac.uk/~udr/tips/ )

To illustrate what I am talking about, the plain text version of an
adminisrative message received recently had this amount of text.

    Dear colleagues,

    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

In contrast, the TOTALLY UNNECESSARY html copy that came with it (presumably without
the sender's knowledge) because of the default mail settings used, starts as follows,
although most recipients will never see this stuff because it is not displayed.

(To ensure that it is readable even in html I have replaced the "<"
and ">" html brackets throughout with "[" and "]" (52 occurrences of
each):

START OF HTML MESSAGE BODY
    [html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word"
    xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p="urn:schemas-microsoft-com:office:powerpoint" xmlns:a="urn:schemas-microsoft-com:office:access" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema" xmlns:b="urn:schemas-microsoft-com:office:publisher" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:odc="urn:schemas-microsoft-com:office:odc" xmlns:oa="urn:schemas-microsoft-com:office:activation" xmlns:html="http://www.w3.org/TR/REC-html40"
    xmlns:q="http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc="http://microsoft.com/officenet/conferencing" xmlns:D="DAV:" xmlns:Repl="http://schemas.microsoft.com/repl/" xmlns:mt="http://schemas.microsoft.com/sharepoint/soap/meetings/" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ppda="http://www.passport.com/NameSpace.xsd" xmlns:ois="http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir="http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:dsp="http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc="http://schemas.microsoft.com/data/udc" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sub="http://schemas.microsoft.com/sharepoint/soap/2002/1/alerts/" xmlns:ec="http://www.w3.org/2001/04/xmlenc#" xmlns:sp="http://schemas.microsoft.com/sharepoint/" xmlns:sps="http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:udcs="http://schemas.microsoft.com/data/udc/soap" xmlns:udcxf="http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udcp2p="http://schemas.microsoft.com/data/udc/parttopart" xmlns:wf="http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:dsss="http://schemas.microsoft.com/office/2006/digsig-setup" xmlns:dssi="http://schemas.microsoft.com/office/2006/digsig" xmlns:mdssi="http://schemas.openxmlformats.org/package/2006/digital-signature" xmlns:mver="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns:mrels="http://schemas.openxmlformats.org/package/2006/relationships" xmlns:spwp="http://microsoft.com/sharepoint/webpartpages" xmlns:ex12t="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:ex12m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:pptsl="http://schemas.microsoft.com/sharepoint/soap/SlideLibrary/" xmlns:spsl="http://microsoft.com/webservices/SharePointPortalServer/PublishedLinksService" xmlns:Z="urn:schemas-microsoft-com:" xmlns:st="" xmlns="http://www.w3.org/TR/REC-html40"][head][META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii"][meta name=Generator content="Microsoft Word 12 (filtered medium)"][style][!--
    /* Font Definitions */
    @font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
    @font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
    @font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
    /* Style Definitions */
    p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
    a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
    a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
    p
        {mso-style-priority:99;
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
    p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
        {mso-style-priority:99;
        mso-style-link:"Balloon Text Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:8.0pt;
        font-family:"Tahoma","sans-serif";}
    span.BalloonTextChar
        {mso-style-name:"Balloon Text Char";
        mso-style-priority:99;
        mso-style-link:"Balloon Text";
        font-family:"Tahoma","sans-serif";}
    p.msochpdefault, li.msochpdefault, div.msochpdefault
        {mso-style-name:msochpdefault;
        mso-style-priority:99;
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Times New Roman","serif";}
    span.balloontextchar0
        {mso-style-name:balloontextchar;
        font-family:"Tahoma","sans-serif";}
    span.emailstyle19
        {mso-style-name:emailstyle19;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
    span.emailstyle20
        {mso-style-name:emailstyle20;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle24
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle25
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle26
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle27
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle28
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    .MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
    @page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
    div.WordSection1
        {page:WordSection1;}
    --][/style][!--[if gte mso 9]][xml]
    [o:shapedefaults v:ext="edit" spidmax="1026" /]
    [/xml][![endif]--][!--[if gte mso 9]][xml]
    [o:shapelayout v:ext="edit"]
    [o:idmap v:ext="edit" data="1" /]
    [/o:shapelayout][/xml][![endif]--][/head][body lang=EN-GB link=blue vlink=purple][div class=WordSection1][div][div][p class=MsoNormal][span
    style='color:black']Dear colleagues,[o:p][/o:p][/span][/p][p
    class=MsoNormal][span
    style='color:black'][o:p] [/o:p][/span][/p]
    [p class=MsoNormal][span style='color:black']

ACTUAL MESSAGE STARTS HERE:
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXX
AND ENDS HERE

    [o:p][/o:p][/span][/p][p
    class=MsoNormal]
    [span style='color:black'][o:p] [/o:p][/span][/p]
END OF HTML MESSAGE BODY

The actual message content clearly occupies only a tiny subset.

Of course, my conjecture that there are substantial costs, including capital costs,
running costs for maintenance, etc. and energy costs, as described above, may turn
out to be false.

However, the project may even then still be worth while, and interesting, though not
so important.

I spent a few minutes using google to search for reports of such a project but did
not find anything (though that may be because I did not spend enough time looking:
google is usually very good at putting things I want near the top of the list, if
they exist).

It's possible that Prof Andy Hopper (Cambridge) already has the information, since he
has done a lot of work, and given public presentations, on energy costs of
networking, but my (short) search did not reveal anything by him on the costs of
html-inflated email.

If anyone is interested in doing a project like this, or knows of one that has
already been done, please let me know. [A.Sloman AT cs.bham.ac.uk]

If you send me a comment please specify if you would prefer it not to be included in
an appendix to this web site, which I shall create later.

Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham