The need for HTML5 postMessage API

The postMessage API in HTML5 specification is useful for making cross domain calls across frames. This is typically useful for mashups, Web 2.0 sites (e.g., pageflakes.com) where different widgets might need to communicate with each other.

HTML5 postMessage Demo

Few developers have already started using HTML5 postMessage in their projects, without knowing why they are using. Here are a couple of questions an inquisitive developer might have in mind:

1. How are mashups and rich Web 2.0 applications built even before HTML5 postMessage API came into existence?

2. What is the trust model which Web 2.0 sites have? (Who trusts whom?)

3. Is there really a need for a new API when workarounds met the needs?

This post tries to answer these questions and explains why postMessage API is important. Though the usage of API looks trivial, the birth of this API is the outcome of several insightful research papers, which are also a motivation for this post.

In the screenshot, the web pages loaded in the top window and iframe are from different domains. On clicking the submit button, the message in textbox is sent to iframe and displayed in the last line. Notice that the top window url (http://localhost/postMessage) has a default port number(80), while the iframe has a different port number (81). Hence the site in iframe is treated as that from a cross domain.

HTML5 postMessage API is as simple as the below JavaScript code.

<script language="javascript">
        window.onload = function () {
            win = document.getElementById("ifrDomain2").contentWindow; //get the target iframe window
            frm = document.getElementById("frmPost");  //get the form which needs to post a message
 
            frm.onsubmit = function (e) {
                msg = document.getElementById("txtMessage").value;  //get the message to be posted from the textbox
                win.postMessage(msg, "http://localhost:81/");  //post the message to the destination URL
                e.preventDefault();  //prevent default action (suppress postback)
            };
        };
</script>

Now let us see why such an API is needed in the first place and try to answer the above questions.

Same Origin Policy and Trust Model:

As most of you know, the Same Origin Policy (SOP) of browsers disallows scripts loaded from one origin to access DOM of another origin (Two sites do not belong to same origin if they differ in at least one of the three- protocol, domain name or port number). Due to this, an AJAX call cannot be made from one domain to another domain from a browser. So far so good, since if this policy is not in place, an attacker can make an AJAX call to your site and grab your cookie.

However, the SOP is not applicable to scripts themselves! Developers can always embed script tags which point to different domains (just as we include reference to jQuery or any JS library from CDNs). If there are scripts from multiple sources, the application is not secure. But this is how mashups and most of our web applications are built! Isn’t this ironical? Moreover, the scripts which are loaded from different domains run under the privilege of the host site. So whether it is external script file or JSONP script injection, the developer should have ‘complete trust’ on the scripts being injected.

As Douglas Crockford rightly points, “A mashup is a self-inflicted XSS attack”. It is more of a work around than a standard.

Instead of loading external scripts in an integrated site, an alternative is to use iframes to load external sites. Since iframes provide complete isolation mechanism, aggregating content is secure, but genuine communication between frames goes for a toss.

Most of the modern Web 2.0 sites rely on external JS libraries, AJAX and JSONP techniques for fetching, manipulating content. In this case, communication between widgets (divs) is not a problem since entire DOM is accessible to any script (bad design w.r.t security as discussed above). Sites using iframe for isolating widgets rely on “fragment identifiers” (e.g., yoursite.com#message) for communicating between widgets (has confidentiality but no authentication and integrity). These (flawed?) solutions answer our 1st question.

So the trust model we have is, you as a developer/site owner should either trust all (in case of scripts) or trust none (in case of iframes), but nothing in between. This answers our 2nd question. You may trust the JavaScript provided by Google analytics, maps, facebook widgets etc., but this dependency on ‘trust’ does not scale well.

Browser vs. OS:

The modern web has seen data intensive, rich and interactive web applications, which mimic desktop applications. Mashups, which are applications that combine data from multiple data sources, have changed the boundaries of a web browser. Concepts like Web OS started evolving which guided researchers to draw a parallel between browsers and OS.

  • The “system calls” in OS are analogous to “DOM calls” in browsers
  • “Processes” in OS are analogous to “Frames” in browsers
  • “Disk storage” in OS is analogoius to “cookies, localStorage, IndexedDB etc.” in browsers
  • In an OS, “Users” are the principals (which need to be distinguished), whereas in a browser, “Origins” are the principals.

Browsers, which were designed to handle pages from a single domain at a time are now forced to handle pages/data from multiple domains. In other words, as researchers say, web browsers have evolved from a single-principal platform to multi-principal platform. However, unlike OS which can easily handle multi-user scenarios, web browsers prior to HTML5 postMessage did not have the capability to abstract multi-principal scenarios. Their trust model remained the same as discussed above.

Hence, there is a need for a newer standard supported by browsers, which can securely abstract multiple principals and provide communication between them, thereby improving the trust model (answers 3rd question). There were several recommendations like JSONRequest, Verifiable Origin Policy, CommRequest etc.,as described in the references, for solving these problems and finally, the HTML5 postMessage API came into existence.

//Syntax of HTML5 postMessage
otherwindow.postMessage(message, targetOrigin); //Clearly, the "targetOrigin" parameter improves trust!

The postMessage channel, which is designed for cross site communication, guarantees confidentiality, integrity and authentication and improves trust (A frame can now communicate with a trusted frame by specifying the target). With this standard in place, frames can now be attractive feature to integrate 3rd party content, create widgets with improved trust. It is supported by majority of modern browsers (IE8+, FF3+, Chrome, Safari, Opera 10+).

Hope the article helped in understanding why HTML5 postMessage is needed and possibly pointed out the mistake you are doing by not using it for your requirements. Let us build a more secure and standard compliant web, one website at a time Smile

References:

1. “Securing Frame Communication in Browsers” – by Stanford web security lab

2. “Protection and Communication Abstractions for Web Browsers in MashupOS” – by Microsoft Research & Stanford web sec lab.