The goal of this post is to point out some of the gray areas in the web platform and how browser security is shaping up (some of you might already know these in pieces). The post is intended to bring security focus to web developers so that it enables them to understand cutting edge research going in this area. As a part of my M.S by research thesis, I am lucky enough to survey and study the state-of-the-art research going in this area.
Same-Origin-Policy:To understand and define SOP better, one needs to understand the term “Origin”.
Origin: The combination of “scheme://host: port” is called the “Origin” of a website. Scheme stands for protocols like http, https etc. Host stands for domain name such as example.com. Port is the port number on which the protocol is running. For http, the default port is 80 and is not included in notations.
As per the above definition, http://example.com is called an origin. Note that http://example.com/profile/kris.php (dummy link) is called a URL whose origin is http://example.com.
Same and cross origin interactions
Same-Origin-Policy controls the interactions that can happen between origins. Interactions like read/write/execute on “resources” are granted access within an origin whereas restrictions are put across origins. The below table shows clearly what are same and cross origins.
Note that the sub-domains mail.example.com and chat.example.com are treated as different origins though they have the same parent domain. This restriction can be put off by a feature called “domain relaxation”. By setting document.domain property to “example.com”, both the sub-domains can reduce the origin check to “example.com” instead. There are a lot of intricacies of course, which I do not want to cover here.
Note: Michal Zalewski’s book “The Tangled Web” best explains in detail, the intricacies of origins and browser security. My understanding of web security has multiplied after reading this book. Also, the core concepts in this post are compiled from some of its chapters. Suggest you to read the book for increasing your depth.
So the interactions between sites whose origin do not match are known as cross-origin interactions while between those that match are called same origin interactions. These interactions vary between various resources of the site. Here, the term “resources” refer to DOM (Document Object Model), Cookies, XMLHTTPRequest (AJAX), HTML5 Local storage etc. In general, SOP can be stated as - Within an origin, all scripts have equal and complete access to DOM, storage and network, whereas across origins, they cannot. It is observed that the implementation of Same-Origin-Policy in various browsers varies with the resource in consideration.
Now that you have understood what same origin and cross origin interactions are, try to answer the below questions yourself. To set the context, let A.com and B.com be websites from two different origins. Let A and B be the the content of the sites A.com and B.com respectively, rendered by browsers. Now,
- Can A get resources (images/css/scripts) from B.com?
- Can A execute resources (scripts) from B.com?
- Can A post content to B.com?
- Can A interfere with the DOM of B?
- Can A redirect a browsing context (iframe, window etc) of B?
- Can A read storage (cookies, localStorage etc) of B?
- Can chat.A.com communicate (exchange data) with A.com?
- Can A.com/user1 read/fetch content of A.com/user2?
Though these questions look trivial, they bear a lot of concepts behind. I have written several blog posts related to the above questions in this year (2012). Check these: JSONP and Cross-Origin AJAX, AJAX vs. HTML5 CORS, HTML5 Sandbox, The need for HTML5 PostMessage, Frame Navigation Policies.
Why is the web “Uncontrollable”?
Though Same-Origin-Policy enforces certain restrictions on the way script interact, it has several bypasses and is not sufficient to meet complex security requirements (e.g., mashups are the best example). Below are some of the cases which are beyond the control of SOP.
Moreover, when it comes to scripts, there is recursion involved in some sense. i.e., a script can in turn create another script tag in the DOM and load a remote script. So if a script “X” is trust worthy, it does not imply that the other scripts “Y” and “Z” loaded by “X” are trust worthy. Trust is not transitive and cannot be verified in this model.
Cross-Site Scripting (XSS):
As most of us know, XSS is a technique in which attackers use flaws in web applications to inject evil code. Typically, code injection happens due to lack of (or) weak sanitization of inputs, either before storing into database or before rendering into the DOM. Based on the way XSS is triggered, it is classified into Stored, Reflected and DOM XSS. OWASP.org has detailed information on XSS along with a few detection and prevention techniques.
The point here is, once a script is injected, it has equal and complete privileges as other genuine scripts in that origin. So DOM access, network access and storage access is compromised. SOP does not have anything to say about XSS, which is a major problem.
Cross-Site Request Forgery (CSRF):
In this well known attack, the attacker masquerades as a genuine user and submits HTTP requests to the server. The server assumes the request is from a genuine user and executes it. A simple image tag can be used to send cross origin GET requests, thereby forging the genuine user. (Check OWASP.org for more info). SOP does not have any restrictions on out-going requests or a mechanism to identify genuine/malicious requests and hence fails to prevent CSRF.
Data can be sent out to cross-origins using several techniques. Apart from image, script, link tags which can do cross origin GET requests, HTML forms can do cross-origin POSTs, because of which sensitive data can be exported to evil destinations. There are several hacks which can export data (out of scope of this topic) and the main problem here is lack of check on out-going data. SOP does not have a mechanism to check data-exfiltration, which is also a major concern.
In principle, data can be pulled in and sent out via several channels without any security checks by browsers (Authentication and authorization come under application logic at the server, not on the client). Due to these main factors, the state of the current web is uncontrollable. To fix these problems, security researchers are focusing on designing stricter browser security policies, which address the limitations of SOP. The main challenge here is, web has already grown into a huge tree and any drastic change will break it completely. Backward compatibility with respect to supporting older browsers, older languages, developers who are not ready to migrate etc. are a part of this grand challenge. However, the work of several smart researchers is beginning to pay with smarter and stricter policies coming into picture. I shall discuss them in my upcoming posts. Stay tuned