Adventures in Cross-Site Scripting

vectors are fun

Dont cross the streams!

One big component of making our search relevancy tool Quepid simple to use is the ability to just paste in a Solr search URL and go. Thankfully, Solr executes searches entirely with HTTP GET requests. Unfortunately, pasting in an arbitrary URL and executing HTTP requests to that arbitrary third-party violates the browser’s same-origin policy for performing HTTP requests. The browser wants to keep you sandboxed to the domain you started at (ie

The most seamless way to get around this is to talk to Solr through a method known as JSONP. JSONP is certainly somewhat of a hack, but its a fairly well accepted hack. JSONP leverages the fact that your browser can load resources from any domain. A script tag is a resource, so if we dynamically insert a script tag like so:

jsonpReq.src = '' +               '/solr/collection1/select?' +               'wt=json&json.wrf=loadResults&q=searchquery';jsonpReq.type = "text/javascript";document.body.appendChild(jsonpReq);

(see a full example in this jsfiddle)

Solr takes a json.wrf argument (here “loadResults”) which identifies a JavaScript function that the search results should be called on. Solr returns executable javascript that looks like:

loadResults(/*search results as JavaScript object*/);

So when this dynamic script tag is loaded, a global callback “loadResults” will be executed with our search results as an argument.

We’ve used JSONP quite a bit in pure Javascript search applications. It lets us get rid of a middle layer of server-side glue code and focus on a rich, beautiful, client-side application. We love it!

The problem with JSONP

Unfortunately, JSONP has a pretty glaring hole. If your request fails to load, you get the same information you’d get if a script tag failed to load. You get extremely basic error information. You unfortunately don’t get an HTTP status code and you don’t get the data sent back with that error. Unfortunately for us, Solr reports errors via an HTTP error with the included HTTP data describing the exact error.

For example, screwing up the echoParams parameter in this search request returns HTTP 400 with the following error message

Solr Query: Response:          Invalid value 'foo' for echoParams parameter, use 'EXPLICIT' or 'ALL'        400  

While this turns out to be not a big deal for a user-facing application, it stinks for a search developer workbench like Quepid. An advanced developer using Quepid needs more information than “an error has occurred” with their search. Perhaps in the search developers experimentation with the Solr relevancy parameters they mistyped a parameter, and Solr simply couldn’t parse what was sent. As a search developer, getting these errors is a pretty big deal. Missing them is tantamount to a code compiler that gives you no error messages. Not very helpful :).

So knowing that JSONP stinks in this regard, how can we extract the Solr errors even if they come back with an HTTP error?


CORS, Cross-Origin Resource Sharing is a more standardized way of doing cross-site requests within the HTTP protocol. Perhaps this would be a better way to do cross-domain requests and extract both data and errors?

I briefly looked into this for Quepid, but it quickly became apparent that it wasn’t nearly as seamless as JSONP. CORS requires that the server white-list domains that it will accept cross-domain requests from. Our users would need to white-list Quepid’s domain in their Solr web server’s config. I can just imagine our users calls to IT: “Hey there’s this Quepid thing we’d like to try and it requires this other CORS thing. Could you please reconfigure Solr’s web server and add Quepid’s domain to this list?” How long would that take to get approved? Some users may even be using hosted Solr solution where changing the web config is impossible. Even scarier, I’ve seen in a number of places that doing CORS in Solr may require a bit of Java code to be inserted, making it even less seamless.

A big business goal behind Quepid is for it to be easy to try. Therefore, I don’t want to put an undue burden on users. The least friction to trying Quepid the better. Still, perhaps I should keep this in mind. Perhaps initially trying out the product would simply involve using JSONP and more advanced customers could white-list the domain to allow CORS to be used.

Iframes Can Do Cross-Site Requests Too!

One realization I had when playing around with cross-site scripting was that I can also insert iframes into a page using the same method (yes I know Im crazy). Simply by doing

Quepid’s Javascript can communicate with this iframe as follows:

var solrErrorWindow = document.getElementById("solr_errors").contentWindow;solrErrorWindow.postMessage("", "");var receiver = function(e) {   console.log(;}window.addEventListener('message', receiver, false);

And guess what, it works! Check out this jsfiddle for a demo.

The downside to this is that it requires users to add an XSLT file to the right place in their Solr config directory. A one-time inconvenience that users can opt-in to for better error reporting. An inconvenience Im imagining is likely easier to deploy then changes to the web server’s configuration.

The upside to this approach is that it plays by the rules. I’m not doing hacky things trying to insert javascript into the solr response. I’m not circumventing any browser protections. And it works rather well.

A Problem Thatll Drive You Insane

vectors are fun

Brendan Eichs favorite graphic to describe the Web. Evolution makes something thats often not very pretty, but it works!

Reflecting on this problem leaves me wondering. While I’m aware how easy it is to inject cross-site scripts with malicious intent, I’m left wishing the web did a better job here. This is a rough problem to have to solve from an implementation point-of-view. It feels like if a service returns just data we ought to feel a little safer about how an application outside the domain consumes the content. For example, the browser could make decisions based on the mimetype coming back from the other end and relax the restrictions a bit. My naive understanding is that its cross-domain text/html we fear, not XML or JSON. Sure a malicious user that can examine my code can inject just the right XML or JSON into a response to exploit my lack of sanity checking. But its not in the same ballpark as doing cross-site requests where HTML and executable JavaScript is involved.

Complaining aside, it was kind of a fun problem to work on. Getting cross-domain requests right certainly blurs the line between hacking and well “hacking”. I’d be curious if you have any thoughts on solving this problem or a better solution? If so let me know. And of course, shameless plug. Check out Quepid! Its a neat tool if you care about managing your search quality.