Security & XML

After a seminar on threat modelling today, I started thinking some more about how XML web services can be secured. The obviously hard part is that things like SOAP are typically run all on port 80. Multiple applications can be running on the same webserver. Sure, SOAP clients/servers can easily be configured to run on different ports, using basic authentication, SSL, or other standard techniques used for securing web sessions. However, the more important issue is that SOAP really falls into the same categories as RPC, Corba, RMI, and so on.

While working on the 'sparkle' web-based email client, I improved security drastically be incorporating cross-site scripting protection. Actually, Nick Cleaton did some testing against acmemail/sparkle, by sending HTML messages (with javascript in them) to his account, then tested it using webmail. Read his XSS rant at: Apache has information at:

There are many web-based tools out there. Think of web-based forums, Movable type, and so on, that allow you to enter freeform HTML. Hey, it's easy to code up in a night's worth of coding, looks good, and is easy to use, what can go wrong? Well, the server code might be secure... but somebody could easily post some an <a href> containing an onClick() event that will fire open a new window. The new window could be set to open hotmail or someother site that is secured. However, many users have logon cookies saved to sites... so by posting a MT comment using javascript, a malicious user could launch a site (perhaps online banking? or private email?), grab the content of a page (like a credit care number or other private info) via DHTML, then post it to their own site. This could even be done via email, since clients like Outlook Express will happily execute the Javascript code. So, what many web sites are doing is black listing certain tags such as <body> and <script>, or attributes like onMouseOver or onClick(). That's still not perfect though. The best is a detailed whitelist of what is allowed, down to every single tag and each tag's attributes. For examples of this, see

Anyhow, that's my brief rundown on how "simple little innocent applications" can have wider range affects that you don't normally think of. And, even if your program is closed-source, malicious users can still probe it to see what it is capable of. Also, as the simple vulnerabilities are blocked, the cracks/hacks around them just get more and more intelligent. Just as they say "there will always be a better idiot" about testing software for bugs and usability issues, there will always be somebody trying to break in. Again, it comes down to weighing the risk, or the cost/benefit ratio. There is nothing new under the sun.

Ok, so how can XML requests and their XML responses by filtered? With SOAP, there are three key things required in a request:

So, a filtering system (or shall we say, XML filtering?) would need to, at a minimum, look at:

The problem is that it's such a new concept, and has a much higher administrative load to configure this for every single Common Gateway Interface (CGI) based application. And, more processing power is required to do in-depth analysis of the proxied streams.

As a conclusion, using only regular expressions (as used in this script) is probably not the way to go. Instead, some sort of fast DOM parser would be optimal.

I think I could do something like this... but I'm not sure how streaming HTTP/1.1 XML sessions would happen. For example, Jabber keeps a long running XML connection opening. The final closing XML tag is what tells the server the client has disconnected. Until then, the client just continues to add requests in XML to the socket.

Back to XML