There has been a discussion about whether to untaint or not. A string becomes tainted in Ruby when it comes from an external source, for example. The standard Ruby method untaint marks it as untainted. Plugins such as SafeErb do not allow the programmer to output tainted strings (in Erb) in order to protect the application from XSS. This works fine in my experience (even in Rails 2) and won't need you to untaint manually too often.
There is a good comment in the mentioned discussion (and the author has written an interesting article about safe strings):
"What most tainting-based solutions seem to miss is that 'safeness' or 'taintedness' is not a property of a string but a relationship between the string and the contexts in which it is used."
Indeed, SafeErb escapes and untaints the strings according to one context only – HTML (SGML) – and that is the purpose of the plugin. But there are a lot of other contexts you will use strings in:
SQL
Thanks to clever helper methods, this is hardly a problem in most Rails applications. However, you have to follow the rules and remember the problem whenever you use tainted strings in plain SQL or uncommon places. Use sanitize_sql() to escape in this context.
SGML (HTML, XML, RSS…)
CSS
Beware of CSS Injection if you render style sheets using tainted strings. This was a hole the Samy worm exploited.
Update: Textile
Textile (by means of RedCloth) should be used only together with a white-list sanitizer like Rails' sanitize(). In addition to my previous post, it is possible to inject scripts in image titles like this: !bunny.gif(Bunny" onclick="alert(1))!
Will become:<p><img .src="bunny.gif" title="Bunny" onclick="alert(1)" alt="Bunny" onclick="alert(1)" /></p>
JavaScript
Do escape strings in a JS context as well if you render code using tainted strings.
JSON
Rails
Do you use the params hash in Rails methods that accept hashes? How about redirect_to(params) or similar? Imagine what will happen when the hash contains a :host key! This is logic injection and can't be sanitized.
Log files
Someone could fill up your server's disk or manipulate your log files using log file injection. Remember to sanitize, filter (see filter_parameter_logging()) and truncate here.
Second level
Beware of command line injection.
All these different contexts raise the question when to sanitize a string, before or after storing it (if it is stored). Of course for some contexts (SQL, shell), it certainly has to be sanitized before being stored/processed. But what about the most important context in web applications, SGML? Will I save a sanitized string or the raw input? I prefer to sanitize (or escape actually) strings according to a context when using it in that context, because in a large application you will never know in which contexts you will use the input data in the future.
For performance reasons I store one untouched and one escaped (and even textiled and white-listed) version of the input. The raw version will be used if the user wants to edit the data, and the untainted version will be displayed.
There have come up quite a lot of plugins recently trying to automatically escape the user input data.
- Sanitize_params walks the params hash and sanitizes the strings using the white_list plugin.
- XSS_terminate votes for retrospective sanitization by automatically escaping all the fields in a model when loaded.
- CrossSiteSniper seems to do pretty much the same.
What I don't like about these solutions is that it might be seen as plug-n-play security to install and forget, because it does it all automatically. In my opinion security is a process and you have to reconcile for each and every string what it represents and how to escape it in this context. Of course an automatic escaper can be a great help, but only if you remember to sanitize it differently in an other context.