This post is a follow-up on my previous post.
Sending out files is a tricky thing. Rails’ send_file() method may send a file inline or as attachment. But in order to send a file inline, you have send the correct content type (:type parameter) as well. If you set send_file :disposition => ‘inline’ AND the :type is set to a correct and matching content type, the file will be displayed inside the browser if it understands the file format. If the :type attribute is not set correctly or the default (application/octet-stream) then Firefox will present the download box, but IE6 and IE7 still display it in the browser. If the content type, file extension and file signature do not match, IE6 and IE7 will start the MIME sniffing (see previous post for details). So if the first 256 bytes of the file to be downloaded contain any HTML, the browsers will show it and run any included JavaScript code.
If you set send_file :disposition => ‘attachment’, the browser will display a download box even if the browser could display this type (such as HTML).
To sum it up: If someone uploads a poisoned image with file extension “.html”, “.pdf” or whatever and you send out the file with send_file :disposition => ‘inline’, IE6 and IE7 will run the included JavaScript code. This is because most upload plugins (such as the popular attachment_fu) assume the content type to be “text/html” or “application/pdf” in this case. More correct: Actually the browser already assumes it when uploading the file, but the plugin doesn’t verify it.
The conclusion of this is, that (1) setting the correct content type when sending files with send_file :disposition => ‘inline’ is crucial. And (2) it is very likely that you want to reject images or other files the browser can display that have malicious code in it.
1. Setting the correct content type
I use attachment_fu so I added a before_create :set_content_type_by_content method to my model. If you allow users to upload another file when editing, you can do the same in before_update. This method uses the shared-mime-info package and its Ruby wrapper with the same name. This package is basically a XML-database with known MIME types, its typical file extensions and some magic numbers to determine a file’s real MIME type. Magic numbers are constants found in binary files used to identify a file format.
You can install the package on OS X using MacPorts:
sudo port install shared-mime-info
and follow the OS X specific instructions here. Use your favourite installer for other systems.
Install the gem like this:
sudo gem install shared-mime-info
Now you can require ‘shared-mime-info’ and get the MIME type for a file: MIME.check(filename). The check() method actually uses the file extension to determine the MIME type in cases where MIME.check_magics(filename) doesn’t work. I overwrote this behaviour, i.e. method, so that it doesn’t, because that’s what we wanted to avoid.
Here’s the code to overwrite the behaviour. You can put in lib/core_extensions.rb and require the file in initializer (config/initializers). And here is the before_create callback for the model with attachment_fu:
def set_content_type_by_content
mime = MIME.check_magics(self.temp_path) #try magic numbers first
mime = MIME.check(self.temp_path) if self.content_type.nil? #do other checks if it failed
self.content_type = mime.to_s
self.filename += mime.typical_file_extension unless mime.match_filename?(self.filename)
end
At the end I will add the typical file extension for this MIME type to the file name if the current doesn’t match the MIME type.
2. Reject malicious files
So far so good, now there won’t be any MIME type sniffing in the browser because the MIME type, the filename and the content matches. But nonetheless, you may want to reject images/files that contain JavaScript in the first place. In order to check that I suggested a regular expression in my previous post. As I said, this was meant as a starting point, because it is not quite good. There were suggestions using a whitelist, but unfortunately binary files contain all kinds of <> so that a whitelist filter is not feasible. We do need a blacklist filter that recognizes all kind of injection tricks.
Here is the solution I came up with. The magic happens in the regular expression in the MimeTypeInjection module. The regex is quite complicated, because it has to recognize tags like <sc\0ript>, too (yes, these quirks work in some browsers). The browser_displays_this_type? method determines which content types you want to check for HTML. Of course the “text/html” content type contains HTML, so you have to adjust this method according to your requirements (what types of files users are allowed to upload). Also, this list might not be complete, there might be other content types the browser displays inline (post a comment for other types).