Thursday, March 15, 2007

Comparing Tempfile API's in Python and Ruby

Much to the detriment of my productivity at work, I've been bouncing back and forth between Python and Ruby. But it makes for some interesting API's comparisons. So I'll continue the motif I started with looking at Ruby and XML-RPC API's by comparing temp files APIs.

First the security background, Insecure temp file usage is a notorious (but typically local) security flaw that can lead arbitrary code execution, , escalation of privilege, and denial of service--commonly as a result of race conditions. Take a search Packetstorm and you'll see a bunch of examples of what not to do.

Ruby
So let's take a look at the tempfile module in the Ruby Standard library:

new(basename, tmpdir=Dir::tmpdir)

Creates a temporary file of mode 0600 in the temporary directory whose name is basename.pid.n and opens with mode "w+". A Tempfile object works just like a File object.

If tmpdir is omitted, the temporary directory is determined by Dir::tmpdir provided by ‘tmpdir.rb’. When $SAFE > 0 and the given tmpdir is tainted, it uses /tmp. (Note that ENV values are tainted by default)


How would you use this?
franz-g4:~ mdfranz$ irb
irb(main):001:0> require 'tempfile'
=> true
irb(main):002:0> t = Tempfile.new("ruby")
=> #
irb(main):003:0> u = Tempfile.new("ruby")

At this point you have a regular old File object you can do what ever you want to with. So the permissions are limited to the process owner and the filename is based on the process id, which depending on the OS may be more or less predictable. But the good news is the file is deleted automatically after the script finishes execution.
franz-g4:/tmp mdfranz$ ls -al ruby*
-rw------- 1 mdfranz wheel 0 Mar 15 20:36 ruby655.0
-rw------- 1 mdfranz wheel 0 Mar 15 20:37 ruby655.1

Python
With Python we get an (ostensibly) more secure methods as part of the tempfile module:

mkstemp([suffix[, prefix[, dir[, text]]]])
Creates a temporary file in the most secure manner possible. There are no race conditions in the file's creation, assuming that the platform properly implements the O_EXCL flag for os.open(). The file is readable and writable only by the creating user ID. If the platform uses permission bits to indicate whether a file is executable, the file is executable by no one. The file descriptor is not inherited by child processes.

Unlike TemporaryFile(), the user of mkstemp() is responsible for deleting the temporary file when done with it.

If suffix is specified, the file name will end with that suffix, otherwise there will be no suffix. mkstemp() does not put a dot between the file name and the suffix; if you need one, put it at the beginning of suffix.

If prefix is specified, the file name will begin with that prefix; otherwise, a default prefix is used.

If dir is specified, the file will be created in that directory; otherwise, a default directory is used.

If text is specified, it indicates whether to open the file in binary mode (the default) or text mode. On some platforms, this makes no difference.

mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) and the absolute pathname of that file, in that order. New in version 2.3.


and for directories

mkdtemp([suffix[, prefix[, dir]]])
Creates a temporary directory in the most secure manner possible. There are no race conditions in the directory's creation. The directory is readable, writable, and searchable only by the creating user ID.

The user of mkdtemp() is responsible for deleting the temporary directory and its contents when done with it.

The prefix, suffix, and dir arguments are the same as for mkstemp().

mkdtemp() returns the absolute pathname of the new directory. New in version 2.3.

An example combining the two:


>>> import tempfile
>>> td = tempfile.mkdtemp()
>>> td
'/tmp/tmpaJ-M4J'
>>> tf = tempfile.mkstemp(dir=td)
>>> tf
(3, '/tmp/tmpaJ-M4J/tmpb7pxE4')


And even after the script executes they are still around

franz-macbook:/tmp mdfranz$ ls -al tmpaJ-M4J/
total 0
drwx------ 3 mdfranz wheel 102 Mar 17 07:08 .
drwxrwxrwt 4 root wheel 136 Mar 17 07:06 ..
-rw------- 1 mdfranz wheel 0 Mar 17 07:08 tmpb7pxE4

So unlike the Ruby you have to call open on the pathname + filename you got back from mkstemp/mktemp vs getting a File/IO object back.

Conclusions
As usual, the Python API documentation is more complete (and probably more importantly) consistent and in a single place. Going back and forth between the The Pragmatic Programmer and the the Ruby Standard Library Documentation and the Ruby Class and Library reference is very annoying. While I certainly didn't audit the source for Python or Ruby to determine the difficulty of race conditions and to see whether or not they are "impossible" but the use of a more pseuedo-random value for the filename (and directory name) with Python certainly raises the bar higher for Python. And combining the directory and file functions in Python. The only real downside for the Python API is that you have to clean up temporary files and directories automatically, likely a consequence of the Python API not being "truly and consistenly OO" (a common critique).