Thursday, February 04, 2010

A Maze of Twisty Fuzzers All Alike

Funny how a single innocent tweet can stir the pot. Not that I'm disappointed or that I mind, because the pot definitely needed to be stirred, but that certainly wasn't my intent on Monday. Really. But let's back up.

So I gave a short, not terribly technical presentation on Open Source fuzzing tools right before lunch at a conference on vulnerability discovery hosted by CERT at their office in Arlington. It was go to back there as I'd been to the SEI offices back in 2006 when I was working with them on the disclosure of some SCADA vulns.

Unfortunately, I didn't get to stick around the whole day (I missed Jared DeMott's presentation) and I was in and out during a conference call but there interesting talk by CERT, CERT-FI, Secunia, and Codenomicon.

But most interesting, and what led to my innocent tweet was a talk by Microsoft on how they use fuzzing and what were the results of different tools and approaches.

The conclusion I found to be surprising that they found that the use of "smart fuzzers" to have a lower ROI than the use of "dumb fuzzers" and their whitebox fuzzing platform called SAGE. Their point was the time it takes to define, model, and implement the protocol in a smart fuzzer is in most cases better spent having less skilled engineers run dumb fuzzers or white box tools.

They mentioned a talk at Blue Hat Security Briefings (I don't think this is the actual talk, but I don't have time to look for it) where they presented the bug results on a previously untested application were tested by a internally written (smart fuzzer), Peach (the dumb fuzzer?) and their whitebox fuzzing platform called SAGE. They mentioned an interesting technique of taking "major hashes" and "minor hashes" on the stack traces to isolate unique bugs. This is interesting because the primary focus has been on reducing the number of unique test cases but another approach is to look at the results. It may end up being more efficient. Of course this assumes the ability to have instrumented targets which may not always be the case, for example with embedded systems.

So Dale picked up on this and tried to apply this to the world of SCADA
We have two security vendors that are trying to sell products to the control system market: Wurldtech with their Achilles platform and Mu Dynamics with their Mu Test Suite. [FD: Wurldtech is a past Digital Bond client and advertiser] One of the features of these products is they both send a large number of malformed packets at an interface – - typically crashing protocol stacks that have ignored negative testing.
Mu responded within the comments in the blog and Wurldtech (far more defensively) on their own blog
In fact, our CTO Dr. Kube even gave a presentation at Cansecwest almost 2 years ago called “Fuzzing WTF” which was our first attempt to re-educate the community. To bolster the impact, we invited our friends at Codenomicon to help as they also were frustrated with the community discourse. The presentation can be found here.
Well I guess thie "re-education" (which sounds vaguely Maoist, I guess some of us need to be sent to a Wurldtech Fuzzing Re-education program) hasn't exactly worked although a satisfied Wurldtech customer did chime in on the Digital Bond blog. I actually agree that the need for better descriptions of fuzzing tools capabilities is needed and that was the entire point of my talk. I did a survey of the features available several dozen fuzzing tools and fuzzing frameworks that could be used to test.

I didn't spend as much time on the actual message generation as I should have and I was only focusing on Free and Open Source tools, but I identified a number of attributes for comparison such as target, execution mode, language, transport, template (generation, data model, built-in functions), fault payloads, debugging & instrumentation, and session handling. I'm not sure I completely hit my target but one of my goals was to develop some criteria to help folks make better choices on which Open Source tools could be used to most efficiently conduct robustness testing of your target. One of my conclusions (which I was pleased to hear echoed in the Microsoft talk) is that no single tool is best, no single approach is adequate--and that there are different types of fuzzing users that will require different feature sets. A QA engineer (that may have little to no security expertise) requires different features from those required for a pen-tester (or perhaps security analyst as part of a compliance-based engagement) which are still different from a hard core security researcher.

And the same applies to commercial tools you are paying tens of thousands of dollars for. One size does not fit all, regardless of the marketing (or mathematical) claims of the vendor. It would definitely be good to see a bakeoff of the leading commercial and Open Source fuzzing/protocol robustness tools similar to what Jeff Mercer has been doing for webapp scanners but I'm not optimistic that we will see that on the commercial tools because they are too expensive and the primarily customers for these tools (large vendors) are not going to disclose enough details about the vulnerabilities discovered to provide a rich enough data set for comparison.

It won't be me but perhaps some aspiring young hacker will take the time to do a thorough comparing the coverage of the tools that are out there against a reference implementation -- instead of writing yet another incomplete, poorly documented Open Soure fuzzer or fuzzing framework.

Wednesday, January 13, 2010

Hello MongoDB (Jython Style)

It has been ages since I've played around with any of the Java scripting languages so I thought I'd give Jython a spin with MongoDB. I have no idea about the performance between the pure Python vs. Java driver but it would be an interesting benchmark.

This is a very quick code snippet based on the MongoDB Java tutorial.

This was done on Ubuntu 9.10 with OpenJDK in the standard repositories and assumes the jython shell script is in your path. It also assumes the Java MongoDB driver is in your path and I was lazy so I didn't bother with CLASSPATH.

#!/usr/bin/env jython
import sys
sys.path.append("mongo-1.2.jar")
from com.mongodb import *
print "Jython MongoDB Example"
m = Mongo("10.0.0.33")
db = m.getDB("grid_example")

for c in db.getCollectionNames():
print c

And the output is just what you'd expect.

mfranz@karmic-t61:~/Documents/mongo$ ./jymongo.py
Jython MongoDB Example
fs.chunks
fs.files
system.indexes

Avoiding Bracket Hell in MongoDB Queries (Python Style)

To me it wasn't immediately obvious from the MongoDB Advanced Query documentation that you can string together multiple operators to perform existence, membership, and greater/than that tests. And since JSON can get very messy (and long!) and the syntax is slightly different from the Javascript in the documentation, instead of passing JSON directly to the find method of your collection pass a dictionary and assign the various conditions

For example:

myq = {}
myq["batchstamp"] = b # a timestamp
myq["modbus_tcp_reference_num"] = {"$exists": True}
cur = coll.find( myq )

Although it doesn't appear much easier than passing

{'modbus_tcp_reference_num': {'$exists': True}, 'batchstamp': 999999999}

Once start adding additional conditions (themselves which may have dictionaries it is much easier and less error prone. Trust me!


Sunday, January 10, 2010

PyMongo for Dummies (using Squid logs, again)

In my last blog I showed some examples form the MongoDB shell. Next, we'll go through the PyMongo API, since only crazy people code in JavaScript.

In [3]: c = pymongo.Connection("192.168.169.62")
In [4]: db = c.mongosquid
In [5]: raw = db.raw
In [6]: raw
Out[6]: Collection(Database(Connection('192.168.169.62', 27017), u'mongosquid'), u'raw')

We could have also referred to our collection as db["raw"] or db[coll] if you needed to define the collection in a variable.

In [7]: raw.count()
Out[7]: 205339

You can find out the methods that belong to the database with the collection_names() method.

In [40]: db.collection_names()
Out[40]: [u'raw', u'system.indexes']

The find_one() method allows you to quickly inspect your collection and take a peek at a sample document.

In [10]: raw.find_one()

Out[10]:

{u'_id': ObjectId('4b496cddb15cb004a4000000'), u'format': u'-', u'method': u'GET', u'size': 824477.0, u'source': u'192.168.1.254', u'squidcode': u'TCP_MISS/200', u'stamp': 1263096815.7609999, u'url': u'http://netflix086.as.nflximg.com.edgesuite.net/sa0/166/1680180166.wmv/range/660083845-660907844?'}

The distinct() method does have some limitations, as I discovered the hard way, as you an see from this exception.

In [13]: raw.distinct("stamp") --------------------------------------------------------------------------- OperationFailure Traceback (most recent call last) /root/ /usr/lib/python2.4/site-packages/pymongo-1.3-py2.4-linux-i686.egg/pymongo/collection.pyc in distinct(self, key) /usr/lib/python2.4/site-packages/pymongo-1.3-py2.4-linux-i686.egg/pymongo/cursor.pyc in distinct(self, key) /usr/lib/python2.4/site-packages/pymongo-1.3-py2.4-linux-i686.egg/pymongo/database.pyc in _command(self, command, allowable_errors, check, sock)

OperationFailure: command SON([('distinct', u'raw'), ('key', 'stamp')]) failed: assertion: distinct too big, 4mb cap

So in my previous blog (using JavaScript) I introduced queries but you really can't do anything useful without using a cursor. If you've ever done any MySQL coding before you should be familiar with the concept. Basically it allows you to iterate through the results of a query.

Here we have the same expressions but you obviously need to quote the gt in Python.

In [29]: c = raw.find( {'stamp': { "$gt": 1263096815 }})
In [31]: c.count()
Out[31]: 2060


and

In [23]: c = raw.find({'squidcode':'TCP_DENIED/403'})
In [24]: c.count()

Out[24]: 2999


For the sake of this exercise, we only want to see 3 results so we call the limit() method.

In [26]: c.limit(3)

Out[26]:


Now we can iterate through the results of our query.

In [27]: for e in c:
....: print e
....:
....:

{u'squidcode': u'TCP_DENIED/403', u'format': u'-', u'stamp': 1262520969.721, u'source': u'192.168.1.254', u'url': u'http://www.bing.com/favicon.ico', u'_id': ObjectId('4b496ea4b15cb004a6000000'), u'method': u'GET', u'size': 1419.0}

{u'squidcode': u'TCP_DENIED/403', u'format': u'-', u'stamp': 1262521126.928, u'source': u'192.168.1.254', u'url': u'http://www.msn.com/', u'_id': ObjectId('4b496ea4b15cb004a600003e'), u'method': u'GET', u'size': 1395.0}

{u'squidcode': u'TCP_DENIED/403', u'format': u'-', u'stamp': 1262521127.654, u'source': u'192.168.1.254', u'url': u'http://www.bing.com/favicon.ico', u'_id': ObjectId('4b496ea4b15cb004a600003f'), u'method': u'GET', u'size': 1419.0}

So if we try again, what happens?

In [28]: for e in c:

print e
....:
....:

Nada. We have to rewind the cursor object to be able iterate again.

In [30]: c.rewind()
Out[30]:
In [31]: for e in c:
print e ....: ....:

{u'squidcode': u'TCP_DENIED/403', u'format': u'-', u'stamp': 1262520969.721, u'source': u'192.168.1.254', u'url': u'http://www.bing.com/favicon.ico', u'_id': ObjectId('4b496ea4b15cb004a6000000'), u'method': u'GET', u'size': 1419.0}

You can also manually iterate through these by calling next()

In [51]: cr.next()

Out[51]:

{u'_id': ObjectId('4b496ea4b15cb004a6000000'),
u'format': u'-', u'method': u'GET', u'size': 1419.0, u'source': u'192.168.1.254', u'squidcode': u'TCP_DENIED/403', u'stamp': 1262520969.721, u'url': u'http://www.bing.com/favicon.ico'}

In [52]: result = cr.next()


Guess what, your limit will still apply so if you want to clear it you can do a cr.rewind() and cr.limit(0) and then you can manually iterate through with cr.next()

Dummies Guide to MongoDB Queries using Squid Logs (JavaScript Shell Edition)

So the MongoDB develop documentation is actually pretty decent, but it doesn't really use examples with real data. For me, it made it more difficult for some of the API and shell commands to sink in.

So to generate some real world queries I created a python script that parsed the access.log file[s] generated by squid. I'll follow this blog with one that covers pymongo but I think this will be helpful, and like most of the posts will provide a good reference because when you are rapidly approaching 40 not only your eyes go, but your memory. So here goes...

First of all this assumes you are running the mongo JavaScript shell and yeah I know running from root is a bad idea and not even necessary (I don't think) but sue me.

root@opti620:~/mongodb# ./bin/mongo
MongoDB shell version: 1.2.1
url: test
connecting to: test
type "help" for help
> show dbs
admin
local
mongosquid
test
> use mongosquid
switched to db mongosquid
> show collections
raw
system.indexes
>

Now let's have some fun. This was actually when I just imported a few lines in from the log file so there are a relatively small number of documents. A collection is essentially like a table but since this is #nosql it really isn't a table. It is just collection of documents. We'll see those next.

> db.raw.find().count()
1029
> db.raw.find()[1029]
> db.raw.find()[1028]
{
"_id" : ObjectId("4b496cddb15cb004a4000404"),
"squidcode" : "TCP_MISS/200",
"source" : "192.168.1.254",
"stamp" : 1263102993.841,
"format" : "-",
"url" : "agmoviecontrol.netflix.com:443",
"method" : "CONNECT",
"size" : 17499
}

The JSON above is the "document." Something you'll notice is there are two different data types basically strings and floating points. The size field and timestamp are obviously floats. That hash looking thing is actually a hash or GUID that is supposedly unique.

So one of the cool built in queries is to return only the unique values for a given field. This is handled by the distinct method.

So we can see here that there were HTTP Posts.

> db.raw.distinct("method")
[ "CONNECT", "GET" ]

And because of my screwed up natting I can't tell which of my kids was going to netflix.

> db.raw.distinct("source")
[ "192.168.1.254" ]

> db.raw.distinct("url")
....
"http://netflix086.as.nflximg.com.edgesuite.net/sa0/725/1985205725.wma/range/9247565-9735184?",
"http://netflix086.as.nflximg.com.edgesuite.net/sa0/725/1985205725.wma/range/9735185-10219794?",
"http://netflix086.as.nflximg.com.edgesuite.net/sa0/725/1985205725.wma/range/985115-1469724?"

So remember when I discussed types above, if we wanted to retrieve all the transactions that were greater than 1MB we could do the following, but there are obviously more to it than that.

> db.raw.find( {size: { $gt:1000000}} )
{ "_id" : ObjectId("4b496cddb15cb004a4000162"), "squidcode" : "TCP_MISS/200", "source" : "192.168.1.254", "stamp" : 1263097489.996, "format" : "-", "url" : "http://netflix086.as.nflximg.com.edgesuite.net/sa0/166/1680180166.wmv/range/143155845-144163844?", "method" : "GET", "size" : 1008478 }
{ "_id" : ObjectId("4b496cddb15cb004a40003b0"), "squidcode" : "TCP_MISS/200", "source" : "192.168.1.254", "stamp" : 1263099100.207, "format" : "-", "url" : "http://netflix086.as.nflximg.com.edgesuite.net/sa0/166/1680180166.wmv/range/400771845-401779844?", "method" : "GET", "size" : 1008478 }

I was pleased to find that you can use regular expressions. The first query tells me there are 3199 documents that have port 443 in them and the 2nd query returns the first document. One of the things I noticed is that retrieving the document based on the "index" is really really slow. But I believe that is because it isn't really an index, but we'll get to them later.

> db.raw.find ( { url: /:443/ }).count()
3199
> db.raw.find ( { url: /:443/ })[0]
{
"_id" : ObjectId("4b496cddb15cb004a4000093"),
"squidcode" : "TCP_MISS/200",
"source" : "192.168.1.254",
"stamp" : 1263096929.091,
"format" : "-",
"url" : "agmoviecontrol.netflix.com:443",
"method" : "CONNECT",
"size" : 96222
}
> db.raw.find ( { url: /:443/ })[0:3]
Sun Jan 10 01:16:11 JS Error: SyntaxError: missing ] in index expression (shell):0

You'll notice that array slices don't work, but they do in Python, obviously which I'll blog on next.