In [3]: c = pymongo.Connection("192.168.169.62")
In [4]: db = c.mongosquid
In [5]: raw = db.raw In [6]: raw
Out[6]: Collection(Database(Connection('192.168.169.62', 27017), u'mongosquid'), u'raw')
We could have also referred to our collection as db["raw"] or db[coll] if you needed to define the collection in a variable.
In [7]: raw.count()
Out[7]: 205339
You can find out the methods that belong to the database with the collection_names() method.
In [40]: db.collection_names()
Out[40]: [u'raw', u'system.indexes']
The find_one() method allows you to quickly inspect your collection and take a peek at a sample document.
In [10]: raw.find_one()
Out[10]:
{u'_id': ObjectId('4b496cddb15cb004a4000000'), u'format': u'-', u'method': u'GET', u'size': 824477.0, u'source': u'192.168.1.254', u'squidcode': u'TCP_MISS/200', u'stamp': 1263096815.7609999, u'url': u'http://netflix086.as.nflximg.com.edgesuite.net/sa0/166/1680180166.wmv/range/660083845-660907844?'}
The distinct() method does have some limitations, as I discovered the hard way, as you an see from this exception.
In [13]: raw.distinct("stamp") --------------------------------------------------------------------------- OperationFailure Traceback (most recent call last) /root/
OperationFailure: command SON([('distinct', u'raw'), ('key', 'stamp')]) failed: assertion: distinct too big, 4mb cap
So in my previous blog (using JavaScript) I introduced queries but you really can't do anything useful without using a cursor. If you've ever done any MySQL coding before you should be familiar with the concept. Basically it allows you to iterate through the results of a query.
Here we have the same expressions but you obviously need to quote the gt in Python.
In [29]: c = raw.find( {'stamp': { "$gt": 1263096815 }})
In [31]: c.count()
Out[31]: 2060
and
In [23]: c = raw.find({'squidcode':'TCP_DENIED/403'})
In [24]: c.count()
Out[24]: 2999
For the sake of this exercise, we only want to see 3 results so we call the limit() method.
In [26]: c.limit(3)
Out[26]:
Now we can iterate through the results of our query.
In [27]: for e in c:
....: print e
....:
....:
{u'squidcode': u'TCP_DENIED/403', u'format': u'-', u'stamp': 1262520969.721, u'source': u'192.168.1.254', u'url': u'http://www.bing.com/favicon.ico', u'_id': ObjectId('4b496ea4b15cb004a6000000'), u'method': u'GET', u'size': 1419.0}
{u'squidcode': u'TCP_DENIED/403', u'format': u'-', u'stamp': 1262521126.928, u'source': u'192.168.1.254', u'url': u'http://www.msn.com/', u'_id': ObjectId('4b496ea4b15cb004a600003e'), u'method': u'GET', u'size': 1395.0}
{u'squidcode': u'TCP_DENIED/403', u'format': u'-', u'stamp': 1262521127.654, u'source': u'192.168.1.254', u'url': u'http://www.bing.com/favicon.ico', u'_id': ObjectId('4b496ea4b15cb004a600003f'), u'method': u'GET', u'size': 1419.0}
So if we try again, what happens?
In [28]: for e in c:
print e
....:
....:
Nada. We have to rewind the cursor object to be able iterate again.
In [30]: c.rewind()
Out[30]:
print e ....: ....:
{u'squidcode': u'TCP_DENIED/403', u'format': u'-', u'stamp': 1262520969.721, u'source': u'192.168.1.254', u'url': u'http://www.bing.com/favicon.ico', u'_id': ObjectId('4b496ea4b15cb004a6000000'), u'method': u'GET', u'size': 1419.0}
You can also manually iterate through these by calling next()
In [51]: cr.next()
Out[51]:
{u'_id': ObjectId('4b496ea4b15cb004a6000000'), u'format': u'-', u'method': u'GET', u'size': 1419.0, u'source': u'192.168.1.254', u'squidcode': u'TCP_DENIED/403', u'stamp': 1262520969.721, u'url': u'http://www.bing.com/favicon.ico'}
In [52]: result = cr.next()
Guess what, your limit will still apply so if you want to clear it you can do a cr.rewind() and cr.limit(0) and then you can manually iterate through with cr.next()
No comments:
Post a Comment