[ start | index | login ]
start > 2009-10-01 > 1

2009-10-01 #1

Created by arte. Last edited by arte, 161 days ago. Viewed 215 times. #3
[diff] [history] [edit] [rdf]
labels
attachments
mongo.test.count.png (606253)
mongo.test.count.small.png (42599)

NoSQL: MongoDB performance testing (part 2: counting)…

After my insert tests last time I decided to look at some count queries as we do count a lot at >>twimpact.com. As a first result I can say that without any index count makes no sense with a database of this size.

I have used the database left over from my last insert test and added a few indexes which takes around 30-40 minutes per index. I did not check in more detail about the time it takes as we tend to create the index while working on the database anyway.

Now for todays results. The queries are quite simple, but in our case practical. I get a cursor for 1.000.000 documents as a result of a simple query and count the amount of documents that have the value of one of the documents properties:

def cursor = db.find().limit(1000000)
// alternative: query one of the indexed properties
// def cursor = db.find(new BasicDBObject("property", new BasicDBObject("\$ne", null))).limit(1000000)

cursor.each { doc -> def value = doc.get("property") def count = db.getCount(new BasicDBObject("property", value)) }

MongoDB query test (click to enlarge)

The time was taken for each of the "db.getCount()" calls and it turns out that around 40-50% of all queries result in negligible query time (< 1ms) which is the smallest time frame I can measure right now. This needs to be taking into account when evaluating the graphs as they only show the queries with at least 1ms duration (log scale plot).

In the plot you see query time versus the result of getCount(). As expected higher counts may take longer,

Some explanation is necessary for the plots. random means that I get some documents and count one of the properties (the same for all documents). I do not know the order in which the documents come, so they are unrelated to the property I am counting. correlated is the counting if I query the documents using an index and the count the property that was indexed. The assumption here was that it might be easier for the database to count all documents having a certain property value if I previously queried all documents having a non-null property value.

This holds true for the long index but not for the string index. The latter behaves about the same as my random counts.

The results show that count queries are very fast, but only if indexed.

What we also need for >>twimpact.com are some more advanced queries. I assume that the results for those also depend on how we design our documents to fit our needs. The design will take some time and I will get back with results of design and advanced queries at a later date.

2 comments (by rbtst) | post comment
[subscribe to thinkberg]

    Logged in Users: (0)
    … and 2 Guests.
    14 users and 274 docs.
    Emerged 6 years and 82 days ago

    Connections:
    >>Stephans Blog
    >>USA Erklärt
    >>DUHBLOG
    >>Der König
    >>drrockit.com
    >>sofa. rites de passage
    >>langreiter.com
    >>henso.com

    Current Gaming:
    New Super Mario Bros. Dr. Kawashima's Brain Training

    Ohloh profile for Matthias L. Jugel

    < March 2010 >
    SunMonTueWedThuFriSat
    123456
    78910111213
    14151617181920
    21222324252627
    28293031

    Portlet 1
    thinkberg
    subconscious opinions
    Copyright © 2005-2008 Matthias L. Jugel | SnipSnap 1.0b3-uttoxeter