Support this blog. Buy from Amazon!
2007-09-27
Berkeley DB vs SQLite
Ok, this is my attempt to make this blog a little bit more "educational."
In my latest project, like most projects, I need a library to quickly store large amount of data and perform some basic queries on the data. Since I want to implement the prototype quickly and make it easy to send the prototype to other people, I decided against using any "real" database. I see two options: Berkeley DB or SQLite. Since SQLite is like new and cool and allows me to dust my SQL, I decided to use it.
Obviously, the rest of this entry is about why it was a stupid choice.
I need the library to map some ID number and hash values in both direction. Doing it fast in SQL engines requires the use of indices on both fields. In SQLite, the only supported index is BTree, which means that it would needlessly sort the millions of hash values...
A quick reimplementation in Berkeley DB with hash based indexing reduced the running time from 10 hours into 3.
In my latest project, like most projects, I need a library to quickly store large amount of data and perform some basic queries on the data. Since I want to implement the prototype quickly and make it easy to send the prototype to other people, I decided against using any "real" database. I see two options: Berkeley DB or SQLite. Since SQLite is like new and cool and allows me to dust my SQL, I decided to use it.
Obviously, the rest of this entry is about why it was a stupid choice.
I need the library to map some ID number and hash values in both direction. Doing it fast in SQL engines requires the use of indices on both fields. In SQLite, the only supported index is BTree, which means that it would needlessly sort the millions of hash values...
A quick reimplementation in Berkeley DB with hash based indexing reduced the running time from 10 hours into 3.
Labels: work