Bitcask ok for simple and high performant file store?

I am looking for a simple way to store and retrieve millions of xml files. Currently everything is done in a filesystem, which has some performance issues.

Our requirements are:

  1. Ability to store millions of xml-files in a batch-process. XML files may be up to a few megs large, most in the 100KB-range.
  2. Very fast random lookup by id (e.g. document URL)
  3. Accessible by both Java and Perl
  4. Available on the most important Linux-Distros and Windows

I did have a look at several NoSQL-Platforms (e.g. CouchDB, Riak and others), and while those systems look great, they seem almost like beeing overkill:

  1. No clustering required
  2. No daemon ("service") required
  3. No clever search functionality required

Having delved deeper into Riak, I have found Bitcask (see intro), which seems like exactly what I want. The basics described in the intro are really intriguing. But unfortunately there is no means to access a bitcask repo via java (or is there?)

Soo my question boils down to

  • is the following assumption right: the Bitcask model (append-only writes, in-memory key management) is the right way to store/retrieve millions of documents
  • are there any viable alternatives to Bitcask available via Java? (BerkleyDB comes to mind...)
  • (for riak specialists) Is Riak much overhead implementation/management/resource wise compared to "naked" Bitcask?

6
задан KoW 15 May 2011 в 13:09
поделиться