FileDict – bug-fixes and updates

In my previous post I introduced FileDict. I did my best to get it right the first time, but as we all know, this is impossible for any non-trivial piece of code.
I want to thank everyone for their comments and remarks. It’s been very helpful.

The Unreliable Pickle

A special thanks goes to the mysterious commenter “R”, for pointing out that pickling identical objects may produce different strings (!), which are therefor inadequate to be used as keys. And my FileDict indeed suffered from this bug, as this example shows:

>>> key = (1, u'foo')
>>> d[(1, u'foo')] = 4
>>> d[(1, u'foo')]
>>> d[key]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "", line 64, in __getitem__
    raise KeyError(key)
KeyError: (1, u'foo')

And if that’s not bad enough:

>>> d[key] = 5
>>> list(d.items())
[['a', 3], [(1, 2), 3], [(1, u'foo'), 4], [(1, u'foo'), 5]]

I’ve rewritten the entire storing mechanism to poll only on hash and compare keys after unpickling. This may be a bit slower, but I don’t (and shouldn’t) expect many colliding hashes anyway.
Bug is fixed.


Under popular demand, I’m now inheriting from DictMixin. It’s made my code a bit shorter, and was not at all painful.

Copy and Close

I no longer close the database on __del__, and instead I rely on the garbage collector. It seems to close the database on time, and it allows to one copy the dictionary (which, of course, will all be always have the same keys, but doesn’t have to have the same behavior or attributes).

New Source Code

Is available here

  2. I have a web scraping and analysis project for which I’ve been using compressed pickles. As I’ve gotten more ambitious, the memory usage has increased beyond what I can support, so I thought, I’d get to try shelve! Unfortunately, my keys are unicode, so folks recommended sqllite. Never used that though, and I really want is a dict. So I’m happy to find filedict. I converted my dicts from pickles to filedicts and it seems to be working! I’ll have to figure out about file size compression on my own though. In any case, thank you.

  3. You may wish to warn people that your filedict does not behave like a dict with respect to values that are mutable types.

    In [3]: td = {}

    In [7]: a = [1,2]

    In [8]: td[‘a’] = a

    In [9]: td
    Out[9]: {‘a’: [1, 2]}

    In [10]: a.append(3)

    In [11]: a
    Out[11]: [1, 2, 3]

    In [12]: td
    Out[12]: {‘a’: [1, 2, 3]}

    In [14]: from filedict import FileDict

    In [15]: tfd = FileDict(filename = ‘test.db’)

    In [16]: a = [1,2]

    In [17]: tfd[‘a’] = a

    In [18]: tfd
    Out[18]: {‘a’: [1, 2]}

    In [19]: a.append(3)

    In [20]: a
    Out[20]: [1, 2, 3]

    In [21]: tfd
    Out[21]: {‘a’: [1, 2]}

    • erezsh says:

      Hi Joseph,

      I’m glad that you found FileDict useful.

      I’m sorry if its behavior confused you. FileDict stores a copy of the (keys and) values, and not the actual values, so changes to these values don’t affect the copy. In this regard, shelve behaves the same way.

      It is always a humbling life lesson that what it obvious to me, isn’t obvious to others, and vice-versa. I’ll add a note about this in the original post.

  4. Matteo says:

    Hello erezsh,

    I have modified your script a bit (to make it more shelve api compatible). I’d like to publish it, but I’d like to know what kind of license (MIT/BSD/Python?) you are using to do it in the proper way 🙂

    A persistent (sqlite3) python dictionary
    Based on:
    Author: Erez Shinan
    Date: 31-May-2009
    Copyright 2010 Matteo Bertini <>
    Python Software Foundation License (PSFL)
    import UserDict
    import cPickle as pickle
    import sqlite3
    class SqliteDict(UserDict.DictMixin):
    "A dictionary that stores its data persistently in a database"
    def __init__(self, filename, flag='c', protocol=1, writeback=False):
    self.filename = filename
    self.flag = flag
    self.protocol = protocol
    self.writeback = writeback
    # flag as in
    if flag in ('r', 'w'):
    if not os.path.exists(filename):
    raise IOError("File {0!r} missing!".format(filename))
    sqlite3.register_converter("PICKLE", self._loads)
    self._conn = sqlite3.connect(filename, detect_types=sqlite3.PARSE_DECLTYPES)
    if flag == 'n':
    if flag in ('n', 'c'):
    def _dbdrop(self):
    self._conn.execute("DROP TABLE IF EXISTS dict;")
    def _dbcreate(self):
    self._conn.execute("""CREATE TABLE IF NOT EXISTS dict (idx INTEGER PRIMARY KEY,
    value PICKLE);""")
    self._conn.execute("CREATE INDEX IF NOT EXISTS dict_index ON dict(key);")
    def _commit(self):
    if self.writeback:
    def _dumps(self, value):
    return buffer(pickle.dumps(value, self.protocol))
    def _loads(self, blob):
    return pickle.loads(blob)
    def __getitem__(self, key):
    cursor = self._conn.execute("SELECT value FROM dict WHERE key=?;", (key,))
    for (value,) in cursor:
    return value
    raise KeyError(key)
    def _setitems(self, items):
    parameters = ((key, self._dumps(value)) for key, value in items)
    self._conn.executemany("INSERT OR REPLACE INTO dict (key, value) values (?, ?);",
    def __setitem__(self, key, value):
    self._setitems([(key, value)])
    def __delitem__(self, key):
    cursor = self._conn.execute("DELETE FROM dict WHERE key=?;", (key,))
    if cursor.rowcount <= 0:
    raise KeyError(key)
    def update(self, d):
    def iterkeys(self):
    return self._conn.execute("SELECT key FROM dict;")
    def itervalues(self):
    return self._conn.execute("SELECT value FROM dict;")
    def iteritems(self):
    return self._conn.execute("SELECT key, value FROM dict;")
    def __iter__(self):
    return self.iterkeys()
    def keys(self):
    return list(self.iterkeys())
    def values(self):
    return list(self.itervalues())
    def items(self):
    return list(self.iteritems())
    def __contains__(self, key):
    return True
    except KeyError:
    return False
    def __len__(self):
    return self._conn.execute("SELECT COUNT(*) FROM dict;").fetchone()[0]
    def close(self):
    def sync(self):
    def __del__(self):
    def open(filename, flag='c', protocol=1, writeback=False):
    return SqliteDict(filename, flag, protocol, writeback)

    • erezsh says:

      Hi Matteo,

      I didn’t pick a license for this code, and I’m fine with any of the three you mentioned. Please let me know when you’ve published it 🙂

