FileDict – a Persistent Dictionary in Python

May 24, 2009

Python’s dictionary is possibly the most useful construct in the language.  And I argue that for some purposes, mapping it to a file (in real-time) can be even more useful.

*** Update ***

There’s a newer and better version of FileDict, containing bugfixes and corrections, many of which are due to comments on this page.
You can read about it (with explanations) in https://erezsh.wordpress.com/2009/05/31/filedict-bug-fixes-and-updates/

Why?

The dictionary resides in memory, and so has three main “faults”:

  1. It only lasts as long as your program does.
  2. It occupies memory that might be useful for other, more commonly accessed, data.
  3. It is limited to how much memory your machine has.

The first can be solved by pickling and unpickling the dictionary, but will not survive an unexpected shutdown (even putting the pickling in a try-finally block won’t protect it against all errors).

FileDict

FileDict is a dictionary interface I wrote, that saves and loads its data from a file using keys. Current version uses Sqlite3 to provide consistency, and as a by-product, acidity.

The result is a dictionary which at all-times exists as a file, has virtually no size limit, and can be accessed by several processes concurrently.

It is meant as a quick-and-simple general-purpose solution. It is rarely the best solution, but it is usually good enough.

Performance obviously cannot compare to the builtin dictionary, but it is reasonable and of low complexity (refer to sqlite for more details on that).

Uses

FileDict can be used for many purposes, including:

  • Saving important data in a convinient manner
  • Managing large amounts of data in dictionary form, without the mess of implementing paging or other complex solutions
  • Communication between processes (sqlite supports multiple connections and implements ACID)

Examples

$ python
>>> import filedict
>>> d=filedict.FileDict(filename="example.dict")
>>> d['bla'] = 10
>>> d[(2,1)] = ['hello', (1,2) ]
-- exit --
$ python
>>> import filedict
>>> d=filedict.FileDict(filename="example.dict")
>>> print d['bla']
10
>>> print d.items()
[['bla', 10], [(2, 1), ['hello', (1, 2)]]]
>>> print dict(d)
{'bla': 10, (2, 1): ['hello', (1, 2)]}
>>> d=filedict.FileDict(filename="try.dict")
>>> with d.batch:  # using .batch suspend commits, making a batch of changes quicker
>>>    for i in range(100000):
>>>            d[i] = i**2
(takes about 8 seconds on my comp)
>>> print len(d)
100000
>>> del d[103]
>>> print len(d)
99999

Limitations

  • All data (keys and values) must be pickle-able
  • Keys must be hashable (perhaps this should be removed by hashing the pickled key)
  • Keys and values are stored as a copy, so changing them after assignment will not update the dictionary.

Source Code

Is availible in here in here

Future

Additions in the future may include:

  • An LRU-cache for fetching entries
  • A storage strategy different than Sqlite

Other suggestions?

Advertisements

Ribbon

March 12, 2009

Most of you may think that Microsoft’s Ribbon is fairly new.
However, the truth is it was already an optional feature in Word 5.5, as the following screenshot plainly demonstrates:

I salute Microsoft for their foresight.


Raphael and a Prophecy

March 7, 2009

More than once I have wasted my time in attempt to introduce graphics into the webpage, some of these attempts are recorded in this very blog: First were my lines in CSS, followed by rotating rectangles. I say “wasted” because I have now found Raphaël, and I’m in love.

Raphaël

What is it? Their site is too modest, claiming “Raphaël is a small JavaScript library that should simplify your work with vector graphics on the web”.
While true, it is also a most ambitious project to change the way the web looks, whether they mean to or not.
All (popular) browsers support vector graphics: IE supports VML, and all other browsers support SVG. By implementing both languages and switching between them, you can create complex vector graphics that work in almost every browser.
This idea was new to me, at least, and I think it’s great. Now Raphaël comes along and does it for you. Nice.

My Prophecy
Vector graphics (VG) will become more popular. As VG libraries mature (and Raphaël is already in good shape), VG on the web would become so easy that anyone could use them, and many would. Soon enough you will have beautiful interactive GUIs. At fiirst snippets and demos, then complete libraries for everyone to use. What more can it do? Maybe interactive animations, maybe just add some spice to websites.
It probably won’t bury Flash, but I expect it would be a competition.


The Evils Of Positional Arguments

January 16, 2009

A few days ago I found myself writing pure ansi C code, after over two years of not touching it at all. In fact, these two years consisted mostly of Python, and a little of C#, both very far from it. While coding away, trying to get re-accustumed to mallocs and frees, return values instead of exceptions and pointer arithmatic, I had to perform a memcpy. Very easy, of course, but I could not for the life of me remember how to call it. I remembered rather vividly that it had 3 arguments, named ‘dst’, ‘src’ and ‘len’, the first two being void* and the latter.. perhaps plain int? But in which order they came, I could not recall. Now, that is not so evil, of course, because a quick look at the C reference refreshed my memory (until next time). This repeated with other functions (such as memset). But only when it started to happen with my own functions, I realized how unnatural it is for me, to remember things by order and not by name. Names allow us to establish a context, to find reasoning. Also, we are (relatively) good at remembering names.

By using position as a way for transferring parameters, we are essentially calling them “one”, “two”, “three” and etc., which are still names, but are devoid of context or meaning, and are the same for every function. It is not only hard to remember the right order; we can endure it (as reality proves), but it makes our code less readable, less natural.

Also, it complicates changing existing APIs. Move an argument from its position, and you have to change the position everywhere in the code. This can easily lead to subtle bugs (consider switching dst and src, it may be very hard to find where this happened!). So if you append parameters, they must come in the end, which somtimes does not make much sense.

While all programming languages (that I know of) sin in this evil of positionality , one device takes it one step ahead: Regular expressions. Specfically, RE Substitution. Not only do you address its matches using numbers, but actually figuring out which number goes to which match can be confounding. I cannot count how many bugs this must have produced.

In conclusion:

Design your systems differently.
Let computers count. Let humans use names!


Rotating Cube in JavaScript

November 25, 2008

My CSS-Parrot gave me an appetite for more in-browser graphics. This, led to my newest creation: A rotating “3D” cube, written with JavaScript.

Check it out here: http://kanelynchsucks.com/sfsr/rot_cube.html

(glory to Tzahi for hosting it)

Edit: The page is currently unavailible. If you’re interested, email me and I’ll send you a copy.

Please note, this cube is not really 3D.  I just hacked it together using a couple of tricks, which I think are rather clever.

My next post (or more) will explain thoroughly how I made the cube and the tricks that I used. But I highly recommend you, the readers of this blog, to try and figure it out yourself. It can be a fun exercise. The actual code is very readable, and the only mystery is – how did I do it?


Pickling Python Expressions

November 12, 2008

My last post introduced the concept of X, a class which “absorbs” operations and behaves like a function.
As many people pointed out, this was merely a syntactic alternative to lambda. You may like it, you may not.
Now, after a rewrite, X can now be pickled. But let me explain first.

Python lambdas cannot be pickled. In fact, python code cannot be pickled.
Pickling an object, aka serializing, is converting the object’s state (that is, its data) to a string, which can at a later time be unpickled to re-create the object with that state. The unpickling process instanciates that class, assuming it has not changed, and updating the new instance’s state to the pickled one. Python code is never stored.

Trying to pickle a class or a function might appear to work, but it does not really pickle it; it simply pickles the reference to it (and its state). Unpickling the string in a new terminal would prove that (as would a quick analysis of resulting string).

Attempts to pickle methods, nested functions or lambdas fail on the spot. That is because a reference to them cannot be kept (Actually, it can be. But they are rather volatile, so it might not be wise). Eventually, python code, or even expressions, cannot be pickled.

This brings me back to X. X allows you to do just that:

>>> expr = 1 + (X + 3) * 4
>>> s = pickle.dumps(expr)

(destory objects, change objects, switch an interpreter, whatever you wish)

>>> expr2 = pickle.loads(s)
>>> expr2(5)
33

By using X, the programmer can blend dynamic code with his data, and still be able to pickle it.
I believe this removes a very big limitation.

Just to be fair, I will note that there is another way to achieve this: Keep your expressions in a string, and eval it when you need it run. I highly recommend not doing it.

X’s new source code (a bit cryptic, but it’s the best I could do. Suggestions for simplification are welcomed) :

"""
x.py

Author: Erez Sh.
Date  : 11/11/2008
"""

import operator

def identity(x):
	return x

class _Return(object):
	"Pickle-able!"
	def __init__(self, value):
		self._value = value

	def __call__(self, *args):
		return self._value

class _Partial(object):
	"Pickle-able!"
	def __init__(self, callable, *args):
		self._callable = callable
		self._args = args

	def __call__(self, *args, **kwargs):
		args = self._args + args
		return self._callable(*args)

class _X(object):
	def __init__(self, func, *args_to_run):
		self.__func = func
		self.__args_to_run = tuple(args_to_run)

	def __getstate__(self):
		return self.__func, self.__args_to_run
	def __setstate__(self, state):
		self.__func, self.__args_to_run = state
	def __reduce__(self):
		#raise Exception("Deprecated!")
		return object.__reduce__(self)

	def __apply_un_func(self, func ):
		return _X(func, _Partial(self))
	def __apply_bin_func(self, func, arg ):
		return _X(func, _Partial(self), _Return(arg))
	def __apply_rbin_func(self, func, arg ):
		return _X(func, _Return(arg), _Partial(self))
	def __apply_multargs_func(self, func, *args ):
		return _X(func, _Partial(self), *map(_Return,args))

	def __call__(self, arg):
		return self.__func(*[x(arg) for x in self.__args_to_run])

	def __getattr__(self, attr):
		return self.__apply_bin_func( getattr, attr )

	def call(self, *args, **kwargs):
		return self.__apply_multargs_func( apply, args, kwargs)

	# Containers
	def __getitem__(self, other):
		return self.__apply_bin_func( operator.getitem, other )
	def __getslice__(self, a,b=None,c=None):
		return self.__apply_bin_func( operator.getslice, other )
	def in_(self, other):
		return self.__apply_bin_func( operator.contains, other )

	# Arith
	def __add__(self, other):
		return self.__apply_bin_func( operator.add, other )
	def __sub__(self, other):
		return self.__apply_bin_func( operator.sub, other )
	def __mul__(self, other):
		return self.__apply_bin_func( operator.mul, other )
	def __div__(self, other):
		return self.__apply_bin_func( operator.div, other )
	def __floordiv__(self, other):
		return self.__apply_bin_func( operator.floordiv, other )
	def __truediv__(self, other):
		return self.__apply_bin_func( operator.truediv, other )
	def __mod__(self, other):
		return self.__apply_bin_func( operator.mod, other )
	def __pow__(self, other):
		return self.__apply_bin_func( operator.pow, other )

	def __radd__(self, other):
		return self.__apply_rbin_func( operator.add, other )
	def __rsub__(self, other):
		return self.__apply_rbin_func( operator.sub, other )
	def __rmul__(self, other):
		return self.__apply_rbin_func( operator.mul, other )
	def __rdiv__(self, other):
		return self.__apply_rbin_func( operator.div, other )
	def __rfloordiv__(self, other):
		return self.__apply_rbin_func( operator.floordiv, other )
	def __rtruediv__(self, other):
		return self.__apply_rbin_func( operator.truediv, other )
	def __rmod__(self, other):
		return self.__apply_rbin_func( operator.mod, other )
	def __rpow__(self, other):
		return self.__apply_rbin_func( operator.pow, other )

	# bitwise
	def __and__(self, other):
		return self.__apply_bin_func( operator.and_, other )
	def __or__(self, other):
		return self.__apply_bin_func( operator.or_, other )
	def __xor__(self, other):
		return self.__apply_bin_func( operator.xor, other )

	def __rand__(self, other):
		return self.__apply_rbin_func( operator.and_, other )
	def __ror__(self, other):
		return self.__apply_rbin_func( operator.or_, other )
	def __rxor__(self, other):
		return self.__apply_rbin_func( operator.xor, other )
	
	def __rshift__(self, other):
		return self.__apply_bin_func( operator.rshift, other )
	def __lshift__(self, other):
		return self.__apply_bin_func( operator.lshift, other )

	# Comparison
	def __lt__(self, other):
		return self.__apply_bin_func( operator.lt, other )
	def __le__(self, other):
		return self.__apply_bin_func( operator.le, other )
	def __eq__(self, other):
		return self.__apply_bin_func( operator.eq, other )
	def __ne__(self, other):
		return self.__apply_bin_func( operator.ne, other )
	def __ge__(self, other):
		return self.__apply_bin_func( operator.ge, other )
	def __gt__(self, other):
		return self.__apply_bin_func( operator.gt, other )

	def __abs__(self):
		return self.__apply_un_func( abs )
	def __neg__(self):
		return self.__apply_un_func( operator.neg )
	

X = _X(identity, identity)

Fun While Avoiding Lambda (Python)

November 1, 2008

Readers, meet X. X is a class I wrote in Python as an alternative to using lambda. It has two main features:

  1. It acts as an identity function ( so X(3) == 3, etc. )
  2. When performing operations on it, it returns a new class that acts as a corresponding function.

Let me explain. Doing X+2 will return a new class that whenever called with an argument, will return that argument added with 2. So:

>>> map( X+2, [1, 2, 3] )
[3, 4, 5]

>>> filter( X>0, [5, -3, 2, -1, 0, 13] )
[5, 2, 13]

>>> l = ["oh", "brave", "new", "world"]
>>> sorted(l,key=X[-1])
['world', 'brave', 'oh', 'new']

These operations can be chained:

>>> map(2**(X+1), range(10))
[2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]

>>> map( "P" + X[3:]*2 + "!", ["Hello", "Suzzy"] )
['Plolo!', 'Pzyzy!']

Caveats

X has a few limitations.

  • Using X twice in the same expression probably won’t work (this can be solved)
  • Since calling X evaluates it, it can’t emulate method calls. For that you have to do use call, like: X.upper.call() (‘hello’) –> ‘HELLO’.
  • Not all operations can be “captured”. For example, the “in” operator. For that you have to use X.in_( … ), like: X.in_(range(10)) (5) –> True
  • Not all attributes will be accessible
  • More problems? Likely.

Conclusion

While not innovative nor a complete solution, I believe X can be a useful replacement for some uses of anonymous functions, providing a shorter and simpler syntax which is easier to read and understand.

It is provided here in full, in hope that it will be useful to my readers (Improvements and fixes are welcome):

class _X(object):
	def __init__(self, func):
		self.__func = func

	def __call__(self, arg):
		return self.__func(arg)

	def __getattr__(self, attr):
		return _X(lambda x: getattr(self(x), attr))
	def call(self, *args, **kwargs):
		return _X(lambda x: self(x)(*args,**kwargs))

	# Containers
	def __getitem__(self, other):
		return _X(lambda x: self(x)[other])
	def __getslice__(self, a,b=None,c=None):
		return _X(lambda x: self(x)[a:b:c])
	def in_(self, other):
		return _X(lambda x: self(x) in other)

	# Arith
	def __add__(self, other):
		return _X(lambda x: self(x) + other)
	def __sub__(self, other):
		return _X(lambda x: self(x) - other)
	def __mul__(self, other):
		return _X(lambda x: self(x) * other)
	def __div__(self, other):
		return _X(lambda x: self(x) / other)
	def __floordiv__(self, other):
		return _X(lambda x: self(x) // other)
	def __mod__(self, other):
		return _X(lambda x: self(x) % other)
	def __pow__(self, other):
		return _X(lambda x: self(x) ** other)

	def __radd__(self, other):
		return _X(lambda x: other + self(x))
	def __rsub__(self, other):
		return _X(lambda x: other - self(x))
	def __rmul__(self, other):
		return _X(lambda x: other * self(x))
	def __rdiv__(self, other):
		return _X(lambda x: other / self(x))
	def __rfloordiv__(self, other):
		return _X(lambda x: other // self(x))
	def __rmod__(self, other):
		return _X(lambda x: other % self(x))
	def __rpow__(self, other):
		return _X(lambda x: other ** self(x))

	# bitwise
	def __and__(self, other):
		return _X(lambda x: self(x) & other)
	def __or__(self, other):
		return _X(lambda x: self(x) | other)
	def __xor__(self, other):
		return _X(lambda x: self(x) ^ other)

	def __rand__(self, other):
		return _X(lambda x: other & self(x))
	def __ror__(self, other):
		return _X(lambda x: other | self(x))
	def __rxor__(self, other):
		return _X(lambda x: other ^ self(x))

	def __rshift__(self, other):
		return _X(lambda x: self(x) >> other)
	def __lshift__(self, other):
		return _X(lambda x: self(x) << other)

	# Comparison
	def __lt__(self, other):
		return _X(lambda x: self(x) < other)
	def __le__(self, other):
		return _X(lambda x: self(x) <= other)
	def __eq__(self, other):
		return _X(lambda x: self(x) == other)
	def __ne__(self, other):
		return _X(lambda x: self(x) != other)
	def __ge__(self, other):
		return _X(lambda x: self(x) >= other)
	def __gt__(self, other):
		return _X(lambda x: self(x) > other)

	def __abs__(self):
		return _X(lambda x: abs(self(x)))

X = _X(lambda x:x)

Put it in x.py and import as:
from x import X