PySnippets – improving code reuse

June 2, 2009

For a long time now, I’ve been hindered by the issue of utilities, or snippets. These are convenience functions and classes that are too small or too incomplete to justify a library, yet are useful enough to be used.
I’ve posted a few on my blog: Namespaces, X and now FileDict. Others I didn’t post, and include a priority queue, an A* implementation, a lazy-list, an LRU memoizer, etc. You probably have a few of those. I know because I see them on snippet sites.

However, I rarely actually use these in my code. I really want to. But once your code spans more than one file, you usually need to make a proper installation, or at least trouble your “users” a bit more. Usually saving a few lines just isn’t worth the trouble. Copy-pasting the snippet into the file is sometimes the solution, but it really pains me that I’ll have to re-paste it every time I improve my snippet.

I’m sure some of you looked at my snippets, or other people’s, thought “cool”, but never used them, simply because it was too much trouble.

Paradoxically, this is especially true when writing snippets. They are just one small file, and using another snippet would probably make them too hard to distribute. This is a magic-circle, for-ever limiting our snippets to a low level of sophistication, and discouraging re-use.

I want to break that circle. I want to create an economy of snippets, increasingly building on each other, eventually creating a “standard library” of their own. But how would one do that? I have one suggestion, along with a proof-of-concept, which I will present here.

PySnippets

PySnippets is my attempt of making snippets usable. It’s comprised of two solutions – a server and a client.

  1. Server - A website for uploading snippets. Simple enough. You can rate them, tag them, discuss them, offer some documentation and of-course post newer versions.
  2. Client - A python module that automagically imports snippets from the web. Essentially, it downloads the snippets you request to a cache, imports them if they’re already there, and periodically searches for updates.

The server is structured in a predictable way, so that the client knows how to fetch a snippet just by its name.

The Client

Here’s a usage example with my current client implementation, I creatively call “snippets”:

import snippets
antigravity = snippets.get('antigravity')  # "snippet-import"
antigravity.start(mode='xkcd')

Easy as that!

The snippets.get function looks for the module in the local snippets-cache. If it’s there, get just imports it and returns the module. If it’s not, it queries the server for a snippet called “antigravity” (names are unique), stores it in the cache, and the imports it. What the user notices is a 2-second pause the first time he ever imports that snippet, and nothing else from then on.

You can specify to download a specific version, like this:

filedict = snippets.get('filedict', version=0.1)

Auto-Updating Snippets

The current implementation also includes an “auto-update” feature: Periodically, before importing a module, the client surveys the server for a newer version of it. If a newer version exists, it downloads it to the cache and continues with the import.

Auto-updates can be disabled in a parameter to get.

The Server

The server is yet another service to upload snippets, however it has a slightly unusual design (which no other snippet site I know of has):

  • A URL to a snippet is easy to deduce given its name.
  • There is a conscious (though simple) support for versions.
  • To increase reliability and trust (more on that later), uploaded snippets cannot be altered (but a new version can be issued)

Since I know very little about administration and server-maintenance, I chose wikidot.com to host my POC web-site. They have an elaborate support for permissions and most of the features I need, such as the ability to rate, tag and discuss snippets.

Trust

Perhaps the biggest issue with such a system is trust. Since you’re running code which resides online, you have to trust me not to maliciously alter the snippets, and also you have to trust the author of the snippet not to do so.

As a partial solution, uploaded files cannot be altered: Not edited, nor renamed, nor deleted, etc. So if specify a particular snippet version, it is guaranteed that it will never change (I may commit changes by request, but I will audit them myself).
If you decide to use the latest version of a snippet (that is, not specify a version), please make sure you trust its author.

Perhaps higher-ups in the Python community would like to take some sponsorship of the project, removing the remaining trust-issues with the administrator (that’s me).

Implications

  • To distribute your snippets, all you need is for the reciever to have an internet connection, and the snippets client.
  • If you’re sending someone code, you can attach the client (it’s rather small, too), and just import away. The reciever will benefit from improvements and bugfixes to your snippets.
  • You can use other people’s snippets just as easily, as long as you trust them.
  • Snippets can now build on each other without worrying too much.

What if my user is offline?

Then probably PySnippets isn’t for him.

However, I do have some ideas, and might implement them if there is sufficient demand.

Afterword

PySnippets is my humble attempt at solving the utility/snippet reuse problems. I hope you like it and find it useful.

Please try it!


FileDict – bug-fixes and updates

May 31, 2009

In my previous post I introduced FileDict. I did my best to get it right the first time, but as we all know, this is impossible for any non-trivial piece of code.
I want to thank everyone for their comments and remarks. It’s been very helpful.

The Unreliable Pickle

A special thanks goes to the mysterious commenter “R”, for pointing out that pickling identical objects may produce different strings (!), which are therefor inadequate to be used as keys. And my FileDict indeed suffered from this bug, as this example shows:

>>> key = (1, u'foo')
>>> d[(1, u'foo')] = 4
>>> d[(1, u'foo')]
4
>>> d[key]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "filedict.py", line 64, in __getitem__
    raise KeyError(key)
KeyError: (1, u'foo')

And if that’s not bad enough:

>>> d[key] = 5
>>> list(d.items())
[['a', 3], [(1, 2), 3], [(1, u'foo'), 4], [(1, u'foo'), 5]]

Ouch.
I’ve rewritten the entire storing mechanism to poll only on hash and compare keys after unpickling. This may be a bit slower, but I don’t (and shouldn’t) expect many colliding hashes anyway.
Bug is fixed.

DictMixin

Under popular demand, I’m now inheriting from DictMixin. It’s made my code a bit shorter, and was not at all painful.

Copy and Close

I no longer close the database on __del__, and instead I rely on the garbage collector. It seems to close the database on time, and it allows to one copy the dictionary (which, of course, will all be always have the same keys, but doesn’t have to have the same behavior or attributes).

New Source Code

Is available here


FileDict – a Persistent Dictionary in Python

May 24, 2009

Python’s dictionary is possibly the most useful construct in the language.  And I argue that for some purposes, mapping it to a file (in real-time) can be even more useful.

Why?

The dictionary resides in memory, and so has three main “faults”:

  1. It only lasts as long as your program does.
  2. It occupies memory that might be useful for other, more commonly accessed, data.
  3. It is limited to how much memory your machine has.

The first can be solved by pickling and unpickling the dictionary, but will not survive an unexpected shutdown (even putting the pickling in a try-finally block won’t protect it against all errors).

FileDict

FileDict is a dictionary interface I wrote, that saves and loads its data from a file using keys. Current version uses Sqlite3 to provide consistency, and as a by-product, acidity.

The result is a dictionary which at all-times exists as a file, has virtually no size limit, and can be accessed by several processes concurrently.

It is meant as a quick-and-simple general-purpose solution. It is rarely the best solution, but it is usually good enough.

Performance obviously cannot compare to the builtin dictionary, but it is reasonable and of low complexity (refer to sqlite for more details on that).

Uses

FileDict can be used for many purposes, including:

  • Saving important data in a convinient manner
  • Managing large amounts of data in dictionary form, without the mess of implementing paging or other complex solutions
  • Communication between processes (sqlite supports multiple connections and implements ACID)

Examples

$ python
>>> import filedict
>>> d=filedict.FileDict(filename="example.dict")
>>> d['bla'] = 10
>>> d[(2,1)] = ['hello', (1,2) ]
-- exit --
$ python
>>> import filedict
>>> d=filedict.FileDict(filename="example.dict")
>>> print d['bla']
10
>>> print d.items()
[['bla', 10], [(2, 1), ['hello', (1, 2)]]]
>>> print dict(d)
{'bla': 10, (2, 1): ['hello', (1, 2)]}
>>> d=filedict.FileDict(filename="try.dict")
>>> with d.batch:  # using .batch suspend commits, making a batch of changes quicker
>>>    for i in range(100000):
>>>            d[i] = i**2
(takes about 8 seconds on my comp)
>>> print len(d)
100000
>>> del d[103]
>>> print len(d)
99999

Limitations

  • All data (keys and values) must be pickle-able
  • Keys must be hashable (perhaps this should be removed by hashing the pickled key)

Source Code

Is availible in here

Future

Additions in the future may include:

  • An LRU-cache for fetching entries
  • A storage strategy different than Sqlite

Other suggestions?


Pickling Python Expressions

November 12, 2008

My last post introduced the concept of X, a class which “absorbs” operations and behaves like a function.
As many people pointed out, this was merely a syntactic alternative to lambda. You may like it, you may not.
Now, after a rewrite, X can now be pickled. But let me explain first.

Python lambdas cannot be pickled. In fact, python code cannot be pickled.
Pickling an object, aka serializing, is converting the object’s state (that is, its data) to a string, which can at a later time be unpickled to re-create the object with that state. The unpickling process instanciates that class, assuming it has not changed, and updating the new instance’s state to the pickled one. Python code is never stored.

Trying to pickle a class or a function might appear to work, but it does not really pickle it; it simply pickles the reference to it (and its state). Unpickling the string in a new terminal would prove that (as would a quick analysis of resulting string).

Attempts to pickle methods, nested functions or lambdas fail on the spot. That is because a reference to them cannot be kept (Actually, it can be. But they are rather volatile, so it might not be wise). Eventually, python code, or even expressions, cannot be pickled.

This brings me back to X. X allows you to do just that:

>>> expr = 1 + (X + 3) * 4
>>> s = pickle.dumps(expr)

(destory objects, change objects, switch an interpreter, whatever you wish)

>>> expr2 = pickle.loads(s)
>>> expr2(5)
33

By using X, the programmer can blend dynamic code with his data, and still be able to pickle it.
I believe this removes a very big limitation.

Just to be fair, I will note that there is another way to achieve this: Keep your expressions in a string, and eval it when you need it run. I highly recommend not doing it.

X’s new source code (a bit cryptic, but it’s the best I could do. Suggestions for simplification are welcomed) :

"""
x.py

Author: Erez Sh.
Date  : 11/11/2008
"""

import operator

def identity(x):
	return x

class _Return(object):
	"Pickle-able!"
	def __init__(self, value):
		self._value = value

	def __call__(self, *args):
		return self._value

class _Partial(object):
	"Pickle-able!"
	def __init__(self, callable, *args):
		self._callable = callable
		self._args = args

	def __call__(self, *args, **kwargs):
		args = self._args + args
		return self._callable(*args)

class _X(object):
	def __init__(self, func, *args_to_run):
		self.__func = func
		self.__args_to_run = tuple(args_to_run)

	def __getstate__(self):
		return self.__func, self.__args_to_run
	def __setstate__(self, state):
		self.__func, self.__args_to_run = state
	def __reduce__(self):
		#raise Exception("Deprecated!")
		return object.__reduce__(self)

	def __apply_un_func(self, func ):
		return _X(func, _Partial(self))
	def __apply_bin_func(self, func, arg ):
		return _X(func, _Partial(self), _Return(arg))
	def __apply_rbin_func(self, func, arg ):
		return _X(func, _Return(arg), _Partial(self))
	def __apply_multargs_func(self, func, *args ):
		return _X(func, _Partial(self), *map(_Return,args))

	def __call__(self, arg):
		return self.__func(*[x(arg) for x in self.__args_to_run])

	def __getattr__(self, attr):
		return self.__apply_bin_func( getattr, attr )

	def call(self, *args, **kwargs):
		return self.__apply_multargs_func( apply, args, kwargs)

	# Containers
	def __getitem__(self, other):
		return self.__apply_bin_func( operator.getitem, other )
	def __getslice__(self, a,b=None,c=None):
		return self.__apply_bin_func( operator.getslice, other )
	def in_(self, other):
		return self.__apply_bin_func( operator.contains, other )

	# Arith
	def __add__(self, other):
		return self.__apply_bin_func( operator.add, other )
	def __sub__(self, other):
		return self.__apply_bin_func( operator.sub, other )
	def __mul__(self, other):
		return self.__apply_bin_func( operator.mul, other )
	def __div__(self, other):
		return self.__apply_bin_func( operator.div, other )
	def __floordiv__(self, other):
		return self.__apply_bin_func( operator.floordiv, other )
	def __truediv__(self, other):
		return self.__apply_bin_func( operator.truediv, other )
	def __mod__(self, other):
		return self.__apply_bin_func( operator.mod, other )
	def __pow__(self, other):
		return self.__apply_bin_func( operator.pow, other )

	def __radd__(self, other):
		return self.__apply_rbin_func( operator.add, other )
	def __rsub__(self, other):
		return self.__apply_rbin_func( operator.sub, other )
	def __rmul__(self, other):
		return self.__apply_rbin_func( operator.mul, other )
	def __rdiv__(self, other):
		return self.__apply_rbin_func( operator.div, other )
	def __rfloordiv__(self, other):
		return self.__apply_rbin_func( operator.floordiv, other )
	def __rtruediv__(self, other):
		return self.__apply_rbin_func( operator.truediv, other )
	def __rmod__(self, other):
		return self.__apply_rbin_func( operator.mod, other )
	def __rpow__(self, other):
		return self.__apply_rbin_func( operator.pow, other )

	# bitwise
	def __and__(self, other):
		return self.__apply_bin_func( operator.and_, other )
	def __or__(self, other):
		return self.__apply_bin_func( operator.or_, other )
	def __xor__(self, other):
		return self.__apply_bin_func( operator.xor, other )

	def __rand__(self, other):
		return self.__apply_rbin_func( operator.and_, other )
	def __ror__(self, other):
		return self.__apply_rbin_func( operator.or_, other )
	def __rxor__(self, other):
		return self.__apply_rbin_func( operator.xor, other )

	def __rshift__(self, other):
		return self.__apply_bin_func( operator.rshift, other )
	def __lshift__(self, other):
		return self.__apply_bin_func( operator.lshift, other )

	# Comparison
	def __lt__(self, other):
		return self.__apply_bin_func( operator.lt, other )
	def __le__(self, other):
		return self.__apply_bin_func( operator.le, other )
	def __eq__(self, other):
		return self.__apply_bin_func( operator.eq, other )
	def __ne__(self, other):
		return self.__apply_bin_func( operator.ne, other )
	def __ge__(self, other):
		return self.__apply_bin_func( operator.ge, other )
	def __gt__(self, other):
		return self.__apply_bin_func( operator.gt, other )

	def __abs__(self):
		return self.__apply_un_func( abs )
	def __neg__(self):
		return self.__apply_un_func( operator.neg )

X = _X(identity, identity)

Fun While Avoiding Lambda (Python)

November 1, 2008

Readers, meet X. X is a class I wrote in Python as an alternative to using lambda. It has two main features:

  1. It acts as an identity function ( so X(3) == 3, etc. )
  2. When performing operations on it, it returns a new class that acts as a corresponding function.

Let me explain. Doing X+2 will return a new class that whenever called with an argument, will return that argument added with 2. So:

>>> map( X+2, [1, 2, 3] )
[3, 4, 5]

>>> filter( X>0, [5, -3, 2, -1, 0, 13] )
[5, 2, 13]

>>> l = ["oh", "brave", "new", "world"]
>>> sorted(l,key=X[-1])
['world', 'brave', 'oh', 'new']

These operations can be chained:

>>> map(2**(X+1), range(10))
[2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]

>>> map( "P" + X[3:]*2 + "!", ["Hello", "Suzzy"] )
['Plolo!', 'Pzyzy!']

Caveats

X has a few limitations.

  • Using X twice in the same expression probably won’t work (this can be solved)
  • Since calling X evaluates it, it can’t emulate method calls. For that you have to do use call, like: X.upper.call() (‘hello’) –> ‘HELLO’.
  • Not all operations can be “captured”. For example, the “in” operator. For that you have to use X.in_( … ), like: X.in_(range(10)) (5) –> True
  • Not all attributes will be accessible
  • More problems? Likely.

Conclusion

While not innovative nor a complete solution, I believe X can be a useful replacement for some uses of anonymous functions, providing a shorter and simpler syntax which is easier to read and understand.

It is provided here in full, in hope that it will be useful to my readers (Improvements and fixes are welcome):

class _X(object):
	def __init__(self, func):
		self.__func = func

	def __call__(self, arg):
		return self.__func(arg)

	def __getattr__(self, attr):
		return _X(lambda x: getattr(self(x), attr))
	def call(self, *args, **kwargs):
		return _X(lambda x: self(x)(*args,**kwargs))

	# Containers
	def __getitem__(self, other):
		return _X(lambda x: self(x)[other])
	def __getslice__(self, a,b=None,c=None):
		return _X(lambda x: self(x)[a:b:c])
	def in_(self, other):
		return _X(lambda x: self(x) in other)

	# Arith
	def __add__(self, other):
		return _X(lambda x: self(x) + other)
	def __sub__(self, other):
		return _X(lambda x: self(x) - other)
	def __mul__(self, other):
		return _X(lambda x: self(x) * other)
	def __div__(self, other):
		return _X(lambda x: self(x) / other)
	def __floordiv__(self, other):
		return _X(lambda x: self(x) // other)
	def __mod__(self, other):
		return _X(lambda x: self(x) % other)
	def __pow__(self, other):
		return _X(lambda x: self(x) ** other)

	def __radd__(self, other):
		return _X(lambda x: other + self(x))
	def __rsub__(self, other):
		return _X(lambda x: other - self(x))
	def __rmul__(self, other):
		return _X(lambda x: other * self(x))
	def __rdiv__(self, other):
		return _X(lambda x: other / self(x))
	def __rfloordiv__(self, other):
		return _X(lambda x: other // self(x))
	def __rmod__(self, other):
		return _X(lambda x: other % self(x))
	def __rpow__(self, other):
		return _X(lambda x: other ** self(x))

	# bitwise
	def __and__(self, other):
		return _X(lambda x: self(x) & other)
	def __or__(self, other):
		return _X(lambda x: self(x) | other)
	def __xor__(self, other):
		return _X(lambda x: self(x) ^ other)

	def __rand__(self, other):
		return _X(lambda x: other & self(x))
	def __ror__(self, other):
		return _X(lambda x: other | self(x))
	def __rxor__(self, other):
		return _X(lambda x: other ^ self(x))

	def __rshift__(self, other):
		return _X(lambda x: self(x) >> other)
	def __lshift__(self, other):
		return _X(lambda x: self(x) << other)

	# Comparison
	def __lt__(self, other):
		return _X(lambda x: self(x) < other)
	def __le__(self, other):
		return _X(lambda x: self(x) <= other)
	def __eq__(self, other):
		return _X(lambda x: self(x) == other)
	def __ne__(self, other):
		return _X(lambda x: self(x) != other)
	def __ge__(self, other):
		return _X(lambda x: self(x) >= other)
	def __gt__(self, other):
		return _X(lambda x: self(x) > other)

	def __abs__(self):
		return _X(lambda x: abs(self(x)))

X = _X(lambda x:x)

Put it in x.py and import as:
from x import X