Vlad Ioan Topan

My playground

Archive for the ‘Python’ Category

Funky Python – code snippets

with 3 comments

Python is a great programming language for a number of reasons, but one of it’s best features is the fact that, as per Python Zen item #13, “There should be one– and preferably only one –obvious way to do it.” Working with the beast for a number of years, however, does expose one to some less pythonic and somewhat quirky design points (maybe even a few gotchas); here are some of them.

Python quirks & gotchas

1. Boolean values in numeric contexts evaluate to 0/1

It’s very intuitive (especially when coming to Python from C) for 0 values to evaluate to False in boolean contexts and non-zero values to True. Having False evaluate to 0 and True to 1 in numeric contexts is however less intuitive (and somewhat useless):

>>> a = [1, 2, 3]
>>> a[True], a[False]
(2, 1)
>>> True + True # this one was rather unexpected...
2

2. The default argument values are evaluated at the point of function definition in the defining scope

This is probably one of the most frequent gotchas out there:

>>> def a(b=[]):
...     b.append(3)
...     print b
>>> a()
[3]
>>> a()
[3, 3]

The proper way to do this is to set b’s default value to None in the declaration and set it to [] inside the function body if it’s set to None:

>>> def a(b=None):
...     if b is None: b = []        
...     b.append(3)

3. *Everything* is an object

Although it’s a fact which may escape even the best of programmers at the beginning, Python is actually object oriented to the core. Despite the fact that it allows you to write procedural (and even functionalish) code, everything is an object. Functions are objects, data types are objects etc. This:

69.bit_length()

doesn’t work because the period is parsed as part of the numeric token (think of 69.j or 69.e1); this however:

(69).bit_length()

works. Being objects, functions can also have attributes:

>>> def a():
...    print a.x
>>> a.x = 3
>>> a()
3

This comes in handy e.g. for giving a decorated function the same internal name (for introspection purposes) as the original function:

def print_call_decorator(fun):
    def replacement(*args, **kwargs):
        res = fun(*args, **kwargs)
        print r'Call %s.%s => %s' % (inspect.getmodule(fun).__name__, fun.__name__, res)
        return res
    replacement.__name__ = fun.__name__
    return replacement

4. Generators, sets & dictionaries also have comprehensions (called “displays”)

As you probably know, list comprehensions are a great way to generate a new list from another iterable. But it goes further… Generators, sets and dicts also have something similar, called displays. The basic syntax is this:
generator = (value for ... in ...)
dict = {key:value for ... in ...}
set = {value for ... in ...}

E.g.:

>>> a = ['a', 'b', 'c']
>>> d = {x:a.index(x) for x in a}
>>> d
{'a': 0, 'c': 2, 'b': 1}
>>> d_rev = {d[x]:x for x in d}
>>> d_rev
{0: 'a', 1: 'b', 2: 'c'}

This makes reversing a dictionary for example much cleaner.
What makes displays even more fun are the *list* displays, which are essentially list comprehensions but with unlimited depth; using them to flatten a list of lists would look something like this:

flat = [x for y in lst for x in y]

The x/y order in the example is not a mistake; that’s actually the proper order.

5. GIL: Python threads aren’t

Not on multi-processor machines, at least. Yes, there is a threading module (aptly named), but due to the Global Interpreter Lock, threads of the same (Python) process can’t actually run at the same time. This becomes more of an issue when deploying native-Python servers, as they don’t get any benefit from the number of cores installed on the machine (drastically limiting the number of open sockets a Python process can handle at the same time as opposed to a native one written in C).

6. for has an else clause

…and so do the try…except/finally and while constructs. In all cases, the else branch is executed if all went well (break wasn’t called to stop cycles / no exception occurred). And while the else branch may be useful to perform when you want something to happen only if the cycle construct wasn’t “broken” (the classic example is handling the fact that the cycle hasn’t found the value it was looking for), try doesn’t really need an else clause, as the following are equivalent and the latter seems at least to me more readable:

  • with else:
    try:
        this_may_crash()
    except:
        handle_it()
    else:
        call_me_if_it_didnt_crash()
  • without else:
    try:
        this_may_crash()
        call_me_if_it_didnt_crash()
    except:
        handle_it()

7. Tuple assignment order

In tuple assignments, the left-values are eval’ed & assigned in order:

>>> a = [1, 2, 3]
>>> i, a[i] = 1, 5 # i is set to 1 *before* a[i] is evaluated
>>> a
[1, 5, 3]

This happens because tuple assignments are equivalent to assigning the unpacked pairs in order; the second line above is therefore equivalent to:

>>> i = 1
>>> a[i] = 5

8. Scope juggling

Inside a function, variables are resolved in the global scope if no direct variable assignment appears in the
function, but are local otherwise (making Python clairvoyant, as it is able to tell that something is going to happen later on inside the function, i.e. a variable will be set). Note that attributes and sequence/dict values can still be set, just not the “whole” variable…

a1 = [1, 2, 3]
a2 = [1, 2, 3]
b = 3
c = 4
def fun():
    global b
    print c # crashes, because c is resolved to the local one (which is not set at this point)
    print b # works, because the global directive above forces b to be resolved to the global value
    a1[0] = 4 # works, because a1 is not directly set anywhere inside the function
    a2[0] = 5 # crashes, because a2 is later on directly set
    c = 10
    b = 11
    a2 = 'something else'

Bonus facts

As a bonus for making it through to the end, here are some lesser known / less frequently pondered upon facts about Python:

  1. a reference to the current list comprehension (from the inside) can be obtained with:
    locals()['_[1]'].__self__
  2. list comprehensions can contain any number of forin levels (this is actually documented). Flattening lists:
    flat = [x for y in lst for x in y]
  3. range() actually builds a list, which can be slow and memory-consuming for large values; use xrange()
  4. modules have a dict attribute .__dict__ with all global symbols
  5. the sys.path list can be tampered with before some imports to selectively import modules from dynamically-generated paths
  6. flushing file I/O must be followed by an os.fsync(…) to actually work:
    f.flush()
    os.fsync(f.fileno())
  7. after instantiation, object methods have the read-only attributes .im_self and .im_func set to the current object’s class and the implementing function respectively
  8. some_set.discard(x) removes x from the set only if present (without raising an exception otherwise)
  9. when computing actual indexes for sequences, negative indexes get added with the sequence length; if the result is still negative, IndexError is raised (so [1, 2, 3][-2] is 2 and [1, 2, 3][-4] raises IndexError)
  10. strings have the .center(width[, fillchar]) method, which padds them centered with fillchar (defaults to space) to the length given in width
  11. inequality tests can be chained: 1 < x < 2 works
  12. the minimum number of bits required to represent an integer (or long) can be obtained with the integer’s .bit_length() method
Advertisements

Written by vtopan

March 17, 2011 at 12:37 AM

Posted in Python, Snippets

Python snippets (1)

leave a comment »

Apparently, Python 2.6 supports a max. of 100 named groups in regular expressions. It also doesn’t support constructs such as (?i:…), which makes it impossible to selectively mark groups as case insensitive in a regex with more than 100 groups. The performance penalty for making the complete regex case insensitive is enormous (roughly double running time), so the only solution to my problem (which involves searching for a large number of patterns, some of them case insensitive) appears to be linking to an external regex module, such as PCRE (which will require bindings for Python, as PCRE for example doesn’t have any (not any more, at least)), or implementing my own pattern searching library. This should be fun.

Written by vtopan

February 18, 2009 at 8:29 PM