GFX::Monk Home

Posts tagged programming:

 

Systemd Socket Activation in Python

For those unaware: systemd is a replacement for the traditional unix init process. It is the first process to be brought up by the kernel, and is responsible for all the user-space tasks in booting a system. Conveniently, it also has a --user mode switch that allows you to use it yourself as a session-level init service. As someone who hates being root and loves repeatable configuration, I’m tremendously pleased to be able to offload as much of my session management as I can to something that’s built for the task (and doesn’t rely on hard-coded, root-owned paths).

I’ve heard a lot of complaining from sysadmins who seem to prefer a tangled mess of shell scripts in /etc/init.d/, but I’m a big fan of systemd for its user-friendly features - like restarting stuff when it dies, listing running services, showing me logging output, and being able to reliably kill daemons. There are other ways to accomplish all of these, but systemd is easy to use, and works very well in my experience.

Socket activation

Recently I was wondering how to implement systemd socket activation in python. The idea is much like inetd, in which you tell systemd which port to listen on, and it opens up the port for you. When a request comes in, it immediately spawns your server and hands over the open socket to seamlessly allow your program to service the request. That way, services can be started only as necessary, making for a quicker boot and less resource usage for infrequently accessed services.

The question is, of course, how am I supposed to grab a socket from systemd? It turns out it’s not hard, but there are some tricks. For my app, I am using the BaseHTTPServer module. Specifically, the threaded variant that you can construct using multiple inheritance:

class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
	pass

I couldn’t find much existing information on how to do this in python, so I thought I’d write up what I found. One useful resource was this tutorial for ruby, since python and ruby are pretty similar.

Nuts and bolts

Obviously, systemd needs to tell you that it has a socket open for you. The way it does this is by placing your process ID into $LISTEN_PID. So to tell if you should try and use an existing socket, you can check:

if os.environ.get('LISTEN_PID', None) == str(os.getpid()):
	# inherit the socket
else:
	# start the server normally

Given that you normally pass a host and port into the HTTPServer class’ constructor, how can you make it bind to a socket systemd gives you? It turns out to be fairly simple:

class SocketInheritingHTTPServer(ThreadedHTTPServer):
	"""A HttpServer subclass that takes over an inherited socket from systemd"""
	def __init__(self, address_info, handler, fd, bind_and_activate=True):
		ThreadedHTTPServer.__init__(self, address_info, handler, bind_and_activate=False)
		self.socket = socket.fromfd(fd, self.address_family, self.socket_type)
		if bind_and_activate:
			# NOTE: systemd provides ready-bound sockets, so we only need to activate:
			self.server_activate()

You construct this just like you would the normal ThreadedHTTPServer, but with an extra fd keyword argument. It passes bind_and_activate=False to prevent the parent class from binding the socket, overrides the instance’s self.socket, and then activates the server.

The final piece of the puzzle is the somewhat-arbitrary knowledge that systemd passes you in sockets beginning at file descriptor #3. So you can just pass in fd=3 to the SocketInheritingHTTPServer. If you have a server that has multiple ports configured in your .socket file, you can check $LISTEN_FDS

And that’s it! I’ve only just learnt this myself, so I may be missing some detail. But it seems to work just fine. If you want to see the full details, you can have a look at the commit I just made to edit-server, which includes a simple script to simulate what systemd does when it hands you an open socket, for quick testing. You can also have a look at the service files to see how the .socket and .service systemd units are set up.

IPython's new notebook feature

I’ve just finished watching the Pycon-2012 talk “IPython: Python at your fingertips”, and am really impressed by the new notebook feature, new in version 0.12.

I’ve used IPython over python’s standard console since I first learned about it, and think it’s the best REPL around for any language. So I didn’t think I’d learn too much from this talk, but it turns out IPython is even cooler than I thought!

I’m not sure how much I’d use it during development (although it has some neat advantages over the IPython console), but I can see this making for some killer interactive documentation - publish the pre-processed results on the web, with a “play with this example” button allowing the user to grab the script, interactively modify any part of the code they want and see the new results.

It could also be really useful for manual / visual inspection of test scenarios that are too hard, expensive or brittle to completely automate.

Update: OK, I’ve finally got it running1. I take it back, this is awesome for interactive development as well.

  1. I had to make and compile 0install feeds for tornado and ipython since ubuntu’s version was too old, if you want to run it yourself without having to do this you should be able to run 0launch --command=notebook2 http://gfxmonk.net/dist/0install/ipython.xml  

Why Piep

piep (pronounced “pipe”) is a new command line tool for processing text streams with a slightly modified python syntax, inspired by the main characters of a typical unix shell (grep, sed, cut, tr, etc). To pluck a random example. here’s how you might rename all files (but not directories) in the current folder to have a “.bak” extension (because you have a very strange and manual backup scheme, apparently):

$ ls -1 | piep 'not os.path.isdir(p) | sh("mv", p, p + ".bak")'

In this simple example we can see filtering, piping (note that the pipes between expressions are internal to piep’s single argument, and thus not interpreted by the shell), and shelling out to perform useful work.

Here’s another, to print out the size of files in the current directory that are greater than 1024 bytes:

$ ls -l | piep 'pp[1:] | p.splitre(" +", 7) | size=int(p[4]) | size > 1024 | p[7], "is", p[4], "bytes"'

Or, if hacking through the output of ls -l isn’t your thing (it’s most likely a terrible idea), you can do things the pythonic way:

$ ls -1 | piep --import 'stat' 'size=os.stat(p).st_size | size > 1024 | p, "is", size, "bytes"'

For a proper introduction, you should read the online documentation. But I wanted to address one specific point here, about the origins of piep.


Recently I came across pyp, The Pied Piper. It seemed like a great idea, but after I played with it for a little while I uncovered some unfortunate shortcomings, some of which are deal breakers. My list included:

  • stream-based operation: there’s a beta implementation with “turbo” (line-wise) mode, but it seems very limited. I believe it should be the norm, and wanted to see if I could do things in a way that was just as convenient, but with all the benefits of lazy stream-based processing.
  • Command execution: commands are made up by string concatenation, requiring manual effort to escape metacharacters including the humble space 1. Also, errors are silently ignored.
  • Purity of data: things are routinely strip()ed and empty strings are frequently dropped from computations. Breaking up a line into a list of data would (sometimes?) see each list merged back into the input stream, rather than maintained as a list.
  • stream confusion: second stream, file inputs, etc. Not really sure why there are so many special cases
  • a not-very-extensible extension mechanism, which is fairly manual and appears to preclude sharing or combining extensions
  • lots of unnecessary machinery that complicates the code: macros, history, –rerun, three file input types, etc. Some of this may be useful once you use the tool a lot, but it hindered my ability to add the features I wanted to pyp.

I initially tried my hand at modifying pyp to fix some of the things I didn’t like about it, but the last point there really got in my way. History is baked in, and doesn’t really work in the same manner for stream-based operations. The entire pp class had to be rewritten, which is actually what I started doing when I decided to turn it into a separate tool (since it then became difficult to integrate this new pp class with the rest of the system. Anyway, I hope this isn’t taken as an offence by the developers of pyp - I really like the ideas, so much so that I was compelled to write my ideal version of them.

  1. I humbly submit that concatenating strings is the worst possible way to generate shell commands, leading to countless dumb bugs that only rear their heads in certain situations (and often in cascading failures). Observe piep’s method on a filename containing spaces:

    $ ls -1 | piep 'sh("wc", "-c", p)'
    82685610 Getting the Most Out of Python Imports.mp4
    

    Compared to that of pyp (and countless other tools):

    $ ls -1 | pyp 'shell("wc -c " + p)'
    wc: Getting: No such file or directory
    wc: the: No such file or directory
    wc: Most: No such file or directory
    wc: Out: No such file or directory
    wc: of: No such file or directory
    wc: Python: No such file or directory
    wc: Imports.mp4: No such file or directory
    [[0]0 total]
    $ echo $?
    0
    

    It is unacceptable for a language with simple and convenient sequence types to instead rely on complex string escaping rules to prevent data from being misinterpreted. To be honest, this on its own may be reason enough to use piep over alternatives.

Looking for a good javascript mocking library

Lately I’ve been looking for a good mocking library for node.js. It’s not easy.

Here are some (I would have said obvious) features that seem to be missing in most of the libraries I’ve seen:

  1. create an anonymous mock object on which to add expected methods (no need to provide a template object)
  2. create a (named) mock function (i.e a directly callable mock)
  3. argument matchers (at least eq(), given how terrible javascript equality is)
  4. stub a single method of a real object for the duration of the test
  5. verify all expectations and revert all replaced methods (see #4) with a single call (to be called from a common tearDown())

I don’t want it to be tied to my test runner, I’m quite happy with mocha.

I prefer the rspec style of setting expectations on mocks before the code is run and having them verified at the end of the test, but it’s not a requirement.

I plan need it to run in node.js, but would like it to work in the browser (even if I have to use some sort of commonJS-shim).

Here are the libraries I tried to use or looked at, and reasons they will not suffice:

Honestly, I looked at most of the unit testing modules listed on the node.js wiki that sounded like they did mocking.

jasmine’s mocking support seems somewhat reasonable (I’ve used it before), but unfortunately it seems to be tied to the jasmine test runner, which is not acceptable for async tests.

I’m happy to be shown wrong about my conclusions here, or to be pointed to any mocking library that succeeds in most (or at least more) of my requirements. If all else fails I may consider porting my mocktest python library to javascript as best as the language will allow, but it’s probably a lot of effort - surely someone has written a good javascript mocking library somewhere amongst all this? What do other folks use?

Ruby's split() function makes me feel special (in a bad way)

Quick hand count: who knows what String.split() does?

Most developers probably do. Python? easy. Javascript? probably. But if you’re a ruby developer, chances are close to nil. I’m not trying to imply anything about the intelligence or skill of ruby developers, it’s just that the odds are stacked against you.


So, what does String.split() do?

In the simple case, it takes a separator string. It returns an array of substrings, split on the given string. Like so:

py> "one|two|three".split("|")
["one", "two", "three"]

Simple enough. As an extension, some languages allow you to pass in a num_splits option. In python, it splits only this many times, like so:

py> "one|two|three".split("|", 1)
["one", "two|three"]

Ruby is similar, although you have to add one to the second argument (it talks about number of returned components, rather than number of splits performed).

Javascript is a bit odd, in that it will ignore the rest of the string if you limit it:

js> "one|two|three".split("|", 2)
["one", "two"]

I don’t like the javascript way, but these are all valid interpretations of split. So far. And that’s pretty much all you have to know for python and javascript. But ruby? Pull up a seat.

Experiment: Using Zero-Install as a Plugin Manager

So for a little while now I’ve been wanting to try an experiment, using Zero Install as a Plugin Manager. Some background:

  • Zero install is designed to be used for distributing applications and libraries.
  • One of its greatest strengths is that it has no global state, and doesn’t require elevated privileges (i.e root) to run a program (and as the name implies, there is no installation to speak of).

Since there’s no global state, it seems possible to use this as a dependency manager within a program - also known as a plugin system.

Happily, it turns out it’s not so difficult. There are a few pieces required to set it up, but for the most part they are reusable - and much simpler to implement (and more flexible) than most home-rolled plugin systems.

Tame.JS: Async Flow Control

If you are interested in my defer work with async control flow in CofeeScript, you’ll probably be interested in Tame.JS. The guys from OkCupid have a history with this sort of thing, apparently they have been using a similar mechanism they built for C++ for years.

Tame allows for more explicit control over parallelism than defer, and is a pretty simple mechanism. Contrasted to Stratified JS it seems to be simpler and more interoperable with existing javascript codebases, but also has fewer features - Tame.js is at a similar level to defer, while Stratified JS offers additional features like promise values (strata), parallel composition, alternative composition and more.

Regardless of which you prefer, it’s good to see people tackling the problem despite the common wisdom seeming to be that there is no problem (or worse, that it can be adequately addressed with libraries alone).

As usual, there’s some good commentary going on at hacker news. There are even a bunch of people wondering when such a useful mechanism will arrive in CoffeeScript ;)

(view link)

Node.js is backwards

(view link)

Arrows and Haskell

For a little while I’ve been wondering what exactly arrows were and what they’re used for - ever since I came across their syntax in my window manager’s somewhat-indecipherable configuration file (which I don’t claim to understand).

So anyway, this article is probably the best thing I’ve found on the subject. I’m still not entirely sure what I’d use them for (the parser example that everyone seems to give is a little abstract for my likes), but this article was the easiest introduction to arrows that I’ve found, and now I think I finally get what they are, at least…

Scala Investigation: First-Class Functions

I’ve just spent some time learning about the difference between scala’s functions and methods. It’s a surprisingly complicated topic, so I’ll defer to the smart folks on stack overflow for the explanation itself:

Difference between method and function in Scala

Here are some interesting points / examples I took from that topic:

Scala trick: convert java.lang.Object to Option[A]

So let’s say you have a java method:

public Object getSomething(String key) { ... }

This returns an Object, or null. Ideally it would have a generic type so that it at least returned the type you expect (like String), rather than Object, but that’s java for you. What’s a scala chap to do with such an ugly method?

val obj = getSomething("key")
val maybeObj = obj match {
	case s:String => Some(s)
	case _ => None
}
val actualObj = maybeObj.getOrElse("")

Not very nice, is it? We should abstract this (unfortunately) common pattern!

My Ideal Window Manager

I’ve been using xmonad (with a slightly modified bluetile setup) for about a year now, and it’s been pretty great. But I still feel locked in to its grid sometimes, and miss the direct manipulation that a “normal” window manager (like metacity) provides - specifically allowing quick movement and resizing by using alt + mouse dragging. Bluetile has the option of floating windows, but actually moving or resizing them is so cumbersome that it’s not really worth it. I also sometimes wish that my windows could overlap, so that (while still tiled) a window can extend beyond the bounds of its tile if I want it to.

I also am a sucker for shiny things, and xmonad is far from a shiny thing (in terms of graphics). I tried out gnome-shell yesterday, and while buggy, it is exceedingly shiny. And considering that gnome-shell will not allow alternate window managers (that was a surprise to me), I have put some thought into what my ideal window manager would look like.

I’m keen to try and implement this somewhere. It’s unlikely to be xmonad, as I want builtin compositing support (and haskell is a great language, but I can barely figure out how to configure xmonad, let alone extend it). So I’m wondering if the following can be done as a plugin to either gnome-shell or mutter. Hopefully gnome-shell, as I can stomach javascript hacking a lot easier than compiling C extensions.

Also, if people know of an existing project (with compositing!) that has these sorts of features, I’d be interested to know - I don’t want to have to reinvent the wheel, but it seems like most tiling window managers are too rigid and keyboard-based for me, while most “grid” extensions to floating window managers are too manual.

So, here’s the plan:

Four Common (and Broken) Ruby Operations

All of these lines, in ruby, should fail. All of them instead return nil:

@nonexistant_var
{}[:nonexistant_key]
[].first
{}.shift

All of these were encountered by myself in the course of yesterday’s programming. None of them in a good way. And the last two were in published libraries, not even code under development.

All of these, of course, raise errors in python. I refer you to lines 10 and 11 of the zen of python:

Errors should never pass silently.

Unless explicitly silenced

(an Option or Maybe type would be acceptable also, but that’s pretty uncommon to find in a dynamic language)

Also inviting my fury: every single language, tool or function, ever, that makes you check the return code of a system (shell) command to see whether it was nonzero.

Pea

…three days after this post, pea, the tiniest BDD framework is born.

The Best-Named Python Library

I hereby declare to be monocle. To see why, check out the second example.

How I Replaced Cucumber With 65 Lines of Python

Update:

I’ve since cleaned up the code here and published it as a tiny library: pea on github

Aside: why cucumber doesn’t work as well as everyone thinks it should

I’ve used cucumber at work for a reasonably large project, and I wasn’t impressed. Having one canonical language for stories sounds great, until you have enough arguments about how things should be phrased that you eventually come to the realisation that BAs don’t want to write their specifications as tests, and you don’t write your tests as specifications.

This is a test style assertion step:

Then the total of the items should be 42

..and this is the same step in a requirements-style of language:

Then the total of the items should equal the sum of the number of items in each category

To a BA, the first example is a lie. The sum shouldn’t be 42, it should be the correct number! And to an automated program, the second statement is nigh on useless. Saying what something is supposed to be made from is just doing the same calculation twice - there’s nothing stopping you from doing it wrong both times! If you want to check that it’s getting the right answer, you need to tell it what the right answer is, not just tell it (again) how to make it.

So I’m not a huge fan of having cucumber scenarios be the single source of truth for requirements. If the programmers have their way it’s just a series of examples (also known as “tests”), and if the BAs have their way it’s just a series of feeble assertions that don’t necessarily check what they say they’re checking.

But it’s not all bad…

But on programmer-oriented projects, I can see them working quite well. For example, I’ve recently upgraded a large suite of specs to rspec 2, and made heavy use of the browsable cucumber scenarios on relishapp.com as actual, useful documentation.

So I decided to try cucumber on one of my own projects. Since I am obviously a python fiend outside of work, I wasn’t going to use cucumber. So out came the (very young) python port of cucumber, called lettuce (where did this salad theme even come from? o_O). I gave it a go, and of course it’s naturally a bit more awkward than ruby because python doesn’t have blocks. It’s also more than a little buggy, and lacking some useful features that cucumber has (which is to be expected of such a young project).

I started hacking on it to add or improve features, and then got sick of it. It really does seem a little ridiculous. We’re actually inventing a (trivial) language, and parsing it, and using little regex parsers in each of our steps, and mapping each of those regexes to little chunks of code. And all this makes it hard to find usages, hard to track duplication and dead code, and generally just awful to navigate and manage.

The punchline

So, you know what? I just transformed all my steps into valid python code instead. Each regex replaced with a function name, and each matching group an argument (python’s keyword-arguments help here). 65 lines of code later, I have a very similar result using plain-old python.

Here is a comparison. The old feature:

Feature: running indicate-task
	Running a basic, blocking process that
	consumes and produces output.

Scenario: running and cancelling a program
	When I run indicate-task -- cat
	And I enter "input"
	And I press ctrl-c
	And I wait for the task to complete

	Then there should be a "cat" indicator
	And it should have a menu description of "cat: running..."
	And the output should be: input
	And the error output should be empty
	And the return code should not be 0
	And it should display the task's output to the user
	And it should notify the user of the task's completion

And the new, normal, actual-python-code-that-works-just-fine-with-ctags-and-isn’t-built-with-dirty-regexes version:

from makeshift_cucumber import *
from base_test import BaseTest

class TestRunning(BaseTest):
	"""
	Feature: Running a basic, blocking process that
	consumes and produces output.
	"""

	def test_running_and_cancelling_a_program(self):
		When.I_run_indicate_task('--', 'cat')
		And.I_enter("input")
		And.I_press_ctrl_c()
		And.I_wait_for_the_task_to_complete()
		Then.there_should_be_an_indicator_named("cat")
		And.it_should_have_a_menu_description_of("cat: running...")
		And.the_output_should_be('input')
		And.the_error_output_should_be_empty()
		And.the_return_code_should_not_be(0)
		And.it_should_display_the_tasks_output_to_the_user()
		And.it_should_notify_the_user_of_the_tasks_completion()

No, you probably wouldn’t be able to get a businessman to write a scenario. But has that ever actually worked with cucumber either? I find it doubtful. The results are just as readable, and insanely simpler in terms of the complexity of the testing infrastructure. Plus, it’s just a normal test, the functions are just normal functions, and the arguments are just normal arguments.

And if you don’t want to give that to a BA, just show them the test output instead:

terminal output

I’ll try to clean this up some time into a proper library & formatter sometime, because I think the mess of code you end up with cucumber is just too ridiculous for the benefits you get, and this sort of thing is much more developer-friendly while maintaining most of the readability benefits.

The -rubygems Flag

I was always slightly confused that despite rubygems not being part of the ruby language or interpreter, there is nonetheless a -rubygems option you can give to ruby to enable rubygems.

Today when I was delving through some stack traces, I noticed an odd looking filename at the root of it all. As I’m sure many before me have realised (a bunch of my workmates already knew about this), the -rubygems flag is not a real flag at all. It’s just a perverted case of the -r module syntax which tells ruby to require a file by name. Because when you install rubygems, it conveniently installs a file called ubygems.rb whose contents is simply require "rubygems". Very sneaky…

rvm: Manage Your Rubies With an Ill-Managed Manager

rvm is a tool for maintaining multiple versions of ruby, as well as maintaining project-specific sets of gem dependencies. When I first learnt about it this week it sounded like a very useful tool, although it’s unfortunate that gems are so awkward to manage that it should be necessary in the first place.

Yesterday my first task was to update rspec. Which in turn required an update to rubygems before it would install. But who manages rubygems? It could be rvm, or rubygems itself, or apt, or even maybe bundler.

I looked through the documentation, and the most appropriate answer seemed to be that rvm should manage rubygems. I quote from the documentation:

rvm action [interpreter] [flags] [options]

where update is an action, and one of the flags is --rubygems:

--rubygems    - with update, updates rubygems for selected ruby

So I diligently typed

rvm update --rubygems

And what did rvm do? It proceeded to attempt to update itself, instead of rubygems. If you want to upgrade rubygems, you’re supposed to type:

rvm --rubygems update

(note that this is incorrect according to the above documentation, but is how I eventually coerced it into upgrading rubygems (this bug has since been fixed))

The accidental upgrade might have been okay, if its upgrade process were anything but Completely Insane. It goes thusly:

  • download a file from an unsecured HTTP location
  • without verifying any sort of checksum, signature or even HTTP status code, pipe the output directly into a bash shell
  • this script clones a github repository, and proceeds to install the absolute latest revision, whatever that might be

Hilarity ensues. I got a bash syntax error, but evidently not early enough in the process to stop rvm from destroying itself, requiring me to delete everything related to it and install from scratch.

Security? ignored.

Sanity checking? skipped.

Dependencies? get them yourself.

Update management? The website says “make sure you run this command frequently”.

I don’t know that I want such a tool trying to manage my dependencies, thank you very much…

The most painful thing, of course, is that it’s yet another buggy, language-specific implementation of the principals that zero-install does so much better (and simpler). If you don’t have global state, suddenly it’s really not that hard to keep things from interfering with each other.

Oh, and did I mention how rvm integrates with your shell, so that when you cd into a project directory, it automatically sets up your ruby version and gems? Except that when you open a new shell in the same location, you have to cd out of your project directory and then back in or else you’ll see the system version of ruby and your gems, and things will be broken in very odd ways. Splendid.

mocktest 0.5

Over the holidays, I’ve finally had the time to rewrite my mocking library for python, mocktest.

The original version had what turned out to be a confusing distinction between mock anchdors, mock wrappers and raw mocks. You should no longer need to know about that distinction when using mocktest 0.5, as it takes a more traditional approach using global methods like when() and expect() to differentiate between setting up the mock object and actually using it.

Please check out the brand-new documentation if you’re looking into a mocking library for python - it works with the standard unittest infrastructure, so it’ll work just fine with your favourite test runner (nosetests, surely….)

Speaking of documentation, this is my first time using sphinx. I am very impressed, and really quite keen to properly document a lot of my other code, when I get the time.

Jekyll Improvements

I’ve been using Jekyll to generate this blog for over a year now. It’s great. I don’t have to worry about security exploits, or replicating the arcane installation of PHP and associated libraries that my hosting provider happens to have installed. And it’s a blog. I shouldn’t need a CMS for what amounts to very static content.

The good thing about jekyll is that it’s easily hackable if it doesn’t do exactly what you want. The sucky thing is that over 400 people have done just that, making it incredibly difficult to get the right combination of features that actually work together. The franken-jekyll running this blog for the past year was a combination of mojombo’s master, mikewest’s tag_index and master, as well as some of my own fixes for bugs I came across or features I wanted.

So when I eventually got around to merging in new updates, things fell in a heap. Thankfully, in the past year, extending jekyll has become a whole lot cleaner. There is now jekyll_ext, which allows you to add features without changing the jekyll code itself. All of my original modifications have now either been fixed, or implemented in jekyll_ext.

So now I have, thanks to an upgraded markdown engine and some jekyll_ext plugins:

If you notice any post’s formatting is a bit off now due to the upgraded markdown engine, please leave a comment (or shoot me an email) and I’ll fix it up.

  1. Which I would never use spuriously. Except for this post, perhaps.

Background Make for GVim

I haven’t used vim’s :make command much, mostly because I don’t often use compiled languages, and setting :errorformat correctly for nonstandard programs is a dark, dark art.

Recently I got :make and :errorformat working well enough with sbt, so the only remaining problem is that vim completely locks up while the make task is going. That really sucks, as sbt can take a good 20 seconds to even just compile and install an android app.

Enter background-make (for GVim only, sorry terminal freaks). It’s not perfect, but for non-pathological makeprg settings it seems to work very reliably. It adds a :Make command that does exactly what :make does, except it does it in the background.

And it tries its very best to not disrupt you - by default, it’ll send a system notification the moment that make finishes. But it will then wait until you are in either insert or normal mode, at which point it’ll take the opportunity to pop up the error window and restore your cursor position / mode. (It has to wait for normal or insert mode because these are the only ones I can figure out how to restore ;))

It’s implemented by firing off a background make process with the current vim instance’s v:servername so that it knows where to send the results (thus the GVim requirement). Once complete, it uses --remote-send to tell the originating vim instance to open the now-complete errorfile. Oh, and it also requires a python-enabled vim, because vimscript makes me wince.

What The Scala?

Here’s a puzzler for scala fans: What will be the console output of the following program:

def foo1() {
	println("> foo1()")
	return "foo1"
}

def foo2() {
	println("> foo2()")
	"foo2"
}

def foo3() = {
	println("> foo3()")
	"foo3"
}

def foo4():String {
	println("> foo4()")
	"foo4"
}

println(foo1())
println("---------------")
println(foo2())
println("---------------")
println(foo3())
println("---------------")
println(foo4())

Look carefully at each of the def lines, and write down what you think the output will be.

Why Zero-Install Will Succeed

Also known by the longer title: I Sure Hope Zero-Install Succeeds, Or Else We Might All Give Up On Package Managers Entirely.

If you’ve tried to run any of my software lately, you may have noticed that it’s all distributed and packaged via Zero-Install. I’ve posted about how awesome it is, but that was just an initial impression - I’m now familiar enough with the system to expand on those impressions.

After reading that the distros have killed python yesterday, I felt compelled to write in a little more detail how zero-install solves this and many other problems right now, across platforms and languages, and for much less effort than the current packaging practices.

A new edit-server for TextAid and friends

ItsAllText has long been one of my most useful firefox extensions. It allows you to edit the contents of a <textarea> in an external editor (i.e. vim, emacs, etc) and insert the results back into the web page.

I’ve been having trouble with itsalltext, so I scoped out other alternatives. One such extension is TextAid for chrome (Edit with Emacs is another, which thankfully you can use with vim despite the name ;)).

The funny thing about chrome extensions is that they’re not allowed to spawn new processes, which injects a large portion of awkwardness into an extension whose main goal is to spawn your text editor. The workaround is to run a server (locally) that receives a POST request with some text content. The server then spawns your favourite text editor and waits for you to edit its contents. When you’re done, the new text is send back as the response body. It seems a rather roundabout mechanism, but it’s nonetheless kind of neat. And entirely necessary to fit in with chrome’s security model - the chrome extension is just making a long-running ajax call.

So anyway. I took the python server from the emacs_chrome project, cleaned it up, added multithreading so you can edit multiple files at once, and packaged it all up as a 0install package (yep, I still love 0install). You can get it here, if you are ever in the need for such an outrageous piece of software.

(view link)

group_sequential

So the other day I had a list of (html) elements, and I wanted to get an array representing lines of text. The only problem being that some of the elements are displayed inline - so I needed to join those together. But only when they appeared next to each other.

I would call the generic way of doing so group_sequential, where an array is chunked into sub-arrays, and sequential elements satisfying some predicate are included in the same sub-array. That way, my predicate could be :inline?, and I could join the text of each grouped element together to get the lines out.

For example, using even numbers for simplicity:

[1,2,3,4,6,8,5,4,4].group_sequential(&:even?)
=> [[1],[2],[3],[4,6,8],[5],[4,4]]

Here’s the ruby code I came up with:

	class Array
		def group_sequential
			result = []
			group = []
			finish_group = lambda do
				unless group.empty?
					result << group
					group = []
				end
			end

			self.each do |elem|
				if yield elem
					group << elem
				else
					finish_group.call
					result << [elem]
				end
			end
			finish_group.call
			result
		end
	end
	

Things are slightly less noisy, but assignment is subtly awkward in python without using nonlocal scope keyword (only available in python3):

	def group_sequential(predicate, sequence):
		result = []
		group = []
		def finish_group():
			if group:
				result.append(group)
			return []

		for item in sequence:
			if predicate(item):
				group.append(item)
			else:
				group = finish_group()
				result.append([item])
		group = finish_group()
		return result
	

This feels like something that should be doable in a much more concise way than I came up with above. Any ideas? (In either ruby or python)


Update:

My friend Iain has posted a number of interesting solutions over yonder, which got me thinking differently about it (specifically, reminding me of dropwhile and takewhile). I applied python’s itertools to the problem to get this rather satisfactory result in python:

	from itertools import takewhile, tee, dropwhile
	def group_sequential(pred, sequence):
		taker, dropper = tee(iter(sequence))
		while True:
			group = list(takewhile(pred, taker))
			if group: yield group
			yield [dropwhile(pred, dropper).next()]
	

(note: this is a generator which is fine for my purposes - you can always wrap it in a call to list() to force it into an actual list).

The same approach is acceptable when done in ruby, but a bit more verbose because of the need to explicitly check for the end of the sequence, and to collect the results array:

	class Array
		def group_sequential(&pred)
			sequence = self
			results = []
			while true
				group = sequence.take_while &pred
				results << group if group.size > 0
				sequence = sequence.drop_while &pred
				return results if sequence.empty?
				results << [sequence.shift]
			end
		end
	end
	

Update (the second):

While poking around itertools, I managed to miss groupby. I assumed it did the same thing as ruby’s Enumerable#group_by, which is to say not at all what I want (though it’s surely useful at other times). So here is presumably the most concise version I’ll find, for the sake of closure:

	from itertools import groupby
	def group_sequential(pred, sequence):
		return [list(group) for key, group in groupby(sequence, pred)]
	

rspec immediate feedback formatter

I had cause to repurpose this immediate feedback formatter to work with the SpecDoc rspec formatter. So here’s a minimal version that just monkey-patches the SpecdocFormatter class to provide immediate feedback. As long as it’s included somewhere before rspec starts running, it should do its thing…

(view link)

Dealing with non-local control flow in deferred CoffeeScript

…a continuation (hah!) of defer: Taming asynchronous javascript with coffeescript.

In my last post, I outlined the ideas behind defer and its current state. One of the things I mentioned is that how to deal with return statements in asynchronous code is not yet decided. I’d like to explore a few of those ideas here. If you haven’t read the previous post, I suggest that you do.

defer: Taming asynchronous javascript with CoffeeScript

defer is something I’ve been working towards in various fashions for many months now. It’s a way of writing asynchronous javascript in a more straightforward and synchronous-looking way. My current approach has been to modify the CoffeeScript compiler to introduce a defer keyword. There is currently some debate on the issue (parts one and two have even more history) as to whether the functionality is necessary, useful or appropriate. Here, I hope to show the reasons behind the idea, why it does what it does, and how it helps programmers write more concise and readable code.

iPhone OS "multitasking"

So apparently iPhone OS 4 (or iOS, as it’s now awkwardly named), still has no way to sync data in the background. “Multitasking done right” indeed.

To be clear, that means that for every application that syncs data with the web, you must explicitly open it (and maybe tap some buttons) to initiate a sync. If you have a third-party mail client, an RSS reader, a todo list and instapaper, that’s quite a lot of tedium to go through every time you want to sync data. Heaven forbid you have more than a handful of apps that might need to talk to the internet without your explicit direction.

Get over your ego, Apple, and just copy android properly next time.

Scala on Android

As an addition to my last post, I’d like to point out a great resource for getting started with android programming in scala. This article gives you a nice template to go from, and a brief rundown of the sbt build tool you can use to build and deploy your scala-based android app. Good stuff!

(view link)

zero install is great

Zero install is a marvellous system that I’ve known about for a little while, but only just started to use. It’s a package manager (like apt or yum), with two important differences:

  1. Packages are distributed and named by URL - there is no single repository, just the internet. This makes for potential trust issues, but it’s far better than (for example) launchpad PPAs, because…
  2. Packages are installed and run as regular users. No root access required.

It also has some additional perks:

  • Cleanup is trivial - just clean out your zero-install cache.
  • Making packages is pretty simple.

For its convenience, using zero-install puts some restraints on you:

  1. No triggers. You can’t run a script on install / uninstall, because there are no such events - merely “run”. That means your program has to be self-contained, and must deal with any first-run issues in the code itself. Probably not a bad idea though.
  2. No arbitrary placement of files. I’ve got a bunch of customisations that (for example) put things in /etc/profile.d. You can’t do that sort of thing with zero-install, not even in the user’s home directory (e.g ~/.config). This means it’s not a great solution for configuration-based packages that coexist on the filesystem in well-known directories, so I certainly don’t see it replacing APT for that sort of stuff any time soon. But it’s certainly an excellent tool for delivering both programs and libraries.

I’ve put together my first 3 zero-install packages, and hopefully there will be more to come. You can find them here: http://gfxmonk.net/dist/0install/

Two of them are existing software, and the third is a tiny utility for working with zero-install itself. You can click on the xml files in that directory listing for an overview of what each package does.

Battling javascript contortion with lisp (of all things)

I’m pleased to report that my crazy notions of replacing javascript metaprogramming with lisp metaprogramming appear to be headed steadily in the direction of success. It’d take a while to explain exactly what I’m up to (and the reasoning behind it), but essentially I’m trying to solve the same problem that async.js, narrative javascript and strands all try to solve: asynchronous callbacks are ugly, error-prone and downright confusing.

All of the above mentioned tools are unsatisfactory in various ways. Both narrative javascript and strands are complex enough that I uncovered serious bugs in both reasonably quickly. async.js is much more stable (and I actually have a working application with it), but still requires very careful programming and only works on mozilla browsers.

So I set forth with parenscript, which is essentially javascript dressed up as lisp. It doesn’t make for any prettier javascript, but it does make for one hell of a metaprogramming opportunity using lisp macros.

The aim is to convert straightforward, procedural-style code into the contortion required to appease asynchronous callbacks. I’m certainly not done yet, but I do have some compelling proof to show that it’s a plausible thing to do with lisp. Here’s a contrived example for, say, getting all items from whatever feed the supplied item-id belongs to. This involves getting the item to find the feed it belongs to, then returning all items contained within that feed. The “store” objects in this scenario refer to lawnchair stores, which do asynchronous local datastore lookups:

    (asyncfun get-sibling-items (item-id)
      (defer item (item-store.get id))
      (console.log (+ "item belongs to feed: " item.feed-id))
      (defer feed (feed-store.get item.feed-id))
      (ret feed.all-items))

And here is the generated javascript:

    function getSiblingItems(itemId, cb) {
      itemStore.get (id, function (item) {
         console.log('item belongs to feed: ' + item.feedId);
         feedStore.get(item.feedId, function (feed) {
            cb(feed.allItems);
          });
       });
    };

This is just a simple example, but the lisp code still shows a remarkable reduction of both noise and contorted control-flow, and the generated code ought to work in all browsers. Worth pursuing, certainly.

Javascript Object Promotion

So it’s not something you should have to worry about (often), but I just spent an inordinate amount of time figuring out exactly why a particularly innocuous-looking piece of code was failing. It’s also yet another surprising part of the javascript language, and if you often deal with javascript then it’s in your best interests to keep track of such oddities (there are a lot of them!).

To illustrate, consider the following array intersection function:

    var list1 = ["a", "b", "c"];
    var list2 = ["a", "b", "d"];

    function intersect(a,b) {
      // find the intersection of two arrays
      var intersection = [];
      jQuery.each(a, function() {
        if(b.indexOf(this) != -1) {
          intersection.push(this);
        }
      }
      return intersection;
    }

You might expect calling intersect(list1, list2) to return ["a", "b"], since those are the common elements in the two sets. However, what you really get is [].

The problem is that I was a bit lazy, and used jQuery.each() to iterate over my collection (I hate the for x in y construct, and for loops are just so C). The each method calls your provided function on each member of the provided array, setting this in the context of each call to the current array element.

But it turns out that when you call a method on a primitive string object, the javascript (well, ECMAScript really) language specifies that this call takes place not on the string primitive type, but on the String object type (it’s kinda like a boxed value in Java). That is, fun.call("some string") actually ends up as if you had written fun.call(new String("some string")).

The String and string objects will compare equal when using the “==” operator - but since they are of different types, they will not compare as equal using the “===” operator. Evidently that is the type of equality that indexOf() uses, therefore none of the String objects will ever appear in the array of string primitives.

Normally, you will neither notice nor care. However, when extending the String class or calling a function with a primitive type as the subject, you should keep this conversion in mind. Note that the conversion does not apply to function arguments, only to the subject of a function application (i.e this).


I’m continually amazed by the high quality answers found on stackoverflow.com - I posted this question asking why the object promotion occurs (and if that is indeed what’s going on), and was rewarded with an explanation and a link to the precise part in the ECMAScript spec where the behaviour is detailed.

Making sense of async.js

async.js is a fascinating library for taming asynchronous javascript. It’s highly experimental, and as far as I know, it only works in firefox. But the idea is an important (and useful) one, so I think it’s definitely worth knowing about.

There are a few approaches to dealing with asynchronous javascript:

  1. Compile [some language] to javascript This is the approach taken by GWT, pyjamas, and many others. It’s usually extremely heavyweight, so it mainly makes sense for big apps.

  2. Compile [almost-javascript] to javascript Most notably this includes Narrative Javascript and its successor, Strands. This is a reasonable approach, but the lack of maturity / tool support makes debugging extremely hard. Also, I ran into a number of bugs in both these libraries. The thought of finding and fixing more of those bugs is not at all fun.

  3. Write a javascript library This is doable, but typically looks hideous, convoluted, and is usually quite burdensome to try and use.

Async.js is the most plausible attempt at #3 that I’ve seen. It uses a reasonably clever (but not unknown) trick to to turn Javascript 1.7’s generator functionality into an event-based coroutine system. Specifically, this allows for a program to “wait” for a callback, while not actually blocking the javascript interpreter (as a synchronous AJAX call would). Go read the async.js page to learn more, because the following isn’t going to make much sense if you don’t know roughly how to use async.js.

Javascript's fragile "this" statement

Javascript’s this statement must be the most fragile and confusing statement I’ve come across. Many languages have the concept of “this”, but none mess with it to hard as javascript does. As I have recently discovered, there are two fundamental issues with javascript’s this:

  1. this does not get captured when storing an object member into a function object
  2. you can never actually be certain what this will be

For object-oriented programming, these two facts are entirely terrifying. Let me illustrate each one:

1. this does not get captured when storing an object member into a function object

function Obj() {
	this.method = function() {
		return "method() called - I am " + this;
	};

	this.toString = function() {
		return "[Obj instance]";
	}
}

Now, consider the following scenarios:

var obj = new Obj();
obj.method();
// returns "method() called - I am [Obj instance]"

var obj_method = obj.method;
obj_method();
// returns "method() called - I am [object DOMWindow]"

What happened to this? I can find no explanation anywhere as to why it has been lost (and the window object used in its place), but it is consistent. This may not seem like a big deal, but even aside from the awkwardness, it’s completely non-obvious - and therefore a great candidate for sneaky bugs.

2. You can never actually be certain what this will be

Some see this as a feature, and it is in some cases. But the fact remains that the caller of any function can set this to be any object they choose is cause for great suspicion on the part of any callback code.

What’s worse, as a library writer there are cases where it’s impossible to not mess with the value of this. Other languages have the concept of a unsplat operator. Google it if you don’t know this term, but basically it will turn a list of objects into an argument list. That is, func(1,2,3) is the same as func(*[1,2,3]) (where * is the un-splat operator). This is very important for higher-order / functional programming, where you might write a proxy function that wraps a normal function call with some useful behaviour.

Anyways, javascript does have an unsplat operator. Kind of… The following code will work:

function call_other() {
	var _arguments = Array.prototype.slice.call(arguments);
	var func = _arguments[0];
	var func_args = _arguments.slice(1);
	// do whatever proxy stuff you need to do here
	func.appy(null, func_args);
}

Except for that first parameter to the apply function. Whatever you pass in there is what this will be set to in the context of the called function. With no discernible means of extracting what this would normally be for the given function, it becomes impossible not to clobber the otherwise extremely-useful this statement.

Zenity: for the user-friendly scripter

I just discovered zenity, which is a great tool for simple gtk+ interactions with a script.

It has the following types of interactions:

  • calendar (date picker)
  • file / directory selection
  • error / info message
  • list picker
  • question (yes/no)
  • progress dialog
  • systray notification
  • and more… ( see the man page )

The usage is brilliantly simple, and it’s very unixy despite being a GUI tool. I just wrote a pygtk+ app to do some photo importing stuff, but doing it with zenity would have been far simpler.

Well worth keeping in mind for your next user-friendly shell scripts (particularly for the more advanced progress, calendar and list picker dialogs).

Bash trick: indicate whether a session is running under SSH

It’s worrying how often I execute a command on the wrong machine because I don’t notice the hostname in my bash prompt. So I eventually figured out a way to make it more obvious. This is the relevant part of my ~/.bashrc:

# colours
RED=`tput setaf 1`
GREEN=`tput setaf 2`
YELLOW=`tput setaf 3`
BLUE=`tput setaf 4`
MAGENTA=`tput setaf 5`
CYAN=`tput setaf 6`
WHITE=`tput setaf 7`
LIGHT=`tput setaf 9`
GREY=`tput setaf 0`

# make remote sessions stand out
HOSTNAME_COLOR="$YELLOW"
[ "$SSH_CLIENT" ] && HOSTNAME_COLOR="$RED"

# string it all together:
PS1='\[$GREEN\]\u\[$HOSTNAME_COLOR\]@\h \[$CYAN\]\w\[$GREEN\] \$\[$LIGHT\] '

The colours stuff is pretty standard, the crucial part is [ "$SSH_CLIENT" ] && HOSTNAME_COLOR="$RED". You can do anything in here to make a remote session stand out.

Is there such thing as a Snapping Window Manager?

..in which I propose a potentially-new window management feature, and hope that somebody has already done it so that I won’t have to…

Google camera adapter (MagicCam) - osascript errors

Just thought I’d shed some light on the issue (because googling it myself turned up less than useful results). I just had an issue where the following messages were being spewed into the output of every osascript (applescript) command:

$ osascript -e 'tell application "finder" to activate'
[000:035] MagicCam 0: Current process: osascript, Flash is loaded: no
[000:035] Error(magiccammac.cc:276): MagicCam 0: MagicCamOpen: Not an allowed process!
[000:002] MagicCam 0: Current process: osascript, Flash is loaded: no
[000:002] Error(magiccammac.cc:276): MagicCam 0: MagicCamOpen: Not an allowed process!
[000:000] MagicCam 1: Current process: osascript, Flash is loaded: no
[000:000] Error(magiccammac.cc:276): MagicCam 1: MagicCamOpen: Not an allowed process!
[000:002] MagicCam 1: Current process: osascript, Flash is loaded: no
[000:002] Error(magiccammac.cc:276): MagicCam 1: MagicCamOpen: Not an allowed process!

This has the potential to break a lot of scripts that use osascript to get information from (or about) running applications.

It turns out this is due to a google quicktime component, which I believe is related to google gears and video chat. To get rid of the error, you can delete “Google Camera Adapter 0.component” and “Google Camera Adapter 1.component” from /Library/Quicktime/. I make no claims that google video chat will work after you do this (it surely won’t), but I never use it anyways, and I’d rather not have it break anything that relies on osascript output.

navim update

So I’ve kept working on navim, my jQuery plugin for easily adding vim-style keyboard navigation to web pages. New features include:

  • shift+enter to open links in a new window (and the ability to tell if shift is pressed from a custom action callback)
  • fixed a bug that interfered with pressing return to submit a form
  • using focus(), blur() and tab-key navigation to better effect

I’ve also now implemented navim in my “read later” webapp, pagefeed. It was trivial enough to add “d” as an additional keyboard shortcut to delete the currently active item, which serves as a good example for anyone wanting to add their own custom action keys. The code is simply:

$(window).keypress(function(e) {
	if(e.which == 100) { // 'd'
		$(".navim_active").children("form.del").eq(0).submit();
		return false;
	}
});

The rise of vi keybindings on the web

Vi? On the internet? No, not that one. I just mean vi keybindings, not the rest of vi. And I’ve just written a jQuery plugin to help make this happen.

On the UNIX terminal, many commands use common keyboard shortcuts to navigate screenfuls of text. j/k for down/up, h/l for left/right, and so on. As far as I know, vi was the first program to use these shortcuts. vi is famous for its modal interface, where different keys mean different things depending on what mode you’re in. In text insertion mode, “hjkl” means exactly those letters. But in normal (or visual) mode, they are the keys you use for navigation. Using standard letters instead of the arrow keys could have come about for a number of reasons. Firstly, there’s the issue that different terminals send different key-codes for special keys like the arrow keys. That’s generally been solved these days, but the other reason still remains: By using the whole alphabet as control keys, you get a startling amount of “command bandwidth” - that is, commands in vi generally require much less finger-contortion than pretty much every other text editor.

vi keybindings are useful on the internet for a third (but related) reason - keybinding collisions. We already have meanings for what the arrow keys do (scroll), and trying to use control-key combinations for webpage-specific functionality is fraught with user frustration, confusion and technical issues.

I’ve noticed it already with a bunch of google products, especially since I started using gmail’s online version most of the time now (instead of Mail.app). Gmail and Google reader are the two big ones that I use, but I’m sure there are more. Both use vi-style keyboard navigation, and both are a delight to use with the keyboard. I’d be surprised if bespin lasts long before a vi-mode is added.

I guess many people think only data-heavy, webapps are worth learning keyboard shortcuts for. But the real benefit comes when everything (or at least most things you care about) use the same convenient shortcuts. I was absolutely delighted when I noticed that every issue of The Big Picture allows you to use j/k to jump to successive images - incredibly useful in this case, because page up/down rarely manages to line up to image boundaries. It’s an unobtrusive addition, and it won’t hurt regular users. But for those who do use it, it quickly becomes indispensable.

So what I’d love is for all websites with many conceptual “items” on a page to implement the j/k keybinding as a minimum (with horizontal and inter-page navigation coming later, hopefully). To this end, I have written a jQuery plugin that should make the process fairly trivial, and allow for “active item” decoration via CSS. Click that link for a demo you can try out yourself.

If there’s enough interest, maybe someone could turn it into a community-based firefox extension (a-la AutoPager) that allows users to define the navigation items for websites that haven’t supplied their own (google search would be a handy one).

It’s worth mentioning Vimperator, which brings vi keybindings to firefox’s UI. To clarify, I’m not talking about browser functionality. I’m happy to use the existing controls for a browser, and leave the alphabet keys for use by the web-page. That way there’s no overlap, and webpages can provide contextual navigation controls that are much more powerful than the basic “scroll down 30 pixels” that a browser provides.

Recursively Default Dictionaries

Today I was asked if I knew how to make a recursively default dictionary (although not in so many words). What that means is that it’s a dictionary (or hash) which is defaulted to an empty version of itself for every item access. That way, you can throw data into a multi-dimensional dictionary without regard for whether keys already exist, like so:

h["a"]["b"]["c"] = 5

Without having to first initialise h[“a”] and h[“a”][“b”].

A dictionary with a default value of an empty hash sprang to mind, but after trying it out I realised that this only works for one level. Recursion was evidently required.

So, here’s the python solution:

from collections import defaultdict
new_dict = lambda: defaultdict(new_dict)
h = defaultdict(new_dict)

And the ruby, which seems overly noisy:

new_hash = lambda { |hash, key| hash[key] = Hash.new &new_hash }
h = Hash.new(&new_hash)

Understanding git submodules

I had to refer to this today, when I discovered (much to my surprise) that git submodules update does, for the most part, nothing.

You might expect it to update all my git submodules to the latest revision. Nope, I’m supposed to cd into each of those directories myself and run git pull myself.

The only hint that git update will do nothing of use is a single sentence in the help page, which mentions as part of the update command’s summary: “This will make the submodules HEAD be detached”. Which is a fairly unintelligible statement, even to someone who’s been using git for about a year now.

I can’t imagine when you would run git submodule update after the initial checkout - why doesn’t git submodule init just do update’s job (actually fetching the initial content), and then maybe we could have an update that actually pulls updates? I’d even be satisfied to have to use a flag, like --hard, --please or --come-on-old-chap-just-do-it-would-you

I’m pretty new to git’s submodules, and so far they just seem to take far too many manual steps (I don’t understand why they’re not fetched by a clone, for starters). I feel wrong saying so, but I miss svn:externals.

Tell me, how are things in the land of the other DVCS’s?


Update: Looks like my git is outdated, the newest version (1.6) allows a --merge or --rebase flag to update which sounds like it does what I want. Now I just need to sort out my mac’s package management :s

(view link)

OSX-style horizontal mouse scrolling for linux

OSX has this great feature where if you hold down SHIFT at the same time as using your mouse scroll wheel, it’ll scroll horizontally instead of vertically. If you don’t have a laptop, this is an immensely useful trick. Sadly, I couldn’t find any way to get this to happen on linux.

But now, thanks to some direction from stackoverflow, I finally figured out how to do it myself. The world of X11 input hackery is somewhat twisted and full of projects either abandoned or in disrepair, but I finally stumbled across the right set of tools.

If you’d like to get this (rather excellent) feature in linux, you will need the following:

  • Install the packages xbindkeys and xautomation: sudo apt-get install xbindkeys xautomation

  • Save the following file as ~/.xbindkeysrc.scm :

      ; bind shift + vertical scroll to horizontal scroll events
      (xbindkey '(shift "b:4") "xte 'mouseclick 6'")
      (xbindkey '(shift "b:5") "xte 'mouseclick 7'")
    
  • Use your favourite mechanism to ensure that the xbindkeys command is run at the beginning of your xsession (I added it to ubuntu’s “startup items” preference, but you can surely use init.d if you’re comfortable with that).

Autonose: continuous test runner for python's nosetests

Today I’ve put up the first “releaseable” version of autonose. Basically, it analyses your code’s imports, and determines exactly which tests rely on which code. So whenever you change a file, it’ll automatically run the tests that depend on the changed file (be it directly or transitively). Give it a go, and please let me know your feedback.

All you need do is:

$ easy_install autonose
$ cd [project_with_some_tests_in_it]
$ autonose

See the github project page for further information (and a screenshot).

minor metaprogramming

Can anyone tell me why ruby’s instance_variable_set would possibly require the name of a variable to start with an “@”, rather than simply assuming it? It’s a ruddy instance variable, after all…

I can find no decent alternative to python’s setattr in ruby, which surprises me.

The joys of functional programming with side-effects

Quick quiz: what will the following pieces of ostensibly identical python and ruby programs print out?

If you answered “1 2 3 4” or something similar, you’re half right. That’s what you’ll get from ruby. Surprisingly, python will give you “4 4 4 4”. I’m not the first to discover this, but that doesn’t make it any less startling.

The “fix” in python is to replace the lambda line with q.append(lambda i=i: puts(i)), because default function paramaters are evaluated at definition-time, not evaluation-time (another difference from ruby, potentially even related?). It makes sense once it’s explained what’s going on, but it’s hardly obvious.

I don’t know enough about how closures are done to say that python is wrong, but it’s certainly less desirable and more surprising…

(for those keeping score, this makes ruby: 1, python: still heaps ;P)

P.S. I couldn’t resist mentioning the title of a proposed solution to this problem: For-loop variable scope: simultaneous possession and ingestion of cake

update: As matt just pointed out, the ruby port is different in that it uses i as an argument to a block. A more faithful translation would be:

for i in (0..9)
  q << (lambda {puts i})
end

Which has exactly the same outcome as python.

So there you go. It turns out closures are confusing. Who knew? ;P

Javascript: smells like lisp

I was just struck by how lisp-ish javascript is getting (not in the powerful code-as-data way, just the “screw builtin language features, lets just use more brackets for everything” way). Exemplified by this tiny sammy example code:

$.sammy(function() { with(this) {
  get('#/', function() { with(this) {
    $('#main').text('Welcome!');
  }});
}});

Entirely too much investigation into ruby's match operator

(how could a title like that not excite you! ;P)

So yesterday I had this weird regex issue in ruby. I wanted to get a regular expression containing a given string, but didn’t want to have to manually escape all the special characters. Regexp.escape to the rescue! It escapes all regex metacharacters in any given string, and returns it as a regex. In fact, the docs assure me:

For any string, Regexp.escape(str)=~str will be true.

But not so much in practice?

>> str = "123"            
=> "123"
>> Regexp.escape(str)=~str
TypeError: type mismatch: String given


So, problem one: Regexp.escape is broken. It returns not a Regexp, but a string. Oddly enough it also seems to escape spaces and other innocuous characters, but at least you get the right result if you pump it through Regexp.compile(). However, that wasn’t my only discovery.

I mentioned this to Matt, and he couldn’t make much sense of it either. He noticed that the type error is specific to strings - if you use a number it just returns false:

>> 123 =~ 'foo'
=> false

Seems a bit odd, really. Fixnum doesn’t implement =~, nor does Integer.

So I went doc spelunking. I found implementations for =~ in the following three important classes:

Regexp: regexp =~ str: do a regex match, as you might expect

String: str =~ obj: Call obj =~ str (i.e swap the order of your operands). Not mentioned in the docs but clearly apparent in the source (and experimentation): raise a TypeError if both arguments are strings. Without this, matching one string to another would very quickly run out of stack space.

Object: obj =~ other_obj: return false

So the Regexp implementation is fine. The String implementation is a little odd. I guess it’s there to allow people to write matching statements either way, but it seems like a dangerous (and confusing) habit to condone.

But the Object implementation? Why??? What possible reason could one have for doing a match operation against two objects, neither of which implement any matching behaviour? This has the painful side effect of giving every single object I inspect a “=~” method which does nothing. No wonder Object.new() has over 120 methods on it *.

For comparison, python’s object only has 12 methods / attributes. And they’re all special names, so there’s no pollution of regular names going on.

So there you go, two spoonfuls of broken in the one discovery!

(this is ruby 1.8.6, if that matters)

* I exaggerate, over 120 methods on object are just what you get in a rails app. Vanilla ruby only has 41 by my count. but it’s still completely unnecessary, and adds noise to inflate that number

ruby dataflow library

Pretty cool, I worry that such things can’t easily be done so cleanly in python…

(view link)

rednose: coloured output formatting plugin for nosetests

I recently wrote a plugin for nosetests which greatly (imho) improves the output for failed and errored tests. The screenshot explains it best.

To install, just run easy_install rednose. Then you can run nosetests with the –rednose option.

See github.com/gfxmonk/rednose for code and more information.

ruby - longing for some discipline

More and more, I am wishing that there was some sort of strict mode I could enable in ruby to say “you know what? I’m careful with my code. Please don’t assume things behind my back”. And to be honest, this mode would pretty much be synonymous with “just do what python would do”

By default, python is strict. If you index a dict (hash) with a nonexistant key, you get a keyError. If you don’t want to have to deal with that, you can use the get method and provide a default for if the key doesn’t exist. In ruby, if you want to be strict about anything, you generally have to write your own checks to guard against the core library’s forgivingness. Forgivingness sounds nice at first, but goes completely against the idea of failing fast, and frequently delays the manifestation of bugs, making them that much harder to actually track down.

Two examples that I came across within minutes of each other the other day:

Struct.new(:a,:b,:c).new('a','b')

that should NOT go without an exception

"a|b||c".split("|")
=> ["a", "b", "", "c"]

good…. so now:

"a|b||".split("|")
=> ["a", "b"]

argh! what have you done to my third field?

Size matters

On the same codebase, with no changes pending in either system:

$ time git status
# ...
real	0m0.618s

$ time bzr status
# ...
real	0m3.795s

It’s a small thing, but it matters.

Pretty Decorators

Python decorators are cool, but they can become very messy if you sart taking arguments:

def decorate_with_args(arg):
    def the_decorator(func):
        def run_it():
            func(arg)
        return run_it
    return the_decorator

@decorate_with_args('some string')
def messy(s):
    print s

ugh. Three levels of function definitions for a single decorator. And heaven forbid you want the decorator to be useable without supplying any arguments (not even empty brackets).

So then, I present a much cleaner decorator helper class:

class ParamDecorator(object):
    def __init__(self, decorator_function):
        self.func = decorator_function

    def __call__(self, *args, **kwargs):
        if len(args) == 1 and len(kwargs) == 0 and callable(args[0]):
            # we're being called without paramaters
            # (just the decorated function)
            return self.func(args[0])
        return self.decorate_with(args, kwargs)

    def decorate_with(self, args, kwargs):
        def decorator(callable_):
            return self.func(callable_, *args, **kwargs)
        return decorator

All that’s required of you is to take the decorated function as your first argument, and then any additional (optional) arguments. For example, here’s how you might implement a “pending” and “trace” decorator:

@ParamDecorator
def pending(decorated, reason='no reason given'):
    def run(*args, **kwargs):
        print "function '%s' is pending (%s)" % (decorated.__name__, reason)
    return run

@ParamDecorator
def trace(decorated, label=None):
    if label is None:
        label = decorated.__name__
    else:
        label = "%s (%s)" % (decorated.__name__, label)
    def run(*args, **kwargs):
        print "%s: started (args=%s, kwargs=%s)" % (label, args, kwargs)
        ret = decorated(*args, **kwargs)
        print "%s: returning: %s" % (label, ret)
        return ret
    return run

Which can then be used as either standard or paramaterised decorators:

@pending
def a():
    pass

@pending("I haven't done it yet!")
def b():
    pass

@trace
def foo():
    return "blah"

@trace("important function")
def bar():
    return "blech!"

And just to show what this all amounts to:

if __name__ == '__main__':
    a()
    b()
    foo()
    bar()

reveals the following output:

function 'a' is pending (no reason given)
function 'b' is pending (I haven't done it yet!)
foo: started (args=(), kwargs={})
foo: returning: blah
bar (important function): started (args=(), kwargs={})
bar (important function): returning: blech!

Fairly simple stuff, but hopefully useful for anyone who finds themselves tripped up by decorators - particularly when trying to allow for both naked and paramaterised decorators.

P.S: I’ve made a pastie of the code in this post, because my weblog engine is not cool enough to colour-code python ;)

Too magic?

A snippet from the source code of my iPhone app, GRiS:

- (void) tableView: (UITableView*) tableView
willBeginEditingRowAtIndexPath: (NSIndexPath *) indexPath {
	// the mere presence of this method causes a swipe action to be
	// recognised, and a delete button appears. like magic!
}

I’m glad it was so easy, but fairly unnerved at the same time…

Remap shift+space to underscore

So I had this great idea yesterday (for coders, at least): remap [shift+space] to [underscore]. Turns out I am far from the first to think of this, which only enforces its awesomeness as an idea.

Mac: Put this in ~/Library/KeyBindings/DefaultKeyBinding.dict:

{
	/* turn shift + space into underscore */
	"$ " = ("insertText:", "_");
}

Linux: for a PC keyboard, try:

xmodmap -e 'keycode 65 = space underscore'

or on a mac keyboard:

xmodmap -e 'keycode 57 = space underscore'

Or if neither of those work, run:

xmodmap -pk | grep space | awk '{print $1}'

and use that number instead of 65 or 57 above.

You can put this keybinding in ~/.xmodmaprc or somesuch if you like it.

Ruby class methods

Not a very exciting realisation, but an annoying one:

$ irb
>> class A
>>   def self.meth; puts "class method!"; end
>> end

>> A.meth()
class method!

>> A.new().meth()
NoMethodError: undefined method `meth' for #<A:0x5ad160>
	from (irb):11

ick…

(sadly enough, most of my posts tagged “ruby” would be equally well tagged as “things that suck in ruby”)

Package managers

I can feel things getting ugly…

Recently I’ve been dealing with package managers, and debating which one to use. In different projects I’m currently using .deb packages (for cydia) and python .eggs (for the cheese shop / pypi / easy_install).

I’m a big fan of pypi / simple_install for python modules, but it obviously doesn’t make sense for general software. On the other hand, .deb repositories are a big world, and I don’t want to have to release all my python eggs in that global namespace.

But now I am writing a tool which relies on both python packages and installed command-line software. How on earth do you specify dependencies across packaging worlds? Should we release all packages on all packaging systems in order that any future package can depend on the appropriate version of your package in its chosen package distribution format? ugh…

And then we just get into these meta-packagers like easy-deb

What’s the answer? Perhaps if there was some way to delegate dependencies? So for example my .deb package could depend on easy_install and pypi:somelib. Then there would only need to be a single “pypi” package which could ensure the pypi-world dependencies for somelib were satisfied and somelib is installed.

(I don’t know how that would go for uninstalls though. Especially considering that simple_install can’t even do uninstalls…)

Excuse the rant, and please point out if something does in fact exist to ease the headache…

Google Reader List Opacity plugin

This is the plugin version of a script I made a while ago to set the opacity of each feed in google reader’s feed list according to how many unread items are in it.

(view link)

mocktest 0.2

new and shiny! (and almost entirely incompatible with the previous version :s)

(view link)

mocktest

I’ve been working on this a little while now. At work, I use ruby. It has its good points, but my language of choice is still python. I am, however, blown away by rspec. So I’ve tried to bring some of its features into python (namely and should_receive matchers).

Thus was born mocktest. The readme has all the examples you should need to start using it in your own testing code.

It’s also available from the cheese shop, which means you can just run easy_install mocktest to install it on your system.

I should note that this code builds upon the excellent Mock library by Michael Foord.

(view link)

Ruby is friggin weird. And a little messed up.

(if you don’t read my blog for the geeky thrill of it, you may want to give this post a miss ;))

Follow my little IRB session, if you will:

>> nil or "val"
=> "val"

>> puts (nil or "val").inspect
=> "val"

>> x = nil || "val"
=> "val"

>> x
=> "val"

>> y = nil or "val"
=> "val"

(wait for it…)

>> y
=> nil


Seriously, ruby. What the crap?


Okay, so I just figured out what’s going on here. “or” works both as a logic operator and a conditional statement. Just like you can do:

x = something_dangerous() rescue "x failed!"

and

puts "x is greater than 10" if x > 10

It would seem you can also do

x = some_value or puts "i guess the assignment didn't evaluate to true"

Meaning that in my example above:

y = nil or "val"

Ruby evaluates it as:

(y = nil) or ("val")

(i.e. in the second set of brackets, y is not actually assigned to anything)

Of course, || is solely a logic operator. Which is why it looks like you get different behaviour when you use || instead of or.

When I found out about ruby supporting both sets of logic operators (&&, ||, !) and (and, or, not), I thought it was dumb, but just a matter of preference which type you prefer.

When I found out that the symbol-based ones bind tighter than the keywords, I winced a little and noted to myself never to rely on that, because it’s neither readable or obvious.

Now that I’ve stumbled upon this latest gem of knowledge, It just makes me cringe…

@ruby: You're Doing it Wrong.

Pay attention to the output types, kids:

>> { "key" => "val" }.reject { false }
=> {"key" => "val"}

looking good so far...

>> { "key" => "val" }.select { true }
=> [["key", "val"]]

eww... what?

HTML to PDF converter

Given the integration of PDF into Mac OS X, I was surprised to find that there didn’t seem to be any tool to convert HTML files into PDFs. So, like any frustrated coder I wrote my own little script to do it: html2pdf.py

Requires python and OSX 10.5 (Leopard). It uses a WebKit view for the rendering, so in theory it should work with any URL that safari can handle - but I haven’t exactly tested it thoroughly…

Trashy.app

In any OSX “document window”, there is a little icon representing the current document. If you click and hold this icon, it becomes a draggable alias for the file. You can then use that alias much as you would the file itself (as if you had dragged it from the finder) - but one thing you can’t do is delete the file by dropping it in the trash.

Trashy is a simple program to fix that. Put it in your dock, and it will send anything you drag onto it into the trash. Click here to download Trashy.

Here’s the entire source code applescript, if you’re curious (or just naturally suspicious of running random programs you found on internet, as you ought to be):

on open fileList
	tell application "Finder"
		repeat with f in fileList
			set n to 0
			repeat while class of f is alias file
				if n is less than 5 then
					set n to n + 1
					set f to original item of f
				end if
			end repeat
			move f to the trash
		end repeat
	end tell
end open

on run
	tell application "Finder" to open the trash
end run

MetaMonkey.app

[I made this]

A Lightweight Metadata manager for OSX, allowing you to easily tag and rate any type of file (which is then indexed by spotlight). This is basically my iPhoto replacement now, after spending a week trying to fix its broken database, orphan files, duplicates and all that rubbish. This is much simpler, and it can be used to tag any type of file I want.

This is the result of 2 days work, and just 400 lines of code. That’s pretty decent in my book, the combination of python and cocoa is a pretty powerful beast. Once you get past all the runtime errors and pyobjc bridge troubles, that is…

(view link)

Photo.99

According to matt, I just made milk come out of Python’s nose.

[i made this, although I’m hardly proud of that fact]

For those who care (or are thoroughly confused), this is a python function to find out where an alias file (the mac’s version of a shortcut or soft link) actually points to. And no, it really shouldn’t have to be this hard.