it.gen.nz

Writings on technology and society from Wellington, New Zealand

Sunday, October 19, 2008

A little programming project – part 3

A few weeks ago I blogged about writing a little program to make my life easier. (The entries are here and here.) In summary, this program automates the messy but easy administrative task of editing links to the sound files of my radio programs into their respective blog entries.

I’ve been experimenting with this prgoram for a few weeks, but I can only really tell if it’s working properly on a Thursday after my radio slot. And, I’m happy to say, it all went swimmingly last Thursday. After my radio program, I kept checking the Radio NZ site manually to see when the sound files went up on it, and then when I went to my blog the links had been added automagically. Wonderful – just one less thing to worry about.

I’ve posted the listing of the program, which is a Python script, here. Given the messiness of what it’s doing there seem to be indecently few lines of actual code. I’ll explain below how they work.

The first few lines are just there to set up pointers to the resources we need, like the Radio NZ website and the blog. The first interesting line is the one that begins while – this sets a up a loop so we can keep checking whether the links to the sound files are on the RNZ page yet.

The next lines get the contents of the RNZ page and look for the links. See my previous blog entries on this for details. And you can see a sleep statement which makes the loop wait for one minute before checking again. It would be rude to hammer on RNZ’s door too often.

You can see how there is no statement in Python ending the loop. All we do is set the indent back to the same level as where the loop started and Python just knows. Some people hate this aspect of the language because it means that white space is under some circumstances significant and can alter the path of your program, something not seen since the heyday of IBM’s supremely rude Job Control Language. I don’t mind this, in fact I find it quite a natural way of showing program structure. Python forces you to show the program’s structure as you write it, rather than leaving it to the computer to “pretty print” it for you. And one other wrinkle – you’ll see that the sleep statement – based on the time module, but that’s another story – is further indented, because it depends on the if before it. Going right back to the left margin closes both the if and the loop.

The line beginning linktext is building the HTML we need to insert into the blog. Then, we get the last blog entry, and split it at the string “download the audio”. Typically this is the last line of the blog entry, with only a full stop afterwards. The if statement following that checks that we did indeed find “download the audio” – if we did, we paste in the links and save it to the blog.

This program will run on pretty much any Internet-connected computer which has a Python interpreter on it. That’s any Mac or *nix box, and Windows boxes where the owner has installed Python.

In practice, because I don’t want to have to trigger this program manually, I am running it on a GNU/Linux server I have semi-employed at home as a file and media server. I set a cron job to run it at 11:15am every Thursday. Then, it loops until the sound files are posted at RNZ, then edits my last blog entry.

The program needs a loop limit so that it doesn’t keep trying if the sound files don’t get posted. I’ll add that later, before I go away in a couple of weeks and miss two weeks on the radio.

But it’s good enough for now, and I’m pleased to have managed to get it working.

posted by colin at 6:16 pm  

4 Comments

  1. I am seaching for some idea to write in my blog… somehow come to your blog. best of luck. Eugene

    Comment by Eugene — 22 October 2008 @ 12:23 pm

  2. It’s amazing what a little bit of Python can do.

    Just a stylistic matter, I would prefer to write the while-loop as follows:

    while True :
    page = urllib.urlopen(linkspage).read()
    links = re.findall(r'”http\S*?echnology\S*?”‘,page)
    if len(links) >= 2 :
    break
    time.sleep(60)

    saves checking len(links) twice. Also you have a line that just says

    links

    which I don’t think is doing anything useful, and can be removed.

    Comment by Lawrence D'Oliveiro — 25 October 2008 @ 11:24 am

  3. One subtlety worth mentioning: in the line

    page = urllib.urlopen(linkspage).read()

    the urllib.urlopen call is returning a file object; you call this read() method to obtain the contents, then discard the object, whereupon Python’s memory management will automatically close it. May horrify some people used to having to explicitly close every file they open, but it works!

    Comment by Lawrence D'Oliveiro — 25 October 2008 @ 11:26 am

  4. Lawrence

    You’re completely right about the spare “links” line. And also, of course, about the way I chose to implement getting a web page – I know that a file object is created, but I’m not interested in it and I’m delighted to let Python clean it up for me. All I want is the contents of the web page in a form I can manipulate.

    Cheers

    Colin

    Comment by colin — 27 October 2008 @ 9:44 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress