Home » 2010 » January

Monthly Archives: January 2010

expand your scope – you can dot-source more than just files

I’m working on a small project that will require me to dot-source some PowerShell files in order to load their functions, aliases, and variables and make them available in a session. Actually, I have to do a little more than dot-source each file, but I’ll keep the example simple to illustrate the wrinkle I ran into.

Suppose I have this file, file-to-load.ps1:

Function Get-MyName
{
    Write-Output "Blair Conrad"
}

I dot-source it from the console, and everything’s great:

PS> . .\file-to-load.ps1
PS> Get-MyName
Blair Conrad

Because I’ll be doing this over and over, and I want to manipulate the .ps1 files a little more, I decide to wrap the dot-sourcing in a function, and call it.

Function Load-File([string] $filename)
{
    . $filename
}
PS> Load-File(‘.\file-to-load.ps1’)
PS>
PS> Get-MyName
The term ‘Get-MyName’ is not recognized as the name of a cmdlet, function,
script file, or operable program. Check the spelling of the name, or if
a path was included, verify that the path is correct and try again.
At line:1 char:11
+ Get-MyName <<<<
+ CategoryInfo : ObjectNotFound: (Get-MyName:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException

Not good. The Get-MyName function is loaded inside the scope of the Load-File function. It’s only available as long as I’m inside Load-File.

I thought about modifying all the script files that were to be loaded, scoping each contained function, alias, and variable as global, but that would be a pain, and I’m not going to be the only one writing these files. Eventually, I came upon it: dot-source the Load-File function:

PS> . Load-File(‘.\file-to-load.ps1’)
PS>
PS> Get-MyName
Blair Conrad

I’ll admit I don’t quite understand why it works, but for now, I’m content to know that it does.

Brandon Sanderson – Good author and class act

I won’t talk about every book I read here – that’s what Goodreads is for. However, every once in a while something exceptional’s bound to come up, and I’ll be compelled to mention it here. This is one such thing.

Anyone who’s talked to me this year will have heard how much I enjoyed Brandon Sanderson’s Mistborn Trilogy. Individually, they’re very good books, but as a whole, they’re exceptional – the planning Mr. Sanderson put into the books ties them together to provide a tight, compelling read, the likes of which I rarely see. Anyhow, before Christmas, Mr. Sanderson offered the Gift of Mistborn for 10 days – buy 7 signed, hard cover, first edition Mistborn: The Final Empire for $70US, plus shipping. I thought this was a tremendous deal and convinced a coworker to go in on it with me.

A few days after I ordered, I received an e-mail saying that Mr. Sanderson had accidentally signed and personalized the entire trilogy for me, so he’d be sending me those as well. And sure enough they arrived just after Christmas, looking absolutely beautiful.

This was a very generous thing to do. He could’ve said, “Ooops, I ruined those copies,” and sold them at a discount or ground them up and burnt them to heat his house. Instead, he poured more of his own money into shipping the books to Canada, just to make me happy.

Thanks!

Acronyms of the Day: VOMIT and BARF

I was listening to CBC’s Ideas Podcast today, specifically to You Are “Pre-Diseased”, Part 2 (which aired on 18 January and is available for download until mid-February, in case you want to listen) when I heard a new acronym that’s relevant to my Day Job.

Victim Of Medical Imaging Technology refers to patients who are operated on after an abnormality is detected in an imaging procedure, but nothing is found during the operation.

Closely related is Brainlessly Applying Radiological Findings – treating the result of an imaging study, not the patient and her symptoms.

I don’t mean to make light of the plight of patients who undergo operations or treatments when they’re not warranted, but I thought the acronyms themselves are good for a chuckle. Read more.

Cookies, Redirects, and Transcripts – Supercharging urlfetch

LibraryHippo‘s main function is fetching current library account status for patrons. Since I have no special relationship with any of the libraries involved, LibraryHippo web scrapes the libraries’ web interfaces.

The library websites issue cookies and redirects, so I needed to do something to augment the URL Fetch Python API.
I wrote a utility class that worked with the urllib2 interface, but that didn’t allow me to set the deadline argument, and I wanted to increase its value to 10 seconds. I resigned myself to wiring up a version that used urlfetch, when I found Scott Hillman’s URLOpener, which uses cookielib to follow redirects and handle any cookies met along the way.

URLOpener looked like it would work for me, with a few tweaks – it didn’t support relative URLs in redirects, it doesn’t allow one to specify headers in requests, and it lacked one feature that I really wanted – a transcript.

Why a transcript?

The libraries don’t provide a spec for their output, so I built the web scraper by trial and error, sometimes putting books on hold or taking them out just to get test data. Every once in a while something comes up that I haven’t coded for and the application breaks. In these cases, I can’t rely on the problem being reproducible, since the patron could’ve returned (or picked up) the item whose record was troublesome or some other library state might’ve changed. I need to know what the web site looked like when the problem occurred, and since the ultimate cause might be several pages back, I need a history.

I started adding a transcript feature to the URLOpener – recording every request and response including headers. As I worked, I worried about two things:

  • the fetch logic was becoming convoluted, and
  • the approach was inflexible – what if later I didn’t want to follow redirects, or to keep a transcript?

Decorators to the rescue

I decided to separate each bit of functionality – following redirects, tracking cookies, and keeping a transcript – into its own decorator, to be applied as needed. First I teased out the code that followed redirects, with my change to allow relative URLs:

class RedirectFollower():
    def __init__(self, fetcher):
        self.fetcher = fetcher

    def __call__(self, url, payload=None, method='GET', headers={},
                 allow_truncated=False, follow_redirects=False, deadline=None):
        while True:
            response = self.fetcher(url, payload, method, headers,
                                    allow_truncated, False, deadline)
            new_url = response.headers.get('location')
            if new_url:
                # Join the URLs in case the new location is relative
                url = urlparse.urljoin(url, new_url)

                # Next request should be a get, payload needed
                method = 'GET'
                payload = None
            else:
                break

        return response

After that, the cookie-handling code was easy to put in its own class:

class CookieHandler():
    def __init__(self, fetcher):
        self.fetcher = fetcher
        self.cookie_jar = Cookie.SimpleCookie()

    def __call__(self, url, payload=None, method='GET', headers={},
                 allow_truncated=False, follow_redirects=True, deadline=None):
            headers['Cookie'] = self._make_cookie_header()
            response = self.fetcher(url, payload, method, headers,
                                    allow_truncated, follow_redirects, deadline)
            self.cookie_jar.load(response.headers.get('set-cookie', ''))
            return response

    def _make_cookie_header(self):
        cookieHeader = ""
        for value in self.cookie_jar.values():
            cookieHeader += "%s=%s; " % (value.key, value.value)
        return cookieHeader

Now I had the URLOpener functionality back, just by creating an object like so:

fetch = RedirectFollower(CookieHandler(urlfetch.fetch))

Implementing transcripts

I still needed one more decorator – the transcriber.

class Transcriber():
    def __init__(self, fetcher):
        self.fetcher = fetcher
        self. transactions = []

    def __call__(self, url, payload=None, method='GET', headers={},
                 allow_truncated=False, follow_redirects=True, deadline=None):
        self.transactions.append(Transcriber._Request(vars()))
        response = self.fetcher(url, payload, method, headers,
                                    allow_truncated, follow_redirects, deadline)
        self.transactions.append(Transcriber._Response(response))
        return response

    class _Request:
        def __init__(self, values):
            self.values = dict((key, values[key])
                               for key in ('url', 'method', 'payload', 'headers'))
            self.values['time'] = datetime.datetime.now()

        def __str__(self):
            return '''Request at %(time)s:
  url = %(url)s
  method = %(method)s
  payload = %(payload)s
  headers = %(headers)s''' % self.values

    class _Response:
        def __init__(self, values):
            self.values = dict(status_code=values.status_code,
                               headers=values.headers,
                               content=values.content,
                               time=datetime.datetime.now())

        def __str__(self):
            return '''Response at %(time)s:
  status_code = %(status_code)d
  headers = %(headers)s
  content = %(content)s''' % self.values

To record all my transactions, all I have to do is wrap my fetcher one more time. When something goes wrong, I can examine the whole chain of calls and have a better shot at fixing the scraper.

fetch = Transcriber(RedirectFollower(CookieHandler(urlfetch.fetch)))
response = fetch(patron_account_url)
try:
    process(response)
except:
    logging.error('error checking account for ' + patron, exc_info=True)
    for action in fetch.transactions:
            logging.debug(action)

Extra-fine logging without rewriting fetch

The exercise of transforming URLOpener into a series of decorators may seem like just that, an exercise that doesn’t provide real value, but provides a powerful debugging tool for your other decorators. By moving the Transcriber to the inside of the chain of decorators, you can see each fetch that’s made due to a redirect, and which cookies are set when:

fetch = RedirectFollower(CookieHandler(Transcriber(urlfetch.fetch)))

The only trick is that the Transcriber.transactions attribute isn’t available from the outermost decorator. This is easily solved by extracting a base class and having it delegate to the wrapped item.

class _BaseWrapper:
    def __init__(self, fetcher):
        self.fetcher = fetcher

    def __getattr__(self, name):
        return getattr(self.fetcher, name)

Then the other decorators extend _BaseWrapper, either losing their __init__ or having them modified. For example, CookieHandler becomes:

class CookieHandler(_BaseWrapper):
    def __init__(self, fetcher):
        _BaseWrapper.__init__(self, fetcher)
        self.cookie_jar = Cookie.SimpleCookie()
...

And then the following code works, and helped me diagnose a small bug I’d originally had in my RedirectFollower. As a bonus, if I ever need to get at CookieHandler.cookie_jar, it’s right there too.

fetch = RedirectFollower(CookieHandler(Transcriber(urlfetch.fetch)))
fetch(patron_account_url)
for action in fetch.transactions:
    logging.debug(action)

New Year’s Python Meme

It’s a little late, but I’m participating in Tarek Ziadé’s Python Meme (via Richard Jones):

  1. What’s the coolest Python application, framework or library you have discovered in 2009?

    Google App Engine. I’d known of it before, but hadn’t tried it until early this year when I started to work on LibraryHippo.

  2. What new programming technique did you learn in 2009?

    I’m not sure if this counts as a technique, but I recently found (and found a use for) Jean-Paul S. Boodhoo’s Static Gateway Pattern. At the Day Job, we have a lot of hard-coded dependencies and reliance on well-known static methods for authorization. The Static Gateway Pattern made it easy to provide an injectable implementation without rewriting the whole application. I expect it to continue to be useful, at least until we take the time to introduce a full Inversion of Control container.

  3. What’s the name of the open source project you contributed the most in 2009? What did you do?

    I didn’t, really. Unless you count LibraryHippo. I’ve an interest in working on Noda Time, but I haven’t managed to yet.

  4. What was the Python blog or website you read the most in 2009?

    Word Aligned

  5. What are the three top things you want to learn in 2010?

Meet LibraryHippo

I enjoy reading and using my local libraries. My wife and I have four library cards between us – one each for the Waterloo Public Library, one for the Kitchener Public Library, and one for the Region of Waterloo Library. Using our cards, we were able to find all kinds of books to read and DVDs to watch, but organizing our borrowing was a little annoying, since:

  • we had to log into four different library accounts to get an overview of our current borrowings and holds,
  • each account had a long, hard-to-remember ID, and
  • the library would send e-mail when items were overdue, not in time to take them back.

I’d been using Library Elf to manage our cards, but they’d recently moved to a for-pay model, so I combined a sense of frugality with the desire to build something using a new technology and created LibraryHippo, a Google App Engine-powered web application that takes care of my library cards.

LibraryHippo logoLibraryHippo:

  • manages multiple cards per family
  • shows a comprehensive overview of a family’s current library status
  • sends e-mail every morning if
    • a family has items that are nearly due
    • there are items ready to be picked up, or
    • there’s a problem checking an account

Feel free to check out the project, hosted on Google Code. A fair number of my future posts will talk about the adventures I’ve had implementing and improving LibraryHippo.