Monthly Archive for July, 2007
-
A test instance of the MT4 beta available through Amazon’s EC2
After updating BBQueue for the 3rd time to fix it to work with changes to Blockbuster’s site, I decided that there’s got to be a better way to go than regular expressions for pulling the data from their pages. Regular expressions are super handy and great for a lot of things, but handling HTML whose structure might change at any time without notice isn’t one of them. What I really needed was an HTML parser, and having been greatly impressed with Hpricot for Ruby I set out to find a similar library for Javascript. However, I came up empty handed. In the process though I came across any number of Javascript XML parsers and it dawned on me that, hey if I’m lucky and Blockbuster’s using XHTML, well that would work just fine. In fact, the widget engine itself has a built in XML parser and DOM objects as of not too long ago. Sure enough, Blockbuster’s doctype proudly proclaimed that it was XHTML 1.0 However, upon attempting to parse it, my plans were quickly dampened. At first I was getting errors about a few unsupported entities, like &nsbp; A little massaging of the HTML cleared that right up, and that’s when things got ugly: mismatched end tags all over the place. A quick pass through the validator confirmed it: 305 errors. I guess I should have figured that if they can’t be bothered to provide RSS feeds of your movie queue in the first place that having valid XHTML wouldn’t have been high on their priorities either. (In the interest of full disclosure though, it’s not for me either: 22 errors. The shame of it all.) So now it’s back to the original plan. Anyone know a good HTML parser written in Javascript?

Elsewhere