I had a really sticky problem. I’m building a web 2.0 application that aggregates data from disparate social media sites. The good news is that MANY sites have really great, REST API’s while many do not. My nemesis was Google Blog Search. Sorry Charlie, no API!
What to to?
I’ve burned several nights trying to work around this problem with mixed results and very disappointing results. Plugging away and looking at dozen’s of different ways to tackle this problem, I finally settled on an approach suggested here by igvita.com. While this got me 90% of the way there, I had to do a bit of twiddling to get it to work properly.
First, the igvita.com article didn’t explain how to submit data. No big deal, you can pass the search criteria in on the requesting URL:
@url = "http://blogsearch.google.com/blogsearch?hl=en&ie=UTF-8&q=ruby+rails"
So far, so good.
The next trick is to use XPATH to locate and snag the piece of data you’re interested in. This is where Firebug completely rocks. If you’re not using Firebug for web development, you’re either a glutton for punishment or have been living under a rock for quite a while. In my particular case, the only piece of data that I was interested was the number of hits on the particular query:

The igvita.com article goes into detail regarding how to find the right Xpath by leveraging Firebug. Firebug pretty slick and the article does a great job explaining how to use it, so I won’t go into the details here. The problem is that the HTML needs to be converted to XML before the XPATH will work properly. Below is the Ruby code in its entirety:
require 'rubygems'
require 'open-uri'
require 'hpricot'
@url = “http://blogsearch.google.com/blogsearch?hl=en&ie=UTF-8&q=ruby+rails”
@response = ”
begin
# HPricot RDoc: http://code.whytheluckystiff.net/hpricot/
doc = Hpricot(@response)
xml = Hpricot.XML(open(@url).read)
# Retrive number of comments
number_of_hits = (xml/”/html/body/div[5]/table[3]/tbody/tr/td[2]/font/b[3]“).inner_html
puts “Number of hits: #{number_of_hits}”
rescue Exception => e
print e, “n”
end
Happy hacking,
Peter
Technorati Tags: hpricot, RoR, Ruby, ruby on rails
In general, I’m not a huge fan of code reviews but they can be an effective tool if they’re kept in check. A co-worker asked me to put together some talking points about code reviews, which I did, regarding some of the things I thought would be material to keep in mind for a code review process. I thought they might be generally useful, so I’m posting them here.
- In general, I’m not a big fan of invasive, comprehensive code reviews. They tend to have diminished value the more comprehensive they are. I AM a fan of spot checks and quick audits concentrated mainly on those developers that maybe unknown quantities, e.g. new contractors, new team members etc.
- AUTOMATE whenever and wherever possible. If style standards or code test coverage can be automated (which it can), it should. Tools like CheckStyle and others are invaluable and are huge time savers. They also have the benefit of being impersonal. Nobody gets offended if a tools yells at you. Before anything’s done manually, ask “Can this be automated?” Manual review *should be* the exception, not the rule…
- If there are formal code reviews, the purpose and the criteria that is going to be checked should be explicitly stated up-front. I’ve seen too many code review sessions spiral into arguments that essentially boil down to “this is NOT how *I* would have implemented it!” as opposed to having a constructive discussion that ultimately adds value to the process.
- Code efficiency can be a dicey topic. I’ve seen a lot of developers spend hours arguing about the efficiency of a piece of code that might run once per day. A strong moderator that has good mediation skills is important. The reviewers must be mature enough to know when to pick a fight and when good enough is good enough.
- A good process with well defined roles goes a long way to alleviating arguments and maximizing the value of code reviews. In terms of roles, I would suggest: 1) Moderator 2) Peer 3) Developer who is being checked. The peer reviews and then the three people get together in a meeting and review the results. The moderator mediates and keeps the meeting rolling along and notes any action that is required to be taken. I would suggest a time and scope limited review session. Around 1-1.5 hours seems to work with follow-up sessions as needed. Any longer and eyes glaze over and people’s interest fads…
- Don’t let perfect get in the way of good enough – again there has to be a good balance and generally this discretion falls to the moderator, hence the moderator should be somebody who has good facilitator skills.
Technorati Tags: Code Review, Coding