Sunday, May 23, 2010

The Boot, but not the Heel (Time For Better Search Engines, Part 2)

Well, here we go.

As many people must know, Secretary of the Interior Salazar did say "Our job basically is to keep the boot on the neck of British Petroleum to carry out the responsibilities they have both under the law and contractually to move forward and stop this spill,", which I discovered by throwing words out of the quote (as I should have thought to do); in particular, Rand Paul "improved" it just a bit by adding a heel ("boot heel on the") and changing "neck" to "throat". An ugly image that should have been disavowed by the administration, instead of being repeated, somewhat sheepishly, by Robert Gibbs, the worst presidential press secretary I can remember, at least I can honestly say that has always been my gut feeling.

But it affords an excuse for another lesson on search engines. Voila: which I found, described as "experimental" on searching for "search engine" "by date". I had already tinkered with the quote removing the heel, and continuing to use "-rand" "-paul" to find only pages with no mention of rand or paul, and was coming up with Robert Gibbs, then it was Gibbs quoting Salazar.

But the google news timeline really let me do just what I'd been wanting to do, namely find "who said it first". When I did a search for "boot on the neck of bp" using the timeline, I got an array of columns, one per date, with news stories. The bad news is it seems to be limited to news stories from major sources, but it did give a graphic picture of stories containing the phrase blowing up starting on 5/21, when Rand Paul was quoted slightly misquoting the quote. Arrows let me walk back in time -- little or nothing from May 13-20, then a cluster of references going all the way back to May 2, and then stopping.

I find the interface nicely graphic, but slow and cumbersome, and if somebody used the phrase a year or 2 ago, it really wouldn't be much help (correct me, google, if you can). But it did the job, and nicely shows the value of such a feature if we just improve on it a bit, and integrate it into regular google.

It also suggests another class of improvements we could use in our search engines: something approaching search by meaning. Computers can't really "understand" meaning, but they are getting better and better with translation, which indicates quite a bit of adaptation to the structure of language, and so, suppose I could have posed a search like this:

"{boot heel}* on the neck ..." where {...}* means "What's in the bracket or something roughly equivalent".

One problem is, if you have a million exact matches, and a smaller number of modified matches, how to give the user some handle on the variation(s). Typical "search by relevance" arguments would probably see the exact match as way better than the inexact match, so that it would be way, way down on the list. I'd suggest something like, as either an alternative or addition to the current type of google listing, something like:

"boot heel on the throat" 11,707 hits [date range: 5/21-5/23]
"boot heel on the neck..." 1,305 hits [date range: 5/21-5/23] (I'm making the numbers up)
"boot on the neck" 7,222 hits [date range 5/2-5/23]

and then you would click on a variant to see all the specific examples in the format normally used by google, or another search engine.

There is much more to be said about search engines, and vastly much more that I don't know, I'm sure.

The thing about the internet is "The truth is out there", but often, like the dynosaur bone in the rock, it can be quite a job to pry it out of there.

