Sunday, May 23, 2010

The Boot, but not the Heel (Time For Better Search Engines, Part 2)

Well, here we go.

As many people must know, Secretary of the Interior Salazar did say "Our job basically is to keep the boot on the neck of British Petroleum to carry out the responsibilities they have both under the law and contractually to move forward and stop this spill,", which I discovered by throwing words out of the quote (as I should have thought to do); in particular, Rand Paul "improved" it just a bit by adding a heel ("boot heel on the") and changing "neck" to "throat". An ugly image that should have been disavowed by the administration, instead of being repeated, somewhat sheepishly, by Robert Gibbs, the worst presidential press secretary I can remember, at least I can honestly say that has always been my gut feeling.

But it affords an excuse for another lesson on search engines. Voila: http://newstimeline.googlelabs.com/ which I found, described as "experimental" on searching for "search engine" "by date". I had already tinkered with the quote removing the heel, and continuing to use "-rand" "-paul" to find only pages with no mention of rand or paul, and was coming up with Robert Gibbs, then it was Gibbs quoting Salazar.

But the google news timeline really let me do just what I'd been wanting to do, namely find "who said it first". When I did a search for "boot on the neck of bp" using the timeline, I got an array of columns, one per date, with news stories. The bad news is it seems to be limited to news stories from major sources, but it did give a graphic picture of stories containing the phrase blowing up starting on 5/21, when Rand Paul was quoted slightly misquoting the quote. Arrows let me walk back in time -- little or nothing from May 13-20, then a cluster of references going all the way back to May 2, and then stopping.

I find the interface nicely graphic, but slow and cumbersome, and if somebody used the phrase a year or 2 ago, it really wouldn't be much help (correct me, google, if you can). But it did the job, and nicely shows the value of such a feature if we just improve on it a bit, and integrate it into regular google.

It also suggests another class of improvements we could use in our search engines: something approaching search by meaning. Computers can't really "understand" meaning, but they are getting better and better with translation, which indicates quite a bit of adaptation to the structure of language, and so, suppose I could have posed a search like this:

"{boot heel}* on the neck ..." where {...}* means "What's in the bracket or something roughly equivalent".

One problem is, if you have a million exact matches, and a smaller number of modified matches, how to give the user some handle on the variation(s). Typical "search by relevance" arguments would probably see the exact match as way better than the inexact match, so that it would be way, way down on the list. I'd suggest something like, as either an alternative or addition to the current type of google listing, something like:

VARIATIONS:
"boot heel on the throat" 11,707 hits [date range: 5/21-5/23]
"boot heel on the neck..." 1,305 hits [date range: 5/21-5/23] (I'm making the numbers up)
"boot on the neck" 7,222 hits [date range 5/2-5/23]

and then you would click on a variant to see all the specific examples in the format normally used by google, or another search engine.

There is much more to be said about search engines, and vastly much more that I don't know, I'm sure.

The thing about the internet is "The truth is out there", but often, like the dynosaur bone in the rock, it can be quite a job to pry it out of there.

Saturday, May 22, 2010

It's Time for Better Search Engines (Who said: "..put my boot heel on the throat of BP")

Is there a search engine that will let me ask?

Who said: "..put my boot heel on the throat of BP"

OK Rand Paul said it, or lets say the the answer looks like a sort of summary "Rand Paul said it" AND a list of pointers to articles quoting Paul as saying it, and maybe a few quite different entries, such as Rand Paul saying he didn't say it. So what if I could say "show me the most atypical entries first". That sounds like a very generally useful followup question when you get 2 million hits, and as far as you can tell the all say more or less the same thing. Could a computer program do a reasonable approximation of what a human (with a year to wade through the 2 million hits) could do? My guess is yes, that wouldn't be a big stretch even.

I've been skimming so many web pages, I feel like I've seen something somewhere quoting someone in Obama's cabinet actually using a phrase like: "..put my boot heel on the throat of BP". Can I confirm that? or be very comfortable in saying it didn't happen (or hear who the Cabinet member was, and see if he/she gets fired the next day)? Well, I can find someone directly attributing the phrase to Obama: "I'll put my boot heel on the throat of BP." Barry Obamma http://twitter.com/busybrains/status/14497679203.

An important question, and perhaps it represents one of those big stories the news media misses: How many people today, next week, next month, next November literally believe or will believe Obama did say that? Are there any pollsters asking that sort of question? My guess, it could easily be something on the order of as many people as think Saddam Hussein was directly behind 9/11 (at least some pollsters paid attention to that).

Relying on existing search engines and their limited abilities, how close could I come to answering this sort of question?
Well if somebody said it before Rand Paul, and Paul picked it up a couple of days later, wouldn't there be some references to this on the web, before there was any association between the phrase and Rand Paul?

Consider this Google search: "put my boot heel on the throat of BP" -paul

The quotes ("") mean I don't want just any combination of the words "put", "my", "boot".... but want that exact phrase. The "-paul" means nothing containing the word "Paul". So I get 4 hits, all from context being clearly from the Ron Paul interview, except for the twitterer directly attributing it to "Barry Obamma".

OK, but what if the quoted secretary was named Paul ____?_____ ?
I tried already -"rand paul", which picked up too many pages in which Rand Paul was just referred to as Paul. Some other approach? OK, when Paul was putting words in the President's mouth, he started with "What I don't like from the president's administration..."

HOW ABOUT: "put my boot heel on the throat of BP" -paul -"What I don't like from"

That cuts the hit count down quite a bit. There are a couple in which "don't" came out "dont" or "donit", or they just cut the quote down so the whole phrase
"What I don't like from" didn't appear, and finally we are left with the twitterrer quoting "Barry Obamma" which I'm inclined to discount.

Suppose I could say "Who said it first"? Computer logic to approximate that could rely on that fact that every internet page in google's (or another search engine's) vast database will have a date and time of posting. In fact, can't I just tell google "display in order of posting", which would make the question much more easy to answer? NO, apparently not; at least I don't see how. I could probably put a front end on google accessing google via it's more computer friendly interface (or API), and voila, a new and useful search engine.

For more (and drier) discussions of search engines, see http://en.wikipedia.org/wiki/List_of_search_engines
or
searchenginewatch.com,

or just google "search engines". You will get, according to google "About 69,600,000 results". Bon appetite!

Sunday, May 16, 2010

Email from a Friend of a Friend: From Urban Legends to Political Smears

Going back for years, anonymous forwarded emails have been a source of urban legends, jokes, and other frivolous but sometimes entertaining stuff.

Now they facilitate the equivalent of old fashioned whispering campaigns, but vastly more powerful, and while that sort of thing mostly happenned on a state level or smaller (it being hard to do it on a difficult scale and not get caught). E.g. the rumor that Ann Richardson was a lesbian in the Texas governor's race, and the one spread in just one state in the 2000 republican primary campaign that McCain had an out of wedlock black child.

But email knows no boundaries. A couple of years ago, I started getting forwarded emails from my Mom, with claims that could generally be shot down easily with 15 minutes of internet research. They seemed to be really affecting my parents' views, and based on what they told me, they were generally believed by most of their friends. But they were quite simply full of provable lies. They would show signs of having been forwarded a half dozen or so times, with visible 'CC' lists giving them a sort of homey look. You receive this sort of thing from a friend who forwarded, and are apt to assume it was written by a friend of that friend, or a friend of a friend of a friend, not that they are being churned out by some sort of under the radar political operator, but that is what I think they are, based partly on consistency of style.

Here are a couple of references:

The New Right-Wing Smear Machine by Christopher Hayes Oct 25, 2007

MyRightWingDad.net: FW: OBAMA DEATH LIST

If the same sort of phenomenon is going on with Liberal or Ultra Liberal sources, I would be very interested to investigate that as well.