Sunday, May 23, 2010

The Boot, but not the Heel (Time For Better Search Engines, Part 2)

Well, here we go.

As many people must know, Secretary of the Interior Salazar did say "Our job basically is to keep the boot on the neck of British Petroleum to carry out the responsibilities they have both under the law and contractually to move forward and stop this spill,", which I discovered by throwing words out of the quote (as I should have thought to do); in particular, Rand Paul "improved" it just a bit by adding a heel ("boot heel on the") and changing "neck" to "throat". An ugly image that should have been disavowed by the administration, instead of being repeated, somewhat sheepishly, by Robert Gibbs, the worst presidential press secretary I can remember, at least I can honestly say that has always been my gut feeling.

But it affords an excuse for another lesson on search engines. Voila: which I found, described as "experimental" on searching for "search engine" "by date". I had already tinkered with the quote removing the heel, and continuing to use "-rand" "-paul" to find only pages with no mention of rand or paul, and was coming up with Robert Gibbs, then it was Gibbs quoting Salazar.

But the google news timeline really let me do just what I'd been wanting to do, namely find "who said it first". When I did a search for "boot on the neck of bp" using the timeline, I got an array of columns, one per date, with news stories. The bad news is it seems to be limited to news stories from major sources, but it did give a graphic picture of stories containing the phrase blowing up starting on 5/21, when Rand Paul was quoted slightly misquoting the quote. Arrows let me walk back in time -- little or nothing from May 13-20, then a cluster of references going all the way back to May 2, and then stopping.

I find the interface nicely graphic, but slow and cumbersome, and if somebody used the phrase a year or 2 ago, it really wouldn't be much help (correct me, google, if you can). But it did the job, and nicely shows the value of such a feature if we just improve on it a bit, and integrate it into regular google.

It also suggests another class of improvements we could use in our search engines: something approaching search by meaning. Computers can't really "understand" meaning, but they are getting better and better with translation, which indicates quite a bit of adaptation to the structure of language, and so, suppose I could have posed a search like this:

"{boot heel}* on the neck ..." where {...}* means "What's in the bracket or something roughly equivalent".

One problem is, if you have a million exact matches, and a smaller number of modified matches, how to give the user some handle on the variation(s). Typical "search by relevance" arguments would probably see the exact match as way better than the inexact match, so that it would be way, way down on the list. I'd suggest something like, as either an alternative or addition to the current type of google listing, something like:

"boot heel on the throat" 11,707 hits [date range: 5/21-5/23]
"boot heel on the neck..." 1,305 hits [date range: 5/21-5/23] (I'm making the numbers up)
"boot on the neck" 7,222 hits [date range 5/2-5/23]

and then you would click on a variant to see all the specific examples in the format normally used by google, or another search engine.

There is much more to be said about search engines, and vastly much more that I don't know, I'm sure.

The thing about the internet is "The truth is out there", but often, like the dynosaur bone in the rock, it can be quite a job to pry it out of there.

Saturday, May 22, 2010

It's Time for Better Search Engines (Who said: "..put my boot heel on the throat of BP")

Is there a search engine that will let me ask?

Who said: "..put my boot heel on the throat of BP"

OK Rand Paul said it, or lets say the the answer looks like a sort of summary "Rand Paul said it" AND a list of pointers to articles quoting Paul as saying it, and maybe a few quite different entries, such as Rand Paul saying he didn't say it. So what if I could say "show me the most atypical entries first". That sounds like a very generally useful followup question when you get 2 million hits, and as far as you can tell the all say more or less the same thing. Could a computer program do a reasonable approximation of what a human (with a year to wade through the 2 million hits) could do? My guess is yes, that wouldn't be a big stretch even.

I've been skimming so many web pages, I feel like I've seen something somewhere quoting someone in Obama's cabinet actually using a phrase like: "..put my boot heel on the throat of BP". Can I confirm that? or be very comfortable in saying it didn't happen (or hear who the Cabinet member was, and see if he/she gets fired the next day)? Well, I can find someone directly attributing the phrase to Obama: "I'll put my boot heel on the throat of BP." Barry Obamma

An important question, and perhaps it represents one of those big stories the news media misses: How many people today, next week, next month, next November literally believe or will believe Obama did say that? Are there any pollsters asking that sort of question? My guess, it could easily be something on the order of as many people as think Saddam Hussein was directly behind 9/11 (at least some pollsters paid attention to that).

Relying on existing search engines and their limited abilities, how close could I come to answering this sort of question?
Well if somebody said it before Rand Paul, and Paul picked it up a couple of days later, wouldn't there be some references to this on the web, before there was any association between the phrase and Rand Paul?

Consider this Google search: "put my boot heel on the throat of BP" -paul

The quotes ("") mean I don't want just any combination of the words "put", "my", "boot".... but want that exact phrase. The "-paul" means nothing containing the word "Paul". So I get 4 hits, all from context being clearly from the Ron Paul interview, except for the twitterer directly attributing it to "Barry Obamma".

OK, but what if the quoted secretary was named Paul ____?_____ ?
I tried already -"rand paul", which picked up too many pages in which Rand Paul was just referred to as Paul. Some other approach? OK, when Paul was putting words in the President's mouth, he started with "What I don't like from the president's administration..."

HOW ABOUT: "put my boot heel on the throat of BP" -paul -"What I don't like from"

That cuts the hit count down quite a bit. There are a couple in which "don't" came out "dont" or "donit", or they just cut the quote down so the whole phrase
"What I don't like from" didn't appear, and finally we are left with the twitterrer quoting "Barry Obamma" which I'm inclined to discount.

Suppose I could say "Who said it first"? Computer logic to approximate that could rely on that fact that every internet page in google's (or another search engine's) vast database will have a date and time of posting. In fact, can't I just tell google "display in order of posting", which would make the question much more easy to answer? NO, apparently not; at least I don't see how. I could probably put a front end on google accessing google via it's more computer friendly interface (or API), and voila, a new and useful search engine.

For more (and drier) discussions of search engines, see

or just google "search engines". You will get, according to google "About 69,600,000 results". Bon appetite!

Sunday, May 16, 2010

Email from a Friend of a Friend: From Urban Legends to Political Smears

Anonymous forwarded emails have for years been a  vehicle for circulating jokes, inspirational pictures and poems, a source of urban legends, and other frivolous but entertaining stuff.

Now, camouflaged by the fluff and amateur political commentary is  a stream of carefully constructed lies and disinformation which does not look like the work of amateurs.

Unlike past, mostly local, whispering campaigns, email is harder to trace and easier to do on a national scale. A couple of years ago, I started getting forwarded emails from my Mom, with claims that could generally be shot down with less than 15 minutes of internet research. They seemed to be really affecting my parents' views, and based on what they told me, they were generally believed by most of their friends. But they were quite simply full of provable lies. They would show signs of having been forwarded a half dozen or so times, with visible 'CC' lists giving them a sort of homey look. When many people receive this sort of thing forwarded by a friend or relative, they are apt to trust it as coming from ordinary outraged citizens as they might not trust direct mass email, but many could simply not have originated as misinformation that the sender believed, which means they can't be anything but deliberately constructed lies, and the number of them, and the similar techniques used seem to prove that they are mass produced.

Here are a couple of references:

The New Right-Wing Smear Machine by Christopher Hayes Oct 25, 2007 FW: OBAMA DEATH LIST

If the same sort of phenomenon is going on with Liberal or Ultra Liberal sources, I would be very interested to investigate that as well.