Similpedia: providing related Wikipedia content

Similpedia is a nice idea. It takes a web page or paragraph of text and cross-references key words in it to Wikipedia entries. In this way you can look up terms or follow the Similpedia suggestions for ‘related content.’

Check out their demo page and you’ll see what they can do. They provide scripts and widgets for WordPress, Firefox and websites generally, as well as an RSS feed which you can see at the bottom right corner of this page. More on that later.

What I didn’t see on their website was functionality for passing a page’s URL to the Similpedia engine. This would be useful for users such as myself who cannot use script. So, I emailed them. And about an hour later, got a reply. The answer is fairly simple, you just pass URL here] to it. So, if I wanted to add a link to the bottom of each post, as I currently do anyway with my ‘BlinkList | Blogmarks | Digg’ etc links (I use a Word file and search/replace a keyword in it with the post URL), I could just add that as a link and off we go.

Almost, but not quite. Here’s one query listing from my recent ‘tecchy’ type post:

As you can see, it lists entries for social networks, blogging and so on. This makes sense.

However, here’s a query on a very different post indeed:

… and you get similar results.

How can this be? The posts are totally different. So, let’s see what happens when I copy the text from that last post and paste it into Similpedia:

Pearly gates
The Pearly gates, in Christian mythology, is an informal name for the gateway to Heaven, inspired by the description of the New Jerusalem in Revelation 21:21&mdash The twelve gates were twelve pearls, each gate being made from a single pearl. The image of the gates in p….

The Wish List
::For other uses of The Wish List , please see The Wish List (disambiguation) The Wish List is a fantasy novel by Eoin Colfer. It chronicles the adventures of Meg Finn, a spirit who has struck a perfect balance between good and evil and as such, is barred from entering ….

James Broadwater
Reverend James S. Broadwater was a Republican candidate for U.S. Congress from the southern state of Mississippi. Broadwater is staunchly conservative and an evangelical Christian. He is unabashed in promoting his personal belief that Christianity is the main source of ….

The Farmer’s Curst Wife
The Farmer’s Curst Wife is Child ballad number 278. Synopsis A farmer had a bad woman for his wife, and one day the devil came for her. They reached Hell, and the gates were shut, so she struck him. She made life in hell so bad that the devil brought her back to her hus….’s_Curst_Wife

Heaven & Hell (album)
Heaven & Hell was a compilation album released in 1989. It contains songs performed by Meat Loaf and Bonnie Tyler. Tracklisting 1. Bat out of Hell 2. Faster Than the Speed of Night 3. You Took the Words Right Out of My Mouth 4. Have You Every Seen the Rain 5. Read ‘Em a….

The Marriage of Heaven and Hell
The Marriage of Heaven and Hell is one of William Blake’s prophetic books, a series of texts written in imitation of biblical books of prophecy, but expressing Blake’s own intensely personal Romantic and revolutionary beliefs. Like his other books it was published as pr….

Heaven’s Gates, Hell’s Flames
Heaven’s Gates, Hell’s Flames is a touring evangelistic drama that has been performed worldwide. The tagline on the official website asks, “Where will you be when reality strikes?”. It is based on an evangelical interpretation of the “Gospel”, and presents the message t….’s_Gates,_Hell’s_Flames

Morgan Pym
Morgan Pym is a character on the television series The Collector, played by Chris Kramer. Story Morgan Pym was a monk in 1348 who sold his soul to the Devil to save the woman he loved, Katrina, who was dying from the plague. After 10 years The Devil came to take Morgan’….

Jane (band)
Band history Jane was formed in October 1970 in Hanover, Germany. Line-up * Peter Panka – Lead Vocals, Drums * Charly Maucher – Lead Vocals, Bass * Werner Nadolny – Keyboards, Vocals * Klaus Walz – Guitars, Vocals Discography Vinyl * Together (1972) * Here we are (1973)….

LAB (band)
LAB is a gothic rock band from Finland. Their single ‘Beat the Boys’ is featured prominently in the PS2/Xbox/PC game, Flatout. Releases Albums * 3/2000: Porn Beautiful * 3/2002: Devil Is A Girl * 3/2005: Where Heaven Ends Singles * 6/1999: Get Me a Name * 9/1999: ‘Til Y….

Very different results, and more the kind of thing I would expect to see, although still not perfect. I’m not sure I’d be interested in Morgan Pym after reading the post, less still the band ‘Jane’.

Is this because the URL approach takes the entire page, including my blogrolls to the left and feeds to the right, I wonder? Whereas just copying and pasting the text uses just that text and nothing else? Another email to Similpedia and lo, again I get a response. Turns out I’m right (I occasionally am), and they’re working on making the algorithm a bit ‘cleverer’ to get around this.

But if you use Feedburner you have a workaround. Feedburner just takes the content of my posting and, among other things, creates a straight HTML page without all the other stuff. So, let’s point Similpedia at my Feedburner page and see what happens:

Something tells me close but no cigar. We’re still getting a lot of tecchy stuff and I daresay this is because of the content that Feedburner places at the top of the page, listing all sorts of quick links to subscribe via different services.

Let’s look at their RSS feed tool. I don’t quite get the point of their RSS demo because it refers to a static page, and surely you’ll always get the same results from it. So let’s give it a page that changes quite often, in my case, my PR feed.

So I  add an RSS widget and point it at:

You can see the results to the bottom of the right-hand column on this page, under ‘Related content’. They’re different, you have to admit, but I’m still not totally convinced they’re useful (which is why the widget is at the bottom, so that I can test it for a while).

So, what next? Until Similpedia develop a ‘clever’ way to strip the non-specific content from a page, we’re left with the method that works but is manual – copying the text and pasting it into Similpedia. I’ve created a Word macro that takes the Similpedia results and converts them to straight HTML but this is a workaround and not, I would say, an ideal one.

Still, it’s a nice idea, very like the online report I recently profiled which cross-referenced Google hits with Wikipedia and other source entries. If you can use scripts or widgets then you’re laughing. Ha ha.

I get the feeling they’re still developing it so let’s watch it with interest. Perhaps they’ll tweak the search engine to weed out the slightly odd results that can crop up, and zero in more on the specific content than the stuff around it. I would also like to see a quick and easy way to point people towards ‘copy and paste the text’ query results such as the one listed above.

Meanwhile they deserve a big round of applause for taking the time to answer my queries because that makes me a fan, and perhaps that’s our small ‘PR learning’ for the day (above and beyond the main PR reason for posting this, which is that I tend to find a lot of execs spend time looking for stuff and I try to help them by providing cool tools I come across but they very seldom seem to want to benefit from this).

Looking forwards, Similpedia have a teaser on their site promising news and blog services coming soon, and I can’t wait to see what that’s all about.

EDIT: I just realised, how about pointing my Google Reader feed directly at Similpedia instead? This doesn’t have all the added content at the top which the Feedburner page has. Also, why do I have to point a page? Why not a feed? Surely a feed would work better? Tried it, but get virtually identical results. So, how about pointing it at the public Google Reader page for that feed? At last! We start getting related content. But is it useful? At the time of writing it’s giving me lots of lookups for ‘Johnson’, presumably based on a ‘Johnson And Johnson Suit Against The Red Cross‘ story. Hmmm. ‘Johnson’ may have resonance for The Big Lebowski lovers but I’m not so sure…

Technorati tag: Add to GoogleAdd to BloglinesAdd to TechnoratiSubscribe by RSSSubscribe by emailFind out more »

BlinkList | Blogmarks | Digg | | Ekstreme Socializer | Feedmarker | Furl | Google Bookmarks | ma.gnolia | Netvouz | New PR | RawSugar | Reddit | Scuttle | Shadows | Simpy | Spurl | Technorati | Unalog | Wink | Yahoo MyWeb2

Look! It's a comment field!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s