Why are there so many Americans online?
It’s probably not that much of a puzzler. There’s a lot of them around anyway, and they’re often several years ahead of Old Europe technologically. And they do like to talk.
But that doesn’t help people like me when I need to monitor UK traffic. Sure, you can use IP addresses to find UK websites, but anyone who blogs on WordPress, Blogger or a number of other platforms is, by definition, American. So distinguishing between UK and US bloggers is very much a case of looking at their bios – or their language.
So, I’ve tried to put together a pipe that filters out the Americanisms in an RSS feed, in the hope that what comes out the other end is mostly British English. For example, if there’s mention of ‘ize’ in a post – very American, British English would use ‘ise’ – it’s filtered out, unless it contains the word ‘size’. Other word roots are filtered out such as ‘gram ‘ (note the space there), ‘anemi’ (British retains the latin ‘a’ for ‘anaemi’, for example ‘anaemia’, as it does for quite a few other word roots), ‘ior’ (‘behavior’ vs ‘behavior’), and there are give-away words such as ‘color’, ‘center’, ‘gray’ and ‘jewelry’.
It’s an unsophisticated approach but then again you could say filtering out keywords is just as sophisticated as keyword matching, which services such as Twendz (probably) do.
I’ve used it in the past and I think it works. So, I’m making it available in case anyone else fancies using it. I would really appreciate it if anyone thinks of a cool way to improve it. And it would be nice if, when you do use it, you cite me.