Tuesday, November 28, 2006

Google Wikipedia MySpace How Teenagers Hijacked the Internet

This is Google's policy now: Wikipedia articles regardless of their length or quality or even mere existence are placed by Google's algorithm high up in the search results.

A recent (late 2006) study by Heather Hopkins from Hitwise demonstrates the existence of a pernicious feedback loop between Google, Wikipedia, MySpace, and Blogspot. Wikipedia gets 54% of its traffic from Google search results. The majority of Wikipedia visitors then proceed to MySpace or Blogspot, both of which use Google as their search service.

Google has changed its search algorithm in late 2005-early 2006. I have been monitoring 154 keywords on Google since 1999. Of these, the number one (#1) search result in 128 keywords is now a Wikipedia article. More than a quarter (38 out of 128) of these "articles" are what the Wikipedia calls "stubs" (one or two sentences to be expanded by Wikipedians in the future). Between 7 and 10 of the articles that made it to the much-coveted number one spot are ... empty pages, placeholders, yet to be written!

This is Google's policy now: Wikipedia articles regardless of their length or quality or even mere existence are placed by Google's algorithm high up in the search results. Google even makes a Wikipedia search engine available to Webmasters for their Websites. The relationship between Google and Wikipedia is clearly intimate and mutually-reinforcing.

Google's new algorithm, codenamed Big Daddy, still calculates the popularity of Websites by counting incoming links. An incoming link is a link to a given Website placed on an unrelated page somewhere on the Web. The more numerous such links - the higher the placement in Google's search results pages. To avoid spamming and link farms, Google now rates the quality of "good and bad Internet neighborhoods". Not all incoming links are treated equally. Some Internet properties are shunned. Links from such "bad" Websites actually contribute negatively to the overall score.

The top results in all 154 keywords I have been diligently monitoring since 1999 have changed dramatically since April 2006. The only common thread in all these upheavals is one: the more incoming links from MySpace a Website has - the higher it is placed in the search results.

In other words: if Website A has 700 incoming links from 700 different Websites and website B has 700 incoming links, all of them from various pages on MySpace, Website B will be ranked (much) higher in the search results. This holds true even when both Websites A and B sport the same PageRank. This holds true even if the bulk of Website A's incoming links come from "good properties" in "good Internet neighborhoods". Incoming links from MySpace trump every other category of incoming links.

An unsettling pattern emerges:

Wikipedia, the "encyclopedia" whose "editors" are mostly unqualified teenagers and young adults is touted by Google as an authoritative source of information. In search results, it is placed well ahead of sources of veritable information such as universities, government institutions, the home pages of recognized experts, the online full-text content of peer-reviewed professional and scholarly publications, real encyclopedias (such as the Encarta), and so on.

MySpace whose 110 million users are predominantly prepubescent and adolescents now dictates what Websites will occupy the first search results in Google's search results pages. It is very easy to spam MySpace. It is considered by some experts to be a vast storehouse of link farms masquerading as "social networks".

Google has vested, though unofficial and unannounced and, thereforeScience Articles, undisclosed interests in both Wikipedia and MySpace. Wikipedia visitors end up on various properties whose search technology is Google's and Wikipedia would have shriveled into insignificance had it not been to Google's relentless promotion of its content.