What’s really going on with Google

By Brian Livingston

Google.com is a search engine, not a Windows program. But Google is running on so many desktops — and so many computer professionals use Google to look up technical-support information — that it almost seems at times like a built-in Windows applet.

That’s why I’ve taken a very public interest in the quality of search results that Google is providing to Windows users (and everyone else).

The news hasn’t all been good. I published a column in eWeek on Feb. 17 providing several examples of searches on technical subjects that no longer produced very relevant hits in the top 10 results at Google. I followed that by printing several readers’ comments — most of them critical of Google — in the Feb. 26 issue of Brian’s Buzz.

Get our unique weekly Newsletter with tips and techniques, how to's and critical updates on Windows 7, Windows 8, Windows XP, Firefox, Internet Explorer, Google, etc. Join our 480,000 subscribers!

PC Drive Maintenance (Excerpt)

Subscribe and get our monthly bonuses - free!

Your hard drives store photos, books, music and film libraries, letters, financial documents and so on. This ebook is aimed at helping you understand your hard drives, expand their capacities and length of life, and recover what you can from them when they fail. We're offering you a FREE Excerpt! Get this excerpt and other 4 bonuses if you subscribe FREE now!



After several weeks of study since then, I’ve learned several little-known details about the ubiquitous search engine that so many of us have come to rely upon. I’d like to share them with you now, in hopes that the art of Web searching can be improved for us all.

The problem with “junk” pages
Google is by far the most popular search engine in the world, handling 35% of all Web searches, according to a recent story citing comScore Media Metrix figures. That compares with 27% of all searches conducted from Yahoo’s network of sites, 16% from AOL/Time Warner sites, and 15% from Microsoft sites, such as MSN.

click for larger graphic Google’s dominance in the technology marketplace is even stronger. Citing StatMarket figures from May 2003, the search engine’s Web site flatly states that “Google sends more search traffic to technology sites than all other search engines combined” (graph, left).

The broad reach of Google can send enormous quantities of traffic to whichever sites show up in the top 10 on particular searches. This attracted the interest of thousands of Web site owners with something to sell. An entire cottage industry called “search engine optimization” (SEO) sprouted in the past few years to manipulate Google’s ranking system. SEO techniques usually focused on the fact that Google’s computerized formulas gave extra weight to the words found in a Web site’s title and headings, and the words in links that point to such sites.

Rankings on many search terms became so loaded with “junk” pages — sites with little content but lots of optimization tricks — that even many SEO consultants felt Google was being abused.

“Google has been delivering questionable returns for several months now, with spam and duplicate listings often making it into the Top 10,” wrote Jim Hedger of SEO firm Stepforth.com in a Nov. 2003 PDF report.

When I interviewed Google executives in preparation for my eWeek piece, they denied that any particular problem had arisen with the relevance of the search results. I noted that Google Groups, the index of Usenet postings, often provided better technical links than the main Google index. Peter Norvig, Google’s director of search quality, told me in response, “These are the types of questions that have always been best answered on Google Groups.”

In fact, top Google officials had for months been planning and implementing a major overhaul of the ranking formula to combat the takeover of the listings by the most “optimized” sites.

Google co-founder Sergey Brin told the AP on Feb. 17 that the search engine had made “five significant changes to its algorithmic formulas.” The update, dubbed “Brandy,” was rolled out across Google’s thousands of servers worldwide over a four-day period from Feb. 17 to 20, according to a Sitepoint.com article by Alex Walker.

The Brandy update, Walker explains, allows Google to give more weight to Web pages that bear words similar to but not identical to the terms that a searcher typed in. A person searching for travel insurance, for example, might be shown sites that use other words, such as holidays or medical. This is called latent semantic indexing.

The update also places more weight on anchor text, which is the wording in links that point to a given Web page. Equally important, says Walker, is that Google is now downgrading the importance it previously placed on words that appear in page titles, headings, and other HTML tags.

A major impact on small e-commerce sites

The Brandy algorithm, and an earlier change made on Jan. 23 known as “Austin“, was intended to soften the blow that had been caused to many Mom-and-Pop e-commerce sites by Google’s “Florida” update on Nov. 16, 2003.

Just before the crucial Christmas online buying season, the Florida update drastically altered Google’s ranking system. Google’s aim was to cut out “spammy” Web sites that were manipulating the index. The effect, however, almost entirely eliminated many legitimate small businesses from the first several pages of rankings on numerous commercial terms.

A site that is often critical of Google’s weaknesses, Google-Watch.org, published an amazing study of this effect. The organization showed that certain two-word search terms produced an entirely different list of top 100 sites in December 2003 than had appeared in November 2003.

More than 90 listings disappeared out of the top 100 search results that previously appeared, according to Google-Watch, when searches were performed on the following 2- and 3-word phrases (among many others):

    airport parking
    apartment finders
    birthday balloons
    car import
    cheap business cards
    cheap glasses
    condo rental
    dental plans
    free movie clips
    hair removal
    homeowner loans
    limo for wedding
    mcse boot camp
    medical transcription jobs
    nanny agency
    payday loan
    satellite dish
    tshirt printing
    ultrasound jobs
    used office furniture
    web designing
    wooden flooring
    work boots

The organization not only published a complete list of Google’s “poisoned phrases.” It also made available a remarkable online tool that allows anyone to see the difference in the top 100 listings that Google produces — with and without the Florida filter in effect.

A search on airport parking, for example, previously showed ElPaso-Airport-Parking.com, a parking service in Texas, and SeaTacPark.com, a private operator of parking lots near the Seattle-Tacoma airport, in the top 100 listings.

The new algorithm isn’t necessarily an improvement in relevance. The top two Google results on a search for airport parking are now Parking4Less.co.uk and ParkAndGo.co.uk, two private parking operators in Britain — not ideal, “information-rich” sites about airport parking in general.

But the new ranking formula is definitely a big, big shake-up. This has generated plenty of speculation about the motivations for the changes.

A detailed flow chart that shows how this all works
Vaughn Aubuchon, a technical writer who maintains an “Internet mini-encyclopedia“, developed an intricate flow chart on the way the new system penalizes various sites.

The chart itself looks like spaghetti, but Aubuchon’s written explanation that annotates it makes sense. In a nutshell, here’s how he speculates that the penalty system works:

  1. If a Google user’s search terms are in the list of “poisoned phrases,” certain Web sites will be penalized in the search results that appear;

  2. The rating penalty is imposed if any ONE of the following is true:

    • The site is listed in a commercial category of the directory Google uses; or

    • The site in included in Froogle, Google’s e-commerce search engine; or

    • The site has been search engine optimized, with common search terms having been inserted into several HTML tags — such as the site’s title, headings, and alternate image text — as well as the body text; or

    • Links to the site mainly come from “link farms” and other information-poor sites, rather than “expert sites,” as determined by Google’s new Hilltop Algorithm.

The Hilltop Algorithm, which was introduced with the Austin update, is a patented methodology that two researchers provided to Google to help it find “authority” sites, including those in .edu, .gov, and .org domains. These sites — and sites they link to — are reported by Aubuchon to be exempt from the penalties.

How these top-of-the-hill sites are selected has become yet another factor in the speculation about the changes.

The “profit has finally won out” theory
Google-Watch goes so far as to allege that the list of “poisoned phrases” is very similar to the search terms that fetch the highest bids from advertisers in Google’s AdWords program.

Specifically, the site says, many Mom-and-Pop e-commerce sites “feel that they are being deliberately forced to bid on AdWords so as to enhance Google’s profit margins in the months before [Google is] filing an IPO.”

It’s impossible to know whether this is true or what Google’s internal discussions were.

When I asked Nathan Tyler, a Google public relations representative, about the recent upheavals, he replied: “Generally speaking, we can’t get into the specifics about changes to our ranking algorithms.” He added, “Google frequently changes its algorithms to improve the overall quality and accuracy of its search results. This is why it is common to see movement in the ranking of sites on Google search results pages.”

Tyler did not respond to a follow-up question seeking a response to Google-Watch’s specific allegations about e-commerce and Google’s AdWords program.

Will the results really improve?
There’s some evidence that the new Google algorithm is even more open to manipulation by “spammy” sites than it was a few months ago.

On Mar. 25, the principal behind Google-Watch, Daniel Brandt (who goes by the online handle “Everyman”) announced in a forum that he had succeeded in making a particular Web page the No. 1 result at Google on a search for out-of-touch executives.

The joke is that he was able to make the No. 1 listing be Google’s corporate information page, which shows pictures and biographies of co-founders Sergey Brin, Larry Page, and other officials.

This effect is similar to other recent “Google bombs,” in which dozens of Web logs used the same anchor text to link to particular sites. The cumulative effect of all those links was to make searches such as french military victories and weapons of mass destruction go to satirical sites.

But Brandt’s recent demonstration is stunningly different. He was able to manipulate Google’s corporate info page into the No. 1 position by creating anchor text on only eight different Web pages.

Brandt says this proves how easy it is for shady and “spammy” sites to get high rankings in Google by setting up numerous sites that use the same anchor text in their links to each other.

Meanwhile, competing search engines are mimicking Google and showing the same anchor-text vulnerability. Google’s corporate info page was soon the No. 1 result for searches on out-of-touch executives at Yahoo, MSN, AllTheWeb, and AltaVista, Brandt reported.

“Google should not use terms in external links to boost the rank of a page on those terms, unless those terms are on the page itself,” Brandt explained in an interview. “This is a no-brainer. But it means another CPU cycle [increasing the cost] per link, which is why Google won’t do it.”

How you can use this information

1. Small businesses and large corporations. Does your company rely on search engines to send visitors to your site? If so, you owe it to yourself to visit Google-Watch’s demonstration page.

Type in a common 1- or 2-word phrase that’s associated with your business, such as computers or xp professional. The demonstration shows you a “toxicity score” for the search term, and shows you the sites that, as a result, no longer appear in Google’s top 100 results (perhaps yours!)

You should compare these results with actual searches on Google, to ensure that the ranking algorithms used in the demonstration are still effective. If your site is, in fact, being penalized because of the “poisoned phrases,” try reducing the number of times these words are used in titles and tags on your pages, so they’re not “over-optimized.” Since Google makes major updates to its index only about every 30 days, you may have to wait a month to see if this helps.

2. Individual Web searchers. Do you use Google to search for technical information about Windows? If so, you should familiarize yourself with other search engines that may produce more relevant results.

The biggest alternatives available to you (in my order of preference) are:

You can quickly compare the results from Google and the alternatives by using a metacrawler, such as HotBot. When you perform a search at HotBot, it returns listings from three different search technologies:

  • Clicking the HotBot button displays results from Inktomi.

  • Clicking the Google button returns results from Google; and

  • Clicking the Ask Jeeves button returns results from Teoma.
Another good bet is Dogpile. This metacrawler includes results from Google, LookSmart, Yahoo, and others. You can display the results from the different search engines intermingled on the page or have the results grouped by engine. (Tip: Use the Preferences link to establish this setting.)

Search engine technology is rapidly changing. Increased competition among the players can only be good for those of us who depend on these services to find technical information about Windows and other topics. Don’t become dependent upon a single search solution. Make yourself aware of the strengths and weaknesses of each alternative.

To send me more information about this, or to send me a tip on any other subject, visit WindowsSecrets.com/contact. You’ll receive a gift certificate for a book, CD, or DVD of your choice if you send me a comment that I print.
= Paid content

All Windows Secrets articles posted on 2004-04-08: