When we started opening up the Windows Secrets Lounge to Google and other Web indexes a few months ago, we didn’t realize how hard it would be to get the search engine gods to find all our pages.
Finally, we hit the right solution. Google now includes more than 60,000 pages from the Lounge — over half of our total discussion threads — with the rest soon to become available to searchers around the globe.
As you’ll recall from my Jan. 7 Introduction column, the old Woody’s Lounge — founded by WS senior editor Woody Leonhard in 1995 — moved to WindowsSecrets.com in late 2009. One of our goals was to make available to the whole world, via search engines, the more than 125,000 threads Loungers had written since 2001.
For most of the Lounge’s history, the discussion board was hosted on a series of underpowered servers. Years ago, the volunteer admins decided to ban any crawling by search engines to prevent resource overload. In 2009, however, Windows Secrets moved the Lounge to a screaming server and invited search engines to suck down all 700,000 pages at will.
Just making your site visible, however, no longer guarantees that search engines will list all your pages. We had to make several changes to files with names like robots.txt and sitemap.xml to get Google to index more than a few hundred threads. But last month, the search giant got the message and started gulping down 10,000 pages at a whack. (See Figure 1.)
Figure 1. This screen shot taken on March 27 shows that (1) about 60,000 Lounge pages are in Google’s index, (2) the most-recent threads are listed first, and (3) new comments can show up in Google within an hour or two.
In the past week, we’ve seen the page count jump up and down — from 50,000 to 71,000 and back. This variation is probably due to the fact that Google runs thousands of servers, and each one uses a slightly different database.
You can see the latest count yourself by adding site: to the beginning of the Lounge URL in a Google query (this trick works with any domain name):
The trend is definitely up. Once all 125,000+ Lounge threads are available to the world’s Windows users through search engines, a dream of the Lounge’s administrators and moderators will be realized.
Make search engines follow your spider trail
If you’d like to see what we did to make search engines ignore the Lounge’s less-important pages and concentrate on our technical content, you can view our robots.txt directives file for yourself. Or you can append /robots.txt to the end of Lounge.WindowsSecrets.com or any other domain name to view the site’s directives in a browser. (The file name must be all lowercased, as specified by the Robots Exclusion Protocol.)
Until December 2009, the old Lounge’s robots.txt file excluded search engines from every page. But simply lifting that restriction didn’t make Google suddenly see all of our thousands of pages.
To attract Google — “Hey, over here, big boy!” — we had to perfect the art of sitemaps. These are XML files that list every URL you want search engines to index. A sitemap can contain only 50,000 URLs, so we had to create a sitemap index, which points search engines to our multiple sitemaps. Our server constantly updates the sitemaps as Lounge members create new threads.
The big value in writing sitemaps is that we get to tell search engines which pages should be visited most often. We inform Google, for example, that threads with new content should be visited frequently, whereas old threads that haven’t generated new content in months can be checked less often. (This may be why new threads are showing up in Google within an hour or two, as shown above in Figure 1.)
If you’d like to create a sitemap for your Web site, see the Sitemap Protocol for more info.
Get your free Lounge membership today
If you haven’t yet done so, get the benefits of a full Lounge membership by registering today. Just visit our quick registration form. It’s free!
Already a member? Take a look at the latest topics in today’s Lounge Life column and jump into the Lounge once more. Have fun!
| Have more info on this subject? Post your tip in the WS Columns forum.|
Â Brian Livingston is editorial director of WindowsSecrets.com and co-author of Windows Vista Secrets and 10 other books.