Results 1 to 12 of 12
  1. #1
    WS Lounge VIP mrjimphelps's Avatar
    Join Date
    Dec 2009
    Location
    USA
    Posts
    3,915
    Thanks
    506
    Thanked 453 Times in 422 Posts

    Yellow Pages Scraper

    Does anyone have any experience with one of the Yellow Page scrapers out there?

    A Yellow Page scraper extracts listings from YellowPages.com and puts them into .CSV format for import into Excel.

    Thanks.

  2. #2
    5 Star Lounger
    Join Date
    Dec 2009
    Location
    Paducah, Kentucky
    Posts
    641
    Thanks
    48
    Thanked 112 Times in 107 Posts
    You can likely find information by doing a search using the term "yellow pages scraping software".
    (I don't think I want to be experienced with one!)
    Clone or Image often! Backup, backup, backup, backup...
    - - - - -
    Home Built System: Windows 10 Home 64-bit, AMD Athlon II X3 435 CPU, 16GB DDR3 RAM, ASUSTeK M4A89GTD-PRO/USB3 (AM3) motherboard, 512GB SanDisk SSD, 3 TB WD HDD, 1024MB ATI AMD RADEON HD 6450 video, ASUS VE278 (1920x1080) display, ATAPI iHAS224 Optical Drive, integrated Realtek High Definition Audio

  3. #3
    Administrator Rick Corbett's Avatar
    Join Date
    Dec 2009
    Location
    South Glos., UK
    Posts
    3,240
    Thanks
    142
    Thanked 847 Times in 682 Posts
    I don't have experience but this looked interesting: How to scrape Yellow Pages with ScreenScraper Chrome Extension

    Any use?

  4. #4
    WS Lounge VIP mrjimphelps's Avatar
    Join Date
    Dec 2009
    Location
    USA
    Posts
    3,915
    Thanks
    506
    Thanked 453 Times in 422 Posts
    Rick, I was able to install the above Yellow Pages scraper into Opera by first installing the Chrome Extension from the Opera Add-On gallery, which will allow me to install most Chrome extensions in Opera. I then installed the ScreenScraper you linked to above.

    There is some configuration that I need to do on the ScreenScraper. I'll post back once I have gone through the whole process and then tried it out.

  5. #5
    5 Star Lounger Lugh's Avatar
    Join Date
    Jun 2010
    Location
    Indy
    Posts
    848
    Thanks
    211
    Thanked 108 Times in 95 Posts
    Jim, I've looked at scraping over the past couple of years [haven't implemented anything yet], so just in case it's useful, here's a note I made a month ago of an interesting looking general prospect.

    Data Toolbar $24

    •Data collection from HTML5 highly interactive web pages with Async Javascript
    •Multi-tab and multi-window browsing mode with high level of parallelism ideal for multi core PCs
    •Unlimited crawling depth
    •Automatic logins. Repeated submission of forms for all possible input values.
    •Opening menus, switching tabs, accepting alerts and handling pop-up panels
    •Simulating mouse hover effect
    •Multiple lists of data per web page. For example, from a LinkedIn page you can collect all person's jobs, skills and education history.
    •Direct XML, Excel and SQL multi-table output. For example, collecting products catalog with attached table of user reviews.
    •Background data scraping using a headless WebKit browser.
    •Scheduled execution on any interval
    •Simultaneous processing of multiple projects
    •The Data Toolbar for Chrome and Firefox can run side-by-side with Data Toolbar for Internet Explorer.
    Lugh.
    ~
    Dell Alienware Aurora R6 (new 2017)
    Windows 10 Home x64 1703; Office 365 x32
    GeForce GTX 1060; 16GB DDR4 2400
    256G SSD, 1TB HD

  6. #6
    WS Lounge VIP mrjimphelps's Avatar
    Join Date
    Dec 2009
    Location
    USA
    Posts
    3,915
    Thanks
    506
    Thanked 453 Times in 422 Posts
    Lugh:

    I downloaded the free version of Data Toolbar. In short, this is a really good program for "scraping" data out of the Yellow Pages. It took me a few minutes to figure out the program; but once I figured it out, it worked like a champ.

    You have to do a little bit of cleanup on the results, once you export them to Excel. The reason for this is that the data isn't always laid out "correctly" in the Yellow Pages, and it exports to Excel exactly like it is laid out in the Yellow pages. But it was very easy to clean up the data once I got it into Excel.

    The only limitation I found with the free version vs the paid version is that with the free version, you are limited to scraping no more than 100 listings at a time from the Yellow Pages. I used it to get a listing of the churches in our area, and I did one town at a time, and so there was never anywhere near 100 listings in any of the results.

    Thanks for the tip.

    Jim

    Update: I forgot to mention something:

    Data Toolbar is a Windows program; it doesn't run in Linux. Therefore, I had to reboot into Windows to run it. As much as I prefer to work in Linux, this is such a handy program that I don't mind working in Windows while using it.

    Also, Data Toolbar is not a browser add-on or extension. It doesn't run within the browser session, but rather it is a separate program which loads a browser session. Consequently, you load Data Toolbar, and it then loads the browser.
    Last edited by mrjimphelps; 2017-08-28 at 07:58.

  7. #7
    5 Star Lounger Lugh's Avatar
    Join Date
    Jun 2010
    Location
    Indy
    Posts
    848
    Thanks
    211
    Thanked 108 Times in 95 Posts
    Quote Originally Posted by mrjimphelps View Post
    Thanks for the tip.
    You're welcome, and thanks for being the guinea pig Now I know it's worth my time to check out.

    The only limitation I found with the free version vs the paid version is that with the free version, you are limited to scraping no more than 100 listings at a time
    That's not bad. If I can automate the Excel clean-up, I'll eventually have ~1,000 listings to do so I'll happily pay the reasonable price.
    Lugh.
    ~
    Dell Alienware Aurora R6 (new 2017)
    Windows 10 Home x64 1703; Office 365 x32
    GeForce GTX 1060; 16GB DDR4 2400
    256G SSD, 1TB HD

  8. #8
    WS Lounge VIP mrjimphelps's Avatar
    Join Date
    Dec 2009
    Location
    USA
    Posts
    3,915
    Thanks
    506
    Thanked 453 Times in 422 Posts
    Here's the key to getting it to work like a champ:

    1. Run the program. It will load the browser that it is set up to work with (e.g. Firefox or Chrome). Go to the web page you want to scrape.
    2. Set up the profile, choosing the fields you want to include. Once you have it working correctly, save it as the default profile.

    Now here's the part that snagged me:

    In the future, when you want to scrape several yellow page listings: FIRST, run the program; SECOND, go to the page you want to scrape; THIRD, click the arrowhead next to the program button in the upper left of your browser and choose Load the current page with the default profile. It will then scrape the info from the page you currently are at. When it shows the first listing from that page, then you click the button to save the data.

    The above steps will scrape only one webpage. To scrape another, it is very important that you first close the program, then change to the new webpage, then reload the program using the current profile with the current webpage. If you leave the program running when you change to another webpage, it won't scrape the new page.

    Perhaps the paid version doesn't have this quirk. But it is very easy to close the program and then reopen it in order to scrape new data, so I don't consider this any sort of problem with this program.

  9. The Following 2 Users Say Thank You to mrjimphelps For This Useful Post:

    Lugh (2017-09-08),Rick Corbett (2017-09-08)

  10. #9
    5 Star Lounger Lugh's Avatar
    Join Date
    Jun 2010
    Location
    Indy
    Posts
    848
    Thanks
    211
    Thanked 108 Times in 95 Posts
    Great info Jim, thanks

    Quote Originally Posted by mrjimphelps View Post
    If you leave the program running when you change to another webpage, it won't scrape the new page.
    That certainly goes against their claims for "simultaneous" and "parallelism" so hopefully it's a bug, or some setting you haven't found yet. I won't be doing 1,000 daily opens and closes manually, that's for sure!

    They invite email Qs to support@, if you want to pursue it.
    Lugh.
    ~
    Dell Alienware Aurora R6 (new 2017)
    Windows 10 Home x64 1703; Office 365 x32
    GeForce GTX 1060; 16GB DDR4 2400
    256G SSD, 1TB HD

  11. #10
    Administrator Rick Corbett's Avatar
    Join Date
    Dec 2009
    Location
    South Glos., UK
    Posts
    3,240
    Thanks
    142
    Thanked 847 Times in 682 Posts
    Very helpful write-up of your experience, Jim. Thank you.

  12. The Following User Says Thank You to Rick Corbett For This Useful Post:

    mrjimphelps (2017-09-08)

  13. #11
    WS Lounge VIP mrjimphelps's Avatar
    Join Date
    Dec 2009
    Location
    USA
    Posts
    3,915
    Thanks
    506
    Thanked 453 Times in 422 Posts
    Quote Originally Posted by Lugh View Post
    Great info Jim, thanks



    That certainly goes against their claims for "simultaneous" and "parallelism" so hopefully it's a bug, or some setting you haven't found yet. I won't be doing 1,000 daily opens and closes manually, that's for sure!

    They invite email Qs to support@, if you want to pursue it.
    The instructions weren't very clear, so I basically figured it out by trial and error. Maybe it does work as advertised; but they need to do better on the instructions. Perhaps the paid version has better instructions?

    We will use this program maybe once or twice per year, and scrape about 30 to 40 listings each time we run it; and it's working fine for what I need, so I'm not likely to pursue assistance from them. But thanks for the tip.
    Last edited by mrjimphelps; 2017-09-08 at 12:45.

  14. #12
    WS Lounge VIP mrjimphelps's Avatar
    Join Date
    Dec 2009
    Location
    USA
    Posts
    3,915
    Thanks
    506
    Thanked 453 Times in 422 Posts
    I don't know if this is appropriate for this forum or not, but I do technical documentation on the side (as well as at my fulltime job), so if you ever hear of a need for someone to document a process in a simple and clear way, my rates are cheap!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •