Results 1 to 10 of 10
  1. #1
    Bronze Lounger
    Join Date
    Jan 2001
    Location
    La Jolla, CA
    Posts
    1,470
    Thanks
    30
    Thanked 62 Times in 58 Posts

    Download from website

    A client of mine uses a government website that enables searches by individual last name. The search can take just the first letter of the last name and will return data for every person with a last name beginning with that first letter. There is also an option on the site to download the results to an Excel file.

    Is there a way to automate this process so that Excel can go to the site, do 26 searches one-by-one, and automatically select the download to Excel option or is this a 26-search & download a manual labor of love?

    TIA

  2. #2
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,436
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Kevin,

    Is it an open website? If so how about a link so we can test.
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  3. #3
    Bronze Lounger
    Join Date
    Jan 2001
    Location
    La Jolla, CA
    Posts
    1,470
    Thanks
    30
    Thanked 62 Times in 58 Posts
    http://casesearch.courts.state.md.us...s?disclaimer=Y

    If you search for "A" (for example), there's a button at the bottom of the page to download to Excel all records with a last name starting with "A".

  4. #4
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,436
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Kevin,

    Sorry I can't see a way to automate this since it only provides the first 25 results on the screen so the standard Excel web page link won't work. As a further note the search only returns the first 500 records.

    Kevin1.PNG

    So even If you could automate it by somehow getting the code to press the web export button how would you know if you got all the results? BTW: I tested the export to Excel and only got 500 records not to mention a warning message in Excel requiring User Action.

    Kevin2.PNG

    With my skill set this looks like a strictly user wash rinse repeat operation. Maybe some of the Web gurus will have some insite.

    HTH
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  5. #5
    Gold Lounger Maudibe's Avatar
    Join Date
    Aug 2010
    Location
    Pa, USA
    Posts
    2,639
    Thanks
    115
    Thanked 650 Times in 592 Posts
    KW,

    I took a look at the source coding for the site. It is a fairly simple site to initiate the search as the elements are easily identifiable. The resulting "results" page would require an extraction of the URL so the code could identify the new page and its content. The new source code is primarily a table with a slew of hyperlinks, one being to download the Excel file. If the search resulted in one page of results, this could be doable.

    So, I was trying to think of a way to achieve that. My initial thought was to filter the searches by systematically sending combinations of letters to the Last name search box. Ex: Aa, Ab, Ac, etc. They all were over 25 making them more than one results page. In fact each combo was over 500. So I thought let me try a combination of 3 letters. Aaa produced over 500 itself. So did Abb. To go deeper to cycle through all the 4 combinations of the alphabet would be overwhelming. My guess is that even then, the searches would top out over 500. This is an enormous database with no doubt Oracle running on the server. Capturing all the data would easily exceed the 1,048,576 row capacity of Excel.

    I think this could have been a feasible venture if the database was not so extensive. I agree with RG, "this looks like a strictly user wash rinse repeat operation."

    Maud

  6. #6
    Bronze Lounger
    Join Date
    Jan 2001
    Location
    La Jolla, CA
    Posts
    1,470
    Thanks
    30
    Thanked 62 Times in 58 Posts
    RG & Maud...thanks for all your effort to examine this. Your add'l analysis and conclusion, Maud, is quite shocking about the size of this database. Client will have to staff up to wash, rinse, etc. Maybe add bleach.

  7. #7
    New Lounger
    Join Date
    Dec 2009
    Location
    Gordonsville, VA
    Posts
    17
    Thanks
    4
    Thanked 2 Times in 2 Posts
    Due to the the large database size, this has to be an all or nothing proposition. The following suggestion is subject to two huge assumptions:

    1. The web site will allow thousands of searches from the same IP in a short period of time
    2. The client can find a developer to properly implement the scheme

    Maudibe had the right idea: partitioning the data. I chose to focus on the other filters, since the initial letter really can't be extended. After refining each filter, I checked against the letter A. Each result returned over 500 until I got to the last filter: time range. Arbitrarily, I chose 01/01/2006 - 01/31/2006.

    Finally! 3 results!

    lounge-md-search.jpg

    Upon further reflection, this is a good range, as the number returned is less than a page. (If the developer can use sockets, this will be irrelevant.)

    At the worst, nearly 20,000 searches have to be done for each time range. Here is a summary:

    Partitioning Scheme:

    26 Initial: Must use only one letter, due to single letter entities like A & L
    24 County: Maryland has 23 counties and one independent city
    4 Case Type: Civil, Criminal, Traffic, Civil Citation
    2 Court Type: Circuit, District
    4 Party Type: Attorney, Defendant, Plaintiff, Other
    ?? Time Range: 1 month

    So for each time range, create 26 * 24 * 4 * 2 * 4 = 19,968 searches.


    Optimizations are possible. OP didn't mention whether client was interested in the entire database. For example, if only one county is required, the searches per time range would go down to 832!

  8. The Following User Says Thank You to Anklebuster For This Useful Post:

    zeddy (2016-02-11)

  9. #8
    WS Lounge VIP
    Join Date
    Mar 2002
    Location
    Newcazzle, UK
    Posts
    2,829
    Thanks
    136
    Thanked 482 Times in 459 Posts
    Hi

    This is a good approach. I too managed to get the results filtered to a manageable number by using a monthly calendar range.
    On a positive note, historical records i.e. for past years and months, shouldn't change. So it doesn't need to be processed again when you have fetched prior year data etc etc.
    Perhaps using a vba loop for the months date range etc etc and running overnight might be a possibility.

    zeddy

  10. #9
    5 Star Lounger Lugh's Avatar
    Join Date
    Jun 2010
    Location
    Indy
    Posts
    620
    Thanks
    166
    Thanked 77 Times in 68 Posts
    Quote Originally Posted by zeddy View Post
    historical records i.e. for past years and months, shouldn't change. So it doesn't need to be processed again
    Right, my first thought was a sufficiently restrictive date filter [even down to one day, whatever makes it a doable repetitive job], assuming that it's only newly added info which needs to be downloaded.

    I wonder would web scraping software help? I plan to implement that whenever I get time, and iirc the software below handle the multi-page results problem. Here are the results of some research I did 3 to 6 months ago:

    The features I'm looking for:
    Desktop software, one-off purchase [+ maybe later upgrades];
    Wizard interface for setting up the queries;
    Run multiple queries in one session;
    Results to Access or Excel;
    Auto scheduling.

    Software for Web Scraping has a nice list of scrapers. Here are the ones catching my eye:

    Easy Web Extract $60
    This has all the features above.

    OutWit Hub $81
    This has all the features above.

    WebHarvy $99

    [I downloaded those 3, but didn't get a chance to play with them much before the trial period ran out]

    Excel - Get external data from a Web page

    Good MS article about how to setup web queries and bring the results into Excel.

    WebService () function

    Need to check this new function out. These new functions available in Excel 2013 allow you to retrieve data from a web service directly into a worksheet: EncodeURL(), Webservice and FilterXML()
    Lugh.
    ~
    Windows 10 Pro x64 1607; Office 2016 (365 Home) x32; Win Defender, MBAM Pro

    ASRock H97 Anniversary; Xeon E3-1231V3 (like i7)
    Gigabyte GeForce GTX 970; 12GB Crucial DDR3 1600
    Logitech MX Master mouse; Roccat Isku kb

  11. #10
    5 Star Lounger
    Join Date
    Jan 2004
    Location
    Praha
    Posts
    990
    Thanks
    56
    Thanked 105 Times in 90 Posts
    Thinking a little laterally, have you contacted the Maryland Judiciary to see whether, for instance:
    1. They can let you have a copy of the database (or a subset) which is, after all, in the public domain ?
    2. They can give you another form of access ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •