Page 1 of 2 12 LastLast
Results 1 to 15 of 21
  1. #1
    Platinum Lounger
    Join Date
    Dec 2001
    Location
    Melbourne, Australia
    Posts
    4,594
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Screen scraper (Access 2003 SP1)

    I need to scrape the table of bets as it appears on the screen in url http://www.iasbet.com/site/racing/racingwi...?eventid=139597.

    I got some help originally from Jefferson initially on this. His code was used to access HTMLTable type code.
    It processes the last 2 columns then bombs out which is understandable, however, I want to know why it does not start from the 1st column instead of the 7th column.
    Attached Files Attached Files

  2. #2
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Screen scraper (Access 2003 SP1)

    <P ID="edit" class=small>(Edited by jscher2000 on 06-Nov-05 21:06. )</P>That page has, quite possibly, the nastiest HTML table ever created. Not nasty looking, of course, but nasty to read.

    Anyway, if you notice that there are options to change the center column from 3 sets of number, to jockeys, to trainers. These are separate DIVs embedded in a single cell. To successfully scrape that data, look for the first table in each of the following elements:

    <div id=&quot;Detail_Fluc&quot; style=&quot;display: block;&quot;>
    <div id=&quot;Detail_Jockey&quot; style=&quot;display: none;&quot;>
    <div id=&quot;Detail_Trainer&quot; style=&quot;display: none;&quot;>

    Each of these tables is distinct from the ones that create the rest of those rows. (Ignore the style attribute; the script changes those as you toggle the contents of that column.)

    Excel's Data - Get External - Web Query feature doesn't even seem to recognize them as tables at all...

    Added: After further review... as an illustration of the above point, see the attached. If you parse through the HTML, you see that there are 6 embedded tables, 4 of which are visible at any given time.
    Attached Images Attached Images

  3. #3
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Screen scraper (Access 2003 SP1)

    Now, onwards and upwards. Your code isn't finding the data you want because you are only picking up one of the less consequential tables by referring to the second-to-last in the document:

    Set aTable = colTables.Item(colTables.length - 2)

    In fact, due to sloppy coding (or a desire to mess with your code), there actually are a number of totally blank tables in that page. So, to get at the "fluc" data, you could try something like the approach in the attached. It creates an object reference to a DIV with the appropriate name and then looks at all of its child nodes until it finds a table, then creates an object reference to that table for you to parse. But this is meaningless without correlating it with the data in the first few columns. Which actually are in the next row. So probably you should read all of the tables in the document and build an array of strings and then write it into coherent records in your database. Or something. Maybe you can get an XML (RSS) feed to parse instead?
    Attached Files Attached Files

  4. #4
    Platinum Lounger
    Join Date
    Dec 2001
    Location
    Melbourne, Australia
    Posts
    4,594
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Re: Screen scraper (Access 2003 SP1)

    I will only be trying to scrape the Detail_Fluc table and the 3 others to the right of it.

    Tell me, where do you get to learn about all this? Is there any books I could purchase?

  5. #5
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Screen scraper (Access 2003 SP1)

    > I will only be trying to scrape the Detail_Fluc table and the 3 others to the right of it.

    I suspect you might want the names of the horses, too?

    > Tell me, where do you get to learn about all this? Is there any books I could purchase?

    Well, I'm not sure. Over the past 5 years I have developed, maintained and rewritten a substantial ASP application on our intranet. I also have built various tools in VBA and javascript where I learned some of these collections. And of course spending lots of time on the web and various forums...

  6. #6
    Platinum Lounger
    Join Date
    Dec 2001
    Location
    Melbourne, Australia
    Posts
    4,594
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Re: Screen scraper (Access 2003 SP1)

    As long as you don't mind me asking you questions.

    Thanks for your help.

  7. #7
    Platinum Lounger
    Join Date
    Dec 2001
    Location
    Melbourne, Australia
    Posts
    4,594
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Re: Screen scraper (Access 2003 SP1)

    You are right I need to get the number and name of the horse. Which table is this data in?

    Does any of the code I had originally extract the data I need? If not, would you provide a guide for me to get started, if you ca afford the time of course.

  8. #8
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Screen scraper (Access 2003 SP1)

    If you consider the outer table structure to be "table 1", then the horses are in that table, but that data starts on the row below the embedded tables. That's why I think you might need an intermediate data structure to "reconstruct" records from these different sources. Or I suppose you could stuff the data into the various fields in your recordset after passes through all four tables before updating it to the database. I don't program Access directly (just using ADO), so I don't know what would be best there.

    All in all, it might be easiest to obtain an alternative data format directly from its proprietor rather than try to "scrape" this data.

  9. #9
    Platinum Lounger
    Join Date
    Dec 2001
    Location
    Melbourne, Australia
    Posts
    4,594
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Re: Screen scraper (Access 2003 SP1)

    When you say the outer table, how do you address that table, also does it also include the horse number?

    I have no problem with reconstructing the data once it is extracted, my problem is one of extraction. I really don't care what it looks like I just need it extracted. Have you any examples of code that would help with the extraction?

    I would have no hope of getting info from the proprietor so that's not an option.

    Jefferson, I really appreciate your help, after all the exercise we went thru a couple of years ago meant I could scrape data from three different betting sites.

  10. #10
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Screen scraper (Access 2003 SP1)

    That table doesn't really have any identifying characteristics, other than 11 columns. However, the rows that list the horses do have id attributes. Accordingly, before you retrieve the data from the other 3 tables, you can get the names of the horses like so. Note that the number in the first column of the table is the same as the array counter (you need to ignore element 0 in the array).
    <pre>'Find the race rows and store data to a string array (via a split())
    Dim aRow As MSHTML.HTMLTableRow, intHorseCount As Integer
    Dim strConcat As String, strHorses() As String
    'Set aTable = FindTableWithNumColumns(ieDocSrc, 11)
    Do
    Set aRow = ieDocSrc.getElementById("Runner_" & CStr(intHorseCount + 1))
    If (aRow Is Nothing) Then Exit Do
    strConcat = strConcat & "|" & aRow.Cells(1).innerText
    intHorseCount = intHorseCount + 1
    Loop
    strHorses = Split(strConcat, "|") ' ignore the 0th element
    </pre>

    Alternatively, you could put the names directly into a recordset. I will leave that approach to you.

  11. #11
    Platinum Lounger
    Join Date
    Dec 2001
    Location
    Melbourne, Australia
    Posts
    4,594
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Re: Screen scraper (Access 2003 SP1)

    That's excellent thank you, I can populate the table with the number and horse name.

    I noticed in an earlier post that you said my existing code would not work. So my next problem is how to get at the other tables and how to extract data from them.

    Sorry for being such a pest but if I could just get at the other columns that would just about solve it.

  12. #12
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Screen scraper (Access 2003 SP1)

    To find the other tables, first get the id attributes of the <div> elements that contain them. Then see the code in <post#=532380>post 532380</post#> which grabs the first table inside a <div>.

  13. #13
    Platinum Lounger
    Join Date
    Dec 2001
    Location
    Melbourne, Australia
    Posts
    4,594
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Re: Screen scraper (Access 2003 SP1)

    To get the id attributes of the <div> elements that contain them. By this do you mean like the following from post 532380:
    Set aDiv = ieDocSrc.getElementById("Detail_Fluc")
    Then you have a For loop to set the aTable variable for the 1st table. Is this 1st table the Open column?

    After determining the aTable variable what code do I require to get at the Open odds data? I presume I need a loop to pick up these values.

  14. #14
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Screen scraper (Access 2003 SP1)

    Yes, inside the <div> whose id is "Detail_Fluc", there is a three column table which you can see in the above illustration.

    As you read through the HTML source of that page, you will next find two <div> elements with hidden tables (one's jockeys, I forget the other one). Then there are <div> elements for the other three green boxes in the illustration.

    With those <div> id values, you can use the loop in the above code to grab on to the table.

    Once you have the table, you can use the old code you used with the "straightforward" table design that site used before to work your way down the rows and read out the values in the 3 or 2 columns.

    During this process, I recommend using break points or Stop statements in your procedure so that you can inspect the hierarchy of rows and cells in these tables. I think that all four of those have 7 rows and the number of cells per row that you see on the screen, but you'll want to check, and remember that the cells collection is zero based (so for the Detail_Fluc table you'll be reading cells 0, 1 and 2).

  15. #15
    Platinum Lounger
    Join Date
    Dec 2001
    Location
    Melbourne, Australia
    Posts
    4,594
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Re: Screen scraper (Access 2003 SP1)

    Thanks Jefferson, the way you have described this it should be reasonably straightforward to do this.

    I will do some testing of the code you have supplied alongwith the old code you supplied about 2 years back and I'll get back to you.

    This will save me a lot of time.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •