Results 1 to 5 of 5
  1. #1
    Star Lounger
    Join Date
    Sep 2002
    Location
    Honolulu, Hawaii, USA
    Posts
    63
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Aloha,
    I am trying to parse data into an Excel spreadsheet from a textfile that has been captured off of a web page. To creat the textfile, I right-click the page and choose 'View Source', and save the resultant window as a textfile. The textfile has a bunch of html code, and nestled in there somewhere is the data for several records that i want to get at. I'm OK writing code to look for and capture elements of the data records that are within the web page code... but can I please have some advice, or be directed to a helpful resource, on how to best open or access the textfile in the first place? when I open it in Excel, the records I'm looking for are all in one concatenated mass in a few cells with 32,759 characters. I guess I could parse my way through these cells, but I thought it might be easier to read in the entire text into VB and parse it from there. Suggestions?

    Mahalo,

    John Jacobson

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts
    Are the data you're looking for in table format? If so, you'd be better off using a web query (in Excel 2003 and before, select Data | Import external data | New web query...); you can specify the web page then point to the table that you want to import.

    Otherwise we'd have to know much more about the structure of the web page.

  3. #3
    Star Lounger
    Join Date
    Sep 2002
    Location
    Honolulu, Hawaii, USA
    Posts
    63
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Thanks Hans,

    The web query doesn't get me what I'm after -- I get the same results as if I selected and copied off of the web page. I'll give you more information in hopes that will help.

    These are foreclosure listings from a local newspaper. I have some code to parse out the fields I need -- property info, lender, trustee, auction date, etc. -- but what I want is a unique record identifier, an ID for each ad. It's available if I View Source, but not if I copy off of the web page or use a web query.

    The website is http://class2.honadv.com/classified/classi...ll&cid=5755. I made the attached textfile (I had to zip it - wouldn't upload otherwise) by right-clicking the web page and choosing View Source. I made the attached Excel file by selecting and copying the listings from the web page, then pasting into Excel. In the View Source textfile, there is a unique ID number (preceded by "?aid=") for each listing. That's what I want to get at.

    So I was thinking I could just open the View Source textfile, OPEN AS #1 or something like that, and plow through the listings with my code. I just need an example of code that does this, opens and reads a textfile, or I need a resource that will have some instructions I can use.

    Does that clarify my situation a little? Apologies for the confusion and the lengthy explanation! Mahalo,

    John Jacobson
    Attached Files Attached Files

  4. #4
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts
    If you want to get the first number after each ?aid=, you could use the following code as a starting point:

    Code:
    Sub GetIDs()
      Dim wsh As Worksheet
      Dim r As Long
      Dim strFile As String
      Dim f As Integer
      Dim strLine As String
      Dim lngPos1 As Long
      Dim lngPos2 As Long
    
      On Error GoTo ErrHandler
    
      ' Create sheet for output
      Set wsh = Worksheets.Add
    
      ' Open the text file - substitute the correct path and filename
      strFile = "C:\Test\Adv FC Capture 20090928.txt"
      f = FreeFile
      Open strFile For Input As #f
    
      ' Read the file line by line
      Do While Not EOF(f)
    	Line Input #f, strLine
    	' Search for ?aid=
    	lngPos1 = InStr(strLine, "?aid=")
    	Do While lngPos1 > 0
    	  ' Search for apostrophe
    	  lngPos2 = InStr(lngPos1 + 5, strLine, "'")
    	  If lngPos2 > 0 Then
    		' Write value to next cell
    		r = r + 1
    		wsh.Cells(r, 1) = Mid(strLine, lngPos1 + 5, lngPos2 - lngPos1 - 5)
    	  End If
    	  lngPos1 = InStr(lngPos1 + 1, strLine, "?aid=")
    	Loop
      Loop
    
    ExitHandler:
      On Error Resume Next
      Close #f
      Exit Sub
    
    ErrHandler:
      MsgBox Err.Description, vbExclamation
      Resume ExitHandler
    End Sub

  5. #5
    Star Lounger
    Join Date
    Sep 2002
    Location
    Honolulu, Hawaii, USA
    Posts
    63
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Righteous awesome Hans! I'm not worthy! Thank you, exactly what I was looking for and much more. I hope someone is paying you well.

    Mahalo,
    John Jacobson

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •