Results 1 to 4 of 4
  1. #1
    Lounger
    Join Date
    Oct 2004
    Location
    Indianapolis, Indiana, USA
    Posts
    45
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Accessing HTML Source Code in Program (Excel 2003)

    I may be hoping for the impossible, but here goes: I would like to programmatically create a list of the page titles of a number of Web pages (about 6000 of them). The URL for each Web page has a unique, 4-digit number, but otherwise all URLs are identical. I would like to pull the text that appears in the <title></title> tags for each URL. To make it more complicated, some of the URLs will return a 404 error, so I need to account for that as well.

    What do you think? Impossible?

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Accessing HTML Source Code in Program (Excel 2003)

    Perhaps you can use this code as starting point. See the notes below.

    Sub LoopWebPages()
    Dim ie As Object
    Dim n As Long
    Dim strTitle As String
    Set ie = CreateObject("InternetExplorer.Application")
    On Error GoTo ErrHandler
    With ie
    For n = 1000 To 9999
    .Navigate "http://www.test.com/page" & Format(n, "0000")
    Do While .busy
    DoEvents
    Loop
    Do While .ReadyState <> 4
    DoEvents
    Loop
    If InStr(.Document.Body.InnerHTML, "relevant text") = 0 Then
    strTitle = .Document.Title
    ' Do something with the title
    Debug.Print n, strTitle
    End If
    Next n
    End With

    ExitHandler:
    On Error Resume Next
    ie.Quit
    Set ie = Nothing
    Exit Sub

    ErrHandler:
    MsgBox Err.Description, vbExclamation
    Resume ExitHandler
    End Sub

    Notes:
    1) Change the URL "http://www.test.com/page" & Format(n, "0000") as needed.
    2) Change the lower and upper bounds 1000 and 9999 as needed.
    3) You'll have to find out what the title of a "not found" page is in your browser and change "relevant text" to that title, or a significant part of it.
    4) The code will take a loooong time to execute if you want to process 6,000 web pages!

  3. #3
    WS Lounge VIP rory's Avatar
    Join Date
    Dec 2000
    Location
    Burwash, East Sussex, United Kingdom
    Posts
    6,280
    Thanks
    3
    Thanked 191 Times in 177 Posts

    Re: Accessing HTML Source Code in Program (Excel 2003)

    Here's an alternative:
    <pre>Function GetHTMLTitle(strURL As String) As String
    Dim XML As Object, strHTML As String
    Dim lngStart As Long, lngStop As Long
    Set XML = CreateObject("Microsoft.XMLHTTP")
    If LCase$(Left$(strURL, 4)) <> "http" Then strURL = "http://" & strURL
    XML.Open "GET", strURL, False
    XML.Send
    strHTML = XML.responseText
    lngStart = InStr(strHTML, "<title>") + 7
    lngStop = InStr(strHTML, "</title>")
    If lngStop > 0 And lngStart > 0 Then
    GetHTMLTitle = Mid$(strHTML, lngStart, lngStop - lngStart)
    End If
    End Function

    </pre>


    FWIW
    Regards,
    Rory

    Microsoft MVP - Excel

  4. #4
    Lounger
    Join Date
    Oct 2004
    Location
    Indianapolis, Indiana, USA
    Posts
    45
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Accessing HTML Source Code in Program (Excel 2003)

    Thanks to both of you. I'll likely use a combination of your examples to get the final product.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •