Results 1 to 9 of 9
  1. #1
    New Lounger
    Join Date
    Nov 2006
    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts

    FileSearch - PDF (Excel VBA 2003)

    Is it possible to search for text in PDF documents using the FileSearch object? I have tried it and it does not appear to work presumably because it is not part of MS Office.

    If it is not possible I would be interested to know if there is a work around?

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: FileSearch - PDF (Excel VBA 2003)

    FileSearch fails because the text isn't stored in a directly readable form in PDF files.
    Desktop search engines (available from Microsoft, Google and others) are able to search many more types of files, including PDF files.

  3. #3
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: FileSearch - PDF (Excel VBA 2003)

    Perhaps you can hook into Windows' built-in search features? These seem to have advanced beyond the old FileSearch interface. Probably worth a test first: if you use Windows' Search feature and target your PDF files, are the right ones found?

  4. #4
    New Lounger
    Join Date
    Nov 2006
    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: FileSearch - PDF (Excel VBA 2003)

    Yes, the standard windows search facility seems to work most but not all of the time. It may depend on the type of pdf.

    Is there any way I can incorporate the functionality from this search tool into VB code ? My overalll objective is tosearch several hundred documents for about 70 key words and the method and summarise which key words appear in which file. Most of the documents are in Word format but I still have about 30 or so in pdf.

  5. #5
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: FileSearch - PDF (Excel VBA 2003)

    I've only started looking at this, but perhaps the documentation surrounding this page on MSDN will help get into Windows' native search functionality.

  6. #6
    Silver Lounger GARYPSWANSON's Avatar
    Join Date
    Aug 2001
    Location
    Frederick, Maryland, USA
    Posts
    1,788
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Re: FileSearch - PDF (Excel VBA 2003)

    I have had similar requests recently. As to searching PDF files, this link may be of assistance http://www.adobe.com/education/pdf/acrobat...t7_lesson11.pdf

    I have found that using the native search feature to search many word docs is very slow. I put together a system to search all resumes in one folder that returns the file name and matching results for a keyword search but was not happy with the time it took to return data. I found it is much quicker to parse the word documents word by word into an access table and then use queries on the table. This is not indexing but more of a brute force method to do so. Anyway, the code to parse the word data is as follows:

    <pre>Dim db As DAO.Database
    Dim rst As DAO.Recordset

    Dim wordApp As Word.Application
    Dim wordDoc As Word.Document
    Dim DocTitle As String
    Dim strsql As String
    Dim FName As String
    Dim COUNTER As Integer


    With Application.FileSearch
    .NewSearch
    .FileType = msoFileTypeWordDocuments
    .LookIn = Me.Text_PathToResumes 'Text box pointing to folder
    .Execute

    For i = 1 To .FoundFiles.Count
    CurrentDb.Execute "Insert Into TableDocuments (DocName) Values (" _
    & Chr(34) & RemovePathName(.FoundFiles.Item(i)) & Chr(34) & ")"
    FName = RemovePathName(.FoundFiles.Item(i)) 'function removes path name and returns only file name
    docname = .FoundFiles.Item(i)


    Set wordApp = CreateObject("Word.Application")
    Set wordDoc = wordApp.Documents.Open(FileName:=docname) '
    Set colwords = wordDoc.Words

    Set db = CurrentDb
    Set rst = db.OpenRecordset("T_Data") 'Put Filename and Words in table T_Data

    For Each strword In colwords
    strword = LCase(strword)
    strletter = Left(strword, 1)
    If Asc(strletter) < 97 Or Asc(strletter) > 122 Then 'Get rid of punctuation that Word treats as words
    Else
    With rst
    .AddNew
    !FileName = FName
    !DocWords = strword
    .Update
    End With
    End If
    Next


    wordDoc.Close False
    wordApp.Quit False
    Set wordDoc = Nothing
    Set wordApp = Nothing
    rst.Close
    Set db = Nothing
    Set rst = Nothing


    Next i

    End With
    Refresh
    msg = "Data Loading Complete. " & i - 1 & " Files Loaded"
    MsgBox msg, vbInformation, "JOB COMPLETE"


    Exit Sub
    </pre>



    Anyway, hope this helps.
    Regards,

    Gary
    (It's been a while!)

  7. #7
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: FileSearch - PDF (Excel VBA 2003)

    Wouldn't it be more efficient to create the Word.Application object before you enter the For i = 1 to .FoundFiles.Count loop, and only quit and destroy it after finishing the loop? The overhead of starting and quitting Word each time is considerable.

  8. #8
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Weert, Limburg, Netherlands
    Posts
    4,812
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: FileSearch - PDF (Excel VBA 2003)

    If I'm not mistaken, FileSearch will NOT be available in Office 2007!

    ###EDITED###

    I just checked and proved myself wrong, sorry if I caused confusion...
    Jan Karel Pieterse
    Microsoft Excel MVP, WMVP
    www.jkp-ads.com
    Professional Office Developers Association

  9. #9
    Silver Lounger GARYPSWANSON's Avatar
    Join Date
    Aug 2001
    Location
    Frederick, Maryland, USA
    Posts
    1,788
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Re: FileSearch - PDF (Excel VBA 2003)

    Hans,

    I agree it would be more efficient, however, for whatever reason, if I do it the way you suggest, Word does not release the documents and it appears that the docs are still open in memory. Thus, if you try to run the process again or open the doc, Word states that the doc is already open and you wind up opening it as read only but it still does not release. It got to the point that the only way to release the docs was to re-boot the system. Quitting and Destroying the application after every iteration, although with the overhead, is the only way I can get this to work and ensure the docs are closed. Even with the overhead, it still opens and closes very quickly.
    Regards,

    Gary
    (It's been a while!)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •