Results 1 to 14 of 14

Thread: Unknown Files

  1. #1
    Star Lounger
    Join Date
    May 2013
    Posts
    59
    Thanks
    34
    Thanked 6 Times in 6 Posts

    Unknown Files

    Hi,

    I have a spreadsheet that will need to utilize files at any given time from a folder. The folder has .pdf, .txt, and various image files. In addition, it has files with no extension similar to below:

    4bd6e3cf-b02f-462c-9e52-53d7228ca9ac

    There are over 1,000 of the above files (no extension) and 2,663 files total. My forensic skills have found that they are a mix of formats that I aforementioned. Does anyone know of a VBA method to determine the file type other than the obvious? Once determined, I would also need a way to change the file names with the correct extension.

    Thanks in advance,
    Alexandra

  2. #2
    Gold Lounger Maudibe's Avatar
    Join Date
    Aug 2010
    Location
    Pa, USA
    Posts
    2,631
    Thanks
    115
    Thanked 645 Times in 589 Posts
    Alexandra,

    This was a real challenge but it can be done. Borrowing some snippets, the following code was pieced together and modified to achieve your task. I took some copies of PDFs and renamed them with no extension to simulate what you are describing. Bare with me because it is a bit wordy and there are quite a few steps to take.

    1. Go to the folder in Windows Explorer where the files are. At the top, click on Type to sort the files by their type. Scroll down to the first file with no extension and click on it to highlight it. Go to the last file with no extension, hold down the shift key, then click on the last file with no extension. All the files with no extension should now be highlighted.

    2. Copy those files to a folder in the root directory folder you created (ex. C:\Junk). You can manipulates these files and not damage your originals

    3. Highlight the copied files as described in step 1 and while holding down the shift key, right click on the selected files and click "Copy as path". Go to a blank spreadsheet and right click in cell A2 and select paste. This is an old IT maneuver described in a recent post by Duthiet that will copy the list of the names of your selected files. You should be displaying the path of all your files in column A with no extension starting at cell A2. (See column A in image)

    4. Run the following code that will append an extension of ".pdf" to the path and place it adjacent in column B.

    In a standard module
    Code:
    Public Sub RenameFileNames()
    'AMEND THE FILE NAME BY ADDING EXTENSION
    Dim OldName
    LastRow = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row
    For I = 2 To LastRow
        OldName = Cells(I, 1)
        Cells(I, 2) = OldName & ".pdf"
    Next I
    End Sub
    Column B should look like column A except it has the extension .pdf added. (See column B in image)

    5. Now you need change the actual file names with the amended names in column B. Run the following code:

    In a standard module
    Code:
    Public Sub RenameFiles()
    'RENAME FILES
    On Error Resume Next
    LastRow = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row
    For I = 2 To LastRow
        Name Cells(I, 1) As Cells(I, 2)
    Next I
    End Sub
    The trick here was that you had to add the "On Error Resume Next" line for it to work. All the files that previously did not have an extension now have the extension added.

    6. Test to see if they are PDF Files by running the following code. Going down the list in column B, it will launch Adobe and attempt to open the file. The code then checks to see if the file is open. If it is open, it places "PDF" in column C next to the file name. If the file did not open, an error message will be generated and the cell in column C next to the file name will be blank. After 5 seconds, the file will close and move to the next file in the list. Looking down Column C, files that are true PDF files have "PDF" in column C. (See column C in image)

    Code:
    Sub CheckPDF()
    On Error Resume Next
    'CHECK IF FILE OPENED THEN RETURN IF SUCCESSFUL
    Dim strTerminateThis As String
    Dim objWMIcimv2 As Object
    Dim objProcess As Object
    Dim objList As Object
    Dim intError As Integer, I As Integer
    Application.DisplayAlerts = False
    '---------------------------------------------
    'OPEN PDF FILES
    LastRow = ActiveSheet.Cells(Rows.Count, 2).End(xlUp).Row
    For I = 2 To LastRow
        Cells(I, 2).Select
        Shell "C:\Program Files (x86)\Adobe\Reader 11.0\Reader\AcroRd32.exe " & Cells(I, 2), vbMaximizedFocus
        Application.Wait (Now + TimeValue("0:00:05"))
    '---------------------------------------------
    'VERIFY IF FILE OPENED
        Ret = IsFileOpen(Cells(I, 2))
        If Ret = True Then
            Cells(I, 3) = "PDF"
        End If
    '---------------------------------------------
    'CLOSE FILES
        strTerminateThis = "AcroRd32.exe"
        Set objWMIcimv2 = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\.\root\cimv2")
        Set objList = objWMIcimv2.ExecQuery("select * from win32_process where name='" & strTerminateThis & "'")
        For Each objProcess In objList
            intError = objProcess.Terminate
        Next
    Next I
    '---------------------------------------------
    'CLEANUP
        Set objWMIcimv2 = Nothing
        Set objList = Nothing
        Set objProcess = Nothing
    Application.DisplayAlerts = True
    End Sub
    7. Up to this point, we have isolated the files with no extension in a designated folder, renamed the file with a .pdf extension and then tested as to whether Adobe can open it and marked down which ones it could. Next you have to create a new column (D) with the path of a destination folder + the file name that you want to move the identified PDFs to. Create a new folder in the root directory called C:\NewFolder and run the following code. (See column D in image)

    Code:
    Public Sub MovePDF()
    On Error Resume Next
    'MOVE FILES
    LastRow = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row
    For I = 2 To LastRow
        If Cells(I, 3) = "PDF" Then
            s = Split(Cells(I, 2), "\")
            Cells(I, 4) = "C:\NewFolder\" & s(2)
            Name Cells(I, 2) As Cells(I, 4)
        End If
    Next I
    End Sub
    The files in the C:\NewFolder will be all the files that had no extension, now identified and renamed as .pdf, and congregated in a folder.

    Alouso1.png

    The above steps could be repeated for the other file types.

    HTH,
    Maud

  3. The Following User Says Thank You to Maudibe For This Useful Post:

    Alouso (2015-06-12)

  4. #3
    Star Lounger
    Join Date
    May 2013
    Posts
    59
    Thanks
    34
    Thanked 6 Times in 6 Posts
    OK, this is a lot to take in. I am going to play with this and post back. Thank you Maud

  5. #4
    Star Lounger
    Join Date
    May 2013
    Posts
    59
    Thanks
    34
    Thanked 6 Times in 6 Posts
    Maudibe,

    I got an error message on line Ret = IsFileOpen(Cells(I, 2)). I don't see the subroutine. Can you list it?

    Alex

  6. #5
    Gold Lounger Maudibe's Avatar
    Join Date
    Aug 2010
    Location
    Pa, USA
    Posts
    2,631
    Thanks
    115
    Thanked 645 Times in 589 Posts
    My bad!!! Please add the following function to the module.

    Code:
    Function IsFileOpen(FileName As String)
        Dim ff As Long, ErrNo As Long
    
        On Error Resume Next
        ff = FreeFile()
        Open FileName For Input Lock Read As #ff
        Close ff
        ErrNo = Err
        On Error GoTo 0
    
        Select Case ErrNo
        Case 0:    IsFileOpen = False
        Case 70:   IsFileOpen = True
        Case Else: Error ErrNo
        End Select
    End Function

  7. The Following User Says Thank You to Maudibe For This Useful Post:

    Alouso (2015-06-13)

  8. #6
    Star Lounger
    Join Date
    May 2013
    Posts
    59
    Thanks
    34
    Thanked 6 Times in 6 Posts
    Thanks Maudibe. It is working great! Genious!!! I am watching the files open and close one at a time. If adobe open them they are being flagged as "PDF". 10 minutes in to it and about 72 files checked. This will need to run for quite a while. I'll post back

  9. #7
    WS Lounge VIP
    Join Date
    Mar 2002
    Location
    Newcazzle, UK
    Posts
    2,820
    Thanks
    133
    Thanked 481 Times in 458 Posts
    Hi Alex

    ..can you tell us what are your other 'expected' file formats?
    i.e. are the graphics all the same file formats? e.g what file extensions do you expect/want??
    What are your expected/wanted text file formats e.g. .csv .txt .doc .rtf
    This would help us to help you.
    Can you give us any further info on how/where these files originate????

    zeddy

  10. #8
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,433
    Thanks
    371
    Thanked 1,456 Times in 1,325 Posts
    Alouso,

    This problem intrigued me so I did a little googling. You may find this free program TrID - File Identifier to be of use. You can set it to work on a whole directory and automatically add the extension. It should be much faster than using Excel (this is not to take away from Maud's excellent code!). HTH
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  11. #9
    Star Lounger
    Join Date
    May 2013
    Posts
    59
    Thanks
    34
    Thanked 6 Times in 6 Posts
    @RG:
    I found the same program when I first started to investigate how to accomplish this but I was looking for a way using VBA because there are operations in Excel that start prior to and following the determination of the types of the files. Thank you for your input.

    @Zeddy:
    As in my opening post, they are PDFs, text, and image files. They are scanned documents uploaded to a remote database hosted by another company. They are dumped as an export into a folder; many without extensions as described above. Once I know the file type, I have to decrypt the file name then attach it to an account. Thanks also for your input.

    @Maudibe:
    Your code ran for about 3 hours and 50 minutes and documented which files were PDFs. In step 7, instead of sending to another folder, I sent them to a subfolder within C:\Junk. Of the 2600+ files, 1556 were PDFs. Determining the rest of the files was just a process of elimination by renaming the remaining files with a .tiff extension. In Windows explorer I viewed those files as large thumbnails. All those that were image files showed a graphical thumbnail representation (1953) while those that were not showed the standard thumbnail for Windows Picture Viewer (54). As it turns out 52 of the 54 were text files and renamed as such leaving 2 unidentified. I thought I was going to end up doing this manually but you have saved me many hours of work!

    As it stands, I can't thank you enough.
    Alex

    Files.png

  12. #10
    WS Lounge VIP
    Join Date
    Mar 2002
    Location
    Newcazzle, UK
    Posts
    2,820
    Thanks
    133
    Thanked 481 Times in 458 Posts
    Hi Alex

    Do you have to do this on a regular basis? Or is it a one-off?

    The reason I ask is because it could be speeded up dramatically.

    zeddy

  13. #11
    Star Lounger
    Join Date
    May 2013
    Posts
    59
    Thanks
    34
    Thanked 6 Times in 6 Posts
    This question is for Maudibe.

    I have had to run a new batch, much larger than the first (4,044 files). I accidently stopped the code from running when I was part way through the batch. I was able to save my progress but would hate to have to start all over again. Is there a way to start at the same place where I left off?

    This program has been a life saver. Thank you so much!!!!

    Alexandra

  14. #12
    Gold Lounger Maudibe's Avatar
    Join Date
    Aug 2010
    Location
    Pa, USA
    Posts
    2,631
    Thanks
    115
    Thanked 645 Times in 589 Posts
    Alexandra,

    Look at your PDF column for the last listed type as PDF. Let's say you stopped at line 3000 the you simply change the code line in blue.

    Code:
    Sub CheckPDF()
    On Error Resume Next
    'CHECK IF FILE OPENED THEN RETURN IF SUCCESSFUL
    Dim strTerminateThis As String
    Dim objWMIcimv2 As Object
    Dim objProcess As Object
    Dim objList As Object
    Dim intError As Integer, I As Integer
    Application.DisplayAlerts = False
    '---------------------------------------------
    'OPEN PDF FILES
    LastRow = ActiveSheet.Cells(Rows.Count, 2).End(xlUp).Row
    For I = 3001 to 4044
        Cells(I, 2).Select
        Shell "C:\Program Files (x86)\Adobe\Reader 11.0\Reader\AcroRd32.exe " & Cells(I, 2), vbMaximizedFocus
        Application.Wait (Now + TimeValue("0:00:05"))
    '---------------------------------------------
    'VERIFY IF FILE OPENED
        Ret = IsFileOpen(Cells(I, 2))
        If Ret = True Then
            Cells(I, 3) = "PDF"
        End If
    HTH,
    Maud

  15. The Following User Says Thank You to Maudibe For This Useful Post:

    Alouso (2015-09-02)

  16. #13
    Gold Lounger Maudibe's Avatar
    Join Date
    Aug 2010
    Location
    Pa, USA
    Posts
    2,631
    Thanks
    115
    Thanked 645 Times in 589 Posts
    Alexandra,

    An after thought....

    If you convert your files first to .tiff extension and follow your method using icons to filter out what is not a TIFF file, you can reduce the number of files significantly that the code tests as a .pdf using Adobe.

    Maud

  17. The Following User Says Thank You to Maudibe For This Useful Post:

    Alouso (2015-09-02)

  18. #14
    Star Lounger
    Join Date
    May 2013
    Posts
    59
    Thanks
    34
    Thanked 6 Times in 6 Posts
    Zeddy,

    Sorry I missed your post but it should have been a one-time project however a second unexpected batch was overlooked. I don't think I will be repeating this project

    Thanks Maudibe. Your macro worked perfectly. After reviewing your code, I was going to change that line but didn't want to take the chance of messing things up.

    Alexandra

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •