Results 1 to 15 of 15
  1. #1
    Lounger
    Join Date
    Jan 2008
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I am looking for the easiest way to import the attached file into Excel so that analysis can be performed. The file was created by another application and has all the pertinent data. What I am struggling with is how to get the two lines of data onto one line in order to easily import it into Excel using the Text to Data feature. If you look at the copy/paste below, you'll notice the header and the subsequent data is coming across on two lines. I cannot do a clean import into Excel and this is quite a big file. All suggestions are welcome.

    CHECK
    BANK CHECK # CHECK AMT CHECK DATE VENDOR # ADDRS # VENDOR NAME STATUS
    WO # AMOUNT G/L ACCT # DESCRIPTION INVOICE # INV VEND
    ------------------------------------------------------------------------------------------------------------------------------------

    1 1 $38742.79 05/21/09 INT01 0 INTERNAL REVENUE SERVICE OUTSTANDING
    38742.79 01-00000-20600-00000-000 PAYROLL TAXES FOR PE 4/24/09 PPE 4/24/09

    1 2 $552.60 05/21/09 CON06 0 CONEXIS OUTSTANDING
    552.60 90-00000-23327-00000-000 FLEX 125-DEPENDENT CARE REIMBS 051509

    Thanks.
    Attached Files Attached Files

  2. #2
    WS Lounge VIP sdckapr's Avatar
    Join Date
    Jul 2002
    Location
    Pittsburgh, Pennsylvania, USA
    Posts
    11,225
    Thanks
    14
    Thanked 342 Times in 335 Posts
    I think the easiest thing to do would be to have the program exporting this text file, export it in a more appropriate way. Some programs can export into an excel worksheet instead of using text.

    If it is only this file, I would manually work on it, trying to determine what "type" the line of text is. If you are only interested in the data and not the header, I would add first add a column numbering the items (to always be able to sort to the original list) then a column with a 0, 1, 2, 3 (0 for header, 1 for line1, 2 for line2, 3 for blank) then you can sort by type and then original number, delete the rows with 0 and 3, and then do a text to column with type 1 and type 2 and then move type 2 columns into the columns in the same rows as the type 1


    If you have to do this a lot a macro could be written to do it, but how complex will depend on how much it may change from time to time

    Steve

  3. #3
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,055
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Hi aapke,

    Before loading the file into Excel, load it into Word and run the following macro against it, then re-save the file and load into Excel. From there you can use the Excel text-to-columns function, with fixed-with splits, to parse the data.

    Code:
    Sub StatementReformat()
    Application.ScreenUpdating = False
    With ActiveDocument.Range.Find
      .ClearFormatting
      .Replacement.ClearFormatting
      .Forward = True
      .Wrap = wdFindContinue
      .Format = False
      .MatchCase = False
      .MatchWholeWord = False
      .MatchAllWordForms = False
      .MatchSoundsLike = False
      .MatchWildcards = True
      'First Page
      .Text = "REPORT*EXTRACT FILE : [A-Z0-9]{1,}^13^12"
      .Replacement.Text = ""
      .Execute Replace:=wdReplaceOne
      'Column Headers (except New First Page)
      .Text = "^12*VEND^13*[-]{1,}^13"
      .Execute Replace:=wdReplaceAll
      'Header Underline on First Page
      .Text = "[-]{1,}^13"
      .Execute Replace:=wdReplaceAll
      'Last Page
      .Text = "          TOTAL #*^12REPORT*{1,255}PAGE*{1,255}FUND TOTALS*[=]{1,}*^13*^13"
      .Execute Replace:=wdReplaceAll
      'Unwanted Header Lines - First Page
      .Text = "REPORT*CHECK^13"
      .Execute Replace:=wdReplaceAll
      'Empty Paragraphs
      .Text = "[ ]{1,}^13"
      .Execute Replace:=wdReplaceAll
      'Header Line Wrap - First Page
      .Text = "(STATUS)^13"
      .Replacement.Text = "\1"
      'Record Line Wraps
      .Execute Replace:=wdReplaceAll
      .Text = "(OUTSTANDING)^13"
      .Execute Replace:=wdReplaceAll
      .Text = "(CLEARED)^13"
      .Replacement.Text = "\1    "
      .Execute Replace:=wdReplaceAll
      .Text = "(VOIDED)^13"
      .Replacement.Text = "\1     "
      .Execute Replace:=wdReplaceAll
      'Pad Records with Multiple Rows
      .Text = "(^13)([!^13]{1,120}^13)"
      .Replacement.Text = "\1                                                                                                                               \2"
      .Execute Replace:=wdReplaceAll
      .Execute Replace:=wdReplaceAll
    End With
    Application.ScreenUpdating = True
    End Sub
    Update:
    I noticed that negative values in the data have the '-' sign trailing the values concerned. Excel doesn't handle those too well. You can convert them to values with a preceding '-' sign by adding the following four lines immediately before the 'End With' statement in the macro:
    Code:
      'Reformat -ve values
      .Text = "( )([0-9]{1,}.[0-9]{2})(-)"
      .Replacement.Text = "\3\2\1"
      .Execute Replace:=wdReplaceAll
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  4. #4
    Lounger
    Join Date
    Jan 2008
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Okay, that was pretty cool to just watch. Thank YOU! Do I need to run it more than once to fix the issue with the multiple lines for one record?

    1 3000022 $697.49 04/16/09 COA05 0 COASTAL BLUE, INC OUTSTANDING 24.84 50-00000-62250-07203-000 FILM/MAPS/BLUEPRINTS 423225

    25.91 25-00000-63608-80028-000 FILMS/MAPS/BLUEPRINTS 423272
    12.34 50-00000-62250-09207-000 FILMS/MAPS/BLUEPRINTS 423458

    20.10 25-00000-63608-80058-000 FILMS/MAPS/BLUEPRINTS 423527
    19.22 50-00000-62250-09207-000 FILMS/MAPS/BLUEPRINTS 423950

    59.86 25-00000-63608-80068-000 FILMS/MAPS/BLUEPRINTS 423980
    51.56 25-00000-63608-80064-000 FILMS/MAPS/BLUEPRINTS 424065

    77.85 01-62000-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424076
    30.44 50-00000-62250-07104-000 FILMS/MAPS/BLUEPRINTS 424187

    77.58 50-00000-62250-07107-000 FILMS/MAPS/BLUEPRINTS 424191
    37.34 50-00000-62250-07101-000 FILMS/MAPS/BLUEPRINTS 424320

    47.02 01-62000-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424439
    20.10 60-94030-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424479

    29.56 60-94030-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424480
    53.71 60-94030-63608-00000-000 FILMS/MAPS/BLUEPRINTS 424620

    34.53 50-00000-62250-07104-000 FILMS/MAPS/BLUEPRINTS 424695
    75.53 25-00000-63608-80065-000 FILMS/MAPS/BLUEPRINTS 425067

    Thanks.
    Attached Files Attached Files

  5. #5
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,055
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Hi aapke,

    You shouldn't need to. The two consecutive '.Execute Replace:=wdReplaceAll' statements near the end of the first code module I posted should take care of that - it did in my testing.
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  6. #6
    Lounger
    Join Date
    Jan 2008
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I just knew I did something incorrectly. Basic steps I used are: Generate the report again in the finance system. Saved the generated file as .TXT. Opened it up in Word. Inserted your totally cool code into a module. Saved the file as a .docm. Ran the macro, sat back, and watched it work it's magic. Saved the file as a .TXT again. Imported it into Excel using the Text to Data feature. And yes, I did incorporate the additional 4 lines.

    Did I miss a step along the way? Either way, the new file I have to work with is so much cleaner than the old one, it won't take more than a few hours to clean it up manually if needed! Thank you so much!

    aapke

  7. #7
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,055
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Hi aapke,

    I suspect what you're seeing is line-wrapping effects. Attached is a Word file containing the full macro, with the prcoessed data and the page format set up so that each record takes a single line in the document. As you can see, the multi-entry records are fully padded.
    Attached Files Attached Files
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  8. #8
    Lounger
    Join Date
    Jan 2008
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi macropod,

    I've copied the steps exactly (I believe) and am still getting the line wrapping. Can you tell from a quick glance what I am doing incorrectly?

    Thanks again,
    aapke
    Attached Files Attached Files

  9. #9
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,055
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Hi aapke,

    I don't understand why your system isn't behaving the same as mine. Is your copy of Word of Word 2007 fully updated? There were some problems with the vba implementation in the earlier releases. I've run the code with both Word 2000 SP3 and Word 2007 SP2 and it works fine. Nevertheless, it seems what you're ending up with is not so much a line wrapping issue as unwanted line breaks in every 2nd padded line. You should be able to fix that by adding the following extra lines to your code:
    Code:
      'Fix Broken Lines
      .Text = "( )(^13)( )"
      .Replacement.Text = "\1\3"
      .Execute Replace:=wdReplaceAll
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  10. #10
    Lounger
    Join Date
    Jan 2008
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Ah. That may just be it. My copy of Word 2007 shows as "MS Word 2007 (12.0.6514.5000) SP2 MSO (12.0.6521.5000). Hopefully that'll fix everything!

    Thank you again for ALL the assistance.
    aapke

  11. #11
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,055
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Hi aapke,
    My copy of Word 2007 shows as "MS Word 2007 (12.0.6514.5000) SP2 MSO (12.0.6521.5000).
    In that case, you do indeed have Word 2007's SP2 update installed. Strange - perhaps it's just some little glitch lurking somewhere else in your system.
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  12. #12
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,055
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Hi aapke,

    Instead of adding the extra lines of code for repairing the broken data lines that I suggested earlier, try changing:
    Code:
      'Pad Records with Multiple Rows
      .Text = "(^13)([!^13]{1,120}^13)"
      .Replacement.Text = "\1                                                                                                                               \2"
      .Execute Replace:=wdReplaceAll
      .Execute Replace:=wdReplaceAll
    to
    Code:
      'Pad Records with Multiple Rows
      .Text = "(^13)([ ]{20})([ ]{1,8}[0-9]{1,8}.[0-9]{2})"
      .Replacement.Text = "\1\2\2\2\2\2\2\2       \3"
      .Execute Replace:=wdReplaceAll
    You'll notice that the new block has only one '.Execute Replace:=wdReplaceAll' statement. That's all I found necessary and it should make the overall execution just that little bit quicker - as should not needing those other extra lines of code for repairing the broken data lines .
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  13. #13
    Lounger
    Join Date
    Jan 2008
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Will do!

    Thank you again.
    aapke

  14. #14
    New Lounger
    Join Date
    Oct 2016
    Posts
    8
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by macropod View Post
    Hi aapke,

    Instead of adding the extra lines of code for repairing the broken data lines that I suggested earlier, try changing:
    Code:
      'Pad Records with Multiple Rows
      .Text = "(^13)([!^13]{1,120}^13)"
      .Replacement.Text = "\1                                                                                                                               \2"
      .Execute Replace:=wdReplaceAll
      .Execute Replace:=wdReplaceAll
    to
    Code:
      'Pad Records with Multiple Rows
      .Text = "(^13)([ ]{20})([ ]{1,8}[0-9]{1,8}.[0-9]{2})"
      .Replacement.Text = "\1\2\2\2\2\2\2\2       \3"
      .Execute Replace:=wdReplaceAll
    You'll notice that the new block has only one '.Execute Replace:=wdReplaceAll' statement. That's all I found necessary and it should make the overall execution just that little bit quicker - as should not needing those other extra lines of code for repairing the broken data lines .

    Hi,
    I also tried this code and got really impressed the way you have used regular expressions within the with...end with block. I am an absolute beginner as far as RegEx is concerned. Just want to know if you could allow me to share a similar file with you that I have from our database. That way looking at that code that you will assist me with, will give me a better understanding how RegEx could help me with kind of report I have to deal with on regular basis.

    I would be waiting for your feedback.

    Kind regards
    Syed

  15. #15
    New Lounger
    Join Date
    Oct 2016
    Posts
    8
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Hi Paul,
    OK now I understand this is not RegEx but wildcards within MS Office that are similar to RegEx. However, even though I have looked up some reference material but still need your help. Only if you could explain the logic behind following few lines extracted from your above code. Note that to make my query more understandable and precise I have avoided With...End With block and have re-written it otherwise:

    Code chunk #1:
    ActiveDocument.Range.Find.Text = " TOTAL #*^12REPORT*{1,255}PAGE*{1,255}FUND TOTALS*[=]{1,}*^13*^13"


    Code chunk #2:
    ActiveDocument.Range.Find.Text = "(CLEARED)^13"
    ActiveDocument.Range.Find.Replacement.Text = "\1 "
    ActiveDocument.Range.Find.Execute Replace:=wdReplaceAll


    Code chunk #3:
    'Pad Records with Multiple Rows
    ActiveDocument.Range.Find.Text = "(^13)([ ]{20})([ ]{1,8}[0-9]{1,8}.[0-9]{2})"
    ActiveDocument.Range.Find.Replacement.Text = "\1\2\2\2\2\2\2\2 \3"
    ActiveDocument.Range.Find.Execute Replace:=wdReplaceAll
    Question referring to chunk# 1: I need your help as to what all those curly braces and numbers mean? I am not sure about spaces before word "TOTAL" in your code. Also the CHECKREG.TXT does not have word "FUND TOTALS", so I am not sure what does this part of code line mean?


    Question referring to chunk# 2 : Understand that code looks for word "CLEARED" followed by paragraph mark but I need some clarity on the next two lines. What these last two lines are doing?

    Question referring to chunk# 3: I am totally clueless to these last 3 lines. Can you please explain about their function?

    It is hoped you will appreciate my intent to learn some code from your already created set of code instead of digging into something from scratch.

    I would be waiting for your response.

    REGARDS,
    SMI

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •