Results 1 to 7 of 7
  1. #1
    Lounger
    Join Date
    Jan 2013
    Posts
    41
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Stratify Word Occurrences in a Range

    Hi, I have list of text values in the form of sentences in a specific column. I'd like to list say top 25 occurrences of words within this column and list how many times each word occurred. The values are in column C.

    A sample of this list is:

    Address company witness issues for trial
    Address issues for expert and possible motion regarding expert reports.
    Address issues re: plaintiff s request for corporate representatives depositions.
    Address issues relating to company witness scheduling and options for trial
    Address possible issues for trial concerning witnesses and exhibits and legal arguments for same.
    Communicate (within legal team) regarding expert discovery.
    Communicate (within legal team) regarding revisions to historian presentation.
    Conference call with trial team to review case and discuss projects to prepare for trial.


    Any ideas on how to do this and put the results in another column that would be like: Address, 5
    Communicate 4, etc.

  2. #2
    Bronze Lounger
    Join Date
    Jan 2001
    Location
    La Jolla, CA
    Posts
    1,468
    Thanks
    30
    Thanked 61 Times in 57 Posts
    I'll bet there's a VBA routine that will loop through your A column and do this. I wrote the following and filled down and across and it enables you to select specific words and counts them. Maybe that's a start/help?!
    Attached Images Attached Images
    Last edited by kweaver; 2016-08-16 at 18:03.

  3. #3
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,433
    Thanks
    371
    Thanked 1,456 Times in 1,325 Posts
    Mitch,

    Here's a VBA solution.

    Methodology
    Split sentences up to create word list in Column E.
    Use Advanced Filter to select Unique words in Column I (Col G is the Criteria Range for the advance filter)
    Create SumIFS formulas to count the occurrences of the unique words Column J
    Sort cols I-J in descending order (most frequent word at top of list).

    Code:
    Option Explicit
    
    Sub WordCount()
    
       Dim lCntr      As Long
       Dim lLastRow   As Long
       Dim lSCntr     As Long
       Dim lWordsCntr As Long
       Dim vWords     As Variant
       Dim zSent      As String
          
       
       lSCntr = 1
       lWordsCntr = 2
       Cells(1, "E").Value = "Words"
       Cells(1, "G") = Cells(1, "E")
       Cells(1, "I") = Cells(1, "E")
       Cells(1, "J").Value = "Counts"
       Application.Names.Add Name:="Criteria", RefersTo:=Range(Cells(1, "G"), Cells(2, "G"))
       Application.Names.Add Name:="Extract", RefersTo:=Cells(1, "I")
       
       Do
         zSent = Cells(lSCntr, 3).Value
         zSent = Replace(zSent, "(", "")
         zSent = Replace(zSent, ")", "")
         
         vWords = Split(zSent, " ")
         
         For lCntr = 0 To UBound(vWords)
            Cells(lWordsCntr, 5).Value = vWords(lCntr)
            lWordsCntr = lWordsCntr + 1
         Next lCntr
         
         lSCntr = lSCntr + 1
         
       Loop Until Cells(lSCntr, 3) = ""
       
       Range("E1").Select
       Range(Selection, Selection.End(xlDown)).Select
       lLastRow = Selection.Count
      
       With ActiveWorkbook.Worksheets("Sheet1").Sort
           .SortFields.Clear
           .SortFields.Add Key:=Range("E2:E" & lLastRow), SortOn:=xlSortOnValues, _
                           Order:=xlAscending, DataOption:=xlSortNormal
           .SetRange Range("E1:E" & lLastRow)
           .Header = xlYes
           .MatchCase = False
           .Orientation = xlTopToBottom
           .SortMethod = xlPinYin
           .Apply
        End With
        
    '*** Select Extracted Words and create Name of Database for use in Adv Filter ***
        ActiveWorkbook.Names.Add Name:="Database", RefersToR1C1:= _
            Selection
       
    '*** Extract Unique Words ***
        Range("E1:E" & lLastRow).AdvancedFilter Action:=xlFilterCopy, _
                                                CriteriaRange:=Range("Criteria"), _
                                                CopyToRange:=Range("Extract"), _
                                                Unique:=True
    
    '*** Add formulas to do the word counts ***
    
        lCntr = 2
        
        Do
          Cells(lCntr, "J").Formula = "=CountIfs(Database," & Cells(lCntr, "I").Address(False, True, xlA1) & ")"
          lCntr = lCntr + 1
        Loop Until Cells(lCntr, "I") = ""
        
    '*** Sort Counts by Frequency decending ***
         
         
        Range(Cells(1, "I"), Cells(1, "I").End(xlDown)).Select
        lLastRow = Selection.Count
       
       With ActiveWorkbook.Worksheets("Sheet1").Sort
           .SortFields.Clear
           .SortFields.Add Key:=Range("J2:J" & lLastRow), SortOn:=xlSortOnValues, _
                           Order:=xlDescending, DataOption:=xlSortNormal
           .SetRange Range("I1:J" & lLastRow)
           .Header = xlYes
           .MatchCase = False
           .Orientation = xlTopToBottom
           .SortMethod = xlPinYin
           .Apply
        End With
         
    End Sub     'WordCount()
    Example:
    CountWords.JPG

    Test File: WordCount RG V1.xlsm

    HTH
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  4. #4
    Gold Lounger Maudibe's Avatar
    Join Date
    Aug 2010
    Location
    Pa, USA
    Posts
    2,629
    Thanks
    114
    Thanked 644 Times in 588 Posts
    Here is an alternate approach following the same premise as RG but without using advanced filtering or multiple columns and using different coding. This version that splits up the sentences then displays a sorted list of unique words with the count in the adjacent column. I left RG's results for comparison.

    Click the "Count Word" button to run the code. Add as many sentences as you like. The code will adjust

    HTH,
    Maud

    Unique1.png

    Code:
    Public Sub CountUniqueWords()
    Application.ScreenUpdating = False
    '-----------------------------------------------------
    'DECLARE AND SET VARIABLES
    Dim Row As Long, LastRow As Long, I As Long, J As Long
    LastRow = ActiveSheet.Cells(Rows.Count, 3).End(xlUp).Row
    Row = 1
    '-----------------------------------------------------
    'BUILD LIST OF WORDS AND COUNT
    For I = 1 To LastRow
        s = Split(Cells(I, 3), " ")
        For J = 0 To UBound(s)
            s(J) = Replace(s(J), "(", "", , , vbTextCompare)
            s(J) = Replace(s(J), ")", "", , , vbTextCompare)
            s(J) = Replace(s(J), ":", "", , , vbTextCompare)
            s(J) = Replace(s(J), ";", "", , , vbTextCompare)
            s(J) = Replace(s(J), ",", "", , , vbTextCompare)
            s(J) = Replace(s(J), ".", "", , , vbTextCompare)
            Cells(Row, 4) = Trim(UCase(s(J)))
            Cells(Row, 5).Formula = "=Countif(Words, D" & Row & ")"
            Row = Row + 1
        Next J
    Next I
    '-----------------------------------------------------
    'GET UNIQUES AND SORT
    LastRow = ActiveSheet.Cells(Rows.Count, 4).End(xlUp).Row
    Range("D1:D" & LastRow).Name = "Words"
    Range("E1:E" & LastRow).Copy:: Range("E1").PasteSpecial Paste:=xlPasteValues
    ActiveSheet.Range("D1:E" & LastRow).RemoveDuplicates Columns:=1, Header:=xlNo
    LastRow = ActiveSheet.Cells(Rows.Count, 4).End(xlUp).Row
    With ActiveSheet.Sort
        .SortFields.Clear
        .SortFields.Add Key:=Range("D1:D" & LastRow)
        .SetRange Range("D1:E" & LastRow)
        .Apply
    End With
    Range("C1").Select
    Application.ScreenUpdating = True
    End Sub
    Attached Files Attached Files
    Last edited by Maudibe; 2016-08-16 at 21:53.

  5. #5
    WS Lounge VIP
    Join Date
    Mar 2002
    Location
    Newcazzle, UK
    Posts
    2,815
    Thanks
    132
    Thanked 479 Times in 456 Posts
    I liked Maud's method of using UPPERCASE - it simplifies things!

    Well I thought I'd have a go at this using a dictionary method.
    I also used a function to strip punctuation and digits (but allowing any apostrophes e.g. as in can't, shouldn't etc)
    I added a couple of extra text cells to show the differences.

    see attached

    Note: early binding is used, so requires a reference to Microsoft Scripting Runtime
    Code:
    Sub countWordsInRange()
    Application.ScreenUpdating = False                  'freeze display till ready
    Dim d As New Dictionary                             'use dictionary to hold unique words
    
    zText = Join(Application.Transpose(Selection), " ") 'merge all text in cell selection range
    zText = stripPunctuation(zText)                     'remove all punctuation and digits
    zText = UCase(zText)                                'convert to UPPERCASE
    zArr = Split(zText, " ")                            'put words into array
    
    For i = LBound(zArr) To UBound(zArr)                'loop through array
    If d.Exists(zArr(i)) Then                           'item already exists in dictionary, so..
    d.Item(zArr(i)) = d.Item(zArr(i)) + 1               'increment item counter
    Else                                                'otherwise..
    d(zArr(i)) = 1                                      'add item to dictionary, with count as 1
    End If                                              'end of test for item in dictionary
    Next i                                              'next word in array
    
    For Each k In d.Keys                                'loop through all dictionary entries
    r = r + 1                                           'increment row counter
    Cells(r, "M") = k                                   'write word to column [M]
    Cells(r, "N") = d(k)                                'write word count to column [N]
    Next                                                'process next item in dictionary
    
    Range("M1:N" & d.Count).Sort key1:=[M1]             'sort words alphabetically on sheet
    
    End Sub
    
    Function stripPunctuation(z) As String              'note:removes digits as well
    With CreateObject("vbscript.regexp")
        .Pattern = "[^A-Z ']"                           '<<allow alpha chars, space, apostrophe '
        .IgnoreCase = True
        .Global = True
        stripPunctuation = .Replace(z, "")
    End With
    End Function
    zeddy
    Attached Files Attached Files
    Last edited by zeddy; 2016-08-18 at 05:41.

  6. The Following User Says Thank You to zeddy For This Useful Post:

    RetiredGeek (2016-08-18)

  7. #6
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,433
    Thanks
    371
    Thanked 1,456 Times in 1,325 Posts
    Zeddy,

    Loved the StripPunctuation Function!

    It is now in my code database. I've made some notes and thought I'd post it for those who may not get into the long code above or recognize the genius of the code!

    Code:
    Function StripPunctuation(zSrcStr As String) As String  
    
    '+---------------------------------------------------------------+
    '| Author: Zeddy of the Windows Secrets Lounge                   |
    '| Date  : Aug 18, 2016                                          |
    '|                                                               |
    '| Description: Strips punctuation from the supplied string      |
    '|              ignoring case and returns the modified string    |
    '|              to the caller.                                   |
    '|              See comments for some available adjustments.     |
    '+---------------------------------------------------------------+
    
      With CreateObject("vbscript.regexp")
          .Pattern = "[^A-Z ']"           '<<allow alpha chars, space, apostrophe '
    '      .Pattern = "[^A-Z^0-9 ']"      '<<allow alphanumeric chars, space, apostrophe '
    '      .Pattern = "[^A-Z^a-z^0-9 ']"  '<<allow alphanumeric chars if .IgnoreCase = False!
          .IgnoreCase = True              'Change to False to honor case
          .Global = True                   
          StripPunctuation = .Replace(zSrcStr, "")
      End With
      
    End Function 'StripPunctuation
    Note: This function will work in ANY Office Product!

    HTH
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  8. #7
    WS Lounge VIP
    Join Date
    Mar 2002
    Location
    Newcazzle, UK
    Posts
    2,815
    Thanks
    132
    Thanked 479 Times in 456 Posts
    Hi RG

    ..I like the additional examples you gave, and that Thank You is very appreciated!

    I like seeing the different ways of helping out in this Lounge.
    There are always so many different techniques available.

    zeddy

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •