Results 1 to 7 of 7
  1. #1
    Silver Lounger
    Join Date
    Oct 2002
    Posts
    1,993
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Extracting parts of text strings (2002 and above)

    I am trying to extract different parts from text strings, for the purpose of later using AutoFilter or some other filter etc.

    Among other information in a sheet I have a column with text strings. A basic example could look like this:

    ABC123_XY4567_Name;Another name_Third name_**.xyz

    The information that is interesting is the numbers and names and eventual last codes (**), it can even be some more (_**_##). In this first step I'm working with, the first name ("Name"), is the same for all the text strings, so I'm not extracting that at the moment.

    The problem is that sometimes the string looks a little different, but still with the main separators. Ex. another ABC code in string. At the moment I haven't bothered with those who have two codes, i.e. extracting the second in another column but guess it would be simple and useful.

    ABC123_ABC456_XY4567;1_Name;Another name_Third name_**.xyz

    But as you can see it sometimes also is part of a set ("1_Name") and then it has another first ";" suddenly before "Name".

    So, if we look at the first example and I extract "Another name", I could search for ";" add 1, and get the start position for A in "Another name". Works OK. But since it sometimes is another ";" earlier in the string I must use better approach. IF it's an earlier ";" it's always after XY code, I could then search for XY instead and add 6 or something to get a start position for the real search for ";", i.e. the second. But I get some problem with the 1_.

    The first parts are easy:
    Let's say the first string above is in B10, then I could set up columns for: "ABC" "XY" "Another name" and "Third name".
    ABC
    =MID(B10;4;3) gives 123
    XY
    =MID(B10;SEARCH("XY";B10)+2;4) gives 4567
    Another name
    =MID(B10;SEARCH(";";B10)+1;SEARCH("_";B10;SEARCH(" ;";B10)+1)-SEARCH(";";B10)-1) gives Another name.

    But if the string looks like these:
    ABC789_XY4567;1_Name;Another name_Third name_**.xyz
    ABC123_ABC789_XY4567;1_Name;Another name_Third name_**.xyz

    with or without two ABC codes, but the odd ;1_ before name, I don't get Another name, I get 1. It's the first semicolon that's the problem.

    And last the Third name; many times the string ends with special codes, (not exact but as example) _** or _## or _**_##. Thus when extracting Third name I initially searched for last "_" to get end position for Third name. Worked OK.

    =MID(B10;SEARCH("_";B10;25)+1;SEARCH("_";B10;SEARC H("_";B10;25)+1)-SEARCH("_";B10;25)-1) gives Third name.

    But if the string ends like this, i.e. no end codes after Third name:
    _Third name.xyz
    I get #VALUE error.

    So problems are (at the moment <img src=/S/smile.gif border=0 alt=smile width=15 height=15>): Another name if a first ";" and Third name since string can end differently (code _** or just ".").

    Any help is greatly appreciated.

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Extracting parts of text strings (2002 and above)

    If there is a semicolon before the first name, is the semicolon ALWAYS followed by 1_ or could it be something else?
    Is there any chance that Another name is prefixed with 1_ ?

  3. #3
    Silver Lounger
    Join Date
    Oct 2002
    Posts
    1,993
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Extracting parts of text strings (2002 and abo

    <img src=/S/hello.gif border=0 alt=hello width=25 height=29> Hans,

    Thanks for reply.
    If there is a semicolon before the first name, it is always followed by a number and "_". Such as:
    ;1_Name
    ;2_Name
    ;3_Name

    As you maybe can guess the first name (Name) is for a broader group, so actually, if they had asked me, the 1_ part should probably been connected to Third name. But let's not confuse matters with that. This is how it is.

    Most of the time it is:
    ABC123_XY4567_Name;Another name etc.
    but sometimes
    ABC789_XY4567;1_Name;Another name etc.
    ABC123_ABC789_XY4567;1_Name;Another name etc.

    Another name (silly this, but I can't use real examples), is never prefixed. It starts after first (most cases) or as you've seen after second semicolon, and ends with "_". Ex: ;Another name_Third name_ etc.

    The second name (Another name) can be one word or several, but I have not had any problem with that. That is why I used "Another name" as example (two words with space). Actually all names can be one or several words. And as mentioned Third name many times ends with "_" before last code, but sometimes there are no end code, and I also get problems.

    Again, thanks for your reply.

  4. #4
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Extracting parts of text strings (2002 and abo

    It's probably possible to do this with formulas, but it gets complicated. I wrote custom VBA function that parses the text:

    Function SplitCode(s As String)
    Dim p1 As Integer
    Dim p2 As Integer
    Dim p3 As Integer
    Dim p4 As Integer
    Dim p5 As Integer
    Dim c1 As Integer
    Dim s1 As String
    Dim s2 As String
    Dim s3 As String
    Dim s4 As String
    ' position of first semicolon
    p1 = InStr(1, s, ";")
    c1 = Asc(Mid(s, p1 + 1))
    If c1 > 47 And c1 < 58 Then
    ' semicolon followed by digit
    ' so look for next semicolon
    p1 = InStr(p1 + 1, s, ";")
    End If
    ' position of previous underscore
    p2 = InStrRev(s, "_", p1)
    ' first name
    s1 = Mid(s, p2 + 1, p1 - p2 - 1)
    ' position of next underscore
    p3 = InStr(p1 + 1, s, "_")
    ' second name
    s2 = Mid(s, p1 + 1, p3 - p1 - 1)
    ' position of next underscore
    p4 = InStr(p3 + 1, s, "_")
    If p4 = 0 Then
    ' not found, so no code
    ' position of period
    p5 = InStr(p3 + 1, s, ".")
    ' third name
    s3 = Mid(s, p3 + 1, p5 - p3 - 1)
    ' code
    s4 = ""
    Else
    ' found
    ' position of period
    p5 = InStr(p4 + 1, s, ".")
    ' third name
    s3 = Mid(s, p3 + 1, p4 - p3 - 1)
    ' code
    s4 = Mid(s, p4 + 1, p5 - p4 - 1)
    End If
    SplitCode = Array(s1, s2, s3, s4)
    End Function

    It's an array function that returns 4 values: the three names and the code (which can be an empty string).

    Usage:
    Say that you have a text string in A1.
    Select B1:E1, enter the formula =SplitCode(A1) and press Ctrl+Shift+Enter to confirm as an array formula.

    See the attached workbook.
    Attached Files Attached Files

  5. #5
    Silver Lounger
    Join Date
    Oct 2002
    Posts
    1,993
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Extracting parts of text strings (2002 and abo

    Hi Hans,

    Thanks for reply and everything!

    Looks promising, indeed promising. Singing over here ... I looked at your workbook, tested in a test file I had prepared if needed to post here ( <img src=/S/sorry.gif border=0 alt=sorry width=15 height=15> didn't, but same formulas as I posted) and also tested on real data.

    Since I hadn't bothered to look at first name (Name), at this stage all are the same in my project, but if the solution is "scaled" one level, it could be useful, I had not found an error in a text string.

    Since you included Name in the solution, when I double-clicked to expand the column I found this XY4567Name (example, was something else in real data), in Name column. Obviously the string is wrong.
    ABC123_XY4567Name;Another name_Third name_**.xyz
    No "_" after XY nr. or before Name.

    Thanks!

    I know extremely little about VBA, have only recorded macros some times, or created some simple quick and dirty years ago. But I'm usually good at formulas <img src=/S/smile.gif border=0 alt=smile width=15 height=15>.

    In the e-mail: "If c1 > 47 And c1 < 58 Then" came through as "If c1 > 47 And c1 < 58 Then", didn't work of course. Understood it should be "<", but looked at your file.
    ---
    Oops, a preview tells me I can't even post in html what I found in the mail, it was: "& l t ;" with no spaces between.
    ---
    But I can't say I understand the string
    c1 = Asc(Mid(s, p1 + 1))
    If c1 > 47 And c1 < 58 Then

    or anything else <img src=/S/laugh.gif border=0 alt=laugh width=15 height=15>, it's late. As you can guess I'm a person trying to solve anything with formulas to avoid using VBA.

    In your file, I find "Option Explicit" first in the module, but the code worked when I added a module and pasted it in my files. Again, my knowledge about VBA is nil, but I recall to have heard something about it's good to use "Option Explicit". Sure enough found this:
    Declaring Variables In VBA Code, at Chip's site.

    A preliminary check says it works, works very fine!

    Don't know how it will work using macros in the file. At this stage no problem with me, but don't know about other people. On the other hand at this stage it's my little project.

    Maybe can come back if I find more questions.

  6. #6
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Extracting parts of text strings (2002 and abo

    Sometimes, symbols such as "greater than", "less than", "greater than or equal to", "less than or equal to" and "not equal to" are converted to HTML codes when you view the post in your e-mail program (or sometimes in the browser). The code module in the workbook shows the "real" code.

    The Asc function returns the numeric code for a character. The ASCII code for "0" is 48 and that for "9" is 57. To test whether a character is a digit, I test whether its ASCII code is greater than 47 and less than 58.

    If you don't need the first name for now, we won't bother with it. If you do need it, you'll have to tell us what all possible forms for the first part of the string are.

  7. #7
    Silver Lounger
    Join Date
    Oct 2002
    Posts
    1,993
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Extracting parts of text strings (2002 and abo

    The first name isn't used at this stage since I look at a specific group, and first name is same within that group (it's the name on the group). But there are other data not added to the workbook from other groups. If one collects more (or all) data, then being able to filter on first name is really useful.

    No problem, your code is excellent, apart from working with the data, it also showed a string not correctly named:
    ABC123_XY4567Name;Another name_Third name_**.xyz

    so there probably are no other possible forms than those I mentioned first. It's only good that your code don't extract the Name in the above string, since it's easy to find and it shows an error in the text string naming!

    Since this is an array I can always hide the first name column if I don't want it. But it looks good.

    Another thing; my example for the XY code:
    =MID(B10;SEARCH("XY";B10)+2;4)
    from a string like this
    ABC123_XY4567_Name;Another name_Third name_**.xyz

    results in 4567, format is General, but it is text, so I can't use "is greater than" etc. under "Custom..." when using AutoFilter. Must be a number.

    A thing to add, maybe, would be to have a possibility to filter what I called ABC code, not as I did, simple extracting of first 123, but if two different codes have been used extracting both. Something like:
    ABC123_XY4567_Name; etc. gives "123", but
    ABC123_ABC456_XY4567_Name; etc. gives "123/456"
    or something similar so one could find both codes togeher.

    On the other hand the full text string is always there after filtering, so if my formula only picks the first 123, it is still possible to see the result.

    Lets say I filter on 123 for ABC code,
    ABC123_XY4567_Name; etc. my formula gives 123 and it's shown when filtered.
    ABC987_XY4567_Name; etc. my formula gives 987 and it's NOT shown when filtered.
    ABC123_ABC456_XY4567_Name; etc. my formula gives 123 and it's shown when filtered.

    So even if the last one has also another code, the primary (first) is the one that is shown and the one possible to filter.

    It's getting late for us here in Europe Hans, I'll look more into this tomorrow (or later today ...), together with actually using the data.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •