Results 1 to 6 of 6
  1. #1
    New Lounger
    Join Date
    Nov 2013
    Posts
    12
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Post how to extract certain text from string

    Hello
    Okay, How do I filter/extract strings.

    So I have converted a PDF file into String using itextsharp and I have the text displayed into a Richtextbox1.
    However there are too many irrelevant text that I don't need in the Richtextbox.
    Is there a way I can display the text I want based on keywords, the entire length of the text.

    Example of text that is displayed in textrichbox1 after conversation of PDF to text:

    2
    3
    3
    4
    4
    A A
    B B
    SHEET 1 OF 1
    774
    SIZE
    SCALE
    24.000-47.999
    12.000-23.999
    CON BAG
    WIRE
    90in. EX
    Bos00232940
    Bos00320491
    Das1234
    Das3216
    DETAILS
    1 2
    RAGE
    So the keywords would be "Bos, "Das", "774" and the new text that would be displayed in the richtextbox1 is shown below, instead of the entire text above.

    Bos00232940
    Bos00320491
    Das1234
    Das3216
    774
    Here is what I have so far. But it doesn't work it still displays the entire PDF in the richtextbox.



    Code:
     Public Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    
    
            Dim pdffilename As String
            pdffilename = TextBox1.Text
            Dim filepath = "c:\temp\" & TextBox1.Text & ".pdf"
            Dim thetext As String
            thetext = GetTextFromPDF(filepath)
            Dim lines() As String = System.Text.RegularExpressions.Regex.Split(thetext, Environment.NewLine)
    
            Dim keywords As New List(Of String)
            keywords.Add("Bos")
            keywords.Add("Das")
            keywords.Add("774")
            Dim newTextLines As New List(Of String)
            For Each line As String In lines
    
                For Each keyw As String In thetext
    
                    If line.Contains(keyw) Then
                        newTextLines.Add(line)
                        Exit For
                    End If
                Next
            Next
            RichTextBox1.Text = String.Join(Environment.NewLine, newTextLines.ToArray)
        End Sub
    VB.net 2010

    Thanks,

    Steve.

  2. #2
    4 Star Lounger SpywareDr's Avatar
    Join Date
    Dec 2009
    Location
    Riviera Beach, Maryland, USA
    Posts
    490
    Thanks
    10
    Thanked 52 Times in 43 Posts
    Try this:

    1. Rename your original text file "Richtextbox1.txt".

    2. Get to a CMD prompt and into the same folder containing your "Richtextbox1.txt" file and type:

      for %x in (Bos Das 774) do type Richtextbox1.txt|find "%x">>results.txt

    3. The strings you wanted should now be in the file named "results.txt"

  3. #3
    New Lounger
    Join Date
    Nov 2013
    Posts
    12
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Not a real solution, I want to do it within my program. thanks for the idea though.

    again, the original text file is a PDF document that i convert into string via VB.

    thanks for the reply.

  4. #4
    New Lounger
    Join Date
    Nov 2013
    Posts
    12
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Thank you all but I figure it out

  5. #5
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,434
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Kingpain,

    Please post your solution so others can learn from your work. Thanks!
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  6. #6
    New Lounger
    Join Date
    Nov 2013
    Posts
    12
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Post

    Quote Originally Posted by RetiredGeek View Post
    Kingpain,

    Please post your solution so others can learn from your work. Thanks!
    Sure buddy.

    Code:
    Public Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim pdffilename As String
        pdffilename = TextBox1.Text
        Dim filepath = "c:\temp\" & TextBox1.Text & ".pdf"
        Dim thetext As String
        thetext = GetTextFromPDF(filepath)
    
        Dim re As New Regex("[\t ](?<w>((774)|(Bos)|(Das))[a-z0-9]*)[\t ]", RegexOptions.ExplicitCapture Or RegexOptions.IgnoreCase Or RegexOptions.Compiled)
        Dim Lines() As String = {thetext}
        Dim words As New List(Of String)
        For Each s As String In Lines
            Dim mc As MatchCollection = re.Matches(s)
            For Each m As Match In mc
                words.Add(m.Groups("w").Value)
            Next
        Next
        RichTextBox1.Text = String.Join(Environment.NewLine, words.ToArray)
    End Sub

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •