Results 1 to 5 of 5
  1. #1
    3 Star Lounger
    Join Date
    Mar 2004
    Location
    London, Gtr London, England
    Posts
    298
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I extract comments from students about the course they have just finished, and compile reports for the university administration. the comments are typed online into a box on the web form used for collection feedback, and can be extracted into a .csv file, one row per student. The students can type anything, and text may include for example apostrophes, ampersands, quote marks, line breaks etc. These then show up as or ' type codes in the extracted text. Someone (me) then has to go through and clean up the text, so I am looking for an easy way to identify the dirty text and replace it with the correct punctuation. Find Replace will do this but it's very slow, tedious and error prone. Any ideas about how to improve on it?
    Thanks
    Frank

  2. Get our unique weekly Newsletter with tips and techniques, how to's and critical updates on Windows 7, Windows 8, Windows XP, Firefox, Internet Explorer, Google, etc. Join our 480,000 subscribers!

    Excel 2013: The Missing Manual

    + Get this BONUS — free!

    Get the most of Excel! Learn about new features, basics of creating a new spreadsheet and using the infamous Ribbon in the first chapter of Excel 2013: The Missing Manual - Subscribe and download Chapter 1 for free!

  3. #2
    Star Lounger
    Join Date
    Dec 2009
    Location
    Lansing, MI
    Posts
    65
    Thanks
    1
    Thanked 2 Times in 2 Posts
    I have very limited experience working with macros, and I'm sure someone has a macro out there that will do exactly what you are asking about. So with that in mind, here's how I might handle it to remove the html code (which sounds like is your problem)... round trip through Internet Explorer. Of course, I don't know the extent of what you might be receiving in your CSV file, so this may or may not work for you. It worked in the couple of simple tests that I ran though and I didn't have to do any extensive find and replace for any other html codes.

    Make a back up of your file... just in case.

    Change the extension of your CSV file to html
    Open the file in Notepad
    Do a replace of with either a space or with nothing (depends on how your file actually looks)
    Save and close
    Open your html file in Internet Explorer
    Select File > Save As and save as a text file with csv as the extension
    Open your new file in Excel

  4. #3
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts
    Quote Originally Posted by frankwalsh1962 View Post
    The students can type anything, and text may include for example apostrophes, ampersands, quote marks, line breaks etc. These then show up as or ' type codes in the extracted text. Someone (me) then has to go through and clean up the text, so I am looking for an easy way to identify the dirty text and replace it with the correct punctuation.
    It would be handy to be able to use the .Net HtmlDecode method, which reverses the process of coding special characters into HTML tags and entities. However, I don't know whether Word/VBA can do that. The alternative to using such a pre-built routine is, as you have seen, a huge amount of manual futzing.

    == Edit ==

    One thing you could try, but you must make sure there are no hidden script tags in the user input, is to open the .csv file directly in a browser. This should decode the HTML back to characters for display. Then you could select and copy the entire displayed text and paste it into a text editor. The problem with this would be if the user inserted quotation marks into their comments, then the CSV format would be broken. Hmmm...

  5. #4
    Super Moderator
    Join Date
    Jan 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    3,515
    Thanks
    3
    Thanked 143 Times in 136 Posts
    It would be a pretty simple macro to check the ascii value of every character in the submitted text and if it falls outside the range of a-z and A-Z then it is replaced with something that allows it to be reinstated eg <asc32> if required.

    How would you plan to run this code? Would you plan to manually do it in Word or do you need a solution that acts on the webserver using either vbscript or javascript?
    Andrew Lockton, Chrysalis Design, Melbourne Australia

  6. #5
    3 Star Lounger
    Join Date
    Mar 2004
    Location
    London, Gtr London, England
    Posts
    298
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Wow, thanks guys, I've been away for a little and come back to find several great answers!

    The comments often include quote marks, but these get converted into ampersand hash digit digit semicolon before I get a chance to download as a .csv file (I can't actually put ampersands etc in this text box!)

    These codes are often identifiable as apostrophes, dashes, line breaks, etc. and could be find-and-replaced if you know what to replace with (usually easy if read in context). So I'm currently using a series of FnR operations on each one in turn. I will try out the idea of opening in a browser, if it will handle a .csv file - or note pad after changing the extension.

    I will try things out and let you know what happens. Thanks Frank

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •