Results 1 to 9 of 9
  1. #1
    Super Moderator BATcher's Avatar
    Join Date
    Feb 2008
    Location
    A cultural area in SW England
    Posts
    3,414
    Thanks
    33
    Thanked 195 Times in 175 Posts

    Anyone knowledgeable about the use of WGET?

    I've just returned to WGET after a gap of about ten years, since when https has come upon the scene.

    I've set up the WGET parameters correctly, I hope, but my download is of an HTML file not the expected TXT file.

    Here's part of what I'm using - assume the other variables are set appropriately...

    set websource=https://www.thewebsite.org.uk/Documents/WhatIWantToDownload.txt

    set wgetparms="%websource%" -O %newfile% -a %log% --no-check-certificate
    set wgetparms=%wgetparms% --user=username --password=password

    call %wgetpath%\%wgetversion% %wgetparms%


    When I use a browser to do the file download it works fine...
    BATcher

    Time prevents everything happening all at once...

  2. #2
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,434
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    BATcher,

    I'm not familiar with wget but I don't see where you are initializing %wgetpath%, %log%, %newfile% ? Of course this could be in part of the script you didn't share?

    JIC here's a link to the WGET User Manual.

    Probably not much help but I'm trying!

    HTH
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  3. #3
    3 Star Lounger
    Join Date
    Dec 2009
    Location
    Northern California
    Posts
    326
    Thanks
    15
    Thanked 142 Times in 91 Posts
    Quote Originally Posted by RetiredGeek View Post
    I don't see where you are initializing %wgetpath%, %log%, %newfile% ?
    I noticed that, too, but disregarded as not relevant because BATcher said, "my download is of an HTML file not the expected TXT file." That implies to me that his script is successfully downloading but not in the format he's expecting, so I think we can assume the variables are predefined elsewhere.

    I'm no expert, but in what little experience I've had, wget has always downloaded exactly what the server gave it--no more, no less... which begs the question: have you assured yourself the file on the server actually doesn't contain html tags? If you use a browser to open "WhatIWantToDownload.txt" and then view source, are there any html tags or is it plain text only?

  4. #4
    Super Moderator BATcher's Avatar
    Join Date
    Feb 2008
    Location
    A cultural area in SW England
    Posts
    3,414
    Thanks
    33
    Thanked 195 Times in 175 Posts
    Thanks, both of you! You missed my "assume the other variables are set appropriately" statement!

    I have done some searching in the hefty HTML document (65KB), and find:
    Code:
     <div class="main-container">         
    <nav role="navigation" class="main-breadcrumb" aria-label="Breadcrumb" data-track-zone="breadcrumb">
        <div class="main-breadcrumb__content breadcrumb">        
        </div>
    </nav>
      
    <form method="post" action="?ReturnUrl=Documents%2fWhatIWantToDownload.txt" id="aspnetForm">
    
    <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTkyMjIwOTc2NmQYAwUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFVGN0bDAwJGN0bDAwJFJvb3RQbGFjZUhvbGRlciRSb290UGxhY2VIb2xkZXIkTG9naW5Db250cm9sJExvZ2luRm9ybUNvbnRyb2wkUmVtZW1iZXJNZQUjY3RsMDAkY3RsMDAkUmVxdWlyZWRSZXNvdXJjZXNGb290ZXIPBQZGb290ZXJkBSNjdGwwMCRjdGwwMCRSZXF1aXJlZFJlc291cmNlc0hlYWRlcg8FBkhlYWRlcmQFvd+F/+Xce6aPV9wL/r5i9zVfgg==" />
    </div>
    Oh dear...!
    Last edited by BATcher; 2016-04-01 at 02:48. Reason: Add even more incomprensible stuff up to </div>
    BATcher

    Time prevents everything happening all at once...

  5. #5
    Super Moderator BATcher's Avatar
    Join Date
    Feb 2008
    Location
    A cultural area in SW England
    Posts
    3,414
    Thanks
    33
    Thanked 195 Times in 175 Posts
    And here's the appropriately-edited WGET log information...
    Code:
    --2016-04-01 08:14:39--  https://www.thewebsite.org.uk/Documents/WhatIWantToDownload.txt
    Resolving www.thewebsite.org.uk (www.thewebsite.org.uk)... ppp.qqq.rrr.sss
    Connecting to www.thewebsite.org.uk (www.thewebsite.org.uk)|ppp.qqq.rrr.sss|:443... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: /login/?ReturnUrl=%2fDocuments%2fWhatIWantToDownload.txt [following]
    --2016-04-01 08:14:39--  https://www.thewebsite.org.uk/login/?ReturnUrl=%2fDocuments%2fWhatIWantToDownload.txt
    Reusing existing connection to www.thewebsite.org.uk:443.
    HTTP request sent, awaiting response... 200 OK
    Length: 66118 (65K) [text/html]
    Saving to: 'TargetFile.txt'
    BATcher

    Time prevents everything happening all at once...

  6. #6
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,434
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Quote Originally Posted by BATcher View Post
    Thanks, both of you! You missed my "assume the other variables are set appropriately" statement!
    Reminds me of that famous book "With Out A Clue" by:

    ROTFLOL.gif
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  7. #7
    3 Star Lounger
    Join Date
    Dec 2009
    Location
    Northern California
    Posts
    326
    Thanks
    15
    Thanked 142 Times in 91 Posts
    Quote Originally Posted by BATcher View Post
    You missed my "assume the other variables are set appropriately" statement!
    Why would I have seen that? After all, I also missed:
    When I use a browser to do the file download it works fine.
    ...which had already answered my question, "If you use a browser to open "WhatIWantToDownload.txt" and then view source, are there any html tags or is it plain text only?"

    So I missed both.

  8. #8
    3 Star Lounger
    Join Date
    Dec 2009
    Location
    Northern California
    Posts
    326
    Thanks
    15
    Thanked 142 Times in 91 Posts
    I think one relevant point to consider in this discussion is that a browser will attempt to parse what the server is sending back before displaying it, while wget does not (AFAIK). For example, if you're requesting a text file that contains "<img src=..." a browser would make a second request and display the image, while wget would show the link and not the image.

    I'm guessing that's why you're getting different results from a browser vs. your wget command, and I think reply #4 may contain a clue:
    Code:
    <form method="post" action="?ReturnUrl=Documents%2fWhatIWantToDownload.txt"
    The action="? part means the form is designed to return to itself (because there's no URL in front of the "?"), but this time with a hidden parameter--all that gobbledegook. I wonder if that's because of the https, and the gobbledegook might be some sort of SSL key?

    In effect, the server may be saying, "If you want this file, rerequest with this encryption key." Your wget script is showing you what the server is saying, while a browser would parse that and respond back to the server as requested, to which the server would make a second response with the desired data.

    I don't know if that's how it works, but that would be my working theory.

    If you repeat your wget script, does the gobbledegook change? If not, you might be able to edit your wget URL to "WhatIWantToDownload.txt?__VIEWSTATE={gobbledegook}"

    But I suspect it won't be that easy. If that's an SSL key, it should be different each time your computer and the server open an SSL channel.

    I don't know, this has just gotten out of my league.

  9. #9
    Super Moderator BATcher's Avatar
    Join Date
    Feb 2008
    Location
    A cultural area in SW England
    Posts
    3,414
    Thanks
    33
    Thanked 195 Times in 175 Posts
    You know much more about it than I do! I suspect I need an ASP.NET developer to elucidate.

    I get the same result with and without the --no-check-certificate parameter, and with and without the logon/password information.
    BATcher

    Time prevents everything happening all at once...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •