Categories
Info

A Tutorial on Wget

  1. Basics
  2. Download Websites
  3. Fool Webmasters
  4. Download All pdfs

Basics

Wget is one of the powerful tools available there to download stuff from internet. You can do a lot of things using wget. Basic use is to download files from internet.

To download a file just type

wget http://your-url-to/file

But you cannot resume broken downloads.use -c option to start resumable downloads

wget -c http://your-link-to/file

 You can also mask the program as web browser using -U.
This helps when the sites doesn’t allow download managers.

wget -c -U Mozilla http://your-link-to/file

Return To Contents

Download Entire Website

You can download an entire website using -r option.

wget -r http://your-site.com

But be careful. It downloads the entire website for you. Since this tool can put a large load on servers it obeys robot.txt you can mirror a site on you local drive using -m option.

wget -m http://your-site.com

You can select the levels up to which you can dig into the site and downloads using -l option.

wget -r -l3 http://your-site.com

This will download only up to 3 levels. Suppose you want download only sub folders in a website url use –no-parent option. With this option wget downloads only the sub folders and ignores,the parent folders

wget -r –no-parent http://your-site.com/subfldr/subfolder 

Now coming to terrible ideas.. to the hell with webmasters, not allowing to download the website type to ignore the robots.txt.  

wget -r -U Mozilla -erobots=off http://url-to-site/ 

p.s. masking like a browser is a crime in some countries…. or something like that, i have heard on net.

Return To Contents

Fooling the Webmasters

Do you think the web master cannot stop u with above command. to fool him use  

wget -r -U Mozilla -erobots=off -w 5 –limit-rate=20 http://url-to-site/ 

here -w 5 instructs wget to wait 5 secs before downloading another file and –limit-rate=20 makes wget to cap the download speed to 20KBps. So u can fool the webmaster ….

Return To Contents

Download all PDFs

You can download all files of a particular format , like all pdfs listed on a webpage,

wget -r -l1 -A.pdf –no-parent http://url-to-webpage-with-pdfs/ 

This is most useful for students. When they find a webpage of a professor with the files they can use this command to download all pdfs or lecture notes. 

visit wget man page for more details wget is also available in windows. 
You can get all the powerful features of wget in windows!!!! get it here 

References
Mastering wget 
Linux reviews  
More On Website Mirroring

to download urls with utf-8 characters use
–restrict-file-names=nocontrol option with wget
you will save files with correct name in filesystem