wget - shawfdong/hyades GitHub Wiki

1. To fetch all materials for UCSC CMPE 118/L(218/L) - Mechatronics - Winter 2013:

$ wget -nH --cut-dirs=1 -r -l1 --no-parent http://classes.soe.ucsc.edu/cmpe118/Winter13/
Note:
  • -nH / --no-host-directories : Disable generation of host-prefixed directories; otherwise wget will create a structure of directories beginning with classes.soe.ucsc.edu/;
  • --cut-dirs=1 : Ignore 1 directory component; so the files will be saved locally under Winter13/, rather than cmpe118/Winter13/;
  • -r / --recursive : Turn on recursive retrieving;
  • -l 1 / --level=1 : Specify recursion maximum depth level 1;
  • --no-parent : Do not ever ascend to the parent directory when retrieving recursively.
2. To download the Bolshoi Catalogs[1]:
$ wget -e robots=off -nH -r --no-parent --cut-dirs=1 --reject="index.html*"  http://www.slac.stanford.edu/~behroozi/Bolshoi_Catalogs/
Note:
  • -e robots=off: By default, wget respects robots.txt which might not allow you to grab the site. We turn that behavior off;
3. To download the Technical Program Posters of SC16:
$ wget --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" -nH --cut-dirs=1 -r --no-parent http://sc16.supercomputing.org/sc-archive/tech_poster
Note:
  • If we don't specify --user-agent, we'll get "ERROR 403: Forbidden"[2]

References

  1. ^ Using wget to recursively fetch a directory with arbitrary files in it
  2. ^ Sites not accepting wget user agent header
⚠️ **GitHub.com Fallback** ⚠️