wget - shawfdong/hyades GitHub Wiki
1. To fetch all materials for UCSC CMPE 118/L(218/L) - Mechatronics - Winter 2013:
$ wget -nH --cut-dirs=1 -r -l1 --no-parent http://classes.soe.ucsc.edu/cmpe118/Winter13/Note:
- -nH / --no-host-directories : Disable generation of host-prefixed directories; otherwise wget will create a structure of directories beginning with classes.soe.ucsc.edu/;
- --cut-dirs=1 : Ignore 1 directory component; so the files will be saved locally under Winter13/, rather than cmpe118/Winter13/;
- -r / --recursive : Turn on recursive retrieving;
- -l 1 / --level=1 : Specify recursion maximum depth level 1;
- --no-parent : Do not ever ascend to the parent directory when retrieving recursively.
$ wget -e robots=off -nH -r --no-parent --cut-dirs=1 --reject="index.html*" http://www.slac.stanford.edu/~behroozi/Bolshoi_Catalogs/Note:
- -e robots=off: By default, wget respects robots.txt which might not allow you to grab the site. We turn that behavior off;
$ wget --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" -nH --cut-dirs=1 -r --no-parent http://sc16.supercomputing.org/sc-archive/tech_posterNote:
- If we don't specify --user-agent, we'll get "ERROR 403: Forbidden"[2]