Saturday, July 23, 2011

Wget (or curl) can be a magic command at the right times.

Heres some options I’ve used to retrieve a directory from svn

I wanted to download the everything below

wget -r --no-parent -nH --cut-dirs=10 --no-check-certificate --http-user=user --http-password=password https://remotehost/svn/Base/Streams/1.3/Projects/ProjectName/trunk/osb/Interface/Resources

-r = recursive
--no-parent stops it looping back to parent directory should any references point there
-nH. Ignore the host when saving any resources (removes remotehost directory)
--cut-dirs=10. Removes 10 directories when saving resources. This removes /svn/Base/Streams/1.3/Projects/ProjectName/trunk/osb/Interface/Resources, and will only save any subsequent directories and filenames.

--no-check-certificate, ignores any https cert problems. Handy for local sites with untrusted certs.

-m = mirror. Handy replacement for, -N -r -l inf --no-remove-listing

-Rindex.html : do not download index.html pages

--http-user, --http-password specify some HTTP basic auth if required

--proxy-user=, --proxy-password= specify some proxy auth if required. The actual proxy itself

To specify the proxy. Yo must set environment variables
e.g. export http_proxy=

variables are
If set, the http_proxy and https_proxy variables should contain the urls of the proxies for http and https connections respectively.

This variable should contain the url of the proxy for ftp connections. It is quite common that http_proxy and ftp_proxy are set to the same url.


This variable should contain a comma-separated list of domain extensions proxy should not be used for. For instance, if the value of no_proxy is ‘’, proxy will not be used to retrieve documents from mysite.

No comments: