Christopher Juckins

SysAdmin Tips, Tricks and other Software Tools

User Tools

Site Tools


mirror_websites_with_wget

http://www.techrepublic.com/blog/opensource/mirroring-web-sites-with-wget/883

The quickest and easiest way to mirror a remote Web site is to use wget. Wget is similar to cURL (and I'll be the first to admit that I prefer cURL over wget), but wget has some really slick and useful features that aren't found in cURL, such as a means to download an entire Web site for local viewing:

$ wget -rkp -l6 -np -nH -N http://example.com/

This command does a number of things. The -rkp option tells wget to download recursively, to convert downloaded links in HTML pages to point to local files, and to obtain all images and other files to properly render the page.

The -l6 option tells wget to recurse to a maximum of six nested levels, while -np tells it not to recurse to the parent directory. The -nH option tells wget not to create host directories; this means that the files will be downloaded to the current directory rather than a directory named after the hostname of the site being mirrored.

Finally, -N tells wget to use time-stamping, which is its way of trying to prevent downloading the same unchanged file more than once. Unfortunately, with dynamic sites being the norm, this may not work very well, but it's worth adding, regardless.

mirror_websites_with_wget.txt · Last modified: 2014/05/20 21:14 by juckins