Tuesday, July 6, 2010

wget websites

wget utility can be used to save websites for offline viewing.
wget --no-parent -rkpU Mozilla http://examplewebsite.com/index.html
--no-parent does not fetch files from parent directories.
-r recursively downloads the pages. Means if the index.html contains links to .css files and other pages those will be downloaded.
-k convert-links The links that appear in a page will be pointed to local files
-p page-requisites Get all images etc needed to display the html page.
-U Mozilla U means user-agent and Mozilla is one of the user-agents. Some sites allows only user-agents identified as browsers only since they don't want the sites ripped.