我想下载一个网页的本地副本,并获得所有的css,图像,javascript等。

在之前的讨论中(例如这里和这里,都是超过两年的历史),通常会提出两个建议:wget -p和httrack。然而,这些建议都失败了。如果你能帮助我使用这两种工具来完成任务,我会非常感激;替代品也很可爱。

选项1:wget -p

Wget -p成功下载了所有网页的先决条件(css,图像,js)。但是,当我在web浏览器中加载本地副本时,页面无法加载先决条件,因为通往这些先决条件的路径还没有从web上的版本修改。

例如:

在页面的html中,<link rel="stylesheet href="/stylesheets/foo.css" />将需要被纠正,以指向foo.css的新的相对路径 在css文件中,background-image: url(/images/bar.png)同样需要调整。

是否有办法修改wget -p以使路径正确?

选项2:httrack

Httrack似乎是一个镜像整个网站的好工具,但我不清楚如何使用它来创建单个页面的本地副本。在httrack论坛上有很多关于这个话题的讨论(例如这里),但似乎没有人有一个万无一失的解决方案。

选择3:另一个工具?

有些人建议使用付费工具,但我只是不相信没有免费的解决方案。


Wget有能力做你要求的事情。试试下面的方法:

wget -p -k http://www.example.com/

-p将为您提供正确查看站点所需的所有元素(css,图像等)。 -k将改变所有链接(包括CSS和图像链接),以允许您离线查看页面,就像它在线时一样。

来自Wget文档:

‘-k’
‘--convert-links’
After the download is complete, convert the links in the document to make them
suitable for local viewing. This affects not only the visible hyperlinks, but
any part of the document that links to external content, such as embedded images,
links to style sheets, hyperlinks to non-html content, etc.

Each link will be changed in one of the two ways:

    The links to files that have been downloaded by Wget will be changed to refer
    to the file they point to as a relative link.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also
    downloaded, then the link in doc.html will be modified to point to
    ‘../bar/img.gif’. This kind of transformation works reliably for arbitrary
    combinations of directories.

    The links to files that have not been downloaded by Wget will be changed to
    include host name and absolute path of the location they point to.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to
    ../bar/img.gif), then the link in doc.html will be modified to point to
    http://hostname/bar/img.gif. 

Because of this, local browsing works reliably: if a linked file was downloaded,
the link will refer to its local name; if it was not downloaded, the link will
refer to its full Internet address rather than presenting a broken link. The fact
that the former links are converted to relative links ensures that you can move
the downloaded hierarchy to another directory.

Note that only at the end of the download can Wget know which links have been
downloaded. Because of that, the work done by ‘-k’ will be performed at the end
of all the downloads.