如何在不使用第三方库的情况下使用Node.js下载文件?

我不需要什么特别的东西。我只想从给定的URL下载文件,然后将其保存到给定的目录。


当前回答

Gfxmonk的答案在回调和file.close()完成之间有一个非常紧张的数据竞赛。File.close()实际上接受一个回调函数,该函数在close完成时被调用。否则,立即使用文件可能会失败(非常罕见!)。

一个完整的解决方案是:

var http = require('http');
var fs = require('fs');

var download = function(url, dest, cb) {
  var file = fs.createWriteStream(dest);
  var request = http.get(url, function(response) {
    response.pipe(file);
    file.on('finish', function() {
      file.close(cb);  // close() is async, call cb after close completes.
    });
  });
}

如果不等待finish事件,幼稚的脚本可能最终得到一个不完整的文件。如果不通过close调度cb回调,您可能会在访问文件和文件实际准备就绪之间出现竞争。

其他回答

超时解决方案,防止内存泄漏:

下面的代码是基于Brandon Tilley的回答:

var http = require('http'),
    fs = require('fs');

var request = http.get("http://example12345.com/yourfile.html", function(response) {
    if (response.statusCode === 200) {
        var file = fs.createWriteStream("copy.html");
        response.pipe(file);
    }
    // Add timeout.
    request.setTimeout(12000, function () {
        request.abort();
    });
});

当您得到一个错误时,不要创建文件,并倾向于使用超时在X秒后关闭您的请求。

说到处理错误,监听请求错误甚至更好。我甚至会通过检查响应代码来验证。这里认为只有200个响应代码成功,但其他代码可能很好。

const fs = require('fs');
const http = require('http');

const download = (url, dest, cb) => {
    const file = fs.createWriteStream(dest);

    const request = http.get(url, (response) => {
        // check if response is success
        if (response.statusCode !== 200) {
            return cb('Response status was ' + response.statusCode);
        }

        response.pipe(file);
    });

    // close() is async, call cb after close completes
    file.on('finish', () => file.close(cb));

    // check for request error too
    request.on('error', (err) => {
        fs.unlink(dest, () => cb(err.message)); // delete the (partial) file and then return the error
    });

    file.on('error', (err) => { // Handle errors
        fs.unlink(dest, () => cb(err.message)); // delete the (partial) file and then return the error
    });
};

尽管这段代码相对简单,但我建议使用request模块,因为它处理更多http不支持的协议(你好,HTTPS!)。

可以这样做:

const fs = require('fs');
const request = require('request');

const download = (url, dest, cb) => {
    const file = fs.createWriteStream(dest);
    const sendReq = request.get(url);
    
    // verify response code
    sendReq.on('response', (response) => {
        if (response.statusCode !== 200) {
            return cb('Response status was ' + response.statusCode);
        }

        sendReq.pipe(file);
    });

    // close() is async, call cb after close completes
    file.on('finish', () => file.close(cb));

    // check for request errors
    sendReq.on('error', (err) => {
        fs.unlink(dest, () => cb(err.message)); // delete the (partial) file and then return the error
    });

    file.on('error', (err) => { // Handle errors
        fs.unlink(dest, () => cb(err.message)); // delete the (partial) file and then return the error
    });
};

编辑:

要使它与https兼容,请更改

const http = require('http');

to

const http = require('https');

您可以创建一个HTTP GET请求,并将其响应管道到可写文件流:

const http = require('http'); // or 'https' for https:// URLs
const fs = require('fs');

const file = fs.createWriteStream("file.jpg");
const request = http.get("http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg", function(response) {
   response.pipe(file);

   // after download completed close filestream
   file.on("finish", () => {
       file.close();
       console.log("Download Completed");
   });
});

如果您希望支持在命令行上收集信息(比如指定目标文件或目录或URL),可以使用类似Commander的工具。

更详细的解释在https://sebhastian.com/nodejs-download-file/

编写自己的解决方案,因为现有的不符合我的要求。

包括:

HTTPS下载(http下载时切换包到http) 基于承诺的函数 处理转发路径(状态302) 浏览器头-需要在一些cdn 来自URL的文件名(以及硬编码) 错误处理

打印出来的,更安全。如果你使用的是纯JS(没有Flow,没有TS),可以随意删除类型,或者转换为.d。ts文件

index.js

import httpsDownload from httpsDownload;
httpsDownload('https://example.com/file.zip', './');

httpsDownload.[js|ts]

import https from "https";
import fs from "fs";
import path from "path";

function download(
  url: string,
  folder?: string,
  filename?: string
): Promise<void> {
  return new Promise((resolve, reject) => {
    const req = https
      .request(url, { headers: { "User-Agent": "javascript" } }, (response) => {
        if (response.statusCode === 302 && response.headers.location != null) {
          download(
            buildNextUrl(url, response.headers.location),
            folder,
            filename
          )
            .then(resolve)
            .catch(reject);
          return;
        }

        const file = fs.createWriteStream(
          buildDestinationPath(url, folder, filename)
        );
        response.pipe(file);
        file.on("finish", () => {
          file.close();
          resolve();
        });
      })
      .on("error", reject);
    req.end();
  });
}

function buildNextUrl(current: string, next: string) {
  const isNextUrlAbsolute = RegExp("^(?:[a-z]+:)?//").test(next);
  if (isNextUrlAbsolute) {
    return next;
  } else {
    const currentURL = new URL(current);
    const fullHost = `${currentURL.protocol}//${currentURL.hostname}${
      currentURL.port ? ":" + currentURL.port : ""
    }`;
    return `${fullHost}${next}`;
  }
}

function buildDestinationPath(url: string, folder?: string, filename?: string) {
  return path.join(folder ?? "./", filename ?? generateFilenameFromPath(url));
}

function generateFilenameFromPath(url: string): string {
  const urlParts = url.split("/");
  return urlParts[urlParts.length - 1] ?? "";
}

export default download;

就像Michelle Tilley说的,但是要有适当的控制流:

var http = require('http');
var fs = require('fs');

var download = function(url, dest, cb) {
  var file = fs.createWriteStream(dest);
  http.get(url, function(response) {
    response.pipe(file);
    file.on('finish', function() {
      file.close(cb);
    });
  });
}

如果不等待finish事件,幼稚的脚本可能最终得到一个不完整的文件。

编辑:感谢@Augusto Roman指出cb应该传递到文件。Close,没有显式调用。