我已经开始摆弄Node.js HTTP服务器,真的很喜欢写服务器端Javascript,但有些东西让我开始使用Node.js为我的web应用程序。
我理解整个异步I/O概念,但我有点担心过程性代码非常CPU密集型的边缘情况,如图像操作或排序大型数据集。
据我所知,对于简单的网页请求,如查看用户列表或查看博客文章,服务器将非常快。然而,如果我想编写非常CPU密集型的代码(例如在管理后端)来生成图形或调整数千张图像的大小,请求将非常慢(几秒钟)。由于这段代码不是异步的,在这几秒钟内到达服务器的每个请求都将被阻塞,直到我的慢请求完成。
一个建议是对CPU密集型任务使用Web Workers。然而,我担心网络工作者将很难编写干净的代码,因为它通过包含一个单独的JS文件来工作。如果CPU密集型代码位于对象的方法中怎么办?为每个CPU密集型方法编写JS文件有点糟糕。
另一个建议是生成一个子进程,但这会使代码更难以维护。
有什么建议可以克服这个障碍吗?如何使用Node.js编写干净的面向对象代码,同时确保CPU繁重的任务异步执行?
You don't want your CPU intensive code to execute async, you want it to execute in parallel. You need to get the processing work out of the thread that's serving HTTP requests. It's the only way to solve this problem. With NodeJS the answer is the cluster module, for spawning child processes to do the heavy lifting. (AFAIK Node doesn't have any concept of threads/shared memory; it's processes or nothing). You have two options for how you structure your application. You can get the 80/20 solution by spawning 8 HTTP servers and handling compute-intensive tasks synchronously on the child processes. Doing that is fairly simple. You could take an hour to read about it at that link. In fact, if you just rip off the example code at the top of that link you will get yourself 95% of the way there.
另一种构造方法是设置一个作业队列,并在队列上发送大型计算任务。请注意,对于作业队列,IPC有很多相关的开销,因此这只在任务明显大于开销时才有用。
令我惊讶的是,这些其他答案都没有提到集群。
背景:
异步代码是挂起的代码,直到在其他地方发生某些事情,此时代码将被唤醒并继续执行。一种非常常见的情况是,在其他地方必须发生一些缓慢的事情,那就是I/O。
如果异步代码是由处理器负责处理的,那么异步代码就没有用处。这正是“计算密集型”任务的情况。
现在,异步代码似乎是小众的,但实际上它非常普遍。它只是碰巧对计算密集型任务没有用处。
Waiting on I/O is a pattern that always happens in web servers, for example. Every client who connects to your sever gets a socket. Most of the time the sockets are empty. You don't want to do anything until a socket receives some data, at which point you want to handle the request. Under the hood an HTTP server like Node is using an eventing library (libev) to keep track of the thousands of open sockets. The OS notifies libev, and then libev notifies NodeJS when one of the sockets gets data, and then NodeJS puts an event on the event queue, and your http code kicks in at this point and handles the events one after the other. Events don't get put on the queue until the socket has some data, so events are never waiting on data - it's already there for them.
单线程基于事件的web服务器作为一种范式是有意义的,当瓶颈正在等待一堆空的套接字连接时,你不希望每个空闲连接都有一个完整的线程或进程,你也不希望轮询你的250k套接字来寻找下一个有数据的套接字。