我不完全明白Node.js是关于什么的。也许是因为我主要是一个基于web的商业应用程序开发人员。它是什么?它有什么用?

目前我的理解是:

编程模型是事件驱动的,特别是它处理I/O的方式。 它使用JavaScript,解析器是V8。 它可以很容易地用于创建并发服务器应用程序。

我的理解正确吗?如果是,那么事件I/O的好处是什么,它只是更多的并发性的东西吗?另外,Node.js的方向是成为一个框架,像基于JavaScript(基于V8)的编程模型吗?


当前回答

I use Node.js at work, and find it to be very powerful. Forced to choose one word to describe Node.js, I'd say "interesting" (which is not a purely positive adjective). The community is vibrant and growing. JavaScript, despite its oddities can be a great language to code in. And you will daily rethink your own understanding of "best practice" and the patterns of well-structured code. There's an enormous energy of ideas flowing into Node.js right now, and working in it exposes you to all this thinking - great mental weightlifting.

Node.js in production is definitely possible, but far from the "turn-key" deployment seemingly promised by the documentation. With Node.js v0.6.x, "cluster" has been integrated into the platform, providing one of the essential building blocks, but my "production.js" script is still ~150 lines of logic to handle stuff like creating the log directory, recycling dead workers, etc. For a "serious" production service, you also need to be prepared to throttle incoming connections and do all the stuff that Apache does for PHP. To be fair, Ruby on Rails has this exact problem. It is solved via two complementary mechanisms: 1) Putting Ruby on Rails/Node.js behind a dedicated webserver (written in C and tested to hell and back) like Nginx (or Apache / Lighttd). The webserver can efficiently serve static content, access logging, rewrite URLs, terminate SSL, enforce access rules, and manage multiple sub-services. For requests that hit the actual node service, the webserver proxies the request through. 2) Using a framework like Unicorn that will manage the worker processes, recycle them periodically, etc. I've yet to find a Node.js serving framework that seems fully baked; it may exist, but I haven't found it yet and still use ~150 lines in my hand-rolled "production.js".

Reading frameworks like Express makes it seem like the standard practice is to just serve everything through one jack-of-all-trades Node.js service ... "app.use(express.static(__dirname + '/public'))". For lower-load services and development, that's probably fine. But as soon as you try to put big time load on your service and have it run 24/7, you'll quickly discover the motivations that push big sites to have well baked, hardened C-code like Nginx fronting their site and handling all of the static content requests (...until you set up a CDN, like Amazon CloudFront)). For a somewhat humorous and unabashedly negative take on this, see this guy.

Node.js is also finding more and more non-service uses. Even if you are using something else to serve web content, you might still use Node.js as a build tool, using npm modules to organize your code, Browserify to stitch it into a single asset, and uglify-js to minify it for deployment. For dealing with the web, JavaScript is a perfect impedance match and frequently that makes it the easiest route of attack. For example, if you want to grovel through a bunch of JSON response payloads, you should use my underscore-CLI module, the utility-belt of structured data.

利与弊:

Pro: For a server guy, writing JavaScript on the backend has been a "gateway drug" to learning modern UI patterns. I no longer dread writing client code. Pro: Tends to encourage proper error checking (err is returned by virtually all callbacks, nagging the programmer to handle it; also, async.js and other libraries handle the "fail if any of these subtasks fails" paradigm much better than typical synchronous code) Pro: Some interesting and normally hard tasks become trivial - like getting status on tasks in flight, communicating between workers, or sharing cache state Pro: Huge community and tons of great libraries based on a solid package manager (npm) Con: JavaScript has no standard library. You get so used to importing functionality that it feels weird when you use JSON.parse or some other build in method that doesn't require adding an npm module. This means that there are five versions of everything. Even the modules included in the Node.js "core" have five more variants should you be unhappy with the default implementation. This leads to rapid evolution, but also some level of confusion.

相对于简单的每个请求一个进程的模型(LAMP):

Pro: Scalable to thousands of active connections. Very fast and very efficient. For a web fleet, this could mean a 10X reduction in the number of boxes required versus PHP or Ruby Pro: Writing parallel patterns is easy. Imagine that you need to fetch three (or N) blobs from Memcached. Do this in PHP ... did you just write code the fetches the first blob, then the second, then the third? Wow, that's slow. There's a special PECL module to fix that specific problem for Memcached, but what if you want to fetch some Memcached data in parallel with your database query? In Node.js, because the paradigm is asynchronous, having a web request do multiple things in parallel is very natural. Con: Asynchronous code is fundamentally more complex than synchronous code, and the up-front learning curve can be hard for developers without a solid understanding of what concurrent execution actually means. Still, it's vastly less difficult than writing any kind of multithreaded code with locking. Con: If a compute-intensive request runs for, for example, 100 ms, it will stall processing of other requests that are being handled in the same Node.js process ... AKA, cooperative-multitasking. This can be mitigated with the Web Workers pattern (spinning off a subprocess to deal with the expensive task). Alternatively, you could use a large number of Node.js workers and only let each one handle a single request concurrently (still fairly efficient because there is no process recycle). Con: Running a production system is MUCH more complicated than a CGI model like Apache + PHP, Perl, Ruby, etc. Unhandled exceptions will bring down the entire process, necessitating logic to restart failed workers (see cluster). Modules with buggy native code can hard-crash the process. Whenever a worker dies, any requests it was handling are dropped, so one buggy API can easily degrade service for other cohosted APIs.

相对于用Java / c# / C (C?真的吗?)

Pro: Doing asynchronous in Node.js is easier than doing thread-safety anywhere else and arguably provides greater benefit. Node.js is by far the least painful asynchronous paradigm I've ever worked in. With good libraries, it is only slightly harder than writing synchronous code. Pro: No multithreading / locking bugs. True, you invest up front in writing more verbose code that expresses a proper asynchronous workflow with no blocking operations. And you need to write some tests and get the thing to work (it is a scripting language and fat fingering variable names is only caught at unit-test time). BUT, once you get it to work, the surface area for heisenbugs -- strange problems that only manifest once in a million runs -- that surface area is just much much lower. The taxes writing Node.js code are heavily front-loaded into the coding phase. Then you tend to end up with stable code. Pro: JavaScript is much more lightweight for expressing functionality. It's hard to prove this with words, but JSON, dynamic typing, lambda notation, prototypal inheritance, lightweight modules, whatever ... it just tends to take less code to express the same ideas. Con: Maybe you really, really like coding services in Java?

要从另一个角度了解JavaScript和Node.js,请查看从Java到Node.js,这是一篇关于Java开发人员学习Node.js的印象和经验的博客文章。


模块 在考虑节点时,请记住JavaScript库的选择将定义您的体验。大多数人至少使用两种,异步模式助手(Step, Futures, Async)和JavaScript糖模块(Underscore.js)。

Helper / JavaScript

下划线。js -使用这个。尽管去做。它使您的代码漂亮和可读的东西像_.isString(),和_.isArray()。我不知道你怎么能写出安全的代码。此外,对于增强的command-line-fu,请查看我自己的Underscore-CLI。

异步模式模块:

Step - a very elegant way to express combinations of serial and parallel actions. My personal reccomendation. See my post on what Step code looks like. Futures - much more flexible (is that really a good thing?) way to express ordering through requirements. Can express things like "start a, b, c in parallel. When A, and B finish, start AB. When A, and C finish, start AC." Such flexibility requires more care to avoid bugs in your workflow (like never calling the callback, or calling it multiple times). See Raynos's post on using futures (this is the post that made me "get" futures). Async - more traditional library with one method for each pattern. I started with this before my religious conversion to step and subsequent realization that all patterns in Async could be expressed in Step with a single more readable paradigm. TameJS - Written by OKCupid, it's a precompiler that adds a new language primative "await" for elegantly writing serial and parallel workflows. The pattern looks amazing, but it does require pre-compilation. I'm still making up my mind on this one. StreamlineJS - competitor to TameJS. I'm leaning toward Tame, but you can make up your own mind.

或者要阅读关于异步库的所有内容,请参阅作者的小组访谈。

Web框架:

Express Great Ruby on Rails-esk framework for organizing web sites. It uses JADE as a XML/HTML templating engine, which makes building HTML far less painful, almost elegant even. jQuery While not technically a node module, jQuery is quickly becoming a de-facto standard for client-side user interface. jQuery provides CSS-like selectors to 'query' for sets of DOM elements that can then be operated on (set handlers, properties, styles, etc). Along the same vein, Twitter's Bootstrap CSS framework, Backbone.js for an MVC pattern, and Browserify.js to stitch all your JavaScript files into a single file. These modules are all becoming de-facto standards so you should at least check them out if you haven't heard of them.

测试:

JSHint - Must use; I didn't use this at first which now seems incomprehensible. JSLint adds back a bunch of the basic verifications you get with a compiled language like Java. Mismatched parenthesis, undeclared variables, typeos of many shapes and sizes. You can also turn on various forms of what I call "anal mode" where you verify style of whitespace and whatnot, which is OK if that's your cup of tea -- but the real value comes from getting instant feedback on the exact line number where you forgot a closing ")" ... without having to run your code and hit the offending line. "JSHint" is a more-configurable variant of Douglas Crockford's JSLint. Mocha competitor to Vows which I'm starting to prefer. Both frameworks handle the basics well enough, but complex patterns tend to be easier to express in Mocha. Vows Vows is really quite elegant. And it prints out a lovely report (--spec) showing you which test cases passed / failed. Spend 30 minutes learning it, and you can create basic tests for your modules with minimal effort. Zombie - Headless testing for HTML and JavaScript using JSDom as a virtual "browser". Very powerful stuff. Combine it with Replay to get lightning fast deterministic tests of in-browser code. A comment on how to "think about" testing: Testing is non-optional. With a dynamic language like JavaScript, there are very few static checks. For example, passing two parameters to a method that expects 4 won't break until the code is executed. Pretty low bar for creating bugs in JavaScript. Basic tests are essential to making up the verification gap with compiled languages. Forget validation, just make your code execute. For every method, my first validation case is "nothing breaks", and that's the case that fires most often. Proving that your code runs without throwing catches 80% of the bugs and will do so much to improve your code confidence that you'll find yourself going back and adding the nuanced validation cases you skipped. Start small and break the inertial barrier. We are all lazy, and pressed for time, and it's easy to see testing as "extra work". So start small. Write test case 0 - load your module and report success. If you force yourself to do just this much, then the inertial barrier to testing is broken. That's <30 min to do it your first time, including reading the documentation. Now write test case 1 - call one of your methods and verify "nothing breaks", that is, that you don't get an error back. Test case 1 should take you less than one minute. With the inertia gone, it becomes easy to incrementally expand your test coverage. Now evolve your tests with your code. Don't get intimidated by what the "correct" end-to-end test would look like with mock servers and all that. Code starts simple and evolves to handle new cases; tests should too. As you add new cases and new complexity to your code, add test cases to exercise the new code. As you find bugs, add verifications and / or new cases to cover the flawed code. When you are debugging and lose confidence in a piece of code, go back and add tests to prove that it is doing what you think it is. Capture strings of example data (from other services you call, websites you scrape, whatever) and feed them to your parsing code. A few cases here, improved validation there, and you will end up with highly reliable code.

另外,请查看推荐的Node.js模块的官方列表。然而,GitHub的节点模块Wiki是一个更完整和很好的资源。


为了理解Node,考虑一些关键的设计选择是有帮助的:

Node.js is EVENT BASED and ASYNCHRONOUS / NON-BLOCKING. Events, like an incoming HTTP connection will fire off a JavaScript function that does a little bit of work and kicks off other asynchronous tasks like connecting to a database or pulling content from another server. Once these tasks have been kicked off, the event function finishes and Node.js goes back to sleep. As soon as something else happens, like the database connection being established or the external server responding with content, the callback functions fire, and more JavaScript code executes, potentially kicking off even more asynchronous tasks (like a database query). In this way, Node.js will happily interleave activities for multiple parallel workflows, running whatever activities are unblocked at any point in time. This is why Node.js does such a great job managing thousands of simultaneous connections.

Why not just use one process/thread per connection like everyone else? In Node.js, a new connection is just a very small heap allocation. Spinning up a new process takes significantly more memory, a megabyte on some platforms. But the real cost is the overhead associated with context-switching. When you have 10^6 kernel threads, the kernel has to do a lot of work figuring out who should execute next. A bunch of work has gone into building an O(1) scheduler for Linux, but in the end, it's just way way more efficient to have a single event-driven process than 10^6 processes competing for CPU time. Also, under overload conditions, the multi-process model behaves very poorly, starving critical administration and management services, especially SSHD (meaning you can't even log into the box to figure out how screwed it really is).

Node.js is SINGLE THREADED and LOCK FREE. Node.js, as a very deliberate design choice only has a single thread per process. Because of this, it's fundamentally impossible for multiple threads to access data simultaneously. Thus, no locks are needed. Threads are hard. Really really hard. If you don't believe that, you haven't done enough threaded programming. Getting locking right is hard and results in bugs that are really hard to track down. Eliminating locks and multi-threading makes one of the nastiest classes of bugs just go away. This might be the single biggest advantage of node.

但是我如何利用我的16核盒子呢?

两种方式:

For big heavy compute tasks like image encoding, Node.js can fire up child processes or send messages to additional worker processes. In this design, you'd have one thread managing the flow of events and N processes doing heavy compute tasks and chewing up the other 15 CPUs. For scaling throughput on a webservice, you should run multiple Node.js servers on one box, one per core, using cluster (With Node.js v0.6.x, the official "cluster" module linked here replaces the learnboost version which has a different API). These local Node.js servers can then compete on a socket to accept new connections, balancing load across them. Once a connection is accepted, it becomes tightly bound to a single one of these shared processes. In theory, this sounds bad, but in practice it works quite well and allows you to avoid the headache of writing thread-safe code. Also, this means that Node.js gets excellent CPU cache affinity, more effectively using memory bandwidth.

Node.js让你毫不费力地做一些非常强大的事情。假设你有一个Node.js程序,它执行各种任务,监听TCP端口的命令,编码一些图像,等等。只需五行代码,就可以添加一个基于HTTP的web管理门户,显示活动任务的当前状态。这很容易做到:

var http = require('http');
http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end(myJavascriptObject.getSomeStatusInfo());
}).listen(1337, "127.0.0.1");

现在您可以点击一个URL并检查正在运行的进程的状态。添加几个按钮,你就有了一个“管理门户”。如果您有一个正在运行的Perl / Python / Ruby脚本,那么仅仅“添加一个管理门户”并不简单。

But isn't JavaScript slow / bad / evil / spawn-of-the-devil? JavaScript has some weird oddities, but with "the good parts" there's a very powerful language there, and in any case, JavaScript is THE language on the client (browser). JavaScript is here to stay; other languages are targeting it as an IL, and world class talent is competing to produce the most advanced JavaScript engines. Because of JavaScript's role in the browser, an enormous amount of engineering effort is being thrown at making JavaScript blazing fast. V8 is the latest and greatest javascript engine, at least for this month. It blows away the other scripting languages in both efficiency AND stability (looking at you, Ruby). And it's only going to get better with huge teams working on the problem at Microsoft, Google, and Mozilla, competing to build the best JavaScript engine (It's no longer a JavaScript "interpreter" as all the modern engines do tons of JIT compiling under the hood with interpretation only as a fallback for execute-once code). Yeah, we all wish we could fix a few of the odder JavaScript language choices, but it's really not that bad. And the language is so darn flexible that you really aren't coding JavaScript, you are coding Step or jQuery -- more than any other language, in JavaScript, the libraries define the experience. To build web applications, you pretty much have to know JavaScript anyway, so coding with it on the server has a sort of skill-set synergy. It has made me not dread writing client code.

此外,如果你真的讨厌JavaScript,你可以使用像CoffeeScript这样的语法糖。或者其他任何创建JavaScript代码的工具,比如谷歌Web Toolkit (GWT)。

说到JavaScript,什么是“闭包”?-这是一种很花哨的说法,可以在调用链中保留词法范围内的变量。,)是这样的:

var myData = "foo";
database.connect( 'user:pass', function myCallback( result ) {
    database.query("SELECT * from Foo where id = " + myData);
} );
// Note that doSomethingElse() executes _BEFORE_ "database.query" which is inside a callback
doSomethingElse();

看到了如何使用“myData”而不做任何尴尬的事情,如将它存储到一个对象?与Java不同的是,“myData”变量不必是只读的。这个强大的语言特性使异步编程变得不那么冗长和痛苦。

编写异步代码总是比编写简单的单线程脚本要复杂得多,但是使用Node.js,它并没有那么难,除了数千个并发连接的效率和可伸缩性之外,您还可以获得很多好处……

其他回答

此外,别忘了提到谷歌的V8非常快。它实际上将JavaScript代码转换为机器代码,具有与已编译二进制代码相匹配的性能。所以,连同所有其他伟大的东西,它是疯狂的快。

两个很好的例子是关于如何管理模板和如何使用渐进式增强。您只需要几段轻量级的JavaScript代码就可以使它完美地工作。

我强烈建议你看一下这些文章:

视频玻璃节点 Node.js YUI DOM操作

选择任何一种语言,试着记住如何管理HTML文件模板,以及必须做什么来更新DOM结构中的单个CSS类名称(例如,用户单击菜单项,您希望将其标记为“selected”并更新页面内容)。

使用Node.js就像在客户端JavaScript代码中一样简单。获取DOM节点并将CSS类应用于该节点。获取DOM节点并innerHTML内容(为此需要一些额外的JavaScript代码。阅读这篇文章了解更多)。

另一个很好的例子是,你可以让你的网页兼容JavaScript打开或关闭同一段代码。假设您有一个用JavaScript编写的日期选择,允许用户使用日历选择任何日期。您可以编写(或使用)同一段JavaScript代码,使其在打开或关闭JavaScript时正常工作。

Q:编程模型是事件驱动的,特别是它处理I/O的方式。

正确的。它使用回调,因此任何访问文件系统的请求都会导致一个请求被发送到文件系统,然后Node.js将开始处理它的下一个请求。它只在从文件系统获得响应后才会关心I/O请求,这时它将运行回调代码。但是,可以进行同步I/O请求(即阻塞请求)。由开发人员在异步(回调)或同步(等待)之间进行选择。

Q:它使用JavaScript,解析器是V8的。

Yes

Q:它可以很容易地用于创建并发服务器应用程序。

是的,尽管您需要手工编写相当多的JavaScript。最好查看一个框架,比如http://www.easynodejs.com/,它提供了完整的在线文档和示例应用程序。

闭包是在创建代码的上下文中执行代码的一种方式。

对于并发来说,这意味着您可以定义变量,然后初始化一个非阻塞I/O函数,并为它的回调发送一个匿名函数。

当任务完成时,回调函数将在变量的上下文中执行,这就是闭包。

闭包非常适合使用非阻塞I/O编写应用程序的原因是,它非常容易管理异步执行的函数的上下文。

我认为优点是:

Web development in a dynamic language (JavaScript) on a VM that is incredibly fast (V8). It is much faster than Ruby, Python, or Perl. Ability to handle thousands of concurrent connections with minimal overhead on a single process. JavaScript is perfect for event loops with first class function objects and closures. People already know how to use it this way having used it in the browser to respond to user initiated events. A lot of people already know JavaScript, even people who do not claim to be programmers. It is arguably the most popular programming language. Using JavaScript on a web server as well as the browser reduces the impedance mismatch between the two programming environments which can communicate data structures via JSON that work the same on both sides of the equation. Duplicate form validation code can be shared between server and client, etc.