Python vs. Node vs. PyPy
Tuesday, October 23, 2012 » Science Projects
With Python and Node implementations in hand, I compared the two in terms of performance, community, developer productivity, etc. I also took the opportunity to test-drive PyPy; it seemed appropriate, given that both PyPy and V81 rely on JIT compilation to boost performance.
In this particular post, I'd like to share my discoveries concerning Python vs. Node performance.
Update: See my followup post on Python vs. Node, in which I share my results from a more formal, rigorous round of performance testing.
For each test, I executed 5,000 GETs with ApacheBench (ab). I ran each benchmark 3 times from localhost2 on a 4-core Rackspace Cloud Server3, retaining only the numbers for the best-performing iteration4.
~ Insert obligatory benchmark disclaimer here. ~
CPython (Eventlet Worker)
Python v2.7 + Gunicorn v0.14.6 + Eventlet v0.9.17 + Rawr
PyPy (Sync worker)
PyPy 1.8 + Gunicorn v0.14.6 + Rawr
PyPy (Tornado Worker)
Gunicorn v0.14.6 + Tornado9 + Rawr
Node v0.8.11 + Cluster + Connect10 v2.6.0
In terms of throughput, Node.js is hard to beat. It's almost twice as fast as CPython/Eventlet workers. Interestingly, Node.js spanks Eventlet even while serving a single request at a time. It's obvious that Node's async framework is far more efficient than Eventlet's.
PyPy comes to the rescue, demonstrating the power of a good JIT compiler. The big surprise here is how well the PyPy/Sync workers perform. But there's more to the story...
Not shown in the graph (below) are a few informal tests I ran to see what would happen at even higher levels of concurrency. At 5,000 concurrent requests, PyPy/Tornado and Node.js exhibited the best performance at ~4100 req/sec, followed closely by PyPy/Sync at ~3800 req/sec, then CPython/Sync at ~2200 req/sec. I couldn't get CPython/Eventlet to finish all 5,000 requests without ab giving up due to socket errors.
CPython/Eventlet demonstrates high per-request overhead, turning around requests about 30% slower (in the worst case) than non-evented CPython workers.
With a high number of concurrent connections, PyPy's performance is very consistent. CPython/Sync also does well in this regard.
Node is less consistent than I expected, given its finely-tuned event loop. It's possible that node-mongodb-native is to blame—but that's pure conjecture on my part.
Node's asynchronous subsystem incurs far less overhead than Eventlet, and is slightly faster than Tornado running under PyPy. However, it appears that for moderate numbers of concurrent requests, the overhead inherent in any async framework simply isn't worthwhile11. I.e., async only makes sense in the context of serving several thousand concurrent requests per second.
CPython's lackluster performance makes a strong case for migrating to PyPy for existing projects, and for considering Node.js as a viable alternative to Python for new projects. That being said, PyPy is not nearly as battle-tested as CPython; caveat emptor!
- 2 Running ab on the same box as the server avoids networking anomalies that could bias test results.
- 3 CentOs 5.5, 8 GB RAM, ORD region, first-generation cloud server platform.
- 4 What can I say? I'm an insufferable optimist.
- 5 A lot of FUD has gone around re MongoDB. I'm convinced that most bad experiences with Mongo are due to one or more of the following: (1) Developers consciously (or subconsciously) expecting MongoDB to act like an RDBMS, (2) the perpetuation of the myth that MongoDB is not durable, and (3) trying to run MongoDB on servers with slow disks and/or insufficient RAM.
- 6 The response body was chosen as an example of a typical JSON document that might be returned from the message bus service.
- 7 Gunicorn is a very fast WSGI worker proxy, similar to Node's Cluster module. Gunicorn supports running various worker types, including basic synchronous workers, Eventlet workers, and Tornado workers.
- 8 Based on extensive benchmarking, I ended up writing a custom micro web-services framework. Protip: Don't use webob for parsing requests; it's dog-slow.
- 9 Used in lieu of Eventlet, since Eventlet was extremely slow under PyPy in my benchmarks.
- 10 Connect adds virtually no overhead; I wanted to use a framework that was fast, yet mainstream.
- 11 Warning: Don't expose sync processes to the Internet; run behind a non-blocking server such as Nginx to guard against not only malicious attacks (ala Slowloris), but also against the Thundering Herd.