Lee Braiden's Blog

Random thoughts, categorised

Tuning a small webserver to avoid swapping

A lot of people run into a problem when they first setup a webserver in a VPS, using default settings: the system will run fine at first, but when the owner isn’t around, it’ll start grinding to a halt. The owner comes back, tries to use the site, and it’s unbearably slow. Even logging into the server to see what’s wrong, can be unbearably slow.

The usual cause of this is not realising that web / database servers need to be configured carefully for the memory use per connection.

For example, if you’re just running apache with mod-php, and PHP is configured for 128MB per connection, apache might default to 32 connections or more (it might even be 128 by default), which is a LOT of memory for a small VPS. Then there is your database on top, and all of its connections, caches, etc.

For a webserver, you should probably:

  • Turn off virtual memory
  • Carefully calculate your memory requirements for PHP, Ruby, python etc.
  • Run a SMALL number of dedicated fastcgi (or similar, see wsgi etc.) servers for the webapp/languages you need.
  • Tune these as far as reasonably possible. For example, you can configure how much memory PHP-FPM will use, and how many threads mongrel will spawn, etc.
  • Run a carefully tuned webserver, with lots of LIGHTWEIGHT connections: apache stripped down (google that) or nginx, or lighttpd. I’d recommend nginx.
  • Run these without in-process languages like mod-php. Instead, make them serve static files quickly (ideally using the linux kernel’s sendfile feature), but pass PHP requests etc. to the back-end handlers, like php-fpm via fastcgi or similar.
  • Run mysqltuner or similar tools to tune your database.

What you need to aim for is for the maximum number of processes, under full load, and full memory usage, plus any extra software like firewalls and cron jobs, to never exceed physical memory. If you do need to run heavy cron jobs etc., then enable virtual memory, but only if you’re sure there are quiet times for your server, like 3am, when it can afford to crawl. Otherwise, you need to take the hit and reduce the number of processes/maximum memory, or increase the server memory, to cope without swapping.

UK Interest Rates: Government figures vs. Government figures

Officially, the UK Inflation rate is 2.4%. Except that inflation is supposed to be “a measure of the rise in cost of goods and services“.

Based on actual figures from the Office of National Statistics, the same organisation that reports the inflation rate, things are very different:

Gas and electricity: +142%
Car tax & insurance +108%
Home insurance: +54%
Council tax and rates +49%
(and so on)

The two are highly out of sync. On the one hand, you have huge actual rises in the cost of living. On the other hand, you have methodologies that make everything look right as rain. In fact, they have the audacity to claim that interest rates are falling, to 2.4%.

When inflation gets really bad (when hyperinflation occurs), the end result is that the economy collapses, or has to take extreme measures known as to prevent collapse. We’re not there yet, but this should be starting to sound familiar.

And yet, they have the audacity to claim that interest rates are falling, to 2.4%.

This is how you know your country is slipping… not into recession, but into global irrelevance.

As Her Majesty might say, one would do well to learn Chinese soon.

PyPy vs. CPython: Speed and memory usage benchmarks

Following on from part 1 of this article, I’d like to take you through some PyPy vs. CPython benchmarks.

Benchmarks: Many Objects

Note: in all benchmarks, I’m measuring total memory use for the entire interpreter run, but only measuring time taken across the code I’m actually interested in testing. There’s a subtle (depending on your experience) difference, because the interpreters may take longer to start up and shut down before actually running. From my perspective, it seems wiser to ignore this startup time, since most applications may run for a long time, and users are interested in performance once running, rather than during startup/shutdown. Either way, the startup/shutdown time is neglible for both interpreters, in this case.

Let’s take a quick benchmark of CPython vs. PyPy. This simple script will take a number on the command line, and generate that number of objects, each containing a dictionary of 30 sub-objects, which in turn contain simple string values as fields.

dyerware.com


dyerware.com


So PyPy completes these jobs in around 1/3 of the time, in both cases, and memory usage is about half that of CPython. Not bad.

What’s not so great

Performance Issues

Unfortunately, there are still some issues with PyPy vs. CPython, mainly in

Benchmarks: String concatenation, or CPython hacks

Let’s modify this code to do a little more: concatenating the values from each item into a huge string:

dyerware.com


dyerware.com


Look closely (or click Show/Hide Table Data): CPython’s results are there; they just don’t register on the scale! CPython is a lot faster in this case. At least the memory usage is still better for PyPy.

OK, something went very wrong here. PyPy clearly does not like being asked to do this: string concatenation in a loop over many objects is a quadratic operation, according to the PyPy performance page.

Why is CPython so much better here? Well, it looks like CPython has a hack to optimise this, relying on certain inner workings of CPython which don’t directly translate to the way that other Python interpreters work.

Then, is this a flaw in PyPy, that it can’t perform the same optimisation, even if it’s really a hack? Apparently not. In fact, the official Python documentation actually warns against Python concatenating strings in this way:

CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations.

Benchmarks: String concatenation, with a simple workaround

So, let’s try this again. Following the performance tuning instructions given above and on the PyPy performance page, we’ll change the code that loops over objects and concatenates strings to first build a list of strings, then join them in one operation.

And the benchmarks:

dyerware.com


dyerware.com


Much better. The CPython benchmarks are largely unaffected, and PyPy comes out on top. In other words, even if we don’t care about the ~3x performance gain that PyPy can now provide, running our code through PyPy has helped to indentify and improve bad code.

Reproduction details

Benchmarks were produced with the following script:

The specs of the machine this was executed on are:

Conclusions

All things considered, PyPy has been pretty great, for me. I believe it can be great for you, too. I believe it could be great for you right now, so long as you’re using Python 2.x code, or very soon, otherwise.

On the virtues of PyPy as your default interpreter

I get a lot of use out of PyPy. In fact, it’s become my default python interpreter, replacing CPython, at least for Python 2.x code. Python 3.x support in PyPy is coming real soon now; most of the tests are passing, so the next release will probably make it happen.

So, I wanted to write a little about the virtues of PyPy, and its potential to be your default Python interpreter, too. I also want to talk about the main issues that might present roadblocks at the moment, and how you might work around them. Finally, while most benchmarks focus on PyPy’s speed, I’m also going to examine its memory usage.

What’s great

Compatibility

Before we talk about improvements, it’s difficult to overstate how important this is.  PyPy is, at least for my use cases, a drop-in replacement for CPython 2.7.  It’s so compatible that I’ve symlink’d <code>/usr/local/bin/python2.7</code> to pypy, and have been using it that way for so long, without incident, that I’d forgotten I still had that set up.  There are lots of Python “variants” out there, like Cython, but when Python is the basis for many tools you use daily, that compatibility is the first stumbling block for any would-be replacements.

There are some compatibility issues.  From my understanding of this, the C-level API (ctypes) for CPython isn’t fully supported yet. PyPy has its own C-level interface, CFFI, and ctypes is a new addition. So, a few non-Python extension libraries for Python will not work with PyPy as yet.  If that sounds bad, the thing is that it’s never really been an issue for me.  Without paying attention to which of my libraries are pure Python code, and which are native code, virtually everything I’ve tried to do with PyPy just worked, first time.

PyPy provides replacements for some of these libraries, like NumPy and SciPy, which are actually optimised better by being part of PyPy itself. Would other any other major libraries be an issue?  Probably.  Do you actually use or need those libraries?  Possibly not. Also consider, that, even if you’re using some native extension library, another library implemented in pure Python may be a valid alternative. That’s because, with PyPy, pure Python code is fast: almost as fast as C code in some cases — at least in the same ballpark.

For example, I believe PyPy now implements CElementTree as native code just as CPython does, but in PyPy 1.7, it was pure python code.  Stefan Welts benchmarks showed that, one or two (4.5 MB structure and 274KB hamlet.xml) of five tests, speed was greatly improved over CPython running the same pure-python ElementTree code.

Of course, this doesn’t help if you really need a C library’s functionality, for, say, accessing some new piece of hardware.  In that case, it’s a matter of porting the code to CFFI.

There’s a full breakdown of PyPy compatibility on the official website.

Speed

If PyPy is about any one thing, it’s about boosting speed.  There are lots of great performance boosts to be had, just by running your code in PyPy instead of CPython.  Overall, according to the speed.pypy.org site, PyPy is currently, on average, around six times faster than CPython.

Now, a sixfold improvement in performance isn’t always a big deal — not if your application is IO-bound, or gluing other libraries together, at least.  Consider that your average python might will load some stuff from disk, then call a C library to generate a UI, or open a socket to another machine and start talking at WAN speeds.  Even the slowest language can probably keep up with relatively simple, lightweight desktop applications like a basic email client.  When Python is only used as glue to bind these separate system/C libraries together, you might not care about PyPy’s speed.  What might matter to you is the memory footprint of running many Python apps, but we’ll get to memory later.

Does performance matter to Python?

Setting IO-bound apps and “glue apps” aside, there are still many areas where speed is relatively critical.  Python is increasingly popular for data crunching, thanks to its simplicity, power, and wealth of libraries that let you just get on with the actual task at hand, rather than yak shaving.  Did you know that the US Securities and Exchange Commission is pushing Python as the language of choice for processing financial transactions?  High-profile, high-impact libraries like SciPy and NumPy also help a lot.  There are also multimedia applications, processing sound waves, handling many players and events on a game’s battlefield, applications running simulations of many people in crowded buildings during emergencies, etc. The list goes on.

Probably the most frequent and compelling need for high performance in Python, though, is in server-side applications.  Consider web apps like those based on Django, or TurboGears: for every single page requested, they’re probably many of the following:

  • Handling lots of contextual information about security middleware for handling authentication and authorisation
  • Parsing URLs requested, and routing through layers of the application to the right function which can handle that request
  • Talking to a database, parsing the results
  • Compiling multiple template files into a single page
  • Doing subsequent database lookups, merging and filtering lists of objects (if it couldn’t be optimised into a single database query)
  • Parsing and reformatting application-level data
  • Applying business logic
  • Generating forms from widgets
  • Looking up default values and previously posted values
  • Checking for errors on forms
  • Storing the results of updates, by modifying databases etc.
  • Posting errors to the user
  • Merging that all into an output page.

Now consider that this might be happening for thousands of requests per second.  It’s all pretty user-interface stuff on the client-side of the web — though even there, Javascript engines and browser rendering engines are having to step the performance. On the server-side, though, performance is pretty important.  Sure, you can always throw hardware at the problem, but consider that six times faster may actually mean spending six times less on hardware.  Even if you adjust that for realities and estimate, maybe three times less, you’ve still potentially saved yourself a lot of money.  Or, maybe you only have one smaller server, and you’ve reduced your rendering time per page from five seconds to under two seconds.  That boost in web app responsiveness could make or break your web app.

So, for me, crunching data, and for lots of others, crunching web requests etc., having Python run 6x faster is a huge improvement.

Bear in mind that PyPy will do this essentially for free, because it’s almost a drop-in replacement.

Memory Overhead

Memory overhead is another issue that doesn’t matter for some applications.  Well, if the application is a nothing more than a small script, at least. Do you really want your little systray app using 8MB of memory, though, or your server app using 4GB instead of 2GB?

In the C language, when you create 4-byte, 32-bit integer, that’s exactly the memory you require: 4 bytes, on the stack.  The only other overhead is to update the stack pointer: adding 4 to a number, in other words.  That’s it: creating a new number means updating a number which counts how much you’ve created so far.  It’s very fast, and very lightweight.  When you name a variable and say what type it will be, you’re saying, “let me have a little space for this, right here.”

In contrast,  languages like Python and Java, the underlying details can be a lot different: you have “boxed” objects, which contain a lot of hidden, runtime information: what type the thing you just made is, where it is in memory, how many times that new object has been referenced, etc.  All of this is overhead: not the information you wanted, but meta-information: information about the information that you wanted.

Now, those boxes of metadata can be great.  You can use them for lots of cool things at run-time, like asking what kind of data someone passed into the function.  You can use them for debugging, to say, “I’ve been passed this object.  It’s an int, and its value is x”.  In languages like C, some of that is possible, on some types of data (like class objects), but not all, and not on the simplest, native data types.  The upshot is that, compared to Python, languages like C can do a lot more in the same amount of system memory.  In Python, all objects require some metadata.  And in Python, because it’s such a powerfully dynamic language, it’s (relatively speaking) a lot of metadata.

What does all this have do to with CPython vs. PyPy?  Simply that PyPy is better at this.  Like for like, PyPy will probably use about half as much memory as CPython.  That matters, if you’ve just loaded 1,000,000 lines of text, and parsed them into 1,000,000 objects, for example.  It also matters a little for performance, when all that data needs to be created, moved around, copied, updated, etc.

In part 2, I’ll post some code, and the benchmarks which result from it, for comparison.

Welcome to my new blog

It’s been a while.  Finding time to blog is tough these days, and it’s so easy to be swept up in sites like Facebook, where all your friends are, or Google Plus, which offer easy access to lots of other posts and readers.

I’ve seen the light, though. Facebook does so much analysis on users for so little in return, and limits your network size, etc.  When I last had a blog, I had complex, interesting discussions with people from all over the world, regularly.  For all my time on Facebook, I’ve said hi to a few friends, arranged a few outings, checked out a bunch of photos… but the whole experience has been much more shallow.  Granted, Google Plus is better at that sort of open discussion.  But then there are the privacy issues.  It’s so hard to track what google plus is sharing, and what it’s not.  And it’s closed.  There isn’t even an API to post to google plus, that I know of. You’d think they’d welcome the content.

In short, I missed the real world-wide web, where people network, communicate, share ideas, and build things together.  So.  A new beginning.

In this blog, I’ll be discussing software engineering projects, mainly in Python, possibly in Rust and a few other modern, interesting languages.  I might get into other projects, like fixing up the new house, garden, sheds, etc.  I might get into ethics, philosophy, and stuff like that, too.  I’ll keep those all neatly categorised though, so people can subscribe to the coding stuff, without listening to me harp on about ethics.  Or vice versa.