technow

Subscribe to this blog Follow Owlient on Twitter!

A review of PHP compilers and their outputs

16 Feb
2010
2 comments
By Nicolas Favre-Felix

Introduction

Facebook generated a lot of buzz a few weeks ago when they announced the release of their new PHP compiler, HipHop-PHP. In this article, we will compare existing PHP optimization tools and what they can do to improve the speed of PHP pages. The release re-heated the debate over whether web applications are limited by the speed of PHP or the speed of their database; this article only deals with optimization tools for PHP code.

We will discuss the following concepts:

Thanks to Philipp Zentner for translating this article into German.

The Zend Engine, PHP opcodes

The Zend Engine is a virtual machine which runs PHP scripts. It is the official implementation of the PHP language. This virtual machine is opcode-based: PHP scripts are compiled into a simpler language which supports a limited number of operations, each with a code. For example: adding values, calling a function, comparing variables with == are such operations; their opcodes are ADD, DO_FCALL_BY_NAME, IS_EQUAL.

When a PHP script is executed, the following happens:

  1. The script is read and split into tokens, which are fed into a parser
  2. If the script is valid PHP, opcodes are generated
  3. The Zend Engine executes the opcodes, using a function for each opcode

For a more detailed explanation of this process, head over to Sara Golemon’s blog post on Understanding Opcodes.

Let’s examine the generated opcodes on a simple example:

function fib($n) {

if($n === 0 || $n === 1) {
return $n;
}
return fib($n-1) + fib($n-2);
}

echo fib(30)."\n";

This small script will be used throughout this article to examine its transformations by several compilation tools.
The Zend Engine generates two code blocks: one for the fib function, and one for the top-level call. I used the Vulcan Logic Dumper extension to dump the opcodes, using the following command: php -d vld.active=1 -d vld.execute=0 -f test.php

function name:  fib
number of ops: 20
compiled vars: !0 = $n
line # op fetch ext return operands
-------------------------------------------------------------------------------
29 0 RECV 1
31 1 IS_IDENTICAL ~0 !0, 0
2 JMPNZ_EX ~0 ~0, ->5
3 IS_IDENTICAL ~1 !0, 1
4 BOOL ~0 ~1
5 JMPZ ~0, ->8
32 6 RETURN !0
33 7* JMP ->8
34 8 INIT_FCALL_BY_NAME 'fib'
9 SUB ~2 !0, 1
10 SEND_VAL ~2
11 DO_FCALL_BY_NAME 1
12 INIT_FCALL_BY_NAME 'fib'
13 SUB ~4 !0, 2
14 SEND_VAL ~4
15 DO_FCALL_BY_NAME 1
16 ADD ~6 $3, $5
17 RETURN ~6
35 18* RETURN null
19* ZEND_HANDLE_EXCEPTION

function name:  (null)
number of ops: 7
compiled vars: none
line # op fetch ext return operands
-------------------------------------------------------------------------------
29 0 NOP
37 1 SEND_VAL 30
2 DO_FCALL 1 'fib'
3 CONCAT ~1 $0, '%0A'
4 ECHO ~1
40 5 RETURN 1
6* ZEND_HANDLE_EXCEPTION

This should be readable by anyone who has ever used assembly language with registers and a stack. I wonder about the NOP, though.

Opcode caches

Each HTTP request re-reads and re-compiles the file before executing the opcodes. For the majority of PHP pages, each short-lived request spends a significant proportion of its total execution time parsing and generating opcodes whether the source file has changed or not. Opcode caches are plugins for the Zend Engine that keep a copy of the generated opcodes after the file is read for the first time, and bypass the parsing and generation steps after that. They will only check if the file has been modified, although it is even possible to remove that check in order to gain more performance.

The most common Opcode caches are APC, XCache, eAccelerator. A full list and history is available on Wikipedia. APC has received contributions from Facebook, where it is used. Facebook engineers have also given talks in Web conferences on APC tuning (link to a much recommended PDF slideshow).

Opcode caches are easy to use and often bring “free” performance without having to optimize any code.

PHP extensions: when caching opcodes is not enough

Once you’ve made sure the bottleneck is indeed the execution of PHP code, there is a way to improve the execution speed by re-writing parts of the PHP code in C. This C code is compiled into a .so file, which is loaded by the PHP interpreter at runtime; the compiled modules export functions and classes that PHP scripts can use directly. Whenever you make a call to memcache from PHP, you’re using an extension written in C.

Extensions are loaded inside PHP, and make use of the Zend Engine’s internal data structures and APIs. Writing PHP extensions is tedious: There isn’t a whole lot of documentation, many functions and macros are inconsistent or confusing… It is an unpleasant experience overall.

The performance gain mostly depends on the application. I’ve seen a speed-up of 60× on certain functions: being able to use pointers and custom data structures in C is much more efficient than having to copy a whole lot of data every time a variable changes.
That said, you can’t just rewrite and expect your functions to perform better; identifying possible bottlenecks and re-writing a few core functions worked for us.

Generating PHP extensions with PHC

PHC is a PHP compiler written by Paul Biggar as part of his PhD. It can convert existing PHP code into C, to be compiled as an extension for PHP. The idea is to keep calling the same classes and functions, only this time they’ll be faster because they’re written in C.

The code that PHC generates is often difficult to follow and there is no simple way to use it as a C base that could be maintained by a human being. Our fib function generated a 2500-line file, the first 1350 being PHC boilerplate used by the rest. A generated comment in the output explained how PHC transformed the code:

function fib($n)
{
$TLE2 = 0;
$TLE0 = ($n === $TLE2);
if (TLE0) goto L16 else goto L17;
L16:
$TEF1 = $TLE0;
goto L18;
L17:
$TLE3 = 1;
$TEF1 = ($n === $TLE3);
goto L18;
L18:
$TLE4 = (bool) $TEF1;
if (TLE4) goto L19 else goto L20;
L19:
return $n;
goto L21;
L20:
goto L21;
L21:
$TLE5 = 1;
$TLE6 = ($n - $TLE5);
$TLE7 = fib($TLE6);
$TLE8 = 2;
$TLE9 = ($n - $TLE8);
$TLE10 = fib($TLE9);
$TLE11 = ($TLE7 + $TLE10);
return $TLE11;
}

Here is an excerpt of the generated C code, corresponding to the underlined statement above: (I have emphasized the important points; the rest is boilerplate)

// $TLE11 = ($TLE7 + $TLE10);
{
if (local_TLE11 == NULL)
{
local_TLE11 = EG (uninitialized_zval_ptr);
local_TLE11->refcount++;
}
zval** p_lhs = &local_TLE11;
zval* left;
if (local_TLE7 == NULL)
{
left = EG (uninitialized_zval_ptr);
}
else
{
left = local_TLE7;
}
zval* right;
if (local_TLE10 == NULL)
{
right = EG (uninitialized_zval_ptr);
}
else
{
right = local_TLE10;
}
if (in_copy_on_write (*p_lhs))
{
zval_ptr_dtor (p_lhs);
ALLOC_INIT_ZVAL (*p_lhs);
}
zval old = **p_lhs;
int result_is_operand = (*p_lhs == left || *p_lhs == right);
add_function(*p_lhs, left, right TSRMLS_CC);
if (!result_is_operand)
zval_dtor (&old);
phc_check_invariants (TSRMLS_C);
}

Note the weird flow of execution, probably due to the lack of type inference: The PHC compiler does its best to cover all possible issues and this take time.

How PHC uses PHP’s “embed” mode to generate executables

PHP has different interfaces, called SAPIs (Server API). Existing SAPIs include apache, cli, cgi-fgci… One of them provides a way to embed the PHP parser and virtual machine inside a C program. PHC takes advantage of this feature to bundle the PHP runtime along with the generated C code, producing an executable binary.

Other compilers

Two other compilers are available today, Roadsend-PHP and HipHop-PHP. Roadsend is being rewritten to use LLVM, but the project is apparently still in its infancy so the new version isn’t included here. These compilers are different from PHC as they don’t use the Zend runtime but provide their own execution system instead; both are capable of generating executable files.

A look at Roadsend’s output

Roadsend-PHP transforms PHP code into Scheme, a functional language. The Scheme code is then converted to C and compiled to produce an executable binary. Here is the output for fib:

(define test:test.php/fib
(lambda ($n)
#f
(push-stack 'unset 'fib $n)
(set! *PHP-LINE* 2)
(set! *PHP-FILE* "test.php")
(let ((ret1112
(begin
(begin0
(bind-exit
(return)
(let ()
#t
(begin
(if (or (identicalp $n #e0) (identicalp $n #e1))
(begin (return (copy-php-data $n)))
(begin))
(return
(php-+ (maybe-unbox
(begin
(set! *PHP-FILE* "test.php")
(set! *PHP-LINE* 7)
(let ((retval1110
(test:test.php/fib (php-- $n #e1))))
(set! *PHP-FILE* "test.php")
(set! *PHP-LINE* 7)
retval1110)))
(maybe-unbox
(begin
(set! *PHP-FILE* "test.php")
(set! *PHP-LINE* 7)
(let ((retval1111
(test:test.php/fib (php-- $n #e2))))
(set! *PHP-FILE* "test.php")
(set! *PHP-LINE* 7)
retval1111))))))
NULL))))))
(pop-stack)
ret1112)))

I used the following command: pcc -v test.php -O --no-clean.

*PHP-FILE* and *PHP-LINE* are global variables that are updated as the code is executed. Apart from a few PHP-related function calls, the code is pretty much straight Scheme code with special operators for PHP variables. That said, the generated C code is absolutely unreadable.
A point of comparison: on my machine, calling fib(30) in PHP takes 0.95 sec, while the roadsend-generated binary takes 0.44 sec. N.B. this is not a proper benchmark.

A look at HipHop’s output

HipHop produces the following output:
Variant f_fib(Numeric v_n) {
FUNCTION_INJECTION(fib);
if (same(v_n, 0LL) || same(v_n, 1LL)) {
return v_n;
}
return plus_rev(LINE(7,f_fib(v_n - 2LL)), f_fib(v_n - 1LL));
} /* function */
This is a real C++ function that is very similar to the PHP code. It is called by the following line:
  echo(concat(toString(LINE(10,f_fib(30LL))), "\n"));
This is indeed hackable and understandable easily. This code ran on my machine in about the same time as the one generated by Roadsend’s compiler. This is still not a fair comparison, as the program embeds a whole web server: a significant part of this run must be spent in setup and initialization.

Conclusion

Web developers have many options when it comes to optimizing existing PHP code. There is no silver bullet and the first step should always be to write better code; an optimizing compiler won’t save your bubble sort. Using Big-O notation is a good start, using simple data structures instead of a complex hierarchy of objects will often help as well.
Once your code is clean, try optimizing it step by step to scale as your user base grows: first with an opcode cache, and then using a compiler if you actually need it. Chances are, a well-tuned APC will be fast enough and you’ll be able to deploy changes quickly without having to recompile the whole code base.

Notes on HipHop-PHP: This project looks very promising but is still very young at the moment, and although it is maintained by Facebook I wouldn’t recommend running anything on it before it gets rid of its rough edges.
Comparative benchmarks are going to start appearing soon on forums and blogs: If you rely on them to make a technical decision, make sure they actually compare the same things. HipHop-PHP has its own web server for example, and this has an influence on every HTTP benchmark: don’t compare Apache and libevent when you’re trying to benchmark APC against HipHop.

Switching from MyISAM to InnoDB

30 Nov
2009
one comment
By Nicolas Favre-Felix

Context

Historically, we’ve used the MyISAM storage engine that MySQL offers by default. It is a pretty simple engine, which doesn’t support transactions or foreign keys. It seems to lack stability as well, as evidenced by how often tables would crash and lose or corrupt data.

The other major storage engine for MySQL is InnoDB. We have recently switched a few key tables from MyISAM to InnoDB with surprising performance gains, and this post will explain how.

It’s all about locks

MyISAM has an important characteristic, which is that it locks the whole table when performing a write operation (UPDATE or INSERT). This seriously hampers scalability, as the table will spend more and more of its time locking clients out as we grow in size. This means that as the number of clients grow, our table will get slower and spend more time waiting on locks than executing queries.

This table lock is clearly not appropriate in heavy-update applications such as our games, where players send many AJAX requests to update their virtual horses, fish, or babies. These tables often spent 95 to 99% of their time locking, and the rest executing. Adding cores doesn’t help here, and we had to start scaling horizontally, by partitionning tables.

How we do “sharding”

In order to limit locking times, we use our own sharding technique: splitting a table into many, and touch only one of them at a time only based on the previously-known value of one field or more.

For instance, a “comment” table for a blog-like website would be split in 20 tables: comment_0, comment_1... to comment_19. The article’s ID is used to determine in which table the comments are stored, using the formula Article ID modulo 20.

This technique would guarantee that when a user posts a comment, only one table locks, while another user posting on another article would lock (hopefully!) another table. It works well, except when your design is such that you can’t really partition the table without rewriting a lot of code and completely changing the way the data is accessed, which could mean changing the game too.

In order to simplify our databases and to gain more flexibility, we considered switching to InnoDB. This was decided mainly for one reason: InnoDB doesn’t lock per table, but per modified row. But first, we had to evaluate the possible performance gains.

Configuring InnoDB

We configured InnoDB with an innodb_buffer_pool_size of 1.5 GB, which is enough to hold our InnoDB data in its entirety. Many parameters can affect InnoDB’s performance, and I won’t detail them here. This presentation goes into a lot of details, and has been very helpful to us. 

How not to benchmark

Benchmarking database engines like SQLite used to do is meaningless in a web environment. Their benchmarks used to compare the time needed to insert 1000 rows, for example. This is not at all how we use our tables.

To get a fair comparison of storage engines, we used a specific pattern of queries that corresponds approximately to what we can see on our production servers: For 24 queries, 16 are SELECTs, 7 are UPDATEs, and 1 is an INSERT. We used sysbench as well as a custom tool to apply this pattern of queries in a benchmark and measured the results.

Where SQLite used to measure the time spent inserting 1000 rows in a single transaction, we ran our custom benchmark in a highly parallel environment and measured how the different engines reacted. Such a configuration gives a completely different image, and recommends InnoDB instead of MyISAM.

Parallel benchmarks

These tests have been run on a 2M+row table. Numbers are in queries per second. 

SELECT using the primary key:

UPDATE a field using the primary key:

INSERT a row:

As you can see, testing with a single thread like in the old SQLite tests is a big mistake in this case. Doing so would result in MyISAM performing (seemingly) better than InnoDB, when it is certainly not the case in our production environment and with our usage.
The number of queries per second could seem to be much lower for some operations than for others, but these numbers come from the scenario described above. For every INSERT, 16 SELECT and 7 UPDATE are executed.

Performance gains

We converted a few key tables to InnoDB. We selected these tables among the ones with the longest locking time, using MySQL’s SHOW PROFILE command to sort them by locking time. With all these lock contentions removed, we experienced a large drop in CPU usage, and much faster response times for our largest and most accessed pages.

Here is a measure of instant page speed, on one of the games. Can you tell on which day we turned to InnoDB? (The spikes are during our maintenance process).

Here is a sliding 7-day window of average page speed, with better detail.

This graph comes from a different game which is why the days don’t match. But even with different tables and different access patterns, the gains are still impressive.

Welcome!

30 Nov
2009
no comment
By Nicolas Favre-Felix

Welcome to Owlient’s new “techblog”. This site will present some of the technologies used here at Owlient to build our 3 games.

Our games are running on top of Apache, PHP5, MySQL and Memcache, so much of the articles will discuss these tools. The millions of players who visit our websites force us to use special techniques in order to scale to such a very large number of page views.

Such techniques include: Database optimisations and Replication, Sharding, Caching, PHP acceleration, Load balancing, Virtualization, Monitoring, and more.