Optimize PHP memory usage: eliminate circular references

Tree-like structures are very common in programming world and you may see some type of the parent-child relationship between classes a lot of times. And it is where PHP has a problem and you, as a developer, should care about it.

PHP has a build-in garbage collector so you do not need to track the links on the objects, allocate memory for objects and delete them when they are not longer necessary. Things seem so perfect that developers do not even know that their scripts allocate a lot of memory until their server stops processing requests because of the out of memory error.

Consider the simple scenario when we need to build a parent-child relationship multiple times:

class Node {
    public $parentNode;
    public $childNodes = array();
    function Node() {
        $this->nodeValue = str_repeat('0123456789', 128);
    }
}
function createRelationship() {
    $parent = new Node();
    $child = new Node();
    $parent->childNodes[] = $child;
    $child->parentNode = $parent;
}

And then let’s review the amount of memory allocated by this script after 10,000 calls to createRelationship function.

echo 'Initial: ' . number_format(memory_get_usage(), 0, '.', ',') . " bytes\n";
for($i = 0; $i < 10000; $i++) {
    createRelationship();
}
echo 'Peak: ' . number_format(memory_get_peak_usage(), 0, '.', ',') . " bytes\n";
echo 'End: ' . number_format(memory_get_usage(), 0, '.', ',') . " bytes\n";

And the output:

Initial: 327,336 bytes
Peak: 35,824,176 bytes
End: 35,823,328 bytes

So, such simple script allocate 34Mb. Systems with 1-2Gb can run 30-60 similar processes at the same time. But in shared hosting environment or when support of 100 simultaneous connections is a system requirement, such memory consumption is inadequate.

Understanding of processes that are behind this code is absolutely necessary to improving it and avoiding similar memory problems in future. And the key phrase is circular reference. Detecting them is very complex problem for garbage collectors because all in-memory objects should be analyzed. This is very time-consuming operation and PHP engine developers decided to just destroy all objects on script shutdown and do not share anything between scripts. So the PHP strategy is being faster instead of being memory aware. Honestly speaking, this strategy works well and the developer may never have to write some special code for destroying objects. But this is not a reason to ignoring this aspect of PHP.

Let’s back to the createRelationship() function. Garbage collector destroys objects that are out of scope and no other objects refer to them. So, you may expect $child and $parent objects destroyed when PHP leaves the function but each of the objects refer to each other so PHP cannot delete $child because $parent refers to it and vise-versa.

The common solution for this is to create special destructor that deletes references or delete them when they are not longer necessary. Adding this code into the “magic” __destruct() method is impossible because __destruct() is called by PHP when it detects that an object can be destroyed, but for $child and $parent it happens too late (on request shutdown).

So, let’s add new method to the Node object and call it for $parent in the createRelationship() function:

class Node {
    public $parentNode;
    public $childNodes = array();
    function Node() {
        $this->nodeValue = str_repeat('0123456789', 128);
    }
    function destroy()
    {
        $this->parentNode = null;
        $this->childNodes = array();
    }
}
function createRelationship() {
    $parent = new Node();
    $child = new Node();
    $parent->childNodes[] = $child;
    $child->parentNode = $parent;
    $parent->destroy();
}

Being run again with the modified version of the createRelationship() function the test displays the following results:

Initial: 328,416 bytes
Peak: 335,304 bytes
End: 328,520 bytes

Another solution that become available in PHP 5.3 is to use new garbage collector that analyzes circular references between objects and destroys the unused objects more efficiently.

With the new garbage collector turned on the first test (without destroy) shows much better result:

Initial: 327,136 bytes
Peak: 18,059,504 bytes
End: 825,656 bytes

From one point it in two times better then early versions of PHP. From another - it is still as far from efficient memory usage as 17 from 0.06.

Relying on a garbage collector for a memory critical applications is not a good idea just because garbage collector does not work all the time. It does not scan all the memory data on return of every function because it would be very slow. Instead, as in other programming languages such as C# and Java, you can forcibly start garbage collector in the moment you think is the best for cleaning up the memory by calling gc_collect_cycles(). For example, before loading a big file into memory.

Result of the createRelationship() function with gc_collect_cycles() is identical to the result of the createRelationship() function with the destroy() method:

function createRelationship() {
    $parent = new Node();
    $child = new Node();
    $parent->childNodes[] = $child;
    $child->parentNode = $parent;
    gc_collect_cycles();
}

Result:

Initial: 327,264 bytes
Peak: 335,112 bytes
End: 330,816 bytes

New garbage collector is good but it adds performance drawback. The following table shows how turning garbage collector on and off and gc_collect_cycles() may change the performance:

gcmemory cleanuptime (ms)memory (max/end, MB, rounded)
ongc_collect_cycles()430/0
ondestroy()440/0
on-7418/0
offgc_collect_cycles()430/0
offdestroy()460/0
off-4935/35

Summary

The garbage collector exchanges memory on performance, with your help this exchange becomes more profitable for your scripts.