Memory Performance Boosts with Generators and Nikic/Iter

Share this article

Arrays, and by extension iteration, are fundamental parts to any application. And like the complexity of our applications, how we use them should evolve as we gain access to new tools.

New tools, like generators, for instance. First came arrays. Then we gained the ability to define our own array-like things (called iterators). But since PHP 5.5, we can rapidly create iterator-like structures called generators.

A loop illustration

These appear as functions, but we can use them as iterators. They give us a simple syntax for what are essentially interruptible, repeatable functions. They’re wonderful!

And we’re going to look at a few areas in which we can use them. We’re also going to discover a few problems to be aware of when using them. Finally, we’ll study a brilliant library, created by the talented Nikita Popov.

You can find the example code at https://github.com/sitepoint-editors/generators-and-iter.

The Problems

Imagine you have lots of relational data, and you want to do some eager loading. Perhaps the data is comma-separated, and you need to load each data type, and knit them together.

You could start with something as simple as:

function readCSV($file) {
    $rows = [];

    $handle = fopen($file, "r");

    while (!feof($handle)) {
        $rows[] = fgetcsv($handle);
    }

    fclose($handle);

    return $rows;
}

$authors = array_filter(
    readCSV("authors.csv")
);

$categories = array_filter(
    readCSV("categories.csv")
);

$posts = array_filter(
    readCSV("posts.csv")
);

Then you’d probably try to connect related elements through iteration or higher-order functions:

function filterByColumn($array, $column, $value) {
    return array_filter(
        $array, function($item) use ($column, $value) {
            return $item[$column] == $value;
        }
    );
}

$authors = array_map(function($author) use ($posts) {
    $author["posts"] = filterByColumn(
        $posts, 1, $author[0]
    );

    // make other changes to $author

    return $author;
}, $authors);

$categories = array_map(function($category) use ($posts) {
    $category["posts"] = filterByColumn(
        $posts, 2, $category[0]
    );

    // make other changes to $category

    return $category;
}, $categories);

$posts = array_map(function($post) use ($authors, $categories) {
    foreach ($authors as $author) {
        if ($author[0] == $post[1]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach ($categories as $category) {
        if ($category[0] == $post[1]) {
            $post["category"] = $category;
            break;
        }
    }

    // make other changes to $post

    return $post;
}, $posts);

Seems ok, right? Well, what happens when we have huge CSV files to parse? Let’s profile the memory usage a bit…

function formatBytes($bytes, $precision = 2) {
    $kilobyte = 1024;
    $megabyte = 1024 * 1024;

    if ($bytes >= 0 && $bytes < $kilobyte) {
        return $bytes . " b";
    }

    if ($bytes >= $kilobyte && $bytes < $megabyte) {
        return round($bytes / $kilobyte, $precision) . " kb";
    }

    return round($bytes / $megabyte, $precision) . " mb";
}

print "memory:" . formatBytes(memory_get_peak_usage());

The example code includes generate.php, which you can use to make these CSV files…

If you have large CSV files, this code should show just how much memory if takes to link these arrays together. It’s at least the size of the file you have to read, because PHP has to hold it all in memory.

Generators to the Rescue!

One way you could improve this would be to use generators. If you’re unfamiliar with them, now is a good time to learn more.

Generators will allow you to load tiny amounts of the total data at once. There’s not much you need to do to use generators:

function readCSVGenerator($file) {
    $handle = fopen($file, "r");

    while (!feof($handle)) {
        yield fgetcsv($handle);
    }

    fclose($handle);
}

If you loop over the CSV data, you’ll notice an immediate drop in the amount of memory you need at once:

foreach (readCSVGenerator("posts.csv") as $post) {
    // do something with $post
}

print "memory:" . formatBytes(memory_get_peak_usage());

If you were seeing megabytes of memory used before, you’ll see kilobytes now. That’s a huge improvement, but it doesn’t come without its share of problems.

For a start, array_filter and array_map don’t work with generators. You’ll have to find other tools to handle that kind of data. Here’s one you can try!

composer require nikic/iter

This library introduces a few functions that work with iterators and generators. So how could you still get all this relatable data, without keeping any of it in memory?

function getAuthors() {
    $authors = readCSVGenerator("authors.csv");

    foreach ($authors as $author) {
        yield formatAuthor($author);
    }
}

function formatAuthor($author) {
    $author["posts"] = getPostsForAuthor($author);

    // make other changes to $author

    return $author;
}

function getPostsForAuthor($author) {
    $posts = readCSVGenerator("posts.csv");

    foreach ($posts as $post) {
        if ($post[1] == $author[0]) {
            yield formatPost($post);
        }
    }
}

function formatPost($post) {
    foreach (getAuthors() as $author) {
        if ($post[1] == $author[0]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach (getCategories() as $category) {
        if ($post[2] == $category[0]) {
            $post["category"] = $category;
            break;
        }
    }

    // make other changes to $post

    return $post;
}

function getCategories() {
    $categories = readCSVGenerator("categories.csv");

    foreach ($categories as $category) {
        yield formatCategory($category);
    }
}

function formatCategory($category) {
    $category["posts"] = getPostsForCategory($category);

    // make other changes to $category

    return $category;
}

function getPostsForCategory($category) {
    $posts = readCSVGenerator("posts.csv");

    foreach ($posts as $post) {
        if ($post[2] == $category[0]) {
            yield formatPost($post);
        }
    }
}

// testing this out...

foreach (getAuthors() as $author) {
    foreach ($author["posts"] as $post) {
        var_dump($post["author"]);
        break 2;
    }
}

This could be less verbose:

function filterGenerator($generator, $column, $value) {
    return iter\filter(
        function($item) use ($column, $value) {
            return $item[$column] == $value;
        },
        $generator
    );
}

function getAuthors() {
    return iter\map(
        "formatAuthor",
        readCSVGenerator("authors.csv")
    );
}

function formatAuthor($author) {
    $author["posts"] = getPostsForAuthor($author);

    // make other changes to $author

    return $author;
}

function getPostsForAuthor($author) {
    return iter\map(
        "formatPost",
        filterGenerator(
            readCSVGenerator("posts.csv"), 1, $author[0]
        )
    );
}

function formatPost($post) {
    foreach (getAuthors() as $author) {
        if ($post[1] == $author[0]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach (getCategories() as $category) {
        if ($post[2] == $category[0]) {
            $post["category"] = $category;
            break;
        }
    }

    // make other changes to $post

    return $post;
}

function getCategories() {
    return iter\map(
        "formatCategory",
        readCSVGenerator("categories.csv")
    );
}

function formatCategory($category) {
    $category["posts"] = getPostsForCategory($category);

    // make other changes to $category

    return $category;
}

function getPostsForCategory($category) {
    return iter\map(
        "formatPost",
        filterGenerator(
            readCSVGenerator("posts.csv"), 2, $category[0]
        )
    );
}

It’s a bit wasteful to re-read each data source, every time. Consider keeping smaller related data (like authors and categories) in memory…

Other Fun Things

That’s just the tip of the iceberg when it comes to Nikic’s library! Ever wanted to flatten an array (or iterator/generator)?

$array = iter\toArray(
    iter\flatten(
        [1, 2, [3, 4, 5], 6, 7]
    )
);

print join(", ", $array); // "1, 2, 3, 4, 5"

You can return slices of iterable variables, using functions like slice and take:

$array = iter\toArray(
    iter\slice(
        [-3, -2, -1, 0, 1, 2, 3],
        2, 4
    )
);

print join(", ", $array); // "-1, 0, 1, 2"

As you work more with generators, you may come to find that you can’t always reuse them. Consider the following example:

$mapper = iter\map(
    function($item) {
        return $item * 2;
    },
    [1, 2, 3]
);

print join(", ", iter\toArray($mapper));
print join(", ", iter\toArray($mapper));

If you try to run that code, you’ll see an exception saying; “Cannot traverse an already closed generator”. Each iterator function in this library has a rewindable counterpart:

$mapper = iter\rewindable\map(
    function($item) {
        return $item * 2;
    },
    [1, 2, 3]
);

You can use this mapping function many times. You can even make your own generators rewindable:

$rewindable = iter\makeRewindable(function($max = 13) {
    $older = 0;
    $newer = 1;

    do {
        $number = $newer + $older;

        $older = $newer;
        $newer = $number;

        yield $number;
    }
    while($number < $max);
});

print join(", ", iter\toArray($rewindable()));

What you get from this is a reusable generator!

Conclusion

For every looping thing you need to think about, generators may be an option. They can even be useful for other things ,too. And where the language falls short, Nikic’s library steps in with higher-order functions aplenty.

Are you using generators yet? Would you like to see more examples on how to implement them in your own apps to gain some performance upgrades? Let us know!

Frequently Asked Questions (FAQs) on Memory Performance Boosts with Generators and Nikic/Iter

What are the benefits of using generators in PHP?

Generators in PHP are a simple and powerful tool for creating iterators. They are used to simplify the process of writing code that uses both pull and push model iteration. Generators are particularly useful when dealing with large datasets that would be impractical or impossible to load into memory all at once. They allow you to write code that uses a foreach loop to iterate over a set of data without needing to build an array in memory, which can lead to significant performance improvements.

How does Nikic/Iter improve memory performance?

Nikic/Iter is a library that provides a set of primitives for working with PHP iterators. It offers a number of functions that can be used to manipulate and compose iterators, which can lead to more efficient and readable code. By using Nikic/Iter, you can avoid the overhead of creating unnecessary intermediate arrays, which can lead to significant memory savings.

How can I read a CSV file in Symfony?

Reading a CSV file in Symfony can be done using the built-in PHP functions for working with CSV files, such as fgetcsv(). However, for larger files, it may be more efficient to use a library like Nikic/Iter, which can handle large datasets without loading them into memory all at once.

What does the error “Cannot traverse an already closed generator” mean?

This error occurs when you try to iterate over a generator that has already been exhausted. Once a generator has been fully iterated over, it cannot be rewound or reused. To avoid this error, you should ensure that you do not attempt to iterate over a generator more than once.

How can I work with CSV files in PHP?

PHP provides a number of built-in functions for working with CSV files, including fgetcsv() for reading CSV files and fputcsv() for writing to CSV files. However, for larger files, it may be more efficient to use a library like Nikic/Iter, which can handle large datasets without loading them into memory all at once.

What is the issue with Laravel framework?

The issue with Laravel framework mentioned in the competitor’s article is related to a specific bug in the framework. It’s not directly related to the topic of memory performance boosts with generators and Nikic/Iter. However, it’s worth noting that using generators and libraries like Nikic/Iter can help to improve the performance of Laravel applications by reducing memory usage.

How can I improve the performance of my PHP code?

There are many ways to improve the performance of your PHP code, including using generators to handle large datasets, using libraries like Nikic/Iter to work with iterators more efficiently, and optimizing your code to reduce unnecessary computations. It’s also important to profile your code to identify any bottlenecks and to use caching where appropriate.

What are the alternatives to Nikic/Iter?

There are many libraries available for working with iterators in PHP, including Doctrine’s collections library and the SPL (Standard PHP Library) iterators. However, Nikic/Iter is particularly well-suited for working with large datasets due to its efficient memory usage.

How can I use generators in my PHP code?

Generators in PHP are created using the yield keyword. You can use a generator wherever you would use an iterator, such as in a foreach loop. The main advantage of generators is that they allow you to write code that can handle large datasets without needing to load the entire dataset into memory.

What are the best practices for working with large datasets in PHP?

When working with large datasets in PHP, it’s important to avoid loading the entire dataset into memory at once. Instead, you should use techniques like pagination, streaming, and generators to process the data in chunks. Libraries like Nikic/Iter can also be very helpful for working with large datasets efficiently.

Christopher PittChristopher Pitt
View Author

Christopher is a writer and coder, working at Over. He usually works on application architecture, though sometimes you'll find him building compilers or robots.

BrunoSgeneratorgeneratorsiteratoriteratorslooploopsmemoryOOPHPoptimizationperformanceperformance-toolsPHP
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week