idea iter生成指定_生成器和Nikic / Iter提高了内存性能

最新推荐文章于 2024-01-30 11:58:06 发布

culh2177

最新推荐文章于 2024-01-30 11:58:06 发布

阅读量562

点赞数

文章标签： python java php 编程语言人工智能

原文链接：https://www.sitepoint.com/memory-performance-boosts-with-generators-and-nikiciter/

版权

idea iter生成指定

Arrays, and by extension iteration, are fundamental parts to any application. And like the complexity of our applications, how we use them should evolve as we gain access to new tools.

数组(通过扩展迭代)是任何应用程序的基本组成部分。就像我们应用程序的复杂性一样，随着我们获得新工具的使用，我们使用它们的方式也应不断发展。

New tools, like generators, for instance. First came arrays. Then we gained the ability to define our own array-like things (called iterators). But since PHP 5.5, we can rapidly create iterator-like structures called generators.

例如，新工具，例如发电机。首先是数组。然后，我们获得了定义自己的类似数组的东西(称为迭代器)的能力。但是从PHP 5.5开始，我们可以快速创建类似迭代器的结构，称为生成器。

These appear as functions, but we can use them as iterators. They give us a simple syntax for what are essentially interruptible, repeatable functions. They’re wonderful!

这些作为函数出现，但是我们可以将它们用作迭代器。它们为本质上可中断，可重复的功能提供了简单的语法。他们太棒了！

And we’re going to look at a few areas in which we can use them. We’re also going to discover a few problems to be aware of when using them. Finally, we’ll study a brilliant library, created by the talented Nikita Popov.

我们将研究一些可以使用它们的领域。我们还将发现一些在使用时需要注意的问题。最后，我们将研究由才华横溢的Nikita Popov创建的出色图书馆。

You can find the example code at https://github.com/sitepoint-editors/generators-and-iter.

您可以在https://github.com/sitepoint-editors/generators-and-iter上找到示例代码。

问题所在 (The Problems)

Imagine you have lots of relational data, and you want to do some eager loading. Perhaps the data is comma-separated, and you need to load each data type, and knit them together.

假设您有很多关系数据，并且想要进行一些急切的加载。也许数据是用逗号分隔的，并且您需要加载每种数据类型，并将它们编织在一起。

You could start with something as simple as:

您可以从以下简单内容开始：

function readCSV($file) {
    $rows = [];

    $handle = fopen($file, "r");

    while (!feof($handle)) {
        $rows[] = fgetcsv($handle);
    }

    fclose($handle);

    return $rows;
}

$authors = array_filter(
    readCSV("authors.csv")
);

$categories = array_filter(
    readCSV("categories.csv")
);

$posts = array_filter(
    readCSV("posts.csv")
);

Then you’d probably try to connect related elements through iteration or higher-order functions:

然后，您可能会尝试通过迭代或高阶函数连接相关元素：

function filterByColumn($array, $column, $value) {
    return array_filter(
        $array, function($item) use ($column, $value) {
            return $item[$column] == $value;
        }
    );
}

$authors = array_map(function($author) use ($posts) {
    $author["posts"] = filterByColumn(
        $posts, 1, $author[0]
    );

    // make other changes to $author

    return $author;
}, $authors);

$categories = array_map(function($category) use ($posts) {
    $category["posts"] = filterByColumn(
        $posts, 2, $category[0]
    );

    // make other changes to $category

    return $category;
}, $categories);

$posts = array_map(function($post) use ($authors, $categories) {
    foreach ($authors as $author) {
        if ($author[0] == $post[1]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach ($categories as $category) {
        if ($category[0] == $post[1]) {
            $post["category"] = $category;
            break;
        }
    }

    // make other changes to $post

    return $post;
}, $posts);

Seems ok, right? Well, what happens when we have huge CSV files to parse? Let’s profile the memory usage a bit…

看起来还好吧？好吧，当我们有大量的CSV文件要解析时会发生什么？让我们简要介绍一下内存使用情况…

function formatBytes($bytes, $precision = 2) {
    $kilobyte = 1024;
    $megabyte = 1024 * 1024;

    if ($bytes >= 0 && $bytes < $kilobyte) {
        return $bytes . " b";
    }

    if ($bytes >= $kilobyte && $bytes < $megabyte) {
        return round($bytes / $kilobyte, $precision) . " kb";
    }

    return round($bytes / $megabyte, $precision) . " mb";
}

print "memory:" . formatBytes(memory_get_peak_usage());

The example code includes generate.php, which you can use to make these CSV files…

示例代码包括generate.php ，您可以使用它来制作这些CSV文件…

If you have large CSV files, this code should show just how much memory if takes to link these arrays together. It’s at least the size of the file you have to read, because PHP has to hold it all in memory.

如果您有大型CSV文件，则此代码应显示将这些数组链接在一起需要多少内存。它至少是您必须读取的文件大小，因为PHP必须将其全部保存在内存中。

发电机救援！ (Generators to the Rescue!)

One way you could improve this would be to use generators. If you’re unfamiliar with them, now is a good time to learn more.

可以改善这种情况的一种方法是使用生成器。如果您不熟悉它们，现在是学习更多的好时机。

Generators will allow you to load tiny amounts of the total data at once. There’s not much you need to do to use generators:

生成器将允许您一次加载少量的总数据。使用生成器不需要做太多事情：

function readCSVGenerator($file) {
    $handle = fopen($file, "r");

    while (!feof($handle)) {
        yield fgetcsv($handle);
    }

    fclose($handle);
}

If you loop over the CSV data, you’ll notice an immediate drop in the amount of memory you need at once:

如果遍历CSV数据，您会立即发现所需的内存量立即减少：

foreach (readCSVGenerator("posts.csv") as $post) {
    // do something with $post
}

print "memory:" . formatBytes(memory_get_peak_usage());

If you were seeing megabytes of memory used before, you’ll see kilobytes now. That’s a huge improvement, but it doesn’t come without its share of problems.

如果您看到以前使用的兆字节内存，现在将看到千字节。这是一个巨大的进步，但并非没有问题。

For a start, array_filter and array_map don’t work with generators. You’ll have to find other tools to handle that kind of data. Here’s one you can try!

首先， array_filter和array_map不能与生成器一起使用。您将不得不找到其他工具来处理此类数据。这是您可以尝试的一个！

composer require nikic/iter

This library introduces a few functions that work with iterators and generators. So how could you still get all this relatable data, without keeping any of it in memory?

该库引入了一些可与迭代器和生成器一起使用的函数。那么，如何在不将任何相关数据保留在内存中的情况下，仍然获得所有这些相关数据呢？

function getAuthors() {
    $authors = readCSVGenerator("authors.csv");

    foreach ($authors as $author) {
        yield formatAuthor($author);
    }
}

function formatAuthor($author) {
    $author["posts"] = getPostsForAuthor($author);

    // make other changes to $author

    return $author;
}

function getPostsForAuthor($author) {
    $posts = readCSVGenerator("posts.csv");

    foreach ($posts as $post) {
        if ($post[1] == $author[0]) {
            yield formatPost($post);
        }
    }
}

function formatPost($post) {
    foreach (getAuthors() as $author) {
        if ($post[1] == $author[0]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach (getCategories() as $category) {
        if ($post[2] == $category[0]) {
            $post["category"] = $category;
            break;
        }
    }

    // make other changes to $post

    return $post;
}

function getCategories() {
    $categories = readCSVGenerator("categories.csv");

    foreach ($categories as $category) {
        yield formatCategory($category);
    }
}

function formatCategory($category) {
    $category["posts"] = getPostsForCategory($category);

    // make other changes to $category

    return $category;
}

function getPostsForCategory($category) {
    $posts = readCSVGenerator("posts.csv");

    foreach ($posts as $post) {
        if ($post[2] == $category[0]) {
            yield formatPost($post);
        }
    }
}

// testing this out...

foreach (getAuthors() as $author) {
    foreach ($author["posts"] as $post) {
        var_dump($post["author"]);
        break 2;
    }
}

This could be less verbose:

这可能不那么冗长：

function filterGenerator($generator, $column, $value) {
    return iter\filter(
        function($item) use ($column, $value) {
            return $item[$column] == $value;
        },
        $generator
    );
}

function getAuthors() {
    return iter\map(
        "formatAuthor",
        readCSVGenerator("authors.csv")
    );
}

function formatAuthor($author) {
    $author["posts"] = getPostsForAuthor($author);

    // make other changes to $author

    return $author;
}

function getPostsForAuthor($author) {
    return iter\map(
        "formatPost",
        filterGenerator(
            readCSVGenerator("posts.csv"), 1, $author[0]
        )
    );
}

function formatPost($post) {
    foreach (getAuthors() as $author) {
        if ($post[1] == $author[0]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach (getCategories() as $category) {
        if ($post[2] == $category[0]) {
            $post["category"] = $category;
            break;
        }
    }

    // make other changes to $post

    return $post;
}

function getCategories() {
    return iter\map(
        "formatCategory",
        readCSVGenerator("categories.csv")
    );
}

function formatCategory($category) {
    $category["posts"] = getPostsForCategory($category);

    // make other changes to $category

    return $category;
}

function getPostsForCategory($category) {
    return iter\map(
        "formatPost",
        filterGenerator(
            readCSVGenerator("posts.csv"), 2, $category[0]
        )
    );
}

It’s a bit wasteful to re-read each data source, every time. Consider keeping smaller related data (like authors and categories) in memory…

每次重新读取每个数据源都有些浪费。 考虑在内存中保留较小的相关数据(例如作者和类别)…

其他有趣的事情 (Other Fun Things)

That’s just the tip of the iceberg when it comes to Nikic’s library! Ever wanted to flatten an array (or iterator/generator)?

这只是Nikic图书馆的冰山一角！是否曾经想过扁平化数组(或迭代器/生成器)？

$array = iter\toArray(
    iter\flatten(
        [1, 2, [3, 4, 5], 6, 7]
    )
);

print join(", ", $array); // "1, 2, 3, 4, 5"

You can return slices of iterable variables, using functions like slice and take:

您可以使用slice和take类的函数返回可迭代变量的slice ：

$array = iter\toArray(
    iter\slice(
        [-3, -2, -1, 0, 1, 2, 3],
        2, 4
    )
);

print join(", ", $array); // "-1, 0, 1, 2"

As you work more with generators, you may come to find that you can’t always reuse them. Consider the following example:

当您更多地使用生成器时，您可能会发现您不能总是重复使用它们。考虑以下示例：

$mapper = iter\map(
    function($item) {
        return $item * 2;
    },
    [1, 2, 3]
);

print join(", ", iter\toArray($mapper));
print join(", ", iter\toArray($mapper));

If you try to run that code, you’ll see an exception saying; “Cannot traverse an already closed generator”. Each iterator function in this library has a rewindable counterpart:

如果您尝试运行该代码，则会看到一个异常提示： “无法遍历已经关闭的发电机”。此库中的每个迭代器函数都有一个可回退的对应项：

$mapper = iter\rewindable\map(
    function($item) {
        return $item * 2;
    },
    [1, 2, 3]
);

You can use this mapping function many times. You can even make your own generators rewindable:

您可以多次使用此映射功能。您甚至可以使自己的发电机可回绕：

$rewindable = iter\makeRewindable(function($max = 13) {
    $older = 0;
    $newer = 1;

    do {
        $number = $newer + $older;

        $older = $newer;
        $newer = $number;

        yield $number;
    }
    while($number < $max);
});

print join(", ", iter\toArray($rewindable()));

What you get from this is a reusable generator!

您从中得到的是可重用的生成器！

结论 (Conclusion)

For every looping thing you need to think about, generators may be an option. They can even be useful for other things ,too. And where the language falls short, Nikic’s library steps in with higher-order functions aplenty.

对于您需要考虑的每一个循环事物，可以选择使用生成器。它们甚至可能对其他事情也有用。在语言不足的地方，Nikic的库大量加入了高阶函数。

Are you using generators yet? Would you like to see more examples on how to implement them in your own apps to gain some performance upgrades? Let us know!

您正在使用发电机吗？您是否想查看更多有关如何在自己的应用程序中实现它们以获取一些性能升级的示例？让我们知道！

翻译自: https://www.sitepoint.com/memory-performance-boosts-with-generators-and-nikiciter/

idea iter生成指定

culh2177

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
idea iter生成指定_生成器和Nikic / Iter提高了内存性能

idea iter生成指定Arrays, and by extension iteration, are fundamental parts to any application. And like the complexity of our applications, how we use them should evolve as we gain access to new tools. 数...
复制链接

扫一扫