gs1-epc-tids_GS示例集合-第1部分

最新推荐文章于 2023-04-19 11:33:21 发布

cunfu6353

最新推荐文章于 2023-04-19 11:33:21 发布

阅读量497

点赞数

文章标签： java python 大数据机器学习人工智能

原文链接：https://www.infoq.com/articles/GS-Collections-by-Example-1/?topicPageSponsorship=c1246725-b0a7-43a6-9ef9-68102c8d48e1

版权

gs1-epc-tids

我是高盛（Goldman Sachs）的Java开发人员，技术研究员和董事总经理。我是高盛（Goldman Sachs）在2012年1月开源的GS Collections框架的创建者。我还是前Smalltalk开发人员。

当我开始使用Java时，我错过了两件事。

Smalltalk的块关闭（又名lambdas）
功能丰富的Smalltalk Collections Framework具有奇妙的功能。

我想要这些功能以及与现有Java Collections接口的兼容性。在2004年左右，我意识到没人会给我我在Java中寻找的一切。在这一点上，我还知道，至少在我职业生涯的未来10-15年中，我可能会使用Java进行编程。因此，我决定开始构建自己想要的东西。

快进10年。现在，我几乎拥有了Java所需的一切。我在Java 8中支持lambda，现在可以将lambda和方法引用与可用的功能最丰富的Java集合框架（GS集合）一起使用。

这是GS Collections，Java 8，Guava，Trove和Scala中可用功能的比较。这些可能不是您在Collections框架中寻找的所有功能，但是它们是我或与我合作的其他GS开发人员在过去10多年中在Java中所需的功能。

特征	GSC 5.0	Java 8	番石榴	宝藏	斯卡拉
丰富的API	✓	✓	✓		✓
介面	可读，可变，不可变，FixedSize，惰性	可变，流	易变，流利	可变的	可读，可变，不可变，懒惰
优化的布景和地图	✓ （+袋）			✓
不变的收藏	✓		✓		✓
原始集合	✓ （+袋，+不可变）			✓
多图	✓ （+袋，+分类袋）		✓ （+链接）		（多图特征）
包袋（多件套）	✓		✓
双图	✓		✓
迭代样式	渴望/懒惰串行/并行	懒，串行/并行	懒，序列号	急于，序列号	渴望/懒惰，串行/并行（仅懒惰）

我在去年的jClarity采访中描述了我认为使GS Collections有趣的功能组合。您可以在此处以其原始形式阅读它们。

既然Java 8已经发布并且包含Streams API，为什么还要使用GS Collections？尽管Streams API是Java Collections Framework的重大改进，但它并没有您可能需要的所有功能。如上表所示，GS Collections具有多图，包，不可变容器和原始容器。 GS Collections对HashSet和HashMap进行了优化的替代，其Bags和Multimaps建立在这些优化的类型上。 GS Collections迭代模式位于collections接口上，因此无需通过调用stream（）来“输入” API，而通过调用collect（）来“退出” API。在许多情况下，这导致代码更加简洁。最后，GS Collections可以与Java 5兼容。这对于库开发人员来说是特别重要的功能，因为在新的主要版本发布之后，他们倾向于在Java的较早版本上支持其库。

我将展示一些示例，说明如何以多种不同方式利用这些功能。这些示例是GS Collections Kata中练习的变体。我们在高盛内部使用的培训课程，教我们的开发人员如何使用GS Collections。我们将此培训开源为GitHub中的独立存储库。

示例1：过滤集合

您将要对GS Collections做的最常见的事情之一就是过滤一个集合。 GS Collections提供了几种不同的方法来实现这一目标。

在GS Collections Kata中，我们通常会从客户列表开始。在其中一个练习中，我想将客户列表筛选为仅包含居住在伦敦的客户的列表。以下代码显示了如何使用名为“ select”的迭代模式来完成此操作。

import com.gs.collections.api.list.MutableList; 
import com.gs.collections.impl.test.Verify;@Test public void getLondonCustomers() { 
      MutableList < Customer > customers = this.company.getCustomers(); 
      MutableList < Customer > londonCustomers = customers. select (c -> c.livesIn( "London" )); 
      Verify .assertSize( "Should be 2 London customers" , 2 , londonCustomers); 
}

MutableList上的select方法返回一个MutableList。该代码渴望执行，这意味着在select（）调用完成时，已执行了所有从源列表中选择匹配元素并将它们添加到目标列表中的计算。名称“选择”来自Smalltalk的传承。 Smalltalk具有一组基本的收集协议，这些收集协议名为select （aka filter ），拒绝（aka filterNot ）， collect （aka map ， transform ）， detect （aka findOne ）， detectIfNone ， injectInto （aka foldLeft ）， anySatisfy和allSatisfy 。

如果我想使用惰性评估来完成相同的事情，则可以这样编写：

MutableList < Customer > customers = this .company.getCustomers(); 
LazyIterable <C ustomer > londonCustomers = customers. asLazy (). select (c -> c.livesIn( "London" )); 
Verify . assertIterableSize ( 2 , londonCustomers);

在这里，我添加了对名为asLazy（）的方法的调用。所有其他代码几乎保持不变。由于对asLazy（）的调用，select的返回类型已更改。取而代之的是MutableList <客户>的，现在我拿回LazyIterable <客户>。 这几乎与使用Java 8中新的Streams API的以下代码等效：

List < Customer > customers = this .company.getCustomers(); 
Stream < Customer > stream = customers. stream (). filter (c -> c.livesIn( " London " )); 
List <Customer> londonCustomers = stream. collect ( Collectors . toList ()); 
Verify . assertSize ( 2 , londonCustomers);

在这里，方法stream（） ，然后对filter（）的调用返回Stream <Customer> 。为了测试大小，我必须如上所述将Stream转换为List，或者可以使用Java 8 Stream.count（）方法：

List < Customer > customers = this.company.getCustomers(); 
Stream < Customer > stream = customers. stream (). filter (c -> c.livesIn( " London " )); 
Assert . assertEquals ( 2 , stream. count ());

这两个GS Collections接口MutableList和LazyIterable共享一个共同的祖先，即RichIterable 。实际上，我可以只使用RichIterable编写所有这些代码。这是仅延迟使用RichIterable <Customer>的示例

RichIterable < Customer > customers = this.company.getCustomers(); 
RichIterable < Customer > londonCustomers = customers. asLazy (). select (c -> c.livesIn( " London " )); 
Verify . assertIterableSize ( 2 , londonCustomers);

然后又热切地

RichIterable < Customer > customers = this .company.getCustomers(); 
RichIterable < Customer > londonCustomers = customers. select (c -> c.livesIn( " London " )); 
Verify .assertIterableSize( 2 , londonCustomers);

如这些实施例所示，可以RichIterable代替LazyIterable和MutableList的使用，因为它是两个根接口。

客户列表可能是不可变的。如果我有一个ImmutableList <Customer> ，这是更改类型的方式：

ImmutableList < Customer > customers = this .company.getCustomers(). toImmutable (); 
ImmutableList < Customer > londonCustomers = customers. select (c -> c.livesIn( "London" ));
Verify .assertIterableSize( 2 , londonCustomers);

像其他RichIterables一样，我们可以懒惰地遍历ImmutableList 。

ImmutableList < Customer > customers = this .company.getCustomers().toImmutable(); 
LazyIterable < Customer > londonCustomers = customers. asLazy (). select (c -> c.livesIn( " London " )); 
Assert . assertEquals ( 2 , londonCustomers. size ());

有一个名为ListIterable两个MutableList和ImmutableList共同的父接口。可以代替任何一种类型使用它作为更通用的类型。 RichIterable是ListIterable父类型。因此，此代码也可以更一般地编写如下：

ListIterable < Customer > customers = this .company.getCustomers(). toImmutable (); 
LazyIterable < Customer > londonCustomers = customers. asLazy (). select (c -> c.livesIn( " London " )); 
Assert.assertEquals( 2 , londonCustomers.size());

或更一般而言：

RichIterable < Customer > customers = this .company.getCustomers(). toImmutable (); 
RichIterable < Customer > londonCustomers = customers.asLazy(). select (c -> c.livesIn( " London " )); 
Assert .assertEquals( 2 , londonCustomers.size());

GS Collections的接口层次结构遵循非常基本的模式。对于每种类型（列表，集合，袋子，地图），都有一个可读的接口（ ListIterable，SetIterable，袋子，MapIterable ），可变接口（ MutableList，MutableSet，MutableBag，MutableMap ）和一个不可变接口（ ImmutableList，ImmutableSet， ImmutableBag，ImmutableMap ）。

（点击图片放大）

图1.基本的GSC容器接口层次结构

这是使用Set而不是List的相同代码的示例：

MutableSet < Customer > customers = this .company.getCustomers(). toSet (); 
MutableSet < Customer > londonCustomers = customers. select (c -> c.livesIn( " London " )); 
Assert .assertEquals( 2 , londonCustomers.size());

这是用Set懒惰编写的类似解决方案：

MutableSet < Customer > customers = this .company.getCustomers().toSet(); 
LazyIterable < Customer > londonCustomers = customers. asLazy (). select (c -> c.livesIn( " London " ));
Assert .assertEquals( 2 , londonCustomers.size());

这是使用Set的解决方案，并使用最通用的接口：

RichIterable < Customer > customers = this.company.getCustomers(). toSet (); 
RichIterable < Customer > londonCustomers = customers. asLazy (). select (c -> c.livesIn( "London" )); 
Assert .assertEquals( 2 , londonCustomers.size());

接下来，我将说明可用于从一种容器类型转换为另一种容器类型的机制。首先，让我们在延迟过滤的同时从列表转换为集合：

MutableList < Customer > customers = this .company.getCustomers(); 
LazyIterable < Customer > lazyIterable = customers. asLazy (). select (c -> c.livesIn( "London" )); 
MutableSet < Customer > londonCustomers = lazyIterable. toSet (); 
Assert .assertEquals( 2 , londonCustomers.size());

因为该API非常流畅，所以我可以将所有这些方法链接在一起：

MutableSet < Customer > londonCustomers = 
       this .company.getCustomers() 
       . asLazy () 
       . select (c -> c.livesIn( "London" )) 
       . toSet (); 
Assert .assertEquals( 2 , londonCustomers.size());

我将它留给读者来决定这是否影响可读性。如果我认为它将有助于将来的代码读者更好地理解事物，那么我倾向于打断流利的调用并介绍中间类型。这是以要阅读更多代码为代价的，但这反过来又可以降低理解的成本，这对于不那么频繁的代码阅读者而言可能更为重要。

我可以在select方法本身中完成List到Set的转换。 select方法具有定义的重载形式，该形式将谓词作为第一个参数，并将结果集合作为第二个参数：

MutableSet < Customer > londonCustomers = 
       this .company.getCustomers() 
       . select (c -> c.livesIn( "London" ), UnifiedSet .newSet() ); 
Assert .assertEquals( 2 , londonCustomers.size());

请注意，我如何使用相同的方法返回所需的任何Collection类型。在以下情况下，我取回MutableBag <Customer>

MutableBag < Customer > londonCustomers = 
       this .company.getCustomers() 
       . select (c -> c.livesIn( "London" ), HashBag .newBag() ); 
Assert .assertEquals( 2 , londonCustomers.size());

在以下情况下，我将返回一个CopyOnWriteArrayList ，它是JDK的一部分。关键是该方法将返回我指定的任何类型，但它必须是实现java.util.Collection的类：

C opyOnWriteArrayList < Customer > londonCustomers = 
       this .company.getCustomers() 
       . select (c -> c.livesIn( "London" ), new CopyOnWriteArrayList <>()); 
Assert .assertEquals( 2 , londonCustomers.size());

在所有这些示例中，我们一直使用lambda。 select方法采用谓词，该谓词是GS Collections中的功能接口，定义如下：

public interface Predicate <T> extends  Serializable { 
       boolean  accept(T each); 
}

我一直在使用的lambda很简单。我将其提取到一个单独的变量中，以便更清楚地了解此lambda在代码中表示什么：

Predicate < Customer > predicate = c -> c.livesIn( "London" ); 
MutableList < Customer > londonCustomers = this .company.getCustomers(). select ( predicate ); 
Assert .assertEquals( 2 , londonCustomers.size());

在客户上定义的方法liveIn（）非常简单。定义如下：

public boolean livesIn( String city) { 
       return city.equals( this .city); 
}

如果我可以在这里使用方法引用而不是lambda， 那就可以利用liveIn方法。但是此代码无法编译：

Predicate < Customer > predicate = Customer ::livesIn ;

编译器给我以下错误：

Error:(65, 37) java: incompatible types: invalid method reference 
      incompatible types: com.gs.collections.kata.Customer cannot be converted to java.lang.String

这是因为此方法引用将需要两个参数，即Customer和city String。谓词的另一种形式称为Predicate2 ，它将在这里工作。

Predicate2 < Customer , String > predicate = Customer ::livesIn ;

请注意， Predicate2采用两种通用类型， Customer和String 。有一种特殊的select形式，称为selectWith ，可以使用此Predicate2 。

Predicate2 < Customer , String > predicate = Customer ::livesIn ; 
MutableList < Customer > londonCustomers = this.company.getCustomers(). selectWith (predicate, "London" ); 
Assert .assertEquals( 2 , londonCustomers.size());

通过内联方法引用，可以更简单地编写如下：

MutableList < Customer > londonCustomers = this .company.getCustomers(). selectWith ( Customer ::livesIn , "London" ); 
Assert .assertEquals( 2 , londonCustomers.size());

字符串“伦敦”作为第二个参数传递给Predicate2上定义的方法的每次调用。第一个参数将是列表中的客户。

selectWith方法与select一样，在RichIterable上定义。因此，我之前用select演示的所有内容都可以使用selectWith 。这包括对所有不同的可变和不可变接口的支持，对不同协变量类型的支持以及对延迟迭代的支持。 selectWith还有一种形式，它带有第三个参数。与使用两个参数进行选择相似， selectWith中的第三个参数可以采用目标集合。

例如，以下代码使用selectWith从列表过滤到集合：

MutableSet < Customer > londonCustomers = 
       this .company.getCustomers () 
       . selectWith ( Customer ::livesIn, "London" , UnifiedSet .newSet() );
Assert .assertEquals( 2 , londonCustomers.size());

这也可以通过以下代码懒洋洋地完成：

MutableSet < Customer > londonCustomers = 
       this .company.getCustomers() 
       . asLazy () 
       . selectWith ( Customer ::livesIn, "London" ) 
       . toSet (); 
Assert .assertEquals( 2 , londonCustomers.size());

我将展示的最后一件事是select和selectWith方法可以与任何扩展java.lang.Iterable的集合一起使用。这包括所有JDK类型以及任何第三方集合库。 GS Collections中存在的第一个类是名为Iterate的实用程序类。这是一个代码示例，该示例演示如何使用Iterate从Iterable中进行选择。

Iterable < Customer > customers = this .company.getCustomers(); 
Collection < Customer > londonCustomers = Iterate . select (customers, c -> c.livesIn( "London" )); 
Assert .assertEquals( 2 , londonCustomers.size());

selectWith变体也可用：

Iterable < Customer > customers = this .company.getCustomers(); 
Collection < Customer > londonCustomers = Iterate . selectWith (customers, Customer ::livesIn, "London" ); 
Assert.assertEquals( 2 , londonCustomers.size());

还有一些采用目标集合的变体。所有基本的迭代协议都可以在Iterate上获得 。还有一个实用程序类，它涵盖了懒惰的迭代（名为LazyIterate ），它还可以与任何扩展java.lang.Iterable的容器一起使用。例如：

Iterable < Customer > customers = this .company.getCustomers(); 
LazyIterable < Customer > londonCustomers = LazyIterate . select (customers, c -> c.livesIn( "London" )); 
Assert .assertEquals( 2 , londonCustomers.size());

使用更多面向对象的API来完成此操作的更好方法是使用适配器类。这是将ListAdapter与java.util.List结合使用的示例：

List < Customer > customers = this .company.getCustomers(); 
MutableList < Customer > londonCustomers = 
       ListAdapter . adapt (customers). select (c -> c.livesIn( "London" )); 
Assert .assertEquals( 2 , londonCustomers.size());

正如您现在所期望的那样，这可以写得很懒。

List < Customer > customers = this .company.getCustomers(); 
LazyIterable < Customer > londonCustomers = 
    ListAdapter . adapt (customers) 
    . asLazy () 
    . select (c -> c.livesIn( "London" ));
Assert .assertEquals( 2 , londonCustomers.size());

selectWith（）方法也可以在ListAdapter上懒惰地工作：

List < Customer > customers = this.company.getCustomers(); 
LazyIterable < Customer > londonCustomers = 
        ListAdapter .adapt(customers) 
        . asLazy () 
        . selectWith ( Customer ::livesIn, "London" ); 
Assert .assertEquals( 2 , londonCustomers.size());

SetAdapter可以类似地用于java.util.Set的任何实现。

现在，如果您有可以从数据级并行性中受益的问题，则可以使用两种方法之一来并行化此问题。首先，我将演示如何使用ParallelIterate类通过渴望/并行算法解决此问题：

Iterable < Customer > customers = this.company.getCustomers(); 
Collection < Customer > londonCustomers = ParallelIterate . select (customers, c -> c.livesIn( "London" )); 
Assert .assertEquals( 2 , londonCustomers.size());

ParallelIterate类将任何Iterable用作参数，并始终返回java.util.Collection作为其结果。自2005年以来， ParallelIterate就已经出现在GS Collections中。Eager / parallel一直是GS Collections支持的唯一并行形式，直到5.0版本为止，当时我们向RichIterable添加了惰性/并行API。我们在RichIterable上没有渴望/并行的API，因为我们认为惰性/并行作为默认情况更有意义。将来，我们可能会根据我们收到的关于延迟/并行API有用性的反馈，直接向RichIterable添加一个渴望/并行API。

如果我想使用惰性/并行API解决相同的问题，我将编写如下代码：

FastList < Customer > customers = this .company.getCustomers(); 
ParallelIterable < Customer > londonCustomers = 
     customers. asParallel ( Executors .newFixedThreadPool( 2 ), 100 ) 
        . select (c -> c.livesIn( "London" )); 
Assert .assertEquals( 2 , londonCustomers. toList ().size());

如今， asParallel（）方法仅存在于GS Collections中的几个具体容器上。该API尚未提升为MutableList， ListIterable或RichIterable之类的任何接口。 asParallel（）方法采用两个参数– ExecutorService和批处理大小。将来，我们可能会添加一个asParallel（）版本，该版本会自动计算批量大小。

在此示例中，我可以选择使用更具体的类型：

FastList<Customer> customers = this.company.getCustomers();ParallelListIterable < Customer > londonCustomers = 
     customers.asParallel( Executors .newFixedThreadPool( 2 ), 100 ) 
          . select (c -> c.livesIn( "London" )); 
Assert .assertEquals( 2 , londonCustomers.toList().size());

有一个ParallelIterable层次结构，其中包括ParallelListIterable ， ParallelSetIterable和ParallelBagIterable 。

我已经演示了使用select（）和selectWith（）过滤GS Collections中的集合的几种不同方法。我已经展示了使用GS Collections RichIterable层次结构中的不同类型的急切，延迟，串行和并行迭代的许多组合。

在下个月要发布的本文的第2部分中，我将介绍一些示例，其中包括collect，groupBy，flatCollect以及一些原始容器以及它们上可用的丰富API。我在第2部分中介绍的示例将不会涉及太多细节或探索尽可能多的选项，但是值得注意的是，这些细节和选项很可能可用。

翻译自: https://www.infoq.com/articles/GS-Collections-by-Example-1/?topicPageSponsorship=c1246725-b0a7-43a6-9ef9-68102c8d48e1

gs1-epc-tids

cunfu6353

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
gs1-epc-tids_GS示例集合-第1部分

gs1-epc-tids 我是高盛（Goldman Sachs）的Java开发人员，技术研究员和董事总经理。我是高盛（Goldman Sachs）在2012年1月开源的GS Collections框架的创建者。我还是前Smalltalk开发人员。当我开始使用Java时，我错过了两件事。 Smalltalk的块关闭（又名lambdas）功能丰富的Smalltalk Collec...
复制链接

扫一扫