此代码从原始列表中删除重复项,但我想从原始列表中提取重复项 – >不删除它们(此包名称只是另一个项目的一部分):
鉴于:
一个人pojo:
package at.mavila.learn.kafka.kafkaexercises;
import org.apache.commons.lang3.builder.ToStringBuilder;
public class Person {
private final Long id;
private final String firstName;
private final String secondName;
private Person(final Builder builder) {
this.id = builder.id;
this.firstName = builder.firstName;
this.secondName = builder.secondName;
}
public Long getId() {
return id;
}
public String getFirstName() {
return firstName;
}
public String getSecondName() {
return secondName;
}
public static class Builder {
private Long id;
private String firstName;
private String secondName;
public Builder id(final Long builder) {
this.id = builder;
return this;
}
public Builder firstName(final String first) {
this.firstName = first;
return this;
}
public Builder secondName(final String second) {
this.secondName = second;
return this;
}
public Person build() {
return new Person(this);
}
}
@Override
public String toString() {
return new ToStringBuilder(this)
.append("id", id)
.append("firstName", firstName)
.append("secondName", secondName)
.toString();
}
}
复制提取代码.
请注意,我们过滤id和第一个名称以检索新列表,我在其他地方看到了这个代码,而不是我的:
package at.mavila.learn.kafka.kafkaexercises;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;
import static java.util.Objects.isNull;
public final class DuplicatePersonFilter {
private DuplicatePersonFilter() {
//No instances of this class
}
public static List getDuplicates(final List personList) {
return personList
.stream()
.filter(duplicateByKey(Person::getId))
.filter(duplicateByKey(Person::getFirstName))
.collect(Collectors.toList());
}
private static Predicate duplicateByKey(final Function super T, Object> keyExtractor) {
Map seen = new ConcurrentHashMap<>();
return t -> isNull(seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE));
}
}
测试代码.
如果您运行此测试用例,您将获得[alex,lolita,elpidio,romualdo].
我希望得到[romualdo,otroRomualdo]作为提取的重复项给定id和firstName:
package at.mavila.learn.kafka.kafkaexercises;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import static org.junit.Assert.*;
public class DuplicatePersonFilterTest {
private static final Logger LOGGER = LoggerFactory.getLogger(DuplicatePersonFilterTest.class);
@Test
public void testList(){
Person alex = new Person.Builder().id(1L).firstName("alex").secondName("salgado").build();
Person lolita = new Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();
Person elpidio = new Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();
Person romualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();
Person otroRomualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();
List personList = new ArrayList<>();
personList.add(alex);
personList.add(lolita);
personList.add(elpidio);
personList.add(romualdo);
personList.add(otroRomualdo);
final List duplicates = DuplicatePersonFilter.getDuplicates(personList);
LOGGER.info("Duplicates: {}",duplicates);
}
}
在我的工作中,我能够通过使用Comparator使用TreeMap和ArrayList获得所需的结果,但这是创建一个列表然后过滤它,再次将过滤器传递给新创建的列表,这看起来是膨胀的代码,(并且可能效率低下)
有人有更好的想法如何提取重复项?,而不是删除它们.
提前致谢.
更新:
谢谢大家的回答
要使用uniqueAttributes使用相同的方法删除重复项:
public static List removeDuplicates(final List personList) {
return personList.stream().collect(Collectors
.collectingAndThen(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(
PersonListFilters::uniqueAttributes))),
ArrayList::new));
}
private static String uniqueAttributes(Person person){
if(Objects.isNull(person)){
return StringUtils.EMPTY;
}
return (person.getId()) + (person.getFirstName()) ;
}