集合 求全集 高效 算法 性能对比 与 分析 JAVA

本文探讨如何在Java中快速获取两个集合的全集,提供了四种方法:集合减法、遍历所有元素、使用set合并去重以及使用stream的方式,并进行了性能对比测试,方法三(直接使用HashSet合并)表现最优。
摘要由CSDN通过智能技术生成

问题:

两个集合,分别含有一定数量的元素,如何快速得到两个集合的合集?

举例:

给定两个集合List<String> list1和List<String> list2,假定两个集合分别具有m和n个元素,要求得到他们的全集(实际上就是去重复)。

说明:

1.以String作为集合中元素的类型,如果是自定义的数据结构,需要重写equals方法

2.输入参数:第一个集合list1,第二个集合list2

3.输出参数:合集的集合结果

实现: 

方法一:集合减法

全集、交集和差集:这三者使用相同的方法下,可以知二得三。全集=交集+差集(2);交集=全集-差集;差集=全集-交集。 因此,使用上一篇文章(集合 求交集 求差集 高效 算法 性能对比 与 分析 JAVA)中得到的交集和差集,我们可以直接得到全集。同时基于文章求交集和差集的各种思路,我们也能稍作改动得到全集的求解方法。

例如:先求交集再去掉交集

    public static List<String> getDiff(List<String> listA, List<String> listB) {
        List<String> dif = new ArrayList<>();//交集
        List<String> res = new ArrayList<>();//不同的元素
        dif.addAll(listA);
        //先求出两个list的交集;
        dif.retainAll(listB);
        res.addAll(listA);
        res.addAll(listB);
        //用合集去掉交集,就是不同的元素;
        res.removeAll(dif);
        return res;
    }

例如一种比较差的实现:

对于有序的数据,我们就可以使用快速查找的算法,比如设置指针 i 和 j ,分别交替比较一轮,即可得到全集,相应的时间复杂度和空间复杂度是 O(m+n) ~ O(1) 。因此,对于未排序数据可以先对于list1和list2先进行排序,此时使用快速排序等高性能算法,然后执行上述操作。

方式二:遍历所有元素

    private static List<String> getAll(List<String> list1, List<String> list2) {
        List<String> all = new ArrayList<String>(list1);
        for (String str : list2) {
            if (!list1.contains(str)) {
                all.add(str);
            }
        }
        return all;
    }

方法三:使用set合并去重

利用set数据结构不会存储重复的元素的特性实现元素去重复,得到全集。

    public static List<String> getAll2(List<String> listA, List<String> listB) {
        Set<String> set = new HashSet<>(listA);
        set.addAll(listB);
        List<String> list = new ArrayList<>(set);
        return list;
    }

此处的 addAll() 相当于遍历 listB 中的每个元素并分别执行 add() 操作。 

同样也可以不用一个list直接初始化,使用两次 addAll() :

Set<String> set = new HashSet<>();
set.addAll(listA);
set.addAll(listB);

addAll() 方法的源码实现:


    /**
     * Appends all of the elements in the specified collection to the end of
     * this list, in the order that they are returned by the
     * specified collection's Iterator.  The behavior of this operation is
     * undefined if the specified collection is modified while the operation
     * is in progress.  (This implies that the behavior of this call is
     * undefined if the specified collection is this list, and this
     * list is nonempty.)
     *
     * @param c collection containing elements to be added to this list
     * @return <tt>true</tt> if this list changed as a result of the call
     * @throws NullPointerException if the specified collection is null
     */
    public boolean addAll(Collection<? extends E> c) {
        Object[] a = c.toArray();
        int numNew = a.length;
        ensureCapacityInternal(size + numNew);  // Increments modCount
        System.arraycopy(a, 0, elementData, size, numNew);
        size += numNew;
        return numNew != 0;
    }

 其中,arraycopy调用了底层封装函数:


    /**
     * Copies an array from the specified source array, beginning at the
     * specified position, to the specified position of the destination array.
     * A subsequence of array components are copied from the source
     * array referenced by <code>src</code> to the destination array
     * referenced by <code>dest</code>. The number of components copied is
     * equal to the <code>length</code> argument. The components at
     * positions <code>srcPos</code> through
     * <code>srcPos+length-1</code> in the source array are copied into
     * positions <code>destPos</code> through
     * <code>destPos+length-1</code>, respectively, of the destination
     * array.
     * <p>
     * If the <code>src</code> and <code>dest</code> arguments refer to the
     * same array object, then the copying is performed as if the
     * components at positions <code>srcPos</code> through
     * <code>srcPos+length-1</code> were first copied to a temporary
     * array with <code>length</code> components and then the contents of
     * the temporary array were copied into positions
     * <code>destPos</code> through <code>destPos+length-1</code> of the
     * destination array.
     * <p>
     * If <code>dest</code> is <code>null</code>, then a
     * <code>NullPointerException</code> is thrown.
     * <p>
     * If <code>src</code> is <code>null</code>, then a
     * <code>NullPointerException</code> is thrown and the destination
     * array is not modified.
     * <p>
     * Otherwise, if any of the following is true, an
     * <code>ArrayStoreException</code> is thrown and the destination is
     * not modified:
     * <ul>
     * <li>The <code>src</code> argument refers to an object that is not an
     *     array.
     * <li>The <code>dest</code> argument refers to an object that is not an
     *     array.
     * <li>The <code>src</code> argument and <code>dest</code> argument refer
     *     to arrays whose component types are different primitive types.
     * <li>The <code>src</code> argument refers to an array with a primitive
     *    component type and the <code>dest</code> argument refers to an array
     *     with a reference component type.
     * <li>The <code>src</code> argument refers to an array with a reference
     *    component type and the <code>dest</code> argument refers to an array
     *     with a primitive component type.
     * </ul>
     * <p>
     * Otherwise, if any of the following is true, an
     * <code>IndexOutOfBoundsException</code> is
     * thrown and the destination is not modified:
     * <ul>
     * <li>The <code>srcPos</code> argument is negative.
     * <li>The <code>destPos</code> argument is negative.
     * <li>The <code>length</code> argument is negative.
     * <li><code>srcPos+length</code> is greater than
     *     <code>src.length</code>, the length of the source array.
     * <li><code>destPos+length</code> is greater than
     *     <code>dest.length</code>, the length of the destination array.
     * </ul>
     * <p>
     * Otherwise, if any actual component of the source array from
     * position <code>srcPos</code> through
     * <code>srcPos+length-1</code> cannot be converted to the component
     * type of the destination array by assignment conversion, an
     * <code>ArrayStoreException</code> is thrown. In this case, let
     * <b><i>k</i></b> be the smallest nonnegative integer less than
     * length such that <code>src[srcPos+</code><i>k</i><code>]</code>
     * cannot be converted to the component type of the destination
     * array; when the exception is thrown, source array components from
     * positions <code>srcPos</code> through
     * <code>srcPos+</code><i>k</i><code>-1</code>
     * will already have been copied to destination array positions
     * <code>destPos</code> through
     * <code>destPos+</code><i>k</I><code>-1</code> and no other
     * positions of the destination array will have been modified.
     * (Because of the restrictions already itemized, this
     * paragraph effectively applies only to the situation where both
     * arrays have component types that are reference types.)
     *
     * @param      src      the source array.
     * @param      srcPos   starting position in the source array.
     * @param      dest     the destination array.
     * @param      destPos  starting position in the destination data.
     * @param      length   the number of array elements to be copied.
     * @exception  IndexOutOfBoundsException  if copying would cause
     *               access of data outside array bounds.
     * @exception  ArrayStoreException  if an element in the <code>src</code>
     *               array could not be stored into the <code>dest</code> array
     *               because of a type mismatch.
     * @exception  NullPointerException if either <code>src</code> or
     *               <code>dest</code> is <code>null</code>.
     */
    public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);

方式四:使用stream的方式合并去重

    public static List<String> getAll3(List<String> listA, List<String> listB) {
        List<String> streamList = Stream.of(listA, listB)
                .flatMap(Collection::stream)
                .distinct()
                .collect(Collectors.toList());
        return streamList;
    }

Stream能够提供将要处理的元素集合看作一种流, 流在管道中传输, 并且可以在管道的节点上进行处理, 比如筛选, 排序,聚合等。这里是将两个list分别转为数据流插入到集合中,经过去重复,最后转为 List 。

性能对比

由于求全集,我们只需要无脑放入然后想办法去重复即可,不需要进行区分比对,因此上一篇文章(集合 求交集 求差集 高效 算法 性能对比 与 分析 JAVA)中使用HashMap进行key和value映射的方式并不会带来性能提高,不如直接使用hashSet结构。

测试结果:

getDiff total times: 718069800
getAll total times: 295361800
getAll2 total times: 6414800
getAll3 total times: 70735000

测试代码:

import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class Test {
    public static void main(String[] args) {
        List<String> list1 = new ArrayList<String>();
        List<String> list2 = new ArrayList<String>();
        for (int i = 0; i < 10000; i++) {
            list1.add("test" + i);
            list2.add("test" + i * 2);
        }
        List<String> res = getDiff(list1, list2);
        List<String> res2 = getAll(list1, list2);
        List<String> res3 = getAll2(list1, list2);
        List<String> res4 = getAll3(list1, list2);

    }

    public static List<String> getAll3(List<String> listA, List<String> listB) {
        long st = System.nanoTime(); // 计时测试
        List<String> streamList = Stream.of(listA, listB)
                .flatMap(Collection::stream)
                .distinct()
                .collect(Collectors.toList());
        System.out.println("getAll3 total times: " + (System.nanoTime() - st)); // 输出运行时间
        return streamList;
    }

    public static List<String> getAll2(List<String> listA, List<String> listB) {
        long st = System.nanoTime(); // 计时测试
        Set<String> set = new HashSet<>(listA);
        set.addAll(listB);
        List<String> list = new ArrayList<>(set);
        System.out.println("getAll2 total times: " + (System.nanoTime() - st)); // 输出运行时间
        return list;
    }

    private static List<String> getAll(List<String> list1, List<String> list2) {
        long st = System.nanoTime(); // 计时测试
        List<String> all = new ArrayList<String>(list1);
        for (String str : list2) {
            if (!list1.contains(str)) {
                all.add(str);
            }
        }
        System.out.println("getAll total times: " + (System.nanoTime() - st)); // 输出运行时间
        return all;
    }

    public static List<String> getDiff(List<String> listA, List<String> listB) {
        long st = System.nanoTime(); // 计时测试
        List<String> dif = new ArrayList<>();//交集
        List<String> res = new ArrayList<>();//不同的元素
        dif.addAll(listA);
        //先求出两个list的交集;
        dif.retainAll(listB);
        res.addAll(listA);
        res.addAll(listB);
        //用合集去掉交集,就是不同的元素;
        res.removeAll(dif);
        System.out.println("getDiff total times: " + (System.nanoTime() - st)); // 输出运行时间
        return res;
    }
}

可见,方法三具有最优的性能,方法四次之。因为方法一使用了先求一个交集和差集再求全集,实际上走了远路,因此性能最差。

参考:

https://www.cnblogs.com/czpblog/archive/2012/08/06/2625794.html
https://blog.csdn.net/lixianrich/article/details/103822214
https://blog.csdn.net/sinat_21843047/article/details/78783681

以上就是关于集合操作的总结与性能分析,如果各位有其他方法,欢迎讨论交流并在评论区留言,文章将及时更新。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值