Java C++ 数据结构对比

最新推荐文章于 2024-05-15 17:22:30 发布

胖胖的大星

最新推荐文章于 2024-05-15 17:22:30 发布

阅读量2.8k

点赞数 6

分类专栏：算法与数据结构文章标签： java c 数据结构

本文链接：https://blog.csdn.net/mymottoissh/article/details/82826709

版权

算法与数据结构专栏收录该内容

2 篇文章 0 订阅

订阅专栏

转载请注明出处：https://blog.csdn.net/mymottoissh/article/details/82826709

目的是总结一下 JAVA 和 C++几种常用的数据结构内存模型，知道了内存结构，才能在实战中更好滴选用数据结构。通过总结，日后也避免把两种语言的结构混为一谈。闲言少叙，开始正文。

关键字：JAVA C++ 数据结构

一、动态数组

二、链表

三、树

首先做一个整体的概括：

底层数据存储只分成两大类，一种是连续存储，如数组。一种是链式存储，如链表。在这基础之上，结合结构和算法，才衍生出如树、图之类的结构。前者的优点：随机寻址快。后者优点：插删速度快。这里也会结合源码做一些剖析。

一、动态数组

代表结构：

JAVA ：ArrayList、Vector、HashMap/HashTable/LinkedHashMap、HashSet

C++ ： vector、deque<特>、stack/queue <特>

开始接触的时候很困惑，Java里明明写的是list，为什么还算作数组。这就是恶心的地方，所以千万不要别外表欺骗了。

Java::ArrayList

ArrayList无疑是项目中最常用的结构，为了查找他底层的存储原理，直接看他的add方法

    /**
     * Appends the specified element to the end of this list.
     *
     * @param e element to be appended to this list
     * @return <tt>true</tt> (as specified by {@link Collection#add})
     */
    public boolean add(E e) {
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        elementData[size++] = e;
        return true;
    }

显然，存储结构为一数组 elementData。这里有一个关键方法实现动态内存

    private void ensureCapacityInternal(int minCapacity) {
        if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
            minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
        }

        ensureExplicitCapacity(minCapacity);
    }

    private void ensureExplicitCapacity(int minCapacity) {
        modCount++;

        // overflow-conscious code
        if (minCapacity - elementData.length > 0)
            grow(minCapacity);
    }

    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

每次添加一个数据之前，都要保证数组中有比现有数据长度多1的容量，如果保证不了，则会创建一个新的容量为以前1.5倍的数组，并把元素进行复制。可想而知，如果需要反复扩容的话，ArrayList的效率是非常低的。同时，ArrayList也保留了数组本身的特点：如果在某一位置增删数据，需要把后面的数据全都复制一遍。这部分代码就不贴了。

Java::Vector

    /**
     * Appends the specified element to the end of this Vector.
     *
     * @param e element to be appended to this Vector
     * @return {@code true} (as specified by {@link Collection#add})
     * @since 1.2
     */
    public synchronized boolean add(E e) {
        modCount++;
        ensureCapacityHelper(elementCount + 1);
        elementData[elementCount++] = e;
        return true;
    }

在add方法下，可以看到与ArrayList类似的存储：数组。增加之前，首先确保容量，唯一不同的在于在前面加入了synchronized关键字。所以在多线程环境下，应该选用Vector。

JAVA::HashMap/HashTable/LinkedHashMap

这里把HashMap及相关结归结到了动态数组里，这关系到HashMap的实现原理。直接放源码

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

/**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null) //索引下标计算
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

首先可以看到存储结构为一Node数组。在这里有点小疑问，Java里的Node是一个单向链表节点，而这里却用了一个Node数组存储。究其原因，还有分析一下HashMap的原理。当向map中put一个键值对时，首先通过hash方法计算key的哈希值，以 (数组长度-1)&哈希值 作为数组索引下标。如果该下标对应的数组值为空，则new一个新节点。如果已经有值，那该值对应的key一定等于新增的可以吗？当然不是，因为可能会有哈希碰撞。这时候就用到单向链表结构的特性了，如果当前的数组值对应的key与新增key不相等，则会遍历以当前Node值为首节点的链表，直到找到与新key相等的节点为止。其中TREEIFY_THRESHOLD 定义为8，如果超过次链表长度，将转为树存储。更形象的表示，HashMap的结构应该是这样滴。

在put源码中，有两个HashMap未实现的函数，但是在LinkedHashMap中实现了，这也保证了其有序的元素存储。

// Callbacks to allow LinkedHashMap post-actions
    void afterNodeAccess(Node<K,V> p) { }
    void afterNodeInsertion(boolean evict) { }
    void afterNodeRemoval(Node<K,V> p) { }

至于HashTable与HashMap的区别，这里就不造轮子了

引用一段： https://blog.csdn.net/qq_29631809/article/details/72599708

HashMap和Hashtable的主要的区别有：线程安全性，同步(synchronization)，以及速度。

HashMap几乎可以等价于Hashtable，除了HashMap是非synchronized的，并可以接受null(HashMap可以接受为null的键值(key)和值(value)，而Hashtable则不行)。 HashMap是非synchronized，而Hashtable是synchronized，这意味着Hashtable是线程安全的，多个线程可以共享一个Hashtable；而如果没有正确的同步的话，多个线程是不能共享HashMap的。Java 5提供了ConcurrentHashMap，它是HashTable的替代，比HashTable的扩展性更好。
另一个区别是HashMap的迭代器(Iterator)是fail-fast迭代器，而Hashtable的enumerator迭代器不是fail-fast的。所以当有其它线程改变了HashMap的结构（增加或者移除元素），将会抛出ConcurrentModificationException，但迭代器本身的remove()方法移除元素则不会抛出ConcurrentModificationException异常。但这并不是一个一定发生的行为，要看JVM。这条同样也是Enumeration和Iterator的区别。由于Hashtable是线程安全的也是synchronized，所以在单线程环境下它比HashMap要慢。如果你不需要同步，只需要单一线程，那么使用HashMap性能要好过Hashtable。 HashMap不能保证随着时间的推移Map中的元素次序是不变的。

Java::HashSet

废话不多说，贴一个源码就知道了

    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element <tt>e</tt> to this set if
     * this set contains no element <tt>e2</tt> such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns <tt>false</tt>.
     *
     * @param e element to be added to this set
     * @return <tt>true</tt> if this set did not already contain the specified
     * element
     */
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

HashSet内部维护了一个HashMap的，对HashSet的操作，实际上就是多HashMap的操作。

C++::vector

void push_back(_Ty&& _Val)
{	// insert element at end
if (_Inside(_STD addressof(_Val)))
	{	// push back an element
	size_type _Idx = _STD addressof(_Val) - this->_Myfirst;
	if (this->_Mylast == this->_Myend)
		_Reserve(1);
	_Orphan_range(this->_Mylast, this->_Mylast);
	_Cons_val(this->_Alval,
		this->_Mylast,
		_STD forward<_Ty>(this->_Myfirst[_Idx]));
	++this->_Mylast;
	}
else
	{	// push back a non-element
	if (this->_Mylast == this->_Myend)
		_Reserve(1);
	_Orphan_range(this->_Mylast, this->_Mylast);
	_Cons_val(this->_Alval,
		this->_Mylast,
		_STD forward<_Ty>(_Val));
	++this->_Mylast;
	}
}

pushback时，首先判断该元素地址是否属于数组内部，如果是，则直接在改地址对应位置更新数据，同时在最末端位置压入数据。如果不是，则在数组末端添加数据。

首先，vector是单端数组，只提供了push_back函数，虽然也提供了insert函数，但是插入时，会导致后续元素进行交换反转，所以插入效率极低。

其次，_Reserve函数用于数组扩容，扩大为原来的1.5倍。在扩容的过程中会new一块新的内存空间。这意味着一旦被扩容，指向原有元素的迭代器将失效。

C++::deque

双端数组，提供前端压入函数push_front和末端压入函数push_back。虽然命名为双端数组，而且从某种意义上来说是内存连续的，但是关于deque有必要重点说明一下。首先来做一个实验。

有如下代码，编译运行

int main() {
    std::deque<int> deq;
    for(int i = 0; i < 10; i++) {
        deq.push_back(i);
    }
    for(int i = 0; i < 10; i++) {
        cout << deq[i] << " " << &deq[i] << endl;
    }
	system("pause");
	return 0;
}

来看一运行结果

0 00000000002CAF10
1 00000000002CAF14
2 00000000002CAF18
3 00000000002CAF1C
4 00000000002CAF60
5 00000000002CAF64
6 00000000002CAF68
7 00000000002CAF6C
8 00000000002DD2D0
9 00000000002DD2D4

奇怪吧，从运行结果可以看到，每四个元素的内存是连续的，从第五个开始，会跳过一段空间，重新组织内存。那关于deque实现到底是怎样的呢。用一句话说，deque是数组和指针（并不能说是链表）的组合实现。上源码。

void push_front(_Ty&& _Val)
{	// insert element at beginning
	this->_Orphan_all();
	_PUSH_FRONT_BEGIN;
	_Cons_val(this->_Alval,
		this->_Map[_Block] + _Newoff % _DEQUESIZ,
		_STD forward<_Ty>(_Val));
	_PUSH_FRONT_END;
}

void push_back(_Ty&& _Val)
{	// insert element at end
	this->_Orphan_all();
	_PUSH_BACK_BEGIN;
	_Cons_val(this->_Alval,
		this->_Map[_Block] + _Newoff % _DEQUESIZ,
		_STD forward<_Ty>(_Val));
	_PUSH_BACK_END;
}

从源码中可以看到，插入的时候，实际上是取的_Map[_Block]所存储的地址。

_Mapptr _Map;		// pointer to array of pointers to blocks

而通过注释看到，_Map实际上是一个二级指针，指向的是一个数组，这个数组存放的是指向block的指针。那么问题又来了，block是什么鬼。

size_type _Getblock(size_type _Off) const
{	// determine block from offset
	// NB: _Mapsize and _DEQUESIZ are guaranteed to be powers of 2
	return ((_Off / _DEQUESIZ) & (_Mapsize - 1));
}

看看这个获取block的函数就大致明白了，block是一块存储内存。其实在deque内部，数组里的每个元素存储的都是一个指向这种block的指针。block是一段连续的存储空间，在block内部的元素，内存都是连续的，但是block与block之间内存不连续。那么block的大小是怎么确定的呢。

#define _DEQUESIZ	(sizeof (value_type) <= 1 ? 16 \
	: sizeof (value_type) <= 2 ? 8 \
	: sizeof (value_type) <= 4 ? 4 \
	: sizeof (value_type) <= 8 ? 2 \
	: 1)	/* elements per block (a power of 2) */

就是通过上述这个算法计算得到。这也解释了为什么我们实验中，针对deque<int>对象，block的大小为4。

另外，关于前插和后插，offset的计算可以区分。在插入数据的前后，分别通过宏定义做了一些操作，相信这就是区分前还是后的关键。来看一下宏定义中都做了什么。

#define _PUSH_FRONT_BEGIN \
	if (this->_Myoff % _DEQUESIZ == 0 \
		&& this->_Mapsize <= (this->_Mysize + _DEQUESIZ) / _DEQUESIZ) \
		_Growmap(1); \
	size_type _Newoff = this->_Myoff != 0 ? this->_Myoff \
		: this->_Mapsize * _DEQUESIZ; \
	size_type _Block = --_Newoff / _DEQUESIZ; \
	if (this->_Map[_Block] == 0) \
		this->_Map[_Block] = this->_Alval.allocate(_DEQUESIZ)

#define _PUSH_BACK_BEGIN \
	if ((this->_Myoff + this->_Mysize) % _DEQUESIZ == 0 \
		&& this->_Mapsize <= (this->_Mysize + _DEQUESIZ) / _DEQUESIZ) \
		_Growmap(1); \
	size_type _Newoff = this->_Myoff + this->_Mysize; \
	size_type _Block = _Newoff / _DEQUESIZ; \
	if (this->_Mapsize <= _Block) \
		_Block -= this->_Mapsize; \
	if (this->_Map[_Block] == 0) \
		this->_Map[_Block] = this->_Alval.allocate(_DEQUESIZ)

对比发现，逻辑基本一致。首先判断容量是否够用，不够用则扩容。然后计算偏移量，即写入数据的位置，最后调用_Cons_val写入数据。不过这个代码是有讲究的，在分析源码之前，我对这个结构也有些误解。

首先是扩容判断条件。既然是双端数组，以前认为首末两端都会预留一些空位用于写入数据，所以如果有一端空间不足，则会扩容。而实际的情况是，容量不分前后，只有在总容量不够用时才会重新分配。

那么第二个问题来了，既然不是首末都有空位，那怎么实现push_back和push_front呢。这里deque维护了几个成员变量

	_Mapptr _Map;		// 指向block指针数组的指针
	size_type _Mapsize;	// _Map的尺寸
	size_type _Myoff;	// 首元素的偏移地址
	size_type _Mysize;	// 当前数组的尺寸

其中，_Myoff记录了首元素的偏移位置，而且从宏_PUSH_FRONT_BEGIN的源码中可以看到，一旦首部空间不足，则会将push_front的数据写加到数据最末端，从这角度来讲，deque类似于一个循环数组。所以这是一个圈圈，而不是一条直线。

同时，与vector一样，一旦内存因扩容而重新分配，指向原有元素的迭代器将失效。

C++::stack/queue

把这两个放在一起说，一个先进后出，一个先进先出。两个结构内部使用的容器其实都是deque，所以不再深入分析了。.

class _Container = deque<_Ty> >

二、链表

代表结构

JAVA ：LinkedList

C++ ： list

Java::LinkedList

    /**
     * Appends the specified element to the end of this list.
     *
     * <p>This method is equivalent to {@link #addLast}.
     *
     * @param e element to be appended to this list
     * @return {@code true} (as specified by {@link Collection#add})
     */
    public boolean add(E e) {
        linkLast(e);
        return true;
    }
    /**
     * Links e as last element.
     */
    void linkLast(E e) {
        final Node<E> l = last;
        final Node<E> newNode = new Node<>(l, e, null);
        last = newNode;
        if (l == null)
            first = newNode;
        else
            l.next = newNode;
        size++;
        modCount++;
    }

单向链表。所以在进行LinkedList随机存取时，需要遍历链表。

C++::list

双向链表，结构不多说。这里需要重点说明一下，C++的list到底是不是循环链表呢？是，但不是典型的循环链表，因为在首Node和尾Node之间存在一个"多余"的节点。

_Nodeptr _Myhead;	// pointer to head node

就是它。也是因为有它，当我们用迭代器遍历元素时，才不会遍历个没完没了。原因看这

iterator end()
{	// return iterator for end of mutable sequence
    return (iterator(this->_Myhead, this));
}

end返回的正是他的地址。如果画出来的话，C++中list的数据结构应该是这样子滴。画的糙，凑合看。

三、树

代表结构

JAVA ：TreeMap

C++ ： set/map

Java::TreeMap

    /**
     * The comparator used to maintain order in this tree map, or
     * null if it uses the natural ordering of its keys.
     *
     * @serial
     */
    private final Comparator<? super K> comparator;

    private transient Entry<K,V> root;

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     *
     * @return the previous value associated with {@code key}, or
     *         {@code null} if there was no mapping for {@code key}.
     *         (A {@code null} return can also indicate that the map
     *         previously associated {@code null} with {@code key}.)
     * @throws ClassCastException if the specified key cannot be compared
     *         with the keys currently in the map
     * @throws NullPointerException if the specified key is null
     *         and this map uses natural ordering, or its comparator
     *         does not permit null keys
     */
    public V put(K key, V value) {
        Entry<K,V> t = root;
        if (t == null) {
            compare(key, key); // type (and possibly null) check

            root = new Entry<>(key, value, null);
            size = 1;
            modCount++;
            return null;
        }
        int cmp;
        Entry<K,V> parent;
        // split comparator and comparable paths
        Comparator<? super K> cpr = comparator;
        if (cpr != null) {
            do {
                parent = t;
                cmp = cpr.compare(key, t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        else {
            if (key == null)
                throw new NullPointerException();
            @SuppressWarnings("unchecked")
                Comparable<? super K> k = (Comparable<? super K>) key;
            do {
                parent = t;
                cmp = k.compareTo(t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        Entry<K,V> e = new Entry<>(key, value, parent);
        if (cmp < 0)
            parent.left = e;
        else
            parent.right = e;
        fixAfterInsertion(e);
        size++;
        modCount++;
        return null;
    }

首先TreeMap是一棵红黑树，类的内部维护了两个主要变量：root节点和比较器（谓词）。root为红黑树的树根，当插入一个数据时，根据比较器判断向左子树还是右子树插入数据。构造TreeMap时，可以传入比较器，如果比较器为空，则使用类型默认的比较器。所如果key为自定义类型，则最好实现Comparable接口。

C++::set/map

C++中的set和map都继承自xtree，也就是说两种结构都是用红黑树存储的。我们知道，这两种存储都是有序的，且存在默认的排序规则，在构造时，可以自定义仿函数来重定义排序规则。大体上看一下xtree插入的逻辑。

_Pairib insert(const value_type& _Val, bool _Leftish)
{	// try to insert node with value _Val, on left if _Leftish
	_Nodeptr _Trynode = _Root();
	_Nodeptr _Wherenode = this->_Myhead;
	bool _Addleft = true;	// add to left of head if tree empty
	while (!this->_Isnil(_Trynode))
	{	// look for leaf to insert before (_Addleft) or after
		_Wherenode = _Trynode;
		if (_Leftish)
			_Addleft = !_DEBUG_LT_PRED(this->comp,
				this->_Key(_Trynode),
				this->_Kfn(_Val));	// favor left end
		else
			_Addleft = _DEBUG_LT_PRED(this->comp,
				this->_Kfn(_Val),
				this->_Key(_Trynode));	// favor right end
		_Trynode = _Addleft ? this->_Left(_Trynode)
			: this->_Right(_Trynode);
	}

	if (this->_Multi)
		return (_Pairib(_Insert(_Addleft, _Wherenode, _Val), true));
	else
	{	// insert only if unique
		iterator _Where = iterator(_Wherenode, this);
		if (!_Addleft)
			;	// need to test if insert after is okay
		else if (_Where == begin())
			return (_Pairib(_Insert(true, _Wherenode, _Val), true));
		else
			--_Where;	// need to test if insert before is okay

		if (_DEBUG_LT_PRED(this->comp,
			this->_Key(_Where._Mynode()),
			this->_Kfn(_Val)))
			return (_Pairib(_Insert(_Addleft, _Wherenode, _Val), true));
		else
			return (_Pairib(_Where, false));
	}
}

插入时，首先通过仿函数判断插入左子树还是右子树。然后判断该位置是否有数据，没有的话才会执行插入。最后调用_Pairlib进行数据的添加。

以上，对Java和C++常用的数据结构做了简要的总结。日后选用结构时可以根据其特性，选却最优的建模方式。

胖胖的大星

关注

6
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
Java C++ 数据结构对比

转载请注明出处：https://blog.csdn.net/mymottoissh/article/details/82826709目的是总结一下 JAVA 和 C++几种常用的数据结构内存模型，知道了内存结构，才能在实战中更好滴选用数据结构。通过总结，日后也避免把两种语言的结构混为一谈。闲言少叙，开始正文。关键字：JAVA C++ 数据结构目录一、动态数组二、链表三、树...
复制链接

扫一扫