neo4j 源码分析5-导数据

最新推荐文章于 2023-11-20 23:33:12 发布

邓子明

最新推荐文章于 2023-11-20 23:33:12 发布

阅读量612

点赞数

文章标签： neo4j 源码知识图谱

本文链接：https://blog.csdn.net/qq_32140547/article/details/82500858

版权

本文深入分析 Neo4j 数据导入过程中如何利用 EncodingIdMapper 存储节点ID，并探讨其数据查找算法。通过二分查找、桶排序和基数排序的结合，提高查询效率。同时，文章解释了 trackerCache 和 sortBuckets 在排序过程中的作用，以及编码和基数计算的细节。

摘要由CSDN通过智能技术生成

EncodingIdMapper
put 方法：


long eId = encode( inputId );

dataCache.set( nodeId, eId );

groupCache.set( nodeId, group.id() );

candidateHighestSetIndex.offer( nodeId );

dataCache.set( nodeId, eId );

DynamicLongArray.set(long index, long value )

at( index ).set( index, value );

1. at
{DynamicNumberArray<N extends NumberArray<N>>

    @Override
    public N at( long index )
    {
        if ( index >= length() )
        {
            synchronizedAddChunk( index );
            // 扩容 new OffHeapLongArray( length, defaultValue, base );
        }

        int chunkIndex = chunkIndex( index );
        return chunks[chunkIndex];
    }
}

at 方法先看情况给 chunks 扩容，实际上就是新建一个 OffHeapLongArray ，放到 chunks 数租，然后返回一个。实际上就是返回当前的顶点应该所在的 LongArray。

2. set
{OffHeapLongArray

    @Override
    public void set( long index, long value )
    {
        UnsafeUtil.putLong( addressOf( index ), value );
    }

    addressOf(index)
    {
        index = rebase( index ); // index - base;
        if ( index < 0 || index >= length )
        {
            throw new ArrayIndexOutOfBoundsException( "Requested index " + index + ", but length is " + length );
        }
        // 
        return address + (index << shift); // 在当前位置左移三位，也就是乘以8，因为保存一个 long 要 8位？
    }
    putLong:
    unsafe.putLong( address, value );

}

dataCache 的set逻辑比较清楚了，dataCache 是一个 DynamicLongArray ，里面有一个 chunks = OffHeapLongArray[] ,每个 OffHeapLongArray 长度是一百万，数据增多了就新建 OffHeapLongArray。
然后通过 Unsafe 的方法赋值，因为 ne4j 的 id 是连续的，所以直接移动八位就是下一个数据的存储地址。然后 putLong 放进去。
这里我们其实有一些疑问，现在其实是将顶点的值 value 放到了它的 id 对应的位置上，我们要查一个 id 对应的值只需要找到对应地址即可，我们需要查询一个 value 对应的 id 还是没法查，所以我们接下来要研究一下怎么查。

看 get 方法，看名字就知道这是一个二分查找，二分查找有个前提是数据是排好序的，然后不断得到中间的值和现有的值对比。

private long binarySearch( Object inputId, int groupId )
{
    long low = 0;
    long high = highestSetIndex;
    // 得到加密后的 value 
    long x = encode( inputId );
    // 这个方法得到 x 的基数
    int rIndex = radixOf( x );

    // for 循环中，rIndex 和 sortBuckets 二维数组的 第一行的值作比较，如果满足了条件，就取对应的第二行和下一个第二行的值作为 low 和 high。
    // 根据这个逻辑我们大概能判断出来，sortBuckets 中放的是分位点，第一行放的是 rIndex 分位点，第二行放的是 index 的分位点，而且 index 对应的值一个个是排好序的，不然没法二分查找。
    // 
    for ( int k = 0; k < sortBuckets.length; k++ )
    {
        if ( rIndex <= sortBuckets[k][0] )//bucketRange[k] > rIndex )
        {
            low = sortBuckets[k][1];
            high = (k == sortBuckets.length - 1) ? highestSetIndex : sortBuckets[k + 1][1];
            break;
        }
    }

    long returnVal = binarySearch( x, inputId, low, high, groupId );
    if ( returnVal == ID_NOT_FOUND )
    {
        low = 0;
        high = highestSetIndex;
        returnVal = binarySearch( x, inputId, low, high, groupId );
    }
    return returnVal;
}


private long binarySearch( long x, Object inputId, long low, long high, int groupId )
{
    while ( low <= high )
    {
        // 中值点 low 和 high 都是 index
        long mid = low + (high - low) / 2;//(low + high) / 2;

        // trackerCache 能根据值 index 得到 真正的 index ？

        long dataIndex = trackerCache.get( mid );
        if ( dataIndex == ID_NOT_FOUND )
        {
            return ID_NOT_FOUND;
        }
        // 查找 value，dataCache 是我们上面放数据的 DynamicLongArray ，根据 index 能查到 值。
        long midValue = dataCache.get( dataIndex );
        switch ( unsignedDifference( clearCollision( midValue ), x ) )
        {
        case EQ:
            // We found the value we were looking for. Question now is whether or not it's the only
            // of its kind. Not all values that there are duplicates of are considered collisions,
            // read more in detectAndMarkCollisions(). So regardless we need to check previous/next

最低0.47元/天解锁文章

邓子明

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
neo4j 源码分析5-导数据

EncodingIdMapper put 方法：long eId = encode( inputId );dataCache.set( nodeId, eId );groupCache.set( nodeId, group.id() );candidateHighestSetIndex.offer( nodeId );dataCache.set( nodeId, eI...
复制链接

扫一扫