spatialhadoop2.3源码阅读(八) RTree索引生成方法(一)

最新推荐文章于 2024-04-22 15:07:25 发布

flyhaifeng

最新推荐文章于 2024-04-22 15:07:25 发布

阅读量771

点赞数

分类专栏： spatialhadoop

本文链接：https://blog.csdn.net/flyhaifeng/article/details/50374077

版权

spatialhadoop 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

SpatialHadoop的索引生成类为edu.umn.cs.spatialHadoop.operations.Repartition。该类的main方法，repartition方法以及repartitionMapReduce的第一部分和第三部分，均与spatialhadoop2.3源码阅读(五) grid 索引生成方法(一)中介绍的相同，本文重点介绍repartitionMapReduce的第二部分，具体代码如下：

 /**
   * Create rectangles that together pack all points in sample such that
   * each rectangle contains roughly the same number of points. In other words
   * it tries to balance number of points in each rectangle.
   * Works similar to the logic of bulkLoad but does only one level of
   * rectangles.
   * @param samples
   * @param gridInfo - Used as a hint for number of rectangles per row or column
   * @return
   */
  public static Rectangle[] packInRectangles(GridInfo gridInfo, final Point[] sample) {
    Rectangle[] rectangles = new Rectangle[gridInfo.columns * gridInfo.rows];
    int iRectangle = 0;
    // Sort in x direction
    final IndexedSortable sortableX = new IndexedSortable() {
      @Override
      public void swap(int i, int j) {
        Point temp = sample[i];
        sample[i] = sample[j];
        sample[j] = temp;
      }

      @Override
      public int compare(int i, int j) {
        if (sample[i].x < sample[j].x)
          return -1;
        if (sample[i].x > sample[j].x)
          return 1;
        return 0;
      }
    };

    // Sort in y direction
    final IndexedSortable sortableY = new IndexedSortable() {
      @Override
      public void swap(int i, int j) {
        Point temp = sample[i];
        sample[i] = sample[j];
        sample[j] = temp;
      }

      @Override
      public int compare(int i, int j) {
        if (sample[i].y < sample[j].y)
          return -1;
        if (sample[i].y > sample[j].y)
          return 1;
        return 0;
      }
    };

    final QuickSort quickSort = new QuickSort();
    
    quickSort.sort(sortableX, 0, sample.length);
    for(int i = 0;i < sample.length;i++){
    	System.out.println(sample[i]);
    }
    int xindex1 = 0;
    double x1 = gridInfo.x1;
    for (int col = 0; col < gridInfo.columns; col++) {
      int xindex2 = sample.length * (col + 1) / gridInfo.columns;
      
      // Determine extents for all rectangles in this column
      double x2 = col == gridInfo.columns - 1 ? 
          gridInfo.x2 : sample[xindex2-1].x;
      
      // Sort all points in this column according to its y-coordinate
      quickSort.sort(sortableY, xindex1, xindex2);
      
      // Create rectangles in this column
      double y1 = gridInfo.y1;
      for (int row = 0; row < gridInfo.rows; row++) {
        int yindex2 = xindex1 + (xindex2 - xindex1) * (row + 1) / gridInfo.rows;
        double y2 = row == gridInfo.rows - 1 ? gridInfo.y2 : sample[yindex2 - 1].y;
        
        rectangles[iRectangle++] = new Rectangle(x1, y1, x2, y2);
        y1 = y2;
      }
      
      xindex1 = xindex2;
      x1 = x2;
    }
    return rectangles;
  }

12行：new出最后的返回值

15-50：定义排序函数

52-57：对采样的所有点的x坐标进行由小到大排序

60：最外层循环遍历x轴上的每一列

61：将所有的点按照columns均分，即将有序的x坐标分为columns份，在循环中对每一份进行处理。每一次处理xindex1 到 xindex2之间的点（xindex1，xindex2为 sample数组的索引）

64：得出索引xindex2对应的x坐标

68：将xindex1 到 xindex2之间的点按照y坐标进行由小到大排序

72：内层循环遍历y轴上的每一行

73：将xindex1 到 xindex2之间的点按照rows进行均分，即将有序的y坐标分为rows分，在循环中对每一份进行处理。每一次处理yindex1 到 yindex2之间的点

74：得出索引yindex2对应的y坐标，至此已获得x1,x2,y1,y2

76：得出当前网格的(x1,y1)-(x2,y2)

整个算法大概为：先将所有点按照x坐标由小到大排序，然后等分，再将等分后的每一部分按照y坐标由小到大排序，再等分，算出每一份即每一个网格的点数，因为点已经排序，所以可以得到该网格内的最小x1，y1，最大x2，y2.这个网格就可以用该(x1,y1)-(x2,y2)描述。

flyhaifeng

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spatialhadoop2.3源码阅读(八) RTree索引生成方法(一)

SpatialHadoop的索引生成类为edu.umn.cs.spatialHadoop.operations.Repartition。该类的main方法，repartition方法以及repartitionMapReduce的第一部分和第三部分，均与spatialhadoop2.3源码阅读(五) grid 索引生成方法(一)中介绍的相同，本文重点介绍repartitionMapReduce的
复制链接

扫一扫