Algorithm Tutorials Binary Indexed Trees (树状数组)

最新推荐文章于 2020-01-31 21:19:07 发布

xueerfei

最新推荐文章于 2020-01-31 21:19:07 发布

阅读量2.1k

点赞数 1

分类专栏：树状数组/线段树文章标签：树状数组 BIT

树状数组/线段树专栏收录该内容

5 篇文章 0 订阅

订阅专栏

最开始学习树状数组的时候，翻遍了各种帖子，也把刘汝佳的入门经典196也翻了个N遍，可还是不怎么看的明白，无意中发现了这个教程，讲的嘛，反正我个人觉得非常好，当然个人水平比较一般，要不然看了N遍的帖子都不明白，但是感觉人家老外讲的真的非常的清楚，就翻译下吧。

英文水平一般，翻译有误还希望大家批评指正。

原文地址：http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees

介绍：

我们经常会使用一些数据结构来使我们的算法更加迅速，这篇文章中我们介绍树状数组结构，根据Peter.M.Fenwich，树状数组第一次使用是用来进行数据压缩的，现在，经常用来存储frequencies和计算累计的frequencies表。【译注：这个frequency不知该如何翻译，在中文教程中常用来进行动态连续和查询问题】

首先我们定义下列问题：我们有n个盒子，操作如下：

1. 给第i个盒子增加marble

2. 计算第k...l盒子中的marble和。

最原始的解决方案,对于第一个问题时间复杂度是O（1）,，第二个是O（n），假定我们进行m次查询，最坏情况（当只进行第二个问题），时间复杂度是O(n*m)，使用类似于RMQ的数据结构，我们可以在最坏O(mlog(n))的情况下求解。另一种方案是使用Binary Indexed Tree 数据结构，他的最坏时间复杂度是O(mlog(n))，但是它更容易编码，而且需要的内存空间更少。

注意：

BIT Binary Indexed Tree（树状数组）

MaxVal maximum value which will have non-zero frequency（frequency非零中的最大值）

f[i] - frequency of value with index i, i = 1 .. MaxVal （数据frequency【其实这里就是我们题目中所说的原始数据】）
c[i] - cumulative frequency for index i (f[1] + f[2] + ... + f[i]) （数据frequency 1...i之和）
tree[i] - sum of frequencies stored in BIT with index i (latter will be described what index means); sometimes we will write tree frequency instead sum of frequencies stored in BIT（存储在BIT中的数据，稍后会为大家讲解）
num¯ - complement of integer num (integer where each binary digit is inverted: 0 -> 1; 1 -> 0 )【num的补充，翻译的不准确，不知如何翻译，但是从下文中可以自行揣测】
NOTE: Often we put f[0] = 0, c[0] = 0, tree[0] = 0, so sometimes I will just ignore index 0.

基本思想：
每个整数都可以表示为sum of powers of two（不知该如何翻译，根据上下文理解吧）。类似的，连续的频率可以表示为多个子频率之和，在我们的例子中，每个集合总包含是非重叠的frequencies.

idx是BIT的下标，r是idx表示为2进制中最后一位1的位置（从左到右），tree[idex]是下标从（idx-2^r+1 ~~ idx）的frequency值之和。从表1.1中可以看出。我们可以将idx表示为（idx-2^r+1 ~~ idx）。（注意核心在于我们的算法中操纵tree的方式）

Image 1.3 - tree of responsibility for indexes (bar shows range of frequencies accumulated in top element)

Image 1.4 - tree with tree frequencies

假定我们查询计算index=13的frequency，13的二进制表示为1101，即，我们计算c[1001]=tree[1101]+tree[1100]+tree[1000]。

隔离最后一位数字

注意：我们用“the last digit”来代替“the last non-zero digit”。

【这部分就不翻译了，感觉作者说的有点复杂了，就是求一个数二进制最后一个1所表示的数的大小，对与13来说，1101，最后一个1在第四位上，即2^0=1，用C++语言描述就是13&（-13），对于任何一个正整数来说，求法都是（num&(-num)）】。作者这段的意思就是这样，想看原文了可以回头看看。

读取累计频率

如果我们想读取idx所表示的frequency之和，我们累计tree[idx]之和，每次将idx自身最后一个1移去，重复此过程，知道idx变为0，我们可使用下面这个函数：

int read(int idx){
	int sum = 0;
	while (idx > 0){
		sum += tree[idx];
		idx -= (idx & -idx);
	}
	return sum;
}

一个例子，idx=13;sum=0;

Image 1.5 - arrows show path from index to zero which we use to get sum (image shows example for index 13)

所以我们的结果是26，对于idx参数来说，迭代的次数是log MaxVal.

时间复杂度是: O(log MaxVal)

代码长度不超过10行

改变某些位置的frequency并且更新tree

更新tree的概念就是，更新我们所改变frequency值得位置上所影响到的所有tree[idx]，读取某个index的frequency，我们移除last bit（即就是最后一个1）并且继续，在tree上修改某个值，我们应该增加当前index的值，增加last digit to index(增加最后一位1)，直到index<=MaxVal,c++代码如下：

void update(int idx ,int val){
	while (idx <= MaxVal){
		tree[idx] += val;
		idx += (idx & -idx);
	}
}

我们来看看idx=5 这个例子：

Image 1.6 - Updating tree (in brackets are tree frequencies before updating); arrows show path while we update tree from index to MaxVal (image shows example for index 5)

使用上边的算法，或者根据图1.6的箭头，我们可以更新BIT：

时间复杂度：O(log(MaxVal))

代码长度：不超过10行

读取真实index的frequence，【译注：就是根据tree[index]求出f[index]】

我们已经描述了我们怎样读取index下的连续frequency之和，很明显我们不能仅仅根据tree[idx]来得到真实的f[idx]。一个方法是增加一个额外的数组，分别存储frequency值。这样，读取和存储花费O(1)，内存消耗为线性。有时候特别需要节约内存，所以我将展示怎样得到真实的frequency值，而不需要额外的空间。

可能每个读者都已经看到了要获得真实的idx下的值，需要调用read函数两次：f[idx]=read(idx)-read(idx-1)，仅仅需要计算两个临近的连续序列。这个过程需要花费2*O(log n)时间，如果我们写一个新的函数，我们可以得到一个更快的算法，并且是较小的常量【最后一句不甚理解】

如果从两个index到root根的两个路径有相同的部分，（即路径有重叠），那么我们可以计算到在路径重合之前的和（sum），减去这个sum值，我们就得到了2个index之间的序列之和。那么邻近index之间的sum和计算或者读取给定index的frequency值则是很简单的。

【这段话翻译的不好，大家对着原文看看】

Mark given index with x, its predecessor with y. We can represent (binary notation) y as a0b, where b consists of all ones. Then, x will be a1b¯ (note that b¯ consists all zeros). Using our algorithm for getting sum of some index, let it be x, in first iteration we remove the last digit, so after the first iteration x will be a0b¯, mark a new value with z.

对于给定的index=x，前导为y，我们可以将y表示为二进制形式a0b，其中b包含所有的1。让后将x变为 a1b¯ ，（注意 b¯包含所有0 ）。使用我们的算法各道一些index的sum值，假设是index是x，第一步我们移除掉最右边的1，那么x就变为了 a0b¯，得到一个新值z。

重复这个过程，使用这个算法，我们将会不断的将index的最后边的1移除，经过几个步骤之后，index的前导y将会变为 a0b¯，刚好等于z。现在，我们写下我们的算法，注意唯一的例外是x等于0.C++代码如下：

int readSingle(int idx){
int sum = tree[idx]; // sum will be decreased
if (idx > 0){ // special case
	int z = idx - (idx & -idx); // make z first
	idx--; // idx is no important any more, so instead y, you can use idx
	while (idx != z){ // at some iteration idx (y) will become z
		sum -= tree[idx]; 
// substruct tree frequency which is between y and "the same path"
		idx -= (idx & -idx);
	}
}
return sum;
}

这里给出一个得到index等于12的真实frequency的过程：

首先，我们计算出z=12-(12&-12)=8，sum=11

Image 1.7 - read actual frequency at some index in BIT (image shows example for index 12)

让我们对比一下，对于给定的index，调用两次read函数和调用我们上边定义的函数。注意对于每个偶数数字，这个算法的时间复杂度为O(1)，不会进行迭代，对于绝大多数技术数字的idx，将会花费掉c*O(log(idx))时间，c严格小于1，对比read(idx)-read(idx-1)，这个的时间复杂度c1*O(log idx)，其中c1则always大于1.

时间复杂度为c*O(log(idx)) c严格小于1

代码长度：不超过15行

Scaling the entire tree by a constant factor【通过一个常数因子将树按比例变化】

有时候我们需要一些factor来scale我们的树，上边描述的过程都非常简单，如果我们想通过一些因子scale，那么对于每一个idx，应该通过-(c-1)*readSingle(idx)/c（因为f[idx]-(c-1)*f[idx]/c=f[idx]/c）函数描写如下：

void scale(int c){
	for (int i = 1 ; i <= MaxVal ; i++)
		update(-(c - 1) * readSingle(i) / c , i);
}

这个也可以很迅速的完成，因子是线性操作，每一个tree的 frequency是某些frequency的线性组合，如果我们使用一些因子对于每个frequency进行scale，那么我们也可以使用相同的因子scale tree的frequency。上边的程序时间复杂度为O(MaxVal * log (MaxVal))，我们可以写出下边复杂度为O(MaxVal)的程序：

void scale(int c){
	for (int i = 1 ; i <= MaxVal ; i++)
		tree[i] = tree[i] / c;
}

时间复杂度：O(MaxVal).

代码长度：仅仅几行

对于给定的某一累积frequency，找到相应的index

对于求出给定某一累积frequency，找到相应的index这个问题，最原始最简单的办法就是遍历所有的index，计算累积frequency，然后检测是否与给定的值相等【说白了就是暴搜】。然而，如果我们仅有非负的frequency（意味着index值越大，累积的frequency越大），我们可以找出对数算法，即所谓的二分搜索。我们遍历所有位（从最高位开始），比较当前index累积的frequency与给定值，根据比较结果，（类似于二分搜索），取较低或者较高的一半。c++ 代码如下：

// if in tree exists more than one index with a same
// cumulative frequency, this procedure will return 
// some of them (we do not know which one)

// bitMask - initialy, it is the greatest bit of MaxVal
// bitMask store interval which should be searched
int find(int cumFre){
	int idx = 0; // this var is result of function
	
	while ((bitMask != 0) && (idx < MaxVal)){ // nobody likes overflow :)
		int tIdx = idx + bitMask; // we make midpoint of interval
		if (cumFre == tree[tIdx]) // if it is equal, we just return idx
			return tIdx;
		else if (cumFre > tree[tIdx]){ 
		        // if tree frequency "can fit" into cumFre,
		        // then include it
			idx = tIdx; // update index 
			cumFre -= tree[tIdx]; // set frequency for next loop 
		}
		bitMask >>= 1; // half current interval
	}
	if (cumFre != 0) // maybe given cumulative frequency doesn't exist
		return -1;
	else
		return idx;
}



// if in tree exists more than one index with a same
// cumulative frequency, this procedure will return 
// the greatest one
int findG(int cumFre){
	int idx = 0;
	
	while ((bitMask != 0) && (idx < MaxVal)){
		int tIdx = idx + bitMask;
		if (cumFre >= tree[tIdx]){ 
		        // if current cumulative frequency is equal to cumFre, 
		        // we are still looking for higher index (if exists)
			idx = tIdx;
			cumFre -= tree[tIdx];
		}
		bitMask >>= 1;
	}
	if (cumFre != 0)
		return -1;
	else
		return idx;
}

例子：给定累积频率为21，函数find工作如下：

时间复杂度：O(log(MaxVal))

代码长度：不超过20行

二维BIT

BIT可以被用做多维数据结构。假定你有屏幕上一个点（坐标非负），你现在有3个问题：

1. 将点设置到（x,y）

2 将点从（x,y）移除掉

3 计算矩形（（0,0），（x，y））中点的数量。

如果进行m次询问，max_x是x的最大值，max_y是y的最大值，这样问题可以在O(m * log (max_x) * log (max_y)).时间内解决。在这种情况下，tree的每个元素将包含数组(tree[max_x][max_y])。更新x坐标的索引与上边的类似，例如，假定我们设定/移除点（a,b），我们定义更新函数update(a,b,1)/update(a,b,-1)。update函数如下：

void update(int x , int y , int val){
	while (x <= max_x){
		updatey(x , y , val); 
		// this function should update array tree[x] 
		x += (x & -x); 
	}
}

updatey与update类似：

void updatey(int x , int y , int val){
	while (y <= max_y){
		tree[x][y] += val;
		y += (y & -y); 
	}
}

也可写在一个函数中：

void update(int x , int y , int val){
	int y1;
	while (x <= max_x){
		y1 = y;
		while (y1 <= max_y){
			tree[x][y1] += val;
			y1 += (y1 & -y1); 
		}
		x += (x & -x); 
	}
}

Image 1.8 - BIT is array of arrays, so this is two-dimensional BIT (size 16 x 8).
Blue fields are fields which we should update when we are updating index (5 , 3).

其他功能的修改是类似的，同时注意BIT可以作为一个n维的数据结构。

文章到此就翻译完了。

杭电1394题目： http://acm.hdu.edu.cn/showproblem.php?pid=1394

这个求逆序数可以使用这个BIT来解决：

#include <cstdio>
#include <stdlib.h>
#include <algorithm>
#include <string.h>
using namespace std;
int num[5005] , total[5005];
int n;
int lowbit( int x )
{
    return x&(-x);
}

int query( int x )
{
    int ret = 0;
    x += 1;
    while( x <= n )
    {
          ret += total[x];
          x += lowbit(x);  
    }
    return ret ;
}

void update( int x )
{
     while( x > 0 )
     {
            total[x]++;
            x -= lowbit(x);       
     }
}

int main( )
{
    while( scanf("%d",&n) != EOF )    
    {
           memset( total , 0 , sizeof(total) );
           memset( num , 0 , sizeof(num) );
           int sum = 0;
           for( int i=0 ; i<n ; i++ ){
                scanf("%d",&num[i]);
                update( num[i]+1 );
                sum += query( num[i]+1 ); 
                //printf("%d\n",sum);
           }
          
           int k;
           int s = sum;
           for( int i=0 ; i<n ; i++ ){
                k = sum + n-1 - 2*num[i];
                sum = k;
                if( k < s )
                    s = k;     
           }
           printf("%d\n",s);
    }
    return 0;
}

中秋那天开始翻译，结果因为一些其他的事给耽误了，哼哧了半天总算翻完了，哎，英语水平有限说实话翻译的不怎么滴，不过通过这篇文章倒是总算把树状数组给搞明白了。

当然实际应用中，怎么将问题抽象成模型还需要多多练习啊。

大西安中秋依然阴雨绵绵，晚安！

xueerfei

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
Algorithm Tutorials Binary Indexed Trees (树状数组)

最开始学习树状数组的时候，翻遍了各种帖子，也把刘汝佳的入门经典196也翻了个N遍，可还是不怎么看的明白，无意中发现了这个教程，讲的嘛，反正我个人觉得非常好，自己水平一般，要不然看了N遍的帖子都不明白，但是感觉人家老外讲的真的非常的清楚，就翻译下吧。英文水平一般，翻译有误还希望大家批评指正。原文地址：http://community.topcoder.com/tc?module=Static&
复制链接

扫一扫