HBase-过滤器(各种过滤器及代码实现)

最新推荐文章于 2024-05-16 10:19:13 发布

weixin_30718391

最新推荐文章于 2024-05-16 10:19:13 发布

阅读量279

点赞数

文章标签：大数据

原文链接：http://www.cnblogs.com/EnzoDin/p/9564160.html

版权

过滤器简介

HBase过滤器提供了非常强大的特性来帮助用户提高其处理表中数据的效率。

HBase中两种主要的数据读取函数是get和scan，它们都支持直接访问数据和通过指定起止行键访问数据的功能。可以再查询中添加更多的限制条件来减少查询得到的数据量，这些限制可以使指定列族、列、时间戳以及版本号。

所有的过滤器都在服务端生效，叫做谓词下推(predicate push down),这样可以保证被过滤掉的数据不会被传送到客户端。也可以在客户端代码中实现过滤的功能（但会影响系统性能），因为这种情况下服务器端需要传输更多的数据到客户端，用户应当尽量避免这种情况。

过滤器的层次结构

在过滤器层次结构的最底层是Filter和FilterBase抽象类，它们实现了过滤器的空壳和骨架，这使得实际的过滤器类可以避免许多重复的结构代码。

过滤器可以作用于用户插入的整行数据，所以用户可以基于任意可用的信息来决定如何处理这一行。这些信息包括行键、列名、实际的列值和时间戳等。

比较运算符

以为继承自CompareFilter的过滤器比基类FilterBase多了一个compare方法，它需要使用传入参数定义比较操作的过程。

操作	描述
LESS	匹配小于设定值的值
LESS_OR_EQUAL	匹配小于或等于设定值的值
EQUAL	匹配等于设置值的值
NOT_EQUAL	匹配与设定值不相等的值
GREATER_OR_EQUAL	匹配大于或等于设定值的值
GREATER	匹配大于设定值的值
NO_OP	排除一切值

比较器

CompareFilter所需要的第二类类型是比较器(comparator)，比较器提供了多种方法来比较不同的键值。比较器都继承自WritableByteArrayComparable，WritableByteArrayComparable实现了Writable和Comparable接口。

比较器	描述
BinaryComparator	使用Bytes.compareTo比较当前值与阈值
BinaryPrefixComparator	使用Bytes.compareTo进行匹配，但是是从左端开始前缀匹配
NullComparator	不做匹配，只判断当前值是不是null
BitComparator	通过BitwiseOp类提供的按位与(AND)\或(OR)\异或(XOR)操作执行位级比较
RegexStringComparator	根据一个正则表达式，在实例化这个比较器的时候去匹配表中的数据
SubstringComparator	把阈值和表中数据当做string实例，同时通过contains操作匹配字符串

注意：后面的三种比较器，BitComparator、RegexStringComparator、SubstringComparator，只能与EQUAL和NOT_EQUAL运算符搭配使用，因为这些比较器的compareTo方法匹配时返回0，不匹配时返回1.如果和LESS和GREATER运算符搭配使用，会产生错误。

基于字符串的的比较器，比如RegexStringComparator和SubstringComparator，比基于字节的比较器更慢，更消耗资源。因为每次比较时它们都需要将给定的值转化为string。截取字符串子串和正则式的处理也需要花费额外的时间。

比较过滤器

每个比较过滤器的构造方法都有一个从CompareFilter继承来的签名方法

public CompareFilter(final CompareOp compareOp,

      final ByteArrayComparable comparator) {

      this.compareOp = compareOp;

      this.comparator = comparator;

  }

注意：HBase中过滤器本来的目的是为了筛掉无用的信息。被过滤掉的信息不会被传送到客户端。过滤器不能用来指定用户需要哪些信息，而是在读取数据的过程中不返回用户不想要的信息。

正好相反，所有基于CompareFilter的过滤处理过程与上面所描述的恰好相反，它们返回匹配的值。用户需要根据过滤器的不同规则来小心地挑选过滤器。

HBase连接代码

private static String addr="node233,node232,node231";
	private static  String port="2181";
	Logger logger = Logger.getLogger(getClass());

	private static Connection connection;
	
	public static void getConnection(){
		Configuration conf = HBaseConfiguration.create();

		conf.set("hbase.zookeeper.quorum",addr);
		conf.set("hbase.zookeeper.property.clientPort", port);
		try {
			connection = ConnectionFactory.createConnection(conf);
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	/*
	 * 关闭连接
	 *
	 */
	public static void close() {
		/**
		 * close connection
		 **/
		if (connection != null) {
			try {
				connection.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
	
	public static void main(String[] args) {
		getConnection();
		try {
			columnPaginationFilterTest();
		} catch (IOException e) {
			e.printStackTrace();
		}
		close();
	}

1.行过滤器(RowFilter)

行过滤器是基于行键来过滤数据

/**
	 * 对rowkey检索
	 * @throws IOException
	 */
	public static void rowFilterTest() throws IOException {
		Table table = connection.getTable(TableName.valueOf("test_hbase"));
		
		Scan scan = new Scan();
		scan.addColumn(Bytes.toBytes("userInfo"), Bytes.toBytes("id"));
		Filter filter = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("110115199402265244")));
		scan.setFilter(filter);
		
		ResultScanner scanner = table.getScanner(scan);
		//keyvalues={110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0}
		for (Result result : scanner) {
			System.out.println(result);
		}
		scanner.close();
		
		System.out.println("==============filter2===============");
		//检索出rowkey以244结尾的
		Filter filter2 = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator(".*.244") );
		scan.setFilter(filter2);
		ResultScanner scanner2 = table.getScanner(scan);
		for (Result result : scanner2) {
			System.out.println(result);
		}
		scanner2.close();
		
		System.out.println("==============filter3===============");
		//检索出rowkey包含244的
		Filter filter3 = new RowFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("244"));
		scan.setFilter(filter3);
		ResultScanner scanner3 = table.getScanner(scan);
		for (Result result : scanner3) {
			Cell[] cells = result.rawCells();
			for (Cell cell : cells) {
				System.out.println(Bytes.toString(CellUtil.cloneValue(cell)));
			}
			System.out.println(result);
		}
		scanner3.close();
	}

2.列族过滤器（FamilyFilter）

这个过滤器与行过滤器相似，不过它是通过比较列族而不是比较行键来返回结果的。通过使用不同组合的运算符和比较器，用户可以在列族一级筛选所需要的数据。

/**
     * 列族过滤器
     * @throws IOException
     */
    public static void faimlyFilterTest() throws IOException {
       Table table = connection.getTable(TableName.valueOf("test_hbase"));
       Filter filter = new FamilyFilter(CompareOp.LESS, new BinaryComparator(Bytes.toBytes("userInfo")));
       Scan scan = new Scan();
       scan.setFilter(filter);
       //使用过滤器扫描表
       ResultScanner scanner = table.getScanner(scan);
       //keyvalues={110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0}
       //keyvalues={110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0}
       for (Result result : scanner) {
           System.out.println(result);
       }
       scanner.close();
       System.out.println("============get==============");
       //使用相同过滤器获取一行数据
       //Result of get():keyvalues={110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0}
       Get get = new Get(Bytes.toBytes("110115199402265244"));
       get.setFilter(filter);
       Result result = table.get(get);
       System.out.println("Result of get():" + result);
       System.out.println("============filter=============");
       //在一个列族上创建过滤器，同时获取另一行数据
       //使用新的过滤器获取同一行数据，此时返回结果为NONE
       //Result of get():keyvalues=NONE
       Filter filter2 = new FamilyFilter(CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("userInfo1")));
       Get get2 = new Get(Bytes.toBytes("110115199402265244"));
       get2.addFamily(Bytes.toBytes("tag"));
       get2.setFilter(filter2);
       Result result2 = table.get(get2);
       System.out.println("Result of get():" + result2);
    }

3.列名过滤器(QualifierFilter)

/**
     * 列名过滤器(QualifierFilter)
     * @throws IOException
     */
    public static void qualifiterFilterTest() throws IOException {
       Table table = connection.getTable(TableName.valueOf("test_hbase"));
       Filter filter = new QualifierFilter(CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("basic_222")));
       Scan scan = new Scan();
       scan.setFilter(filter);
       ResultScanner scanner = table.getScanner(scan);
       //keyvalues={110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0}
       for (Result result : scanner) {
           System.out.println(result);
       }
       scanner.close();
       Get get = new Get(Bytes.toBytes("110115199402265244"));
       get.setFilter(filter);
       Result result = table.get(get);
       //Result of get():keyvalues={110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0}
       System.out.println("Result of get():" + result);
    }

4.值过滤器(ValueFilter)

筛选某个特定值的单元格。

/**
     * 值过滤器
     * @throws IOException
     */
    public static void valueFilterTest() throws IOException {
       Table table = connection.getTable(TableName.valueOf("test_hbase"));
       Filter filter = new ValueFilter(CompareOp.EQUAL, new SubstringComparator("si"));
       Scan scan = new Scan();
       scan.setFilter(filter);
       ResultScanner scanner = table.getScanner(scan);
       //CELL: 110115199402265245/userInfo:name/1535527529043/Put/vlen=4/seqid=0,Value:lisi
       for (Result result : scanner) {
           for (Cell cell : result.rawCells()) {
              System.out.println("CELL: " + cell + ",Value:" + Bytes.toString(CellUtil.cloneValue(cell)));
           }
       }
       scanner.close();
       System.out.println("===============get===============");
       Get get = new Get(Bytes.toBytes("110115199402265245"));
       get.setFilter(filter);
       Result result = table.get(get);
       for (Cell cell : result.rawCells()) {
           System.out.println("CELL: " + cell + ",Value:" + Bytes.toString(CellUtil.cloneValue(cell)));
       }
    }

5.参考列过滤器(DependentColumnFilter)

此过滤器允许用户指定一个参考列或是引用列，并使用参考列控制其他列的过滤。参考列过滤器使用参考列的时间戳，并在过滤时包括所有与引用时间戳相同的列。

构造方法：

public DependentColumnFilter(final byte [] family, final byte[] qualifier,
      final boolean dropDependentColumn, final CompareOp valueCompareOp,
      final ByteArrayComparable valueComparator)
public DependentColumnFilter(final byte [] family, final byte [] qualifier)
public DependentColumnFilter(final byte [] family, final byte [] qualifier,
      final boolean dropDependentColumn)

可以传入比较运算符和基准值来启用ValueFilter的功能。这个过滤器的构造函数默认允许用户在所有列上忽略运算符和比较器，以及屏蔽按值筛选的功能，也就是说整个过滤器只基于参考列的时间戳进行筛选。

dropDependentColumn可以帮助用户操作参选列：该参数设为false或true决定了参考列可以被返回还是被丢弃。

/**
     * 参考列过滤器
     * @throws IOException
     */
    public static void dependentColumnFilterTest() throws IOException {
       //CELL: 110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0,Value:223
       filter(true, CompareOp.NO_OP, null);
       System.out.println("----------filter(false, CompareOp.NO_OP, null);--------");
       //CELL: 110115199402265244/userInfo:name/1535527096150/Put/vlen=8/seqid=0,Value:"张三"
       //CELL: 110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0,Value:223
       //CELL: 110115199402265245/userInfo:name/1535527529043/Put/vlen=4/seqid=0,Value:lisi
       filter(false, CompareOp.NO_OP, null);
       System.out.println("----------filter(true, CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes(\"row\")));--------");
       //CELL: 110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0,Value:223
       filter(true, CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("li")));
       System.out.println("----------filter(false, CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes(\"row\")));--------");
       //CELL: 110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0,Value:223
       //CELL: 110115199402265245/userInfo:name/1535527529043/Put/vlen=4/seqid=0,Value:lisi
       filter(false, CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("li")));
       System.out.println("----------filter(true, CompareOp.EQUAL, new RegexStringComparator(\".*.si\"));--------");
       //CELL: 110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0,Value:223
       filter(true, CompareOp.EQUAL, new RegexStringComparator(".*.si"));
       System.out.println("----------filter(false, CompareOp.EQUAL, new RegexStringComparator(\".*.si\"));--------");
       //CELL: 110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0,Value:223
       //CELL: 110115199402265245/userInfo:name/1535527529043/Put/vlen=4/seqid=0,Value:lisi
       filter(false, CompareOp.EQUAL, new RegexStringComparator(".*.si"));
    }
    public static void filter(boolean drop, CompareOp operator, ByteArrayComparable comparator) throws IOException {
       Table table = connection.getTable(TableName.valueOf("test_hbase"));
       Filter filter;
       if (null != comparator) {
           filter = new DependentColumnFilter(Bytes.toBytes("userInfo"), Bytes.toBytes("name"),
                  drop, operator, comparator);
       } else {
           filter = new DependentColumnFilter(Bytes.toBytes("userInfo"), Bytes.toBytes("name"), drop);
       }
       Scan scan = new Scan();
       scan.setFilter(filter);
       ResultScanner scanner = table.getScanner(scan);
       for (Result result : scanner) {
           for (Cell cell : result.rawCells()) {
              System.out.println("CELL: " + cell + ",Value:" + Bytes.toString(CellUtil.cloneValue(cell)));
           }
       }
    }

这种过滤器与扫描操作的批量处理功能不兼容。过滤器需要查看整行数据来决定那些数据被过滤，使用批量处理可能会导致取到的数据中不包括参考列，因此结果有错。

专用过滤器

HBase提供的专用过滤器直接继承自FilterBase,同时用于更特定的使用场景。其中的一些过滤器只能做行筛选，因此只适用于扫描操作。

1.单列值过滤器(SingleColumnValueFilter)

用一列的值决定是否一行数据被过滤

构造方法

public SingleColumnValueFilter(final byte [] family, final byte [] qualifier,
      final CompareOp compareOp, final byte[] value)
public SingleColumnValueFilter(final byte [] family, final byte [] qualifier,
      final CompareOp compareOp, final ByteArrayComparable comparator)

辅助方法微调过滤

public boolean getFilterIfMissing()
public void setFilterIfMissing (boolean filterIfMissing)
public boolean getLatestVersionOnly()
public void setLatestVersionOnly(boolean latestVersionOnly)

setFilterIfMissing：当参考列不存在时如何处理这一行。默认的这一行是被抱哈年结果中的，如果想要过滤掉这些行，可以使用setFilterIfMissing(true)来过滤，在这样设置后，所有不包含参考列的行都可以被过滤掉。

注意：用户在扫描时必须包括参考列，用户使用类似于addColumn的方法把参考列添加到查询中。如果没有这么做，也就是说扫描结果中没有包括参考列，那么结果可能为空或包含所有行，具体结果根据setFilterIfMissing的设定值来返回

setLatestVersionOnly：可以改变过滤器的行为，默认值为true，此时过滤器只检查参考列的最新版本，如果设为false，会检查所有版本。

/**
	 * 单列值过滤
	 * @throws IOException
	 * 排除掉符合条件的
	 */
	public static void singleColumnValueFilterTest() throws IOException {
		Table table = connection.getTable(TableName.valueOf("test_hbase"));
		
		SingleColumnValueFilter filter = 
				new SingleColumnValueFilter(Bytes.toBytes("userInfo"), Bytes.toBytes("name"),
						CompareOp.NOT_EQUAL, new SubstringComparator("si"));
		filter.setFilterIfMissing(true);
		
		/**
		 * 	CELL: 110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0,Value:222
			CELL: 110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0,Value:110115199402265244
			CELL: 110115199402265244/userInfo:name/1535527096150/Put/vlen=8/seqid=0,Value:"张三"
			CELL: 110115199402265244/userInfo:row/1535525614718/Put/vlen=5/seqid=0,Value:row-1
		 */
		Scan scan = new Scan();
		scan.setFilter(filter);
		ResultScanner scanner = table.getScanner(scan);
		for (Result result : scanner) {
			for (Cell cell : result.rawCells()) {
				System.out.println("CELL: " + cell + ",Value:" + Bytes.toString(CellUtil.cloneValue(cell)));
			}
		}
		scanner.close();
		
		/**
		 * 	Result of get():keyvalues={110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0, 110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0, 110115199402265244/userInfo:name/1535527096150/Put/vlen=8/seqid=0, 110115199402265244/userInfo:row/1535525614718/Put/vlen=5/seqid=0}
			CELL: 110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0,Value:222
			CELL: 110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0,Value:110115199402265244
			CELL: 110115199402265244/userInfo:name/1535527096150/Put/vlen=8/seqid=0,Value:"张三"
			CELL: 110115199402265244/userInfo:row/1535525614718/Put/vlen=5/seqid=0,Value:row-1
		 */
		Get get = new Get(Bytes.toBytes("110115199402265244"));
		get.setFilter(filter);
		Result result = table.get(get);
		System.out.println("Result of get():" + result);
		for (Cell cell : result.rawCells()) {
			System.out.println("CELL: " + cell + ",Value:" + Bytes.toString(CellUtil.cloneValue(cell)));
		}
	}

2.单列排除过滤器(SingleColumnValueExcludeFilter)

单列排除过滤器继承自SingleColumnValueFilter，参考列不被包括到结果中。用户可以使用与之前相同的特性和方法来控制过滤器的工作，唯一的不同是，客户端Result实例中用户永远不会获得作为检查目标的参考列。

3.前缀过滤器(PrefixFilter)

在构造当前过滤器时传入一个前缀，所有与前缀匹配的行都会被返回到客户端。在扫描操作中非常有用。

/**
	 * 前缀过滤器
	 * @throws IOException 
	 * 返回满足条件的
	 */
	public static void prefixFilterTest() throws IOException {
		Table table = connection.getTable(TableName.valueOf("test_hbase"));
		
		Filter filter = new PrefixFilter(Bytes.toBytes("110"));
		Scan scan = new Scan();
		scan.setFilter(filter);
		ResultScanner scanner = table.getScanner(scan);
		/**
		 * 	CELL: 110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0,Value:222
			CELL: 110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0,Value:110115199402265244
			CELL: 110115199402265244/userInfo:name/1535527096150/Put/vlen=8/seqid=0,Value:"张三"
			CELL: 110115199402265244/userInfo:row/1535525614718/Put/vlen=5/seqid=0,Value:row-1
			CELL: 110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0,Value:223
			CELL: 110115199402265245/userInfo:id/1535524869900/Put/vlen=18/seqid=0,Value:110115199402265245
			CELL: 110115199402265245/userInfo:name/1535527529043/Put/vlen=4/seqid=0,Value:lisi
			CELL: 110115199402265245/userInfo:row/1535525622873/Put/vlen=5/seqid=0,Value:row-2
		 */
		for (Result result : scanner) {
			for (Cell cell : result.rawCells()) {
				System.out.println("CELL: " + cell + ",Value:" + Bytes.toString(CellUtil.cloneValue(cell)));
			}
		}
		scanner.close();
	}

扫描操作以字典序查找，当遇到比前缀大的行时，扫描操作就结束了。通过与起始行配合使用，过滤器的扫描性能大大提高，原因是当它发现后面的行不符合要求时会全部跳过。

4.分页过滤器(PageFilter)

用户可以使用这个过滤器对结果按行分页。当用户创建当前过滤器实例时需要指定pageSize参数，这个参数可以控制每页返回的行数。

注意：在物理上分离的服务器中并行执行过滤操作时，需要注意以下几个事项。

在不同的region服务器上并行执行的过滤器不能共享它们现在的状态和边界，因此，每个过滤器都会在完成扫描前获取pageCount行的结果，这种情况使得分页过滤器可能失效，极有可能返回的比所需要的多。最终客户端在合并结果时可以选择返回所有结果，也可以使用API根据需求筛选结果。

客户端代码会记录本次扫描的最后一行，并在下一次获取数据时把记录的上次扫描的最后一行设为这次扫描的起始行，同时保留相同的过滤属性，然后依次进行迭代。

分页时对依次返回的行数设定了严格的限制，依次扫描所覆盖的行数很可能是多于分页大小的，一旦这种情况发生，过滤器有一种机制通知region服务器停止扫描。

/**
	 * 分页过滤器
	 * @throws IOException
	 */
	public static void pageFilterTest() throws IOException {
		Table table = connection.getTable(TableName.valueOf("socialSecurity"));
		int pageCount = 10;
		Filter filter = new PageFilter(pageCount);
		int totalRows = 0;
		byte[] lastRow = null;
		while(true) {
			Scan scan = new Scan();
			scan.setFilter(filter);
			if (lastRow != null) {
				byte[] startRow = Bytes.add(lastRow, new byte[0]);
				System.out.println("start row:" + Bytes.toStringBinary(startRow));
				scan.setStartRow(startRow);
			}
			ResultScanner scanner = table.getScanner(scan);
			int localRows = 0;
			Result result;
			while ((result = scanner.next()) != null) {
				System.out.println(localRows++ + ": " + result);
				totalRows++;
				lastRow = result.getRow();
			}
			scanner.close();
			if (localRows == 0 || localRows < pageCount) {
				break;
			}
		}
		System.out.println("total rows: " + totalRows);
	}

HBase中的行键是按字典序排列的，因此返回的结果也是如此排序的，并且起始行是被包括在结果中的。用户需要拼接一个零字节(一个长度为零的字节数组)到之前的行键，这样可以保证最后返回的行在本轮扫描时不被包括。当重置扫描的边界时，零字节是最可靠的方式，因为零字节是最小的增幅。即使有一行的行键正好与之前一行加零字节相同，在这一轮循环时也不会有问题，因为起始行在扫描时是被包括在内的。

5.行键过滤器(KeyOnlyFilter)

在一些应用中只需要将结果中KeyValue实例的键返回，而不需要返回实际的数据。KeyOnlyFilter提供了可以修改扫描出的列和单元格的功能。这个过滤器通过KeyValue.convertToKeyOnly(boolean)方法帮助调用只返回键不返回值。

public KeyOnlyFilter(boolean lenAsVal)

这个过滤器的构造函数中需要一个叫lenAsVal的布尔参数。这个参数会被传入convertToKeyOnly方法中，它可以控制KeyValue实例中值的处理。默认值为false，设置为false时，值被设为长度为0的字节数组，设置为true时，值被设为原值长度的字节数组。

6.首次行键过滤器(FirstKeyOnlyFilter)

如果用户需要访问一行中的第一列，则这种过滤器可以满足需求。这种过滤器通常在行数统计(row counter)的应用场景中使用，这种场景只需要检查这一行是否存在。在列式存储数据库中如果某一行存在，则行中必然有列。

由于列也按字典序排列，因此其他可能用到的场景是按照时间先后生成列名，这样最旧的就会排在最前面，因此时间戳最久的列会最先被检索到。

这个类使用了过滤器框架提供的另一个优化特性:它在检查完第一列之后会通知region服务器结束对当前行的扫描，并跳到下一行，与全表扫描相比，其性能得到了提升。

7.包含结束的过滤器(InclusiveStopFilter)

扫描操作中的开始行被包含到结果中，但终止行被排除在外。使用这个过滤器，也可以将终止行包含在内。

/**
	 * 包含结束的过滤器
	 * @throws IOException 
	 */
	public static void inclusiveStopFilterTest() throws IOException {
		Table table = connection.getTable(TableName.valueOf("test_hbase"));
		
		Filter filter = new InclusiveStopFilter(Bytes.toBytes("110115199402265245"));
		Scan scan = new Scan();
		scan.setStartRow(Bytes.toBytes("110115199402265244"));
		//keyvalues={110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0, 110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0, 110115199402265244/userInfo:name/1535527096150/Put/vlen=8/seqid=0, 110115199402265244/userInfo:row/1535525614718/Put/vlen=5/seqid=0}
		//keyvalues={110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0, 110115199402265245/userInfo:id/1535524869900/Put/vlen=18/seqid=0, 110115199402265245/userInfo:name/1535527529043/Put/vlen=4/seqid=0, 110115199402265245/userInfo:row/1535525622873/Put/vlen=5/seqid=0}
		scan.setFilter(filter);
		
		//keyvalues={110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0, 110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0, 110115199402265244/userInfo:name/1535527096150/Put/vlen=8/seqid=0, 110115199402265244/userInfo:row/1535525614718/Put/vlen=5/seqid=0}
		//scan.setStopRow(Bytes.toBytes("110115199402265245"));
		ResultScanner scanner = table.getScanner(scan);
		for (Result result : scanner) {
			System.out.println(result);
		}
		scanner.close();
	}

8.时间戳过滤器(TimestampFilter)

当用户需要在扫描结果中对版本进行细粒度的控制时，这个过滤器可以满足需求。用户需要传入一个装载了时间戳的List实例

public TimestampsFilter(List<Long> timestamps)

一个版本(version)是指一个列在一个特定时间的值，因此用一个时间戳(timestamp)来表示。当过滤器请求一系列的时间戳时，它会找到与其中时间戳精确匹配的列版本。

/**
	 * 时间戳过滤器
	 * @throws IOException
	 * 如果多个时间戳对应一行，则只返回一行
	 * 如果指定了同一列的不同时间戳，则返回新的时间戳的数据
	 */
	public static void timestampsFilterTest() throws IOException {
		Table table = connection.getTable(TableName.valueOf("test_hbase"));
		List<Long> ts = new ArrayList<Long>();
		ts.add(new Long(1535080434895L));
		ts.add(new Long(1535527096150L));
		ts.add(new Long(1535079590454L));
		Filter filter = new TimestampsFilter(ts);
		
		/**
		 * 	keyvalues={110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0}
			keyvalues={110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0, 110115199402265245/userInfo:name/1535079590454/Put/vlen=4/seqid=0}
		 */
		Scan scan = new Scan();
		scan.setFilter(filter);
		ResultScanner scanner = table.getScanner(scan);
		for (Result result : scanner) {
			for (Cell cell : result.rawCells()) {
				System.out.println("CELL: " + cell + ",Value:" + Bytes.toString(CellUtil.cloneValue(cell)));
			}
		}
		scanner.close();
		
		System.out.println("==========scan2==========");
		/**
		 * 	keyvalues={110115199402265244/tag:basic_222/1535079521796/Put/vlen=3/seqid=0, 110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0, 110115199402265244/userInfo:name/1535527096150/Put/vlen=8/seqid=0, 110115199402265244/userInfo:row/1535525614718/Put/vlen=5/seqid=0}
			keyvalues={110115199402265245/tag:basic_223/1535079590454/Put/vlen=3/seqid=0, 110115199402265245/userInfo:id/1535524869900/Put/vlen=18/seqid=0, 110115199402265245/userInfo:name/1535527529043/Put/vlen=4/seqid=0, 110115199402265245/userInfo:row/1535525622873/Put/vlen=5/seqid=0}
		 */
		Scan scan2 = new Scan();
		scan.setFilter(filter);
		scan.setTimeRange(1535080434895L, 1535527096149L);
		ResultScanner scanner2 = table.getScanner(scan2);
		for (Result result : scanner2) {
			for (Cell cell : result.rawCells()) {
				System.out.println("CELL: " + cell + ",Value:" + Bytes.toString(CellUtil.cloneValue(cell)));
			}
		}
		scanner2.close();
	}

9.列计数过滤器(ColumnCountGetFilter)

可以使用这个过滤器来限制每行最多取回多少列。

public ColumnCountGetFilter(final int n)

当一行的列数达到设定的最大值时，这个过滤器会停止整个扫描操作，所以它不太适合扫描操作，反而比较适合在get方法中使用。

10.列分页过滤器(ColumnPaginationFilter)

与PageFilter相似，这个过滤器可以对一行的所有列进行分页。构造器如下

public ColumnPaginationFilter(final int limit, final int offset)

它将跳过所有偏移量小于offset的列，并包括之后所有偏移量在limit之前的列。

/**
	 * 列分页过滤器
	 * @throws IOException 
	 */
	public static void columnPaginationFilterTest() throws IOException {
		Table table = connection.getTable(TableName.valueOf("test_hbase"));
		//返回两列，从列索引1(第二列)开始
		Filter filter = new ColumnPaginationFilter(2, 1);
		/**
		 * 	keyvalues={110115199402265244/userInfo:id/1535524861515/Put/vlen=18/seqid=0, 110115199402265244/userInfo:name/1535527096150/Put/vlen=8/seqid=0}
			keyvalues={110115199402265245/userInfo:id/1535524869900/Put/vlen=18/seqid=0, 110115199402265245/userInfo:name/1535527529043/Put/vlen=4/seqid=0}
		 */
		Scan scan = new Scan();
		scan.setFilter(filter);
		ResultScanner scanner = table.getScanner(scan);
		for (Result result : scanner) {
			System.out.println(result);
		}
		scanner.close();
	}

11.列前缀过滤器(ColumnPrefixFilter)

类似于PrefixFilter，这个过滤器通过对列名称进行前缀匹配过滤。

public ColumnPrefixFilter(final byte [] prefix)

12.随机行过滤器(RandomRowFilter)

可以让结果中包含随机行。构造器如下

public RandomRowFilter(float chance)

在过滤器内部会使用java方法Random.nextFloat()来决定一行是否被过滤，使用这个方法的结果会与用户设定的chance进行比较。如果用户为chance赋一个负值会导致所有结果都被过滤掉，相反，如果chance大于1.0则结果集中包含所有行。

转载于:https://www.cnblogs.com/EnzoDin/p/9564160.html

weixin_30718391

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
HBase-过滤器(各种过滤器及代码实现)

过滤器简介HBase过滤器提供了非常强大的特性来帮助用户提高其处理表中数据的效率。HBase中两种主要的数据读取函数是get和scan，它们都支持直接访问数据和通过指定起止行键访问数据的功能。可以再查询中添加更多的限制条件来减少查询得到的数据量，这些限制可以使指定列族、列、时间戳以及版本号。所有的过滤器都在服务端生效，叫做谓词下推(predicate push down),这样可以保证...
复制链接

扫一扫