CQEngine学习
0x01 摘要
因为项目里面有一个需求,需要对一个实体类的集合按不同字段做查询。传统的list hashmap等数据结构都不能很好的满足,遂在网上搜索一番,最后锁定CQEngine。
本文主要记录下CQEngine的基本概念和一些用法,并会附上小例子。
0x02 CQEngine是什么
CQEngine
全名为Collection Query Engine
,看名字就知道是一个集合查询引擎。有了CQEngine,我们能使用SQL-like
语句高效率地查询 Java 集合。
具体来说,CQEngine拥有如下特点:
- 查询吞吐量达到
百万/秒
- 查询响应速度为
微秒
级 - 可以减轻数据库的压力
- 性能表现胜过普通数据库
- 支持堆内持久化,堆外内存持久化,磁盘持久化
- 支持MVCC事务隔离。
传统上从容器中搜索Object的方法是采用遍历,十分低效。如果要优化就必须深入了解该容器的组成。
而CQEngine 性能优异,下面是CQEngine和 遍历 关于 range-type
查询的对比测试结果图:
CQEngine测试描述如下:
- 1116071 QPS(单核CPU, 1.8GHz)
- 查询平均响应时间为0.896微秒
- CQEngine比普通遍历快 330187.50%
- CQEngine比优化后的遍历快 330187.50%
关于CQEngine Benchmark 测试更多内容请点击 CQEngine Benchmark
0x03 CQEngine的基本概念
3.1 index
3.1.1 index的优化
CQEngine通过为集合类内部的Object的fields
建立indexes
索引以及应用了依据为集合理论的规则的算法来减少搜索时间复杂度,在可扩展性和延时上胜过遍历。
具体来说,可以用CQEngine的index做以下优化:
- Simple Indexes
简易索引可以被添加到集合内Object中的任意数量的单独 field上,使得查询这些field的时间复杂度为 O(1)。类似数据库中的普通索引。 - Multiple indexes
Multiple indexes
可以被添加到同一个field上,针对不同类型查询(等值、范围等)做不同优化 - Compound Indexes
可以被用在多个field上,可以使得查询这若干个field的时间复杂度为O(1)。类似数据库中的联合索引 - Nested Queries
支持嵌套查询,如"WHERE color = ‘blue’ AND(NOT(doors = 2 OR price > 53.00))" - Standing Query Indexes
- 这种索引支持复杂查询或是嵌套查询片段(fragments),无论引用多少个field,查询时间复杂度仍是O(1)
- Statistical Query Plan Optimization
- 当若干field拥有适当的 index时,CQEngine会使用这些index的统计学信息来挑选index,生成一个拥有最小查询时间复杂度的执行计划。此时时间复杂度可能大于O(1)但是小于O(n)
- Iteration fallback
如果查询时没有适合的index可用,CQEngine会通过Iteration来延迟评估此次查询 - CQEngine supports full concurrency
CQEngine支持并发。在CQEngine运行的同时,Object会被添加到集合中或从集合中删除; CQEngine会实时更新所有已注册的索引,对用户透明 - Type-safe
几乎所有的查询错误会在编译时体现而不是运行时。所有索引和查询都使用泛型在对象和字段层级做了强类型化,减少错误发生可能 - On-heap/off-heap/disk
Object可能被存在以下位置:- 堆内,和传统的Java集合相同
- 堆外,即非堆内存
- 磁盘
3.1.2 IndexedCollection的不同实现
CQEngine的索引集合有三种支持不同并发和事务隔离的实现:
3.1.2.1 ConcurrentIndexedCollection
- 实现自
IndexedCollection
- 没有事务隔离
addIndex
方法可以添加查询使用的索引,提高查询效率- 当增加元素时,可以自动更新索引
- 读与读之间线程安全(任何时候)
- 写与写(操作collection中不同对象)(
add
/remove
)之间线程安全 - 当多个线程尝试
add
或remove
本collection中同一个元素对象(同一个实例或者是hashcode
相同且equals
方法返回true
)时,线程不安全。此时可能会因为索引不同步,从而导致不一致的结果。此时应该使用其子类ObjectLockingIndexedCollection
,可以使得写与写在任何时候都线程安全。
3.1.2.2 ObjectLockingIndexedCollection
- 继承自
ConcurrentIndexedCollection
- 无锁的并发读
- 有锁的写(对象层级的事务隔离和一致性保证)。特别是对多线程add/remove同一个对象设计了专门的锁机制保证线程安全
- 并发控制原理是
striped lock
3.1.2.3 TransactionalIndexedCollection
- 继承自
ConcurrentIndexedCollection
- 无锁的并发读
- 有序的写(通过MVCC实现完整的
READ_COMMITTED
级别事务隔离)。特别是对多线程add/remove同一个对象设计了专门的锁机制保证线程安全 - 并发控制原理是
MVCC
可以像下面这样通过一个Long值添加一个版本号:
static final AtomicLong VERSION_GENERATOR = new AtomicLong();
final long version = VERSION_GENERATOR.incrementAndGet();
@Override
public boolean equals(Object o) {
if (this == o) { return true; }
if (null == o || getClass() != o.getClass()) return false;
Car car = (Car) o;
if (carId != car.carId) { return false; }
if (this.version != car.version) return false;
return true;
}
关于事务隔离更多内容请点击TransactionIsolation
3.2 Attributes
3.2.1 概念
CQEngine 需要访问Object中的field来添加index和检索值,但并非是通过反射而是通过一种叫attributes
的概念。
attribute是一个访问者对象,可以读取POJO中的field值。
下面是一个读取carId
的CAR_ID Attribute
例子:
public static final Attribute<Car, Integer> CAR_ID = new SimpleAttribute<Car, Integer>("carId") {
public Integer getValue(Car car, QueryOptions queryOptions) { return car.carId; }
};
另一种通过lambda表达式的例子:
public static final Attribute<Car, String> FEATURES = com.googlecode.cqengine.query.QueryFactory.attribute(String.class, "features", Car::getFeatures);
3.2.2 空值
如果数据包含null
,那应该使用 SimpleNullableAttribute
或 MultiValueNullableAttribute
,而不是SimpleAttribute
,MultiValueAttribute
,否则可能抛出NullPointerException
3.3 持久化
3.3.1 数据的持久化
数据持久化默认在堆内。
- On-heap
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
- Off-heap:
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>(OffHeapPersistence.onPrimaryKey(Car.CAR_ID));
- tmp-file,可用DiskPersistence.getFile()查到
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>(DiskPersistence.onPrimaryKey(Car.CAR_ID));
存在指定路径的文件中:
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>(DiskPersistence.onPrimaryKeyInFile(Car.CAR_ID, new File("cars.dat")));
3.3.2 索引的持久化
索引也有on-heap, off-heap, on disk 三种持久化方式,而且可以和数据持久化方式不同,甚至可以一个集合类不同索引使用不同持久化方式。
- On-heap
cars.addIndex(NavigableIndex.onAttribute(Car.MANUFACTURER));
- Off-heap:
cars.addIndex(OffHeapIndex.onAttribute(Car.MANUFACTURER));
- Disk
cars.addIndex(DiskIndex.onAttribute(Car.MANUFACTURER));
注意使用持久化数据的ResultSet的时候使用try catch 防止异常抛出。
0x04 CQEngine 常用API
4.1 IndexedCollection
IndexedCollection是一个接口,继承自java.util.Set
。IndexedCollection还提供了两个额外的方法:
- addIndex(SomeIndex)
将index添加到集合中 - retrieve(Query)
接受Query参数,返回满足查询请求的ResultSet
,可用iterator或Java 8中的Stream进行遍历
4.2 ResultSet
- iterator()
主要是用来迭代遍历结果集 - size()
返回结果集大小 - contains
查询目标对象是否在结果集内 - uniqueResult()
期望返回唯一结果,如果结果为0或多余1都会抛异常 - stream()
直接转为 Java 8+ 的Stream
,就可以愉快的用lambda表达式进行操作了 - getRetrievalCost()
该指标展示CQEngine查询开销,如果某个Query无Index可用就会返回Integer.MAX_VALUE(2147483647)。 - getMergeCost()
该指标展示CQEngine重排序该查询的元素以减小时间复杂度的开销。 - close()
释放所有该Query打开的资源、关闭相关事务。是否需要close ResultSet的依据是两个:实现IndexedCollection
的对象类型、添加的index类型。
具体来说,当使用IndexedCollection
或index,且将他们持久化到off-heap
或disk
时,就必须在最后close ResultSet。可以使用以下方式关闭:
try (ResultSet<Car> results = cars.retrieve(equal(Car.MANUFACTURER, "Ford"))) {
results.forEach(System.out::println);
}
4.3 Grouping and Aggregation
因为CQEngine整合了Java8+的 Stream API
,所以本身的CQEngine API不支持分组和聚合,而是用lambda表达式来实现。
ResultSet可以使用ResultSet.stream()来转化为 Java Stream。但为了查询最佳性能,请尽可能多的使用CQEngine支持的语句,非必要不要使用Java Stream。
下面是一个将ResultSet转换为Stream的例子:
public static void main(String[] args) {
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<>();
cars.addAll(CarFactory.createCollectionOfCars(10));
cars.addIndex(NavigableIndex.onAttribute(Car.MANUFACTURER));
Set<Car.Color> distinctColorsOfFordCars = cars.retrieve(equal(Car.MANUFACTURER, "Ford"))
.stream()
.map(Car::getColor)
.collect(Collectors.toSet());
System.out.println(distinctColorsOfFordCars); // prints: [GREEN, RED]
}
在5.3节还会展示一个按指定字段分组(group by)的聚合统计例子。
4.4 OnHeap Index
这里写一些常用的堆内索引 API,实现展示一下Car.java:
import com.googlecode.cqengine.attribute.Attribute;
import com.googlecode.cqengine.attribute.MultiValueAttribute;
import com.googlecode.cqengine.attribute.SimpleAttribute;
import com.googlecode.cqengine.query.option.QueryOptions;
import java.util.List;
/**
* @author Niall Gallagher
*/
public class Car {
public final int carId;
public final String name;
public final String description;
public final List<String> features;
public Car(int carId, String name, String description, List<String> features) {
this.carId = carId;
this.name = name;
this.description = description;
this.features = features;
}
@Override
public String toString() {
return "Car{carId=" + carId + ", name='" + name + "', description='" + description + "', features=" + features + "}";
}
// -------------------------- Attributes --------------------------
public static final Attribute<Car, Integer> CAR_ID = new SimpleAttribute<Car, Integer>("carId") {
public Integer getValue(Car car, QueryOptions queryOptions) { return car.carId; }
};
public static final Attribute<Car, String> NAME = new SimpleAttribute<Car, String>("name") {
public String getValue(Car car, QueryOptions queryOptions) { return car.name; }
};
public static final Attribute<Car, String> DESCRIPTION = new SimpleAttribute<Car, String>("description") {
public String getValue(Car car, QueryOptions queryOptions) { return car.description; }
};
public static final Attribute<Car, String> FEATURES = new MultiValueAttribute<Car, String>("features") {
public List<String> getValues(Car car, QueryOptions queryOptions) { return car.features; }
};
}
4.4.1 NavigableIndex
前面已经提到过,是在Java 堆内持久化的索引。支持以下查询类型:
- Equal
- LessThan
- GreaterThan
- Between
例子:
cars.addIndex(NavigableIndex.onAttribute(Car.CAR_ID));
System.out.println("Cars whose id is less than 2:");
Query<Car> query1 = lessThan(Car.CAR_ID, 2);
4.4.2 ReversedRadixTreeIndex
由ConcurrentReversedRadixTree
支持实现的索引,支持以下查询类型:
- Equal
- StringEndsWith
cars.addIndex(ReversedRadixTreeIndex.onAttribute(Car.NAME));
System.out.println("Cars whose name ends with 'vic'");
Query<Car> query1 = endsWith(Car.NAME, "vic");
4.4.3 SuffixTreeIndex
由ConcurrentReversedRadixTree
支持实现的索引,支持以下查询类型:
- Equal
- StringEndsWith
- StringContains
cars.addIndex(SuffixTreeIndex.onAttribute(Car.DESCRIPTION));
System.out.println("Cars whose description contains 'flat tyre can'");
Query<Car> query2 = contains(Car.DESCRIPTION, "flat tyre");
4.4.4 HashIndex
由ConcurrentReversedRadixTree
支持实现的索引,支持以下查询类型:
- Equal
System.out.println("\nCars which have a sunroof or a radio: ");
Query<Car> query3 = in(Car.FEATURES, "sunroof", "radio");
0x05 示例
5.1 简单例子
示例一是一个简单例子,使用ConcurrentIndexedCollection
,通过不同条件查询Car。
import static com.googlecode.cqengine.query.QueryFactory.*;
/**
* An introductory example which demonstrates usage using a Car analogy.
*
* @author Niall Gallagher
*/
public class Introduction {
public static void main(String[] args) {
// 创建一个索引集合
// 也可以通过CQEngine.copyFrom()从已存在的集合来创建
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
// 添加一些索引
cars.addIndex(NavigableIndex.onAttribute(Car.CAR_ID));
cars.addIndex(ReversedRadixTreeIndex.onAttribute(Car.NAME));
cars.addIndex(SuffixTreeIndex.onAttribute(Car.DESCRIPTION));
cars.addIndex(HashIndex.onAttribute(Car.FEATURES));
// Add some objects to the collection...
// 添加对象到集合中
cars.add(new Car(1, "ford focus", "great condition, low mileage", Arrays.asList("spare tyre", "sunroof")));
cars.add(new Car(2, "ford taurus", "dirty and unreliable, flat tyre", Arrays.asList("spare tyre", "radio")));
cars.add(new Car(3, "honda civic", "has a flat tyre and high mileage", Arrays.asList("radio")));
// -------------------------- 查询 --------------------------
System.out.println("Cars whose name ends with 'vic' or whose id is less than 2:");
Query<Car> query1 = or(endsWith(Car.NAME, "vic"), lessThan(Car.CAR_ID, 2));
cars.retrieve(query1).forEach(System.out::println);
System.out.println("\nCars whose flat tyre can be replaced:");
Query<Car> query2 = and(contains(Car.DESCRIPTION, "flat tyre"), equal(Car.FEATURES, "spare tyre"));
cars.retrieve(query2).forEach(System.out::println);
System.out.println("\nCars which have a sunroof or a radio but are not dirty:");
Query<Car> query3 = and(in(Car.FEATURES, "sunroof", "radio"), not(contains(Car.DESCRIPTION, "dirty")));
cars.retrieve(query3).forEach(System.out::println);
}
}
5.2 字符串查询: SQL and CQN 方言
CQEngine支持SQL和CQN(CQEngine 语言)的查询格式。
SQL例子:
SQLParser<Car> parser = SQLParser.forPojoWithAttributes(Car.class, createAttributes(Car.class));
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
cars.addAll(CarFactory.createCollectionOfCars(10));
ResultSet<Car> results = parser.retrieve(cars, "SELECT * FROM cars WHERE (" +
"(manufacturer = 'Ford' OR manufacturer = 'Honda') " +
"AND price <= 5000.0 " +
"AND color NOT IN ('GREEN', 'WHITE')) " +
"ORDER BY manufacturer DESC, price ASC");
results.forEach(System.out::println); // Prints: Honda Accord, Ford Fusion, Ford Focus
CQN例子:
CQNParser<Car> parser = CQNParser.forPojoWithAttributes(Car.class, createAttributes(Car.class));
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
cars.addAll(CarFactory.createCollectionOfCars(10));
ResultSet<Car> results = parser.retrieve(cars,
"and(" +
"or(equal(\"manufacturer\", \"Ford\"), equal(\"manufacturer\", \"Honda\")), " +
"lessThanOrEqualTo(\"price\", 5000.0), " +
"not(in(\"color\", GREEN, WHITE))" +
")");
results.forEach(System.out::println); // Prints: Ford Focus, Ford Fusion, Honda Accord
5.3 聚合查询
CQEngine没有直接实现count(*) … group by x 语法,但是官方推荐是用Java8+的stream API来实现。
下面是一个按汽车颜色来分组并分别求数量的完整例子:
- SQLAggregationDemo.java
import com.googlecode.cqengine.ConcurrentIndexedCollection;
import com.googlecode.cqengine.IndexedCollection;
import com.googlecode.cqengine.query.parser.sql.SQLParser;
import com.googlecode.cqengine.resultset.ResultSet;
import demos.cqengine.testutils.Car;
import demos.cqengine.testutils.CarFactory;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import static com.googlecode.cqengine.codegen.AttributeBytecodeGenerator.createAttributes;
/**
* Created by chengc on 2018/11/6.
*/
public class SQLAggregationDemo
{
public static void main(String[] args) {
SQLParser<Car> parser = SQLParser.forPojoWithAttributes(Car.class, createAttributes(Car.class));
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
cars.addAll(CarFactory.createCollectionOfCars(10));
ResultSet<Car> results = parser.retrieve(cars, "SELECT * FROM cars");
Map<String,List<Car>> carResultMap = results.stream().collect(Collectors.groupingBy(t -> t.getColor().toString()));
Iterator iterator = carResultMap.entrySet().iterator();
while (iterator.hasNext()){
Map.Entry<String, List<Car>> entry = (Map.Entry) iterator.next();
System.out.println("Car.color=" + entry.getKey()+", count=" + entry.getValue().size());
}
}
}
- Car.java
import java.util.List;
/**
* @author Niall Gallagher
*/
public class Car {
public enum Color {RED, GREEN, BLUE, BLACK, WHITE}
final int carId;
final String manufacturer;
final String model;
final Color color;
final int doors;
final double price;
final List<String> features;
public Car(int carId, String manufacturer, String model, Color color, int doors, double price, List<String> features) {
this.carId = carId;
this.manufacturer = manufacturer;
this.model = model;
this.color = color;
this.doors = doors;
this.price = price;
this.features = features;
}
public int getCarId() {
return carId;
}
public String getManufacturer() {
return manufacturer;
}
public String getModel() {
return model;
}
public Color getColor() {
return color;
}
public int getDoors() {
return doors;
}
public double getPrice() {
return price;
}
public List<String> getFeatures() {
return features;
}
@Override
public String toString() {
return "Car{" +
"carId=" + carId +
", manufacturer='" + manufacturer + '\'' +
", model='" + model + '\'' +
", color=" + color +
", doors=" + doors +
", price=" + price +
", features=" + features +
'}';
}
@Override
public boolean equals(Object o) {
if (this == o) { return true; }
if (!(o instanceof Car)) { return false; }
Car car = (Car) o;
if (carId != car.carId) { return false; }
return true;
}
@Override
public int hashCode() {
return carId;
}
}
- CarFactory.java
import com.googlecode.concurrenttrees.common.LazyIterator;
import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;
import static java.util.Arrays.asList;
/**
* @author Niall Gallagher
*/
public class CarFactory {
public static Set<Car> createCollectionOfCars(int numCars) {
Set<Car> cars = new LinkedHashSet<Car>(numCars);
for (int carId = 0; carId < numCars; carId++) {
cars.add(createCar(carId));
}
return cars;
}
public static Iterable<Car> createIterableOfCars(final int numCars) {
final AtomicInteger count = new AtomicInteger();
return new Iterable<Car>() {
@Override
public Iterator<Car> iterator() {
return new LazyIterator<Car>() {
@Override
protected Car computeNext() {
int carId = count.getAndIncrement();
return carId < numCars ? createCar(carId) : endOfData();
}
};
}
};
}
public static Car createCar(int carId) {
switch (carId % 10) {
case 0: return new Car(carId, "Ford", "Focus", Car.Color.RED, 5, 5000.00, noFeatures());
case 1: return new Car(carId, "Ford", "Fusion", Car.Color.RED, 4, 3999.99, asList("hybrid"));
case 2: return new Car(carId, "Ford", "Taurus", Car.Color.GREEN, 4, 6000.00, asList("grade a"));
case 3: return new Car(carId, "Honda", "Civic", Car.Color.WHITE, 5, 4000.00, asList("grade b"));
case 4: return new Car(carId, "Honda", "Accord", Car.Color.BLACK, 5, 3000.00, asList("grade c"));
case 5: return new Car(carId, "Honda", "Insight", Car.Color.GREEN, 3, 5000.00, noFeatures());
case 6: return new Car(carId, "Toyota", "Avensis", Car.Color.GREEN, 5, 5999.95, noFeatures());
case 7: return new Car(carId, "Toyota", "Prius", Car.Color.BLUE, 3, 8500.00, asList("sunroof", "hybrid"));
case 8: return new Car(carId, "Toyota", "Hilux", Car.Color.RED, 5, 7800.55, noFeatures());
case 9: return new Car(carId, "BMW", "M6", Car.Color.BLUE, 2, 9000.23, asList("coupe"));
default: throw new IllegalStateException();
}
}
static List<String> noFeatures() {
return Collections.<String>emptyList();
}
}
- 运行结果
Car.color=RED, count=3
Car.color=WHITE, count=1
Car.color=BLUE, count=2
Car.color=BLACK, count=1
Car.color=GREEN, count=3
5.4 SQL + Index
在5.2的官方例子中没有使用索引,会发现getRetrievalCost()
和getMergeCost()
都返回Integer.MAX_VALUE。
这里讲一个加index的例子:
import com.googlecode.cqengine.ConcurrentIndexedCollection;
import com.googlecode.cqengine.IndexedCollection;
import com.googlecode.cqengine.attribute.Attribute;
import com.googlecode.cqengine.index.hash.HashIndex;
import com.googlecode.cqengine.index.navigable.NavigableIndex;
import com.googlecode.cqengine.query.parser.sql.SQLParser;
import com.googlecode.cqengine.resultset.ResultSet;
import demos.cqengine.testutils.Car;
import demos.cqengine.testutils.CarFactory;
import java.util.Map;
import static com.googlecode.cqengine.codegen.AttributeBytecodeGenerator.createAttributes;
/**
* Created by chengc on 2018/11/6.
*/
public class SQLQueryDemo {
public static void main(String[] args) {
Map<String, ? extends Attribute<Car, ?>> attributesMap = createAttributes(Car.class);
SQLParser<Car> parser = SQLParser.forPojoWithAttributes(Car.class, attributesMap);
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
cars.addIndex(HashIndex.onAttribute((Attribute<Car, String>) attributesMap.get("manufacturer")));
cars.addIndex(NavigableIndex.onAttribute((Attribute<Car, String>) attributesMap.get("price")));
cars.addIndex(HashIndex.onAttribute((Attribute<Car, String>) attributesMap.get("color")));
cars.addAll(CarFactory.createCollectionOfCars(10));
ResultSet<Car> results = parser.retrieve(cars, "SELECT * FROM cars WHERE (" +
"(manufacturer = 'Ford' OR manufacturer = 'Honda') " +
"AND price <= 5000.0 " +
"AND color NOT IN ('GREEN', 'WHITE')) " +
"ORDER BY manufacturer DESC, price ASC");
System.out.println("results.getRetrievalCost()=" + results.getRetrievalCost());
System.out.println("results.getMergeCost()=" + results.getMergeCost());
results.forEach(System.out::println); // Prints: Honda Accord, Ford Fusion, Ford Focus
}
}
最后关于CQEngine 查询开销的输出如下:
results.getRetrievalCost()=40
results.getMergeCost()=25