Apahce Calcite 是一个动态的数据管理框架。它和经典的数据管理系统的主要区别在于它省略了经典数据管理系统的数据存储,数据加工,元数据存储等功能。
正是因为Calcite 有意的回避了这些功能,使得它成为了在应用和数据库之间的很好的一个中间适配软件。
如何引入calcite,以maven方式为例
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<version>1.20.0</version>
</dependency>
下面来创建一个基于内存数据的查询
// 通过java创建一个具有类似表结构的数据
public static class HrSchema {
public final Employee[] emps = new Employee[2];
public final Department[] depts = new Department[2];
}
public class Department {
public Integer deptno;
public Department(Integer deptno) {
this.deptno = deptno;
}
}
public class Employee {
public Integer empid;
public Integer deptno;
public Employee(Integer empid, Integer deptno) {
this.empid = empid;
this.deptno = deptno;
}
}
接下来我们使用calcite 来查询这个结构里的数据
// 通过该方法来添加一些数据,并返回这个集合
public static HrSchema createInmemSchema () {
HrSchema hs = new HrSchema();
hs.depts[0] = new Department(1);
hs.depts[1] = new Department(2);
hs.emps[0] = new Employee(11,2);
hs.emps[1] = new Employee(22,2);
return hs;
}
// Class StudyDoc
public static void TestInMemDatasource () throws ClassNotFoundException, SQLException {
HrSchema hs = StudyDoc.createInmemSchema();
Class.forName("org.apache.calcite.jdbc.Driver");
Properties info = new Properties();
info.setProperty("lex", "JAVA");
Connection connection =
DriverManager.getConnection("jdbc:calcite:", info);
CalciteConnection calciteConnection =
connection.unwrap(CalciteConnection.class);
SchemaPlus rootSchema = calciteConnection.getRootSchema();
Schema schema = new ReflectiveSchema(hs);
rootSchema.add("hr", schema);
Statement statement = calciteConnection.createStatement();
ResultSet resultSet = statement.executeQuery(
"select d.deptno, min(e.empid)\n"
+ "from hr.emps as e\n"
+ "join hr.depts as d\n"
+ " on e.deptno = d.deptno\n"
+ "group by d.deptno\n"
+ "having count(*) > 1");
ShowQueryResult(resultSet);
resultSet.close();
statement.close();
connection.close();
}
// 查询的结果打印为:
deptno=2; EXPR$1=11
下面演示如何通过jdbc 来连接mysql数据源进行查询
public static void TestJDBCDatasource (String sql) throws ClassNotFoundException, SQLException{
if (sql == null) {
sql = "select * from sap.sap_table limit 10";
}
Class.forName("org.apache.calcite.jdbc.Driver");
Properties info = new Properties();
info.setProperty("lex", "JAVA");
Connection connection =
DriverManager.getConnection("jdbc:calcite:", info);
CalciteConnection calciteConnection =
connection.unwrap(CalciteConnection.class);
SchemaPlus rootSchema = calciteConnection.getRootSchema();
Class.forName("com.mysql.jdbc.Driver");
BasicDataSource dataSource = new BasicDataSource();
dataSource.setUrl("jdbc:mysql://localhost/sap");
dataSource.setUsername("root");
dataSource.setPassword("112233");
Schema schema = JdbcSchema.create(rootSchema, "sap", dataSource,
null, "sap");
rootSchema.add("sap", schema);
Statement statement = calciteConnection.createStatement();
ResultSet resultSet = statement.executeQuery(
sql);
ShowQueryResult(resultSet);
resultSet.close();
statement.close();
connection.close();
}
注意:需要提前安装com.mysql.jdbc.Driver对应的jar包
Maven安装配置
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.41</version>
</dependency>
运行后的执行结果
ts_code=000001.SZ; symbol=000001; name=平安银行; area=深圳; industry=银行; list_date=19910403
ts_code=000002.SZ; symbol=000002; name=万科A; area=深圳; industry=全国地产; list_date=19910129
ts_code=000004.SZ; symbol=000004; name=国农科技; area=深圳; industry=生物制药; list_date=19910114
ts_code=000005.SZ; symbol=000005; name=世纪星源; area=深圳; industry=环境保护; list_date=19901210
ts_code=000006.SZ; symbol=000006; name=深振业A; area=深圳; industry=区域地产; list_date=19920427
ts_code=000007.SZ; symbol=000007; name=全新好; area=深圳; industry=酒店餐饮; list_date=19920413
ts_code=000008.SZ; symbol=000008; name=神州高铁; area=北京; industry=运输设备; list_date=19920507
ts_code=000009.SZ; symbol=000009; name=中国宝安; area=深圳; industry=综合类; list_date=19910625
ts_code=000010.SZ; symbol=000010; name=*ST美丽; area=深圳; industry=建筑施工; list_date=19951027
ts_code=000011.SZ; symbol=000011; name=深物业A; area=深圳; industry=区域地产; list_date=19920330
接下来演示如何通过配置json文件的方式来读取jdbc数据源的数据
上面的代码都是函数化的方式,不直观。看下面的配置,同样可以完成上面的功能设置:
//stock.json
{
version: '1.0',
defaultSchema: 'sap',
schemas: [
{
name: 'sap',
type: 'custom',
factory: 'org.apache.calcite.adapter.jdbc.JdbcSchema$Factory',
operand: {
jdbcDriver: 'com.mysql.jdbc.Driver',
jdbcUrl: 'jdbc:mysql://localhost:3306/sap',
jdbcUser: 'root',
jdbcPassword: '112233'
}
}
]
}
public static void TestJDBCDatasourceByJsonCof() throws ClassNotFoundException, SQLException {
Properties info = new Properties();
info.setProperty("lex", "JAVA");
Connection connection =
// 读取上面的配置文件
DriverManager.getConnection("jdbc:calcite:model=/Users/guosheng.lu/Desktop/Java/studyCalcite/src/main/java/stock.json", info);
CalciteConnection calciteConnection =
connection.unwrap(CalciteConnection.class);
Statement statement = calciteConnection.createStatement();
ResultSet resultSet = statement.executeQuery("select * from sap_table limit 10");
ShowQueryResult(resultSet);
resultSet.close();
statement.close();
connection.close();
}
同样可以查询到如下结果。
通过上面的代码演示,我们可以初步感受到Apache Calcite 作为数据源和应用之间的中间件,是如何来配置并读取数据的。我们初步可以感受到,它作为一个中间件,可以适配多种具有一定数据结构的不管是内存中的还是我们传统的RDMS数据源。这对我们大数据数据仓库上对接多路数据的适配起到了很大的作用。