一、数据准备
下载开放的强生出租车的数据,网址如下,http://shanghai.sodachallenges.com/data.html,本测试的案例已经下载了30天的轨迹,大概每天6000万条,原始文件以txt存储,大概的文件目录结构如下:
数据//HT160801//01//00//1608010000.txt
其中txt代表某一分钟的所有出租车位置数据
数据格式如下:
14082|A|0|1|V|0|0|0|2016-08-01|2016-08-01 00:01:05|121.480657|31.180292|44.0|318.0|0|032
二、数据描述
出租车定位的信息项有很多,这里为了简化influxdb的存储,摘取了主要的几项,构建SHTaxi实体如下
private String carid;//车辆编号
private String datetime;//定位时间
private double lon;//经度
private double lat;//维度
private double speed;//速度
private double dir;//方位角
三、存储设计
1.创建shtaxi表
2.将carid属性作为tag值
3.将lon、lat、speed等作为Field
4.将datetime转换为long型时间,存储为time
5.分批批量加入,我这里是解析每个txt文件里面所有,然后批量插入一次
private static void insertData() {
InfluxdbUtils influxdb = new InfluxdbUtils();
try {
influxdb.setUp();
String basepath = "D:\\Data\\强生taxi\\HT160801\\01\\";
for (int i = 0; i < 24; i++) {
for (int j = 0; j < 60; j++) {
String stri = String.valueOf(i);
if (i < 10) {
stri = "0" + i;
}
String strj = String.valueOf(j);
if (j < 10) {
strj = "0" + j;
}
strj = "160801" + stri + strj;
String path = basepath + stri + "\\" + strj + ".txt";
System.out.println("准备写入" + path);
BatchPoints pts = setUpBatchPoints(getData(path));
influxdb.write(pts);
System.out.println("写入成功");
}
}
System.out.println("写入成功");
influxdb.closeInfluxDBClient();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
private static List<SHTaxi> getData(String basepath) {
File file = new File(basepath);
if (file.exists()) {
BufferedReader reader = null;
try {
System.out.println("以行为单位读取文件内容,一次读一整行:");
reader = new BufferedReader(new FileReader(file));
String tempString = null;
List<SHTaxi> taxis = new ArrayList<SHTaxi>();
// 一次读入一行,直到读入null为文件结束
while ((tempString = reader.readLine()) != null) {
// 显示行号
String[] infos = tempString.split("\|");
SHTaxi taxi = new SHTaxi();
taxi.setCarid(infos[0]);
taxi.setDatetime(infos[9]);
taxi.setLon(Double.parseDouble(infos[10]));
taxi.setLat(Double.parseDouble(infos[11]));
taxi.setDir(Double.parseDouble(infos[13]));
taxi.setSpeed(Double.parseDouble(infos[12]));
taxi.setState(Integer.parseInt(infos[14]));
taxis.add(taxi);
}
reader.close();
return taxis;
} catch (IOException e) {
e.printStackTrace();
return null;
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e1) {
}
}
}
} else {
System.out.println("没有找到文件");
return null;
}
}
public static BatchPoints setUpBatchPoints(List<SHTaxi> taxis) {
System.out.println("构建BatchPoints");
BatchPoints batchPoints = BatchPoints.database(InfluxdbUtils.DATABASE).tag("async", "true")
.retentionPolicy(InfluxdbUtils.rpName).consistency(ConsistencyLevel.ALL).build();
for (SHTaxi taxi : taxis) {
Point point = Point.measurement("shtaxi").tag("carid", taxi.getCarid())
.time(DateUtils.formatDate(taxi.getDatetime()).getTime(), TimeUnit.MILLISECONDS)
.addField("lon", taxi.getLon()).addField("lat", taxi.getLat()).addField("head", taxi.getDir())
.addField("speed", taxi.getSpeed()).build();
batchPoints.point(point);
}
return batchPoints;
}
四、基础查询
1.当前表的总条数
select count(*) from shtaxi
查询结果:
time count_head count_lat count_lon count_speed
0 64934444 64934444 64934444 64934444
大概有6000多万数据,查询非常耗时,大约30s
2.查询十条数据
SELECT * from shtaxi limit 10
查询结果:
time async carid head lat lon speed
1222639851000000000 true 18860 716 31.276538 121 0
1262275200000000000 true 00003 0 31.313642 121.427585 0
1262275200000000000 true 00007 0 31.262738 121.581457 0
1262275200000000000 true 00016 0 31.299445 121.11689 0
1262275200000000000 true 00020 0 31.229033 121.279512 0
1262275200000000000 true 00025 0 31.265292 121.511433 0
1262275200000000000 true 00035 0 31.108333 121.44878 0
1262275200000000000 true 00037 0 31.217817 121.406355 0
1262275200000000000 true 00048 0 31.153637 121.547815 0
1262275200000000000 true 00049 0 31.276552 121.35836 0
整个查询时间相对第一个查询快了很多,但是差不多也10s左右。
3.查询指定时间段的位置数据
SELECT * FROM shtaxi where time>=’2016-07-31T20:33:32Z’ AND time <=’2016-07-31T20:33:33Z’
查询结果:
time async carid head lat lon speed
1469997213000000000 true 13617 96 30.990268 121.496652 33
1469997213000000000 true 13626 333 31.009327 121.72234 0
1469997213000000000 true 13657 347 31.286078 121.550668 0
1469997213000000000 true 13662 108 31.301087 121.454312 0
1469997213000000000 true 13684 242 31.360207 121.417495 0
1469997213000000000 true 13696 240 31.110172 121.258982 0
1469997213000000000 true 13726 301 31.340192 121.455047 0
1469997213000000000 true 13738 134 30.990753 121.501022 0
1469997213000000000 true 13744 325 31.30235 121.528888 0
1469997213000000000 true 13751 325 31.35154 121.396623 0
1469997213000000000 true 13758 32 31.126795 121.393505 0
1469997213000000000 true 13765 115 31.321905 121.418807 0
1469997213000000000 true 13792 60 31.386887 121.360457 0
1469997213000000000 true 13797 81 31.26792 121.599462 0
1469997213000000000 true 13803 244 31.349768 121.3945 0
1469997213000000000 true 13804 105 31.306233 121.392213 0
1469997213000000000 true 13807 309 31.25703 121.566315 0
…
查询速度非常快,毫秒级的查询返回
五、总结
本次主要是测试目标为较大数据量的存储以及它的查询性能,目前来看,时间条件下的查询优势还是非常明显的。