Cloudera公司已经推出了基于Hadoop平台的查询统计分析工具Impala,只要熟悉SQL,就可以熟练地使用Impala来执行查询与分析的功能。不过Impala的SQL和关系数据库的SQL还是有一点微妙地不同的。 下面,我们设计一个表,通过该表中的数据,来将SQL查询与统计的语句,使用Solr查询的方式来与SQL查询对应。这个翻译的过程,是非常有趣的,你可以看到Solr一些很不错的功能。 用来示例的表结构设计,如图所示: 下面,我们通过给出一些SQL查询统计语句,然后对应翻译成Solr查询语句,然后对比结果。
查询对比
SQL查询语句:
1
SELECT
log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type
3
WHERE
prov_id = 1
AND
net_type = 1
AND
area_id = 10304
AND
time_type = 1
AND
time_id >= 20130801
AND
time_id <= 20130815
4
ORDER
BY
log_id LIMIT 10;
查询结果,如图所示: Solr查询URL:
1
http://slave1:8888/solr-cloud/i_event/select?q= *:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=prov_id:1 AND net_type:1 AND area_id:10304 AND time_type:1 AND time_id:[20130801 TO 20130815]&sort=log_id asc&start=0&rows=10
查询结果,如下所示:
002
<
lst
name
=
"responseHeader"
>
003
<
int
name
=
"status"
>0</
int
>
004
<
int
name
=
"QTime"
>4</
int
>
006
<
result
name
=
"response"
numFound
=
"77"
start
=
"0"
>
008
<
int
name
=
"log_id"
>6827</
int
>
009
<
long
name
=
"start_time"
>1375072117</
long
>
010
<
long
name
=
"end_time"
>1375081683</
long
>
011
<
int
name
=
"prov_id"
>1</
int
>
012
<
int
name
=
"city_id"
>103</
int
>
013
<
int
name
=
"area_id"
>10304</
int
>
014
<
int
name
=
"idt_id"
>11002</
int
>
015
<
int
name
=
"cnt"
>0</
int
>
016
<
int
name
=
"net_type"
>1</
int
>
019
<
int
name
=
"log_id"
>6827</
int
>
020
<
long
name
=
"start_time"
>1375072117</
long
>
021
<
long
name
=
"end_time"
>1375081683</
long
>
022
<
int
name
=
"prov_id"
>1</
int
>
023
<
int
name
=
"city_id"
>103</
int
>
024
<
int
name
=
"area_id"
>10304</
int
>
025
<
int
name
=
"idt_id"
>11000</
int
>
026
<
int
name
=
"cnt"
>0</
int
>
027
<
int
name
=
"net_type"
>1</
int
>
030
<
int
name
=
"log_id"
>6851</
int
>
031
<
long
name
=
"start_time"
>1375142158</
long
>
032
<
long
name
=
"end_time"
>1375146391</
long
>
033
<
int
name
=
"prov_id"
>1</
int
>
034
<
int
name
=
"city_id"
>103</
int
>
035
<
int
name
=
"area_id"
>10304</
int
>
036
<
int
name
=
"idt_id"
>14001</
int
>
037
<
int
name
=
"cnt"
>5</
int
>
038
<
int
name
=
"net_type"
>1</
int
>
041
<
int
name
=
"log_id"
>6851</
int
>
042
<
long
name
=
"start_time"
>1375142158</
long
>
043
<
long
name
=
"end_time"
>1375146391</
long
>
044
<
int
name
=
"prov_id"
>1</
int
>
045
<
int
name
=
"city_id"
>103</
int
>
046
<
int
name
=
"area_id"
>10304</
int
>
047
<
int
name
=
"idt_id"
>11002</
int
>
048
<
int
name
=
"cnt"
>23</
int
>
049
<
int
name
=
"net_type"
>1</
int
>
052
<
int
name
=
"log_id"
>6851</
int
>
053
<
long
name
=
"start_time"
>1375142158</
long
>
054
<
long
name
=
"end_time"
>1375146391</
long
>
055
<
int
name
=
"prov_id"
>1</
int
>
056
<
int
name
=
"city_id"
>103</
int
>
057
<
int
name
=
"area_id"
>10304</
int
>
058
<
int
name
=
"idt_id"
>10200</
int
>
059
<
int
name
=
"cnt"
>55</
int
>
060
<
int
name
=
"net_type"
>1</
int
>
063
<
int
name
=
"log_id"
>6851</
int
>
064
<
long
name
=
"start_time"
>1375142158</
long
>
065
<
long
name
=
"end_time"
>1375146391</
long
>
066
<
int
name
=
"prov_id"
>1</
int
>
067
<
int
name
=
"city_id"
>103</
int
>
068
<
int
name
=
"area_id"
>10304</
int
>
069
<
int
name
=
"idt_id"
>14000</
int
>
070
<
int
name
=
"cnt"
>4</
int
>
071
<
int
name
=
"net_type"
>1</
int
>
074
<
int
name
=
"log_id"
>6851</
int
>
075
<
long
name
=
"start_time"
>1375142158</
long
>
076
<
long
name
=
"end_time"
>1375146391</
long
>
077
<
int
name
=
"prov_id"
>1</
int
>
078
<
int
name
=
"city_id"
>103</
int
>
079
<
int
name
=
"area_id"
>10304</
int
>
080
<
int
name
=
"idt_id"
>11000</
int
>
081
<
int
name
=
"cnt"
>1</
int
>
082
<
int
name
=
"net_type"
>1</
int
>
085
<
int
name
=
"log_id"
>6851</
int
>
086
<
long
name
=
"start_time"
>1375142158</
long
>
087
<
long
name
=
"end_time"
>1375146391</
long
>
088
<
int
name
=
"prov_id"
>1</
int
>
089
<
int
name
=
"city_id"
>103</
int
>
090
<
int
name
=
"area_id"
>10304</
int
>
091
<
int
name
=
"idt_id"
>10201</
int
>
092
<
int
name
=
"cnt"
>31</
int
>
093
<
int
name
=
"net_type"
>1</
int
>
096
<
int
name
=
"log_id"
>6851</
int
>
097
<
long
name
=
"start_time"
>1375142158</
long
>
098
<
long
name
=
"end_time"
>1375146391</
long
>
099
<
int
name
=
"prov_id"
>1</
int
>
100
<
int
name
=
"city_id"
>103</
int
>
101
<
int
name
=
"area_id"
>10304</
int
>
102
<
int
name
=
"idt_id"
>8002</
int
>
103
<
int
name
=
"cnt"
>8</
int
>
104
<
int
name
=
"net_type"
>1</
int
>
107
<
int
name
=
"log_id"
>6851</
int
>
108
<
long
name
=
"start_time"
>1375142158</
long
>
109
<
long
name
=
"end_time"
>1375146391</
long
>
110
<
int
name
=
"prov_id"
>1</
int
>
111
<
int
name
=
"city_id"
>103</
int
>
112
<
int
name
=
"area_id"
>10304</
int
>
113
<
int
name
=
"idt_id"
>8000</
int
>
114
<
int
name
=
"cnt"
>30</
int
>
115
<
int
name
=
"net_type"
>1</
int
>
对比上面结果,除了根据idt_id排序方式不同以外(Impala是升序,Solr是降序),其他是相同的。
SQL查询语句:
1
SELECT
prov_id,
SUM
(cnt)
AS
sum_cnt,
AVG
(cnt)
AS
avg_cnt,
MAX
(cnt)
AS
max_cnt,
MIN
(cnt)
AS
min_cnt,
COUNT
(cnt)
AS
count_cnt
查询结果,如图所示: Solr查询URL:
查询结果,如下所示:
02
<
lst
name
=
"responseHeader"
>
03
<
int
name
=
"status"
>0</
int
>
04
<
int
name
=
"QTime"
>2</
int
>
06
<
result
name
=
"response"
numFound
=
"4088"
start
=
"0"
></
result
>
08
<
lst
name
=
"stats_fields"
>
10
<
double
name
=
"min"
>0.0</
double
>
11
<
double
name
=
"max"
>1258.0</
double
>
12
<
long
name
=
"count"
>4088</
long
>
13
<
long
name
=
"missing"
>0</
long
>
14
<
double
name
=
"sum"
>32587.0</
double
>
15
<
double
name
=
"sumOfSquares"
>9170559.0</
double
>
16
<
double
name
=
"mean"
>7.971379647749511</
double
>
17
<
double
name
=
"stddev"
>46.69344567709268</
double
>
对比查询结果,Solr提供了更多的统计项,如标准差(stddev)等,与SQL查询结果是一致的。
SQL查询语句:
1
SELECT
log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_typ
3
WHERE
prov_id = 1
AND
net_type = 1
AND
city_id
IN
(106,103)
AND
idt_id
IN
(12011,5004,6051,6056,8002)
AND
time_type = 1
AND
time_id >= 20130801
AND
time_id <= 20130815
4
ORDER
BY
log_id, start_time
DESC
LIMIT 10;
查询结果,如图所示: Solr查询URL:
1
http://slave1:8888/solr-cloud/i_event/select?q= *:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id, cnt,net_type&fq=prov_id:1 AND net_type:1 AND (city_id:106 OR city_id:103) AND (idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002) AND time_type:1 AND time_id:[20130801 TO 20130815]&sort=log_id asc ,start_time desc&start=0&rows=10
或
1
http://slave1:8888/solr-cloud/i_event/select?q= *:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id, cnt ,net_type&fq=prov_id:1&fq=net_type:1&fq=(city_id:106 OR city_id:103)&fq=(idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002)&fq=time_type:1&fq=time_id:[20130801 TO 20130815]&sort=log_id asc,start_time desc&start=0&rows=10
查询结果,如下所示:
002
<
lst
name
=
"responseHeader"
>
003
<
int
name
=
"status"
>0</
int
>
004
<
int
name
=
"QTime"
>6</
int
>
006
<
result
name
=
"response"
numFound
=
"63"
start
=
"0"
>
008
<
int
name
=
"log_id"
>6553</
int
>
009
<
long
name
=
"start_time"
>1374054184</
long
>
010
<
long
name
=
"end_time"
>1374054254</
long
>
011
<
int
name
=
"prov_id"
>1</
int
>
012
<
int
name
=
"city_id"
>103</
int
>
013
<
int
name
=
"area_id"
>10307</
int
>
014
<
int
name
=
"idt_id"
>12011</
int
>
015
<
int
name
=
"cnt"
>0</
int
>
016
<
int
name
=
"net_type"
>1</
int
>
019
<
int
name
=
"log_id"
>6553</
int
>
020
<
long
name
=
"start_time"
>1374054184</
long
>
021
<
long
name
=
"end_time"
>1374054254</
long
>
022
<
int
name
=
"prov_id"
>1</
int
>
023
<
int
name
=
"city_id"
>103</
int
>
024
<
int
name
=
"area_id"
>10307</
int
>
025
<
int
name
=
"idt_id"
>5004</
int
>
026
<
int
name
=
"cnt"
>2</
int
>
027
<
int
name
=
"net_type"
>1</
int
>
030
<
int
name
=
"log_id"
>6555</
int
>
031
<
long
name
=
"start_time"
>1374055060</
long
>
032
<
long
name
=
"end_time"
>1374055158</
long
>
033
<
int
name
=
"prov_id"
>1</
int
>
034
<
int
name
=
"city_id"
>103</
int
>
035
<
int
name
=
"area_id"
>70104</
int
>
036
<
int
name
=
"idt_id"
>5004</
int
>
037
<
int
name
=
"cnt"
>3</
int
>
038
<
int
name
=
"net_type"
>1</
int
>
041
<
int
name
=
"log_id"
>6555</
int
>
042
<
long
name
=
"start_time"
>1374055060</
long
>
043
<
long
name
=
"end_time"
>1374055158</
long
>
044
<
int
name
=
"prov_id"
>1</
int
>
045
<
int
name
=
"city_id"
>103</
int
>
046
<
int
name
=
"area_id"
>70104</
int
>
047
<
int
name
=
"idt_id"
>12011</
int
>
048
<
int
name
=
"cnt"
>0</
int
>
049
<
int
name
=
"net_type"
>1</
int
>
052
<
int
name
=
"log_id"
>6595</
int
>
053
<
long
name
=
"start_time"
>1374292508</
long
>
054
<
long
name
=
"end_time"
>1374292639</
long
>
055
<
int
name
=
"prov_id"
>1</
int
>
056
<
int
name
=
"city_id"
>103</
int
>
057
<
int
name
=
"area_id"
>10307</
int
>
058
<
int
name
=
"idt_id"
>5004</
int
>
059
<
int
name
=
"cnt"
>4</
int
>
060
<
int
name
=
"net_type"
>1</
int
>
063
<
int
name
=
"log_id"
>6611</
int
>
064
<
long
name
=
"start_time"
>1374461233</
long
>
065
<
long
name
=
"end_time"
>1374461245</
long
>
066
<
int
name
=
"prov_id"
>1</
int
>
067
<
int
name
=
"city_id"
>103</
int
>
068
<
int
name
=
"area_id"
>10307</
int
>
069
<
int
name
=
"idt_id"
>5004</
int
>
070
<
int
name
=
"cnt"
>1</
int
>
071
<
int
name
=
"net_type"
>1</
int
>
074
<
int
name
=
"log_id"
>6612</
int
>
075
<
long
name
=
"start_time"
>1374461261</
long
>
076
<
long
name
=
"end_time"
>1374461269</
long
>
077
<
int
name
=
"prov_id"
>1</
int
>
078
<
int
name
=
"city_id"
>103</
int
>
079
<
int
name
=
"area_id"
>10307</
int
>
080
<
int
name
=
"idt_id"
>5004</
int
>
081
<
int
name
=
"cnt"
>1</
int
>
082
<
int
name
=
"net_type"
>1</
int
>
085
<
int
name
=
"log_id"
>6612</
int
>
086
<
long
name
=
"start_time"
>1374461261</
long
>
087
<
long
name
=
"end_time"
>1374461269</
long
>
088
<
int
name
=
"prov_id"
>1</
int
>
089
<
int
name
=
"city_id"
>103</
int
>
090
<
int
name
=
"area_id"
>10307</
int
>
091
<
int
name
=
"idt_id"
>12011</
int
>
092
<
int
name
=
"cnt"
>0</
int
>
093
<
int
name
=
"net_type"
>1</
int
>
096
<
int
name
=
"log_id"
>6613</
int
>
097
<
long
name
=
"start_time"
>1374461422</
long
>
098
<
long
name
=
"end_time"
>1374461489</
long
>
099
<
int
name
=
"prov_id"
>1</
int
>
100
<
int
name
=
"city_id"
>103</
int
>
101
<
int
name
=
"area_id"
>10307</
int
>
102
<
int
name
=
"idt_id"
>6056</
int
>
103
<
int
name
=
"cnt"
>1</
int
>
104
<
int
name
=
"net_type"
>1</
int
>
107
<
int
name
=
"log_id"
>6613</
int
>
108
<
long
name
=
"start_time"
>1374461422</
long
>
109
<
long
name
=
"end_time"
>1374461489</
long
>
110
<
int
name
=
"prov_id"
>1</
int
>
111
<
int
name
=
"city_id"
>103</
int
>
112
<
int
name
=
"area_id"
>10307</
int
>
113
<
int
name
=
"idt_id"
>6051</
int
>
114
<
int
name
=
"cnt"
>1</
int
>
115
<
int
name
=
"net_type"
>1</
int
>
对比查询结果,是一致的。
SQL查询语句:
1
SELECT
log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type
3
WHERE
net_type = 1
AND
idt_id
IN
(12011,5004,6051,6056,8002)
AND
time_type = 1
AND
start_time >= 1373598465
AND
end_time < 1374055254
4
ORDER
BY
log_id, start_time, idt_id
DESC
LIMIT 30;
查询结果,如图所示: Solr查询URL:
1
http://slave1:8888/solr-cloud/i_event/select?q= *:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=net_type:1 AND (idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002) AND time_type:1 AND start_time:[1373598465 TO 1374055254]&fq =-start_time:1374055254&sort=log_id asc,start_time asc,idt_id desc&start=0&rows=30
或
1
http://slave1:8888/solr-cloud/i_event/select?q= *:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=net_type:1 AND (idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002) AND time_type:1 AND start_time:[1373598465 TO 1374055254] AND -start_time:1374055254&sort=log_id asc,start_time asc,idt_id desc&start=0&rows=30
或
1
http://slave1:8888/solr-cloud/i_event/select?q= *:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=net_type:1&fq=idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002&fq =time_type:1&fq=start_time:[1373598465 TO 1374055254]&fq =-start_time:1374055254&sort=log_id asc,start_time asc,idt_id desc&start=0&rows=30
查询结果,如下所示:
02
<
lst
name
=
"responseHeader"
>
03
<
int
name
=
"status"
>0</
int
>
04
<
int
name
=
"QTime"
>5</
int
>
06
<
result
name
=
"response"
numFound
=
"4"
start
=
"0"
>
08
<
int
name
=
"log_id"
>6553</
int
>
09
<
long
name
=
"start_time"
>1374054184</
long
>
10
<
long
name
=
"end_time"
>1374054254</
long
>
11
<
int
name
=
"prov_id"
>1</
int
>
12
<
int
name
=
"city_id"
>103</
int
>
13
<
int
name
=
"area_id"
>10307</
int
>
14
<
int
name
=
"idt_id"
>12011</
int
>
15
<
int
name
=
"cnt"
>0</
int
>
16
<
int
name
=
"net_type"
>1</
int
>
19
<
int
name
=
"log_id"
>6553</
int
>
20
<
long
name
=
"start_time"
>1374054184</
long
>
21
<
long
name
=
"end_time"
>1374054254</
long
>
22
<
int
name
=
"prov_id"
>1</
int
>
23
<
int
name
=
"city_id"
>103</
int
>
24
<
int
name
=
"area_id"
>10307</
int
>
25
<
int
name
=
"idt_id"
>5004</
int
>
26
<
int
name
=
"cnt"
>2</
int
>
27
<
int
name
=
"net_type"
>1</
int
>
30
<
int
name
=
"log_id"
>6555</
int
>
31
<
long
name
=
"start_time"
>1374055060</
long
>
32
<
long
name
=
"end_time"
>1374055158</
long
>
33
<
int
name
=
"prov_id"
>1</
int
>
34
<
int
name
=
"city_id"
>103</
int
>
35
<
int
name
=
"area_id"
>70104</
int
>
36
<
int
name
=
"idt_id"
>12011</
int
>
37
<
int
name
=
"cnt"
>0</
int
>
38
<
int
name
=
"net_type"
>1</
int
>
41
<
int
name
=
"log_id"
>6555</
int
>
42
<
long
name
=
"start_time"
>1374055060</
long
>
43
<
long
name
=
"end_time"
>1374055158</
long
>
44
<
int
name
=
"prov_id"
>1</
int
>
45
<
int
name
=
"city_id"
>103</
int
>
46
<
int
name
=
"area_id"
>70104</
int
>
47
<
int
name
=
"idt_id"
>5004</
int
>
48
<
int
name
=
"cnt"
>3</
int
>
49
<
int
name
=
"net_type"
>1</
int
>
SQL查询语句:
1
SELECT
city_id, area_id,
COUNT
(cnt)
AS
count_cnt
3
WHERE
prov_id = 1
AND
net_type = 1
4
GROUP
BY
city_id, area_id;
查询结果,如图所示: Solr查询URL:
查询结果,如下所示:
002
<
lst
name
=
"responseHeader"
>
003
<
int
name
=
"status"
>0</
int
>
004
<
int
name
=
"QTime"
>72</
int
>
006
<
result
name
=
"response"
numFound
=
"1171"
start
=
"0"
></
result
>
007
<
lst
name
=
"facet_counts"
>
008
<
lst
name
=
"facet_queries"
/>
009
<
lst
name
=
"facet_fields"
/>
010
<
lst
name
=
"facet_dates"
/>
011
<
lst
name
=
"facet_ranges"
/>
012
<
lst
name
=
"facet_pivot"
>
013
<
arr
name
=
"city_id,area_id"
>
015
<
str
name
=
"field"
>city_id</
str
>
016
<
int
name
=
"value"
>103</
int
>
017
<
int
name
=
"count"
>678</
int
>
020
<
str
name
=
"field"
>area_id</
str
>
021
<
int
name
=
"value"
>10307</
int
>
022
<
int
name
=
"count"
>298</
int
>
025
<
str
name
=
"field"
>area_id</
str
>
026
<
int
name
=
"value"
>10315</
int
>
027
<
int
name
=
"count"
>120</
int
>
030
<
str
name
=
"field"
>area_id</
str
>
031
<
int
name
=
"value"
>10317</
int
>
032
<
int
name
=
"count"
>86</
int
>
035
<
str
name
=
"field"
>area_id</
str
>
036
<
int
name
=
"value"
>10304</
int
>
037
<
int
name
=
"count"
>67</
int
>
040
<
str
name
=
"field"
>area_id</
str
>
041
<
int
name
=
"value"
>10310</
int
>
042
<
int
name
=
"count"
>49</
int
>
045
<
str
name
=
"field"
>area_id</
str
>
046
<
int
name
=
"value"
>70104</
int
>
047
<
int
name
=
"count"
>48</
int
>
050
<
str
name
=
"field"
>area_id</
str
>
051
<
int
name
=
"value"
>10308</
int
>
052
<
int
name
=
"count"
>6</
int
>
055
<
str
name
=
"field"
>area_id</
str
>
056
<
int
name
=
"value"
>0</
int
>
057
<
int
name
=
"count"
>2</
int
>
060
<
str
name
=
"field"
>area_id</
str
>
061
<
int
name
=
"value"
>10311</
int
>
062
<
int
name
=
"count"
>2</
int
>
067
<
str
name
=
"field"
>city_id</
str
>
068
<
int
name
=
"value"
>0</
int
>
069
<
int
name
=
"count"
>463</
int
>
072
<
str
name
=
"field"
>area_id</
str
>
073
<
int
name
=
"value"
>0</
int
>
074
<
int
name
=
"count"
>395</
int
>
077
<
str
name
=
"field"
>area_id</
str
>
078
<
int
name
=
"value"
>10307</
int
>
079
<
int
name
=
"count"
>68</
int
>
084
<
str
name
=
"field"
>city_id</
str
>
085
<
int
name
=
"value"
>106</
int
>
086
<
int
name
=
"count"
>10</
int
>
089
<
str
name
=
"field"
>area_id</
str
>
090
<
int
name
=
"value"
>10304</
int
>
091
<
int
name
=
"count"
>10</
int
>
096
<
str
name
=
"field"
>city_id</
str
>
097
<
int
name
=
"value"
>110</
int
>
098
<
int
name
=
"count"
>8</
int
>
101
<
str
name
=
"field"
>area_id</
str
>
102
<
int
name
=
"value"
>0</
int
>
103
<
int
name
=
"count"
>8</
int
>
108
<
str
name
=
"field"
>city_id</
str
>
109
<
int
name
=
"value"
>118</
int
>
110
<
int
name
=
"count"
>8</
int
>
113
<
str
name
=
"field"
>area_id</
str
>
114
<
int
name
=
"value"
>10316</
int
>
115
<
int
name
=
"count"
>8</
int
>
120
<
str
name
=
"field"
>city_id</
str
>
121
<
int
name
=
"value"
>105</
int
>
122
<
int
name
=
"count"
>4</
int
>
125
<
str
name
=
"field"
>area_id</
str
>
126
<
int
name
=
"value"
>0</
int
>
127
<
int
name
=
"count"
>4</
int
>
对比上面结果,Solr查询结果,需要从上面的各组中进行合并,得到最终的统计结果,结果和SQL结果是一致的。
多个字段分组统计(支持count、sum、max、min等函数)
一次对多个字段进行独立分组统计,Solr可以很好的支持。这相当于执行两个带有GROUP BY子句的SQL,这两个GROUP BY分别只对一个字段进行汇总统计。 SQL查询语句:
1
SELECT
city_id, area_id,
COUNT
(cnt)
AS
count_cnt
3
WHERE
prov_id = 1
AND
net_type = 1
6
SELECT
city_id, area_id,
COUNT
(cnt)
AS
count_cnt
8
WHERE
prov_id = 1
AND
net_type = 1
查询结果,不再显示。 Solr查询URL:
查询结果,如下所示:
002
<
lst
name
=
"responseHeader"
>
003
<
int
name
=
"status"
>0</
int
>
004
<
int
name
=
"QTime"
>6</
int
>
006
<
result
name
=
"response"
numFound
=
"1171"
start
=
"0"
></
result
>
008
<
lst
name
=
"stats_fields"
>
010
<
double
name
=
"min"
>0.0</
double
>
011
<
double
name
=
"max"
>167.0</
double
>
012
<
long
name
=
"count"
>1171</
long
>
013
<
long
name
=
"missing"
>0</
long
>
014
<
double
name
=
"sum"
>3701.0</
double
>
015
<
double
name
=
"sumOfSquares"
>249641.0</
double
>
016
<
double
name
=
"mean"
>3.1605465414175917</
double
>
017
<
double
name
=
"stddev"
>14.260812879164407</
double
>
021
<
double
name
=
"min"
>0.0</
double
>
022
<
double
name
=
"max"
>167.0</
double
>
023
<
long
name
=
"count"
>463</
long
>
024
<
long
name
=
"missing"
>0</
long
>
025
<
double
name
=
"sum"
>2783.0</
double
>
026
<
double
name
=
"sumOfSquares"
>238819.0</
double
>
027
<
double
name
=
"mean"
>6.010799136069115</
double
>
028
<
double
name
=
"stddev"
>21.92524420257807</
double
>
029
<
lst
name
=
"facets"
/>
032
<
double
name
=
"min"
>0.0</
double
>
033
<
double
name
=
"max"
>1.0</
double
>
034
<
long
name
=
"count"
>8</
long
>
035
<
long
name
=
"missing"
>0</
long
>
036
<
double
name
=
"sum"
>3.0</
double
>
037
<
double
name
=
"sumOfSquares"
>3.0</
double
>
038
<
double
name
=
"mean"
>0.375</
double
>
039
<
double
name
=
"stddev"
>0.5175491695067657</
double
>
040
<
lst
name
=
"facets"
/>
043
<
double
name
=
"min"
>0.0</
double
>
044
<
double
name
=
"max"
>0.0</
double
>
045
<
long
name
=
"count"
>10</
long
>
046
<
long
name
=
"missing"
>0</
long
>
047
<
double
name
=
"sum"
>0.0</
double
>
048
<
double
name
=
"sumOfSquares"
>0.0</
double
>
049
<
double
name
=
"mean"
>0.0</
double
>
050
<
double
name
=
"stddev"
>0.0</
double
>
051
<
lst
name
=
"facets"
/>
054
<
double
name
=
"min"
>0.0</
double
>
055
<
double
name
=
"max"
>0.0</
double
>
056
<
long
name
=
"count"
>4</
long
>
057
<
long
name
=
"missing"
>0</
long
>
058
<
double
name
=
"sum"
>0.0</
double
>
059
<
double
name
=
"sumOfSquares"
>0.0</
double
>
060
<
double
name
=
"mean"
>0.0</
double
>
061
<
double
name
=
"stddev"
>0.0</
double
>
062
<
lst
name
=
"facets"
/>
065
<
double
name
=
"min"
>0.0</
double
>
066
<
double
name
=
"max"
>55.0</
double
>
067
<
long
name
=
"count"
>678</
long
>
068
<
long
name
=
"missing"
>0</
long
>
069
<
double
name
=
"sum"
>915.0</
double
>
070
<
double
name
=
"sumOfSquares"
>10819.0</
double
>
071
<
double
name
=
"mean"
>1.3495575221238938</
double
>
072
<
double
name
=
"stddev"
>3.7625525739676986</
double
>
073
<
lst
name
=
"facets"
/>
076
<
double
name
=
"min"
>0.0</
double
>
077
<
double
name
=
"max"
>0.0</
double
>
078
<
long
name
=
"count"
>8</
long
>
079
<
long
name
=
"missing"
>0</
long
>
080
<
double
name
=
"sum"
>0.0</
double
>
081
<
double
name
=
"sumOfSquares"
>0.0</
double
>
082
<
double
name
=
"mean"
>0.0</
double
>
083
<
double
name
=
"stddev"
>0.0</
double
>
084
<
lst
name
=
"facets"
/>
089
<
double
name
=
"min"
>0.0</
double
>
090
<
double
name
=
"max"
>1.0</
double
>
091
<
long
name
=
"count"
>6</
long
>
092
<
long
name
=
"missing"
>0</
long
>
093
<
double
name
=
"sum"
>1.0</
double
>
094
<
double
name
=
"sumOfSquares"
>1.0</
double
>
095
<
double
name
=
"mean"
>0.16666666666666666</
double
>
096
<
double
name
=
"stddev"
>0.408248290463863</
double
>
097
<
lst
name
=
"facets"
/>
100
<
double
name
=
"min"
>0.0</
double
>
101
<
double
name
=
"max"
>5.0</
double
>
102
<
long
name
=
"count"
>49</
long
>
103
<
long
name
=
"missing"
>0</
long
>
104
<
double
name
=
"sum"
>40.0</
double
>
105
<
double
name
=
"sumOfSquares"
>108.0</
double
>
106
<
double
name
=
"mean"
>0.8163265306122449</
double
>
107
<
double
name
=
"stddev"
>1.2528878206593208</
double
>
108
<
lst
name
=
"facets"
/>
111
<
double
name
=
"min"
>0.0</
double
>
112
<
double
name
=
"max"
>167.0</
double
>
113
<
long
name
=
"count"
>409</
long
>
114
<
long
name
=
"missing"
>0</
long
>
115
<
double
name
=
"sum"
>2722.0</
double
>
116
<
double
name
=
"sumOfSquares"
>238550.0</
double
>
117
<
double
name
=
"mean"
>6.6552567237163816</
double
>
118
<
double
name
=
"stddev"
>23.243931908854</
double
>
119
<
lst
name
=
"facets"
/>
122
<
double
name
=
"min"
>0.0</
double
>
123
<
double
name
=
"max"
>0.0</
double
>
124
<
long
name
=
"count"
>2</
long
>
125
<
long
name
=
"missing"
>0</
long
>
126
<
double
name
=
"sum"
>0.0</
double
>
127
<
double
name
=
"sumOfSquares"
>0.0</
double
>
128
<
double
name
=
"mean"
>0.0</
double
>
129
<
double
name
=
"stddev"
>0.0</
double
>
130
<
lst
name
=
"facets"
/>
133
<
double
name
=
"min"
>0.0</
double
>
134
<
double
name
=
"max"
>55.0</
double
>
135
<
long
name
=
"count"
>77</
long
>
136
<
long
name
=
"missing"
>0</
long
>
137
<
double
name
=
"sum"
>370.0</
double
>
138
<
double
name
=
"sumOfSquares"
>9476.0</
double
>
139
<
double
name
=
"mean"
>4.805194805194805</
double
>
140
<
double
name
=
"stddev"
>10.064318107786017</
double
>
141
<
lst
name
=
"facets"
/>
144
<
double
name
=
"min"
>0.0</
double
>
145
<
double
name
=
"max"
>3.0</
double
>
146
<
long
name
=
"count"
>48</
long
>
147
<
long
name
=
"missing"
>0</
long
>
148
<
double
name
=
"sum"
>51.0</
double
>
149
<
double
name
=
"sumOfSquares"
>117.0</
double
>
150
<
double
name
=
"mean"
>1.0625</
double
>
151
<
double
name
=
"stddev"
>1.1560433254047038</
double
>
152
<
lst
name
=
"facets"
/>
155
<
double
name
=
"min"
>0.0</
double
>
156
<
double
name
=
"max"
>12.0</
double
>
157
<
long
name
=
"count"
>366</
long
>
158
<
long
name
=
"missing"
>0</
long
>
159
<
double
name
=
"sum"
>274.0</
double
>
160
<
double
name
=
"sumOfSquares"
>768.0</
double
>
161
<
double
name
=
"mean"
>0.7486338797814208</
double
>
162
<
double
name
=
"stddev"
>1.2418218134151426</
double
>
163
<
lst
name
=
"facets"
/>
166
<
double
name
=
"min"
>0.0</
double
>
167
<
double
name
=
"max"
>4.0</
double
>
168
<
long
name
=
"count"
>120</
long
>
169
<
long
name
=
"missing"
>0</
long
>
170
<
double
name
=
"sum"
>143.0</
double
>
171
<
double
name
=
"sumOfSquares"
>359.0</
double
>
172
<
double
name
=
"mean"
>1.1916666666666667</
double
>
173
<
double
name
=
"stddev"
>1.2588899560996694</
double
>
174
<
lst
name
=
"facets"
/>
177
<
double
name
=
"min"
>0.0</
double
>
178
<
double
name
=
"max"
>0.0</
double
>
179
<
long
name
=
"count"
>8</
long
>
180
<
long
name
=
"missing"
>0</
long
>
181
<
double
name
=
"sum"
>0.0</
double
>
182
<
double
name
=
"sumOfSquares"
>0.0</
double
>
183
<
double
name
=
"mean"
>0.0</
double
>
184
<
double
name
=
"stddev"
>0.0</
double
>
185
<
lst
name
=
"facets"
/>
188
<
double
name
=
"min"
>0.0</
double
>
189
<
double
name
=
"max"
>5.0</
double
>
190
<
long
name
=
"count"
>86</
long
>
191
<
long
name
=
"missing"
>0</
long
>
192
<
double
name
=
"sum"
>100.0</
double
>
193
<
double
name
=
"sumOfSquares"
>262.0</
double
>
194
<
double
name
=
"mean"
>1.1627906976744187</
double
>
195
<
double
name
=
"stddev"
>1.3093371930442208</
double
>
196
<
lst
name
=
"facets"
/>
多个字段联合分组统计(支持count、sum、max、min等函数)
SQL查询语句:
1
SELECT
city_id, area_id,
SUM
(cnt)
AS
sum_cnt,
AVG
(cnt)
AS
avg_cnt,
MAX
(cnt)
AS
max_cnt,
MIN
(cnt)
AS
min_cnt,
COUNT
(cnt)
AS
count_cnt
3
WHERE
prov_id = 1
AND
net_type = 1
4
GROUP
BY
city_id, area_id;
查询结果,如图所示: Solr目前不能简单的支持这种查询,如果想要满足这种查询统计,需要在schema的设计上,将一个字段设置为多值,然后通过多个值进行分组统计。如果应用中查询统计分析的模式比较固定,预先知道哪些字段会用于联合分组统计,完全可以在设计的时候,考虑设置多值字段来满足这种需求。