GeoLife GPS Trajectories
该GPS轨迹数据集出自微软研究GeoLift项目。从2007年四月到2012年八月收集了182个用户的轨迹数据。这些数据包含了一系列以时间为序的点,每一个点包含经纬度、海拔等信息。包含了17621个轨迹,总距离120多万公里,总时间48000多小时。这些数据不仅仅记录了用户在家和在工作地点的位置轨迹,还记录了大范围的户外活动轨迹,比如购物、旅游、远足、骑自行车。
下载地址:https://www.microsoft.com/en-us/download/details.aspx?id=52367
1.文件结构及数据结构
├── Data
│ ├── 000
│ │ └── Trajectory
│ │ ├── 20081023025304.plt
│ │ ├── 20081024020959.plt
│ │ ├── 20090521231053.plt
│ │ └── 20090705025307.plt
│ ├── 001
│ │ └── Trajectory
│ │ ├── 20081023055305.plt
│ │ ├── 20081023234104.plt
数据结构Example:
39.906631,116.385564,0,492,40097.5864583333,2009-10-11,14:04:30
39.906554,116.385625,0,492,40097.5865162037,2009-10-11,14:04:35
Line 1…6 are useless in this dataset, and can be ignored. Points are described in following lines, one for each line.
Field 1: (纬度)Latitude in decimal degrees.
Field 2: (经度)Longitude in decimal degrees.
Field 3: All set to 0 for this dataset.
Field 4: Altitude in feet (-777 if not valid).
Field 5: Date - number of days (with fractional part) that have passed since 12/30/1899.
Field 6: (日期)Date as a string.
Field 7: (时间)Time as a string.
Note that field 5 and field 6&7 represent the same date/time in this dataset. You may use either of them.
2.Spark抽取转换
2.1 maven添加spark、es依赖
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>7.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.3</version>
</dependency>
import org.apache.spark.sql.SparkSession
import scala.collection.mutable.ArrayBuffer
import org.elasticsearch.spark.sql._
object GeoToES {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local[2]")
.config("es.index.auto.create", "true")
.config("es.nodes", "127.0.0.1")
.config("es.port", "9200")
.appName("log")
.getOrCreate()
val sc = spark.sparkContext
val rdd = sc.wholeTextFiles("./data/geoData/*/*.plt")
val rdd2 = rdd.map(x => FileAndContext(x._1, x._2))
val rdd3 = rdd2.flatMap(splitContext)
val dataFrame = spark.createDataFrame(rdd3)
dataFrame.show(false)
dataFrame.saveToEs("person_geo_time_location")
sc.stop()
spark.close()
}
def splitContext(x: FileAndContext): List[PersonRealTimePosition] = {
val space = " "
val arr = ArrayBuffer[PersonRealTimePosition]()
val personName = x.file.split("/")(6)
val lines = x.context.split("\r\n")
for (line <- lines) {
//39.99999,116.327396,0,92,39752.4790277778,2008-10-31,11:29:48
val ss = line.split(",")
arr += (PersonRealTimePosition(personName, Array(ss(1).toDouble,ss(0).toDouble), ss(5) + space + ss(6)))
}
arr.toList
}
//case class PersonRealTimePosition(personName: String, latitude: Double, longitude: Double, time: String)
case class PersonRealTimePosition(personName: String, position: Array[Double], time: String)
case class FileAndContext(file: String, context: String)
}
3.ES
3.1 新建索引
PUT person_geo_time_location
{
"mappings": {
"properties": {
"personName": {
"type": "keyword"
},
"position": {
"type": "geo_point"
},
"time": {
"type": "date",
"format": [
"yyyy-MM-dd HH:mm:ss"
]
}
}
}
}
//查看索引
GET person_geo_time_location/_search
//删除索引
DELETE person_geo_time_location
3.2 执行 spark程序
4 kibana展示
多边形区域查询
GET ds_location/_search
{
"size":10000,
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"position" : {
"points" :[[120.16087071364484,33.362620916202346],[120.15767787519458,33.36015291341143],[120.15605546269813,33.35866426916683],[120.15490042270382,33.35746054807127],[120.15249817743417,33.355856777638884],[120.14898651015477,33.35369297900035],[120.14769618773781,33.352883360852076],[120.14575278703055,33.35174334958074],[120.14314632497516,33.35029689059761],[120.14528151284478,33.347831792929696],[120.14706608167971,33.34575916170193],[120.15068212527135,33.34159854967904],[120.15261959667367,33.33939803668557],[120.1546651583634,33.337056142664316],[120.15575997491486,33.33580978887047],[120.15723154389163,33.334170489150814],[120.15952897225307,33.33566894835298],[120.16206921453815,33.33730405885229],[120.16803711087628,33.337401056389126],[120.17187876183662,33.339529697144954],[120.17640792235093,33.342381957341786],[120.17436088714622,33.34473709462396],[120.17681950268545,33.34638298783318],[120.17431170780574,33.34964904932173],[120.17267057379742,33.35155695220517],[120.17039041062083,33.353871967775255],[120.16876820486264,33.356331270904334],[120.16723802260562,33.35755302052758],[120.16620351333903,33.35808542840889],[120.16384718081144,33.360070149758],[120.16188625148396,33.36182902735397],[120.16188625148396,33.36182902735397]]
}
}
}
}
}
}