Spark抽取转换182个用户的轨迹数据到ES,Kibana展示

GeoLife GPS Trajectories

该GPS轨迹数据集出自微软研究GeoLift项目。从2007年四月到2012年八月收集了182个用户的轨迹数据。这些数据包含了一系列以时间为序的点,每一个点包含经纬度、海拔等信息。包含了17621个轨迹,总距离120多万公里,总时间48000多小时。这些数据不仅仅记录了用户在家和在工作地点的位置轨迹,还记录了大范围的户外活动轨迹,比如购物、旅游、远足、骑自行车。
下载地址:https://www.microsoft.com/en-us/download/details.aspx?id=52367


1.文件结构及数据结构
├── Data
│   ├── 000
│   │   └── Trajectory
│   │       ├── 20081023025304.plt
│   │       ├── 20081024020959.plt
│   │       ├── 20090521231053.plt
│   │       └── 20090705025307.plt
│   ├── 001
│   │   └── Trajectory
│   │       ├── 20081023055305.plt
│   │       ├── 20081023234104.plt

数据结构Example:

39.906631,116.385564,0,492,40097.5864583333,2009-10-11,14:04:30
39.906554,116.385625,0,492,40097.5865162037,2009-10-11,14:04:35
Line 1…6 are useless in this dataset, and can be ignored. Points are described in following lines, one for each line.
Field 1: (纬度)Latitude in decimal degrees.
Field 2: (经度)Longitude in decimal degrees.
Field 3: All set to 0 for this dataset.
Field 4: Altitude in feet (-777 if not valid).
Field 5: Date - number of days (with fractional part) that have passed since 12/30/1899.
Field 6: (日期)Date as a string.
Field 7: (时间)Time as a string.
Note that field 5 and field 6&7 represent the same date/time in this dataset. You may use either of them.

2.Spark抽取转换
2.1 maven添加spark、es依赖
   <dependency>
       <groupId>org.elasticsearch</groupId>
       <artifactId>elasticsearch-spark-20_2.11</artifactId>
       <version>7.2.0</version>
   </dependency>
          <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
   <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_2.11</artifactId>
       <version>2.4.3</version>
   </dependency>
   <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
   <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-sql_2.11</artifactId>
       <version>2.4.3</version>
   </dependency>
import org.apache.spark.sql.SparkSession

import scala.collection.mutable.ArrayBuffer
import org.elasticsearch.spark.sql._

object GeoToES {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .master("local[2]")
      .config("es.index.auto.create", "true")
      .config("es.nodes", "127.0.0.1")
      .config("es.port", "9200")
      .appName("log")
      .getOrCreate()

    val sc = spark.sparkContext

    val rdd = sc.wholeTextFiles("./data/geoData/*/*.plt")
    val rdd2 = rdd.map(x => FileAndContext(x._1, x._2))
    val rdd3 = rdd2.flatMap(splitContext)

    val dataFrame = spark.createDataFrame(rdd3)
    dataFrame.show(false)
    dataFrame.saveToEs("person_geo_time_location")
    sc.stop()
    spark.close()
  }

  def splitContext(x: FileAndContext): List[PersonRealTimePosition] = {
    val space = " "
    val arr = ArrayBuffer[PersonRealTimePosition]()
    val personName = x.file.split("/")(6)
    val lines = x.context.split("\r\n")
    for (line <- lines) {
      //39.99999,116.327396,0,92,39752.4790277778,2008-10-31,11:29:48
      val ss = line.split(",")
      arr += (PersonRealTimePosition(personName, Array(ss(1).toDouble,ss(0).toDouble), ss(5) + space + ss(6)))
    }
    arr.toList
  }

  //case class PersonRealTimePosition(personName: String, latitude: Double, longitude: Double, time: String)
  case class PersonRealTimePosition(personName: String, position: Array[Double], time: String)

  case class FileAndContext(file: String, context: String)

}

3.ES
3.1 新建索引
PUT person_geo_time_location
{
  "mappings": {
    "properties": {
      "personName": {
        "type": "keyword"
      },
      "position": {
        "type": "geo_point"
      },
      "time": {
        "type": "date",
        "format": [
          "yyyy-MM-dd HH:mm:ss"
        ]
      }
    }
  }
}
//查看索引
GET person_geo_time_location/_search 
//删除索引
DELETE person_geo_time_location
3.2 执行 spark程序

4 kibana展示

在这里插入图片描述


多边形区域查询

GET ds_location/_search
{
    "size":10000,
    "query": {

        "bool" : {
            "must" : {
                "match_all" : {}

            },
            "filter" : {
                "geo_polygon" : {
                    "position" : {
                        "points" :[[120.16087071364484,33.362620916202346],[120.15767787519458,33.36015291341143],[120.15605546269813,33.35866426916683],[120.15490042270382,33.35746054807127],[120.15249817743417,33.355856777638884],[120.14898651015477,33.35369297900035],[120.14769618773781,33.352883360852076],[120.14575278703055,33.35174334958074],[120.14314632497516,33.35029689059761],[120.14528151284478,33.347831792929696],[120.14706608167971,33.34575916170193],[120.15068212527135,33.34159854967904],[120.15261959667367,33.33939803668557],[120.1546651583634,33.337056142664316],[120.15575997491486,33.33580978887047],[120.15723154389163,33.334170489150814],[120.15952897225307,33.33566894835298],[120.16206921453815,33.33730405885229],[120.16803711087628,33.337401056389126],[120.17187876183662,33.339529697144954],[120.17640792235093,33.342381957341786],[120.17436088714622,33.34473709462396],[120.17681950268545,33.34638298783318],[120.17431170780574,33.34964904932173],[120.17267057379742,33.35155695220517],[120.17039041062083,33.353871967775255],[120.16876820486264,33.356331270904334],[120.16723802260562,33.35755302052758],[120.16620351333903,33.35808542840889],[120.16384718081144,33.360070149758],[120.16188625148396,33.36182902735397],[120.16188625148396,33.36182902735397]]
                    }
                }
            }
        }
    }
}
  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值