简单Elasticsearch实战(一)介绍
简单Elasticsearch实战(二)python爬取招聘网站信息
简单Elasticsearch实战(三)python连接Elasticsearch
简单Elasticsearch实战(四)数据清洗后,从mysql导入Elasticsearch
简单Elasticsearch实战(五)利用kabana做简单数据分析
ok,现在原数据有了,下一步就是清洗数据到ES上面了,不过在这之前我们先看一下python如何连接elasticsearch和索引如何建立。
连接至ES
这里我们使用elasticsearch_dsl库来进行连接
pip install elasticsearch_dsl
安装完库之后导入
from elasticsearch_dsl import Document, Date, Text, Keyword, InnerDoc, Object, connections
创建连接还是很方便的,就一行代码。
connections.create_connection(hosts=['localhost'])
创建对象
根据需求,我们需要三个对象,这三个信息分开保存,看着也舒服
- Address 地址
- Company 公司
- Job 招聘信息
address类
class Address(InnerDoc):
#城市
city = Text(fields={'keyword': Keyword()})
#地区
area = Text()
#详细地址 这里使用了ik分词器
detail = Text(analyzer="ik_smart")
Company类
class Company(InnerDoc):
# 公司名称
company_name = Text(analyzer="ik_smart",fields={'keyword': Keyword()} )
# 规模
number_person = Text(analyzer="ik_smart",fields={'keyword': Keyword()} )
# 属性
company_property = Text(analyzer="ik_smart",fields={'keyword': Keyword()})
# 标签
company_tag = Keyword()
# 地址
company_address = Object(Address)
Job类
class Job(Document):
#工资
salary = Text(fields={'keyword': Keyword()})
#工作名称
job_name = Text(analyzer="ik_smart",fields={'keyword': Keyword()})
#工作类型
job_type = Text(analyzer="ik_smart",fields={'keyword': Keyword()})
#工作经验
work_experience = Text(analyzer="ik_smart",fields={'keyword': Keyword()})
#学历
education = Text(fields={'keyword': Keyword()})
#福利
welfare = Keyword()
#url
url = Text()
#日期
date = Date()
#职位描述
job_description = Text(analyzer="ik_smart",fields={'keyword': Keyword()})
#公司类
company = Object(Company)
#工作地址
work_address = Object(Address)
# 索引名称
class Index:
name = 'jobs'
def save(self, **kwargs):
return super(Job, self).save(**kwargs)
这三个类没什么好说的,address和company是job里的object,都继承自InnerDoc,Job是文档本身,继承自Document。接下来我们可以先插入一条数据试试。
这里说明一下,我们建好上面的类后,字段类型,参数,分词器,都在里面定义好了,当我们插入第一条数据后,elasticsearch——dsl会给我们自动按照参数创建索引和mapping
if __name__ == '__main__':
connections.create_connection(hosts=['localhost'])
job = Job(meta={'id': "job1"})
job.url = "www.baidu.com"
job.job_name = "测试"
job.job_description = ["会python","会Elasticsearch"]
job.work_experience = "25岁,30年工作经验"
job.welfare = ["五险一金"]
job.education = "不限"
job.job_type = "ES开发"
job.salary = "1000k"
# 保存至Elasticsearch
job.save()
查询结果,这里我们只是简单插入数据测试,没有添加company和address信息,下一篇数据清洗时候,我们会添加完整信息
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "job52",
"_type" : "_doc",
"_id" : "jobb1",
"_score" : 1.0,
"_source" : {
"salary" : "1000k",
"job_name" : "测试",
"job_type" : "ES开发",
"work_experience" : "25岁,30年工作经验",
"education" : "不限",
"welfare" : [
"五险一金"
],
"url" : "www.baidu.com",
"job_description" : [
"会python",
"会Elasticsearch"
]
}
}
]
}
}