2019年05月_平常心19-3-21

原创 Echarts实现柱状图渐变色

源码option={ dataset:{ source:[ ['score','amount','product'], [89,5300,'oppo'], [78,4500,'vivo'], [95,7800,'华为'], [96,8000,'iPhone'], [69,3000,'三星'...

2019-05-24 08:33:13 1003

原创 Tensorflow英文垃圾邮件分类

data_helpers.pyimport numpy as npimport reimport itertoolsfrom collections import Counterdef clean_str(string): """ Tokenization/string cleaning for all datasets except for SST. Ori...

2019-05-17 09:04:29 545

原创 Spark原理

Spark的体系架构：主从架构：主节点：Master作用：接收客户端发送的数据处理的请求，将数据处理任务分配给Worker执行从节点：Worker作用：执行数据处理任务通过浏览器查看spark运行状态：http://主节点主机名:8080Spark架构相关的术语（1） Driver（2） SparkContext（3） Cluster Manager（4） Wo...

2019-05-15 11:00:59 177

原创 python爬取前程无忧招聘网站信息（requests+bs4+xlwt）

import requestsfrom bs4 import BeautifulSoupimport xlwtheaders = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/53...

2019-05-12 21:56:51 1609 2

原创 python爬取前程无忧招聘信息（urllib正则+xlwt）

import urllib.requestimport reimport xlwtdef get_content(page): url='https://search.51job.com/list/120200,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,'+str(page)+'.html'...

2019-05-12 21:53:32 424

原创 pandas_Series基础入门

Series创建import panads as pds_score = pd.Series([80,50,95,96,98],index=["b","c","d","e","f"])访问可以通过一个或多个索引，索引切片以及标签切片来访问（注意：标签切片左右都是闭区间）print(s_score["b"])print(s_score[1])print(s_score[:...

2019-05-10 09:14:21 196

原创 Python实现12306自动抢票（通过Selnium+Chromedriver 附源码）

今天我们来实现12306抢票功能，话不多说直接开撸。我们实现抢票分为六个步骤：让浏览器打开12306的登录界面，然后我们手动进行登陆登录完成后让浏览器跳转到买票的界面手动输入出发地，目的地，日期。通过代码检测我们输入的信息无误后，自动点击查询。查找到我们需要的车次，然后判断对应的车次是否有余票。如果有，自动预定。如果没有，我们就循环这个查询工作。一旦有票，就执行预定操作，来到预定后的...

2019-05-07 20:33:27 2782 1

原创 from selenium import webdriver报错

from selenium import webdriverselenium包已经安装，但是导入时竟然在webdriver报红线。当时我以为是报的问题或者是路径的问题，或者是我下载的selenium与我的python版本不对应。于是我检查了所有可能的问题，但还是不管用。原来是，我这个代码路径下有一个文件是selenium.py导入的时候，pycharm会优先选择我创建的selenium...

2019-05-06 22:25:13 3103 7

原创爬取前程无忧招聘信息存入Mysql数据库(Requests+Xpath+PyMysql)

今天我们抓取的信息有：职位名，公司名，工作地点和薪资并保存至数据库中1.我们先连接数据库建表import pymysqldef create_table(): db = pymysql.connect(host='localhost',db='qianchengwuyou',user='root',password='wgy@666666',charset='utf8') ...

2019-05-05 13:33:01 1735 2

原创 scrapy模拟登陆github

# -*- coding: utf-8 -*-import scrapyimport reclass GithubSpider(scrapy.Spider): name = 'github' allowed_domains = ['github.com'] start_urls = ['https://github.com/login'] def pars...

2019-05-05 09:15:45 346

原创 Scrapy框架爬取苏宁图书信息

# -*- coding: utf-8 -*-import scrapyfrom SNBook.items import SnbookItemimport reclass SnBookSpider(scrapy.Spider): name = 'sn_book' allowed_domains = ['suning.com'] start_urls = ['ht...

2019-05-05 09:11:47 604

原创 Scrapy框架爬取阳光政务平台数据

# -*- coding: utf-8 -*-import scrapyfrom yangguang.items import YangguangItemfrom yangguang.settings import MONGO_HOSTclass YgSpider(scrapy.Spider): name = 'yg' allowed_domains = ['sun076...

2019-05-05 09:10:19 544

原创 Scrapy框架爬取腾讯招聘信息

# -*- coding: utf-8 -*-import scrapyfrom tencent.items import TencentItemclass HrSpider(scrapy.Spider): name = 'hr' allowed_domains = ['tencent.com'] start_urls = ['http://hr.tencent.c...

2019-05-05 09:07:42 252

自律