I'm_Jenson-CSDN博客

原创 Python - 内置函数、字符串方法、保留字

查看python内置函数： import builtins for b in dir(builtins): print(b) abs(int) 取一个数的绝对值 all(iterable) 判断可迭代对象中所有元素是否全为True( 0、空、None、False),有则返回False,参数为空迭代对象时为True any(iterable) 判断可迭代对象中有任意一个元素为真返回True,所有元素为空返回False,参数为空迭代对象返回False ascii(int/str) 把输入参数(如字符串字符参数

2020-07-19 15:26:53 348

原创 Linux - nginx服务器

nginx基本信息和基本操作 1./var/log/nginx/access.log -->访问日志 2./etc/nginx/nginx.conf -->配置文件 3./etc/nginx/conf.d -->虚拟机配置文件 4.service nginx start/stop/restart/reload -->开启/停止/重启/重载 1) Nginx安装（方法1） 1.yum install yum-utils 2.vim /etc/yum.repos.d/

2020-07-19 15:23:41 116

原创 Linux - 搭建yum源服务器

本文以Centos 7为例: yum配置目录 cd /etc/yum.repos.d 查看当前使用的yum源 yum repolist 1) 新建存放所有yum源的文件夹 mkdir [file name] 2) 同步当前镜像服务器中的yum源 1.yum -y install yum-utils (安装reposync同步工具) 2.reposync -r base -p /dir (-r 指定下载哪个库 -p 指定下载到哪里) 3) 建立资料库repository索引 1.yum -y

2020-07-19 15:18:35 473

原创 Linux - fdisk磁盘管理

df -h 查看硬盘状态 1)fdisk -l -->查看存储情况 2)fdisk /dev/newdisk -->磁盘分区 m 查看命令说明 d 删除分区 p 查看分区列表 n 创建新分区 q 退出不保存 w 保存退出 3)mkfs.ext4 /dev/sdb5(具体分区部分) -->格式化分区 4)mount /dev/sdb5 /mnt -->挂载到指定目录 ...

2020-07-19 15:12:11 118

原创 51job爬虫职位数据分析实战

国际惯例:导入模块 import pandas as pd import numpy as np import matplotlib.pyplot as plt import pymysql,re import pyecharts.charts as pc from pyecharts import options as opts from pyecharts.globals import ThemeType %matplotlib inline 加载数据 # 创建mysql数据库对象 conn = py

2020-07-19 14:51:37 1614 3

原创 scrapy爬虫实战 - 51job爬虫职位爬取

思路: 首先爬取所有内容页的链接存储到数据库然后再新建一个scrapy 爬取这些链接需要用到的模块:scrapy urllib pymysql 内容页链接爬取这里使用scrapy的通用爬虫框架创建命令:scrapy genspider -t crawl [name] [domains] # -*- coding: utf-8 -*- import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spid..

2020-07-18 16:05:15 1217

原创 Python - 分布式爬取百度贴吧

Environment Configure: Scrapy settings.py middlewares.py tieba.py Selenium Redis MongoDB Linux step 1:scrapy startproject name windows写好的爬虫文件整个传进linux无法辨识settings.py属于哪个爬虫 linux中创建scrapy爬虫 windows中编写好scrapy爬虫文件对应覆盖linux中scrapy爬虫文件即可 step 2:settings.py

2020-07-16 16:14:33 158 1

原创豆瓣读书数据分析实战

本次分析内容: 分析所有书籍评分情况热门书籍TOP20 书名高频词汇作者出版书数量TOP20 每年出版书籍数量分布热评作者TOP20 每年出版最受欢迎的类别书籍最多的分类TOP20 热评分类TOP20 导入模块 import pandas as pd import numpy as np import pymysql,re import matplotlib.pyplot as plt %matplotlib inline 读取数据库数据 conn = pymysql.connect("l

2020-07-13 16:13:13 3538 2

原创 Python分布式爬虫实战 - 豆瓣读书

01.是否为整数 >>> str="1234567890" >>> str.isdigit() True 02.是否为字母 >>> str.isalpha() False

2019-08-06 15:26:55 934 1

花生侯塞利