- 博客(15)
- 收藏
- 关注
原创 续~pandas按某列的时间值进行筛选
我之前爬了智联的信息,因为需要找工作,爬了好几天的,但是我发现,几天前的都不理会你的简历,所以我就用pandas按时间进行筛选。很简单的一段程序,但是我们这行就是学一点记一点会一点。一般科技公司是不允许你带优盘把你曾经的工作成品拷贝走的,被发现的话,后果更严重,再说还有监控。 import pandas as pdimport numpy as npf = open("智联大连pyt...
2018-09-14 10:00:04 13653 10
原创 谷歌翻译
import logging#from nose.tools import (eq_, with_setup)from fake_useragent import UserAgentimport requests#import threadingua=UserAgent()#import re#import requests_cache#requests_cache.confi...
2018-09-13 08:30:07 350
原创 中国新闻网
# -*- coding: utf-8 -*-import osimport reimport threadingimport time,datetimefrom lxml import etreeimport pandas as pdimport pyodbcimport requestsfrom pybloom_live import BloomFilterfrom fa...
2018-09-13 08:29:29 294
转载 清洗数据
# -*- coding: utf-8 -*-"""Created on Tue Aug 7 14:36:45 2018@author: 33"""import sys#reload(sys)#sys.setdefaultencoding('utf-8')import pandas as pdimport osimport reimport xml.etree.Ele..
2018-09-13 08:28:49 305
原创 ip池
182.99.253.180:61234 39.137.77.67:8080 118.212.137.135:31288 221.182.133.161:375 59.44.16.6:8000 117.127.0.197:8080 39.135.35.19:80 47.89.18.87:80 221.182.133.175:9999 218.65.67.15:61234 121...
2018-09-13 08:28:03 80222 1
原创 51job
# import urllib.requestimport reimport xlwt#用来创建excel文档并写入数据import requestsimport csvfrom lxml import etreefrom fake_useragent import UserAgentCOUNT = 3def parse(COUNT, header, url): while...
2018-09-13 08:27:48 466
转载 智联
import re,requestsfrom lxml import etree#import pymysql,sysimport csvimport time,randomfrom fake_useragent import UserAgentCOUNT = 3def parse(COUNT, header, url): while COUNT: try:...
2018-09-13 08:27:27 465
原创 下载表格
# -*- coding: utf-8 -*-import csvimport timeimport threadingfrom bs4 import BeautifulSoup as bsimport pandas as pdimport requests#国家字典country_dict={'AFGHANISTAN TIS': '1', 'ALBANIA': '3', '...
2018-09-12 08:23:33 1539
原创 jieba分词
import jiebaimport sysimport importlib#importlib.reload(sys)#sys.setdefaultencoding( "utf-8" )file = 'jiebatest.txt'fn = open(file,'r')print(fn.read())fn.closeimport jieba.posseg as pseg#im...
2018-09-11 13:24:16 128
原创 百度舆情
import requestsimport jsonimport codecsimport timeimport randomimport csvfrom fake_useragent import UserAgentimport pandas as pdfrom selenium import webdriverfrom lxml import etreefrom bs4 i...
2018-09-11 13:23:05 1211
原创 亚马逊书籍
import osimport reimport threadingimport time,datetimeimport csvfrom lxml import etreeimport pandas as pdimport pyodbcimport requestsfrom pybloom_live import BloomFilterfrom fake_useragent i...
2018-09-11 13:22:26 393
原创 爬微博
import requests import json import codecs import time import random import csv from fake_useragent import UserAgent import pandas as pd from selenium import webdriver from lxml import etree ...
2018-09-11 13:17:35 203
原创 python 传入sql
一些国家贸易信息的csv传入SQLconn = pyodbc.connect(r'DRIVER={SQL Server Native Client 10.0};SERVER=192.168.2.188;DATABASE=india1;UID=sa;PWD=123456')cursor = conn.cursor()#创建数据库for type1 in ["export"]: f...
2018-09-11 13:10:00 273
原创 写入csv函数
def save_to_csv(""要存入的各个列名字""): row = [""要存入的各个列名字""] with open(r'path/filename.csv', 'a', newline='', encoding='utf-8') as file: f = csv.writer(file) f.writerow(row)...
2018-09-11 13:06:28 666
原创 关于pandas的 一些操作
1. 删除操作删除csv文件最后一行import osimport numpy as npimport pandas as pddef main(): df = pd.read_csv('filename' ,encoding='gb2312') df.drop(df.index[[-1]],inplace=True) df.to_csv('filenam...
2018-09-11 10:00:52 177
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人