2017年01月_星空永恒&&卡利达

原创使用pandas读写JSON

利用JSON字符串创建一个pandas Series。pandas提供的read_json()函数，可以用来创建pandas Series或者pandas DataFrame数据结构In [1]: import pandas as pdIn [2]: json_str = '{"country":"Netherlands"}'In [3]: data = pd.read_json(json_

2017-01-16 23:09:18 25576

原创使用pandas读写Excel文件

使用pandas读写Excel文件：模块安装： pip install openpyxl 模块openpyxl源于PHPExcel，它提供了针对.xlsx文件的读写功能 pip install xlsxwriter 模块也需要读取.xlsx文件 pip install xlrd 模块xlrd能用来析取.xls和.xlsx文件中的数据。下面，我们先来生成用于填充pandas

2017-01-16 18:39:00 51548

原创 NumPy.npy与pandas DataFrame

用CSV格式来保存文件是个不错的主意，因为大部分程序设计语言和应用程序都能处理这种格式，所以交流起来非常方便。然而这种格式的存储效率不是很高，原因是CSV及其他纯文本格式中含有大量空白符;而后来发明的一些文件格式，如zip、bzip和gzip等，压缩率则有了显著提升首先导入模块：In [1]: import numpy as npIn [2]: import pandas as pd

2017-01-16 14:39:08 6801

原创利用NumPy和pandas对CSV文件进行写操作

数组存储成CSV之类的区隔型文件：下面代码给随机数生成器指定种子，并生成一个3*4的NumPy数组将一个数组元素的值设为NaN: In [26]: import numpy as np In [27]: np.random.seed(42) In [28]: a = np.random.randn(3,4) In [29]: a[2][2] = np.nan In [30]: pri

2017-01-16 00:12:10 15056

The flat property gives back a numpy.flatiter object.This is the only means to get a flatiter object;we do not have access to a flatiter constructor.The flat iterator enables us to loop through an arr

2017-01-06 18:23:56 497

原创 numpy 学习4

Splitting Numpy arrays:Horizontal splitting:The following code splits a 3*3 array on its horizontal axis into three parts of the same size and shape: In [26]: np.hsplit(a,3) Out[26]: [array([[0

2017-01-06 17:08:26 307

原创 virtualenv 配置scrapy

For Debian and Ubuntu, the following command will ensure that the required dependencies are installed:1.第一步先做这个，这个是在全局环境中执行的，是为了安装好cryptography的相关依赖$ sudo apt-get install build-essential libssl-d

2017-01-06 16:37:54 1358

原创 numpy 学习3

Arrays can be stacked horizontally,depth wise,or vertically.We can use,for this goal,the vstack(),dstack(),hstack(),colum_stack(),row_stack(),and concatenate() functions.To start with,let's set up som

2017-01-06 16:29:55 362

原创 numpy 学习2

In [1]: import numpy as npIn [2]: a = np.arange(9)In [3]: a[3:7]Out[3]: array([3, 4, 5, 6])In [4]: a[:7:2]Out[4]: array([0, 2, 4, 6])In [5]: a[::-1]Out[5]: array([8, 7, 6, 5, 4, 3, 2, 1, 0]

2017-01-06 12:05:58 348

原创 numPy 学习1

not allowed to change a complex number into a integernot allowed to change a complex number into a floating-point numberyou can convert a floating-point number to a complex number, for example, com

2017-01-05 21:11:05 340

原创 python virtualenv使用

virtualenv目前对我来说最有用的是可以创建虚拟环境，解决python版本或者包版本的冲突，而且学习成本不大安装virtualenv: pip3 install virtualenv (如果还未安装pip3，可以运行sudo apt-get install python3-pip) (sudo apt-get install python-pip使用来安装python2.7版

2017-01-04 11:17:11 452

原创 git 起步

图形或网页设计师版本控制系统（VCS）:如果你是位图形或网页设计师，可能会需要保存某一幅图片或页面布局文件的所有修订版本（这或许是你非常渴望拥有的功能），采用版本控制系统（VCS）是个明智的选择。有了它你就可以将某个文件回溯到之前的状态，甚至将整个项目都回退到过去某个时间点的状态，你可以比较文件的变化细节，查出最后是谁修改了哪个地方，从而找出导致怪异问题出现的原因，又是谁在何时报告了某个功能

2017-01-03 19:39:48 384

原创 python函数random研读

摘自python数据收集一书的一段文字：大多数随机数算法都努力创造一种呈均匀分布且难以预测的数据序列，但是在算法初始化阶段都需要提供随机数"种子"(random seed)。而完全相同的种子每次会产生同样的"随机"数序列，因此我用系统时间作为随机数序列生成的起点。random.seed(a=None, version=2) Initialize the random number ge

2017-01-03 16:44:17 383

原创 scrapy定时任务

1. sudo crontab -e2. 我选择的是vim3. 在末尾插入一行 */1 * * * * sh /home/maoxianxin/scrape/scrapeIP/cron.sh4. 在/home/maoxianxin/scrape/scrapeIP/下新建cron.sh文件代码如下： #! /bin/sh export PATH=$PATH:/u

2017-01-02 18:26:36 2247 1

原创抓取西刺代理IP+验证是否可用+存储mongodb

spider文件的代码：import scrapyimport requests #用于测试抓取过来的IP是否可用class XiciSpider(scrapy.Spider): name = "xici" allowed_domains = ["xicidaili.com",] def start_requests(self): urls = ["http://www.xic

2017-01-02 18:06:01 2085

原创 downloader middleware 研读(1)

对requests和response会产生影响，像代理IP什么的就跟这个有关了 The downloader middleware is a framework of hooks into Scrapy’s request/response processing. It’s a light, low-level system for globally altering Scrapy’s requ

2017-01-02 16:48:26 836

星空永恒