python数据分析(应用数据库)

本文介绍主题如下:

  • 基于sqlite3的轻量级访问
  • 通过pandas访问数据库
  • SQLAlchemy的安装与配置
  • 通过SQLAlchemy填充数据库
  • 通过SQLAlchemy查询数据库
  • Pony ORM
  • Dataset:懒人数据库
  • PyMongo与MongoDB
  • 利用Redis存储数据
  • Apache Cassandra

1、基于sqlite3的轻量级访问

SQLite是一款非常流行的关系型数据库,由于它非常轻盈,因此被大量应用程序广泛使用。sqlite3是python标准发行版中自带的模块,可以用于处理sqlite数据库。数据库既可以保存到文件中,也可以保存在内存中,这里保存到内存中。
代码:
import sqlite3
with sqlite3.connect(":memory:") as con:
    c=con.cursor()  #创建游标
    c.execute('''CREATE TABLE sensors(data text,city text,code text,sensor_id real,temperature real)''') #新建表,text和real分别表示字符串和数值的类型
    for table in c.execute("SELECT name FROM sqlite_master WHERE type='table'"):
        print "Table",table[0]
        c.execute("INSERT INTO sensors VALUES ('2016-11-05','Utrecht','Red',42,15.14)")
        c.execute("SELECT * FROM sensors")
        print c.fetchone()  #输出插入记录
        con.execute("DROP TABLE sensors")  #删除表
        print "# of tables",c.execute("SELECT COUNT(*) FROM sqlite_master WHERE type='table'").fetchone()[0]
c.close()
运行结果:
Table sensors
(u'2016-11-05', u'Utrecht', u'Red', 42.0, 15.14)
# of tables 0

2、通过pandas访问数据库

首先安装statsmodels库,安装命令如下:
pip install statsmodels
该库中包含太阳黑子周期数据。
代码:
import statsmodels.api as sm 
from pandas.io.sql import read_sql
import sqlite3
with sqlite3.connect(":memory:") as con:
    c=con.cursor()
    data_loader=sm.datasets.sunspots.load_pandas()  #加载数据
    df=data_loader.data
    rows=[tuple(x) for x in df.values]
    
    con.execute("CREATE TABLE SUNSPOTS(year,sunactivity)")
    con.executemany("INSERT INTO sunspots(year,sunactivity) VALUES(?,?)",rows)  #执行多次SQL语句
    c.execute("SELECT COUNT(*) FROM sunspots")  #统计数据表中元组数
    print c.fetchone()
    print "Deleted",con.execute("DELETE FROM sunspots where sunactivity>20").rowcount,"rows"  #rowcount表示受影响的行
    print read_sql("SELECT * FROM sunspots where year<1732",con)  #如果把数据库连接到pandas,使用read_sql函数执行查询就可以返回pandas DataFrame了
    con.execute("DROP TABLE sunspots")
c.close()
运行结果:
(309,)
Deleted 217 rows
      year  sunactivity
0   1700.0          5.0
1   1701.0         11.0
2   1702.0         16.0
3   1707.0         20.0
4   1708.0         10.0
5   1709.0          8.0
6   1710.0          3.0
7   1711.0          0.0
8   1712.0          0.0
9   1713.0          2.0
10  1714.0         11.0
11  1723.0         11.0

3、SQLAlchemy的安装与配置

SQLAlchemy以基于设计模式的对象关系映射(ORM)而闻名。也就是说,它可以把Python的类映射为数据库的数据表。实际上,这意味着添加了一个额外的抽象层,因此,我们需要使用SQLAlchemy应用程序接口跟数据库打交道,而非使用SQL命令。使用SQLAlchemy好处是它可以在幕后替我们处理各种细节。但是凡事有利皆有弊,这样我们就要学习他的应用程序接口,同时性能也会下降。
下面是安装SQLAlchemy所需要的命令:
pip install SQLAlchemy
使用一个带有两个数据表的小型数据库,其中第一个数据表是关于观测站的,第二个数据表是描述观测站内的传感器的。代码放在alchemy_entities.py文件中,不必直接运行该文件,因为他是供其他脚本使用的,代码如下:
#coding:utf8
from sqlalchemy import Column, ForeignKey, Integer, Float, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy import create_engine
from sqlalchemy import UniqueConstraint

# 超类
Base = declarative_base()

# 观测站表
class Station(Base):
    __tablename__ = 'station'  # 表名
    id = Column(Integer, primary_key=True)  # id
    name = Column(String(14), nullable=False, unique=True)  # 观测站名称


    def __repr__(self):
        return "Id=%d name=%s" % (self.id, self.name)

# 传感器表
class Sensor(Base):
    __tablename__ = 'sensor'  # 表名
    id = Column(Integer, primary_key=True)  # id
    last = Column(Integer)
    multiplier = Column(Float)
    station_id = Column(Integer, ForeignKey('station.id'))  # 外键
    station = relationship(Station)


    def __repr__(self):
        return "Id=%d last=%d multiplier=%.1f station_id=%d" % (self.id, self.last, self.multiplier, self.station_id)

if __name__ == "__main__":
    print("This script is used by another script. Run python alchemy_query.py")

4、通过SQLAlchemy填充数据库

数据表的创建将在下一节介绍,本节先准备一个脚本,以便用来填充数据库,不能直接运行这个脚本,因为它是供下一节中的脚本使用的。通过DBSession对象,可以向数据表中插入数据,当然还需要一个引擎。引擎创建方法也放在下一节,文件populate_db.py
代码:
#coding:utf8
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from alchemy_entities import Base, Sensor, Station

def populate(engine):
    # 创建DBSession对象
    Base.metadata.bind = engine
    DBSession = sessionmaker(bind=engine)
    session = DBSession()


    # 创建两个观测站
    de_bilt = Station(name='De Bilt')
    session.add(de_bilt)


    session.add(Station(name='Utrecht'))
    session.commit()
    print('Station', de_bilt)


    # 添加传感器记录
    temp_sesor = Sensor(last=20, multiplier=.1, station=de_bilt)
    session.add(temp_sesor)
    session.commit()
    print("Sensor", temp_sesor)

if __name__ == "__main__":
    print("This script is used by another script. Run python alchemy_query.py")

5、使用SQLAlchemy查询数据库

将代码放在文件alchemy_query.py文件中,代码如下:

#coding:utf8
from alchemy_entities import Base, Station, Sensor
from pupulate_db import populate
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import os
from pandas.io.sql import read_sql

# 创建引擎
engine = create_engine('sqlite:///demo.db')
# 删除数据表
Base.metadata.drop_all(engine)
# 创建数据表
Base.metadata.create_all(engine)
populate(engine)
Base.metadata.bind = engine
DBSession = sessionmaker()
DBSession.bind = engine
session = DBSession()

# 查询station表中的第一行记录
station = session.query(Station).first()

# 查询所有station
print('all station', session.query(Station).all())
# 查询所有sensor
print('all sensor', session.query(Sensor).all())
# 查询第一个station的第一个sensor
print('query sensor by station', session.query(Sensor).filter(Sensor.station == station).one())
# 使用pandas的read_sql查询
print('read_sql all station', read_sql("SELECT * FROM station", engine.raw_connection()))

# 删除数据库,没有执行
try:
    os.remove('demo.db')
    print('Delete demo.db')
except OSError as e:
    # [WinError 32] 另一个程序正在使用此文件,进程无法访问。: 'demo.db'
    print(e)
    pass
运行结果:
('Station', Id=1 name=De Bilt)
('Sensor', Id=1 last=20 multiplier=0.1 station_id=1)
('all station', [Id=1 name=De Bilt, Id=2 name=Utrecht])
('all sensor', [Id=1 last=20 multiplier=0.1 station_id=1])
('query sensor by station', Id=1 last=20 multiplier=0.1 station_id=1)
('read_sql all station',    id     name
0   1  De Bilt
1   2  Utrecht)
[Error 32] : 'demo.db'

6、Pony ORM

代码:
#coding:utf8
#from pony import orm
from pony.orm import Database,db_session
from pandas.io.sql import to_sql
import statsmodels.api as sm
db=Database('sqlite',':memory:')#建立放在内存中的sqlite数据库
with db_session:
    data_loader=sm.datasets.sunspots.load_pandas()
    df=data_loader.data
    to_sql(df,"sunspots",db.get_connection())
    print db.select("count(*) FROM sunspots")
运行结果:
[309]

7、Dataset:懒人数据库

dataset是一个Python库,实际上是SQLAlchemy的一个包装器。这个库开发主旨是尽量易于使用,也就是尽量让懒人满意。
代码:
#coding:utf8
import dataset
from pandas.io.sql import read_sql
from pandas.io.sql import to_sql
import statsmodels.api as sm
db=dataset.connect('sqlite:///:memory:')
table=db["books"]
table.insert(dict(title="Numpy Beginner's Guide",author="Ivan Idris"))
table.insert(dict(title="Numpy Cookbook",author='Ivan Idris'))
table.insert(dict(title="Learning Numpy",author='Ivan Idris'))
print read_sql('SELECT * FROM books',db.executable.raw_connection())

data_loader=sm.datasets.sunspots.load_pandas()
df=data_loader.data
to_sql(df,"sunspots",db.executable.raw_connection())
table=db["sunspots"]

for row in table.find(_limit=5):
	print row
print "Tables",db.tables


运行结果:

id      author                   title
0  1  Ivan Idris  Numpy Beginner's guide
1  2  Ivan Idris          Numpy Cookbook
2  3  Ivan Idris          Learning Numpy
OrderedDict([('index', 0), ('YEAR',1700.0), ('SUNACTIVITY', 5.0)])
OrderedDict([('index', 1), ('YEAR',1701.0), ('SUNACTIVITY', 11.0)])
OrderedDict([('index', 2), ('YEAR',1702.0), ('SUNACTIVITY', 16.0)])
OrderedDict([('index', 3), ('YEAR',1703.0), ('SUNACTIVITY', 23.0)])
OrderedDict([('index', 4), ('YEAR',1704.0), ('SUNACTIVITY', 36.0)])
Table ['books', 'sunspots']

6、PyMongo与MongoDB

MongoDB是一个面向文档的NoSQL数据库,文档将以类似JSON的BSON格式进行存储。Windows下可以直接在官网下载安装,例如我直接装在D:\MongoDB目录,然后再该目录下新建文件夹data,etc,logs。然后再进入目录D:\MongoDB\Server\3.4\bin 目录启动DOS,执行命令:mongod --dbpath D:\MongoDB\data启动数据库服务。PyMongo是MongoDB的Python驱动程序,测试代码如下:
from pymongo import MongoClient
import statsmodels.api as sm
import json
import pandas as pd
client=MongoClient()
db=client.test_database
data_loader=sm.datasets.sunspots.load_pandas()
df=data_loader.data
rows=json.loads(df.T.to_json()).values()
db.sunspots.insert(rows)
cursor=db['sunspots'].find({})
df=pd.DataFrame(list(cursor))
print df
db.drop_collection('sunspots')
运行结果:
     SUNACTIVITY    YEAR                       _id
0           57.1  1916.0  5a1e979cb2faff280d618a1f
1          103.9  1917.0  5a1e979cb2faff280d618a20
2            9.6  1914.0  5a1e979cb2faff280d618a21
3           47.4  1915.0  5a1e979cb2faff280d618a22
4            3.6  1912.0  5a1e979cb2faff280d618a23
5            1.4  1913.0  5a1e979cb2faff280d618a24
6           18.6  1910.0  5a1e979cb2faff280d618a25
7            5.7  1911.0  5a1e979cb2faff280d618a26
8           30.5  1865.0  5a1e979cb2faff280d618a27
9           10.2  1964.0  5a1e979cb2faff280d618a28
10          15.1  1965.0  5a1e979cb2faff280d618a29
11          80.6  1918.0  5a1e979cb2faff280d618a2a
12          63.6  1919.0  5a1e979cb2faff280d618a2b
13           8.5  1833.0  5a1e979cb2faff280d618a2c
14          27.5  1832.0  5a1e979cb2faff280d618a2d
15          47.8  1831.0  5a1e979cb2faff280d618a2e
16          70.9  1830.0  5a1e979cb2faff280d618a2f
17         138.3  1837.0  5a1e979cb2faff280d618a30
18         121.5  1836.0  5a1e979cb2faff280d618a31
19          56.9  1835.0  5a1e979cb2faff280d618a32
20          13.2  1834.0  5a1e979cb2faff280d618a33
21          85.7  1839.0  5a1e979cb2faff280d618a34
22         103.2  1838.0  5a1e979cb2faff280d618a35
23          16.3  1866.0  5a1e979cb2faff280d618a36
24          21.0  1724.0  5a1e979cb2faff280d618a37
25          40.0  1725.0  5a1e979cb2faff280d618a38
26          78.0  1726.0  5a1e979cb2faff280d618a39
27         122.0  1727.0  5a1e979cb2faff280d618a3a
28          28.0  1720.0  5a1e979cb2faff280d618a3b
29          26.0  1721.0  5a1e979cb2faff280d618a3c
..           ...     ...                       ...
279         98.5  1847.0  5a1e979cb2faff280d618b36
280         15.0  1844.0  5a1e979cb2faff280d618b37
281         40.1  1845.0  5a1e979cb2faff280d618b38
282         24.2  1842.0  5a1e979cb2faff280d618b39
283         10.7  1843.0  5a1e979cb2faff280d618b3a
284         64.6  1840.0  5a1e979cb2faff280d618b3b
285         36.7  1841.0  5a1e979cb2faff280d618b3c
286         43.9  1909.0  5a1e979cb2faff280d618b3d
287         48.5  1908.0  5a1e979cb2faff280d618b3e
288        124.7  1848.0  5a1e979cb2faff280d618b3f
289         96.3  1849.0  5a1e979cb2faff280d618b40
290         92.5  1777.0  5a1e979cb2faff280d618b41
291         19.8  1776.0  5a1e979cb2faff280d618b42
292          7.0  1775.0  5a1e979cb2faff280d618b43
293         30.6  1774.0  5a1e979cb2faff280d618b44
294         34.8  1773.0  5a1e979cb2faff280d618b45
295         66.5  1772.0  5a1e979cb2faff280d618b46
296         81.6  1771.0  5a1e979cb2faff280d618b47
297        100.8  1770.0  5a1e979cb2faff280d618b48
298         92.5  1978.0  5a1e979cb2faff280d618b49
299        125.9  1779.0  5a1e979cb2faff280d618b4a
300        154.4  1778.0  5a1e979cb2faff280d618b4b
301         27.9  1963.0  5a1e979cb2faff280d618b4c
302        134.7  1949.0  5a1e979cb2faff280d618b4d
303         37.6  1962.0  5a1e979cb2faff280d618b4e
304         53.9  1961.0  5a1e979cb2faff280d618b4f
305         83.9  1950.0  5a1e979cb2faff280d618b50
306        112.3  1960.0  5a1e979cb2faff280d618b51
307         93.8  1967.0  5a1e979cb2faff280d618b52
308         47.0  1966.0  5a1e979cb2faff280d618b53




阅读更多

没有更多推荐了,返回首页