英文原文: 08 - Lesson
如何从微软的 SQL 数据库中抓取数据。
import pandas as pd
import sys
from sqlalchemy import create_engine, MetaData, Table, select, engine
print('Python version ' + sys.version)
print('Pandas version ' + pd.__version__)
Python version 3.6.1 | packaged by conda-forge | (default, Mar 23 2017, 21:57:00) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] Pandas version 0.19.2 # 版本 1 这一部分,我们使用 ***sqlalchemy*** 库从 sql 数据库中抓取数据。 确保使用你自己的 ***ServerName***, ***Database***, ***TableName*** (服务器名,数据库和表名)。
TableName = "data"
DB = {
'drivername': 'mssql+pyodbc',
'servername': 'DAVID-THINK',
'database': 'BizIntel',
'driver': 'SQL Server Native Client 11.0',
'trusted_connection': 'yes',
'legacy_schema_aliasing': False
}
engine = create_engine(DB['drivername'] + '://' + DB['servername'] + '/' + DB['database'] + '?' + 'driver=' + DB['driver'] + ';' + 'trusted_connection=' + DB['trusted_connection'], legacy_schema_aliasing=DB['legacy_schema_aliasing'])
conn = engine.connect()
metadata = MetaData(conn)
tbl = Table(TableName, metadata, autoload=True, schema="dbo")
sql = tbl.select()
result = conn.execute(sql)
df = pd.DataFrame(data=list(result), columns=result.keys())
conn.close()
print('Done')
Done 查看一下 dataframen 中的内容。
df.head()
| Date | Symbol | Volume |
---|
0 | 2013-01-01 | A | 0.00 |
---|
1 | 2013-01-02 | A | 200.00 |
---|
2 | 2013-01-03 | A | 1200.00 |
---|
3 | 2013-01-04 | A | 1001.00 |
---|
4 | 2013-01-05 | A | 1300.00 |
---|
df.dtypes
Date datetime64[ns] Symbol object Volume object dtype: object 转变成特殊的数据类型。以下的代码,你需要比配你自己的表名并修改代码。 # 版本 2
import pandas.io.sql
import pyodbc
server = 'DAVID-THINK'
db = 'BizIntel'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + DB['servername'] + ';DATABASE=' + DB['database'] + ';Trusted_Connection=yes')
sql = """
SELECT top 5 *
FROM data
"""
df = pandas.io.sql.read_sql(sql, conn)
df.head()
| Date | Symbol | Volume |
---|
0 | 2013-01-01 | A | 0.0 |
---|
1 | 2013-01-02 | A | 200.0 |
---|
2 | 2013-01-03 | A | 1200.0 |
---|
3 | 2013-01-04 | A | 1001.0 |
---|
4 | 2013-01-05 | A | 1300.0 |
---|
# 版本 3
from sqlalchemy import create_engine
ServerName = "DAVID-THINK"
Database = "BizIntel"
Driver = "driver=SQL Server Native Client 11.0"
engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database + "?" + Driver)
df = pd.read_sql_query("SELECT top 5 * FROM data", engine)
df
| Date | Symbol | Volume |
---|
0 | 2013-01-01 | A | 0.0 |
---|
1 | 2013-01-02 | A | 200.0 |
---|
2 | 2013-01-03 | A | 1200.0 |
---|
3 | 2013-01-04 | A | 1001.0 |
---|
4 | 2013-01-05 | A | 1300.0 |
---|