notebook python 内嵌 数据库_python数据分析:在jupyter notebook上使用python&SQL做数据分析...

python数据分析:在jupyter notebook上使用python&SQL做数据分析

发布时间:2019-01-14 21:14,

浏览次数:1143

, 标签:

python

jupyter

notebook

SQL

类似于在jupyter上使用R语言,同样可以使用SQL语句:

详细见github项目:https://github.com/catherinedevlin/ipython-sql

<>安装ipython-sql

pip install ipython-sql

<>载入

%load_ext sql

<>连接数据库 同 SQLAlchemy

* postgresql://will:[email protected]/shakes

* mysql+pymysql://scott:[email protected]/foo

* oracle://scott:[email protected]:1521/sidname

* sqlite://

* sqlite:///foo.db

*

mssql+pyodbc://username:[email protected]/databasedriver=SQL+Server+Native+Client+11.0

我是使用的是mysql,本地链接,用户名ffzs,密码666666,test数据库:

%sql mysql+pymysql://ffzs:[email protected]/test

<>简单使用

%matplotlib inline import matplotlib.pyplot as plt plt.style.use('bmh')

<>1.显示表

%%sql show tables;

<>2.选取steam_users表的前5行

df = %sql select * from steam_users limit 5 df.DataFrame()

<>3.计算表中包含多少游戏数和玩家数

%%sql select count(distinct Game) gameCount, count(distinct UserID) userCount

from steam_users

<>4.筛选出拥有用户前十的游戏

%%sql data << select Game , count(1) as count from steam_users where Action=

'play' group by Game order by count desc limit 10

data.DataFrame()[::-1].plot.barh("Game","count")

<>5.筛选出被玩总时长前十的游戏

%%sql playHour << select Game,sum(Hours) as playHour from steam_users where

Action="play" group by Game order by playHour desc limit 10

playHour.DataFrame()[::-1].plot.barh('Game', 'playHour')

<>6.筛选出被玩平均时长前十的游戏

%%sql avgHour << select Game, avg(Hours) as avgHour from steam_users where

Action='play' group by Game order by avgHour desc limit 10

avgHour.DataFrame()[::-1].plot.barh('Game','avgHour')

<>7.平均时长前十的游戏的游戏人数

%%sql select Game, avg(Hours) as avgHour, count(1) as count from steam_users

where Action='play' group by Game order by avgHour desc limit 10

联系join on:

%%sql select a.Game, avgHour, count from (select Game, avg(Hours) as avgHour

from steam_users where Action='play' group by Game order by avgHour desc limit

10) a left join (select Game ,count(1) as count from steam_users where Action=

'play' group by Game) b on a.Game=b.Game order by avgHour desc

可见平均时长长的游戏大多是小众游戏

<>8.玩家人数大于500人的游戏的个数(having使用)

%%sql select count(1) as count from (select Game, count(1) as count from

steam_userswhere Action='play' group by Game having count > 500) a

<>9.拥有游戏数量前十用户

%%sql games << select UserID, count(1) count from steam_users where Action=

'play' group by UserID order by count desc limit 10

games.DataFrame()[::-1].plot.barh('UserID','count')

<>10.游戏总时长最多5个用户和最少5个用户(union使用)

%%sql (select UserID, sum(Hours) as allHour from steam_users where Action=

'play' group by UserID order by allHour desc limit 5) union (select UserID, sum(

Hours) as allHour from steam_users where Action='play' group by UserID order by

allHourlimit 5)

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值