20200823【入门笔记】


一、软件安装问题和解决

工具:my sql, sequel pro入门使用的问题和解决方法
课程来源:《秦路七周数据分析》、《必知必会》…
解决方法:百度,感谢前人种树

1. csv无法导入

问题原因:mac和excel兼容问题
解决方法:用number打开csv,另存为utf 8

2. sequel pro无法链接数据库

step 1: 终端

mysql -u root -p

step 2:

load data local infile '位置' into table 数据库名字
fields terminated by ',';

step 3:应该就可以了,但是一直报错,应该是环境未配置好,但是至今没有解决

3. 加载sql

问题原因:版本兼容问题
解决方法:下载测试版squel pro

二、基础查询

1. 排序检索

语句:select, order by, limit

SELECT * FROM `test data 01`.dataanalyst
ORDER BY companyId DESC
LIMIT 10;

2. 过滤数据

语句:where

  • 简单过滤
SELECT * FROM `test data 01`.dataanalyst
WHERE city = "成都";
SELECT * FROM `test data 01`.dataanalyst
WHERE city IN ("成都", "杭州"); #杭州或者成都
SELECT * FROM `test data 01`.dataanalyst
WHERE companyId > 20000; #比较运算:>, <, >=, !=, not in
  • 操作符:and, or
SELECT * FROM `test data 01`.dataanalyst
WHERE city IN ("成都", "杭州")
AND education = "本科"
SELECT * FROM `test data 01`.dataanalyst
WHERE city IN ("成都", "杭州")
OR city = "上海";

(1). and:具有优先级
(2). 示例:在上海的本科生,或者在上海的工作经验1-3年的

SELECT * FROM `test data 01`.dataanalyst
WHERE city IN ("上海")
AND (education = "本科"
	OR workyear="1-3年")
  • 模糊查找
SELECT * FROM `test data 01`.dataanalyst
WHERE secondType LIKE "%后端%"; # "%后端", "后段%"

3. 分组数据

语句:group by
(1). 示例:按城市分组,每个城市的职位数量

SELECT city, COUNT(positionId) AS "position num" FROM `test data 01`.dataanalyst
GROUP BY city

(2). 示例:按城市分组,每个城市招聘公司的数量(去重:distinct

SELECT city, COUNT(distinct companyId) AS "position num" FROM `test data 01`.dataanalyst #count(*):按照默认字段计数
GROUP BY city;

(3). 多条件分组:每个城市不同学历招聘的岗位数

SELECT city, education, COUNT(*) FROM `test data 01`.dataanalyst
GROUP BY city, education;
  • 分组过滤
    语句:having
    (1). 示例:招聘职位数量大于等于100的城市
SELECT city, COUNT(*) FROM `test data 01`.dataanalyst
GROUP BY city
HAVING COUNT(positionId) >= 100

(2). 示例:招聘电子商务岗位的,数量大于等于50的城市
思路:1st. 选出包含电子商务的岗位; 2nd. 按城市进行分组,然后过滤

SELECT city, COUNT(*) FROM `test data 01`.dataanalyst
WHERE industryField LIKE "%电子商务%"
GROUP BY city
HAVING COUNT(positionId) >= 50

(3). 用if解决(示例):招聘电子商务岗位的,数量大于等于50的城市
思路:1st. 用if判断,如果名称包含“电子商务”,就返回1,不然返回null; 2nd. count计数不计null

SELECT city, COUNT(*) FROM `test data 01`.dataanalyst
GROUP BY city
HAVING COUNT(IF(industryfield LIKE "%电子商务%",1,null)) >= 50

(4). 示例:不同城市中,电子商务岗位占该城市所有职位的比重
思路:1st. 先计算城市所有的职位数量; 2nd. 电子商务的职位数量; 3rd. 按照城市分组

SELECT 
	city,
	#每个城市的岗位数
	COUNT(*) AS total,
	#每个城市电子商务的岗位数
	COUNT(IF (industryfield LIKE "%电子商务%", industryfield, NULL)) AS emarket,
    COUNT(IF (industryfield LIKE "%电子商务%", industryfield, NULL))/COUNT(*)  #select设置别名,但不能直接用于计算等
	FROM `test data 01`.dataanalyst
#按城市分组
GROUP BY city
#只要电子商务招聘岗位数大于等于10的城市
HAVING emarket >=10  
ORDER BY emarket

4. 截取数据

语句:left, locate

  • left
left (salary, 1)  #从salary中,从左边开始,截取1个
  • locate
locate ('k', salary, 4)  #在salary中,从第4个位置开始,'k'的位置是第几个

(1). 示例:8k-12k,薪资的下限是多少
思路:salary中,从左开始,截取到第一个k所在的位置

SELECT LEFT(salary,LOCATE('k', salary)-1), salary FROM `test data 01`.dataanalyst; 

(2). 示例:8k-12k,薪资的上限是多少
方法一:1st. salary中,从右开始,截取到 ‘-’ 所在的位置; 2nd. 要去掉取出的上限的’k’

SELECT 
	LEFT(salary,LOCATE("k", salary)-1), 
    locate("-", salary),
    RIGHT(salary,length(salary)-locate("-",salary)), #8k-12k的总长度,减去'-'所在的位置
    LEFT(RIGHT(salary,length(salary)-locate("-",salary)),LOCATE('k',salary)-1), #截取出的上限,再从左取到'k'之前
    salary FROM `test data 01`.dataanalyst;
  • substr
substr (salary, 2,3)  #substr(字符串,从哪个位置开始,截取长度),好处是可以从中间截取

示例:8k-12k,薪资的上限是多少
方法二:

SELECT 
    SUBSTR(salary,LOCATE('-',salary)+1,length(salary)-locate("-",salary)-1) AS top,
    salary FROM `test data 01`.dataanalyst;

三、子查询

注释:select子句可以作为一个临时的库,在其中直接查找

SELECT (bottom+top)/2 FROM(
	SELECT 
		LEFT(salary,LOCATE("k", salary)-1) AS bottom, 
		SUBSTR(salary,LOCATE('-',salary)+1,length(salary)-locate("-",salary)-1) AS top,
		salary 
	FROM `test data 01`.dataanalyst
)AS t  #别名一般用t

1. 条件分组

语句:case when……then……else……end

SELECT 
	CASE
		WHEN (bottom+top)/2 <= 10 THEN '0-10'
		WHEN (bottom+top)/2 <= 20 THEN '10-20'
		WHEN (bottom+top)/2 <= 30 THEN '20-30'
	ELSE '30+'
    END,
    salary 
FROM (
	SELECT 
		LEFT(salary,LOCATE("k", salary)-1) AS bottom, 
		SUBSTR(salary,LOCATE('-',salary)+1,length(salary)-locate("-",salary)-1) AS top,
		salary 
	FROM `test data 01`.dataanalyst
	)AS t

2. 过滤功能

语句:where……in
示例:城市职位数量大于100的城市,职位详情

SELECT * FROM `test data 01`.dataanalyst
	WHERE city IN(
		SELECT city FROM `test data 01`.dataanalyst
		GROUP BY city HAVING COUNT(positionid) >=100
		)  #where和select中的字段要一致,都是city

四、表的连结

语句:join
示例:查询“唯医网”的招聘岗位,其中公司名称在company表中,职位在dataanalyst中,共同字段是companyid
(1). 方法一:用where子查询

SELECT * FROM `test data 01`.dataanalyst
WHERE companyId = (
	SELECT companyId FROM `test data 01`.company
	WHERE companyshortname = "唯医网")  #选出来的信息只包含datananlyst的部分

(2). 方法二:用join……on

SELECT * FROM `test data 01`.dataanalyst AS d
JOIN `test data 01`.company AS c
ON d.companyId = c.companyid
where companyshortname = '唯医网'
  • left join
    语句:left join……on
    作用:左链接,包含join左边的全部,没有的部分用null
    应用:可以进行排除法统计
    在这里插入图片描述

示例:人数在150-500人公司的占比

SELECT COUNT(t.companyid)/COUNT(1)
FROM `test data 01`.company AS d
LEFT JOIN (SELECT * FROM `test data 01`.company
	WHERE companysize= '150-500人') AS t
ON t.companyId = d.companyid

五、时间函数

语句:date
(1)date():年月日
(2)date_format(列名,’%Y-%m’):把列的时间格式改为Y-m

SELECT paidTime, date(paidTime), date_format(paidTime,'%Y-%m') 
FROM `test data 01`.orderInfo

(3)date_sub(列名, interval 1 day/week/month) :减去时间
(4)date_add(列名, interval 1 day):增加时间
(5)datediff:时间差值

示例:不同月份成交的订单笔数

SELECT COUNT(*), date_format(paidTime,'%Y-%m') as monthly
FROM `test data 01`.orderInfo
GROUP BY monthly

六、例题

例题1:统计不同月份的下单人数

SELECT COUNT(distinct userId),  #去重
	date_format(paidTime,'%Y-%m') as monthly  #也可以用month()
FROM `test data 01`.orderInfo
WHERE isPaid = '已支付'  #有未支付或者空值的脏数据
GROUP BY monthly

例题2:统计用户三月份的回购率和复购率

#复购率:在三月份购买多次的人,占三月份购买人数的比率
SELECT COUNT( if ( ct >1, 1, null) )/COUNT(ct)  
FROM (
	SELECT  userId,count(userid) as ct
	FROM `test data 01`.orderInfo
	WHERE month(paidTime)= 3
	GROUP BY userId
) as t
#回购率:三月份购买且后一个月又购买的人,占三月份购买人数的比率
SELECT t1.m, COUNT(t2.m)/COUNT(t1.m) FROM(
	SELECT userId, date_format(paidtime,'%Y-%m-01') as m FROM `test data 01`.orderInfo
	WHERE isPaid='已支付'
	GROUP BY userId, date_format(paidtime,'%Y-%m-01') )as t1
LEFT JOIN (
	SELECT userId, date_format(paidtime,'%Y-%m-01') as m FROM `test data 01`.orderInfo
	WHERE isPaid='已支付'
	GROUP BY userId, date_format(paidtime,'%Y-%m-01') )as t2
ON t1.userId=t2.userId and t1.m = date_sub(t2.m, interval 1 month)
GROUP BY t1.m

例题3:统计男女用户的消费频次是否有差异
思路:先求出不同id的消费次数,再求平均数

SELECT sex, AVG(ct) FROM (	
    SELECT o.userId, sex, COUNT(1) as ct FROM (
		SELECT * FROM `test data 01`.userinfo
		WHERE sex !='') u
	JOIN `test data 01`.orderinfo o
	ON u.userId=o.userId
	GROUP BY userId, sex) t2
GROUP BY sex

例题4:统计多次消费的用户,第一次消费和最后一次消费的间隔是多少

SELECT userId, max(paidtime), min(paidtime), 
	datediff(max(paidtime), min(paidtime)), #天数差值
    max(paidtime)-min(paidtime) #秒数差值
FROM `test data 01`.orderInfo
where ispaid="已支付"
group by userId having count(1) >1

例题5:统计不同年龄段,用户的消费频次是否有差异

select age, avg(ct) from(	
    SELECT u.userid, 
	ceil((date_format(now(),'%Y')-year(birth) )/10 )as age, #/10是指年龄分10段,ceil向上取整,=1就是0-10岁
	#或者 year(now())-year(birth),
	count(u.userid) as ct
	FROM `test data 01`.userinfo u
	JOIN orderinfo o
	ON u.userid = o.userid
	group by userid) t #报错1248,每个派生表都要加一个别名
group by age

例题6:统计消费的二八法则,消费的top 20%用户,贡献了多少额度

SELECT count(userid), sum(total) FROM (
    SELECT userid, sum(price) as total FROM `test data 01`.orderInfo
	where ispaid = '已支付'
	group by userid
	order by total desc
    limit 1 ) t#count(userid)*0.2,前20%的消费者
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值