Week4
套路
- t为Table实例
t.apply(func, {%s}) # column label
- xticks 缩写 for x-axis ticks
import matplotlib.pyplot as plt
# 示例数据
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
# 绘制散点图
plt.scatter(x, y)
# 设置x轴刻度
custom_ticks = [1, 2, 3, 4, 5] # 自定义刻度位置
custom_labels = ['A', 'B', 'C', 'D', 'E'] # 自定义刻度标签
plt.xticks(custom_ticks, custom_labels)
# 显示图形
plt.show()
Knowledge through Quiz
设置简易问答,调用思考,引导
Area
Area of Bar = Percent in Bin
= Height x Bin Width
“How many individuals in the bin?” - Use ____ ?
“How crowded (dense) is the bin?” - Use ____ ?
Feature of different graph
Line graph: _____ data (over time, etc.)
Scatter plot: relation between two ____ variables
Bar chart: distribution of one _____ variable or relation between a ______ and a ______ variable
Histogram: distribution of one _____ variable
Answers
- Area
- area
- height
- Feature
- sequential
- numerical
- categorical categorical numerical
- numerical
其他
好习惯
- use docstring format to writing comment(接触很久了,但是离做到确实还差很远呢)
平均值图 in scatter - 回歸線
apply
# 查看datascience的apply和以后用的真正的apply有什么不同
>>> import pandas as pd
>>> help(pd.DataFrame.apply)
EXAMPLE
def convert_pay_string_to_number(pay_string):
"""Converts a pay string like '$100' (in millions) to a number of dollars."""
result = float(pay_string.strip("$")) * 1e6
return result
###
arr_total_pay = compensation.apply(convert_pay_string_to_number, "Total Pay")
后记
到这里如果是新入门,需要了解其实pd.dataFrame就是Table了!
ndarray 可以直接 abs(对应ndarray), 可以 new_arr = ndarray > 10 (返回bool值的ndarray), 已经是用了函数
但是如果是apply, 一定是 df.apply(func, columnLabel), 是df掉用,当然column在pandas里面叫series