学习python有关统计基础部分课程总结

最新推荐文章于 2022-05-12 23:06:22 发布

书香门第

最新推荐文章于 2022-05-12 23:06:22 发布

阅读量910

点赞数 1

分类专栏：学习python 文章标签： python pandas

本文链接：https://blog.csdn.net/weixin_42325834/article/details/109685955

版权

本文是对学习Python统计基础课程的总结，涵盖了函数、标量、列表、字典、字符串格式化、读取CSV文件、日期和时间、对象、map和lambda函数、numpy以及Pandas的Series和DataFrame等内容。强调了Python数据类型的操作、文件读取、时间处理、正则表达式、数据操作和数据结构转换等核心知识点。

摘要由CSDN通过智能技术生成

花了整整两天的时间学习一门关于python用来做统计的基础课程。现在总结一下。基本的原则是列出一些代码的实例，表明学到的pythong包括numpy和pandas的基本功能。如果可以用一个函数表示的就用一个函数，不用多个函数，这样容易跟踪。不过刚刚学完，思想上很觉得疲劳。这个总结可能要慢慢做起来。做完了再分享出去。

函数

def add_numbers(x, y, z=None, flag=False):
    if flag:
        print('flag is true')
    if z == None:
        return x + y
    else:
        return x + y + z

print(add_numbers(1, 2, flag=True))

要点：

1） python的函数参数是可以有默认值的

2）调用带有默认参数值的函数时，可以指定哪个参数的默认值被改写

def add_numbers(x, y):
    return x + y
f = add_numbers
f(1, 2)

要点：

1）函数可以赋值给变量

标量，列表，字典

x = (1, 'a', 2, 'b')
type(x)
#tuple

x = [1, 'a', 2, 'b']
x.append(3.3)

[1]*3
# [1, 1, 1]

[1,2] + [3,4]
# [1, 2, 3, 4]

print(x[1])
print(x[1:2])
print(x[-3, -1])

x = {'a':'b', 'c':'d'}
print(x['a'])

for e in x:
    print(x[e])

for v in x.values():
    print(v)

for e,v in x.items():
    print(e)
    print(v)

要点：

1）python的标量，列表，字典可以保存不同类型的值

2）列表的一些基本操作，像append(), *， +

3）字典的遍历key，value，和item的方法

字符串格式打印

sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'
print(sales_statement.format('Chris', 4, 3.24, 4 * 3.24))

要点：

1）怎么像java或者c++那样在字符串里打印变量值

读取CSV文件

import csv 

%precision 2

with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))

mpg[0].keys()
# dict_keys(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'name'])

sum(float(d['acceleration']) for d in mpg)

# 6196.10

要点：

1）如何将CSV文件的每一行按照字典的方式读出来

2）怎样访问读出来的字典的内容

日期和时间

import datetime as dt
import time as tm

tm.time()
# 1605049825.93

dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow
# datetime.datetime(2020, 11, 10, 15, 11, 18, 593011)

dtnow.year, dtnow.month
# (2020, 11)

today = dt.date.today()
today
# datetime.date(2020, 11, 10)

要点：

1）怎么使用时间戳和日期

2）怎么在时间戳和日期之间转换

对象

class Person:
    department = 'school of information'
    
    def set_name(self, new_name):
        self.name = new_name

要点：

1）怎么定义类和类里面的属性和方法

map函数

store1 = [10.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]
cheapest = map(min, store1, store2) # lazy execution

list(cheapest)
# [9.00, 11.00, 12.34, 2.01]

要点：

1）map函数是lazy execution的

lambda函数

my_function = lambda a, b, c : a + b
my_function(1, 2, 3)
# 3

people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    return person.split()[0] + ' ' + person.split()[-1]

#option 1
for person in people:
    print(split_title_and_name(person) == (lambda person:person.split()[0] + ' ' + person.split()[-1])(person))

要点：

1）lambda函数的写法

numpy

import numpy as np
import math

#array creation
a = np.array([1, 2, 3])
print(a) # [1, 2, 3]
print(a.ndim) # 1

b = np.array([[1, 2, 3], [4, 5, 6]])
b
# array([[1, 2, 3],
#       [4, 5, 6]])

b.shape
# (2,3)

b.dtype
#dtype('int32')

d = np.zeros((2, 3))
print(d)
# [[0. 0. 0.]
# [0. 0. 0.]]

np.random.rand(2,3)
# array([[0.74956529, 0.33373396, 0.19966149],
#       [0.35464706, 0.24963549, 0.31511589]])

f = np.arange(10, 50, 2)
f
# array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,
#       44, 46, 48])

np.linspace(0, 2, 15)
# array([0.        , 0.14285714, 0.28571429, 0.42857143, 0.57142857,
#        0.71428571, 0.85714286, 1.        , 1.14285714, 1.28571429,
#        1.42857143, 1.57142857, 1.71428571, 1.85714286, 2.        ])

要点：

1）怎么从list创建numpy array

2）怎样显示numpy array维度的数量，维度，数据类型

3）怎样用numpy创建全零array，随机array，步进式array，和平均分布array

a = np.array([10, 20, 30, 40])
b = np.array([1, 2, 3, 4])
d = a * b
print(d)
# [ 10  40  90 160]

# booilean array
d > 50
# array([False, False,  True,  True])

A = np.array([[1, 1], [0, 1]])
B = np.array([[2, 0], [3, 4]])
print(A*B)
# [[2 0]
#  [0 4]]

print(A.sum())
print(A.max())
print(A.min())
print(A.mean())
# 3
# 1
# 0
# 0.75

要点：

1）array的乘法

2）array的布尔型array

3）array的sum，max， min，和mean

b = np.arange(1, 16, 1).reshape(3, 5)
print(b)

# [[ 1  2  3  4  5]
#  [ 6  7  8  9 10]
#  [11 12 13 14 15]]

from PIL import Image
from IPython.display import display
im = Image.open('momo.JPG')
display