《Python数据分析基础》学习笔记第一章

Bryant chen

已于 2023-11-25 21:52:08 修改

阅读量659

点赞数 23

分类专栏： python数据分析学习文章标签： python 数据分析学习

于 2023-11-21 17:55:06 首次发布

本文链接：https://blog.csdn.net/chenchenojbk/article/details/134537171

版权

python数据分析学习专栏收录该内容

9 篇文章 1 订阅

订阅专栏

python数据分析

源代码下载

创建python脚本

（1）打开一个文本编辑器
（2）讲下面两行代码写在文本文件:

#!/Users/chenbryant/anaconda3/bin/python
print("Output #1: I'm excited to learn Python.")

（3）save为一个.py文件（可命名为 “first_script.py”）

运行python脚本

在macOS终端中运行如下：
（1）打开一个终端窗口。
（2）切换到桌面，讲上面创建的python脚本保存在这里。

cd /Users/[Your Name]/Desktop

（3）为python脚本添加权限。

chmod +x first_script.py

（4）运行python脚本。

./first_script.py

Python语言基础要素

数值

整数

x = 9
print("Output #4: {0}".format(x))
print("Output #5: {0}".format(3**4))
print("Output #6: {0}".format(int(8.3)/int(2.7)))

输出：
Output #4: 9
Output #5: 81
Output #6: 4.0
# 浮点数
print("Output #7: {0:.3f}".format(8.3/2.7)) # .3f -> 保留三位小数
y = 2.5*4.8
print("Output #8: {0:.1f}".format(y))
r = 8/float(3)
print("Output #9: {0:.2f}".format(r))
print("Output #10: {0:.4f}".format(8.0/3))

# math 
from math import exp,log,sqrt
print("Output #11: {0:.4f}".format(exp(3)))
print("Output #12: {0:.2f}".format(log(4)))
print("Output #13: {0:.1f}".format(sqrt(81)))

字符串 String

字符串可以包含在单引号、双引号、3个单引号或3个双引号之间。下面是几个示例：
print("Output #14: {0:s}".format('I\'m enjoying learning Python.')) # 如果用双引号来包含 这个字符串的话，就不需要在 "I'm" 的单引号前面使用反斜杠了
print("Output #15: {0:s}".format("This is a long string.Without the backslash\
 it would run off of the page on the right in the text editor and be very\
 difficult to read and edit. By using the backslash you can split the long\
 string into smaller strings on separate lines so that the whole string is easy\
 to view in the text editor."))
print("Out put #16: {0:s}".format('''You can use triple single quotes for multi-line comment strings.'''))
print("Output #17: {0:s}".format("""You can also use triple double quotes for multi-line comment strings."""))

split （字符串拆分）

EG:
string1 = "My deliverable is due in May"
string1_list1 = string1.split()
string1_list2 = string1.split(" ",2) # 第一个附加参数是 " "，说明想用 空格来拆分字符串。第二个附加参数是 2，说明只想使用前两个空格进行拆分。
print("Output #21: {0}".format(string1_list1))
print("Output #22: FIRST PIECE:{0} SECOND PIECE:{1} THIRD PIECE:{2}"\
.format(string1_list2[0],string1_list2[1],string1_list2[2]))
string2 = "Your ,delierable,is,due,in,June"
string2_list = string2.split(',')
print("Output #23: {0}".format(string2_list))
print("Output #24: {0} {1} {2}".format(string2_list[1],string2_list[5],string2_list[-1]))

join （合并字符串）

print("Output #25: {0}".format(','.join(string2_list)))

strip lstrip rstrip (分别从字符串的左侧、右侧和两侧删除空格、制表符和换行符)

string3 = "Remove unwanted characters    from this string.\t\t    \n"
print("Output #26: string3: {0:s}".format(string3))
string3_lstrip = string3.lstrip()
print("Output #27: lstrip: {0:s}".format(string3_lstrip))
string3_rstrip = string3.rstrip()
print("Output #28: rstrip: {0:s}".format(string3_rstrip))
string3_strip = string3.strip()
print("Output #29: strip: {0:s}".format(string3_strip))

replace(替换)

string5 = "Let's replace the spaces in this sentence with other characters."
string5_replace = string5.replace(" ","!@!")
print("Output #32 (with !@!): {0:s}".format(string5_replace))
string5_replace = string5.replace(" ",",")
print("Output #33 (with commas): {0:s}".format(string5_replace))

lower upper capitalize (lower 和 upper 函数分别用来将字符串中的字母转换为小写和大写。capitalize 函数对字符串中的第一个字母应用upper函数，对其余的字母应用lower函数)

string6 = "Here's WHAT Happens WHEN You Use lower."
print("Output #34: {0:s}".format(string6.lower()))
string7 = "Here's what Happens WHEN you use Capitalize."
print("Output #35: {0:s}".format(string7.upper()))
string5 = "here's WHAT Happens WHEN you use Capitalize."
print("Output #36: {0:s}".format(string5.capitalize()))
string5_list = string5.split()
print("Output #37 (on each word):")
for word in string5_list:
print("{0:s}".format(word.capitalize()))

正则表达式与模式匹配(re)

# 计算字符串中模式出现的次数
string = "The quick brown fox jumps over the lazy dog."
string_list = string.split()
pattern = re.compile(r"The",re.I) #re.compile 函数将文本形式的模式编译成为编译后的正则表达式，re.I 函数确保模式是不区分大小写的。
count = 0
for word in string_list:
    if pattern.search(word):
        count += 1
print("Output #38: {0:d}".format(count))


# 在字符串中每次找到模式时将其打印出来
string = "The quick brown fox jumps over the lazy dog."
string_list = string.split()
pattern = re.compile(r"(?P<match_word>The)",re.I)
print("Output #39:")
for word in string_list:
    if pattern.search(word):
        print("{:s}".format(pattern.search(word).group('match_word')))
        
# 使用字母“a“替换字符串中的单词”the“
string = "The quick brown fox jumps over the lazy dog."
string_to_find = r"The"
pattern = re.compile(string_to_find,re.I)
print("Output #40: {:s}".format(pattern.sub("a",string)))

日期

# 打印出今天的日期形式
today = date.today()
print("Output #41: today: {0!s}".format(today)) # {0!s} 中的 !s 表示传入到 print 语句中的值应该格式化为字符串
print("Output #42: {0!s}".format(today.year))
print("Output #43: {0!s}".format(today.month))
print("Output #44: {0!s}".format(today.day))
current_datetime = datetime.today()
print("Output #45: {0!s}".format(current_datetime))

# 使用timedelta计算一个新日期
one_day = timedelta(days=-1)
today = date.today()
yesterday = today + one_day
print("Output #46: yesterday: {0!s}".format(yesterday))
eight_hours = timedelta(hours=-8)
print("Output #47: {0!s} {1!s}".format(eight_hours.days,eight_hours.seconds))

# 计算出两个日期之间的天数
today = date.today()
one_day = timedelta(days=-1)
yesterday = today + one_day
date_diff = today - yesterday
print("Output #48: {0!s}".format(date_diff))
print("Output #49: {0!s}".format(str(date_diff).split()[0]))

list

创建列表

    # 使用方括号创建一个列表
    # 用len计算列表中元素的数量
    # 用max和min找出最大值和最小值
    # 用count计算出列表中某个值出现的次数
    a_list = [1, 2, 3]
    print("Output #58: {}".format(a_list))
    print("Output #59: a_list has {} elements.".format(len(a_list)))
    print("Output #60: the maximum value in a_list is  {}".format(max(a_list)))
    print("Output #61: the minimum value in a_list is  {}".format(min(a_list)))
    another_list = ['printer',5,['star','circle',9]]
    print("Output #62: {}".format(another_list))
    print("Output #63: another_list also has {} elements.".format(len(another_list)))
    print("Output #64: 5 is in another_list {} time.".format(another_list.count(5))) # count 返回列表中某个元素出现的次数。

列表复制

a_new_list = a_list[:]

列表连接

#使用+将两个或多个列表连接起来
a_longer_list = a_list + another_list

使用in和not in

# 使用in和not in来检查列表中是否有特定元素
a_list = [1, 2, 3]
a = 2 in a_list
print("Output #79: {}".format(a))
if 2 in a_list:
    print("Output #80: 2 is in {}.".format(a_list))
b = 6 not in a_list
print("Output #81: {}".format(b))
if 6 not in a_list:
    print("Output #82: 6 is not in {}.".format(a_list))

追加、删除和弹出元素

# 使用append()向列表末尾追加一个新元素 
# 使用remove()从列表中删除一个特定元素 
# 使用pop()从列表末尾删除一个元素 
a_list.append(4)
a_list.append(5)
a_list.append(6)
print("Output #83: {}".format(a_list)) 
a_list.remove(5)
print("Output #84: {}".format(a_list)) 
a_list.pop()
a_list.pop()
print("Output #85: {}".format(a_list))

列表反转

# 使用reverse()原地反转一个列表会修改原列表
# 要想反转列表同时又不修改原列表，可以先复制列表 
a_list.reverse()
print("Output #86: {}".format(a_list)) 
a_list.reverse()
print("Output #87: {}".format(a_list))

列表排序

unordered_list = [3, 5, 1, 7, 2, 8, 4, 9, 0, 6] 
print("Output #88: {}".format(unordered_list)) 
list_copy = unordered_list[:]
list_copy.sort()
print("Output #89: {}".format(list_copy)) 
print("Output #90: {}".format(unordered_list))

sorted排序函数

# 使用sorted对一个列表集合按照列表中某个位置的元素进行排序
my_lists = [[1, 2, 3, 4], [4, 3, 2, 1], [2, 4, 1, 3]]
my_lists_sorted_by_index_3 = sorted(my_lists, key=lambda index_value:index_value[3]) 
# 使用索引位置为 3 的值(也就是列表中的第四个元素)对列表 进行排序。
print("Output #91: {}".format(my_lists_sorted_by_index_3))

使用itemgetter对一个列表集合按照两个索引位置来排序

my_lists = [[123, 2, 2, 444], [22, 6, 6, 444], [354, 4, 4, 678],[235, 4, 4, 678],[236, 5, 5, 678],[578, 1, 1, 290],[461, 1, 1, 290]]
my_lists_sorted_by_index_3_and_0 = sorted(my_lists, key=itemgetter(3,0)) # 先按照3排序再按照0排序。
print("Output #92: {}".format(my_lists_sorted_by_index_3_and_0))

元组

创建元组

# 使用圆括号创建元组
my_tuple = ('x', 'y', 'z')
print("Output #93: {}".format(my_tuple))
print("Output #94: my_tuple has {} elements".format(len(my_tuple))) print("Output #95: {}".format(my_tuple[1]))
longer_tuple = my_tuple + my_tuple
print("Output #96: {}".format(longer_tuple))

元组解包

# 使用赋值操作符左侧的变量对元组进行解包
one, two, three = my_tuple
print("Output #97: {0} {1} {2}".format(one, two, three)) 
var1 = 'red'
var2 = 'robin'
print("Output #98: {} {}".format(var1, var2))

# 在变量之间交换彼此的值
var1, var2 = var2, var1
print("Output #99: {} {}".format(var1, var2))

元组与列表相互转换

my_list = [1, 2, 3]
my_tuple = ('x', 'y', 'z')
print("Output #100: {}".format(tuple(my_list))) 
print("Output #101: {}".format(list(my_tuple)))

字典

创建字典

# 使用花括号创建字典
# 用冒号分隔键-值对
# 用len计算出字典中键-值对的数量
empty_dict = { }
a_dict = {'one':1, 'two':2, 'three':3}
print(a_dict)
print(len(a_dict))
another_dict = {'x':'printer', 'y':5, 'z':['star', 'circle', 9]}
print(another_dict)
print(len(another_dict))

# 使用键来引用字典中特点的值
print(a_dict['two'])
print(another_dict['z'])

复制

# 使用copy复制一个字典
a_new_dict = a_dict.copy()

键、值和项目

# 使用keys、values和items
# 分别引用字典中的键、值和键-值对
a_dict = {'one':1, 'two':2, 'three':3}
print("Output #109: {}".format(a_dict.keys()))
a_dict_keys = a_dict.keys()
print("Output #110: {}".format(a_dict_keys))
print("Output #111: {}".format(a_dict.values()))
print("Output #112: {}".format(a_dict.items()))

使用in、not in和get

if 'y' in another_dict:
    print("Output #114: y is a key in another_dict: {}.".format(another_dict.keys()))
if 'c' not in another_dict:
    print("Output #115: c is not a key in another_dict: {}.".format(another_dict.keys()))
print("Output #116: {!s}".format(a_dict.get('three')))
print("Output #117: {!s}".format(a_dict.get('four')))
print("Output #118: {!s}".format(a_dict.get('four','Not in dict')))

排序

对items函数生成的键-值元组列表按照某种规则进行排序。这种规则就是key，它相当于一个简单的lambda函数。在这个lambda函数中，item是唯一的参数，表示由items函数返回的每个键-值元组。冒号后面是要返回的表达式，这个表达式是item[0]，即返回元组中的第一个元素（也就是字典键值），用作sorted函数的关键字。简而言之，这行代码的意义是：将字典中的键-值对按照字典键值升序排序。下一个sorted函数使用item[1]而不是item[0]，所以这行代码按照字典值对键-值对进行升序排序。reverse=True对应降序。

# 使用sorted对字典进行排序
# 要想对字典排序的同时不修改原字典
# 先复制字典
print("Output #119: {}".format(a_dict))
dict_copy = a_dict.copy()
ordered_dict1 = sorted(dict_copy.items(), key=lambda item:item[0])
print("Output #120 (order by keys): {}".format(ordered_dict1))
ordered_dict2 = sorted(dict_copy.items(), key=lambda item:item[1])
print("Output #121 (order by values): {}".format(ordered_dict2))
ordered_dict3 = sorted(dict_copy.items(), key=lambda x: x[1],reverse=True)
print("Output #122 (order by values, descending): {}".format(ordered_dict3))
ordered_dict4 = sorted(dict_copy.items(), key=lambda x: x[1],reverse=False)
print("Output #122 (order by values, ascending): {}".format(ordered_dict4))

控制流

# for循环
print("Output #126:")
for month in y:
    print("{!s}".format(month))
    
print("Output #127: (index value: name in list)")
for i in range(len(z)):
    print("{0!s}: {1:s}".format(i, z[i]))
    
print("Output #128: (access elements in y with z's index values)")
for j in range(len(z)):
    if y[j].startswith('J'):
        print("{!s}".format(y[j]))
print("Output #129:")
for key, value in another_dict.items():
    print("{0:s}, {1}".format(key, value))

简化for循环：列表、集合与字典生成式

# 列表生成式
# 使用列表生成式选择特定的行
my_data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
rows_to_keep = [row for row in my_data if row[2]>5]
print("Output #130 (list comprehension): {}".format(rows_to_keep))
# 集合生成式
my_data = [(1, 2, 3),(4, 5, 6),(7, 8, 9),(7, 8, 9)]
set_of_tuples1 = {x for x in my_data}
print("Output #131 (set comprehension): {}".format(set_of_tuples1))
set_of_tuples2 = set(my_data)
print("Output #132 (set function): {}".format(set_of_tuples2))    
# 使用字典生成式选择特定的键-值对
my_dictionary = {'customer1': 7, 'customer2': 9, 'customer3': 11}
my_results = {key : value for key, value in my_dictionary.items() if value > 10}
print("Output #133 (dictionary comprehension): {}".format(my_results))

函数

# 计算一系列数值的均值
def getMean(numericValues):
return sum(numericValues)/len(numericValues)if len(numericValues)>0 else float('nan')
my_list = [2, 2, 4, 4, 6, 6, 8, 8]
print("Output #135 (mean): {!s}".format(getMean(my_list)))

try-except

def getMean(numericValues):
    return sum(numericValues)/len(numericValues)
my_list2 = [ ]
try :
    print("Output #138: {}".format(getMean(my_list2)))
except ZeroDivisionError as detail:
    print("Output #138 (Error): {}".format(float('nan')))
    print("Output #138 (Error): {}".format(detail))

读取文本文件

创建文本文件

# 读取单个文本文件
input_file = sys.argv[1] # 提供文件路径名
print("Output #143: ")
filereader = open(input_file, 'r')
for row in filereader:
    print(row.strip())
filereader.close()

命令：python first_script.py file_to_read.txt
如果脚本和要读取的文本文件不在同一路径下 :
python first_script.py “路径/file_to_read.txt”

# 使用glob读取多个文本文件
inputPath = sys.argv[1] # 提供目录路径名
for input_file in glob.glob(os.path.join(inputPath,'*.txt')):
    with open(input_file, 'r', newline='') as filereader:
        for row in filereader:
            print("{}".format(row.strip()))
使用os.path.join函数和glob.glob函数来找出符合特定模式的某个文件夹下面的所有文件。指向这个文件夹的路径包含在变量inputpath中，这个变量将在命令行中被提供。os.path.join函数将这个文件夹路径和中国文件夹中所有符合特定模式的文件名连接起来，这种特定模式可以由glob.glob函数扩展。

写入文本文件

my_letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
max_index = len(my_letters)
output_file = sys.argv[1]
filewriter = open(output_file, 'w') # w意为可写
for index_value in range(len(my_letters)):
    if index_value < (max_index-1):
        filewriter.write(my_letters[index_value]+'\t')
    else:
        filewriter.write(my_letters[index_value]+'\n')
filewriter.close()
# 写入csv文件
my_numers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
max_index = len(my_numers)
output_file = sys.argv[1]
filewriter = open(output_file, 'a')
for index_value in range(len(my_numers)):
    if index_value < (max_index-1):
        filewriter.write(str(my_numers[index_value])+',')
    else:
        filewriter.write(str(my_numers[index_value])+'\n')
filewriter.close()