笔记：Intermediate Python

最新推荐文章于 2021-03-01 17:02:46 发布

Daisy Lee

最新推荐文章于 2021-03-01 17:02:46 发布

阅读量1.1k

点赞数 1

本文链接：https://blog.csdn.net/weixin_42871941/article/details/104808388

版权

DataCamp学习笔记同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

Data Analyst with Python

4 篇文章 0 订阅

订阅专栏

Matplotlib

CASE: demographic

案例：人口统计数据

The world bank has estimates of the world population for the years 1950 up to 2100. The years are loaded in your workspace as a list called year, and the corresponding populations as a list called pop.
世界银行预估了1950年到2100年的世界人口。其中年份已经加载到列表year，人口加载到列表pop。

print(year[-1])
print(pop[-1])

2100
10.85

from matplotlib import pyplot as plt
plt.plot(year, pop)
plt.show()

在这里插入图片描述
Let’s start working on the data that professor Hans Rosling used to build his beautiful bubble chart. It was collected in 2007. Two lists are available for you:
现在让我们开始研究 Hans Rosling 教授的一份数据，其中包含两个指标：

life_exp which contains the life expectancy for each country 每个国家的预期寿命
gdp_cap, which contains the GDP per capita (i.e. per person) for each country expressed in US Dollars. 每个国家的人均GDP

print(gdp_cap[-1])
print(life_exp[-1])

469.70929810000007
43.487

plt.plot(gdp_cap, life_exp)
plt.show()

在这里插入图片描述
When you’re trying to assess if there’s a correlation between two variables, for example, the scatter plot is the better choice.
当你尝试评估两个变量之间的相关性时，散点图是更好的选择。

plt.scatter(gdp_cap, life_exp)
plt.xscale('log') # 把人均GDP用对数表示时，相关性就会变得很明显。
plt.show()

在这里插入图片描述
You saw that the higher GDP usually corresponds to a higher life expectancy. In other words, there is a positive correlation. Do you think there’s a relationship between population and life expectancy of a country?
GDP越高，寿命越长。换句话说，两者是正相关的。但是一个国家的人口和预期寿命之间有关系吗?

import matplotlib.pyplot as plt
plt.scatter(pop, life_exp)
plt.show()

在这里插入图片描述
To see how life expectancy in different countries is distributed, let’s create a histogram of life_exp
为了了解不同国家的预期寿命是如何分布的，让我们创建一个life_exp直方图。

import matplotlib.pyplot as plt
plt.hist(life_exp)
plt.show()

在这里插入图片描述
In the previous exercise, you didn’t specify the number of bins. By default, Python sets the number of bins to 10 in that case. The number of bins is pretty important. Too few bins will oversimplify reality and won’t show you the details. Too many bins will overcomplicate reality and won’t show the bigger picture.
直方图默认箱子为10，太少的箱子会使图像过于简单化，不会展示细节。太多的箱子会使图像变得过于复杂，不会展现出更大的图景。

import matplotlib.pyplot as plt
plt.hist(life_exp, 5)
plt.show()
plt.clf() # 清除

在这里插入图片描述

import matplotlib.pyplot as plt
plt.hist(life_exp, 20)
plt.show()
plt.clf()

在这里插入图片描述
Let’s do a similar comparison. life_exp contains life expectancy data for different countries in 2007. You also have access to a second list now, life_exp1950, containing similar data for 1950. Can you make a histogram for both datasets?
life_exp包含2007年不同国家的预期寿命数据。life_exp1950包含1950年不同国家的预期寿命数据。你能给两个数据集都做一个直方图吗?

import matplotlib.pyplot as plt
plt.hist(life_exp, 15)
plt.show()
plt.clf()

plt.hist(life_exp1950, 15)
plt.show()
plt.clf()

在这里插入图片描述

You’re going to work on the scatter plot with world development data: GDP per capita on the x-axis (logarithmic scale), life expectancy on the y-axis. As a first step, let’s add axis labels and a title to the plot.
你们要用世界发展数据的散点图：x轴是人均GDP(对数尺度)，y轴是预期寿命。作为第一步，让我们将axis标签和标题添加到绘图中。

import matplotlib.pyplot as plt
plt.scatter(gdp_cap, life_exp)
plt.xscale('log') 
xlab = 'GDP per Capita [in USD]'
ylab = 'Life Expectancy [in years]'
title = 'World Development in 2007'
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
plt.show()

在这里插入图片描述
Let’s do a thing for the x-axis of your world development chart, with the xticks() function. The tick values 1000, 10000 and 100000 should be replaced by 1k, 10k and 100k.
让我们用xticks()函数修改图表中的x轴：刻度值1000、10000和100000用1k、10k和100k替换。

import matplotlib.pyplot as plt
plt.scatter(gdp_cap, life_exp)
plt.xscale('log') 
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
tick_val = [1000, 10000, 100000]
tick_lab = ['1k', '10k', '100k']
plt.xticks(tick_val, tick_lab)
plt.show()

在这里插入图片描述
Right now, the scatter plot is just a cloud of blue dots, indistinguishable from each other. Let’s change this. Wouldn’t it be nice if the size of the dots corresponds to the population?
现在，散点图只是一团蓝点，彼此难以区分。让我们改变这种情况，让这些点的大小与总体的大小一致。

import numpy as np
np_pop = np.array(pop) # 将pop存储为numpy数组:np_pop
np_pop = np_pop * 2
plt.scatter(gdp_cap, life_exp, s = np_pop) # 将size参数设置为np_pop
plt.xscale('log') 
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])
plt.show()

在这里插入图片描述
The next step is making the plot more colorful! To do this, a list col has been created for you. It’s a list with a color for each corresponding country, depending on the continent the country is part of.
下一步，我们为不同大洲国家设定不同的颜色，具体参见下方字典：

dict = {
    'Asia':'red',
    'Europe':'green',
    'Africa':'blue',
    'Americas':'yellow',
    'Oceania':'black'
}

plt.scatter(x = gdp_cap, y = life_exp, 
			s = np.array(pop) * 2, 
			c = col, 
			alpha = 0.8)
plt.xscale('log') 
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])
plt.show()

在这里插入图片描述
Additional customizations and gridlines.
添加额外的注释和网格线

plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)
plt.xscale('log') 
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')
plt.grid(True)
plt.show()

在这里插入图片描述

Dictionaries

找到某个国家名对应的index：

countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']
ind_ger = countries.index('germany') # 德国的索引
print(capitals[ind_ger])

<script.py> output:
    berlin

学会创建字典：

countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']
europe = { 'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
print(europe)

<script.py> output:
    {'spain': 'madrid', 'germany': 'berlin', 'norway': 'oslo', 'france': 'paris'}

访问字典中的键(key)对应的值(value)：

europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
print(europe.keys())
print(europe['norway'])

<script.py> output:
    dict_keys(['spain', 'germany', 'norway', 'france'])
    oslo

添加一个新的键值进入字典：

europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
europe['italy'] = 'rome'
print('italy' in europe)
europe['poland'] = 'warsaw'
print(europe)

<script.py> output:
    True
    {'spain': 'madrid', 'germany': 'berlin', 'italy': 'rome', 'norway': 'oslo', 'france': 'paris', 'poland': 'warsaw'}

更新和删除字典中已存在的键值：

europe = {'spain':'madrid', 'france':'paris', 'germany':'bonn',
          'norway':'oslo', 'italy':'rome', 'poland':'warsaw',
          'australia':'vienna' }
europe['germany'] = 'berlin'
del europe['australia']
print(europe)

<script.py> output:
    {'spain': 'madrid', 'germany': 'berlin', 'italy': 'rome', 'norway': 'oslo', 'france': 'paris', 'poland': 'warsaw'}

字典嵌套：

europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
           'france': { 'capital':'paris', 'population':66.03 },
           'germany': { 'capital':'berlin', 'population':80.62 },
           'norway': { 'capital':'oslo', 'population':5.084 } }
print(europe['france'])
data = {'capital':'rome', 'population':59.83} # 添加信息
europe['italy'] = data
print(europe)

<script.py> output:
    {'capital': 'paris', 'population': 66.03}
    {'spain': {'capital': 'madrid', 'population': 46.77}, 'germany': {'capital': 'berlin', 'population': 80.62}, 'italy': {'capital': 'rome', 'population': 59.83}, 'norway': {'capital': 'oslo', 'population': 5.084}, 'france': {'capital': 'paris', 'population': 66.03}}

Pandas

The DataFrame is one of Pandas’ most important data structures. It’s basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.
DataFrame是panda最重要的数据结构之一。它基本上是一种存储表格数据的方法，您可以在其中标记行和列。构建DataFrame的一种方法是从字典中获得。

names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

import pandas as pd
my_dict = {'country':names, 'drives_right':dr, 'cars_per_cap':cpc} # 创建字典
cars = pd.DataFrame(my_dict) # 建立一个DataFrame
print(cars)

<script.py> output:
       cars_per_cap        country  drives_right
    0           809  United States          True
    1           731      Australia         False
    2           588          Japan         False
				...

通过设置index属性指定DataFrame的行标签：

import pandas as pd
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars = pd.DataFrame(cars_dict)

row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']
cars.index = row_labels # 指定行标签
print(cars)

<script.py> output:
         cars_per_cap        country  drives_right
    US            809  United States          True
    AUS           731      Australia         False
    JPN           588          Japan         False
				...

Putting data in a dictionary and then building a DataFrame works, but it’s not very efficient. What if you’re dealing with millions of observations? In those cases, the data is typically available as files with a regular structure. One of those file types is the CSV file, which is short for “comma-separated values”.
将数据放入字典中，然后构建一个DataFrame是可行的，但它的效率不是很高。如果你要处理数百万次的观测呢?在这些情况下，数据通常以CSV文件提供。

import pandas as pd
cars = pd.read_csv('cars.csv')
print(cars)

<script.py> output:
      Unnamed: 0  cars_per_cap        country  drives_right
    0         US           809  United States          True
    1        AUS           731      Australia         False
    2        JPN           588          Japan         False
				...

Your read_csv() call to import the CSV data didn’t generate an error, but the output is not entirely what we wanted. The row labels were imported as another column without a name.
行标签并没有被导入，我们可以用index_col

import pandas as pd
cars = pd.read_csv('cars.csv', index_col=0)
print(cars)

<script.py> output:
         cars_per_cap        country  drives_right
    US            809  United States          True
    AUS           731      Australia         False
    JPN           588          Japan         False
				...

Square Brackets

使用方括号选择列：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars['country']) # 打印Pandas Series

<script.py> output:
    US     United States
    AUS        Australia
    JPN            Japan
				...
    Name: country, dtype: object

print(cars[['country']]) # 打印Pandas DataFrame

<script.py> output:
               country
    US   United States
    AUS      Australia
    JPN          Japan
				...

print(cars[['country', 'drives_right']])

<script.py> output:
               country  drives_right
    US   United States          True
    AUS      Australia         False
    JPN          Japan         False

除了选择列，还可以用方括号选择行或者观察值：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars[0:3])
print(cars[3:6])

<script.py> output:
         cars_per_cap        country  drives_right
    US            809  United States          True
    AUS           731      Australia         False
    JPN           588          Japan         False
         cars_per_cap  country  drives_right
    IN             18    India         False
    RU            200   Russia          True
    MOR            70  Morocco          True

loc and iloc

With loc and iloc you can do practically any data selection operation on DataFrames you can think of. loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.
使用loc和iloc，您几乎可以对您能想到的数据流进行任何数据选择操作。loc是基于标签的，这意味着必须根据行和列标签指定行和列。iloc是基于整数索引的，因此您必须通过它们的整数索引指定行和列。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars.loc[['JPN']]) # 打印Japan

<script.py> output:
         cars_per_cap country  drives_right
    JPN           588   Japan         False

print(cars.iloc[2]) 

<script.py> output:
    cars_per_cap      588
    country         Japan
    drives_right    False
    Name: JPN, dtype: object

print(cars.loc[['AUS', 'EG']])

<script.py> output:
         cars_per_cap    country  drives_right
    AUS           731  Australia         False
    EG             45      Egypt          True

loc and iloc also allow you to select both rows and columns from a DataFrame.
loc和iloc还允许您从DataFrame中选择行和列。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars.loc['MOR', 'drives_right']) # 打印出摩洛哥的drives_right值
print(cars.loc[['RU', 'MOR'], ['country', 'drives_right']])

<script.py> output:
    True
    
         country  drives_right
    RU    Russia          True
    MOR  Morocco          True

It’s also possible to select only columns with loc and iloc. In both cases, you simply put a slice going from beginning to end in front of the comma.
也可以只选择具有loc和iloc的列。在这两种情况下，只需在逗号前面放一个从开始到结束的切片。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars.loc[:, 'drives_right'])

<script.py> output:
    US      True
    AUS    False
    JPN    False
		...
    Name: drives_right, dtype: bool

print(cars.loc[:, ['drives_right']])

<script.py> output:
         drives_right
    US           True
    AUS         False
    JPN         False

print(cars.loc[:, ['cars_per_cap', 'drives_right']])

<script.py> output:
         cars_per_cap  drives_right
    US            809          True
    AUS           731         False
    JPN           588         False

Logic

Boolean operators with Numpy
带有numpy的布尔运算符：np.logical_and()、np.logical_or()和np.logical_not()

import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

print(np.logical_or(my_house > 18.5, my_house < 10))
print(np.logical_and(my_house < 11, your_house < 11))

<script.py> output:
    [False  True False  True]
    [False False False  True]

Control Flow

用if语句查看房间：

room = "kit"
area = 14.0

if room == "kit" :
    print("looking around in the kitchen.")
if area > 15:
    print("big place!")

<script.py> output:
    looking around in the kitchen.

用else扩展if语句：

room = "kit"
area = 14.0

if room == "kit" :
    print("looking around in the kitchen.")
else :
    print("looking around elsewhere.")

if area > 15 :
    print("big place!")
else:
    print("pretty small.")
    
<script.py> output:
    looking around in the kitchen.
    pretty small.

进一步：使用elif

room = "bed"
area = 14.0

if room == "kit" :
    print("looking around in the kitchen.")
elif room == "bed":
    print("looking around in the bedroom.")
else :
    print("looking around elsewhere.")

if area > 15 :
    print("big place!")
elif area > 10:
    print("medium size, nice!")
else :
    print("pretty small.")

<script.py> output:
    looking around in the bedroom.
    medium size, nice!

Filtering

筛选出符合drives_right is True的行：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
dr = cars['drives_right'] # 将drives_right提取为Series
sel = cars.loc[dr]
print(sel)

<script.py> output:
         cars_per_cap        country  drives_right
    US            809  United States          True
    RU            200         Russia          True
    MOR            70        Morocco          True
    EG             45          Egypt          True

将上面的代码简化为一行：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
sel = cars[cars['drives_right']]
print(sel)

This time you want to find out which countries have a high cars per capita figure.
找到哪些国家的人均汽车拥有量高。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

cpc = cars['cars_per_cap']
many_cars = cpc > 500
car_maniac = cars[many_cars]
print(car_maniac)

<script.py> output:
         cars_per_cap        country  drives_right
    US            809  United States          True
    AUS           731      Australia         False
    JPN           588          Japan         False

找到cars_per_cap在100到500之间的汽车观察结果：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
import numpy as np

cpc = cars['cars_per_cap']
between = np.logical_and(cpc > 100, cpc < 500)
medium = cars[between]
print(medium)

<script.py> output:
        cars_per_cap country  drives_right
    RU           200  Russia          True

Loops

while循环：

offset = 3
while offset != 0:
    print("correcting...")
    offset = offset - 1
    print(offset)

<script.py> output:
    correcting...
    2
    correcting...
    1
    correcting...
    0

偏移量为负的while循环：

offset = -3
while offset != 0 :
    print("correcting...")
    if offset > 0 :
      offset = offset - 1
    else : 
      offset = offset + 1  
    print(offset)

<script.py> output:
    correcting...
    -2
    correcting...
    -1
    correcting...
    0

对列表进行循环：

areas = [11.25, 18.0, 20.0, 10.75, 9.50]
for element in areas:
    print(element)

<script.py> output:
    11.25
    18.0
    20.0
    10.75
    9.5

Using a for loop to iterate over a list only gives you access to every list element in each run, one after the other. If you also want to access the index information, so where the list element you’re iterating over is located, you can use enumerate().
使用for循环遍历一个列表只允许您在每次运行时一个接一个地访问每个列表元素。如果还希望访问索引信息，以便迭代的列表元素位于何处，则可以使用enumerate()。

areas = [11.25, 18.0, 20.0, 10.75, 9.50]
for index, a in enumerate(areas) :
    print("room " + str(index) + ": " + str(a))

<script.py> output:
    room 0: 11.25
    room 1: 18.0
    room 2: 20.0
    room 3: 10.75
    room 4: 9.5

For non-programmer folks, room 0: 11.25 is strange. Wouldn’t it be better if the count started at 1?
房间0: 11.25很奇怪，改为房间1：

areas = [11.25, 18.0, 20.0, 10.75, 9.50]
for index, area in enumerate(areas) :
    print("room " + str(index+1) + ": " + str(area))

<script.py> output:
    room 1: 11.25
    room 2: 18.0
    room 3: 20.0
    room 4: 10.75
    room 5: 9.5

构建子列表的循环：

house = [["hallway", 11.25], 
         ["kitchen", 18.0], 
         ["living room", 20.0], 
         ["bedroom", 10.75], 
         ["bathroom", 9.50]] 
for x, y in house:
    print("the " + str(x) + " is " + str(y) + " sqm")

<script.py> output:
    the hallway is 11.25 sqm
    the kitchen is 18.0 sqm
    the living room is 20.0 sqm
    the bedroom is 10.75 sqm
    the bathroom is 9.5 sqm

字典的循环遍历：

europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
          'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
for key, value in europe.items():
    print("the capital of " + key + " is " + str(value))

<script.py> output:
    the capital of austria is vienna
    the capital of norway is oslo
    the capital of spain is madrid
    			...

numpy数组的循环遍历：

import numpy as np
for x in np_height: # 遍历一维数组
    print(str(x) + " inches")

<script.py> output:
74 inches
74 inches
72 inches

import numpy as np
for x in np.nditer(np_baseball): # 遍历二维数组及以上
    print(x)

<script.py> output:
74
74
...
180
215
...

DataFrame的循环遍历：

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
for lab, row in cars.iterrows():
    print(lab)
    print(row)

<script.py> output:
    US
    cars_per_cap              809
    country         United States
    drives_right             True
    Name: US, dtype: object
    ...

The row data that’s generated by iterrows() on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets.
每次运行时由iterrows()生成的行数据是panda系列，这种格式打印出来不太方便。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
for lab, row in cars.iterrows() :
    print(lab + ": " + str(row['cars_per_cap']))
    
<script.py> output:
    US: 809
    AUS: 731
    JPN: 588
    ...

在DataFrame中添加列

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

for lab, row in cars.iterrows(): # 添加国家列的循环代码
    cars.loc[lab, "COUNTRY"] = row['country'].upper()
print(cars)

<script.py> output:
         cars_per_cap        country  drives_right        COUNTRY
    US            809  United States          True  UNITED STATES
    AUS           731      Australia         False      AUSTRALIA
    JPN           588          Japan         False          JAPAN
    ...

If you want to add a column to a DataFrame by calling a function on another column, the iterrows() method in combination with a for loop is not the preferred way to go. Instead, you’ll want to use apply().
如果您想通过调用另一列上的函数来将一列添加到DataFrame中，那么iterrows()方法结合for循环不是首选的方法。相反，您需要使用apply()。

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

for lab, row in cars.iterrows() : # 使用.apply(str.upper)
    cars["COUNTRY"] = cars["country"].apply(str.upper)
print(cars)

Case: Hacker Statistics 黑客统计

Randomness has many uses in science, art, statistics, cryptography, gaming, gambling, and other fields. You’re going to use randomness to simulate a game.
随机性在科学、艺术、统计学、密码学、游戏、赌博和其他领域有很多用途。你将使用随机性来模拟游戏。

All the functionality you need is contained in the random package, a sub-package of numpy. In this exercise, you’ll be using two functions from this package:
您需要的所有功能都包含在random包中，它是numpy的子包。在这个练习中，您将使用这个包中的两个函数：

seed(): sets the random seed, so that your results are reproducible between simulations. As an argument, it takes an integer of your choosing. If you call the function, no output will be generated. 设置随机种子，使您的结果是重复之间的模拟。作为一个参数，它取你选择的整数。如果调用该函数，则不会生成任何输出。
rand(): if you don’t specify any arguments, it generates a random float between zero and one. 如果不指定任何参数，它将生成0到1之间的随机浮点数。

import numpy as np
np.random.seed(123) # Set the seed
print(np.random.rand())

<script.py> output:
    0.6964691855978616

使用randint()随机一个整数：

import numpy as np
np.random.seed(123)
print(np.random.randint(1, 7)) # 使用randint()来模拟骰子
print(np.random.randint(1, 7))

<script.py> output:
    6
    3

你是否在帝国大厦游戏中获胜，取决于你每一步骰子的点数，使用循环语句模拟骰子：

import numpy as np
np.random.seed(123)
step = 50
dice = np.random.randint(1, 7)

if dice <= 2 :
    step = step - 1
elif dice <= 5 :
    step = step + 1
else :
    step = step + np.random.randint(1,7)
    
print(dice)
print(step)

<script.py> output:
    6
    53

Before, you have already written Python code that determines the next step based on the previous step. Now it’s time to put this code inside a for loop so that we can simulate a random walk.
在此之前，您已经编写了Python代码，它根据前面的步骤确定下一步。现在我们把它放在for循环中进行随机游走：

import numpy as np
np.random.seed(123)
random_walk = [0]

for x in range(100) :
    step = random_walk[-1] # random_walk中的最后一个元素
    dice = np.random.randint(1,7)
    if dice <= 2:
        step = step - 1
    elif dice <= 5:
        step = step + 1
    else:
        step = step + np.random.randint(1,7)
    random_walk.append(step) # 将next_step追加到random_walk
print(random_walk)

<script.py> output:
    [0, 3, 4, 5, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1, 0, -1, ..., 57, 58, 59]

Things are shaping up nicely! You already have code that calculates your location in the Empire State Building after 100 dice throws. However, there’s something we haven’t thought about - you can’t go below 0!
你已经有了在掷100次骰子后计算你在帝国大厦位置的代码。然而，有些事情我们还没有考虑到——你不能低于0！解决这类问题的典型方法是使用max()。

import numpy as np
np.random.seed(123)
random_walk = [0]

for x in range(100) :
    step = random_walk[-1]
    dice = np.random.randint(1,7)
    if dice <= 2:
        step = max(0, step - 1) # 使用max确保step不低于0
    elif dice <= 5:
        step = step + 1
    else:
        step = step + np.random.randint(1,7)
    random_walk.append(step)
print(random_walk)

<script.py> output:
    [0, 3, 4, 5, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1, 0, 0, ..., 58, 59, 60]

绘制折线图：

import numpy as np
np.random.seed(123)
random_walk = [0]

for x in range(100) :
    step = random_walk[-1]
    dice = np.random.randint(1,7)
    if dice <= 2:
        step = max(0, step - 1)
    elif dice <= 5:
        step = step + 1
    else:
        step = step + np.random.randint(1,7)
    random_walk.append(step)

import matplotlib.pyplot as plt
plt.plot(random_walk)
plt.show()

在这里插入图片描述
A single random walk is one thing, but that doesn’t tell you if you have a good chance at winning the bet. To get an idea about how big your chances are of reaching 60 steps, you can repeatedly simulate the random walk and collect the results. That’s exactly what you’ll do in this exercise.
一次随机游走并不能告诉你你是否有很大的机会赢得这场胜利。为了了解达到60级台阶的可能性有多大，可以重复模拟随机行走并收集结果。这就是你们在这个练习中要做的。

import numpy as np
np.random.seed(123)
all_walks = []

for i in range(10) : # 模拟随机行走10次
    random_walk = [0]
    for x in range(100) :
        step = random_walk[-1]
        dice = np.random.randint(1,7)
        if dice <= 2:
            step = max(0, step - 1)
        elif dice <= 5:
            step = step + 1
        else:
            step = step + np.random.randint(1,7)
        random_walk.append(step)
    all_walks.append(random_walk)
    
import matplotlib.pyplot as plt    
np_aw = np.array(all_walks)
plt.plot(np_aw)
plt.show()
plt.clf()
np_aw_t = np.transpose(np_aw) # 转置np_aw
plt.plot(np_aw_t)
plt.show()

在这里插入图片描述

You’re a bit clumsy and you have a 0.1% chance of falling down. That calls for another random number generation. Basically, you can generate a random float between 0 and 1. If this value is less than or equal to 0.001, you should reset step to 0.
除此之外，你有0.1%的几率摔倒。这就需要产生另一个随机数。基本上，您可以生成0到1之间的随机浮点数。如果该值小于或等于0.001，则需要从头开始。

import numpy as np
np.random.seed(123)
all_walks = []

for i in range(250) :
    random_walk = [0]
    for x in range(100) :
        step = random_walk[-1]
        dice = np.random.randint(1,7)
        if dice <= 2:
            step = max(0, step - 1)
        elif dice <= 5:
            step = step + 1
        else:
            step = step + np.random.randint(1,7)
        if np.random.rand() <= 0.001 : # Implement clumsiness
            step = 0
        random_walk.append(step)
    all_walks.append(random_walk)

import matplotlib.pyplot as plt   
np_aw_t = np.transpose(np.array(all_walks))
plt.plot(np_aw_t)
plt.show()

在这里插入图片描述
All these fancy visualizations have put us on a sidetrack. We still have to solve the million-dollar problem: What are the odds that you’ll reach 60 steps high on the Empire State Building?
所有这些花哨的视觉化使我们偏离了轨道。我们仍然要解决这个百万美元的问题:你爬上帝国大厦60级台阶的几率有多大？

Basically, you want to know about the end points of all the random walks you’ve simulated. These end points have a certain distribution that you can visualize with a histogram.
基本上，你想知道你模拟的所有随机游动的终点。这些端点有一定的分布，你可以用直方图来表示。

import numpy as np
np.random.seed(123)
all_walks = []
for i in range(500) :
    random_walk = [0]
    for x in range(100) :
        step = random_walk[-1]
        dice = np.random.randint(1,7)
        if dice <= 2:
            step = max(0, step - 1)
        elif dice <= 5:
            step = step + 1
        else:
            step = step + np.random.randint(1,7)
        if np.random.rand() <= 0.001 :
            step = 0
        random_walk.append(step)
    all_walks.append(random_walk)

import matplotlib.pyplot as plt   
np_aw_t = np.transpose(np.array(all_walks))
ends = np_aw_t[-1,:] # 选取np_aw_t最后一个点
plt.hist(ends)
plt.show()

在这里插入图片描述

np.mean(ends >= 60)

<script.py> output:
    0.784

Daisy Lee

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录