mysql 时间推移_随着时间的推移可视化COVID-19新案例

最新推荐文章于 2022-07-12 11:11:52 发布

weixin_26746401

最新推荐文章于 2022-07-12 11:11:52 发布

阅读量210

点赞数

文章标签： python mysql java 数据可视化

原文链接：https://towardsdatascience.com/visualization-of-covid-19-new-cases-over-time-in-python-8c6ac4620c88

版权

mysql 时间推移

This heat map shows the progression of the COVID-19 pandemic in the United States over time. The map is read from left to right, and color coded to show the relative numbers of new cases by state, adjusted for population.

该热图显示了美国COVID-19大流行随着时间的进展。从左到右读取地图，并用颜色编码显示按州调整的新病例的相对数量，并根据人口进行调整。

This visualization was inspired by a similar heat map that I saw on a discussion forum thread. I could never locate the source, as it was only a pasted image with no link. The original version was also crafted to make a political point, separating states by predominate party affiliation, which I was not as interested in. I was fascinated by how it concisely showed the progression of the pandemic, so I decided to create a similar visualization myself that I could update regularly.

这种可视化的灵感来自我在讨论论坛主题中看到的类似热图。我永远找不到源，因为它只是一个粘贴的图像，没有链接。最初的版本还经过精心设计以提出政治观点，以政党之间的支配关系分隔国家，对此我并不感兴趣。我可以定期更新。

Source code is hosted on my Github repo. If you are just interested in seeing updated versions of this heat map, I publish them weekly on my Twitter feed. It’s important to note that you should be careful comparing graphs from one week to another to each other, as the color map may change as new data is included. Comparisons are only valid within a given heatmap.

源代码托管在我的Github存储库中。如果您只想查看此热图的更新版本，我每周都会在Twitter feed上发布它们。重要的是要注意，您应该谨慎比较一周之间的图表，因为随着添加新数据，颜色图可能会发生变化。比较仅在给定的热图中有效。

The script relies on pandas, numpy, matplotlib, and seaborn.

该脚本依赖于pandas，numpy，matplotlib和seaborn。

The data comes from the New York Times COVID-19 Github repo. A simple launcher script clones the latest copy of the repository and copies the required file, and then launches the Python script to create the heat map. Only one file is really needed, so it could certainly be tightened up, but this works.

数据来自《纽约时报》 COVID-19 Github存储库。一个简单的启动器脚本将克隆存储库的最新副本并复制所需的文件，然后启动Python脚本以创建热图。确实只需要一个文件，因此可以将其收紧，但这是可行的。

echo "Clearing old data..."
rm -rf covid-19-data/
rm us-states.csv
echo "Getting new data..."
git clone https://github.com/nytimes/covid-19-data
echo "Done."


cp covid-19-data/us-states.csv .
echo "Starting..."


python3 heatmap-newcases.py
echo "Done."

The script first loads a CSV file containing the state populations into a dictionary, which is used to scale daily new case results. The new cases are computed for each day from the running total in the NY Times data, and then scaled to new cases per 100,000 people in the population.

该脚本首先将包含州人口的CSV文件加载到字典中，该字典用于扩展每日新个案结果。根据《纽约时报》数据中的运行总计每天计算新病例，然后将其扩展为人口中每100,000人的新病例。

We could display the heat map at that point, but if we do, states with very high numbers of cases per 100,000 people will swamp the detail of the states with lower numbers of cases. Applying a log(x+1) transform improves contrast and readability significantly.

我们可以在那时显示热图，但是如果这样做，每10万人中案件数量非常多的州将淹没案件数量较少的州的详细信息。应用log(x + 1)变换可显着提高对比度和可读性。

Finally, Seaborn and Matplotlib are used to generate the heatmap and save it to an image file.

最后，使用Seaborn和Matplotlib生成热图并将其保存到图像文件中。

That’s it! Feel free to use this as a framework for your own visualization. You can customize it to zero in on areas of interest.

而已！随意使用它作为您自己的可视化框架。您可以在感兴趣的区域将其自定义为零。

Full source code is below. Thanks for reading, and I hope you found it useful.

完整的源代码如下。感谢您的阅读，希望您觉得它有用。

import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import csv
import datetime


reader = csv.reader(open('StatePopulations.csv'))


statePopulations = {}
for row in reader:
    key = row[0]
    if key in statePopulations:
        pass
    statePopulations[key] = row[1:]


filename = "us-states.csv"
fullTable = pd.read_csv(filename)
fullTable = fullTable.drop(['fips'], axis=1)
fullTable = fullTable.drop(['deaths'], axis=1)


# generate a list of the dates in the table
dates = fullTable['date'].unique().tolist()
states = fullTable['state'].unique().tolist()


result = pd.DataFrame()
result['date'] = fullTable['date']


states.remove('Northern Mariana Islands')
states.remove('Puerto Rico')
states.remove('Virgin Islands')
states.remove('Guam')


states.sort()


for state in states:
    # create new dataframe with only the current state's date
    population = int(statePopulations[state][0])
    print(state + ": " + str(population))
    stateData = fullTable[fullTable.state.eq(state)]


    newColumnName = state
    stateData[newColumnName] = stateData.cases.diff()
    stateData[newColumnName] = stateData[newColumnName].replace(np.nan, 0)
    stateData = stateData.drop(['state'], axis=1)
    stateData = stateData.drop(['cases'], axis=1)


    stateData[newColumnName] = stateData[newColumnName].div(population)
    stateData[newColumnName] = stateData[newColumnName].mul(100000.0)


    result = pd.merge(result, stateData, how='left', on='date')


result = result.drop_duplicates()
result = result.fillna(0)


for state in states:
    result[state] = result[state].add(1.0)
    result[state] = np.log10(result[state])
    #result[state] = np.sqrt(result[state])


result['date'] = pd.to_datetime(result['date'])
result = result[result['date'] >= '2020-02-15']
result['date'] = result['date'].dt.strftime('%Y-%m-%d')


result.set_index('date', inplace=True)
result.to_csv("result.csv")
result = result.transpose()


plt.figure(figsize=(16, 10))
g = sns.heatmap(result, cmap="coolwarm", linewidth=0.05, linecolor='lightgrey')
plt.xlabel('')
plt.ylabel('')


plt.title("Daily New Covid-19 Cases Per 100k Of Population", fontsize=20)


updateText = "Updated " + str(datetime.date.today()) + \
    ". Scaled with Log(x+1) for improved contrast due to wide range of values. Data source: NY Times Github. Visualization by @JRBowling"


plt.suptitle(updateText, fontsize=8)


plt.yticks(np.arange(.5, 51.5, 1.0), states)


plt.yticks(fontsize=8)
plt.xticks(fontsize=8)
g.set_xticklabels(g.get_xticklabels(), rotation=90)
g.set_yticklabels(g.get_yticklabels(), rotation=0)
plt.savefig("covidNewCasesper100K.png")