用python进行营销分析_用python进行covid 19分析

最新推荐文章于 2023-03-23 17:04:28 发布

张_伟_杰

最新推荐文章于 2023-03-23 17:04:28 发布

阅读量917

点赞数 2

文章标签： python 人工智能机器学习

原文链接：https://medium.com/swlh/covid-19-analysis-with-python-b898181ea627

版权

用python进行营销分析

Python is a highly powerful general purpose programming language which can be easily learned and provides data scientists a wide variety of tools and packages. Amid this pandemic period, I decided to do an analysis on this novel coronavirus.

Python是一种功能强大的通用编程语言，可以轻松学习，并为数据科学家提供各种工具和软件包。在这个大流行时期，我决定对这种新型冠状病毒进行分析。

In this article, I am going to walk you through the steps I undertook for this analysis with visuals and code snippets.

在本文中，我将通过视觉和代码片段逐步指导您进行此分析。

数据分析涉及的步骤： (Steps involved in Data Analysis:)

Importing required packages
导入所需的软件包

2. Gathering Data

2.收集数据

3. Transforming Data to our needs (Data Wrangling)

3.将数据转变为我们的需求(数据整理)

4. Exploratory Data Analysis (EDA) and Visualization

4.探索性数据分析(EDA)和可视化

步骤— 1：导入所需的软件包 (Step — 1: Importing required Packages)

Importing our required packages is the starting point of all data analysis programming in python. As I’ve said, python provides a wide variety of packages for data scientists and in this analysis I used python’s most popular data science packages Pandas and NumPy for Data Wrangling and EDA. When coming to Data Visualization, I used python’s interactive packages Plotly and Matplotlib. It’s very simple to import packages in python code:

导入所需的软件包是python中所有数据分析编程的起点。就像我说过的那样，python为数据科学家提供了各种各样的软件包，在此分析中，我使用了python最受欢迎的数据科学软件包Pandas和NumPy进行数据整理和EDA。进行数据可视化时，我使用了python的交互式软件包Plotly和Matplotlib。用python代码导入软件包非常简单：

This is the code for importing our primary packages to perform data analysis but still, we need to add some more packages to our code which we will see in step-2. Yay! We successfully finished our first step.

这是用于导入主要程序包以执行数据分析的代码，但是仍然需要向代码中添加更多程序包，我们将在步骤2中看到这些代码。好极了！我们成功地完成了第一步。

步骤2：收集数据 (Step — 2: Gathering Data)

For a clean and perfect data analysis, the foremost important element is collecting quality Data. For this analysis, I’ve collected many data from various sources for better accuracy.

对于干净，完美的数据分析，最重要的元素是收集高质量的数据。为了进行此分析，我从各种来源收集了许多数据，以提高准确性。

Our primary dataset is extracted from esri (a website which provides updated data on coronavirus) using a query url (click here to view the website). Follow the code snippets to extract the data from esri:

我们的主要数据集是使用查询网址从esri(提供有关冠状病毒的最新数据的网站)中提取的( 请单击此处查看该网站 )。按照代码片段从esri中提取数据：

Requests is a python packages used to extract data from a given json file. In this code I used requests to extract data from the given query url by esri. We are now ready to do some Data Wrangling! (Note : We will be importing many data in step-4 of our analysis)

Requests是一个python软件包，用于从给定的json文件中提取数据。在这段代码中，我使用了esri的请求从给定的查询URL中提取数据。现在，我们准备进行一些数据整理！ (注意：我们将在分析的第4步中导入许多数据)

步骤— 3：数据整理 (Step — 3: Data Wrangling)

Data Wrangling is a process where we will transform and clean our data to our needs. We can’t do analysis with our raw extracted data. So, we have to transform the data to proceed our analysis. Here’s the code for our Data Wrangling:

数据整理是一个过程，在此过程中，我们将根据需要转换和清理数据。我们无法使用原始提取的数据进行分析。因此，我们必须转换数据以进行分析。这是我们的数据整理的代码：

Note that, we have imported a new python package, ‘datetime’, which helps us to work with dates and times in a dataset. Now, get ready to see the big picture of our analysis -’ EDA and Data Visualization’.

请注意，我们已经导入了一个新的python包“ datetime”，它可以帮助我们处理数据集中的日期和时间。现在，准备看一下我们分析的大图-“ EDA和数据可视化”。

步骤— 4：探索性数据分析和数据可视化 (Step — 4: Exploratory Data Analysis and Data Visualization)

This process is quite long as it is the heart and soul of data analysis. So, I’ve divided this process into three steps:

这个过程很长，因为它是数据分析的心脏和灵魂。因此，我将这一过程分为三个步骤：

a. Ranking countries and provinces (based on COVID-19 aspects)

一个。对国家和省进行排名(基于COVID-19方面)

b. Time Series on COVID-19 Cases

b。 COVID-19病例的时间序列

c. Classification and Distribution of cases

C。案件分类和分布

Ranking countries and provinces

排名国家和省

From our previously extracted data we are going to rank countries and provinces based on confirmed, deaths, recovered and active cases by doing some EDA and Visualization. Follow the code snippets for the upcoming visuals (Note : Every visualizations are interactive and you can hover them to see their data points)

从我们先前提取的数据中，我们将通过进行一些EDA和可视化，根据确诊，死亡，康复和活着的病例对国家和省进行排名。请遵循即将出现的视觉效果的代码片段(注意：每个视觉效果都是交互式的，您可以将它们悬停以查看其数据点)

Part 1 — Ranking Most affected countries

第1部分-排名受影响最大的国家

i) Top 10 Confirmed Cases Countries:

i)十大确诊病例国家：

The following code will produce a plot ranking top 10 countries based on confirmed cases.

以下代码将根据已确认的案例得出前十个国家/地区的地块。

# a. Top 10 confirmed countries (Bubble plot)


top10_confirmed = pd.DataFrame(data.groupby('Country')['Confirmed'].sum().nlargest(10).sort_values(ascending = False))
fig1 = px.scatter(top10_confirmed, x = top10_confirmed.index, y = 'Confirmed', size = 'Confirmed', size_max = 120,
                color = top10_confirmed.index, title = 'Top 10 Confirmed Cases Countries')
fig1.show()