熊猫压缩怎么使用_将Excel与熊猫一起使用

熊猫压缩怎么使用

Excel is one of the most popular and widely-used data tools; it’s hard to find an organization that doesn’t work with it in some way. From analysts, to sales VPs, to CEOs, various professionals use Excel for both quick stats and serious data crunching.

Excel是最流行和广泛使用的数据工具之一。 很难找到一个不以某种方式与之合作的组织。 从分析师到销售副总裁,再到首席执行官,各种专业人员都使用Excel进行快速统计和严重的数据处理。

With Excel being so pervasive, data professionals must be familiar with it. You’ll also want a tool that can easily read and write Excel files — pandas is perfect for this.

随着Excel的普及,数据专业人员必须熟悉它。 您还将需要一个可以轻松读取和写入Excel文件的工具-熊猫非常适合此操作。

Pandas has excellent methods for reading all kinds of data from Excel files. You can also export your results from pandas back to Excel, if that’s preferred by your intended audience. Pandas is great for other routine data analysis tasks, such as:

熊猫具有从Excel文件读取各种数据的出色方法。 您也可以将结果从熊猫导出回Excel,如果您的目标受众更喜欢的话。 熊猫非常适合执行其他常规数据分析任务,例如:

  • quick Exploratory Data Analysis (EDA)
  • drawing attractive plots
  • feeding data into machine learning tools like scikit-learn
  • building machine learning models on your data
  • taking cleaned and processed data to any number of data tools
  • 快速探索性数据分析(EDA)
  • 绘制有吸引力的地块
  • 将数据馈送到scikit-learn等机器学习工具
  • 在数据上建立机器学习模型
  • 将清理和处理过的数据带入任意数量的数据工具

Pandas is better at automating data processing tasks than Excel, including processing Excel files.

Pandas在自动化数据处理任务方面比Excel更好,包括处理Excel文件。

img-panda-1

In this tutorial, we are going to show you how to work with Excel files in pandas. We will cover the following concepts.

在本教程中,我们将向您展示如何在熊猫中使用Excel文件。 我们将介绍以下概念。

  • setting up your computer with the necessary software
  • reading in data from Excel files into pandas
  • data exploration in pandas
  • visualizing data in pandas using the matplotlib visualization library
  • manipulating and reshaping data in pandas
  • moving data from pandas into Excel
  • 使用必要的软件设置计算机
  • 将Excel文件中的数据读入熊猫
  • 熊猫中的数据探索
  • 使用matplotlib可视化库可视化熊猫中的数据
  • 处理和重塑熊猫中的数据
  • 将数据从熊猫移动到Excel

Note that this tutorial does not provide a deep dive into pandas. To explore pandas more, check out our course.

请注意,本教程并未深入探讨熊猫。 要进一步探索熊猫,请查看我们的课程

系统先决条件 (System prerequisites)

We will use Python 3 and Jupyter Notebook to demonstrate the code in this tutorial.In addition to Python and Jupyter Notebook, you will need the following Python modules:

在本教程中,我们将使用Python 3Jupyter Notebook演示代码。除了Python和Jupyter Notebook外,您还需要以下Python模块:

  • matplotlib – data visualization
  • NumPy – numerical data functionality
  • OpenPyXL – read/write Excel 2010 xlsx/xlsm files
  • pandas – data import, clean-up, exploration, and analysis
  • xlrd – read Excel data
  • xlwt – write to Excel
  • XlsxWriter – write to Excel (xlsx) files

There are multiple ways to get set up with all the modules. We cover three of the most common scenarios below.

设置所有模块有多种方法。 我们在下面介绍三种最常见的方案。

  • If you have Python installed via Anaconda package manager, you can install the required modules using the command conda install. For example, to install pandas, you would execute the command – conda install pandas.

  • If you already have a regular, non-Anaconda Python installed on the computer, you can install the required modules using pip. Open your command line program and execute command pip install <module name> to install a module. You should replace <module name> with the actual name of the module you are trying to install. For example, to install pandas, you would execute command – pip install pandas.

  • If you don’t have Python already installed, you should get it through the Anaconda package manager. Anaconda provides installers for Windows, Mac, and Linux Computers. If you choose the full installer, you will get all the modules you need, along with Python and pandas within a single package. This is the easiest and fastest way to get started.

  • 如果您是通过Anaconda软件包管理器安装的Python,则可以使用conda install命令conda install所需的模块。 例如,要安装pandas,您将执行命令– conda install pandas

  • 如果您已经在计算机上安装了常规的非Anaconda Python,则可以使用pip安装所需的模块。 打开命令行程序并执行命令pip install <module name>来安装模块。 您应将<module name>替换为您要安装的模块的实际名称。 例如,要安装pandas,您将执行命令– pip install pandas

  • 如果尚未安装Python,则应通过Anaconda软件包管理器获取它。 Anaconda提供了适用于Windows,Mac和Linux计算机的安装程序。 如果选择完整的安装程序,则将在一个软件包中获得所需的所有模块以及Python和pandas。 这是最简单,最快的入门方法。

数据集 (The data set)

In this tutorial, we will use a multi-sheet Excel file we created from Kaggle’s IMDB Scores data. You can download the file here.

在本教程中,我们将使用从Kaggle的IMDB分数数据创建的多页Excel文件。 您可以在此处下载文件。

img-excel-1

Our Excel file has three sheets: ‘1900s,’ ‘2000s,’ and ‘2010s.’ Each sheet has data for movies from those years.

我们的Excel文件分为三页:“ 1900年代”,“ 2000年代”和“ 2010年代”。 每张纸都包含这些年份的电影数据。

We will use this data set to find the ratings distribution for the movies, visualize movies with highest ratings and net earnings and calculate statistical information about the movies. We will be analyzing and exploring this data using pandas, thus demonstrating pandas capabilities to work with Excel data.

我们将使用此数据集来查找电影的收视率分布,可视化具有最高收视率和净收入的电影,并计算有关电影的统计信息。 我们将使用pandas分析和探索此数据,从而展示pandas处理Excel数据的功能。

从Excel文件中读取数据 (Read data from the Excel file)

We need to first import the data from the Excel file into pandas. To do that, we start by importing the pandas module.

我们需要首先将数据从Excel文件导入到熊猫。 为此,我们首先导入pandas模块。

import pandas as pd

import pandas as pd

We then use the pandas’ read_excel method to read in data from the Excel file. The easiest way to call this method is to pass the file name. If no sheet name is specified then it will read the first sheet in the index (as shown below).

然后,我们使用熊猫的read_excel方法从Excel文件中读取数据。 调用此方法的最简单方法是传递文件名。 如果未指定工作表名称,则它将读取索引中的第一张工作表(如下所示)。

Here, the read_excel method read the data from the Excel file into a pandas DataFrame object. Pandas defaults to storing data in DataFrames. We then stored this DataFrame into a variable called movies.

在这里, read_excel方法将数据从Excel文件读取到pandas DataFrame对象中。 熊猫默认将数据存储在DataFrames中。 然后,我们存储在该数据帧到一个变量称为movies

Pandas has a built-in DataFrame.head() method that we can use to easily display the first few rows of our DataFrame. If no argument is passed, it will display first five rows. If a number is passed, it will display the equal number of rows from the top.

Pandas具有内置的DataFrame.head()方法,可用于轻松显示DataFrame的前几行。 如果未传递任何参数,它将显示前五行。 如果传递了一个数字,它将从顶部开始显示相同数量的行。

movies.head()

movies.head()

Title 标题 Year Genres 体裁 Language 语言 Country 国家 Content Rating 内容分级 Duration 持续时间 Aspect Ratio 长宽比 Budget 预算 Gross Earnings 总收入 Facebook Likes – Actor 1 Facebook喜欢–演员1 Facebook Likes – Actor 2 Facebook喜欢–演员2 Facebook Likes – Actor 3 Facebook喜欢–演员3 Facebook Likes – cast Total Facebook点赞–总计 Facebook likes – Movie Facebook喜欢–电影 Facenumber in posters 海报中的面Kong编号 User Votes 用户投票 Reviews by Users 用户评论 Reviews by Crtiics Crtiics的评论 IMDB Score IMDB分数
0 0 Intolerance: Love’s Struggle Throughout the Ages 不宽容:千古以来的爱情挣扎 1916 1916年 Drama|History|War 戏剧|历史|战争 NaN N USA 美国 Not Rated 没有评分 123 123 1.33 1.33 385907.0 385907.0 NaN N 436 436 22 22 9.0 9.0 481 481 691 691 1 1个 10718 10718 88 88 69.0 69.0 8.0 8.0
1 1个 Over the Hill to the Poorhouse 越过山到贫民窟 1920 1920年 Crime|Drama 犯罪|戏剧 NaN N USA 美国 NaN N 110 110 1.33 1.33 100000.0 100000.0 3000000.0 3000000.0 2 2 2 2 0.0 0.0 4 4 0 0 1 1个 5 5 1 1个 1.0 1.0 4.8 4.8
2 2 The Big Parade 大游行 1925 1925年 Drama|Romance|War 戏剧|浪漫|战争 NaN N USA 美国 Not Rated 没有评分 151 151 1.33 1.33 245000.0 245000.0 NaN N 81 81 12 12 6.0 6.0 108 108 226 226 0 0 4849 4849 45 45 48.0 48.0 8.3 8.3
3 3 Metropolis 都会 1927 1927年 Drama|Sci-Fi 戏剧|科幻 German 德语 Germany 德国 Not Rated 没有评分 145 145 1.33 1.33 6000000.0 6000000.0 26435.0 26435.0 136 136 23 23 18.0 18.0 203 203 12000 12000 1 1个 111841 111841 413 413 260.0 260.0 8.3 8.3
4 4 Pandora’s Box 潘多拉魔盒 1929 1929年 Crime|Drama|Romance 犯罪|戏剧|浪漫 German 德语 Germany 德国 Not Rated 没有评分 110 110 1.33 1.33 NaN N 9950.0 9950.0 426 426 20 20 3.0 3.0 455 455 926 926 1 1个 7431 7431 84 84 71.0 71.0 8.0 8.0

5 rows × 25 columns

5行×25列

Excel files quite often have multiple sheets and the ability to read a specific sheet or all of them is very important. To make this easy, the pandas read_excel method takes an argument called sheetname that tells pandas which sheet to read in the data from. For this, you can either use the sheet name or the sheet number. Sheet numbers start with zero. If the sheetname argument is not given, it defaults to zero and pandas will import the first sheet.

Excel文件通常具有多个工作表,并且读取特定工作表或全部工作表的能力非常重要。 为了使其变得容易,pandas read_excel方法采用了一个名为sheetname的参数,该参数告诉pandas从数据中读取哪张纸。 为此,您可以使用工作表名称或工作表编号。 工作表编号从零开始。 如果未提供sheetname参数,则默认为零,熊猫将导入第一张图纸。

By default, pandas will automatically assign a numeric index or row label starting with zero. You may want to leave the default index as such if your data doesn’t have a column with unique values that can serve as a better index. In case t

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值