使用python数据分析_如何使用Python提升您的数据分析技能

使用python数据分析

If you're learning Python, you've likely heard about sci-kit-learn, NumPy and Pandas. And these are all important libraries to learn. But there is more to them than you might initially realize.

如果您正在学习Python,则可能听说过sci-kit-learn,NumPy和Pandas。 这些都是需要学习的重要库。 但是他们所拥有的比您最初想象的要多。

There are numerous tips and tricks in the world of Python that can help you speed up your tasks in data science, improve your code, and also help you to write code more efficiently.

Python领域中有许多技巧和窍门,可以帮助您加快数据科学中的任务,改善代码并还可以更有效地编写代码。

So I decided to compile some of the most valuable data analysis tips in this article for you.

因此,我决定为您编译一些最有价值的数据分析技巧。

在Pandas中剖析数据框 (Profile dataframes in Pandas)

The primary role or purpose of profiling is to get a clear understanding of the data. And this is what the Python package, Pandas Profiling, does. This method is straightforward and fast in performing data analysis of dataframes in Pandas.

概要分析的主要作用或目的是对数据有清晰的了解。 这就是Python程序包Pandas Profiling所做的。 该方法在对Pandas中的数据帧执行数据分析时非常简单快捷。

The exploratory data analysis process includes the Pandas df.info()functions and df.describe() as the first steps. But you only get a basic data overview, which might not be very helpful if you're dealing with a large data set.

探索性数据分析过程包括熊猫df.info()函数和df.describe()作为第一步。 但是您只会得到基本的数据概述,如果您要处理大量数据集,这可能不会很有帮助。

Pandas’s profiling function also extends the dataframe of Pandas with the df.profile_report(), which helps you quickly analyze data. It displays plenty of information in just one line of code, which also happens to be an HTML report that's interactive.

Pandas的分析功能还使用df.profile_report()扩展了Pandas的数据框,该功能可帮助您快速分析数据。 它仅用一行代码显示大量信息,而这恰好是交互式HTML报告。

For a set of data, Pandas profiling computes these statistics:

对于一组数据,Pandas分析会计算以下统计信息:

使熊猫图更具互动性 (Make pandas plots more interactive)

The built-in plot() function of Pandas is also one of the Dataframe classes. However, this function offers visualizations that are not very interactive, and so do not appeal much to a data science audience.

Pandas的内置plot()函数也是Dataframe类之一。 但是,此功能提供的可视化效果不是很互动,因此对数据科学的受众吸引力不大。

On the other hand, it is easy to plot a chart with the Pandas.DataFrame.plot() function. The question then is, how do we plot interactive charts like Plotly using Pandas and without making significant changes to the code?

另一方面,使用Pandas.DataFrame.plot()函数可以很容易地绘制图表。 然后的问题是,如何在不对代码进行重大更改的情况下使用Pandas绘制交互式图表(如Plotly)?

You can do this with the Cufflinks library, which binds Plotly’s power with Pandas's flexibility for plotting quickly.

您可以使用Cufflinks库来做到这一点,该库将Plotly的功能与Pandas的灵活性相结合,可以快速进行绘图。

You can see the result in the images below.

您可以在下面的图像中看到结果。

Both visualizations show the same things. The first visualization is a static chart, while the second one is a more interactive chart (and it also provides more details than the first one). Yet, we got this without making any significant changes to the syntax.

两种可视化都显示相同的内容。 第一个可视化是静态图表,而第二个可视化是更具交互性的图表(它还提供了比第一个图表更多的详细信息)。 但是,我们在没有对语法进行任何重大更改的情况下获得了此代码。

魔术命令 (Magic commands)

The tag ‘Magic Commands’ refers to a set of functions in Jupyter Notebooks. They created this set of features to solve the many common problems that are experienced in standard data analysis.

标签“ Magic Commands”指的是Jupyter Notebook中的一组功能。 他们创建了这组功能来解决标准数据分析中遇到的许多常见问题。

There are two kinds of Magic commands. First, there are the line magics - those that have a prefix of the % character. They also operate on one line of input.

有两种Magic命令。 首先,有线魔术-带有%字符前缀的魔术。 它们还可以在一行输入上运行。

The second kind are the cell magics - denoted by the double %% prefix. They work on more than one input line. If you set it to 1, you'll call the magic functions without needing to type the initial %.

第二种是细胞魔术-由双%%前缀表示。 它们在多个输入行上工作。 如果将其设置为1,则无需键入首字母%就可以调用magic函数。

Some of these commands might come in handy when you're doing everyday tasks in data analysis. Some of them are:

在执行数据分析的日常任务时,其中一些命令可能会派上用场。 他们之中有一些是:

%pastebin (%pastebin)

This function returns the URL and also uploads the code to Pastebin. Pastebin is a content hosting service online where it's possible to store plain text (such as source code snippets) and then share the URL with other people.

此函数返回URL,并将代码上传到Pastebin。 Pastebin是在线的内容托管服务,可以存储纯文本(例如源代码片段),然后与其他人共享URL。

As a matter of fact, a Github gist is very similar to Pastebin, but has version control.

实际上,Github要点与Pastebin非常相似,但是具有版本控制功能。

%matplotlib笔记本 (%matplotlib notebook)

You can use this inline function for rendering static Matplotlib plots within Jupyter notebooks. You have to try and replace the inline part with a notebook. This will get you resize-able and zoom-able plots quickly.

您可以使用此内联函数在Jupyter笔记本中渲染静态Matplotlib图。 您必须尝试用笔记本替换嵌入式部件。 这将使您能够快速调整大小和缩放比例的图。

But make sure you call the function before you start to import the Matplotlib library.

但是请确保在开始导入Matplotlib库之前先调用该函数。

%跑 (%run)

You can use this function to run a Python script in a notebook.

您可以使用此功能在笔记本中运行Python脚本。

%% writefile (%%writefile)

This function writes the cell content into a file. You then write the code into another file named foo.py before saving it into the current directory.

此函数将单元格内容写入文件。 然后,将代码写入另一个名为foo.py的文件中,然后再将其保存到当前目录中。

%%胶乳 (%%latex)

This function makes the cell content appear as LaTeX. It comes in handy when writing mathematical equations and formulae in a cell.

此功能使单元格内容显示为LaTeX。 在单元格中编写数学方程式和公式时非常方便。

查找并删除错误 (Find and remove errors)

The function known as the interactive debugger is another magic feature. However, for this article, it has a different category all its own.

称为交互式调试器的功能是另一个魔术功能。 但是,对于本文,它自己拥有一个不同的类别。

If you are running a code cell and get an exception, type %debug under a new line and then run it. This will open up an environment for interactive debugging that takes you back to the point where the exception happened.

如果您正在运行代码单元并遇到异常,请在新行下键入%debug,然后运行它。 这将为交互式调试打开一个环境,使您回到发生异常的地方。

You can also check the values of the different variables that they assigned within the program and, at the same time, perform operations there. After that, if you want to exit the debugger, press q.

您还可以检查它们在程序中分配的不同变量的值,并同时在其中执行操作。 此后,如果要退出调试器,请按q。

运行Python脚本时使用“ I”选项 (Use the ‘I’ option when running Python scripts)

One way to typically run a Python script from the command line is with hello.py. But if you add an -i and run the same Python script, (Python -i hello.py), you get more benefits. How?

通常从命令行运行Python脚本的一种方法是hello.py。 但是,如果添加-i并运行相同的Python脚本(Python -i hello.py),则会获得更多好处。 怎么样?

First of all, after you get to the program end, Python does not close the interpreter. This means that we can check for the values of the different variables and how correct the functions defined in the program are.

首先,进入程序端后 ,Python不会关闭解释器。 这意味着我们可以检查不同变量的值以及程序中定义的函数的正确性。

Second, it is then easy to invoke the Python debugger, especially since the interpreter is still available by:

其次,调用Python调试器非常容易,特别是因为解释器仍然可以通过以下方式使用:

  • Import pdb

    导入pdb
  • Pdb.pm()

    Pdb.pm()

From here, we can quickly get to the point where the exception happened and then work on the code.

从这里,我们可以快速到达发生异常的地方,然后对代码进行处理。

删除并还原 (Delete and restore)

So what do you do when you mistakenly delete one cell within your Jupyter Notebook? Luckily there is a shortcut for you to undo that action.

那么,当您错误地删除Jupyter Notebook中的一个单元格时该怎么办? 幸运的是,您可以通过快捷方式撤消该操作。

You can recover or undo your deleted content by hitting CTRL/CMD+Z.

您可以通过按CTRL / CMD + Z来恢复或撤消已删除的内容。

If you have deleted an entire cell that you want to recover, press ESC+Z, or EDIT > Undo Delete Cells.

如果已删除要恢复的整个单元,请按ESC + Z或EDIT> Undo Delete Cells。

结论 (Conclusion)

This article shared some tips to boost your data analysis skills with Python. These hacks should come in handy for you at some point in your Python data analysis journey.

本文分享了一些技巧,以提高您使用Python的数据分析技能。 在您进行Python数据分析的过程中,这些技巧应该会很方便。

翻译自: https://www.freecodecamp.org/news/how-to-boost-your-data-analysis-skills-with-python/

使用python数据分析

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值