python熊猫图案_Python熊猫:您可能不知道的技巧和功能

python熊猫图案

Pandas is a foundational library for analytics, data processing, and data science. It’s a huge project with tons of optionality and depth.

熊猫是用于分析,数据处理和数据科学的基础库。 这是一个庞大的项目,具有大量的可选项和深度。

This tutorial will cover some lesser-used but idiomatic Pandas capabilities that lend your code better readability, versatility, and speed, à la the Buzzfeed listicle.

本教程将介绍一些使用较少但惯用的Pandas功能,从而使您的代码具有更好的可读性,多功能性和速度,如Buzzfeed列表。

If you feel comfortable with the core concepts of Python’s Pandas library, hopefully you’ll find a trick or two in this article that you haven’t stumbled across previously. (If you’re just starting out with the library, 10 Minutes to Pandas is a good place to start.)

如果您对Python的Pandas库的核心概念感到满意,希望您会在本文中找到一两个以前没有偶然发现过的窍门。 (如果您只是从图书馆开始,那么到熊猫的十分钟是一个不错的起点。)

Note: The examples in this article are tested with Pandas version 0.23.2 and Python 3.6.6. However, they should also be valid in older versions.

注意 :本文中的示例已通过Pandas 0.23.2版和Python 3.6.6版进行了测试。 但是,它们在旧版本中也应有效。

1.在解释器启动时配置选项和设置 (1. Configure Options & Settings at Interpreter Startup)

You may have run across Pandas’ rich options and settings system before.

您之前可能已经运行过Pandas的丰富选项和设置系统。

It’s a huge productivity saver to set customized Pandas options at interpreter startup, especially if you work in a scripting environment. You can use pd.set_option() to configure to your heart’s content with a Python or IPython startup file.

在解释器启动时设置自定义的Pandas选项可以节省大量的生产力,尤其是在脚本环境中工作时。 您可以使用pd.set_option()通过PythonIPython启动文件来配置您的心脏内容。

The options use a dot notation such as pd.set_option('display.max_colwidth', 25), which lends itself well to a nested dictionary of options:

这些选项使用点表示法,例如pd.set_option('display.max_colwidth', 25) ,这很适合用于嵌套的选项字典:

 import import pandas pandas as as pd

pd

def def startstart ():
    ():
    options options = = {
     
        {
     
        'display''display' : : {
     
            {
     
            'max_columns''max_columns' : : NoneNone ,
            ,
            'max_colwidth''max_colwidth' : : 2525 ,
            ,
            'expand_frame_repr''expand_frame_repr' : : FalseFalse ,  ,  # Don't wrap to multiple pages
            # Don't wrap to multiple pages
            'max_rows''max_rows' : : 1414 ,
            ,
            'max_seq_items''max_seq_items' : : 5050 ,         ,         # Max length of printed sequence
            # Max length of printed sequence
            'precision''precision' : : 44 ,
            ,
            'show_dimensions''show_dimensions' : : False
            False
            },
        },
        'mode''mode' : : {
     
            {
     
            'chained_assignment''chained_assignment' : : None   None   # Controls SettingWithCopyWarning
            # Controls SettingWithCopyWarning
            }
        }
        }

    }

    for for categorycategory , , option option in in optionsoptions .. itemsitems ():
        ():
        for for opop , , value value in in optionoption .. itemsitems ():
            ():
            pdpd .. set_optionset_option (( ff '' {category}{category} .. {op}{op} '' , , valuevalue )  )  # Python 3.6+

# Python 3.6+

if if __name__ __name__ == == '__main__''__main__' :
    :
    startstart ()
    ()
    del del start  start  # Clean up namespace in the interpreter
# Clean up namespace in the interpreter

If you launch an interpreter session, you’ll see that everything in the startup script has been executed, and Pandas is imported for you automatically with your suite of options:

如果启动解释器会话,您将看到启动脚本中的所有内容均已执行,并且使用您的选项套件会自动为您导入Pandas:

Let’s use some data on abalone hosted by the UCI Machine Learning Repository to demonstrate the formatting that was set in the startup file. The data will truncate at 14 rows with 4 digits of precision for floats:

让我们使用UCI机器学习存储库托管的鲍鱼上的一些数据来演示在启动文件中设置的格式。 数据将在14行中截断,且浮点数的精度为4位:

 >>> >>>  url url = = (( 'https://archive.ics.uci.edu/ml/'
'https://archive.ics.uci.edu/ml/'
...        ...        'machine-learning-databases/abalone/abalone.data''machine-learning-databases/abalone/abalone.data' )
)
>>> >>>  cols cols = = [[ 'sex''sex' , , 'length''length' , , 'diam''diam' , , 'height''height' , , 'weight''weight' , , 'rings''rings' ]
]
>>> >>>  abalone abalone = = pdpd .. read_csvread_csv (( urlurl , , usecolsusecols == [[ 00 , , 11 , , 22 , , 33 , , 44 , , 88 ], ], namesnames == colscols )

)

>>> >>>  abalone
abalone
     sex  length   diam  height  weight  rings
     sex  length   diam  height  weight  rings
0      M   0.455  0.365   0.095  0.5140     15
0      M   0.455  0.365   0.095  0.5140     15
1      M   0.350  0.265   0.090  0.2255      7
1      M   0.350  0.265   0.090  0.2255      7
2      F   0.530  0.420   0.135  0.6770      9
2      F   0.530  0.420   0.135  0.6770      9
3      M   0.440  0.365   0.125  0.5160     10
3      M   0.440  0.365   0.125  0.5160     10
4      I   0.330  0.255   0.080  0.2050      7
4      I   0.330  0.255   0.080  0.2050      7
5      I   0.425  0.300   0.095  0.3515      8
5      I   0.425  0.300   0.095  0.3515      8
6      F   0.530  0.415   0.150  0.7775     20
6      F   0.530  0.415   0.150  0.7775     20
...   ...   ..     ..     ...    ...    ...     ...     ...     ...     ...    ...    ...
...
4170   M   0.550  0.430   0.130  0.8395     10
4170   M   0.550  0.430   0.130  0.8395     10
4171   M   0.560  0.430   0.155  0.8675      8
4171   M   0.560  0.430   0.155  0.8675      8
4172   F   0.565  0.450   0.165  0.8870     11
4172   F   0.565  0.450   0.165  0.8870     11
4173   M   0.590  0.440   0.135  0.9660     10
4173   M   0.590  0.440   0.135  0.9660     10
4174   M   0.600  0.475   0.205  1.1760      9
4174   M   0.600  0.475   0.205  1.1760      9
4175   F   0.625  0.485   0.150  1.0945     10
4175   F   0.625  0.485   0.150  1.0945     10
4176   M   0.710  0.555   0.195  1.9485     12
4176   M   0.710  0.555   0.195  1.9485     12

You’ll see this dataset pop up in other examples later as well.

稍后您还将在其他示例中看到此数据集弹出。

2.使用熊猫的测试模块制作玩具数据结构 (2. Make Toy Data Structures With Pandas’ Testing Module)

Hidden way down in Pandas’ testing module are a number of convenient functions for quickly building quasi-realistic Series and DataFrames:

熊猫testing模块中隐藏的许多便捷功能可用于快速构建准真实的Series和DataFrames:

There are around 30 of these, and you can see the full list by calling dir() on the module object. Here are a few:

其中大约有30个,您可以通过在模块对象上调用dir()来查看完整列表。 这里有一些:

 >>> >>>  [[ i i for for i i in in dirdir (( tmtm ) ) if if ii .. startswithstartswith (( 'make''make' )]
)]
['makeBoolIndex',
['makeBoolIndex',
 'makeCategoricalIndex',
 'makeCategoricalIndex',
 'makeCustomDataframe',
 'makeCustomDataframe',
 'makeCustomIndex',
 'makeCustomIndex',
 # ...,
 # ...,
 'makeTimeSeries',
 'makeTimeSeries',
 'makeTimedeltaIndex',
 'makeTimedeltaIndex',
 'makeUIntIndex',
 'makeUIntIndex',
 'makeUnicodeIndex']
 'makeUnicodeIndex']

These can be useful for benchmarking, testing assertions, and experimenting with Pandas methods that you are less familiar with.

这些对于基准测试,测试断言以及尝试使用您不太熟悉的Pandas方法很有用。

3.利用访问器方法 (3. Take Advantage of Accessor Methods)

Perhaps you’ve heard of the term accessor, which is somewhat like a getter (although getters and setters are used infrequently in Python). For our purposes here, you can think of a Pandas accessor as a property that serves as an interface to additional methods.

也许您听说过accessor一词,它有点像一个getter(尽管在Python中很少使用getter和setter)。 对于我们这里的目的,您可以将Pandas访问器视为可充当其他方法的接口的属性。

Pandas Series have three of them:

熊猫系列有三个:

Yes, that definition above is a mouthful, so let’s take a look at a few examples before discussing the internals.

是的,上面的定义非常详尽,因此在讨论内部原理之前,让我们看一些示例。

.cat is for categorical data, .str is for string (object) data, and .dt is for datetime-like data. Let’s start off with .str: imagine that you have some raw city/state/ZIP data as a single field within a Pandas Series.

.cat用于分类数据

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值