深入学习pandas(1) : 10 minutes to pandas

  • pandas

    pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

  • 10 minutes to pandas

df2 = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo'})

# filter
df[df['E'].isin(['test'])]


df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
                   columns=['A', 'B', 'C', 'D'])
# plot

  • Intro to data structures

    Here is a basic tenet to keep in mind : data alignment is intrinsic.

    The link between labels and data will not be broken unless done so explicitly by you.

  • Series

    Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

    The axis labels are collectively referred to as the index.

    The basic method to create a Series is to call :

    >>> s = pd.Series(data, index=index)
    

    data can be many different things:

    1. a Python dict
    2. an ndarray
    3. a scalar value

    The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is :

    # 1. From ndarray
    # If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0,...,len(data)-1]
    s = pd.Series(np.random.randn(5), index=['a','b', 'c', 'd', 'e'])
    
    # 2. From dict
    # The Series index will be ordered by the dict's insertion order(Python>=3.6, Pandas>=0.23), otherwise, the Series index will be the lexically ordered list of dict keys
    d = {'b':1, 'a':0, 'c':2}
    s = pd.Series(d)
    # If an index is passed, the values in data corresponding to the labels in the index will be pulled out
    s = pd.Series(d, index=['b', 'c', 'd', 'a'])
    
    # 3. From scalar value
    # If data is a scalar value, an index must be provided. The value will be repeated to match the length of index
    s = pd.Series(5.0, index=['b', 'c', 'd', 'a'])
    

    Pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值