One-Hot Encoding独热编码

最新推荐文章于 2024-04-28 20:16:45 发布

pingzishinee

最新推荐文章于 2024-04-28 20:16:45 发布

阅读量1.5k

点赞数

分类专栏： Kaggle 机器学习文章标签： One_Hot Encoding 独热编码

本文链接：https://blog.csdn.net/u013317445/article/details/84983569

版权

文章目录

one-hot encoding:The Standard Approach for Categorical Features

Categorical feature：如，color of flowers: yellow, red, green。

Imgur

one-hot encoding：一种码制，有多少个状态（或者叫类别值）就有多少个比特，且只有一个比特为1，其它全为0.

Pandas offers a convenient function called get_dummies to get one-hot encodings.

code

独热编码
Pandas offers a convenient function called get_dummies to get one-hot encodings. Call it like this:

one_hot_encoded_data = pd.get_dummies(data)

help(pd.get_dummies)

Help on function get_dummies in module pandas.core.reshape.reshape:

get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
    Convert categorical variable into dummy/indicator variables
    
    Parameters
    ----------
    data : array-like, Series, or DataFrame
    prefix : string, list of strings, or dict of strings, default None
        String to append DataFrame column names.
        Pass a list with length equal to the number of columns
        when calling get_dummies on a DataFrame. Alternatively, `prefix`
        can be a dictionary mapping column names to prefixes.
    prefix_sep : string, default '_'
        If appending prefix, separator/delimiter to use. Or pass a
        list or dictionary as with `prefix.`
    dummy_na : bool, default False
        Add a column to indicate NaNs, if False NaNs are ignored.
    columns : list-like, default None
        Column names in the DataFrame to be encoded.
        If `columns` is None then all the columns with
        `object` or `category` dtype will be converted.
    sparse : bool, default False
        Whether the dummy columns should be sparse or not.  Returns
        SparseDataFrame if `data` is a Series or if all columns are included.
        Otherwise returns a DataFrame with some SparseBlocks.
    drop_first : bool, default False
        Whether to get k-1 dummies out of k categorical levels by removing the
        first level.
    
        .. versionadded:: 0.18.0
    
    dtype : dtype, default np.uint8
        Data type for new columns. Only a single dtype is allowed.
    
        .. versionadded:: 0.23.0
    
    Returns
    -------
    dummies : DataFrame or SparseDataFrame
    
    Examples
    --------
    >>> import pandas as pd
    >>> s = pd.Series(list('abca'))
    
    >>> pd.get_dummies(s)
       a  b  c
    0  1  0  0
    1  0  1  0
    2  0  0  1
    3  1  0  0
    
    >>> s1 = ['a', 'b', np.nan]
    
    >>> pd.get_dummies(s1)
       a  b
    0  1  0
    1  0  1
    2  0  0
    
    >>> pd.get_dummies(s1, dummy_na=True)
       a  b  NaN
    0  1  0    0
    1  0  1    0
    2  0  0    1
    
    >>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
    ...                    'C': [1, 2, 3]})
    
    >>> pd.get_dummies(df, prefix=['col1', 'col2'])
       C  col1_a  col1_b  col2_a  col2_b  col2_c
    0  1       1       0       0       1       0
    1  2       0       1       1       0       0
    2  3       1       0       0       0       1
    
    >>> pd.get_dummies(pd.Series(list('abcaa')))
       a  b  c
    0  1  0  0
    1  0  1  0
    2  0  0  1
    3  1  0  0
    4  1  0  0
    
    >>> pd.get_dummies(pd.Series(list('abcaa')), drop_first=True)
       b  c
    0  0  0
    1  1  0
    2  0  1
    3  0  0
    4  0  0
    
    >>> pd.get_dummies(pd.Series(list('abc')), dtype=float)
         a    b    c
    0  1.0  0.0  0.0
    1  0.0  1.0  0.0
    2  0.0  0.0  1.0
    
    See Also
    --------
    Series.str.get_dummies

align:

final_train_predictors, final_test_predictors= one_hot_encoded_training_data_predictors.align(one_hot_encoded_test_data_predictors, join='left',axis=1, fill_value=0)
#axis=1：columns
#join='left' : keep exactly the columns from our training data
#fill_value=0：对齐后没有值的地方填0，默认填的是NaN

#align
help(one_hot_encoded_X.align)

Help on method align in module pandas.core.frame:

align(self, other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None) method of pandas.core.frame.DataFrame instance
    Align two objects on their axes with the
    specified join method for each axis Index
    
    Parameters
    ----------
    other : DataFrame or Series
    join : {'outer', 'inner', 'left', 'right'}, default 'outer'
    axis : allowed axis of the other object, default None
        Align on index (0), columns (1), or both (None)
    level : int or level name, default None
        Broadcast across a level, matching Index values on the
        passed MultiIndex level
    copy : boolean, default True
        Always returns new objects. If copy=False and no reindexing is
        required then original objects are returned.
    fill_value : scalar, default np.NaN
        Value to use for missing values. Defaults to NaN, but can be any
        "compatible" value
    method : str, default None
    limit : int, default None
    fill_axis : {0 or 'index', 1 or 'columns'}

最低0.47元/天解锁文章

pingzishinee

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
One-Hot Encoding独热编码

one-hot encoding:The Standard Approach for Categorical FeaturesCategorical feature：如，color of flowers: yellow, red, green。one-hot encoding：一种码制，有多少个状态（或者叫类别值）就有多少个比特，且只有一个比特为1，其它全为0.Pandas offers ...
复制链接

扫一扫