特征工程之自动特征生成(自动特征衍生)工具Featuretools介绍

源文件地址:https://docs.featuretools.com/

参考内容:https://blog.csdn.net/q337100/article/details/80804887

FeatureTools是进行特征自动生成的框架,它可以将时间和关系数据集转换为可用于机器学习的特征矩阵。

5分钟快速开始

下面是使用深度特征合成(DFS)执行自动化特征工程的示例。在本例中,我们将DFS应用于一个由多个表组成的带有时间戳的客户交易数据集。

In [1]: import featuretools as ft

载入Mock数据

In [2]: data = ft.demo.load_mock_customer()

准备数据

本示例使用的数据集包含三张表。在Featuretools中将表称之为entity。本示例包含的三个entity如下所示:

  • customers:由不同的客户记录组成,一个客户可以有多个session
  • sessions:由不同的session记录组成,一个session记录包括多个属性
  • transactions:由不同的交易记录组成,一个session可以包括多个交易事件
In [3]: customers_df = data["customers"]

In [4]: customers_df
Out[4]: 
   customer_id zip_code           join_date date_of_birth
0            1    60091 2011-04-17 10:48:33    1994-07-18
1            2    13244 2012-04-15 23:31:04    1986-08-18
2            3    13244 2011-08-13 15:42:34    2003-11-21
3            4    60091 2011-04-08 20:08:14    2006-08-15
4            5    60091 2010-07-17 05:27:50    1984-07-28

In [5]: sessions_df = data["sessions"]

In [6]: sessions_df.sample(5)
Out[6]: 
    session_id  customer_id   device       session_start
13          14            1   tablet 2014-01-01 03:28:00
6            7            3   tablet 2014-01-01 01:39:40
1            2            5   mobile 2014-01-01 00:17:20
28          29            1   mobile 2014-01-01 07:10:05
24          25            3  desktop 2014-01-01 05:59:40

In [7]: transactions_df = data["transactions"]

In [8]: transactions_df.sample(5)
Out[8]: 
     transaction_id  session_id    transaction_time product_id  amount
74              232           5 2014-01-01 01:20:10          1  139.20
231              27          17 2014-01-01 04:10:15          2   90.79
434              36          31 2014-01-01 07:50:10          3   62.35
420              56          30 2014-01-01 07:35:00          3   72.70
54              444           4 2014-01-01 00:58:30          4   43.59

首先,我们用数据集中的所有实体指定一个字典。

In [9]: entities = {
   ...:    "customers" : (customers_df, "customer_id"),
   ...:    "sessions" : (sessions_df, "session_id", "session_start"),
   ...:    "transactions" : (transactions_df, "transaction_id", "transaction_time")
   ...: }
   ...: 

其次,我们指定实体的关联方式。当两个实体有一对多关系时,即为父子实体关系。父实体的一条记录对应子实体中的多条记录。例如Customer Entity(customer_id zip_code           join_date date_of_birth)与session Entity(session_id  customer_id device session_start),一个客户可以有多条会话记录。定义父子关系的语句如下所示:

(parent_entity, parent_variable, child_entity, child_variable)

在示例数据集中,具有如下关系:

In [10]: relationships = [("sessions", "session_id", "transactions", "session_id"),
   ....:                  ("customers", "customer_id", "sessions", "customer_id")]
   ....: 

运行深度特征合成

DFS的最小输入包括一组entity,一组关系以及要计算特征的target_entity。DFS的输出是一个特征矩阵和相应的特征定义列表。

In [11]: feature_matrix_customers, features_defs = ft.dfs(entities=entities,
   ....:                                                  relationships=relationships,
   ....:                                                  target_entity="customers")
   ....: 

In [12]: feature_matrix_customers
Out[12]: 
            zip_code  COUNT(sessions)  NUM_UNIQUE(sessions.device) MODE(sessions.device)  SUM(transactions.amount)  STD(transactions.amount)  MAX(transactions.amount)  SKEW(transactions.amount)  MIN(transactions.amount)  MEAN(transactions.amount)  COUNT(transactions)  NUM_UNIQUE(transactions.product_id)  MODE(transactions.product_id)  DAY(join_date)  DAY(date_of_birth)  YEAR(join_date)  YEAR(date_of_birth)  MONTH(join_date)  MONTH(date_of_birth)  WEEKDAY(join_date)  WEEKDAY(date_of_birth)  SUM(sessions.STD(transactions.amount))  SUM(sessions.MAX(transactions.amount))  SUM(sessions.SKEW(transactions.amount))  SUM(sessions.MIN(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  SUM(sessions.NUM_UNIQUE(transactions.product_id))  STD(sessions.SUM(transactions.amount))  STD(sessions.MAX(transactions.amount))  STD(sessions.SKEW(transactions.amount))  STD(sessions.MIN(transactions.amount))  STD(sessions.MEAN(transactions.amount))  STD(sessions.COUNT(transactions))  STD(sessions.NUM_UNIQUE(transactions.product_id))  MAX(sessions.SUM(transactions.amount))  MAX(sessions.STD(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.MIN(transactions.amount))  MAX(sessions.MEAN(transactions.amount))  MAX(sessions.COUNT(transactions))  MAX(sessions.NUM_UNIQUE(transactions.product_id))  SKEW(sessions.SUM(transactions.amount))  SKEW(sessions.STD(transactions.amount))  SKEW(sessions.MAX(transactions.amount))  SKEW(sessions.MIN(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  SKEW(sessions.COUNT(transactions))  SKEW(sessions.NUM_UNIQUE(transactions.product_id))  MIN(sessions.SUM(transactions.amount))  MIN(sessions.STD(transactions.amount))  MIN(sessions.MAX(transactions.amount))  MIN(sessions.SKEW(transactions.amount))  MIN(sessions.MEAN(transactions.amount))  MIN(sessions.COUNT(transactions))  MIN(sessions.NUM_UNIQUE(transactions.product_id))  MEAN(sessions.SUM(transactions.amount))  MEAN(sessions.STD(transactions.amount))  MEAN(sessions.MAX(transactions.amount))  MEAN(sessions.SKEW(transactions.amount))  MEAN(sessions.MIN(transactions.amount))  MEAN(sessions.MEAN(transactions.amount))  MEAN(sessions.COUNT(transactions))  MEAN(sessions.NUM_UNIQUE(transactions.product_id))  NUM_UNIQUE(sessions.MODE(transactions.product_id))  NUM_UNIQUE(sessions.DAY(session_start))  NUM_UNIQUE(sessions.YEAR(session_start))  NUM_UNIQUE(sessions.MONTH(session_start))  NUM_UNIQUE(sessions.WEEKDAY(session_start))  MODE(sessions.MODE(transactions.product_id))  MODE(sessions.DAY(session_start))  MODE(sessions.YEAR(session_start))  MODE(sessions.MONTH(session_start))  MODE(sessions.WEEKDAY(session_start))
customer_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
1              60091                8                            3                mobile                   9025.62                 40.442059                    139.43                   0.019698                      5.81                  71.631905                  126                                    5                              4              17                  18             2011                 1994                 4                     7                   6                       0                              312.745952                                 1057.97                                -0.476122                                   78.59                               582.193117                                                 40                              279.510713                                7.322191                                 0.589386                                6.954507                                13.759314                           4.062019                                           0.000000                                 1613.93                               46.905665                                 0.640252                                   26.36                                88.755625                                 25                                                  5                                 0.778170                                -0.312355                                -0.780493                                 2.440005                                 -0.424949                            1.946018                                           0.000000                                   809.97                               30.450261                                  118.90                                -1.038434                                50.623125                                 12                                                  5                              1128.202500                                39.093244                               132.246250                                 -0.059515                                 9.823750                                 72.774140                           15.750000                                           5.000000                                                   4                                         1                                         1                                          1                                            1                                             4                                  1                                2014                                    1                                      2
2              13244                7                            3               desktop                   7200.28                 37.705178                    146.81                   0.098259                      8.73                  77.422366                   93                                    5                              4              15                  18             2012                 1986                 4                     8                   6                       0                              258.700528                                  931.63                                -0.277640                                  154.60                               548.905851                                                 35                              251.609234                               17.221593                                 0.509798                               15.874374                                11.477071                           3.450328                                           0.000000                                 1320.64                               47.935920                                 0.755711                                   56.46                                96.581000                                 18                                                  5                                -0.440929                                 0.013087                                -1.539467                                 2.154929                                  0.235296                           -0.303276                                           0.000000                                   634.84                               27.839228                                  100.04                                -0.763603                                61.910000                                  8                                                  5                              1028.611429                                36.957218                               133.090000                                 -0.039663                                22.085714                                 78.415122                           13.285714                                           5.000000                                                   4                                         1                                         1                                          1                                            1                                             3                                  1                                2014                                    1                                      2
3              13244                6                            3               desktop                   6236.62                 43.683296                    149.15                   0.418230                      5.89                  67.060430                   93                                    5                              1              13                  21             2011                 2003                 8                    11                   5                       4                              257.299895                                  847.63                                 2.286086                                   66.21                               405.237462                                                 29                              219.021420                               10.724241                                 0.429374                                5.424407                                11.174282                           2.428992                                           0.408248                                 1477.97                               50.110120                                 0.854976                                   20.06                                82.109444                                 18                                                  5                                 2.246479                                -0.245703                                -0.941078                                 1.000771                                  0.678544                           -1.507217                                          -2.449490                                   889.21                               35.704680                                  126.74                                -0.289466                                55.579412                                 11                                                  4                              1039.436667                                42.883316                               141.271667                                  0.381014                                11.035000                                 67.539577                           15.500000                                           4.833333                                                   4                                         1                                         1                                          1                                            1                                             1                                  1                                2014                                    1                                      2
4              60091                8                            3                mobile                   8727.68                 45.068765                    149.95                  -0.036348                      5.73                  80.070459                  109                                    5                              2               8                  15             2011                 2006                 4                     8                   4                       1                              356.125829                                 1157.99                                 0.002764                                  131.51                               649.657515                                                 37                              235.992478                                3.514421                                 0.387884                               16.960575                                13.027258                           3.335416                                           0.517549                                 1351.46                               54.293903                                 0.382868                                   54.83                               110.450000                                 18                                                  5                                -0.391805                                -1.065663                                 0.027256                                 2.103510                                  1.980948                            0.282488                                          -0.644061                                   771.68                               29.026424                                  139.20                                -0.711744                                70.638182                                 10                                                  4                              1090.960000                                44.515729                               144.748750                                  0.000346                                16.438750                                 81.207189                           13.625000                                           4.625000                                                   5                                         1                                         1                                          1                                            1                                             1                                  1                                2014                                    1                                      2
5              60091                6                            3                mobile                   6349.66                 44.095630                    149.02                  -0.025941                      7.55                  80.375443                   79                                    5                              5              17                  28             2010                 1984                 7                     7                   5                       5                              259.873954                                  839.76                                 0.014384                                   86.49                               472.231119                                                 30                              402.775486                                7.928001                                 0.415426                                4.961414                                11.007471                           3.600926                                           0.000000                                 1700.67                               51.149250                                 0.602209                                   20.65                                94.481667                                 18                                                  5                                 0.472342                                 0.204548                                -0.333796                                -0.470410                                  0.335175                           -0.317685                                           0.000000                                   543.18                               36.734681                                  128.51                                -0.539060                                66.666667                                  8                                                  5                              1058.276667                                43.312326                               139.960000                                  0.002397                                14.415000                                 78.705187                           13.166667                                           5.000000                                                   5                                         1                                         1                                          1                                            1                                             3                                  1                                2014                                    1                                      2

从上述结果可以看出,我们得到了描述客户行为的几十个特征。

修改target entity

DFS如此强大的原因之一是它可以为数据中的任何实体创建一个特征矩阵。例如,我们同样可以为session构建特征:

In [13]: feature_matrix_sessions, features_defs = ft.dfs(entities=entities,
   ....:                                                 relationships=relationships,
   ....:                                                 target_entity="sessions")
   ....: 

In [14]: feature_matrix_sessions.head(5)
Out[14]: 
            customer_id   device  SUM(transactions.amount)  STD(transactions.amount)  MAX(transactions.amount)  SKEW(transactions.amount)  MIN(transactions.amount)  MEAN(transactions.amount)  COUNT(transactions)  NUM_UNIQUE(transactions.product_id)  MODE(transactions.product_id)  DAY(session_start)  YEAR(session_start)  MONTH(session_start)  WEEKDAY(session_start) customers.zip_code  NUM_UNIQUE(transactions.DAY(transaction_time))  NUM_UNIQUE(transactions.YEAR(transaction_time))  NUM_UNIQUE(transactions.MONTH(transaction_time))  NUM_UNIQUE(transactions.WEEKDAY(transaction_time))  MODE(transactions.DAY(transaction_time))  MODE(transactions.YEAR(transaction_time))  MODE(transactions.MONTH(transaction_time))  MODE(transactions.WEEKDAY(transaction_time))  customers.COUNT(sessions)  customers.NUM_UNIQUE(sessions.device) customers.MODE(sessions.device)  customers.SUM(transactions.amount)  customers.STD(transactions.amount)  customers.MAX(transactions.amount)  customers.SKEW(transactions.amount)  customers.MIN(transactions.amount)  customers.MEAN(transactions.amount)  customers.COUNT(transactions)  customers.NUM_UNIQUE(transactions.product_id)  customers.MODE(transactions.product_id)  customers.DAY(join_date)  customers.DAY(date_of_birth)  customers.YEAR(join_date)  customers.YEAR(date_of_birth)  customers.MONTH(join_date)  customers.MONTH(date_of_birth)  customers.WEEKDAY(join_date)  customers.WEEKDAY(date_of_birth)
session_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
1                     2  desktop                   1229.01                 41.600976                    141.66                   0.295458                     20.91                  76.813125                   16                                    5                              3                   1                 2014                     1                       2              13244                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          7                                      3                         desktop                             7200.28                           37.705178                              146.81                             0.098259                                8.73                            77.422366                             93                                              5                                        4                        15                            18                       2012                           1986                           4                               8                             6                                 0
2                     5   mobile                    746.96                 45.893591                    135.25                  -0.160550                      9.32                  74.696000                   10                                    5                              5                   1                 2014                     1                       2              60091                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          6                                      3                          mobile                             6349.66                           44.095630                              149.02                            -0.025941                                7.55                            80.375443                             79                                              5                                        5                        17                            28                       2010                           1984                           7                               7                             5                                 5
3                     4   mobile                   1329.00                 46.240016                    147.73                  -0.324012                      8.70                  88.600000                   15                                    5                              1                   1                 2014                     1                       2              60091                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          8                                      3                          mobile                             8727.68                           45.068765                              149.95                            -0.036348                                5.73                            80.070459                            109                                              5                                        2                         8                            15                       2011                           2006                           4                               8                             4                                 1
4                     1   mobile                   1613.93                 40.187205                    129.00                   0.234349                      6.29                  64.557200                   25                                    5                              5                   1                 2014                     1                       2              60091                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          8                                      3                          mobile                             9025.62                           40.442059                              139.43                             0.019698                                5.81                            71.631905                            126                                              5                                        4                        17                            18                       2011                           1994                           4                               7                             6                                 0
5                     4   mobile                    777.02                 48.918663                    139.20                   0.336381                      7.43                  70.638182                   11                                    5                              5                   1                 2014                     1                       2              60091                                               1                                                1                                                 1                                                  1                                          1                                       2014                                           1                                             2                          8                                      3                          mobile                             8727.68                           45.068765                              149.95                            -0.036348                                5.73                            80.070459                            109                                              5                                        2                         8                            15                       2011                           2006                           4                               8                             4                                 1

 

  • 3
    点赞
  • 31
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
featuretools可以衍生出各种类型的特征,包括以下几类: 1. 聚合特征featuretools可以通过聚合数据表中的多个实体之间的关系,计算出多种聚合统计特征,如平均值、最大值、最小值、求和等。例如,在一个包含订单和产品信息的数据集中,可以通过聚合计算每个用户的订单数量、总金额、平均金额等特征。 2. 时间序列特征featuretools可以从时间序列数据中提取各种有关时间的特征,例如,对于每个时间戳,可以计算过去一段时间内的滑动平均值、滑动标准差等特征。这些特征可以帮助分析数据中的趋势、周期性等时间相关的模式。 3. 文本特征featuretools可以对文本数据进行特征衍生,例如,可以从文本中提取关键词、计算词频、构建词袋模型等。这些特征可以用于文本分类、情感分析等任务。 4. 图特征featuretools可以处理包含图结构的数据,从图中提取各种节点和边的特征。例如,在社交网络数据中,可以计算每个用户的节点度数、介数中心性等特征,用于社交网络分析。 5. 深度特征featuretools可以通过结合机器学习模型提取深度特征。例如,可以使用预训练的神经网络模型来提取图像特征,用于图片分类或物体检测等任务。 总之,featuretools是一个强大的特征工程工具,可以自动化地从复杂的数据中衍生出各种类型的特征,为后续的机器学习任务提供有用的输入。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值