Learning Data Mining with Python-第一章-affinity analysis

亲和力分析:给定样本相似度分析的一种data mining方法,相似的应用如服务、广告、商品推荐、基因同源追溯等。

简单的规则可用“如果一个人买了产品X,那么他也可能买商品Y”来形容亲和力分析。

亲和力分析的度量尺度:

Support:支持度,rule在数据集中发生的次数,有时可以除以rule有效预测次数进行归一化

Confidence:可信度,rule在数据集中发生的次数/预测事件的次数

简单理解下:

rule: 买A→买B

Support=买了A同时买了B的次数

Confidence = 买了A同时买了B的次数/买了A的次数

Affinity analysis is a type of data mining that gives similarity between samples (objects). This could be the similarity between the following:

  • users on a website, in order to provide varied services or targeted advertising

  • items to sell to those users, in order to provide recommended movies or products

  • human genes, in order to find people that share the same ancestors 

find simple rules of the form: If a person buys product X, then they are likely to purchase product Y

Rules of this type can be measured in many ways, of which we will focus on two: support and confidence. 

  • Support is the number of times that a rule occurs in a dataset, which is computed by simply counting the number of samples that the rule is valid for. It can sometimes be normalized by dividing by the total number of times the premise of the rule is valid, but we will simply count the total for this implementation.
  •  Confidence measures how accurate they are when they can be used. It can be computed by determining the percentage of times the rule applies when the premise applies. We first count how many times a rule applies in our dataset and divide it by the number of samples where the premise (the if statement) occurs. 
例如采样矩阵:有5条样例,每条有5个特征分别代表买【bread, milk, cheese, apples, and bananas】 
[
 [ 0.  0.  1.  1.  1.]
 [ 1.  1.  0.  1.  0.]
 [ 1.  0.  1.  1.  0.]
 [ 0.  0.  1.  1.  1.]
 [ 0.  1.  0.  0.  1.]
]

特征features = ["bread", "milk", "cheese", "apples", "bananas"]

例如买了apples则买banana的事件

规则发生的次数:rule_valid = 2

规则无效的次数:rule_invalid = 2 

Support  =  rule_valid(3,4)=  2次 ,其中(3,4)代表矩阵中第4列=1同时第5列=1,apples则买banana的事件

预测事件发生次数:num_apple_purchases 买apples事件发生的次数为4

Confidence = rule_valid/num_apple_purchases = 50%

 

                
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
earning Data Mining with Python - Second Edition by Robert Layton English | 4 May 2017 | ASIN: B01MRP7VFV | 358 Pages | AZW3 | 2.85 MB Key Features Use a wide variety of Python libraries for practical data mining purposes. Learn how to find, manipulate, analyze, and visualize data using Python. Step-by-step instructions on data mining techniques with Python that have real-world applications. Book Description This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations. What you will learn Apply data mining concepts to real-world problems Predict the outcome of sports matches based on past results Determine the author of a document based on their writing style Use APIs to download datasets from social media and other online services Find and extract good features from difficult datasets Create models that solve real-world problems Design and develop data mining applications using a variety of datasets Perform object detection in images using Deep Neural Networks Find meaningful insights from your data through intuitive visualizations Compute on big data, including real-time data from the internet About the Author Robert Layton is a data scientist working mainly on text mining problems for industries including the finance, information security, and transport sectors. He runs dataPipeline to build algorithms for practical use, and Eurekative, helping bringing start-ups to life in regional Australia. He has presented at the last four PyCon AU conferences, at multiple international research conferences, and has been training in some capacity for five years. He has a PhD in cybercrime analytics from the Internet Commerce Security Laboratory at Federation University Australia, where he was the Inaugural Young Alumni of the Year in 2014 and is currently and Honorary Research Fellow. You can find him on LinkedIn at https://www.linkedin.com/in/drrobertlayton and on Twitter at @robertlayton. Robert writes regularly on data mining and cybercrime, in a private, consultancy, and a research capacity. Robert is an Official Member of the Ballarat Hackerspace, where he helps grow the future-tech sector in regional Victoria.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值