Apriori算法的介绍

前言:

数据挖掘中的关联分析可以分成频繁项集的挖掘和关联规则的生成两个步骤,而Apriori算法是找频繁项集最常用到的一种算法。
关于关联分析和频繁项集请见:什么是关联分析?


中言:

我们还是利用购物篮的例子来讲述Apriori算法的思路。

购物篮信息如下:

TIDItems
001Cola, Egg, Ham
002Cola, Diaper, Beer
003Cola, Diaper, Beer, Ham
004Diaper, Beer

TID代表交易流水号,Items代表一次交易的商品。

我们Apriori算法的最终目的就是要找出数据集中的频繁项集,把最小支持度阈值设为50%,则最终挖掘结果如下(后面的数字表示该项集的支持度计数):

频繁1-项集:
{Cola} 3
{Diaper} 3
{Beer} 3
{Ham} 2

频繁2-项集:
{Cola, Diaper} 2
{Cola, Beer} 2
{Cola,Ham} 2
{Diaper, Beer} 3

频繁3-项集:
{Cola, Diaper, Beer} 2



Apriori算法的思路是由频繁(k-1)-项集生成候选k-项集,然后根据最小支持度判断该候选k-项集是否是频繁k-项集。
例如先找出所有1-项集,然后筛选出里面的频繁1-项集; 根据频繁1-项集生成候选2-项集,然后筛选出里面的频繁2-项集; 再根据频繁2-项集生成候选3-项集,从里面筛选出频繁3-项集;·······

那么问题来了,如何从频繁(k-1)-项集生成候选k-项集呢?
答案是利用Apriori性质:一个频繁项集的任一子集也应该是频繁子集(用反证法容易证明,略)。所以如果一个项集是非频繁项集,那么它的超集也应该是非频繁项集。
例如{Cola, Diaper}是频繁项集,所以{Cola}和{Diaper}也应该是频繁项集。因为{Egg}是非频繁项集,所以{Cola, Egg}也是非频繁项集。


从频繁1-项集生成候选2-项集的步骤是:把频繁1-项集和频繁1-项集排列组合成2-项集,把含有非频繁子项集的2-项集去掉,就是候选2-项集了。

从频繁2-项集生成候选三项集的步骤是:把频繁2-项集和频繁1-项集排列组合成3-项集:{Cola, Diaper, Beer}、{Cola, Diaper, Ham}、{Cola, Beer, Ham}、{Diaper, Beer, Ham}。
因为{Diaper, Ham}不是频繁2-项集,所以含有{Diaper, Ham}的{Cola, Diaper, Ham}不是候选3-项集,去掉。因为{Beer, Ham}不是频繁2-项集,所以含有{Beer, Ham}的{Cola, Beer, Ham}、{Diaper, Beer, Ham}不是候选3-项集,去掉。
所以候选3-项集只有{Cola, Diaper, Beer}。

购物篮频繁项集的挖掘过程如下:
这里写图片描述


Apriori算法描述如下(代码源自《数据挖掘原理与实践》):

算法:Apriori 算法的频繁项集的产生
输入:数据集D;最小支持度阈值min_sup
输出:D 中的频繁项集L
(1) L1 = find_frequent_1-itemset( D );
(2) for( k=2; Lk1Φ ; k++)
(3) {
(4)   Ck = apriori_gen( Lk1 );      // 产生候选项集
(5)  for all transactions t D
(6)  {
(7)     Ct = subset( Ck , t);      // 识别 t 包含的所有候选
(8)    for all candidates cCt
(9)    {
(10)      c.count++;      // 支持度计数增值
(11)    }
(12)  }
(13)   Lk = { cCk | c.count≥min_sup}      // 提取频繁k-项集
(14) }
(15) return L=kLk ;


procedure apriori_gen( Lk1 )
(1) for each itemset l1Lk1
(2)  for each itemset l2Lk1
(3)    if( l1 [1]= l2 [1] ∧…∧ ( l1 [k-2]= l2 [k-2] ) ∧ ( l1 [k-1]< l2 [k-2] ) then
(4)    {
(5)      c = join( l1 , l2 );      // 连接:产生候选
(6)      if has_infrequent_subset( c, Lk1 ) then
(7)        delete c;      // 减枝:移除非频繁的候选
(8)      else
(9)        add c to Ck
(10)    }
(11) return Ck ;


procedure has_infrequent_subset( c, Lk1 )
// 使用先验知识判断候选项集是否频繁
(1) for each ( k-1 )-subset s of c
(2)  if s Lk1 then
(3)    return TRUE;
(4) return FALSE;


后言:

我们可以看到,Apriori算法是通过频繁(k-1)-项集找到频繁k-项集的,虽然可以通过Apriori性质进行减枝,去掉一部分子集为非频繁项集的候选项集,但还是需要不断地扫描数据集,不断地求候选项集的支持度计数从而判断它是否是频繁项集。如果数据集足够大的时候,这种算法的效率还是挺让人捉急的!

然后,FP-Growth算法就华丽丽的出现了,请见FP-Growth算法的介绍FP_Growth算法python实现


转载请注明出处,谢谢!(原文链接:http://blog.csdn.net/bone_ace/article/details/46660819
根据这些信息提供创建数据库和创建数据表的Oracle代码:CUSTOMER: stores customer information such as name, address, phone number, email, and loyalty program status. This table has the primary key of CustomerID. • ORDER: stores information about each order such as the order date, order status, and total cost. This table has a primary key of OrderID and a foreign key to the Customer table. • ORDER_DETAILS: stores details about each item in an order such as the product name, price, quantity, and subtotal. This table has a primary key of OrderDetailID and foreign keys to the Order and PRODUCT tables. • PRODUCT: stores information about each product such as the product name, description, price, and category. This table has the primary key of ProductID. • STORE: stores information about each store such as the store name, location, and hours of operation. This table has the primary key of StoreID. • INVENTORY: stores information about the inventory for each product in each store such as the quantity on hand and the reorder point. This table has a composite primary key of ProductID and StoreID. Group Assignment • TRANSACTION: stores information about each transaction such as the transaction date, transaction type, and total amount. This table has a primary key of TransactionID and a foreign key to the Customer table. • TRANSACTION_DETAILS: stores details about each item in a transaction such as the product name, price, quantity, and subtotal. This table has the primary key of TransactionDetailID and foreign keys to the TRANSACTION and PRODUCT tables. • EMPLOYEE: stores employee information such as name, address, phone number, email, and position. This table has the primary key of EmployeeID. • SALARY: stores information about the salary for each employee such as the salary amount, start date, and end date. This table has a composite primary key of EmployeeID and StartDate. • SHIFT: stores information about the shift for each employee such as the start time, end time, and store location. This table has a primary key of ShiftID and foreign keys to the Employee and STORE tables.
05-23
以下是创建数据库和创建数据表的Oracle代码: 创建数据库: ``` CREATE DATABASE myDatabase; ``` 创建数据表: ``` CREATE TABLE CUSTOMER ( CustomerID INT PRIMARY KEY, Name VARCHAR(50), Address VARCHAR(100), Phone VARCHAR(20), Email VARCHAR(50), LoyaltyProgramStatus VARCHAR(20) ); CREATE TABLE ORDER ( OrderID INT PRIMARY KEY, OrderDate DATE, OrderStatus VARCHAR(20), TotalCost DECIMAL(10, 2), CustomerID INT FOREIGN KEY REFERENCES CUSTOMER(CustomerID) ); CREATE TABLE ORDER_DETAILS ( OrderDetailID INT PRIMARY KEY, ProductName VARCHAR(50), Price DECIMAL(10, 2), Quantity INT, Subtotal DECIMAL(10, 2), OrderID INT FOREIGN KEY REFERENCES ORDER(OrderID), ProductID INT FOREIGN KEY REFERENCES PRODUCT(ProductID) ); CREATE TABLE PRODUCT ( ProductID INT PRIMARY KEY, ProductName VARCHAR(50), Description VARCHAR(100), Price DECIMAL(10, 2), Category VARCHAR(50) ); CREATE TABLE STORE ( StoreID INT PRIMARY KEY, StoreName VARCHAR(50), Location VARCHAR(100), HoursOfOperation VARCHAR(50) ); CREATE TABLE INVENTORY ( ProductID INT, StoreID INT, QuantityOnHand INT, ReorderPoint INT, PRIMARY KEY (ProductID, StoreID), FOREIGN KEY (ProductID) REFERENCES PRODUCT(ProductID), FOREIGN KEY (StoreID) REFERENCES STORE(StoreID) ); CREATE TABLE TRANSACTION ( TransactionID INT PRIMARY KEY, TransactionDate DATE, TransactionType VARCHAR(20), TotalAmount DECIMAL(10, 2), CustomerID INT FOREIGN KEY REFERENCES CUSTOMER(CustomerID) ); CREATE TABLE TRANSACTION_DETAILS ( TransactionDetailID INT PRIMARY KEY, ProductName VARCHAR(50), Price DECIMAL(10, 2), Quantity INT, Subtotal DECIMAL(10, 2), TransactionID INT FOREIGN KEY REFERENCES TRANSACTION(TransactionID), ProductID INT FOREIGN KEY REFERENCES PRODUCT(ProductID) ); CREATE TABLE EMPLOYEE ( EmployeeID INT PRIMARY KEY, Name VARCHAR(50), Address VARCHAR(100), Phone VARCHAR(20), Email VARCHAR(50), Position VARCHAR(50) ); CREATE TABLE SALARY ( EmployeeID INT, StartDate DATE, EndDate DATE, SalaryAmount DECIMAL(10, 2), PRIMARY KEY (EmployeeID, StartDate), FOREIGN KEY (EmployeeID) REFERENCES EMPLOYEE(EmployeeID) ); CREATE TABLE SHIFT ( ShiftID INT PRIMARY KEY, StartTime TIME, EndTime TIME, StoreID INT FOREIGN KEY REFERENCES STORE(StoreID), EmployeeID INT FOREIGN KEY REFERENCES EMPLOYEE(EmployeeID) ); ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值