ImageVerifierCode 换一换
格式:PPT , 页数:82 ,大小:2.72MB ,
文档编号:2040925      下载积分:15 文币
快捷下载
登录下载
邮箱/手机:
温馨提示:
系统将以此处填写的邮箱或者手机号生成账号和密码,方便再次下载。 如填写123,账号和密码都是123。
支付方式: 支付宝    微信支付   
验证码:   换一换

优惠套餐
 

温馨提示:若手机下载失败,请复制以下地址【https://www.163wenku.com/d-2040925.html】到电脑浏览器->登陆(账号密码均为手机号或邮箱;不要扫码登陆)->重新下载(不再收费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  
下载须知

1: 试题类文档的标题没说有答案,则无答案;主观题也可能无答案。PPT的音视频可能无法播放。 请谨慎下单,一旦售出,概不退换。
2: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
3: 本文为用户(罗嗣辉)主动上传,所有收益归该用户。163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

1,本文(数据挖掘课件:chap6-basic-association-analysis.ppt)为本站会员(罗嗣辉)主动上传,163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。
2,用户下载本文档,所消耗的文币(积分)将全额增加到上传者的账号。
3, 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(发送邮件至3464097650@qq.com或直接QQ联系客服),我们立即给予删除!

数据挖掘课件:chap6-basic-association-analysis.ppt

1、Data Mining Association Analysis: Basic Concepts and AlgorithmsLecture Notes for Chapter 6Introduction to Data MiningbyTan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Association Rule MininglGiven a set o

2、f transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transactionMarket-Basket transactionsExample of Association RulesDiaper Beer,Milk, Bread Eggs,Coke,Beer, Bread Milk,Implication means co-occurrence, not causality! Tan,Steinbach, Kum

3、ar Introduction to Data Mining 4/18/2004 3 Definition: Frequent ItemsetlItemset A collection of one or more itemsuExample: Milk, Bread, Diaper k-itemsetuAn itemset that contains k itemslSupport count ( ) Frequency of occurrence of an itemset E.g. (Milk, Bread,Diaper) = 2 lSupport Fraction of transac

4、tions that contain an itemset E.g. s(Milk, Bread, Diaper) = 2/5lFrequent Itemset An itemset whose support is greater than or equal to a minsup threshold Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Definition: Association RuleExample:BeerDiaper,Milk4 . 052|T|)BeerDiaper,Milk(s67. 032

5、)Diaper,Milk()BeerDiaper,Milk,(clAssociation RuleAn implication expression of the form X Y, where X and Y are itemsetsExample: Milk, Diaper Beer lRule Evaluation MetricsSupport (s)uFraction of transactions that contain both X and YConfidence (c)uMeasures how often items in Y appear in transactions t

6、hatcontain X Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Association Rule Mining TasklGiven a set of transactions T, the goal of association rule mining is to find all rules having support minsup threshold confidence minconf thresholdlBrute-force approach: List all possible associat

7、ion rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds Computationally prohibitive! Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Mining Association RulesExample of Rules:Milk,Diaper Beer (s=0.4, c=0.67)Milk,Beer Diaper (s=0.

8、4, c=1.0)Diaper,Beer Milk (s=0.4, c=0.67)Beer Milk,Diaper (s=0.4, c=0.67) Diaper Milk,Beer (s=0.4, c=0.5) Milk Diaper,Beer (s=0.4, c=0.5)Observations: All the above rules are binary partitions of the same itemset: Milk, Diaper, Beer Rules originating from the same itemset have identical support but

9、can have different confidence Thus, we may decouple the support and confidence requirements Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Mining Association RuleslTwo-step approach: 1. Frequent Itemset GenerationGenerate all itemsets whose support minsup2. Rule GenerationGenerate high

10、 confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemsetlFrequent itemset generation is still computationally expensive Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Frequent Itemset GenerationnullABACADAEBCBDBECDCEDEABCDEABCABDABEAC

11、DACEADEBCDBCEBDECDEABCDABCEABDEACDEBCDEABCDEGiven d items, there are 2d possible candidate itemsets Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Frequent Itemset GenerationlBrute-force approach: Each itemset in the lattice is a candidate frequent itemset Count the support of each can

12、didate by scanning the database Match each transaction against every candidate Complexity O(NMw) = Expensive since M = 2d ! Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Computational ComplexitylGiven d unique items: Total number of itemsets = 2d Total number of possible association

13、rules: 1231111dddkkdjjkdkdRIf d=6, R = 602 rules Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11 Frequent Itemset Generation StrategieslReduce the number of candidates (M) Complete search: M=2d Use pruning techniques to reduce MlReduce the number of transactions (N) Reduce size of N as

14、 the size of itemset increases Used by DHP and vertical-based mining algorithmslReduce the number of comparisons (NM) Use efficient data structures to store the candidates or transactions No need to match every candidate against every transaction Tan,Steinbach, Kumar Introduction to Data Mining 4/18

15、/2004 12 Reducing Number of CandidateslApriori principle: If an itemset is frequent, then all of its subsets must also be frequentlApriori principle holds due to the following property of the support measure: Support of an itemset never exceeds the support of its subsets This is known as the anti-mo

16、notone property of support)()()( :,YsXsYXYX Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 Found to be InfrequentIllustrating Apriori PrinciplePruned supersets Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14 Illustrating Apriori PrincipleItemCountBread4Coke2Milk4Beer3Dia

17、per4Eggs1ItemsetCountBread,Milk3Bread,Beer2Bread,Diaper3Milk,Beer2Milk,Diaper3Beer,Diaper3Item set C ount B read,M ilk,D iaper 3 Items (1-itemsets)Pairs (2-itemsets)(No need to generatecandidates involving Cokeor Eggs)Triplets (3-itemsets)Minimum Support = 3If every subset is considered, 6C1 + 6C2 +

18、 6C3 = 41With support-based pruning,6 + 6 + 1 = 13 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15 Apriori AlgorithmlMethod: Let k=1 Generate frequent itemsets of length 1 Repeat until no new frequent itemsets are identifieduGenerate length (k+1) candidate itemsets from length k freque

19、nt itemsetsuPrune candidate itemsets containing subsets of length k that are infrequent uCount the support of each candidate by scanning the DBuEliminate candidates that are infrequent, leaving only those that are frequent Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16 Reducing Number

20、 of ComparisonslCandidate counting: Scan the database of transactions to determine the support of each candidate itemset To reduce the number of comparisons, store the candidates in a hash structureu Instead of matching each transaction against every candidate, match it against candidates contained

21、in the hashed buckets Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17 Generate Hash Tree2 3 45 6 71 4 51 3 61 2 44 5 71 2 54 5 81 5 93 4 53 5 63 5 76 8 93 6 73 6 81,4,72,5,83,6,9Hash functionSuppose you have 15 candidate itemsets of length 3: 1 4 5, 1 2 4, 4 5 7, 1 2 5, 4 5 8, 1 5 9, 1

22、 3 6, 2 3 4, 5 6 7, 3 4 5, 3 5 6, 3 5 7, 6 8 9, 3 6 7, 3 6 8You need: Hash function Max leaf size: max number of itemsets stored in a leaf node (if number of candidate itemsets exceeds max leaf size, split the node) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18 Association Rule Disco

23、very: Hash tree1 5 91 4 51 3 63 4 53 6 73 6 83 5 63 5 76 8 92 3 45 6 71 2 44 5 71 2 54 5 81,4,72,5,83,6,9Hash FunctionCandidate Hash TreeHash on 1, 4 or 7 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19 Association Rule Discovery: Hash tree1 5 91 4 51 3 63 4 53 6 73 6 83 5 63 5 76 8 92

24、 3 45 6 71 2 44 5 71 2 54 5 81,4,72,5,83,6,9Hash FunctionCandidate Hash TreeHash on 2, 5 or 8 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20 Association Rule Discovery: Hash tree1 5 91 4 51 3 63 4 53 6 73 6 83 5 63 5 76 8 92 3 45 6 71 2 44 5 71 2 54 5 81,4,72,5,83,6,9Hash FunctionCand

25、idate Hash TreeHash on 3, 6 or 9 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21 Subset OperationGiven a transaction t, what are the possible subsets of size 3? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22 Subset Operation Using Hash Tree1 5 91 4 51 3 63 4 53 6 73 6 83

26、 5 63 5 76 8 92 3 45 6 71 2 44 5 71 2 54 5 81 2 3 5 61 + 2 3 5 63 5 62 +5 63 +1,4,72,5,83,6,9Hash Functiontransaction Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23 Subset Operation Using Hash Tree1 5 91 4 51 3 63 4 53 6 73 6 83 5 63 5 76 8 92 3 45 6 71 2 44 5 71 2 54 5 81,4,72,5,83,6

27、,9Hash Function1 2 3 5 63 5 61 2 +5 61 3 +61 5 +3 5 62 +5 63 +1 + 2 3 5 6transaction Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 24 Subset Operation Using Hash Tree1 5 91 4 51 3 63 4 53 6 73 6 83 5 63 5 76 8 92 3 45 6 71 2 44 5 71 2 54 5 81,4,72,5,83,6,9Hash Function1 2 3 5 63 5 61 2

28、+5 61 3 +61 5 +3 5 62 +5 63 +1 + 2 3 5 6transactionMatch transaction against 11 out of 15 candidates Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 25 Factors Affecting ComplexitylChoice of minimum support threshold lowering support threshold results in more frequent itemsets this may in

29、crease number of candidates and max length of frequent itemsetslDimensionality (number of items) of the data set more space is needed to store support count of each item if number of frequent items also increases, both computation and I/O costs may also increaselSize of database since Apriori makes

30、multiple passes, run time of algorithm may increase with number of transactionslAverage transaction width transaction width increases with denser data sets This may increase max length of frequent itemsets and traversals of hash tree (number of subsets in a transaction increases with its width) Tan,

31、Steinbach, Kumar Introduction to Data Mining 4/18/2004 26 Compact Representation of Frequent ItemsetslSome itemsets are redundant because they have identical support as their supersetslNumber of frequent itemsetslNeed a compact representation101103kk Tan,Steinbach, Kumar Introduction to Data Mining

32、4/18/2004 27 Maximal Frequent ItemsetBorderInfrequent ItemsetsMaximal ItemsetsAn itemset is maximal frequent if none of its immediate supersets is frequent Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 28 Closed ItemsetlAn itemset is closed if none of its immediate supersets has the sam

33、e support as the itemsetItemsetSupportA,B,C2A,B,D3A,C,D2B,C,D3A,B,C,D2 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 29 Maximal vs Closed ItemsetsTIDItems1ABC2ABCD3BCE4ACDE5DEnullABACADAEBCBDBECDCEDEABCDEABCABDABEACDACEADEBCDBCEBDECDEABCDABCEABDEACDEBCDEABCDE1241231234245345121242441232

34、3243445122244423424Transaction IdsNot supported by any transactions Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 30 Maximal vs Closed Frequent ItemsetsnullABACADAEBCBDBECDCEDEABCDEABCABDABEACDACEADEBCDBCEBDECDEABCDABCEABDEACDEBCDEABCDE12412312342453451212424412323243445122244423424Mini

35、mum support = 2# Closed = 9# Maximal = 4Closed and maximalClosed but not maximal Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 31 Maximal vs Closed Itemsets Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 32 Alternative Methods for Frequent Itemset GenerationlTraversal of Ite

36、mset Lattice General-to-specific vs Specific-to-general Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 33 Alternative Methods for Frequent Itemset GenerationlTraversal of Itemset Lattice Equivalent Classes Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 34 Alternative Methods

37、for Frequent Itemset GenerationlTraversal of Itemset Lattice Breadth-first vs Depth-first Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 35 Alternative Methods for Frequent Itemset GenerationlRepresentation of Database horizontal vs vertical data layout Tan,Steinbach, Kumar Introduction

38、to Data Mining 4/18/2004 36 FP-growth AlgorithmlUse a compressed representation of the database using an FP-treelOnce an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 37 FP-tree c

39、onstructionTIDItems1A,B2B,C,D3A,C,D,E4A,D,E5A,B,C6A,B,C,D7B,C8A,B,C9A,B,D10B,C,EnullA:1B:1nullA:1B:1B:1C:1D:1After reading TID=1:After reading TID=2: Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 38 FP-Tree ConstructionnullA:7B:5B:3C:3D:1C:1D:1C:3D:1D:1E:1E:1TIDItems1A,B2B,C,D3A,C,D,E4A

40、,D,E5A,B,C6A,B,C,D7B,C8A,B,C9A,B,D10B,C,EPointers are used to assist frequent itemset generationD:1E:1Transaction DatabaseItemPointerABCDEHeader table Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 39 FP-growthnullA:7B:5B:1C:1D:1C:1D:1C:3D:1D:1Conditional Pattern base for D: P = (A:1,B:1

41、,C:1),(A:1,B:1), (A:1,C:1), (A:1), (B:1,C:1)Recursively apply FP-growth on PFrequent Itemsets found (with sup 1): AD, BD, CD, ACD, BCDD:1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 40 Tree ProjectionSet enumeration tree:nullABACADAEBCBDBECDCEDEABCDEABCABDABEACDACEADEBCDBCEBDECDEABCDA

42、BCEABDEACDEBCDEABCDEPossible Extension: E(A) = B,C,D,EPossible Extension: E(ABC) = D,E Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 41 Tree ProjectionlItems are listed in lexicographic orderlEach node P stores the following information: Itemset for node P List of possible lexicographic

43、 extensions of P: E(P) Pointer to projected database of its ancestor node Bitvector containing information about which transactions in the projected database contain the itemset Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 42 Projected DatabaseTIDItems1A,B2B,C,D3A,C,D,E4A,D,E5A,B,C6A,B

44、,C,D7B,C8A,B,C9A,B,D10B,C,ETIDItems1B23C,D,E4D,E5B,C6B,C,D78B,C9B,D10Original Database:Projected Database for node A: For each transaction T, projected transaction at node A is T E(A) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 43 ECLATlFor each item, store a list of transaction ids (

45、tids)TIDItems1A,B,E2B,C,D3C,E4A,C,D5A,B,C,D6A,E7A,B8A,B,C9A,C,D10BHorizontalData LayoutABCDE11221423435545667897898109Vertical Data LayoutTID-list Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 44 ECLATlDetermine support of any k-itemset by intersecting tid-lists of two of its (k-1) subs

46、ets.l3 traversal approaches: top-down, bottom-up and hybridlAdvantage: very fast support countinglDisadvantage: intermediate tid-lists may become too large for memoryA1456789B1257810 AB1578 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 45 Rule GenerationlGiven a frequent itemset L, find

47、 all non-empty subsets f L such that f L f satisfies the minimum confidence requirement If A,B,C,D is a frequent itemset, candidate rules:ABC D, ABD C, ACD B, BCD A, A BCD,B ACD,C ABD, D ABCAB CD,AC BD, AD BC, BC AD, BD AC, CD AB,lIf |L| = k, then there are 2k 2 candidate association rules (ignoring

48、 L and L) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 46 Rule GenerationlHow to efficiently generate rules from frequent itemsets? In general, confidence does not have an anti-monotone propertyc(ABC D) can be larger or smaller than c(AB D) But confidence of rules generated from the sa

49、me itemset has an anti-monotone property e.g., L = A,B,C,D: c(ABC D) c(AB CD) c(A BCD) u Confidence is anti-monotone w.r.t. number of items on the RHS of the rule Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 47 Rule Generation for Apriori AlgorithmLattice of rulesPruned RulesLow Confid

50、ence Rule Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 48 Rule Generation for Apriori AlgorithmlCandidate rule is generated by merging two rules that share the same prefixin the rule consequentljoin(CD=AB,BD=AC)would produce the candidaterule D = ABClPrune rule D=ABC if itssubset AD=BC

侵权处理QQ:3464097650--上传资料QQ:3464097650

【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。


163文库-Www.163Wenku.Com |网站地图|