ImageVerifierCode 换一换
格式:PPT , 页数:33 ,大小:331.50KB ,
文档编号:5182998      下载积分:25 文币
快捷下载
登录下载
邮箱/手机:
温馨提示:
系统将以此处填写的邮箱或者手机号生成账号和密码,方便再次下载。 如填写123,账号和密码都是123。
支付方式: 支付宝    微信支付   
验证码:   换一换

优惠套餐
 

温馨提示:若手机下载失败,请复制以下地址【https://www.163wenku.com/d-5182998.html】到电脑浏览器->登陆(账号密码均为手机号或邮箱;不要扫码登陆)->重新下载(不再收费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  
下载须知

1: 试题类文档的标题没说有答案,则无答案;主观题也可能无答案。PPT的音视频可能无法播放。 请谨慎下单,一旦售出,概不退换。
2: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
3: 本文为用户(晟晟文业)主动上传,所有收益归该用户。163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

1,本文(CHAMELEON-A-Hierarchical-Clustering-Algorithm-:变色龙的层次聚类算法课件.ppt)为本站会员(晟晟文业)主动上传,163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。
2,用户下载本文档,所消耗的文币(积分)将全额增加到上传者的账号。
3, 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(发送邮件至3464097650@qq.com或直接QQ联系客服),我们立即给予删除!

CHAMELEON-A-Hierarchical-Clustering-Algorithm-:变色龙的层次聚类算法课件.ppt

1、2001/12/18CHAMELEON1CHAMELEON:A Hierarchical Clustering Algorithm Using Dynamic ModelingPaper presentation in data mining classPresenter:許明壽;蘇建仲Data:2001/12/182001/12/18CHAMELEON2About this paper lDepartment of Computer Science and Engineering,University of MinnesotalGeorge KarypislEui-Honh(Sam)Hanl

2、Vipin KumarlIEEE Computer Journal-Aug.19992001/12/18CHAMELEON3OutlinelProblems definitionlMain algorithmlKeys features of CHAMELEONlExperiment and related workedlConclusion and discussion2001/12/18CHAMELEON4Problems definitionlClusteringlIntracluster similarity is maximizedlIntercluster similarity i

3、s minimizedlProblems of existing clustering algorithmslStatic model constrainlBreakdown when clusters that are of diverse shapes,densities,and sizeslSusceptible to noise,outliers,and artifacts 2001/12/18CHAMELEON5Static model constrainlData space constrainlK means,PAM etclSuitable only for data in m

4、etric spaceslCluster shape constrainlK means,PAM,CLARANSlAssume cluster as ellipsoidal or globular and are similar sizeslCluster density constrainlDBScanlPoints within genuine cluster are density-reachable and point across different clusters are notlSimilarity determine constrainlCURE,ROCKlUse stati

5、c model to determine the most similar cluster to merge 2001/12/18CHAMELEON6Partition techniques problem(a)Clusters of widely different sizes(b)Clusters with convex shapes2001/12/18CHAMELEON7Hierarchical technique problem(1/2)lThe(c),(d)will be choose to merge when we only consider closeness2001/12/1

6、8CHAMELEON8Hierarchical technique problem(2/2)lThe(a),(c)will be choose to merge when we only consider inter-connectivity2001/12/18CHAMELEON9Main algorithmlTwo phase algorithmlPHASE IlUse graph partitioning algorithm to cluster the data items into a large number of relatively small sub-clusters.lPHA

7、SE IIlUses an agglomerative hierarchical clustering algorithm to find the genuine clusters by repeatedly combining together these sub-clusters.2001/12/18CHAMELEON10Framework ConstructSparse GraphPartition the GraphMerge PartitionFinal ClustersData Set2001/12/18CHAMELEON11Keys features of CHAMELEONlM

8、odeling the datalModeling the cluster similaritylPartition algorithmslMerge schemes2001/12/18CHAMELEON12TermslArguments neededlKlK-nearest neighbor graphlMINSIZElThe minima size of initial clusterlTRIlThreshold of related inter-connectivity lTRC lThreshold of related intra-connectivityllCoefficient

9、for weight of RI and RC2001/12/18CHAMELEON13Modeling the datalK-nearest neighbor graph approachlAdvantageslData points that are far apart are completely disconnected in the GklGk capture the concept of neighborhood dynamicallylThe edge weights of dense regions in Gk tend to be large and the edge wei

10、ghts of sparse tend to be small2001/12/18CHAMELEON14Example of k-nearest neighbor graph2001/12/18CHAMELEON15Modeling the clustering similarity(1/2)lRelative interconnectivitylRelative closeness2|),(,CjCiCjCiECECECCjCiRICjCiCjCiECECECSCjCiCjSCjCiCiSCjCiRC|),(,2001/12/18CHAMELEON16Modeling the cluster

11、ing similarity(2/2)If related is considered,(c),(d)will be merged2001/12/18CHAMELEON17Partition algorithm(PHASE I)lWhatlFinding the initial sub-clusterslWhylRI and RC cant be accurately calculated for clusters containing only a few data pointslHowlUtilize multilevel graph partitioning algorithm(hMET

12、IS)lCoarsening phaselPartitioning phaselUncoarsening phase2001/12/18CHAMELEON18Partition algorithm(cont.)lInitiallall points belonging to the same clusterlRepeat until(size of all clusters MINSIZE)lSelect the largest cluster and use hMETIS to bisectlPost scriptumlBalance constrainlSpilt Ci into CiA

13、and CiB and each sub-clusters contains at least 25%of the node of Ci2001/12/18CHAMELEON192001/12/18CHAMELEON20lWhatlMerging sub-clusters using a dynamic frameworklHowlFinding and merging the pair of sub-clusters that are the most similarlScheme 1lScheme 2Merge schemes(Phase II),(*),(jijiCCRCCCRIRIji

14、TCCRI),(andRCjiTCCRC),(2001/12/18CHAMELEON21Experiment and related workedlIntroduction of CURElIntroduction of DBSCANlResults of experimentlPerformance analysis2001/12/18CHAMELEON22Introduction of CURE(1/n)lClustering Using Representative points1.Properties:lFit for non-spherical shapes.lShrinking c

15、an help to dampen the effects of outliers.lMultiple representative points chosen for non-sphericallEach iteration,representative points shrunk ratio related to merge procedure by some scattered points chosen lRandom sampling in data sets is fit for large databases2001/12/18CHAMELEON23Introduction of

16、 CURE(2/n)2.Drawbacks:lPartitioning method can not prove data points chosen are good.lClustering accuracy with respect to the parameters below:l(1)Shrink factor s:CURE always find the right clusters by range of s values from 0.2 to 0.7.l(2)Number of representative points c:CURE always found right cl

17、usters for value of c greater than 10.l(3)Number of Partitions p:with as many as 50 partitions,CURE always discovered the desired clusters.l(4)Random Sample size r:(a)for sample size up to 2000,clusters found poor quality(b)from 2500 sample points and above,about 2.5%of the data set size,CURE always

18、 correctly find the clusters.2001/12/18CHAMELEON243.Clustering algorithm:Representative points 2001/12/18CHAMELEON25Merge procedure2001/12/18CHAMELEON26Introduction of DBSCAN(1/n)lDensity Based Spatial Clustering of Application With Noise1.Properties:lCan discovery clusters of arbitrary shape.lEach

19、cluster with a typical density of points which is higher than outside of cluster.lThe density within the areas of noise is lower than the density in any of the clusters.lInput the parameters MinPts onlylEasy to implement in C+language using R*-treelRuntime is linear depending on the number of points

20、.lTime complexity is O(n*log n)2001/12/18CHAMELEON27Introduction of DBSCAN(2/n)2.Drawbacks:lCannot apply to polygons.lCannot apply to high dimensional feature spaces.lCannot process the shape of k-dist graph with multi-features.lCannot fit for large database because no method applied to reduce spati

21、al database.3.DefinitionslEps-neighborhood of a point plNEps(p)=qD|dist(p,q)=MinPts(core point condition)lWe know directly density-reachable is symmetric when p and q both are core point,otherwise is asymmetric if one core point and one border point.5.p is density-reachable from q if there is a chai

22、n of points between p and qlDensity-reachable is transitive,but not symmetriclDensity-reachable is symmetric for core points.2001/12/18CHAMELEON29Introduction of DBSCAN(4/n)6.A point p is density-connected to a point q if there is a point s such that both p and q are density-reachable from s.lDensit

23、y-connected is symmetric and reflexive relationlA cluster is defined to be a set of density-connected points which is maximal density-reachability.lNoise is the set of points not belong to any of clusters.7.How to find cluster C?lMaximalityl p,q:if p C and q is density-reachable from p,then q ClConn

24、ectivityl p,q C:p is density-connected to q8.How to find noises?l p,if p is not belong to any clusters,then p is noise point2001/12/18CHAMELEON30Results of experiment2001/12/18CHAMELEON31Performance analysis(1/2)lThe time of construct the k-nearest neighborlLow-dimensional data sets based on k-d tre

25、es,overall complexity of O(n log n)lHigh-dimensional data sets based on k-d trees not applicable,overall complexity of O(n2)lFinding initial sub-clusterslObtains m clusters by repeated partitioning successively smaller graphs,overall computational complexity is O(n log(n/m)lIs bounded by O(n log n)l

26、A faster partitioning algorithm to obtain the initial m clusters in time O(n+m log m)using multilevel m-way partitioning algorithm2001/12/18CHAMELEON32Performance analysis(2/2)lMerging sub-clusters using a dynamic frameworklThe time of compute the internal inter-connectivity and internal closeness f

27、or each initial cluster is which is O(nm)lThe time of the most similar pair of clusters to merge is O(m2 log m)by using a heap-based priority queuelSo overall complexity of CHAMELEONs is O(n log n+nm+m2 log m)2001/12/18CHAMELEON33Conclusion and discussionlDynamic model with related interconnectivity and closenesslThis paper ignore the issue of scaling to large datalOther graph representation methodology?lOther Partition algorithm?

侵权处理QQ:3464097650--上传资料QQ:3464097650

【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。


163文库-Www.163Wenku.Com |网站地图|