欢迎来到三一办公! | 帮助中心 三一办公31ppt.com(应用文档模板下载平台)
三一办公
全部分类
  • 办公文档>
  • PPT模板>
  • 建筑/施工/环境>
  • 毕业设计>
  • 工程图纸>
  • 教育教学>
  • 素材源码>
  • 生活休闲>
  • 临时分类>
  • ImageVerifierCode 换一换
    首页 三一办公 > 资源分类 > PPT文档下载  

    人工智能与数据挖掘教学课件lect513.ppt

    • 资源ID:4743805       资源大小:370.50KB        全文页数:34页
    • 资源格式: PPT        下载积分:10金币
    快捷下载 游客一键下载
    会员登录下载
    三方登录下载: 微信开放平台登录 QQ登录  
    下载资源需要10金币
    邮箱/手机:
    温馨提示:
    用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)
    支付方式: 支付宝    微信支付   
    验证码:   换一换

    加入VIP免费专享
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    人工智能与数据挖掘教学课件lect513.ppt

    1,Chapter 3 Basic Data Mining Techniques,3.3 The K-Means Algorithm(For cluster analysis),5/12/2023,AI&DM BUPT,脊春柔矫啥跳们秃内辞继矢橙烘揽诉锯圭墩巴又藏抹氢肝宪瓜吗球反蕾战人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,2,1.What is Cluster Analysis(clustering)?,Cluster(簇):a collection of data objectsSimilar to one another within the same clusterDissimilar to the objects in other clustersHigh quality clusters:high intra-class similaritylow inter-class similarityCluster analysis(聚类分析)Grouping a set of data objects into clustersClustering is unsupervised learning(unsupervised classification):no predefined classes.It is a form of learning by observation,rather than learning by examples.Typical applicationsAs a stand-alone tool to get insight into data distribution As a preprocessing step for other algorithms,5/12/2023,AI&DM BUPT,扭齐服代禽溺卡汀阎霹脓硫古督宽布设餐年拇款必揍滦蛋攫病翁茹齐鸯崎人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,3,Examples of Clustering Applications(I),聚类分析在客户细分中的应用消费同一种类的商品或服务时,不同的客户有不同的消费特点,通过研究这些特点,企业可以制定出不同的营销组合,从而获取最大的消费者剩余,这就是客户细分的主要目的。常用的客户分类方法主要有三类:经验描述法,由决策者根据经验对客户进行类别划分;传统统计法,根据客户属性特征的简单统计来划分客户类别;非传统统计方法,即聚类-基于人工智能技术的方法。,5/12/2023,AI&DM BUPT,嵌窜案氓笨棋梆友者兜链港百翼瑰恐夯足软敏示乡粒凌肖野事寄跋惹酷锌人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,4,Examples of Clustering Applications(II),Marketing:Help marketers discover distinct groups in their customer bases,and then use this knowledge to develop targeted marketing programsInsurance:Identifying groups of motor insurance policy holders with a high average claim costCity-planning:Identifying groups of houses according to their house types,values,and geographical locations,5/12/2023,AI&DM BUPT,正骆噎垣豆怒劳俘东妆业怎舟瞳法宁猛刑沏葬电酌胁钟架汉塑落仓隙快冤人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,5,Example,5/12/2023,AI&DM BUPT,卧声输了妇铂褂抽蚀现蟹或癌例脏豢氧晒疲屹胳材瓶靡芒浅傲恍洪宰妓庞人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,6,Example,5/12/2023,AI&DM BUPT,辨焕宇杠讲劲拂烩润闽畅嘉拳伴醇荤谎撅悲赊嚣绎赘祭玫质舟柳另惕惺膜人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,7,Example,5/12/2023,AI&DM BUPT,于瓦扑仕癸疮雇寐朽霍金修橙蓝硫蚜令漳呸危蒲斧幕铁济债钝胎楔钱笔懊人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,8,Example,5/12/2023,AI&DM BUPT,族颓奴寻怨届梳鹅趟鸳遭缝歹扑槽撰娘存萍莱抨刽货揩躁磷批两气卉拍澈人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,9,2.The K-Means Algorithm,Choose a value for K,the total number of clusters.Randomly choose K points as cluster centers.Assign the remaining instances to their closest cluster center.Calculate a new cluster center for each cluster.Repeat steps 3-5 until the cluster centers do not change.,5/12/2023,AI&DM BUPT,旱柬迫谷便怒戮牡乃呕盖窃宽盾瓣石它颧亲判拧蓬澜压困旦异豺社凌黑栏人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,10,The K-Means Clustering Algorithm,Example,5/12/2023,AI&DM BUPT,蹦廖耸萨裕统诣胆嘱根芍剪隅琉店锅羌希衣泣麓凤搀绑攘狭捞任痴蛆伴檄人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,11,5/12/2023,AI&DM BUPT,雾侵转岗蕉檬客孝驮轨尊怒思战污粥昆哺皇孕当鞍职弛妥匪琵刁惶余饿雁人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,12,5/12/2023,AI&DM BUPT,喉袄乐徒慑薯勇堆防瞥裂凳圆诸贫降浇辕惰钒氧蓄季佐济妙捷荤俗蓝撒腿人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,13,Problem:We may see a different final cluster configuration for each alternative choice of the initial centers.Solution:Try different centers.But set a Maximum Acceptable Squared Error.,5/12/2023,AI&DM BUPT,克米痢旧登接叹蒂罚某脱裔悲伦近砒赌榴亚眉哎锁兼蹦令必吭蓑请萍抨嘲人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,14,3.General Considerations,Requires real-valued data.The number of clusters present in the data,is selected by human.Works best when the clusters in the data are of approximately equal size.Attribute significance cannot be determined.Lacks explanation capabilities.,5/12/2023,AI&DM BUPT,仪靠液侮哦封样炕穿笛怔刨酥害也蔚盅掺句谋补卑峻陛亢肥首滚谊山走粤人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,15,4.Types of data in clustering analysis,4.1 Interval-scaled variables:4.2 Binary variables:4.3 Nominal and ordinal variables:4.4 Variables of mixed types:,Distance is normally used to measure the similarityor dissimilarity between two data objects,5/12/2023,AI&DM BUPT,曼途有耘曲吝阅广跑守翱颇化爸吵访解膛哄熙诫意抵跋坯蚊笨疆缆变昧寂人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,16,4.1 Interval-valued variables(间隔值变量),If q=1,d is Manhattan distanceIf q=2,d is Euclidean distance:Requirements for distance functiond(i,j)0d(i,i)=0d(i,j)=d(j,i)d(i,j)d(i,k)+d(k,j),5/12/2023,AI&DM BUPT,熄即署蚂诣沟埋广兑北另饥局拄踏俐锰驭弘锈蔓没超收纯汽棘贮渭耀抑搭人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,17,4.1 Interval-valued variables(Cont.1),Some popular measures include:Minkowski distance:where i=(xi1,xi2,xip)and j=(xj1,xj2,xjp)are two p-dimensional data objects,and q is a positive integer,5/12/2023,AI&DM BUPT,靖汀粘旅炭烫激辗尤夏檬事您见浙氧亦疹哭洒页骸牢输沮杆箱仅敲凝酉俐人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,18,4.1 Interval-valued variables(cont.2),Standardize dataFind out the mean:Calculate the mean absolute deviation(绝对偏差均值):Calculate the standardized measurement(z-score),5/12/2023,AI&DM BUPT,庙溶传逊旭暮愤闸蛔瓤媚喘舰梗钝蹦虾馁樱确吸渗抚亡老私营矛掏缉翌潮人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,19,4.2 Binary Variables(二值变量),A contingency table(相依表)for binary dataSimple matching coefficient(if the binary variable is symmetric(对称的)):Jaccard coefficient(if the binary variable is asymmetric(非对称的)):,Object i,Object j,5/12/2023,AI&DM BUPT,循孽读岛致蒲筒崭焕沁珍绵吓郧桃岿犹厂骑通旧接劈垦纽幸溉韩暑搓龄扑人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,20,4.2 Binary Variables(cont.),Examplegender is a symmetric attributethe remaining attributes are asymmetric binary attributeslet the values Y and P be set to 1,and the value N be set to 0,5/12/2023,AI&DM BUPT,鬃珊绸耸茄枷窥奠酌肄甜烫庚犁帖绍郁器浆泪棵浙寸豌司膝阴铜影搏柳哇人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,21,4.3 Nominal Variables(符号变量),Nominal Variables can be treated as a generalization of the binary variable in that it can take more than 2 states,e.g.,red,yellow,blue,greenMethod:Simple matching-symmetricm:#of matches,p:total#of variables,5/12/2023,AI&DM BUPT,舆认康麦直宦寓剿恒睬坟淆度辈簿核煽烟捣继仰赤让贵匈义铱粱虞烬坊修人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,22,4.4 Ordinal Variables(顺序变量),Variables that order is important,e.g.,rankCan be treated like interval-scaled replacing xif in rank order map the range of each variable onto 0,1 by replacing i-th object in the f-th variable bycompute the dissimilarity using methods for interval-scaled variables,5/12/2023,AI&DM BUPT,灶套肤饥口坪莫倦樊就烈蔷釉葡永褐慈坞兽锁火礁伟侮辉悉岳钙蛊塑碱揣人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,23,4.5 Variables of Mixed Types,A database may contain all types of variablessymmetric binary,asymmetric binary,nominal,ordinal,interval-valued.One may use a weighted formula to combine their effects.f is binary or nominal:dij(f)=0 if xif=xjf;or dij(f)=1 o.w.f is interval-based:use the normalized distancef is ordinalcompute ranks rif and zif treat zif as interval-scaled,5/12/2023,AI&DM BUPT,烬订秸涤律焚妨其想噶哦书嘎聘枝伤凸躁纵龄拦俏第阁妖已乳弄狡誉罪桥人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,24,4.5 Variables of Mixed Types(cont.),One may use a weighted formula to combine their effects.xif or xjf is missingxif=xjf=0,and variable f is asymmetric Otherwise,5/12/2023,AI&DM BUPT,墟峪灵酞鞠蘸杀荣滇傈斧漾挽沦纤逃顺饲突哩鸿丢压炉匣丫泌封喻奋棘祸人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,25,5.More about clustering Algorithms:K-means&K-medoids,Partitioning method:Construct a partition of n objects into a set of k clustersSimilarity Function:usually is distancek-means(MacQueen67):Each cluster is represented by the center of the clusterk-medoids or PAM(Partition around medoids)(Kaufman&Rousseeuw87):Each cluster is represented by one of the objects in the cluster,5/12/2023,AI&DM BUPT,掀鞘玫因箍淌真赐浇醒热钵剥捻稠萎鹿绞盛收例伯早彤箱虎魂悸窖列胎仙人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,26,Comments on the K-Means Method,Strength Relatively efficient:O(tkn),where n is#objects,k is#clusters,and t is#iterations.Normally,k,t n.WeaknessApplicable only when mean is defined,then what about categorical data?Need to specify k-the number of clusters,in advanceUnable to handle noisy data and outliersNot suitable to discover clusters with non-convex shapes,5/12/2023,AI&DM BUPT,普纵泪锌晒驰仅豆姑懊溺燕贮朝钳颐篙橇桅瞳渣兢酸吧讲豪鼻皂探褪寝臀人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,27,The K-Medoids Clustering Method,Find representative objects,called medoids(聚类代表),in clustersPAM(Partitioning Around Medoids,1987)starts from an initial set of medoids and iteratively replaces one of the medoids by one of the non-medoids if it improves the total distance of the resulting clustering,5/12/2023,AI&DM BUPT,垮楞筋帽肾略快坠诌信右诫烷商郊殴阁生航囱栓态致瘤妓明崖求寇滓筏蚜人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,28,PAM(Partitioning Around Medoids),PAM(Kaufman and Rousseeuw,1987),Use real object to represent the clusterSelect k representative objects arbitrarilyFor each pair of non-selected object h and selected object i,calculate the total swapping cost TCihFor each pair of i and h,If TCih 0,i is replaced by hThen assign each non-selected object to the most similar representative objectrepeat steps 2-3 until there is no changePAM works effectively for small data sets,but does not scale well for large data sets,5/12/2023,AI&DM BUPT,砍叫钉稗萍添卸衙藉扼馅莉昏擒加嫁姑丸兵秃琉菊书帛馅雇欲砰贤钙旅迹人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,29,6.Agglomerative(凝聚的)Clustering(10.4),Place each instance into a separate partition.Until all instances are part of a single cluster:a.Determine the two most similar clusters.b.Merge the clusters chosen into a single cluster.3.Choose a clustering formed by one of the step 2 iterations as a final result.,5/12/2023,AI&DM BUPT,驻撬猫角罪聋厚秆倘唁拣虾悬终汤楔目勤汾输侯谩抛涅衡摔炮汰壕尚搓溯人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,30,Agglomerative Clustering:An Example,5/12/2023,AI&DM BUPT,擦柳贺僵维料鹃头拢虎砷砌仁倔庄任叼詹底枢举黄兜舷桥暴绵凹嫌碱顾簇人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,5/12/2023,31,AI&DM BUPT,柠拘延大沮胜卯推蜀回囱咨演抠吼瘤免映液师话腕包修霄赌兑谩遵什迪妊人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,32,Summary:Requirements of Clustering Algorithm,ScalabilityAbility to deal with different types of attributesDiscovery of clusters with arbitrary(任意的)shapeMinimal requirements for domain knowledge to determine input parametersAbility to deal with noisy dataInsensitivity to order of input recordsHigh dimensionalityIncorporation of user-specified constraintsInterpretability and usability,5/12/2023,AI&DM BUPT,锋烹涸排付贱汐榴焰忻甩忘阁跟韧躁拼嘶痢糠鹃儿敢蛤马狡朔熏爸瞧劫湘人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,33,Challenges Further Research,Considerable progress has been made in scalable clustering methodsPartitioning:k-means,k-medoids,CLARANSHierarchical:BIRCH,CUREDensity-based:DBSCAN,CLIQUE,OPTICSGrid-based:STING,WaveClusterModel-based:Autoclass,Denclue,CobwebCurrent clustering techniques do not address all the requirements adequately,5/12/2023,AI&DM BUPT,佳孔蜡鲤局蚌与吾柿码笛尘孔或例澡兹赦递驴肯爆锅勾絮妄叠褥璃凤艺聊人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,34,Homework,Perform the third iteration of the k-means algorithm for the example given in the section“An Example Using K-Means”.What are the new cluster centers?Suppose that the data mining task is to cluster the following 8 points(with(x,y)representing location)into 3 clusters.A1(2,10),A2(2,5),A3(8,4),B1(5,8),B2(7,5),B3(6,4),C1(1,2),C2(4,9)The distance function is Manhattan distance.Suppose initially we assign A1,B1,and C1 as the center of each cluster,respectively.Use the k-means algorithm to show only:(a)the three cluster centers after the first round execution;(b)the final three clusters.,5/12/2023,AI&DM BUPT,矢允鼎泞盈儡命呈涧毖锅伞杨骇邻令厕慢半辫句滁屈剿卒渺啮靛撒攘贤腔人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,

    注意事项

    本文(人工智能与数据挖掘教学课件lect513.ppt)为本站会员(sccc)主动上传,三一办公仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三一办公(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    备案号:宁ICP备20000045号-2

    经营许可证:宁B2-20210002

    宁公网安备 64010402000987号

    三一办公
    收起
    展开