ImageVerifierCode 换一换
格式:PPTX , 页数:34 ,大小:1.67MB ,
文档编号:7259530      下载积分:20 文币
快捷下载
登录下载
邮箱/手机:
温馨提示:
系统将以此处填写的邮箱或者手机号生成账号和密码,方便再次下载。 如填写123,账号和密码都是123。
支付方式: 支付宝    微信支付   
验证码:   换一换

优惠套餐
 

温馨提示:若手机下载失败,请复制以下地址【https://www.163wenku.com/d-7259530.html】到电脑浏览器->登陆(账号密码均为手机号或邮箱;不要扫码登陆)->重新下载(不再收费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  
下载须知

1: 试题类文档的标题没说有答案,则无答案;主观题也可能无答案。PPT的音视频可能无法播放。 请谨慎下单,一旦售出,概不退换。
2: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
3: 本文为用户(最好的沉淀)主动上传,所有收益归该用户。163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

1,本文(FAFU机器学习 5-1-2中文.pptx)为本站会员(最好的沉淀)主动上传,163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。
2,用户下载本文档,所消耗的文币(积分)将全额增加到上传者的账号。
3, 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(发送邮件至3464097650@qq.com或直接QQ联系客服),我们立即给予删除!

FAFU机器学习 5-1-2中文.pptx

1、评价方法 Holdout方法()方法()K-折叠交叉验证(K)引导()2020/12/3模型评估第4-1课评价方法 Holdout method():无疑是最简单的模型评估技术 将数据集拆分为两个不相交的部分:训练集和测试集2020/12/3模型评估第4-2课评价方法 Holdout method():无疑是最简单的模型评估技术 将数据集拆分为两个不相交的部分:训练集和测试集 请记住:拆分数据集的方法有很多种,不同的方法带来不同的性能 沿着特征轴的底层样本统计数据的变化仍然是一个问题,如果我们处理小数据集,这个问题就会变得更加明显2020/12/3模型评估第4-3课评价方法 Holdout m

2、ethod():无疑是最简单的模型评估技术 将数据集拆分为两个不相交的部分:训练集和测试集 请记住:拆分数据集的方法有很多种,不同的方法带来不同的性能l分层抽样()l用不同的随机种子重复保持方法用不同的随机种子重复保持方法k次,并计算这次,并计算这k次重复的平均性能次重复的平均性能2020/12/3模型评估第4-4课kjjAcckA1avg1cc评价方法 Holdout method():无疑是最简单的模型评估技术 将数据集拆分为两个不相交的部分:训练集和测试集 请记住:拆分数据集的方法有很多种,不同的方法带来不同的性能l分层抽样()l用不同的随机种子重复保持方法用不同的随机种子重复保持方法k

3、次,并计算这次,并计算这k次重复的平均性能次重复的平均性能l请记住:训练集的大小会影响性能2020/12/3模型评估第4-5课评价方法 Holdout method():无疑是最简单的模型评估技术 将数据集拆分为两个不相交的部分:训练集和测试集 请记住:拆分数据集的方法有很多种,不同的方法带来不同的性能l分层抽样()l用不同的随机种子重复保持方法用不同的随机种子重复保持方法k次,并计算这次,并计算这k次重复的平均性能次重复的平均性能l请记住:训练集的大小会影响性能l以约以约2/34/5的数据集作为训练数据的数据集作为训练数据2020/12/3模型评估第4-6课评价方法lK-折叠交叉验证(K):

4、可能是一种最常见但计算量更大的方法l将数据集拆分为k个不相交的部分,称为折叠lk的典型选择是的典型选择是5,10或或20lK-fold交叉验证是交叉验证的一种特殊情况,我们在一个数据集上迭代k次l在每一轮中,一个部分用于验证,剩余的k-1个部分合并为一个训练子集用于模型评估2020/12/3模型评估第4-7课评价方法lK-fold Cross-validation(K折交叉验证法):is probably a most common but more computationally intensive approach.l5-fold2023-11-4Model EvaluationLesso

5、n 4-8Evaluation MethodslK-fold Cross-validation(K折交叉验证法):is probably a most common but more computationally intensive approach.lKeep in mind:there the larger the number of folds used in k-fold CV,the better the error estimates will be,but the longer your program will take to run.lSolution:use at lea

6、st 10 folds(or more)when you canlLeave-One-Out(留一法):LOO,is a special case when k=number of datalLOOCV can be useful for small datasets2023-11-4Model EvaluationLesson 4-9Evaluation MethodslBootstrapping(自助法):bootstrap sampling technique for estimating a sampling distributionlthe idea of the bootstrap

7、 method is to generate new data from a population by repeated sampling from the original dataset with replacement2023-11-4Model EvaluationLesson 4-10Evaluation MethodslBootstrapping(自助法):bootstrap sampling technique for estimating a sampling distributionlthe idea of the bootstrap method is to genera

8、te new data from a population by repeated sampling from the original dataset with replacementlapproximately select 0.632n samples as bootstrap training sets and reserve 0.368n out-of-bag samples for testing in each iteration.2023-11-4Model EvaluationLesson 4-11Evaluation Metrics2023-11-4Model Evalua

9、tionLesson 4-12Metrics for Binary classificationlMeasuring model performance with accuracylFraction of correctly classified sampleslIt is really only suitable when there are an equal number of observations in each class(which is rarely the case)and that all predictions and prediction errors are equa

10、lly important,which is often not the caseDefinitionniiiyylnyyAcc1)(1,2023-11-4Model EvaluationLesson 4-13Metrics for Binary classificationlMeasuring model performance with accuracylFraction of correctly classified sampleslNot always a useful metric,may be misleadingExample:Email lSpam classification

11、l99%of email are real,1%of email are spamlCould build a model that predicts all email are reallaccurcy=99%lBut horrible at actually classifying spamlFails at its original purposeMetrics for Binary Classification2023-11-4Model EvaluationLesson 4-14Confusion matrixlOne of the most comprehensive ways t

12、o represent the result of evaluating binary classification2023-11-4Model EvaluationLesson 4-15Metrics for Binary ClassificationError rate&AccuracylThe error rate can be understood as the sum of all false predictions divided by the number of total predictions,and the accuracy is calculated as the sum

13、 of correct predictions divided by the total number of predictions,respectively:TNTPFNFPFNFPErrTNTPFNFPTNTPAcc2023-11-4Model EvaluationLesson 4-16Metrics from the confusion matrixPrecision(查准率查准率)lPrecision:measures how many of the samples predicted as positive are actually positiveFPTPTPPlPrecision

14、 is used as a performance metric when the goal is to limit the number of false positives.lExamples:lPredicting whether a new drug will be effective in treating a disease in clinical trials2023-11-4Model EvaluationLesson 4-17Metrics from the confusion matrixRecall(查全率,查全率,召回率召回率)lRecall:measures how

15、many of the positive are captured by the positive predictionsFNTPTPRlPrecision is used as a performance metric when we need to indentify all positive samples.lExamples:lFind people that are sick2023-11-4Model EvaluationLesson 4-18Metrics from the confusion matrixTradeoff between Precision and Recall

16、lTo get higher precision by increasing threshold lTo get higher recall by reducing threshold 2023-11-4Model EvaluationLesson 4-19Metrics from the confusion matrixTradeoff between Precision and RecallTradeoff between Precision and Recall2023-11-4Model EvaluationLesson 4-20Metrics from the confusion m

17、atrixTradeoff between Precision and RecallTradeoff between Precision and RecallF1:F-score or F-measurelF-score:is with the harmonic mean(调和平均数)of precision and recallRPRPF 21AlgorithmPRAverageF1A10.50.40.450.444A20.70.10.40.175A30.0210.510.03922023-11-4Model EvaluationLesson 4-21Metrics from the con

18、fusion matrixGeneral F-measure:FFPFNTPTPRPRPF22222)1()1()1(lWhen=1,becoming F1lWhen 1,placing more emphasis on false negative,and weighing recall higher than precisionlWhen 1,attenuating the influence of false negative,and weighing recall lower than precision2023-11-4Model EvaluationLesson 4-22Metri

19、cs for Binary ClassificationGeneral F-measure:FReceiver operating characteristics(ROC)lROC(受试者工作特征):considers all possible thresholds for a given classifier,and shows the false positive rate(FPR)against the true positive rate(TPR)FNTPTPTPRFPTNFPFPR2023-11-4Model EvaluationLesson 4-23Metrics for Bina

20、ry ClassificationArea Under ROC Curve(AUC)Model Selection 有了实验评估方法和性能度量,看起来就能对学习器的性能进行评估比较了:先使用某种实验评估方法测得学习器的某个性能度量结果,然后对这些结果进行比较.首先,我们希望比较的是泛化性能,然而通过实验评估方法我们获得的是测试集上的性能,两者的对比结果可能未必相同;第二,测试集上的性能与测试集本身的选择有很大关系,且不论使用不同大小的测试集会得到不同的结果,即使用相同大小的测试集?若包含的测试样例不同,测试结果也会有不同;第二,很多机器学习算法本身有一定的随机性,即便用相同的参数设置在同一个测

21、试集上多次运行,其结果也会有不同.2023-11-4Model EvaluationLesson 4-24Model Selection 统计假设检验(hypothesis test)为我们进行学习器性能比较提供了重要依据,基于假设检验结果我们可:对单个学习器泛化性能的假设进行检 对多个学习器进行性能比较。若在测试集上观察到学习器A 比B 好,则A 的泛化性能是否在统计意义上优于B,以及这个结论的把握有多大.2023-11-4Model EvaluationLesson 4-25Model Selection 统计假设检验(hypothesis test)为我们进行学习器性能比较提供了重要依据

22、,基于假设检验结果我们可:对单个学习器泛化性能的假设进行检 对多个学习器进行性能比较。若在测试集上观察到学习器A 比B 好,则A 的泛化性能是否在统计意义上优于B,以及这个结论的把握有多大.2023-11-4Model EvaluationLesson 4-26An hypothesis testing probleml Consider a model with hold out methodlSupport that the model was performed 5 times,and the accuracy are 0.99,0.98,0.99,0.94,0.95lCan we sa

23、y that the mean accuracy is different from 0.97?lConsider the grader of two modelslA had 15,10,12,19,5,7lB had 14,11,11,12,6,7lCan we say A had better grades than B?lA statistic test aims to answer such questionsconfidence interval(置信区间)点估计与区间估计 点估计:用样本统计量来估计总体参数,因为样本统计量为数轴上某一点值,估计的结果也以一个点的数值表示,所以称为

24、点估计。点估计虽然给出了未知参数的估计值,但是未给出估计值的可靠程度,即估计值偏离未知参数真实值的程度。区间估计:给定置信水平,根据估计值确定真实值可能出现的区间范围,该区间通常以估计值为中心,该区间则为置信区间。2023-11-4Model EvaluationLesson 4-27confidence interval(置信区间)2023-11-4Model EvaluationLesson 4-28confidence interval(置信区间)点估计与区间估计 标准差(standard deviation)与标准误差(standard error)95%的置信区间 假设X服从正态分布

25、:X N(,2)不断进行采样,假设样本的大小为n,则样本的均值为:M=(X1+X2+Xn)/n 由大数定理与中心极限定理,M 服从:M N(,12)2023-11-4Model EvaluationLesson 4-29confidence interval(置信区间)2023-11-4Model EvaluationLesson 4-30Hypothesis Testing and Statistical Significance The process of hypothesis testing Null hypothesis:The null hypothesis is a model

26、of the system based on the assumption that the apparent effect was actually due to chance.p-value:The p-value is the probability of the apparent effect under the null hypothesis.Interpretation:Based on the p-value,we conclude that the effect is either statistically significant,or not.paired t-test T

27、he t-test is an example of a parametric test.It is applicable when the null hypothesis states that the difference between two responses has mean zero and unknown variance.The t-test actually assumes that data is distributed according to a Gaussian distribution.2023-11-4Model EvaluationLesson 4-31Hyp

28、othesis Testing and Statistical Significance For example,we might run 5-fold cross validation and compute f-score for every fold.Perhaps the f-scores are 92.4,93.9,96.1,92.2 and 94.4.This gives us an average f-score of 93.8 over the 5 folds.The standard deviation of this set of f-scores is:We can no

29、w assume that the distribution of scores is approximately Gaussian and calculate the 95%confidence interval.scipy.stats.ttest_ind,scipy.stats.ttest_relhttps:/docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html#scipy.stats.ttest_ind2023-11-4Model EvaluationLesson 4-32T检验的资料 t检验的工作原理和在Python中的实现https:/ 利用python进行T检验https:/ EvaluationLesson 4-33Summary Evaluation method Holdout Method(留出法)K-fold Cross-validation(K折交叉验证法)Bootstrapping(自助法)Evaluation metrics Accuracy Precision Recall F-score AUC Model selection2023-11-4Model EvaluationLesson 4-34

侵权处理QQ:3464097650--上传资料QQ:3464097650

【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。


163文库-Www.163Wenku.Com |网站地图|