学习视频课件Gradient Descent (v2).pptx

上传人(卖家):晟晟文业 文档编号:4729961 上传时间:2023-01-05 格式:PPTX 页数:37 大小:1.65MB
下载 相关 举报
学习视频课件Gradient Descent (v2).pptx_第1页
第1页 / 共37页
学习视频课件Gradient Descent (v2).pptx_第2页
第2页 / 共37页
学习视频课件Gradient Descent (v2).pptx_第3页
第3页 / 共37页
学习视频课件Gradient Descent (v2).pptx_第4页
第4页 / 共37页
学习视频课件Gradient Descent (v2).pptx_第5页
第5页 / 共37页
点击查看更多>>
资源描述

1、Gradient DescentReview:Gradient Descent In step 3,we have to solve the following optimization problem:L:loss functionSuppose that has two variables 1,2Review:Gradient Descent MovementGradientGradient:Loss 的等高線的法線方向Gradient DescentTip 1:Tuning your learning ratesLearning RateNo.of parameters updatesL

2、ossLossVery LargeLargesmallJust make11iiiLSet the learning rate carefullyIf there are more than three parameters,you cannot visualize this.But you can always visualize this.Adaptive Learning RatesAdagrad Divide the learning rate of each parameter by the root mean square of its previous derivativesw

3、is one parametersVanilla Gradient descentAdagradParameter dependentAdagradAdagrad Divide the learning rate of each parameter by the root mean square of its previous derivatives1/t decayContradiction?Vanilla Gradient descentAdagradLarger gradient,larger stepLarger gradient,smaller stepLarger gradient

4、,larger stepIntuitive Reason How surprise it is造成反差的效果g0g1g2g3g40.001 0.001 0.003 0.002 0.1g0g1g2g3g410.820.931.712.10.1反差特別大特別小Larger gradient,larger steps?Best step:Larger 1st order derivative means far from the minimaComparison between different parametersabcdc da bLarger 1st order derivative mea

5、ns far from the minimaDo not cross parametersSecond DerivativeBest step:The best step is|First derivative|Second derivativeComparison between different parametersabcdc da bLarger 1st order derivative means far from the minimaDo not cross parameters|First derivative|Second derivativeThe best step isS

6、maller SecondLargerSecondLarger second derivativesmaller second derivative|First derivative|Second derivativeThe best step isUse first derivative to estimate second derivativelarger second derivativesmaller second derivative?Gradient DescentTip 2:Stochastic Gradient Descent Make the training fasterS

7、tochastic Gradient DescentuGradient DescentuStochastic Gradient Descent11iiiL11iniiLPick an example xnFaster!Loss is the summation over all training examplesLoss for only one example DemoStochastic Gradient DescentGradient DescentStochastic Gradient DescentSee all examplesSee all examplesSee only on

8、e exampleUpdate after seeing all examplesIf there are 20 examples,20 times faster.Update for each exampleGradient DescentTip 3:Feature ScalingFeature ScalingMake different features have the same scalingSource of figure:http:/cs231n.github.io/neural-networks-2/Feature Scalingy1w2w1x2xb1,2 100,200 1w2

9、wLoss Ly1w2w1x2xb1,2 1w2wLoss L1,2 Feature ScalingThe means of all dimensions are 0,and the variances are all 1 For each dimension i:Gradient DescentTheoryQuestionby gradient descentIs this statement correct?Warning of Math12Formal Derivation Suppose that has two variables 1,2012How?L()Given a point

10、,we can easily find the point with the smallest value nearby.Taylor Series Taylor series:Let h(x)be any function infinitely differentiable around x=x0.kkkxxkx000!hxh 200000!2xxxhxxxhxhWhen x is close to x0 000 xxxhxhxhsin(x)=E.g.Taylor series for h(x)=sin(x)around x0=/4The approximation is good arou

11、nd/4.Multivariable Taylor Series00000000,yyyyxhxxxyxhyxhyxhWhen x and y is close to x0 and y000000000,yyyyxhxxxyxhyxhyxh+something related to(x-x0)2 and(y-y0)2+Back to Formal Derivation12ba,bbaababa2211,L,L,LLBased on Taylor Series:If the red circle is small enough,in the red circle21,L,Lbavbau bvau

12、s21Lbas,LL()12Back to Formal DerivationBased on Taylor Series:If the red circle is small enough,in the red circle bvaus21Lbas,Lba,L()Find 1 and 2 in the red circle minimizing L()22221dba21,L,LbavbauconstantdSimple,right?Gradient descent two variablesRed Circle:(If the radius is small)bvaus21L121221,

13、vu,vu21To minimize L()vuba21Find 1 and 2 in the red circle minimizing L()22221dba21,Back to Formal Derivationvuba2121,C,CbababaThis is gradient descent.Based on Taylor Series:If the red circle is small enough,in the red circle bvaus21Lbas,L21,L,LbavbauconstantNot satisfied if the red circle(learning rate)is not small enoughYou can consider the second order term,e.g.Newtons method.End of Warning More Limitation of Gradient DescentLossThe value of the parameter wVery slow at the plateauStuck at local minimaStuck at saddle point

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 办公、行业 > 各类PPT课件(模板)
版权提示 | 免责声明

1,本文(学习视频课件Gradient Descent (v2).pptx)为本站会员(晟晟文业)主动上传,163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。
2,用户下载本文档,所消耗的文币(积分)将全额增加到上传者的账号。
3, 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(发送邮件至3464097650@qq.com或直接QQ联系客服),我们立即给予删除!


侵权处理QQ:3464097650--上传资料QQ:3464097650

【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。


163文库-Www.163Wenku.Com |网站地图|