1、Structured Support Vector MachineHung-yi LeeStructured Learning We need a more powerful function f Input and output are both objects with structures Object:sequence,list,tree,bounding box X is the space of one kind of object Y is the space of another kind of object YXf:Unified Framework Find a funct
2、ion F F(x,y):evaluate how compatible the objects x and y isStep 1:Training Given an object xStep 2:Inference(Testing)R:FYXyxFyYy,maxargThree Problems What does F(x,y)look like?Problem 1:Evaluation How to solve the“arg max”problemProblem 2:Inference Given training data,how to find F(x,y)Problem 3:Tra
3、iningyxFyYy,maxargExample Task:Object DetectionSource of image:http:/citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.295.6007&rep=rep1&type=pdfhttp:/www.vision.ee.ethz.ch/hpedemo/gallery.phpKeep in mind that what you will learn today can be applied to other tasks.Example TaskProblem 1:Evaluation F
4、(x,y)is linear Open question:What if F(x,y)is not linear?Problem 2:Inference=1.1=8.2=0.3=10.1=-1.5=5.6maxProblem 2:Inferencehttp:/ 3:TrainingNN2211,yxyxyxTraining data:PrincipleWe should find F(x,y)such that 11,Fyxyx,F11 yy for all22,Fyxyx,F22 yy for allNN,Fyxyx,FNN yy for allLets ignore problems 1
5、and 2 and only focus on problem 3 today.OutlineBeyond Structured SVM(open question)Multi-class and binary SVMCutting Plane Algorithm for Structured SVMStructured SVMRegularizationConsidering ErrorsNon-separable caseSeparable caseOutlineBeyond Structured SVM(open question)Multi-class and binary SVMCu
6、tting Plane Algorithm for Structured SVMStructured SVMRegularizationConsidering ErrorsNon-separable caseSeparable caseAssumption:Separablew 11,yxyx,122,yxyx,2yxwyxw,111yxwyxw,222Structured PerceptronNN2211,yxyxyxnnyx,yxwynYyn,maxargnnnnyxyxww,nnyy(problem 2)We are done!Warning of Math:marginNot rela
7、ted to the space of y!Proof of Terminationnnnnkkyxyxww,112100kkwwwwww is updated once it sees a mistake(the relation of wk and wk-1)yxwyxwnnn,nyYy(All incorrect label for an example)n(All training examples)Remind:we are considering the separable caseProof of Terminationnnnnkkyxyxww,112100kkwwwwwnnnn
8、kkyxyxwwww,1nnnnkyxwyxwww,11kwww is updated once it sees a mistake Proof that:The angle k between and wk is smaller as k increasesw(the relation of wk and wk-1)Analysis kcos(larger and larger?)(Separable)kkkwwwwcosProof of Termination1kkwwwwProof that:The angle k between and wk is smaller as k incre
9、asesw Analysis kcos(larger and larger?)kkkwwwwcos01wwww12wwwwkwwk.1 ww22ww.(so what)12100kkwwwwww is updated once it sees a mistake(the relation of wk and wk-1)=0nnnnkkyxyxww,1Proof of Termination212,nnnnkkyxyxwwnnnnknnnnkyxyxwyxyxw,2,1221221Rkw22Rkwk0?0(mistake)Assume the distance between any two f
10、eature vectors is smaller than R22021R ww2R22122R ww2R2kkkwwwwcosnnnnkkyxyxww,1Proof of Termination22Rkwkkwwk2kRkRkkkcosRk1cosk1Rk2RkkkkwwwwcosEnd of Warning:marginNot related to the space of y!How to make training fast?2RkMargin:Is it easy to separable red points from the blue onesNormalizationnnyx
11、,yxn,All feature times 2RLarger margin,less updateThe largest distances between featuresOutlineBeyond Structured SVM(open question)Multi-class and binary SVMCutting Plane Algorithm for Structured SVMStructured SVMRegularizationConsidering ErrorsNon-separable caseSeparable caseNon-separable Case When
12、 the data is non-separable,some weights are still better than the others.Defining Cost Function Define a cost C to evaluate how bad a w is,and then pick the w minimizing the cost CWhat is the minimum value?Other alternatives?(Stochastic)Gradient Descent(Stochastic)Gradient descent:When w is differen
13、t,the y can be different.Space of w(Stochastic)Gradient DescentFor t=1 to T:Update the parameters T timesstochasticLocate the regionsimpleOutlineBeyond Structured SVM(open question)Multi-class and binary SVMCutting Plane Algorithm for Structured SVMStructured SVMRegularizationConsidering ErrorsNon-s
14、eparable caseSeparable caseBased on what we have considered.Treat all incorrect y equallyThe right case is better.very bad!acceptableConsidering the incorrect onesClose to correct boxDifferent from correct boxsmallerlargerHow to measure the differenceDefining Error Function(0)Another Cost Functionma
15、rginmarginGradient DescentIn each iteration,Oh no!Problem 2.1Another Viewpoint Minimizing the new cost function is minimizing the upper bound of the errors on training setIt is hard!upper bound Another ViewpointMore Cost FunctionsMargin rescaling:Slack variable rescaling:OutlineBeyond Structured SVM
16、(open question)Multi-class and binary SVMCutting Plane Algorithm for Structured SVMStructured SVMRegularizationConsidering ErrorsNon-separable caseSeparable caseRegularizationKeep the incorrect answer from a margin depending on errorsw close to zero can minimize the influence of mismatch.Training da
17、ta and testing data can have different distribution.Regularization:Find the w close to zeroRegularizationIn each iteration,Weight decay as in DNNOutlineBeyond Structured SVM(open question)Multi-class and binary SVMCutting Plane Algorithm for Structured SVMStructured SVMRegularizationConsidering Erro
18、rsNon-separable caseSeparable caseStructured SVMAre they equivalent?We want to minimize CStructured SVMSlack variableStructured SVM=0=0It is possible that no w can achieve this.marginmarginmarginStructured SVM-Intuition(lots of inequalities)slack variablemarginmarginStructured SVM-Intuition(lots of
19、inequalities)MinimizeTraining data:(lots of inequalities)(lots of inequalities)Structured SVM-IntuitionStructured SVMToo many constraints Solve it by the solver in SVM packageQuadratic Programming(QP)ProblemOutlineBeyond Structured SVM(open question)Multi-class and binary SVMCutting Plane Algorithm
20、for Structured SVMStructured SVMRegularizationConsidering ErrorsNon-separable caseSeparable caseSource of image:http:/ Plane AlgorithmParameter spaceColor is the value of C which is going to be minimized:Solution without constraintsSolution with constraintsImage credit:Yisong YueCutting Plane Algori
21、thmParameter spaceGreen line:Remove this constraint will not influence the solutionRed lines:determine the solutionAlthough there are lots of constraints,most of them do not influence the solution.working setImage credit:Yisong YueCutting Plane Algorithm obtain solution wSolve a QP problemRepeatedly
22、Cutting Plane AlgorithmNo constraint at allSolving QPThe solution w is the blue point.Image credit:Yisong YueCutting Plane AlgorithmThere are lots of constraints is violatedFind the most violated oneSuppose it is the constraint from yExtent the working setyImage credit:Yisong YueCutting Plane Algori
23、thmImage credit:Yisong YueFind the most violated oneConstraint:Violate a Constraint:Degree of ViolationThe most violated one:Cutting Plane AlgorithmGiven training data:RepeatFind,1 minimizing QP:Cutting Plane Algorithmfind the most violated constraintsGiven training data:RepeatTraining data:Find,1,2
24、 minimizing QP:There is no constraintTraining data:=1.0=1.0=1.0=0.25=0.90=0.88Training data:Find,1,2 minimizing QP:Training data:=-0.99=-1.10=1.01=1.25=0.97=1.55,Training data:Find,1,2 minimizing QP:,The process repeats iterativelyConcluding RemarksBeyond Structured SVM(open question)Multi-class and
25、 binary SVMCutting Plane Algorithm for Structured SVMStructured SVMRegularizationConsidering ErrorsNon-separable caseSeparable caseMulti-class SVMMulti-class SVM Problem 2:Inference The number of classes are usually small,so we can just enumerate them.Multi-class SVM Problem 3:Training Some types of
26、 misclassifications may be worse than others.(defined as your wish)There are only N(K-1)constraints.Binary SVM Set K=2=1If y=1:If y=2:Concluding RemarksBeyond Structured SVM(open question)Multi-class and binary SVMCutting Plane Algorithm for Structured SVMStructured SVMRegularizationConsidering Erro
27、rsNon-separable caseSeparable caseBeyond Structured SVMDNNStructured SVMRef:Hao Tang,Chao-hong Meng,Lin-shan Lee,An initial attempt for phoneme recognition using Structured Support Vector Machine(SVM),ICASSP,2010Shi-Xiong Zhang,Gales,M.J.F.,Structured SVMs for Automatic Speech Recognition,in Audio,S
28、peech,and Language Processing,IEEE Transactions on,vol.21,no.3,pp.544-555,March 2013Beyond Structured SVM Jointly training structured SVM and DNNDNNStructured SVMjointly trainedRef:Shi-Xiong Zhang,Chaojun Liu,Kaisheng Yao,and Yifan Gong,“DEEP NEURAL SUPPORT VECTOR MACHINES FOR SPEECH RECOGNITION”,In
29、terspeech 2015Beyond Structured SVM Replacing Structured SVM with DNNDNNDNNjointly trainedRef:Yi-Hsiu Liao,Hung-yi Lee,Lin-shan Lee,Towards Structured Deep Neural Network for Automatic Speech Recognition,ASRU,2015http:/speech.ee.ntu.edu.tw/tlkagk/paper/DNN_ASRU15.pdfConcluding RemarksBeyond Structured SVM(open question)Multi-class and binary SVMCutting Plane Algorithm for Structured SVMStructured SVMRegularizationConsidering ErrorsNon-separable caseSeparable caseAcknowledgement 感謝 盧柏儒 同學於上課時發現投影片上的錯誤 感謝 徐翊祥 同學於上課時發現投影片上的錯誤