深度学习综述讨论简介deepLearning课件.pptx_163文库

资源描述

1、Introduction to Deep LearningHuihui LiuMar.1,20171Outline Conception of deep learning Development history Deep learning frameworks Deep neural network architectures Convolutional neural networks Introduction Network structure Training tricks Application in Aesthetic Image Evaluation Idea 2Deep Learn

2、ing(Hinton,2006)Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data.The advantage of deep learning is to extracting features automatically instead of extracting features manually.Computer vision Speech recognition Natural l

3、anguage processing3Development History19431940 1950 1960 1970 1980 1990 2000 2010 MP model1958Single-layerPerceptron1969XORproblem1986BP algorithm1989CNN-LeNet1995 1997SVMLSTMGradient disappearance problem19912006DBNReLU2011 2012 2015DropoutAlexNetBNFaster R-CNNResidualNetGeoffrey HintonW.S.McCulloc

4、hW.PittsRosenblattMarvin MinskyYann LeCunHintonHintonHintonLeCunBengio4Deep Learning Frameworks5Deep neural network architectures Deep Belief Networks(DBN)Recurrent Neural Networks(RNN)Generative Adversarial Networks(GANs)Convolutional Neural Networks(CNN)Long Short-Term Memory(LSTM)6DBN(Deep Belief

5、 Network,2006)Hidden units and visible units Each unit is binary(0 or 1).Every visible unit connects to all the hidden units.Every hidden unit connects to all the visible units.There are no connections between v-v and h-h.Hinton G E.Deep belief networksJ.Scholarpedia,2009,4(6):5947.Fig1.RBM(restrict

6、ed Boltzmann machine)structure.Fig2.DBN(deep belief network)structure.Idea?Composed of multiple layers of RBM.How to we train these additional layers?Unsupervised greedy approach7RNN(Recurrent Neural Network,2013)What?RNN aims to process the sequence data.RNN will remember the previous information a

7、nd apply it to the calculation of the current output.That is,the nodes of the hidden layer are connected,and the input of the hidden layer includes not only the output of the input layer but also the output of the hidden layer.Marhon S A,Cameron C J F,Kremer S C.Recurrent Neural NetworksM/Handbook o

8、n Neural Information Processing.Springer Berlin Heidelberg,2013:29-65.Applications?Machine TranslationGenerating Image DescriptionsSpeech RecognitionHow to train?BPTT(Back propagation through time)8GANs(Generative Adversarial Networks,2014)GANs Inspired by zero-sum Game in Game Theory,which consists

9、 of a pair of networks-a generator network and a discriminator network.The generator network generates a sample from the random vector,the discriminator network discriminates whether a given sample is natural or counterfeit.Both networks train together to improve their performance until they reach a

10、 point where counterfeit and real samples can not be distinguished.Goodfellow I,Pouget-Abadie J,Mirza M,et al.Generative adversarial netsC/Advances in neural information processing systems.2014:2672-2680.Applacations:Image editingImage to image translationGenerate textGenerate images based on textCo

11、mbined with reinforcement learningAnd more9Long Short-Term Memory(LSTM,1997)10Neural NetworksNeuronNeural network11Convolutional Neural Networks(CNN)Convolution neural network is a kind of feedforward neural network,which has the characteristics of simple structure,less training parameters and stron

12、g adaptability.CNN avoids the complex pre-processing of image(etc.extract the artificial features),we can directly input the original image.Basic components:Convolution Layers,Pooling Layers,Fully connected Layers12Convolution layer The convolution kernel translates on a 2-dimensional plane,and each

13、 element of the convolution kernel is multiplied by the element at the corresponding position of the convolution image and then sum all the product.By moving the convolution kernel,we have a new image,which consists of the sum of the product of the convolution kernel at each position.local receptive

14、 fieldweight sharingReduced the number of parameters13Pooling layerPooling layer aims to compress the input feature map,which can reduce the number of parameters in training process and the degree of over-fitting of the model.Max-pooling:Selecting the maximum value in the pooling window.Mean-pooling

15、:Calculating the average of all values in the pooling window.14Fully connected layer and Softmax layerEach node of the fully connected layer is connected to all the nodes of the last layer,which is used to combine the features extracted from the front layers.Fig1.Fully connected layer.Fig2.Complete

16、CNN structure.Fig3.Softmax layer.15Training and Testing Forward propagation -Taking a sample(X,Yp)from the sample set and put the X into the network;-Calculating the corresponding actual output Op.Back propagation -Calculating the difference between the actual output Op and the corresponding ideal o

17、utput Yp;-Adjusting the weight matrix by minimizing the error.Training stage:Testing stage:Putting different images and labels into the trained convolution neural network and comparing the output and the actual value of the sample.Before the training stage,we should use some different small random n

18、umbers to initialize weights.16CNN Structure EvolutionHinton BPNeocognitionLeCunLeNetAlexNetHistorical breakthroughReLUDropoutGPU+BigDataVGG16VGG19MSRA-NetDeeper networkNINGoogLeNetInception V3Inception V4R-CNNSPP-NetFast R-CNNFaster R-CNNInception V2(BN)FCNFCN+CRFSTNetCNN+RNN/LSTMResNetEnhanced the

19、 functionality of the convolution moduleClassification taskDetection taskAdd new functional unitintegration19801998198920142015ImageNetILSVRC(ImageNet Large Scale Visual Recognition Challenge)20132014201520152014,2015201520122015BN(Batch Normalization)RPN17LeNet(LeCun,1998)LeNet is a convolutional n

20、eural network designed by Yann LeCun for handwritten numeral recognition in 1998.It is one of the most representative experimental systems in early convolutional neural networks.LeNet includes the convolution layer,pooling layer and full-connected layer,which are the basic components of modern CNN n

21、etwork.LeNet is considered to be the beginning of the CNN.network structure:3 convolution layers+2 pooling layers+1 fully connected layer+1 output layerHaykin S,Kosko B.GradientBased Learning Applied to Document RecognitionD.Wiley-IEEE Press,2009.18AlexNet(Alex,2012)Network structure :5 convolution

22、layers+3 fully connected layers The nonlinear activation function:ReLU(Rectified linear unit)Methods to prevent overfitting:Dropout,Data Augmentation Big Data Training:ImageNet-image database of million orders of magnitude Others:GPU,LRN(local response normalization)layerKrizhevsky A,Sutskever I,Hin

23、ton G E.ImageNet classification with deep convolutional neural networksC/International Conference on Neural Information Processing Systems.Curran Associates Inc.2012:1097-1105.19Overfeat(2013)Sermanet P,Eigen D,Zhang X,et al.OverFeat:Integrated Recognition,Localization and Detection using Convolutio

24、nal NetworksJ.Eprint Arxiv,2013.20VGG-Net(Oxford University,2014)input:a fixed-size 224*224 RGB imagefilters:a very small receptive field-3*3,with stride 1Max-pooling:2*2 pixel window,with stride 2Fig1.Architecture of VGG16Table 1:ConvNet configurations(shown in columns).The convolutional layer para

25、meters are denoted as“conv-”Simonyan K,Zisserman A.Very Deep Convolutional Networks for Large-Scale Image RecognitionJ.Computer Science,2014.Why 3*3 filters?Stacked conv.layers have a large receptive fieldMore non-linearityLess parameters to learn21Network-in-Network(NIN,Shuicheng Yan,2013)Network s

26、tructure:4 Mlpconv layers+Global average pooling layerFig 1.linear convolution MLP convolutionFig 2.fully connected layer global average pooling layerMin Lin et al,Network in Network,Arxiv 2013.Fig 3.NIN structure Linear combination of multiple feature maps.Information integration of cross-channel.R

27、educed the parameters Reduced the network Avoided over-fitting22GoogLeNet(Inception V1,2014)Fig1.Inception module,nave versionProposed inception architecture and optimized itCanceled the fully connnected layerUsed auxiliary classifiers to accelerate network convergenceSzegedy C,Liu W,Jia Y,et al.Goi

28、ng deeper with convolutionsC/Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.Fig2.Inception module with dimension reductionsFig3.GoogLeNet network(22 layers)23Inception V2(2015)Ioffe S,Szegedy C.Batch normalization:Accelerating deep network training by reducing

29、 internal covariate shiftJ.arXiv preprint arXiv:1502.03167,2015.24Inception V3(2015)Szegedy C,Vanhoucke V,Ioffe S,et al.Rethinking the inception architecture for computer visionC/Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.25ResNet(Kaiwen He,2015)A si

30、mple and clean framework of training “very”deep networks.State-of-the-art performance forImage classificationObject detectionSemantic Segmentationand moreHe K,Zhang X,Ren S,et al.Deep Residual Learning for Image RecognitionJ.2015:770-778.Fig1.Shortcut connectionsFig2.ResNet structure(152 layers)26Fr

31、actalNet27Inception V4(2015)Szegedy C,Ioffe S,Vanhoucke V,et al.Inception-v4,inception-resnet and the impact of residual connections on learningJ.arXiv preprint arXiv:1602.07261,2016.28Inception-ResNetHe K,Zhang X,Ren S,et al.Deep Residual Learning for Image RecognitionJ.2015:770-778.29Comparison30S

32、queezeNet SqueezeNet:AlexNet-level accuracy with 50 x fewer parameters and 0.5MB model size31Xception32R-CNN(2014)Region proposals:Selective SearchResize the region proposal:Warp all region proposals to the required size(227*227,AlexNet Input)Compute CNN feature:Extract a 4096-dimensional feature ve

33、ctor from each region proposal using AlexNet.Classify:Training a linear SVM classifier for each class.1Uijlings J R R,Sande K E A V D,Gevers T,et al.Selective Search for Object RecognitionJ.International Journal of Computer Vision,2013,104(2):154-171.2Girshick R,Donahue J,Darrell T,et al.Rich Featur

34、e Hierarchies for Accurate Object Detection and Semantic SegmentationJ.2014:580-587.R-CNN:Region proposals+CNN33SPP-Net(Spatial pyramid pooling network,2015)He K,Zhang X,Ren S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual RecognitionJ.IEEE Transactions on Pattern Analysis&M

35、achine Intelligence,2015,37(9):1904-1916.Fig2.A network structure with a spatial pyramid pooling layer.Fig1.Top:A conventional CNN.Bottom:Spatial pyramid pooling network structure.Advantages:Get the feature map of the entire image to save much time.Output a fixed length feature vector with inputs of

36、 arbitrary sizes.Extract the feature of different scale,and can express more spatial information.The SPP-Net method computes a convolutional feature map for the entire input image and then classifies each object proposal using a feature vector extracted from the shared feature map.34Fast R-CNN(2015)

37、A Fast R-CNN network takes an entire image and a set of object proposals as input.The network processes the entire image with several convolutional(conv)and max pooling layers to produce a conv feature map.For each object proposal,a region of interest(RoI)pooling layer extracts a fixed-length featur

38、e vector from the feature map.Each feature vector is fed into a sequence of fully connected layers that finally branch into two sibling output layers.Girshick R.Fast r-cnnC/Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.35Faster R-CNN(2015)Faster R-CNN=RPN+Fast R-

39、CNN A Region Proposal Network(RPN)takes an image(of any size)as input and outputs a set of rectangular object proposals,each with an objectness score.Ren S,He K,Girshick R,et al.Faster r-cnn:Towards real-time object detection with region proposal networksC/Advances in neural information processing s

40、ystems.2015:91-99.Figure 1.Faster R-CNN is a single,unified network for object detection.Figure 2.Region Proposal Network(RPN).36Training tricks Data Augmentation Dropout ReLU Batch Normalization37Data Augmentation-rotation-flip-zoom-shift-scale-contrast-noise disturbance-color-.38Dropout(2012)Dropo

41、ut consists of setting to zero the output of each hidden neuron with probability p.The neurons which are“dropped out”in this way do not contribute to the forward backpropagation and do not participate in backpropagation.39ReLU(Rectified Linear Unit)advantagesrectified Simplified calculation Avoided

42、gradient disappeared40Batch Normalization(2015)In the input of each layer of the network,insert a normalized layer.For a layer with d-dimensional input x=(x(1).x(d),we will normalize each dimension：Ioffe S,Szegedy C.Batch normalization:Accelerating deep network training by reducing internal covariat

43、e shiftJ.arXiv preprint arXiv:1502.03167,2015.Internal Covariate Shift41Application in Aesthetic Image Evaluation Dong Z,Shen X,Li H,et al.Photo Quality Assessment with DCNN that Understands Image WellM/MultiMedia Modeling.Springer International Publishing,2015:524-535.Lu X,Lin Z,Jin H,et al.Rating

44、image aesthetics using deep learningJ.IEEE Transactions on Multimedia,2015,17(11):2021-2034.Wang W,Zhao M,Wang L,et al.A multi-scene deep learning model for image aesthetic evaluationJ.Signal Processing Image Communication,2016,47:511-518.42Photo Quality Assessment with DCNN that Understands Image W

45、ellDCNN_Aesthtrained well network a two-class SVM classifierDCNN_Aesth_SPoriginal imagessegmented images spatial pyramidImageNetCUHKAVADong Z,Shen X,Li H,et al.Photo Quality Assessment with DCNN that Understands Image WellM/MultiMedia Modeling.Springer International Publishing,2015:524-535.43Rating

46、image aesthetics using deep learningSupport heterogeneous inputs,i.e.,global and local views.All parameters in DCNN are jointly trained.Fig1.Global views and local views of an imageFig3.DCNN architectureFig2.SCNN architectureSCNNDCNNEnables the network to judge image aesthetics while simultaneously

47、considering both the global and local views of an image.Lu X,Lin Z,Jin H,et al.Rating image aesthetics using deep learningJ.IEEE Transactions on Multimedia,2015,17(11):2021-2034.44A multi-scene deep learning model for image aesthetic evaluation Design a scene convolutional layer consist of multi-gro

48、up descriptors in the network.Design a pre-training procedure to initialize our model.Fig1.The architecture of the multi-scene deep learning model(MSDLM).Fig2.The over view of proposed MSDLM.Architecture of MSDLM:4 convolutional layers+1 scene convolutional layer+3 fully connected layersWang W,Zhao

49、M,Wang L,et al.A multi-scene deep learning model for image aesthetic evaluationJ.Signal Processing Image Communication,2016,47:511-518.45Example-Load the datasetdef load_dataset():url=http:/ filename=E:/DeepLearning_Library/mnist.pkl.gz if not os.path.exists(filename):print(Downloading MNIST dataset

50、.)urlretrieve(url,filename)with gzip.open(filename,rb)as f:data=pickle.load(f)X_train,y_train=data0 X_val,y_val=data1 X_test,y_test=data2 X_train=X_train.reshape(-1,1,28,28)X_val=X_val.reshape(-1,1,28,28)X_test=X_test.reshape(-1,1,28,28)y_train=y_train.astype(np.uint8)y_val=y_val.astype(np.uint8)y_t

展开阅读全文