深度学习的可视分析.pptx_163文库

资源描述

1、Visual Analytics of Deep LearningDeep learning2Motivation专家TrainingProcessCat(prob=0.93)Why the output is like this?When does the model fail?When does the model work?When I can trust the model?Training dataDL ModelModeloutputThe development of high-quality deep models typicallyrelies on a substantia

2、l amount of trial-and-error3Explainable Deep Learning专家Cat, becauseFurExplainabledeep+learningEarUnderstand the model outputUnderstand when it worksUnderstand when it failsUnderstand why it failsTraining dataExplainable Visual analyticsmodelsof deep modelsExplain the model outputsby statistical appr

3、oaches,e.g., ICML 2017 best paperVisualize workings of the model,support interactive exploration,e.g., VAST 2017 best paper4Overview: In Academia Hot topic201620177*920187*6Deep learning specific 2Other models / generic 2Total41613* The best paper in VAST 2017* Honorable mention in VAST 2018Data is

4、collected only in IEEE VIS5Overview: In Industry Google TensorBoard AutoML Microsoft CustomVision Machine learning service Machine learning studio IBM Visual Recognition Facebook FB Learner6TensorBoardVisualizing LearningGraph VisualizationHistogram Dashboard7Outline Par t 1: Model Par t 2: Training

5、 Par t 3: Dataset Par t 4: Cost function Nearly all deep learning algorithms can be described asparticular instances of a fairly simple recipe: combine aspecification of a dataset, a cost function, an optimizationprocedure, and a model.From the “Deep Learning” book8Par t 1: Model Models are deeper a

6、nd deeper AlexNet: 7 layers, 2013 VGG: 19 layers, 2015 ResNet: 101 layers, 2017 Structures are more complex No longer chain-like, but with shortcuts Challenge Efficient mining approach to support real time interaction Visual clutter in showing model structure, neurons in each layer9Analyzing the Noi

7、se Robustness of DeepNeural NetworksMengchen Liu, Shixia Liu, Hang Su, Kelei Cao, Jun ZhuVAST 201810Adversarial Examples Intentionally designed to mislead a deep neural network (DNN)into making incorrect prediction.Giant pandaDNNGuenon monkey11Datapath extractionDatapath visualization1212Datapath Ex

8、traction - MotivationNeuron Current me Most acti ProblemMost activated Misleadinrecogniza Reason Neurons Gap betwNeuron 1 Neuron 213Datapath Extraction - Formulation The critical neurons for a prediction: the neurons thathighly contributed to the final prediction Subset selection Keep the original p

9、rediction by selecting a minimized subset ofneuronsN: all neuronsNs: neuron subsetp(): prediction Extend to a set of images X14Datapath Extraction - Solution Directly solving: time-consuming NP-complete Large search space due to the large number of layersand neurons in a CNNDivide-and-conquer-baseds

10、earch space reductionQuadratic approximationAn accurate approximation in smaller search space15Datapath Extraction Search Space Reduction Original problem: 57.78 million dimsNetwork: ResNet-101Split into layers2k 1.44 million dimsSplit into feature mapsA neuron64 2k dims16Neurons in a feature mapNeu

11、rons in a layerDatapath Extraction Quadratic ApproximationStill NPReformulatewhether the j-th featuremap in layer i is criticalppQ j,k = (a )(a )Discrete to continuousjakaiijk1. Bridges the gapNeeds to calculatebetween activation andprediction2. Each element in Qapproximately modelsthe interaction b

12、etweenfeature map j andfeature map kby BP in each iterationTaylor decomposition: activation vector of thej-th feature mapQuadratic optimization17Datapath Extraction - Motivation Current method Most activated neurons ProblemMost activated Misleading results when exists a highlyrecognizable secondary

13、objectLearned feature ReasonActivations Neurons have complex interactions Gap between activation and predictionNeuron 1 Neuron 218Why?19202121212421252126212721282129Feature map cluster, color:A(normal) A(adversarial)Euler-diagram-basedlayout to presentShared feature mapsfeature maps in a layerUniqu

14、e feature maps21213121322133213421352136213721382239Research Opportunities Dynamic interactions between layers For each image, the model calculates whether to use shortcuts orthe main path23Par t 2: Training Challenge 1: handle a large amount of time series data Activation/gradient/weight changes ov

15、er time Often millions of activations/gradients/weights in a deep model Challenge 2: identify the root cause of a failed trainingprocess Its often difficult to locate the specific neurons leading to thetraining failure Neurons influence each other24Analyzing the Training Processes ofDeep Generative

16、ModelsMengchen Liu, Jiaxin Shi, Kelei Cao, Jun Zhu, Shixia LiuVAST 201725DGMTrackerMulti-level visualizationDebugging ProcessSnapshot levelHybrid data flow visualizationNetwork levelNeuron levelBlue noise polyline samplingCredit assignment26Blue Noise Polyline Sampling - Motivation Employ a line cha

17、rt to visually convey thetraining dynamics Training dynamics: activation/gradient/weightchanges over time Challenge Visual clutter caused by a large amount of time series Solution: bpolyline sampling Both reduce visual clutter and preserve outliers27Blue Noise Sampling The selected samples have blue

18、-noiseproperties The selected samples are located randomlyand uniformly in the space Compared with traditional randomsampling Low sampling rate in the high-density regions Reduce visual clutterBlue noise sampling example High sampling rate in the low-density regions Preserve outliers28Blue-Noise Pol

19、yline Sampling State-of-the-ar t: blue-noise line segment sampling Sun etal., 2013 Sample a line segment if the distance with others is large enough, accept, or reject Intuitive method Selecting line segment samples with blue-noise properties Selecting polylines that contain the selected line segmen

20、ts assamples Solution: select “complete” polylines How to select a polyline sample How to compute the distance between two polylines29Blue-Noise Polyline Sampling (contd) How to select a polyline sample Solution: select the polyline that canmake the samples the mostbalanced in direction isotropy How

21、 to compute the distancebetween two polylines30Case Study: Diagnosing a Failed Variational AutoencoderBlue: components of loss functionRed: sampling of random variablesDataset: CIFAR10 datasetLoss = NaN (10k-30k iterations)An example case: fails at 24,397MeanVarianceTraining data31Loss changesAn abn

22、ormal snapshotFails at iteration 24,39732HoFocus snapshotMaximum activationAverage activationMinimum activation32TimeSource: 2nd GaussianLogarithmic variance of theGaussian sampling layer32Aggregate height and weight32A sudden increaseSome of them showed unusual behaviorMost of the activations of th

23、is layer remained stable32Abnormal imageNormal image32Credit assignment forcalculating contributionsBlue noise polyline sampling33Training: Research Opportunities Multi-core training process analyze core interactions 128 to 512 cores of a Google TPU v3 Pod reported in “LargeScale GAN Training for Hi

24、gh Fidelity Natural Image Synthesis”Generated images Long training process online analysis Waste of time: dozens of hours for each training process Waste of funding: ￥8 / per core / per hour34Par t 3: Dataset Challenge: large number of classes and images, e.g., 1000classes in ImageNetClass hierarchy

25、 as icicle plotConfusion matrix as pixel-based visualization35Do Convolutional Neural Networks Learn Class HierarchyHorseDataset: Research Opportunities More complex datasets, other than pure classification E.g., Visual Genome 108,077 Images 5.4 Million Region Descriptions 1.7 Million Visual Questio

26、n Answer 3.8 Million Object Instances 2.8 Million Attributes 2.3 Million Relationships Mapped to Wordnet Synsetshttps:/visualgenome.org/36Par t 4: Loss Functions Loss: difference between the desired output and ground truth Loss = L(f(x,),t) f: model : parameters t: ground truth L: how to calculate t

27、he difference, e.g., L2-norm Challenges A multi-dimensional function with millions of dimensions Mathematical properties are not fully understood37https:/ the Loss Landscape of Neural Nets38Conclusion Visual analytics of deep learning is hot both in academiaand industry We have made some efforts to help users better analyzedeep models Research opportunities Visualizing dynamic interactions between layers Visualize multi-core training process Visualize online training processes Deal with more complex datasets, other than classification Visualize the high-dimensional loss function39Thank you!

展开阅读全文