1、Visualization andData MiningOutline Graphical excellence and lie factor Representing data in 1,2,and 3-D Representing data in 4+dimensions Parallel coordinates Scatterplots Stick figuresNapoleon Invasion of Russia,1812NapoleonMarley,1885 www.odt.org,from http:/www.odt.org/Pictures/minard.jpg,used by
2、 permissionSnows Cholera Map,1855Asia at nightSouth and North Korea at nightSeoul,South KoreaNorth KoreaNotice how darkit isVisualization RoleSupport interactive explorationHelp in result presentationDisadvantage:requires human eyesCan be misleading Bad Visualization:SpreadsheetYear Sales1999 2,1102
3、000 2,1052001 2,1202002 2,1212003 2,124Sales2095210021052110211521202125213019992000200120022003SalesWhat is wrong with this graph?Bad Visualization:Spreadsheet with misleading Y axisYear Sales1999 2,1102000 2,1052001 2,1202002 2,1212003 2,124Sales2095210021052110211521202125213019992000200120022003
4、SalesY-Axis scale gives WRONGimpression of big changeBetter VisualizationYear Sales1999 2,1102000 2,1052001 2,1202002 2,1212003 2,124Sales05001000150020002500300019992000200120022003SalesAxis from 0 to 2000 scale gives correct impression of small changeLie Factordataineffectofsizegraphicinshowneffec
5、tofsizeFactorLie8.14528.0833.718)0.185.27(6.0)6.03.5(Tufte requirement:0.95Lie Factor1.05Tuftes Principles of Graphical Excellence Give the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.Tell the truth about the data!Visualization MethodsVisualizing
6、 in 1-D,2-D and 3-D well-known visualization methodsVisualizing more dimensions Parallel Coordinates Other ideas1-D(Univariate)Data Representations7531020MeanlowhighMiddle 50%Tukey box plotHistogram2-D(Bivariate)Data Scatter plot,pricemileage3-D Data(projection)priceLie Factor=14.83-D image(requires
7、 3-D blue and red glasses)Taken by Mars Rover Spirit,Jan 2004Visualizing in 4+Dimensions Scatterplots Parallel Coordinates Chernoff faces Stick Figures Multiple ViewsGive each variable its own display A B C D E1 4 1 8 3 52 6 3 4 2 13 5 7 2 4 34 2 6 3 1 5A B C D E1234Problem:does not show correlation
8、sScatterplot MatrixRepresent each possiblepair of variables in theirown 2-D scatterplot(car data)Q:Useful for what?A:linear correlations (e.g.horsepower&weight)Q:Misses what?A:multivariate effectsParallel Coordinates Encode variables along a horizontal row Vertical line specifies valuesDataset in a
9、Cartesian coordinatesSame dataset in parallel coordinatesInvented by Alfred Inselberg while at IBM,1985Example:Visualizing Iris DataIris setosaIris versicolorIris virginicaFlower PartsPetal,a non-reproductive part of the flowerSepal,a non-reproductive part of the flowerParallel Coordinates Sepal Len
10、gth5.1Parallel Coordinates:2 DSepal Length5.1Sepal Width3.5Parallel Coordinates:4 DSepal Length5.1Sepal WidthPetal lengthPetal Width3.51.40.25.13.51.40.2Parallel Visualization of Iris dataParallel Visualization SummaryEach data point is a lineSimilar points correspond to similar linesLines crossing
11、over correspond to negatively correlated attributesInteractive exploration and clusteringProblems:order of axes,limit to 20 dimensionsChernoff FacesEncode different variables values in characteristicsof human facehttp:/www.cs.uchicago.edu/wiseman/chernoff/http:/ applets:Interactive FaceChernoff face
12、s,exampleStick FiguresTwo variables are mapped to X,Y axesOther variables are mapped to limb lengths and angles Texture patterns can show data characteristicsStick figures,examplecensus data showingage,income,sex,education,etc.Closed figures correspond to women and we can see more of them on the left.Note also a young woman with high incomeVisualization softwareFree and Open-sourceGgobiXmdvMany more-see www.KD SummaryMany methodsVisualization is possible in more than 3-DAim for graphical excellence