1、Numerical Descriptive MeasuresChapter 3In this chapter,you learn to:nDescribe the properties of central tendency,variation,and shape in numerical datanConstruct and interpret a boxplotnCompute descriptive summary measures for a populationnCalculate the covariance and the coefficient of correlationOb
2、jectivesSummary Definitions The central tendency is the extent to which the values of a numerical variable group around a typical or central value.The variation is the amount of dispersion or scattering away from a central value that the values of a numerical variable show.The shape is the pattern o
3、f the distribution of values from the lowest value to the highest value.DCOVAMeasures of Central Tendency:The MeannThe arithmetic mean(often just called the“mean”)is the most common measure of central tendencynFor a sample of size n:Sample sizenXXXnXXn21n1iiObserved valuesThe ith valuePronounced x-b
4、arDCOVAMeasures of Central Tendency:The Mean (cont)nThe most common measure of central tendencynMean=sum of values divided by the number of valuesnAffected by extreme values(outliers)11 12 13 14 15 16 17 18 19 20Mean=13 11 12 13 14 15 16 17 18 19 20Mean=1431565551413121114157052041312111DCOVAMeasure
5、s of Central Tendency:The MediannIn an ordered array,the median is the“middle”number(50%above,50%below)nLess sensitive than the mean to extreme valuesMedian=13Median=1311 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20DCOVAMeasures of Central Tendency:Locating the MediannThe location of the
6、 median when the values are in numerical order(smallest to largest):nIf the number of values is odd,the median is the middle numbernIf the number of values is even,the median is the average of the two middle numbersNote that is not the value of the median,only the position of the median in the ranke
7、d datadataorderedtheinposition21npositionMedian21nDCOVAMeasures of Central Tendency:The ModenValue that occurs most oftennNot affected by extreme valuesnUsed for either numerical or categorical datanThere may be no modenThere may be several modes0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode=90 1 2 3 4 5 6
8、No ModeDCOVAMeasures of Central Tendency:Review ExampleHouse Prices:$2,000,000$500,000$300,000$100,000$100,000Sum$3,000,000 Mean:($3,000,000/5)=$600,000 Median:middle value of ranked data =$300,000 Mode:most frequent value =$100,000DCOVAMeasures of Central Tendency:Which Measure to Choose?The mean i
9、s generally used,unless extreme values(outliers)exist.The median is often used,since the median is not sensitive to extreme values.For example,median home prices may be reported for a region;it is less sensitive to outliers.In some situations it makes sense to report both the mean and the median.DCO
10、VAMeasures of Central Tendency:SummaryCentral TendencyArithmetic MeanMedianModenXXnii1Middle value in the ordered arrayMost frequently observed valueDCOVASame center,different variationMeasures of VariationnMeasures of variation give information on the spread or variability or dispersion of the data
11、 values.VariationStandard DeviationCoefficient of VariationRangeVarianceDCOVAMeasures of Variation:The Range Simplest measure of variation Difference between the largest and the smallest values:Range=Xlargest Xsmallest0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range=13-1=12Example:DCOVAMeasures of Variation
12、:Why The Range Can Be Misleading Does not account for how the data are distributed Sensitive to outliers7 8 9 10 11 12Range=12-7=57 8 9 10 11 12Range=12-7=51,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,51,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120Range=5-1=4Range=120-1=119DCOVAnAverage(appr
13、oximately)of squared deviations of values from the meannSample variance:Measures of Variation:The Sample Variance1-n)X(XSn1i2i2Where =arithmetic meann=sample sizeXi=ith value of the variable XXDCOVAnMost commonly used measure of variationnShows variation about the meannIs the square root of the vari
14、ancenHas the same units as the original datanSample standard deviation:Measures of Variation:The Sample Standard Deviation1-n)X(XSn1i2iDCOVAMeasures of Variation:The Standard DeviationSteps for Computing Standard Deviation1.Compute the difference between each value and the mean.2.Square each differe
15、nce.3.Add the squared differences.4.Divide this total by n-1 to get the sample variance.5.Take the square root of the sample variance to get the sample standard deviation.DCOVAMeasures of Variation:Sample Standard Deviation:Calculation ExampleSample Data (Xi):10 12 14 15 17 18 18 24 n=8 Mean=X=164.3
16、09571301816)(2416)(1416)(1216)(101n)X(24)X(14)X(12)X(10S22222222A measure of the“average”scatter around the meanDCOVAMeasures of Variation:Comparing Standard DeviationsMean=15.5 S=3.338 11 12 13 14 15 16 17 18 19 20 2111 12 13 14 15 16 17 18 19 20 21Data BData AMean=15.5 S=0.92611 12 13 14 15 16 17
17、18 19 20 21Mean=15.5 S=4.567Data CDCOVAMeasures of Variation:Comparing Standard DeviationsSmaller standard deviationLarger standard deviationDCOVAMeasures of Variation:Summary Characteristics The more the data are spread out,the greater the range,variance,and standard deviation.The more the data are
18、 concentrated,the smaller the range,variance,and standard deviation.If the values are all the same(no variation),all these measures will be zero.None of these measures are ever negative.DCOVAMeasures of Variation:The Coefficient of VariationnMeasures relative variationnAlways in percentage(%)nShows
19、variation relative to meannCan be used to compare the variability of two or more sets of data measured in different units 100%XSCVDCOVAMeasures of Variation:Comparing Coefficients of VariationnStock A:nAverage price last year=$50nStandard deviation=$5nStock B:nAverage price last year=$100nStandard d
20、eviation=$5Both stocks have the same standard deviation,but stock B is less variable relative to its price10%100%$50$5100%XSCVA5%100%$100$5100%XSCVBDCOVAMeasures of Variation:Comparing Coefficients of Variation(cont)nStock A:nAverage price last year=$50nStandard deviation=$5nStock C:nAverage price l
21、ast year=$8nStandard deviation=$2Stock C has a much smaller standard deviation but a much higher coefficient of variation10%100%$50$5100%XSCVA25%100%$8$2100%XSCVC DCOVALocating Extreme Outliers:Z-Score To compute the Z-score of a data value,subtract the mean and divide by the standard deviation.The
22、Z-score is the number of standard deviations a data value is from the mean.A data value is considered an extreme outlier if its Z-score is less than-3.0 or greater than+3.0.The larger the absolute value of the Z-score,the farther the data value is from the mean.DCOVALocating Extreme Outliers:Z-Score
23、where X represents the data value X is the sample mean S is the sample standard deviationSXXZDCOVALocating Extreme Outliers:Z-Score Suppose the mean math SAT score is 490,with a standard deviation of 100.Compute the Z-score for a test score of 620.3.1100130100490620SXXZA score of 620 is 1.3 standard
24、 deviations above the mean and would not be considered an outlier.DCOVAShape of a DistributionnDescribes how data are distributednTwo useful shape related statistics are:nSkewnessnMeasures the extent to which data values are not symmetricalnKurtosisnKurtosis affects the peakedness of the curve of th
25、e distributionthat is,how sharply the curve rises approaching the center of the distributionDCOVAShape of a Distribution(Skewness)nMeasures the extent to which data is not symmetricalMean=Median Mean Median Median MeanRight-SkewedLeft-SkewedSymmetricDCOVASkewnessStatistic0 Shape of a Distribution -K
26、urtosis measures how sharply the curve rises approaching the center of the distribution Sharper PeakThan Bell-Shaped(Kurtosis 0)Flatter ThanBell-Shaped(Kurtosis Xlargest MedianMedian XsmallestXlargest MedianMedian XsmallestXlargest Q3Q1 XsmallestXlargest Q3Q1 XsmallestQ3 MedianMedian Q1Q3 MedianMedi
27、an Q1 1)n Examples:(1-1/22)x 100%=75%.k=2 (2)(1-1/32)x 100%=88.89%.k=3 (3)Chebyshev RuleWithinAt leastDCOVAWe Discuss Two Measures Of The Relationship Between Two Numerical Variables Scatter plots allow you to visually examine the relationship between two numerical variables and now we will discuss
28、two quantitative measures of such relationships.The Covariance The Coefficient of CorrelationThe CovariancenThe covariance measures the strength of the linear relationship between two numerical variables(X&Y)nThe sample covariance:nOnly concerned with the strength of the relationship nNo causal effe
29、ct is implied1n)YY)(XX()Y,X(covn1iiiDCOVAnCovariance between two variables:cov(X,Y)0 X and Y tend to move in the same directioncov(X,Y)0 X and Y tend to move in opposite directionscov(X,Y)=0 X and Y are independentnThe covariance has a major flaw:nIt is not possible to determine the relative strengt
30、h of the relationship from the size of the covarianceInterpreting CovarianceDCOVACoefficient of CorrelationnMeasures the relative strength of the linear relationship between two numerical variablesnSample coefficient of correlation:whereYXSSY),(Xcovr 1n)X(XSn1i2iX1n)Y)(YX(XY),(Xcovn1iii1n)Y(YSn1i2iY
31、DCOVAFeatures of theCoefficient of CorrelationnThe population coefficient of correlation is referred as.nThe sample coefficient of correlation is referred to as r.nEither or r have the following features:nUnit freenRange between 1 and 1nThe closer to 1,the stronger the negative linear relationshipnT
32、he closer to 1,the stronger the positive linear relationshipnThe closer to 0,the weaker the linear relationshipDCOVAScatter Plots of Sample Data with Various Coefficients of CorrelationYXYXYXYXr=-1r=-.6r=+.3r=+1YXr=0DCOVAThe Coefficient of Correlation Using Microsoft Excel FunctionDCOVAThe Coefficie
33、nt of Correlation Using Microsoft Excel Data Analysis Tool1.Select Data2.Choose Data Analysis3.Choose Correlation&Click OKDCOVAThe Coefficient of CorrelationUsing Microsoft Excel4.Input data range and select appropriate options5.Click OK to get outputDCOVAInterpreting the Coefficient of CorrelationU
34、sing Microsoft Excel r=.733 There is a relatively strong positive linear relationship between test score#1 and test score#2.Students who scored high on the first test tended to score high on second test.DCOVAPitfalls in Numerical Descriptive MeasuresnData analysis is objectivenShould report the summ
35、ary measures that best describe and communicate the important aspects of the data setnData interpretation is subjectivenShould be done in fair,neutral and clear mannerDCOVAEthical ConsiderationsNumerical descriptive measures:nShould document both good and bad resultsnShould be presented in a fair,ob
36、jective and neutral mannernShould not use inappropriate summary measures to distort factsDCOVAIn this chapter we have discussed:nDescribing the properties of central tendency,variation,and shape in numerical datanConstructing and interpreting a boxplotnComputing descriptive summary measures for a populationnCalculating the covariance and the coefficient of correlationChapter Summary