1、Defining and Collecting DataChapter 1ObjectivesIn this chapter you learn:nTo understand issues that arise when defining variables.nHow to define variablesnHow to collect datanTo identify different ways to collect a samplenUnderstand the types of survey errorsClassifying Variables By Type Categorical
2、(qualitative)variables take categories as their values such as“yes”,“no”,or“blue”,“brown”,“green”.Numerical(quantitative)variables have values that represent a counted or measured quantity.Discrete variables arise from a counting process Continuous variables arise from a measuring process DCOVAExamp
3、les of Types of VariablesDCOVAQuestionResponsesVariable TypeDo you have a Facebook profile?Yes or NoCategorical(Qualitative)How many text messages have you sent in the past three days?-Numerical(discrete)How long did the mobile app update take to download?-Numerical(continuous)Types of VariablesVari
4、ablesCategoricalNumerical DiscreteContinuousExamples:Marital StatusPolitical PartyEye Color (Defined categories)Examples:Number of ChildrenDefects per hour (Counted items)Examples:WeightVoltage (Measured characteristics)DCOVACollecting Data Correctly Is A Critical Task Need to avoid data flawed by b
5、iases,ambiguities,or other types of errors.Results from flawed data will be suspect or in error.Even the most sophisticated statistical methods are not very useful when the data is flawed.DCOVADeveloping Operational Definitions Is Crucial To Avoid Confusion/ErrorsnAn operational definition is a clea
6、r and precise statement that provides a common understanding of meaningnIn the absence of an operational definition miscommunications and errors are likely to occur.nArriving at operational definition(s)is a key part of the Define step of DCOVADCOVAEstablishing A Business Objective Focuses Data Coll
7、ectionExamples Of Business Objectives:n A marketing research analyst needs to assess the effectiveness of a new television advertisement.n A pharmaceutical manufacturer needs to determine whether a new drug is more effective than those currently in use.n An operations manager wants to monitor a manu
8、facturing process to find out whether the quality of the product being manufactured is conforming to company standards.n An auditor wants to review the financial transactions of a company in order to determine whether the company is in compliance with generally accepted accounting principles.DCOVASo
9、urces of Data Primary Sources:The data collector is the one using the data for analysis Data from a political survey Data collected from an experiment Observed data Secondary Sources:The person performing data analysis is not the data collector Analyzing census data Examining data from print journal
10、s or data published on the internet.DCOVASources of data fall into five categoriesnData distributed by an organization or an individualnThe outcomes of a designed experimentnThe responses from a surveynThe results of conducting an observational studynData collected by ongoing business activitiesDCOV
11、AExamples Of Data Distributed By Organizations or IndividualsnFinancial data on a company provided by investment services.nIndustry or market data from market research firms and trade associations.nStock prices,weather conditions,and sports statistics in daily newspapers.DCOVAExamples of Data From A
12、 Designed ExperimentnConsumer testing of different versions of a product to help determine which product should be pursued further.nMaterial testing to determine which suppliers material should be used in a product.nMarket testing on alternative product promotions to determine which promotion to use
13、 more broadly.DCOVAExamples of Survey DatanA survey asking people which laundry detergent has the best stain-removing abilitiesnPolitical polls of registered voters during political campaigns.nPeople being surveyed to determine their satisfaction with a recent product or service experience.DCOVAExam
14、ples of Data Collected From Observational StudiesnMarket researchers utilizing focus groups to elicit unstructured responses to open-ended questions.nMeasuring the time it takes for customers to be served in a fast food establishment.nMeasuring the volume of traffic through an intersection to determ
15、ine if some form of advertising at the intersection is justified.DCOVAExamples of Data Collected From Ongoing Business ActivitiesnA bank studies years of financial transactions to help them identify patterns of fraud.nEconomists utilize data on searches done via Google to help forecast future econom
16、ic conditions.nMarketing companies use tracking data to evaluate the effectiveness of a web site.DCOVAData Is Collected From Either A Population or A SamplePOPULATIONA population consists of all the items or individuals about which you want to draw a conclusion.The population is the“large group”SAMP
17、LEA sample is the portion of a population selected for analysis.The sample is the“small group”DCOVAPopulation vs.SamplePopulationSampleAll the items or individuals about which you want to draw conclusion(s)A portion of the population of items or individuals DCOVACollecting Data Via Sampling Is Used
18、When Selecting A Sample IsnLess time consuming than selecting every item in the population.nLess costly than selecting every item in the population.nLess cumbersome and more practical than analyzing the entire population.DCOVAThings To Consider/Deal With In Potential Sources Of DatanIs the source of
19、 data structured or unstructured?nHow is electronic data formatted?nHow is data encoded?DCOVAStructured Data Follows An Organizing Principle&Unstructured Data Does NotnA Stock Ticker Provides Structured Data:nThe stock ticker repeatedly reports a company name,the number of shares last traded,the bid
20、 price,and the percent change in the stock price.nDue to their inherent structure,data from tables and forms are structured data.nE-mails from five people concerning stock trades is an example of unstructured data.nIn these e-mails you cannot count on the information being shared in a specific order
21、 or format.nThis book deals exclusively with structured dataDCOVAAll Of The Methods In This Book Deal With Structured DatanTo use the techniques in this book on unstructured data you need to convert the unstructured into structured data.nFor many of the questions you might want to answer,the startin
22、g point can/will be tabular data.DCOVAData Can Be Formatted and/or Encoded In More Than One WaynSome electronic formats are more readily usable than others.nDifferent encodings can impact the precision of numerical variables and can also impact data compatibility.nAs you identify and choose sources
23、of data you need to consider/deal with these issuesDCOVAData Cleaning Is Often A Necessary Activity When Collecting DatanOften find“irregularities”in the datanTypographical or data entry errorsnValues that are impossible or undefinednMissing valuesnOutliersnWhen found these irregularities should be
24、reviewed/addressednBoth Excel&Minitab can be used to address irregularitiesDCOVAAfter Collection It Is Often Helpful To Recode Some VariablesnRecoding a variable can either supplement or replace the original variable.nRecoding a categorical variable involves redefining categories.nRecoding a quantit
25、ative variable involves changing this variable into a categorical variable.nWhen recoding be sure that the new categories are mutually exclusive(categories do not overlap)and collectively exhaustive(categories cover all possible values).DCOVAA Sampling Process Begins With A Sampling FramenThe sampli
26、ng frame is a listing of items that make up the populationnFrames are data sources such as population lists,directories,or mapsnInaccurate or biased results can result if a frame excludes certain portions of the populationnUsing different frames to generate data can lead to dissimilar conclusionsDCO
27、VATypes of SamplesSamplesNon-Probability SamplesJudgmentProbability SamplesSimple RandomSystematicStratifiedClusterConvenienceDCOVATypes of Samples:Nonprobability SamplenIn a nonprobability sample,items included are chosen without regard to their probability of occurrence.nIn convenience sampling,it
28、ems are selected based only on the fact that they are easy,inexpensive,or convenient to sample.nIn a judgment sample,you get the opinions of pre-selected experts in the subject matter.DCOVATypes of Samples:Probability SamplenIn a probability sample,items in the sample are chosen on the basis of know
29、n probabilities.Probability SamplesSimple RandomSystematicStratifiedClusterDCOVAProbability Sample:Simple Random SamplenEvery individual or item from the frame has an equal chance of being selectednSelection may be with replacement(selected individual is returned to frame for possible reselection)or
30、 without replacement(selected individual isnt returned to the frame).nSamples obtained from table of random numbers or computer random number generators.DCOVASelecting a Simple Random Sample Using A Random Number TableSampling Frame For Population With 850 ItemsItem Name Item#Bev R.001Ulan X.002.Joa
31、nn P.849Paul F.850Portion Of A Random Number Table49280 88924 35779 00283 81163 0727511100 02340 12860 74697 96644 8943909893 23997 20048 49420 88872 08401The First 5 Items in a simple random sampleItem#492Item#808Item#892 -does not exist so ignoreItem#435Item#779Item#002DCOVAnDecide on sample size:
32、nnDivide frame of N individuals into groups of k individuals:k=N/nnRandomly select one individual from the 1st group nSelect every kth individual thereafterProbability Sample:Systematic SampleN=40n=4k=10First GroupDCOVAProbability Sample:Stratified SamplenDivide population into two or more subgroups
33、(called strata)according to some common characteristicnA simple random sample is selected from each subgroup,with sample sizes proportional to strata sizesnSamples from subgroups are combined into onenThis is a common technique when sampling population of voters,stratifying across racial or socio-ec
34、onomic lines.PopulationDividedinto 4strataDCOVAProbability SampleCluster SamplenPopulation is divided into several“clusters,”each representative of the populationnA simple random sample of clusters is selectednAll items in the selected clusters can be used,or items can be chosen from a cluster using
35、 another probability sampling techniquenA common application of cluster sampling involves election exit polls,where certain election districts are selected and sampled.Population divided into 16 clusters.Randomly selected clusters for sampleDCOVAProbability Sample:Comparing Sampling MethodsnSimple r
36、andom sample and Systematic samplenSimple to usenMay not be a good representation of the populations underlying characteristicsnStratified samplenEnsures representation of individuals across the entire populationnCluster samplenMore cost effectivenLess efficient(need larger sample to acquire the sam
37、e level of precision)DCOVAEvaluating Survey WorthinessnWhat is the purpose of the survey?nIs the survey based on a probability sample?nCoverage error appropriate frame?nNonresponse error follow upnMeasurement error good questions elicit good responsesnSampling error always existsDCOVATypes of Survey
38、 ErrorsnCoverage error or selection biasnExists if some groups are excluded from the frame and have no chance of being selectednNonresponse error or biasnPeople who do not respond may be different from those who do respondnSampling errornVariation from sample to sample will always existnMeasurement
39、errornDue to weaknesses in question design and/or respondent errorDCOVATypes of Survey ErrorsnCoverage errornNonresponse errornSampling errornMeasurement errorExcluded from frameFollow up on nonresponsesRandom differences from sample to sampleBad or leading question(continued)DCOVAChapter SummaryIn this chapter we have discussed:nThe types of variables used in statisticsnHow to collect datanThe different ways to collect a samplenThe types of survey errors