1、01020304050-2-1012Simple Use of Color In a PlotJust a Whisper of a Label-1.0-0.50.00.51.0sin and cosPhase Angle sinSepal.Length2.03.04.00.51.52.54.55.56.57.52.03.04.0Sepal.WidthPetal.Length12345674.55.56.57.50.51.52.51234567Petal.WidthEdgar Andersons Iris DataMath can be beautiful .cosr2er 6R: Stati
2、stics? Programme?and Who are You?- An ABC introduction to RPresented byGuohui DingR&D, SIBS, CASFor Fudan UniversityMain Topics Today What is R? How to administrate R? How does R work? How to apply R for statistical problem? How to program your R function? xy10020030040050060070080010020030040050060
3、0Maunga Whau VolcanoWhat is R?A brief history of R02004006008000200400600A Topographic Map of Maunga WhauThe legend of R R started in the early 1990s as a project by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, intended to provide a statistical environmentstatistical e
4、nvironment in their teaching lab. The lab had Macintosh computers, for which no suitable commercial environment was available.Robert GentlemanRoss IhakaRs Parents(1) The S language S: an interactive environment for data analysis developed at Bell Laboratories since 1976 Exclusively licensed by AT&T/
5、Lucent to Insightful Corporation, Seattle WA. Product name: “S-plus”.You can learn more from:http:/cm.bell- father is S, mother is Scheme, but why my name is “R”? The Scheme languageScheme is a statically scoped and properly tail-recursive dialect of the Lisp programming language invented by Guy Lew
6、is Steele Jr. and Gerald Jay Sussman. Learn more: http:/swiss.csail.mit.edu/projects/scheme/ Schemes underlying semantics + Ssyntax = RRs Parents(2) “ We have named our language R in part to acknowledge the influence of S and in part to celebrate our own efforts.”- R. Ihaka R. Gentleman - Ihaka R. &
7、 Gentleman R., 1996R Now Since mid-1997 there has been a core group who can modify the R source code CVS archive. The R package system CRAN (the Comprehensive R Archive Network )http:/www.r-project.orgThe characters of R R is “GNU S” A language and environment for data manipula-tion, calculation and
8、 graphical display. That is R is a Free Software (or Open source software). (Here, Free refers to freedom, not price, although R is free in that sense as well.) The core of R is an interpreted computer language. A mosaic of procedure-based programming and object-oriented programming Good interface t
9、o procedures written in C, C+, FORTRAN and other languages A flexible data exchange mechanism accessingrelational databases -ODBC, PostgreSQL, MySQL and so on.小偷与强盗的谈判R and Statistics Most packages deal with statistics and data analysis. Powerful statistical graphics. Well crosstalking with other st
10、atistical softwares. Most R user are statistical experts. You can learn more modern analysis method from they by email. You can do it when you come across a thing no body do it before.Install and administrate R Focus on Windows(MS)rowcolumnvolcanoHow do I get R? The informational web site http:/www.
11、r-project.org/ CRAN - the Comprehensive R Archive Network. The primary site is http:/cran.r-project.org/ .Mirror sites are available for many countries. CRAN sites have binary distributions for Windows 95, 98, ME, NT4, 2000 and XP on Intel, for the Macintosh (System 8.6 to 9.1 and MacOS X), and for
12、several Linux distributions. New releases occur frequently about every 3 months.Be prepared to re-install frequently. Also you can get it from your friends, teachers, etc.Down it!It is about 20.6M in size.Using Precompiled Binary DistributionsInstalling R Double click “rw1091.exe” using your mouse.
13、That is OK. You can install it as all other standard MS softwares.R Console/RGui in Windows(MS)Command boxGraphics boxMenuIconsSeveral concepts in Administrating R Workspace xxx.RData History xxx.Rhistory Package Object Session ConsoleRun your R codesLoad/save workspaceLoad/save HistoryChange your w
14、orking directory- Ihaka R. & Gentleman R., 1996Add a new package Commands: library()add a package in the library detach(package : xxx)detach a package All can do in the GUI (except detach()Load a local packageInstall packages frominternet or localUpdate the local package from internetPackages in R E
15、nvironment Basic packages package:methods package:stats package:graphics“ package:utils package:base Recommanded packages grid; lattice;e1071 Contributed packages (more than 366 packages nowadays) You can see what packages loaded now by the command search().Dont lose your way! Three useful system co
16、mmand getwd()Get Working Directory setwd() Set Working Directory list.files()List the Files in a Directory/FolderShow the Demonstrations of the Packages/Functions Commands demo()Demonstrations of R Functionality example()Run an Examples Section from the Online HelpGetting Helps Several commands help
17、.start() help() or ?() help.search() apropos() Internet searching I like it verymuch. It seemsomnipotence.Quit R Command q()Terminate an R SessionHow does R work?Basic R Structure and data manipulation-60-40-20020406080-60-40-200204060clusplot(clara(x = xclara, k = 3, keep.data = FALSE)Component 1Co
18、mponent 2These two components explain 100 % of the point variability.Basic R working flow(Object orientation)package- R for Beginners. Emmanuel ParadisObject orientation Object: a collection of atomic variables and/or other objects that belong together Parlance: class: the “abstract” definition of i
19、t object: a concrete instance method: other word for function slot: a component of an objectTypes of Data in R The basic data object is a vector of elements of type: numeric numbers - either floating point or integer character each element is a character string logical each element is TRUE or FALSE
20、list elements can be any type of object, including other lists Components of the S language, such as functions, are also vectors. Any vector can include the missing data marker NA as an element. All vectors have a length and a mode. The functions length and mode return this information as does the s
21、tr function. A structure consists of a data object plus additional information. Matrices (or arrays, in general) and time series are examples of structures.OperatorsVectors, Matrices and Arrays Command: array(data = NA, dim = length(data), dimnames = NULL) matrix(data = NA, nrow = 1, ncol = 1, byrow
22、 = FALSE, dimnames = NULL)Lists List vs. Vector list: an ordered collection of data of arbitrary types. vector: an ordered collection of data of the same type. Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string). But both types suppor
23、t both access methods.Factors Factors: classification variables If the levels of a factor are numeric (e.g. the treatments are labelled“1”, “2”, and “3”) it is important to ensure that the data are ctually stored as a factor and not as numeric data. Always check this by using summary.Data frames dat
24、a frame: is supposed to represent the typical data table that researchers come up with like a spreadsheet. It is a rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. ( A list actually)Subset
25、ting Individual elements of a vector, matrix, array or data frame are accessed with “ ” by specifying their index, or their nameUsing R on Windows(MS)Basic statistical analysis by R70-7465-6960-6455-5950-54Rural MaleRural FemaleUrban MaleUrban FemaleDeath Rates in VirginiaFaked 95 percent error bars
26、020406080100Mean 60.35Mean 40.4Mean 25.88Mean 16.93Mean 11.05Data Input From the keyboard one by one c( ); scan( ) From the file read.table(); read.csv(); read.csv2(); read.dta(); read.spss(); By a spreadsheet data.entry() edit() fix() Data Edit Commands edit() fix()Tips: edit() can invokean notepad
27、 in the RGui!Data Discription Commands summary() mean() sd() hist() boxplot() Probability DistributionThree useful prefix in Probability Distribution Function dxxx for the density pxxx for the CDF qxxx for the quantile function rxxx for the simulation(random deviates)They are different!The seed is s
28、et by the system. You can set seed yourselfby set.seed().Statistical Inference Commands qxxx () for the quantile function t.test() wilcox.test(stats) kruskal.test(stats) var.test(); shapiro.test();qqnorm(); qqline()- Analysis of variance and Regression Analysis Commands anova() lm() Experiment Desig
29、n Commands sample() power.t.test() Save Object/Data Every R object can be stored into and restored from a file with the commands “save” and “load”. save(x, file=“x.Rdata”) load(“x.Rdata”) Importing and exporting data with rectangular tables in the form of tab-delimited text files. write.table(x, fil
30、e=“x.txt”, sep=“t”)Graphics with RA Friendly R Environment - RcmdrIf you dont like a command line environment, package Rcmdr may be a good choice!Cube Root Ozone (cube root ppb)Wind Speed (mph)Temperature (F)2.02.53.03.53.54.04.04.55.05.5607080905101520radiation2.53.03.54.04.54.55.05.56.06.5radiatio
31、n2.53.03.54.04.55.05.05.56.06.57.0radiation2.53.03.54.04.55.05.05.56.06.57.060708090radiation510152012345678R programming (.R)Program your R code own5101520-1.0-0.50.00.51.0sinesControl Flow if(cond) expr if(cond) cons.expr else alt.expr for(var in seq) expr while(cond) expr repeat expr break nextLo
32、ops The main loop construct in R is for. The commonest use, as in C and other languages, is to count from 1 to n. for (i in 1:n) # do somethingLeaving loops The break and next commands allow the flow of a loop to be alteredbreak jumps out the loopnext jumps to the next iteration of the loopAvoiding
33、Iteration The canonical bad R program looks like this # multiply two vectors for(i in 1:n) di - ai * bi #compute the inner product s - 0 for (i in 1:n) s - s + di The right way to do this is s-sum(a*b) apply(); lapply(); sapply()Write R functionA function definition looks likemedian - function(x, na
34、.rm = FALSE)lots of code.# a return valueMore Packages Objects and methods Debugging and optimisation Connecting to other packages Interface to other programme language or DataBaseR+? +R!Some Resources A Course (The ppt is showed with R Development Core Group) http:/faculty.washington.edu/tlumley/Rc
35、ourse/ A Paper (citing R in a publication) Ihaka R. & Gentleman R. 1996. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics 5: 299314. Two URL http:/www.r-project.org http:/www.ats.ucla.edu/stat/ Several Books Using R for Data Analysis and GraphicsAn Intr
36、oduction. J.H. Maindonald An Introduction to R. The R Development Core Team simpleR Using R for Introductory Statistics. John Verzani R for Beginners. Emmanuel Paradis The R Reference Manual Base Package. The R Development Core TeamAcknowledgePhD. Qi Liu Prof. Naiqing ZhaoProf. Gang Pei Everyone HereProf. Yixue LiAny Question?