1、1Chapter 9 (Semi)Variogram modelsGiven a geostatistical model,Z(s),its variogram g(h)is formally defined aswhere f(s,u)is the joint probability density function of Z(s)and Z(u).For an intrinsic random field,the variogram can be estimated using the method of moments estimator,as follows:where h is th
2、e distance separating sample locations si and si+h,N(h)is the number of distinct data pairs.In some circumstances,it may be desirable to consider direction in addition to distance.In isotropic case,h should be written as a scalar h,representing magnitude.Note:In literature the terms variogram and se
3、mivariogram are often used interchangeably.By definition g(h)is semivariogram and the variogram is 2g(h).usususushddfZZZZ),()()(21)()(var21)(2g2)(1)()()(21)(hhsshhNiiizzNg2Robust variogram estimatorVariogram provides an important tool for describing how the spatial data are related with distance.As
4、we have seen it is defined in terms of dissimilarity in data values between two locations separated by a distance h.It is noted that the moment estimator given in the previous page is sensitive to outliers in the data.Thus,sometimes robust estimators are used.The widely used robust estimator is give
5、n by Cressie and Hawkins(1980):The motivation behind this estimator is that for a Gaussian process,we haveBased on the Box-Cox transformation,it is found that the fourth-root of 12 is more normally distributed.*Cressie,N.and Hawkins,D.M.1980.Robust estimation of the variogram,I.Journal of the Intern
6、ational Association for Mathematical Geology 12:115-125.)(/494.0457.0)()()(21)(42/1hNhszszhNhiig212)(2)()(ghsZhsZ3Variogram parametersThe main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process.A typical var
7、iogram can be described using three parameters:Nugget effect represents micro-scale variation or measurement error.It is estimated from the empirical variogram at h=0.Range is the distance at which the variogramreaches the plateau,i.e.,the distance(if any)at which data are no longer correlated.Sill
8、is the variance of the random field V(Z),disregarding the spatial structure.It is theplateau where the variogram reaches at therange,g(range).hg(h)02468100.00.40.81.2range h=5nugget =0.2sill =1.0)(hg45 m20200.5 m0.5 mSetting variogram parametersConstruction of a variogram requires consideration of a
9、 few things:An appropriate lag increment for h It defines the distance at which the variogram is calculated.A tolerance for the lag increment It establishes distance bins for the lag increments to accommodate unevenly spaced observations.The number of lags over which the variogram will be calculated
10、 The number of lags in conjunction with the size of the lag increment will define the total distance over which a variogram is calculated.A tolerance for angle It determines how wide the bins will span.Two practical rules:It is recommended that h is chosen as suchthat the number of pairs is greater
11、than 30.2.The distance of reliability for anexperimental variogram is h D/2,where1.D is the maximum distance over the field of data.5Computing variogramsAn experimental variogram is calculated using the R function(in package gstat):variogrm(pH1,locgx+gy,soil87.dat)#gx:list or vector of x-coordinates
12、#gy:list or vector of y-coordinates#pH:list or vector of a response variable 6Covariogram and CorrelogramCovariogram(analogous to covariance)and correlogram(analogous to correlation coefficient)are another two useful methods for measuring spatial correlation.They describe similarity in values betwee
13、n two locations.Covariogram:Its estimator:where is the sample mean.At h=0,(0)is simply the finite variance of the random field.It is straightforward to establish the relationship:The correlogram is defined as)(),(cov)(hszszhCii)(1)()()()(1)(hNiiizhszzszhNhCz.)0()(1)0()()(ChChChg).()0()(hCChg7Propert
14、ies of the moment estimator for variogramIt is unbiased:If Z(s)is ergodic,as n .This means that the moment estimator approaches the true value for the variogram as the size of the region increases.The estimator is consistent.The moment estimator converges in distribution to a normal distribution as
15、n ,i.e.,it is approximately normally distributed for large samples.For Gaussian processes,the approximate variance-covariance matrix of is available(Cressie 1985).*Cressie,N.1985.Fitting variogram models by weighted least squares.Mathematical Geology 17:563-586.)()(hhEgg)()(hhgg)(hg)(hg8Properties o
16、f the moment estimator for covarianceThe covariance:C(h)=cov(Z(si),Z(si+h)The moment estimator:Properties:The moment estimator for the covariance is biased.The bias arises because the covariance function for the residuals,is not the same as the covariance function for the errors,For a second-order s
17、tationary random field,the moment estimator for the covariance is consistent:(h)C(h)almost surely as n .However,the convergence is slower than the varigogram.For a second-order stationary random field,the moment estimator is approximately normally distributed.1.Properties 1 and 2 are the reasons why
18、 the variogram is preferred over the covariance function(and correlogram)in modeling geostatistical data.,)()(zsZsii.)()(iisZs)(1)()()()(1)(hNiiizhszzszhNhC9pHFsrfxpyp01002003004000.00.10.20.30.4ypxp01002003004000.00.050.100.150.200.25Variogram before detrending Variogram after detrending01002003004
19、0050002004006008000100200300400500020040060080010Variogram modelsThere are two reasons we need to fit a model to the empirical variogram:Spatial prediction(kriging)requires estimates of the variogram g(h)for those hs which are not available in the data.The empirical variogram cannot guarantee the va
20、riance of predicted values to be positive.A variogram model can ensure a positive variance.Various parametric variogram models have been used in the literature.The follows are some of the most popular ones.Linear model where c0 is the nugget effect.The linear variogramhas no sill,and so the variance
21、 of the process is infinite.The existence of a linear variogram suggests a trend inthe data,so you should consider fitting a trend to the1.data,modeling the data as a function of the coordinates(trend surface analysis).bhch0)(ghg(h)11Power model-where c0 is the nugget effect.The power variogram has
22、no sill,so the variance of the process is infinite.The linear variogram is a special case of the power model.Similarly,the existence of a linear variogram suggests a trend in the data,so you should consider fitting a trend to the data,modeling the data as a function of the coordinates(trend surface
23、analysis).gbhch0)(hg(h)112Exponential model-where c0 is the nugget effect.The sill is c0+c1.The range for the exponential model is defined to be 3 at which the variogram is of 95%of the sill.Gaussian model-where c0 is the nugget effect.c0+c1 is the sill.The range is 3.This model describes a random f
24、ield that is considered to be too smooth and possesses the peculiar property that Z(s)can be predicted without error for any s on the plane.g/101)(hecchhg(h)hg(h)2)/(101)(ghecch13Logistic model(rational quadratic model)-where c0 is the nugget effect.The sill is c0+a/b.The range for the exponential m
25、odel isSpherical model-where c0 is the nugget effect.The sill is c0+c1.The range for the spherical model can be computed by setting g(h)=0.95(c0+c1).2201)(bhahchg.)(1900bcabbcafor 0 h afor h a310)(2123)(ahahcchg10)(cchghg(h)hg(h)14Parameter estimationThere are commonly two ways to fit a variogram mo
26、del to an empirical variogram.Assume the variogram model g(h;q q),where q q is an unknown parameter vector.For example,for the exponential variogram model q q=(c0,c1,).Ordinary least squares method The OLS estimator for q q is obtained by finding that minimizesThe OLS estimation can be easily implem
27、ented in R using function optim or nls.Initial values for q q are required,these values can be obtained from the empirical variogram.Notes:OLS estimation assumes that-does not depend on the lag distance hi-for all pairs of lag distances hi hi.Both assumptions are violated.The variance and the covari
28、ance depend on the number of pairs of sites used to compute the empirical variogram(see Cressie 1985).1.These violations do not contribute significantly to the bias of the parameter estimation.q q.);()()(2iiihhQgg)(var(ihg0)(),(cov(jihhgg15Weighted least squares estimatorThe WLS estimator for q q is
29、 obtained by finding that minimizeswhereSo that,To note that the WLS estimator is more precise(has a smaller variance)than the OLS estimator.Model selection criteria:Select a model with the smallest residual sum of squares or AIC or log-likelihood ratio,but pay a particularly attention to the goodne
30、ss-of-fit at short distance lags(important for efficient spatial prediction).q q.)(var();()()(2iiiihhhQggg.)(),(2)(var(2iiihNhhgg.1);()()(21);()();(2)()(222iiiiiiiiihhhNhhhhNQggggg16Splus implementationFitvar.s(dt,c,sill,range,model,wt=F)#dt:list of distances and sample variograms obtained from the
31、function variogrm#c:initial estimate of the nugget effect#sill:initial estimate of the sill#range:initial estimate of the range#model:=exp,fits an exponential model to the sample variogram#=gau,fits a Gaussian model to the sample variogram#=sph,fits a spherical model to the sample variogram#=lin,fit
32、s a linear model to the sample variogrampH.variog_variogrm(soil87.dat$gx,soil87.dat$gy,soil87.dat,5,nint=30,dmax=400)pH.exp_fitvar.s(pH.variog,0.12,0.22,300,model=exp)x_seq(0,380,1)lines(expvar(x,pH.exp),col=5)Note:In the function fitvar,wt=F(i.e.,OLS)each sample equally contributes to the objective
33、 function Q(q q),while wt=T(i.e.,WLS)Q(q q)is weighted in proportion to the number of obs.used in computing the sample variance.Thus,locations based on a few obs.will not carry as much weight compared to the one based on a large number of obs.17Fractals The concept of dimension Geometric objects are
34、 traditionally viewed and measured in the Euclidean space,e.g.,line,rectangle and cube,with dimension D=1,2,and 3,respectively.However,many phenomena in nature(e.g.,clouds,snow flakes,tree architecture)cannot be satisfactorily described using Euclidean dimensions.To describe the irregularity of such
35、 geometric objects(irregular geometric objectsare called fractals),we need to generalize theconcept of Euclidean dimension.The Hausdorff Dimension If we take an objectresiding in Euclidean dimension D and reduce itslinear size by 1/r in each spatial direction,thenumber of replicas of the original ob
36、ject wouldincrease to N=rD times.D=log(N)/log(r),is theHausdorff dimension,named after the Germanmathematician,Felix Hausdorff.The importantpoint is that in fractal dimension D need not be aninteger,it could be a fraction.It has proved usefulfor describing natural objects.D=1D=2D=3r=1r=2r=3N=1N=1N=1
37、N=4N=8N=2N=3N=9N=2718Examples of geometric objects with non-integer dimensions1.Cantor set(dust)Begin with a line of length 1,called initiator.Then remove the middle third of the line,this step is called the generator,because it specifies a rule that is used to generate a new form.The generator coul
38、d iterativelyinfinitely be applied to the remaining segmentsso that to generate a set of“dust”.The dustsare obviously neither points nor lines,but laysomewhere between them,thus has a dimensionbetween 0 and 1:D=log(N)/log(r)=log(2)/log(3)=0.6309.2.Koch curve D=log(4)/log(3)=1.2618.InitiatorGenerator
39、3.Sierpinski triangle D=log(3)/log(2)=1.5850 19Self-similarity and smoothnessAn important property of a fractal is self-similarity,which refers to an infinite nesting of structure on all scales.It means that a substructure resembles the form of its superstructure,e.g.,leaf shape resembles branch sha
40、pe,whereas branch resembles tree shape.Another important way to understand fractal dimension is that D is a smoothness measure of a spatial process/object(e.g.,surface smoothness/roughness).When D=1(a line),or=2(a plane),the objects are smooth.For those objects whose Ds are between 1 or 2(e.g.,Koch
41、curve or Sierpinski triangle),their smoothness varies between a line and a plane.Study surface growth and smoothness is increasingly becoming an important physic and biological subjects.It has much to do with fractal geometry and spatial statistics.An example is a technology,called molecular beam ep
42、itaxy,used to manufacture thin films for computer chips and other semiconductor devices.It is a process to deposit silicon molecules to create a very smooth si surface.*Manderlbrot,B.B.1982.The fractal geometry of Nature.Freeman,San Francisco.*Meakin,P.2019.Fractals,scaling and growth far from equil
43、ibrium.Cambridge U.Press.*Barabsi,A.-L.and Stanley,H.E.2019.Fractal concepts in surface growth.Cambridge U.Press.20Calculating fractal dimension from a variogramBecause the smoothness of a spatial process is directly related to the smoothness of the covariance function at h 0,a fractal dimension can
44、 be calculated from a variogram.Ifthen we say the process Z(s)is continuous.For a continuous covariance,we haveorWhere o(h)is a term of smaller order than h for h at neighborhood 0.The fractal dimension of the surface is D=2 /2.can be estimated from an empirical variogram as follows:log(g(h)=log(b)+log(h).*Davies,S.&Hall,P.2019.Fractal analysis of surface roughness by using spatial data(with Discussion).JRSS,B.61:3-37.*Palmer,M.W.1988.Fractal geometry:a tool for describing spatial patterns of plant communities.Vegetation 75:91-102.,)()0(bhhCC)()0()(bhohChCbghh)(