1、天文数据分析天文数据分析国家天文台 赵永恒2015年4月数据分析方法 累积概率分布任意分布的随机数常用概率分布数据插值内插和外推插值(scipy.interpolate)线性1d插值(interp1d)1d样条插值(interpolate.splXXX)2d样条插值(bisplrep)RBF(radial basis function)插值/平滑import numpy as npfrom scipy.interpolate import Rbfimport matplotlib.pyplot as pltfrom matplotlib import cm#2-d tests-setup sc
2、attered datax=np.random.rand(100)*4.0-2.0y=np.random.rand(100)*4.0-2.0z=x*np.exp(-x*2-y*2)ti=np.linspace(-2.0,2.0,100)XI,YI=np.meshgrid(ti,ti)#use RBFrbf=Rbf(x,y,z,epsilon=2)ZI=rbf(XI,YI)#plot the resultn=plt.Normalize(-2.,2.)plt.subplot(1,1,1)plt.pcolor(XI,YI,ZI,cmap=cm.jet)plt.scatter(x,y,100,z,cm
3、ap=cm.jet)plt.title(RBF interpolation-multiquadrics)plt.xlim(-2,2)plt.ylim(-2,2)plt.colorbar()plt.show()数据拟合拟合模型 O=T+e最小二乘法import numpy as npimport scipy.linalg as splinimport matplotlib.pyplot as pltc1,c2=5.0,2.0i=np.r_1:11xi=0.1*iyi=c1*np.exp(-xi)+c2*xizi=yi+0.05*np.max(yi)*np.random.randn(len(yi)
4、A=np.c_np.exp(-xi):,np.newaxis,xi:,np.newaxisc,resid,rank,sigma=splin.lstsq(A,zi)print c,resid,rank,sigmaxi2=np.r_0.1:1.0:100jyi2=c0*np.exp(-xi2)+c1*xi2plt.plot(xi,zi,x,xi2,yi2)plt.axis(0,1.1,3.0,5.5)plt.xlabel($x_i$)plt.title(Data fitting with linalg.lstsq)plt.show()非线性拟合 拟合系数有非线性函数关系from numpy imp
5、ort x=arange(0,6e-2,6e-2/30)A,k,theta=10,1.0/3e-2,pi/6y_true=A*sin(2*pi*k*x+theta)y_meas=y_true+2*random.randn(len(x)def residuals(p,y,x):A,k,theta=p err=y-A*sin(2*pi*k*x+theta)return errdef peval(x,p):return p0*sin(2*pi*p1*x+p2)p0=8,1/2.3e-2,pi/3print array(p0)#8.43.4783 1.0472from scipy.optimize i
6、mport leastsqplsq=leastsq(residuals,p0,args=(y_meas,x)print plsq0#10.9437 33.3605 0.5834print array(A,k,theta)#10.33.3333 0.5236import matplotlib.pyplot as pltplt.plot(x,peval(x,plsq0),x,y_meas,o,x,y_true)plt.title(Least-squares fit to noisy data)plt.legend(Fit,Noisy,True)plt.show()Maximum Likelihoo
7、d Estimation(MLE)最大似然估计 一致收敛 渐近于正态分布:有极值等精度高斯分布非等精度高斯分布上下限分布求极值scipy.optimizeA collection of general-purpose optimization routines.fmin -Nelder-Mead Simplex algorithm (uses only function calls)fmin_powell-Powells(modified)level set method(uses only function calls)fmin_cg -Non-linear(Polak-Ribiere)co
8、njugate gradient algorithm (can use function and gradient).fmin_bfgs -Quasi-Newton method(Broydon-Fletcher-Goldfarb-Shanno);(can use function and gradient)fmin_ncg -Line-search Newton Conjugate Gradient(can use function,gradient and Hessian).leastsq -Minimize the sum of squares of M equations in N u
9、nknowns given a starting estimate.Constrained Optimizers(multivariate)fmin_l_bfgs_b-Zhu,Byrd,and Nocedals L-BFGS-B constrained optimizer(if you use this please quote their papers-see help)fmin_tnc -Truncated Newton Code originally written by Stephen Nash and adapted to C by Jean-Sebastien Roy.fmin_c
10、obyla -Constrained Optimization BY Linear ApproximationEM算法 Expect-Maximum算法 EM算法的目标是找出有隐性变量的概率模型的最大可能性解 分为2个步骤,E-step和M-step E-step根据最初假设的模型参数值或者上一步的模型参数计算出隐性变量的后验概率,其实就是隐性变量的期望 M-step根据这个E-step的后验概率重新计算出模型参数,然后再重复这两个步骤,直至目标函数收敛。拟合度好模型精度估计 Bootstrap方法“杰克刀”方法Bootstrap方法 用已知数据建立分布 可有N!组新数据SN1987A中微子事
11、件 9个中微子“杰克刀”方法 每次丢掉1或多个数据KolmogorovSmirnov(K-S)检验高斯分布?AndersonDarling检验 ShapiroWilk检验Bayes统计推断高斯分布问题:问题:直方图Markov chain Monte Carlo(MCMC)https:/ GX 5-1LombScargle 周期图 不等间隔相位图注意:倍频、差频注意:倍频、差频相关分析小波变换课后作业课后作业II1、一组有确定均值和误差的测量值中,即有N个值xi,又有M个上下限值xmini,xmaxi,请写出其似然函数ln L。2、随机写出20个整数(可重复),组成xi。以xi的分布产生出1000个随机数,构成yi。并用KS检验说明xi和yi的分布是否一致。3、设计具有一个或数个频率(周期)的正弦曲线,并加上误差。用Fourier谱分析确定其频率(周期)。