1、Understanding Social Mediawith Machine LearningXiaojin Zhujerryzhucs.wisc.eduDepartment of Computer SciencesUniversity of WisconsinMadison,USACCF/ADL Beijing 2013Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 20131/95Outline1234Spatio-Temporal Signal Recovery from Social MediaMachine Lea
2、rning BasicsProbabilityStatistical EstimationDecision TheoryGraphical ModelsRegularizationStochastic ProcessesSocioscope:A Probabilistic Model for Social MediaCase Study:RoadkillZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 20132/95Spatio-Temporal Signal Recovery from Social MediaOutlin
3、e1234Spatio-Temporal Signal Recovery from Social MediaMachine Learning BasicsProbabilityStatistical EstimationDecision TheoryGraphical ModelsRegularizationStochastic ProcessesSocioscope:A Probabilistic Model for Social MediaCase Study:RoadkillZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijin
4、g 20133/95Spatio-Temporal Signal Recovery from Social MediaSpatio-temporal Signal:When,Where,How MuchDirect instrumental sensing is di cult and expensiveZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 20134/95Spatio-Temporal Signal Recovery from Social MediaHumans as SensorsZhu (U Wiscons
5、in)Understanding Social MediaCCF/ADL Beijing 20135/95Spatio-Temporal Signal Recovery from Social MediaHumans as SensorsNot“hot trend”discovery:We know what event we want to monitorNot natural language processing for social media:We are given areliable text classifier for“hit”Our task:precisely estim
6、ating a spatiotemporal intensity function fstof a pre-defined target phenomenon.Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 20136/95Spatio-Temporal Signal Recovery from Social MediaChallenges of Using Humans as SensorsKeyword doesnt always mean eventIII was just told I look like dead
7、crow.Dont blame me if one day I treat you like a dead crow.Human sensors arent under our controlLocation stamps may be erroneous or missingIIII3%have GPS coordinates:(-98.24,23.22)47%have valid user profile location:Bristol,UK,New York50%dont have valid location informationHogwarts,In the tra c.blah
8、,Sitting On A TacoZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 20137/95Spatio-Temporal Signal Recovery from Social MediaProblem DefinitionInput:A list of time and location stamps of the target posts.Output:fst Intensity of target phenomenon at location s(e.g.,NewYork)and time t(e.g.,0-
9、1am)Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 20138/95Spatio-Temporal Signal Recovery from Social MediaWhy Simple Estimation is Badfst=xst,the count of target posts in bin(s,t)Justification:MLE of the model x Poisson(f)However,IIIPopulation Bias:Assume fst=fs0t0,if more users in(s,t
10、),thenxst xs0t0Imprecise location:Posts without location stamp,noisy user profilelocationZero/Low counts:If we dont see tweets from Antarctica,no penguinsthere?Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 20139/95Machine Learning BasicsOutline1234Spatio-Temporal Signal Recovery from So
11、cial MediaMachine Learning BasicsProbabilityStatistical EstimationDecision TheoryGraphical ModelsRegularizationStochastic ProcessesSocioscope:A Probabilistic Model for Social MediaCase Study:RoadkillZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201310/95Machine Learning BasicsProbabilit
12、yOutline1234Spatio-Temporal Signal Recovery from Social MediaMachine Learning BasicsProbabilityStatistical EstimationDecision TheoryGraphical ModelsRegularizationStochastic ProcessesSocioscope:A Probabilistic Model for Social MediaCase Study:RoadkillZhu (U Wisconsin)Understanding Social MediaCCF/ADL
13、 Beijing 201311/95Machine Learning BasicsProbabilityProbabilityThe probability of a discrete random variable A taking the value a isP(A=a)2 0,1.Sometimes written as P(a)when no danger of confusion.NormalizationJoint probability P(A=a,B=b)=P(a,b),the two events bothhappen at the same time.Marginaliza
14、tion P(A=a)=B”.P(a,b)The product rule P(a,b)=P(a)P(b|a)=P(b)P(a|b).Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201312/95Bayes rule P(a|b)=P(b|a)P(a).In general,P(a|b,C)=P(b|C)Rp(D|)p()d the evidence,Machine Learning BasicsProbabilityBayes RuleP(b)P(b|a,C)P(a|C)where C can be one or mo
15、rerandom variables.Bayesian approach:when is model parameter,D is observed data,we havep(|D)=p(D|)p()p(D),Rp(D|)d 6=1),IIIIp()is the prior,p(D|)the likelihood function(of,not normalized:p(D)=p(|D)the posterior.Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201313/95Machine Learning Basic
16、sProbabilityIndependenceThe product rule can be simplified as P(a,b)=P(a)P(b)i A andB are independentEquivalently,P(a|b)=P(a),P(b|a)=P(b).Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201314/95R x2P(x1 X 1 is possible!Integrates to 1.x1Marginalization p(x)=p(x)dx=11p(x)dx1 p(x,y)dyZhu (
17、U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201315/95pMachine Learning BasicsProbabilityExpectation and VarianceThe expectation(“mean”or“average”)of a function f under theprobability distribution P isEPf=P(a)f(a)aEpf=p(x)f(x)dxxIn particular if f(x)=x,this is the mean of the random variabl
18、e x.The variance of f isVar(f)=E(f(x)Ef(x)2=Ef(x)2Ef(x)2The standard deviation is std(f)=Var(f).Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201316/95Machine Learning BasicsProbabilityMultivariate StatisticsWhen x,y are vectors,Ex is the mean vectorCov(x,y)is the covariance matrix with
19、 i,j-th entry beingCov(xi,yj).Cov(x,y)=Ex,y(xEx)(yEy)=Ex,yxyExEyZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201317/958 (d-sided die)f(x)=nx1,.,xdxk B(1)(Marginal)x N(x,A)(Conditional)y|x N(y+CA1(xx),BCA1C)Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201322/95Machine Lear
20、ning BasicsProbabilityMore Continuous Distributions0with 0.Generalizes factorial:(n)=(n 1)!when n is a positive integer.(+1)=()for 0.parameter 0 and scale parameter 0f(x)=1()x1ex/,x 0.Conjugate prior for Poisson rate.Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201323/95Machine Learnin
21、g BasicsStatistical EstimationOutline1234Spatio-Temporal Signal Recovery from Social MediaMachine Learning BasicsProbabilityStatistical EstimationDecision TheoryGraphical ModelsRegularizationStochastic ProcessesSocioscope:A Probabilistic Model for Social MediaCase Study:RoadkillZhu (U Wisconsin)Unde
22、rstanding Social MediaCCF/ADL Beijing 201324/95Machine Learning BasicsStatistical EstimationParametric ModelsA statistical model H is a set of distributions.In machine learning,we call H the hypothesis space.A parametric model can be parametrized by a finite number ofparameters:f(x)f(x;)with paramet
23、er 2 Rd:H=f(x;):2 Rdwhere is the parameter space.Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201325/95Statistical EstimationMachine Learning BasicsParametric ModelsWe denote the expectationE(g)=Zxg(x)f(x;)dxE means Ex f(x;),not over di erent s.data1All(parametric)models are wrong.Some
24、 are more useful than others.Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201326/95Machine Learning BasicsStatistical EstimationNonparametric modelA nonparametric model cannot be parametrized by a fixed number ofparameters.Model complexity grows indefinitely with sample sizeExample:H=P
25、 :V arP(X)0,favor homogeneous chainsWhen the parameter a 1(X)Multivariate Gaussian1xi,xj are conditionally independent given all other variables,if andonly if ij=0When ij 6=0,there is an edge between xi,xjZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201362/95Machine Learning BasicsGrap
26、hical ModelsConditional Independence in Markov Random FieldsTwo group of variables A,B are conditionally independent givenanother group C,if:A,B become disconnected by removing C and all edges involving CACBZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201363/95Machine Learning BasicsRe
27、gularizationOutline1234Spatio-Temporal Signal Recovery from Social MediaMachine Learning BasicsProbabilityStatistical EstimationDecision TheoryGraphical ModelsRegularizationStochastic ProcessesSocioscope:A Probabilistic Model for Social MediaCase Study:RoadkillZhu (U Wisconsin)Understanding Social M
28、ediaCCF/ADL Beijing 201364/95Machine Learning BasicsRegularizationRegularization for Maximum LikelihoodCan overfit.Regularized likelihoodn()+()()is the regularizer,for example ()=kk2.Coincides with MAP estimate with prior distributionp()/exp()Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijin
29、g 201365/95kxiMachine Learning BasicsRegularizationGraph-based regularizationNodes:x1.xn,=f=(f(x1),.,f(xn)Edges:similarity weights computed from features,e.g.,IIk-nearest-neighbor graph,unweighted(0,1 weights)fully connected graph,weight decays with distancexjk2/2Iw=exp-radius graphAssumption Nodes
30、connected by heavy edge tend to have the samevalue.x1x3x2Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201366/95RegularizationMachine Learning BasicsGraph energyf incurs the energyXi jwij(f(xi)f(xj)2smooth f has small energyconstant f has zero energyZhu (U Wisconsin)Understanding Social
31、 MediaCCF/ADL Beijing 201367/95Machine Learning BasicsRegularizationAn electric network interpretationEdges are resistors with conductance wijNodes clamped at voltages specified by fEnergy=heat generated by the network in unit time+1 voltRij=1wij10Zhu (U Wisconsin)Understanding Social MediaCCF/ADL B
32、eijing 201368/95Machine Learning BasicsRegularizationThe graph LaplacianWe can express the energy of f in closed-form using the graph Laplacian.I symmetric,non-negativeDiagonal degree matrix D:Dii=j=1 WijGraph Laplacian matrix=DWThe energyXi jwij(f(xi)f(xj)2=f fZhu (U Wisconsin)Understanding Social
33、MediaCCF/ADL Beijing 201369/95XGaussian likelihood yi=f(xi)+i where i N(0,2),andMachine Learning BasicsRegularizationGraph Laplacian as a RegularizerRegression problem with training data xi 2 Rd,yi 2 R,i=1.nAllow f(Xi)to be di erent from Yi,but penalize the di erence witha Gaussian log likelihoodReg
34、ularizer (f)=f fminfni=1(f(xi)yi)2+f fEquivalent to MAP estimate withII1Gaussian Random Field prior p(f)=Z exp f fZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201370/95PnPnPnLf=graph regularizer fi=1 aiMachine Learning BasicsRegularizationGraph Spectrum and RegularizationAssumption:lab
35、els are“smooth”on the graph,characterized by the graphspectrum(eigen-values/vectors(i,i)ni=1 of the Laplacian L):L=i=1 i i ia graph has k connected components if and only if 1=.=k=0.the corresponding eigenvectors are constant on individual connectedcomponents,and zero elsewhere.any f on the graph ca
36、n be represented as f=i=1 ai i2ismooth function f uses smooth basis(those with smalli)Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201371/95Machine Learning BasicsRegularizationExample graph spectrumThe graphEigenvalues and eigenvectors of the graph Laplacian1=0.006=0.3811=1.7916=3.342
37、=0.007=0.6612=2.2117=3.623=0.048=1.0013=2.6218=3.624=0.179=1.3814=2.6219=3.835=0.3810=1.3815=3.0020=3.96Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201372/95Machine Learning BasicsStochastic ProcessesOutline1234Spatio-Temporal Signal Recovery from Social MediaMachine Learning BasicsPr
38、obabilityStatistical EstimationDecision TheoryGraphical ModelsRegularizationStochastic ProcessesSocioscope:A Probabilistic Model for Social MediaCase Study:RoadkillZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201373/95Machine Learning BasicsStochastic ProcessesStochastic ProcessInfinit
39、e collection of random variables indexed by a set x.x 2 R for“time”More generally,x 2 Rd(e.g.,space and time).Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201374/95Machine Learning BasicsStochastic ProcessesGaussian ProcessA stochastic process where any finite number of random variable
40、shave a joint Gaussian distribution.Index x 2 Rdf(x)2 R:the random variable indexed by xmean function m(x)=Ef(x),e.g.m(x)=0,8xcovariance function(kernel)k(x,x0)=E(f(x)m(x)(f(x0)m(x0),e.g.RBF12 2Gaussian Process(GP)f(x)GP(m(x),k(x,x0)A draw from a GP is a function f over Rd.Zhu (U Wisconsin)Understan
41、ding Social MediaCCF/ADL Beijing 201375/950Covariance matrix =AMachine Learning BasicsStochastic ProcessesGaussian Process vs.Gaussian DistributionFor any x1,.,xn,(f(x1),.,f(xn)N(,)Mean vector =(m(x1),.,m(xn)k(x1,x1).k(x1,xn).k(xn,x1).k(xn,xn)1Zhu (U Wisconsin)Understanding Social MediaCCF/ADL Beiji
42、ng 201376/95RExpected number of events in region T Rd isT =Stochastic ProcessesMachine Learning BasicsPoisson ProcessRate function (x)Ihomogeneous Poisson Process:(x)=constant(x)dxI inhomogeneous Poisson Process:otherwiseTThe number of events in T follows a Poisson distributionP(N(T)=k)=eTkTk!Zhu (U
43、 Wisconsin)Understanding Social MediaCCF/ADL Beijing 201377/95Machine Learning BasicsStochastic ProcessesCox ProcessDoubly stochastic processThe rate (x)itself sampled from another stochastic processEg,Sigmoidal Gaussian Cox Process Adams et al.ICML09(x)=(f(x)upper bound intensity11+ezlogistic“squas
44、hing”(z)=f(x)GPZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201378/95Socioscope:A Probabilistic Model for Social MediaOutline1234Spatio-Temporal Signal Recovery from Social MediaMachine Learning BasicsProbabilityStatistical EstimationDecision TheoryGraphical ModelsRegularizationStochas
45、tic ProcessesSocioscope:A Probabilistic Model for Social MediaCase Study:RoadkillZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201379/95Socioscope:A Probabilistic Model for Social MediaCorrecting Population BiasSocial media user activity intensity gstx Poisson(f,g)Link function(target p
46、ost intensity)(f,g)=f gCount of all posts z Poisson(g)gst can be accurately recoveredZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201380/95Socioscope:A Probabilistic Model for Social MediaHandling Imprecise LocationReproduced from Vardi et al(1985),A statistical model for positronemiss
47、ion tomographyZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201381/95Socioscope:A Probabilistic Model for Social MediaHandling Imprecise Location:TransitionZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201382/95Socioscope:A Probabilistic Model for Social MediaHandling Impre
48、cise Location:TransitionZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201383/95Socioscope:A Probabilistic Model for Social MediaHandling Zero/Low CountsZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201384/95Socioscope:A Probabilistic Model for Social MediaThe Graphical Mode
49、lZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201385/95XXSocioscope:A Probabilistic Model for Social MediaOptimization and Parameter Tuningmin2Rnmi=1hi)+()(xi loghij=logfjhi=nj=1Pij(j,j)Quasi-Newton method(BFGS)Cross-Validation:Data-based and objective approach toregularization;Sub-sam
50、ple events from the total observationsZhu (U Wisconsin)Understanding Social MediaCCF/ADL Beijing 201386/95If x Poisson(h),then E(xhh)2=hSocioscope:A Probabilistic Model for Social MediaTheoretical ConsiderationHow many posts do we need to obtain reliable recovery?1 x1:more counts,lesserrorTheorem:Le