ImageVerifierCode 换一换
格式:PPTX , 页数:46 ,大小:9.96MB ,
文档编号:2979396      下载积分:25 文币
快捷下载
登录下载
邮箱/手机:
温馨提示:
系统将以此处填写的邮箱或者手机号生成账号和密码,方便再次下载。 如填写123,账号和密码都是123。
支付方式: 支付宝    微信支付   
验证码:   换一换

优惠套餐
 

温馨提示:若手机下载失败,请复制以下地址【https://www.163wenku.com/d-2979396.html】到电脑浏览器->登陆(账号密码均为手机号或邮箱;不要扫码登陆)->重新下载(不再收费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  
下载须知

1: 试题类文档的标题没说有答案,则无答案;主观题也可能无答案。PPT的音视频可能无法播放。 请谨慎下单,一旦售出,概不退换。
2: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
3: 本文为用户(三亚风情)主动上传,所有收益归该用户。163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

1,本文(大数据研究(英文)课件.pptx)为本站会员(三亚风情)主动上传,163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。
2,用户下载本文档,所消耗的文币(积分)将全额增加到上传者的账号。
3, 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(发送邮件至3464097650@qq.com或直接QQ联系客服),我们立即给予删除!

大数据研究(英文)课件.pptx

1、 Exploring Big Data Analysis:Exploring Big Data Analysis: Fundamental Scientific Problems Fundamental Scientific ProblemsOutlinelBig Data: Opportunities and ChallengeslSome More Scientific Problems in Big Data Analysis and ProcessinglSome Advances on Big Data ResearchBig DataA term for a collection

2、of data that are very large and complex so that it is difficult to process and analyze using on-hand database management tools, traditional data processing methods and analysis methodologies . (Wikipedia )ZB(1021), EB(1018), PB(1015), TB(1012), GB(109), MB(106)Big Data: Opportunities and ChallengesW

3、hy difficulty? Big data challenges the existing information technologies, management paradigm, statistical and computa- tional sciences.VolumeBig Data: Opportunities and Challengesl PBZB in scalel Distributed storage and processing necessaryl Growing tremendouslyl Data flowl Multisource, correlated,

4、 heterogeneousl Unstructured, unreliable, inconsistent.lTotal dataset embodies great valuel Individual or small subset contains less informationVelocityVarietyValueWhat opportunities:Big data embody great values that might not be explored in small sized data.Scientific ResearchesHigh-energy physicsA

5、stronomyLife scienceGeosciences and remote sensingSocial GovernanceBusinessNew chance of getting benefit/incomesValuable customer findingMarketingBig Data: Opportunities and ChallengesThe fourth paradigm of researchA systematic approach uniquely applicable to modern management (Jims Gray)Big data vi

6、ew of assessing public policiesManagement ScienceBig data research: A real inter/multidisciplinary activities.Data acquisition&data managementData assess &processingData understandingApplicationsMath and StatisticsInformation ScienceEngineeringsFundamentalChallenge 1Big Data: Opportunities and Chall

7、engesFundamentalChallenge 2FundamentalChallenge 3FundamentalChallenge 4Management ScienceBig data research: A real inter/multidisciplinary activities.Data acquisition&data managementData assess &processingData understandingApplicationsMath and statisticsInformation ScienceEngineeringsFundamentalChal

8、lenge 1Challenges 1 : Data Resource Management& Public PoliciesAcquisition; Quality; Standard; Sharing; Privacy protection; Safety; Data-driven managementBig Data: Opportunities and ChallengesFundamentalChallenge 2FundamentalChallenge 3FundamentalChallenge 4Architecture; System/Software/Algorithm; S

9、calability/Complexity; Real time processingChallenges 2 : IT& Science for Big DataRepresentation (Uniform scheme; Complexity); Modeling (Parent space identification; sampling); Mining (Clustering; Classification; Regression; Prediction; Variable Selection); Analytics(Relevance Analysis; Latent varia

10、ble analytics; Statistical inference); Computation (Subsampling; Complexity; Distributed computation)Challenges 3: Statistics & Computation for Big Data AnalyticsHighly domain-specific; Any data-driven fields (Social media based; Trade data based; Record (Survey, Observation) based; Empirical data b

11、ased; Experimental data based)Challenges 4: Big Data EngineeringsManagement ScienceBig data research: A real inter/multidisciplinary activities.Data acquisition&data managementData assess &processingData understandingApplicationsMath and statisticsBig Data Industry (Value chain management, Business

12、pattern,)Information ScienceEngineeringsFundamentalChallenge 1Big Data: Opportunities and ChallengesFundamentalChallenge 2FundamentalChallenge 3FundamentalChallenge 4OutlinelBig Data: Opportunities and ChallengeslSome More Scientific Problems in Big Data Analysis and ProcessinglSome Advances on Big

13、Data ResearchHigh dimensionality problem:The number of features (p) is far larger than the sample size (n), and n varies with p (n=n(p) Classical Classical:npnp; ; High-DHigh-D:pnpn; ; Big dataBig data:pn(p)pn(p). . Solution Asymptotical normalityProblem 1: High DimensionalityY=Xnpbp1b=(XX)-1XYn(b-b

14、) N(0,1n(XX)-1s2)dN(0,s2Ipp)y=b1x1+b2x2+,bpxpLinear model:Data:D=(x1,y1),(x2,y2),(xn,yn)Matrix form: Core open questionslHow to add priors so that a high-D problem can be well defined?lSparse modelinglHigh-D statisticslHigh-D data mining (clustering stability , classification consistency )Hot Issues

15、:Sparse modeling (compressed sensing; low rank decomposition of matrix; sparse learning)Problem 1: High Dimensionality)()(npntRxSub-sampling problem: A big data set has to be processed by some types of divide-and-conquer schemes, like Hadoop system.The Big Data Bootstrap. Kleiner et.al. 2012 ICML Pr

16、oblem 2: Sub-samplingX X1 1X X2 2X X3 3X Xn nMap (random sub-sampling)D1DkDm.Reduce (aggregation)DIntermediate solution f1Intermediate solution f2Intermediate solution fmFinal estimation f*Problem 2: Sub-samplingD1TransitivityTransitivity Core open questionslHow to sub-sampling/aggregate so that the

17、 final f* models properly D ?lIs distributed processing feasible?lHow about traditional sub-sampling technologies work?lSub-sampling axiom (Similarity; Transitivity, )D2D3Problem 3: Computational ComplexityComputational Complexity Problems:Traditionally, computational complexity concerns with how di

18、fficult a problem can be solved, or how much computation cost must be paid an algorithm solves a problem.)(: )(DAPAR=)(tttDAR =Traditional settingBig data settingProblem 3: Computational ComplexityRt=At(Dt)D1D2D3ExchangeProcessing Core open questionslHow to properly define complexity in big data set

19、ting?lEasy or difficult, a given big data problem?lHow to establish complexity theory for some specific types of big data problems?lFlow data Dti (easy Ati (Dti) yields Rti withinti=ti+1-ti)lDistributed processing (easy processing time data exchange time)Real & distributed computation problem: Paral

20、lel and distri -buted processing are necessary, perhaps become uniquely available way of processing for big data. The main challenges come from:Problem 4: R/D ComputationHDFSHBaseMapReduceHadooplReal timelFeasibilitylEfficiencylScalabilityF(x)New D2D1Fnew(x)D1 + D2X=(0,0,0,1,1,1,0,0,0,1,1,0)Xu et.al

21、. Efficiency speed-up for evolutionary computation Fundamentals and Fast-Gas. AMC 2003Code Core open questionslThe IT for supporting fast storage/ reading/ranking .?lProblem decomposability: Can and how a data modeling problem be decomposed into a series of sub-data set dependent problems?lSolution

22、assemblies: How can the solution of a problem be assembled with its sub-solution (component solutions)?lDifficult or easy of a specific data flow computation problem?Problem 4: R/D ComputationProblem 5: Unstructured Processing Unstructured data processing problems: Structured data are those that can

23、 be represented with finite number of rules and can be processed within acceptable time; Otherwise, unstructured. The main challenge:(Structured data)lMultisourcedlHeterogeneouslUnderstanding: cognition dependent(Unstructured data)UnstructureddatatextImageVideoUnified processing platformDecision:F(x

24、)Problem 5: Unstructured Processing Core open questionslHow to build a uniform platform on How to build a uniform platform on which different types of which different types of unstructured data can be processed unstructured data can be processed simultaneouslysimultaneouslylHow to develop the cognit

25、ion How to develop the cognition consistent approaches for consistent approaches for unstructured data modeling?unstructured data modeling?Problem 6: VisualizationVisualization analysis:Using visual-consistent figures or graphics to exhibit the intrinsic structure and patterns in high dimensional bi

26、g data. A basic tool for human-machine interface and expanding applications.Data space(H-d)Feature Space(L-d)VisualizationVisualized space(2d)FacebookWordleWhisperFeature extractionProblem 6: VisualizationMicrosoft T-drive Yuan et al., 2010 Core open questionslEssential feature extraction of H-d dat

27、a (dimension-reduction)?lStructured representation of imaginal thinking?lHow to construct appropriate visualized space?lHow to map a problem in feature space (Data space) to a representation problem in visualized space?OutlinelBig Data: Opportunities and ChallengeslSome More Scientific Problems in B

28、ig Data Analysis and ProcessinglSome Advances on Big Data Research(1) HighDimensionality Problem - Sparse Modeling - Clustering Stability(2) R/D Computation Problem - Feasibility of Hadoop-based Algorithms - Unveiling Traffic Anomalies(3) Unstructured Data Processing - Visual Clustering Machine Some

29、 Advances in My GroupSparsity (of x): There exists a characteristic quantity q(x) such that q(x) is of singularity (i.e., smaller than the normal).0.20.40.60.811.21.41.61.82x 104-30-25-20-15-10-5051015200: ( )|(|nCarxRq xd xx=(),Trace(),()(Rank XXCardqXX * *(),(: (),?)rampnknTCaRqrd=XXXXminxRnF(x),s

30、.t.|x|0RminxRnF(x)+l xqq,0q1*, min( )( ), .,()m nL E RRank LCard E st YA L E+=+(1) H-d problem: Sparse modeling1st order: 2nd order:3rd order:l Unique Solvability Theory (Signal recovery) RIP: for L0 (Candes & Tao,2006); for Lq (Cai&Zhang,2013; Wang et.al ,2013) Coherence: for L1 (Donoho&Elad,2003)l

31、 Thresholding Representation Theory(1) H-d problem: Sparse modeling*( )qxT xF x=- qT is analytically expressible only if is analytically expressible only if (Xu, 2010; Xu et.al, 2012; Zeng et. al 2014) 0,1/2,2/3,1q=d2k1d2k0.5( )1/(21)-AkTheoriesl Xu ZB, Data modeling: Visual Psychology Approach and

32、L(1/2) Regularization Theory, Proceeding of ICM, 2010l Xu et.al, L(1/2) Regularization: A Thresholding Representation Theory and A Fast Solver, IEEE TNNLS, 2012l Zeng et.al, L(1/2) Regularization: Convergence of Iterative Half Thresholding Algorithm, IEEE TSP,2014;(1) H-d problem: Sparse modelinglFr

33、om linear to nonlinearlFrom 1st ordet to higher order lFrom unconstrained to constrainedlGreedy-type: OMP(Tropp,2006),CoSaMp(Deedell&Tropp,2009),SP(Dai,2009)lConvex-type: Linear programming(Candes et.al,2006),FPC(Yin et.al,2008), FISTA(Beck et.al,2009)lNonconvex-type: Reweigted L1(Candes et.al,2008)

34、,IRLS(Daubechies et.al,2010) Half thresholding(Xu et.al, 2012),Smoothing(Chen et.al,2013)AlgorithmsExtensionsClustering: Categorize a data set into subgroups according to data similarity; The basis of pattern recognition. (1) H-d problem: Clustering stabilityTraditional K-means:H-d setting:Given a d

35、ata flowlVariable dimension (pt)lVariable sample size n(pt)lCt C* (Consistency + Stability),1()argmin(x ,)ikKikCkxCCK Dd= ()(),tn ptttCK DDR=New Challenges:tD(1) H-d problem: Clustering stabilityNew Modeling (Feature decomposable)New Concept (Optimal Clustering)New Theory : If the data flow are mixt

36、ure Gaussian distributed, then 1)The sparse K-Means is consistent2)The optimal solution is stable ( )n p p ( Chang ,Lin & Xu, Sparse K-Means via l/l0 Penalty for High-dimensional Data Clustering, 2014.) Regression:Find an estimation for the correspondence between input (X)and output (Y) based on fin

37、ite number of observations S=(xi,yi), i=1,n.(2) R&D computation problem - Feasibility of Hadoop-based regressionTraditional approach: RERMModel:Theory: (Regression function) based on the fact the hypothesis error: sff0()()sssff21argmin( , )KKsHfHz Sfl f zfnl=+Big Data Setting: S is too big to proces

38、s in a central computer.Then the distributed processing has to be made.Global Machine. . . . . . .Local Machines(2) R&D computation problem - Feasibility of Hadoop-based regressionHydoop-based regression:Step 1New Challenge: hypothesis error21argmin( , )KKijjHfHzSjfl f zfnl=+Step 2*11mjjffm=S1S2S3Sm

39、S0()()sssff?New methodology:Using the random sampling inequality to estimate the hypo-thesis error ( Random sampling inequality quantifies the fact that a differentiable function cannot attain its large values anywhere if its derivatives are bounded on a sufficiently dense discrete set ).(2) R&D com

40、putation problem - Feasibility of Hadoop-based regressionFeasibility Theory:Under certain conditions, the Hydoop-based regression algorithm is feasible in the sense of consistency*()()0ff-(Chang & Xu, Distributed Regression for Big Data: A Feasibility Theory, ICML 2014)Unveiling Traffic Anomalies: T

41、raffic anomalies monitoring is a typical flow big data problem, which needs real time processing.(2) R&D computation problem - Unveiling Traffic AnomaliesTopology of IP NetworkAnomaly Matrix:ATraffic Matrix:ZLLA-LADM LLA-LADM Algorithm is used to solve Algorithm is used to solve the above model.the

42、above model.(2) R&D computation problem - Unveiling Traffic Anomalies2nd order sparsity modellAbilene IP NetworkData: : http:/internet2.edu/observatory/achive/data-collections.html 11 nodes,41 links,121 OD flows one-week period:2003/11/8-2003/11/14 5-minute intervals, T=2016(2) R&D computation probl

43、em - Unveiling Traffic AnomaliesCore Idea: View a data modeling problem as a cognition problem, and solve the problem by simulating visual psychology principles. We develop the model in low-dimension through visual intuition and transmit it to high-dimension by mathematical induction. (Leung & Xu, I

44、EEE TPAMI, 2000) regression clustering Traditional approach: data structure-basedNew approach: cognition-basedWhy can I recognize it so easily? classification (3) Unstructured Problem - Visual Clustering Machine A Basic Visual Principle: The distribution of light strength reaching at retina is contr

45、olled by the distance between the object and retina, or the curvature of crystalline lens.Visual imaging system at retina levelRetina levelVisual Cortex level(3) Unstructured problem - Visual Clustering Machine Scale Space Representation: View the distance or curvature of lens as the scale, the imag

46、e, i.e., the light strength, of an object can be represented in multiple scales Witkin, IJCAI, 1983; Perona, PAMI, 1990 .)(xp Let denote the light strengths distribution of an object in real world , and be its distance to the retina, then the projected image on the retina is modeled as( ,)( ,)( ,0)(

47、 )xP xP xP xp xsss= =Linear diffusion model( ,)( )*( ,)() ( ,)P xp xG xp xy G ydysss=-22| |221( ,)( 2)xG xesss-=Multiscale representation of Lena image with increasing ( ,)p xsss( ,)div( (|)( ,0)( )P xfPPP xp xss=nonlinear diffusion model:(3) Unstructured Problems - Visual Clustering Machine( ; )( ;

48、 )* ( )( ; )*(0)P xG xp xG xX ( ) sss=Data image (data):Multi-scale representation:= 0.2= 1.0ssMulti-scale evolution:Scale Space Clustering: View a datum as a light point, and the data set as an image, then we observe the clustering structures from the multi-scale representation of the data image Le

49、ung, Zhang & Xu, IEEE Trans. PAMI, 2000. Data set= 2.0s)0( )(1)(11NiiNiixXxxNxp=-=d(3) Unstructured Problems - Visual Clustering MachineA blob=0)0(),(xxxPdtdxxs),;(lim),(00ssxtxxyt=Centroid:Gradient flow:300 clusters : 0.023 clusters:0 x),(0sxy1 cluster:1s=2s=What is blob? A light blob is a cluster.

50、 It corresponds to a set of data, starting from which the same local maximum is reached. (3) Unstructured Problems - Visual Clustering Machine3 basic problems0sStep 1. Given a set of scales with . At , each datum is a cluster center and its blob center is itself. Let .Step 2. Find the new blob cente

侵权处理QQ:3464097650--上传资料QQ:3464097650

【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。


163文库-Www.163Wenku.Com |网站地图|