数据仓库的概念课件.ppt

上传人(卖家):晟晟文业 文档编号:4107047 上传时间:2022-11-11 格式:PPT 页数:27 大小:145.50KB
下载 相关 举报
数据仓库的概念课件.ppt_第1页
第1页 / 共27页
数据仓库的概念课件.ppt_第2页
第2页 / 共27页
数据仓库的概念课件.ppt_第3页
第3页 / 共27页
数据仓库的概念课件.ppt_第4页
第4页 / 共27页
数据仓库的概念课件.ppt_第5页
第5页 / 共27页
点击查看更多>>
资源描述

1、2022年11月11日星期五1 Data Warehousing2022年11月11日星期五2 1.传统数据库以及OLTP(On-Line Transaction Processing 联机事务处理)在日常的管理事务处理中获得了巨大的成功,但是对管理人员的决策分析要求却无法满足。2.因为,管理人员常常希望能够通过对组织中的大量数据进行分析,了解业务的的发展趋势。而传统数据库只保留了当前的业务处理信息,缺乏决策分析所需要的大量的历史信息。3.为满足管理人员的决策分析需要,就需要在数据库的基础上产生适应决策分析的数据环境数据仓库(Data Warehose)。数据仓库技术出现的背景数据仓库技术出现

2、的背景2022年11月11日星期五3n1-What is a data warehouse?2022年11月11日星期五4What is Data Warehouse?nDefined in many different ways,but not rigorously.nA decision support database that is maintained separately from the organizations operational databasenSupport information processing by providing a solid platform o

3、f consolidated,historical data for analysis.n“A data warehouse is a subject-oriented,integrated,time-variant,and nonvolatile collection of data in support of managements decision-making process.”W.H.Inmon2022年11月11日星期五5数据仓库的定义与基本特性数据仓库的定义与基本特性n William H.Inmon在1993年所写的论著Building the DataWarehouse首先系

4、统地阐述了关于数据仓库的思想、理论,为数据仓库的发展奠定了历史基石。文中他将数据仓库定义为:a data warehouse is a subject-oriented,integrated,non-volatile,time-variant collection of data in support of management decisions.一个面向主题的、集成的、非易失性的、随时间变化的数据的集合,以用于支持管理层决策过程。2022年11月11日星期五6Data WarehouseSubject-OrientednOrganized around major subjects,suc

5、h as customer,product,sales.nFocusing on the modeling and analysis of data for decision makers,not on daily operations or transaction processing.nProvide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.2022年11月11日星期五7Da

6、ta WarehouseIntegratednConstructed by integrating multiple,heterogeneous data sourcesnrelational databases,flat files,on-line transaction recordsnData cleaning and data integration techniques are applied.nEnsure consistency in naming conventions,encoding structures,attribute measures,etc.among diffe

7、rent data sourcesnE.g.,Hotel price:currency,tax,breakfast covered,etc.nWhen data is moved to the warehouse,it is converted.2022年11月11日星期五8Data WarehouseTime VariantnThe time horizon for the data warehouse is significantly longer than that of operational systems.nOperational database:current value da

8、ta.nData warehouse data:provide information from a historical perspective(e.g.,past 5-10 years)nEvery key structure in the data warehousenContains an element of time,explicitly or implicitlynBut the key of operational data may or may not contain“time element”.2022年11月11日星期五9Data WarehouseNon-Volatil

9、enA physically separate store of data transformed from the operational environment.nOperational update of data does not occur in the data warehouse environment.nDoes not require transaction processing,recovery,and concurrency control mechanismsnRequires only two operations in data accessing:ninitial

10、 loading of data and access of data.2022年11月11日星期五10Data Warehouse vs.Heterogeneous DBMSnTraditional heterogeneous DB integration:nBuild wrappers/mediators on top of heterogeneous databases nQuery driven approachnWhen a query is posed to a client site,a meta-dictionary is used to translate the query

11、 into queries appropriate for individual heterogeneous sites involved,and the results are integrated into a global answer setnComplex information filtering,compete for resourcesnData warehouse:update-driven,high performancenInformation from heterogeneous sources is integrated in advance and stored i

12、n warehouses for direct query and analysis2022年11月11日星期五11Data Warehouse vs.Operational DBMSnOLTP(on-line transaction processing)nMajor task of traditional relational DBMSnDay-to-day operations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.nOLAP(on-line analytical pr

13、ocessing)nMajor task of data warehouse systemnData analysis and decision makingnDistinct features(OLTP vs.OLAP):nUser and system orientation:customer vs.marketnData contents:current,detailed vs.historical,consolidatednDatabase design:ER+application vs.star+subjectnView:current,local vs.evolutionary,

14、integratednAccess patterns:update vs.read-only but complex queries2022年11月11日星期五12OLTP vs.OLAP OLTP OLAP users clerk,IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current,up-to-date detailed,flat relational isola

15、ted historical,summarized,multidimensional integrated,consolidated usage repetitive ad-hoc access read/write index/hash on prim.key lots of scans unit of work short,simple transaction complex query#records accessed tens millions#users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction t

16、hroughput query throughput,response 2022年11月11日星期五13Why Separate Data Warehouse?nHigh performance for both systemsnDBMS tuned for OLTP:access methods,indexing,concurrency control,recoverynWarehousetuned for OLAP:complex OLAP queries,multidimensional view,consolidation.nDifferent functions and differ

17、ent data:nmissing data:Decision support requires historical data which operational DBs do not typically maintainndata consolidation:DS requires consolidation(aggregation,summarization)of data from heterogeneous sourcesndata quality:different sources typically use inconsistent data representations,co

18、des and formats which have to be reconciled2022年11月11日星期五14n2-A multi-dimensional data model2022年11月11日星期五15From Tables and Spreadsheets to Data CubesnA data warehouse is based on a multidimensional data model which views data in the form of a data cubenA data cube,such as sales,allows data to be mo

19、deled and viewed in multiple dimensionsnDimension tables,such as item(item_name,brand,type),or time(day,week,month,quarter,year)nFact table contains measures(such as dollars_sold)and keys to each of the related dimension tablesnIn data warehousing literature,an n-D base cube is called a base cuboid.

20、The top most 0-D cuboid,which holds the highest-level of summarization,is called the apex cuboid.The lattice of cuboids forms a data cube.2022年11月11日星期五16Cube:A Lattice of Cuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locati

21、ontime,item,suppliertime,location,supplieritem,location,suppliertime,item,location,supplier0-D(apex)cuboid1-D cuboids2-D cuboids3-D cuboids4-D(base)cuboid2022年11月11日星期五17A Concept Hierarchy:Dimension(location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM.WindL.Chan.allregionofficecountryT

22、orontoFrankfurtcity2022年11月11日星期五18Multidimensional DatanSales volume as a function of product,month,and regionProductRegionMonthDimensions:Product,Location,TimeHierarchical summarization pathsIndustry Region YearCategory Country QuarterProduct City Month Week Office Day2022年11月11日星期五19A Sample Data

23、 CubeTotal annual salesof TV in U.S.A.DateProductCountrysumsum TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum2022年11月11日星期五20Cuboids Corresponding to the Cubeallproductdatecountryproduct,dateproduct,countrydate,countryproduct,date,country0-D(apex)cuboid1-D cuboids2-D cuboids3-D(base)cuboid2022年11月11日星期

24、五21n3-Conceptual Modeling of Data Warehouses2022年11月11日星期五22Modeling data warehouses:dimensions&measuresnStar schema:A fact table in the middle connected to a set of dimension tables nSnowflake schema:A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller di

25、mension tables,forming a shape similar to snowflakengalaxy schema:Multiple fact tables share dimension tables,viewed as a collection of stars,therefore called galaxy schema2022年11月11日星期五23Example of Star Schema time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcou

26、ntrylocationSales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranch2022年11月11日星期五24Example of Snowflake Schematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreet

27、city_keylocationSales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycityprovince_or_streetcountrycity2022年11月11日星期五25Example of Gal

28、axy schema time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSales Fact Tabletime_key item_key branch_key location_key units_sold dollars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranchShipping

29、 Fact Tabletime_key item_key shipper_key from_location to_location dollars_cost units_shippedshipper_keyshipper_namelocation_keyshipper_typeshipper2022年11月11日星期五26n4-Typical OLAP Operations2022年11月11日星期五27nRoll up(drill-up):summarize datanby climbing up hierarchy or by dimension reductionnDrill down

30、(roll down):reverse of roll-upnfrom higher level summary to lower level summary or detailed data,or introducing new dimensionsnSlice and dice:nproject and select nPivot(rotate):nreorient the cube,visualization,3D to series of 2D planes.nOther operationsndrill across:involving(across)more than one fact tablendrill through:through the bottom level of the cube to its back-end relational tables(using SQL)

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 办公、行业 > 各类PPT课件(模板)
版权提示 | 免责声明

1,本文(数据仓库的概念课件.ppt)为本站会员(晟晟文业)主动上传,163文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。
2,用户下载本文档,所消耗的文币(积分)将全额增加到上传者的账号。
3, 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(发送邮件至3464097650@qq.com或直接QQ联系客服),我们立即给予删除!


侵权处理QQ:3464097650--上传资料QQ:3464097650

【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。


163文库-Www.163Wenku.Com |网站地图|