-
What is a data warehouse?
A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format
"The data warehouse is a collection of integrated, subject-oriented databases designed to support DDS functions, where each unit of data is non-volatile and relevant to some moment in time"
-
Why is a DW needed? What is the ultimate purpose of a DW in BI systems?
To provide the single version of the truth
-
The four major characteristics fo data warehousing:
- Subject-oriented
- Integrated
- Time-variant
- Nonvolatile
-
Characteristics of Data Warehousing
Subject-Oriented
Data are organized by topics, such as sales, products, customers, etc. Best for providing a more comprehensive view of the organization; not only how a business is operating, but why.
-
Characteristics of Data Warehousing
Integrated
Data from different sources are stored in a consisten format. Also clarity is obtained in unit of measures, naming/labeling of attributes, etc. (The assumption is the datat warehouse is totally integrated.)
-
Characteristics of Data Warehousing
Time Variant
Provides data at various points in time (daily, weekly, monthly, quaterly, annually - historic and current data so as to analyze trends, deviations, compare and forcast outcomes, etc.) Every data warehouse should have a time variable.
i.e. LSU enrollement, retention, graduation data
-
There are three main types of Data Warehouses
- Data Mart
- Operational Data Stores (ODS)
- Enterprise data warehouses (EDW)
-
A Data Mart is:
a subset of a data warehouse, usually consisting of a single subject area (marketing, sales, customer satisfaction inventory, production, etc.)
-
A Data Mart has two componets which are
- Dependent Data Mart
- Independent Data Mart
-
Dependent Data Mart (subset of Data Mart)
Created directly from the data warehouse. This ensures that the user is using/viewing the same data available at all other users. EDW must be constructed first.
-
-
Independant Data Mart (subset of Data Mart)
A small warehouse designed for a department or strategic business unit (SBU). Its source is not an EDW
-
-
Operational Data Stores (ODS)
A type of database often used as an interim (or staging) area for a data warehouse, especially for customer information files (CIF). Data are updated frequently through the course of business operations as opposed to the static contents of a data warehouse. (short-term memory)
-
-
Enterprise Data Warehouses (EDW)
A large-scale data warehouse that is used across the enterprise/company for decision support. Being large-scaled, the EDW integrates data in standard format from many sources. (Direct TV and Enterprise use EDW)
-
Enterprise Data Warehouse
-
Differentiate among a data mart, an ODS, and an EDW
- An ODS (Operational Data Store) is the database from which a business operates on an ongoing basis.
- Both an EDW and a Data Mart are data warehouses. An EDW (Enterprise Data Warehouse) is an all-encompanssing DW that covers all subject areas of interest to the entire organization. A data mart is a smaller DW designed around one problem, organizational function, topic, or other suitable focus area
-
Data Mart
is a subset of a data warehouse, typically consisting of a single subject area (smaller and focusing on a particular subject or department). A data mart can be either dependent or independent.
-
Dependent Data Mart
a subset that is created directly from the data warehouse
it has the advantages of using a consistent data model and providing quality data
-
Independent Data Mart
a small warehouse designed for a strategic business unit or department, but its source is not an EDW
-
Operational Data Stores (ODS)
This type of database is often used as an interim staging area for a data warehouse.
An ODS is similar to short-term memory in that it stores only very recent information
-
Enterprise Data Warehouse (EDW)
a large-scale data warehouse that is used across the enterprise for decision support.
The large scale nature provides integration of data from many sources into a standard format for effective BI and decision support applications
-
MetaData
- Data about Data
- descirbes the contents of a data warehouse, its structure (such as a field name, data type, default value, length), meaning, syntax and the manner of its use.
-
MetaData - 3 Main Types
Syntactic Metadata
Structural Metadata
Semantic Metadata
-
MetaData Types
Syntactic Metadata
Data describing the syntax of data
ex. "Dublin Core" may be expressed in plain text, HTML or XML
-
MetaDate Types
Structural Metadata
Data describing the structure of the data
ex. field name, data type, length, table relationships
-
Meta Data Type
Semantic Metadata
Data describing the meaning of the data
ex. a webpage may include metadata specifying what language it is written in
-
Metadata in BI Architecture
Centralized
-
Metadata in BI Architecture
DISTRIBUTED
-
DW Framework
(Database can come from all: Database sources, ETL, Process, EDW, or Data mart)
-
Data Warehouse Architecture
Three-Tier
 - 1. Database Server (data acquisition software - backend)
- 2. The data warehouse that contains the data & software (application server)
- 3. Client (front-end) software that allows users to access and analyze data from the warehouse (Client Workstation)
-
Data Warehouse Architecture
Two-Tier Architecture
 - 1. Application and database server
- 2. Client Workstation
|
|