-
What is Business Intelligence (BI)
- A set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information
- Coined by the gartner group - early 1990s
-
Business intellegence supports proper _______
- governance
- For controlling and monitoring business health
- Creates accountability
-
How does BI manage information
- Master Data Management (MDM)
- Provides for comprehensive and consistent definition of all data in an organization
- Ensures uniform and consistent view of all data
-
How can you monitor business health
- Setting goals
- Defining key performance indicators (KPIs)
- Quantifiable measurements that assess a company's effectiveness in reaching its goals (i.e. general, financial, human resources)
-
Typical major components of BI
- Data extraction, transformation, and loading tools
- Data store (Data warehouse or data mart)
- Data query and analysis tools (OLAP)
- Data presentation and visualization tools (dashboard)
-
What are 2 different types of data
- Operational data
- Decision support data
-
Describe operational data
- Transactional databases
- Stored in highly normalized tables in a relational database
- Dynamically updated
- Focus on traditional information systems
-
Describe decision support data
- Data warehouses
- Stored in formats that facilitate data extraction, data analysis, and decision making
- Often aggregated and with redundancies
-
What step of BI is "Collecting and storing operational data" (there are 6 steps)
- step 1

-
What step of BI is "Aggregating the operational data into decision support data" (there are 6 steps)
- Step 2

-
What step of BI is "Analyzing the decision support data to generate information" (there are 6 steps)
- Step 3

-
What step of BI is "Presenting the information to the end user to support decision-making" (there are 6 steps)
- Step 4

-
What step of BI is "Making business decisions (and generating more data)" (there are 6 steps)
Step 5
-
What step of BI is "Monitoring results to evaluate outcomes of the business decisions" (there are 6 steps)
Step 6
-
What is a data warehouse
- A database optimized for data analysis and read-only query processing
- Prime objective of it is to provide fast and accurate answers to data analysis queries
-
Data are typically extracted from various sources and then transformed and integrated into the
Data warehouse
-
Development of a data ware house requires ______, _______, and __________
Time, money, considerable managerial effort
-
Data characteristics of data warehouses
- Integrated - consitent format and meaning
- Subject-oriented - organized and summarized to answer questions
- Time-variant - captures and represents the flow of data over time
- Nonvolatile - once data enters the warehouse, it is never removed
-
What is a data mart
- A small single-subject data warehouse subset that provides decision support to a small group of people
- Less organizational commitment required
- Lower cost, shorter implementation time
-
Describe online analytical processing (OLAP)
- Graphical user interface
- Analytical processing logic
- Data-processing logic
-
What are the differentiating characteristics of OLAP
- Advanced functionality for data presentation, aggregation, modeling and analysis
- Capacity for multi-dimensional analysis
- Used with both transactional DBs and data warehouses
-
Describe a data cube
- Data stored in multi-dimensional arrays
- Provides an opportunity to "slice and dice" data
- foundation for OLAP
- Data retrieval is much quicker than with standard relational databases
-
Star schema's have
- Fact tables
- Dimension tables
-
Describe fact tables
- Associated with a particular type of data
- ex. sales table
- Sales: Sales_Quantity, Sales_Price, Sales_Total
- Orders: Order_Quantity, Order_Price, Order_Amount
-
Describe dimension tables
- Attributes provide descriptive information about the facts within a given dimension
- ex. product, time and location tables
- Product: Prod_Brand, Prod_Color, Prod_Size, Prod_Package
- Location: Region_ID, Loc_State, Loc_City, Loc_Office
-
Data is extracted from source systems at regular intervals - typically measured in days, months and quarters
Predictable frequency
-
data is sourced from controlled, internal systems supporting established and well-defined back-office processes
Static sources
-
Data structures are known and modeled in advance of analysis
Fixed models
-
questions to be asked of the data are pre-defined
Defined queries
-
Rigorous change control is enforced before the introduction of new data sources or reporting requirements
Slow-changing requirements
-
The consumers of BI reports are typically business managers and senior executives
Limited users
-
Projects where SQL is ideal
- logical related descrete data requirements which can be identified up-front
- data integrity is essential
- standards-based proven technology with good developer experience and support
-
Projects where NoSQL is ideal
- unrelated indeterminate or evolving data requirements
- simpler or looser project objectives, able to start coding immediately
- speed and scalability is imperative
-
What are the 8 main differences in SQL vs NoSQL
- Tables
- Schemas
- Normalization
- JOINs
- Data integrity
- Transactions
- Performance
- Scaling
-
what are the big factors impacting traditional BI databases
- Semi-structured and unstructured data
- Rapid evolution of database schema
- High-velocity data sources
- Quickly growing data volumes
-
What is data mining
Non-trivial extraction of implicit, previous unknown and potentially useful information from data
-
Why may traditional techniques of answering business question be unsuitable
- Enormity of data
- high dimensionality of the data
- heterogenous, distributed nature of data
-
Describe supervised algorithms (classification)
- Learning by example
- use training data with correct answers
- identify a class label for the incoming new data
- Find a model to predict the class attribute as a function of the values of the other attributes
-
Describe unsupervised algorithms (clustering)
- do not use training data
- classes may not be known in advance
-
What are the 3 things required by the Nearest-Neighbor Classifier
- The set of stored records
- Distance Metric to compute the distance between records
- The value of k, the number of nearest neighbors to retrieve
-
To classify an unknown record using the nearest-neighbor classifier
- Compute the distance to other training records
- identify k nearest neighbors
- use class labels of nearest neighbors to determine the class label of unknown record
-
Factors to consider when choosing the value of k
- If too small, the model is sensitive to noise
- If too large, neighborhood may include too many points from other classes
|
|