Data Warehouses OLAP and Business Intelligence

  1. What is Business Intelligence (BI)
    • A set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information
    • Coined by the gartner group - early 1990s
  2. Business intellegence supports proper _______
    • governance
    • For controlling and monitoring business health
    • Creates accountability
  3. How does BI manage information
    • Master Data Management (MDM)
    • Provides for comprehensive and consistent definition of all data in an organization
    • Ensures uniform and consistent view of all data
  4. How can you monitor business health
    • Setting goals
    • Defining key performance indicators (KPIs)
    • Quantifiable measurements that assess a company's effectiveness in reaching its goals (i.e. general, financial, human resources)
  5. Typical major components of BI
    • Data extraction, transformation, and loading tools
    • Data store (Data warehouse or data mart)
    • Data query and analysis tools (OLAP)
    • Data presentation and visualization tools (dashboard)
  6. What are 2 different types of data
    • Operational data
    • Decision support data
  7. Describe operational data
    • Transactional databases
    • Stored in highly normalized tables in a relational database
    • Dynamically updated
    • Focus on traditional information systems
  8. Describe decision support data
    • Data warehouses
    • Stored in formats that facilitate data extraction, data analysis, and decision making
    • Often aggregated and with redundancies
  9. What step of BI is "Collecting and storing operational data" (there are 6 steps)
    • step 1
    • Image Upload 1
  10. What step of BI is "Aggregating the operational data into decision support data" (there are 6 steps)
    • Step 2
    • Image Upload 2
  11. What step of BI is "Analyzing the decision support data to generate information" (there are 6 steps)
    • Step 3
    • Image Upload 3
  12. What step of BI is "Presenting the information to the end user to support decision-making" (there are 6 steps)
    • Step 4
    • Image Upload 4
  13. What step of BI is "Making business decisions (and generating more data)" (there are 6 steps)
    Step 5
  14. What step of BI is "Monitoring results to evaluate outcomes of the business decisions" (there are 6 steps)
    Step 6
  15. What is a data warehouse
    • A database optimized for data analysis and read-only query processing
    • Prime objective of it is to provide fast and accurate answers to data analysis queries
  16. Data are typically extracted from various sources and then transformed and integrated into the
    Data warehouse
  17. Development of a data ware house requires ______, _______, and __________
    Time, money, considerable managerial effort
  18. Data characteristics of data warehouses
    • Integrated - consitent format and meaning
    • Subject-oriented - organized and summarized to answer questions
    • Time-variant - captures and represents the flow of data over time
    • Nonvolatile - once data enters the warehouse, it is never removed
  19. What is a data mart
    • A small single-subject data warehouse subset that provides decision support to a small group of people
    • Less organizational commitment required
    • Lower cost, shorter implementation time
  20. Describe online analytical processing (OLAP)
    • Graphical user interface
    • Analytical processing logic
    • Data-processing logic
  21. What are the differentiating characteristics of OLAP
    • Advanced functionality for data presentation, aggregation, modeling and analysis
    • Capacity for multi-dimensional analysis
    • Used with both transactional DBs and data warehouses
  22. Describe a data cube
    • Data stored in multi-dimensional arrays
    • Provides an opportunity to "slice and dice" data
    • foundation for OLAP
    • Data retrieval is much quicker than with standard relational databases
  23. Star schema's have
    • Fact tables
    • Dimension tables
  24. Describe fact tables
    • Associated with a particular type of data
    • ex. sales table
    • Sales:  Sales_Quantity, Sales_Price, Sales_Total
    • Orders:  Order_Quantity, Order_Price, Order_Amount
  25. Describe dimension tables
    • Attributes provide descriptive information about the facts within a given dimension
    • ex. product, time and location tables
    • Product:  Prod_Brand, Prod_Color, Prod_Size, Prod_Package
    • Location:  Region_ID, Loc_State, Loc_City, Loc_Office
  26. Data is extracted from source systems at regular intervals - typically measured in days, months and quarters
    Predictable frequency
  27. data is sourced from controlled, internal systems supporting established and well-defined back-office processes
    Static sources
  28. Data structures are known and modeled in advance of analysis
    Fixed models
  29. questions to be asked of the data are pre-defined
    Defined queries
  30. Rigorous change control is enforced before the introduction of new data sources or reporting requirements
    Slow-changing requirements
  31. The consumers of BI reports are typically business managers and senior executives
    Limited users
  32. Projects where SQL is ideal
    • logical related descrete data requirements which can be identified up-front
    • data integrity is essential
    • standards-based proven technology with good developer experience and support
  33. Projects where NoSQL is ideal
    • unrelated indeterminate or evolving data requirements
    • simpler or looser project objectives, able to start coding immediately
    • speed and scalability is imperative
  34. What are the 8 main differences in SQL vs NoSQL
    • Tables
    • Schemas
    • Normalization
    • JOINs
    • Data integrity
    • Transactions
    • Performance
    • Scaling
  35. what are the big factors impacting traditional BI databases
    • Semi-structured and unstructured data
    • Rapid evolution of database schema
    • High-velocity data sources
    • Quickly growing data volumes
  36. What is data mining
    Non-trivial extraction of implicit, previous unknown and potentially useful information from data
  37. Why may traditional techniques of answering business question be unsuitable
    • Enormity of data
    • high dimensionality of the data
    • heterogenous, distributed nature of data
  38. Describe supervised algorithms (classification)
    • Learning by example
    • use training data with correct answers
    • identify a class label for the incoming new data
    • Find a model to predict the class attribute as a function of the values of the other attributes
  39. Describe unsupervised algorithms (clustering)
    • do not use training data
    • classes may not be known in advance
  40. What are the 3 things required by the Nearest-Neighbor Classifier
    • The set of stored records
    • Distance Metric to compute the distance between records
    • The value of k, the number of nearest neighbors to retrieve
  41. To classify an unknown record using the nearest-neighbor classifier
    • Compute the distance to other training records
    • identify k nearest neighbors
    • use class labels of nearest neighbors to determine the class label of unknown record
  42. Factors to consider when choosing the value of k
    • If too small, the model is sensitive to noise
    • If too large, neighborhood may include too many points from other classes
Card Set
Data Warehouses OLAP and Business Intelligence