BDD Ch 4 - Exam III

  1. 0.Redundant Physical Infrastructure
    1.Security Infrastructure
    2.Operational Databases (day-to-day)
    3.Organizing Data Services & Tools
    4.Analytical Data Warehouse
    4.5.Big Dat Analytics
    5.Reporting & Visualization
    6.Big Data Applications
    6 Layers of the Tech Stack
  2. layer 0 - Redundant Physical Infrastructure
    layer in the stack that includes hardware, network systems, communication, etc. and ensures multiple components are able to "step in" when another component fails
  3. performance - latency req
    availability - interruptions/uptime
    scalability - adjust to needs
    flexibility - add to resources & recover
    cost - affordable
    principles that should be applied to your data implementation approach (typically at layer 0)
  4. privacy
    the expectation by an individual that his information will not be disclosed without concent
  5. security
    the mechanism and the strategy that prevents unauthorized disclosure of a person's information
  6. redundancy
    a concept that allows an infrastructure or system to be resilient to failure or changes  - ensures that a malfunction won't cause an outage - multiple components are able to "step in" when another component fails
  7. layer 1 - Security Infrastructure
    layer in the stack that includes security and privacy requirements
  8. who - should have access?
    what - data is available to whom?
    when - specific times?
    where - require physical presence?
    how - data in what form?
    elements of data & application access
  9. data & application access
    data encryption
    threat detection
    elements of security strategy:
  10. encryption
    changing the formatting of the data so that it is not readable except by the person who is meant to receive it
  11. private or secret key cryptography
    the same key is used to encrypt and decrypt the message - faster, but more open to attack
  12. public key cryptography
    the sender uses one key (the receiver's public key) to send something and the receiver uses a private key to decrypt - slower but more secure
  13. application programming interface (API) - most common API is Representational State Transfer (REST) used for connecting web resources
    a piece of software that connects different applications - is designed to be flexible & user-driven
  14. layer 2 - Operational Database
    layer in the stack that holds the day-to-day data for the organization
  15. ACID: (transactions can/cannot)
    Atomicity - all or nothing
    Consistency - cannot cause issues in the database
    Isolation - not interfere with each other
    Durability - data doesn't disappear but is stored
    characteristics of a database transaction
  16. layer 3 - Organizing Data Services and Tools
    layer in the stack where big data elements are captured, validated, and assembled into contextually relevant collections
  17. a distributed file system
    serialization services
    coordination services
    extract, transform, and load (ETL) tools
    workflow services
    *common technologies in the Organizing Data Services and Tools layer
  18. a distributed file system
    technology necessary to accommodate the decomposition of data streams and to provide scale and storage capacity - allows access from a number of hosts
  19. serialization services
    technology necessary for persistent data storage and multi-language remote procedure calls (RPCs) - a form of transforming data for understanding between applications
  20. coordination services
    technology necessary for building distributed applications (locking and so on) - help make multiple applications resilient (ex: withstand an outage and come back on)
  21. extract, transform, and load (ETL) tools
    technology necessary for the loading and conversion of structured and unstructured data into Hadoop - get data ready from its source to its application
  22. workflow services
    technology necessary for scheduling jobs and providing a structure for synchronizing process elements across layers - keep instructions in a defined order
  23. layer 4 - Analytical Data Warehouses
    layer in the stack created specifically for the purpose of heavy analysis and therefore big data (as oposed to strategic planning which is the primary purpose of traditional data warehouses & marts)
  24. layer 4.5 - Big Data Analytics
    layer in the stack where analytics tools are used such as reporting and dashboards, visualization, and analytics and advanced analytics
  25. layer 5 - Reporting  & Visualization
    layer in the stack (specific to an organization) that is focused on giving the employee what they need when they need it to successfully make decicions
  26. layer 6 - Big Data Application
    layer in the stack (specific to an organization) that encompasses anything that fits into the mold - analytics packages, ETL packages, networking support, advanced DSS packages, etc.
Author
mjweston
ID
242150
Card Set
BDD Ch 4 - Exam III
Description
Tech Stack
Updated