0.Redundant Physical Infrastructure
1.Security Infrastructure
2.Operational Databases (day-to-day)
3.Organizing Data Services & Tools
4.Analytical Data Warehouse
4.5.Big Dat Analytics
5.Reporting & Visualization
6.Big Data Applications
6 Layers of the Tech Stack
layer 0 - Redundant Physical Infrastructure
layer in the stack that includes hardware, network systems, communication, etc. and ensures multiple components are able to "step in" when another component fails
performance - latency req
availability - interruptions/uptime
scalability - adjust to needs
flexibility - add to resources & recover
cost - affordable
principles that should be applied to your data implementation approach (typically at layer 0)
the expectation by an individual that his information will not be disclosed without concent
the mechanism and the strategy that prevents unauthorized disclosure of a person's information
a concept that allows an infrastructure or system to be resilient to failure or changes - ensures that a malfunction won't cause an outage - multiple components are able to "step in" when another component fails
layer 1 - Security Infrastructure
layer in the stack that includes security and privacy requirements
who - should have access?
what - data is available to whom?
when - specific times?
where - require physical presence?
how - data in what form?
elements of data & application access
data & application access
data encryption
threat detection
elements of security strategy:
changing the formatting of the data so that it is not readable except by the person who is meant to receive it
private or secret key cryptography
the same key is used to encrypt and decrypt the message - faster, but more open to attack
public key cryptography
the sender uses one key (the receiver's public key) to send something and the receiver uses a private key to decrypt - slower but more secure
application programming interface (API) - most common API is Representational State Transfer (REST) used for connecting web resources
a piece of software that connects different applications - is designed to be flexible & user-driven
layer 2 - Operational Database
layer in the stack that holds the day-to-day data for the organization
ACID: (transactions can/cannot)
Atomicity - all or nothing
Consistency - cannot cause issues in the database
Isolation - not interfere with each other
Durability - data doesn't disappear but is stored
characteristics of a database transaction
layer 3 - Organizing Data Services and Tools
layer in the stack where big data elements are captured, validated, and assembled into contextually relevant collections
a distributed file system
serialization services
coordination services
extract, transform, and load (ETL) tools
workflow services
*common technologies in the Organizing Data Services and Tools layer
a distributed file system
technology necessary to accommodate the decomposition of data streams and to provide scale and storage capacity - allows access from a number of hosts
serialization services
technology necessary for persistent data storage and multi-language remote procedure calls (RPCs) - a form of transforming data for understanding between applications
coordination services
technology necessary for building distributed applications (locking and so on) - help make multiple applications resilient (ex: withstand an outage and come back on)
extract, transform, and load (ETL) tools
technology necessary for the loading and conversion of structured and unstructured data into Hadoop - get data ready from its source to its application
workflow services
technology necessary for scheduling jobs and providing a structure for synchronizing process elements across layers - keep instructions in a defined order
layer 4 - Analytical Data Warehouses
layer in the stack created specifically for the purpose of heavy analysis and therefore big data (as oposed to strategic planning which is the primary purpose of traditional data warehouses & marts)
layer 4.5 - Big Data Analytics
layer in the stack where analytics tools are used such as reporting and dashboards, visualization, and analytics and advanced analytics
layer 5 - Reporting & Visualization
layer in the stack (specific to an organization) that is focused on giving the employee what they need when they need it to successfully make decicions
layer 6 - Big Data Application
layer in the stack (specific to an organization) that encompasses anything that fits into the mold - analytics packages, ETL packages, networking support, advanced DSS packages, etc.