-
0.Redundant Physical Infrastructure
1.Security Infrastructure
2.Operational Databases (day-to-day)
3.Organizing Data Services & Tools
4.Analytical Data Warehouse
4.5.Big Dat Analytics
5.Reporting & Visualization
6.Big Data Applications
6 Layers of the Tech Stack
-
layer 0 - Redundant Physical Infrastructure
layer in the stack that includes hardware, network systems, communication, etc. and ensures multiple components are able to "step in" when another component fails
-
performance - latency req
availability - interruptions/uptime
scalability - adjust to needs
flexibility - add to resources & recover
cost - affordable
principles that should be applied to your data implementation approach (typically at layer 0)
-
privacy
the expectation by an individual that his information will not be disclosed without concent
-
security
the mechanism and the strategy that prevents unauthorized disclosure of a person's information
-
redundancy
a concept that allows an infrastructure or system to be resilient to failure or changes - ensures that a malfunction won't cause an outage - multiple components are able to "step in" when another component fails
-
layer 1 - Security Infrastructure
layer in the stack that includes security and privacy requirements
-
who - should have access?
what - data is available to whom?
when - specific times?
where - require physical presence?
how - data in what form?
elements of data & application access
-
data & application access
data encryption
threat detection
elements of security strategy:
-
encryption
changing the formatting of the data so that it is not readable except by the person who is meant to receive it
-
private or secret key cryptography
the same key is used to encrypt and decrypt the message - faster, but more open to attack
-
public key cryptography
the sender uses one key (the receiver's public key) to send something and the receiver uses a private key to decrypt - slower but more secure
-
application programming interface (API) - most common API is Representational State Transfer (REST) used for connecting web resources
a piece of software that connects different applications - is designed to be flexible & user-driven
-
layer 2 - Operational Database
layer in the stack that holds the day-to-day data for the organization
-
ACID: (transactions can/cannot)
Atomicity - all or nothing
Consistency - cannot cause issues in the database
Isolation - not interfere with each other
Durability - data doesn't disappear but is stored
characteristics of a database transaction
-
layer 3 - Organizing Data Services and Tools
layer in the stack where big data elements are captured, validated, and assembled into contextually relevant collections
-
a distributed file system
serialization services
coordination services
extract, transform, and load (ETL) tools
workflow services
*common technologies in the Organizing Data Services and Tools layer
-
a distributed file system
technology necessary to accommodate the decomposition of data streams and to provide scale and storage capacity - allows access from a number of hosts
-
serialization services
technology necessary for persistent data storage and multi-language remote procedure calls (RPCs) - a form of transforming data for understanding between applications
-
coordination services
technology necessary for building distributed applications (locking and so on) - help make multiple applications resilient (ex: withstand an outage and come back on)
-
extract, transform, and load (ETL) tools
technology necessary for the loading and conversion of structured and unstructured data into Hadoop - get data ready from its source to its application
-
workflow services
technology necessary for scheduling jobs and providing a structure for synchronizing process elements across layers - keep instructions in a defined order
-
layer 4 - Analytical Data Warehouses
layer in the stack created specifically for the purpose of heavy analysis and therefore big data (as oposed to strategic planning which is the primary purpose of traditional data warehouses & marts)
-
layer 4.5 - Big Data Analytics
layer in the stack where analytics tools are used such as reporting and dashboards, visualization, and analytics and advanced analytics
-
layer 5 - Reporting & Visualization
layer in the stack (specific to an organization) that is focused on giving the employee what they need when they need it to successfully make decicions
-
layer 6 - Big Data Application
layer in the stack (specific to an organization) that encompasses anything that fits into the mold - analytics packages, ETL packages, networking support, advanced DSS packages, etc.
|
|