MapReduce functional behavior that states that mapping must be concluded before reducing can take place
synchronization
MapReduce functional behavior when all the mapping is complete & reducing begins. Gathers and prepares all the mapped data for reduction
code/data colocation
MapReduce functional behavior where code and its related data are placed on the same node prior to execution
fault/error handling
MapReduce functional behavior where the engine must recognize that something is wrong and make the necessary correctionss
hardware/network topology
synchronization
file system
three categories of optimization techniques used to improve the reliability & performance of MapReduce jobs
hardware/network topology
MapReduce optimization technique that states that the closer the hardware processing elements are to each other, the less latency you will have to deal with.
synchronization
MapReduce optimization technique that states that all values from the same key are sent to the same reducer
file system
MapReduce optimization technique that states that a "warm standby" should be kept. Also that lots of small files should be avoided, and long stretches of bandwidth are necessary, as well as security
syntax
the grammer in a programming language
logic error
an error produced when a program is able to run, but doesn't produce resonable output
sequence
a set of instructions that are performed one instruction at a time in the order stated
selection (or decision)
a set of instructions that are performed according to the outcome of a question
loop
a set of instructions that are performed iteratively until something tells it to stop
sequence
selection (or decision)
loop
high level structures that can be combined to create programs
argument
something that a function is going to work on
function
the structure that encompases the code and the arguements - a code cluster - usually returns something (a value) passed back through variables
function
a set of instructions that do a specific task often needing information passed to it in variables so it can perform the task
encapsulation
large sets of code organized to allow for reuse for certain tasks - these often include a type of code cluster called a function
MapReduce
a program that allows large sets of data to be worked with at the same time over a number of nodes
input
splitting
mapping
shuffling & sorting
reducing
final result
steps in the MapReduce process
scheduling
MapReduce behavior that self manages the number of tasks and the number of nodes so that all mapping occurs prior to reducing
synchronizing
MapReduce behavior that self manages the tasks by holding task results in limbo until all have completed; once tasks are completed maps are placed in a "shuffle & sort" area
code/data colocation
MapReduce behavior that sends a copy of the code to each node because there is enhanced efficiency when the data and the code reside in the same node
fault/error handling
MapReduce behavior that all programs "should" include which allows the system to properly realize when a failure or error occurs (ie: assigning a new node to complete a filed node's process)