Ch 8 Big Data - Exam IV

The flashcards below were created by user mjweston on FreezingBlue Flashcards.

  1. algorithm
    a series of steps that need to occur in service to an overall goal
  2. commutative
    states that the order that a (MapReduce) function is executed doesn't matter
  3. scheduling
    code/data colocation
    fault/error handling
    foundational behaviors of MapReduce
  4. scheduling
    MapReduce functional behavior that states that mapping must be concluded before reducing can take place
  5. synchronization
    MapReduce functional behavior when all the mapping is complete & reducing begins.  Gathers and prepares all the mapped data for reduction
  6. code/data colocation
    MapReduce functional behavior where code and its related data are placed on the same node prior to execution
  7. fault/error handling
    MapReduce functional behavior where the engine must recognize that something is wrong and make the necessary correctionss
  8. hardware/network topology
    file system
    three categories of optimization techniques used to improve the reliability & performance of MapReduce jobs
  9. hardware/network topology
    MapReduce optimization technique that states that the closer the hardware processing elements are to each other, the less latency you will have to deal with.
  10. synchronization
    MapReduce optimization technique that states that all values from the same key are sent to the same reducer
  11. file system
    MapReduce optimization technique that states that a "warm standby" should be kept.  Also that lots of small files should be avoided, and long stretches of bandwidth are necessary, as well as security
  12. syntax
    the grammer in a programming language
  13. logic error
    an error produced when a program is able to run, but doesn't produce resonable output
  14. sequence
    a set of instructions that are performed one instruction at a time in the order stated
  15. selection (or decision)
    a set of instructions that are performed according to the outcome of a question
  16. loop
    a set of instructions that are performed iteratively until something tells it to stop
  17. sequence
    selection (or decision)
    high level structures that can be combined to create programs
  18. argument
    something that a function is going to work on
  19. function
    the structure that encompases the code and the arguements - a code cluster - usually returns something (a value) passed back through variables
  20. function
    a set of instructions that do a specific task often needing information passed to it in variables so it can perform the task
  21. encapsulation
    large sets of code organized to allow for reuse for certain tasks - these often include a type of code cluster called a function
  22. MapReduce
    a program that allows large sets of data to be worked with at the same time over a number of nodes
  23. input
    shuffling & sorting
    final result
    steps in the MapReduce process
  24. scheduling
    MapReduce behavior that self manages the number of tasks and the number of nodes so that all mapping occurs prior to reducing
  25. synchronizing
    MapReduce behavior that self manages the tasks by holding task results in limbo until all have completed; once tasks are completed maps are placed in a "shuffle & sort" area
  26. code/data colocation
    MapReduce behavior that sends a copy of the code to each node because there is enhanced efficiency when the data and the code reside in the same node
  27. fault/error handling
    MapReduce behavior that all programs "should" include which allows the system to properly realize when a failure or error occurs (ie: assigning a new node to complete a filed node's process)
Card Set
Ch 8 Big Data - Exam IV
MapReduce Fundamentals
Show Answers