Bioinformatics I (parts 1-5)

  1. Three routes to fixation
    • *Purifying selection
    • *Positive selection/diversifying selection
    • *Drift
  2. Potential Sources of Genetic Drift
    • *Chromosome segregation during meiosis
    • -There is a 50% chance of an offspring receiving either of two parental alleles
    • -It is possible for all offspring to receive the same allele
    • *(Random) fluctuations in the number of offspring an individual or couple produces
    • *(Random) effects of which organisms survive to adulthood
  3. Is fixation time independent of population size?
    • No:
    • Fixation is slower in large populations, hence large populations generally have more genetic variation
  4. After attempting an assembly process, what is yielded?
    • Contigs!
    • A contig is a group of reads that have been found to overlap

    • Also have scaffolds and super contigs
    • –Use additional information beyond sequence to determine genomic location
  5. What does the following image describe?
    Image Upload 2
    • Paired end reads:
    • –Each read is paired with a second read a known distance apart
    • –Easy to do by cloning fragments into bacterial vector
    • Mate pairs are a similar concept with differing chemistry
  6. When was the human genome project completed?
    2003
  7. Paper read:
    Image Upload 4
    • Abstract |
    • Since the completion of the human genome project in 2003, extraordinary progress has
    • been made in genome sequencing technologies, which has led to a decreased cost per megabase
    • and an increase in the number and diversity of sequenced genomes. An astonishing complexity of
    • genome architecture has been revealed, bringing these sequencing technologies to even greater
    • advancements. Some approaches maximize the number of bases sequenced in the least amount
    • of time, generating a wealth of data that can be used to understand increasingly complex
    • phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of
    • DNA, which are essential for resolving structurally complex regions. These and other strategies
    • are providing researchers and clinicians a variety of tools to probe genomes in greater depth,
    • leading to an enhanced understanding of how genome sequence variants underlie phenotype
    • and disease.
  8. Sanger Sequencing Platforms
    • Sigma-Aldrich
    • "Chain termination method"
    • Image Upload 6
  9. Short read NGS: Sequencing by litigation (SBL)
    • In SBL approaches, a probe
    • sequence that is bound to a fluorophore hybridizes to
    • a DNA fragment and is ligated to an adjacent oligonucleotide for imaging. The emission spectrum of the
    • fluorophore indicates the identity of the base or bases
    • complementary to specific positions within the probe
  10. Short read NGS: Sequencing by synthesis (SBS)
    • In SBS approaches, a polymerase is used and a signal,
    • such as a fluorophore or a change in ionic concentration, identifies the incorporation of a nucleotide into an elongating strand
  11. DNA microarrays
    • DNA microarrays have been used for genetic research since the early 1980s. In DNA microarrays, single-stranded DNA (ssDNA) probes are immobilized on a substrate in a discrete location with spots as small as 50μm. Target DNA is labelled with a fluorophore and hybridized to the array. The intensity of the signal is used to determine the number of bound molecules.
    • Image Upload 8
  12. Gestalt Proximity Principle
    People feel closer to those with the most similarities to themselves
  13. Nano String
    Similar to microarrays, the nCounter Analysis System from NanoString relies on target–probe hybridization. Probes target a gene of interest; one probe is bound to a fluorophore ‘barcode’ and the other anchors the target for imaging. The number and type of each barcode is counted. NanoString is unique in that the probes are labelled molecules that are bound together in a discrete order, which can be changed to create hundreds of different labels.

    Image Upload 10
  14. qPCR
    • Real-time qPCR utilizes the PCR reaction to detect targets of interest. Gene-specific primers are used and the target is detected either by the incorporation of a double-stranded DNA (dsDNA)-specific dye or by the release of a TaqMan FRET (fluorescence resonance energy transfer) probe through polymerase 5′−3′ exonuclease activity.
    • Image Upload 12
  15. Optical Mapping
    • Optical mapping combines long-read technology with low-resolution sequencing. Originally a method for ordering restriction enzyme sites150 through digestion and size separation, this technology now uses fluorescent markers to tag particular sequences within DNA fragments that are up to ~1Mb long. The results are imaged and aligned to each other, and/or a reference, to map the locations of the probes relative to each other
    • Image Upload 14
  16. BLAST
    • * Basic Local Alignment Search Tool
    • * stores database as a suffix tree
    • * searches by looking for exact matches of a given word length in this structure
    • * local alignments are constructed for regions around such words
  17. What are some weakness of BLAST?
    • - BLAST requires at least two near-exact word matches in close proximity to start a search
    • - A heuristic or exact alignment algorithm is then employed. As a result, you can construct pathological cases where two similar sequences have no BLAST hit

    –Worse for nucleotide searches
  18. What are E-Values in BLAST?
    • - Expectation Value
    • - Similar to p-values, measures statistical strength of a hit
    • - The E-value gives the expected number of hits as strong or stronger than the given hit in a database of the size searched
  19. What is Asymptotic notation?
    • Express the running time of the problem as an approximate function of the problem size
    • - O(n) notation
    • - n is the problem size, O(n) describes and (upper bound) on how the running time changes with increases in n
  20. How do you define running time complexity?
    O(n2)
  21. How many yes or no questions does it to specify an alphabet size of 20?
    64?
    • Somewhere between 4 and 5
    • 8 --> 28 =64
    • log2(20)
    • log-base(2) of alphabet size
  22. What is a ribosome mostly made out of?
    rRNA
  23. What are the three sequence types found in biology?
    Image Upload 16
  24. Maximum Parsimony
    Faster than likelihood, only appropriate if sequences evolve slowly and at the same rate

    • * parsimony can prefer the wrong tree and positively mislead
    • (otherwise sensitive to homoplasies = parallel, convergent and reversed substitutions)
  25. What is the term that Darwin hypothesized in regards to how organisms can exist in an ecosystem if they reproduce exponentially?
    Differential Reproductive Success
  26. Why was AZT resistant HIV less competitive than wild type HIV?
    AZT-HIV has a slower telomerase, and DNA proofreads more slowly allowing for removal of incorrect bases (AZT is an analog for A) The slower ploymerase means it replicates its genome slower than wild type HIV in an AZT-free environment
  27. What are some potential sources of genetic drift?
    Image Upload 18
  28. What is a "while" loop?
    The while loop is used to repeat a section of code an unknown number of times until a specific condition is met.
  29. What are some steps to writing good pseudocode?
    • *Initially, write a statement that clearly defines the goal of writing pseudocode.
    • *Outline the steps required to perform in a logical sequence.
    • *Make sure to indentation while using conditional statements.
    • *Leverage programming conventions to name commands and appropriate formats.
    • *Explain everything in code using notations as you progress forward.

    • Proofread code to ensure that it is clear and easy to comprehend. It
    • should be understandable even by people from non-technical backgrounds.
  30. When are two N50 values comparable?
    When their assembly lengths are the same.
  31. What are two kinds of homologous genes?
    • Orthologs-
    • paralogs-
  32. Image Upload 20
    Image Upload 22
  33. How do you get a single gene duplication event?
    • –Caused primarily either by errors in recombination
    • -reverse transcription
Author
saucyocelot
ID
359190
Card Set
Bioinformatics I (parts 1-5)
Description
Concepts and ideas from the first few week/s of class
Updated