Digital Analysis of Genomes

  1. Here you see petri plates containing thousands of separate colonies constituting part of a human genomic library. Each clone contains a different ______ DNA molecule, each with a ______ ______ attached to a different fragment of human genomic DNA. Note that the colonies are scattered randomly around the plate, so their arrangement on the plate has no correspondence to their relative order in the ______
    Image Upload 1
    • recombinant 
    • plasmid vector
    • genome
  2. How can you then tell which colony contains which fragment of human DNA?
    With today's tech., the simplest answer is to sequence the human DNA insert in each clone (based on the model set by Fred Sanger in 70s)
  3. Automated Sanger sequencing (5-8-1-1-1-4-story)
    Image Upload 2

    Image Upload 3

    Image Upload 4

    Image Upload 5
    pg 310-311
  4. Primer:
    • primer: a short-SSDNA molecule (an oligonucleotide) that is complementary to part of the template and that provides the free 3' end to which DNA polymerase can attach new nucleotides
    • Hybridization: the natural tendency of complementary SSDNA molecules of DNA or RNA to base pair and form double helixes
  5. Dideoxyribonucleotide triphosphate:
    aka Dideoxynucletides: nucleotide analogs lacking the 3'-hydroxyl group that is critical for the formation of phosphodiester bonds. Dideoxynucleotides are key components of the most common methods of DNA sequencing
  6. Genomes range from the 700,000 base pairs (700kb) in the smallest known microbial genome, to more than 3 billion base pairs (3 gigabase pairs, or 3 Gb) disrtributed among the 23 chromosomes of ______, to even larger genomes
  7. If any single DNA sequencing run can yield at most ______ bases of info, then you might think you would need to obtain at least 3 million such sequences to determine the human genome's entire sequence. Why is this a gross understimate?
    • 1000
    • because you would really need to examine at least five times this number of clones from a genomic library to ensure just a 95% chance that each portion of the genome would be represented once
  8. How can you do so many DNA sequencing runs? And how can you deal with the immense amount of data you would obtain so that you could somehow figure out how these millions of small, 1000 base snippets are ordered with respect to each other in the intact genome?
    The whole-genome shotgun strategy or The hierarchical strategy
  9. The whole-genome shotgun strategy (2-story)
    pg 314
  10. What was the one problem recognized with the whole genome shotgun strategy early on?
    • Genomes contain many kinds of repetitive DNA sequences, each of which can be located in many positions scattered throughout the genome
    • If you found a cloned sequence containing repetitive DNA, how would you know where in the genome that particular copy belonged?
  11. Minimal tiling path
    The smallest set of Bacterial Artificial Chromosome (BAC) vector clones with the least amount of overlap that could cover the entire genome
  12. Hierarchical strategy (4-story)
    pg 315
  13. How did Celera (a private company) manage to successfully complete sequencing via the shotgun approach?
    Image Upload 6
    Image Upload 7
    Image Upload 8
    pg 315
  14. The importance of points (3) and (4) is that they provide _____ information about the two sequences in a clone.
    Image Upload 9
    • spatial
    • *Ex: the two ends sequenced from a BAC clone insert must be located about 20 kb apart in the genome
  15. How does the spatial info overcome the problem posed by repeat sequences scattered throughout the genome
    • paired end sequences from clones of three insert sizes (2kb, 10kb and 200kb) make it possibe to bridge most lengths of repetitive sequences 
    • Image Upload 10
  16. The whole genome shotgun strategy has one main advantage with respect to the hierarchical approach:
    After the genomic libraries are constructed, the rest of the procedure can be highly automated
  17. Celera invested in a huge facility containing hundreds of DNA sequencing machines fed by other robotic machines that first prepared DNA from the clones of ____ _____, placed these DNAs into _______reactions, and then loaded the reactions into the _____ machines. This automation allowed Celera to obtain relatively cheaply the millions of DNA sequence _____ required to provide about 10-fold genomic equivalent coverage
    • genomic libraries
    • sequencing 
    • sequencing 
    • reads
  18. The DNA sequencing machines could feed their data into a centralized supercomputer, whose complex software could then assemble all these sequences into the chromosomal strings. The Celera tech had such large relative efficiencies that most species' genomes that have been determined to date (2013) were deciphered with the _____-______ ______ approach
    whole-genome shotgun approach
  19. Genomes also have other important features such as ______, ________ and ________ ______ (regions of DNA that move between different places in the genomes)
    centromeres, telomeres, and transposable elements
  20. The annotation of the genome (explain) depends on the compilation of data derived from diverse methods of investigation
    parsing out which sequences of DNA do which tasks
  21. One way to look specifically for regions that might correspond to the exons of protein-coding genes is to scan genomic DNA sequences for long _____ ______ ______; that is, stretches of nucleotides that have a reading frame of triplets uninterrupted by a ____ ____.
    • open reading frames 
    • stop codon
  22. There are 43 = 64 possible triplets of the four nucleotides, of which three (TAA etc) signify stop. If you looked at any random sequence of DNA starting at any one nucleotide, state a very rough estimate of how many codons you would encounter, on avg, before running into a stop codon
    64/3 ~ 21 triplets
  23. If that nucleotide begins a reading frame that continues without a stop for significantly more than 21 triplets, a good chance exists that the DNA in this region is not a ______ set of nucleotides, but instead actually encodes _____ ______ within a protein
    • random
    • amino acids
  24. That method is useful but far from foolproof. Genomes are so large that regions that do not correspond to genes might by chance rarely contain a ____ _____. On the other hand, because many genes in higher euks are interrupted by ______, some protein-coding ______ are so small that they would not be identified as _____ unless other info was available
    • long ORF
    • introns 
    • exons
    • ORFs
  25. A segment of DNA is said to be a homolog of a sequence in another species when the two show evidence of derivation from the ______ DNA sequence in a ______ ______. For perfectly matched sequence that are ___ bp in length or longer, the evidence is clear
    • same
    • common ancestor 
    • 50bp
  26. But evidence for ________ of imperfectly matched DNA regions requires a more sophisticated statistical analysis, a task that is readily performed by specialized bioinformatics programs. When homologs of a DNA sequence are found in many different species, the sequence is said to be _______
    • homology 
    • conserved
  27. A traditional _______ tree, like the one in the fig. depicts the relatedness of multiple species to each other, with branch points that represent a series of nested _______ _______.
    Image Upload 11
    • phylogenic tree
    • common ancestors
  28. When the human genome is compared as a whole with other representative vertebrate species, the percentage of sequence conservation is very _____ for chimps and monkeys, but ______ as the elapsed time to a common ancestor increases
    • high
    • decreases
  29. At a distance of over 400 million yrs, the fish genome contains only ___% of the DNA sequences present in the ______ genome. In contrast, when comparisons are restricted to human protein-coding regions, conservation levels remain ______ (more than ___%) throughout vertebrate evolution
    Image Upload 12
    • 2%
    • human 
    • high 
    • 82%
  30. Mutations that disrupt the function of functional DNA sequences such as protein-coding regions may ______ the evolutionary fitness of the organism. As a result, functionally important sequences evolve more ____ than nonfunctional sequences, which do not contribute to _______
    • lessen
    • slowly
    • phenotype
  31. Unconstrained divergence of nonfunctional sequences would eventually eliminate all evidence of _____ ______. Thus ______ _______  comparison results have biological relevance
    • common ancestors
    • whole genome
  32. With a computerized genome visualization tool, it becomes possible to explore DNA sequence _______ directly along the genome as well as across ______ time
    • conservation 
    • evolutionary time
  33. An example of cross-species homology analysis in the fig for a 100kb region containing four genes. The bottom row of the figure displays the locations and _____/______ structures of the four genes in the human genome. Above this row are _______ maps for three representative vertebrate species; _______ conserved DNA sequences are indicated within the dark lines or blocks
    Image Upload 13
    • exon/intron
    • homology
    • highly
  34. As anticipated from the close relationship between human and chimpanzee species, nearly complete _______ of human sequences exists across the entire region in a chimp genome. In other mammals, represented here by the mouse, _______ is also apparent across the entire region, but the pattern is ______, indicating small regions of __________ interspersed with small, ________ regions
    Image Upload 14
    • conservation
    • conservation 
    • choppy
    • conservation
    • nonconserved
  35. As we move farther across the phylogenetic landscape to fish, we can more clearly distinguish sequences subject to evolutionary ________ from those that are not. Note in particular that large parts of the coding regions of three of the four genes are ______ conserved in all the species examined. This conservation suggests that the ______ _______ of the three genes are critical to the survival of all _________
    • constraints
    • highly
    • protein products
    • vertebrates
  36. However, a homolog of the fourth gene is not found in zebrafish, indicating that the function of the fourth gene is _______ to fish. Regions of homology between the human and mouse or zebrafish genomes are much less frequent in ______, in the noncoding parts of ______ (corres. to the 5' and 3' UTRs of the genes) and the spaces ______ genes
    • dispensable 
    • introns
    • exons
    • between
  37. What usually predicts the location genes?
    What are the exceptions?
    • Sequence conservation over long evolutionary periods, such as the time humans last shared a common ancestor with mice or fish
    • Conserved DNA sequences can be observed rarely at locations outside of the coding regions.
  38. The fact that these features are so well conserved strongly suggests that they have a ______ that is subject to evolutionary constraints; however, in most cases we do not yet know what these _______ may be.
    • function 
    • function
  39. Scientists are actively exploring the potential roles of these conserved noncoding sequences; for example, some might represent _______ elements that help determine when and where nearby genes are _______ into mRNA
    • enhancer
    • transcribed
  40. Many genes encode ______ while some others, such as genes for _____ and _____, do not. However, all genes are _____ into RNAs, even if some RNAs are not _______
    • proteins
    • rRNAs and tRNAs
    • transcribed
    • translated
  41. If you knew the sequence of the ____ produced from a gene, it would be easy to find that gene in genomic DNA simply by looking for the DNA sequence complementary to the _____.
    • RNA
    • RNA
  42. This approach in fact works well for RNAs that can be purified in ______ amounts like rRNAs (which can be isolated from other RNAs because they form part of the _______). In contrast, most mRNAs are so relatively _____ in cells that they cannot be purified readily
    • large
    • ribosome
    • rare
  43. Moreover, although techs for determining the nucleotide sequence of RNAs do exist, they are less widely available and more difficult to perform than the methods available for sequencing. What is the easiest way to study mRNAs?
    To copy them into DNA, then clone the resultant DNA molecules, and then to sequence these clones by the same methods already described for genomic DNA
  44. To produce DNA clones from mRNA sequences, researchers rely on a series of in vitro reactions that mimics part of the life cycle of viruses known as ______. ______, which include among their ranks the _____ virus that causes AIDS, carry their genetic info in molecules of RNA
    • retroviruses
    • Retroviruses
    • HIV
  45. As part of their gene-transmission kit, retroviruses also contain the unusual enzyme known as _____-________ ______ _______. After infecting a cell, a retrovirus uses ______ ________ to copy its single strand of RNA into a mirror-image strand of ______ DNA
    • RNA-dependent DNA polymerase (reverse transcriptase)
    • reverse transcriptase
    • complementary DNA (cDNA)
  46. The reverse transcriptase then makes a second strand of DNA complementary to first _____ strand (and equivalent in sequence to the original _____ template). Finally, this ______-stranded DNA copy of the retroviral RNA chromosome integrates into the host cell's ________
    • cDNA strand
    • RNA
    • double-stranded
    • genome
  47. Although the designation cDNA originally meant a single strand of DNA complementary to an RNA molecule, it now refers to any DNA (single or double stranded) derived from an _____ _______
    RNA template
  48. How do you isolate the mRNAs of a particular cell so you could use reverse transcriptase to make cDNA copies of the mRNAs that are transcribed (for example cells such as red blood cell precursors?) (7-story)
    pg 319 bottom left to top right
  49. The addition of ____ _____ to this total mRNA (as well as ample amounts of the four deoxyribonucleotide triphosphates and primers to initiate synthesis) generates ______-stranded ______ bound to the ______ template
    Image Upload 15
    • reverse transcriptase 
    • single-stranded
    • cDNA 
    • mRNA template
  50. The primers used in this reaction are also ______ so as to initiate polymerization of the first ______ strand from all mRNAs. After synthesis is finished you can ________ mRNA-cDNA hybrids into single strands by ______  the hybrids to ______ temps. The addition of an ______ enzyme that digests the original RNA strand leaves intact single strands of _____
    Image Upload 16
    • oligo-dT
    • cDNA
    • denature
    • heating
    • high
    • RNase enzyme
    • cDNA
  51. Most of the SScDNA ____ back on themselves at their ___ end to form transient ______ _____ via base pairing with random _________ nucleotides in nearby sequences in the same strand. These ______ _____ serve as _______ for synthesis of the second DNA strand
    • fold
    • 3' end
    • hairpin loops
    • complementary
    • hairpin loops 
    • primers
  52. Now the addition of DNA polymerase, in the presence of the requisite deoxyribonuc triphosphates, initiates the production of a second ______ strand from the just synthesized _____-stranded _____ template. The product are _____-stranded ____ molecules
    Image Upload 17
    • second cDNA
    • single-stranded
    • cDNA template
    • double-stranded cDNA
  53. After using ______ enzymes and _____ to insert the double-stranded cDNA into a suitable vector and then transforming the vector-insert recombinants into appropriate host cells, you would have a _______ of double-stranded cDNA fragments
    • restriction enzymes and ligase
    • library
  54. The cDNA fragment in each individual clone will correspond to an ______ molecule in the red blood cell ________ that served as your sample. It is important to note that this cDNA library includes only the ______ from that part of the genome that these cells were actively transcribing for translation into protein
    • mRNA
    • precursors
    • exons
  55. Why dont the clones in cDNA libraries contain introns?
    The mRNAs from which they were produced do not have introns
  56. The clones of genomic libraries represent all regions of the DNA ______ and show what the intact _______ looks like in the region of each clone.
    • equally 
    • genome
    • Image Upload 18
  57. The clones in cDNA libraries reveal which parts of genome contain the info used in making ______ in specific ______. The prevalence of the mRNAs for specific genes also gives some indication, though _____, of the relative amounts of the various proteins made in those cells
    • proteins 
    • tissues
    • imperfect
Card Set
Digital Analysis of Genomes
Ch 9.3-9.6