-
DNA
- Deoxyribonucleic acid (DNA)
- • An alphabet of 4 leFers
- – adenine (A)
- – guanine (G)
- – cytosine (C)
- – thymine (T)
- • Sugar backbone
- – deoxyribose
- • Direc;onal (has polarity)
- – 5’ phosphate group
- – 3’ hydroxyl group
- Double-stranded
- – Allows easy replica;on
- • In base pairs
- – A-T
- – C-G
- – “Reverse-complement”
- • Packaged into
- chromosomes
- – Humans: 23 pairs of
- chromosomes (46 total)
- – Fruit flies: 4 pairs (8 total)
-
Types of DNA
- nucDNA in the nucleus
- • Mitochondria have mtDNA
- • Chloroplasts have cpDNA
-
central dogma
- • DNA provides
- instruc;ons for making
- RNA
- • RNA
- – can act on own OR
- – template for making
- proteins (mRNA)
- • Proteins: the major
- machinery for cells
-
Gene structure: coding genes
- • Promoter: start transcrip;on
- • Exons: expressed
- • Introns: intergenic regions
- • Transcrip;on only occurs from the
- “coding strand” (other would produce
- junk)
-
genes
- – rela;vely small propor;on of genome (<5% in some cases), but varies among species
- – ~23,000 genes in humans, ~13,000 in fruit flies
- – number of genes does not scale with organism complexity
-
Regulatory regions
- – contain binding sites for transcrip;on factors
- – affect rate and occurrence of transcription
-
Repetive regions
satellites, minisatellites, microsatellites
-
Transposable elements:
- elements that can copy themselves around the genome. SINES and
- LINES (short and long interspersed nuclear element). Can disrupt or change gene function.
-
Pseudogenes:
non-functional copies of coding genes
-
Ultra-conserved elements
- – 481 iden;cal regions of at least 200bp found between mouse, human, and rat (Bejerano
- et al. 2004 Science)
- – Function unclear
-
Methylation:
- attach methyl group to cytosine (5-methylcytosine), usually inactivates a gene.
- Can be inherited across cell divisions, but low fidelity. A focus of epigenetics.
-
Ribonucleic acid (RNA)
- • Usually single-stranded
- • Sugar backbone has ribose
- instead of deoxyribose
- • Four leFers
- – Adenine
- – Cytosine
- – Guanine
- – Uracil (U): very similar to
- thymine (T)
- • Secondary and tertiary
- structure ohen important
-
mRNA
messenger from DNA to protein
-
tRNA
- delivers (‘transfers’) amino acids to the translation
- process
-
rRNA
- form structure of the ribosome (along with
- proteins) for translation
-
siRNA
- small, interfering molecules (20-25bp) that reduce
- expression of specific genes
-
miRNA
- micro RNAs (21-23bp) that bind to
- complementary mRNA strands and prevent transla;on.
- Less specific than siRNAs.
-
introns
- – spliced out of mRNA
- – alterna;ve splicing
- possible and important
-
Exons
- – Untranslated Region
- (UTR): stability,
- localiza;on, efficiency
- – Coding sequence (CDS) for
- protein sequence
-
genetic code
- • Translates nucleotide triplets into
- amino acids
- • Code varies somewhat between
- nucDNA vs. mtDNA
- • Start (AUG/ATG) and stop codons
- for transla;on
- • Redundant codes for the same
- amino acid
- – Synonymous muta;ons: change
- the nucleo;de but not the amino
- acid (ohen 3rd position)
- • fourfold degenerate (e.g., Val)
- • twofold degenerate (e.g., Tyr)
- – Non-synonymous mutations
-
locus
- a segment of the genome, used to
- refer to a gene or other genetic marker
-
allele
- alternative forms of the same locus
- (i.e., a difference in sequence, maternalpaternal)
-
Allozymes
- • Different forms of the same protein
- – detects amino acid differences (coarse)
- – distinguish by size and/or electric charge
- • Run on an electrophoresis gel
- • Different forms of the same protein
- – detects amino acid differences (coarse)
- – dis;nguish by size and/or electric charge
- • Run on an electrophoresis gel
- – Produces distinctive bands at specific places
-
allozyme limitations
can't spot silent mutation
-
mtDNA/cpDNA sequencing
- Read sequence directly: AAGT, etc.
- • Maternally inherited (except molluscs
- and rare excep;ons)
- • Cytochrome oxidase I (COI): common
- for iden;fying species
- • Control region: higher subs;tu;on
- rate, greater diversity (doesn’t code
- for proteins)
- • Cytochome b (cyt b): intermediate
- muta;on rate
- • Only a single locus (n = 1)
-
Restriction fragment length
polymorphisms (RFLP)
- • Assays DNA directly
- • Restriction enzymes found naturally in
- bacteria
- • Bind to a specific DNA sequence
- – Won’t bind if there has been a
- mutation at the cut site
- • Cut the strand
- – Leaves a blunt or a sticky end
- • Run product on a gel, assess variation
- in length
- • Ohen used with mtDNA
- – Most popular in 80s and 90s
- – Cheap
-
nucDNA
- Poten;al for many loci: greater
- power for inference
- • Can examine protein-coding loci
- • Exons or introns
- – Exon-priming, intron-crossing (EPIC)
- primers: more likely to work across
- species
-
Microsatellites
- • Repetitive sections of the genome
- – AT AT AT AT AT
- – GTA GTA GTA GTA
- – 2-5 bp repeats
- • Strands can slip during replica;on or recombination
- – adds or deletes a repeat
- – changes length of microsatellite locus
- – Frequent muta;on: loci are highly polymorphic
- • Assess variation in fragment length on a gel
-
Single Nucleotide Polymorphism (SNPs)
- • Usually from nucDNA
- • A single position in the genome that can be one of
- two alternate bases
- • Need primers (flanking sequence)
- – ohen developed from genome
- • Only see the SNPs that you look for (ascertainment
- bias)
-
Transcriptomes
- • All mRNA in a tissue
- • Transcribed genes ~5% of genome
- • Genes likely to be functional
- • Can identify sequence AND level of expression
- (# transcripts)
-
Genotyping by sequencing
- • Various methods to select a repeatable
- subset of the genome
- – e.g., cut with restric;on enzymes and sequence
- near cut-sites (RADseq, RRL, CroPS)
- • Efficient use of next-gen sequencing for
- popula;on-level or phylogene;c ques;ons
- – e.g., 300 thousand bp from each of 10,000
- individuals instead of 3 billion bp from 1
- individual
-
Sanger sequencing
- • Developed by Frederick Sanger and
- colleagues in 1977
- • Outline is similar to PCR: polymerase copies
- a DNA strand
- • Copying starts from primers
- – can be same as used in PCR
- • Uses a mix of normal (dNTP) and dideoxy
- nucleotides (ddNTP)
- – H instead of OH at 3’ and 2’ posi;on on sugar
- backbone
- – cannot be extended further by polymerase
- • ddNTPs are fluorescently labelled
- – 4 colors for 4 nucleotides
- • Creates DNA copies that terminate at
- random positions
- – Hence, called the “dye-termination” method
- – Generates forward and reverse strands that
- may overlap in the middle
- • Run fragments through a gel,
- separate by size
- – Capillary gel on an automated
- sequencer
- • Read sequence of fluorescing
- dyes with a laser
- • Fragments up to 1000 bp
- – $2-3/sequence (250bp/$)
- • Widely available as a
- commercial service
-
Illumina sequencing (aka Solexa)
- • Most popular of next-gen technologies
- • Reads 33-300 bp at either end of a DNA
- fragment
- • Up to 280 million reads per run
- – $1k -4k per run (12 million bp/$)
- – Available from sequencing centers (e.g.,
- Princeton LSI, UC Berkeley)
- • Can put mul;ple samples on a lane, iden;fy
- later by DNA “barcode”
- • Process:
- – AFach DNA fragments to surface of a glass slide
- (“flow cell”)
- – Amplify into a cluster
- – AFach a fluorescently labeled dNTP
- – Image
- – Remove fluorescent label
- – Add next fluorescently labeled dNTP
|
|