Purpose of HapMap & projected utility to human genetics research
Goal: to make a map with high density of SNP (single nucleotide polymorphism point mutations) markers throughout genome
1. used to identify disease causing mutations: through looking for correlations b/w SNPs & ppl with certain phenotype/disease
2. specifically looking for genes involved in simple (involves just one gene Mendelian inheritance) or complex diseases
3. also ID regions of genome that have undergone RECENT selection in order to: gain MORE info about human evolution.
What is a haplotype within the context of the HapMap project?
Haplotype: 2 or more genes/alleles that tend to be co-inherited/show linkage disequilibrium (do NOT follow Mendelian pattern of inheritance). Usually they are on the SAME chromosome
For Complex diseases: haplotype may include UNLINKED genes that MUST be co-inherited for disease to persist.
What are tagSNPs and how are they used?
TagSNPs: SNPs with a region with HIGH linkage disequilibrium. These are useful to identify specific genetic variation in genes/alleles WITHOUT identifying EVERY SNP in the chromosomal region.
(Only have to sequence the area of these tagSNPs to identify which haplotype individual has, not needed to sequence all SNPs)
Estimated that about 300-600,000 tagSNPs will give us a COMPLETE INFO re: individuals genotype
****In genome wide association studies (GWAS) the term haplotype has a slightly different meaning. How is this term used in GWAS studies?
It is used to refer to 2 or more genes/alleles that are involved in association with a specific disease???
What is meant by the term Linkage Disequilibrium and how is it used in genome wide association studies? Within the context of these types of studies, what is meant by the term, association?
Linkage disequilibrium: Situation where genes/alleles do NOT follow Mendelian pattern of inheritance.
Measure of the degree of linkage between the genes and the associated disease/phenotype.
The more TIGHT linked the genes that make contribution to disease/phenotype:the HIGHER level of linkage disequilibrium.
Briefly describe how the HapMap Project was done. What was the source of the genome samples, i.e. what was the composition of the study cohort?
Began with looking at 269 individuals from 4 population groups:
1. 90 (30 trios) of Europeans
2. 90 (30 trios) from African tribe
3. 45 unrelated Chinese
4. 44 unrelated Japanese
Looked for COMMON SNPs: site where minimum of 2 alleles are present in gene pool, and frequency of less common allele is @least 0.05
Chose 10 regions (500kb each) over 7 chromosomes, sequenced in 48 individuals from the study.
These SNP sites identified in these 48 were sequenced in all 269 ppl.
What were some of the major findings of the HapMap project?
SNP density is HIGHER than expected: 1 SNP every 297bp on avg.
SNPs Not just clustered in CODING regions. (interSNP distances typically LESS than 10kb)
Amount of sharing of SNPs between European Ancestry (Utah), Yoruba African Tribe, Chinese and Japanese population are similar BUT Chinese and Japanese have CLOSE relationship to each other compared to others
What are microarrays, how are they produced & how are they used?
1. Obtain genomic DNA sample (or reverse transcribe RNA) and biotin or fluorescently label
2. Hybridize to microarray (aka Gene chip w/millions of probes/features) of KNOWN DNA seq/alleles
3. Detect sites on microarray where sample has hybridized using FLUORESCENT probe
Microarray chip details:
sequences printed on glass wafers.
Protective groups removed through light deprotection (only area where specific nucleotide, ie T, will be attached), then nucleotides (ie T) add onto linker molecule, and process repeats until about 25 nucleotides have attached to each sequence
25 nucleotides gives best specificity results
Phosphate groups present to prevent branching between sequences, but removed at final step
When any unprotected nucleotides are left after nucleotide addition step, they are CAPPED to prevent mutant sequences
Fluorescence from the sequences indicate binding has occurred to the feature
In the Styrkarsdottir et al paper the p-value they used to indicate statistical significance at the genome level was p = 1.7 x 10-7. Why did they set their p-value for significance so low? Why didnt they use the p = 0.05 value that is commonly used in statistics?
The significance was set so low because the study was testing for an association between 301,019 SNPs and the bone mineral density of the hip and lumbar spine. All these 300,000+ SNPs, in order to be reported as statistically significant, needed to be accounted for in order to get the genomic wide significance. The p-value was then taken as 0.05 (standardly used p-value in stats) DIVIDED by 301,019 SNPs in order to give this genomic wide significance.
****What is one of the major limitations of the HapMap data? To put that another way, now that research groups are beginning to use the HapMap data to look for genes associated with complex diseases or traits, what is one of the issues that they have encountered?
For some experiments, such as in the bone mineral density and fractures article, the sample sets included do not represent the entire species, but only subsets of the population.
They also discovered that the contribution of each allele may be very small in relation to correlating it to bone mineral density.
There may be a vast more amount of genes or loci tht may have SNPs that will also indicate a correlation, but have not been studied yet.
What are the functions of the different types of RNA polymerase in eukaryotic cells?
RNA pol I: transcribes rRNA (structural component of translational machinery)
pol II: TRANSCRIBES all PROTEIN coding and many FUNCTIONAL RNA genes, including snoRNAs and microRNAs
Pol III: TRANSCRIBES tRNA and 5S rRNA genes
Mitochondrial RNA pol: nuclear encoded protein, but is structurally and functionally related to bacteriophage RNA pol.
What are the different types of common promoter elements?
Are these promoter elements absolutely required for transcription? What role(s) do the common promoter elements play in transcription regulation?
Common promoter elements:
1. GC or CAAT box: binding sites for transcription factors
2. TATA box: facilitates BINDING of TBP (TATA-binding protein). Is in 32% of promoters
3. BRE: transciption factor recognition site for TFIIB, positions RNA pol. At start of site of transcription
4. Inr and DPE other core promoter elements
Common promotor elements are NOT essential nor SUFFICIENT for initiation of transcription
What are the various components of the RNA pol II initiation complex and what are their functions?
Composed of many transcription factors, as well as RNA pol II:
TBP (TATA binding protein): recognizes TATA box
TAF subunits: recognize DNA seq. near start point, also regulate DNA binding by TBP
TFIIB: Recognizes BRE element, used to position RNA pol. @ start site of transcription
TFIIF: stabilizes RNA pol. Interaction with TBP and TFIIB, also attracts other TFs (TFIIE and TFIIH)
TFIIE: attracts and regulates TFIIH
TFIIH: UNWINDS DNA @ transcription start point, phosphorylates Ser 5 of RNA pol C-terminal domain, also RELEASES RNA pol. from promoter
What are enhancers and silencers and how do they differ from the common promoter elements? Why can enhancers and silencers can be located 1000s of bps away from the genes they regulate?
Enhancers and repressors bind to proteins (transcription factors) to initiate or repress transcription.
Enhancers and silencers are sequences that are located within the promoter region (may also be 1000s of bps from start of transcription, in which case they can utilize DNA looping so the TFs that they are bound to can interact w/pol II and other TFs at start site of transcription.
An enhancer can also interact as a repressor for a different gene than it acts as an enhancer for.
Some enhancers may be involved with TFs that are needed to dimerize to function, and are done through an extensive regulatory complex through various developmental/environmental signals