-
What are the two types of databases?
Prinary/ curated
-
A scientist has solved a structure and would like to create functional rules that can be propagated to rest of the members of the family. What kind of rule do you think he can make?
Site- rule
-
What is the basic unit of a PIRSF
Homeomorphic family
-
What tool would you use to find aTemplate for modeling
BLAST
-
Which of these are elements of protein secondary structure?
- A) Helic
- B) Sheet
- C) Loop
- D) All of the above
D: all of the above
-
What is the only amino acid found at position 3 in a Type II turn ?
Glycine
-
Which of the following databases is based on HMM?
- A) Uniprot
- B) Pfam
- C) CDD
- D) Genbanck
B: Pfam
-
A subgroup of two or more taxa or DNA/protein sequences that include both their common ancestor and all of their descendents”. What is this a definition of?
Clade
-
What is central to molecular recognition?
Protein-ligand interactions
-
Which of the following is aSupersecondary structure?
- A) alpha helix
- B) gamma turns
- C) beta sheet
- D) Greek key
D: Greek key
-
What is the guiding principle that is used in assigning protein function?
% identity
-
Which one of the following methods is based onclustering algorithms?
- A) Least Squares
- B) Neighbor Joining
- C) minimum Evolution
- D) Max Parsimony
B: Neighbor joining
-
Name one database that stores homology models?
MODBASE
-
Two structures with sequence identities below 20% can still be closely related with a RMSD < 1ang and belong to the same family. What can you attribute this to?
structures evolve slowly
-
What is the other term used for beta turns?
Reverse Turns
-
Which of the following is a domain database?
TigrFams
-
You are given the two accessions gi|4503055 & gi|117351. Which database would you go to retrieve the Fasta formatted file of these sequences?
NCBI
-
Where would you go to find the neighbors for proteins with ids 1P91, 2ZDZ, 1BNL
DALI
-
Which of the elements of protein structure is most flexible?
Loops
-
Which of the following is a Uniprot database?
UNIPRAC
-
Homologs produced by gene duplication are called what?
Paralogs
-
Which of the following databasesis based on HMMs?
SMART
-
Sequences are considered to be in the Twilight Zone if their sequence identities is :
< 30%
-
Which pair is considered as aConservative substitution?
Y/F
-
Which of the following is a Supersecondary structure?
helix-loop-helix
-
Which of the following is best suited to get divergent sequences?
- A) profile HHMs
- B) BLASTN
- C) TBLASTN
- D) BLASTP
A: Profile HMMs
-
What is the most important parameter used ininterpreting the results of sequencecomparisons?
E-value
-
At what sequence cut-off it isSafe to model?
> 60%
-
Who came up with the first evolutionary tree for globins?
M.O. Dayhoff
-
What is the typical word size usedwhen using blastp?
3
-
What is Bioinformatics
Integration of Omic terms
-
Name the three distinct domains of life. How manyBillion years ago roughly did each of them evolve?
- Bacteria (2.6 billion years)
- Archea (3.5 billion years)
- Eukarya (2.2 billion years)`
-
When was the first draft of human genomesequencing project completed (roughly)?
2001: Completion of the human draft genome!
-
What are accession numbers and IDs? What typeof ID is used by PDB? An example please?
- Identifier: string of letters and digits; can change
- Accession: leters and number; stable
-
What are gaps? How many gaps do you see in thefollowing alignment?
3 gaps
-
Name one tool that you would use to alignmultiple sequences.
- 1. CLUSTALW (Progressive Method)
- 2. MUSCLE (Iterative method)
- 3. T-Coffee/Expresso (Structure based)
-
What tool/database would you use to getstructural neighbors?
-
What tool would you use to browse a genome and find the Chromosomal location of a gene?
UCSC
-
What tool would you use to browse a genome and find theChromosomal location of a gene?
-
Name one tool that you would use to align twosequences to do a pair-wise alignment?
Align using BLAST
-
What is a domain? Give names of at least three domain families you have encountered so far.
Domain is an evolutionarily mobileunit of a protein
-
Define homology? Will two proteins belongingto the same family be considered Homologous? Why?At what sequence identity cut-offs can two proteins safely be considered homologous.
Homology: Two sequences or structures are said to be homologs or homologous to each other if they are related by divergence from a common ancestorHomology = descent from a common ancestor
Yes, two proteins belonging to the same family can be considered Homologous since they will share a common ancestor.
-
Define Orthologs and Paralogs. Give an example for each.
- Orthologs: homologs prduced by speciation
- Paralogs: homologs produced by gene duplication
- Xenologs: homologs resultiong from horizontal transfer of a gene b/w 2 organisms.
-
What does sequence identity mean?
Sequence IdentityThe extent to which two sequences are invariant.
-
What is a PAM matrix? What does PAM1 mean? Whatkind of alignment is it based on?
- PAM matrices: Point-accepted mutations
- based on global alignments of closely related proteins
- Calculated from comparisons of sequences with no more that 1% divergence
-
What are the two types of alignments? What type does BLAST use?
Global and local alignments
Local alignment is almost always used to database search such as BLAST
-
What are the different components of a protein?
- Motifs
- domains
- Full-length Protein
- Intergrated Family databases
- 3D structure
-
What is a Substitution matrix? Name the two majortypes of matrices.
- contains values proportionalto the probability that amino acid i mutates intoamino acid j for all pairs of amino acids.
- constructed by assemblinga large and diverse sample of verified pairwise alignments(or multiple sequence alignments) of amino acids.
- should reflect the true probabilitiesof mutations occurring through a period of evolution.
- The two major types of substitution matrices arePAM and BLOSUM.
-
Name two protein domain databases.
-
Comparing two sequences is the cornerstone of any bioinformatics analysis. Can you explain why that is so.
- It helps us to understand if two proteins are functionally related
- It helps us to understand if two sequences are structurallyrelated
- It helps us to understand if two sequences are structurally or Functionally related or both
- It helps us to identify common domains and motifs (ligand binding Sites; metal sites; active sites)
- More importantly it helps us to understand the differences and its
- Functional divergence and hence helps us look back at events Billions of years ago (Evolution!)
-
Which database serves as a universal hub for protein structuresdetermined by X-ray, NMR or EM
PDB
-
What are the different Uniprot databases?
-
Name the database you would use to get informationabout diseases.
OMIM: Online Mendelian inheritance in man
-
Name two structure classification databases.
-
Name two full-length classification databases.
-
You are given the two accessions O95050 & Q12400 . Which database would you go to retrieve the Fasta formatted file of these sequences. Describe the steps
- 1.Go to http://www.uniprot.org/
- 2.Click on retrieve
- 3.Paste the IDs
- 4.and save the file as fasta from the options.
-
I would like to know if 3D structure of my query proteinhas been determined? How would I go about it? Which databaseWould I go to and what tool would I use?
- 1.Go to NCBI and do a BLAST
- 2.Go to PDB and do a BLAST against PDB
-
Where would you go to find the neighbours (other related proteins) for proteins with ids 1P91, 2ZDZ, 1BNL
-
A scientist would like to design a hybridization probe for a Type IIB bleeding disorder gene that he has the sequence for. Which tool would you recommend he uses for his probe design?
Primer3Plus
-
A scientist has just cloned a gene and would like to see if it hasany known domains. How should he go about it?Which database do you recommend he uses?
- NCBI to sequence
- go to Pfam, SMART, CD
-
Name the database you would use to get informationabout diseases.
OMIM: Online Mendelian inheritance in man
-
Which of the following is a domain database?
CDD
-
Where would you go to find the neighbors for proteins with ids 1P91, 2ZDZ, 1BNL
DALI
-
Which of the following is a Uniprot database?
- A) RESEQ
- B) UNIPRAC
- C) OMIM
- D) ENTREZ
B: UNIPARC
-
Homologs produced by gene duplication are called what?
Paralogs
-
What is the typical word size used when using blastp?
3
-
Primary Databases
- txt: PubNed
- DNA seq: GenBank, DDBJ, EMBL
- Protein seq: Entrez Proteins, TREMBLE, Refseq
- Protein structres: PDB
-
Curated databased
- DNA seq: RefSeq, OMIM
- Protein seq: Swiss-Prot, PIR, Refseq
- Genomes: Entrez Genomen, COGs
-
Protein sequence databases
-
uniprot databases
- UniprotKB: SWISS-PROT/TrEMBL
- UniRef
- UniParc
-
-
-
-
Secondary processed structrual databases
-
-
Protein full length calssification
-
-
-
-
Tools/Programs
- Chromosome Location: UCSC Genome Browser,NCBI MapViewer
- Primer/Probe Design: Primer3Plus
- Pair-wise alignments: BLAST align-b12sep
- ID conversion: PIR-ID mapping
- Multiple swq alignments: CLUSTALW(progressive method),;MUSCLE (iterative method); T-coffee, ecpresso (structure-based method); HMMER (statistical method)
- homology modeling: Modeller, swiss-model
- nodel validation: ramachandran plot
-
-
-
KB protein seq
- UNIPROTKB (SP &TrEMBL)
- NCBI (REFSEQ)
-
-
-
-
-
-
STRUCTURE SPECIFIC: UNIVERSAL HUB
PDB (PRIMARY)
-
STRUCTURE SPECIFIC: NEIGHBORS
-
STRUCTURE SPECIFIC: STRU. CLASS
-
-
-
PRIMARY DB: PROT SEQ
- UNIPROTKB (TrEMBL)
- NCBI (REFSEQ)
-
-
-
-
-
-
-
SEARCH FOR COMPLETE GENOMES
GOLD
-
WHAT IS A DOMAIN
PROTEIN THAT CARRIES STURCTURE AND FUNCTION
-
WHAT IS A CONSERVED DOMAIN
PIECE OF PROTEIN THAT IS CONSERVED ACROSS A FAMILY
-
HOW MANY GENES HAVE BE IDENTIFIES THAT INVOLVE DISEASE
< 2500
-
SEQ SIMILARTY
- EXTENT TO WHICH NUCLEOTIDE OR PROTEIN SEQUENCES ARE RELATED
- BASED ON IDENTITY PLUS CONSERVATION
-
SEQ CONSERVATION
CHANGES AT A SPECIFIC POSITION OF A AMINO ACID OR SEQUENCE THAT PRESERVE THE PHYSICO-CHEMICAL PROPERTIES OF THE ORIGINAL RESIDUE
-
PHI VS. PSI BLAST
- PHI: PATTERN-HIT INITIATED SEARCH
- PSI: POSITION-SPECIFIC ITERATED SEARCH
-
PAM
- POINT ACCEPTED MUTATION
- M.O. DAYHOFF
- LOW PAM: SHORT STRONG LLOCAL SIMILARITIES
- HIGH PAM: WEAK SIMILARITIES
-
phylogeny
- evolutionary history of an organism
- cornerstone of systematic taxonomy
-
systematics
study of the exolution of biological diversity
-
root
common ancestor of all taxa
-
branch
reflects the relationship b/w taxa according to descent and ancestry
-
node
a toxonomic unit identifying either an existing or extinct species
-
distance scale
scale that represents the number of differences b/w organisms or seq
-
topology
defins the branching patterns of the tree
-
What are the two types of computational methods used in phylogenetic analysis?
Clustering algorithms & Optimality approaches
-
Name four methods that uses optimality criterion.
Parsimony; Maximum Likelihood; Minimum evolution & Least squares
-
Name a few methods that uses clustering algorithms.
UPGMA & Neighbor joining
-
How is the decision made on when to use what method?
Based on the levels of similarity
-
Define Taxonomy and Cladistics.
-
What are the three types of trees?
- cladogram
- phylogram
- ultrameric tree
-
How are phylogenetic analysis depicted?
-
-
What is evolution at the molecular level?
-
What is functional annotation?
-
Why is manual annotation important and absolutely essential?
-
What are the advantages of using PIRSFs?
-
What are the two types of rules? What is the difference between them?
-
What are site rules and what do you absolutely need to create a site rule for propagation?
-
What is the clustering tool used to create PIRSFs?
|
|