The two-language problem refers to the fact that many scientific codes are prototyped in a slow but flexible language (to test an idea quickly) but then have to be moved to a faster but less flexible language for practical applications. Translating from one language to another both slows things down and introduces bugs.
How does Julia address the 2-language problem?
Allows for flexible prototyping which can be replaced by a fast optimized code
When was DNA discovered as the physical carrier of hereditary information (also which experiment?)
1943, Oswald and Avery
"transforming principle"
R and S strain pneumococcus experiment in mice
What was discovered in 1950, by Edwin Chargaff?
complementary base pairing between AT and GC
What was discovered by Walter Flemming, 1878?
DNA is densely coiled and packed into chromosomes
Overview of basic DNA terms (~10 terms)
Where in dna might we use motif discovery?
to identify transcription binding sites
Facts about the human genome:
all cells have the same genome (all cells came from repeated duplication events starting from an initial cells)
human genome is 99.9% similar between individuals
human genome is 3 billion bp long (~1.8 m stretched out)
Why do we study Bioinformatics?
DNA sequencing has created massive amounts of data that can only efficiently be analyzed with computers
As the information becomes greater and more complex, more computational tools are needed to sort thru the data
DNA hard drives
memory of every cells that tells how to create the entity it was found in
potential way to store computer information
Was able to transcribe pdfs of watson and crick, shakespeare's sonnets, and audio files of MLK's "I have a dream speech"
Describe a basic computational problem:
You have a specific input and a problem you want to solve for (the output)
We have a task we want to solve for
Principles of algorithm design:
Exhaustive search or brute force
Branch-and-bound
Dynamic programming
Divide-and-conquer
Randomization
Heuristics
Time complexity of an algorithm (RAM model):
Time Complexity: Basic operations take constant time. The running time of an algorithm is the total number of basic operations executed.
What are basic operations and what are some examples?
They are basic computations performed by an algorithm, largely independent from the programming language
Examples:
–Evaluating an expression
–Assigning a value to a variable
–Indexing into an array
–Calling a method
–Returning from a method
–Comparing two values
Inspect the pseudocode. What is the maximum number of basic operations executed by the algorithm, as a function of the input size?
Total: 6n-3
finding index =1
assigning value =1
for each iteration in a loop, each step multiplied by (n-1) = # of steps needed
*** we focus on characteristic operations***
What is the characteristic operation here?
n-1
When analyzing an algorithm, what is the goal?
What is best, worst, and average case?
Show algorithm is correct
Most algorithms transform input objects into output objects, and their running time grows with the input size.
–Identify basic operations and problem size.
–Count number of basic operations as function of problem size.
Worst case: MAXIMUM number of basic operations over ALL inputs of size n.
Best case: MINIMUM number of basic operations over ALL inputs of size n.
Average case: AVERAGE …
What is an algorithm?
An algorithm for solving a computational problem is a well-defined procedure that takes a set of values as input and produces in a finite number of steps the output.
What are the main ways we have looked at analyzing an algorithm?
Time (runtime) and space (memory)
Example: Insertion sort
What is the best, average, and worst case?
Best case: (n-1) comparisons
Average case:
Worst case: if i is the smallest element among all the already sorted prefixes
big-Oh:
if f(n) is asymptotically less than or equal to g(n)
big-Omega:
if f(n) is asymptotically greater than or equal to g(n)
big-Theta:
if f(n) is asymptotically equal to g(n)
Which algorithm has the better run time?
Algorithm 2 (linear)
The big-Oh notation gives an upper bound on the growth rate of a function
The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n)
We can use the big-Oh notation to rank functions according to their growth rate
What are the three Big-Oh rules?
What should we think of when we see f<g?
Let f<g denote "f grows asymptotically slower than g"
i.e. if f∈O(g) then you can use the rule for poly-logarithmic functions:
for asymptotic comparisons in Big-Oh notation, how should we compare: n, log, log(log)?
powers of n dominate powers of the log, which dominate powers of the log(log)
Describe big-Omega in terms of a constant (c >0) and an integer constant
Describe big-Theta in terms of a constant (c' and c'' >0) and an integer constant
More big-Oh rules (big-Theta):
when is f∈Θ(g)?
More big-Oh rules (little oh):
f∈o(g)
More big-Oh rules (little omega):
f∈ω(g)
Little 'oh' includes Big Theta?
T/F
False
More big-Oh rules:
all logarithms are asymptotically equivalent
T/F
True
More big-Oh rules:
asymptotically, exponential functions grow slower than polynomials and logarithms
T/F
False
Lexicographical ordering for the following functions:
a.
b.
c.
d. 1
e.
a. (0,-1, 0)
b. (0, -0.5, 0)
c. (1, 0, -1)
d. (0, 0, 0)
e. (0, 0, 1)
Rank the following in order from fastest to slowest:
a.
b.
c.
d. 1
e.
c, e, d, b, a
The type of problem where you can check an optimal solution but you cannot compute in polynomial time
NP
Define when problem P is called intractable
if there exists no polynomial-time algorithm to solve it
Intractability is a property of the problem not of any one algorithm to solve it.
note: just because we don't know the poly-time algorithm for P, does not make P intractable
Why are polynomial time problems important?
Poly-time is important because, for large problem sizes, non-poly-time algorithms will take forever to execute
How is a polynomial time algorithm defined?
A polynomial-time algorithm has a worst-case running bounded by a polynomial function
How can we group different sort of computational problems into three categories?
1. Problems for which we know poly-time algorithms
2. Problems that have been proven to be intractable
3. Problems for which we don’t know a poly-time algorithm, but no one has proven that one doesn’t exist either
Example of a problem discussed that requires a non-polynomial amount of output/processing time
Determining all Hamiltonian Circuits
(n-1)! circuits in the worst case
Example of a problem discussed for which has no algorithm that can always compute the correct solution?
The Halting Problem
these are undecidable problems and very few exists, Haltings is a classic
What are some problems in the "Unknown Category" of computational problem?
Many problems belong to the category:
0-1 Knapsack, Traveling Salesperson, m-coloring, Hamiltonian Circuits, etc.
What does the "Unknown Category" mean when describing a computational problem?
We don't know whether they are tractable or intractable
Define an intractable problem
if there exists no polynomial-time algorithm to solve it