Foundations of Biology
Every
living organism has a genome made up of deoxyribonucleic acids (DNA) arranged
in a double helix. This genetic material is duplicated and passed to the
daughter cell. Organisms are divided into eukaryotes (organisms with nucleus
and segmented genes with introns and exons such as primates) and prokaryotes
(usually circular genomes without a nuclear membrane such as Bacteria and Archaea). The sequence of DNA is represented as sequence of
letters partitioned into chromosomes in a form of a four letter alphabet
{A,C,G,T}, corresponding to the bases in the helix. Each base is paired with an
opposite base (A with T and C with G), hence we only need one of the double
strands in order to describe a genome (our convention is to represent DNA in 5Õ
to 3Õ direction).
Genomes
are highly organized and structured. Some sequences represent genes, which
encode proteins. Proteins are made of 20 different amino
acids. Within a gene, every three nucleic acids (DNA) encode one amino
acid, the triplet nucleic acid is also called a codon. (Please refer to the genetic
code). Some codons encode the end
(TAA, TAG and TGA) or beginning (ATG) of the protein.
In order
to make proteins, DNA is copied into a similar molecule called mRNA in a
process called transcription. The
mRNA is then processed in translation
into a string of amino acids that are collectively called protein. Proteins
perform all cellular work including duplication, digestion and cell death.
Proteins are regulated through tight transcription control and once a function
is performed, the protein is quickly degraded in order to get inactivated and
recycle resources.

© 2002 by Bruce Alberts, Alexander
Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter.
Two
distinct amino acids that share a common ancestor are called homologous. These
could have been inherited through speciation (orthologous) or duplication (paralogous). Because we cannot sequence ancestral genome, it is
hard to prove two aminoacids are homologous. However, statistical arguments can
show that it is extremely likely that two amino acids are homologous. In this
case, it is very important to identify homologous amino acids between genomes
of related species. Closely related species however are too similar on the
protein level, in which case, one should compare DNA sequences. Alignment of
these sequences can point to parts of the genome that is under selection and
likely to be functional. Duplicated genes are generally considered to adopt one of three possible fates: nonfunctionalization (silencing of one copy), neofunctionalization (acquisition of a novel function for one copy), or subfunctionalization (partitioning of tissue-specific patterns of expression of the ancestral gene between the two copies) (Lynch and Conery, 2000).
Genomes
also undergo duplication. It is known that a gene can be duplicated during
genome duplication through unequal cross-over, replication error or
retro-transposition of an endogenous mRNA. Once duplicated, a gene can mutate
and ultimately perform a different function from the original gene, a process
we call NEO-FUNCTIONALIZATION. If the second gene mutates towards complete loss
of function, we call it a PSEUDO-GENE. A third option is that the duplicated
gene maintains the same function, but is transcribed at a different time in
which case we call it SUB-FUNCTIONALIZATION.
Two
genes can also evolve separately and achieve the same function such as
subtilisin and chymotrypsin. Their sequence is not necessarily similar, in
which case not only genetic code is important for homolog detection but also
the structure of the protein.
Hidden Markov Models allow for integration of diverse Biological
Information such as primary structure (linear genetic code), secondary and
tertiary structure (protein folding) and the gene structure (intron, exon,
operon, start and stop codons and open reading frame of a gene).
Although
replication of the genetic material is the main form of ensuring survival of
the off-spring, some organisms evolved ways of taking up genetic information
from the environment. We see many examples of LATERAL GENE TRANSFER in
bacteria, in which we can detect different genome content. Genome content can
be measured in bias towards nucleic acid (DNA) enrichment. For example, if
bacteria lives in a very hot environment, we find its genome enriched in G
paired with C more than A with T. That is because GC pairing endures higher
temperatures. This GC measure was the first way Biologists could distinguish different
species, so you will invariably find that information about all organisms.
During lateral gene transfer it is possible to detect a different GC content of
the foreign genetic material.
PROTEINS:
The
beginning of a protein is called N-terminus and the end called C-terminus.
Proteins have a complex structure and are organized in 4
levels: the linear sequence of amino acid is called primary structure, the
secondary structure is the shape formed by interaction of amino acids close
together, tertiary structure is the three-dimensional architecture and
quaternary structure is the protein module composed of many interacting
proteins.
A
protein also carries information that will help its transport and localization.
Nuclear localization signal is located in the middle of the protein and
excretion signal is located at the N-terminus. Proteins can be soluble and
free-flowing within the cell or bound in a membrane (Transmembrane ). Because
membranes are highly hydrophobic, membrane bound proteins are required to have
hydrophobic stretches.
Studies
of conformation and function has revealed a main organizational unit: the
protein domain. A protein domain is a structure subset of protein that can fold
independently towards a stable structure. This is a modular structure that
together forms the active parts of proteins. Domains are associated and create
different functions and are constructed of combinations of a helices and b sheets.
Any
further biological question, you may want to check out the free on-line books
from NCBI here.
Positive
selection: in human evolution for example we can guess what some of these
forces are such as new nutrition, need for reproduce, exposure to disease
causing agents. Traits that have been changed by selection such as the Hemoglobin
gene conferring resistance to falciporum malaria in Subhara desert (where
disease is present) and LCT, adult lactose digestion gene fixed in the
population after cattle domestication.
Signature
of selection is persistent and stays within species. Recently researcher have
been able to scan the genome where selection has occurred in order to id locus and
then to to id gene (trait). When a neutral mutation occurs, mutation does not
affect fitness, we observe a genetic drift in the population over many years. When
selection operates, the mutation spreads quickly and may become fixed in the
population (that is 100% present). Two ways we can test : 1) study differences
between species, or 2) genetic variation within species. For functional mutations
under positive selection, beneficial mutations may become fixed, for example the
human in comparison to chimp PRM1 sperm related gene, exon 2 has 6 mutations
(more functional changes 5 vs a silent mutation 1) , with more functional
changes (Rooney 1999/wickoff 2000 used KA/KS test, relative rate test and MK
test).

Shaffner
(MIT) Science seminars
Initially
a locus holds diversity, when an advantageous mutation occurs and goes under
positive selection, a selective sweep eliminates variation on region and brings
mutation of high frequency. Pattern of haplotypes diminish (that is a long
common haplotype is fixed throught population.
Between
species analysis has limited power but rapidly evolved classes (olfactory)
Expected
under neutrality statictical tests by using simultaon of neutral variation,
varints out-liers are under positive selection. The best we can do is to apply
to datasets and id the extreme tail (out-lier of distribution), this is how
novel candidates were picted up. Frequency of allele vs. haplotypeÉ long
haplotype, high frequency are candidatesÉ regions contain no genes but has been
shown to be associated with obesity. Function SLC25A5, pigment in zebra fish
(skin pigment), candidate for selection.