Foundations of Biology

 

Every living organism has a genome made up of deoxyribonucleic acids (DNA) arranged in a double helix. This genetic material is duplicated and passed to the daughter cell. Organisms are divided into eukaryotes (organisms with nucleus and segmented genes with introns and exons such as primates) and prokaryotes (usually circular genomes without a nuclear membrane such as Bacteria and Archaea). The sequence of DNA is represented as sequence of letters partitioned into chromosomes in a form of a four letter alphabet {A,C,G,T}, corresponding to the bases in the helix. Each base is paired with an opposite base (A with T and C with G), hence we only need one of the double strands in order to describe a genome (our convention is to represent DNA in 5Õ to 3Õ direction).

Genomes are highly organized and structured. Some sequences represent genes, which encode proteins. Proteins are made of 20 different amino acids. Within a gene, every three nucleic acids (DNA) encode one amino acid, the triplet nucleic acid is also called a codon. (Please refer to the genetic code). Some codons encode the end (TAA, TAG and TGA) or beginning (ATG) of the protein.

In order to make proteins, DNA is copied into a similar molecule called mRNA in a process called transcription. The mRNA is then processed in translation into a string of amino acids that are collectively called protein. Proteins perform all cellular work including duplication, digestion and cell death. Proteins are regulated through tight transcription control and once a function is performed, the protein is quickly degraded in order to get inactivated and recycle resources.

 

© 2002 by Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter.

 

 

Two distinct amino acids that share a common ancestor are called homologous. These could have been inherited through speciation (orthologous) or duplication (paralogous). Because we cannot sequence ancestral genome, it is hard to prove two aminoacids are homologous. However, statistical arguments can show that it is extremely likely that two amino acids are homologous. In this case, it is very important to identify homologous amino acids between genomes of related species. Closely related species however are too similar on the protein level, in which case, one should compare DNA sequences. Alignment of these sequences can point to parts of the genome that is under selection and likely to be functional. Duplicated genes are generally considered to adopt one of three possible fates: nonfunctionalization (silencing of one copy), neofunctionalization (acquisition of a novel function for one copy), or subfunctionalization (partitioning of tissue-specific patterns of expression of the ancestral gene between the two copies) (Lynch and Conery, 2000).

 

Genomes also undergo duplication. It is known that a gene can be duplicated during genome duplication through unequal cross-over, replication error or retro-transposition of an endogenous mRNA. Once duplicated, a gene can mutate and ultimately perform a different function from the original gene, a process we call NEO-FUNCTIONALIZATION. If the second gene mutates towards complete loss of function, we call it a PSEUDO-GENE. A third option is that the duplicated gene maintains the same function, but is transcribed at a different time in which case we call it SUB-FUNCTIONALIZATION.

Two genes can also evolve separately and achieve the same function such as subtilisin and chymotrypsin. Their sequence is not necessarily similar, in which case not only genetic code is important for homolog detection but also the structure of the protein.  Hidden Markov Models allow for integration of diverse Biological Information such as primary structure (linear genetic code), secondary and tertiary structure (protein folding) and the gene structure (intron, exon, operon, start and stop codons and open reading frame of a gene).

Although replication of the genetic material is the main form of ensuring survival of the off-spring, some organisms evolved ways of taking up genetic information from the environment. We see many examples of LATERAL GENE TRANSFER in bacteria, in which we can detect different genome content. Genome content can be measured in bias towards nucleic acid (DNA) enrichment. For example, if bacteria lives in a very hot environment, we find its genome enriched in G paired with C more than A with T. That is because GC pairing endures higher temperatures. This GC measure was the first way Biologists could distinguish different species, so you will invariably find that information about all organisms. During lateral gene transfer it is possible to detect a different GC content of the foreign genetic material.

PROTEINS:

The beginning of a protein is called N-terminus and the end called C-terminus. Proteins have a complex structure and are organized in 4 levels: the linear sequence of amino acid is called primary structure, the secondary structure is the shape formed by interaction of amino acids close together, tertiary structure is the three-dimensional architecture and quaternary structure is the protein module composed of many interacting proteins.

A protein also carries information that will help its transport and localization. Nuclear localization signal is located in the middle of the protein and excretion signal is located at the N-terminus. Proteins can be soluble and free-flowing within the cell or bound in a membrane (Transmembrane ). Because membranes are highly hydrophobic, membrane bound proteins are required to have hydrophobic stretches.

Studies of conformation and function has revealed a main organizational unit: the protein domain. A protein domain is a structure subset of protein that can fold independently towards a stable structure. This is a modular structure that together forms the active parts of proteins. Domains are associated and create different functions and are constructed of combinations of a helices and b sheets.

 

 

Any further biological question, you may want to check out the free on-line books from NCBI here.

 

Positive selection: in human evolution for example we can guess what some of these forces are such as new nutrition, need for reproduce, exposure to disease causing agents. Traits that have been changed by selection such as the Hemoglobin gene conferring resistance to falciporum malaria in Subhara desert (where disease is present) and LCT, adult lactose digestion gene fixed in the population after cattle domestication.

Signature of selection is persistent and stays within species. Recently researcher have been able to scan the genome where selection has occurred in order to id locus and then to to id gene (trait). When a neutral mutation occurs, mutation does not affect fitness, we observe a genetic drift in the population over many years. When selection operates, the mutation spreads quickly and may become fixed in the population (that is 100% present). Two ways we can test : 1) study differences between species, or 2) genetic variation within species. For functional mutations under positive selection, beneficial mutations may become fixed, for example the human in comparison to chimp PRM1 sperm related gene, exon 2 has 6 mutations (more functional changes 5 vs a silent mutation 1) , with more functional changes (Rooney 1999/wickoff 2000 used KA/KS test, relative rate test and MK test).

 

Shaffner (MIT) Science seminars

Initially a locus holds diversity, when an advantageous mutation occurs and goes under positive selection, a selective sweep eliminates variation on region and brings mutation of high frequency. Pattern of haplotypes diminish (that is a long common haplotype is fixed throught population.

 

Between species analysis has limited power but rapidly evolved classes (olfactory)

Expected under neutrality statictical tests by using simultaon of neutral variation, varints out-liers are under positive selection. The best we can do is to apply to datasets and id the extreme tail (out-lier of distribution), this is how novel candidates were picted up. Frequency of allele vs. haplotypeÉ long haplotype, high frequency are candidatesÉ regions contain no genes but has been shown to be associated with obesity. Function SLC25A5, pigment in zebra fish (skin pigment), candidate for selection.