Bioinformatics Notes – Get to know your proteins

Please feel free to contact me with any questions: Carolina Dallett – cdallett@gmail.com

              One of the goals of Bioinformatics is to infer function to protein using its basic properties: sequence and structure. Sometimes its is hard to find protein function using bench work alone and bioinformatics can be a powerful predicting tool to aid in further experiments. We would like to teach you in this module the tools of protein Phylogenetic analysis (the study of the “intersection of Evolution and Genomics”-Eisen, J 2003) that is the study of hierarchical evolutionary relationship of organisms and their genomes.

              Inference is a multistep process involving selection of homologs, Multiple sequence alignment and Phylogenetic tree construction, following annotations and tree topology. When inferring molecular function by homology, it is critical to confirm that the query and hit have the same overall fold (i.e., are globally alignable, that they are orthologous, and that the database annotation of the hit is based on experiment. As you attempt to infer function to your protein of interest, keep in mind of the following caveats that could cause errors in your investigation and outcome: Gene duplication (enable neofunctionalization of a set of proteins on same structural template), Domain Shuffling (domain fusion or fission may give you a local region of homology, and thus function may be assigned to a different region), Database error (yes our genbank database carries lots of annotation error) and Evolutionary distance (although proteins share same common ancestor, they have evolved different functions). (Adapted from Brown and Sjolander 2006).

TABLE OF CONTENTS

      I.       Foundations of Biology

  II.       Basic Bioinformatics:

A.  Overview of Online tools, Databases and Resources

B.   Gathering Homologs using Blast and using Flowerpower

C.  Creating, Viewing and Editing MSA and using Belvu

D.  Protein Structure Prediction (web servers and HMM methods)

E.   Protein cellular localization

F.   Predicting key residues

G. Protein-protein interactions and pathway inference

H. Tree Construction

I.      Analysis and how to obtain a Fasta formated sequence

III.       Frequent Asked Questions

IV.       Unix for beginners

  V.       LINKS

VI.       Handy references