BRCA1 Protein Homologs and Sequence Alignments



Protein
     Homologs and alignments
     Phylogeny
     Motifs and domains
     Protein interactions

ClustalW alignment

The following file is the ClustalW alignment of the Homo sapiens BRCA1 protein to the homologous gene in Pan troglodytes, Mus musculus, Bos taurus, Canis lupus familiaris, Gallus gallus and Caenorhabditis elegans. All default settings were used. [3] (Clicking on links to each species will allow you to access the GENPEPT entry for the BRCA1 homolog for the respective species)

I have highlighted the regions of the human protein domains in the file in yellow. The region towards the N-terminus is the RING-finger domain, and the two regions towards the C-terminus are BRCT domains. See Protein Motifs and Domains for more information on these regions.

Alignments were also carried out using the four matrices available in ClustalW (Blosum, Pam, Gonnet, and Id). All four resulted in the exact same phygenetic tree, which follows the ClustalW file below.

clustalw_protein_alignment.docx
File Size: 146 kb
File Type: docx
Download File

T-COFFEE alignment

The following file is the T-COFFEE alignment of the Homo sapiens BRCA1 protein to the homologous gene in Pan troglodytes, Mus musculus, Bos taurus, Canis lupus familiaris, Gallus gallus and Caenorhabditis elegans. All default settings were used. [4,5] (Clicking on links to each species will allow you to access the GENPEPT entry for the BRCA1 homolog for the respective species)

Alignments were also carried out using the two matrices available (Blosum and Pam). The phylogenetic trees were identical using both matrices. The tree is shown after the T-COFFEE alignment file below. The alignment score was 64 using Blosum and 63 using Pam.

t-coffee_protein_alignment.docx
File Size: 128 kb
File Type: docx
Download File

MUSCLE alignment

The following file is the MUSCLE alignment of the Homo sapiens BRCA1 protein to the homologous gene in Pan troglodytes, Mus musculus, Bos taurus, Canis lupus familiaris, Gallus gallus and Caenorhabditis elegans. All default settings were used. [1,2] (Clicking on links to each species will allow you to access the GENPEPT entry for the BRCA1 homolog for the respective species)

muscle_protein_alignment.docx
File Size: 146 kb
File Type: docx
Download File

STRING Occurrence view

STRING offers an interesting way of showing homology. Shown below is the occurrence view available from the website. Please note that RNF53 is an alias for BRCA1, and so the amino acid sequence similarity for BRCA1 is shown in the first column on the left. STRING uses sequence alignments to generate sequence similarities. The darker the square, the more sequence similarity there is to the human protein homolog. The proteins shown to the right of BRCA1/RNF53 are proteins known to interact with BRCA1 in humans. From this data, it appears that BRCA1 function may have evolved later than H2AFX and RAD51, as these sequences show a greater conservation as far back as plants and fungi.

From STRING 8.0, 2009. BRCA1 Occurrence View. Retrieved from string.embl.de/newstring_cgi/show_ajax_phylo_evidence.pl?taskId=0F60PZB3wZ2R&allnodes=1.

Analysis

All three alignment programs can be accessed through the EMBL-EBI website, and therefore the ease of use for all of these programs was very similar. All three programs allow input of several sequences in FASTA format. The ClustalW program has many more options that can be changed from the default than either T-COFFEE or MUSCLE. The ClustalW algorithm generates scores for subsets of the aligned sequence based on overall similarity, which is an advantage of this program over the others. [1,2,3,4,5]

All three alignments matched the sequences very similarly, with the exception of C. elegans. C. elegans has a very short protein sequence, and therefore finding a similar match to the other more similar homologs was a challenge. The regions of greatest dissimilarity between the homologs were the C terminal and N terminal.

ClustalW and T-COFFEE both generate phylogenetic trees based on the data from the alignment. I was unable to find the algorithms used by each program to derive the trees. The trees produced differed dramatically. The order of species in both trees is counterintuitive to the evolutionary tree, though some of these are simply inverted because the computer program cannot distinguish between the original and later sequences. Another major difference is that the distances between branchpoints is very different in both of the trees, as can be seen above.

The STRING view of protein homology is a little more visual. The phylogenetic tree on the left is the accepted tree based on many proteins, and is not generated from similarity of the proteins of interest shown. You can see a pattern in the colors, with species closer to humans being darker red for not only BRCA1, but also for its interacting proteins. For species still closely related to humans, but with less homology than expected, you can see general trends of all proteins being lower homology, not just BRCA1. This may suggest that compensatory mutations occurred in these other proteins, allowing them to retain their interactions in these organisms.


[1] Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.F., Guindon, S., Lefort, V.,
     Lescot, M., Claverie, J.M., Gascuel, O. Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic
     Acids Research
, 36(Web Server issue):W465-9.
doi:10.1093/nar/gkn180.
[2] Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic
     Acid Research
, 32(5): 1792-1797. doi:10.1093/nar/gkh340.

[3] Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace,
     I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J. and Higgins, D.G. (2007). ClustalW and ClustalX version
     2. Bioinformatics, 23(21): 2947-2948. doi:10.1093/bioinformatics/btm404.
[4] Notredame, E., Higgins, D.G., and Heringa, J. (2000). T-Coffee: a novel method for fast and accurate multiple
     sequence alignment. Journal of Molecular Biology, 302(1):205-217. doi:10.1006/jmbi.2000.4042.

[5] Poirot, O., O'Toole, E., and Notredame, E. (2003). Tcoffee@igs: A web server for computing, evaluating, and
     combining multiple sequence alignments. Nucleic Acid Research, 31(13):3503-3506.doi:10.1093/nar/
     gkg522.

Site created by Jessica D. Kueck
Genetics 677 Assignment, Spring 2009
University of Wisconsin-Madison