Overview
Proteins are the fundamental building blocks of all life - understanding and classifying these molecules is one of the crucial steps in extracting the benefits to human health that are encoded in genome information. Each entry in the Pfam database includes a protein sequence alignment as well as an accompanying statistical model, called a hidden Markov model.
Proteins are built from a number of regions, called domains, which in different combinations can determine the protein's function. Pfam allows users to analyse sequence data and search for related proteins in the database. The tool also lets users see the structure and domain architecture of any of the proteins stored, examine what species proteins are found in and look at multiple alignments. In addition, Pfam stores and gives access to information on higher level groupings of related protein families - known as clans - which are related by similarity of sequence, structure or by a statistical analysis of their associated hidden Markov model.
The database comprises two main collections of information. Pfam-A comprises high-quality entries that have been curated manually. To extend the sequence coverage of Pfam, an additional area of the Pfam database - Pfam-B - contains automatically curated entries that are of a lower quality but add valuable coverage for regions not yet curated and stored in Pfam-A.
The latest version of the Pfam database contains approaching 12,000 curated protein families, but the aim of the project is to develop a comprehensive classification of all known protein sequences. On its way to achieving this ambitious goal, the open access resource will speed scientific discovery by continuing to share all new information as it is added to the database.
Selected Publications
-
The Pfam protein families database.
Nucleic acids research 2010;38;Database issue;D211-22
PUBMED: 19920124; PMC: 2808889; DOI: 10.1093/nar/gkp985
-
Pfam 10 years on: 10,000 families and still growing.
Briefings in bioinformatics 2008;9;3;210-9
PUBMED: 18344544; DOI: 10.1093/bib/bbn010


