PFAM : Multiple alignments and profile HMMs of protein domains RELEASE 5.1 -------------------------------------- 1. INTRODUCTION Pfam is a collection of protein family alignments which were constructed semi-automatically using hidden Markov models (HMMs). Sequences that were not covered by Pfam were clustered and aligned automatically, and are released as Pfam-B. Pfam families have permanent accession numbers and contain functional annotation and cross-references to other databases, while Pfam-B families are re-generated at each release and are unannotated. See http://www.sanger.ac.uk/Software/Pfam/ http://pfam.wustl.edu/ http://www.cgr.ki.se/Pfam/ 2. STATISTICS Pfam Pfam-B ----------------------- ----------------------- Release Date families sequences residues families sequences residues Source ------- ----- -------- --------- -------- -------- --------- -------- --------- 0.2 01/96 100 10431 2246421 11763 32081 9200334 Swiss 32 1.0 04/96 175 15610 3560959 11929 31931 8957230 Swiss 33 2.0 03/97 527 28170 6770529 13289 31349 8224614 Swiss 34 2.1 10/97 527 28205 6790960 13289 31349 8224614 Swiss 34 3.0 06/98 806 99043 22766133 33550 79544 20648530 Swiss 35 + SP-TrEMBL 5 3.1 09/98 1313 114750 27573470 33550 79544 20648530 Swiss 35 + SP-TrEMBL 5 3.2 10/98 1344 115155 27689081 33550 79544 20648530 Swiss 35 + SP-TrEMBL 5 3.3 12/98 1390 119420 28085438 33550 79544 20648530 Swiss 35 + SP-TrEMBL 5 3.4 01/99 1407 119963 28343136 33550 79544 20648530 Swiss 35 + SP-TrEMBL 5 4.0 05/99 1465 147347 34476183 128689 123610 33470292 Swiss 37 + SP-TrEMBL 9 4.1 07/99 1488 148195 34692597 36739 89640 22510097 Swiss 37 + SP-TrEMBL 9 4.2 08/99 1664 155979 36683193 40017 99587 24062200 Swiss 37 + SP-TrEMBL 9 4.3 09/99 1815 161833 37803491 39506 97492 23115975 Swiss 37 + SP-TrEMBL 9 4.4 11/99 2000 164412 38411490 39200 96055 22552453 Swiss 37 + SP-TrEMBL 9 5.0 01/00 2008 178110 41516321 39228 96077 22506088 Swiss 38 + SP-TrEMBL 11 5.1 02/00 2015 179782 41704446 42357 103709 24762358 Swiss 38 + SP-TrEMBL 11 3. DESCRIPTION OF CHANGES MADE SINCE RELEASE 5.0 Pfam 5.1 is based on Swiss-Prot 38 and SP-TREMBL 11 sequences. These databases can be accessed from ftp://ftp.ebi.ac.uk/pub/databases/swissprot/release/ ftp://ftp.ebi.ac.uk/pub/databases/trembl/ Release 5.1 contains 7 new families since the last release. No format changes since the last release. We are grateful to the many people who contributed data: Rob Finn, Matthew Bashton, Chris Ponting, Peer Bork, Joerg Schultz, Richard Copley, Tim Dudgeon, Harold Hutter, Anton Enright as well as many others. 4. FUTURE FORMAT CHANGES The next release will contain a further type of Database reference for cross-links to the Interpro database: DR INTERPRO; IPR000001; 5. DESCRIPTION OF RELEASE FILES relnotes.txt - This file. userman.txt - A fuller description of Pfam fields. Pfam-A.full - Annotation and full alignments in Pfam format of all Pfam-A families. Pfam-A.seed - Annotation and seed alignments in Pfam format of all Pfam-A families. Pfam-B - All Pfam-B families. swissPfam - Pfam domain organisation of all Swissprot proteins. Pfam - All Pfam-A HMMs in a HMM library searchable with the hmmpfam program. PfamFrag - All Pfam-A HMMs in fs (fragment search) mode in a HMM library searchable with the hmmpfam program. diff - A list of files for each family that have changed since the last release. 6. DESCRIPTION OF FIELDS Compulsory fields: ------------------ AC Accession number: Accession number in form PFxxxxx or PBxxxxxx. ID Identification: One word name for family. DE Definition: Short description of family. AU Author: Authors of the entry. AL Alignment method of seed: The method used to align the seed members. SE Source of seed: The source suggesting the seed members belong to one family. GA Gathering method: Search threshold to build the full alignment. TC Trusted Cutoff: Lowest sequence score and domain score of match in the full alignment. NC Noise Cutoff: Highest sequence score and domain score of match not in full alignment. SQ Sequence: Number of sequences in alignment. // End of alignment. Optional fields: ---------------- DC Database Comment: Comment about database reference. DR Database Reference: Reference to external database. RC Reference Comment: Comment about literature reference. RN Reference Number: Reference Number. RM Reference Medline: Eight digit medline UI number. RT Reference Title: Reference Title. RA Reference Author: Reference Author RL Reference Location: Journal location. PI Previous identifier: Record of all previous ID lines. KW Keywords: Keywords. CC Comment: Comments. 7. REFERENCES Papers on Pfam are listed below: i) Sonnhammer ELL, Eddy SR, Durbin R. Proteins: Structure, Function and Genetics 28:405-420 (1997). ii) Sonnhammer ELL, Eddy SR, Birney E, Bateman A, Durbin R. Nucleic Acids Research 26:320-322 (1998). iii) Bateman A, Birney E, Durbin R, Eddy SR, Finn RD, Sonnhammer ELL. Nucleic Acids Research 27:260-262 (1999). iv) Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer ELL. Nucleic Acids ResEARCH 28:263-266 (2000). We suggest that you reference the most recent paper. 8. COPYRIGHT NOTICE Pfam - A database of protein domain family alignments and HMMs Copyright (C) 1996-1999 The Pfam consortium. This database is free; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. In summary, you are free to redistribute *verbatim* copies of Pfam or any Pfam files in any way you like, including packaging Pfam in proprietary software, so long as your copy of Pfam retains our copyright notice and the GNU license. You may also make *modified* copies of Pfam and distribute them, but your derivative database must be freely distributed under the GNU LGPL. Many academic freeware licenses prohibit any form of commercial use. In contrast, the intent of our license is that Pfam should be freely available to both industrial and academic researchers, including the use of the Pfam database in commercial software; however, proprietary modifications of the Pfam database itself are prohibited. Proprietary modification of the Pfam database is possible only by a separate formal licensing agreement from the Pfam consortium and our host institutions. See the file GNULICENSE for the full text of the GNU Library General Public License. This database is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You may also obtain a copy of the GNU LGPL by writing to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Pfam is maintained by a consortium of researchers. You can contact the Pfam consortium at: pfam-admin@sanger.ac.uk The current members of the Pfam consortium are: Alex Bateman, Ewan Birney, Kevin Howe, Lorenzo Cerutti, Richard Durbin: The Sanger Centre, UK Erik Sonnhammer, Christian Storm, Michael Asman: Karolinska Institute, Sweden Sean Eddy, Ajay Khanna, Christian Zmasek: Washington University, St Louis, USA ___________________ The Pfam Consortium February 2000