This readme should help you get started with "pfam_scan.pl", which is for use with the HMMER3 version of HMMER. -------------------------------------------------------------------------------- - Setting up ------------------------------------------------------------------- -------------------------------------------------------------------------------- First download the tar ball 'PfamScan.tar.gz' which is located in the same directory as this README file. Unpack the script ================= shell% tar zxvf PfamScan.tar.gz ... shell% cd PfamScan Install HMMER3 ============== 1) Get the current tarball of the latest HMMER3 release from the HMMER site at Janelia farm. For example: shell% wget ftp://selab.janelia.org/pub/software/hmmer3/3.0/hmmer-3.0.tar.gz 2) Uncompress the tarball shell% tar zxvf hmmer-3.0.tar.gz 3) Compile the HMMER3 source code (optional - skip to the section about adding HMMER3 binaries to your path). You can also download precompiled binaries, but it is recommmended that you compile your source code with your system compiler. HMMER usually compiles in three easy steps. For detailed instructions, see the HMMER3 INSTALL guide. i) Change the the HMMER source directory: shell% cd hmmer-3.0 ii) Run "configure". For best performance, and depending on the architecture of the machine that will be running HMMER, it's recommended that you use the Intel compiler, "icc" to build the HMMER binaries. You can also use "gcc" but note that older versions (<3.3) will not correctly compile the code. You can tell "configure" which compiler to use by adding the "CC" flag: shell% ./configure CC=icc LDFLAGS="-static" --prefix=/path/to/install/hmmer3 iii) Run make: shell% make iv) Check that everything is compiled and working, by running: shell% make check v) Install HMMER: shell% make install Adding HMMER3 binaries to your path =================================== You need to make sure that the HMMER binaries are found on your shell's executable search path. If you are using bash: bash% export PATH=/path/to/install/hmmer3:$PATH or, if you are using csh (or tcsh): csh% setenv PATH /path/to/install/hmmer3:$PATH Note: if you are using the pre-compiled binaries, just point to those, e.g. /path/to/uncompressedTar/hmmer-3.0b3/binaries. Non-standard Perl dependencies ============================== The PfamScan.pm module depends on several modules that don't come as part of a standard Perl distribution, notably the Moose framework. Everything that you need can be installed using the "cpan" tool. You'll need to make sure that you've already configured your "cpan" environment, and then: shell% cpan Moose Moose itself has quite a few dependencies, so don't worry if it looks like you're installing half of CPAN ! PfamScan.pm also requires bioperl. We have currently only tested against bioperl 1.4, and we believe it works with 1.6. You can install bioperl via CPAN, or you can download bioperl 1.4 from here: http://bioperl.org/DIST/bioperl-1.4.tar.gz Adding Pfam Modules to your PERL5LIB ==================================== If you are using bash: bash% export PERL5LIB=/path/to/pfam_scanDir:$PERL5LIB or for C-shells: csh% setenv PERL5LIB /path/to/pfam_scanDir:$PERL5LIB Change the path of Perl in pfam_scan.pl ======================================= Open PfamScan/pfam_scan.pl in a text editor, and change the first line of the code to point to your Perl. You should be good to go now! -------------------------------------------------------------------------------- - Running searches using "pfam_scan.pl" ---------------------------------------- -------------------------------------------------------------------------------- Download Pfam data files ======================== You will need to download the following files from the Pfam ftp site (ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/): Pfam-A.hmm Pfam-A.hmm.dat Pfam-B.hmm Pfam-B.hmm.dat active_site.dat You will need to generate binary files for Pfam-A.hmm and Pfam-B.hmm by running the following commands: hmmpress Pfam-A.hmm hmmpress Pfam-B.hmm Using pfam_scan.pl ================== "pfam_scan.pl" is a program that searches a FASTA file against a library of Pfam HMMs. REQUIREMENTS ============ To recap, "pfam_scan.pl" requires: - Several modules written by Pfam, all of which are available as part of this tarball - Bioperl and HMMER3 installed (hmmscan and hmmalign should be in your path) - Pfam-A.hmm (and binaries): a data file that contains the Pfam-A library of HMMs - Pfam-B.hmm (and binaries): a data file that contains the Pfam-B library of HMMs - Pfam-A.hmm.dat: a data file that contains information about each Pfam-A family - Pfam-B.hmm.dat: a data file that contains information about each Pfam-B family - active_site.dat: a data file needed for the -as option, which contains active site information about each family - a FASTA-format file containing your query sequence(s) Usage ===== pfam_scan.pl -fasta -dir Additonal options: -h : show this help -o : output file, otherwise send to STDOUT -clan_overlap : show overlapping hits within clan member families (applies to Pfam-A families only) -align : show the HMM-sequence alignment for each match -e_seq : specify hmmscan evalue sequence cutoff for Pfam-A searches (default Pfam defined) -e_dom : specify hmmscan evalue domain cutoff for Pfam-A searches (default Pfam defined) -b_seq : specify hmmscan bit score sequence cutoff for Pfam-A searches (default Pfam defined) -b_dom : specify hmmscan bit score domain cutoff for Pfam-A searches (default Pfam defined) -pfamB : search against Pfam-B HMMs (uses E-value sequence and domain cutoff 0.001), in addition to searching Pfam-A HMMs -only_pfamB : search against Pfam-B HMMs only (uses E-value sequence and domain cutoff 0.001) -as : predict active site residues for Pfam-A matches -json [pretty] : write results in JSON format. If the optional value "pretty" is given, the JSON output will be formatted using the "pretty" option in the JSON module For more help, check the perldoc: shell% perldoc pfam_scan.pl Output format ============= The output should be familiar to anyone who's used the old "pfam_scan.pl" script. Each line contains the following information: Example output (with -as option): Q5NEL3.1 2 224 2 227 PB01348.1 Pfam-B_13481 Pfam-B 1 184 226 358.5 1.4e-107 NA NA O65039.1 38 93 38 93 PF08246.5 Inhibitor_I29 Domain 1 58 58 45.9 2.8e-12 1 No_clan O65039.1 126 342 126 342 PF00112.16 Peptidase_C1 Domain 1 216 216 296.0 1.1e-88 1 CL0125 predicted_active_site[150,285,307] FAQ === Q: Why are the results from running hmmscan different from those I get when I run pfam_scan.pl? A: We group together families which we believe to have a common evolutionary ancestor in clans. Where there are overlapping matches within a clan, pfam_scan.pl will only show the most significant (the lowest E-value) match within the clan. We perform the same clan filtering step on the Pfam website. If you do want the script to report all the overlapping clan matches, you can use the -clan_overlap option. Q: Why do I get the following error 'Expected at least 10 pieces of data: Domain annotation for each model (and alignments)' when I run pfam_scan.pl? A: This error arises if you have an old version of the Pfam modules that are compatible with a beta version of HMMER3. The solution is to remove the old Pfam modules and download the current copy of PfamScan.tar.gz.