7160 IPR009937 \

This family consists of several hypothetical bacterial proteins of around 140 residues in length. Members of the family seem to be found exclusively in Actinomycetes. The function of this family is unknown.

\ 2946 IPR003479 \ The major capsid protein of the adenovirus strain is also known as a hexon. This is a family of hexon-associated proteins (protein IIIa).\ 420 IPR006982 \

Glutamate synthase (GltS)1 is a key enzyme in the early stages of the assimilation of ammonia in bacteria, yeasts, and plants. In bacteria, L-glutamate is involved in osmoregulation, is the precursor for other amino acids, and can be the precursor for heme biosynthesis. In plants, GltS is especially essential in the reassimilation of ammonia released by photorespiration. On the basis of the amino acid sequence and the nature of the electron donor, three different classes of GltS can de defined as follows: 1) ferredoxin-dependent GltS (Fd-GltS), 2) NADPH-dependent GltS (NADPH-GltS), and 3) NADH-dependent GltS (properties of the three classes have been reviewed extensively PUBMED:10357231). The enzyme is a complex iron-sulfur flavoprotein catalyzing the reductive transfer of the amido nitrogen from L-glutamine to 2-oxoglutarate to form two molecules of L-glutamate via intramolecular channelling of ammonia from the amidotransferase domain to the FMN-binding domain.

\

Reaction of amidotransferase domain:

\ \ \

Reactions of FMN-binding domain:

\ \ \ The central domain of glutamate synthase connects the N-terminal amidotransferase domain with the FMN-binding domain and has an alpha/beta overall topology PUBMED:11967268.\ 1181 IPR004116 \

Amelogenins, cell adhesion proteins, play a role in the biomineralisation of\ teeth. They seem to regulate formation of crystallites during the secretory\ stage of tooth enamel development and are thought to play a major role in\ the structural organisation and mineralisation of developing enamel. The\ extracellular matrix of the developing enamel comprises two major classes \ of protein: the hydrophobic amelogenins and the acidic enamelins PUBMED:8118759.

\

\ Circular dichroism studies of porcine amelogenin have shown that the protein\ consists of 3 discrete folding units PUBMED:8454575: the N-terminal region appears to\ contain beta-strand structures, while the C-terminal region displays\ characteristics of a random coil conformation. Subsequent studies on the \ bovine protein have indicated the amelogenin structure to contain a\ repetitive beta-turn segment and a "beta-spiral" between Gln112 and Leu138,\ which sequester a (Pro, Leu, Gln) rich region PUBMED:2598664. The beta-spiral\ offers a probable site for interactions with Ca2+ ions.

\

\ Muatations in the human amelogenin gene (AMGX) cause X-linked hypoplastic\ amelogenesis imperfecta, a disease characterised by defective enamel. A 9bp\ deletion in exon 2 of AMGX results in the loss of codons for Ile5, Leu6, \ Phe7 and Ala8, and replacement by a new threonine codon, disrupting\ the 16-residue (Met1-Ala16) amelogenin signal peptide PUBMED:7782077.

\ 3824 IPR004929 \

This protein is involved in host lysis. This family is not considered to be a peptidase according to the MEROPs database.

\ \ 7769 IPR012475 \

Lectins are involved in many recognition events at the molecular or cellular level. These fungal lectins, such as Aleuria aurantialectin (AAL, ), specifically recognise fucosylated glycans. AAL is a dimeric protein, with each monomer being organised into a six-bladed beta-propeller fold and a small antiparallel two-stranded beta-sheet. The beta-propeller fold is important in fucose recognition; five binding pockets are found between the propeller blades. The small beta-sheet, on the other hand, is involved in the dimerisation process PUBMED:12732625.

\ 7497 IPR011631 \ These proteins appear to be specific to Mycoplasma species. They are of unknown function.\ 2726 IPR004381 \

This family includes glycerate kinase 2 (), which catalyses the phosphorylation of (R)-glycerate to 3-phospho-(R)-glycerate in the presence of ATP.

\ 5189 IPR008269 \

Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

\ \

Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

\

Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

\ \

This signature defines the C-terminal proteolytic domain of the archael, bacterial and eukaryotic lon proteases, which are ATP-dependent serine peptidases belonging to the MEROPS peptidase family S16 (lon protease family, clan SF). In the eukaryotes the majority of the proteins are located in the mitochondrial matrix PUBMED:8248235, PUBMED:9620272. In yeast, Pim1, is located in the mitochondrial matrix, is required for mitochondrial function, is constitutively expressed but is increased after thermal stress, suggesting that Pim1 may play a role in the heat shock response PUBMED:8276800.

\ 6913 IPR010766 \

This presumed domain is about 120 amino acids in length. It is found associated with CBS domains , as well as the CbiA domain . The function of this domain is unknown. It is named the DRTGG domain after some of the most conserved residues. This domain may be very distantly related to a pair of CBS domains. There are no significant sequence similarities, but its length and association with CBS domains supports this idea.

\ 3786 IPR006966 \ The Peroxin-3 family are peroxisomal proteins. They are thought to be involved in membrane vesicle biogenesis prior to the translocation of matrix proteins PUBMED:10848631.\ 6994 IPR009833 \

This family consists of several hypothetical Enterobacterial proteins of around 130 residues in length. Members of this family seem to be found exclusively in Escherichia coli and Salmonella species. The function of this family is unknown.

\ 4065 IPR000313 \ Upon characterization of WHSC1, a gene mapping to the Wolf-Hirschhornsyndrome critical region and at its C-terminus similar to the Drosophila melanogaster ASH1/trithorax group proteins, a novel protein domain designated PWWP domain was identified PUBMED:9618163. The PWWP domain is named after a conserved Pro-Trp-Trp-Pro motif. It is present in proteins of nuclear origin and plays a role in cell growth and differentiation. Due to its position, the composition of amino acids close to the PWWP motif and the pattern of other domains present it has been suggested that the domain is involved in protein-protein interactions PUBMED:10802047.\ 5445 IPR008502 \ This family consists of several proteins of unknown function found exclusively in Arabidopsis thaliana.\ 2531 IPR007854 \ This short motif is about 40 amino acids in length. In the Fip1 protein that is a component of a Saccharomyces cerevisiae pre-mRNA polyadenylation factor that directly interacts with poly(A) polymerase PUBMED:7736590. This region of Fip1 is needed for the interaction with the Yth1 subunit of the complex and for specific polyadenylation of the cleaved mRNA precursor PUBMED:11238938.\ 6253 IPR010492 \

DNA replication in eukaryotes results from a highly coordinated interaction between proteins, often as part of protein complexes, and the DNA template. One of the key early steps leading to DNA replication is formation of the prereplication complex, or pre-RC. The pre-RC is formed by the sequential binding of the origin recognition complex (ORC), Cdc6 and Cdt1 proteins, and the MCM complex. Activation of the pre-RC into the initiation complex (IC) is achieved via the action of S-phase kinases, eventually leading to the loading of the replication machinery.

\

Recently, a novel replication complex, GINS (for Go, Ichi, Nii, and San; five, one, two, and three in Japanese), has been identified PUBMED:12730133, PUBMED:12730134. \ \ The precise function of GINS is not known. However, genetic and two-hybrid interactions indicate that it mediates the loading of the enzymatic replication machinery at a step after the action of the S-phase kinases PUBMED:12730134. Furthermore, GINS may be a part of the replication machinery itself, since it is found associated with replicating DNA PUBMED:12730133, PUBMED:12730134. Electron microscopy of GINS shows that it forms a ring-like structure PUBMED:12730133, reminiscent of the structure of PCNA PUBMED:8001157, the DNA polymerase delta replication clamp.This observation, coupled with the observed interactions for GINS, indicates that the complex may represent the replication clamp for DNA polymerase epsilon PUBMED:12730133.

\ \ \

The GINS complex is essential for initiation of DNA replication in Xenopus egg extracts PUBMED:12730133. This 100 kDa stable complex includes Sld5, Psf1, Psf2, and Psf3. Homologues of these components are found also in other eukaryotes. This family of proteins represents the Psf3 component.

\ 4226 IPR006032 \

Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

\

Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

\ \ \

Ribosomal protein S12 is one of the proteins from the small ribosomal subunit.\ In Escherichia coli, S12 is known to be involved in the translation initiation\ step. It is a very basic protein of 120 to 150 amino-acid residues. S12\ belongs to a family of ribosomal proteins which are grouped on the basis of sequence\ similarities. This protein is known typically as S12 in bacteria, S23 in eukaryotes and as either S12 or S23 in the Archaea PUBMED:.

\

Bacterial S12 molecules contain a conserved aspartic acid residue which undergoes a novel post-translational modification, beta-methylthiolation, to form the corresponding 3-methylthioaspartic acid.

\ 1648 IPR003213 \

Cytochrome c oxidase () is an oligomeric enzymatic complex that is a component of the respiratory chain complex and is involved in the transfer of electrons from cytochrome c to oxygen PUBMED:6307356. In eukaryotes this enzyme complex is located in the mitochondrial inner membrane; in aerobic prokaryotes it is found in the plasma membrane.

\

In eukaryotes, in addition to the three large subunits, I, II and III, that form the catalytic centre of the enzyme complex, there are a variable number of small polypeptide subunits. One of these subunits is the potentially haem-binding subunit, VIb, which is encoded in the nucleus PUBMED:11136449.

\ 5579 IPR008620 \ This family consists of several Rhizobium FixH like proteins. It has been suggested that the four proteins FixG, FixH, FixI, and FixS may participate in a membrane-bound complex coupling the FixI cation pump with a redox process catalysed by FixG PUBMED:2536685.\ 8143 IPR013212 \

Proteins containing this domain are checkpoint proteins involved in cell division. This region has been shown to be essential for the binding of BUB1 and MAD3 to CDC20p PUBMED:10704439.

\ 4085 IPR004321 \

The variable portion of the genes encoding immunoglobulins and T cell receptors are assembled from component V, D, and J DNA\ segments by a site-specific recombination reaction termed V(D)J recombination. V(D)J recombination is targeted to\ specific sites on the chromosome by recombination signal sequences (RSSs) that flank antigen receptor gene segments. The RSS\ consists of a conserved heptamer (consensus, 5'-CACAGTG-3') and nonamer (consensus, 5'-ACAAAAACC-3') separated by a\ spacer of either 12 or 23 bp. Efficient recombination occurs between a 12-RSS and a 23-RSS, a restriction known as the 12/23 rule.

\ \

V(D)J recombination can be divided into two phases, DNA cleavage and DNA joining. DNA cleavage requires two lymphocyte-specific factors, the\ products of the recombination activating genes, RAG1 and RAG2, which together recognize the RSSs and create\ double strand breaks at the RSS-coding segment junctions PUBMED:11961538. RAG-mediated DNA cleavage occurs in a synaptic complex\ termed the paired complex, which is constituted from two distinct RSS-RAG complexes, a 12-SC and a 23-SC (where SC stands for signal complex). The DNA cleavage reaction involves two distinct enzymatic steps, initial nicking that creates a 3'-OH between a coding\ segment and its RSS, followed by hairpin formation in which the newly created 3'-OH attacks a phosphodiester bond on the opposite DNA strand. This generates a\ blunt, 5' phosphorylated signal end containing all of the RSS elements, and a covalently sealed hairpin coding end.

\ \

The second phase of V(D)J recombination, in which broken DNA fragments are processed and joined, is less well characterized. Signal ends are typically joined\ precisely to form a signal joint, whereas joining of the coding ends requires the hairpin structure to be opened and typically involves nucleotide addition and deletion\ before formation of the coding joint. The factors involved in these processes include ubiquitously expressed proteins involved in the repair of DNA double strand\ breaks by nonhomologous end joining, terminal deoxynucleotidyl transferase, and Artemis protein.

\ \

In addition to their critical roles in RSS recognition and DNA cleavage, the RAG proteins may perform two distinct types of functions in the\ postcleavage phase of V(D)J. A structural function has been inferred\ from the finding that, after DNA cleavage in vitro, the DNA ends remain associated with the RAG proteins in a "four end" complex known as the cleaved signal\ complex. After release of the coding ends in vitro, and after coding joint formation in vivo, the RAG proteins remain in a\ stable signal end complex (SEC) containing the two signal ends. These postcleavage complexes may serve\ as essential scaffolds for the second phase of the reaction, with the RAG proteins acting to organize the DNA processing and joining events.

\ \

The second type of RAG protein-mediated postcleavage activity is the catalysis of phosphodiester bond hydrolysis and strand transfer reactions. The RAG proteins are capable of opening hairpin coding ends in vitro. The RAG proteins\ also show 3' flap endonuclease activity that may contribute to coding end processing/joining and can utilize the\ 3' OH group on the signal ends to attack hairpin coding ends (forming hybrid or open/shut joints) or virtually any DNA duplex (forming a transposition product).

\ \ 2425 IPR001144 \

Escherichia coli heat-labile enterotoxin is a bacterial protein toxin with an AB5 multimer structure, in which the B pentamer () has a membrane-binding function and the A chain is needed for enzymatic activity PUBMED:8478941. The B subunits are arranged as a donut-shaped pentamer, each subunit participating in ~30 hydrogen bonds and 6 salt bridges with its two neighbours PUBMED:8478941.

\

The A subunit has a less well-defined secondary structure. It predominantly interacts with the pentamer via the C-terminal A2 fragment, which runs through the charged central pore of the B subunits. A putative catalytic residue in the A1 fragment (Glu112) lies close to a hydrophobic region, which packs two loops together. It is thought that this region might be important for catalysis and membrane translocation PUBMED:8478941.

\ 6296 IPR009465 \

This conserved region is found in the N-terminal half of several Spondin proteins. Spondins are involved in patterning axonal growth trajectory through either inhibiting or promoting adhesion of embryonic nerve cells PUBMED:11287656.

\ 1548 IPR000453 \ Chorismate synthase () catalyzes the last of the \ seven steps in the\ shikimate pathway which is used in prokaryotes, fungi and plants for the\ biosynthesis of aromatic amino acids. It catalyzes the 1,4-trans elimination\ of the phosphate group from 5-enolpyruvylshikimate-3-phosphate (EPSP) to form\ chorismate which can then be used in phenylalanine, tyrosine or tryptophan\ biosynthesis. Chorismate synthase requires the presence of a reduced flavin\ mononucleotide (FMNH2 or FADH2) for its activity.\ Chorismate synthase from various sources shows PUBMED:1718979,\ PUBMED:1837329 a high degree of sequence\ conservation. It is a protein of about 360 to 400 amino-acid residues.\ 578 IPR006667 \ This region is the integral membrane part of the eubacterial MgtE family of magnesium transporters. Related regions are found also in archaebacterial and eukaryotic proteins. All the archaebacterial and eukaryotic examples have two copies of the region. This suggests that the eubacterial examples may act as dimers. Members of this family probably transport Mg2+ or other divalent cations into the cell. The alignment contains two highly conserved aspartates that may be involved in cation binding (Bateman A unpubl.)\ 7986 IPR012972 \

This domain is located N-terminal to WD40 repeats(). It is found in the microtubule-associated protein PUBMED:15112237.

\ 5584 IPR008825 \ S-antigens are heat stable proteins that are found in the blood of individuals infected with malaria.\ 2624 IPR011602 \

Fumble is required for cell division in Drosophila. Mutants lacking fumble exhibit abnormalities in bipolar spindle organisation, chromosome segregation, and contractile ring formation. Analyses have demonstrated that it encodes three protein isoforms, all of which contain a domain with high similarity to the pantothenate kinases of Emericella nidulans and mouse PUBMED:11238410. A role of fumble in membrane synthesis has been proposed PUBMED:11238410.

\ 7007 IPR009839 \

This family consists of several SseB proteins, which appear to be found exclusively in Enterobacteria. SseB is known to enhance serine-sensitivity in Escherichia coli PUBMED:7982894 and is part of the Salmonella pathogenicity island 2 (SPI-2) translocon PUBMED:12724372.

\ 162 IPR005171 \

Cytochrome c oxidase (COX) is a multi-subunit enzyme complex that catalyzes the final step of electron transfer through the respiratory chain on the mitochondrial inner membrane. Bacterial cytochrome c oxidases generally consist of four different subunits, I to IV. This family is composed of cytochrome c oxidase subunit IV from prokaryotes which is present in a cleft formed by subunits I and III. Subunit IV assists the a copper ion, CuB, binding to subunit I during biosynthesis or assembly of the oxidase complex PUBMED:8663126.

\ 5985 IPR009321 \

This family consists of several hypothetical archaeal proteins of unknown function.

\ 3420 IPR003171 \ This family includes the 5,10-methylenetetrahydrofolate reductase from bacteria and methylenetetrahydrofolate reductase from eukaryotes. The structure for this domain is known PUBMED:10201405 to be a TIM barrel.\ 3150 IPR001982 \ The LAGLIDADG and HNH domains of site-specific DNA endonucleases encoded by viruses, bacteriophages as well as archaeal, eukaryotic nuclear and organellar genomes are characterized by the sequence motifs 'LAGLIDADG' and 'HNH', respectively PUBMED:9187655, PUBMED:9254693. Phylogenetic analysis of the two domains indicates a lack of exchange of endonucleases between different mobile elements (environments) and between hosts from different phylogenetic kingdoms. However, there does appear to have been considerable exchange of endonuclease domains amongst elements of the same type. Such events are suggested to be important for the formation of elements of new specficity PUBMED:9358175.\

'Homing' is the lateral transfer of an intervening genetic sequence, either an intron or an intein, to a cognate allele that lacks that element. The end result of homing is the duplication of the intervening sequence. The process is initiated by site-specific endonucleases that are encoded by open reading frames within the mobile elements. These endonucleases may be contrasted with a variety of enzymes involved in nucleic acid strand breakage and rearrangement, particularly restriction endonucleases. They are encoded within\ the intervening sequence and there are interesting limitations on the position and length of their open reading frames, and therefore on their structures. These enzymes display a unique strategy of flexible recognition of very long DNA target sites. This strategy allows these sequences to minimize nonspecific cleavage within the host genome, while maximizing the ability of the endonuclease to cleave closely related variants of the homing site PUBMED:10487208.

\ 1556 IPR006945 \ Circoviruses are small circular single stranded viruses. This family represents the VP2 protein.\ 1316 IPR007780 \ This family consists of several bacterial proteins which are closely related to NAD-glutamate dehydrogenase found in Streptomyces clavuligerus. Glutamate dehydrogenases (GDHs) are a broadly distributed group of enzymes that catalyse the reversible oxidative deamination of glutamate to ketoglutarate and ammonia PUBMED:10924516.\ 3900 IPR003487 \ This family represents the phosphoprotein of Paramyxoviridae, a putative RNA polymerase alpha subunit that may function in template binding.\ 433 IPR005198 \

O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

\

This is a family of alpha-1,6-mannanases belonging to glycoside hydrolase family 76 ().

\ 5271 IPR008807 \ This family consists of several ROS/MUCR transcriptional regulator proteins. The ros chromosomal gene is present in octopine and nopaline strains of Agrobacterium tumefaciens as well as in Sinorhizobium meliloti. This gene encodes a 15.5 kDa protein that specifically represses the virC and virD operons in the virulence region of the Ti plasmid PUBMED:2013576 and is necessary for succinoglycan production PUBMED:7756693. S. meliloti can produce two types of acidic exopolysaccharides, succinoglycan and galactoglucan, that are interchangeable for infection of Medicago sativa subsp. sativa nodules. MucR from S. meliloti acts as a transcriptional repressor that blocks the expression of the exp genes responsible for galactoglucan production therefore allowing the exclusive production of succinoglycan PUBMED:10656595.\ 7267 IPR010889 \

This family consists of several hypothetical bacterial proteins of around 130 residues in length. Members of this family seem to be found exclusively in Rhizobium species. The function of this family is unknown.

\ 5375 IPR008757 \

Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

\

Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

\ \

This group of metallopeptidases belong to MEROPS peptidase family M6 (immune inhibitor A family, clan MA(M)). The predicted active site residues for members of this family and thermolysin, the type example for clan MA, occur in the motif HEXXH.

\ \ \

InhA of Bacillus thuringiensis (an entomopathogenic bacterium) specifically cleaves antibacterial peptides produced by insect hosts PUBMED:2089225. B. thuringiensis is highly resistant to the insect immune system due to its production of two factors, inhibitor A (InhA or InA) and inhibitor B (InhB or InB), which selectively block the humoral defense system developed by insects against Escherichia coli and Bacillus cereus PUBMED:992874. B. thuringiensis is especially resistant to cecropins and attacins, which are the main classes of inducible antibacterial peptides in various lepidopterans and dipterans PUBMED:7140755, PUBMED:3318666. InhA has been shown to specifically hydrolyze cecropins and attacins in the immune hemolymph of Hyalophora cecropia in vitro PUBMED:6421577. However, it has been suggested that the role of InhA in resistance to the humoral defense system is not consistent with the time course of InhA production PUBMED:12029046.

B. thuringiensis has two proteins belonging to this group (InhA and InhA2), and it has been shown that InhA2 has a vital role in virulence when the host is infected via the oral route PUBMED:12029046. The B. cereus member has been found as an exosporium component from endospores PUBMED:10475957. B. thuringiensis InhA is induced at the onset of sporulation and is regulated by Spo0A and AbrB PUBMED:11429458. Vibrio cholerae PrtV is thought to be encoded in the pathogenicity island PUBMED:9371455. However, PrtV mutants did not exhibit a reduced virulence phenotype, and thus PrtV is not an indispensable virulence factor PUBMED:9371455.

Annotation note: due to the presence of PKD repeats in some of the members of this group (e.g., V. cholerae VCA0223), spurious similarity hits may appear (involving unrelated proteins), which may lead to the erroneous transfer of functional annotations and protein names. Also, please note that related Bacillus subtilis Bacillopeptidase F (Bpr or Bpf) contains two different protease domains: N-terminal (peptidase S8, subtilase, a subtilisin-like serine protease) and this C-terminal domain (peptidase M6), which may also complicate annotation.

\ 1344 IPR007600 \ Polyhedra are large crystalline occlusion bodies containing nucleopolyhedrovirus virions, and surrounded by an electron-dense structure called the polyhedron envelope or polyhedron calyx. The polyhedron envelope (associated) protein PEP is thought to be an integral part of the polyhedron envelope. PEP is concentrated at the surface of polyhedra, and is thought to be important for the proper formation of the periphery of polyhedra. It is thought that PEP may stabilise polyhedra and protect them from fusion or aggregation PUBMED:8176372.\ 4172 IPR001196 \

Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

\

Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

\ \ \

L15 is one of the proteins from the large ribosomal subunit.\ In Escherichia coli, L15 is known to bind the 23S rRNA. Ribosomal protein, L15 from\ bacteria and plant chloroplasts (nuclear-encoded) belong to this family. Vertebrate L27a, Tetrahymena thermophila L29 and fungal L27a (L29, CRP-1, CYH2)\ also are members of this group PUBMED:.

\

Ribosomal L18E protein from a number of archebacteria show homology to both the eukaryotic L18 and eubacterial ribosomal protein L15, an observation which has been seen to substantiate the belief that archaea represent an evolutionary stage between bacteria and eukaryotes PUBMED:10527834.

\ 8152 IPR013233 \

Mammalian PIG-X and yeast PBN1 are essential components of glycosylphosphatidylinositol-mannosyltransferase I. These enzymes are involved in the transfer of sugar molecules.

\ 3145 IPR003500 \ This family of proteins contains the sugar isomerase enzymes ribose 5-phosphate isomerase B (rpiB), galactose isomerase subunit A (LacA) and galactose isomerase subunit B (LacB). Ribose 5-phosphate isomerase B forms a homodimer and catalyses the conversion of D-ribose 5-phosphate to D-ribulose 5-phosphate in the nonoxidative branch of the pentose phosphate pathway. Galactose-6-phosphate isomerase is a heteromultimeric protein consisting of subunits LacA and LacB, and catalyses the conversion of D-galactose 6-phosphate to D-tagatose and 6-phosphate in the tagatose 6-phosphate pathway of lactose catabolism. The enzyme is induced by galactose or lactose.\ 5769 IPR010267 \

This family consists of several Chordopoxvirus A20R proteins. The A20R protein is required for DNA replication, is associated with the processive form of the viral DNA polymerase, and directly interacts with the viral proteins encoded by the D4R, D5R, and H5R open reading frames. A20R may contribute to the assembly or stability of the multiprotein DNA replication complex PUBMED:12490386.

\ 6262 IPR004406 \ Synonym(s): Citrate hydro-lyase, Aconitase\

Aconitate hydratase 2 is involved in energy metabolism as part of the TCA cycle. It catalyses the formation of cis-aconitate from citrate. Aconitase has an active (4FE-4S) and an inactive (3FE-4S) form. The active (4FE-4S) cluster is part of the catalytic site that interconverts citrate, cis-aconitase and isocitrate.

\ 1833 IPR002836 \

This protein family is found in archaea and eukaryota. The human TFAR19 encodes a protein which shares significant homology to the corresponding proteins of species ranging from yeast to mice. TFAR19 exhibits a ubiquitous expression pattern and its expression is upregulated in the tumor cells undergoing apoptosis. TFAR19 may play a general role in the apoptotic process PUBMED:9920759. Also included in this family is a DNA-binding protein from the archaea, Methanobacterium thermoautotrophicum.

\ 3725 IPR000045 \

Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

\ \

Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

\

This group of aspartic endopeptidases belong to MEROPS peptidase family A24 (type IV prepilin peptidase family, clan AD), subfamily A24A.

\ \

Bacteria produce a number of protein precursors that undergo post-translational methylation and proteolysis prior to secretion as active\ proteins. Type IV prepilin leader peptidases are enzymes that mediate this type of post-translational modification. Type IV pilin is a protein found on the surface of Pseudomonas aeruginosa, Neisseria gonorrhoeae and other Gram-negative\ pathogens. Pilin subunits attach the infecting organism to the surface of\ host epithelial cells. They are synthesised as prepilin subunits, which\ differ from mature pilin by virtue of containing a 6-8 residue leader\ peptide consisting of charged amino acids. Mature type IV pilins also\ contain a methylated N-terminal phenylalanine residue.

\ \

Prepilin leader peptidases are found on the cytosolic membrane surface,\ where they have dual activity, involving cleavage of glycine-phenylalanine\ bonds and methylation of the newly-revealed N-terminal phenylalanine. The\ consensus sequence for the site of proteolytic cleavage is -G+F-T-L/I-, in\ which the Gly P1 residue is essential PUBMED:7845226. The peptidases are susceptible to\ thiol blocking reagents.

\ 5462 IPR008512 \ This family consists of several plant proteins of unknown function.\ 2165 IPR007547 \ This is a family of uncharacterised proteins.\ 5881 IPR009272 \

This is a family of proteins from the archaeon Sulfolobus, with undetermined function.

\ 7911 IPR012988 \

This presumed domain is found at the N terminus of Ribosomal L30 proteins and has been termed RL30NT or NUC018 PUBMED:15112237.

\ 2109 IPR007393 \ This is a family of uncharacterised proteins.\ 1810 IPR007185 \ DNA polymerase epsilon is essential for cell viability and chromosomal DNA replication in budding yeast. In addition, DNA polymerase epsilon may be involved in DNA repair and cell-cycle checkpoint control. The enzyme consists of at least four subunits in mammalian cells as well as in yeast. The largest subunit of DNA polymerase epsilon is responsible for polymerase activity. In mouse, the DNA polymerase epsilon subunit B is the second largest subunit of the DNA polymerase. A part of the N-terminal was found to be responsible for the interaction with SAP18. Experimental evidence suggests that this subunit may recruit histone deacetylase to the replication fork to modify the chromatin structure PUBMED:11872158.\ 7272 IPR010004 \

This entry represents the N terminus (approximately 80 residues) of Ycf66, a protein that seems to be restricted to eukaryotes that contain chloroplasts and to cyanobacteria.

\ 5226 IPR008800 \ This family consists of bacterial PufQ proteins. PufQ is required for bacteriochlorophyll biosynthesis serving a regulatory function in the formation of photosynthetic complexes PUBMED:10196154.\ 6036 IPR009344 \

This family consists of Borna disease virus G glycoprotein sequences. Borna disease virus (BDV) infection produces a variety of clinical diseases, from behavioural illnesses to classical fatal encephalitis PUBMED:12163584. G protein is important for viral entry into the host cell PUBMED:8985354,PUBMED:11435588.

\ 4846 IPR005134 \ This conserved hypothetical protein family with four predicted transmembrane regions is found in\ Escherichia coli, Haemophilus influenzae, and Helicobacter pylori, among completed genomes.\ 5263 IPR008770 \ This family consists of DNA terminal protein GP3 sequences from Phi-29 like bacteriophage. DNA terminal protein GP3 is linked to the 5' ends of both strands of the genome through a phosphodiester bond between the beta-hydroxyl group of a serine residue and the 5,-phosphate of the terminal deoxyadenylate. This protein is essential for DNA replication and is involved in the priming of DNA elongation PUBMED:6779279.\ 2160 IPR007555 \ This is a family of uncharacterised hypothetical prokaryotic proteins.\ 3931 IPR006834 \ This is a family of Chordopoxvirus proteins composing one of the two subunits that make up VITF-3, a virally encoded complex necessary for intermediate stage transcription PUBMED:10077573.\ 3038 IPR003987 \

Intercellular adhesion molecules (ICAMs) and vascular cell adhesion \ molecule-1 (VCAM-1) are part of the immunoglobulin superfamily. They are\ important in inflammation, immune responses and in intracellular signalling\ events PUBMED:9151947. The ICAM family consists of five members, designated ICAM-1 to ICAM-5. They are known to bind to leucocyte integrins CD11/CD18 during inflammation and in immune responses. In addition, ICAMs may exist in soluble forms in human plasma, due to activation and proteolysis mechanisms at cell surfaces.

\

ICAM-1 (CD54) contains five Ig-like domains. It is expressed on leucocytes, \ endothelial and epithelial cells, and is upregulated in response to bacterial invasion. The protein is a ligand for lymphocyte-function associated (LFA) antigens and also a receptor for CD11a,b/CD18, fibrinogen, human rhinoviruses and Plasmodium falciparum-infected erythrocytes. ICAM-1 binding sites for CD11a/CD18 and its other binding partners are located in the first domain and are overlapping. ICAM-1 domain 2 seems to play an important role in maintaining the conformation of domain 1 and particularly the structural integrity of the LFA-1 ligand-binding site PUBMED:10998349.

\

The 3-dimensional atomic structure of the tandem N-terminal Ig-like domains \ (D1 and D2) of ICAM-1 has been determined to 2.2A resolution and fitted into\ a cryoelectron microscopy reconstruction of a rhinovirus-ICAM-1 complex PUBMED:9539703. Extensive charge interactions between ICAM-1 and human rhinoviruses are largely conserved in major and minor receptor groups of rhinoviruses. The interaction of ICAMs with LFA-1 is mediated by a divalent cation bound to the insertion (I)-domain on the alpha chain of LFA-1 and the carboxyl group of a conserved glutamic acid residue on ICAMs.

\

ICAM-2 (CD102) has two Ig-like domains. It is expressed on endothelial\ cells, leucocytes and platelets, and binds to CD11a,b/CD18. The protein is\ refractory to proinflammatory cytokines, and plays an important role in the\ adhesion of leucocytes to the uninduced endothelium PUBMED:10352278.

\

ICAM-3 (CD50) contains five Ig-like domains and binds to leucocyte integrins\ CD11a,d/CD18. The protein plays an important role in the immune response and\ perhaps in signal transduction PUBMED:10725740.

\

ICAM-4 (LW blood group Ag) is red blood cell (RBC) specific and binds to \ CD11a,b/CD18. It is associated with the RBC Rh antigens and could be \ important in retaining immature red cells in the bone marrow, or in the uptake of senescent cells into the spleen PUBMED:10846180.

\

ICAM-5 (telencephalin) has nine Ig-like domains and is confined to the \ telencephalon of the brain. The role of this CD11a/CD18 binding molecule \ is not yet known PUBMED:10741396.

\

VCAM-1 was first described as a cytokine-inducible endothelial adhesion molecule. It can bind to leucocyte integrin VL-4 (very late antigen-4) to recruit leucocytes to sites of inflammation PUBMED:11133225. The predominant form of VCAM-1 in vivo has an N-terminal extracellular region comprising seven Ig-like domains PUBMED:7531291. A conserved integrin-binding motif has been identified in domains 1 and 4, variants of which are present in the N-terminal domain of all members of the integrin-binding subgroup of the immunoglobulin superfamily. The structure of a VLA-4-binding fragment comprising the first two domains of VCAM-1 has been determined to 1.8A resolution. The integrin-binding motif is exposed and forms the N-terminal region of the loop between beta-strands C and D of domain 1 PUBMED:7531291. VCAM-1 domains 1 and 2\ are structurally similar to ICAM-1 and ICAM-2 PUBMED:11133225.

\ 2131 IPR007416 \

This is a family of bacterial proteins with no known function.

\ 2405 IPR004990 \ This is a family of hypothetical proteins from cereal crops.\ 2621 IPR004224 \ Fumarate reductase couples the reduction of fumarate to succinate to the oxidation of quinol to quinone, in a reaction opposite to that catalysed by the related complex II of the respiratory chain (succinate dehydrogenase) PUBMED:10586875. Three protein subunits contain the fumarate reductase complex. Subunit A contains the site of fumarate reduction and a covalently\ bound flavin adenine dinucleotide prosthetic group. Subunit B contains three iron-sulphur centres. The menaquinol-oxidizing subunit C (this family) consists of five membrane-spanning, primarily helical segments and binds two haem b molecules PUBMED:10586875.\ 5007 IPR000607 \ Double-stranded RNA-specific adenosine deaminase (.-) converts multiple adenosines to inosines\ and creates I/U mismatched base pairs in double-helical RNA substrates without apparent sequence\ specificity. DRADA has been found to modify adenosines in AU-rich regions more frequently, probably\ due to the relative ease of melting A/U base pairs compared to G/C base pairs. The protein functions to\ modify viral RNA genomes, and may be responsible for hypermutation of certain negative-stranded viruses.\ DRADA edits the mRNAs for the glutamate receptor subunits by site-selective adenosine deamination. The\ DRADA repeat is also found in viral E3 proteins, which contain a double-stranded RNA-binding domain.\ 5812 IPR009245 \

This family consists of several Cytomegalovirus UL20A proteins. UL20A is thought to be a glycoprotein PUBMED:11928987.

\ 7741 IPR012462 \

This family is composed of sequences derived from hypothetical eukaryotic proteins of unknown function.

\ 1605 IPR001808 \ Numerous bacterial transcription regulatory proteins bind DNA via a helix-turn-helix (HTH) motif. These proteins are very diverse, but for convenience may be grouped into subfamilies on the basis of sequence similarity. This family groups together a range of proteins, including anr, crp, clp, cysR, fixK, flp, fnr, fnrN, hlyX and ntcA PUBMED:14638413, PUBMED:10550204. Within this family, the HTH motif is situated towards the C-terminus.\ \ 2647 IPR000721 \ The Gag protein from retroviruses, also known as p24, forms the inner protein layer of the\ nucleocapsid. This protein performs highly complex orchestrated tasks during the assembly,\ budding, maturation and infection stages of the viral replication cycle. During viral assembly,\ the proteins form membrane associations and self-associations that ultimately result in\ budding of an immature virion from the infected cell. Gag precursors also function during\ viral assembly to selectively bind and package two plus strands of genomic RNA. ELISA tests\ for p24 is the most commonly used method to demonstrate virus replication both in vivo and in\ vitro.\ 1555 IPR003383 \ Circoviruses are small circular single stranded viruses. This family is the ORF-2 protein from viruses such as porcine circovirus PUBMED:9573301 and beak and feather disease virus . These proteins are about 220 amino acids long and of unknown function.\ 1843 IPR002809 \ This prokaryotic protein family has no known function. Members are predicted to be integral membrane proteins.\ 7796 IPR012896 \

This is the beta tail domain of the Integrin protein. Integrins are receptors, which are involved in cell-cell and cell-extracellular matrix interactions.

\ 5954 IPR009304 \

This is a family of Kaposi's sarcoma-associated herpesvirus (HHV8) latent membrane protein.

\ 294 IPR006702 \ This family of plant proteins contains a domain that may have a catalytic activity. It has a conserved arginine and aspartate that could form an active site. These proteins are predicted to contain 3 or 4 transmembrane helices.\ 19 IPR006693 \

The alpha/beta hydrolase fold is common to several hydrolytic enzymes of widely differing phylogenetic origin and\ catalytic function. The core of each enzyme is similar: an alpha/beta sheet, not barrel, of eight beta-sheets connected by alpha-helices PUBMED:1409539. This entry describes a closely associated region, which is found in a number of lipases.

\ 4245 IPR000289 \

Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

\

Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

\ \ \

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped \ on the basis of sequence similarities. Examples are:\ \

  • Mammalian S28 PUBMED:11875025
  • \
  • Plant S28 PUBMED:8278557
  • \
  • Fungi S33 PUBMED:1481571
  • \
  • Archaebacterial S28e.
  • \ \

    These proteins have from 64 to 78 amino acids and a highly conserved C-terminal extremity region.

    \ 6368 IPR013068 \

    Galanin is a peptide hormone that controls various biological activities PUBMED:1710578. Galanin-like immuno-reactivity has been found in the central and peripheral nervous systems of mammals, with high concentrations demonstrated in discrete regions of the central nervous system, including the median eminence, hypothalamus, arcuate nucleus, septum, neuro-intermediate lobe of the pituitary, and the spinal cord. Its localisation within neurosecretory granules suggests that galanin may function as a neurotransmitter, and it has been shown to coexist with a variety of other peptide and amine neurotransmitters within individual neurons PUBMED:2448788.

    \ \

    Although the precise physiological role of galanin is uncertain, it has a number of pharmacological properties: it stimulates food intake, when injected into the third ventricle of rats; it increases levels of plasma growth hormone and prolactin, and decreases dopamine levels in the median eminence PUBMED:2448788; and infusion into humans results in hyperglycemia and glucose intolerance, and inhibits pancreatic release of insulin, somatostatin and pancreatic peptide. Galanin also modulates smooth muscle contractility within the gastro-intestinal and genito-urinary tracts, all such activities suggesting that the hormone may play an important role in the nervous modulation of endocrine and smooth muscle function PUBMED:2448788.

    \ \

    This domain represents the galanin message-associated peptide (GMAP) domain which is found C-terminal to the galanin domain in the preprogalanin precursor protein. GMAP sequences in different species show a high degree of homology, but the biological function of the GMAP peptide is not known PUBMED:9639260.

    \ \ 3854 IPR003431 \

    Phytase () (phytate 3-phosphatase) is a secreted enzyme which hydrolyses phytate to release inorganic phosphate. This family appears to represent a novel enzyme that shows phytase activity (PUBMED:9603817) and has been shown to have a six- bladed propeller folding architecture (PUBMED:10655618).

    \ 7108 IPR009903 \

    This family consists of several Baculovirus proteins of around 55 residues in length. The function of this family is unknown.

    \ 5248 IPR008410 \ This entry contains the C-terminal regions of several bacterial cellulose synthase operon C (BCSC) proteins. BCSC is involved in cellulose synthesis although the exact function of this protein is unknown PUBMED:11260463.\ 3955 IPR006803 \

    This entry represents the Poxvirus protein I5.

    \ 1655 IPR005481 \

    Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate from \ glutamine () or ammonia () and bicarbonate PUBMED:1972379. This important enzyme \ initiates both the urea cycle and the biosynthesis of arginine and pyrimidines. Glutamine-dependent CPSase \ (CPSase II) is involved in the biosynthesis of pyrimidines and purines.

    In bacteria such as Escherichia coli, a \ single enzyme is involved in both biosynthetic pathways while other bacteria have separate enzymes. The \ bacterial enzymes are formed of two subunits. A small chain (carA) that provides glutamine amidotransferase \ activity (GATase) necessary for removal of the ammonia group from glutamine, and a large chain (carB)\ that provides CPSase activity. The large subunit consists of four structural units: the carboxyphosphate synthetic component, the\ oligomerization domain, the carbamoyl phosphate synthetic component and the allosteric domain PUBMED:10089390. Such a\ structure is also present in fungi for arginine biosynthesis (CPA1 and CPA2). Such a structure is also present in fungi for arginine biosynthesis (CPA1 \ and CPA2).

    Two main CPSases have been identified in mammals, CPSase I is mitochondrial, is found in \ high levels in the liver and is involved in arginine biosynthesis; while CPSase II is cytosolic, is \ associated with aspartate carbamoyltransferase (ATCase) and dihydroorotase (DHOase) and is involved in \ pyrimidine biosynthesis. In the pyrimidine pathway in most eukaryotes, CPSase is found as a domain in a \ multi-functional protein, which also has GATase, ACTase and DHOase activity. Ammonia-dependent CPSase \ (CPSase I) is involved in the urea cycle in ureolytic vertebrates and is a monofunctional protein located \ in the mitochondrial matrix. The CPSase domain is typically 120 kD in size and has arisen from the \ duplication of an ancestral subdomain of about 500 amino acids. Each subdomain independently binds to ATP \ and it is suggested that the two homologous halves act separately, one to catalyze the phosphorylation of \ bicarbonate to carboxyphosphate and the other that of carbamate to carbamyl phosphate. The CPSase subdomain \ is also present in a single copy in the biotin-dependent enzymes acetyl-CoA carboxylase () (ACC), \ propionyl-CoA carboxylase () (PCCase), pyruvate carboxylase () (PC) and urea carboxylase\ ().

    \ 4838 IPR005265 \

    It appears the conserved hypothetical integral membrane proteins of this family are found only in Gram-negative bacteria and their function is unknown.

    \ 6643 IPR010659 \

    This domain is known as the connection domain. This domain lies between the thumb and palm domains PUBMED:1377403.

    \ 6275 IPR010501 \

    This family consists of several bacterial TraB pilus assembly proteins. TraB is know to be essential for piliation and transfer but very little is known about its specific role in this process. It has been suggested that TraB extends into the periplasmic space and is anchored in the inner membrane via a single transmembrane segment near the N terminus. It is also thought that TraB may interact with TraP, in order to stabilise the proposed transmembrane complex formed by the tra operon products PUBMED:8655498.

    \ 3525 IPR001678 \

    This domain is found in archaeal, bacterial and eukaryotic proteins.

    \ \

    In the archaea and bacteria, they are annotated as putative nucleolar protein, Sun (Fmu) family protein or tRNA/rRNA cytosine-C5-methylase. The majority have the S-adenosyl methionine (SAM) binding domain and are related to Escherichia coli Fmu (Sun) protein (16S rRNA m5C 967 methyltransferase) whose structure has been determined PUBMED:14656444.

    \ \

    In the eukaryota, the majority are annotated as being hypothetical protein, nucleolar protein or the Nop2/Sun (Fmu) family. Unlike their bacterial homologues, few of the eukaryotic members in this family have a the SAM binding signature. Despite this, Saccharomyces cerevisiae (yeast) Nop2p is a probable RNA m5C methyltransferase PUBMED:12872006. It is essential for processing and maturation of 27S pre-rRNA and large ribosomal subunit biogenesis PUBMED:12872006; localized to the nucleolus and is essential for viability PUBMED:7806561. Reduced Nop2p expression limits yeast growth and decreases levels of mature 60S ribosomal subunits while altering rRNA processing PUBMED:8972218. There is substantial identity between Nop2p and human p120 (NOL1), which is also called the proliferation-associated nucleolar antigen PUBMED:7806561, PUBMED:2576976.

    \ \ 2421 IPR005050 \

    The expression of early nodulin (ENOD) genes has been well characterized in several legume species. Based on their biochemical attributes and expression\ patterns, they are postulated to have roles in cell structure, in the control of nodule ontogeny by the degradation of Nod factor, and in carbon metabolism PUBMED:10759502.

    \ 77 IPR004776 \ Proteins in this group are mostly uncharacterised and of unknown function.\ 2111 IPR007409 \

    This domain is often found adjacent to a methylase domain () in restriction endonucleases or methylases. In one of the proteins, , it is adjacent to a helicase domain () in a putative restriction endonuclease.

    \ 4141 IPR000448 \ The Nucleocapsid (N) Protein is said to have a 'tight' structure.\ The carboxyl end of the N-terminal domain possesses an RNA binding domain.\ Sequence alignments show 2 regions of reasonable conservation, \ approx. 64-103 and 201-329 PUBMED:9603315. A whole functional protein is required \ for encapsidation to take place PUBMED:9501055.\ 2776 IPR004629 \

    The WecG member of this superfamily, believed to be UDP-N-acetyl-D-mannosaminuronic acid transferase, plays a role in enterobacterial common antigen (eca) synthesis in Escherichia coli. Another family member, the Bacillus subtilis TagA protein, is involved in the biosynthesis of the cell wall polymer poly(glycerol phosphate). The third family member, CpsF, CMP-N-acetylneuraminic acid synthetase has a role in the capsular polysaccharide biosynthesis pathway.

    \ 5887 IPR009273 \

    This is a family of bacterial proteins with undetermined function. All bacteria in this family are from the Rhizobiales order.

    \ 5586 IPR008896 \ The chloroplast genomes of most higher plants contain two giant open reading frames designated ycf1 and ycf2. Although the function of Ycf1 is unknown, it is known to be an essential gene PUBMED:10792825.\ 4441 IPR008258 \

    Bacterial lytic transglycosylases degrade murein via cleavage of the beta-1,4-glycosidic bond between N-acetylmuramic acid and N-acetylglucosamine, with the concomitant formation of a 1,6-anhydrobond in the muramic acid residue. There are both soluble (Slt enzymes) and membrane-bound (Mlt enzymes) lytic transglycosylases that differ in size, sequence, activity, specificity and location. The multi-domain structure of the 70 Kd soluble lytic transglycosylase Slt70 is known, and includes a superhelical domain () and a C-terminal catalytic domain with a lysosome-like fold PUBMED:10452894. The catalytic domain is structurally conserved in some membrane-bound lytic glycosylases and in bacteriophage transglycosylases, even though their sequences can differ considerably proteins PUBMED:8203016. The most conserved part of this domain is its N-terminal extremity that contains two conserved serines and a glutamate, which have been shown PUBMED:8107871 to be involved in the catalytic mechanism. This family is distantly related to .

    \ \ 6717 IPR009677 \

    This family consists of several hypothetical bacterial proteins of around 235 residues in length. Members of this family seem to be found exclusively in the Enterobacteria Salmonella typhimurium and Escherichia coli. The function of this family is unknown.

    \ 1281 IPR002146 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    The CF(0) B/B' subunits are thought to interact with the stalk of the CF(1) subunits.

    \ 4709 IPR007069 \ Transposases are needed for efficient transposition of the insertion sequence or transposon DNA. This family includes transposases IS1294 and IS801.\ 823 IPR005326 \

    This presumed domain is found at the N terminus of some isoforms of the cytoskeletal muscle protein plectin as well as the ribosomal S10 protein. This domain may be involved in RNA binding.

    \ 444 IPR004360 \ Glyoxalase I () (lactoylglutathione lyase) catalyzes the first step of the glyoxal pathway. S-lactoylglutathione is then converted by glyoxalase II to lactic acid PUBMED:7684374.\ Glyoxalase I is an ubiquitous enzyme which binds one mole of zinc\ per subunit. The bacterial and yeast enzymes are monomeric while the mammalian one is homodimeric. The sequence of glyoxalase I is well conserved. This domain is found in other related proteins including the Bleomycin resistance protein and dioxygenases eg. 4-hydroxyphenylpyruvate dioxygenase.\ 2709 IPR008146 \

    Glutamine synthetase () (GS) PUBMED:2900091 plays an essential role in the metabolism of nitrogen by catalyzing the condensation of glutamate and ammonia to form glutamine.

    \

    There seem to be three different classes of GS PUBMED:8096645, PUBMED:2575672, PUBMED:7916055:\

    \

    While the three classes of GS's are clearly structurally related, the sequence similarities are not so extensive.

    \ \ 4734 IPR001278 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Arginyl-tRNA synthetase () has been crystallized and preliminary X-ray crystallographic analysis of yeast\ arginyl-tRNA synthetase-yeast tRNAArg complexes is available PUBMED:10739930.

    \ 3274 IPR000296 \ The cation dependent mannose-6-phosphate (man-6-P) receptor is one of two transmembrane proteins involved in the transport of lysosomal enzymes from the Golgi complex and the cell surface to lysosomes PUBMED:1376319. Lysosomal enzymes bearing phosphomannosyl residues bind specifically to man-6-P receptors in the Golgi apparatus and the resulting receptor-ligand complex is transported to an acidic prelyosomal compartment, where the low pH mediates dissociation of the complex. Binding is optimal in the presence of divalent cations.

    The amino acid sequence is a single polypeptide chain that contains a putative signal sequence and a transmembrane domain PUBMED:2954652. The cation-dependent mannose 6-phosphate (M6P)\ receptor (CD-MPR) is present predominantly as a\ stable homodimer in membranes and has a single\ M6P-binding site per polypeptide PUBMED:2954652, PUBMED:2544594. The molecule crystallizes as a homodimer\ with approximately 20% of the entire surface area of each monomer\ having contact with another through predominantly hydrophobic\ interactions PUBMED:12612639. Each monomer contains a single alpha-helix near its\ amino terminus followed by nine primarily anti-parallel beta-strands that form\ two beta-sheets, which are positioned orthogonally to each other. Extensive\ hydrophobic interactions are formed between the two beta-sheets, which\ results in each monomer forming a flattened beta-barrel structure. Six cysteine residues form three intramolecular disulphide bonds that\ are essential for the ligand-binding conformation of the receptor to be\ generated. The structures of the liganded molecules show that the\ carbohydrate-recognition domain of the enzyme lies relatively deep\ inside the protein, so that the terminal M6P residue and the penultimate\ sugar ring of bound pentamannosyl phosphate are mostly buried in the\ receptor. This deep binding pocket facilitates the formation of numerous\ interactions between the CD-MPR and its carbohydrate ligands.

    \ 1013 IPR007529 \

    The HIT-type zinc finger contains 7 conserved cysteines and one histidine that can potentially coordinate two zinc atoms. It has been named after the first protein that originally defined the domain: the yeast HIT1 protein () PUBMED:1325386. The HIT-type zinc finger displays some sequence similarities to the MYND-type zinc finger. The function of this domain is unknown but it is mainly found in nuclear proteins involved in gene regulation and chromatin remodeling. This domain is also found in the thyroid receptor interacting protein 3 (TRIP-3) , that specifically interacts with the ligand binding domain of the thyroid receptor.

    \ 4560 IPR000643 \ Iodothyronine deiodinase () (DI) PUBMED:, PUBMED:7592917 is the vertebrate enzyme responsible for the deiodination of\ the prohormone thyroxine (T4 or 3,5,3',5'-tetraiodothyronine) into the biologically active hormone T3\ (3,5,3'-triiodothyronine) and of T3 into the inactive metabolite T2 (3,3'-diiodothyronine). All known DI are\ proteins of about 250 residues that contain a selenocysteine at their active site. Three types of DI are\ known, type II is essential for providing the brain with the appropriate levels of T3 during the critical\ period of development, and type III is essential for the regulation of thyroid hormone inactivation during\ embryological development.\ 3093 IPR003065 \ The Salmonella typhimurium surface presentation of antigens K/invasion \ protein B gene (SpaK/InvB) is one of 12 that form a cluster responsible for \ invasion properties. The gene product is required for entry by the \ bacterium into epithelial cells, and is thus considered to be a virulence \ factor PUBMED:8404849. Other Spa genes in the cluster are related to invasion (Inv) genes in similar Salmonella and Shigella species PUBMED:7752894, and to flagella \ biosynthesis genes in Helicobacter pylori PUBMED:10066464. A further analogous gene in \ Yersinia (Spa15 homologue) has also been found PUBMED:8045880.\

    The SpaK/InvB protein has a molecular mass of 15kDa, and is believed to play a part in the sec-independent type III protein secretion system of \ Salmonella typhimurium and Shigella flexneri PUBMED:9159221. In the organisation of the \ Spa/Inv locus, the SpaK/InvB gene is found adjacent to SpaL/InvC PUBMED:8045880, and \ may play a part in the ATPase activity possessed by the latter.

    \ 825 IPR007477 \

    This presumed domain is found in proteins containing FERM domains . This domain is found to bind to both spectrin and actin, hence the name SAB (Spectrin and Actin Binding) domain.

    \ 4599 IPR004111 \

    Several resistance mechanisms have been developed by Gram-negative bacteria against the broad-spectrum antibiotic tetracycline (Tc) PUBMED:7707374. A common mechanism involves a membrane-associated protein (TetA) that exports the antibiotic out of the cell before it can attach to ribosomes and inhibit polypeptide chain growth. TetA expression is regulated by the Tet repressor (TetR). TetR occurs as a homodimer and uses 2 HTH motifs to bind tandem DNA operators, thereby blocking the expression of the associated genes, TetA and TetR.

    \

    The structure of the class D TetR repressor protein PUBMED:8153629 involves 10 alpha-helices, with connecting turns and loops. The 3 N-terminal helices constitute the DNA-binding HTH domain, which has an inverse orientation compared with HTH motifs in other DNA-binding proteins. The core of the protein, formed by helices 5-10, is responsible for dimerisation and contains, for each monomer, a binding pocket that accommodates Tc in the presence of a divalent cation.

    \ 4825 IPR002753 \ These archaebacterial proteins have no known function. Members of\ the family are about 90-105 amino acid residues long.\ 6524 IPR009578 \

    This family consists of a number of ~25 residue long repeats found commonly in Streptococcal surface antigens although one copy is present in the HPSR2-heavy chain potential motor protein of Giardia lamblia (). This family is often found in conjunction with .

    \ 3758 IPR001577 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M8 (leishmanolysin family, clan MA(M)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA.

    \ \ Leishmanolysin is an enzyme found in the eukaryotes including Leishmania and related parasitic\ protozoa PUBMED:7674922. The endopeptidase is the most abundant protein on the cell\ surface during the promastigote stage of the parasite, and is attached to\ the membrane by a glycosylphosphatidylinositol anchor PUBMED:7674922. In the amastigote\ form, the parasite lives in lysosomes of host macrophages, producing a\ form of the protease that has an acidic pH optimum PUBMED:7674922. This differs from\ most other metalloproteases and may be an adaptation to the environment in\ which the organism survives PUBMED:7674922.

    \ 2338 IPR002763 \ The function of this family is unknown. Aquifex aeolicus has two copies of this protein. A probable aspartyl-tRNA synthetase from Escherichia coli PUBMED:2129559 belongs to this group.\ 502 IPR001126 \

    In Escherichia coli, UV and many chemicals appear to cause mutagenesis by a\ process of translesion synthesis that requires DNA polymerase III and the\ SOS-regulated proteins UmuD, UmuC and RecA. This machinery allows the\ replication to continue through DNA lesion, and therefore avoid lethal\ interruption of DNA replication after DNA damage PUBMED:9560379.\ The UmuC is a well conserved protein in prokaryotes, with a homologue in yeast.

    \ \

    Proteins currently known to belong to this family are listed below:

    \ \

  • Escherichia coli MucB protein. Plasmid-born analog of the UmuC protein.
  • \
  • Yeast Rev1 protein. Homologue of UmuC also required for normal induction of\ mutations by physical and chemical agents.
  • \
  • Salmonella typhimurium ImpB protein. Plasmid-born analog of the UmuC\ protein.
  • \
  • Bacterial UmuC protein.
  • \
  • Escherichia coli DNA-damage-inducible protein P (DinP).
  • \
  • Salmonella typhimurium SamB homologue of UmuC plasmid associated.
  • \

    \ \ 5571 IPR008883 \ This family consists of the eukaryotic tumour susceptibility gene 101 protein (TSG101). Altered transcripts of this gene have been detected in sporadic breast cancers and many other Homo sapiens malignancies. However, the involvement of this gene in neoplastic transformation and tumourigenesis is still elusive. TSG101 is required for normal cell function of embryonic and adult tissues but this gene is not a tumour suppressor for sporadic forms of breast cancer PUBMED:12482969.\ 1795 IPR001679 \

    DNA ligase (polydeoxyribonucleotide synthase) is the enzyme that joins two DNA fragments by catalyzing the formation of an internucleotide ester bond between phosphate and deoxyribose. It is active during DNA replication, DNA repair and DNA recombination. There are two forms of DNA ligase: one requires ATP (), the other NAD ().

    \

    This family is predominantly composed of NAD-dependent bacterial DNA ligases. They are proteins of about 75 to 85 Kd whose sequence is well conserved PUBMED:1526462, PUBMED:8390989. They also show similarity to yicF, an Escherichia coli hypothetical protein of 63 Kd.

    \ 6154 IPR009398 \

    This domain is found in adenylate cyclases and related proteins. The exact function of this domain is unknown.

    \ 3419 IPR003690 \

    This family currently contains one sequence of known function human mitochondrial transcription termination factor (mTERF), a multizipper protein but binds to DNA as a monomer, with evidence pointing to intramolecular leucine zipper interactions PUBMED:9118945. The precursors contain a mitochondrial targeting sequence, and the mature mTERF exhibits three leucine zippers, of which one is bipartite, and two widely spaced basic domains. Both basic domains and the three leucine zipper motifs are necessary for DNA binding. The leucine zippers are not implicated in a dimerisation role as in other leucine zippers PUBMED:9118945.

    \ \

    The rest of the family consists of hypothetical proteins none of which have any functional information.

    \ 317 IPR007033 \ This is a family of hypothetical eukaryotic proteins.\ 6445 IPR000751 \ M-phase inducer phosphatases function as dosage-dependent inducers in mitotic control\ PUBMED:1836978, PUBMED:2120044, PUBMED:8156993, PUBMED:1392080. They are tyrosine protein phosphatases\ required for progression of the cell cycle. They may directly dephosphorylate p34(cdc2) and\ activate p34(cdc2) kinase activity. They catalyze the reaction:\ \ 715 IPR000403 \

    Phosphatidylinositol 3-kinase (PI3-kinase) () PUBMED:1322797 is an enzyme\ that phosphorylates phosphoinositides on the 3-hydroxyl group of the inositol\ ring. The three products of PI3-kinase - PI-3-P,\ PI-3,4-P(2) and PI-3,4,5-P(3) function as secondary messengers in cell signalling.\ Phosphatidylinositol 4-kinase (PI4-kinase) () PUBMED:8194527 is an enzyme\ that acts on phosphatidylinositol (PI) in the first committed step in the\ production of the secondary messenger inositol-1,4,5,-trisphosphate. This domain is also present in a wide range of protein kinases, involved in diverse cellular functions, such as control of cell growth, regulation of cell cycle progression, a DNA damage checkpoint, recombination, and maintenance of telomere length. Despite significant homology to lipid kinases, no lipid kinase activity has been demonstrated for any of the PIK-related kinases PUBMED:12456783.

    The PI3- and PI4-kinases share a well conserved domain at their C-terminal\ section; this domain seems to be distantly related to the catalytic domain of\ protein kinases PUBMED:8387896, PUBMED:12151228. The catalytic domain of PI3K has the typical bilobal structure that is seen in other ATP-dependent\ kinases, with a small N-terminal lobe and a large C-terminal lobe. The core of this domain is the most conserved region of the PI3Ks.\ The ATP cofactor binds in the crevice formed by the N-and C-terminal lobes, a loop between two strands provides\ a hydrophobic pocket for binding of the adenine moiety, and a lysine residue interacts\ with the alpha-phosphate. In contrast to protein kinases, the PI3K loop which interacts with the\ phosphates of the ATP and is known as the glycine-rich or P-loop, contains no glycine residues.\ Instead, contact with the ATP -phosphate is maintained through the side chain of a conserved serine\ residue.

    \ \ 7279 IPR010894 \

    This family contains the bacterial stage V sporulation protein AD (SpoVAD), which is approximately 340 residues long. This is one of six proteins encoded by the spoVA operon, which is transcribed exclusively in the forespore at about the time of dipicolinic acid (DPA) synthesis in the mother cell. The functions of the proteins encoded by the spoVA operon are unknown, but it has been suggested they are involved in DPA transport during sporulation PUBMED:11751839.

    \ 3396 IPR007846 \ The MPPN (Mitotic PhosphoProtein N end) family is uncharacterised however it probably plays a role in the cell cycle because the family includes mitotic phosphoproteins PUBMED:9115395. This family also includes a suppressor of thermosensitive mutations in the DNA polymerase delta gene, Pol III PUBMED:7862092. The conserved central region appears to be distantly related to the RNA-binding region RNP-1 (RNA recognition motif, ), suggesting an RNA binding function for this protein.\ 1804 IPR001001 \ Describes the beta chain of DNA polymerase III. This is a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria. The beta chain is required for initiation of replication from an RNA primer, nucleotide triphosphate (dNTP)\ residues being added to the 5'-end of the growing DNA chain.\ 4187 IPR000988 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaeabacterial ribosomal proteins can be grouped on the basis of sequence \ similarities. One of these families PUBMED:8048931 consists of mammalian ribosomal protein L24; yeast\ ribosomal protein L30A/B (Rp29) (YL21); Kluyveromyces lactis ribosomal protein L30; Arabidopsis thaliana \ ribosomal protein L24 homolog; Haloarcula marismortui ribosomal protein HL21/HL22; and Methanococcus jannaschii MJ1201. These proteins have 60 to 160 amino-acid residues.

    \ 2041 IPR007163 \ This is a predicted transmembrane family of unknown function. Proteins usually have between 6 and 9 predicted transmembrane segments.\ 4323 IPR005572 \

    Sigma-E is important for the induction of proteins involved in heat shock response. RseA binds sigma-E via its N-terminal domain, sequestering sigma-E and preventing transcription from heat-shock promoters PUBMED:9159523. The C-terminal domain is located in the periplasm, and may interact with other protein that signal periplasmic stress.

    \ 1526 IPR007597 \ The precise function of these proteins is unclear, but some of them are involved in flagella motor switch PUBMED:11722727. The region represented in this entry is found in the CheC, CheX, CheA and FliY proteins. In some cases, this region is repeated in multiple copies.\ 5165 IPR008002 \

    This family consists of several herpesvirus proteins of unknown function.

    \ 2143 IPR007433 \ This family includes several proteins of uncharacterised function.\ 3045 IPR004436 \

    This family of enzymes catalyses the NADP(+)-dependent oxidative decarboxylation of isocitrate to form 2-oxoglutarate, CO2, and NADPH within the Krebs cycle (). Thus this enzyme supplies the cell with a key intermediate in energy metabolism, and precursors for biosynthetic pathways. The activity of this enzyme, which is controlled by phosphorylation, helps regulate carbon flux between the Krebs cycle and the glyoxylate bypass, which is an alternate route that accumulates carbon for biosynthesis when acetate is the sole carbon source for growth PUBMED:7836312. The phosphorylation state of this enzyme is controlled by isocitrate dehydrogenase kinase/phosphatase. This family has been found in a number of bacterial species including Azotobacter vinelandii, Corynebacterium glutamicum, Rhodomicrobium vannielii, and Neisseria meningitidis.

    \ \

    The structure of isocitrate dehydrogenase from Azotobacter vinelandii () has been determined PUBMED:12467571. This molecule consists of two distinct domains, a small domain and a large domain, with a folding topology similar to that of dimeric isocitrate dehydrogenase from E. coli (). The structure of the large domain repeats a motif observed in the dimeric enzyme. Such a fusional structure by domain duplication enables a single polypeptide chain to form a structure at the catalytic site that is homologous to the dimeric enzyme, the catalytic site of which is located at the interface of two identical subunits.

    \ 939 IPR003346 \ Transposases are needed for efficient transposition of the insertion sequence or transposon DNA. This family includes transposases for IS116, IS110 and IS902. It is often found with the transposase IS111A/IS1328/IS1533 family (see ).\ 6533 IPR008374 \

    \ Striated fibre assemblin (SFA), an acidic 33kDa protein, is the major\ component of striated microtubule-associated fibres (SMAFs) in the flagellar\ basal apparatus of green flagellates. In Chlamydomonas, and other green\ flagellates, the SMAFs form a cross-like pattern and run alongside the\ proximal parts of four bundles of flagellar root microtubules.\

    \

    \ The sequence of SFA contains two structurally distinct domains PUBMED:8491776. The\ head domain, with ~30 residues, contains all the prolines (3-8 depending on\ species) and is rich in hydroxyamino acids. This non-helical domain is\ further characterised by the presence of repetitive SP-motifs, some of them\ in the context SP(M/T)R, which is a putative substrate for p34-CDC2 kinase. The rod domain, with ~250 residues, is predicted to be mostly alpha-\ helical (the alpha-helix content was estimated to be 76% for the entire\ molecule or 85% for the postulated rod domain PUBMED:8491776). This domain shows a\ pronounced coiled-coil-forming ability and contains a 29-residue repeat\ pattern based on four heptads, followed by a skip residue.

    \ 1141 IPR001203 \

    Enzymes of the aldehyde ferredoxin oxidoreductase (AOR) family PUBMED:9242907 contain a tungsten cofactor and an 4Fe4S cluster and catalyse the interconversion of aldehydes to carboxylates PUBMED:8672295. This family includes AOR, formaldehyde\ ferredoxin oxidoreductase (FOR), glyceraldehyde-3-phosphate ferredoxin oxidoreductase (GAPOR), all isolated from\ hyperthermophilic archea PUBMED:9242907; carboxylic acid reductase found in clostridia PUBMED:2550230; and hydroxycarboxylate viologen\ oxidoreductase from Proteus vulgaris, the sole member of the AOR family containing molybdenum PUBMED:8026480. GAPOR may be involved in glycolysis PUBMED:7721730, but the functions of the other proteins are not yet clear. AOR has been proposed to be the\ primary enzyme responsible for oxidising the aldehydes that are produced by the 2-keto acid oxidoreductases PUBMED:9275170.

    \ 6669 IPR010667 \

    This family consists of several tail tube protein gp19 sequences from the T4-like viruses PUBMED:3363870,PUBMED:2403438.

    \ 4134 IPR004901 \

    Alpha-1,4-glucan-protein synthase catalyses the reaction: .\ The enzyme has a possible role in the synthesis of cell wall polysaccharides in plants PUBMED:13677461. It is found associated with the cell wall, with the highest concentrations in the plasmodesmata. It is also located in the Golgi apparatus.

    \ 7071 IPR010826 \

    This domain is found in several Phlebovirus glycoprotein G1 sequences. Members of the Bunyaviridae family acquire an envelope by budding through the lipid bilayer of the Golgi complex. The budding compartment is thought to be determined by the accumulation of the two heterodimeric membrane glycoproteins G1 and G2 in the Golgi PUBMED:9811692.

    \ 546 IPR007149 \ Members of this family are part of the Paf1/RNA polymerase II complex PUBMED:11927560, PUBMED:11884586. The Paf1 complex probably functions during the elongation phase of transcription PUBMED:11927560.\ 226 IPR001774 \ Ligands of the Delta/Serrate/lag-2 (DSL) family and their receptors, members of\ the lin-12/Notch family, mediate cell-cell interactions that specify cell fate in invertebrates and vertebrates. In C. elegans, two DSL genes, lag-2 and apx-1,\ influence different cell fate decisions during development. PUBMED:8575327. Molecular interaction between Notch and Serrate, another EGF-homologous transmembrane protein containing a region of striking similarity to Delta, has been shown and the same two EGF repeats of Notch may also constitute a Serrate binding domain PUBMED:1657403, PUBMED:7716513.\ 5874 IPR010326 \

    Sec6 is a component of the multiprotein exocyst complex. Sec6 interacts with Sec8, Sec10 and Exo70.These exocyst proteins localise to regions of active exocytosis-at the growing ends of interphase cells and in the medial region of cells undergoing cytokinesis-in an F-actin-dependent and exocytosis- independent manner PUBMED:11854409.

    \ 1604 IPR001981 \ Colipase is a small protein cofactor needed by pancreatic lipase for efficient dietary lipid hydrolyisis. Efficient absorption of dietary fats is dependent on the action of pancreatic triglyceride lipase. Colipase binds to the C-terminal, non-catalytic domain of lipase, thereby stabilising as active conformation and considerably increasing the overall hydrophobic binding site. Structural studies of the complex and of colipase alone have revealed the functionality of its architecture PUBMED:9240923, PUBMED:10570245.\

    Colipase is a small protein with five conserved disulphide bonds. Structural analogies have been recognised between a developmental protein (Dickkopf), the pancreatic lipase C-terminal domain, the N-terminal domains of lipoxygenases and the C-terminal domain of alpha-toxin. These non-catalytic domains in the latter enzymes are important for interaction with membrane. It has not been established if these domains are also involved in eventual protein cofactor binding as is the case for pancreatic lipase PUBMED:10570245.

    \ 2885 IPR000585 \

    Hemopexin () is a serum glycoprotein that binds haem and transports it to the liver for breakdown and iron recovery, after which the free hemopexin returns to the circulation PUBMED:12042069. Hemopexin prevents haem-mediated oxidative stress. Structurally hemopexin consists of two similar halves of approximately two hundred amino acid residues connected by a histidine-rich hinge region. Each half is itself formed by the repetition of a basic unit of some 35 to 45 residues. Hemopexin-like domains have been found in two other types of proteins, vitronectin PUBMED:9572850, a cell adhesion and spreading factor found in plasma and tissues, and matrixins MMP-1, MMP-2, MMP-3, MMP-9, MMP-10, MMP-11, MMP-12, MMP-14, MMP-15 and MMP-16, members of the matrix metalloproteinase family that cleave extracellular matrix constituents PUBMED:14619953. These zinc endopeptidases, which belong to MEROPS peptidase subfamily M10A, have a single hemopexin-like domain in their C-terminal section. It is suggested that the hemopexin domain facilitates binding to a variety of molecules and proteins, for example the HX repeats of some matrixins bind tissue inhibitor of metallopeptidases (TIMPs).

    \ 4027 IPR001743 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbT found in PSII, which is thought to be associated with the D1 (PsbA) - D2 (PsbD) heterodimer. PsbT may be involved in the formation and/or stabilisation of dimeric PSII complexes, because in the absence of this protein dimeric PSII complexes were found to be less abundant. Furthermore, although PsbT does not confer photo-protection, it is required for the efficient recovery of photo-damaged PSII PUBMED:11451956.

    \ 6582 IPR009609 \

    This family consists of several bacterial phosphonate metabolism protein PhnG sequences. In Escherichia coli, the phn operon encodes proteins responsible for the uptake and breakdown of phosphonates. The exact function of PhnG is unknown, however it is thought likely that along with six other proteins PhnG makes up the the C-P (carbon-phosphorus) lyase PUBMED:9882650.

    \ 2319 IPR007785 \ This family contains several uncharacterised eukaryotic proteins of unknown function.\ 1873 IPR003453 \ This domain has no known function nor do any of the proteins that possess it. The aligned region is approximately 150 amino acids long.\ 5827 IPR010302 \

    This family of Herpesvirus includes U4, U5 and UL27.

    \ 7301 IPR010012 \

    This family consists of several spasmodic peptide gm9a sequences. Conotoxin gm9a is a putative 27-residue polypeptide encoded by Conus gloriamaris and is known to be a homologue of the 'spasmodic peptide', tx9a, isolated from the venom of the mollusk-hunting cone shell Conus textile PUBMED:12193600. Upon injection of this venom component, normal mice are converted into behavioural phenocopies of a well-known mutant, the spasmodic mouse PUBMED:10677206.

    \ 2953 IPR005507 \ The proteins in this family are poorly characterised, but an investigation PUBMED:11596096 has indicated that the immediate early protein is required for the down-regulation of MHC class I expression in dendritic cells. Human herpesvirus 6 immediate early protein is also referred to as U90.\ 2513 IPR002713 \ The FF domain may be involved in protein-protein interaction PUBMED:10390614. It often occurs as multiple copies and often accompanies WW domains . PRP40 from yeast encodes a novel, essential splicing component that associates with the yeast U1 small nuclear ribonucleoprotein particle PUBMED:8622699.\ 6113 IPR011131 \

    This family consists of several Orthopoxvirus proteins of unknown function.

    \ 1003 IPR004019 \ The YLP motif is found in several Drosophila proteins. Its function is unknown, however the presence of completely conserved tyrosine residues and its presence in the human erbb-4 receptor protein-tyrosine kinase precursor may suggest it could be a substrate for tyrosine kinases.\ 571 IPR003399 \

    This domain is found in all 24 mce genes associated with the four mammalian cell entry (mce) operons of Mycobacterium tuberculosis and their homologs in other Actinomycetales PUBMED:12052567, PUBMED:14500535. The archetype (mce1A, Rv0169), was isolated as being necessary for colonisation of, and survival within, the macrophage PUBMED:8367727. The domain is also found in:

    \ \ \ 7754 IPR012486 \

    The sequences featured in this family are similar to a hypothetical protein product of ORF N1221 in the CPT1-SPC98 intergenic region of the yeast genome (). This encodes an acidic polypeptide with several possible transmembrane regions PUBMED:8619318.

    \ 7113 IPR010838 \

    This family contains several hypothetical bacterial proteins of unknown function that are approximately 250 residues long.

    \ 927 IPR001440 \

    The tetratrico peptide repeat (TPR) is a structural motif present in a wide range of proteins PUBMED:7667876, PUBMED:9482716, PUBMED:1882418. It\ mediates proteinprotein interactions and the assembly of multiprotein complexes PUBMED:14659697. The TPR motif\ consists of 316 tandem-repeats of 34 amino acids residues, although individual TPR motifs can\ be dispersed in the protein sequence. Sequence alignment of the TPR domains reveals a\ consensus sequence defined by a pattern of small and large amino acids. TPR motifs have been\ identified in various different organisms, ranging from bacteria to humans. Proteins containing\ TPRs are involved in a variety of biological processes, such as cell cycle regulation,\ transcriptional control, mitochondrial and peroxisomal protein transport, neurogenesis and\ protein folding.

    The X-ray structure of a domain containing three TPRs from protein phosphatase 5 revealed that\ TPR adopts a helixturnhelix arrangement, with adjacent TPR motifs packing in a parallel\ fashion, resulting in a spiral of repeating anti-parallel alpha-helices PUBMED:14659697. The two helices are denoted\ helix A and helix B. The packing angle between helix A and helix B is ~24° within a\ single TPR and generates a right-handed superhelical shape. Helix A interacts with helix B and\ with helix A' of the next TPR. Two protein surfaces are generated: the inner concave surface is\ contributed to mainly by residue on helices A, and the other surface presents residues from both\ helices A and B.

    \ 591 IPR003409 \ The MORN (Membrane Occupation and Recognition Nexus) motif is found in multiple copies in several proteins including junctophilins (PUBMED:10949023). The function of this motif is unknown.\ 69 IPR001506 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M12, subfamily M12A (astacin family, clan MA(M)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA and the predicted active site residues for members of this family and thermolysin occur in the motif HEXXH PUBMED:7674922.

    \ \ \ \ \

    The astacin () family of metalloendopeptidases encompasses a range of proteins\ found in hydra to humans, in mature and developmental systems PUBMED:7670368. Their\ functions include activation of growth factors, degradation of polypeptides,\ and processing of extracellular proteins PUBMED:7670368. The proteins are synthesised\ with N-terminal signal and pro-enzyme sequences, and many contain multiple\ domains C-terminal to the protease domain. They are either secreted from\ cells, or are associated with the plasma membrane.

    \ \

    The astacin molecule adopts a kidney shape, with a deep active-site cleft\ between its N- and C-terminal domains PUBMED:8445658. The zinc ion, which lies at the\ bottom of the cleft, exhibits a unique penta-coordinated mode of binding,\ involving 3 histidine residues, a tyrosine and a water molecule (which is\ also bound to the carboxylate side chain of Glu93) PUBMED:8445658. The N-terminal\ domain comprises 2 alpha-helices and a 5-stranded beta-sheet. The overall\ topology of this domain is shared by the archetypal zinc-endopeptidase\ thermolysin. Astacin protease domains also share common features with\ serralysins, matrix metalloendopeptidases, and snake venom proteases; they\ cleave peptide bonds in polypeptides such as insulin B chain and bradykinin,\ and in proteins such as casein and gelatin; and they have arylamidase\ activity PUBMED:7670368.

    \ 803 IPR007075 \ RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 6, represents a mobile module of the RNA polymerase. Domain 6 forms part of the shelf module PUBMED:8910400, PUBMED:11313498. This family appears to be specific to the largest subunit of RNA polymerase II.\ 5152 IPR007989 \

    This family consists of several uncharacterised Arabidopsis\ thaliana proteins of unknown function.

    \ 3303 IPR003179 \ Methyl-coenzyme M reductase (MCR) is the enzyme responsible for microbial formation of methane. It is a hexamer composed of 2 alpha, 2 beta, and 2 gamma subunits with two identical nickel porphinoid active sites PUBMED:9367957.\ 4381 IPR001619 \

    Sec1-like molecules have been implicated in a variety of eukaryotic\ vesicle transport processes including neurotransmitter release by exocytosis PUBMED:8769846. They regulate\ vesicle transport by binding to a t-SNARE from the syntaxin family. This process is thought\ to prevent SNARE complex formation, a protein complex required for membrane fusion.\ Whereas Sec1 molecules are essential for neurotransmitter release and other secretory\ events, their interaction with syntaxin molecules seems to represent a negative regulatory\ step in secretion PUBMED:10903948.

    \ 7468 IPR011507 \

    These Rhodopirellula baltica proteins share a highly conserved sequence, centred around an invariant QPP motif, at their N termini. This motif may represent an export signal.

    \ 5140 IPR007977 \

    The p21 membrane protein of vaccinia virus, encoded by the A17L (or A18L) gene, has been\ reported to localise on the inner of the two membranes of the intracellular mature virus (IMV). It has\ also been shown that p21 acts as a membrane anchor for the externally located fusion protein P14\ (A27L gene) PUBMED:11882999.

    \ 2792 IPR007507 \

    This is a domain found in proteins that transfer activated sugars to a variety of substrates, including glycogen, fructose-6-phosphate and lipopolysaccharides. Proteins bearing this domain transfer UDP, ADP, GDP or CMP linked sugars. This region is flanked at the N terminus by a signal peptide and at the C terminus by a glycosyl transferase group 1 domain (). The eukaryotic glycogen synthases may be distant members of this bacterial family PUBMED:10952982.

    \ 3503 IPR006975 \ NifQ is involved in early stages of the biosynthesis of the iron-molybdenum cofactor (FeMo-co) PUBMED:8316214, which is an integral part of the active site of dinitrogenase PUBMED:7954845. The conserved C-terminal cysteine residues may be involved in metal binding PUBMED:8316214.\ 6858 IPR009752 \

    This family consists of both hypothetical bacterial and phage proteins of around 145 residues in length. The function of this family is unknown.

    \ 7912 IPR012996 \

    This domain is a putative zinc-binding domain (CHHC motif) in RNP H and F. The domain is often associated with .

    \ 5002 IPR004443 \ The C-terminal region of yjeF from Escherichia coli shows similarity to hydroxyethylthiazole kinase (thiM) and other enzymes involved in thiamine biosynthesis. Saccharomyces cerevisiae YKL151C and Bacillus subtilis yxkO match the yjeF C-terminal domain but lack this region. The proteins in this group are of unknown function.\ 1394 IPR001851 \

    Bacterial binding protein-dependent transport systems PUBMED:3527048, PUBMED:2229036 are multicomponent systems typically composed of a periplasmic substrate-binding protein, one or two reciprocally homologous integral inner-membrane proteins and one or two peripheral membrane ATP-binding proteins that couple energy to the active transport system.

    \

    The integral inner-membrane proteins translocate the substrate across the membrane. It has been shown PUBMED:3000770, PUBMED:7934906 that most of these proteins contain a conserved region located about 80 to 100 residues from their C-terminal extremity. This region seems PUBMED:1738314 to be located in a cytoplasmic loop between two transmembrane domains. Apart from the conserved region, the sequence of these proteins is quite divergent, however they can be classified into seven families which have been respectively termed: araH, cysTW, fecCD, hisMQ, livHM, malFG and oppBC.

    \ 7610 IPR012419 \

    The members of this family are sequences that are similar to a region of Cas1p protein (). This is an O-acetyltransferase that in Cryptococcus neoformans was shown to be required for O-acetylation of its capsular polysaccharide PUBMED:11703667. The capsule is this organism,s most obvious virulence factor PUBMED:11703667.

    \ 5457 IPR008706 \ This family consists of a group of 17.4 kDa nanovirus proteins which are highly related to the Vicia faba necrotic yellows virus component 8 protein whose function is unknown PUBMED:9880028.\ 4422 IPR007046 \ This domain makes a direct interaction with the core RNA polymerase, to form an enhancer dependent holoenzyme PUBMED:10894718. The centre of this domain contains a very weak similarity to a helix-turn-helix motif, which may represent a DNA binding domain.\ 1715 IPR002219 \

    Diacylglycerol (DAG) is an important second messenger. Phorbol esters (PE) are analogues of DAG and potent tumor promoters that cause a variety of physiological changes when administered to both cells and tissues. DAG activates a family of serine/threonine protein kinases, collectively known as protein kinase C (PKC) PUBMED:1396661. Phorbol esters can directly stimulate PKC. The N-terminal region of PKC, known as C1, has been shown PUBMED:2500657 to bind PE and DAG in a phospholipid and zinc-dependent fashion. The C1 region contains one or two copies (depending on the isozyme of PKC) of a cysteine-rich domain, which is about 50 amino-acid residues long, and which is essential for DAG/PE-binding. The DAG/PE-binding domain binds two zinc ions; the ligands of these metal ions are probably the six cysteines and two histidines that are conserved in this domain.

    \ 7237 IPR010877 \

    This family contains GP46 phage proteins (approximately 120 residues long).

    \ 5319 IPR008823 \ The RuvB protein makes up part of the RuvABC revolvasome which catalyses the resolution of Holliday junctions that arise during genetic recombination and DNA repair. Branch migration is catalysed by the RuvB protein that is targeted to the Holliday junction by the structure specific RuvA protein PUBMED:12423347. This group of sequences contain this signature which is located in the C-terminal region of the proteins; it is thought to be a helicase DNA-binding domain.\ 2525 IPR001298 \

    The many different actin cross-linking proteins share a common architecture, consisting of a globular actin-binding domain and an extended rod. Whereas their actin-binding domains consist of two calponin homology domains (see ), their rods fall into three families.

    \ \

    The rod domain of the family including the Dictyostelium discoideum gelation factor (ABP120) and human filamin (ABP280) is constructed from tandem repeats of a 100-residue motif that is glycine and proline rich PUBMED:9164464. The gelation factor's rod contains 6 copies of the repeat, whereas filamin has a rod constructed from 24 repeats. The resolution of the 3D structure of rod repeats from the gelation factor has shown that they consist of a beta-sandwich, formed by two beta-sheets arranged in an immunoglobulin-like fold PUBMED:9164464, PUBMED:10467095. Because conserved residues that form the core of the repeats are preserved in filamin, the repeat structure should be common to the members of the gelation factor/filamin family.

    \ \

    The head to tail homodimerisation is crucial to the function of the ABP120 and ABP280 proteins. This interaction involves a small portion at the distal end of the rod domains. For the gelation factor it has been shown that the carboxy-terminal repeat 6 dimerises through a double edge-to-edge extension of the beta-sheet and that repeat 5 contributes to dimerisation to some extent PUBMED:9417983, PUBMED:10467095, PUBMED:2668299.

    \ 3593 IPR001704 \

    Orexins (also known as hypocretins) are recently identified neuropeptides that are specifically localised to the hypothalamus. They are thought to interact with autonomic, neurendocrine and neuroregulatory systems, and play an important role in the regulation of feeding behaviour PUBMED:9892705, PUBMED:9419374. When applied to hypothalamic neurones, these peptides are neuroexcitatory, which action is probably mediated by their binding to a new family of G-protein-coupled receptors (orexin receptors 1 and 2), which were previously orphan PUBMED:9491897.

    \

    To date, two orexins have been characterised (orexin-A and -B), both encoded by a single mRNA transcript (prepro-orexin): orexin-A is a 33-residue peptide with two intramolecular disulphide bonds in the N-terminal region; and orexin-B is a linear 28-residue peptide. These peptides have 46% identity at the amino acid sequence level, and show some similarity to the glucagon/vasoactive intestinal polypeptide/secretin peptide family.

    \ \ 4379 IPR000701 \ Succinate dehydrogenase (SDH) is a membrane-bound complex of two main components: a membrane-extrinsic \ component composed of an FAD-binding flavoprotein and an iron-sulphur protein, and a hydrophobic \ component composed of a cytochrome b and a membrane anchor protein. \

    The cytochrome b component is a \ mono heme transmembrane protein PUBMED:1447196, PUBMED:8152421, PUBMED:7616569 belonging to a family that includes cytochrome \ b-556 from bacterial SDH (gene sdhC); cytochrome b560 from the mammalian mitochondrial SDH complex and \ that encoded in the mitochondrial genome of some algae and in the plant Marchantia polymorpha; cytochrome \ b from yeast mitochondrial SDH complex (gene SDH3 or CYB3); and protein cyt-1 from Caenorhabditis. These \ cytochromes are proteins of about 130 residues that comprise three transmembrane regions. There are two \ conserved histidines which may be involved in binding the heme group.

    \ 7062 IPR009871 \

    This family consists of several Banana bunchy top virus proteins of around 120 residues in length. is annotated a movement protein whereas most other family members are hypothetical. The function of this family is unknown.

    \ 7478 IPR004044 \

    The K homology (KH) domain was first identified in the human heterogeneous\ nuclear ribonucleoprotein (hnRNP) K. It is a domain of around 70 amino acids\ that is present in a wide variety of quite diverse nucleic acid-binding\ proteins PUBMED:8036511. It has been shown to bind RNA PUBMED:9302998, PUBMED:10369774. Like many other RNA-binding motifs, KH motifs are found in one or multiple copies (14 copies in chicken vigilin) and, at least for hnRNP K (three copies) and FMR-1 (two copies), each motif is necessary for in vitro RNA binding activity, suggesting that they may function cooperatively or, in the case of single KH motif proteins (for example, Mer1p), independently PUBMED:8036511.

    \

    According to structural PUBMED:9302998, PUBMED:10369774, PUBMED:11160884 analysis the KH domain can be separated in two groups. The first group or type-1 contain a beta-alpha-alpha-beta-beta-alpha structure, whereas in the type-2 the two last beta-sheet are located in the N terminal part of the domain (alpha-beta-beta-alpha-alpha-beta). Sequence similarity between these two folds are limited to a short region (VIGXXGXXI) in the RNA binding motif. This motif is located between helice 1 and 2 in type-1 and between helice 2 and 3 in type-2. Proteins known to contain a type-2 KH domain include eukaryotic and prokaryotic S3 family of ribosomal proteins, and the prokaryotic GTP-binding protein, era.

    \ 8044 IPR013208 \

    Lipocalins are transporters for small hydrophobic molecules, such as lipids, steroid hormones, bilins, and retinoids. The structure is an eight-stranded beta barrel.

    \ 2253 IPR006728 \ This conserved region is found in several uncharacterised proteins from Gram-positive bacteria.\ 6890 IPR010761 \

    This family contains a number of Clc-like proteins that are approximately 250 residues long. These seem to be specific to Caenorhabditis elegans.

    \ 7715 IPR012932 \

    Vitamin K epoxide reductase (VKOR) recycles reduced vitamin K, which is used subsequently as a co-factor in the gamma-carboxylation of glutamic acid residues in blood coagulation enzymes. VKORC1 is a member of a large family of predicted enzymes that are present in vertebrates, Drosophila, plants, bacteria and archaea PUBMED:15276181. Four cysteine residues and one residue, which is either serine or threonine, are identified as likely active-site residues PUBMED:15276181. In some plant and bacterial homologues the VKORC1 homologous domain is fused with domains of the thioredoxin family of oxidoreductases PUBMED:15276181.

    \ 2310 IPR007770 \ This family contains uncharacterised plant proteins of unknown function.\ 7733 IPR012892 \

    Sequences found in this entry are derived from a number of bacteriophage and prophage proteins. They are similar to gp58 (), a minor structural protein of Lactococcus delbrueckii bacteriophage LL-H PUBMED:7828907.

    \ 6780 IPR009713 \

    This family consists of several Enterobacterial PsiA proteins. The function of PsiA is unknown although it is thought that it may affect the generation of an SOS signal in Escherichia coli PUBMED:3526338.

    \ 879 IPR004125 \

    The signal recognition particle (SRP) is an oligomeric complex that mediates targeting and insertion \ of the signal sequence of exported proteins into the membrane of the endoplasmic reticulum. SRP \ consists of a 7S RNA and six protein subunits. One of these subunits, the 54 kD protein (SRP54), is \ a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. The 54K subunit of the signal recognition particle has a two domain structure: the G-domain that binds GTP and the M-domain that binds the 7s RNA and also binds the signal sequence. The \ N-terminal 300 residues of SRP54 include the GTP-binding site (G-domain) (see ) and are evolutionary related \ to similar domains in other proteins PUBMED:7518075,PUBMED:14657338.

    \

    These proteins include Escherichia coli and Bacillus \ subtilis ffh protein (P48), which seems to be the prokaryotic counterpart of SRP54; signal recognition \ particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which \ ensures, in conjunction with SRP, the correct targeting of nascent secretory proteins to the \ endoplasmic reticulum membrane; bacterial FtsY protein, which is believed to play a similar role to \ that of the docking protein in eukaryotes; the pilA protein from Neisseria gonorrhoeae, the homolog of \ ftsY; and bacterial flagellar biosynthesis protein flhF.

    \ 5988 IPR010379 \

    During the bacterial cell cycle, the tubulin-like cell-division protein FtsZ polymerises into a ring structure that establishes the location of the nascent division site. EzrA modulates the frequency and position of FtsZ ring formation PUBMED:10449747.

    \ 1639 IPR000883 \ Cytochrome c oxidase () is a key enzyme in aerobic metabolism. Proton pumping heme-copper oxidases represent the terminal, energy-transfer enzymes of respiratory chains in prokaryotes and eukaryotes. The CuB-heme a3 (or heme o) binuclear center, associated with the largest subunit I of cytochrome c and ubiquinol oxidases (), is directly involved in the coupling between dioxygen reduction and proton pumping PUBMED:8083153, PUBMED:8049679.\ Some terminal oxidases generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner membrane (eukaryotes).

    The enzyme complex consists of 3-4 subunits (prokaryotes) up to 13 polypeptides (mammals) of which only the catalytic subunit (equivalent to mammalian subunit I (CO I)) is found in all heme-copper respiratory oxidases. The presence of a bimetallic center (formed by a high-spin heme and copper B) as well as a low-spin heme, both ligated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common to all family members PUBMED:8013452, PUBMED:6307356, PUBMED:2824194. In contrast to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The enzyme complexes \ vary in heme and copper composition, substrate type and substrate affinity. The different respiratory oxidases allow the cells to customize their respiratory systems according to a variety of environmental growth conditions PUBMED:8083153.

    \ \

    It has been shown that eubacterial quinol oxidase was derived from cytochrome c oxidase in Gram-positive bacteria and that archaebacterial quinol oxidase has an independent origin. A considerable amount of evidence suggests that proteobacteria (Purple bacteria) acquired quinol oxidase through a lateral gene transfer from Gram-positive bacteria PUBMED:8083153.

    \ \

    Nitric oxide reductase (NOR) () exists in denitrifying species of archae and eubacteria and is a heterodimer of cytochromes b and c. Phenazine methosulphate can act as acceptor. The prosite signature in this entry recognises the heme-copper site of the nitric oxidases.

    \ 1273 IPR000749 \ ATP:guanido phosphotransferases are a family of structurally and functionally related enzymes \ PUBMED:2324092, PUBMED:7819288 that reversibly catalyze the transfer of phosphate between \ ATP and various phosphogens. The enzymes belonging to this family include glycocyamine kinase \ (), which catalyzes the transfer of phosphate from ATP to guanidoacetate; arginine \ kinase (), which catalyzes the transfer of phosphate from ATP to arginine; taurocyamine \ kinase (), an annelid-specific enzyme that catalyzes the transfer of phosphate from ATP \ to taurocyamine; lombricine kinase (), an annelid-specific enzyme that catalyzes the \ transfer of phosphate from ATP to lombricine; Smc74, a cercaria-specific enzyme from Schistosoma \ mansoni PUBMED:2324092; and creatine kinase () (CK) PUBMED:3896131, PUBMED:2324105, which plays an important role in energy metabolism of vertebrates. It catalyzes the \ reversible transfer of high energy phosphate from ATP to creatine, generating phosphocreatine and \ ADP. There are at least four different, but very closely related, forms of CK. Two isozymes, M \ (muscle) and B (brain), are cytosolic, while the other two are mitochondrial. In sea urchin there \ is a flagellar isozyme, which consists of the triplication of a CK-domain. A cysteine residue is \ implicated in the catalytic activity of these enzymes and the region around this active site residue \ is highly conserved.\ 6133 IPR010445 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 7738 IPR012507 \

    The sequences featured in this family are similar to two proteins expressed by Lactococcus lactis, YibE () and YibF (). Most of the members of this family are annotated as being putative membrane proteins, and in fact the sequences contain a high proportion of hydrophobic residues.

    \ 2514 IPR007709 \ Formylglutamate amidohydrolase (FGase) catalyzes the terminal reaction in the five-step pathway for histidine utilization in Pseudomonas putida. By this action, N-formyl-L-glutamate (FG) is hydrolyzed to produce L-glutamate plus formate PUBMED:3308850.\ 4787 IPR002213 \

    UDP glycosyltransferases (UGT) are a superfamily of enzymes that catalyzes the addition of the glycosyl group from a UTP-sugar to a small hydrophobic molecule. This family currently consist of:

    \ \

    These enzymes share a conserved domain of about 50 amino acid residues located in their C-terminal section.

    \ 5759 IPR009229 \

    This family consists of several AgrD proteins from many Staphylococcus species. The agr locus was initially described in Staphylococcus aureus as an element controlling the production of exoproteins implicated in virulence. Its pattern of action has been shown to be complex, upregulating certain extracellular toxins and enzymes expressed post-exponentially and repressing some exponential-phase surface components. AgrD encodes the precursor of the autoinducing peptide (AIP).The AIP derived from AgrD by the action of AgrB interacts with AgrC in the membrane to activate AgrA, which upregulates transcription both from promoter P2, amplifying the response, and from P3, initiating the production of a novel effector: RNAIII. In S. aureus, delta-hemolysin is the only translation product of RNA III and is not involved in the regulatory functions of the transcript, which is therefore the primary agent for modulating the expression of other operons contr!\ olled by agr PUBMED:11807079.

    \ \ 2580 IPR002141 \ Influenza virus nucleoprotein is a structural protein which encapsidates the negative strand viral RNA. NP is one of the main determinants of species specificity. The question of how far the NP gene can cross the species barrier by reassortment and become adapted by mutation to the new host has been discussed PUBMED:4024728.\ 7814 IPR012946 \

    The X8 domain PUBMED:11115868 contains 6 conserved cysteine residues that presumably form three disulphide bridges. The domain is found in an Olive pollen allergen PUBMED:15004167 as well as at the C terminus of family 17 glycosyl hydrolases PUBMED:11115868. This domain may be involved in carbohydrate binding.

    \ 6787 IPR010716 \

    This family represents a conserved region approximately 200 residues long within eukaryotic RecQ helicase protein-like 5 (RecQ5). The RecQ helicases have been implicated in DNA repair and recombination, and RecQ5 may have an important role in DNA metabolism PUBMED:10710432.

    \ 1553 IPR008136 \ CinA is the first gene in the competence-inducible (cin) operon, and is thought to be specifically required at some stage in the process of transformation PUBMED:7538190. This is a C-terminal region of putative competence-damaged proteins from the cin operon.\ 3429 IPR002856 \

    This family of methyltransferases occurs in both archaea and bacteria. In archaea, members of this family (MtrH) are involved in the energy conservation step of methanogenesis, while in prokaryotes, members of this family whose function has been defined (CmuB) are involved in the metabolism of chloromethane.

    \ \

    In archaea the enzyme tetrahydromethanopterin S-methyltransferase is composed of eight subunits, MtrA-H. The enzyme is a membrane- associated enzyme complex which catalyzes an energy-conserving, sodium-ion-translocating step in methanogenesis from hydrogen and carbon dioxide PUBMED:7737157. Subunit MtrH catalyzes the methylation reaction and was shown to exhibit methyltetrahydromethanopterin:cob(I)alamin methyltransferase activity PUBMED:10338124.

    \ \ \

    In bacteria, the pathway of chloromethane utilisation allows the microorganisms that possess it to grow with chloromethane as the sole carbon and energy source. It is initiated by a corrinoid-dependent methyltransferase system involving methyltransferase I (CmuA) and methyltransferase II (CmuB), which transfer the methyl group of chloromethane onto tetrahydrofolate PUBMED:10200311. The methyl group of chloromethane is first transferred by the protein CmuA to its corrinoid moiety, from where it is transferred to tetrahydrofolate by CmuB, thereby yielding methyltetrahydrofolate PUBMED:10447694, PUBMED:11358510.

    \

    CmuB has methylcobalamin:tetrahydrofolate methyltransferase activity, and catalyzes the conversion of methylcobalamin and tetrahydrofolate to cob(I)alamin and methyltetrahydrofolate.

    \ \ \ \ 7338 IPR011089 \

    The family contains RloF from Campylobacter jejuni, its function and those of the other members are unknown.

    \ 713 IPR007719 \ Phytochelatin synthase is the enzyme responsible for the synthesis of heavy-metal-binding peptides (phytochelatins) from glutathione and related thiols PUBMED:11814595.\ 2857 IPR002534 \ The medium (M) genome segment of hantaviruses (family Bunyaviridae)\ encodes the two virion glycoproteins PUBMED:3114716. G1 and G2, as a precursor\ protein in the complementary sense RNA.\ 7781 IPR012472 \

    This family of fungal proteins is uncharacterised. Each protein contains two copies of this region.

    \ 417 IPR000294 \

    This domain contains post-translational modifications of many glutamate residues by vitamin K-dependent carboxylation to form gamma-carboxyglutamate (Gla) PUBMED:3106112, PUBMED:2183788.\ The GLA domain is responsible for the high-affinity binding of calcium ions. It starts at the N-terminal extremity of the mature form of proteins and ends with a conserved aromatic residue; a conserved Gla-x(3)-Gla-x-Cys motif PUBMED:3317405 is found in the middle of the domain which seems to be important for substrate recognition by the carboxylase.

    \

    The 3D structure of the Gla domain has been solved PUBMED:7713897, PUBMED:8663165. Calcium ions induce conformational changes in the Gla\ domain and are necessary for the Gla domain to fold properly. A common\ structural feature of functional Gla domains is the clustering of N-terminal\ hydrophobic residues into a hydrophobic patch that mediates interaction with\ the cell surface membrane PUBMED:8663165.\

    \ 2683 IPR000407 \

    A number of nucleoside diphosphate and triphosphate hydrolases as well as some\ yet uncharacterized proteins have been found to belong to the same family PUBMED:8579614, PUBMED:8703025. The uncharacterized proteins all seem to be membrane-bound.

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ \ 1784 IPR002925 \ Dienelactone hydrolases play a crucial role in chlorocatechol degradation via the modified ortho cleavage pathway. Enzymes induced in 4-fluorobenzoate-utilizing bacteria have been classified into three groups on the basis of their specificity towards cis- and trans-dienelactone PUBMED:7684040.\ Some proteins contain repeated small fragments of this domain (for example rat kan-1 protein).\ 1138 IPR007797 \ This family consists of AF4 (Proto-oncogene AF4) and FMR2 (Fragile X E mental retardation syndrome) nuclear proteins. These proteins have been linked to Homo sapiens diseases such as acute lymphoblastic leukemia and mental retardation PUBMED:11171403. The family also contains a Drosophila AF4 protein homologue Lilliputian which contains an AT-hook domain. Lilliputian represents a novel pair-rule gene that acts in cytoskeleton regulation, segmentation and morphogenesis in Drosophila PUBMED:11171404.\ 7508 IPR011626 \

    This domain covers the complement component region of the alpha-2-macroglobulin family.

    \

    The alpha-macroglobulin (aM) family of proteins includes protease inhibitors PUBMED:2473064, typified by the human tetrameric a2-macroglobulin (a2M); they belong to the MEROPS proteinase inhibitor family I39, clan IL. These protease inhibitors share several defining properties, which include (i) the ability to inhibit proteases from all catalytic classes, (ii) the presence of a 'bait region' and a thiol ester, (iii) a similar protease inhibitory\ mechanism and (iv) the inactivation of the inhibitory capacity by reaction of the thiol ester with small primary amines. \ aM protease inhibitors inhibit by steric hindrance PUBMED:2472396. The mechanism involves protease cleavage of the bait region, a segment of the aM that is particularly susceptible to proteolytic cleavage, which initiates a conformational change such that the aM collapses about the protease. In the resulting aMprotease complex, the active site of the protease is sterically shielded, thus substantially decreasing access to protein substrates. Two additional events occur as a consequence of bait region cleavage, namely (i) the h-cysteinyl-g-glutamyl thiol ester becomes highly reactive and (ii) a major conformational change exposes a conserved COOH-terminal receptor binding domain PUBMED:2469470 (RBD). RBD exposure allows the aM protease complex to bind to clearance receptors and be removed from circulation PUBMED:2430968. Tetrameric, dimeric, and, more recently, monomeric aM protease inhibitors have been identified PUBMED:9914899, PUBMED:10426429.

    \ \ 3611 IPR003323 \

    This is a group of proteins found primarily in virus's, eukaryotes and in the pathogenic bacterium Chlamydia pneumoniae. In viruses they are annotated as replicase or RNA-dependant RNA polymerase. The eukaryotic sequences are related to the Ovarian Tumour (OTU) gene in Drosophila, cezanne deubiquitinating peptidase and tumor necrosis factor, alpha-induced protein 3 (MEROPS peptidase family C64) and otubain 1 and otubain 2 (MEROPS peptidase family C65).

    \ \ \ \ \

    None of these proteins has a known\ biochemical function but low sequence similarity with the polyprotein regions\ of arteriviruses, and conserved cysteine and histidine, and possibly the aspartate, residues suggests that those not yet recognised as peptidases could possess cysteine protease activity PUBMED:10664582.

    \ 3686 IPR000730 \

    Proliferating cell nuclear antigen (PCNA), or cyclin, is a non-histone acidic nuclear protein\ PUBMED:2884104 that plays a key role in the control of eukaryotic DNA replication PUBMED:1346518.\ It acts as a co-factor for DNA polymerase delta, which is responsible for leading strand DNA\ replication PUBMED:2565339. The sequence of PCNA is well conserved between plants and animals,\ indicating a strong selective pressure for structure conservation, and suggesting that this type\ of DNA replication mechanism is conserved throughout eukaryotes PUBMED:1671766. In yeast, POL30, is associated with polymerase III, the yeast analog of polymerase delta.

    \ \ \ \

    Homologues of\ PCNA have also been identified in the archaea (euryarchaeota and crenarchaeota) and in Paramecium bursaria chlorella virus and in nuclear polyhedrosis viruses.

    \ 1459 IPR001612 \

    Caveolins PUBMED:8567687, PUBMED:8552590, PUBMED:15003112 are a family of integral membrane proteins which are the principal components of caveolae membranes. Cavoleae are flask-shaped plasma membrane invaginations whose exact cellular function is not yet clear. Caveolins may act as scaffolding proteins within caveolar membranes by compartmentalizing and concentrating signaling molecules. Various classes of signaling molecules, including G-protein subunits,\ receptor and non-receptor tyrosine kinases, endothelial nitric oxide synthase (eNOS), and small GTPases, bind\ Cav-1 through its 'caveolin-scaffolding domain'.

    \

    Currently, three different forms of caveolins are known: caveolin-1 (or VIP21), caveolin-2 and caveolin-3 (or M-caveolin).

    \

    Caveolins are proteins of about 20 Kd, they form high molecular mass homo-oligomers. Structurally they seem to have N-terminal and C-terminal hydrophilic segments and a long central transmembrane domain that probably forms a hairpin in the membrane. Both extremities are known to face the cytoplasm. Caveolae are enriched with cholesterol and Cav-1 is one of the few proteins that binds cholesterol tightly and specifically.

    \ \ 1062 IPR005079 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to MEROPS peptidase family C45 (clan PB(C)). The active site residue for members of this family and family T1 is C-terminal to the autolytic cleavage site. They represent a family of enzymes which catalyse the final step in penicillin biosynthesis PUBMED:2120195.

    \ 3798 IPR002477 \

    This domain, peptidoglycan binding domain 1, may have a general peptidoglycan binding function. It is composed of three alpha helices and is found at the N or C terminus of a variety of enzymes involved in bacterial cell wall degradation PUBMED:9555893, PUBMED:7121588, PUBMED:1683402. Examples are:

    \ \

    \ \

    Many of the proteins having this domain are as yet uncharacterised. Those that are, are metallopeptidases belonging to MEROPS peptidase family M15 (clan MD), subfamily M15A. A number of the proteins belonging to subfamily M15A are non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    \ \ 3727 IPR001769 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to MEROPS peptidase family C25 (gingipain, clan CD). The protein fold of the peptidase domain for members of this family resembles that of caspase 1, the type example for clan CD.

    \ \ \

    This is a protein family found only in the bacteria. Porphyromonas gingivalistaxon (Bacteroides gingivalis) a Gram-negative anaerobic bacterial species strongly associated with adult periodontitis. One of its distinguishing characteristics and putative virulence properties is the ability to agglutinate erythrocytes PUBMED:8926061. It is a highly proteolytic organism which metabolizes small peptides and amino acids. Indirect evidence suggests that the proteases produced by this microorganism constitute an important virulence factor PUBMED:1322368. Protease-encoding genes have been shown to contain multiple copies of repeated nucleotide sequences. These conserved sequences have also been found in hemagglutinin genes PUBMED:9632563.

    \ 8070 IPR013245 \

    This short repeat is found in the Sel1 protein PUBMED:8722778. It is related to TPR repeats.

    \ 6462 IPR009537 \

    This family represents a conserved region within hypothetical prokaryotic and archaeal proteins of unknown function.

    \ 1505 IPR004486 \ This is the small subunit of a heterodimer which catalyzes the reaction CO + H2O + Acceptor = CO2 + Reduced acceptor and is involved in the synthesis of acetyl-CoA from CO2 and H2.\ 5150 IPR007987 \

    This family consists of several poxvirus A21 proteins.

    \ 5717 IPR008829 \ This family consists of several eukaryotic and archaeal proteins which are related to the Homo sapiens soluble liver antigen/liver pancreas antigen (SLA/LP autoantigen). Autoantibodies are a hallmark of autoimmune hepatitis, but most are not disease specific. Autoantibodies to soluble liver antigen (SLA) and to liver and pancreas antigen (LP) have been described as disease specific, occurring in about 30% of all patients with autoimmune hepatitis PUBMED:10801173. The function of SLA/LP is unknown, however, it has been suggested that the protein may function as a serine hydroxymethyltransferase and may be an important enzyme in the thus far poorly understood selenocysteine pathway PUBMED:11481605. The archaeal sequences and are annotated as being pyridoxal phosphate-dependent enzymes.\ 7082 IPR009883 \

    This family consists of several hypothetical bacterial proteins of around 135 residues in length. Members of this family all appear to be Enterobacterial proteins. The function of this family is unknown.

    \ 5266 IPR008857 \ This family consists of several thyrotropin-releasing hormone (TRH) proteins. Thyrotropin-Releasing Hormone (TRH; pyroGlu-His-Pro-NH2), originally isolated as a hypothalamic neuropeptide hormone, most likely acts also as a neuromodulator and/or neurotransmitter in the central nervous system (CNS). This interpretation is supported by the identification of a peptidase localised on the surface of neuronal cells which has been termed TRH-degrading ectoenzyme (TRH-DE) since it selectively inactivates TRH. TRH has been used clinically for the treatment of spinocerebellar degeneration and disturbance of consciousness in humans PUBMED:12467901.\ 3821 IPR006481 \

    This group of sequences represent one of a large number of mutually dissimilar families of phage holins. Holins act against the host cell membrane to allow lytic enzymes of the phage to reach the bacterial cell wall. This family includes the product of the S gene of phage lambda.

    \ 4274 IPR007759 \

    DNA-dependent RNA polymerases () are\ responsible for the polymerisation of ribonucleotides\ into a sequence complementary to the template DNA. In\ eukaryotes, there are three different forms of\ DNA-dependent RNA polymerases transcribing different\ sets of genes. Most RNA polymerases are multimeric\ enzymes and are composed of a variable number of\ subunits. RNA synthesis follows after the attachment\ of RNA polymerase to a specific site, the promoter, on\ the template DNA strand. The RNA synthesis process\ continues until a termination sequence is reached. \ The RNA product, which is synthesised in the 5' to 3'\ direction, is known as the primary transcript.\ \ Eukaryotic nuclei contain three distinct types of RNA\ polymerases that differ in the RNA they synthesise:\ \

    \ \ Eukaryotic cells are also known to contain separate\ mitochondrial and chloroplast RNA polymerases. \ Eukaryotic RNA polymerases, whose molecular masses\ vary in size from 500 to 700 kD, contain two\ non-identical large (>100 kDa) subunits and an array\ of up to 12 different small (less than 50 kDa) subunits.

    \ \ \

    The delta protein is a dispensable subunit of Bacillus subtilis RNA polymerase (RNAP) that has major effects on the biochemical properties of the purified enzyme. In the presence of delta, RNAP displays an increased specificity of transcription, a decreased affinity for nucleic acids, and an increased efficiency of RNA synthesis because of enhanced recycling PUBMED:10336502. The delta protein, contains two distinct regions, an N-terminal domain and a glutamate and aspartate residue-rich C-terminal region PUBMED:7545758.

    \ 6504 IPR009562 \

    This family consists of several hypothetical bacterial proteins of around 150 residues in length. The function of this family is unknown.

    \ 4694 IPR001888 \ Autonomous mobile genetic elements such as transposon or insertion sequences (IS)\ encode an enzyme, transposase, that is required for excising and inserting\ the mobile element. Transposases have been grouped into various families PUBMED:8041625, PUBMED:1310791, PUBMED:1718819. This family includes the mariner transposase PUBMED:8895590.\ 5853 IPR010314 \

    This is a domain of unknown function found towards the N-terminus of a family of E3 ubiquitin protein ligases, including yeast TOM1, many of which appear to play a role in mRNA transcription and processing. This domain is found in association with and immediately C-terminal to another domain of unknown function: .

    \ 2986 IPR000532 \ A number of polypeptidic hormones, mainly expressed in the intestine or the pancreas, belong to a group of structurally related peptides PUBMED:3133967, PUBMED:3291691. Once such hormone, glucagon is widely distributed and produced in the alpha-cells of pancreatic islets PUBMED:4076759. It affects glucose metabolism in the liver PUBMED:6577439 by inhibiting glycogen synthesis, stimulating glycogenolysis and enchancing gluconeogenesis. It also increases mobilisation of glucose, free fatty acids and ketone bodies, which are metabolites produced in excess in diabetes mellitus. Glucagon is produced, like other peptide hormones, as part of a larger precursor (preproglucagon), which is cleaved to produce glucagon, glucagon-like protein I and glucagon-like protein II PUBMED:3260236. The structure of glucagon itself is fully conserved in all known mammalian species PUBMED:4076759. Other members of the structurally similar group include glicentin precursor, secretin, gastric inhibitory protein, vasoactive intestinal peptide (VIP), prealbumin, peptide HI-27 and growth hormone releasing factor.\ 6375 IPR010543 \

    This entry represents the C terminus of a number of hypothetical plant proteins.

    \ 3902 IPR004003 \ Alanine dehydrogenases () and pyridine nucleotide transhydrogenase () have been\ shown to share regions of similarity PUBMED:8439307. Alanine dehydrogenase catalyzes the NAD-dependent\ reversible reductive amination of pyruvate into alanine. Pyridine nucleotide transhydrogenase catalyzes\ the reduction of NADP+ to NADPH with the concomitant oxidation of NADH to NAD+. This enzyme is located\ in the plasma membrane of prokaryotes and in the inner membrane of the mitochondria of eukaryotes. The\ transhydrogenation between NADH and NADP is coupled with the translocation of a proton across the\ membrane. In prokaryotes the enzyme is composed of two different subunits, an alpha chain (gene pntA)\ and a beta chain (gene pntB), while in eukaryotes it is a single chain protein. The sequence of alanine\ dehydrogenase from several bacterial species are related with those of the alpha subunit of bacterial\ pyridine nucleotide transhydrogenase and of the N-terminal half of the eukaryotic enzyme. The two most\ conserved regions correspond respectively to the N-terminal extremity of these proteins and to a central\ glycine-rich region which is part of the NAD(H)-binding site.\ 5947 IPR003886 \

    This is an extracellular domain of unknown function in nidogen (entactin) and hypothetical proteins PUBMED:11893501. Nidogen is a sulphated glycoprotein which is widely distributed in basement membranes and is tightly associated with laminin. It also binds to collagen IV. Nidogen probably plays a role in cell adhesion and cell-extracellular matrix intercations.

    \ 7847 IPR013122 \

    This domain contains the cation channel region of PKD1 and PKD2 proteins.

    \ 2581 IPR000256 \ NS1 is a homodimeric RNA-binding protein found in influenza virus that is required for viral replication. NS1 binds polyA tails of mRNA keeping them in the nucleus. NS1 inhibits pre-mRNA splicing by tightly binding to a specific stem-bulge of U6 snRNA PUBMED:9360601.\ 5732 IPR008581 \ This family consists of a number of hypothetical proteins from plants. The function of this family is unknown.\ 247 IPR004158 \ The function of the plant proteins constituting this family is unknown.\ 2272 IPR002749 \ These proteins of unknown function are found in archaebacteria and are\ probably transmembrane proteins.\ 7336 IPR011109 \

    This domain is usually found associated with in putative integrases/recombinases of mobile genetic elements of diverse bacteria and phages.

    \ 7974 IPR012948 \

    This domain is the central domain of AARP2. It is weakly similar to the GTP-binding domain of elongation factor TU PUBMED:15112237.

    \ 154 IPR007763 \

    The proton-pumping NADH:ubiquinone oxidoreductase, also called complex I, is the entry point for electrons into the respiratory chains of many bacteria and mitochondria of most eukaryotes PUBMED:14741580. It couples electron transfer with the translocation of protons across the membrane, thus providing the proton motive force essential for energy-consuming processes. The human enzyme complex is composed of a total of 43-45 subunits.

    This family contains the 17.2 kDa subunit of complex I of NADH:ubiquinone oxidoreductase and its homologues. This subunit is believed to be one of the 36 structural complex I proteins.

    \ 7544 IPR011666 \ This domain is found at the N terminus of several eukaryotic RNA processing proteins (e.g ).\ 6335 IPR010525 \

    This pattern represents a conserved region of auxin-responsive transcription factors.

    \

    The plant hormone auxin (indole-3-acetic acid) can regulate the gene expression of several families, including Aux/IAA, GH3 and SAUR families. Two related families of proteins, Aux/IAA proteins () and the auxin response factors (ARF), are key regulators of auxin-modulated gene expression PUBMED:12036262. There are multiple ARF proteins, some of which activate, while others repress transcription. ARF proteins bind to auxin-responsive cis-acting promoter elements (AuxREs) using an N-terminal DNA-binding domain. It is thought that Aux/IAA proteins activate transcription by modifying ARF activity through the C-terminal protein-protein interaction domains () found in both Aux/IAA and ARF proteins.

    \ 2571 IPR005626 \

    This is a family of FLP proteins that catalyse recombination between large inverted repetitions of the plasmid.

    \ \ \ 6624 IPR010649 \

    This family consists of several bacterial periplasmic nitrate reductase NapE proteins. Seven genes, napKEFDABC, encoding the periplasmic nitrate reductase system were cloned from the denitrifying phototrophic bacterium Rhodobacter sphaeroides f. sp. denitrificans IL106. NapE is thought to be a transmembrane protein PUBMED:10227138.

    \ 7103 IPR009899 \

    This family consists of several bacterial antirestriction (ArdA) proteins. ArdA functions in bacterial conjugation to allow an unmodified plasmid to evade restriction in the recipient bacterium and yet acquire cognate modification PUBMED:12618468.

    \ 3806 IPR007686 \ This family represents a family of bacterial phosphatidylglycerophosphatases (), known as PgpA. It appears that bacteria possess several phosphatidylglycerophosphatases, and thus, PgpA is not essential in Escherichia coli PUBMED:1309518.\ 5006 IPR005877 \

    Many surface proteins found in Streptococcus, Staphylococcus, and related lineages share apparently homologous signal sequences. A motif resembling [YF]SIRKxxxGxxS[VIA] appears at the start of the transmembrane domain. The GxxS motif appears perfectly conserved, suggesting a specific function and not just homology.

    \ 7119 IPR009908 \

    This family consists of several bacterial methylamine utilisation MauE proteins. Synthesis of enzymes involved in methylamine oxidation via methylamine dehydrogenase (MADH) is encoded by genes present in the mau cluster. MauE and MauD are specifically involved in the processing, transport, and/or maturation of the beta-subunit and that the absence of each of these proteins leads to production of a non-functional beta-subunit which becomes rapidly degraded PUBMED:9403107.

    \ 3115 IPR001229 \

    A variety of proteins containing this domain are lectins, such as Jacalin from the seed of the Artocarpus heterophyllus (jackfruit), which is specific for galactose. Some lectins in this group stimulate distinct T- and B- cell functions, such as Jacalin, which binds to the T-antigen PUBMED:12206779. This domain is found in 1 to 6 copies in lectins. The domain is also found in the salt-stress induced protein from rice and an animal prostatic spermine-binding protein.

    \ 7840 IPR012616 \

    The TOM13 family of proteins are mitochondrial outer membrane proteins that mediate the assembly of beta-barrel proteins PUBMED:15326197.

    \ 235 IPR003791 \

    This entry describes proteins of unknown function.

    \ 1067 IPR003439 \

    ATP-binding cassette (ABC) transporters are multidomain membrane proteins, responsible\ for the controlled efflux and influx of substances (allocrites) across cellular membranes. They are minimally composed of four domains, with two transmembrane domains\ (TMDs) responsible for allocrite binding and transport and two nucleotide-binding domains\ (NBDs) responsible for coupling the energy of ATP hydrolysis to conformational changes\ in the TMDs. Both NBDs are capable of ATP hydrolysis, and inhibition of\ hydrolysis at one NBD effectively abrogates hydrolysis at the other. Hydrolysis\ at the two NBDs may occur in an alternative fashion although they appear substantially functionally\ symmetrical in terms of their binding to diverse nucleotides PUBMED:12504680.

    \

    On the basis of sequence similarities a family of related ATP-binding proteins has been characterized PUBMED:2229036, PUBMED:3288195, PUBMED:3762694, PUBMED:3762695, PUBMED:1977073.

    \ \ \

    The proteins belonging to this family also contain one or two copies of the 'A' consensus sequence PUBMED:6329717 or the 'P-loop' PUBMED:2126155 (see ).

    \ 5692 IPR008448 \ This family consists of several Chordopoxvirus DNA-directed RNA polymerase 7 kDa polypeptide sequences. DNA-dependent RNA polymerase catalyses the transcription of DNA into RNA PUBMED:1560534.\ 4170 IPR000218 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L14 is one of the proteins from the large ribosomal subunit.\ In eubacteria, L14 is known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins, which have been grouped on the basis of sequence similarities PUBMED:. Based on amino-acid sequence homology, it is predicted that ribosomal protein L14 is a member of a recently identified family of structurally related RNA-binding proteins PUBMED:15299380. L14 is a protein of 119 to 137 amino-acid residues.

    \ \ 4480 IPR004084 \

    Spo11 is a meiosis-specific protein in yeast that covalently binds to DNA\ double-strand breaks (DSBs) during the early stages of meiosis PUBMED:10534401. These DSBs initiate homologous recombination, which is required for chromosomal \ segregation and generation of genetic diversity during meiosis. Mouse and human homologues of Spo11 have been cloned and characterised. The proteins are 82% identical and share ~25% identity with other family members. Mouse Spo11 has been localised to chromosome 2H4, and human SPO11 to chromosome 20q13.2-q13.3, a region amplified in some breast and ovarian tumours PUBMED:10534401.

    Similarity between SPO11 and archaebacterial TOP6A proteins points to \ evolutionary specialisation of a DNA-cleavage function for meiotic recombination PUBMED:10622720. Note that the yeast SPO11 protein shares far less similarity to other SPO11 proteins than the human and mouse homologues do to each other.

    \ 3327 IPR000316 \

    Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, nickel, etc. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds PUBMED:1779825, PUBMED:2959513, PUBMED:3064814, PUBMED:2959504. An empirical classification into three classes has been proposed by Fowler and coworkers PUBMED:2959504 and Kojima PUBMED:1779826. Members of class I are defined to include polypeptides related in the positions of their cysteines to equine MT-1B, and include mammalian MTs as well as MTs from crustaceans and molluscs. Class II groups MTs from a variety of species, including sea urchins, fungi, insects and cyanobacteria. Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units PUBMED:2959504. \ This original classification system has been found to be limited, in the sense that it does not allow clear differentiation of patterns of structural similarities, either between or within classes. Consequently, all class I and class II MTs (the proteinaceous sequences) have now been grouped into families of phylogenetically-related and thus alignable sequences. This system subdivides the MT superfamily into families, subfamilies, subgroups, and isolated isoforms and alleles. \ The metallothionein superfamily comprises all polypeptides that resemble equine renal metallothionein in several respects PUBMED:2959504: e.g., low molecular weight; high metal content; amino acid composition with high Cys and low aromatic residue content; unique sequence with characteristic distribution of cysteines, and spectroscopic manifestations indicative of metal thiolate clusters. A MT family subsumes MTs that share particular sequence-specific features and are thought to be evolutionarily related. The inclusion of a MT within a family presupposes that its amino acid sequence is alignable with that of all members. Fifteen MT families have been characterised, each family being identified by its number and its taxonomic range: e.g., Family 1: vertebrate MTs.

    \

    Family 15 consists of planta MTs. Its members are recognised by the sequence pattern [YFH]-x(5,25)-C-[SKD]-C-[GA]-[SDPAT]-x(0,1)-C-x-[CYF] which yields all plant sequences, but also MTCU_HELPO and the non-MT ITB3_HUMAN. The taxonomic range of the members extends to planta. Planta MTs are 45-84 residue proteins, containing 17 conserved cysteines that bind 5 zinc ions. Generally, there are two Cys-rich regions (domain 1 and domain 3) separated by a Cys-poor region (domain 2) and only the domain 2 contains unusual residues. It is believed that the proteins may have a role in Zn2+ homeostasis during embryogenesis. Family 15 includes the following subfamilies: p1, p2, p2v, p3, pec, p21.

    \ \ 6517 IPR010603 \

    The ClpX heat shock protein of Escherichia coli is a member of the universally conserved Hsp100 family of proteins, and possesses a putative zinc finger motif of the C4 type PUBMED:11278349. This presumed zinc binding domain (ZBD) is found at the N terminus of the ClpX protein. ClpX is an ATPase which functions both as a substrate specificity component of the ClpXP protease and as a molecular chaperone. ZBD is a member of the treble clef zinc finger family, a motif known to facilitate protein-ligand, protein-DNA, and protein-protein interactions and forms a constitutive dimer that is essential for the degradation of some, but not all, ClpX substrates PUBMED:14525985.

    \ 2217 IPR007655 \ This is a family of hypothetical bacterial proteins.\ 1121 IPR003391 \ This protein, also known as bellett protein, is covalently attached to the terminii of replicating DNA in vivo PUBMED:433158 and may play a role in DNA replication.\ 7805 IPR013111 \

    This family contains EGF domains found in a variety of extracellular proteins.

    \ 7783 IPR012500 \

    The Clostridium neurotoxin family is composed of tetanus neurotoxin and seven serotypes of botulinum neurotoxin. The structure of the botulinum neurotoxin reveals a four domain protein. The N-terminal catalytic domain (), the central translocation domains and two receptor binding domains PUBMED:9783750. Subsequent to cell surface binding and receptor mediated endocytosis of the neurotoxin, an acid induced conformational change in the neurotoxin translocation domain is believed to allow the domain to penetrate the endosome and from a pore, thereby facilitating the passage of the catalytic domain across the membrane into the cytosol PUBMED:9783750. The structure of the translocation reveals a pair of helices that are 105 Angstroms long and is structurally distinct from other pore forming toxins PUBMED:9783750.

    \ 6456 IPR010582 \

    Catalases () are antioxidant enzymes that catalyse the conversion of hydrogen peroxide to water and molecular oxygen, serving to protect cells from its toxic effects PUBMED:11351128. Hydrogen peroxide is produced as a consequence of oxidative cellular metabolism and can be converted to the highly reactive hydroxyl radical via transition metals, this radical being able to damage a wide variety of molecules within a cell, leading to oxidative stress and cell death. Catalases act to neutralise hydrogen peroxide toxicity, and are produced by all aerobic organisms ranging from bacteria to man. Most catalases are mono-functional, haem-containing enzymes, although there are also bifunctional haem-containing peroxidase/catalases () that are closely related to plant peroxidases, and non-haem, manganese-containing catalases () that are found in bacteria PUBMED:14745498.

    \ \

    This entry represents a small conserved region within catalase enzymes ().

    \ 2384 IPR005658 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    This domain of proteinase inhibitors belong to MEROPS inhibitor family I11, clan IN. Ecotins are dimeric periplasmic proteins from Escherichia coli and related Gram-negative bacteria that have been shown to be potent inhibitors of many trypsin-fold serine proteases of widely varying substrate specificity, which belong to MEROPS peptidase family S1 () PUBMED:14705960. Phylogenetic analysis suggested that ecotin has an exogenous target, possibly neutrophil elastase. Ecotin from E. coli, Yersinia pestis, and Pseudomonas aeruginosa, all species that encounter the mammalian immune system inhibit neutrophil elastase strongly while ecotin from the plant pathogen Pantoea citrea inhibits neutrophil elastase 1000-fold less than the others PUBMED:14705961.

    \ \

    All potently inhibit pancreatic digestive peptidases trypsin and chymotrypsin, while showing more variable inhibition of the blood peptidases Factor Xa, thrombin, and urokinase-type plasminogen activator.

    \ \ 6186 IPR009416 \

    This family consists of several Mycobacterium ESAT-6 like proteins of unknown function.

    \ 60 IPR001606 \

    Members of the recently discovered ARID (AT-rich interaction domain) family of DNA-binding proteins are found in fungi and invertebrate and vertebrate metazoans. ARID-encoding genes are involved in a variety of biological processes\ including embryonic development, cell lineage gene regulation and cell cycle\ control. Although the specific roles of this domain and of ARID-containing proteins in transcriptional regulation are yet to be elucidated, they include both positive and negative transcriptional regulation and a likely involvement in the modification of chromatin structure PUBMED:10838570. The basic structure of the ARID domain domain appears to be a series of six\ alpha-helices separated by beta-strands, loops, or turns, but the structured\ region may extend to an additional helix at either or both ends of the basic\ six. Based on primary sequence homology, they can be partitioned into three\ structural classes: Minimal ARID proteins that consist of a core domain formed by six alpha helices; ARID proteins that supplement the core domain with an N-terminal alpha-helix; and Extended-ARID proteins, which contain the core domain and additional alpha-helices at their N- and C-termini.\

    \ \ \

    The human SWI-SNF complex protein p270 is an ARID family member with non-sequence-specific DNA binding activity. The ARID consensus and other structural features are common to both p270 and yeast SWI1, suggesting that p270 is a human counterpart of SWI1 PUBMED:10757798. The approximately 100-residue ARID sequence is present in a series of proteins strongly implicated in the regulation of cell growth, development, and tissue-specific gene expression. Although about a dozen ARID proteins can be identified from database searches, to date, only Bright (a regulator of B-cell-specific gene expression), dead ringer (a Drosophila melanogaster gene product required for normal development), and MRF-2 (which represses expression from the cytomegalovirus enhancer) have been analyzed directly in regard to their DNA binding properties. Each binds preferentially to AT-rich sites. In contrast, p270 shows no sequence preference in its DNA binding activity, thereby demonstrating that AT-rich binding is not an intrinsic property of ARID domains and that ARID family proteins may be involved in a wider range of\ DNA interactions PUBMED:10757798.

    \ 696 IPR008915 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This family contains metallopeptidases belonging to MEROPS peptidase family M50 (S2P protease family, clan MM).

    \ \

    Members of the M50 metallopeptidase family include: mammalian sterol-regulatory element binding protein (SREBP) site 2 protease, Escherichia coli protease EcfE, stage IV sporulation protein FB and various hypothetical bacterial and eukaryotic homologues. A number of proteins are classified as non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    \ 6597 IPR009614 \

    The Axe-Txe pair in Enterococcus faecium and the homologous YefM-YoeB pair in Escherichia coli have been shown to act as an antitoxin-toxin pair. This family describes the toxin component. Nearly every example found is next to an identifiable antitoxin, as indicated by match to PUBMED:12603745.

    \ 151 IPR002486 \ The function of this domain is unknown. It is found in the N-terminal\ region of nematode cuticle collagens, see . Cuticle is a tough\ elastic structure secreted by hypodermal cells and is primarily composed of\ collagen proteins PUBMED:2753356, PUBMED:7828882.\ 7921 IPR012625 \

    This family consists of the huwentoxin-II (HWTX-II) family of toxins secreted by spiders. These toxins are found in venom that secreted from the bird spider Selenocosmia huwena Wang. The HWTX-II adopts a novel scaffold different from the ICK motif that is found in other huwentoxins. HWTX-II consists of 37 amino acids residues including six cysteines involved in three disulfide bridges PUBMED:15066414.

    \ 474 IPR004154 \ tRNA synthetases, or tRNA ligases are involved in protein synthesis. This domain is found in histidyl, glycyl, threonyl and prolyl tRNA synthetases PUBMED:10447505 it is probably the anticodon binding domain PUBMED:9115984.\ 3428 IPR005866 \

    This model describes the N5-methyltetrahydromethanopterin: coenzyme M methyltransferase subunit C in methanogenic archaea. This methyltranferase is a\ membrane-associated enzyme complex that uses methyl-transfer reaction to drive a sodium-ion pump. Archaea have evolved energy-yielding pathways marked by one-carbon biochemistry featuring novel cofactors and enzymes. This transferase is involved in the transfer of a methyl group from N5-methyltetrahydromethanopterin to coenzyme M. In an accompanying reaction, methane is produced by two-electron reduction of the methyl moiety in methyl-coenzyme M by another enzyme methyl-coenzyme M reductase.

    \ \ 163 IPR004204 \ Cytochrome c oxidase, a 13 subunit complex, is the terminal oxidase in the mitochondrial electron transport chain. This\ family is composed of cytochrome c oxidase subunit VIc.\ 532 IPR002223 \

    The majority of the sequences having this domain belong to the MEROPS inhibitor family I2, clan IB; the Kunitz/bovine pancreatic trypsin inhibitor family, they inhibit proteases of the S1 family PUBMED:14705960 and are restricted to the metazoa with a single exception: Amsacta moorei entomopoxvirus. They are short (~50 residue)\ alpha/beta proteins with few secondary structures. The fold is constrained\ by 3 disulphide bonds. The type example for this family is aprotinin (bovine pancreatic trypsin inhibitor) PUBMED:1714504 (or \ basic protease inhibitor), but the family includes numerous other members\ PUBMED:1703675, PUBMED:1593645, PUBMED:8159751, PUBMED:1304909, such as snake venom basic protease; mammalian inter-alpha-trypsin\ inhibitors; trypstatin, a rat mast cell inhibitor of trypsin; a domain\ found in an alternatively-spliced form of Alzheimer's amyloid beta-protein;\ domains at the C-termini of the alpha(1) and alpha(3) chains of type VII\ and type VI collagens; and tissue factor pathway inhibitor precursor.

    \ 4990 IPR005545 \

    The majority of proteins in this group contain a single copy of this domain, though it is also found as a repeat (). A strongly conserved histidine and a aspartate suggest that the domain has an enzymatic function.

    \ 5665 IPR008426 \ This family consists of several eukaryotic centromere protein H (CENP-H) sequences. Macromolecular centromere-kinetochore complex plays a critical role in sister chromatid separation, but its complete protein composition as well as its precise dynamic function during mitosis has not yet been clearly determined. CENP-H contains a coiled-coil structure and a nuclear localisation signal. CENP-H is specifically and constitutively localised in kinetochores throughout the cell cycle. CENP-H may play a role in kinetochore organisation and function throughout the cell cycle PUBMED:10488063.\ 3313 IPR000111 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycosyl hydrolase family 27 together with the family 36 alpha-galactosidases form the glycosyl hydrolase clan GH-D (), a superfamily of alpha-galactosidases, alpha-N-acetylgalactosaminidases, and isomaltodextranases which are likely to share a common catalytic mechanism and structural topology.

    \

    Alpha-galactosidase () (melibiase) PUBMED:4561015 catalyzes the hydrolysis of\ melibiose into galactose and glucose. In man, the deficiency of this enzyme is\ the cause of Fabry's disease (X-linked sphingolipidosis). Alpha-galactosidase\ is present in a variety of organisms. There is a considerable degree of\ similarity in the sequence of alpha-galactosidase from various eukaryotic\ species.\ Escherichia coli alpha-galactosidase (gene melA), which requires NAD and\ magnesium as cofactors, is not structurally related to the eukaryotic enzymes;\ by contrast, an Escherichia coli plasmid encoded alpha-galactosidase (gene\ rafA ) PUBMED:2556373 contains a region of about 50 amino acids which is similar to a\ domain of the eukaryotic alpha-galactosidases.\ Alpha-N-acetylgalactosaminidase () PUBMED:2174888 catalyzes the hydrolysis of\ terminal non-reducing N-acetyl-D-galactosamine residues in N-acetyl-alpha-D-\ galactosaminides. In man, the deficiency of this enzyme is the cause of\ Schindler and Kanzaki diseases. The sequence of this enzyme is highly related\ to that of the eukaryotic alpha-galactosidases.

    \ 7996 IPR012617 \

    This C-terminal domain is found in traube proteins PUBMED:15112237.

    \ 6124 IPR009387 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 5084 IPR007921 \

    The CHAP (cysteine, histidine-dependent amidohydrolases/peptidases) domain is\ a region between 110 and 140 amino acids that is found in proteins from\ bacteria, bacteriophages, archaea and eukaryotes of the Trypanosomidae family.\ Many of these proteins are uncharacterized, but it has been proposed that they\ may function mainly in peptidoglycan hydrolysis. The CHAP domain is found in a\ wide range of protein architectures; it is commonly associated with bacterial\ type SH3 domains and with several families of amidase domains. It has been\ suggested that CHAP domain containing proteins utilize a catalytic cysteine\ residue in a nucleophilic-attack mechanism PUBMED:12765833, PUBMED:12765834.

    \ \

    The CHAP domain contains two invariant residues, a cysteine and a histidine.\ These residues form part of the putative active site of CHAP domain containing\ proteins. Secondary structure predictions show that the CHAP domain belongs to\ the alpha + beta structural class, with the N-terminal half largely containing\ predicted alpha helices and the C-terminal half principally composed of\ predicted beta strands PUBMED:12765833, PUBMED:12765834.

    \ \

    Some proteins known to contain a CHAP domain are listed below:\

    \ \ 520 IPR002777 \

    Prefoldin (PFD) is a chaperone that interacts exclusively with type II chaperonins, hetero-oligomers lacking an obligate co-chaperonin that are found only in eukaryotes (chaperonin-containing T-complex polypeptide-1 (CCT)) and archaea. Eukaryotic PFD is a multi-subunit complex containing six polypeptides in the molecular mass range of 1423 kDa. In archaea, on the other hand, PFD is composed of two types of subunits, two alpha and four beta. The six subunits associate to form two back-to-back up-and-down eight-stranded barrels, from which hang six coiled coils. Each subunit contributes one (beta subunits) or two (alpha subunits) beta hairpin turns to the barrels. The coiled coils are formed by the N and C termini of an individual subunit. Overall, this unique arrangement resembles a jellyfish. The eukaryotic PFD hexamer is composed of six different subunits; however, these can be grouped into two alpha-like (PFD3 and -5) and four beta-like (PFD1, -2, -4, and -6) subunits based on amino acid sequence similarity with their archaeal counterparts. Eukaryotic PFD has a six-legged structure similar to that seen in the archaeal homologue PUBMED:11106732, PUBMED:12456645. This family contains the archaeal beta subunit, eukaryotic prefoldin subunits 1, 2, 4 and 6.

    \ \

    Eukaryotic PFD has been shown to bind both actin and tubulin co-translationally. The chaperone then delivers the target protein to CCT, interacting with the chaperonin through the tips of the coiled coils. No authentic target proteins of any archaeal PFD have been identified, to date.

    \ 7624 IPR012434 \

    The members of this family are sequences derived from a group of hypothetical proteins expressed by certain bacterial species. The region concerned is approximately 440 amino acid residues in length.

    \ 6588 IPR010632 \

    This is a group of plant proteins, most of which are hypothetical and of unknown function. All members contain the domain, suggesting that they may possess kinase activity.

    \ 2402 IPR004703 \ Bacterial PTS transporters transport and concomitantly phosphorylate their sugar substrates, and typically consist of multiple subunits or protein domains.The Man family is unique in several respects among PTS permease families.\
  • It is the only PTS family in which members possess a IID protein.
  • It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.
  • Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.
  • \

    The only characterized member of this family of PTS transporters is the Escherichia coli galactitol transporter. Gat family PTS systems typically have 3 components: IIA, IIB and IIC.

    This family is specific for the IIC component of the PTS Gat family.

    \ 4685 IPR001102 \ Synonym(s): Transglutaminase, Fibrinoligase, TGase \

    Protein-glutamine gamma-glutamyltransferases () (TGase) are calcium-dependent enzymes that\ catalyze the cross-linking of proteins by promoting the formation of\ isopeptide bonds between the gamma-carboxyl group of a glutamine in one\ polypeptide chain and the epsilon-amino group of a lysine in a second\ polypeptide chain. TGases also catalyze the conjugation of polyamines to\ proteins PUBMED:1683845, PUBMED:1974250.

    \ \

    Transglutaminases are widely distributed in various organs, tissues and\ body fluids. The best known transglutaminase is blood coagulation factor XIII,\ a plasma tetrameric protein composed of two catalytic A subunits and two\ non-catalytic B subunits. Factor XIII is responsible for cross-linking fibrin chains,\ thus stabilizing the fibrin clot.

    \ 5133 IPR007970 \

    This family consists of several uncharacterised Drosophila melanogaster proteins of unknown function.

    \ 6954 IPR009809 \

    This family consists of several hypothetical bacterial proteins of around 180 residues in length. The function of this family is unknown.

    \ 849 IPR007191 \

    Sec8 is a component of the exocyst complex involved in the docking of exocystic vesicles with a fusion site on the plasma membrane. The exocyst complex is composed of Sec3, Sec5, Sec6, Sec8, Sec10, Sec15, Exo70 and Exo84.

    \ 1476 IPR003153 \

    Cbl adaptor proteins are RING-type E3 ubiquitin ligases. Cbl may be involved in the negative regulation of thymocyte development, targeting its substrate for ubiquitination PUBMED:11864842. The ubiquitin ligase activity of Cbl, and of its homologue Cbl-b, plays a role in the negative regulation of upstream kinases, such as Lck, Syk and PI3K, in T and B cells PUBMED:12787751. Cbl can interact with the EGF receptor (EGFR), causing the ubiquitination of the receptor following EGF ligand binding and Grb2 association. Ubiquitination is required for ligand-induced endocytosis of the EGFR PUBMED:15194809. The N-terminal domain of Cbl is evolutionarily conserved, and is known to bind to phosphorylated tyrosine residues.

    \ 4271 IPR000605 \ This domain includes RNA helicases thought to be involved in duplex unwinding during viral RNA replication.\ Members of this domain are found in a variety of single stranded RNA viruses.\ 47 IPR002110 \

    The ankyrin repeat is one of the most common protein-protein interaction motifs in nature. Ankyrin repeats are tandemly repeated modules of about 33 amino acids. They occur in a large number of functionally diverse proteins mainly from eukaryotes. The few known examples from prokaryotes and viruses may be the result of horizontal gene transfers PUBMED:8108379. The repeat has been found in proteins of diverse function such as transcriptional initiators, cell-cycle regulators, cytoskeletal, ion transporters and signal transducers. The ankyrin fold appears to be defined by its structure rather than its function since there is no specific sequence or structure which is universally recognised by it.

    \ \

    The conserved fold of the ankyrin repeat unit is known from several crystal and solution structures PUBMED:8875926, PUBMED:9353127, PUBMED:9461436, PUBMED:9865693. Each repeat folds into a helix-loop-helix structure with a beta-hairpin/loop region projecting out from the helices at a 90o angle. The repeats stack together to form an L-shaped structure PUBMED:8875926, PUBMED:12461176.

    \ \ \ 2434 IPR001323 \ Erythropoietin, a plasma glycoprotein, is the primary physiological mediator of \ erythropoiesis PUBMED:3773894. It is involved in the regulation of the level of peripheral \ erythrocytes by stimulating the differentiation of erythroid progenitor cells, found in \ the spleen and bone marrow, into mature erythrocytes PUBMED:3346214. It is primarily \ produced in adult kidneys and foetal liver, acting by attachment to specific binding \ sites on erythroid progenitor cells, stimulating their differentiation PUBMED:2877922. \ Severe kidney dysfunction causes reduction in the plasma levels of erythropoietin,\ resulting in chronic anaemia - injection of purified erythropoietin into the blood stream \ can help to relieve this type of anaemia. Levels of erythropoietin in plasma fluctuate \ with varying oxygen tension of the blood, but androgens and prostaglandins also modulate \ the levels to some extent PUBMED:2877922. Erythropoietin glycoprotein sequences are well \ conserved, a consequence of which is that the hormones are cross-reactive among mammals,\ i.e. that from one species, say human, can stimulate erythropoiesis in\ other species, say mouse or rat PUBMED:1420369. \ \

    Thrombopoeitin (TPO), a glycoprotein, is the mammalian0 hormone which functions as a \ megakaryocytic lineage specific growth and differentiation factor affecting the \ proliferation and maturation from their committed progenitor cells acting at a late \ stage of megakaryocyte development. It acts as a circulating regulator of platelet \ numbers.

    \ 282 IPR006696 \ This is a potential integral membrane protein with no known function.\ 5520 IPR008692 \ This family contains several membrane proteins from Mycobacterium species PUBMED:11891304.\ 2830 IPR001482 \ A number of bacterial proteins, some of which are involved in a general\ secretion pathway (GSP) for the export of proteins (also called the type II\ pathway) belong to this group PUBMED:8438237, PUBMED:7934814. These proteins\ are probably located in the cytoplasm and, on the basis of the presence of a\ conserved P-loop region , bind ATP.\ 5013 IPR000380 \ Prokaryotic topoisomerase I () PUBMED:7773745, PUBMED:7770916, otherwise known as relaxing enzyme, untwisting \ enzyme or swivelase, catalyses the ATP-independent breakage of single-\ stranded DNA, followed by passage and rejoining of another single-stranded \ DNA region PUBMED:8114910. This reaction brings about the conversion of one topological\ isomer of DNA into another: e.g., relaxation of superhelical turns; \ interconversion of simple and knotted rings of single-stranded DNA; and\ intertwisting of single-stranded rings of complementary sequences PUBMED:8114910, PUBMED:2553698.\ Prokaryotic topoisomerase I folds in an unusual way to give 4 distinct\ domains, enclosing a hole large enough to accommodate a double-stranded DNA\ segment PUBMED:8114910. A tyrosine at the active site, which lies at the interface of\ 2 domains, is involved in transient breakage of a DNA strand, and formation\ of a covalent protein-DNA intermediate PUBMED:8114910. The structure reveals a\ plausible mechanism by which this and related enzymes could catalyse the \ passage of one DNA strand through a transient break in another strand PUBMED:8114910.\ Escherichia coli contains 2 type I topoisomerases: topoisomerases I and III PUBMED:2553698.\ Topoisomerase III can be purified as a potent concatenase, but its role in\ DNA metabolism is still unclear PUBMED:2553698. Yeast, a eukaryote, also contains a\ topoisomerase, which is similar in sequence and function to the prokaryotic\ type I topoisomerases PUBMED:2546682.\ 7052 IPR010821 \

    This family consists of several plant specific Chlorophyllase proteins (). Chlorophyllase (Chlase) is the first enzyme involved in chlorophyll (Chl) degradation and catalyses the hydrolysis of ester bond to yield chlorophyllide and phytol PUBMED:10611389.

    \ 4959 IPR008187 \ The human immunodeficiency virus type 1 Vpu protein acts in the degradation of CD4 in the endoplasmic reticulum and in the enhancement of virion release from the plasma membrane of infected cells PUBMED:7853484.\ 3777 IPR005073 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases associated with clan U- have an unknown catalytic mechanism as the protein fold of the active site domain and the active site residues have not been reported.

    \

    This group of peptidases belong to MEROPS peptidase family U6 (murein endopeptidase family, clan U-). The type example is murein endopeptidase from Escherichia coli, which are a family of penicillin-insensitive murein endopeptidases involved in the removal of murein from the sacculus by cleaving the peptide bonds between neighbouring strands in mature murein.

    \ 92 IPR000515 \

    Bacterial binding protein-dependent transport systems PUBMED:3527048, PUBMED:2229036 are multicomponent systems typically composed of a periplasmic substrate-binding protein, one or two reciprocally homologous integral inner-membrane proteins and one or two peripheral membrane ATP-binding proteins that couple energy to the active transport system. The integral inner-membrane proteins translocate the substrate across the membrane. It has been shown PUBMED:3000770, PUBMED:7934906 that most of these proteins contain a conserved region located about 80 to 100 residues from their C-terminal extremity. This region seems PUBMED:1738314 to be located in a cytoplasmic loop between two transmembrane domains. Apart from the conserved region, the sequence of these proteins is quite divergent, and they have a variable number of transmembrane helices, however they can be classified into seven families which have been respectively termed: araH, cysTW, fecCD, hisMQ, livHM, malFG and oppBC.

    \ 5055 IPR007892 \

    CHASE4 is an extracellular sensory domain, which is present in various classes of\ transmembrane receptors that are upstream of signal transduction pathways in prokaryotes. Specifically,\ CHASE4 domains are found in histidine kinases in archaea and in predicted diguanylate\ cyclases/phosphodiesterases in bacteria. Environmental factors that are recognized by CHASE4\ domains are not known at this time PUBMED:12486065.

    \ 691 IPR002933 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    \ This group of proteins contains the metallopeptidases and non-peptidase homologues that belong to the MEROPS peptidase family M20 (clan MH) PUBMED:7674922. The peptidases of this clan have two catalytic zinc ions at the active site, bound by His/Asp, Asp, Glu, Asp/Glu and His. The catalysed reaction involves the release of an N-terminal aminoacid, usually neutral or hydrophobic, from a polypeptide.PUBMED:7674922. \ The peptidase M20 family has four sub-families: \

    \ 1321 IPR001615 \ Bacillus thuringiensis produces toxins active against insects PUBMED:8632451. The toxin kills the larvae of dipteran insects by making pores in the epithelial cell membrane of the insect midgut. The crystal protein is produced during sporulation and is accumulated both as an inclusion and as part of the spore coat.\ 1047 IPR013079 \

    6-Phosphofructo-2-kinase (, ) is a bifunctional enzyme that catalyses both the synthesis and the degradation of fructose-2, 6-bisphosphate. The fructose-2,6-bisphosphatase reaction involves a phosphohistidine intermediate. The catalytic pathway is:\ \ \ The enzyme is important in the regulation of hepatic carbohydrate metabolism and is found in greatest quantities in the liver, kidney and heart. In mammals, several genes often encode different isoforms, each of which differs in its tissue distribution and enzymatic activity PUBMED:9652401. The family described here bears a resemblance to the ATP-driven phospho-fructokinases, however, they share little sequence similarity, although a few residues seem key to their interaction with fructose 6-phosphate PUBMED:9753654.

    \ \

    This domain forms the N-terminal region of this enzyme, while forms the C-terminal domain.

    \ 8003 IPR012572 \

    This domain is required for cell cycle arrest induced by spindle assembly checkpoint (SPC) activation. It is also involved in the nuclear accumulation and kinetochore targeting of proteins Bub1p, Bub3p and Mad3p PUBMED:15525673.

    \ 7066 IPR009875 \

    This family consists of several bacterial type IV pilus assembly (PilZ) proteins. PilZ is thought to have a cytoplasmic location and be essential for type 4 fimbrial biogenesis but its exact function is unknown PUBMED:8550441.

    \ 6887 IPR009769 \

    This entry represents the C terminus (approximately 250 residues) of a number of hypothetical plant proteins of unknown function.

    \ 8035 IPR013239 \

    This is a family of fungal proteins. RPA14 is one of the final two subunits of Saccharomyces cerevisiae RNA polymerase I and is proposed to play a role in the recruitment of pol I to the promoter PUBMED:15647272.

    \ 1104 IPR006717 \ This domain constitutes the N-terminal of E1B 55 kDa (). E1B 55K binds p53 the tumour suppressor protein converting it from a transcriptional activator which responds to damaged DNA into an unregulated repressor of genes with a p53 binding site PUBMED:10207064. This protects the virus against p53 induced host anitviral responses and prevents apoptosis as induced by the adenovirus E1A protein PUBMED:10207064. The role of the N-terminal in the function of E1B is not known.\ 1351 IPR004148 \

    Endocytosis and intracellular transport involve several mechanistic steps:

  • (1) for the internalisation of cargo molecules, the membrane needs to bend to form a vesicular structure, which requires membrane curvature and a rearrangement of the cytoskeleton;
  • (2) following its formation, the vesicle has to be pinched off the membrane;
  • (3) the cargo has to be subsequently transported through the cell and the vesicle must fuse with the correct cellular compartment.
  • Members of the Amphiphysin protein family are key regulators in the early steps of endocytosis, involved in the formation of clathrin-coated vesicles by promoting the assembly of a protein complex at the plasma membrane and directly assist in the induction of the high curvature of the membrane at the neck of the vesicle. Amphiphysins contain a characteristic domain, known as the BAR (BinAmphiphysinRvs)-domain, which is required for their in vivo function and their ability to tubulate membranes PUBMED:14993925. \

    The crystal structure of these proteins suggest the domain forms a crescent-shaped dimer of a three-helix coiled coil with a characteristic set of conserved hydrophobic, aromatic and hydrophilic amino acids. Proteins containing this domain have been shown to homodimerise, heterodimerise or, in a few cases, interact with small GTPases.

    \ 2864 IPR002531 \ The hypervariable region of the E2/NS1 region of hepatitis C virus\ varies greatly between viral isolates. E2 is thought to encode a\ structurally unconstrained envelope protein PUBMED:9425941.\ 5493 IPR008529 \

    These are hypothetical proteins from the proteobacteria.

    \ 2530 IPR003467 \ Fimbriae (also know as pili) are polar filaments radiating from the surface of the bacterium to a length of 0.5-1.5 micrometers, that enable bacteria to colonize the epithelium of specific host organs. This family consists of the minor and major fimbrial subunits.\ 1245 IPR000802 \

    Arsenic is a toxic metalloid whose trivalent and pentavalent ions inhibit\ a variety of biochemical processes. Operons that encode arsenic resistance\ have been found in multicopy plasmids from both Gram-positive and\ Gram-negative bacteria PUBMED:7721697. The resistance mechanism is encoded from a single\ operon, which houses an anion pump. The pump has two polypeptide components:\ a catalytic subunit (the ArsA protein), which functions as an\ oxyanion-stimulated ATPase; and an arsenite export component (the ArsB protein),\ which is associated with the inner membrane PUBMED:1688427. The ArsA and ArsB proteins\ are thought to form a membrane complex that functions as an\ anion-translocating ATPase.

    \

    The ArsB protein is distinguished by its overall hydrophobic character,\ in keeping with its role as a membrane-associated channel. Sequence\ analysis reveals the presence of 13 putative transmembrane (TM) regions.

    \ 4186 IPR005633 \

    The N-terminal domain appears to be specific to the eukaryotic ribosomal proteins L25, L23, and L23a.

    \ 5623 IPR008780 \ This family consists of several Vir proteins specific to Plasmodium vivax. The vir genes are present at about 600-1,000 copies per haploid genome and encode proteins that are immunovariant in natural infections, indicating that they may have a functional role in establishing chronic infection through antigenic variation PUBMED:11298455.\ 2294 IPR007021 \ These are transposase-like proteins with no known function.\ 1923 IPR003827 \

    This entry describes proteins of unknown function.

    \ 2607 IPR002908 \

    This family is related to the globular C-terminus of frataxin the protein that is mutated in Friedreich's ataxia PUBMED:8931268. Friedreich ataxia is a progressive neurodegenerative disorder caused by loss of function mutations in the gene encoding frataxin (FRDA). Frataxin mRNA is predominantly expressed in tissues with a high metabolic rate (including liver, kidney, brown fat and heart). Mouse and yeast frataxin homologues contain a potential N-terminal mitochondrial targeting sequence, and human frataxin has been observed to co-localise with a mitochondrial protein. Furthermore, disruption of the yeast gene has been shown to result in mitochondrial dysfunction. Friedreich's ataxia is thus believed to be a mitochondrial disease caused by a mutation in the nuclear genome (specifically, expansion of an intronic GAA triplet repeat) PUBMED:8596916, PUBMED:8815938, PUBMED:9241270.

    \

    This family is also found found in a number of bacterial proteins whose functions are currently unknown.

    \ 6557 IPR009601 \

    This family consists of mammalian nuclear receptor co-activator NRIF3 proteins. NRIF3 exhibits a distinct receptor specificity in interacting with and potentiating the activity of only TRs and RXRs but not other examined nuclear receptors. NRIF3 as a coregulator that possesses both transactivation and transrepression domains and/or functions. Collectively, the NRIF3 family of coregulators may play dual roles in mediating both positive and negative regulatory effects on gene expression PUBMED:11713274.

    \ 2892 IPR006727 \

    This is a family of unknown function found in the Herpes viruses.

    \ 2744 IPR006104 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 2 \ comprises enzymes with several known activities; beta-galactosidase (); beta-mannosidase (); beta-glucuronidase ().

    \ \

    These enzymes contain a conserved glutamic acid residue which has been shown PUBMED:1350782, in Escherichia coli lacZ (), to be the general acid/base catalyst in the active site of the enzyme.

    The sugar binding domain has a jelly-roll fold PUBMED:8008071.

    \ 263 IPR005174 \

    This family of proteins are found in plants. The function of the proteins is unknown.

    \ 3823 IPR002104 \

    Members of this family cleave DNA substrates by a series of staggered cuts, during which the protein becomes covalently linked to the DNA through a catalytic tyrosine residue at the carboxy end of the alignment PUBMED:9082984, PUBMED:9288963.

    \ \

    The catalytic site residues in CRE recombinase () are Arg-173, His-289, Arg-292 and Tyr-324.

    \ 4394 IPR000389 \ A number of small hydrophilic plant seed proteins are structurally related.\ These proteins contains from 83 to 153 amino acid residues and may play a role\ PUBMED:12231998, PUBMED:8492809 in equipping the seed for survival, maintaining a minimal level of\ hydration in the dry organism and preventing the denaturation of cytoplasmic\ components. They may also play a role during imbibition by controlling water\ uptake.\ 3913 IPR000662 \ Proteins belonging to this family include coat protein VP1 from polyomaviruses, which are dsDNA viruses with no RNA stage in their life cycle. The virus capsid is composed of 72 icosahedral units, each of which is composed of five copies of VP1. The virus attaches to the cell surface by recognition of oligosaccharides terminating in alpha(2,3)-linked sialic acid.\ 3083 IPR003220 \ Insertion elements are mobile elements in DNA, usually encoding proteins required for transposition, for example transposases. This protein is absolutely required for transposition of insertion element 1.\ 7710 IPR012909 \

    This domain is found at the N-terminus of the polyhydroxyalkanoate (PHA) synthesis regulators. These regulators have been shown to directly bind DNA and PHA PUBMED:12081972. The invariant nature of this domain compared to the C-terminal domain(s) suggests that it contains the DNA-binding function.

    \ 7485 IPR010003 \

    This entry represents a conserved region approximately 60 residues long within eukaryotic HepA-related protein (HARP). This exhibits single-stranded DNA-dependent ATPase activity, and is ubiquitously expressed in human and mouse tissues PUBMED:10857751. Family members may contain more than one copy of this region.

    \ 5681 IPR008567 \ This family consists of several hypothetical prokaryotic proteins with no known function.\ 1657 IPR001251 \ This entry defines the C-terminal of various retinaldehyde/retinal-binding proteins that may be\ functional components of the visual cycle. Cellular retinaldehyde-binding protein (CRALBP) carries 11-cis-retinol or 11-cis-retinaldehyde as endogenous ligands and may function as a substrate carrier protein that modulates interaction of these retinoids with visual cycle enzymes PUBMED:1715867. \ The multidomain protein Trio binds the LAR transmembrane tyrosine phosphatase, contains a protein kinase domain, and has separate rac-specific and rho-specific guanine nucleotide exchange factor domains PUBMED:8643598. Trio is a multifunctional protein that integrates and amplifies signals involved in coordinating actin remodeling, which is necessary for cell migration and growth.\

    Other members of the family are \ transfer proteins that include, guanine nucleotide exchange factor that may \ function as an effector of RAC1, phosphatidylinositol/phosphatidylcholine transfer \ protein that is required for the transport of secretory proteins from the golgi\ complex and alpha-tocopherol transfer protein that enhances the transfer of the \ ligand between separate membranes.

    \ 4075 IPR003914 \

    Regeneration of injured axons at neuromuscular junctions has been assumed\ to be regulated by extra-cellular factors that promote neurite outgrowth.\ A novel neurite outgrowth factor from chick denervated skeletal muscle has \ been cloned and characterised. The protein, termed neurocrescin (rabaptin),\ has been shown to be secreted in an activity-dependent fashion PUBMED:9427343.

    Rabaptin is a 100kDa coiled-coil protein that interacts with the GTP form of the small GTPase Rab5, a potent regulator of endocytic transport PUBMED:8521472. It is mainly cytosolic, but a fraction co-localises with Rab5 to early endosomes. Rab5 recruits rabaptin-5 to purified early endosomes in a\ GTP-dependent manner, demonstrating functional similarities with other members of the Ras superfamily. Immunodepletion of rabaptin-5 from cytosol strongly inhibits Rab5-dependent early endosome fusion. Thus, rabaptin-5 is a Rab effector required for membrane docking and fusion.

    \ 725 IPR001736 \ Phosphatidylcholine-hydrolyzing phospholipase D (PLD) isoforms are activated by ADP-ribosylation factors (ARFs). PLD produces phosphatidic acid from phosphatidylcholine, which may be essential for the formation of certain types of transport vesicles or may be constitutive vesicular transport to signal transduction pathways. PC-hydrolyzing PLD is a homologue of cardiolipin synthase, phosphatidylserine synthase, bacterial PLDs, and viral proteins. Each of these appears to possess a domain duplication which is apparent by the presence of two motifs containing well-conserved histidine, lysine, and/or asparagine residues which may contribute to the active site aspartic acid. An Escherichia coli endonuclease (nuc) and similar proteins appear to be PLD homologues but possess only one of these motifs PUBMED:8732763, PUBMED:8755242, PUBMED:8051126, PUBMED:9242915.\ 6612 IPR009622 \

    This is a group of proteins of unknown function.

    \ 5277 IPR008473 \ This family appears to be found in a group of prophage proteins.\ 594 IPR006738 \

    Motilin is a gastrointestinal regulatory polypeptide produced by motilin cells in the duodenal epithelium. It is released into the general circulation at about 100-min intervals during the inter-digestive state and is the most important factor in controlling the inter-digestive migrating contractions. Motilin also stimulates endogenous release of the endocrine pancreas PUBMED:9210180.

    This domain is also found in ghrelin, a growth hormone secretagogue synthesised by endocrine cells in the stomach. Ghrelin stimulates growth hormone secretagogue receptors in the pituitary. These receptors are distinct from the growth hormone-releasing hormone receptors, and thus provide a means of controlling pituitary growth hormone release by the gastrointestinal system PUBMED:11306336.

    \ 2625 IPR007014 \

    This is a family of short proteins found in eukaryotes and some archaea. Although the function of these proteins is not known they may contain transmembrane helices.

    \ 7875 IPR012510 \

    The repeat has the consensus sequence GDV(K/Q/R)(T/S/G)X(R/K/T) WLFETXPLD. This repeat motif is typically found in the N terminus of the proteins, with a copy number between 2 and 28 repeats. In the human cardiomyopathy associated protein 1, the SAA repeat domain may interact with actin (unpublished, Wu X, et al). The corresponding homologues in mouse and chicken localise in the adherens junction complex of the intercalated disc in cardiac muscle and in the myotendon junction of skeletal muscle. mXin may co-localise with vinculin that is known to attach the actin to the cytoplasmic membrane PUBMED:12203715. Therefore, it is thought that all the proteins containing the repeat act as a adapter proteins for actin binding.

    \ 5222 IPR008798 \ This family consists of several avirulence proteins from Pseudomonas syringae and Xanthomonas campestris.\ 2252 IPR000620 \ This domain is found in proteins including the Erwinia chrysanthemi PecM protein, which is involved in pectinase, cellulase and blue pigment regulation; and the Salmonella typhimurium PagO protein, the function of which is unknown. Many members of this family have no known function and are predicted to be integral membrane proteins and many of the proteins contain two copies of the domain.\ 694 IPR002886 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Members of this family are the metallopeptidases belonging to MEROPS peptidase family M23 (clan M-), subfamily M23B.

    \ \

    The proteins in this family are variously described as either peptidase, endopeptidase, \ NlpD, \ lysostaphin, \ lipoprotein, \ or hypothetical protein. \ \ The bacterial lipoproteins such as and for which no proteolytic activity has been demonstrated belong to the non-peptidase homologs of the M23B subfamily.

    \ 5487 IPR008808 \ This family consists of several broad-spectrum mildew resistance proteins from Arabidopsis thaliana. Plant disease resistance (R) genes control the recognition of specific pathogens and activate subsequent defence responses. The A. thaliana locus Resistance to powdery mildew 8 (RPW8) contains two naturally polymorphic, dominant R genes, RPW8.1 and RPW8.2, which individually control resistance to a broad range of powdery mildew pathogens. They induce localised, salicylic acid-dependent defences similar to those induced by R genes that control specific resistance. Apparently, broad-spectrum resistance mediated by RPW8 uses the same mechanisms as specific resistance PUBMED:11141561, PUBMED:12509520.\ 5509 IPR004646 \

    A number of Fe-S cluster-containing hydro-lyases share a conserved motif, including argininosuccinate lyase, adenylosuccinate lyase, aspartase, class I fumarate hydratase (fumarase), and tartrate dehydratase (see ). Proteins in this group represent a subset of closely related proteins or modules, including the Escherichia coli tartrate dehydratase alpha chain and the N-terminal region of the class I fumarase (where the C-terminal region is homologous to the tartrate dehydratase beta chain). The activity of archaeal proteins in this group is unknown.

    \ 2618 IPR001182 \ A number of prokaryotic integral membrane proteins involved in cell cycle processes\ have been found to be structurally related PUBMED:2509435, PUBMED:2113157. These proteins include, the\ Escherichia coli and related bacteria cell division protein ftsW and the rod\ shape-determining protein rodA (or mrdB), the Bacillus subtilis stage V sporulation\ protein E (spoVE), the Bacillus subtilis hypothetical proteins ywcF and ylaO and the\ Cyanophora paradoxa cyanelle ftsW homolog.\ 7150 IPR009930 \

    This family consists of several Seadornavirus Vp10 proteins found in the Banna and Kadipiro virus. Members of this family are typically around 240 residues in length. The function of this family is unknown.

    \ 552 IPR004872 \

    This family of bacterial lipoproteins contains several antigenic members, that may be involved in bacterial virulence. Their precise function is unknown. However they are probably distantly related to which are solute binding proteins.

    \ \ 5027 IPR005381 \

    This domain is a putative nucleic acid binding zinc finger and is found in proteins that also contain an XS domain and a XH domain .

    \ 4697 IPR002686 \ Transposases are needed for efficient transposition of the insertion sequence or transposon DNA. This family includes transposases for IS200 from Escherichia coli.\ 6096 IPR010431 \

    This family consists of several eukaryotic fascin or singed proteins. The fascins are a structurally unique and evolutionarily conserved group of actin cross-linking proteins. Fascins function in the organisation of two major forms of actin-based structures: dynamic, cortical cell protrusions and cytoplasmic microfilament bundles. The cortical structures, which include filopodia, spikes, lamellipodial ribs, oocyte microvilli and the dendrites of dendritic cells, have roles in cell-matrix adhesion, cell interactions and cell migration, whereas the cytoplasmic actin bundles appear to participate in cell architecture PUBMED:11948621.

    \ 1348 IPR003103 \

    BAG domains are present in Bcl-2-associated athanogene 1 and silencer of death domains. The BAG proteins are modulators of chaperone activity, they bind to HSP70/HSC70 proteins and promote substrate release. The proteins have anti-apoptotic activity and increase the anti-cell death function of BCL-2 induced by various stimuli. BAG-1 binds to the\ serine/threonine kinase Raf-1 or Hsc70/Hsp70 in a mutually exclusive interaction. BAG-1 promotes cell growth by binding to and stimulating Raf-1 activity. The binding of Hsp70 to BAG-1 diminishes Raf-1\ signaling and inhibits subsequent events, such as DNA synthesis, as well as arrests the cell cycle. BAG-1 has been suggested to function as a molecular switch that\ encourages cells to proliferate in normal conditions but become quiescent under a stressful environment PUBMED:12406544.

    BAG-family proteins contain a single\ BAG domain, except for human BAG-5 which has four BAG repeats. The BAG domain is a conserved region located at the C-terminus of the BAG-family\ proteins that binds the ATPase domain of Hsc70/Hsp70. The BAG domain is evolutionarily conserved, and BAG domain containing proteins have been\ described and/or proven in a variety of organisms including mice, Xenopus, Drosophila, Bombyx mori (silk worm), Caenorhabditis elegans, Saccharomyces\ cerevisiae, Schizosaccharomyces pombe, and Arabidopsis thaliana.

    \

    The BAG domain has\ 110124 amino acids and is comprised of three anti-parallel alpha-helices, each approximately 3040 amino acids in length. The first and second\ helices interact with the serine/threonine kinase Raf-1 and the second and third helices are the sites of the BAG domain interaction with the\ ATPase domain of Hsc70/Hsp70. Binding of the BAG domain to the ATPase domain is mediated by both electrostatic and\ hydrophobic interactions in BAG-1 and is energy requiring.

    \ 5009 IPR002653 \ A20 (an inhibitor of cell death)-like zinc fingers are believed to mediate self-association in A20. These fingers also mediate IL-1-induced NF-kappa B activation.\ 5191 IPR005414 \

    The type III secretion system of Gram-negative bacteria is used to transport virulence factors from the pathogen directly into the host cell PUBMED:9618447 and is only triggered when the bacterium comes into close contact with the host. Effector proteins secreted by the type III system do not possess a secretion signal, and are considered unique because of this. Salmonella spp. \ secrete an effector protein called SopE that is responsible for stimulating \ the reorganisation of the host cell actin cytoskeleton, and ruffling of the \ cellular membrane PUBMED:9482928. It acts as a guanyl-nucleotide-exchange factor on Rho-GTPase proteins such as Cdc42 and Rac. As it is imperative for the bacterium \ to revert the cell back to its "normal" state as quickly as possible, \ another tyrosine phosphatase effector called SptP reverses the actions \ brought about by SopE PUBMED:11316807.

    \ \

    Recently, it has been found that SopE and its protein homologue SopE2 can\ activate different sets of Rho-GTPases in the host cell PUBMED:11316807. Far from being a redundant set of two similar type III effectors, they both act in unison \ to specifically activate different Rho-GTPase signalling cascades in the\ host cell during infection.\

    \ 7568 IPR012919 \

    The Caenorhabditis elegans UNC-84 protein is a nuclear envelope protein that is involved in nuclear anchoring and migration during development. The S. pombe Sad1 protein localises at the spindle pole body. UNC-84 and Sad1 share a common C-terminal region that is often termed the SUN (Sad1 and UNC) domain PUBMED:10508607, PUBMED:15082709. In mammals, the SUN domain is present in two proteins, Sun1 and Sun2 PUBMED:10508607. The SUN domain of Sun2 has been demonstrated to be in the periplasm PUBMED:15082709.

    \ 6220 IPR009435 \

    The Asr protein is synthesised as a precursor and the cleavage is essential for moderate to high acid tolerance PUBMED:12670971.

    \ 6679 IPR009657 \

    This family contains a number of hypothetical viral proteins of unknown function approximately 200 residues long.

    \ 7345 IPR011115 \

    SecA protein binds to the plasma membrane where it interacts with proOmpA to support translocation of proOmpA through the membrane. SecA protein achieves this translocation, in association with SecY protein, in an ATP-dependent manner PUBMED:9644254,PUBMED:2542029. This domain represents the N-terminal ATP-dependent helicase domain, which is related to the .

    \ 809 IPR005574 \

    The eukaryotic RNA polymerase subunits RPB4 and RPB7 form a heterodimer that reversibly associates with the RNA polymerase II core. Archaeal cells contain a single RNAP made up of about 12 subunits, displaying considerable homology to the\ eukaryotic RNAPII subunits. The RPB4 and RPB7 homologs are called subunits F and E, respectively, and\ have been shown to form a stable heterodimer. While the RPB7 homolog is\ reasonably well conserved, the similarity between the eukaryotic RPB4 and the archaeal F subunit is barely detectable PUBMED:11741548.

    \ 7189 IPR009155 \

    Cytochrome b562 is a haem-containing protein that is expressed in the periplasm of Escherichia coli. In b-type cytochromes, the haem atom is not covalently attached to the polypeptide. Cytochrome b562 has a four-helical bundle structure that is structurally similar to that found in members of the cytochrome c family (). Cytochrome b562 has a reduction potential of 167 mV, which sets the energy yield possible in metabolism and is also a key determinant of the rate at which redox reactions proceed PUBMED:11914078.

    \ 2424 IPR005491 \ Emsy protein is amplified in breast cancer and interacts with BRCA2. The Emsy N terminal (ENT) domain is found in other vertebrate and plant proteins of unknown function, and has a completely conserved histidine residue that may be functionally important.\ 1692 IPR000199 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This signature defines cysteine peptidases belong to MEROPS peptidase family C3 (picornain, clan PA(C)), subfamilies C3A and C3B. The protein fold of this peptidase domain for members of this family resembles that of the serine peptidase, chymotrypsin PUBMED:8164744, the type example for clan PA.

    \ \

    Picornaviral proteins are expressed as a single polyprotein\ which is cleaved by the viral C3 cysteine protease. The poliovirus polyprotein is selectively cleaved between the Gln-|-Gly bond. In other picornavirus reactions Glu may be substituted for Gln, and Ser or Thr for Gly.\

    \ 2369 IPR001177 \ Papillomavirus helicase E1 protein is an ATP-dependent DNA helicase required for\ initiation of viral NA replication. It forms a complex with the viral E2 protein.\ The E1-E2 complex binds to the replication origin which contains binding sites for\ both proteins.\ 6302 IPR010511 \

    This entry comprises the N-terminal domain of membrane-bound lytic murein transglycosylase D PUBMED:10843862.

    \ 7561 IPR011706 \

    Copper is one of the most prevalent transition metals in living organisms and its biological function is intimately related to its redox properties. Since free copper is toxic, even at very low concentrations, its homeostasis in living organisms is tightly controlled by subtle molecular mechanisms. In eukaryotes, before being transported inside the cell via the high-affinity copper transporters of the CTR family, the copper (II) ion is reduced to copper (I). In blue copper proteins such as Cupredoxin, the copper (I) ion form is stabilised by a constrained His2Cys coordination environment.

    Multicopper oxidases PUBMED:2404764, PUBMED:1995346 are enzymes that possess three spectroscopically different copper centres. These centres are called: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear). Structurally, these proteins contain a cupredoxin-like fold, a beta-sandwich consisting of 7 strands in 2 beta-sheets, arranged in a greek-key beta-barrel PUBMED:11867755.

    \ 641 IPR003654 \ This 14 amino acid motif has been identified within the C-terminal region of\ several Paired-like homeodomain (HD) containing proteins PUBMED:8944018,\ PUBMED:9466998. It was named OAR domain after the initials of otp, aristaless, and rax PUBMED:9096350. Although it has been proposed that this domain could be important for transactivation and be involved in protein-protein interactions or DNA binding PUBMED:9096350, PUBMED:9140395, is function is not yet known. Some proteins known to contain a OAR domain include human RIEG, defects in which are the cause of Rieger syndrome PUBMED:8944018; human OG12X and murine Og12x, whose function is not yet known PUBMED:9466998; vertebrate Rax, which plays a role in the proliferation and/or differentiation of retinal cells PUBMED:9096350; Drosophila DRX, which appears to be important in brain development PUBMED:9482887; and human SHOX, encoded by the short stature homeobox-containing gene. Defects or lack of this protein are the cause of short stature associated with the Turner syndrome PUBMED:9140395.\ 4317 IPR007175 \ This family contains a ribonuclease P subunit of human and yeast. Other members of the family include the probable archaeal homologues. This subunit possibly binds the precursor tRNA PUBMED:11497433.\ 2536 IPR005186 \

    Although these proteins are known to be important for flagellar their exact function is unknown.

    \ 4646 IPR007635 \ All proteins of containing this domain also contain a tandem repeat of CCCH zinc fingers (). Tis11B, Tis11D and their homologues are thought to be regulatory proteins involved in the response to growth factors PUBMED:1695727. Tis11B () is thought to be involved in calcium signalling-induced apoptosis in B cells PUBMED:8898945. The function of this N-terminal domain is unknown.\ 904 IPR000435 \ Tektin heteropolymers form unique protofilaments of flagellar microtubules\ PUBMED:8609631. The proteins are predicted to form extended rods composed of 2 alpha-\ helical segments (~180 residues long) capable of forming coiled coils,\ interrupted by non-helical linkers PUBMED:8609631. The 2 segments are similar in \ sequence, indicating a gene duplication event. Along each tektin rod, \ cysteine residues occur with a periodicity of ~8nm, coincident with the\ axial repeat of tubulin dimers in microtubules PUBMED:8609631. It is proposed that\ the assembly of tektin heteropolymers produces filaments with repeats of\ 8, 16, 24, 32, 40, 48 and 96nm, generating the basis for the complex\ spatial arrangements of axonemal components PUBMED:8609631.\ 2846 IPR007267 \

    Members of this family are predicted to be integral membrane proteins with three or four transmembrane spans. They are involved in the synthesis of cell surface polysaccharides. The GtrA family is a subset of this family. GtrA is predicted to be an integral membrane protein with 4 transmembrane spans. It is involved in O antigen modification by Shigella flexneri bacteriophage X (SfX), but does not determine the specificity of glucosylation. Its function remains unknown, but it may play a role in translocation of undecaprenyl phosphate linked glucose (UndP-Glc) across the cytoplasmic membrane PUBMED:10376843. Another member of this family is a DTDP-glucose-4-keto-6-deoxy-D-glucose reductase, which catalyses the conversion of dTDP-4-keto-6-deoxy-D-glucose to dTDP-D-fucose, which is involved in the biosynthesis of the serotype-specific polysaccharide antigen of Actinobacillus actinomycetemcomitans Y4 (serotype b) PUBMED:10358040. This family also includes the teichoic acid glycosylation protein, GtcA, which is a serotype-specific protein in some Listeria innocua and Listeria monocytogenes strains. Its exact function is not known, but it is essential for decoration of cell wall teichoic acids with glucose and galactose PUBMED:11029438.

    \ 4471 IPR003066 \ The Salmonella typhimurium surface presentation of antigens N/invasion \ protein J gene (SpaN/InvJ) is one of 12 that form a cluster responsible for \ invasion properties. The gene product is required for entry by the \ bacterium into epithelial cells, and is thus considered to be a virulence \ factor PUBMED:8404849. Other Spa genes in the cluster are related to invasion (Inv) genes in similar Salmonella and Shigella species PUBMED:9068645, and to flagella biosynthesis genes in Helicobacter pylori PUBMED:10066464.\

    Functional analysis of the gene product from SpaN/InvJ has revealed the\ protein to have a molecular weight of 36.4 kDa PUBMED:7752894. It is required by the organism to gain access to mammalian epithelial cells, and cellular mutants (InvJ-) fail successfully to infect these cells.\ It has been found, also, that the inv-spa loci of Salmonella species \ encode a type III protein secretion system, essential to the bacterium's\ host cell invasion process PUBMED:8751894. Suprisingly, type III-secreted proteins\ lack the customary signal sequence characteristic of most bacterial\ secretory peptides PUBMED:7752894.\

    \ 6718 IPR010688 \

    This family consists of Bacteriophage Mu Gp45 related proteins from both phages and bacteria. The function of this family is unknown although it has been suggested that family members may be involved in baseplate assembly.

    \ 2377 IPR001633 \ This domain is found in diverse bacterial signaling proteins. It is called EAL after its conserved residues. The EAL domain is a good candidate for a diguanylate\ phosphodiesterase function PUBMED:11557134. The domain contains many conserved acidic residues that could participate in metal binding and might form the phosphodiesterase\ active site. It often but not always occurs along with PAS and DUF9 domains that are also found in many signalling proteins.\ 1568 IPR000547 \ Clathrin is the major protein of the polyhedral coat of coated pits and vesicles. Two different adaptor protein complexes link the clathrin lattice either to the plasma membrane or to the trans golgi network.\ Clathrin triskelions, composed of 3 heavy chains and 3 light chains, are the basic subunits of the clathrin coat. In the presence of light chains, hub assembly is influenced by both the pH and the concentration of calcium. The heavy chains each contain 7 repeats in the arm region. Other eukaryotic proteins, for example vacuolar membrane proteins, may contain one or two repeats.\ 2490 IPR003151 \ The FAT domain is a domain present in the PIK-related kinases. Members of the family of PIK-related kinases may act as intracellular sensors that govern radial and horizontal pathways PUBMED:10782091.\ 7981 IPR012952 \

    This C-terminal domain is found in the BING4 family of nucleolar WD40 repeat proteins PUBMED:15112237.

    \ 652 IPR007204 \

    The Arp2/3 complex is a seven-protein assembly that is critical for actin nucleation and branching in cells. Arp2/3 nucleates new actin filaments while bound to existing filaments, thus creating a branched network PUBMED:15040784. The complex consists of Arp2, Arp3, p41, p34, p21, p20 and p16. Subunits p34 and p20 constitute the core of the structure, with the remaining subunits located peripherally PUBMED:11741539. This entry describes the p21 subunit. Proteins such as WASp and Scar1 may mediate receptor signalling through interactions with p21-Arc, resulting in the activation of Arc2/3 complex activity PUBMED:11162547.

    \ 2743 IPR006103 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 2 \ comprises enzymes with several known activities; beta-galactosidase (); beta-mannosidase (); beta-glucuronidase ().

    \ \

    These enzymes contain a conserved glutamic acid residue which has been shown PUBMED:1350782, in Escherichia coli lacZ (), to be the general acid/base catalyst in the active site of the enzyme.

    \

    Beta-galactosidase from E. coli has a TIM-barrel-like core surrounded by four other largely beta domains PUBMED:8008071.

    \ \ 6240 IPR009444 \

    This family consists of several bacterial TraD conjugal transfer proteins PUBMED:8763953.

    \ 3888 IPR007712 \ Members of this family are involved in plasmid stabilization. The exact molecular function of this protein is not known.\ 7901 IPR012606 \

    This domain is found at the N terminus of ribosomal S13 and S15 proteins. This domain is also identified as NUC021 PUBMED:15112237.

    \ 7409 IPR011493 \

    This domain is found in the IgA1-specific metalloendopeptidases, which attach to the cell wall peptidoglycan by an amide bond PUBMED:8926055. IgA1 protease selectively cleaves human IgA1 and is likely to be a pathogenicity factor in some pathogens including Giardia spp. PUBMED:12841855. This domain is also found in various other contexts, including with . It is named GLUG after the mostly conserved G-L-any-G motif.

    \

    The IgA1-specific metalloendopeptidases belong to MEROPS peptidase family M26, clan MA(E).

    \ \ 7367 IPR011430 \

    These eukaryotic proteins include DRIM (Down-Regulated In Metastasis) (), which is differentially expressed in metastatic and non-metastatic human breast carcinoma cells PUBMED:9673349. It is believed to be involved in processing of non-coding RNA PUBMED:12837249.

    \ 7217 IPR009974 \

    This family consists of several Orthopoxvirus specific proteins of around 100 residues in length. The function of this family is unknown.

    \ 1071 IPR003675 \

    This family consists of various hypothetical protein sequences for which the function is unknown.One of the proteins is an abortive infection protein that confers resistance to the bacteriophage Phi 712 PUBMED:8795193. AbiG is an abortive infection (Abi) mechanism encoded by the conjugative plasmid pCI750 originally isolated from Lactococcus lactis subsp. cremoris UC653. The resistance mechanism acts at neither the phage adsorption or phage DNA restriction level PUBMED:8795193.

    \ \

    Also in this family is a series of bacteriocin-like peptides PlnP, PlnI, PlnT, PlnP and PlnU from Lactobacillus plantarum C11. Lactobacillus plantarum C11 secretes a small cationic peptide, plantaricin A, that serves as an induction signal for bacteriocin production as well as transcription of plnABCD. The plnABCD operon encodes the plantaricin A precursor (PlnA) itself and determinants (PlnBCD) for a signal transducing pathway PUBMED:8755874.

    \ 3551 IPR007252 \

    Nup84p forms a complex with five proteins, including Nup120p, Nup85p, Sec13p, and a Sec13p homolog. This Nup84p complex in conjunction with Sec13-type proteins is required for correct nuclear pore biogenesis PUBMED:8565072.

    \ 7414 IPR011447 \

    This is a family of proteins identified in Rhodopirellula baltica.

    \ 3752 IPR000755 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M15 (clan MD), subfamily M15D (vanX D-Ala-D-Ala dipeptidase).

    \ \

    The D-alanyl-D-alanine dipeptidase enzyme from Enterococcus faecalis is also known as the\ vancomycin resistance protein VanX, and hydrolyses D-ala-D-ala. It has a 250-fold differential in catalytic efficiency for hydrolysis of D-ala-D-ala versus D-ala-D-lactate. The latter therefore remains intact for subsequent incorporation into peptidoglycan precursors that terminate in the dipeptide D-ala-D-lactate rather than the dipeptide D-ala-D-ala, thereby preventing vancomycin from binding. The enzyme requires a metal cofactor, and is induced by vancomycin through regulation by VanS and VanR.

    \ 4608 IPR003923 \

    Transcription initiation factor TFIID is a multimeric protein complex that\ plays a central role in mediating promoter responses to various activators\ and repressors. The complex includes TATA binding protein (TBP) and various\ TBP-associated factors (TAFS). TFIID a bona fide RNA polymerase II-specific\ TATA-binding protein-associated factor (TAF) and is essential for viability PUBMED:8662725.

    \

    TFIID acts to nucleate the transcription complex, recruiting the rest of\ the factors through a direct interaction with TFIIB. The TBP subunit of TFIID is sufficient for TATA-element binding and TFIIB interaction, and can support basal transcription. The protein belongs to the TAF2H family.

    \ 4168 IPR005822 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L13 is one of the proteins from the large ribosomal subunit\ PUBMED:8119894. In Escherichia coli, L13 is known to be one of the early assembly\ proteins of the 50S ribosomal subunit.

    \ 7379 IPR011433 \

    This family is found in a group of small bacterial proteins. Its function is not known.

    \ 5368 IPR008838 \ This family consists of several variable surface proteins from Brachyspira hyodysenteriae.\ 4769 IPR001084 \ Microtubules consist of tubulins as well as a group of additional proteins\ collectively known as the Microtubule Associated Proteins (MAP). MAP's have\ been classified into two classes: high molecular weight MAP's and Tau\ protein. The Tau proteins promote microtubule assembly and stabilize\ microtubules.\ \

    The C-terminal region of these proteins contains three or four tandem repeats\ of a conserved domain of about thirty amino acid residues which is implicated\ in tubulin-binding and which seems to have a stiffening effect on microtubules.

    \ 815 IPR000228 \ RNA cyclases are a family of RNA-modifying enzymes that are conserved in\ eukaryotes, bacteria and archaea.\ RNA 3'-terminal phosphate cyclase () PUBMED:9184239, PUBMED:2199762 catalyses the conversion\ of 3'-phosphate to a 2',3'-cyclic phosphodiester at the end of RNA.\ \ These enzymes might be responsible for production of the cyclic phosphate RNA ends that are known to be required by many RNA ligases in both prokaryotes and eukaryotes.\

    RNA cyclase is a protein of from 36 to 42 kDa. The best conserved region is a\ glycine-rich stretch of residues located in\ the central part of the sequence and which is reminiscent of various ATP, GTP\ or AMP glycine-rich loops.

    \

    The crystal structure of RNA 3'-terminal phosphate cyclase shows that each molecule consists of two domains. The larger domain contains three repeats of a folding unit comprising two parallel alpha helices and a\ four-stranded beta sheet; this fold was previously identified in translation initiation factor 3 (IF3).\ The large domain is similar to one of the two domains of 5-enolpyruvylshikimate-3-phosphate\ synthase and UDP-N-acetylglucosamine enolpyruvyl transferase. The smaller domain uses a\ similar secondary structure element with different topology, observed in many other proteins such\ as thioredoxin PUBMED:10673421. Although the active site of this enzyme could not be\ unambiguously assigned, it can be mapped to a region surrounding His309, an adenylate\ acceptor, in which a number of amino acids are highly conserved in the enzyme from different\ sources PUBMED:10673421.

    \ 6043 IPR010408 \

    This family consists of several infectious salmon anaemia virus haemagglutinin proteins. Infectious salmon anaemia virus (ISAV), an orthomyxovirus-like virus, is an important fish pathogen in marine aquaculture PUBMED:11714961.

    \ 2539 IPR007824 \ This family consists of several eukaryotic paraflagellar rod component proteins. The eukaryotic flagellum represents one of the most complex macromolecular structures found in any organism and contains more than 250 proteins PUBMED:11112698. In addition to its locomotive role, the flagellum is probably involved in nutrient uptake since receptors for host low-density lipoproteins are localised on the flagellar membrane as well as on the flagellar pocket membrane PUBMED:11163437.\ 2336 IPR007842 \

    The HEPN (higher eukaryotes and prokaryotes nucleotide-binding) domain is a\ region of 110 residues found in the C-terminus of sacsin, a chaperonin\ implicated in an early-onset neurodegenerative disease in human, and in many\ bacterial and archeabacterial proteins. There are three classes of proteins\ with HEPN domain:

    \ \
  • Single-domain HEPN proteins found in many bacteria.
  • \
  • Two-domain proteins with N-terminal nucleotidyltransferase (NT) and C-\ terminal HEPN domains. This N-terminal NT domain belongs to a large family\ of NTs, which includes several classes of enzymes that are responsible for\ some types of bacterial resistance to aminoglycosides. These enzymes\ deactivate various antibiotics by transferring a nucleotidyl group to the\ drug.
  • \
  • A multidomain sacsin protein in genomes of fish and mammals. The HEPN\ domain is located at the C-terminus of the protein, directly after the DnaJ\ domain (see ).
  • \ \

    The crystal structure of the HEPN domain from the TM0613 protein of Thermotoga\ maritima indicates that it is structurally similar to the C-terminal all-\ alpha-helical domain of kanamycin nucleotidyltransferases (KNTases). It is composed of five alpha helices, three of which form an up-\ and-down helical bundle, with a pair of short helices on the side. The distant\ structural similarity suggests that the HEPN domain might be involved in\ nucleotide binding PUBMED:12765831.

    \ \ 4798 IPR004290 \ Members of this family are functionally uncharacterised proteins from herpesviruses.\ 6178 IPR009410 \

    This family consists of several plant specific allene oxide cyclase proteins (). The allene oxide cyclase (AOC)-catalysed step in jasmonate (JA) biosynthesis is important in the wound response of tomato PUBMED:12581315.

    \ 4124 IPR007816 \ This family includes both ResB and cytochrome c biogenesis proteins. Mutations in ResB indicate that they are essential for growth PUBMED:10844653. ResB is predicted to be a transmembrane protein PUBMED:10844653.\ 5698 IPR008729 \ This family consists of several bacterial phenolic acid decarboxylase proteins. Phenolic acids, also called substituted cinnamic acids, are important lignin-related aromatic acids and natural constituents of plant cell walls. These acids (particularly ferulic, p-coumaric, and caffeic acids) bind the complex lignin polymer to the hemicellulose and cellulose in plants. The phenolic acid decarboxylase (PAD) gene (pad) is transcriptionally regulated by p-coumaric, ferulic, or caffeic acid; these three acids are the three substrates of PAD PUBMED:9546183.\ 5286 IPR008842 \ Siva binds to the CD27 cytoplasmic tail. It has a DD homology region, a box-B-like ring finger, and a zinc finger-like domain. Overexpression of Siva in various cell lines induces apoptosis, suggesting an important role for Siva in the CD27-transduced apoptotic pathway PUBMED:9177220. Siva-1 binds to and inhibits BCL-X(L)-mediated protection against UV radiation-induced apoptosis. Indeed, the unique amphipathic helical region (SAH) present in Siva-1 is required for its binding to BCL-X(L) and sensitising cells to UV radiation. Natural complexes of Siva-1/BCL-X(L) are detected in HUT78 and murine thymocyte, suggesting a potential role for Siva-1 in regulating T cell homeostasis PUBMED:12011449. This family contains both Siva-1 and the shorter Siva-2 lacking the sequence coded by exon 2. It has been suggested that Siva-2 could regulate the function of Siva-1 PUBMED:10597319.\ 4460 IPR001189 \

    Superoxide dismutases (SODs) () catalyse the conversion of superoxide radicals to molecular oxygen. Their function is to destroy the radicals that are normally produced within cells and are toxic to biological systems. Three evolutionarily distinct families of SODs are known, of which the Mn/Fe-binding family is one PUBMED:3315461, PUBMED:3345848, PUBMED:1556751. This family includes both single metal-binding SODs and cambialistic SOD, which can bind either Mn or Fe. Fe/MnSODs are ubiquitous enzymes that are responsible for the majority of SOD activity in prokaryotes, fungi, blue-green algae and mitochondria. Fe/MnSODs are found as homodimers or homotetramers.

    \

    The structure of Fe/MnSODs can be divided into two domains, an alpha N-terminal domain and an alpha/beta C-terminal domain, connected by a loop. The structure of the N-terminal domain consists of a two helices in an antiparallel hairpin, with a left-handed twist PUBMED:9537987. The structure of the C-terminal domain is of the alpha/beta type, and consists of a three-stranded antiparallel beta-sheet in the order 213, along with four helices in the arrangement alpha/beta(2)/alpha/beta/alpha(2) PUBMED:9931259.

    \ \ 7513 IPR011629 \ This group of proteins contains P47K (), a Pseudomonas chlororaphis protein needed for nitrile hydratase expression, and the cobW gene product (), which may be involved in cobalamin biosynthesis in Pseudomonas denitrificans PUBMED:1655697. This entry represents the C-terminal domain.\ 198 IPR011545 \

    Members of this family include the DEAD and DEAH box helicases. Helicases are involved in unwinding nucleic acids. The DEAD box helicases are involved in various aspects of RNA metabolism, including nuclear transcription, pre mRNA splicing, ribosome biogenesis, nucleocytoplasmic transport, translation, RNA decay and organellar gene expression.

    \ 5856 IPR010315 \

    This family consists of bacterial proteins of unknown function, which are hydrolase-like.

    \ 6227 IPR009112 \

    GTP cyclohydrolase I feedback regulatory protein (GFRP) in mammals helps regulate the biosynthesis of tetrahydrobiopterin through the feedback inhibition of the rate-limiting enzyme GTP cyclohydrolase I (GTPCHI). Tetrahydrobiopterin is the cofactor required for the hydroxylation of aromatic amino acids. The crystal structure of GFRP reveals that the protein forms a homopentamer PUBMED:11580249. In the presence of phenylalanine, the stimulatory complex consists of a GTPCHI decamer sandwiched by two GFRP pentamers, which is thought to enhance GTPCHI activity by locking the enzyme in the active state PUBMED:11818540. The structure of GFRP consists of two alpha/beta layers arranged beta(2)-alpha-beta(2)-alpha-beta(2), with antiparallel beta-sheets in the order 342165.

    \ \ 821 IPR001995 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of aspartic peptidases belong to the MEROPS peptidase family A2 (retropepsin family, clan AA), subfamily A2A. The family includes the single domain aspartic proteases from retroviruses, retrotransposons, and badnaviruses (plant dsDNA viruses).

    \

    Retroviral aspartyl protease is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins.

    \ 4565 IPR002040 \ Tachykinins PUBMED:3284438, PUBMED:1969374, PUBMED:1324401 are a group of biologically active peptides which excite\ neurons, evoke behavioral responses, are potent vasodilatators and contract\ (directly or indirectly) many smooth muscles. This family includes many other peptides.\ Tachykinins, like most other active peptides, are synthesized as larger\ protein precursors that are enzymatically converted to their mature forms.\ Tachykinins are from ten to twelve residues long.\ 307 IPR006903 \ This entry represents a conserved region found in a number of uncharacterised eukaryotic proteins.\ 5281 IPR008911 \ This family consists of several scorpion toxins which act by blocking small conductance calcium activated potassium ion channels in their victim.\ 231 IPR004401 \ The function of this protein is unknown. It is restricted to Bacteria and the plant Arabidopsis. The plant form contains an additional N-terminal region that may serve as a transit peptide and shows a close relationship to the cyanobacterial member, suggesting that it is a chloroplast protein. Members of this family are found in a single copy per bacterial genome, but are broadly distributed. A member is present even in the minimal gene complement of Mycoplasm genitalium.\ 6091 IPR010429 \

    This family consists of several bacterial FdrA proteins. FdrA is known to play a role in the suppression of dominant negative FtsH proteins PUBMED:7500942.

    \ 51 IPR006190 \

    Antifreeze proteins (AFPs) are defined by their ability to bind ice and prevent it from growing. In this way they function in both freeze-resistance and freeze-tolerance strategies of organisms that live at sub-zero temperatures and require protection from ice growth. In fish, five AFP types have been described that are remarkably diverse in their 3D structures. They have completely dissimilar folds and no sequence homology. Type III AFPs found in eel pounts are 65-residue proteins with a compact globular fold formed from short beta strands, which presents a flat ice binding surface. These proteins\ are homologous to the C-terminal region of mammalian and prokaryotic sialic\ acid synthase (SAS; gene neuB), which has been called AFP-like domain PUBMED:12171656. The similarity is greatest in the protein core and the flat ice-binding region. SAS is involved in the condensation of phosphoenolpyruvate with N-acetylmannosamine derivatives to generate N-acetylneuraminic acid, an intermediate used for the sialylation of glycoconjugates. The function of the AFP-like domain, which is a beta-clip fold PUBMED:15146494, in SAS is not known, but it has been proposed that it could be involved in sugar binding.

    \ 7774 IPR012907 \

    Penicillin-binding protein 5 expressed by Escherichia coli () functions as a D-alanyl-D-alanine carboxypeptidase. It is composed of two domains that are oriented at approximately right angles to each other. The N-terminal domain () is the catalytic domain. The C-terminal domain, this entry, is organised into a sandwich of two anti-parallel beta-sheets, and has a relatively hydrophobic surface as compared to the N-terminal domain. Its precise function is unknown; it may mediate interactions with other cell wall-synthesising enzymes, thus allowing the protein to be recruited to areas of active cell wall synthesis. It may also function as a linker domain that positions the active site in the catalytic domain closer to the peptidoglycan layer, to allow it to interact with cell wall peptides PUBMED:10967102.

    \ 218 IPR007693 \

    The hexameric helicase DnaB unwinds the DNA duplex at the Escherichia coli chromosome replication fork. Although the mechanism by which DnaB both couples ATP hydrolysis to translocation along DNA and denatures the duplex is unknown, a change in the quaternary structure of the protein involving dimerization of the N-terminal domain has been observed and may occur during the enzymatic cycle. This N-terminal domain is required both for interaction with other proteins in the primosome and for DnaB helicase activity.

    \ 1372 IPR006862 \ This entry presents the N-termini of acyl-CoA thioester hydrolase and bile acid-CoA:amino acid N-acetyltransferase (BAAT) PUBMED:11673457. This region is not thought to contain the active site of either enzyme. Thioesterase isoforms have been identified in peroxisomes, cytoplasm and mitochondria, where they are thought to have distinct functions in lipid metabolism PUBMED:10567408. For example, in peroxisomes, the hydrolase acts on bile-CoA esters PUBMED:11673457.\ 4992 IPR006879 \ This is a family of YdjC-like proteins. It is possibly involved in the the cleavage of cellobiose-phosphate PUBMED:8407820.\ 7824 IPR013120 \

    This family represents the C-terminal region of the male sterility protein from arabidopsis and drosophila. A sequence-related jojoba acyl CoA reductase is also included.

    \ 6380 IPR010547 \

    This family consists of several plant specific mitochondrial import receptor subunit TOM20 (translocase of outer membrane 20 kDa subunit) proteins. Most mitochondrial proteins are encoded by the nuclear genome, and are synthesised in the cytosol. TOM20 is a general import receptor that binds to mitochondrial pre-sequences in the early step of protein import into the mitochondria PUBMED:12691756.

    \ 4376 IPR004235 \ Scytalone dehydratase is a member of the group of enzymes involved in fungal melanin biosynthesis. It was first identified in a phytopathogenic fungus, Pyricularia oryzae, which causes rice blast disease. Scytalone dehydratase is a molecular target of inhibitor design efforts aimed at protecting rice plants from fungal disease PUBMED:9922139.\ 3344 IPR007737 \ Mga is a DNA-binding protein that activates the expression of several important virulence genes in group A streptococcus in response to changing environmental conditions PUBMED:11952907. The family also contains VirR like proteins which match only at the C terminus of the alignment.\ 1407 IPR005168 \

    Bunyavirus has three genomic segments: small (S), middle-sized (M), and large (L). The S segment encodes the nucleocapsid and a non-structural protein. The M segment codes for two glycoproteins, G1 and G2, and another non-structural protein (NSm). The L segment codes for an RNA polymerase. This family contains the G2 glycoprotein which interacts with the G1 glycoprotein PUBMED:7645217.

    \ 3013 IPR006120 \

    Site-specific recombination plays an important role in DNA rearrangement in prokaryotic organisms. Two types of site-specific recombination are known to occur:

    \
      \
    1. Recombination between inverted repeats resulting in the reversal of a DNA segment.
    2. \
    3. Recombination between repeat sequences on two DNA molecules resulting in their cointegration, or between repeats on one DNA molecule resulting in the excision of a DNA fragment.
    4. \
    \

    Site-specific recombination is characterized by a strand exchange mechanism that requires no DNA synthesis or high energy cofactor; the phosphodiester bond energy is conserved in a phospho-protein linkage during strand cleavage and re-ligation.

    \

    Two unrelated families of recombinases are currently known PUBMED:3011407. The first, called the 'phage integrase' family, groups a number of bacterial phage and yeast plasmid enzymes. The second PUBMED:2896291, called the 'resolvase' family, groups enzymes which share the following structural characteristics: an N-terminal catalytic and dimerization domain that contains a conserved serine residue involved in the transient covalent attachment to DNA , and a C-terminal helix-turn-helix DNA-binding domain.

    \ 1303 IPR011606 \

    Some proteins in this entry are encoded by a gene, which is a part of the azl operon. This operon is involved in branched-chain amino acid transport PUBMED:9287000. Overexpression of this gene results in resistance to a leucine analogue, 4-azaleucine. The protein has 5 potential transmembrane motifs.

    \ 2270 IPR006866 \ This domain represents the N-terminal region of several plant proteins of unknown function.\ 264 IPR005176 \

    Members of this family contain a basic helix-loop-helix leucine zipper motif PUBMED:10831844.

    \ 4176 IPR005484 \

    This family includes L18 from bacteria and L5 from eukaryotes. The ribosomal 5S RNA is\ the only known rRNA species to bind a ribosomal protein before its assembly into the\ ribosomal subunits \ PUBMED:8474444. \ In eukaryotes, the 5S rRNA molecule binds one protein species, a 34-kDa protein which has been implicated in the intracellular\ transport of 5 S rRNA, while in bacteria it binds\ two or three different protein species \ PUBMED:8219074.

    \ 1672 IPR006893 \

    This family of proteins contains p23 from the citrus tristeza virus, which is a member of the Closteroviridae. CTV produces more positive than negative RNA strands, and p23 controls this asymmetrical RNA accumulation. Amino acids 42-180 are essential for function and are thought to contain RNA-binding and zinc finger domains PUBMED:11752137.

    \ 2616 IPR007082 \ In Escherichia coli, nine gene products are known to be essential for assembly of the division septum. One of these, FtsL, is a bitopic membrane protein whose precise function is not understood. It has been proposed that FtsL interacts with the DivIC protein PUBMED:10844672, however this interaction may be indirect PUBMED:11994149.\ 2474 IPR005067 \

    Fatty acid desaturases are enzymes that catalyze the insertion\ of a double bond at the delta position of fatty acids.

    \ \

    There seem to be two distinct families of fatty acid desaturases which do not\ seem to be evolutionary related.

    \ \

    Family 1 is composed of:

    \ \

    - Stearoyl-CoA desaturase (SCD) () PUBMED:2570068.

    \ \ \

    Family 2 is composed of:

    \ \

    - Plants stearoyl-acyl-carrier-protein desaturase () PUBMED:2006187, these\ enzymes catalyze the introduction of a double bond at the delta(9) position\ of steraoyl-ACP to produce oleoyl-ACP. This enzyme is responsible for the\ conversion of saturated fatty acids to unsaturated fatty acids in the\ synthesis of vegetable oils.

    \

    - Cyanobacteria desA PUBMED:2118597, an enzyme that can introduce a second cis double\ bond at the delta(12) position of fatty acid bound to membranes\ glycerolipids. DesA is involved in chilling tolerance; the phase transition\ temperature of lipids of cellular membranes being dependent on the degree\ of unsaturation of fatty acids of the membrane lipids.

    \ 3031 IPR002558 \ I/LWEQ domains bind to actin. It has been shown that the I/LWEQ\ domains from mouse talin and yeast Sla2p\ interact with F-actin PUBMED:9159132. \ The domain has four conserved blocks, the name of the domain is derived from the initial conserved amino acid of\ each of the four blocks PUBMED:9159132. I/LWEQ domains can be\ placed into four major groups based on sequence similarity:\
      \
    1. Metazoan talin.
    2. \
    3. Dictyostelium discoideum TalA/TalB and SLA110.
    4. \
    5. Metazoan Hip1p .
    6. \
    7. Saccharomyces cerevisiae Sla2p .
    8. \
    \ 380 IPR002563 \

    This entry represents the FMN-binding domain found in NAD(P)H-flavin oxidoreductases (flavin reductases), a class of enzymes capable of producing reduced flavin for bacterial bioluminescence and other biological processes. This domain is also found in various other oxidoreductase and monooxygenase enzymes PUBMED:12829278, PUBMED:15461461, PUBMED:11017201.

    \

    This domain consists of a beta-barrel with Greek key topology, and is related to the ferredoxin reductase-like FAD-binding domain. The flavin reductases have a different dimerisation mode than that found in the PNP oxidase-like family, which also carries an FMN-binding domain with a similar topology.

    \ \ 4257 IPR000554 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities PUBMED:8371989.\ One of these families consists of Xenopus S8, and mammalian, insect and yeast S7. These proteins have about\ 200 amino acids.

    \ 3565 IPR006757 \ Opioid peptides act as growth factors in neural and non-neural cells and tissues, in addition to serving in neurotransmission/neuromodulation in the nervous system. The opioid growth factor receptor is an integral membrane protein associated with the nucleus. This conserved region is situated at the N-terminus of the member proteins with a series of imperfect repeats lying immediately to its C-terminal PUBMED:11890982.\ 4115 IPR007662 \ Protein sigmaC in its native state was shown to be a homotrimer. It was demonstrated that the sigmaC subunits are not covalently bound via disulphide linkages and the formation of an intrachain disulphide bond between the two cysteine residues of the sigmaC polypeptide may have a negative effect on oligomer stability. The susceptibility of the trimer to pH, temperature, ionic strength, chemical denaturants and detergents indicates that hydrophobic interactions contribute much more to oligomer stability than do ionic interactions and hydrogen bonding PUBMED:11752709.\ 1300 IPR005166 \

    A family of a vain specific viral glycoproteins that forms a receptor-binding gp85 polypeptide that is linked through disulphide to a membrane-spanning gp37 spike. Gp85 confers a high degree of subgroup specificity for interaction with distinct cell receptors PUBMED:3009025.

    \ 630 IPR007207 \

    The Ccr4-Not complex (Not1, Not2, Not3, Not4 and Not5) is a global regulator of transcription that affects genes positively and negatively and is thought to regulate transcription factor TFIID PUBMED:7926748. This domain is the N-terminal region of the Not proteins.

    \ 5802 IPR010287 \

    This group consists of several hypothetical bacterial proteins of unknown function.

    \ 2354 IPR002793 \

    The function of these prokaryotic proteins is unknown.

    \ 504 IPR006849 \ Members of this family are components of the elongator multi-subunit component of a novel RNA polymerase II holoenzyme for transcriptional elongation PUBMED:10024884.\ 4055 IPR005323 \

    Domain is found in pullanase - carbohydrate de-branching - proteins. It is found both to the N or the C-termini of of the alpha-amylase active site region. This domain contains several conserved aromatic residues that are suggestive of a carbohydrate binding function.

    \ 5502 IPR008534 \

    This family includes proteins that are about 200 amino acids in length. The proteins are all from baculoviruses. This family includes ORF107 from Orgyia pseudotsugata multicapsid polyhedrosis virus, (OpMNPV) and a variety of other numbered ORF proteins, such as ORF52 , ORF140 from other baculoviruses. The function of these proteins is unknown.

    \ 105 IPR004826 \

    There are several different types of Maf transcription factors with different roles in the cell. MafG and MafH are small Mafs which lack a putative transactivation domain. They behave as transcriptional repressors when they dimerize among themselves. However they also serve as transcriptional activators by dimerizing with other (usually larger) basic-zipper proteins and recruiting them to specific DNA-binding sites. Maf transcription factors contain a conserved basic region leucine zipper (bZIP) domain, which mediates their dimerization and DNA binding property. Neural retina-specific leucine zipper proteins also belong to this family. Together with the basic region, the Maf extended\ homology region (EHR), conserved only within the Maf\ family, defines the DNA binding specific to Mafs. This structure enables Mafs to make a\ broader area of contact with DNA and to recognize longer DNA\ sequences. In particular, the two residues at the beginning of\ helix H2 are positioned to recognize the\ flanking region PUBMED:11875518. Small Maf proteins heterodimerize with Fos and may act as competitive repressors of the NF2-E2 transcription factor.

    In mouse, Maf1 may play an early role in axial patterning. Defects in these proteins are a cause of autosomal dominant retinitis pigmentosa.

    \ 3206 IPR004961 \ The Proteobacterial lipase chaperone is a lipase helper protein which seems to assist in the folding of extracellular lipase during its passage through the periplasm.\ 4774 IPR004138 \ This family represents herpes virus protein U79 and cytomegalovirus early phosphoprotein P34 (UL112).\ 3207 IPR001087 \ A variety of lipolytic enzymes with serine as part of the active site have been\ identified PUBMED:7610479. Members of this family include; Aeromonas hydrophila lipase,\ Vibrio mimicus arylesterase, Vibrio parahaemolyticus thermolabile hemolysin,\ rabbit phospholipase (AdRab-B), and Brassica napus anter-specific proline-rich\ protein.\ 7614 IPR012426 \

    This family consists of several hypothetical archaeal proteins of unknown function.

    \ 7806 IPR004595 \

    All proteins in this domain for which functions are known are components of the TFIIH complex which is involved in the initiation of transcription and nucleotide excision repair. It includes the yeast transcription factor Ssl1 (Suppressor of stem-loop protein 1) that is essential for translation initiation and affects UV resistance.

    \ \

    The C-terminal region is essential for transcription activity. This regions binds three zinc atoms through two independent domain. The first contains a C4 zinc finger motif, whereas the second is characterised by a CX(2)CX(2-4)FCADCD motif. The solution structure of the second C-terminal domain revealed homology with the regulatory domain of protein kinase C PUBMED:.

    \ 4747 IPR002310 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \ \

    In eubacteria, glycyl-tRNA synthetase () is an alpha2/beta2 tetramer composed of 2 different subunits PUBMED:6309809, PUBMED:7962006, PUBMED:7665503. In some eubacteria,\ in archaea and eukaryota, glycyl-tRNA synthetase is an alpha2 dimer (see ). It belongs to class IIc and is one of the most complex synthetases. What is most interesting\ is the lack of similarity between the two types: divergence at the sequence\ level is so great that it is impossible to infer descent from common genes. \ The alpha and beta subunits (see ) also lack significant sequence similarity.\ However, they are translated from a single mRNA PUBMED:6309809, and a single chain \ glycyl-tRNA synthetase from Chlamydia trachomatis has been found to have \ significant similarity with both domains, suggesting divergence from a \ single polypeptide chain PUBMED:7665503.

    \ 7988 IPR012974 \

    This N-terminal domain is found in RNA-binding proteins of the NOP5 family PUBMED:15112237.

    \ 6158 IPR009400 \

    Nucleotide excision repair is a major pathway for repairing UV light-induced DNA damage in most organisms. REX1 is required for DNA repair in Chlamydomonas reinhardtii PUBMED:12697762, and has homologues in other eukaryotes.

    \ 3937 IPR004967 \ This family includes Poxvirus C7 and F8A proteins.\ 4895 IPR003360 \ This is the US22 protein family of hypothetical proteins from herpes virus. The name sake of this family US22 is an early nuclear protein that is secreted from cells PUBMED:1321206. The US22 family may have a role in virus replication and pathogenesis PUBMED:10405367.\ 5675 IPR008415 \ This family consists of LEF-3 Nucleopolyhedrovirus late expression factor 3 (LEF-3) sequences which are known to be ssDNA-binding proteins PUBMED:10073712. Alkaline nuclease (AN) and LEF-3 may participate in homologous recombination of the baculovirus genome in a manner similar to that of exonuclease (Redalpha) and DNA-binding protein (Redbeta) of the Red-mediated homologous recombination system of bacteriophage lambda PUBMED:12551981.\ 2135 IPR007426 \

    This sequence is usually found in association with and , and occasionally also with in integral membrane proteins. Together, this entry, and make up the C-terminal portion of Staphylococcus aureus FmtC/MprF, which is involved in resistance to defensins by the lysinylation of membrane phospholipids PUBMED:11342591. This domain along with and also occurs adjacent to the OB-fold nucleic acid binding domain () and tRNA synthetase class II () in lysyl-tRNA synthases.

    \ 7832 IPR012536 \

    This is a family of unique short (US) cytoplasmic glycoproteins which are expressed in cytomegalovirus PUBMED:11992003.

    \ 4871 IPR005362 \

    This family of uncharacterised proteins are only found in Treponema pallidum. They contain a putative signal peptide so may be secreted proteins.

    \ 2716 IPR007370 \ This is a group of bacterial glutamate-cysteine ligases () that carry out the first step of the glutathione biosynthesis pathway.\ 7241 IPR010880 \

    This family consists of several Betaherpesvirus immediate-early glycoprotein UL37 sequences. The human cytomegalovirus (HCMV) UL37 immediate-early regulatory protein is a type I integral membrane N-glycoprotein which traffics through the ER and the Golgi network PUBMED:8794367.

    \ 2204 IPR007536 \ This is a protein of unknown function found in proteobacteria. In Salmonella typhimurium, expression of this protein is regulated by heat shock PUBMED:10629202.\ 7418 IPR011450 \

    This is a family of proteins of unknown function.

    \ 3345 IPR007885 \ This family contains several Mycoplasma MgpC like-proteins.\ 736 IPR007280 \ This domain is normally found at the C terminus of secreted archaeal and bacterial peptidases, the majority of which belong to MEROPS peptidase families M4 (thermolysin, ), M9A (microbial collangenase, ) and S8 (subtilisin, ).\ 2091 IPR007361 \ This is a family of uncharacterised proteins.\ 5724 IPR008703 \ This family consists of several bacterial Na+-translocating NADH-quinone reductase subunit A (NQRA) proteins. The Na+-translocating NADH: ubiquinone oxidoreductase (Na+-NQR) generates an electrochemical Na+ potential driven by aerobic respiration PUBMED:10587447.\ 1732 IPR004668 \ These proteins are members of the C4-Dicarboxylate Uptake (Dcu) family. Most proteins in this family have 12 GES predicted transmembrane regions; however one member has 10 experimentally determined transmembrane regions with both the N- and C-termini localized to the periplasm. The two Escherichia coli proteins, DcuA and DcuB, transport aspartate, malate, fumarate and succinate, and function as antiporters with any two of these substrates. Since DcuA is encoded in an operon with the gene for aspartase, and DcuB is encoded in an operon with the gene for fumarase, their physiological functions may be to catalyze aspartate:fumarate and fumarate:malate exchange during the anaerobic utilization of aspartate and fumarate, respectively.\ 3757 IPR000013 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M7 (snapalysin family, clan MA(M)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA.

    \ \ \

    With a molecular weight of around 16kDa, Streptomyces extracellular neutral protease is one of the smallest known proteases PUBMED:7674922; it is capable of hydrolysing milk proteins PUBMED:7674922. The enzyme is synthesised as a proenzyme with a signal peptide, a propeptide and an active domain that contains the conserved HEXXH motif characteristic of metalloproteases. Although family M7 shows active site sequence similarity to other members, it differs in one major respect: the third zinc ligand appears to be an aspartate residue rather than the usual histidine.

    \ 66 IPR000485 \

    The many bacterial transcription regulation proteins which bind DNA through a\ 'helix-turn-helix' motif can be classified into subfamilies on the basis of\ sequence similarities. One such family is the AsnC/Lrp subfamily PUBMED:7770911. The Lrp family of transcriptional regulators appears to be widely distributed among bacteria and\ archaea, as an important regulatory system of the amino acid metabolism and related processes PUBMED:12675791.

    Members of the Lrp family are small DNA-binding proteins with molecular masses of around\ 15 kDa. Target promoters often contain a\ number of binding sites that typically lack obvious inverted repeat elements, and to which binding is\ usually co-operative. LrpA from Pyrococcus furiosus is the first Lrp-like protein to date of which a three-dimensional structure\ has been solved. In the crystal structure LrpA forms an octamer consisting\ of four dimers. The structure revealed that the N-terminal part of the protein consists of a\ helix-turn-helix (HTH) domain, a fold generally involved in DNA binding.\ The C-terminus of Lrp-like proteins has a beta-fold, where the two alpha-helices are located at one side of the four-stranded antiparallel beta-sheet.\ LrpA forms a homodimer mainly through interactions between the beta-strands of this C-terminal\ domain, and an octamer through further interactions between the second alpha-helix and fourth beta-strand\ of the motif. Hence, the C-terminal domain of Lrp-like proteins appears to\ be involved in ligand-response and activation PUBMED:12675791.

    \ 6020 IPR010395 \

    This family consists of several bacterial TorD proteins. Many prokaryotic molybdoenzymes, for example the TMAO reductase (TorA) of Escherichia coli, require the insertion of a bis(molybdopterin guanine dinucleotide) molybdenum (bis(MGD)Mo) cofactor in its catalytic site to be active and translocated to the periplasm. The TorD chaperone increases apoTorA activation up to four-fold, allowing maturation of most of the apoprotein. Therefore TorD is involved in the first step of TorA maturation to make it competent to receive the cofactor PUBMED:12766163.

    \ 867 IPR007287 \ Sof1 is essential for cell growth and is a component of the nucleolar rRNA processing machinery PUBMED:8508778.\ 6754 IPR009696 \

    This entry represents the C terminus (approximately 100 residues) of a putative replisome organiser protein in Lactococcus bacteriophages PUBMED:11157223.

    \ 7199 IPR009963 \

    This family consists of several hypothetical bacterial proteins of around 90 residues in length. Members of the family seem to be found exclusively in Mycobacterium species. The function of this family is unknown.

    \ 1748 IPR003207 \ This family contains the small subunit of the trimeric diol dehydratases and glycerol dehydratases. These enzymes are produced by some enterobacteria in response to growth substances PUBMED:9805380, PUBMED:10949584.\ 4579 IPR004226 \

    The folding pathway of tubulins includes highly specific interactions with a series of cofactors (A, B, C, D and E) after they are released from the eukaryotic chaperonin CCT. Cofactors A and D capture and stabilise tubulin in a quasi-native conformation. Cofactor E binds to the cofactor D-tubulin complex, and interaction with cofactor C then causes the release of tubulin poypeptides in the native state. This family is the tubulin-specific chaperone A.

    \ \ 4334 IPR002176 \

    The Escherichia coli ruvC gene is involved in DNA repair and in the late step of RecE and RecF pathway recombination PUBMED:1661673. RuvC protein () cleaves cruciform junctions, which are formed by the extrusion of inverted repeat sequences from a super-coiled plasmid and which are structurally analogous to Holliday junctions, by introducing nicks into strands with the same polarity. The nicks leave a 5'terminal phosphate and a 3'terminal hydroxyl group which are ligated by E. coli or T4 DNA ligases. Analysis of the cleavage sites suggests that DNA topology rather than a particular sequence determines the cleavage site. RuvC protein also cleaves Holliday junctions that are formed between gapped circular and linear duplex DNA by the function of RecA protein. The active form of RuvC protein is a dimer. This is mechanistically suited for an endonuclease involved in swapping DNA strands at the crossover junctions. It is inferred that RuvC protein is an endonuclease that resolves Holliday structures in vivo PUBMED:1661673.

    \

    RucC is a small protein of about 20 kD. It requires and binds a magnesium ion. The structure of E. coli ruvC is a 3-layer alpha-beta sandwich containing a 5-stranded beta-sheet sandwiched between 5 alpha-helices PUBMED:8057369.

    \ 3856 IPR001294 \ Phytochrome belongs to a family of plant photoreceptors that mediate physiological and \ developmental responses to changes in red and far-red light conditions PUBMED:1812812.\ The protein undergoes reversible photochemical conversion between a biologically-inactive \ red light-absorbing form and the active far-red light-absorbing form. Phytochrome is a \ dimer of identical 124 kDa subunits, each of which contains a linear tetrapyrrole \ chromophore, covalently-attached via a Cys residue.\

    In Arabidopsis thaliana, there are genes for at least five phytochrome proteins PUBMED:2606345.\ These photoreceptors control such responses as germination, stem elongation, flowering, \ gene expression, and chloroplast and leaf development. It is not yet known which red \ light responses are controlled by which phytochrome species, or whether the different \ phytochromes have overlapping functions PUBMED:8453299. Synechocystis strain PCC 6803 \ hypothetical protein slr0473 contains a domain similar to that of plants phytochrome and \ seems also to bind a chromophore.\

    \ 909 IPR003228 \ Human transcription initiation factor TFIID is composed of the TATA-binding polypeptide (TBP) and at least 13 TBP-associated factors (TAFs) that collectively or individually are involved in activator-dependent transcription PUBMED:7667268.\ 2668 IPR003854 \ This is the GASA gibberellin regulated cysteine rich protein family. The expression of these proteins is up-regulated by the plant hormone gibberellin, most of these proteins have some role in plant development. There are 12 cysteine residues conserved within the alignment giving the potential for these proteins to posses 6 disulphide bonds.\ 5396 IPR008385 \ This family consists of several African swine fever virus J13L proteins.\ 325 IPR007866 \ This family of proteins has no known function. This region may contain transmembrane alpha helices. The domain is found in a variety of metazoan species.\ 5932 IPR010355 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 5664 IPR008450 \ This family consists of several examples of the Drosophila melanogaster specific chorion protein S16. The chorion genes of Drosophila are amplified in response to developmental signals in the follicle cells of the ovary PUBMED:1908228.\ 7909 IPR012537 \

    This family consists of chloramphenicol (Cm) resistance gene leader peptides. Inducible resistance to Cm in both Gram-positive and Gram-negative bacteria is controlled by translation attenuation. In translation attenuation, the ribosome-binding-site (RBS) for the resistance determinant is sequestered in a secondary structure domain within the mRNA. Preceding the secondary structure is a short, translated ORF termed the leader. Ribosome stalling in the leader causes the destabilization of the downstream secondary structure, allowing initiation of translation of the Cm resistance gene PUBMED:8955642.

    \ 1075 IPR000890 \ Acetate kinase, which is predominantly found in micro-organisms, facilitates the production of \ acetyl-CoA by phosphorylating acetate in the presence of ATP and a divalent cation PUBMED:8226682, \ PUBMED:8396545. The enzyme is important in the process of glycolysis, enzyme levels being increased \ in the presence of excess glucose. The growth of a bacterial mutant lacking acetate kinase has \ been shown to be inhibited by glucose, suggesting that the enzyme is involved in excretion of excess \ carbohydrate PUBMED:8226682. A related enzyme, butyrate kinase, facilitates the formation of \ butyryl-CoA by phosphorylating butyrate in the presence of ATP to form butyryl phosphate PUBMED:8396545.\ 1994 IPR002414 \

    This domain has no known function. It is found in various hypothetical proteins and putative lipoproteins from mycoplasmas.

    \ 6972 IPR009817 \

    This family consists of several Varicellovirus UL45 or gene 15 proteins. The Equine herpesvirus 1 UL45 protein represents a type II membrane glycoprotein which has found to be non-essential for EHV-1 growth in vitro but deletion reduces the viruses' replication efficiency PUBMED:11145911.

    \ 1778 IPR000512 \ Diphtheria toxin () is a 58 kDa protein secreted by lysogenic strains of Corynebacterium diphtheriae. The toxin causes the disease diphtheria in humans by gaining entry into the cell cytoplasm and inhibiting protein synthesis PUBMED:8573568. The mechanism of inhibition involves transfer of the ADP-ribose group of NAD to elongation factor-2 (EF-2), rendering EF-2 inactive. The catalysed reaction is as follows: \ \ The crystal structure of the diphtheria toxin homodimer has been determined to 2.5A resolution PUBMED:1589020. The structure reveals a Y-shaped molecule of 3 domains, a catalytic domain (fragment A), whose fold is of the alpha + beta type; a transmembrane (TM) domain, which consists of 9 alpha-helices, 2 pairs of which may participate in pH-triggered membrane insertion and translocation; and a receptor-binding domain, which forms a flattened beta-barrel with a jelly-roll-like topology PUBMED:1589020. The TM- and receptor binding-domains together constitute fragment B.\ 857 IPR007630 \ Region 4 of sigma-70 like sigma-factors is involved in binding to the -35 promoter element via a helix-turn-helix motif PUBMED:11931761. Due to the way Pfam works, the threshold has been set artificially high to prevent overlaps with other helix-turn-helix families. Therefore there are many false negatives.\ 2008 IPR005584 \ The function of these short proteins is unknown, but they contain four conserved cysteines and may therefore be involved in zinc binding.\ 4354 IPR003284 \ Salmonella typhimurium contains a 90kb plasmid that is associated with\ virulence PUBMED:2164511. This plasmid encodes at least 6 genes needed by the bacterium for invading host macrophages during infection. These include the\ 70kDa mkaA protein PUBMED:1657882, a recognised virulence factor, and more recently described, four spv genes under the control of a regulator PUBMED:8483415. The spv genes are induced under carbon-poor conditions at a stationary phase of growth, and their expression is under the control of both the spvR regulator, and the katF locus in Salmonella. It has been proposed that individual spv proteins may be required at different time points during \ infection PUBMED:9234805.\

    SpvB is a 65kDa protein that has been localised to the bacterial cytoplasm \ PUBMED:9234805. Its expression peaks during early stationary phase, but declines as the latent phase of the infection is reached, suggesting a role in initiating virulence.

    \ 6811 IPR009725 \

    This entry represents a conserved region approximately 100 residues long within a number of bacterial and archaeal 3-demethylubiquinone-9 3-methyltransferases (). Note that some proteins contain more than one copy of this region.

    \ 3657 IPR002500 \ This domain is found in phosphoadenosine phosphosulphate (PAPS) reductase\ enzymes or PAPS sulphotransferase. PAPS reductase is part of the adenine \ nucleotide alpha hydrolases superfamily also including N type ATP PPases\ and ATP sulphurylases PUBMED:9261082. The enzyme uses thioredoxin as an electron \ donor for the reduction of PAPS to phospho-adenosine-phosphate (PAP) PUBMED:9261082, PUBMED:7588765.\ It is also found in NodP nodulation protein P from Rhizobium meliloti which has ATP\ sulphurylase activity (sulphate adenylate transferase) PUBMED:2250719.\ 3520 IPR007128 \ NNF1 is an essential yeast gene required for proper spindle orientation, nucleolar and nuclear envelope structure and mRNA export PUBMED:9247195.\ 1998 IPR005524 \

    This family of integral membrane proteins is predicted to be a group of permeases of unknown specificity.

    \ 893 IPR000061 \ SWAP is derived from the Suppressor-of-White-APricot splicing\ regulator from Drosophila melanogaster. The domain is found in regulators responsible for pervasive, nonsex-specific alternative pre-mRNA\ splicing characteristics and has been found in splicing regulatory proteins PUBMED:8206918. These ancient, conserved\ SWAP proteins share a colinearly arrayed series of novel\ sequence motifs PUBMED:7971282.\ 4673 IPR000678 \ In mammals, the second stage of spermatogenesis is characterized by the conversion of nucleosomal\ chromatin to the compact, nonnucleosomal and transcriptionally inactive form found in the sperm nucleus.\ This condensation is associated with a double-protein transition. The first transition corresponds to the\ replacement of histones by several spermatid-specific proteins, also called transition proteins, which are\ themselves replaced by protamines during the second transition. Nuclear transition protein 2 (TP2) is one\ of those spermatid-specific proteins. TP2 is a basic, zinc-binding protein PUBMED:1930189 of 116 to 137\ amino-acid residues. \

    Structurally, TP2 consists of three distinct parts, a conserved serine-rich N-terminal\ domain of about 25 residues, a variable central domain of 20 to 50 residues which contains cysteine residues,\ and a conserved C-terminal domain of about 70 residues rich in lysines and arginines.

    \ 7296 IPR010904 \

    Amaranthus caudatus agglutinin or amaranthin is a\ lectin from the ancient South American crop, amaranth grain. Although its\ biological function is unknown, it has a high binding specificity for the\ methyl-glycoside of the T-antigen, found linked to serine or threonine\ residues of cell surface glycoproteins PUBMED:2271665. The protein is comprised of a homodimer, with each homodimer consisting of two\ beta-trefoil domains PUBMED:9334739.

    \ \ 6274 IPR010500 \

    Hepcidin is a antibacterial and antifungal protein expressed in the liver and is also a signaling molecule in iron metabolism. The hepcidin protein is cysteine-rich and forms a distorted beta-sheet with an unusual disulphide bond found at the turn of the hairpin PUBMED:12138110.

    \ 5836 IPR009254 \

    Laminins are glycoproteins that are major constituents of the basement membrane of cells. Laminins are trimeric molecules; laminin-1 is an alpha1 beta1 gamma1 trimer. It has been suggested that the domains I and II from laminin A, B1 and B2 may come together to form a triple helical coiled-coil structure PUBMED:3182802. Binding to cells via a high affinity receptor, laminin is thought to mediate the attachment, migration and organisation of cells into tissues during embryonic development by interacting with other extracellular matrix components.

    \ 2602 IPR000292 \

    A number of bacterial and archaebacterial proteins involved in transporting\ formate or nitrite have been shown PUBMED:8022272 to be related:\

    \

    These transporters are proteins of about 280 residues and seem to contain six\ transmembrane regions.

    \ 4314 IPR003315 \ The small G protein Rab3A plays an important role in the regulation of neurotransmitter release. The crystal structure of the small G protein Rab3A complexed with the effector domain of rabphilin-3A shows that the effector domain of rabphilin-3A contacts Rab3A in two distinct areas. The first interface involves the Rab3A switch I and switch II regions, which are sensitive to the nucleotide-binding state of Rab3A. The second interface consists of a deep pocket in Rab3A that interacts with a SGAWFF structural element of rabphilin-3A. Sequence and structure analysis, and biochemical data suggest that this pocket, or Rab complementarity-determining region (RabCDR), establishes a specific interaction between each Rab protein and its effectors. It has been suggested that RabCDRs could be major determinants of effector specificity during vesicle trafficking and fusion PUBMED:10025402.\ 8117 IPR013174 \

    This family corresponds to subunit 3 of dolichol-phosphate mannosyltransferase, an enzyme which generates mannosyl donors for glycosylphosphatidylinositols, N-glycan and protein O- and C-mannosylation. DPM3 is an integral membrane protein and plays a role in stabilising the dolichol-phosphate mannosyl transferase complex PUBMED:10835346.

    \ 3136 IPR004132 \ Kinetoplastid membrane protein 11 is a major cell surface glycoprotein of the parasite Leishmania donovani. It stimulates T-cell proliferation and may play a role in the immunlogy of the dieases Leishmaniasis.\ 1252 IPR001542 \

    Arthropod defensins are a family of insect and scorpion cysteine-rich antibacterial peptides, primarily active against Gram-positive bacteria PUBMED:2911573, PUBMED:2358464, PUBMED:8471044, PUBMED:1761552, PUBMED:1425705. All these peptides range in length from 38 to 51 amino acids. There are six conserved cysteines all involved in intrachain disulphide bonds.

    \

    A schematic representation of peptides from the arthropod defensin family is shown below.\

    \
                +----------------------------+\
                |                            | \
              xxCxxxxxxxxxxxxxxCxxxCxxxxxxxxxCxxxxxCxCxx\
                               |   |               | |\
                               +---|---------------+ |\
                                   +-----------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
     \
    

    \

    Although low level sequence similarities have been reported PUBMED:2911573 between the arthropod defensins and mammalian defensins, the topological arrangement of the disulphide bonds as well as the tertiary structure PUBMED:2401368 are completely different in the two families.

    \ 695 IPR000642 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M41 (FtsH endopeptidase family, clan MA(E)). The predicted active site residues for members of this family and thermolysin, the type example for clan MA, occur in the motif HEXXH.

    \ \ \

    The peptidase M41 family belong to a larger family of zinc metalloproteases. This family\ includes the cell division protein FtsH, and the yeast mitochondrial respiratory chain complexes\ assembly protein, which is a putative ATP-dependent protease required for assembly of the\ mitochondrial respiratory chain and ATPase complexes. FtsH is an integral membrane protein,\ which seems to act as an ATP-dependent zinc metallopeptidase that binds one zinc ion.

    \ 3999 IPR001765 \ Carbonic anhydrases ()(CA) are zinc metalloenzymes which catalyze the reversible hydration of carbon dioxide.\ In Escherichia coli, CA (gene cynT) is involved in recycling carbon dioxide formed in the bicarbonate-dependent decomposition of cyanate by cyanase (gene cynS). By this action, it prevents the depletion of cellular bicarbonate PUBMED:1740425. In photosynthetic bacteria and plant chloroplast, CA is essential to inorganic carbon fixation PUBMED:1584776.\ Prokaryotic and plant chloroplast CA are structurally and evolutionary related and form a family distinct from the one which groups the many different forms of eukaryotic CA's (see ).\ Hypothetical proteins yadF from Escherichia coli and HI1301 from Haemophilus influenzae also belong to this family.\ 2406 IPR003424 \ This family consists of egg-laying hormone (ELH) precursor and atrial gland peptides from the little (Aplysia parvula) and california (Aplysia californica) sea hares. The family also includes ovulation prohormone precursor from the great pond snail (Lymnaea stagnalis ). This family thus represents a conserved gastropoda ovulation and egg production prohormone. Note that many of the proteins present are further cleaved to give individual peptides PUBMED:9520477. Neuropeptidergic bag cells of the marine mollusc A. californica synthesize an egg-laying hormone (ELH) precursor protein which is cleaved to generate several bioacitve peptides including ELH, bag cell peptides (BCP) and acidic peptide (AP) PUBMED:10518477.\ 7395 IPR011523 \

    This region is conserved in several predicted nucleoproteins and transposase-like proteins.

    \ 6521 IPR009576 \

    This family consists of several hypothetical Enterobacterial proteins of around 212 residues in length and is known as YjfM in Escherichia coli. The function of this family is unknown.

    \ 5706 IPR008776 \ This family consists of the Phytoreovirus nonstructural proteins Pns9 and Pns10. The function of this family is unknown.\ 3193 IPR001640 \ Prolipoprotein diacylglyceryl transferase PUBMED:8051048 is the bacterial enzyme that catalyzes the first step in lipoprotein biogenesis, it transfers the n-acyl diglyceride group on what will become the N-terminal cysteine of membrane lipoproteins. Prolipoprotein diacylglyceryl transferase (gene lgt) is an integral membrane protein.\ 7681 IPR012493 \

    The sequences featured in this family are similar to a region of the human renin receptor () that bears a putative transmembrane spanning segment PUBMED:12045255. The renin receptor is involved in intracellular signal transduction by the activation of the ERK1/ERK2 pathway, and it also serves to increase the efficiency of angiotensinogen cleavage by receptor-bound renin, therefore facilitating angiotensin II generation and action on a cell surface PUBMED:12045255.

    \ 7947 IPR012633 \

    This family consists of the SFI family of spider toxins. This family of toxins might share structural, evolutionary and functional relationships with other small, highly structurally constrained spider neurotoxins. These toxins are highly selective agonists/antagonists of different voltage-dependent calcium channels and are extremely valuable reagents in the analysis of neuromuscular function.

    \ 1160 IPR002086 \ Aldehyde dehydrogenases ( and ) are enzymes which oxidize\ a wide variety of aliphatic and aromatic aldehydes using NADP as a cofactor. In mammals at least four\ different forms of the enzyme are known PUBMED:2713359: class-1 (or Ald C) a tetrameric\ cytosolic enzyme, class-2 (or Ald M) a tetrameric mitochondrial enzyme, class-\ 3 (or Ald D) a dimeric cytosolic enzyme, and class IV a microsomal enzyme.\ Aldehyde dehydrogenases have also been sequenced from fungal and bacterial\ species. A number of enzymes are known to be evolutionary related to aldehyde\ dehydrogenases.\ A glutamic acid and a cysteine residue have been implicated in the catalytic\ activity of mammalian aldehyde dehydrogenase. These residues are conserved in\ all the enzymes of this family.\ \

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Alt a 10 and Cla h 3.

    \ 5177 IPR008014 \

    Glycogen synthase kinase-3 (GSK-3) sequentially phosphorylates four serine residues on\ glycogen synthase (GS), in the sequence SxxxSxxxSxxx-SxxxS(p), by recognising and\ phosphorylating the first serine in the sequence motif SxxxS(P) (where S(p) represents a\ phosphoserine). Interaction of GSK-3 with a peptide derived from GSK-3 binding protein\ prevents GSK-3 interaction with Axin. This interaction thereby inhibits the Axin-dependent\ phosphorylation of beta-catenin by GSK-3 PUBMED:11738041.

    \ 4868 IPR007344 \ This family of uncharacterised proteins is also known as GrpB.\ 6040 IPR009346 \

    This family consists of several eukaryotic gene associated with retinoic-interferon-induced mortality 19 (GRIM-19) proteins. GRIM-19, was reported to encode a small protein primarily distributed in the nucleus and was able to promote cell death induced by IFN-ß and RA. A bovine homologue of GRIM-19 was co-purified with mitochondrial NADH:ubiquinone oxidoreductase (complex I) in bovine heart. Therefore, its exact cellular localisation and function are unclear. It has now been discovered that GRIM-19 is a specific interacting protein which negatively regulates Stat3 activity PUBMED:12628925.

    \ 2065 IPR007278 \ The function of this family is unknown. It has been suggested that some members of this family are regulators of transcription.\ 6439 IPR009525 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 7699 IPR012450 \

    This protein is found in some prophages found in Lactobacillales lactis PUBMED:11160885.

    \ 6563 IPR010616 \

    This entry represents a conserved region within plant proline-rich proteins.

    \ 5410 IPR008764 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases associated with clan U- have an unknown catalytic mechanism as the protein fold of the active site domain and the active site residues have not been reported.

    \

    This is a group of peptidases belong to MEROPS peptidase family U57 (clan U-). The type example is the YabG protein of Bacillus subtilis. This is a protease involved in the synthesis and maturation of the spore coat proteins SpoIVA and YrbA of Bacillus subtilis PUBMED:11040425.

    \ 8073 IPR013216 \

    Members of this family are SAM dependent methyltransferases.

    \ 5995 IPR010384 \

    This is a family of uncharacterised bacterial sequences.

    \ 3373 IPR005647 \ This family of proteins includes MND1 from Saccharomyces cerevisiae. The mnd1 protein forms a complex with hop2 to promote homologous chromosome pairing and meiotic double-strand break repair PUBMED:11940665.\ 3808 IPR003513 \ This is a family of proteins from single-stranded DNA bacteriophages. Scaffold proteins B and D are required for\ procapsid formation. Sixty copies of the internal scaffold protein B are found in the procapsid.\ 5077 IPR007914 \

    This family of proteins is functionally uncharacterised.

    \ 771 IPR007216 \ Two of the members in this family have been characterised as being involved in regulation of Ste11 regulated sex genes PUBMED:9671458, PUBMED:9447985.\ 3168 IPR007213 \ This is a family of leucine carboxyl methyltransferases (). This family may need to be subdivided as the full alignment contains a significantly shorter mouse sequence.\ 7223 IPR009977 \

    This family contains a number of bacterial mig-14 proteins (approximately 270 residues long). In Salmonella, mig-14 contributes to resistance to antimicrobial peptides, although the mechanism is not fully understood PUBMED:12029036.

    \ 4573 IPR005092 \

    This family of trans-activating transcriptional regulators (TATR), also known as intermediate early protein 1, are common to the Nucleopolyhedroviruses.

    \ 3791 IPR007802 \ This family consists of several Cytochrome B6-F complex subunit VI (PetL) proteins found in a number of plant species. PetL is one of the small subunits which make up the cytochrome b(6)f complex. PetL is not absolutely required for either the accumulation or for the function of cytochrome b6f; in its absence, however, the complex becomes unstable in vivo in aging cells and labile in vitro. It has been suggested that the N terminus of the protein is likely to lie in the thylakoid lumen PUBMED:11796719.\ 2325 IPR006507 \

    These are putative membrane proteins from alpha and gamma proteobacteria, each making up their own clade. The two clades have less than 25% identity between them.

    \ 2947 IPR001312 \

    Hexokinase is an important enzyme that catalyses the ATP-dependent conversion of aldo- and keto-hexose sugars to the hexose-6-phosphate (H6P). The enzyme can catalyse this reaction on glucose, fructose, sorbitol and glucosamine, and as such is the first step in a number of metabolic pathways PUBMED:1783373. The addition of a phosphate group to the sugar acts to trap it in a cell, since the negatively charged phosphate cannot easily traverse the plasma membrane.

    \ \

    The enzyme is widely distributed in eukaryotes. There are three isozymes of hexokinase in yeast (PI, PII and glucokinase): isozymes PI and PII phosphorylate both aldo- and keto-sugars; glucokinase is specific for aldo-hexoses. All three isozymes contain two domains PUBMED:1783373. Structural studies of yeast hexokinase reveal a well-defined catalytic pocket that binds ATP and hexose, allowing easy transfer of the phosphate from ATP to the sugar PUBMED:10749890. Vertebrates contain four hexokinase isozymes, designated I to IV, where types I to III contain a duplication of the two-domain yeast-type hexokinases. Both the N- and C-terminal halves bind hexose and H6P, though in types I an III only the C-terminal half supports catalysis, while both halves support catalysis in type II. The N-terminal half is the regulatory region. Type IV hexokinase is similar to the yeast enzyme in containing only the two domains, and is sometimes incorrectly referred to as glucokinase.

    \ \

    The different vertebrate isozymes differ in their catalysis, localisation and regulation, thereby contributing to the different patterns of glucose metabolism in different tissues PUBMED:12756287. Whereas types I to III can phosphorylate a variety of hexose sugars and are inhibited by glucose-6-phosphate (G6P), type IV is specific for glucose and shows no G6P inhibition. Type I enzyme may have a catabolic function, producing H6P for energy production in glycolysis; it is bound to the mitochondrial membrane, which enables the coordination of glycolysis with the TCA cycle. Types II and III enzyme may have anabolic functions, providing H6P for glycogen or lipid synthesis. Type IV enzyme is found in the liver and pancreatic beta-cells, where it is controlled by insulin (activation) and glucagon (inhibition). In pancreatic beta-cells, type IV enzyme acts as a glucose sensor to modify insulin secretion. Mutations in type IV hexokinase have been associated with diabetes mellitus.

    \ 4671 IPR000693 \ Sea anemones produce many different neurotoxins with related structure and function. Proteins\ belonging to this family include the neurotoxins, of which there are several, including calitoxin and anthopleurin.\ The neurotoxins bind specifically to the sodium channel, thereby delaying its inactivation during\ signal transduction, resulting in strong stimulation of mammalian cardiac muscle contraction. Calitoxin\ 1 has been found in neuromuscular prearations of crustaceans, where it increases transmitter release,\ causing firing of the axons. Three disulphide bonds are present in this protein.\ 373 IPR000336 \

    Flaviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include yellow fever, West Nile, tick-borne encephalitis (TBE), Japanese encephalitis (JE) and Dengue type 2 viruses PUBMED:15378043. Flaviviruses consist of three structural proteins: the core nucleocapsid protein C (), and the envelope glycoproteins M () and E. Glycoprotein E is a class II viral fusion protein that mediates both receptor binding and fusion. Class II viral fusion proteins are found in flaviviruses and alphaviruses, and are structurally distinct from class I fusion proteins from influenza virus and HIV. Glycoprotein E is comprised of three domains: domain I (dimerisation domain) is an 8-stranded beta barrel, domain II (central domain) is an elongated domain composed of twelve beta strands and two alpha helices, and domain III (immunoglobulin-like domain) is an IgC-like module with ten beta strands. This entry represents the Ig-like domain III, which contains a putative receptor-binding loop PUBMED:12759475.

    \ 7886 IPR012570 \

    This family consists of the leucine operon leader peptide. The leucine operon is involved in the control of the biosynthesis of leucine. Four adjacent leucine codons within the leucine leader RNA are critically important in transcription attenuation-mediated control of leucine operon expression in bacteria. The leader RNA contains translational start and stop signals, a cluster of four leucine codons and overlapping regions of dyad symmetry that are capable of forming stem-and-loop structures PUBMED:3922957.

    \ 378 IPR000752 \ NS2A is a hydrophobic protein about 25 kD in size, which is cleaved from NS1 by a membrane \ bound host protease PUBMED:7474145. NS2A has been found to associate with the dsRNA within the \ vesicle packages. It has also been found that NS2A associates with the known replicase \ components and so NS2A has been postulated to be part of this replicase complex PUBMED:9636360.\ 3511 IPR003816 \

    The nitrate reductase enzyme () is composed of three subunits; an alpha, a beta and two gamma. It is the second nitrate reductase enzyme which it can substitute for the NRA enzyme in Escherichia coli allowing it to use nitrate as an electron acceptor during anoerobic respiration PUBMED:2233673.

    \ \

    Nitrate reductase gamma subunit resembles cytochrome b and transfers electrons from quinones to the beta subunit PUBMED:9738886.

    \ 7009 IPR009841 \

    This family consists of several VirC2 proteins which seem to be found exclusively in Agrobacterium species and Rhizobium etli. VirC2 is known to be involved in virulence in Agrobacterium species but its exact function is unclear PUBMED:3584058, PUBMED:3759904.

    \ 7932 IPR012967 \

    This domain is found at the N terminus of a variety of plant O-methyltransferases. It has been shown to mediate dimerisation of these proteins PUBMED:11224575.

    \ 6006 IPR009331 \

    This family consists of several bacterial proteins which are homologous to the oligogalacturonate-specific porin protein KdgM () from Erwinia chrysanthemi. The phytopathogenic Gram-negative bacteria Erwinia chrysanthemi secretes pectinases, which are able to degrade the pectic polymers of plant cell walls, and uses the degradation products as a carbon source for growth. KdgM is a major outer membrane protein, whose synthesis is strongly induced in the presence of pectic derivatives. KdgM behaves like a voltage-dependent porin that is slightly selective for anions and that exhibits fast block in the presence of trigalacturonate. In contrast to most porins, KdgM seems to be monomeric PUBMED:11773048.

    \ 5427 IPR008596 \ This family consists of Rex/Tax proteins from Homo sapiens and simian T-cell leukaemia viruses. The exact function of these proteins is unknown.\ 7637 IPR012488 \

    The region featured in this family is found repeated in a number of plant proteins, some of which are expressed specifically in nodules formed during symbiotic interactions with certain bacterial species]. Some of these proteins are also termed glycine-rich proteins (GRPs), due to the presence of a glycine-rich C-terminal region in their structures PUBMED:12236598. Bacterial infection is required for the induction of nodule-specific GRP genes, and it is thought that nodule-specific GRPs may play non-redundant roles required at specific stages of nodule development PUBMED:12236598. Members of this group of proteins may be cytosolic, whereas others are thought to be membrane-associated PUBMED:9037164.

    \ 2752 IPR003476 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ This group of beta-galactosidase enzymes () belong to the glycosyl hydrolase 42 family . The enzyme catalyses the hydrolysis of terminal, non-reducing terminal beta-D-galactosidase residues.\ 7352 IPR006565 \

    This bromodomain is found in eukaryotic transcription factors and PHD domain containing proteins ().

    \ 3961 IPR006956 \ This family includes variola (smallpox) and vaccinia virus L5 proteins. L5 is thought to contain a metal-binding region PUBMED:8383392.\ 2854 IPR004131 \

    Two types of proteins that hydrolyse inorganic pyrophosphate (PPi), very different in both amino acid sequence and structure, have been characterised to date: soluble and membrane-bound proton-pumping pyrophosphatases (sPPases and H(+)-PPases, respectively). sPPases are ubiquitous proteins that hydrolyse PPi to release heat, whereas H+-PPases, so far unidentified in animal and fungal cells, couple the energy of PPi hydrolysis to proton movement across biological membranes PUBMED:12451180, PUBMED:10471843. The latter type is represented by this group of proteins. H+-PPases () are also called vacuolar-type inorganic pyrophosphatases (V-PPase) or pyrophosphate-energised vacuolar membrane proton pumps PUBMED:11343697. In plants, vacuoles contain two enzymes for acidifying the interior of the vacuole, the V-ATPase and the V-PPase (V is for vacuolar) PUBMED:10471843.

    Two distinct biochemical subclasses of H+-PPases have been characterised to date: K+-stimulated and K+-insensitive PUBMED:12451180, PUBMED:11343697.

    For additional information please see PUBMED:1311852, PUBMED:10556526.

    \ 4763 IPR003367 \ Thrombospondin is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. It can bind to fibrinogen, fibronectin, laminin and type V collagen. This repeat is found in the thrombospondin family of proteins and probably binds to calcium PUBMED:2430973. Cartilage oligomeric matrix protein is also part of this family.\ 3947 IPR005005 \

    The vaccinia virus F12L gene encodes a 65 kDa protein that is expressed late during infection and is important for\ plaque formation, EEV production and virulence. The F12L protein\ is located on intracellular enveloped virus (IEV) particles, but is absent from immature virions, intracellular mature virus\ and cell-associated enveloped virus. F12L shows co-localization with endosomal compartments\ and microtubules and appears to play a role in the the transport of IEV particles to the cell surface on microtubules PUBMED:11752717.

    \ 7131 IPR010844 \

    This represents a conserved region approximately 100 residues long within eukaryotic occludin proteins and the RNA polymerase II elongation factor ELL. Occludin is an integral membrane protein that localises to tight junctions PUBMED:8276896, while ELL is an elongation factor that can increase the catalytic rate of RNA polymerase II transcription by suppressing transient pausing by polymerase at multiple sites along the DNA PUBMED:8596958.

    \ 6056 IPR010413 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 2681 IPR007392 \

    This domain is found at the C-terminus of D-galactarate dehydratase () which is thought to catalyse the reaction PUBMED:9772162 and altronate hydrolase (altronic acid hydratase, ), which catalyses PUBMED:9579062. As purified, both enzymes are catalytically inactive in the absence of added Fe2+, Mn2+, and beta-mercaptoethanol. Synergistic activation of altronate hydrolase activity is seen in the presence of both iron and manganese ions, suggesting that the enzyme may have two ion binding sites. Mn2+ appears to be part of the enzyme active centre, but the function of the single bound Fe2+ ion is unknown. The hydratase has no Fe-S core PUBMED:3038546. The N-terminal is represented by .

    \ 3460 IPR004679 \ These proteins are members of the citrate:cation symporter (CCS) family. These proteins have 12 GES predicted transmembrane regions. Most members of the CCS family catalyze citrate uptake with either Na+ or H+ as the cotransported cation. However, one member is specific for L-malate and probably functions by a proton symport mechanism.\ 7680 IPR012440 \

    Archaeal and bacterial hypothetical proteins are found in this family, with the region in question being approximately 40 residues long.

    \ 6226 IPR010482 \

    Peroxisomes play diverse roles in the cell, compartmentalising many activities related to lipid metabolism and functioning in the decomposition of toxic hydrogen peroxide. Sequence similarity was identified between two hypothetical proteins and the peroxin integral membrane protein Pex24p PUBMED:12707309.

    \ 2308 IPR007749 \ This family consists of AT14A like proteins from Arabidopsis thaliana. At14a contains a small domain that has sequence similarities to integrins from fungi, insects and humans. Transcripts of At14a are found in all Arabidopsis tissues and the protein localises partly to the plasma membrane PUBMED:10196471.\ 3338 IPR005299 \

    This family of plant methyltransferases contains enzymes that act on a variety of substrates including salicylic acid, jasmonic acid and 7-Methylxanthine. Caffeine is synthesized through sequential three-step methylation of xanthine derivatives at positions 7-N, 3-N, and 1-N. The protein 7-methylxanthine methyltransferase (designated as CaMXMT) catalyses the second step to produce theobromine PUBMED:11108716.

    \ 5485 IPR008524 \ This family consists of several Siphovirus and Lactococcus proteins of unknown function. The viral sequences are thought to be tail component proteins.\ 1436 IPR001393 \ Calsequestrin is the principal calcium-binding protein present in the\ sarcoplasmic reticulum of cardiac and skeletal muscle PUBMED:3379055. It is a highly \ acidic protein that is able to bind over 40 calcium ions and acts as an internal\ calcium store in muscle. Sequence analysis has suggested that calcium is\ not bound in distinct pockets via EF-hand motifs, but rather via \ presentation of a charged protein surface.

    Two forms of calsequestrin\ have been identified. The cardiac form is present in cardiac and slow\ skeletal muscle and the fast skeletal form is found in fast skeletal muscle.\ The release of calsequestrin-bound calcium (through a a calcium\ release channel) triggers muscle contraction.\ The active protein is not highly structured, more than 50% of\ it adopting a random coil conformation PUBMED:3427023. When calcium binds there is a structural change whereby\ the alpha-helical content of the protein increases from 3 to 11% PUBMED:3427023.\ Both forms of calsequestrin are phosphorylated by casein kinase II, but\ the cardiac form is phosphorylated more rapidly and to a higher degree PUBMED:1985907.

    \ 5809 IPR010292 \

    This family consists of several bacterial CreA proteins, the function of which is unknown.

    \ 3682 IPR003173 \ p15 has a bipartite structure composed of an amino-terminal regulatory domain and a carboxy-terminal cryptic DNA-binding domain PUBMED:8062392. The DNA-binding activity of the carboxy-terminal is disguised by the amino-terminal p15 domain. Activity is controlled by protein kinases that target the regulatory domain.\ 1239 IPR006035 \ Arginase, which catalyses the conversion of arginine to urea and ornithine,\ is one of the five members of the urea cycle enzymes that convert ammonia\ to urea as the principal product of nitrogen excretion PUBMED:7916684. There are\ several arginase isozymes: a liver isozyme takes part in the final step of\ the urea cycle in ureotelic animals PUBMED:3540966; other isozymes take part in the\ first step of arginine degradation in various cell types (the kidney, small\ intestine and lactating mammary glands) PUBMED:6094498, and differ in catalytic,\ molecular and immunological properties PUBMED:3540966. Deficiency in the liver isozyme\ leads to argininemia, which is usually associated with hyperammonemia PUBMED:3540966.\ 4684 IPR001156 \

    Transferrins are eukaryotic iron-binding glycoproteins that control the\ level of free iron in biological fluids PUBMED:3032619. The proteins have arisen by duplication of a\ domain, each duplicated domain binding one iron atom. Members of the family include\ blood serotransferrin (siderophilin); milk lactotransferrin (lactoferrin); egg white\ ovotransferrin (conalbumin); and membrane-associated melanotransferrin.

    \ \

    Human lactoferrin is a serine peptidase belonging to MEROPS peptidase family S60, clan SR. It is found at high concentrations in all \ human secretions, where it plays a major role in mucosal defence. Lactoferrin cleaves IgA1 protease at an arginine-rich region defined by amino acids RRSRRSVR and digests Hap at a similar arginine-rich sequence (VRSRRAAR). Ser259 and Lys73 form a catalytic dyad, reminiscent of a number of bacterial serine proteases.

    \ 7427 IPR011457 \

    This is a small family of short hypothetical proteins in Leptospira interrogans.

    \ 6202 IPR010474 \

    This family consists of several bovine specific leukaemia virus receptors which are thought to function as transmembrane proteins, although their exact function is unknown PUBMED:12692298.

    \ 4173 IPR000439 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaebacterial ribosomal proteins can be grouped\ on the basis of sequence similarities PUBMED:7733938. One of these families consists of:

    \ \
  • Mammalian L15.
  • \
  • Insect L15.
  • \
  • Plant L15.
  • \
  • Yeast YL10 (L13) (Rp15r).
  • \
  • Archaebacterial L15e.
  • \ \

    These proteins have about 200 amino acid residues.

    \ 7577 IPR011986 \

    Dioxygenases catalyse the incorporation of both atoms of molecular oxygen into substrates using a variety of reaction mechanisms. Cleavage of aromatic rings is one of the most important functions of dioxygenases, which play key roles in the degradation of aromatic compounds. The substrates of ring-cleavage dioxygenases can be classified into two groups according to the mode of scission of the aromatic ring. Intradiol enzymes () use a non-haem Fe(III) to cleave the aromatic ring between two hydroxyl groups (ortho-cleavage), whereas extradiol enzymes use a non-haem Fe(II) to cleave the aromatic ring between a hydroxylated carbon and an adjacent non-hydroxylated carbon (meta-cleavage) PUBMED:10730195, PUBMED:15264822. These two subfamilies differ in sequence, structural fold, iron ligands, and the orientation of second sphere active site amino acid residues. Extradiol dioxygenases are usually homo-multimeric, bind one atom of ferrous ion per subunit and have a subunit size of about 33 kDa. Extradiol dioxygenases can be divided into three classes. Class I and II enzymes () show sequence similarity, with the two-domain class II enzymes having evolved from a class I enzyme through gene duplication. Class III enzymes are different in sequence and structure, but they do share several common active-site characteristics with the class II enzymes, in particular the coordination sphere and the disposition of the putative catalytic base are very similar. Class III enzymes usually have two subunits, designated A () and B ().

    \

    LigAB is a protocatechuate 4,5-dioxygenase () that belongs to the extradiol class III enzyme family. The LigA subunit of this enzyme is multi-helical, containing a compact array of 6 short helices PUBMED:10467151.

    \ \ 8020 IPR012601 \

    This family consists of the spermatozal protamines. Spermatozal protamines play an important role in remodelling of the sperm chromatin during mammalian spermiogenesis. Nuclear elongation and chromatin condensation are concomitant with modifications in the basic protein complement associated with DNA. Somatic histones are initially replaced by testis -specific histone variants, then by transitional proteins, and ultimately by protamines PUBMED:12672123.

    \ 4468 IPR004865 \ The function of this domain is unknown. It is about 105 amino acid residues in length and is predicted to be predominantly alpha helical. This domain is usually found at the amino terminus of protein that contain a SAND domain . \ \ 6838 IPR010740 \

    This family consists of several mammalian endomucin proteins. Endomucin is an early endothelial-specific antigen that is also expressed on putative hematopoietic progenitor cells.

    \ 6704 IPR010682 \

    This family consists of several plant self-incompatibility response (SCRL) proteins. The male component of the self-incompatibility response in Brassica has been shown to be encoded by the S locus cysteine-rich gene (SCR). SCR is related, at the sequence level, to the pollen coat protein (PCP) gene family whose members encode small, cysteine-rich proteins located in the proteo-lipidic surface layer (tryphine) of Brassica pollen grains PUBMED:11437247.

    \ 7965 IPR012585 \

    This family consists of the anticodon nuclease activator proteins. Pre-existing host tRNAs are reprocessed during bacteriophage T4 infection of certain Escherichia coli strains. In this pathway, tRNA(Lys) is cleaved 5, by the anticodon nuclease to the wobble base and is later restored in polynucleotide kinase and RNA ligase reactions PUBMED:3280805.

    \ 3203 IPR004960 \ The bacterial lipid A biosynthesis protein, or lipid A biosynthesis (KDO)2-(lauroyl)-lipid IVA acyltransferase (EC 2.3.1.-), transfers myristate or laurate, activated on ACP, to the lipid IVA moiety of (KDO)2-(lauroyl)-lipid IVA during lipopolysaccharide core biosynthesis.\ 3175 IPR004864 \

    Different types of LEA proteins are expressed at different stages of late embryogenesis in higher plant seed embryos and under conditions of dehydration stress. The function of these proteins is unknown. This family represents a group of LEA proteins that appear to be distinct from those in .

    \ \ 3236 IPR005538 \

    This family is uncharacterised. It contains the protein LrgA that has been hypothesised to export murein hydrolases PUBMED:8824633.

    \ 1944 IPR004296 \ This domain is located at the C-terminal region of a number of Caenorhabditis elegans proteins of unknown function.\ 5235 IPR008738 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the MEROPS peptidase family C27 (clan CA). The type example is the rubella virus endopeptidase (Rubella virus), which is required for processing of the rubella virus replication protein.

    \ 50 IPR006818 \ This family includes the yeast ASF1 protein, which derepresses transcriptionally silenced genes. The human ASF1 homologue has been found to possess histone chaperone activity, which may explain the derepressing function of this family PUBMED:10759893.\ 5975 IPR010373 \

    This is a family of uncharacterised prophage proteins that are also found in bacteria and humans.

    \ 3360 IPR007874 \ In Escherichia coli assembles into a Z ring at midcell while assembly at polar sites is prevented by the min system. MinC a component of this system, is an inhibitor of FtsZ assembly that is positioned within the cell by interaction with MinDE. MinC is an oligomer, probably a dimer PUBMED:10869074. The C-terminal half of MinC is the most conserved and interacts with MinD. The N-terminal half is thought to interact with FtsZ.\ 2183 IPR007552 \ This is a family of hypothetical prokaryotic proteins.\ 4784 IPR001732 \

    The UDP-glucose/GDP-mannose dehydrogenases are a small group of enzymes which possesses the ability to catalyze the NAD-dependent 2-fold oxidation of an alcholol to an acid without the release of an aldehyde intermediate PUBMED:2470755, PUBMED:9013585.

    \ \

    The enzymes have a wide range of functions. In plants UDP-glucose dehydrogenase, , is an important enzyme in the synthesis of hemicellulose and pectin PUBMED:12031484, which are the components of newly formed cell walls; while in zebrafish UDP-glucose dehydrogenase is required for cardiac valve formation PUBMED:11533493. In Xanthomonas campestris, a plant pathogen, UDP-glucose dehydrogenase is required for virulence PUBMED:11554764.

    \ \

    GDP-mannose dehydrogenase, , catalyzes the formation of GDP-mannuronic acid, which is the monomeric unit from which the exopolysaccharide alginate is formed. Alginate is secreted by a number of bacteria, which include, the pathogenic bacterium Pseudomonas aeruginosa and Azotobacter vinelandii. In Pseudomonas aeruginosa alginate is believed to play an important role in the bacteria's resistance to antibiotics and the host immune response PUBMED:12135385, while in Azotobacter vinelandii it is essential for the encystment process PUBMED:9864323.

    \ 6071 IPR009359 \

    Phenylacetate-CoA oxygenase is comprised of a five gene complex responsible for the hydroxylation of phenylacetate-CoA (PA-CoA) as the second catabolic step in phenylacetic acid (PA) degradation PUBMED:9600981, PUBMED:9748275. Although the exact function of this enzyme has not been determined, it has been shown to be required for phenylacetic acid degradation and has been proposed to function in a multicomponent oxygenase acting on phenylacetate-CoA PUBMED:9748275.

    \ 3131 IPR002494 \ High sulphur proteins are cysteine-rich proteins synthesized\ during the differentiation of hair matrix cells, and form hair\ fibers in association with hair keratin intermediate filaments PUBMED:9524245.\ This family has been divided up into four regions, with the second\ region containing 8 copies of a short repeat PUBMED:9524245. This family is\ also known as B2 or KAP1.\ 7145 IPR009926 \

    This family consists of several hypothetical YcgR proteins. YcgR may be involved in the flagellar motor function and may be a new member of the flagellar regulon PUBMED:11031114.

    \ 7721 IPR012459 \

    This family contains sequences from a number of hypothetical eukaryotic proteins of unknown function. The region featured is approximately 150 amino acids long.

    \ 4557 IPR008253 \

    MARVEL domain is often found in lipid-associating proteins - such as Occludin and MAL family proteins PUBMED:12468223. It may be part of the machinery of membrane apposition events, such as transport vesicle biogenesis.

    \ 2482 IPR002606 \ This family consists of part of the bifunctional enzyme riboflavin \ kinase / FAD synthetase. These enzymes have both ATP:riboflavin \ 5'-phospho transferase and ATP:FMN-adenylyltransferase activities PUBMED:3023344.\ They catalyse the 5'-phosphorylation of riboflavin to FMN and the \ adenylylation of FMN to FAD PUBMED:3023344. A domain has been identified in the N-terminal region that is well conserved in all the bacterial FAD synthetases.This domain has remote similarity to nucleotidyl transferases and, hence, it may be involved in the adenylylation reaction of FAD synthetases PUBMED:12517446.\ 1178 IPR003174 \

    Alpha-TIF (VP16) from Herpes Simplex virus is an essential tegument protein involved in the transcriptional activation of viral immediate early (IE) promoters (alpha genes) during the lytic phase of viral infection. VP16 associates with cellular transcription factors to enhance transcription rates, including the general transcription factor TFIIB and the transcriptional coactivator PC4. The N-terminal residues of VP16 confer specificity for the IE genes, while the C-terminal residues are responsible for transcriptional activation. Within the C-terminal region are two activation regions that can independently and cooperatively activate transcription PUBMED:15654739. VP16 forms a transcriptional regulatory complex with two cellular proteins, the POU-domain transcription factor Oct-1 and the cell-proliferation factor HCF-1 PUBMED:12826401. VP16 is an alpha/beta protein with an unusual fold. Other transcription factors may have a similar topology.

    \ \ 6807 IPR009724 \

    This family contains a number of eukaryotic proteins of unknown function that are approximately 160 residues long.

    \ 5709 IPR008431 \ This family consists of the eukaryotic protein 2',3'-cyclic nucleotide 3'-phosphodiesterase (CNP). 2',3'-cyclic nucleotide 3'-phosphodiesterase (CNP) is one of the earliest myelin-related proteins expressed in differentiating oligodendrocytes and Schwann cells. CNP is abundant in the central nervous system and in oligodendrocytes. This protein is also found in mammalian photoreceptor cells, testis and lymphocytes. Although the biological function of CNP is unknown, it is thought to play a significant role in the formation of the myelin sheath, where it comprises 4% of total protein. CNP selectively cleaves 2',3'-cyclic nucleotides to produce 2'-nucleotides in vitro. Although physiologically relevant substrates with 2',3'-cyclic termini are still unknown, numerous cyclic phosphate containing RNAs occur transiently within eukaryotic cells. Other known protein families capable of hydrolysing 2',3'-cyclic nucleotides include tRNA ligases and plant cyclic phosphodiesterases. The catalytic domains from all these proteins contain two tetra-peptide motifs H-X-T/S-X, where X is usually a hydrophobic residue. Mutation of either histidine in CNP abolishes enzymatic activity PUBMED:11885989.\ 6263 IPR010494 \

    This entry represents several repeats of 31 residues in length and seems to be exclusive to Moraxella catarrhalis UspA proteins. The UspA1 and UspA2 proteins of M. catarrhalis are structurally related and are exposed on the bacterial cell surface where can function adhesins PUBMED:10671460. This repeat is commonly found with the .

    \ 5233 IPR008688 \ The Fo sector of the ATP synthase is a membrane bound complex which mediates proton transport. It is composed of nine different polypeptide subunits (a, b, c, d, e, f, g F6, A6L) PUBMED:8011660.\ 4771 IPR004935 \ Tymoviruses are single stranded RNA viruses. This family includes a protein of unknown function that has been named based on its\ molecular weight. Tymoviruses such as the ononis yellow mosaic tymovirus encode only three proteins. Of these two are overlapping\ this protein overalps a larger ORF that is thought to be the polymerase PUBMED:2800337. \ 3198 IPR001781 \

    Recently PUBMED:1970421, PUBMED:1467648 a number of proteins have been found to contain a conserved cysteine-rich domain of about 60 amino-acid residues. These proteins are:\

    \

    These proteins generally have two tandem copies of a domain, called LIM (for Lin-11 Isl-1 Mec-3) in their N-terminal section. Zyxin and paxillin are exceptions in that they contains respectively three and four LIM domains at their C-terminal extremity. In apterous, isl-1, LH-2, lin-11, lim-1 to lim-3, lmx-1 and ceh-14 and mec-3 there is a homeobox domain some 50 to 95 amino acids after the LIM domains.

    \

    In the LIM domain, there are seven conserved cysteine residues and a histidine. The arrangement followed by these conserved residues is C-x(2)-C-x(16,23)-H-x(2)-[CH]-x(2)-C-x(2)-C-x(16,21)-C-x(2,3)-[CHD]. The LIM domain binds two zinc ions PUBMED:8506279. LIM does not bind DNA, rather it seems to act as an interface for protein-protein interaction.

    \ 6079 IPR010425 \

    This is a family of uncharacterised protein from proteobacteria.

    \ 2290 IPR007003 \ This family includes several uncharacterised archaeal proteins.\ 3037 IPR006723 \ This family includes a 69 kDa protein which has been identified as an islet cell autoantigen in type I diabetes mellitus PUBMED:8975715. Its precise function is unknown.\ 6030 IPR010401 \

    This family includes human glycogen branching enzyme . This enzyme contains a number of distinct catalytic activities. It has been shown for the yeast homologue that mutations in this region disrupt the enzymes Amylo-alpha-1,6-glucosidase ().

    \ 2567 IPR001389 \

    Yeast flocculation protein may be directly involved in the flocculation process PUBMED:7502576. The extensively O-glycosylated protein is probably attached to the membrane by a GPI-anchor.

    \ 7766 IPR012922 \

    The sequences featured in this family are similar to a probable integrase () expressed by the SSV1 virus of the archaeon Sulfolobus shibatae. This protein may be necessary for the integration of the virus into the host genome by a process of site-specific recombination PUBMED:1926776.

    \ 4410 IPR001477 \ The mumps virus SH protein is a membrane protein and not\ essential for virus growth PUBMED:8918542. Its function is unknown.\ 889 IPR000917 \ Sulfatases () are enzymes that hydrolyze various sulphate esters. The sequence of different \ types of sulphatases are available and have shown to be structurally related PUBMED:2303452, PUBMED:2122463, \ PUBMED:2476654, including arylsulphatase A () (ASA), a lysosomal enzyme which hydrolyzes cerebroside \ sulphate; arylsulphatase B () (ASB), which hydrolyzes the sulphate ester group from \ N-acetylgalactosamine 4-sulphate residues of dermatan sulphate; arylsulphatase C (ASD) and E (ASE);\ steryl-sulphatase () (STS), a membrane bound microsomal enzyme which hydrolyzes 3-beta-hydroxy \ steroid sulphates; iduronate 2-sulphatase precursor () (IDS), a lysosomal enzyme that hydrolyzes \ the 2-sulphate groups from non-reducing-terminal iduronic acid residues in dermatan sulphate and heparan \ sulphate; N-acetylgalactosamine-6-sulphatase (), which hydrolyzes the 6-sulphate groups of the \ N-acetyl-d-galactosamine 6-sulphate units of chondroitin sulphate and the D-galactose 6-sulphate units of \ keratan sulphate; glucosamine-6-sulphatase () (G6S), which hydrolyzes the N-acetyl-D-glucosamine \ 6-sulphate units of heparan sulphate and keratan sulphate; N-sulphoglucosamine sulphohydrolase () \ (sulphamidase), the lysosomal enzyme that catalyzes the hydrolysis of N-sulpho-d-glucosamine into glucosamine \ and sulphate; sea urchin embryo arylsulphatase (); green algae arylsulphatase (), which \ plays an important role in the mineralization of sulphates; and arylsulphatase \ () from Escherichia coli \ (aslA), Klebsiella aerogenes (gene atsA) and Pseudomonas aeruginosa (gene atsA).\ 1457 IPR002609 \ This family consists of various caulimovirus viroplasmin\ proteins. The viroplasmin protein is encoded by gene VI \ and is the main component of viral inclusion bodies or viroplasms PUBMED:2402462.\ Inclusions are the site of viral assembly, DNA synthesis and \ accumulation PUBMED:2402462. Two domains exist within gene VI corr\ esponding \ approximately to the 5' third and middle third of gene VI, these influence\ systemic infection in a light-dependent manner PUBMED:8372449.\ 2808 IPR000173 \

    Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) plays an important role in glycolysis and gluconeogenesis PUBMED:2716055 by reversibly catalysing the oxidation and phosphorylation of D-glyceraldehyde-3-phosphate to 1,3-diphospho- glycerate. The enzyme exists as a tetramer of identical subunits, each containing 2 conserved functional domains: an NAD-binding domain, and a highly conserved catalytic domain PUBMED:6303388. The enzyme has been found to bind to actin and tropomyosin, and may thus have a role in cytoskeleton assembly. Alternatively, the cytoskeleton may provide a framework for precise positioning of the glycolytic enzymes, thus permitting efficient passage of metabolites from enzyme to enzyme PUBMED:6303388.

    \

    GAPDH displays diverse non-glycolytic functions as well, its role depending upon its subcellular location. For instance, the translocation of GAPDH to the nucleus acts as a signalling mechanism for programmed cell death, or apoptosis PUBMED:10740219. The accumulation of GAPDH within the nucleus is involved in the induction of apoptosis, where GAPDH functions in the activation of transcription. The presence of GAPDH is associated with the synthesis of pro-apoptotic proteins like BAX, c-JUN and GAPDH itself.

    \

    GAPDH has been implicated in certain neurological diseases: GAPDH is able to bind to the gene products from neurodegenerative disorders such as Huntington’s disease, Alzheimer’s disease, Parkinson’s disease and Machado-Joseph disease through stretches encoded by their CAG repeats. Abnormal neuronal apoptosis is associated with these diseases. Propargylamines such as deprenyl increase neuronal survival by interfering with apoptosis signalling pathways via their binding to GAPDH, which decreases the synthesis of pro-apoptotic proteins PUBMED:12721812.

    \ \ \ 4369 IPR006127 \

    This is a family of periplasmic solute binding proteins such as TroA that interacts with an ATP-binding cassette transport\ system in Treponema pallidum and plays a role in the transport of zinc across the cytoplasmic membrane of the bacterium.

    \ 7593 IPR011676 \ The proteins of this entry are mainly hypothetical proteins expressed by Oryza sativa.\ 5761 IPR009230 \

    This family consists of fungus specific ATP synthase protein 8 (). The family may be related to the ATP synthase protein 8 found in other eukaryotes .

    \ 1216 IPR007240 \ Apg17 is required for activating Apg1 protein kinases.\ 534 IPR001196 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    L15 is one of the proteins from the large ribosomal subunit.\ In Escherichia coli, L15 is known to bind the 23S rRNA. Ribosomal protein, L15 from\ bacteria and plant chloroplasts (nuclear-encoded) belong to this family. Vertebrate L27a, Tetrahymena thermophila L29 and fungal L27a (L29, CRP-1, CYH2)\ also are members of this group PUBMED:.

    \

    Ribosomal L18E protein from a number of archebacteria show homology to both the eukaryotic L18 and eubacterial ribosomal protein L15, an observation which has been seen to substantiate the belief that archaea represent an evolutionary stage between bacteria and eukaryotes PUBMED:10527834.

    \ 1136 IPR001710 \

    Adrenomedullin is a hypotensive peptide, first identified in human pheochromocytoma arising from adrenal medulla PUBMED:8387282, PUBMED:7690563. The protein is ~185 amino acids in length, and includes a 21-residue putative N-terminal signal sequence PUBMED:7690563. The active peptide, which is expressed in adrenal glands, lung, kidney, heart, spleen, duodenum and submandibular glands, is thought to function as a hormone in circulation control.

    \

    The adrenomedullin precursor is believed to contain 2 cleavage sites, one of which produces the active adrenomedullin hormone, and the other, a 20-residue peptide (proadrenomedullin N-terminal 20 peptide, or proam-n20) of unknown function PUBMED:7688224.

    \ 3740 IPR011765 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The majority of the sequences in this entry are metallopeptidases and non-peptidase homologs belong to MEROPS peptidase family M16 (clan ME), subfamilies M16A, M16B and M16C; they include:

    \ \ \

    These proteins do not share many regions of sequence similarity; the most noticeable is in the N-terminal section. This region includes a conserved histidine followed, two residues later by a glutamate and another histidine. In pitrilysin, it has been shown PUBMED:7990931 that this H-x-x-E-H motif is involved in enzymatic activity; the two histidines bind zinc and the glutamate is necessary for catalytic activity.\ \ The proteins classified as non-peptidase homologues either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    \ 4660 IPR004905 \ This family represents the Tombusvirus P19 core protein.\ 4513 IPR006811 \ The yeast Ssu72 is an essential protein that may be involved in transcription start site specification PUBMED:8657130.\ 596 IPR002717 \

    Moz is a monocytic leukemia Zn_finger protein and the SAS protein from Saccharomyces cerevisiae is involved in silencing the Hmr locus. These proteins were reported to be homologous to acetyltransferases PUBMED:8607265 but this similarity is not supported by standard sequence analysis.

    \ 468 IPR001650 \

    The domain, which defines this group of proteins is found in a wide variety of helicases and helicase related proteins. It may be that this is not an autonomously folding unit, but an integral part of the helicase.

    \ 3426 IPR005779 \

    This model describes N5-methyltetrahydromethanopterin: coenzyme M methyltransferase subunit D in methanogenic archaea. This methyltranferase is a\ membrane-associated enzyme complex that uses methyl-transfer reaction to drive sodium-ion pump. \ \ Archaea have evolved energy-yielding pathways marked by one-carbon biochemistry featuring novel cofactors and enzymes. This transferase (encoded by subunit A) is involved in the transfer of 'methyl' group from N5-methyltetrahydromethanopterin to coenzyme M. In an accompanying reaction, methane is produced by two-electron reduction of methyl-coenzyme M by another enzyme, methyl-coenzyme M reductase.

    \ \ 7633 IPR012891 \

    This domain is found in proteins carrying other domains known to be involved in intracellular signalling pathways (such as ) indicating that it might also be involved in these pathways. It has 4 highly conserved cysteine residues, suggesting that it can bind zinc ions. Moreover, it is found repeated in some members of this family (such as ); this may indicate that these domains are able to interact with one another, raising the possibility that this domain mediates heterodimerisation.

    \ 33 IPR001395 \

    The aldo-keto reductase family includes a number of related monomeric \ NADPH-dependent oxidoreductases, such as aldehyde reductase, aldose\ reductase, prostaglandin F synthase, xylose reductase, rho crystallin, and\ many others PUBMED:2498333. All possess a similar structure, with a beta-alpha-beta fold \ characteristic of nucleotide binding proteins PUBMED:2105951.\ The fold comprises a parallel beta-8/alpha-8-barrel, which contains a \ novel NADP-binding motif. The binding site is located in a large,\ deep, elliptical pocket in the C-terminal end of the beta sheet, the \ substrate being bound in an extended conformation. The hydrophobic\ nature of the pocket favours aromatic and apolar substrates over highly\ polar ones PUBMED:1621098.

    Binding of the NADPH coenzyme causes a massive\ conformational change, reorienting a loop, effectively locking the\ coenzyme in place. This binding is more similar to FAD- than to\ NAD(P)-binding oxidoreductases PUBMED:1447221.

    \

    Some proteins of this entry contain a K+ ion channel beta chain regulatory domain; these are reported to have oxidoreductase activity PUBMED:10884227.

    \ 6648 IPR009640 \

    This entry represents the C terminus of a prophage tail fibre protein found mostly in Escherichia coli. This domain is found together with conserved RLGP motif.

    \ 79 IPR003759 \

    Cobalamin-dependent methionine synthase () is a large modular protein that catalyses methyl transfer from methyltetrahydrofolate (CH3-H4folate) to homocysteine. During the catalytic cycle, it supports three distinct methyl transfer reactions, each involving the cobalamin (vitamin B12) cofactor and a substrate bound to its own functional unit PUBMED:11731805. The cobalamin cofactor plays an essential role in this reaction, accepting the methyl group from CH3-H4folate to form methylcob(III)alamin, and in turn donating the methyl group to homocysteine to generate methionine and cob(I)alamin.

    \

    Methionine synthase is a large enzyme composed of four structurally and functionally distinct modules: the first two modules bind homocysteine and CH3-H4folate, the third module binds the cobalamin cofactor and the C-terminal module binds S-adenosylmethionine. The cobalamin-binding module is composed of two structurally distinct domains: a 4-helical bundle cap domain (residues 651-740 in the Escherichia coli enzyme) and an alpha/beta B12-binding domain (residues 741-896) (). The 4-helical bundle forms a cap over the alpha/beta domain, which acts to shield the methyl ligand of cobalamin from solvent PUBMED:8939751. Furthermore, in the conversion to the active conformation of this enzyme, the 4-helical cap rotates to allow the cobalamin cofactor to bind the activation domain (). The alpha/beta domain is a common cobalamin-binding motif, whereas the 4-helical bundle domain with its methyl cap is a distinctive feature of methionine synthases.

    \

    This entry represents the 4-helical bundle cap domain. This domain is also present in other shorter proteins that bind to B12, and is always found N-terminus to the alpha/beta B12-binding domain.

    \ 4917 IPR003842 \

    Helicobacter pylori is a micro-aerophilic bacterium with the extraordinary \ ability to establish infections in human stomachs that can last for years or \ decades, despite immune and inflammatory responses and normal turnover of \ the gastric epithelium and overlying mucin layer in which it resides. Most H.pylori strains secrete a toxin (VacA) that induces multiple \ structural and functional alterations in eukaryotic cells. The most \ prominent effect of VacA is its capacity to induce the formation of large \ cytoplasmic vacuoles in eukaryotic cells. In addition, VacA interferes with \ the process of antigen presentation, increases permeability of polarised \ epithelial cell monolayers, and forms anion-selective membrane channels. \ Formation of channels in endosomal membranes of cells may be an important \ feature of the mechanism by which VacA induces cell vacuolation. H.pylori \ vacA encodes a ~139kDa protoxin, which undergoes cleavage of a 33-residue \ N-terminal signal sequence and C-terminal proteolytic processing to \ yield a mature secreted toxin. Purified VacA degrades during prolonged \ storage into two fragments (of ~34 and 58kDa), which are derived from the\ N- and the C-terminus of the toxin respectively. The mass of the\ experimentally intact toxin (~88.2kDa) corresponds closely to the sum of \ the masses of the two proteolytic fragments PUBMED:11160018.\

    \ Secondary structure predictions suggest that a 35kDa portion of the VacA \ C-terminal domain is rich in amphipathic beta-sheets, and this region \ exhibits low-level similarity to members of the family of autotransporter \ proteins. In addition, at the C-terminus of VacA, there is a phenylalanine-\ containing motif that is commonly found in autotransporter proteins, as well\ as in numerous Gram-negative bacterial outer membrane proteins. An intact \ N-terminal portion of VacA is not required for proteolytic processing of the\ protoxin. However, the N-terminal 32 amino acids of the mature VacA are \ predicted to form the only contiguous hydrophobic region in the protein that\ is long enough to span the membrane. What is more, isogenic H.pylori mutant\ strains in which the C-terminal VacA domain is disrupted, fail to express or\ secrete any detectable VacA, which is probably attributable to the \ degradation of export-incompetent toxin precursors within the periplasm. It \ is speculated that the VacA protoxin may undergo proteolytic cleavage at\ multiple sites downstream from amino acid 854 of the protoxin, which would\ yield a 33kDa cell-associated domain, as well as a fragment of ~15kDa PUBMED:11160018.\ \

    \ 471 IPR012312 \

    Iteration of the HHE family PUBMED:9188702 found it to be related to Hemerythrin. It also demonstrated that what has been described as a single domain PUBMED:11513618 in fact consists of two cation-binding domains. Members of this family occur all across nature and are involved in a variety of processes. For instance, in Nereis diversicolor binds Cadmium so as to protect the organism from toxicity PUBMED:12625841. However Hemerythrin is classically described as Oxygen-binding through two attached Fe2+ ions. And the bacterial is a regulator of response to NO, which suggests yet another set-up for its metal ligands PUBMED:678527. In Staphylococcus aureus has been noted to be important when the organism switches to living in environments with low oxygen concentrations PUBMED:678527; perhaps this protein acts as an oxygen store or scavenger.

    \ 7163 IPR009939 \

    This family consists of several fungal chitosanase proteins. Chitin, xylan, 6-O-sulphated chitosan and O-carboxymethyl chitin are indigestible by chitosanase PUBMED:11115392.

    \ 2667 IPR002003 \ Gas vesicles are small, hollow, gas filled protein structures found in several\ cyanobacterial and archaebacterial microorganisms PUBMED:2513809. They allow the\ positioning of the bacteria at the favorable depth for growth. Gas vesicles\ are hollow cylindrical tubes, closed by a hollow, conical cap at each end.\ Both the conical end caps and central cylinder are made up of 4-5 nm wide\ ribs that run at right angles to the long axis of the structure. Gas vesicles\ seem to be constituted of two different protein components: GVPa and GVPc.\ GVPc is a minor constituent of gas vesicles and seems to be located on the\ outer surface. Structurally, cyanobacterial GVPc consists of four or five\ tandem repeats of a 33 residue sequence flanked by sequences of 18 and 10\ residues at the N- and C-termini, respectively.\ 3634 IPR007814 \ This family includes proteins such as PaaA and PaaC that are part of a catabolic pathway of phenylacetic acid PUBMED:9748275. These proteins may form part of a dioxygenase complex.\ 7323 IPR011123 \

    This region is mostly found at the end of the beta propellers () in a family of two component regulators. However they are also found tandemly repeated in without other signal conduction domains being present. It's named after the conserved tyrosines found in the alignment. The exact function is not known.

    \ 4322 IPR005573 \

    Sigma-E is important for the induction of proteins involved in heat shock response. RseA binds sigma-E via its N-terminal domain, sequestering sigma-E and preventing transcription from heat-shock promoters PUBMED:9159523. The C-terminal domain is located in the periplasm, and may interact with other protein that signal periplasmic stress.

    \ 3447 IPR000432 \

    This is the C-terminal domain of proteins in the mutS family of DNA mismatch repair proteins and is found associated with MutS III domain () and MutS N-terminal region ().\ Yeast MSH3, bacterial proteins involved in DNA mismatch repair and the predicted protein product of the Rep-3 gene of mouse share extensive sequence similarity. \ This family of proteins is named after the Salmonella typhimurium MutS protein that is involved in replication repair and plays a role in preventing recombination between non-identical sequences PUBMED:8510668. \ Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein.\ Mismatch repair contributes to the overall fidelity of DNA replication PUBMED:3304141. It\ involves the correction of mismatched base pairs that have been missed by the\ proofreading element of the DNA polymerase complex. The sequence of some\ proteins involved in mismatch repair in different organisms have been found to\ be evolutionary related PUBMED:1651234, PUBMED:8510668. \ A region rich in glycine and negatively charged residues is found\ in the C-terminal section of these protein about 80 residues to the C-\ terminal of an ATP-binding site .

    \ 4129 IPR000625 \ REV is a viral anti-repression trans-activator protein, which appears to act post-transcriptionally to relieve negative\ repression of GAG and ENV production. It is a phosphoprotein whose state of phosphorylation is mediated by a\ specific serine kinase activity present in the nucleus. REV accumulates in the nucleoli.\ 5470 IPR008845 \ This family consists of several Theileria P67 surface antigens. A stage specific surface antigen of Theileria parva, p67, is the basis for the development of an anti-sporozoite vaccine for the control of East Coast fever (ECF) in Bos taurus. The antigen has been shown to contain five distinct linear peptide sequences recognised by sporozoite-neutralising murine monoclonal antibodies PUBMED:10024569.\ 3733 IPR004970 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This is a group of cysteine peptidases which constitute MEROPS peptidase family C57 (clan CE). The type example is vaccinia virus I7 processing peptidase (vaccinia virus); protein I7 is expressed in the late phase of infection PUBMED:2835495.

    \ 5642 IPR008561 \ This family consists of several unidentified baculovirus proteins of around 85 residues long with no known function.\ 692 IPR000905 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M22 (clan MK). The type example being O-sialoglycoprotein endopeptidase from Pasteurella haemolytica.

    \ \

    O-Sialoglycoprotein endopeptidase is secreted by the bacterium Mannheimia haemolytica and digests only proteins that are heavily sialylated, in\ particular those with sialylated serine and threonine residues PUBMED:7674959.\ Substrate proteins include glycophorin A and leukocyte surface antigens\ CD34, CD43, CD44 and CD45 PUBMED:7674922, PUBMED:7674959. Removal of glycosylation, by treatment\ with neuraminidase, completely negates susceptibility to O-sialoglycoprotein\ endopeptidase digestion PUBMED:7674922, PUBMED:7674959.

    \ \

    Sequence similarity searches have revealed other members of the M22 family,\ from yeast, Mycobacterium, Haemophilus influenzae and the cyanobacterium\ Synechocystis PUBMED:7674922. The zinc-binding and catalytic residues of this family\ have not been determined, although the motif HMEGH may be a zinc-binding\ region PUBMED:7674922.

    \ 5383 IPR008481 \ This family consists several of several uncharacterised proteins from the bacterium Coxiella burnetii. C. burnetii is the causative agent of the Q fever disease.\ 8063 IPR013260 \

    Proteins in this entry are involved in cell cycle progression and pre-mRNA splicing PUBMED:12384582, PUBMED:11102353.

    \ 3466 IPR001433 \

    Bacterial ferredoxin-NADP+ reductase may be bound to the thylakoid membrane or anchored to the thylakoid-bound phycobilisomes.\ Chloroplast ferredoxin-NADP+ reductase () may play a key role in regulating the relative amounts of cyclic and non-cyclic electron flow to meet the demands of the plant for ATP and reducing power. It is involved in the final step in the linear photosynthetic electron transport chain and has also been implicated in cyclic electron flow around photosystem I where its role would be to return electrons from ferredoxin to the cytochrome B-F complex.

    \ \

    This domain is present in a variety of proteins that include, bacterial flavohemoprotein, mammalian NADH-cytochrome b5 reductase, eukaryotic NADPH-cytochrome P450 reductase, nitrate reductase from plants, nitric-oxide synthase, bacterial vanillate demethylase and others.

    \ 7660 IPR012498 \

    Alpha-A conotoxin PIVA () is the major paralytic toxin found in the venom produced by the piscivorous snail Conus purpurascens. This peptide acts by blocking the acetylcholine-binding site of the nicotinic acetylcholine receptor at the neuromuscular junction PUBMED:7673220. The overall shape of the peptide is described as an "iron" with a highly charged hydrophilic loop of 15S-19R forming the "handle" domain that is exposed to the exterior of the protein. The stability of the conotoxin is primarily governed by three disulphide bonds. A triangular structural motif formed by residues 19R, 12H and 6Y is thought to constitute a "binding core" that is important in binding to the acetylcholine receptor PUBMED:9048550.

    \ 7691 IPR012421 \

    This family is found at the C-terminus of the Tropheryma whipplei WisP family proteins PUBMED:12606174.

    \ 6943 IPR009802 \

    This family consists of several hypothetical bacterial proteins of around 110 residues in length. The function of this family is unknown but members seem to be specific to Borrelia burgdorferi (Lyme disease spirochete).

    \ 2546 IPR003680 \

    This family consists of a domain with a flavodoxin-like fold. The family includes bacterial and eukaryotic NAD(P)H dehydrogenase (quinone) . These enzymes catalyse the NAD(P)H-dependent two-electron reductions of quinones and protect cells against damage by free radicals and reactive oxygen species PUBMED:2168383. This enzyme uses a FAD cofactor. The equation for this reaction is NAD(P)H + acceptor = NAD(P)(+) + reduced acceptor. This enzyme is also involved in the bioactivation of prodrugs used in chemotherapy PUBMED:2168383. The family also includes acyl carrier protein phosphodiesterase . This enzyme converts holo-ACP to apo-ACP by hydrolytic cleavage of the phosphopantetheine residue from ACP PUBMED:7568029. This family is related to FMN_red and Flavodoxin_1 .

    \ 1115 IPR000736 \ Hexon is the major coat protein from Adenovirus type 2, and is synthesised during late infection.\ It forms a homo-trimer. The 240 copies of the hexon trimer are organised so that 12 lie on each\ of the 20 facets. The central 9 hexons in a facet are cemented together by 12 copies of polypeptide\ IX. The penton complex, formed by the peripentonal hexons and base hexon (holding in place a\ fibre), lie at each of the 12 vertices PUBMED:7932702.\ 6871 IPR009760 \

    This family consists of several hypothetical bacterial proteins of around 50 residues in length. The function of this family is unknown.

    \ 181 IPR006768 \

    This group of sequences contain a conserved C-terminal domain which is found in the Schizosaccharomyces pombe protein CwfJ (). CwfJ is part of the Cdc5p complex involved in mRNA splicing PUBMED:11884590. This domain is found in association with , which is generally C-terminal and adjacent to this domain.

    \ \ \ 3884 IPR001896 \ This family of membrane/coat proteins are found in a number of different ssRNA plant virus families including potexviruses, hordeiviruses and carlaviruses.\ 6070 IPR010421 \

    This is a family of uncharacterised proteins found in Proteobacteria.

    \ 2810 IPR007245 \ GPI (glycosyl phosphatidyl inositol) transamidase is a multiprotein complex. Gpi16, Gpi8 and Gaa1 for a sub-complex of the GPI transamidase. GPI transamidase adds glycosylphosphatidylinositols (GPIs) to newly synthesized proteins. Gpi16 is an essential N-glycosylated transmembrane glycoprotein. Gpi16 is largely found on the lumenal side of the ER. It has a single C-terminal transmembrane domain and a small C-terminal, cytosolic extension with an ER retrieval motif PUBMED:11598210.\ 3390 IPR006833 \

    Ammonia monooxygenase and the particulate methane monooxygenase are both integral membrane proteins, occurring in ammonia oxidisers and methanotrophs respectively, which are thought to be evolutionarily related PUBMED:7590173. These enzymes have a relatively wide substrate specificity and can catalyse the oxidation of a range of substrates including ammonia, methane, halogenated hydrocarbons and aromatic molecules PUBMED:12209257. These enzymes are composed of 3 subunits - A (), B () and C () - and contain various metal centres, including copper. Particulate methane monooxygenase from Methylococcus capsulatus (Bath) is an ABC homotrimer, which contains mononuclear and dinuclear copper metal centres, and a third metal centre containing a metal ion whose identity in vivo is not certainPUBMED:15674245.

    \

    The soluble regions of particulate methane monooxygenase from Methylococcus capsulatus (Bath) derive primarily from the B subunit. This subunit forms two antiparallel beta sheets and contains the mono- and di- nuclear copper metal centres PUBMED:15674245.

    \ 4012 IPR003434 \ This family consists of a conserved probable envelope protein or ORF2 in porcine reproductive and respiratory syndrome virus (PRRSV) also in the family is a minor structural protein from lactate dehydrogenase-elevating virus.\ 7227 IPR009978 \

    This family consists of several hypothetical bacterial proteins of around 440 residues in length. The function of this family is unknown.

    \ 6344 IPR009486 \

    This family consists of several purine nucleoside permease from both bacteria and fungi PUBMED:9802205.

    \ 6752 IPR009694 \

    This family consists of several hypothetical enterobacterial proteins of around 170 residues in length. Members of this family are found in Escherichia coli, Salmonella typhimurium and Shigella species. The function of this family is unknown.

    \ 2931 IPR004958 \ This is a family of Herpes virus UL4 proteins, which are related to HSV-1, HSV-2, EHV-1 58, and VZV 56 proteins.\ 7053 IPR009866 \

    This family contains human NADH-ubiquinone oxidoreductase subunit NDUFB4 () and related sequences.

    \ 1933 IPR003848 \

    This domain of unknown function is found in several uncharacterized proteins.

    \ 7129 IPR009916 \

    This family consists of several hypothetical bacterial proteins of around 150 residues in length. The function of this family is unknown. Members of this family seem to be found exclusively in the Order Bacillales.

    \ 6769 IPR010708 \

    This family consists of several 5' nucleotidase, deoxy (Pyrimidine), and cytosolic type C (NT5C) proteins. 5'(3')-Deoxyribonucleotidase is a ubiquitous enzyme in mammalian cells whose physiological function is not known PUBMED:10681516.

    \ 754 IPR006970 \

    This short repeat is composed on the tetrapeptide XPTX. This repeat is found in a variety of proteins, however it is not clear if these repeats are homologous to each other.

    \ 2728 IPR001360 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 1 comprises enzymes with a number of known activities; beta-glucosidase (); beta-galactosidase (); 6-phospho-beta-galactosidase (); 6-phospho-beta-glucosidase (); lactase-phlorizin hydrolase (), (); beta-mannosidase (); myrosinase ().

    \ 566 IPR007145 \ This is a family of microtubule associated proteins. One of its members is the yeast anaphase spindle elongation protein.\ 5528 IPR008409 \ This family consists of several eukaryotic sequences of unknown function. The mammalian members of this family are annotated as breast carcinoma amplified sequence 2 (BCAS2) proteins PUBMED:12169396. BCAS2 is a putative spliceosome associated protein PUBMED:9731529.\ 4593 IPR004980 \

    This is a non-structural protein found in members of the Tenuivirus family.

    \ 5364 IPR008709 \ This family contains several eukaryotic neurochondrin proteins. Neurochondrin induces hydroxyapatite resorptive activity in bone marrow cells resistant to bafilomycin A1, an inhibitor of macrophage- and osteoclast-mediated resorption. Expression of the gene is localised to chondrocyte, osteoblast, and osteocyte in the bone and to the hippocampus and Purkinje cell layer of cerebellum in the brain PUBMED:10231559.\ 2636 IPR001019 \

    Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha (), beta () and gamma () PUBMED:14762218. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation PUBMED:15294442. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal PUBMED:15119945. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins () or by covalent modifications PUBMED:11313912.

    \

    There are several isoforms of each subunit, many of which have splice variants, which together can make up hundreds of combinations of G proteins. The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli PUBMED:9278091, PUBMED:11882385. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.

    \ \

    This family consists of the G protein alpha subunit, which acts as a weak GTPase. G protein classes are defined based on the sequence and function of their alpha subunits, which in mammals fall into four main categories: G(S)alpha, G(Q)alpha, G(I)alpha and G(12)alpha; there are also fungal and plant classes of alpha subunits. The alpha subunit consists of two domains: a GTP-binding domain and a helical insertion domain (). The GTP-binding domain is homologous to Ras-like small GTPases, and includes switch regions I and II, which change conformation during activation. The switch regions are loops of alpha-helices with conformations sensitive to guanine nucleotides. The helical insertion domain is inserted into the GTP-binding domain before switch region I and is unique to heterotrimeric G proteins. This helical insertion domain functions to sequester the guanine nucleotide at the interface with the GTP-binding domain and must be displaced to enable nucleotide dissociation.

    \ 6356 IPR009493 \

    This family consists of several phage and bacterial proteins which are closely related to the GpE tail protein from Phage P2.

    \ 7664 IPR012854 \

    Copper amine oxidases catalyse the oxidative deamination of primary amines to the corresponding aldehydes, while reducing molecular oxygen to hydrogen peroxide. These enzymes are dimers of identical subunits, each comprising four domains. The N-terminal domain, which is absent in some amine oxidases, consists of a five-stranded antiparallel beta sheet twisted around an alpha helix. The D1 domains from the two subunits comprise the stalk, of the mushroom-shaped dimer, and interact with each other but do not pack tightly against each other PUBMED:8591028, PUBMED:10576737.

    \ 5329 IPR008474 \ This family is predominated by ORFs from Circoviridae. The function of this family remains to be determined.\ 7016 IPR010805 \

    This family consists of Kaposi's sarcoma-associated herpesvirus (KSHV) K8 proteins. KSHV is a human Gammaherpesvirus related to Epstein-Barr virus (EBV) and herpesvirus saimiri. KSHV open reading frame K8 encodes a basic region-leucine zipper protein of 237 aa that homodimerises. K8 interacts and co-localises with human SNF5 (), a cellular chromatin-remodelling factor, both in vivo and in vitro. K8 is thought to function as a transcriptional activator under specific conditions and its transactivation activity requires its interaction with the cellular chromatin remodelling factor hSNF5 PUBMED:12604819.

    \ 5940 IPR009299 \

    This family consists of several Gammaherpesvirus capsid proteins. The exact function of this family is unknown.

    \ 343 IPR006806 \ This is a family of eukaryotic NADH-ubiquinone oxidoreductase subunits () () from complex I of the electron transport chain initially identified in Neurospora crassa as a 29.9 kDa protein. The conserved region is found at the N-terminus of the member proteins PUBMED:1830489.\ 62 IPR000225 \

    The armadillo (Arm) repeat is an approximately 40 amino acid long tandemly repeated sequence motif first identified in the Drosophila melanogaster segment polarity gene armadillo involved in signal transduction through wingless. Animal Arm-repeat proteins function in various processes, including intracellular signalling and cytoskeletal regulation, and include such proteins as beta-catenin, the junctional plaque protein plakoglobin, the adenomatous polyposis coli (APC) tumour suppressor protein, and the nuclear transport factor importin-alpha, amongst others PUBMED:9770300. A subset of these proteins is conserved across eukaryotic kingdoms. In higher plants, some Arm-repeat proteins function in intracellular signalling like their mammalian counterparts, while others have novel functions PUBMED:12946625.

    \

    The 3-dimensional fold of an armadillo repeat is known from the crystal structure of beta-catenin, where the 12 repeats form a superhelix of alpha helices with three helices per unit PUBMED:9298899. The cylindrical structure features a positively charged grove, which presumably interacts with the acidic surfaces of the known interaction partners of beta-catenin.

    \ \ 2080 IPR007334 \ This family consists of bacterial uncharacterised proteins.\ 3476 IPR002164 \

    It is thought that NAPs act as histone chaperones, shuttling both core and linker histones from their site of synthesis in the cytoplasm to the nucleus. The proteins may be involved in regulating gene expression and therefore cellular differentiation PUBMED:9325046, PUBMED:8923009.

    \

    The centrosomal protein c-Nap1, also known as Cep250, has been implicated in the\ cell-cycle-regulated cohesion of microtubule-organizing centers. This 281 kDa\ protein consists mainly of domains predicted to form coiled coil structures. The C-terminal\ region defines a novel histone-binding domain that is responsible for targeting CNAP1, and possibly condensin, to mitotic\ chromosomes PUBMED:12138188. During interphase, C-Nap1 localizes to the proximal\ ends of both parental centrioles, but it dissociates from these structures at the onset of mitosis. Re-association with centrioles\ then occurs in late telophase or at the very beginning of G1 phase, when daughter cells are still connected by post-mitotic\ bridges. Electron microscopic studies performed on isolated centrosomes suggest that a proteinaceous linker connects parental centrioles and C-Nap1 may be part of a linker structure that assures the cohesion of duplicated centrosomes during interphase, but that is dismantled upon centrosome separation at the onset of mitosis PUBMED:12140259.

    \ 4358 IPR001985 \

    S-adenosylmethionine decarboxylase (AdoMetDC) PUBMED:10378277 catalyzes the removal of the carboxylate group of S-adenosylmethionine to form S-adenosyl-5'-3-methylpropylamine which then acts as the n-propylamine group donor in the synthesis of the polyamines spermidine and spermine from putrescine.

    \

    The catalytic mechanism of AdoMetDC involves a covalently-bound pyruvoyl group. This group is post-translationally generated by a self-catalyzed intramolecular proteolytic cleavage reaction between a glutamate and a serine. This cleavage generates two chains, beta (N-terminal) and alpha (C-terminal). The N-terminal serine residue of the alpha chain is then converted by nonhydrolytic serinolysis into a pyruvyol group.

    \ 5106 IPR007943 \

    This domain is found in members of the junctin, junctate and aspartyl beta-hydroxylase\ protein families. Junctate is an integral ER/SR membrane calcium binding protein, which comes from an\ alternatively spliced form of the same gene that generates aspartyl beta-hydroxylase and junctin\ PUBMED:11735129. Aspartyl beta-hydroxylase catalyses the post-translational hydroxylation of\ aspartic acid or asparagine residues contained within epidermal growth factor (EGF) domains of\ proteins PUBMED:11773073.

    \ 1094 IPR001792 \

    Acylphosphatase () is an enzyme of approximately 98 amino acid residues that specifically catalyses the hydrolysis of the carboxyl-phosphate bond of acylphosphates PUBMED:1664426, its substrates including 1,3-diphosphoglycerate and carbamyl phosphate PUBMED:2538623. The enzyme has a mainly beta-sheet structure with 2 short alpha-helical segments. It is distributed in a tissue-specific manner in a wide variety of species, although its physiological role is as yet unknown PUBMED:2538623: it may, however, play a part in the regulation of the glycolytic pathway and pyrimidine biosynthesis PUBMED:2830253. There are two known isozymes. One seems to be specific to muscular tissues, the other, called 'organ-common type', is found in many different tissues. While bacterial and archebacterial hypothetical proteins that are highly similar to that enzyme and that probably possess the same activity.

    \

    These proteins include:\

    \ 5228 IPR008622 \ This family contains several bacterial flagellar FliT proteins. The flagellar proteins FlgN and FliT have been proposed to act as substrate specific export chaperones, facilitating incorporation of the enterobacterial hook-associated axial proteins (HAPs) FlgK/FlgL and FliD into the growing flagellum. In Salmonella typhimurium flgN and fliT mutants, the export of target HAPs is reduced, concomitant with loss of unincorporated flagellin into the surrounding medium PUBMED:11169117.\ 234 IPR003740 \

    This entry describes proteins of unknown function.

    \ 3843 IPR004624 \

    This protein family includes an uncharacterised member designated phnA in Escherichia coli, part of a large operon associated with alkylphosphonate uptake and carbon-phosphorus bond cleavage PUBMED:2155230. This protein is not related to the characterised phosphonoacetate hydrolase designated PhnA PUBMED:9300819.

    \ 4515 IPR004978 \

    Stanniocalcin (STC) is a calcium- and phosphate-regulating hormone produced in bony fish by the corpuscles of Stannius,\ which are located close to the kidney. It is a major antihypercalcemic hormone in fish. Recent results\ suggest that the biological repertoires of STCs in mammals will be considerably larger than in fish and may not be limited to\ mineral metabolism.

    \ 3542 IPR004740 \

    This family of proteins transports nucleosides at a high affinity. The transport mechanism is driven by proton motive force. This family includes nucleoside permease (NupG) and xanthosine permease (XapB) from Escherichia coli.

    \ 2329 IPR007827 \ This family contains uncharacterised baculoviral proteins.\ 2865 IPR002518 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases associated with clan U- have an unknown catalytic mechanism as the protein fold of the active site domain and the active site residues have not been reported.

    \

    The group of proteins, non-structural protein 2 (NS2) of hepatitis C virus, are peptidases belonging to MEROPS peptidase family U39 (hepatitis C virus endopeptidase 2).

    \ \

    The viral genome is translated into a single polyprotein of\ about 3000 amino acids. Generation of the mature non-structural proteins\ relies on the activity of viral proteases. NS2 is an zinc-dependent endopeptidase which cleaves at the NS2/NS3 junction PUBMED:9224925, PUBMED:9261354. The action of NS3 proteinase (NS3P, ), which resides in the N-terminal one-third of the NS3 protein, then yields all remaining non-structural proteins.

    \ 70 IPR000637 \ High mobility group (HMG) proteins are a family of relatively low molecular weight non-histone components\ in chromatin. HMG-I and HMG-Y are proteins of about 100 amino acid residues which are produced by the\ alternative splicing of a single gene. HMG-I proteins bind preferentially to the minor groove of AT-rich\ regions in double-stranded DNA PUBMED:1692833, PUBMED:8414980. It is suggested that these proteins could function\ in nucleosome phasing and in the 3' end processing of mRNA transcripts. They are also involved in the\ transcription regulation of genes containing, or in close proximity to, AT-rich regions. DNA-binding of these,\ and several related, proteins is effected by an 11-residue domain known as an AT-hook. Within known HMG-I\ proteins are found three highly conserved regions, closely related to the consensus sequence TPKRPRGRPKK. A\ synthetic oligopeptide with this sequence specifically binds to substrate DNA in a manner reminiscent of\ intact HMG-I proteins. Structure predictions suggest that the peptide has a secondary structure similar to\ the anti-tumour and anti-viral drugs netropsin and distamycin, and to the dye Hoechst 33258. These ligands,\ which also preferentially bind to AT-rich DNA, effectively compete with both the synthetic peptide and the\ HMG-I proteins for DNA binding. The peptide also contains novel structural features such as a predicted Asx\ bend, or 'hook', at its N-terminus, and laterally-projecting cationic Arg/Lys 'bristles', which may play a\ role in the binding of HMG-I proteins. The predicted peptide structure, the AT-hook, is a previously\ undescribed DNA-binding motif PUBMED:1692833.\ \ 1864 IPR002847 \

    This is a group of prokaryotic proteins that have no known function. The signature is sometimes found at the N-terminal of proteins that contain the nitroreductase family signature .

    \ 4331 IPR000894 \ RuBisCO (ribulose-1,5-bisphosphate carboxylase/oxygenase) is a bifunctional enzyme that catalyses \ both the carboxylation and oxygenation of ribulose-1,5-bisphosphate (RuBP) PUBMED:, thus \ fixing carbon dioxide as the first step of the Calvin cycle. RuBisCO is the major protein in the \ stroma of chloroplasts, and in higher plants exists as a complex of 8 large and 8 small subunits. \ The function of the small subunit is unknown PUBMED:3012537. While the large subunit is coded for by \ a single gene, the small subunit is coded for by several different genes, which are distributed in a \ tissue specific manner. They are transcriptionally regulated by light receptor phytochrome PUBMED:3010233, \ which results in RuBisCO being more abundant during the day when it is required.\ 2116 IPR007402 \ This is a family of uncharacterised proteins.\ 7313 IPR011094 \

    This family is the lppY/lpqO homologue family. They are related to 'probable conserved lipoproteins' LppY and LpqO from Mycobacterium bovis. \

    \ 3956 IPR007674 \ Previously uncharacterised I6 protein binds tightly and with great specificity to the hairpin form of the viral telomeric sequence. This telomere binding protein is thought to play a role in the initiation of vaccinia virus genome replication and/or genome encapsidation PUBMED:11581377.\ 2087 IPR007352 \ This is a predicted membrane protein with four transmembrane helices.\ 3288 IPR006856 \

    This family includes Saccharomyces cerevisiae mating type protein alpha 1 (). MAT alpha 1 is a transcription activator that activates mating-type alpha-specific genes with the help of the MADS-box containing MCM1 transcription factor, which together bind cooperatively to PQ elements upstream of alpha-specific genes. The MCM1-MATalpha1 complex is required for the proper DNA-bending that is needed for transcriptional activation PUBMED:15118075. Alpha 1 interacts in vivo with STE12, linking expression of alpha-specific genes to the alpha-pheromone () response pathway PUBMED:8339934.

    \ \ \ \ 2805 IPR000328 \ The gp41 subunit of the envelope protein complex from human immunodeficiency virus (HIV)\ and simian immunodeficiency viruses (SIV) mediates\ membrane fusion during viral entry PUBMED:9689046.\ 3272 IPR001465 \

    Malate synthase () catalyzes the aldol condensation of glyoxylate with acetyl-CoA to form malate as part of the second step of the glyoxylate bypass and an alternative to the tricarboxylic acid cycle in bacteria, fungi and plants. Malate synthase has a TIM beta/alpha-barrel fold PUBMED:10715138.

    \ \ 7560 IPR011712 \

    This is the dimerisation and phosphoacceptor domain of a sub-family of histidine kinases. It shares sequence similarity with and .

    \ 6103 IPR009376 \

    This family consists of several Lactococcus lactis bacteriophage and Lactococcus lactis proteins of unknown function.

    \ 911 IPR007365 \

    This entry represents the dimerisation domain found in the transferrin receptor, as well as in a number of other proteins including glutamate carboxypeptidase II and N-acetylated-alpha-linked acidic dipeptidase like protein.

    \

    The transferrin receptor (TfR) assists iron uptake into vertebrate cells through a cycle of endo- and exocytosis of the iron transport protein transferrin (Tf). TfR binds iron-loaded (diferric) Tf at the cell surface and carries it to the endosome, where the iron dissociates from Tf. The apo-Tf remains bound to TfR until it reaches the cell surface, where apo-Tf is replaced by diferric Tf from the serum to begin the cycle again. Human TfR is a homodimeric type II transmembrane protein. The crystal structure of a TfR monomer reveals a 3-domain structure: a protease-like domain that closely resembles carboxy- and amino-peptidases; an apical domain consisting of a beta-sandwich; and a helical dimerisation domain. The dimerisation domain consists of a 4-helical bundle that makes contact with each of the three domains in the dimer partner PUBMED:10531064.

    \ \ 1531 IPR000789 \

    In eukaryotes, cyclin-dependent protein kinases interact with cyclins to regulate cell cycle\ progression, and are required for the G1 and G2 stages of cell division PUBMED:3322810. The\ proteins bind to a regulatory subunit, cyclin-dependent kinase regulatory subunit (CKS),\ which is essential for their function. This regulatory subunit is a small protein of 79 to 150\ residues. In yeast (gene CKS1) and in fission yeast (gene suc1) a single isoform is known,\ while mammals have two highly related isoforms. The regulatory subunits exist as hexamers,\ formed by the symmetrical assembly of 3 interlocked homodimers, creating an unusual \ 12-stranded beta-barrel structure PUBMED:8211159. Through the barrel centre runs a 12A diameter\ tunnel, lined by 6 exposed helix pairs PUBMED:8491379. Six kinase units can be modelled to bind the\ hexameric structure, which may thus act as a hub for cyclin-dependent protein kinase\ multimerisation PUBMED:8491379, PUBMED:8211159.

    \ \ 5689 IPR008773 \ This family consists of several proteobacterial phosphonate metabolism protein (PhnI) sequences. Bacteria that use phosphonates as a phosphorus source must be able to break the stable carbon-phosphorus bond. In Escherichia coli phosphonates are broken down by a C-P lyase that has a broad substrate specificity. The genes for phosphonate uptake and degradation in E. coli are organised in an operon of 14 genes, named phnC to phnP. Three gene products (PhnC, PhnD and PhnE) comprise a binding protein-dependent phosphonate transporter, which also transports phosphate, phosphite, and certain phosphate esters such as phosphoserine; two gene products (PhnF and PhnO) may have a role in gene regulation; and nine gene products (PhnG, PhnH, PhnI, PhnJ, PhnK, PhnL, PhnM, PhnN, and PhnP) probably comprise a membrane-associated C-P lyase enzyme complex PUBMED:1335942.\ 6729 IPR010692 \

    This family consists of several RTX iron-regulated FrpC proteins which appear to be found exclusively in Neisseria meningitidis. FrpC has been shown to be related to the RTX family of bacterial cytotoxins. FrpC is found in the meningococcal outer membrane. The function of this family is unknown although it is thought to be a virulence factor PUBMED:12654851.

    \ 2724 IPR000343 \

    Delta-aminolevulinic acid (ALA) is the obligatory precursor for the synthesis\ of all tetrapyrroles including porphyrin derivatives such as chlorophyll and\ heme. ALA can be synthesized via two different pathways: the Shemin (or C4)\ pathway which involves the single step condensation of succinyl-CoA and\ glycine and which is catalyzed by ALA synthase () and via the C5\ pathway from the five-carbon skeleton of glutamate. The C5 pathway operates\ in the chloroplast of plants and algae, in cyanobacteria, in some eubacteria\ and in archaebacteria.

    \ The initial step in the C5 pathway is carried out by members of this family, glutamyl-tRNA reductases\ (GluTR) PUBMED:1502723 which catalyzes the Mg2+/NADPH-dependent conversion of glutamate-\ tRNA(Glu) to glutamate-1-semialdehyde (GSA) with the concomitant release of\ tRNA(Glu) which can then be recharged with glutamate by glutamyl-tRNA\ synthetase. GSA is converted to ALA by GSA aminotransferase. This example of an aminoacyl-tRNA being used in any reaction\ other \ than peptide bond formation is highly unusual.

    \

    \ GluTR is a protein of about 50 Kd (467 to 550 residues) which contains a few\ conserved region. The best conserved region is located in positions 99 to 122\ in the sequence of known GluTR. This region seems important for the activity\ of the enzyme.

    \ 3544 IPR003154 \ This family contains both S1 and P1 nucleases () which cleave RNA and single stranded DNA with no base specificity. \ 5011 IPR007087 \

    Zinc finger domains PUBMED:3125980, PUBMED: are nucleic acid-binding protein structures first \ identified in the Xenopus laevis transcription factor TFIIIA. These domains have since been found in \ numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid \ residues including 2 conserved Cys and 2 conserved His residues in a C-2-C-12-H-3-H type motif. \ The 12 residues separating the second Cys and the first His are mainly polar and basic, implicating \ this region in particular in nucleic acid binding. The zinc finger motif is an unusually small, \ self-folding domain in which Zn is a crucial component of its tertiary structure. All bind 1 atom of \ Zn in a tetrahedral array to yield a finger-like projection, which interacts with nucleotides in the \ major groove of the nucleic acid. The Zn binds to the conserved Cys and His residues. Fingers have \ been found to bind to about 5 base pairs of nucleic acid containing short runs of guanine residues. \ They have the ability to bind to both RNA and DNA, a versatility not demonstrated by the helix-turn-helix motif. The zinc finger may thus represent the original nucleic acid binding protein. It has \ also been suggested that a Zn-centred domain could be used in a protein interaction, e.g. in protein \ kinase C. Many classes of zinc fingers are characterized according to the number and positions of the \ histidine and cysteine residues involved in the zinc atom coordination. In the first class to be \ characterized, called C2H2, the first pair of zinc coordinating residues are cysteines, while the \ second pair are histidines.

    \ \ 4929 IPR003128 \

    Villin is an F-actin bundling protein involved in the\ maintenance of the microvilli of the absorptive epithelia. The villin-type "headpiece" domain is a modular motif found at the extreme C-terminus of larger "core" domains in over 25\ cytoskeletal proteins in plants and animals, often in assocation with the Gelsolin repeat. Although the headpiece is classified as an F-actin-binding domain, it has been shown \ that not all headpiece domains are intrinsically F-actin-binding motifs, surface charge distribution\ may be an important element for F-actin recognition PUBMED:11977079. An autonomously folding, 35 residue, thermostable subdomain (HP36) of the full-length 76 amino acid residue villin\ headpiece, is the smallest known example of a cooperatively folded domain of a naturally occurring protein. The structure of HP36, as determined by NMR\ spectroscopy, consists of three short helices surrounding a tightly packed hydrophobic core PUBMED:12095260.

    \ 706 IPR005021 \ The majority of the members of this family are bacteriophage proteins, several of which are thought to be terminase large subunit proteins. There are also a number\ of bacterial proteins of unknown function.\ 238 IPR004245 \ Members of this family are uncharacterised with a long conserved region that may contain several domains.\ 1450 IPR011614 \

    Catalases () are antioxidant enzymes that catalyse the conversion of hydrogen peroxide to water and molecular oxygen, serving to protect cells from its toxic effects PUBMED:11351128. Hydrogen peroxide is produced as a consequence of oxidative cellular metabolism and can be converted to the highly reactive hydroxyl radical via transition metals, this radical being able to damage a wide variety of molecules within a cell, leading to oxidative stress and cell death. Catalases act to neutralise hydrogen peroxide toxicity, and are produced by all aerobic organisms ranging from bacteria to man. Most catalases are mono-functional, haem-containing enzymes, although there are also bifunctional haem-containing peroxidase/catalases () that are closely related to plant peroxidases, and non-haem, manganese-containing catalases () that are found in bacteria PUBMED:14745498.

    \ \

    This entry represents a conserved region within catalase enzymes ().

    \ 5036 IPR004457 \ An orthologous protein found once in each of the completed archaeal genomes corresponds to a zinc finger-containing domain repeated as the N-terminal and C-terminal halves of the mouse protein ZPR1. ZPR1 is an experimentally proven zinc-binding protein that binds the tyrosine kinase domain of the epidermal growth factor receptor (EGFR); binding is inhibited by EGF stimulation and tyrosine phosphorylation, and activation by EGF is followed by some redistribution of ZPR1 to the nucleus. By analogy, other proteins with the ZPR1 zinc finger domain may be regulatory proteins that sense protein phosphorylation state and/or participate in signal transduction (see also ).\ 7399 IPR011442 \

    These proteins are associated with in transcription initiation factor TFIID subunit 6 (TAF6).

    \ 3090 IPR003634 \ Interleukin-13 (IL-13) is a pleiotropic cytokine which may be important in the regulation of the inflammatory and immune responses PUBMED:8096327. It inhibits inflammatory cytokine production and synergises with IL-2 in regulating interferon-gamma synthesis. The sequences of IL-4 and IL-13 are distantly related.\ 1662 IPR003785 \ Creatininase () catalyses the hydrolysis of creatinine to creatine PUBMED:7670196.\ 6926 IPR010773 \

    This family consists of several bacterial proteins of around 115 residues in length. Members of this family are found in Bacillus species and Streptomyces coelicolor, the function of the family is unknown.

    \ 1234 IPR001535 \ Arenaviruses are single stranded RNA viruses. The arenavirus S RNAs that have been characterised include conserved terminal sequences, an ambisense arrangement of the coding regions for the precursor glycoprotein (GPC) and nucleocapsid (N) proteins and an intergenic region capable of forming a base-paired "hairpin" structure. The mature glycoproteins that result are G1 and G2 and the N protein PUBMED:2042397.\

    Tacaribe virus (TACV) is an arenavirus that is genetically and antigenically\ closely related to Junin virus (JUNV), the aetiological agent of Argentine\ haemorrhagic fever (AHF). It is well established that TACV protects experimental animals fully against an otherwise lethal challenge with JUNV. It has been established that it is the heterologous glycoprotein that protects against JUNV challenge. A recombinant vaccinia virus that expresses JUNV glycoprotein precursor (VV-GJun) protected seventy-two percent of the animals inoculated with two doses of VV-GJun against the lethal JUNV challenge PUBMED:10769070.

    \ 8140 IPR013229 \

    This domain is found in both archaea and bacteria and has similarity to S-layer (surface layer) proteins. It is named after the characteristic PEGA sequence motif found in this domain. The secondary structure of this domain is predicted to be beta-strands PUBMED:.

    \ 6397 IPR009507 \

    This family consists of several short, hypothetical bacterial proteins of unknown function.

    \ 8036 IPR013270 \

    This family represents the CD47 leukocyte antigen V-set like Ig domain PUBMED:12124426, PUBMED:8794870.

    \ 5740 IPR008588 \ This family consists of proteins of unknown function found in Caenorhabditis species.\ 4231 IPR000266 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The ribosomal proteins catalyse ribosome assembly and stabilise the rRNA, tuning the structure of the ribosome for optimal function. Evidence suggests that, in prokaryotes, the peptidyl transferase reaction is performed by the large subunit 23S rRNA, whereas proteins probably have a greater role in eukaryotic ribosomes. Most of the proteins lie close to, or on the surface of, the 30S subunit, arranged peripherally around the rRNA PUBMED:9281425. The small subunit ribosomal proteins can be categorised as primary binding proteins, which bind directly and independently to 16S rRNA; secondary binding proteins, which display no specific affinity for 16S rRNA, but its assembly is contingent upon the presence of one or more primary binding proteins; and tertiary binding proteins, which require the presence of one or more secondary binding proteins and sometimes other tertiary binding proteins.\ The small ribosomal subunit protein S17 is known to bind specifically to the 5' end of 16S ribosomal RNA in Escherichia coli (primary rRNA binding protein), and is thought to be involved in the recognition of termination codons. Experimental evidence PUBMED:9371771 has revealed that S17 has virtually no groups exposed on the ribosomal surface.

    \ 6736 IPR010697 \

    This family consists of several hypothetical bacterial proteins of around 180 residues in length. The function of this family is unknown.

    \ 246 IPR004145 \ This domain is only found in fly proteins. It is found associated with YLP motifs () in some proteins.\ 4167 IPR000206 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This family of large subunit ribosomal proteins is called the L7/L12\ family. L7/L12 is present in each 50S subunit in four copies organized as two dimers.\ The L8 protein complex consisting of two dimers of L7/L12 and L10 in Escherichia coli\ ribosomes is assembled on the conserved region\ of 23 S rRNA termed the GTPase-associated domain PUBMED:10488095.\ The L7/L12 dimer probably interacts with EF-Tu.\ L7 and L12 only differ in a single post translational modification\ of the addition an acetyl group to the N terminus of L7.

    \ 3614 IPR000510 \ Enzymes belonging to this family include cofactor-requiring nitrogenases and protochlorophyllide reductase. The key enzymatic reactions in nitrogen fixation are catalyzed by the nitrogenase complex, which has two components, the iron protein (component 2), and a component (component 1) which is either a molybdenum-iron, vanadium-iron or iron-iron protein. The enzyme () forms a hexamer of two alpha, two beta and two delta chains. Protochlorophyllide reductase () is involved in the light-dependent accumulation of chlorophyll, probably at the step of reduction of protochlorophyllide to chlorophyllide.\ 4701 IPR004264 \ Proteins in this group are TNP1/EN/SPM-like transposon proteins with no known function mostly from Arabidopsis thaliana.\ 4938 IPR000937 \ The capsid proteins of plant icosahedral positive strand RNA viruses form 4 different domains, a \ positively charged, N-terminal 'R' domain, which interacts with RNA (66 residues); a connecting arm, \ 'a' (35 residues); a central, surface 'S' domain, which forms the virion shell; and a projecting, \ C-terminal 'P' domain PUBMED:7704529. Some of the viruses lack either the R or P domains. The S domain \ contains from 158 to 166 amino acids and comprises 8 anti-parallel beta-strands, which form a twisted \ sheet or jelly-roll fold. This structure is shared by a number of plant viral capsid proteins, including \ carmoviruses, dianthoviruses, sobemoviruses, tombusviruses and tobacco necrosis virus PUBMED:1856686.\ 5831 IPR010304 \

    This family consists of several eukaryotic survival motor neuron (SMN) proteins. The Survival of Motor Neurons (SMN) protein, the product of the spinal muscular atrophy-determining gene, is part of a large macromolecular complex (SMN complex) that functions in the assembly of spliceosomal small nuclear ribonucleoproteins (snRNPs). The SMN complex functions as a specificity factor essential for the efficient assembly of Sm proteins on U snRNAs and likely protects cells from illicit, and potentially deleterious, non-specific binding of Sm proteins to RNAs.

    \ 270 IPR005528 \

    This is a small domain found in a family of streptomyces proteins, which are annotated as 'putative secreted protein'. The domain occurs singly or as a pair and many have two cysteines that may form a disulphide bridge.

    \ 7300 IPR010907 \

    In Pseudomonas aeruginosa the fucose-binding lectin II (PA-IIL) contributes to the pathogenic virulence of the bacterium. PA-IIL functions as a tetramer when binding fucose. Each monomer is comprised of a nine-stranded, antiparallel beta-sandwich arrangement and contains two calcium cations that mediate the binding of fucose in a recognition mode unique among carbohydrate-protein interactions PUBMED:12415289.

    \ 5448 IPR008506 \ This family consists of several eukaryotic proteins of unknown function.\ 590 IPR002581 \ This family consists of morbillivirus RNA polymerase alpha subunit\ and non structural protein V. The P gene of morbillivirus is \ cotranscriptionally edited leading to the N-terminal \ half of the P protein being appended to the C-terminal of the P protein, \ and a cysteine rich region in the V fusion protein which has been \ shown to bind zinc PUBMED:1634877.\ Morbilliviruses are negative strand ssRNA viruses and a part of the\ paramyxoviridae family, members include measles virus and phocine \ distemper virus.\ 6421 IPR009516 \

    This family consists of several Raspberry bushy dwarf virus coat proteins.

    \ 382 IPR007588 \

    This domain is a potential FLYWCH Zn-finger found in a number of eukaryotic proteins.

    \ 30 IPR001608 \

    Alanine racemase plays a role in providing the D-alanine required for cell wall biosynthesis by isomerising L-alanine to D-alanine. Proteins containing this domain are found in both prokaryotes and eukaryotes PUBMED:1676385,PUBMED:7871888. The molecular structure of alanine racemase from Bacillus stearothermophilus was determined by X-ray crystallography to a resolution of 1.9 A PUBMED:9063881. The alanine racemase monomer is composed of two domains, an eight-stranded alpha/beta barrel at the N-terminus, and a C-terminal domain essentially composed of beta-strands. The pyridoxal 5'-phosphate (PLP) cofactor lies in and above the mouth of the alpha/beta barrel and is covalently linked via an aldimine linkage to a lysine residue, which is at the C-terminus of the first beta-strand of the alpha/beta barrel.

    \

    This domain is also found in the PROSC (proline synthetase co-transcribed bacterial homolog) family of proteins, which are not known to have alenine racemase activity.

    \ 1929 IPR003832 \

    This family is related to the acid phosphatase/vanadium-dependent haloperoxidases; members of this group are uncharacterised.

    \ 2643 IPR004115 \ This domain is found in some members of the GatB and aspartyl tRNA\ synthetases.\ 2449 IPR007266 \ Members of this family are required for the formation of disulphide bonds in the endoplasmic reticulum PUBMED:10754564, PUBMED:10982384.\ 875 IPR004151 \ Caenorhabditis elegans Sre proteins are candidate chemosensory receptors. There are four main recognized groups of such receptors: Odr-10, Sra, Sro, and Srg. Sre (this family), Sra Sra and Srb Srb comprise the Sra group. All of the above receptors are thought to be G protein-coupled seven transmembrane domain proteins PUBMED:10580986, PUBMED:7585938. The existence of several different chemosensory receptors underlies the fact that in spite of having only 20-30 chemosensory neurones, C. elegans detects hundreds of different chemicals, with the ability to discern individual chemicals among combinations PUBMED:10580986.\ 1574 IPR002679 \ This family consist of coat proteins from closteroviruses a member of the closteroviridae. The viral coat protein encapsulates and protects the viral genome. Both the large cp1 and smaller cp2 coat protein originate from the same primary transcript PUBMED:2033386. Members of the closteroviridae include Sugar beet yellow virus and Grapevine leafroll-associated virus, closteroviruses have a positive strand ssRNA genome with no DNA stage during replication.\ 3692 IPR000396 \ Cyclic-AMP phosphodiesterase () (PDE) catalyses the hydrolysis of cAMP to the\ corresponding nucleoside 5' monophosphate. On the basis of sequence\ similarity, most PDEs can be grouped together PUBMED:2159198, but 2 enzymes lie apart\ from the main family and represent a second distinct class PUBMED:2824992: this\ includes PDEs from Dictyostelium and yeast. \ There is, in the central part of these enzymes, a highly conserved region\ which contains three histidines.\ 6919 IPR010769 \

    This family consists of several bacterial ribosomal RNA methyltransferase (aminoglycoside-resistance methyltransferase) proteins PUBMED:8486289,PUBMED:2013410.

    \ 139 IPR003508 \

    This family consists of caspase-activated (CAD) nucleases, which induce DNA fragmentation and chromatin condensation during apoptosis, and the cell death activator proteins CIDE-A and CIDE-B, which are inhibitors of CAD nuclease. The two proteins interact through the region defined by the method signatures.

    \ 369 IPR003104 \

    Formin homology (FH) proteins play a crucial role in the reorganization of the actin cytoskeleton, which mediates various functions of the cell cortex including motility, adhesion, and cytokinesis PUBMED:10631086. Formins are multidomain proteins that interact with diverse signalling molecules and cytoskeletal proteins, although some formins have been assigned functions within the nucleus. Formins are characterised by the presence of three FH domains (FH1, FH2 and FH3), although members of the formin family do not necessarily contain all three domains PUBMED:12538772. The proline-rich FH1 domain mediates interactions with a variety of proteins, including the actin-binding protein profilin, SH3 (Src homology 3) domain proteins, and WW domain proteins. The FH2 domain is required for the self-association of formin proteins through the ability of FH2 domains to directly bind each other PUBMED:14576350, and may also act to inhibit actin polymerisation PUBMED:14992721. The FH3 domain () is less well conserved and may be important for determining intracellular localisation of formin family proteins. In addition, some formins can contain a GTPase-binding domain (GBD) () required for binding to Rho small GTPases, and a C-terminal conserved Dia-autoregulatory domain (DAD).

    \

    This entry represents the FH2 domain, which was shown by X-ray crystallography to have an elongated, crescent shape containing three helical subdomains PUBMED:15006353.

    \ 7371 IPR011431 \

    The members if this family are of unknown function.

    \ 7022 IPR010808 \

    The response regulators for CheA bind to the P2 domain, which is found between and as either one or two copies. Highly flexible linkers connect P2 to the rest of CheA and impart remarkable mobility to the P2 domain. This feature is thought to enhance the inter CheA dimer phosphotransfer reactions within the signalling complex, thereby amplifying the phosphorylation signal PUBMED:10564504.

    \ 5731 IPR008580 \ This domain consists of the N-terminal portion of several eukaryotic sequences. The function of this domain is unknown.\ 3709 IPR001449 \

    Phosphoenolpyruvate carboxylase (PEPCase), an enzyme found in all multicellular plants, catalyses the formation of oxaloacetate from phosphoenolpyruvate (PEP) and a hydrocarbonate ion PUBMED:1450389. This reaction is harnessed\ by C4 plants to capture and concentrate carbon dioxide into the photosynthetic bundle sheath cells. It also plays a key role in the nitrogen\ fixation pathway in legume root nodules: here it functions in concert with\ glutamine, glutamate and asparagine synthetases and aspartate amido transferase, to synthesise aspartate and asparagine, the major nitrogen transport compounds in various amine-transporting plant species PUBMED:1421147.

    \

    PEPCase\ also plays an antipleurotic role in bacteria and plant cells, supplying\ oxaloacetate to the TCA cycle, which requires continuous input of C4\ molecules in order to replenish the intermediates removed for amino acid\ biosynthesis PUBMED:2779518.\ The C-terminus of the enzyme contains the active site that includes a\ conserved lysine residue, involved in substrate binding, and other conserved\ residues important for the catalytic mechanism PUBMED:1508152.

    \ 3717 IPR000696 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of proteins, which include the Nodavirus coat precusor endopeptidases, are aspartic peptidases that belong to the MEROPS peptidase family A6 (clan AB).

    \ \

    Nodaviruses are small, icosahedral viruses, pathogenic to insects and mammals. A virus particle consists of a single virion, within which is packaged two RNA stands, RNA1 and RNA2.Nodavirus coat precursor endopeptidase (also known as protein alpha) is the only protein encoded by RNA2. During the process of virion assembly, this precursor is cleaved into coat proteins beta and gamma. RNA1 encodes two proteins, at least one of which is involved in RNA replication. The relatively uncomplicated nature of their structural protein and RNA constituents make the nodaviruses a good virus model PUBMED:2116525.

    \ \

    The 3D structure of the capsid protein has been determined by X-ray\ crystallography to 2.8A resolution PUBMED:2116525. The structure contains a beta-barrel\ domain, with a prominent protrusion composed largely of beta-sheet. This\ protrusion, together with similar protrusions from neighbouring subunits,\ forms a prominent trigonal pyramid with quasi-3-fold symmetry PUBMED:2116525. Two\ alpha-helices extend toward the interior of the particle PUBMED:2116525.

    \ 7041 IPR009861 \

    This family consists of several mammalian DAP10 membrane proteins. In activated mouse natural killer (NK) cells, the NKG2D receptor associates with two intracellular adaptors, DAP10 and DAP12, which trigger phosphatidyl inositol 3 kinase (PI3K) and Syk family protein tyrosine kinases, respectively. It has been suggested that the DAP10-PI3K pathway is sufficient to initiate NKG2D-mediated killing of target cells PUBMED:12740576.

    \ 2945 IPR003384 \ The Hepatitis E virus (HEV) genome is a single-stranded, positive-sense RNA molecule of approximately 7.5 kb PUBMED:10449466. Three open reading frames (ORF) were identified within the HEV genome: ORF1 encodes nonstructural proteins, ORF2 encodes the putative structural protein(s), and ORF3 encodes a protein of unknown function. ORF2 contains a consensus signal peptide sequence at its amino terminus and a capsid-like region with a high content of basic amino acids similar to that seen with other virus capsid proteins PUBMED:1926770.\ 3469 IPR011128 \ NAD-dependent glycerol-3-phosphate dehydrogenase (GPDH) catalyses the interconversion of dihydroxyacetone phosphate and L-glycerol-3-phosphate. This family represents the N-terminal NAD-binding domain PUBMED:10801498. \ \ 1766 IPR001874 \ 3-dehydroquinate dehydratase (), or dehydroquinase, catalyzes the conversion of 3-dehydroquinate into 3-dehydroshikimate. It is the third step in the shikimate pathway for the biosynthesis of aromatic amino acids from chorismate. Two classes of dehydroquinases exist, known as types I and II. Class-II enzymes are homododecameric enzymes of about 17 kDa. They are found in some bacteria such as actinomycetales PUBMED:1910148, PUBMED:8170389 and some fungi where they act in a catabolic pathway that allows the use of quinic acid as a carbon source.\ 2795 IPR000172 \

    The glucose-methanol-choline (GMC) oxidoreductases are FAD\ flavoproteins oxidoreductases PUBMED:1542121, PUBMED:8218217.\ These enzymes include a variety of proteins; choline dehydrogenase (CHD), methanol oxidase (MOX) and cellobiose dehydrogenase () PUBMED:10725534 which share a number of regions of sequence similarities. One of\ these regions, located in the N-terminal section, corresponds to the FAD ADP-\ binding domain. The function of the other conserved domains is not yet known.

    \ 7643 IPR012924 \

    This domain consists of a group of sequences that are similar to the core of TfuA protein (). This protein is involved in the production of trifolitoxin (TFX), a gene-encoded, post-translationally modified peptide antibiotic PUBMED:8763943. The role of TfuA in TFX synthesis is unknown, and it may be involved in other cellular processes PUBMED:8763943.

    \ 5214 IPR008850 \ This short sequence region is found in four copies at the N terminus of the TEP1 telomerase component. The functional significance of the region is uncertain. However the conservation of two histidines and a cysteine suggests it potentially binds zinc.\ 2477 IPR002529 \

    This family consists of fumarylacetoacetase (FAA), \ or fumarylacetoacetate hydrolase (FAH) and it also includes the bifunctional enzyme 2-hydroxyhepta-2,4-diene-1,7-dioate isomerase (EC 5.3.3.-) (HHDD isomerase); 5-carboxymethyl-2-oxo-hex-3-ene-1,7-dioate decarboxylase (EC 4.1.1.-) (OPET decarboxylase) from Escherichia coli.

    \

    Fumarylacetoacetate hydrolase is the last enzyme of the tyrosine catabolic pathway, and deficiency in this enzyme causes Tyrosinemia type I, an inborn error of metabolism PUBMED:9101289.

    \ 3291 IPR000982 \ The matrix protein plays a crucial role in virus assembly, and interacts with the RNP complex as well\ as with the viral membrane. It is found in Morbillivirus and paramyxovirus, pneumovirus.\ 1387 IPR007024 \ An FAD-binding domain, BLUF, exemplified by the N-terminus of the AppA protein, (), from Rhodobacter sphaeroides, is present in various proteins, primarily from Bacteria. The BLUF domain is involved in sensing blue-light (and possibly redox) using FAD and is similar to the flavin-binding PAS domains and cryptochromes. The predicted secondary structure reveals that the BLUF domain is a novel FAD-binding fold PUBMED:12368079.\ 4104 IPR003717 \ The damage avoidance-tolerance pathway(s) requires functional recA, recF, recO, and recR genes, suggesting the mechanism to be daughter strand gap repair. The ruvABC genes or the recG gene is also required. The RecG pathway appears to be more active than the RuvABC pathway PUBMED:11073901. RecO may contain a mononucleotide-binding fold PUBMED:2544549.\ 6823 IPR009734 \

    This family consists of several bacterial and phage proteins of around 130 residues in length which seem to be related to the bacteriophage P2 GpU protein () which is thought to be involved in tail assembly PUBMED:12426340.

    \ 5652 IPR005456 \

    Melanin-concentrating hormone (MCH) is a cyclic peptide originally\ identified in teleost fish PUBMED:10421367,PUBMED:10421368. In fish, MCH is released from the\ pituitary and causes lightening of skin pigment cells through pigment\ aggregation PUBMED:10996523. In mammals, MCH is predominantly expressed in the\ hypothalamus, and functions as a neurotransmitter in the control of a range\ of functions. A major role of MCH is thought to be in the regulation of\ feeding: injection of MCH into rat brains stimulates feeding; expression of\ MCH is upregulated in the hypothalamus of obese and fasting mice; and mice\ lacking MCH are lean and eat less PUBMED:10421367. MCH and alpha melanocyte-stimulating\ hormone (alpha-MSH) have antagonistic effects on a number of physiological\ functions. Alpha-MSH darkens pigmentation in fish and reduces feeding in\ mammals, whereas MCH increases feeding PUBMED:10996523.

    \

    \ MCH is derived from a pre-pro-hormone (pre-pro-MCH), which contains 1-2\ hormones other than MCH, depending on the species. In all species, the 17-19\ C-terminal amino acids are cleaved to release MCH. In mammals, amino acids\ 132-144 encode the hormone neuropeptide EI (NEI), whilst in salmonids, the\ analagous region encodes neuropeptide EV (NEV), and in other fish, the region\ determines MCH gene-related peptide (Mgrp) PUBMED:8559281. A further peptide, known as\ neuropeptide GE (NGE), is thought to be found in mammalian pre-pro-MCH\ upstream of NEI, encoded by amino acids 110-129. NEI has been shown to\ enhance oxytocin and reduce arginine vasopressin secretion from rat\ pituitary PUBMED:9175893. Two paralogues of MCH, known as pro-MCH-like 1 and 2 genes\ (PMCHL1 and PMCHL2), which arose recently in primate evolution, also\ exist. At present, it is unclear whether the PMCHL genes are functional \ genes or inactive pseudogenes.\

    \ 3772 IPR001375 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This domain covers the active site serine of the serine peptidases belonging to MEROPS peptidase family S9 (prolyl oligopeptidase family, clan SC). The protein fold of the peptidase domain for members of this family resembles that of serine carboxypeptidase D, the type example of clan SC. \ Examples of protein families containing this domain are:

    \

    \ \

    These proteins belong to MEROPS peptidase families S9A, S9B and S9C.

    \ 4756 IPR000458 \ This family of trypanosomal proteins resemble vertebrate mucins.\ The protein consists of three regions. The N and C terminii are\ conserved between all members of the family, whereas the central\ region is not well conserved and contains a large number of\ threonine residues which can be glycosylated PUBMED:7592617.\ Indirect evidence suggested that these genes might encode the core\ protein of parasite mucins, glycoproteins that were proposed to be\ involved in the interaction with, and invasion of, mammalian host\ cells.\ 5037 IPR000906 \ This is a domain of unknown function, present in ZO-1 and Unc5-like netrin receptors. It is also found in \ different variants of ankyrin, which are responsible for attaching integral membrane proteins to \ cytoskeletal elements.\ 2538 IPR007058 \ This family appears to be distantly related to and which are also components of the archaeal flagellar.\ 3916 IPR002797 \ Members of this family are integral membrane proteins PUBMED:8118055, and many are implicated in the production\ of polysaccharide. The family includes RfbX part of the O antigen biosynthesis\ operon PUBMED:7517390, and SpoVB from Bacillus subtilis (),\ which is involved in spore cortex biosynthesis PUBMED:1744050.\ 4050 IPR001127 \

    The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) PUBMED:8246840, PUBMED:2197982 is a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates and coupled with translocation across the cell membrane, makes the PTS a link between the uptake and metabolism of sugars.

    \ \

    The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred via a signal transduction pathway, to enzyme I (EI) which in turn transfers it to a phosphoryl carrier, the histidine protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease, a membrane-bound complex known as enzyme 2 (EII), which transports the sugar to the cell. EII consists of at least three structurally distinct domains IIA, IIB and IIC PUBMED:1537788. These can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII).

    \ \

    The first domain (IIA or EIIA) carries the first permease-specific phosphorylation site, a histidine which is phosphorylated by phospho-HPr. The second domain (IIB or EIIB) is phosphorylated by phospho-IIA on a cysteinyl or histidyl residue, depending on the sugar transported. Finally, the phosphoryl group is transferred from the IIB domain to the sugar substrate concomitantly with the sugar uptake processed by the IIC domain. This third domain (IIC or EIIC) forms the translocation channel and the specific substrate-binding site.

    \ \

    An additional transmembrane domain IID, homologous to IIC, can be found in some PTSs, e.g. for mannose PUBMED:8246840, PUBMED:1537788, PUBMED:7815935, PUBMED:11361063.

    \ \

    \

    \ 3178 IPR004283 \ The late expression factor 2 (lef-2) protein from Nucleopolyhedrovirus is required for expression of late genes. The lef-2 protein has been\ shown to be specifically required for expression from the vp39 and polh promoters PUBMED:8445724.\ 5613 IPR008430 \ This family consists of several bacterial cytotoxic necrotizing factor proteins as well as related dermonecrotic toxin (DNT) from Bordetella species. Cytotoxic necrotizing factor 1 (CNF1) causes necrosis of Oryctolagus cuniculus skin and re-organisation of the actin cytoskeleton in cultured cells PUBMED:12622819. Bordetella dermonecrotic toxin (DNT) stimulates the assembly of actin stress fibres and focal adhesions by deamidating or polyaminating Gln63 of the small GTPase Rho. DNT is an A-B toxin which is composed of an N-terminal receptor-binding (B) domain and a C-terminal enzymatically active (A) domain PUBMED:12065482.\ 7035 IPR009856 \

    This family consists of several plant specific light regulated Lir1 proteins. Lir1 mRNA accumulates in the light, reaching maximum and minimum steady-state levels at the end of the light and dark period, respectively. Plants germinated in the dark have very low levels of lir1 mRNA, whereas plants germinated in continuous light express lir1 at an intermediate but constant level. It is thought that lir1 expression is controlled by light and a circadian clock. The exact function of this family is unclear PUBMED:8499615.

    \ 4842 IPR002737 \

    This family of proteins, from all branches of life, have not been characterized.

    \ 3154 IPR000034 \

    Laminins represent a distinct family of extracellular matrix proteins present only in basement membranes in almost every animal tissue. They are heterotrimeric molecules composed of alpha, beta and gamma subunits (formerly A, B1, and B2, respectively PUBMED:7921537) and form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains, PUBMED:2404817, PUBMED:7827749. Most of the globular domains of the short arms correspond to one of two different motifs, the 200-residue laminin N-terminal (domain VI) (LN) module and the 250-residue laminin domain IV (L4) module PUBMED:8615779. All alpha chains share a unique C-terminal G domain which consists of five laminin G modules. The laminins can self-assemble, bind to other matrix macromolecules, and have unique and shared cell interactions mediated by integrins, dystroglycan, and other receptors. There are at least 14 laminin isoforms that regulate a variety of cellular functions including cell adhesion, migration, proliferation, signaling and differentiation PUBMED:9758133, PUBMED:7827749, PUBMED:11054872.

    \ \

    The laminin B domain (also known as domain IV) is an extracellular module of unknown function. It is found in a number of different proteins that include, heparan sulphate proteoglycan from basement membrane, a laminin-like protein from Caenorhabditis elegans and laminin. Laminin IV domain is not found in short laminin chains (alpha4 or beta3).

    \ \ 6309 IPR010514 \

    COX2 (Cytochrome O ubiquinol OXidase 2) is a major component of the respiratory complex during vegetative growth. It transfers electrons from a quinol to the binuclear centre of the catalytic subunit 1. The function of this region is not known.

    \ 2445 IPR005352 \

    This family of uncharacterised proteins are integral membrane proteins. They may contain 4 transmembrane helices. The family contains a conserved arginine and histidine that may be functionally important.

    \ 6451 IPR010579 \

    Major Histocompatibility Complex (MHC) glycoproteins are heterodimeric cell surface receptors that function to present antigen peptide fragments to T cells responsible for cell-mediated immune responses. MHC molecules can be subdivided into two groups on the basis of structure and function: class I molecules present intracellular antigen peptide fragments (~10 amino acids) on the surface of the host cells to cytotoxic T cells; class II molecules present exogenously derived antigenic peptides (~15 amino acids) to helper T cells. MHC class I and II molecules are assembled and loaded with their peptide ligands via different mechanisms. However, both present peptide fragments rather than entire proteins to T cells, and are required to mount an immune response.

    \

    Class I MHC glycoproteins are expressed on the surface of all somatic nucleated cells, with the exception of neurons. MHC class I receptors present peptide antigens that are synthesised in the cytoplasm, which includes self-peptides (presented for self-tolerance) as well as foreign peptides (such as viral proteins). These antigens are generated from degraded protein fragments that are transported to the endoplasmic reticulum by TAP proteins (transporter of antigenic peptides), where they can bind MHC I molecules, before being transported to the cell surface via the Golgi apparatus PUBMED:9485452, PUBMED:15526153. MHC class I receptors display antigens for recognition by cytotoxic T cells, which have the ability to destroy viral-infected or malignant (surfeit of self-peptides) cells.

    \

    MHC class I molecules are comprise two chains: a MHC alpha chain (heavy chain), and a beta2-microglobulin chain (light chain), where only the alpha chain spans the membrane. The alpha chain has three extracellular domains (alpha 1-3; and ), a transmembrane region and a C-terminal cytoplasmic tail; the soluble extracellular beta-2 microglobulin chain associates primarily with the alpha-3 domain and is necessary for MHC stability. This entry represents the alpha chain C-terminal tail domain.

    \ 6256 IPR009449 \

    In Saccharomyces cerevisiae, Sec2p is a GDP/GTP exchange factor for Sec4p, which is required for vesicular transport at the post-Golgi stage of yeast secretion PUBMED:9199166.

    \ 917 IPR002909 \ This family consists of a domain that has an immunoglobulin like fold. These domains are found in cell surface receptors such as Met and Ron as well as in intracellular transcription factors where it is involved in DNA binding.\ The Ron tyrosine kinase receptor shares with the members of its subfamily (Met and Sea) a unique functional feature: the control of cell dissociation, motility, and invasion of extracellular matrices (scattering) PUBMED:8816464.\ 6334 IPR010524 \

    This domain is found at the N terminus of several sigma54- dependent transcriptional activators including PrpR, which activates catabolism of propionate.

    \ 6661 IPR009648 \

    This family consists of several bacterial malonate decarboxylase gamma subunit proteins. Malonate decarboxylase of Klebsiella pneumoniae consists of four different subunits and catalyses the conversion of malonate plus H+ to acetate and CO2. The catalysis proceeds via acetyl and malonyl thioester residues with the phosphribosyl-dephospho-CoA prosthetic group of the acyl carrier protein (ACP) subunit. MdcD and E together probably function as malonyl-S-ACP decarboxylase PUBMED:9208947.

    \ 7441 IPR011470 \

    This small family is found in several undescribed proteins. The alignment is distinguished by the frequent occurrence of conserved glycine and aromatic residues.

    \ 6895 IPR009774 \

    This family consists of several hypothetical Streptococcus thermophilus bacteriophage proteins of around 235 residues in length. The function of this family is unknown.

    \ 6503 IPR009561 \

    This family consists of several hypothetical archaeal and bacterial proteins of around 300 residues in length. The function of this family is unknown.

    \ 3222 IPR002520 \ This family consists of the p50 and variable adherence-associated antigen\ (Vaa) adhesins from Mycoplasma hominis. M. hominis is a mycoplasma associated with human urogenital diseases, pneumonia, and septic\ arthritis PUBMED:8698503.\ An adhesin is a cell surface molecule that mediates adhesion to other\ cells or to the surrounding surface or substrate.\ The Vaa antigen is a 50-kDa surface lipoprotein that has four tandem\ repetitive DNA sequences encoding a periodic peptide structure, and is\ highly immunogenic in the human host PUBMED:8698503. p50 is also a 50-kDa\ lipoprotein, having three repeats A,B and C, that may be a tetramer of\ 191-kDa in its native environment PUBMED:8926064.\ 8123 IPR013230 \

    This entry contains zinc D-Ala-D-Ala carboxypeptidases from Streptomyces species and non-peptidase homologues that belong to MEROPS peptidase family M15 (subfamily M15A, clan MD).

    \ 1630 IPR004876 \ Members of this family are Coronavirus proteins that are located in the nucleocapsid. They have no known function.\ 3124 IPR003937 \

    Potassium channels are the most diverse group of the ion channel family\ PUBMED:1772658, PUBMED:1879548. They are important in shaping the action potential, and in neuronal excitability and plasticity PUBMED:2451788. The potassium channel family is\ composed of several functionally distinct isoforms, which can be broadly\ separated into 2 groups PUBMED:2555158: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.

    \

    These are all highly similar proteins, with only small amino acid\ changes causing the diversity of the voltage-dependent gating mechanism,\ channel conductance and toxin binding properties. Each type of K+ channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; and others are regulated by GTP-binding proteins or\ other second messengers PUBMED:2448635. In eukaryotic cells, K+ channels\ are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes PUBMED:1373731. In prokaryotic cells, they play a role in the\ maintenance of ionic homeostasis PUBMED:11178249.

    \

    All K+ channels discovered so far possess a core of \ alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has\ been termed the K+ selectivity sequence.\ In families that contain one P-domain, four subunits assemble to form a selective pathway for K+ across the membrane.\ However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K+ channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains.\ The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K+ channels; and three types of calcium (Ca)-activated K+ channels (BK, IK and SK)\ PUBMED:11178249, PUBMED:. The 2TM domain family comprises inward-rectifying K+ \ channels. In addition, there are K+ channel alpha-subunits that possess two P-domains. These are usually highly regulated K+ selective leak channels.

    \

    KCNQ channels differ from other voltage-gated 6 TM helix channels, chiefly \ in that they possess no tetramerisation domain. Consequently, they rely on\ interaction with accessory subunits, or form heterotetramers with other\ members of the family PUBMED:10838601. Currently, 5 members of the KCNQ family are \ known. These have been found to be widely distributed within the body,\ having been shown to be expressed in the heart, brain, pancreas, lung,\ placenta and ear. They were initially cloned as a result of a search for \ proteins involved in cardiac arhythmia. Subsequently, mutations in other \ KCNQ family members have been shown to be responsible for some forms of\ hereditary deafness PUBMED:8528244 and benign familial neonatal epilepsy PUBMED:9430594.

    \ 4690 IPR001248 \

    The Nucleobase Cation Symporter-1 (NCS1) family consists of bacterial and yeast transporters for nucleobases including purines and pyrimidines. Members of this family possess twelve putative transmembrane a-helical spanners (TMSs). At least some of them have been shown to function in uptake by substrate:H+ symport mechanism.

    \ 3552 IPR006027 \

    This domain is found in a number of functionally different proteins:

    \ \ \

    NusB is a prokaryotic transcription factor involved in antitermination processes, during which it interacts with the boxA portion of the mRNA nut site. Previous studies have shown that NusB exhibits an all-helical fold, and that the protein from Escherichia coli forms monomers, while Mycobacterium tuberculosis NusB is a dimer. The functional significance of NusB dimerization is unknown. \ \ An N-terminal arginine-rich sequence is the probable RNA binding site, exhibiting aromatic residues as potential stacking partners for the RNA bases. The RNA binding region is hidden in the subunit interface of dimeric NusB proteins, such as NusB from M. tuberculosis, suggesting that such dimers have to undergo a considerable conformational change or dissociate for engagement with RNA. In certain organisms, dimerization may be employed to package NusB in an inactive form until recruitment into antitermination complexes PUBMED:9670024, PUBMED:15279620.

    \ \

    The antitermination proteins of Escherichia coli are recruited in the replication cycle of\ bacteriophage lambda, where they play an important role in switching from the\ lysogenic to the lytic cycle.

    \ \ 733 IPR006916 \

    The Popeye (POP) family of proteins, is restricted to vertebrates and is preferentially expressed in developing and adult striated muscle. It is represented by a conserved region which includes three potential transmembrane domains PUBMED:10882522. The strong conservation of POP genes during evolution and their preferential expression in heart and skeletal muscle suggest that these novel proteins may have an important function in these tissues in vertebrates.

    \ 8006 IPR012966 \

    This is a conserved domain in the anillin family of proteins, which are involved in cell division PUBMED:12668659. In Schizosaccharomyces pombe, anillin (Mid2) is involved in septin ring organisation and cell separation PUBMED:12668659, PUBMED:12654901.

    \ 6670 IPR009653 \

    This family consists of a number of eukaryotic proteins of around 72 residues in length. The function of this family is unknown.

    \ 5756 IPR009227 \

    This family consists of several Zea mays specific MURB-like proteins. The transposition of Mu elements underlying Mutator activity in maize requires a transcriptionally active MuDR element. Despite variation in MuDR copy number and RNA levels in Mutator lines, transposition events are consistently late in plant development, and Mu excision frequencies are similar PUBMED:11251096.

    \ 6281 IPR010503 \

    These are B subunits from the type II heat-labile enterotoxin. The B subunits form a pentameric ring, which interacts with one A subunit. Thus, the structural arrangement of type I and type II heat-labile enterotoxins are very similar PUBMED:8805549.

    \ 1981 IPR005061 \

    This is a eukaryotic protein family of unknown function.

    \ 7630 IPR012441 \

    The members of this family are all sequences found within hypothetical proteins expressed by various bacterial species. The region concerned is approximately 150 residues long.

    \ 2382 IPR006825 \ Eclosion hormone is an insect neuropeptide that triggers the performance of ecdysis behaviour, which causes shedding of the old cuticle at the end of a molt PUBMED:11950244, PUBMED:1634328.\ 7104 IPR009900 \

    This entry represents a series of 13 residue repeats found in the apopolysialoglycoprotein of Oncorhynchus mykiss (Rainbow trout) and Oncorhynchus masou (Cherry salmon). Polysialoglycoprotein (PSGP) of unfertilised eggs of rainbow trout consists of tandem repeats of a glycotridecapeptide, Asp-Asp-Ala-Thr*-Ser*-Glu-Ala-Ala-Thr*-Gly-Pro-Ser- Gly (* denotes the attachment site of a polysialoglycan chain). In response to egg activation, PSGP is discharged by exocytosis into the space between the vitelline envelope and the plasma membrane, i.e. the perivitelline space, where the 200 kDa PSGP molecules undergo rapid and dramatic depolymerisation by proteolysis into glycotridecapeptides PUBMED:3182867.

    \ 6569 IPR009604 \

    This entry represents a conserved region approximately 250 residues long located towards the C terminus of eukaryotic ataxin-2. Ataxin-2 is a protein of unknown function, within which expansion of a polyglutamine tract (due to expansion of unstable CAG repeats in the coding region of the SCA2 gene) causes spinocerebellar ataxia type 2 (SCA2), a late-onset neurodegenerative disorder PUBMED:9339681. The expanded polyglutamine repeat in ataxin-2 causes disruption of the normal morphology of the Golgi complex and increased incidence of cell death PUBMED:12812977. Ataxin-2 is predicted to consist of mostly non-globular domains PUBMED:9462862.

    \ 3217 IPR007326 \ This presumed domain is about 100 amino acids in length. It is found in lipoprotein of unknown function and is greatly expanded in Mycoplasma pulmonis. The domain is found in up to five copies in some proteins.\ 3981 IPR007490 \

    This family is the B22R protein from Poxviruses.

    \ 5354 IPR008810 \ This family consists of several virulence-associated proteins from Rhodococcus equi. R. equi is an important pulmonary pathogen of foals and is increasingly isolated from pneumonic infections and other infections in Homo sapiens immunodeficiency virus-infected patients. Isolates from foals possess a large virulence plasmid, varying in size from 80 to 90 kb. Isolates lacking the plasmid are avirulent to foals. Little is known about the function of the plasmid apart from its encoding a virulence associated surface protein PUBMED:11083803.\ 547 IPR001320 \ The ability of synapses to modify their synaptic strength in response to activity is a fundamental property of the nervous system and may be an essential component of learning and memory. There are three classes of ionotropic glutamate receptor, namely NMDA (N-methyl-D-aspartate), AMPA (alpha-amino-3-hydroxy-5-methyl-4-isoxazole-4-propionic\ acid) and kainate receptors. They are believed to play critical roles in synaptic plasticity. At many synapses in the brain, transient activation of NMDA receptors leads to a persistent modification in the strength of synaptic transmission mediated by AMPA receptors and kainate receptors can act as the induction trigger for long-term changes in synaptic transmission PUBMED:10580501.\ 6857 IPR009751 \

    This family consists of several CryBP1 like proteins from Bacillus thuringiensis and Paenibacillus popilliae. Members of this family are thought to be involved in the overall toxicity of the bacteria to their hosts PUBMED:7730255,PUBMED:9209052.

    \ 2275 IPR006913 \

    Glutathione-dependent formaldehyde-activating enzymes catalyze the condensation of formaldehyde and glutathione to S-hydroxymethylglutathione. All known members of this family contain 5 strongly conserved cysteine residues.

    \ 177 IPR000323 \ Copper type II, ascorbate-dependent monooxygenases PUBMED:2792366 are a class of enzymes\ that requires copper as a cofactor and which uses ascorbate as an electron\ donor. This family contains two related enzymes, Dopamine-beta-monooxygenase ()\ and Peptidyl-glycine alpha-amidating monooxygenase ().\ There are a few regions of sequence similarities between these two enzymes,\ two of these regions contain clusters of conserved histidine residues which\ are most probably involved in binding copper.\ 1320 IPR000184 \ The protein sequences of d15 from various strains of Haemophilus influenzae are highly conserved, with only a small variable region identified near the carboxyl terminus of the protein PUBMED:7737523. D15 is a highly conserved antigen that is protective in animal models and it may be a useful component of a universal subunit vaccine against Haemophilus infection and disease PUBMED:9284140. Membrane proteins from other bacteria have been shown to elicit protective immunity. Oma87 is a protective outer membrane antigen of Pasteurella multocida PUBMED:8757848.\ 7550 IPR013105 \

    This Pfam entry includes outlying Tetratricopeptide-like repeats (TPR) that are not matched by . See: PUBMED:7667876, PUBMED:9482716.

    \ 7261 IPR010887 \

    This family consists of several bacterial proteins of around 360 residues in length. The function of this family is unknown.

    \ 1369 IPR004199 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Beta-galactosidase enzymes () belong to the glycosyl hydrolase 42 family . Beta-galactosidase is the product of the lac operon Z gene of Escherichia coli. This enzyme catalyses the hydrolysis of the disaccharide lactose to galactose and glucose, and can also convert lactose to allolactose, the inducer of the lac operon. This domain is found in single chain beta-galactosidases, which are comprised of five domains. The active site is located in a deep pocket built around the central alpha-beta barrel, with the other domains conferring specificity for a disaccharide substrate. This entry represents domain 5, which contains an N-terminal loop that swings towards the active site upon the deep binding of a ligand to produce a closed conformation PUBMED:11732897. This domain is also found in the amino-terminal portion of the small chain of dimeric beta-galactosidases.

    \ 8104 IPR013263 \

    Some proteobacteria topoisomerase I proteins contain two zinc-ribbon-like domains at the C-terminus that are structurally homologous to . However, it is unlikely that the domain can bind zinc as only one of the four cysteine residues remains PUBMED:10873443.

    \ 6645 IPR010661 \

    This domain is known as the thumb domain. It is composed of a four helix bundle PUBMED:1377403.

    \ 5760 IPR010261 \

    This family consists of a number of bacterial sequences, which are highly similar to the Tir chaperone protein in Escherichia coli. In many Gram-negative bacteria, a key indicator of pathogenic potential is the possession of a specialised type III secretion system, which is utilised to deliver virulence effector proteins directly into the host cell cytosol. Many of the proteins secreted from such systems require small cytosolic chaperones to maintain the secreted substrates in a secretion-competent state. CesT serves a chaperone function for the enteropathogenic Escherichia coli (EPEC) translocated intimin receptor (Tir) protein, which confers upon EPEC the ability to alter host cell morphology following intimate bacterial attachment PUBMED:11849537.

    \ 2339 IPR002764 \

    Members of this archaebacterial family have no known function.

    \ 4390 IPR004692 \

    Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase\ pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to\ the translocase component. PUBMED:2202721. From there, the mature proteins are either targeted to the outer\ membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial\ chromosome.\

    \

    The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral\ membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of\ the mature peptide into the periplasm (SecD and SecF) PUBMED:2202721. The chaperone protein SecB PUBMED:11336818 is a highly acidic homotetrameric protein that exists as a "dimer of dimers" in the bacterial cytoplasm.\ SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane\ protein ATPase SecA for secretion PUBMED:10418149. Together with\ SecY and SecG, SecE forms a multimeric channel through which preproteins\ are translocated, using both proton motive forces and ATP-driven secretion. The latter is mediated by SecA.

    \ \

    SecG has two transmembrane\ domains, both of which contribute to the recognition of preprotein signal\ sequences by the translocation complex PUBMED:7650029. The protein also undergoes\ membrane topology inversion when coupled to the SecA cycle PUBMED:11445571.

    \ \ 455 IPR004160 \ Elongation factor Tu consists of three structural domains, this is the third domain. This third domain adopts a beta barrel structure, and is involved in binding to both charged tRNA PUBMED:7491491 and binding to EF-Ts () PUBMED:9253415.\ 5914 IPR009285 \

    This family consists of several Orthopoxvirus A26L and A30L proteins. The Vaccinia A30L gene is regulated by a late promoter and encodes a protein of approximately 9 kDa. It is thought that the A30L protein is needed for vaccinia virus morphogenesis, specifically the association of the dense viroplasm with viral membranes PUBMED:11390577.

    \ 3874 IPR001697 \

    Pyruvate kinase () (PK) catalyses the final step in glycolysis PUBMED:2379684, the conversion of phosphoenolpyruvate to pyruvate with concomitant phosphorylation of ADP to ATP:

    \ \

    The enzyme, which is found in all living organisms, requires both magnesium and potassium ions for its activity PUBMED:3519210. In vertebrates, there are four tissue-specific isozymes: L (liver), R (red cells), M1 (muscle, heart and brain), and M2 (early foetal tissue). In plants, PK exists as cytoplasmic and plastid isozymes, while most bacteria and lower eukaryotes have one form, except in certain bacteria, such as Escherichia coli, that have two isozymes. All isozymes appear to be tetramers of identical subunits of ~500 residues.

    \

    PK helps control the rate of glycolysis, along with phosphofructokinase () and hexokinase (). PK possesses allosteric sites for numerous effectors, yet the isozymes respond differently, in keeping with their different tissue distributions PUBMED:12798932. The activity of L-type (liver) PK is increased by fructose-1,6-bisphosphate (F1,6BP) and lowered by ATP and alanine (gluconeogenic precursor), therefore when glucose levels are high, glycolysis is promoted, and when levels are low, gluconeogenesis is promoted. L-type PK is also hormonally regulated, being activated by insulin and inhibited by glucagon, which covalently modifies the PK enzyme. M1-type (muscle, brain) PK is inhibited by ATP, but F1,6BP and alanine have no effect, which correlates with the function of muscle and brain, as opposed to the liver.

    \

    The structure of cat muscle pyruvate kinase has been determined PUBMED:3519210. The protein comprises three domains each belonging to the alpha-beta class; one of these adopts a 3-layer(aba) sandwich architecture; the other two form beta-barrels.

    \ 5030 IPR007449 \ This entry represents the ZipA C-terminal domain. ZipA is involved in septum formation in bacterial cell division. Its C-terminal domain binds FtsZ, a major component of the bacterial septal ring. The structure of this domain is an alpha-beta fold with three alpha helices and a beta sheet of six antiparallel beta strands. The major loops protruding from the beta sheet surface are thought to form a binding site for FtsZ PUBMED:10924108.\ 961 IPR005113 \ This region is always found associated with . It is predicted to form an all beta domain PUBMED:11563850.\ 8098 IPR013164 \

    This cadherin domain is usually the most N-terminal copy of cadherin domains.

    \ 7524 IPR011646 \ The KAP (after Kidins220/ARMS and PifA) family of predicted NTPases are sporadically distributed across a wide phylogenetic range in bacteria and in animals. Many of the prokaryotic KAP NTPases are encoded in plasmids and tend to undergo disruption to form pseudogenes. A unique feature of all eukaryotic and certain bacterial KAP NTPases is the presence of two or four transmembrane helices inserted into the P-loop NTPase domain. These transmembrane helices anchor KAP NTPases in the membrane such that the P-loop domain is located on the intracellular side PUBMED:15128444.\ 6169 IPR010463 \

    This family consists of uncharacterised proteins that are puatative lipases.

    \ 7197 IPR009962 \

    This family consists of several hypothetical bacterial proteins of around 85 residues in length. The function of this family is unknown.

    \ 5968 IPR009311 \

    These proteins include several that are annotated as alpha-interferon inducible proteins.

    \ 2034 IPR007164 \ This is family of archaebacterial proteins, which are about 170 amino acids in length. They have no known function. The most conserved portion of the protein contains the sequence GEEDL that may be important for its function.\ 4216 IPR002132 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L5 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L5 is known to be involved in binding 5S RNA to the large ribosomal subunit. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities PUBMED:2198942, PUBMED:2016059, PUBMED:1840500, PUBMED:, groups:

    \ \

    L5 is a protein of about 180 amino-acid residues.

    \ 4304 IPR002512 \ Rotaviruses are dsRNA viruses that appear to infect a wide range of mammals. Gene 11 product is a non-structural phosphoprotein designated as NS26 PUBMED:2548010.\ 5 IPR002225 \ The enzyme 3 beta-hydroxysteroid dehydrogenase/5-ene-4-ene \ isomerase (3 beta-HSD) catalyses the oxidation and isomerisation \ of 5-ene-3 beta-hydroxypregnene and 5-ene-hydroxyandrostene \ steroid precursors into the corresponding 4-ene-ketosteroids necessary\ for the formation of all classes of steroid hormones.\ 4284 IPR000783 \ In eukaryotes, there are three different forms of DNA-dependent RNA polymerases ()\ transcribing different sets of genes. Each class of RNA polymerase is an assemblage of ten to\ twelve different polypeptides. In archaebacteria, there is generally a single form of RNA\ polymerase which also consists of an oligomeric assemblage of 10 to 13 polypeptides.\ Archaebacterial subunit H (gene rpoH) PUBMED:10191143, PUBMED:1729711 is a small protein of about 8.5 to\ 10 kD, it is evolutionary related to the C-terminal part of a 23 kD component shared by all three\ forms of eukaryotic RNA polymerases (gene RPB5 in yeast and POLR2E in mammals).\ 7862 IPR012951 \

    This domain is found in the berberine bridge and berberine bridge-like enzymes, which are involved in the biosynthesis of numerous isoquinoline alkaloids. They catalyse the transformation of the N-methyl group of \ (S)-reticuline into the C-8 berberine bridge carbon of (S)-scoulerine PUBMED:8972604.

    \ 5434 IPR008495 \ This family consists of several hypothetical proteins of unknown function, found in Borrelia burgdorferi and Borrelia garinii.\ 671 IPR005255 \

    This is a family of 4-hydroxythreonine-4-phosphate dehydrogenase (). PdxA protein takes part in vitamin B6 biosynthesis: it forms pyridoxine 5'-phosphate from 4-(phosphohydroxy)-L-threonine and 1-deoxy-D-xylulose-5-phosphate.

    \ \ 2604 IPR005793 \

    Methionyl-tRNA formyltransferase () transfers a formyl group onto the amino terminus of the acyl moiety of the methionyl aminoacyl-tRNA. The formyl group appears to play a dual role in the initiator identity of N-formylmethionyl-tRNA by promoting its recognition by IF2 and by impairing its binding to EFTU-GTP. This family also includes formyltetrahydrofolate dehydrogenases, which produce formate from formyl-tetrahydrofolate. These enzymes contain an N-terminal domain in common with other formyl transferase enzymes (). The C-terminal domain has an open beta-barrel fold PUBMED:8887566.

    \ 4892 IPR000257 \ Uroporphyrinogen decarboxylase (URO-D), the fifth enzyme of the heme biosynthetic pathway, catalyzes the sequential decarboxylation of the four acetyl side chains of uroporphyrinogen to yield coproporphyrinogen PUBMED:1576986. URO-D deficiency is responsible for the human genetic diseases familial porphyria cutanea tarda (fPCT) and hepatoerythropoietic porphyria (HEP). The sequence of URO-D has been well conserved throughout evolution. The best conserved region is located in the N-terminal section; it contains a perfectly conserved hexapeptide. There are two arginine residues in this hexapeptide which could be involved in the binding, via salt bridges, to the carboxyl groups of the propionate side chains of the substrate.\

    The crystal structure of human uroporphyrinogen decarboxylase shows it as comprised of a single domain containing a (beta/alpha)8-barrel with a deep active site cleft formed by loops at the C-terminal ends of the barrel strands. \ URO-D is a dimer in solution. Dimerisation juxtaposes the active site clefts of the monomers, suggesting a functionally important interaction between the catalytic centers PUBMED:9564029.

    \ 2031 IPR007152 \ Members of this family are around 350 amino acids in length. They are found in archaea and have no known function.\ 7329 IPR011098 \

    The G5 domain (named after its conserved glycine residues) is a module of ~80 residues that is found in a variety of enzymes such as Streptococcal IgA peptidases and various glycosyl hydrolases in bacteria. It is found in one to seven copies in association with other domains, such as LysM, bacterial Ig-like, M23 and M26 peptidases, F5/8 type C, vanW or transglycosylase-like. The G5 domain contains a few highly conserved residues. None of these conserved residues are the polar types of amino acids found in active sites, so it seems unlikely this region has an enzymatic function. However, in nearly all cases the G5 domain is associated with a known enzymatic domain. Therefore, the G5 domain may confer localization or substrate specificity on the proteins in which it is found. As a common feature of the proteins containing G5 domains is N-acetylglucosamine binding, it has been suggested that this function might be attributed to the G5 domain. Other alternative functions could be allosteric regulation of the enzymatic domain or cofactor binding PUBMED:15598841.

    \ 4764 IPR004307 \

    Tryptophan-rich sensory protein (TspO) is an integral membrane protein that acts as a negative regulator of the\ expression of specific photosynthesis genes in response to oxygen/light PUBMED:7673149. It is involved in the efflux of porphyrin\ intermediates from the cell. This reduces the activity of coproporphyrinogen III oxidase, which is thought to lead to the\ accumulation of a putative repressor molecule that inhibits the expression of specific photosynthesis genes. Several\ conserved aromatic residues are necessary for TspO function: they are thought to be involved in binding porphyrin\ intermediates PUBMED:10681549.

    \

    The rat mitochondrial peripheral benzodiazepine receptor (MBR) was shown to not only retain\ its structure within a bacterial outer membrane, but also to be able to functionally substitute for TspO in TspO- mutants,\ and to act in a similar manner to TspO in its in situ location: the outer mitochondrial membrane PUBMED:9144197. The biological significance\ of MBR remains unclear. It is thought to be involved in a variety of cellular functions, including cholesterol\ transport in steroidogenic tissues.

    \ 968 IPR005375 \

    This family contains a number of small uncharacterised proteins including BM-002 .

    \ 6233 IPR009439 \

    This family consists of several red chlorophyll catabolite reductase (RCC reductase) proteins. Red chlorophyll catabolite (RCC) reductase (RCCR) and pheophorbide (Pheide) a oxygenase (PaO) catalyse the key reaction of chlorophyll catabolism, porphyrin macrocycle cleavage of Pheide a to a primary fluorescent catabolite (pFCC) PUBMED:10743659.

    \ 2003 IPR005532 \

    This presumed domain is found in bacterial proteins. In some cases these proteins also contain a protein kinase domain. The function of this domain is unknown.

    \ 4162 IPR002143 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L1 is the largest protein from the large ribosomal subunit. In Escherichia coli, L1 is known to bind to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities PUBMED:8635468, PUBMED:8607874, groups:

    \ \ 2822 IPR000791 \ Several uncharacterized proteins are evolutionary related, including Yarrowia lipolytica glyxoxylate \ pathway regulator GPR1; yeast protein FUN34 and hypothetical proteins YCR10c and YDR384c; fission yeast hypothetical protein SpAC5D6.09c; Escherichia coli hypothetical protein yaaH; and Methanobacterium \ thermoautotrophicum hypothetical protein Mth215. They are hydrophobic proteins that seem to contain \ six transmembrane regions and which could therefore be involved in transport. They have from 188 to \ 283 amino acids.\ 6212 IPR009429 \

    This family consists of several Baculovirus LEF-11 proteins. The exact function of this family is unknown although it has been shown that LEF-11 is required for viral DNA replication during the infection cycle PUBMED:11861844 and plays a role in late/very late gene activation.

    \ 5894 IPR009275 \

    SepZ is a component of the type III secretion system use in bacteria. SepZ is a gene within the enterocyte effacement locus. SepZ mutants exhibit reduced invasion efficiency and lack of tyrosine phosphorylation of Hp90 PUBMED:8878013.

    \ 6363 IPR010536 \

    This entry represents the N-terminal region of several mammalian and one bird sequence from Gallus gallus (Chicken). All of the mammalian proteins are hypothetical and have no known function but from the chicken is annotated as being a repulsive guidance molecule (RGM). RGM is a GPI-linked axon guidance molecule of the retinotectal system. RGM is repulsive for a subset of axons, those from the temporal half of the retina. Temporal retinal axons invade the anterior optic tectum in a superficial layer, and encounter RGM expressed in a gradient with increasing concentration along the anterior-posterior axis. Temporal axons are able to receive posterior-dependent information by sensing gradients or concentrations of guidance cues. Thus, RGM is likely to provide positional information for temporal axons invading the optic tectum in the stratum opticum PUBMED:12353034.

    \ 135 IPR007514 \ Members of this family are probably coiled-coil proteins that are similar to the CHD5 (Congenital heart disease 5) protein. The exact molecular function of these eukaryotic proteins is unknown.\ 5066 IPR007903 \

    The PRC-barrel is an all beta barrel domain found in photosystem reaction centre subunit H of\ the purple bacteria. PRC-barrels are\ approximately 80 residues long, and found widely represented in bacteria, archaea and plants. This\ domain is also present at the C terminus of the pan-bacterial protein RimM, which is involved in\ ribosomal maturation and processing of 16S rRNA. A family of small proteins conserved in all\ known euryarchaea are composed entirely of a single stand-alone copy of the domain\ PUBMED:12429060.

    \ 4493 IPR001641 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of aspartic peptidases belong to MEROPS peptidase family A9 (spumapepsin family, clan AA).

    \ \

    Foamy viruses are single-stranded enveloped retroviruses that have been noted to infect monkeys, cats and humans. In the human virus, the aspartic protease is encoded by the retroviral gag gene PUBMED:2451755, and in monkeys by the pol gene PUBMED:1647358. At present, the virus has not been proven to cause any particular disease. However, studies have shown human foamy virus causes neurological disorders in infected mice PUBMED:9549727. It is not clear whether the Foamy virus/spumavirus proteases share a common evolutionary origin with other aspartic proteases.

    \ 7923 IPR012626 \

    This family consists of insecticidal peptides isolated from venom of spiders of Aptostichus schlingeri and Calisoga sp. Nine insecticidal peptides were isolated from the venom of the Aptostichus schlinger spider and seven of these toxins cause flaccid paralysis to insect larvae within 10 min of injection. However, all nine peptides were lethal within 24 hours PUBMED:1440641.

    \ 128 IPR007852 \

    Paf1 is an RNA polymerase II-associated protein in yeast, which defines a complex that is distinct from the Srb/Mediator holoenzyme.\ The Paf1 complex, which also contains Cdc73, Ctr9, Hpr1, Ccr4, Rtf1 and Leo1, is required for full expression of a subset of yeast genes, particularly those responsive to signals from the Pkc1/MAP kinase cascade. The complex appears to play an essential role in RNA elongation PUBMED:12242279.

    \ 4310 IPR007779 \ Rotavirus particles consist of three concentric proteinaceous capsid layers. The innermost capsid (core) is made of VP2. The genomic RNA and the two minor proteins VP1 and VP3 are encapsidated within this layer PUBMED:8178489. The N terminus of rotavirus VP2 is necessary for the encapsidation of VP1 and VP3 PUBMED:9420216.\ 463 IPR006933 \

    This family is defined by an N-terminal conserved region found in several huntingtin-associated protein 1 (HAP1) homologues. HAP1 binds to huntingtin in a polyglutamine repeat-length-dependent manner. However, its possible role in the pathogenesis of Huntingtons disease is unclear. This family also includes a similar N-terminal conserved region from hypothetical protein products of ALS2CR3 genes found in the human juvenile amyotrophic lateral sclerosis critical region 2q33-2q34 PUBMED:11161814.

    \ 4377 IPR005130 \

    L-serine dehydratase is found as a heterodimer of alpha and beta chain or as a fusion of the two chains in a single protein. This enzyme catalyses the deamination of serine\ to form pyruvate. This enzyme is part of the gluconeogenesis pathway.

    \ 6471 IPR010590 \

    This family consists of several enterobacterial YbdJ proteins. The function of this family is unknown

    \ 5632 IPR008658 \ This family consists of several eukaryotic kinesin-associated (KAP) proteins. Kinesins are intracellular multimeric transport motor proteins that move cellular cargo on microtubule tracks. It has been shown that the sea urchin KRP85/95 holoenzyme associates with a KAP115 non-motor protein, forming a heterotrimeric complex in vitro, called the Kinesin-II PUBMED:10819327.\ 6740 IPR010699 \

    This family consists of several hypothetical bacterial proteins of around 200 residues in length. The function of this family is unknown although a few members are thought to be membrane proteins.

    \ 7930 IPR012509 \

    This family consists of the Anemonia sulcata toxin III (ATX III) neurotoxin family. ATX III is a neurotoxin that is produced by sea anemone; it adopts a compact structure containing four reverse turns and two other chain reversals, but no regular alpha-helix or beta-sheet. A hydrophobic patch found on the surface of the peptide may constitute part of the sodium channel binding surface PUBMED:7727358.

    \ 5555 IPR008887 \ This small family of proteins is currently restricted to Methanosarcina species. Members of this family are about 200 residues in length, except for that has two copies of this region. Although the function of this region is unknown the pattern of conservation suggests that this may be an enzyme, including multiple conserved aspartate and glutamate residues. The most conserved motif in these proteins is NEL/MEXNE/D, where X can be any amino acid, and is found at the C terminus of these proteins.\ 7431 IPR011460 \

    These proteins of unknown function are found in Leptospira interrogans and in several gamma proteobacteria.

    \ 367 IPR008331 \

    Ferritin is one of the major non-haem iron storage proteins in animals, plants, and microorganisms PUBMED:15222465. It consists of a mineral core of hydrated ferric oxide, and a multi-subunit protein shell that encloses the former and assures its solubility in an aqueous environment.

    \

    In animals the protein is mainly cytoplasmic and there are generally two or more genes that encode closely related subunits - in mammals there are two subunits which are known as H(eavy) and L(ight). In plants ferritin is found in the chloroplast PUBMED:2211706.

    \

    This family contains ferritins and other ferritin-like proteins such as members of the DPS family and bacterioferritins.

    \ 738 IPR004971 \ This is a family of viral mRNA capping enzymes. The enzyme catalyses the first two reactions in the mRNA cap formation pathway. It is a heterodimer consisting of a large and small subunit. This entry is the large subunit. \ 3652 IPR004356 \

    P pili, or fimbriae, are ~68A in diameter and 1 micron in length, the\ bulk of which is a fibre composed of the main structural protein PapA PUBMED:1348107.\ At its tip, the pilus is terminated by a fibrillum consisting of repeating\ units of the PapE protein. This, in turn, is topped by the adhesins, PapF\ and PapG, both of which are needed for receptor binding. The tip fibrillum\ is anchored to the main PapA fibre by the PapK pilus-adaptor protein. PapH,\ an outer membrane protein, then anchors the entire rod in the bacterial\ envelope PUBMED:7816100. A cytoplasmic chaperone (PapD) assists in assembling the \ monomers of the macromolecule in the membrane.

    \

    All of the functional pap genes are arranged in a cluster (operon) on the \ Escherichia coli genome. It is believed that selective pressure exerted by the \ host's urinal and intestinal tract isoreceptors forced the spread of this \ operon to other strains via lateral transfer PUBMED:1357526. PapB, encoded within the \ cluster, acts as a transcriptional regulator of the functional pap genes\ and is located in the bacterial cytoplasm PUBMED:2568258. Its mechanism involves\ differential binding to separate sites in the cluster, suggesting that \ this protein is both an activator and repressor of pilus-adhesion \ transcription. The protein shares similarity with other E. coli fimbrial-\ adhesion transcription regulators, such as AfaA, DaaA and FanB.\

    \ 7120 IPR009909 \

    This entry represents a domain of approximately 90 residues that is tandemly repeated within interferon-induced 35 kDa protein (IFP 35) and the homologous N-myc-interactor (Nmi). This domain mediates Nmi-Nmi protein interactions and subcellular localisation PUBMED:10950963.

    \ 7530 IPR011641 \

    This domain contains 5 conserved cysteine residues, that are likely to\ participate in disulphide bonds. They are found in a wide variety of\ extracellular proteins. Their function is currently unknown.

    \ 950 IPR004344 \

    Tubulins and microtubules are subjected to several post-translational modifications of which the reversible\ detyrosination/tyrosination of the carboxy-terminal end of most alpha-tubulins has been extensively analysed. This\ modification cycle involves a specific carboxypeptidase and the activity of the tubulin-tyrosine ligase (TTL) PUBMED:10685598. Tubulin-tyrosine ligase (TTL) catalyses the\ ATP-dependent post-translational addition of a tyrosine to the carboxy terminal end of detyrosinated alpha-tubulin. The true\ physiological function of TTL has so far not been established. In\ normally cycling cells, the tyrosinated form of tubulin predominates. However, in breast cancer cells, the detyrosinated\ form frequently predominates, with a correlation to tumour aggressiveness PUBMED:11431336.

    \

    3-nitrotyrosine has\ been shown to be incorporated, by TTL, into the carboxy terminal end of detyrosinated alpha-tubulin. This reaction is not\ reversible by the carboxypeptidase enzyme. Cells cultured in 3-nitrotyrosine rich medium showed evidence of altered\ microtubule structure and function, including altered cell morphology, epithelial barrier dysfunction, and apoptosis PUBMED:10339593.

    \ 1852 IPR002825 \

    The function of the archaebacterial proteins in this family is unknown.

    \ 4793 IPR007584 \ UL35 represents a true late gene which encodes a 12 kDa capsid protein PUBMED:1313892.\ 3102 IPR002648 \ Isopentenyl transferase / dimethylallyl transferase synthesizes isopentenyladensosine 5'-monophosphate, a cytokinin that induces shoot formation on host plants infected with the Ti plasmid PUBMED:1465104.\ 45 IPR000020 \

    Complement components C3, C4 and C5 are large glycoproteins that have important functions in the immune response and host defence PUBMED:1431125. They have a wide variety of biological activities and are proteolytically activated by cleavage at a specific site, forming a- and b-fragments PUBMED:2777798. A-fragments form distinct structural domains of approximately 76 amino acids, coded for by a single exon within the complement protein gene. The C3a, C4a and C5a components are referred to as anaphylatoxins PUBMED:2777798, PUBMED:3081348: they cause smooth muscle contraction, histamine release from mast cells, and enhanced vascular permeability PUBMED:3081348. They also mediate chemotaxis, inflammation, and generation of cytotoxic oxygen radicals PUBMED:3081348. The proteins are highly hydrophilic, with a mainly alpha-helical structure held together by 3 disulphide bridges PUBMED:3081348.

    \

    Fibulins are secreted\ glycoproteins that become incorporated into a fibrillar extracellular matrix when\ expressed by cultured cells or added exogenously to cell monolayers PUBMED:2269669, PUBMED:12778127. The five known members of the family share an elongated structure\ and many calcium-binding sites, owing to the presence of tandem\ arrays of epidermal growth factor-like domains. They have\ overlapping binding sites for several basement-membrane proteins,\ tropoelastin, fibrillin, fibronectin and proteoglycans, and they\ participate in diverse supramolecular structures. The\ amino-terminal domain I of fibulin consists of three anaphylatoxin-like (AT)\ modules, each approximately 40 residues long and containing four or six cysteines. The\ structure of an AT module was determined for the\ complement-derived anaphylatoxin C3a, and was found to be\ a compact alpha-helical fold that is stabilized by three disulphide bridges in the\ pattern Cys14, Cys25 and Cys36 (where Cys is cysteine). The bulk of the remaining portion of the fibulin molecule is a series of\ nine EGF-like repeats PUBMED:8245130. The ProDom signature in this entry does not hit the fibulins.

    \ 6763 IPR009702 \

    This family consists of several hypothetical bacterial and archaeal proteins of around 130 residues in length. The function of this family is unknown, although it is thought that they may be iron-sulphur binding proteins.

    \ 878 IPR000897 \

    The signal recognition particle (SRP) is an oligomeric complex that mediates targeting and insertion \ of the signal sequence of exported proteins into the membrane of the endoplasmic reticulum. SRP \ consists of a 7S RNA and six protein subunits. One of these subunits, the 54 kD protein (SRP54), is \ a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. The 54K subunit of the signal recognition particle has a two domain structure: the G-domain that binds GTP and the M-domain (see ) that binds the 7s RNA and also binds the signal sequence. The \ N-terminal 300 residues of SRP54 include the GTP-binding site (G-domain) and are evolutionary related \ to similar domains in other proteins PUBMED:7518075.

    \

    These proteins include Escherichia coli and Bacillus \ subtilis ffh protein (P48), which seems to be the prokaryotic counterpart of SRP54; signal recognition \ particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which \ ensures, in conjunction with SRP, the correct targeting of nascent secretory proteins to the \ endoplasmic reticulum membrane; bacterial FtsY protein, which is believed to play a similar role to \ that of the docking protein in eukaryotes; the pilA protein from Neisseria gonorrhoeae, the homolog of \ ftsY; and bacterial flagellar biosynthesis protein flhF.

    \ 7533 IPR011663 \ The UbiC transcription regulator-associated (UTRA) domain is a conserved ligand-binding domain that has a similar fold to PUBMED:12757941. It is believed to modulate activity of bacterial transcription factors in response to binding small molecules PUBMED:12757941.\ 2699 IPR004995 \ Dormant Bacillus subtilis spores germinate in the presence of particular nutrients called germinants. The spores are thought to\ recognize germinants through receptor proteins encoded by the gerA family of operons, which includes gerA, gerB, and\ gerK. The GerA proteins are predicted to be membrane associated.\ 2694 IPR002488 \ This family consists of the N terminal region of geminivirus\ C4 or AC4 proteins. In Tomato yellow leaf curl geminivirus\ (TYLCV) the C4 protein is necessary for efficient spreading of\ the virus in Lycopersicon esculentum (tomato) PUBMED:8091687.\ 7557 IPR013103 \

    A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. Reverse transcriptases occur in a variety of mobile elements, including retrotransposons, retroviruses, group II introns, bacterial msDNAs, hepadnaviruses, and caulimoviruses. This entry includes reverse transcriptases not recognised by PUBMED:1698615.

    \ 2133 IPR007424 \

    This sequence is usually found in association with and , and occasionally also with in integral membrane proteins. Together, this entry, and make up the C-terminal portion of Staphylococcus aureus FmtC/MprF, which is involved in resistance to defensins by the lysinylation of membrane phospholipids PUBMED:11342591. This domain along with and also occurs adjacent to the OB-fold nucleic acid binding domain () and tRNA synthetase class II () in lysyl-tRNA synthases.

    \ 7292 IPR010009 \

    This family consists of several insect apolipoprotein-III sequences. Exchangeable apolipoproteins constitute a functionally important family of proteins that play critical roles in lipid transport and lipoprotein metabolism. Apolipophorin III (apoLp-III) is a prototypical exchangeable apolipoprotein found in many insect species that functions in transport of diacylglycerol (DAG) from the fat body lipid storage depot to flight muscles in the adult life stage PUBMED:11818551.

    \ 533 IPR003973 \

    Potassium channels are the most diverse group of the ion channel family\ PUBMED:1772658, PUBMED:1879548. They are important in shaping the action potential, and in neuronal excitability and plasticity PUBMED:2451788. The potassium channel family is\ composed of several functionally distinct isoforms, which can be broadly\ separated into 2 groups PUBMED:2555158: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.

    \

    These are all highly similar proteins, with only small amino acid\ changes causing the diversity of the voltage-dependent gating mechanism,\ channel conductance and toxin binding properties. Each type of K+ channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; and others are regulated by GTP-binding proteins or\ other second messengers PUBMED:2448635. In eukaryotic cells, K+ channels\ are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes PUBMED:1373731. In prokaryotic cells, they play a role in the\ maintenance of ionic homeostasis PUBMED:11178249.

    \

    All K+ channels discovered so far possess a core of \ alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has\ been termed the K+ selectivity sequence.\ In families that contain one P-domain, four subunits assemble to form a selective pathway for K+ across the membrane.\ However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K+ channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains.\ The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K+ channels; and three types of calcium (Ca)-activated K+ channels (BK, IK and SK)\ PUBMED:11178249, PUBMED:. The 2TM domain family comprises inward-rectifying K+ \ channels. In addition, there are K+ channel alpha-subunits that possess two P-domains. These are usually highly regulated K+ selective leak channels.

    \

    The Kv family can be divided into 4 subfamilies on the basis of sequence\ similarity and function: Shaker (Kv1), Shab (Kv2), Shaw (Kv3) and Shal \ (Kv4). All consist of pore-forming alpha subunits that associate with \ different types of beta subunit. Each alpha subunit comprises six hydrophobic TM domains with a P-domain between the fifth and sixth, which partially resides in the membrane. The fourth TM domain has positively charged residues at every third residue and acts as a voltage sensor, which triggers the conformational change that opens the channel pore in response to a displacement in membrane potential PUBMED:10712896.

    \

    The Shab voltage-gated delayed rectifier K+ channels (also known as Kv2 \ channels) are responsible for much of the delayed rectifier current in \ Drosophila melanogaster nervous system and muscle. However, in vertebrate, Kv2 channels\ have largely undetermined roles in the delayed rectifier currents of the \ heart and skeletal muscle. Kv2 channels can be further divided into 2\ subtypes, designated Kv2.1 and Kv2.2 PUBMED:.

    \ 5984 IPR010377 \

    This family consists of several hypothetical bacterial and one Caenorhabditis elegans sequence (). The function of this family is unknown.

    \ 4383 IPR005609 \

    This family consists of homologues of Sec61beta - a component of the Sec61/SecYEG protein secretory system. The domain is found in eukaryotes and archaea and is possibly homologous to the bacterial SecG.

    \ 649 IPR006844 \ The proteins in this family are a part of a complex of eight ER proteins that transfers core oligosaccharide from dolichol carrier to Asn-X-Ser/Thr motifs PUBMED:7622558. This family includes both OST3 and OST6, each of which contains four predicted transmembrane helices. Disruption of OST3 and OST6 leads to a defect in the assembly of the complex. Hence, the function of these genes seems to be essential for recruiting a fully active complex necessary for efficient N-glycosylation PUBMED:10358084.\ 5278 IPR008710 \ Nicastrin and presenilin are two major components of the gamma-secretase complex, which executes the intramembrane proteolysis of type I integral membrane proteins such as the amyloid precursor protein (APP) and Notch. Nicastrin is synthesised in fibroblasts and neurons as an endoglycosidase-H-sensitive glycosylated precursor protein (immature nicastrin) and is then modified by complex glycosylation in the Golgi apparatus and by sialylation in the trans-Golgi network (mature nicastrin) PUBMED:12584255.\ 3595 IPR004248 \

    Borrelia burgdorferi supercoiled plasmids encode multicopy tandem open reading frames called Orf-A, Orf-B, Orf-C and Orf-D. This family corresponds to Orf-D. The\ putative product of this gene has no known function PUBMED:8655511.

    \ 4674 IPR013049 \

    In all organisms, type II DNA topoisomerases are essential for untangling\ chromosomal DNA PUBMED:10545127. The structure of the DNA-binding core of the \ Methanococcus jannaschii DNA topoisomerase VI A subunit has been determined\ to 2.0A resolution. The overall structure of the subunit is unique, demonstrating that archaeal type II enzymes are distinct from other type II\ topoisomerases. Nevertheless, the core structure contains a pair of domains that are also found in type IA and classic type II topoisomerases. Such regions may form the basis of a DNA cleavage mechanism shared among these enzymes PUBMED:10545127.

    \

    The core A subunit is a dimer, with a deep groove spanning both protomers\ PUBMED:10545127. The dimer architecture is such that DNA is thought to bind in the groove, across the A subunit interface, and the monomers are thought to\ separate during DNA transport. The A subunit of topoisomerase VI is similar to the meiotic recombination factor, Spo11.

    \

    This domain is present in type IIB topoisomerases and is thought to be involved in DNA binding due to its similarity to E. coli catabolite activator protein (CAP).

    \ \ 7957 IPR013118 \

    This domain is the mannitol dehydrogenase C-terminal domain.

    \ 4467 IPR007375 \ Sarcosine oxidase is a hetero-tetrameric enzyme that contains both covalently bound FMN and non-covalently bound FAD and NAD+. This enzyme catalyzes the oxidative demethylation of sarcosine to yield glycine, H2O2, and 5,10-CH2-tetrahydrofolate (H4folate) in a reaction requiring H4folate and O2 PUBMED:11330998, PUBMED:7543100.\ 97 IPR007084 \

    The BRICHOS family is defined by a 100 amino acid region found in a variety of proteins implicated in dementia, respiratory distress and cancer, including BRI-2, Chondromodulin-I (ChM-I), CA11, and surfactant protein C PUBMED:12114016. In several cases, the BRICHOS region is located in the propeptide region that is removed after proteolytic processing. This domain could be involved in the complex post-translational processing of these proteins.

    \ 7788 IPR012917 \

    This is a family of mitochondrial ribosomal proteins, which appears to be fungal specific PUBMED:1574929.

    \ 781 IPR002610 \

    This family contains integral membrane proteins that are related to Drosophila rhomboid protein . Members of this family are found in archaea, bacteria and eukaryotes. Analysis\ suggests that Rhomboid-1 is a novel intramembrane serine protease that directly cleaves the membrane-anchored TGF-alpha-like growth factor Spitz, allowing it to activate the Drosophila EGF receptor PUBMED:2110920, PUBMED:11672525. These proteins contain three strongly conserved histidines in the putative transmembrane regions that may be involved in the peptidase function.

    \ \ \

    This group of proteins contain probable serine peptidases belonging to the MEROPS peptidase family S54 (Rhomboid, clan S-) and proteins classified as non-peptidase homologues that either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    \ \ \ \ 25 IPR001392 \ Clathrin-coated pits and vesicles originate from the plasma membrane and \ the trans-Golgi, and mediate the trafficking of proteins to and from the\ membranes PUBMED:8288128. The different vesicle types transport different proteins.\ Plasma membrane vesicles are involved in the endocytosis of membrane\ proteins, such as LDL and EGF receptors and trans-Golgi vesicles are\ involved in protein sorting and regulated secretion.

    The main components\ of the pits are clathrin, and the clathrin-associated protein complex, AP,\ (also known as assembly or adaptor proteins) PUBMED:2177341. Both trans-Golgi\ adaptor proteins, AP-1, and plasma membrane adaptor proteins, AP-2, are\ heterotetramers that consist of two large chains (beta' and gamma in AP-1,\ and alpha and beta in AP-2); a medium chain (AP47 in AP-1, and AP50 in\ AP-2); and a small chain (AP19 in AP-1, and AP17 in AP-2).\

    \

    The adaptor complexes are believed to couple clathrin lattices with \ particular membrane proteins by interacting with their cytoplasmic tails,\ leading to their selection and concentration: the medium chains regulate\ this process by self-phosphorylation via a mechanism that is still unclear\ PUBMED:1761056. The medium chains possess a highly conserved N-terminal domain of\ around 230 amino acids, which may be the region of interaction with other\ AP proteins; a linker region of between 10 and 42 amino acids; and a less\ well-conserved C-terminal domain of around 190 amino acids, which may be\ the site of specific interaction with the protein being transported\ in the vesicle PUBMED:1761056.

    \ 1959 IPR004881 \

    This protein has been shown to cleave GTP, remain bound to GDP PUBMED:12220175, and acts as an unusual circulary permuted GTPase that catalyzes rapid hydrolysis of GTP with a slow catalytic turnover. A role as a regulator of translation has been suggested PUBMED:14973029. The Aquifex aeolicus ortholog is split into consecutive open reading frames.

    \ 5590 IPR008912 \ This group of proteins contains a VWA type domain and the function of this family is unknown. It is found as part of a CO oxidising (Cox) system operon is several bacteria PUBMED:10433972.\ 5188 IPR008025 \

    Contractility of vascular smooth muscle depends on phosphorylation of myosin light chains, and\ is modulated by hormonal control of myosin phosphatase activity. Signaling pathways activate\ kinases such as PKC or Rho-dependent kinases that phosphorylate the myosin phosphatase inhibitor\ protein called CPI-17. Phosphorylation of CPI-17 at Thr-38 enhances its inhibitory potency\ 1000-fold, creating a molecular switch for regulating contraction PUBMED:11734001.

    \ 3401 IPR007227 \ MreD (murein formation D) protein is involved in the rod shape determination in Escherichia coli, and more generally in cell shape determination of bacteria whether or not they are rod-shaped.\ 1570 IPR001473 \

    Clathrin is the major protein of the polyhedral coat of coated pits and vesicles. In yeast, it is involved in the retention of proteins in an intracellular membrane compartment, probably the trans-golgi. Clathrin has a triskeleton structure composed of three heavy chains and three light chains that are the basic subunits of the clathrin coat. The C-terminal domain forms the hub of the triskeleton and contains the trimerization domain and the light chain binding domain involved in the assembly of the clathrin lattice.

    \ \

    The N-terminal of the heavy chain is known as the globular domain, and is composed of seven repeats which form a beta propeller PUBMED:9827808.

    \ 5123 IPR007960 \

    This family consists of several forms of mammalian taste receptor proteins (TAS2Rs). TAS2Rs\ are G protein-coupled receptors expressed in subsets of taste receptor cells of the tongue and palate\ epithelia and are organised in the genome in clusters. The proteins are genetically linked to loci that\ influence bitter perception in mice and humans\ PUBMED:10761934.

    \ 98 IPR007109 \

    The Brix domain is found in a number of eukaryotic proteins including SSF proteins from yeast and humans, Arabidopsis thaliana Peter Pan-like protein and several hypothetical proteins.

    \ 7728 IPR012463 \

    The members of this family are sequences derived from hypothetical plant proteins of unknown function. One member of this family () is annotated as a putative RNA-binding protein, but no evidence was found to support this.

    \ 3689 IPR007320 \

    PDCD2 is localized predominantly in the cytosol of cells situated at the opposite pole of the germinal center from the centroblasts as well as in cells in the mantle zone. It has been shown to interact with BCL6, an evolutionarily conserved Kruppel-type zinc finger protein that functions as a strong transcriptional repressor and is required for germinal center development. The rat homologue, Rp8, is associated with programmed cell death in thymocytes.

    \ 3749 IPR001842 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M36 (fungalysin family, clan MA(E)). The predicted active site residues for members of this family and thermolysin, the type example for clan MA, occur in the motif HEXXH.

    \ \

    Fungalysin is produced by fungi, Aspergillus and other\ species, to aid degradation of host lung cell walls on infection. The\ enzyme is a 42kDa single chain protein, with a pH optimum of 7.5-8.0 and\ optimal temperature of 60 celcius PUBMED:.

    \ 4021 IPR002682 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbJ found in PSII. PsbJ is one of the most hydrophobic proteins in the thylakoid membrane, and is located in a gene cluster with PsbE, PsbF and PsbL (PsbEFJL). Both PsbJ and PsbL () are essential for proper assembly of the OEC. Mutations in PsbJ cause the light-harvesting antenna to remain detached from the PSII dimers PUBMED:14686923. In addition, both PsbJ and PsbL are involved in the unidirectional flow of electrons, where PsbJ regulates the forward electron flow from D2 (Qa) to the plastoquinone pool, and PsbL prevents the reduction of PSII by back electron flow from plastoquinol protecting PSII from photo-inactivation PUBMED:14979726.

    \ 1240 IPR001518 \

    Argininosuccinate synthase () (AS) is a urea cycle enzyme that catalyzes the penultimate step in arginine biosynthesis: the ATP-dependent ligation of citrulline to aspartate to form argininosuccinate, AMP and pyrophosphate PUBMED:2123815, PUBMED:3133361.

    \

    In humans, a defect in the AS gene causes citrullinemia, a genetic disease\ characterized by severe vomiting spells and mental retardation.

    \

    AS is a homotetrameric enzyme of chains of about 400 amino-acid residues. An arginine seems to be important for the enzyme's catalytic mechanism. The sequences of AS from various prokaryotes, archaebacteria and eukaryotes show significant similarity.

    \ 4116 IPR000153 \ Sigma 3 is the major outer capsid protein of reovirus PUBMED:8648682.\ Sigma 3 is encoded by genome segment 4. Sigma 3 binds to \ double stranded RNA and associates with polypeptide u1 and \ its cleavage product u1C to form the outer shell of the virion.\ The Sigma 3 protein possesses a zinc-finger motif and an\ RNA-binding domain in the N and C termini respectively.\ This protein is also thought to play a role in pathogenesis.\ 3927 IPR007664 \

    The poxvirus A28 protein is expressed at late times during the virus replication cycle and is a membrane component of the intracellular mature virion. Repression of A28 inhibits cell-to-cell spread, suggesting that all poxviruses use a common A28-dependent mechanism of cell penetration PUBMED:14963132. An N-terminal hydrophobic sequence, present in all poxvirus A28 orthologues, anchors the protein in the virion surface membrane so that most of it is exposed to the cytoplasm PUBMED:14963131.

    \ 936 IPR000301 \

    A number of eukaryotic CD antigens have been shown to be related\ PUBMED:1860863. CD9 (also called DRAP-27, MRP-1 or p24) upregulates HB-EGF activity as a receptor for diphtheria toxin as well as its juxtacrine activity. CD9 mAbs modulate cell adhesion and migration and trigger platelet activation that is blocked by mAbs directed to the platelet Fc receptor CD32. In mice, CD9 mAb KMC8.8 has been shown to inhibit the production of myeloid cells in vitro and has a costimulatory activity for T cells. CD9 is a type III membrane protein, with four putative transmembrane domains.

    \

    CD37 (or gp52-40) is involved in signal transduction and serves as a stable marker for malignancies derived from mature B cells, like B-CLL, HCL, and all types of B-NHL.

    \

    CD63 transfection reduced melanoma cell motility on fibronectin, collagen and laminin, and reduced the growth and metastasis of melanoma cells in nude mice PUBMED:9120293.\ CD63 has been used as a marker for late endosomes and for primary melanomas.\

    \ \

    These proteins are all type II membrane proteins: they contain an\ N-terminal transmembrane (TM) domain, which acts both as a signal sequence\ and a membrane anchor, and 3 additional TM regions (hence the name 'TM4').\ The sequences contain a number of conserved cysteine residues.

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm\

    \ \ 188 IPR011127 \ This entry represents the N-terminal region of the D-alanine--D-alanine ligase enzyme () which is thought to be involved in substrate binding PUBMED:10908650. D-Alanine is one of the central molecules of the cross-linking step of peptidoglycan assembly. There are three enzymes involved in the D-alanine branch of peptidoglycan biosynthesis: the pyridoxal phosphate-dependent D-alanine racemase (Alr), the ATP-dependent D-alanine:D-alanine ligase (Ddl), and the ATP-dependent D-alanine:D-alanine-adding enzyme (MurF) PUBMED:9054558.\ 2544 IPR000208 \ Flaviviruses produce a polyprotein from the ssRNA genome. The polyprotein is cleaved to a number of products one of which is NS5. Recombinant dengue type 1 virus NS5 protein expressed in Escherichia coli exhibits RNA-dependent RNA polymerase activity.\ This RNA-directed RNA polymerase possesses a number of short\ regions and motifs homologous to other RNA-directed RNA \ polymerases PUBMED:8607261.\ 6190 IPR009419 \

    This family consists of the P3A protein of Picornaviridae. P3A has been identified as a genome-linked protein (VPg) and is involved in replication PUBMED:3018280.

    \ 6328 IPR010520 \

    This family consists of bacterial proteins of unknown function, which are hydrolase-like.

    \ 4004 IPR003146 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    The peptidases are synthesised as inactive molecules, zymogens, with propeptides that must be removed by proteolytic cleavage to activate the enzyme.\ Structural studies of carboxypeptidases A and B reveal the propeptide to\ exist as a globular domain, followed by an extended alpha-helix; this\ shields the catalytic site, without specifically binding to it, while the\ substrate-binding site is blocked by making specific contacts PUBMED:7674922, PUBMED:1548696.

    \ \

    Members of this propeptide family are found in the metallocarboxypeptidases: A1, A2 PUBMED:9384570, A3, A4, A5, A6, U, insect gut carboxypeptidase and B PUBMED:12162965, and and are associated with peptidases belonging to MEROPS peptidase family M14A.

    \ \

    Carboxypeptidases are found in abundance in pancreatic secretions. The pro-segment moiety (activation peptide) accounts for up to a quarter of the total length of the peptidase.

    \ 5545 IPR008878 \ This protein is found in insertion sequences related to IS66. The function of these proteins is uncertain, but they are probably essential for transposition PUBMED:11418571.\ 4118 IPR007199 \

    Replication factor-a protein 1 (RPA1) forms a multiprotein complex with RPA2 and RPA3 that binds single-stranded DNA and functions in the recognition of DNA damage for nucleotide excision repair. The complex binds to single-stranded DNA sequences participating in DNA replication in addition to those mediating transcriptional repression and activation, and stimulates the activity of cognate strand exchange protein Sep1. It cooperates with T-AG and DNA topoisomerase I to unwind template DNA containing the Simian Virus 40 origin of replication.

    \ 6050 IPR010411 \

    These proteins include several putative tail assembly chaperones encoded by phages of Gram-negative bacteria.

    \ 2224 IPR007679 \ This is a family of hypothetical proteins. Some family members contain two copies of the region.\ 3703 IPR001646 \ These repeats were first identified in many cyanobacterial proteins but they are also found in bacterial as well as in plant proteins PUBMED:9654141. The repeats were first identified in hglK PUBMED:7592418. The function of these repeats is unknown. The structure of this repeat has been predicted to be a beta-helix PUBMED:9655353. The repeat can be approximately described as A(D/N)LXX, where X can be any amino acid.\ 1149 IPR000866 \

    Peroxiredoxins (Prxs) are a ubiquitous family of antioxidant enzymes that also control cytokine-induced peroxide levels\ which mediate signal transduction in mammalian cells. Prxs can be regulated by changes to phosphorylation, redox and\ possibly oligomerization states. Prxs are divided into three classes: typical 2-Cys Prxs; atypical 2-Cys Prxs; and 1-Cys\ Prxs. All Prxs share the same basic catalytic mechanism, in which an active-site cysteine (the peroxidatic cysteine) is\ oxidized to a sulphenic acid by the peroxide substrate. The recycling of the sulphenic acid back to a thiol is what\ distinguishes the three enzyme classes. Using crystal structures, a detailed catalytic cycle has been derived for typical\ 2-Cys Prxs, including a model for the redox-regulated oligomeric state proposed to control enzyme activity PUBMED:12517450.

    \ \ \

    Alkyl hydroperoxide reductase (AhpC) is responsible for directly reducing organic hyperoxides in \ its reduced dithiol form. Thiol specific antioxidant (TSA) is a physiologically important antioxidant\ which constitutes an enzymatic defense against sulphur-containing radicals. This family contains AhpC \ and TSA, as well as related proteins.

    \ \

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Asp f 3, Mal f 2 and Mal f 3.

    \ 3948 IPR007675 \

    Protein F15 is found in a number of Poxviruses.

    \ 2161 IPR007461 \

    Proteins in this family often also contain an SH3 domain (), or a FYVE zinc finger ().

    \ 2860 IPR005566 \

    Expression of Hydrophobic Abundant protein is thought to be developmentally regulated and possibly involved in spherule cell wall formation PUBMED:3170484.

    \ 2528 IPR000259 \ Members of this family of bacterial proteins are involved in regulation of length and mediation of adhesion of fimbriae. Fimbriae (also called pili), are polar filaments radiating from the surface of the bacterium to a length of 0.5-1.5 micrometers, that enable bacteria to colonize the epithelium of specific host organs. Fimbriae are also responsible to promote virulence PUBMED:10066469, PUBMED:1681580, PUBMED:2890081.\ 2132 IPR002727 \ This family includes prokaryotic proteins of unknown function, as well as a protein annotated as the pit accessory protein from Sinorhizobium meliloti . However, the function of this protein is also unknown (Pit stands for Phosphate transport) PUBMED:8013901.\ 1542 IPR002573 \ Choline kinase, (ATP:choline phosphotransferase, ) catalyses the committed step in the synthesis of\ phosphatidylcholine by the CDP-choline pathway PUBMED:9506987.\ 7584 IPR011667 \ This region is found in a number of hypothetical proteins thought to be expressed by the eukaryote Encephalitozoon cuniculi, an obligate intracellular microsporidial parasite. It is approximately 200 residues long.\ 4484 IPR007730 \

    This 35 residue repeat is found in bacterial proteins involved in sporulation and cell division, such as FtsN, CwlM and RlpA. This repeat might be involved in binding peptidoglycan. FtsN is an essential cell division protein with a simple bitopic topology: a short N-terminal cytoplasmic segment fused to a large carboxy periplasmic domain through a single transmembrane domain. The repeats lie at the periplasmic C-terminus, which has an RNP-like fold PUBMED:15101973. FtsN localises to the septum ring complex. The CwlM protein is a cell wall hydrolase, where the C-terminal region, including the repeats, determines substrate specificity PUBMED:1495475. RlpA is a rare lipoprotein A protein that may be important for cell division. Its N-terminal cysteine may be attached to thioglyceride and N-fatty acyl residues PUBMED:3316191.

    \ 4659 IPR005683 \

    The mitochondrial protein translocase (MPT) family, which brings nuclearly encoded preproteins into mitochondria, is very complex with 19 currently identified protein constituents. These proteins include several chaperone proteins, four proteins of the outer membrane translocase (Tom) import receptor, five proteins of the Tom channel complex, five proteins of the inner membrane translocase (Tim) and three "motor" proteins.The inner membrane translocase is formed of a complex with a number of proteins, including the Tim17, Tim23 and Tim44 subunits. This family is specific for the Tom22 proteins.

    \ 333 IPR005517 \ This domain is found in elongation factor G, elongation factor 2 and some tetracycline resistance proteins and adopts a ribosomal protein S5 domain 2-like fold.\ 3437 IPR007385 \

    This family contains MukE, which are proteins involved in the segregation and condensation of prokaryotic chromosomes. MukE along with MukF () interact with MukB () in vivo forming a complex, which is required for chromosome condensation and segregation in Escherichia coli PUBMED:10545099. The Muk complex appears to be similar to the SMC-ScpA-ScpB complex in other prokaryotes where MukB is the homologue of SMC PUBMED:12065423. ScpA () and ScpB () have little sequence similarity to MukE or MukF, though they are predicted to be structurally similar, being predominantly alpha-helical with coiled coil regions.

    \ 2266 IPR006907 \ This family includes several uncharacterised mouse proteins.\ 2077 IPR007315 \ This is a family of eukaryotic membrane proteins with unknown function.\ 5313 IPR006612 \

    Zinc finger domains PUBMED:3125980, PUBMED: are nucleic acid-binding protein structures first \ identified in the Xenopus laevis transcription factor TFIIIA. These domains have since been found in \ numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid \ residues including 2 conserved Cys and 2 conserved His residues in a C-2-C-12-H-3-H type motif. \ The 12 residues separating the second Cys and the first His are mainly polar and basic, implicating \ this region in particular in nucleic acid binding. The zinc finger motif is an unusually small, \ self-folding domain in which Zn is a crucial component of its tertiary structure. All bind 1 atom of \ Zn in a tetrahedral array to yield a finger-like projection, which interacts with nucleotides in the \ major groove of the nucleic acid. The Zn binds to the conserved Cys and His residues. Fingers have \ been found to bind to about 5 base pairs of nucleic acid containing short runs of guanine residues. \ They have the ability to bind to both RNA and DNA, a versatility not demonstrated by the helix-turn-helix motif. The zinc finger may thus represent the original nucleic acid binding protein. It has \ also been suggested that a Zn-centred domain could be used in a protein interaction, e.g. in protein \ kinase C. Many classes of zinc fingers are characterized according to the number and positions of the \ histidine and cysteine residues involved in the zinc atom coordination. In the first class to be \ characterized, called C2H2, the first pair of zinc coordinating residues are cysteines, while the \ second pair are histidines.

    \

    The THAP domain is an ~90-residue domain restricted to animals, which is shared between the THAP family of cellular DNA-binding proteins, and transposases from mobile genomic parasites. The defined THAP domain includes: a C2CH signature (consensus: C-x(2,4)-C-x(35,50)-C-x(2)-H); three additional key residues that are strictly conserved in all THAP domains that have been found to date (THAP1 amino acids P26, W36, F58); a C-terminal AVPTIF box; and several other conserved amino acid positions with distinct physicochemical properties (e.g. hydrophobic and polar). The THAP domain can be found in one or more copies and can be associated with other domains, such as the C2H2-type zinc finger. The THAP domain is supposed to be a DNA-binding domain (DBD) PUBMED:12575992, PUBMED:12717420.

    \ 4614 IPR007076 \

    This domain is found in a number of bacterial proteins including the TfoX gene product of Haemophilus influenzae. TfoX may play a key role in the development of genetic competence by regulating the expression of late competence-specific genes PUBMED:7724607. This family corresponds to the N-terminal presumed domain of TfoX. The domain is found in association with the C-terminal domain in some, but not all members of this group, suggesting this is an autonomous and functionally unrelated domain.

    \ 3536 IPR000744 \

    Regulated exocytosis of neurotransmitters and hormones, as well as intracellular traffic, requires fusion of two lipid bilayers. SNARE proteins are thought to form a protein bridge, the SNARE complex, between an incoming vesicle and the acceptor compartment. SNARE proteins contribute to the specificity of membrane fusion, implying that the mechanisms by which SNAREs are targeted to subcellular compartments are important for specific docking and fusion of vesicles. This mechanism involves a family of conserved proteins, members of which appear to function at all sites of constitutive and regulated secretion in eukaryotes PUBMED:7846761. Among them are 2 types of cytosolic protein, NSF (N-ethyl-maleimide-sensitive protein) and the SNAPs (alpha-, beta- and gamma-soluble NSF attachment proteins). The yeast vesicular fusion protein,sec17, a cytoplasmic peripheral membrane protein involved in vesicular transport between the\ endoplasmic reticulum and the golgi apparatus, shows a high degree of sequence similarity to the alpha-SNAP family.

    \

    SNAP-25 and its non-neuronal homologue Syndet/SNAP-23 are synthesized as soluble proteins in the cytosol. Both SNAP-25 and Syndet/SNAP-23 are palmitoylated at cysteine residues clustered in a loop\ between two N- and C-terminal coils and palmitoylation is essential for membrane binding and plasma membrane targeting. The C-terminal and the N-terminal helices of SNAP-25, are each targeted to the plasma membrane by two distinct cysteine-rich domains and appear to regulate the availability of SNAP to form complexes with SNARE PUBMED:12140265.

    \ 291 IPR007613 \ This is a domain which occurs in several uncharacterised plant proteins. It is predicted to contain several transmembrane helices and is usually found together with a cytochrome domain ().\ 6463 IPR010586 \

    This family consists of several nodulation protein NolV sequences from different Rhizobium species PUBMED:8412662. The function of this family is unclear.

    \ 342 IPR006885 \ This is a family of pankaryotic NADH-ubiquinone oxidoreductase subunits (, ) from complex I of the electron transport chain initially identified in Neurospora crassa as a 21 kDa protein PUBMED:7947902.\ 8082 IPR013254 \

    The sperm-activating peptides (SAPs) are isolated in egg-conditioned media (egg jelly) of sea urchins. SAPs have several effects on sea urchin spermatozoa: stimulate sperm respiration and motility through intracellular alkalinization, transient elevation of cAMP, cGMP and Ca2+ levels in sperm cells PUBMED:1756858, PUBMED:2059627.

    \ 7821 IPR012940 \

    This domain occurs in some putative nucleic acid binding proteins. One of these proteins has been partially characterised PUBMED:15488989 and contains two putative phosphorylation sites and a possible dimerisation / leucine zipper domain.

    \ 3924 IPR007755 \ This is a family of conserved Chordopoxvirinae A11 family proteins. A conserved region spans the entire protein in the majority of family members.\ 2177 IPR007508 \ This is a family of hypothetical proteins. It is present in prokaryotes and Arabidopsis.\ 2629 IPR006726 \ This domain includes a conserved region found in two proteins associated with fusaric acid resistance, from Burkholderia cepacia PUBMED:1370369 and from Klebsiella oxytoca. The function of this region is unknown.\ 4433 IPR003000 \ These sequences represent the Sir2 family of NAD+-dependent deacetylases. Silent Information Regulator protein of Saccharomyces cerevisiae (Sir2p) is one of several factors critical\ for silencing at least three loci. Among them, it is unique because\ it silences the rDNA as well as the mating type loci and telomeres. Sir2p\ interacts in a complex with itself and with Sir3p and Sir4p, two proteins that\ are able to interact with nucleosomes. In addition Sir2p also interacts with\ ubiquitination factors and/or complexes PUBMED:9214640.\ Unlike Sir3p and Sir4p, for which no homologues are known, Sir2p is part of a\ multigene family in yeast, the homolgues being HST1, HST2, HST3 and HST4. \ \ \ Highly conserved structural homologues also occur in other organisms ranging from bacteria to man and plants. Proteins of this family have been proposed to play a role in\ silencing, chromosome stability and ageingPUBMED:7498786. In addition, an in\ vitro ADP ribosyltransferase activity has been associated with Escherichia coli and\ human members of this family PUBMED:10381378.\ Homologues of Sir2 share a core domain including the GAG and NID motifs and a\ putative C4 Zinc finger. The regions containing these three conserved motifs\ are individually essential for Sir2 silencing function, as are the four\ cysteins PUBMED:10473645. In addition, the conserved residues HG next to the putative Zn\ finger have been shown to be essential for the ADP ribosyltransferase activity\ PUBMED:10381378. \ \ \ Sir2-like enzymes catalyze a reaction in which the cleavage of NAD(+)and histone and/or protein\ deacetylation are coupled to the formation of O-acetyl-ADP-ribose, a novel metabolite. The dependence of the reaction on both\ NAD(+) and the generation of this potential second messenger offers new clues to understanding the function and regulation of nuclear,\ cytoplasmic and mitochondrial Sir2-like enzymes PUBMED:12517451.\ \ \ \ 6177 IPR010468 \

    This domain is found in several mammalian hormone-sensitive lipase (HSL) proteins. Hormone-sensitive lipase, a key enzyme in fatty acid mobilisation, overall energy homeostasis, and possibly steroidogenesis, is acutely controlled via reversible phosphorylation by catecholamines and insulin PUBMED:3420405.

    \ 5151 IPR007988 \

    This family consists of several variants of the human and chimpanzee\ (Pan troglodytes) sperm antigen proteins (HE2 and EP2\ respectively). The EP2 gene codes for a family of androgen-dependent, epididymis-specific\ secretory proteins.The EP2 gene uses alternative promoters and differential splicing to produce a\ family of variant messages. The translated putative protein variants differ significantly from each\ other. Some of these putative proteins have similarity to beta-defensins, a family of antimicrobial\ peptides PUBMED:10819450.

    \ 758 IPR002478 \ The PUA domain named after PseudoUridine synthase and Archaeosine\ transglycosylase, was detected in archaeal and eukaryotic pseudouridine\ synthases, archaeal archaeosine synthases, a family of predicted ATPases\ that may be involved in RNA modification, a family of predicted archaeal\ and bacterial rRNA methylases. Additionally, the PUA domain was detected\ in a family of eukaryotic proteins that also contain a domain homologous\ to the translation initiation factor eIF1/SUI1; these proteins may\ comprise a novel type of translation factors. Unexpectedly, the PUA\ domain was detected also in bacterial and yeast glutamate kinases; this\ is compatible with the demonstrated role of these enzymes in the\ regulation of the expression of other genes PUBMED:10093218. It is predicted that\ the PUA domain is an RNA binding domain.\ 7337 IPR011088 \

    The members of this family are restricted to the Gammaproteobacteria and \ Epsilonproteobacteria, the function of these proteins is unknown.

    \ 3184 IPR007786 \ The baculovirus Autographa californica nuclear polyhedrosis virus encodes a DNA-dependent RNA polymerase that is required for transcription of viral late genes. This polymerase is composed of four equimolar subunits, LEF-8, LEF-4, LEF-9, and p47. LEF-9 is homologous to the largest beta-subunit of prokaryotic DNA-directed RNA polymerase PUBMED:12124466.\ 6219 IPR004575 \

    MAT1 (menage a trois 1) is a RING finger protein with a\ characteristic C3HC4 motif located in the N-terminal domain. MAT1 stabilizes the cyclin H-CDK7 complex to form a functional CDK-activating kinase (CAK) enzymatic complex which then goes on to activate many of the CDK enzymes intimately involved in the cell cycle PUBMED:11007478. CDK7 forms a stable complex with cyclin H and MAT1 in vivo only when phosphorylated on either one\ of two residues (Ser164 or Thr170) in its T-loop. The requirement for MAT1 for the activation of CAK can be by-passed by the phosphorylation of CDK7 on the T-loop. The two mechanisms for CDK7 complex stabilization and activationMAT1 addition and T-loop phosphorylationwhich can operate\ independently in vitro, actually cooperate under physiological conditions to maintain complex integrity. With prolonged exposure to elevated temperature,\ dissociation to monomeric subunits occurs in vivo when CDK7 is dephosphorylated, even in the presence of MAT1 PUBMED:11447116.

    \

    The Cyclin H-MAT1-CDK7 complex also forms part of TFIIH, a multiprotein complex required for both transcription and DNA repair.

    \ 175 IPR001117 \ Multicopper oxidases are enzymes that possess three spectroscopically different\ copper centers PUBMED:2404764, PUBMED:1995346. The enzymes that belong to\ this family include laccase () that in fungi and plants oxidizes many\ different types of phenols and diamines, ascorbate oxidase (), a higher plant\ enzyme and ceruloplasmin (), a protein found in the serum of mammals and\ birds that oxidizes a great variety of inorganic and organic substances.\ \

    The multicopper oxidase, type 1 domain is also present in proteins that have lost the ability\ to bind copper. Proteins which belong to this group are copper resistance protein A\ (copA) from a plasmid in Pseudomonas syringae, blood coagulation factors V (Fa V) and\ VIII (Fa VIII) PUBMED:8293473 and yeast FET3 which is required for ferrous iron uptake and others.

    \ 3885 IPR006904 \

    These sequences are a family of uncharacterised hypothetical proteins restricted to eukaryotes. represents a sequence from Nicotiana tabacum which is up regulated in response to TMV infection.

    \ 7072 IPR010827 \

    This motif is found primarily in bacterial surface antigens, normally as variable number repeats at the N terminus. The C terminus of these proteins is normally represented by . There may also be a relationship to Hemolysin activator HlyB (). The alignment centres on a -GY- or -GF- motif. Some members of this family are found in the mitochondria. It is predicted to have a mixed alpha/beta secondary structure.

    \ 6307 IPR010513 \

    This domain is found in a group of endoribonucleases PUBMED:9637683. Specifically, these enzymes cleave an intron from Hac1 mRNA in humans, which cause it to be much more efficiently translated.

    \ 2637 IPR001770 \

    Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha (), beta () and gamma () PUBMED:14762218. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation PUBMED:15294442. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal PUBMED:15119945. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins () or by covalent modifications PUBMED:11313912.

    \

    There are several isoforms of each subunit, many of which have splice variants, which together can make up hundreds of combinations of G proteins. The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli PUBMED:9278091, PUBMED:11882385. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.

    \ \

    This entry represents the G protein gamma subunit and the GGL (G protein gamma-like) domain, which are related in sequence and are comprised of an extended alpha-helical polypeptide. The G protein gamma subunit forms a stable dimer with the beta subunit, but it does not make any contact with the alpha subunit, which contacts the opposite face of the beta subunit. The GGL domain is found in several RGS proteins. GGL domains can interact with beta subunits to form novel dimers that prevent gamma subunit binding, and may prevent heterotrimer formation by inhibiting alpha subunit binding. The interaction between G protein beta-5 neuro-specific isoforms and RGS GGL domains may represent a general mode of binding between beta-propeller proteins and their partners PUBMED:11331068.

    \ 1017 IPR006895 \

    COPII (coat protein complex II)-coated vesicles carry proteins from the endoplasmic reticulum (ER) to the Golgi complex PUBMED:11535824. COPII-coated vesicles form on the ER by the stepwise recruitment of three cytosolic components: Sar1-GTP to initiate coat formation, Sec23/24 heterodimer to select SNARE and cargo molecules, and Sec13/31 to induce coat polymerisation and membrane deformation PUBMED:12239560.

    \

    Sec23 p and Sec24p are structurally related, folding into five distinct domains: a beta-barrel, a zinc-finger, an alpha/beta trunk domain (), an all-helical region (), and a C-terminal gelsolin-like domain (). This entry describes an approximately 55-residue Sec23/24 zinc-binding domain, which lies against the beta-barrel at the periphery of the complex.

    \ \ 1468 IPR002750 \ Members of this family are involved in cobalamin synthesis.\ The gene encoded by has been designated precorrin methylase (cbiH) but\ in fact represents a fusion between cbiH and cbiG. As other\ multi-functional proteins involved in cobalamin biosynthesis\ catalyse adjacent steps in the pathway, including CysG,\ CobL (CbiET), CobIJ and CobA-HemD, it is therefore possible\ that CbiG catalyses a reaction step adjacent to CbiH. In the\ anaerobic pathway such a step could be the formation of a\ gamma lactone, which is thought to help to mediate the\ anaerobic ring contraction process PUBMED:9742225.\ 1467 IPR002748 \ CbiD is essential for cobalamin biosynthesis in both\ Salmonella typhimurium and Bacillus megaterium, no functional role\ has been ascribed to the protein. The CbiD protein\ has a putative S-AdoMet binding site. It is possible that\ CbiD might have the same role as CobF in undertaking\ the C-1 methylation and deacylation reactions required\ during the ring contraction process PUBMED:9742225.\ 4703 IPR004291 \ Transposase proteins are necessary for efficient DNA transposition. This family includes the bacterial insertion sequence (IS) element, IS66, from Agrobacterium\ tumefaciens PUBMED:6095299. IS66 may cause genetic and structural variations of the T region and\ the vir region of the octopine Ti plasmids PUBMED:6095299.\ 1163 IPR005506 \ This set of repeats is found in a small family of secreted proteins of no known function, which may be involved in signal transduction.\ 1722 IPR004006 \ Dihydroxyacetone kinase (glycerone kinase) catalyses the phosphorylation of glycerone in the presence of ATP to glycerone phosphate in the glycerol utilization pathway. This is the kinase domain of the dihydroxyacetone kinase family.\ 7519 IPR011649 \ The cyanobacterial clock proteins KaiA and KaiB are proposed as regulators of the circadian rhythm in cyanobacteria. Mutations in both proteins have been reported to alter or abolish circadian rhythmicity. KaiB adopts an alpha-beta meander motif and is found to be a dimer PUBMED:15071498.\ 2471 IPR004455 \ The function of F420-dependent NADP reductase is the transfer of electrons from reduced coenzyme F420 into an electron transport chain. It catalyses the reduction of F420 with NADP(+) and the reduction of NADP(+) with F420H(2).\ 5137 IPR007974 \

    This family consists of tenuivirus NS-3 (PV3 or GV3) proteins. The function of this protein is\ unknown although it is thought to be a replication protein.

    \ 3662 IPR001016 \ Paramyxoviridae, like other non-segmented negative strand RNA viruses, have an RNA-dependent RNA polymerase composed of two subunits, a large protein L and a phosphoprotein P. The L protein confers the RNA polymerase activity on the complex while the P protein acts as a transcription factor PUBMED:9224928.\ 7817 IPR012945 \

    This domain is involved in the folding pathway of tubulins PUBMED:12225668.

    \ 1026 IPR007875 \ This family consists of eukaryotic Sprouty protein homologues. Sprouty proteins have been revealed as inhibitors of the Ras/mitogen-activated protein kinase (MAPK) cascade, a pathway crucial for developmental processes initiated by activation of various receptor tyrosine kinases PUBMED:11731251. The sprouty gene has found to be expressed in the the brain, cochlea, nasal organs, teeth, salivary gland, lungs, digestive tract, kidneys and limb buds in mouse PUBMED:12391162.\ 6289 IPR009462 \

    This entry represents several eukaryotic domains of unknown function, which are present in chromodomain helicase DNA binding proteins. This domain is often found in conjunction with , , , and .

    \ 7507 IPR011627 \

    This is the receptor domain region of the alpha-2-macroglobulin family.

    \ \

    The alpha-macroglobulin (aM) family of proteins includes protease inhibitors PUBMED:2473064, typified by the human tetrameric a2-macroglobulin (a2M); they belong to the MEROPS proteinase inhibitor family I39, clan IL. These protease inhibitors share several defining properties, which include (i) the ability to inhibit proteases from all catalytic classes, (ii) the presence of a 'bait region' and a thiol ester, (iii) a similar protease inhibitory\ mechanism and (iv) the inactivation of the inhibitory capacity by reaction of the thiol ester with small primary amines. \ aM protease inhibitors inhibit by steric hindrance PUBMED:2472396. The mechanism involves protease cleavage of the bait region, a segment of the aM that is particularly susceptible to proteolytic cleavage, which initiates a conformational change such that the aM collapses about the protease. In the resulting aM–protease complex, the active site of the protease is sterically shielded, thus substantially decreasing access to protein substrates. Two additional events occur as a consequence of bait region cleavage, namely (i) the h-cysteinyl-g-glutamyl thiol ester becomes highly reactive and (ii) a major conformational change exposes a conserved COOH-terminal receptor binding domain PUBMED:2469470 (RBD). RBD exposure allows the aM protease complex to bind to clearance receptors and be removed from circulation PUBMED:2430968. Tetrameric, dimeric, and, more recently, monomeric aM protease inhibitors have been identified PUBMED:9914899, PUBMED:10426429.

    \ \ 7389 IPR011437 \

    These proteins have four conserved cysteines, which is suggestive of a metal binding function. This domain may be found on its own or duplicated in the proteins.

    \ 7330 IPR011106 \

    The MANSC (motif at N terminus with seven cysteines) domain is a module with a\ well-conserved seven cysteine motif that is present at the N terminus of\ higher multicellular animal membrane and extracellular proteins. It is\ possible that some of the cysteine residues in the MANSC domain form\ structurally important disulfide bridges.\ All of the MANSC-containing proteins contain predicted transmembrane regions\ and signal peptides. It has been proposed that the MANSC domain in HAI-1 might\ function through binding with hepatocyte growth factor activator and\ matriptase PUBMED:15124631.

    \ 4735 IPR002308 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Cysteinyl-tRNA synthetase () is an alpha monomer and belongs\ to class Ia.

    \ 2286 IPR006974 \

    This is a family of hypothetical proteins from Chlamydia pneumoniae.

    \ 1360 IPR006804 \

    The members of this group of sequences contain a conserved N-terminal domain which is found in the BCL7 family. The function of BCL7 proteins is unknown, though they may be involved in early development. Notably, BCL7B is commonly hemizygously deleted in patients with Williams syndrome PUBMED:9931421.

    \ 3818 IPR006479 \

    This group of sequences represent one of more than 30 families of phage proteins, all lacking detectable homology with each other, known or believed to act as holins. Holins act in cell lysis by bacteriophage. Members of this family are found in phage PBSX and phage SPP1, among others.

    \ 1325 IPR002633 \ The bacteriocins are small peptides that inhibit the growth of various bacteria. Bacteriocins of lactic acid bacteria may inhibit their target cells by permeabilizing the cell membrane PUBMED:9611809.\ 6859 IPR010747 \

    This family consists of several hypothetical bacterial proteins of around 170 residues in length. The function of this family is unknown.

    \ 3110 IPR001228 \ The bacterial ispD protein catalyzes the third step of the \ deoxyxylulose-5-phosphate pathway (DXP) of isoprenoid biosynthesis; the formation of 4-diphosphocytidyl-2C-methyl-D-erythritol from CTP and 2C-methyl-D-erythritol 4-phosphate.\ 5829 IPR010303 \

    This domain of unknown function is found in several transcriptional co-activators including the CREB-binding protein, , which is an acetyltransferase that acetylates histones, giving a specific tag for transcriptional activation. CREB-binding protein also acetylates non-histone proteins.

    \ 2309 IPR007769 \ This family contains poxvirus proteins belonging to the A19 family. The proteins are of unknown function.\ 1779 IPR001762 \

    The adhesion of platelets to the extracellular matrix, and platelet-platelet interactions, are essential in thrombosis and haemostasis PUBMED:1859363. Platelets adhere to damaged blood vessels, release biologically active chemicals, and aggregate, a function that is inhibited in normal blood PUBMED:1859363. The binding of fibrinogen to the glycoprotein IIb/IIIa complex of activated platelets is essential to platelet aggregation and is induced by many agonists, including ADP, collagen, thrombin, epinephrine and prostaglandin endoperoxide analogue.

    \

    Snake venoms affect blood coagulation and platelet function in a complex manner PUBMED:1755841: some induce aggregation and release reactions, and some inhibit them PUBMED:1859363. Disintegrin, a component of some snake venoms, rather than inhibiting the release reactions, operates by inhibiting platelet aggregation, blocking the binding of fibrinogen to the receptor-glyco-protein complex of activated platelets PUBMED:1755841. They act by binding to the integrin glycoprotein IIb-IIIa receptor on the platelet surface and inhibit aggregation induced by ADP, thrombin, platelet-activating factor and collagen. The role of disintegrin in preventing blood coagulation renders it of medical interest, particularly with regard to its use as an anti-coagulant PUBMED:1385408.

    \

    Disintegrins are peptides of about 70 amino acid residues that contain many cysteines all involved in disulphide bonds PUBMED:2036389. Disintegrins contain an Arg-Gly-Asp (RGD) sequence, a recognition site of many adhesion proteins. The RGD sequence of disintegrins is postulated to interact with the glycoprotein IIb-IIIa complex.

    \

    The sequences of disintegrins from different snake species are known. These proteins are known as: albolabrin, applagin, barbourin, batroxostatin, bitistatin, echistatin, elegantin, eristicophin, flavoridin, halysin, kistrin, tergeminin and triflavin.

    \

    Some other proteins are known to contain a disintegrin domain:

    \ \

    The schematic representation of the structure of a typical disintegrin is shown below:

    \
    \
                                       +---+\
           +--------+             +----|---|--------------------+\
           |        |             |    |   |                    |\
      xxxxxCxCxxxxxxCCxxxxCxxxxxxxCxxxxCCxxCxxxxxxxxCxxxRGDxxxxxCxxxxxxCxxxxxxx\
             |       |    |             |           |                  |\
             +-------+    +-------------+           +------------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \ 3204 IPR000734 \ Triglyceride lipases () are lipolytic enzymes that hydrolyse ester linkages of\ triglycerides PUBMED:3147715. Lipases are widely distributed in animals, plants and prokaryotes.\ At least three tissue-specific isozymes exist in higher vertebrates, pancreatic, hepatic and\ gastric/lingual. These lipases are closely related to each other and to lipoprotein lipase\ (), which hydrolyses triglycerides of chylomicrons and very low density lipoproteins\ (VLDL) PUBMED:2917565. The most conserved region in all these proteins is centered around a serine\ residue which has been shown PUBMED:2304545 to participate, with an histidine and an aspartic acid\ residue, in a charge relay system. Such a region is also present in lipases of prokaryotic\ origin and in lecithin-cholesterol acyltransferase () (LCAT) PUBMED:3458198, which\ catalyzes fatty acid transfer between phosphatidylcholine and cholesterol.\ 6833 IPR010737 \

    This family represents a conserved region approximately 200 residues long within bacterial type III effector Hrp-dependent outer proteins (Hop). These form part of a secretion system in certain Gram-negative bacterial pathogens of plants and animals that allows them to inject virulence effector proteins into host cells PUBMED:10922033. Many members of this family are hypothetical proteins.

    \ 6779 IPR010711 \

    This family consists of several group XII secretory phospholipase A2 precursor (PLA2G12) () proteins. Group XII and group V PLA(2)s are thought to participate in helper T cell immune response through release of immediate second signals and generation of downstream eicosanoids PUBMED:11278438.

    \ 5223 IPR008466 \ This family consists of several mammalian protein phosphatase inhibitor 1 (IPP-1) and dopamine- and cAMP-regulated neuronal phosphoprotein (DARPP-32) proteins. Protein phosphatase inhibitor-1 is involved in signal transduction and is an endogenous inhibitor of protein phosphatase-1 PUBMED:10960791. It has been demonstrated that DARPP-32, if phosphorylated, can inhibit protein-phosphatase-1 PUBMED:12543476. DARPP-32 has a key role in many neurotransmitter pathways throughout the brain and has been shown to be involved in controlling receptors, ion channels and other physiological factors including the brain's response to drugs of abuse, such as cocaine, opiates and nicotine. DARPP-32 is reciprocally regulated by the two neurotransmitters that are most often implicated in schizophrenia - dopamine and glutamate. Dopamine activates DARPP-32 through the D1 receptor pathway and disables DARPP-32 through the D2 receptor. Glutamate, acting through the N-methyl-d-aspartate receptor, renders DARPP-32 inactive. A mutant form of DARPP-32 has been linked with gastric cancers PUBMED:12124342.\ 109 IPR006941 \ The major pathways of mRNA turnover in eukaryotes initiate with shortening of the poly(A) tail. CAF1 encodes a critical component of the major cytoplasmic deadenylase in yeast. Caf1p is required for normal mRNA deadenylation in vivo and localises to the cytoplasm. Caf1p copurifies with a Ccr4p-dependent poly(A)-specific exonuclease activity. Some members of this family contain a single-stranded nucleic acid binding domain, R3H.\ 1819 IPR001305 \

    Molecular chaperones are a diverse family of proteins that function to protect proteins in the intracellular milieu from irreversible aggregation during synthesis and in times of cellular stress. The bacterial molecular chaperone DnaK is an enzyme that couples cycles of ATP binding, hydrolysis, and ADP release by an N-terminal ATP-hydrolyzing domain to cycles of sequestration and release of unfolded proteins by a C-terminal substrate binding domain. Dimeric GrpE is the co-chaperone for DnaK, and acts as a nucleotide exchange factor, stimulating the rate of ADP release 5000-fold PUBMED:8016869. DnaK is itself a weak ATPase; ATP hydrolysis by DnaK is stimulated by its interaction with another co-chaperone, DnaJ. Thus the co-chaperones DnaJ and GrpE are capable of tightly regulating the nucleotide-bound and substrate-bound state of DnaK in ways that are necessary for the normal housekeeping functions and stress-related functions of the DnaK molecular chaperone cycle.

    Besides stimulating the ATPase activity of DnaK through its J-domain, DnaJ also associates with unfolded polypeptide chains and prevents their aggregation PUBMED:15063739. Thus, DnaK and DnaJ may bind to one and the same polypeptide chain to form a ternary complex. The formation of a ternary complex may result in cis-interaction of the J-domain of DnaJ with the ATPase domain of DnaK. An unfolded polypeptide may enter the chaperone cycle by associating first either with ATP-liganded DnaK or with DnaJ. DnaK interacts with both the backbone and side chains of a peptide substrate; it thus shows binding polarity and admits only L-peptide segments. In contrast, DnaJ has been shown to bind both L- and D-peptides and is assumed to interact only with the side chains of the substrate.

    \ 2847 IPR001054 \

    Guanylate cyclases () catalyse the formation of cyclic GMP (cGMP) \ from GTP. cGMP acts as an intracellular messenger, activating cGMP-dependent kinases \ and regulating cGMP-sensitive ion channels. The role of cGMP as a second messenger in \ vascular smooth muscle relaxation and retinal photo-transduction is well established. \ Guanylate cyclase is found both in the\ soluble and particulate fractions of eukaryotic cells. The soluble and plasma\ membrane-bound forms differ in structure, regulation and other properties PUBMED:1349465,\ PUBMED:1356629, PUBMED:1680765, PUBMED:1982420. \ Most currently known plasma membrane-bound\ forms are receptors for small polypeptides. The soluble forms of guanylate cyclase are\ cytoplasmic heterodimers having alpha and beta subunits.

    \

    In all characterized eukaryote guanylyl- and adenylyl cyclases, cyclic nucleotide synthesis is carried out by the conserved class III cyclase domain.

    \ 808 IPR007641 \

    RNA polymerases catalyse the DNA-dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Rpb2 is the second largest subunit of the RNA polymerase. This domain comprised of the structural domains anchor and clamp. The clamp region (C-terminal) contains a zinc-binding motif. The clamp region is named due to its interaction with the clamp domain found in Rpb1. The domain also contains a region termed switch 4. The switches within the polymerase are thought to signal different stages of transcription PUBMED:11313498.

    \ 2831 IPR007831 \

    This domain is found at the N terminus of members of the general secretory system II protein E. Proteins in this subfamily are typically involved in Type IV pilus biogenesis (e.g. ), though some are involved in other processes; for instance aggregation in Myxococcus xanthus (e.g. ) PUBMED:11073903.

    \ 6977 IPR009819 \

    This family consists of several Caenorhabditis elegans pes-10 and related proteins. Members of this family are typically around 400 residues in length. The function of this family is unknown.

    \ 4454 IPR000175 \

    Neurotransmitter transport systems are integral to the release, re-uptake and recycling of neurotransmitters at synapses. High affinity transport proteins found in the plasma membrane of presynaptic nerve terminals and glial cells are responsible for the removal from the extracellular space of released-transmitters, thereby terminating their actions PUBMED:15336049. Plasma membrane neurotransmitter transporters fall into two structurally and mechanistically distinct families. The majority of the transporters constitute an extensive family of homologous proteins that derive energy from the co-transport of Na+ and Cl-, in order to transport neurotransmitter molecules into the cell against their concentration gradient. The family has a common structure of 12 presumed transmembrane helices and includes carriers for gamma-aminobutyric acid (GABA), noradrenaline/adrenaline, dopamine, serotonin, proline, glycine, choline, betaine and taurine. They are structurally distinct from the second more-restricted family of plasma membrane transporters, which are responsible for excitatory amino acid transport. The latter couple glutamate and aspartate uptake to the cotransport of Na+ and the counter-transport of K+, with no apparent dependence on Cl- PUBMED:8811182. In addition, both of these transporter families are distinct from the vesicular neurotransmitter transporters PUBMED:8103691, PUBMED:7823024.

    Sequence analysis of the Na+/Cl- neurotransmitter superfamily reveals that it can be divided into four subfamilies, these being transporters for monoamines, the amino acids proline and glycine, GABA, and a group of orphan transporters PUBMED:9779464.

    \

    \ 7732 IPR012873 \

    This family is composed of hypothetical bacterial proteins of unknown function.

    \ 7206 IPR009967 \

    This family consists of several FlbT proteins. FlbT is a post-transcriptional regulator of flagellin. FlbT is associated with the 5' untranslated region (UTR) of fljK (25 kDa flagellin) mRNA and that this association requires a predicted loop structure in the transcript. Mutations within this loop abolish FlbT association and result in increased mRNA stability. It is therefore thought that FlbT promotes the degradation of flagellin mRNA by associating with the 5' UTR PUBMED:11029689.

    \ 1543 IPR007440 \ Chorismate lyase catalyses the first step in ubiquinone synthesis, i.e. the removal of pyruvate from chorismate, to yield 4-hydroxybenzoate.\ 1757 IPR007729 \ 2-keto-3-deoxy-galactonokinase catalyses the second step in D-galactonate degradation.\ 806 IPR007647 \ RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Domain 5, is also known as the external 2 domain PUBMED:11313498.\ 2971 IPR000910 \

    High mobility group (HMG or HMGB) proteins are a family of relatively low molecular weight non-histone components in chromatin. HMG1 (also called HMG-T in fish) and HMG2 are two highly related proteins that bind single-stranded DNA preferentially and unwind double-stranded DNA. Although they have no sequence specificity, they have a high affinity for bent or distorted DNA, and bend linear DNA. HMG1 and HMG2 contain two DNA-binding HMG-box domains (A and B) that show structural and functional differences, and have a long acidic C-terminal domain rich in aspartic and glutamic acid residues. The acidic tail modulates the affinity of the tandem HMG boxes in HMG1 and 2 for a variety of DNA targets. HMG1 and 2 appear to play important architectural roles in the assembly of nucleoprotein complexes in a variety of biological processes, for example V(D)J recombination, the initiation of transcription, and DNA repair PUBMED:11497996.

    \

    The profile in this entry describing the HMG-domains is much more general than the signature. In addition to the HMG1 and HMG2 proteins, HMG-domains occur in single or multiple copies in the following protein classes; the SOX family of transcription factors; SRY sex determining region Y protein and related proteins PUBMED:12920151; LEF1 lymphoid enhancer binding factor 1 PUBMED:10890911; SSRP recombination signal recognition protein; MTF1 mitochondrial transcription factor 1; UBF1/2 nucleolar transcription factors; Abf2 yeast ARS-binding factor PUBMED:11779632; and Saccharomyces cerevisiae transcription factors Ixr1, Rox1, Nhp6a, Nhp6b and Spp41.

    \ 5989 IPR010380 \

    This is a family of uncharacterised bacterial proteins.

    \ 6709 IPR010684 \

    This family represents a conserved region within RNA polymerase II transcription factor SIII (Elongin) subunit A. In mammals, the Elongin complex activates elongation by RNA polymerase II by suppressing transient pausing of the polymerase at many sites within transcription units. Elongin is a heterotrimer composed of A, B, and C subunits of 110, 18, and 15 kilodaltons, respectively. Subunit A has been shown to function as the transcriptionally active component of Elongin PUBMED:7660129.

    \ 3372 IPR007760 \

    Catalases () are antioxidant enzymes that catalyse the conversion of hydrogen peroxide to water and molecular oxygen. Hydrogen peroxide is produced as a consequence of oxidative cellular metabolism and can be converted to the highly reactive hydroxyl radical via transition metals, this radical being able to damage a wide variety of molecules within a cell, leading to oxidative stress and cell death. Catalases act to neutralise hydrogen peroxide toxicity, and are produced by all aerobic organisms ranging from bacteria to man. There are three structurally independent classes of catalases: ubiquitous mono-functional haem-containing catalases (), bifunctional haem-containing catalase-peroxidases that are closely related to plant peroxidases (), and non-haem manganese-containing catalases PUBMED:14745498.

    \

    This entry represents the non-haem Mn-catalases, which are found in several bacterial species PUBMED:14871145. The structure of the Mn catalase from Lactobacillus plantarum reveals a homo-hexamer, where each subunit contains a dimanganese active site that is accessed by a single substrate channel PUBMED:11587647. The dimanganese active site performs a two-electron catalytic cycle that alternately oxidises and reduces the dimanganese atoms in a manner that is similar to its haem-counterpart found in other catalases.

    \ 4748 IPR002311 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \ \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic aci, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. No conserved structural features for tRNA recognition by class I synthetases have been established.

    \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \

    In eubacteria, glycyl-tRNA synthetase () is an alpha2/beta2 tetramer composed of 2 different subunits PUBMED:6309809, PUBMED:7962006, PUBMED:7665503. In some eubacteria,\ in archaea and eukaryota, glycyl-tRNA synthetase is an alpha2 dimer (see ), this family. It belongs to class IIc and is one of the most complex synthetases. What is most interesting\ is the lack of similarity between the two types: divergence at the sequence\ level is so great that it is impossible to infer descent from common genes. \ The alpha (see ) and beta subunits also lack significant sequence similarity.\ However, they are translated from a single mRNA PUBMED:6309809, and a single chain \ glycyl-tRNA synthetase from Chlamydia trachomatis has been found to have \ significant similarity with both domains, suggesting divergence from a \ single polypeptide chain PUBMED:7665503.

    \ 2906 IPR005028 \

    This domain of unknown function is found in the intermediate/early proteins of the Herpes virus. Many of these proteins play a role in transcriptional regulation.

    \ 498 IPR005471 \

    The many bacterial transcription regulation proteins which bind DNA through a\ 'helix-turn-helix' motif can be classified into subfamilies on the basis of\ sequence similarities. One of these subfamilies, called 'iclR', groups several proteins including:\ \

    \

    \ \

    These proteins have\ a Helix-Turn-Helix motif at the N-terminus that is similar to that of other DNA-binding proteins PUBMED:1840643.

    \ 2572 IPR003813 \ Methyl-viologen-reducing hydrogenase (MVH) is one of the enzymes involved in methanogenesis and coded in the mth-flp-mvh-mrt cluster of methane genes in Methanobacterium thermoautotrophicum PUBMED:7730278. No specific\ functions have been assigned to the delta subunit.\ 4229 IPR000589 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S15 is one of the proteins from the small ribosomal subunit. In Escherichia coli, this protein binds\ to 16S ribosomal RNA and functions at early steps in ribosome assembly. It belongs to a family of ribosomal proteins\ which, on the basis of sequence similarities PUBMED:, PUBMED:2263452,], groups bacterial and plant chloroplast S15;\ archaeal Haloarcula marismortui HmaS15 (HS11); yeast mitochondrial S28; and mammalian, yeast, Brugia pahangi\ and Wuchereria bancrofti S13. S15 is a protein of 80 to 250 amino-acid residues.

    \ 3982 IPR006163 \

    Phosphopantetheine (or pantetheine 4' phosphate) is the prosthetic group of acyl carrier proteins (ACP) in some multienzyme complexes where it serves as a 'swinging arm' for the attachment of activated fatty acid and amino-acid groups PUBMED:5321311.

    The amino-terminal region of the ACP proteins is well defined and consists of alpha four helices arranged in a right-handed\ bundle held together by interhelical hydrophobic interactions. The Asp-Ser-Leu (DSL)motif is conserved in all of the ACP sequences, and the 4'-PP prosthetic group is covalently linked\ via a phosphodiester bond to the serine residue. The DSL sequence is present at the amino terminus of helix II, a domain of the protein referred to as the recognition helix and which is responsible for the\ interaction of ACPs with the enzymes of type II fatty acid synthesis PUBMED:11825906.

    \ 651 IPR002884 \

    This domain, termed the P domain is approximately 150 amino acids in length and C-terminal to a serine endopeptidase domain which belong to MEROPS peptidase family S8 (clan SB), subfamily S8B (kexin). The domain is primarily associated with the calcium-dependant serine endopeptidases, kex2/subtilisin proprotein convertases (PCs), which have been identified in all eukaryotes PUBMED:9353231 and in the gammaproteobacteria, Nostoc (cyanobacteria) and in Streptomyces avermitilis.

    \ \

    The P domain appears necessary for folding and maintaining the endopeptidase catalytic domain and to regulate its calcium and acidic pH dependence. In addition, contained within the middle of the P domain in most PC family members is the cognate integrin binding RGD sequence PUBMED:10212221, which may be required for intracellular compartmentalization and maintenance of enzyme stability within the ER. The integrity of the RGD sequence of proprotein convertase PC1 is critical for its zymogen and C-terminal processing and for its cellular trafficking PUBMED:9307023, PUBMED:10212221. The carboxy-terminal tail provides uniqueness to each PC family member being the least conserved region of all convertases PUBMED:10842308.

    \ \ 5418 IPR008487 \ This family consists of several uncharacterised hypothetical proteins of unknown function from Xylella fastidiosa, the organism that causes Pierce's disease in plants.\ 669 IPR005139 \ This domain is found in peptide chain release factors.\ 587 IPR001453 \ Eukaryotic and prokaryotic molybdoenzymes require a molybdopterin cofactor\ (MoCF) for their activity. The biosynthesis of this cofactor involves a\ complex multistep enzymatic pathway. One of the eukaryotic proteins involved\ in this pathway is the Drosophila protein cinnamon PUBMED:8088525 which is highly similar\ to gephyrin, a rat microtubule-associated protein which was thought to anchor\ the glycine receptor to subsynaptic microtubules.

    Cinnamon and gephyrin are\ evolutionary related, in their N-terminal half, to the Escherichia coli MoCF\ biosynthesis proteins mog/chlG and moaB/chlA2 and, in their C-terminal half,\ to E. coli moeA/chlE.

    \ 727 IPR004343 \ The plus-3 domain is about 90 residues in length and is often found associated with the GYF domain (). The function of plus-3 is uncertain. It is possible that this domain is involved in DNA binding as it has three conserved positively charged residues, hence this domain has been names the plus-3 domain. It is found in the yeast Rtf1 protein which may be a transcription\ elongation factor PUBMED:11014804.\ 3497 IPR004232 \ Nitrile hydratase is composed of two subunits, alpha and beta and catalyzes the hydration of nitrile compounds to the corresponding amides.\ 7622 IPR012433 \

    This family consists of sequences from hypothetical proteins thought to be expressed by two members of the Xanthomonas genus. The region in question is 125 amino acid residues long.

    \ 7095 IPR009893 \

    This family consists of several Nucleopolyhedrovirus capsid protein P87 sequences. P87 is expressed late in infection and concentrated in infected cell nuclei PUBMED:2184573.

    \ 1445 IPR005596 \ Beta-carotene hydroxylase is involved in zeaxanthin synthesis by hydroxylating beta-carotene, exploiting iron activated oxygen to break the C-H bond with concomitant\ formation of double bond or oxygen insertion. The enzyme may also be involved in other pathways PUBMED:10431816.\ 7848 IPR012969 \

    Proteins in this family bind to fibrinogen. Members include the fibrinogen receptor, FbsA (), which mediates platelet aggregation PUBMED:15383464.

    \ 4740 IPR002318 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \ \

    Alanyl-tRNA synthetase () is an alpha4 tetramer that belongs to class IIc.

    \ 6957 IPR009812 \

    This family consists of several hypothetical Staphylococcus aureus and Staphylococcus aureus bacteriophage proteins of around 65 residues in length. The function of this family is unknown.

    \ 3603 IPR003718 \ Osmotically inducible protein C (OsmC)) is a stress -induced protein found in E. Coli. The transcription of the osmC gene of Escherichia coli is regulated as a function of the phase of growth and is induced during the decelerating phase, before entry into stationary phase. The transcription is initiated by two overlapping promoters, osmCp1 and osmCp2 PUBMED:8820643.\

    An organic hydroperoxide detoxification protein (OHR) from Xanthomonas campestris pv. phaseoli is highly induced by organic hydroperoxides, weakly induced by H2O2, and not induced at all by a superoxide generator. Ohr may be a new type of organic hydroperoxide detoxification protein PUBMED:9573147.

    \ 5267 IPR008657 \ This family contains several jumping translocation breakpoint proteins or JTBs. Jumping translocation (JT) is an unbalanced translocation that comprises amplified chromosomal segments jumping to various telomeres. JTB, located at 1q21, has been found to fuse with the telomeric repeats of acceptor telomeres in a case of JT. hJTB (Homo sapiens JTB) encodes a transmembrane protein that is highly conserved among divergent eukaryotic species. JT results in a hJTB truncation, which potentially produces an hJTB product devoid of the transmembrane domain. hJTB is located in a gene-rich region at 1q21, called EDC (Epidermal Differentiation Complex) PUBMED:10321732. JTB has also been implicated in prostatic carcinomas PUBMED:10762645.\ 2487 IPR006838 \ This family includes the hamster androgen-induced FAR-17a protein () PUBMED:2045681, and its human homologue, the AIG1 protein () PUBMED:11266118. The function of these proteins is unknown. This family also includes homologous regions from a number of other metazoan proteins.\ 6137 IPR010449 \

    This domain is found in the Numb family of proteins.

    \ 3388 IPR006657 \

    A domain in this entry corresponds\ to the C-terminal domain IV in dimethyl sulphoxide (DMSO)reductase\ which interacts with the 2-amino pyrimidone ring of both \ molybdopterin guanine dinucleotide molecules PUBMED:8890912.

    \ 4510 IPR007198 \

    Ssl1-like proteins are 40 kDa subunits of the transcription factor II H complex. This domain is often found associated with the C2H2 type Zn-finger ().

    \ 425 IPR007577 \ The DXD motif is a short conserved motif found in many families of glycosyltransferases, which add a range of different sugars to other sugars, phosphates and proteins. DXD-containing glycosyltransferases all use nucleoside diphosphate sugars as donors and require divalent cations, usually manganese. The DXD motif is expected to play a carbohydrate binding role in sugar-nucleoside diphosphate and manganese dependent glycosyltransferases PUBMED:9653120.\ 2107 IPR007384 \

    This family includes an N-terminal region of unknown function from the Erwinia cartovora exoenzyme regulation regulon orf1 protein, which also contains a domain found in RNA pseudouridylate synthase .

    \ 6143 IPR000758 \ Virulence-related outer membrane proteins are expressed in Gram-negative bacteria and are essential to bacterial survival within macrophages and for eukaryotic cell invasion. Members of this group include: \
  • PagC, required by Salmonella typhimurium for survival in macrophages and for virulence in mice PUBMED:1766380
  • \
  • Rck outer membrane protein of the Salmonella typhimurium virulence plasmid PUBMED:8675302
  • \
  • Ail, a product of the Yersinia enterocolitica chromosome capable of mediating bacterial adherence to and invasion of epithelial cell lines PUBMED:1688838
  • \
  • OmpX from Escherichia coli that promotes adhesion to and entry into mammalian cells. It also has a role in the resistance against attack by the human complement system PUBMED:1987115
  • \
  • a bacteriophage lambda outer membrane protein, Lom PUBMED:1846140
  • \

    The crystal structure of OmpX from Escherichia coli reveals that OmpX consists of an eight-stranded antiparallel all-next-neighbour beta barrel PUBMED:10545325. The structure shows two girdles of aromatic amino acid residues and a ribbon of nonpolar residues that attach to the membrane interior. The core of the barrel consists of an extended hydrogen-bonding network of highly conserved residues. OmpX thus resembles an inverse micelle. The OmpX structure shows that the membrane-spanning part of the protein is much better conserved than the extracellular loops. Moreover, these loops form a protruding beta sheet, the edge of which presumably binds to external proteins. It is suggested that this type of binding promotes cell adhesion and invasion and helps defend against the complement system. Although OmpX has the same beta-sheet topology as the structurally related outer membrane protein A (OmpA) , their barrels differ with respect to the shear numbers and internal hydrogen-bonding networks.

    \ 86 IPR001107 \ The band 7 protein is an integral membrane protein which is thought to regulate\ cation conductance. A variety of proteins belong to this family. These include the\ prohibitins, cytoplasmic anti-proliferative proteins and stomatin, an erythrocyte membrane protein. Bacterial HflC protein also belongs\ to this family.\ 3941 IPR004968 \ This protein is necessary for viral DNA replication, and is a nucleic acid independent nucleoside triphosphatase. \ \ 7782 IPR013104 \

    The Clostridium neurotoxin family is composed of tetanus neurotoxins and seven serotypes of botulinum neurotoxin. The structure of the botulinum neurotoxin reveals a four domain protein. The N-terminal catalytic domain (), the central translocation domains and two receptor-binding domains PUBMED:9783750. This domain is the C-terminal receptor-binding domain, which adopts a modified beta-trefoil fold with a six stranded beta-barrel and a beta-hairpin triplet capping the domain PUBMED:9783750. The first step in the intoxication process is a binding event between this domain and the pre-synaptic nerve ending PUBMED:9783750.

    \ 2809 IPR007720 \ Glycosylphosphatidylinositol (GPI) represents an important anchoring molecule for cell surface proteins. The first step in its synthesis is the transfer of N-acetylglucosamine (GlcNAc) from UDP-N-acetylglucosamine to phosphatidylinositol (PI). This chemically simple step is genetically complex because three or four genes are required in both Saccharomyces cerevisiae (GPI1, GPI2 and GPI3) and mammals (GPI1, PIG A, PIG H and PIG C), respectively PUBMED:11849707.\ 2128 IPR007423 \ This is a small bacterial protein of unknown function.\ 7623 IPR012860 \

    The members of this family include sequences derived from hypothetical eukaryotic proteins of unknown function. The region in question is approximately 550 residues long.

    \ 1144 IPR006763 \ To date many different Plasmodium antigens recognised by the hyperimmune system human sera have been cloned, sequenced and characterised. The majority contain tandemly repeated amino acid sequences which make up a considerable portion of the protein sequence. It has been suggested that these repeat-containing antigens may provide an immunological smokescreen to the parasite in order to evade the human immune system. This repeat is found exclusively in the Plasmodium falciparum Ag332 protein and occupies most of its length PUBMED:7628570.\ 7088 IPR009887 \

    This family consists of several progressive ankylosis protein (ANK or ANKH) sequences. The ANK protein spans the outer cell membrane and shuttles inorganic pyrophosphate (PPi), a major inhibitor of physiologic and pathologic calcification, bone mineralisation and bone resorption PUBMED:11326272. Mutations in ANK are thought to give rise to Craniometaphyseal dysplasia (CMD) which is a rare skeletal disorder characterised by progressive thickening and increased mineral density of craniofacial bones and abnormally developed metaphyses in long bones PUBMED:11326338.

    \ 7174 IPR009946 \

    This family consists of several hypothetical Nucleopolyhedrovirus proteins of around 100 resides in length. The function of this family is unknown.

    \ 5314 IPR008832 \ This family consists of several eukaryotic SRP9 proteins. SRP9 together with the Alu-homologous region of 7SL RNA and SRP14 comprise the Alu domain of SRP, which mediates pausing of synthesis of ribosome associated nascent polypeptides that have been engaged by the targeting domain of SRP PUBMED:7730321.\ 4126 IPR006119 \

    Site-specific recombination plays an important role in DNA rearrangement in prokaryotic organisms. Two types of site-specific recombination are known to occur:

    \
      \
    1. Recombination between inverted repeats resulting in the reversal of a DNA segment.
    2. \
    3. Recombination between repeat sequences on two DNA molecules resulting in their cointegration, or between repeats on one DNA molecule resulting in the excision of a DNA fragment.
    4. \
    \

    Site-specific recombination is characterized by a strand exchange mechanism that requires no DNA synthesis or high energy cofactor; the phosphodiester bond energy is conserved in a phospho-protein linkage during strand cleavage and re-ligation.

    \

    Two unrelated families of recombinases are currently known PUBMED:3011407. The first, called the 'phage integrase' family, groups a number of bacterial phage and yeast plasmid enzymes. The second PUBMED:2896291, called the 'resolvase' family, groups enzymes which share the following structural characteristics: an N-terminal catalytic and dimerization domain that contains a conserved serine residue involved in the transient covalent attachment to DNA, and a C-terminal helix-turn-helix DNA-binding domain .

    \ 2163 IPR007462 \ This protein is predicted to be an integral membrane protein.\ 3298 IPR004089 \

    Bacterial chemotactic-signal transducers PUBMED:, PUBMED:3052756 are proteins that respond to\ changes in the concentration of attractants and repellents in the environment,\ and transduce a signal from the outside to the inside of the cell. These\ proteins undergo two covalent modifications: deamidation and reversible\ methylation. Attractants increase the level of methylation while repellents\ decrease it. The methyl groups are added by the methyl-transferase cheR and\ are removed by the methylesterase cheB.

    \ \

    All these proteins are composed of the same structural domains: a N-terminal\ region that resembles a signal peptide, but which is not removed from the\ mature protein and serves as a membrane-spanning region; a periplasmic\ domain of about 160 amino acids that forms the receptor domain; a second\ transmembrane region and finally a C-terminal cytoplasmic domain of about 300\ amino acids which contains the methylation sites.

    \

    The methyl-accepting sites are specific glutamate residues (some of these\ sites are translated as glutamine but are irreversibly deamidated by cheB).\ They are clustered in two regions of the cytoplasmic domain PUBMED:2033064.

    \ 5875 IPR009269 \

    This is a family of eukaryotic proteins with undetermined function.

    \ 5149 IPR007986 \

    This family consists of NINE proteins from several bacteriophage and from Escherichia coli.

    \ 6757 IPR009697 \

    This family consists of several Rotavirus specific VP3 proteins. VP3 is known to be a viral guanylyltransferase and is thought to posses methyltransferase activity and therefore VP3 is a predicted multifunctional capping enzyme PUBMED:10603323.

    \ 4773 IPR002227 \ Tyrosinase () PUBMED:3130643 is a copper monooxygenases that catalyzes the\ hydroxylation of monophenols and the oxidation of o-diphenols to o-quinols.\ This enzyme, found in prokaryotes as well as in eukaryotes, is involved in the\ formation of pigments such as melanins and other polyphenolic compounds.\ Tyrosinase binds two copper ions (CuA and CuB). Each of the two copper ions has\ been shown PUBMED:1901488 to be bound by three conserved histidines residues. The regions\ around these copper-binding ligands are well conserved and also shared by some\ hemocyanins, which are copper-containing oxygen carriers from the hemolymph of\ many molluscs and arthropods PUBMED:2664531, PUBMED:1898774.\ At least two proteins related to tyrosinase are known to exist in mammals, and include TRP-1 (TYRP1) PUBMED:7813420, which is responsible for the conversion of 5,6-dihydro-xyindole-2-carboxylic acid (DHICA) to indole-5,6-quinone-2-carboxylic acid; and TRP-2 (TYRP2) PUBMED:1537334, which is the melanogenic enzyme DOPAchrome tautomerase\ () that catalyzes the conversion of DOPAchrome to DHICA. TRP-2\ differs from tyrosinases and TRP-1 in that it binds two zinc ions instead\ of copper PUBMED:7980602.\ Other proteins that belong to this family are plant polyphenol oxidases (PPO) (), which catalyze the oxidation\ of mono- and o-diphenols to o-diquinones PUBMED:1391768; and \ Caenorhabditis elegans hypothetical protein C02C2.1.\ 2788 IPR000925 \ This family includes attachment proteins from respiratory synctial virus. Glycoprotein G has not been \ shown to have any neuraminidase or hemagglutinin activity. The amino terminus is thought to be cytoplasmic, \ and the carboxyl terminus extracellular. The extracellular region contains four completely conserved \ cysteine residues.\ 8071 IPR013247 \

    A homologue of the SH3 domain has been found in a number of different bacterial proteins including glycyl-glycine endopeptidase, bacteriocin and some hypothetical proteins.

    \ 5473 IPR008806 \ This entry describes the C-terminal region of several DNA-directed RNA polymerase III polypeptides which are related to the Saccharomyces cerevisiae RPC82 protein. RNA polymerase C (III) promotes the transcription of tRNA and 5S RNA genes. In Saccharomyces cerevisiae, the enzyme is composed of 15 subunits, ranging from 160 to about 10 kDa PUBMED:1406632.\ 7081 IPR009882 \

    This family consists of several Gypsy/Env proteins from Drosophila and Ceratitis fruit fly species. Gypsy is an endogenous retrovirus of Drosophila melanogaster. Phylogenetic studies suggest that occasional horizontal transfer events of gypsy occur between Drosophila species. gypsy possesses infective properties associated with the products of the envelope gene that might be at the origin of these interspecies transfers PUBMED:11805056.

    \ 189 IPR000846 \

    Dihydrodipicolinate reductase () catalyzes the second step in the biosynthesis of \ diaminopimelic acid and lysine, the NAD or NADP-dependent reduction of 2,3-dihydrodipicolinate \ into 2,3,4,5-tetrahydrodipicolinate.

    \ 3058 IPR001468 \

    Indole-3-glycerol phosphate synthase () (IGPS) catalyzes the fourth step in the biosynthesis of tryptophan, the ring closure of 1-(2-carboxy-phenylamino)-1-deoxyribulose into indol-3-glycerol-phosphate. In some bacteria, IGPS is a single chain enzyme. In others, such as Escherichia coli, it is the N-terminal domain of a bifunctional enzyme that also catalyzes N-(5'-phosphoribosyl)anthranilate isomerase () (PRAI) activity (see ), the third step of tryptophan biosynthesis. In fungi, IGPS is the central domain of a trifunctional enzyme that contains a PRAI C-terminal domain and a glutamine amidotransferase () (GATase) N-terminal domain (see ).

    A structure of the IGPS domain of the bifunctional enzyme from the mesophilic\ bacterium E. coli (eIGPS) has been compared with the monomeric indole-3-glycerol phosphate\ synthase from the hyperthermophilic archaeon Sulfolobus solfataricus (sIGPS). Both are single-domain\ (beta/alpha)8 barrel proteins, with one (eIGPS) or two (sIGPS) additional helices inserted before the first beta strand PUBMED:8747452.

    \ 4461 IPR004250 \ Somatostatin inhibits the release of the pituitary growth hormone, somatotropin and inhibits the release of glucagon and insulin from the pancreas of fasted animals. Cortistatin is a cortical neuropeptide with neuronal depressant and sleep-modulating properties PUBMED:8622767.\ 2949 IPR005212 \

    This domain occurs in a range of proteins from antibiotic production pathways. These include the gra-ORF27 product that probably functions at an early step, most likely as a dTDP-4-keto-6- deoxyglucose-2,3-dehydratase PUBMED:9831526. Its homologues include dnmT from the daunorubicin biosynthetic gene cluster in S. peucetius PUBMED:8955419, a similar gene from the daunomycin biosynthetic cluster in Streptomyces sp. strain C5 PUBMED:8655529, eryBVI from the erythromycin cluster in S. erythraea and snoH from the nogalamycin cluster in S. nogalater. This domain is a 200 amino acid long region, which may be a structural unit, that occurs twice within the proteins that contain it.

    \ 7619 IPR012431 \

    This is a family consisting of sequences from hypothetical proteins of unknown function expressed by certain species of archaea. One member () is thought to be similar to tropomyosin PUBMED:10382966.

    \ 7696 IPR012448 \

    The proteins in this entry have not been characterised.

    \ 2656 IPR005850 \

    Galactose-1-phosphate uridyl transferase catalyses the conversion of UDP-glucose and alpha-D-galactose 1-phosphate to alpha-D-glucose 1-phosphate and UDP-galactose during galactose metabolism. The enzyme is present \ in prokaryotes and eukaryotes. Defects in GalT in humans is the cause of galactosemia, an \ inherited disorder of galactose metabolism that leads to jaundice, cataracts and mental retardation.

    \

    This domain describes the C terminal of Galactose-1-phosphate uridyl transferase. SCOP reports fold duplication of the C-terminal with the N-terminal domain. Both are involved in Zn and Fe binding

    \ 1274 IPR000749 \ ATP:guanido phosphotransferases are a family of structurally and functionally related enzymes \ PUBMED:2324092, PUBMED:7819288 that reversibly catalyze the transfer of phosphate between \ ATP and various phosphogens. The enzymes belonging to this family include glycocyamine kinase \ (), which catalyzes the transfer of phosphate from ATP to guanidoacetate; arginine \ kinase (), which catalyzes the transfer of phosphate from ATP to arginine; taurocyamine \ kinase (), an annelid-specific enzyme that catalyzes the transfer of phosphate from ATP \ to taurocyamine; lombricine kinase (), an annelid-specific enzyme that catalyzes the \ transfer of phosphate from ATP to lombricine; Smc74, a cercaria-specific enzyme from Schistosoma \ mansoni PUBMED:2324092; and creatine kinase () (CK) PUBMED:3896131, PUBMED:2324105, which plays an important role in energy metabolism of vertebrates. It catalyzes the \ reversible transfer of high energy phosphate from ATP to creatine, generating phosphocreatine and \ ADP. There are at least four different, but very closely related, forms of CK. Two isozymes, M \ (muscle) and B (brain), are cytosolic, while the other two are mitochondrial. In sea urchin there \ is a flagellar isozyme, which consists of the triplication of a CK-domain. A cysteine residue is \ implicated in the catalytic activity of these enzymes and the region around this active site residue \ is highly conserved.\ 6185 IPR010470 \

    This family consists of several Benyvirus proteins of unknown function.

    \ 3835 IPR006487 \

    This group of sequences represent members of the family of phage lambda minor tail protein L.

    \ 3690 IPR006952 \ Retinal rod and cone cGMP phosphodiesterases function as the effector enzymes in the vertebrate visual transduction cascade. This family represents the inhibitory gamma subunit PUBMED:11900530, which is also expressed outside retinal tissues and has been shown to interact with the G-protein-coupled receptor kinase 2 signalling system to regulate the epidermal growth factor- and thrombin-dependent stimulation of p42/p44 mitogen-activated protein kinase in human embryonic kidney 293 cells PUBMED:11502744.\ 125 IPR004017 \ This domain is usually found in two copies per protein. It contains up to four conserved cysteines. The group includes proteins characterised as: \ heterodisulphide reductase, subunit B (HrdB);\ \ succinate dehydrogenase, subunit C (SdhC, ); \ \ Fe-S oxidoreductase; \ glycerol-3-phosphate dehydrogenase subunit C (Anaerobic GlpC, ); and \ glycolate oxidase iron-sulfur subunit (GlcF) PUBMED:8606183.\ 1011 IPR002857 \

    Zinc finger domains PUBMED:3125980, PUBMED: are nucleic acid-binding protein structures first \ identified in the Xenopus laevis transcription factor TFIIIA. These domains have since been found in \ numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid \ residues including 2 conserved Cys and 2 conserved His residues in a C-2-C-12-H-3-H type motif. \ The 12 residues separating the second Cys and the first His are mainly polar and basic, implicating \ this region in particular in nucleic acid binding. The zinc finger motif is an unusually small, \ self-folding domain in which Zn is a crucial component of its tertiary structure. All bind 1 atom of \ Zn in a tetrahedral array to yield a finger-like projection, which interacts with nucleotides in the \ major groove of the nucleic acid. The Zn binds to the conserved Cys and His residues. Fingers have \ been found to bind to about 5 base pairs of nucleic acid containing short runs of guanine residues. \ They have the ability to bind to both RNA and DNA, a versatility not demonstrated by the helix-turn-helix motif. The zinc finger may thus represent the original nucleic acid binding protein. It has \ also been suggested that a Zn-centred domain could be used in a protein interaction, e.g. in protein \ kinase C. Many classes of zinc fingers are characterized according to the number and positions of the \ histidine and cysteine residues involved in the zinc atom coordination. In the first class to be \ characterized, called C2H2, the first pair of zinc coordinating residues are cysteines, while the \ second pair are histidines.

    \ This domain contains eight conserved cysteine residues\ that bind to zinc. The CXXC domain is found in proteins\ that methylate cytosine, proteins that bind to methyl\ cytosine and HRX related proteins.\ 6912 IPR009786 \

    This family consists of several thyroid hormone-inducible hepatic protein (Spot 14 or S14) sequences. Mainly expressed in tissues that synthesise triglycerides, the mRNA coding for Spot 14 has been shown to be increased in rat liver by insulin, dietary carbohydrates, glucose in hepatocyte culture medium, as well as thyroid hormone. In contrast, dietary fats and polyunsaturated fatty acids, have been shown to decrease the amount of Spot 14 mRNA, while an elevated level of cAMP acts as a dominant negative factor. In addition, liver-specific factors or chromatin organisation of the gene have been shown to contribute to the regulation of its expression PUBMED:9003802. Spot 14 protein is thought to be required for induction of hepatic lipogenesis PUBMED:11564699.

    \ 3326 IPR000966 \

    Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, and nickel. \ They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds \ PUBMED:1779825, PUBMED:2959513, PUBMED:3064814 species, including sea urchins, fungi, insects and cyanobacteria. \ Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units. This original \ classification system has been found to be limited, in the sense that it does not allow clear differentiation \ of patterns of structural similarities, either between or within classes. Consequently, all class I and class \ I MTs (the proteinaceous sequences) have now been grouped into families of phylogenetically-related and thus \ alignable sequences.

    \

    Diptera (Drosophila, family 5) MTs are 40-43 residue proteins that contain 10 conserved \ cysteines arranged in five Cys-X-Cys groups. In particular, the consensus pattern \ C-G-x(2)-C-x-C-x(2)-Q-x(5)-C-x-C-x(2)-D-C-x-C has been found to be diagnostic of family 5 MTs. The protein \ is found primarily in the alimentary canal, and its induction is stimulated by ingestion of cadmium or copper \ PUBMED:2578462. Mercury, silver and zinc induce the protein to a lesser extent. Family 5 includes subfamilies: d1, d2. Only one d2 is known until now. Subfamilies hit the same entry.

    \ 6019 IPR009337 \

    This is a family of uncharacterised Proteobacteria proteins.

    \ 3082 IPR003235 \ Caenorhabditis elegans insulin-like peptides PUBMED:1868853 are evolutionary related to insulin; relaxin; insulin-like growth factors I and II PUBMED:2197088; mammalian Leydig cell-specific insulin-like peptide (gene INSL3) PUBMED:8253799 and early placenta insulin-like\ peptide (ELIP) (gene INSL4) PUBMED:8666396; insect prothoracicotropic hormone (bombyxin) PUBMED:;\ locust insulin-related peptide (LIRP) PUBMED:1688797; and Molluscan insulin-related peptides 1-5 PUBMED:9548970. Structurally, all these peptides\ consist of two polypeptide chains (A and B) linked by two disulphide bonds. They all share a conserved\ arrangement of four cysteines in their A chain. The first of these cysteines is linked by a disulphide\ bond to the third one and the second and fourth cysteines are linked by interchain disulphide bonds to\ cysteines in the B chain. \

    Insulin is involved in the regulation of normal glucose homeostasis, as well\ as other specific physiological functions PUBMED:6243748. It is synthesised as a\ prepropeptide from which an endoplasmic reticulum-targeting sequence is cleaved to yield proinsulin.\ Prosinsulin contains regions A and B separated by an intervening connecting region, C. The\ connecting region is cleaved, liberating the active protein, which contains the A and B chains,\ held together by 2 disulphide bonds PUBMED:503234.

    \ 5958 IPR008300 \

    \ Salmonella enterica serovar Typhimurium degrades 1,2-propanediol by a pathway that requires coenzyme B12, adenosylcobalamin (AdoCbl). Proteins required for 1,2-propanediol degradation are encoded by the pdu operon PUBMED:10498708. PduL functions in this pathway, but its exact role is not yet determined.

    \

    \ Propanediol degradation is thought to be important for the natural Salmonella populations, since propanediol is produced by the fermentation of the common plant sugars rhamnose and fucose PUBMED:10498708, PUBMED:9023178. More than 1% of the Salmonella enterica genome is devoted to the utilization of propanediol and cobalamin biosynthesis. In vivo expression technology has indicated that propanediol utilization (pdu) genes may be important for growth in host tissues, and competitive index studies with mice have shown that pdu mutations confer a virulence defect PUBMED:9539791, PUBMED:9922242. The pdu operon is contiguous and coregulated with the cobalamin (B12) biosynthesis cob operon, indicating that propanediol catabolism may be the primary reason for de novo B12 synthesis in Salmonella PUBMED:1312999, PUBMED:8226666, PUBMED:1313000. Please see , and for more details on the propanediol utilization pathway and the pdu operon.

    \ 2367 IPR008250 \

    P-type (or E1-E2-type) ATPases constitute a superfamily of cation transport enzymes, present both in prokaryota and eukaryota, whose members mediate membrane flux of all common biologically relevant cations PUBMED:8226755. The enzymes, that form an aspartyl phosphate intermediate in the course of ATP hydrolysis, can be divided into 4 major groups PUBMED:8151716: (1) Ca2+-transporting ATPases; (2) Na+/K+- and gastric H+/K+-transporting ATPases; (3) plasma membrane H+-transporting ATPases (proton pumps) of plants, fungi and lower eukaryotes; and (4) all bacterial P-type ATPases, except the Mg2+-ATPase of \ Salmonella typhimurium, which is more similar to the eukaryotic sequences. However, great variety of sequence analysis methods results in diversity of classification.

    \ \ 5879 IPR010328 \

    This is a family of uncharacterised bacterial proteins.

    \ 161 IPR004203 \ Cytochrome c oxidase, a 13 sub-unit complex, is the terminal oxidase in the mitochondrial electron transport chain. This\ family is composed of cytochrome c oxidase subunit IV. The Dictyostelium discoideum member of this family is called COX VI. The Saccharomyces cerevisiae protein YGX6_YEAST appears to be the yeast COX IV subunit.\ 863 IPR001232 \ SKP1 (together with SKP2) was identified as an essential component of the \ cyclin A-CDK2 S phase kinase complex PUBMED:10205047. It was found to bind several \ F-box containing proteins (e.g., Cdc4, Skp2, cyclin F) and to be involved in the \ ubiquitin protein degradation pathway. A yeast homologue of SKP1 (P52286) was \ identified in the centromere bound kinetochore complex PUBMED:8670864 and is also \ involved in the ubiquitin pathway PUBMED:9390558. In the slime mold FP21 \ was shown to be glycosylated in the cytosol and has homology to SKP1 PUBMED:7852383.\ 2455 IPR001308 \ The electron transfer flavoprotein (ETF) serves as a specific electron acceptor for \ various mitochondrial dehydrogenases. ETF transfers electrons to the main respiratory \ chain via ETF-ubiquinone oxidoreductase. ETF is an heterodimer that consists of an alpha \ and a beta subunit which binds one molecule of FAD per dimer PUBMED:2326318, PUBMED:8525056. A similar system also exists in some bacteria.\

    \ The alpha subunit of ETF is structurally related to the bacterial nitrogen fixation \ protein fixB which could play a role in a redox process and feed electrons to ferredoxin.

    \ 1506 IPR003175 \ Cell cycle progression is negatively controlled by cyclin-dependent kinases inhibitors (CDIs). CDIs are involved in cell cycle arrest at the G1 phase. \ 6975 IPR010793 \

    This family consists of several eukaryotic mitochondrial 28S ribosomal protein S30 (or programmed cell death protein 9 PDCD9) sequences. The exact function of this family is unknown although it is known to be a component of the mitochondrial ribosome and a component in cellular apoptotic signaling pathways PUBMED:11248257.

    \ 4171 IPR002784 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This entry includes the eukaryotic ribosomal protein L14, which binds to the 60S ribosomal subunit, and archaebacterial ribosomal protein L14E, which binds to the 50S ribosomal subunit.

    \ 2038 IPR007176 \ This is an archaeal family of unknown function.\ 2112 IPR007399 \ This is a putative lipoprotein.\ 5641 IPR008725 \

    The function of the orthopoxvirus F7L proteins are unknown.

    \ 2426 IPR001835 \

    Escherichia coli heat-labile enterotoxin is a bacterial protein toxin with an AB5 multimer structure, in which the B pentamer has a membrane-binding function and the A chain () is needed for enzymatic activity PUBMED:8478941. The B subunits are arranged as a donut-shaped pentamer, each subunit participating in ~30 hydrogen bonds and 6 salt bridges with its two neighbours PUBMED:8478941.

    \

    The A subunit has a less well-defined secondary structure. It predominantly interacts with the pentamer via the C-terminal A2 fragment, which runs through the charged central pore of the B subunits. A putative catalytic residue in the A1 fragment (Glu112) lies close to a hydrophobic region, which packs two loops together. It is thought that this region might be important for catalysis and membrane translocation PUBMED:8478941.

    \ 4529 IPR006969 \ This family represents the Stig1 cysteine rich plant protein.The tobacco stigma-specific gene, STIG1 is developmentally regulated and expressed specifically in the stigmatic secretory zone. Pistils of transgenic STIG1-barnase tobacco plants undergo normal development, but lack the stigmatic secretory zone and are female sterile. Pollen grains are unable to penetrate the surface of the ablated pistils. Application of stigmatic exudate from wild-type pistils to the ablated surface increases the efficiency of pollen tube germination and growth and restores the capacity of pollen tubes to penetrate the style PUBMED:8039494. The function of STIG1 is unknown.\ 222 IPR007255 \ Dor1 is involved in vesicle targeting to the yeast Golgi apparatus and complexes with a number of other trafficking proteins, which include Sec34 and Sec35 PUBMED:11703943.\ 8034 IPR013218 \

    The Mtw1 kinetochore complex contains at least four essential components including Mtw1, DSN1, NNF1 and NSL1. All proteins exhibit genetic and two-hybrid interactions and all stabley associate in solution. The function of the complex is unclear though it is involved in chromosome segregation PUBMED:15502821, PUBMED:12455957.

    \ 6872 IPR010752 \

    This family consists of several hypothetical bacterial proteins of around 475 residues in length. The majority of family members are from Pseudomonas species but the family also contains sequences from Shewanella oneidensis and Thauera aromatica.

    \ 5961 IPR010368 \

    This family consists of several relatively short bacterial and archaeal hypothetical sequences. The function of this family is unknown.

    \ 5108 IPR007945 \

    Mature peptide hormones and neuropeptides are typically synthesised from much larger precursors and require several post-translational processing steps--including\ proteolytic cleavage--for the formation of the bioactive species. The subtilisin-related proteolytic enzymes that accomplish neuroendocrine-specific cleavages are\ known as prohormone convertases 1 and 2 (PC1 and PC2), which belong to MEROPS peptidase family S8B. The cell biology of these proteases within the regulated secretory pathway of neuroendocrine cells is\ complex, and they are themselves initially synthesised as inactive precursor molecules. ProPC1 propeptide cleavage occurs rapidly in the endoplasmic reticulum, yet its major site of action on prohormones takes place later in the secretory pathway. PC1 undergoes an interesting carboxyl terminal processing event whose function\ appears to be to activate the enzyme. ProPC2, on the other hand, exhibits comparatively long initial folding times and exits the endoplasmic reticulum without\ propeptide cleavage, in association with the neuroendocrine-specific protein 7B2. Once the proPC2/7B2 complex arrives at the trans-Golgi network, 7B2 is\ internally cleaved into two domains, the 21-kDa fragment and a carboxy-terminal 31 residue peptide. PC2 propeptide removal occurs in the maturing secretory granule, most likely through autocatalysis, and 7B2 association does not appear to be directly required for this cleavage event. However, if proPC2 has not encountered 7B2 intracellularly, it cannot generate a catalytically active mature species. The molecular mechanism behind the intriguing intracellular association of 7B2 and proPC2 is still unknown, but may involve conformational rearrangement or stabilisation of a proPC2 conformer mediated by a 36-residue internal segment of 21-kDa 7B2.

    \ \ \

    This family represents, 7B2 (secretogranin V), which is the molecular escort protein for PC2. 7B2 is a bifunctional protein with an N-terminal activation domain and a C-terminal inhibitory domain (MEROPS inhibitor family I21, clan I-) separated by a furin cleavage site PUBMED:10506829. Although 7B2 represents a potent inhibitor of PC2, there is an absolute requirement of 7B2 for the activation of PC2, which is synthesised as a zymogen. Both the full length, 27 kDa, and the C-terminal peptide (CT domain) derived from intramolecular cleavage of 7B2 are potent inhibitors of PC2. Studies have shown that the active peptide in the CT domain to be LLRVHK, active in the nanomolar range not only against PC2 but also PC1 PUBMED:9756897, PUBMED:10812060. Knockout studies have shown that the PC2 nulls are not phenotypically equivalent to the 7B2 nulls, which suggests that 7B2 may have other activities in addition to being the activator of PC2 PUBMED:12472887.

    \ \

    7B2 exhibits both structural and functional homology to proSAAS (), which is the PC1 binding protein. The CT domain of proSAAS contains the same inhibitor hexapeptide as 7B2, consequently both 7B2 and proSAAS are two members of a homologous family of prohormone convertase inhibitor proteins.

    \ \ 1546 IPR005801 \ This entry represents the catalytic regions of the chorismate binding enzymes anthranilate synthase, isochorismate synthase, aminodeoxychorismate synthase and para-aminobenzoate synthase.\ Anthranilate synthase catalyses the reaction:\ \ The enzyme is a tetramer comprising 2 I and 2 II components: this entry is restricted to component I that \ catalyses the formation of anthranilate using ammonia rather than glutamine, while component II\ provides glutamine amidotransferase activity .\ 4426 IPR009042 \

    The bacterial core RNA polymerase complex, which consists of five subunits, is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme PUBMED:3052291. RNA polymerase recruits alternative sigma factors as a means of switching on specific regulons. Most bacteria express a multiplicity of sigma factors. Two of these factors, sigma-70 (gene rpoD), generally known as the major or primary sigma factor, and sigma-54 (gene rpoN or ntrA) direct the transcription of a wide variety of genes. The other sigma factors, known as alternative sigma factors, are required for the transcription of specific subsets of genes.

    With regard to sequence similarity, sigma factors can be grouped into two classes, the sigma-54 and sigma-70 families. Sequence alignments of the sigma70 family members reveal four conserved regions that can be further divided into subregions, e.g. sub-region 2.2, which may be involved in the binding of the sigma factor to the core RNA polymerase; and sub-region 4.2, which seems to harbour a DNA-binding 'helix-turn-helix' motif involved in binding the conserved -35 region of promoters recognized by the major sigma factors PUBMED:3092189, PUBMED:1597408.\

    \ 2577 IPR005188 \

    The influenza C virus genome consists of seven single-stranded RNA segments. The shortest RNA segment encodes a 286 amino acid non-structural protein NS1 as well as the NS2 protein. The NS2 protein is only about 60 amino acids in length and of unknown function.

    \ 850 IPR001214 \

    Synonym(s): Suvar3-9, Enhancer-of-zeste, Trithorax

    \ \

    Proteins bearing the widely distributed SET domain (~130 amino acid) have been shown to contribute to epigenetic mechanisms of gene regulation by methylation of lysine residues in histones and other proteins. The SET domain genes are widely represented in the eukaryotic genomes, and proteins were initially distributed into four families, SU(VAR)3-9, E(Z), ASH1 and TRITHORAX based on the homology of their SET domains. Additional proteins have now been identified which do not fit into this classification PUBMED:12039029.

    \

    The SET domain appears generally as one part of a larger multidomain protein, and recently there were described three structures of very different proteins with distinct domain compositions: Neurospora DIM-5, a member of the Su(var) family of HKMTs which methylate histone H3 on lysine 9,human SET7 (also called SET9), which methylates H3 on lysine 4 and garden pea Rubisco LSMT, an enzyme that does not modify histones, but instead methylates lysine 14 in the flexible tail of the large subunit of the enzyme Rubisco. The SET domain itself turned out to be an uncommon structure. Although in all three studies, electron density maps revealed the location of the AdoMet or AdoHcy cofactor, the SET domain bears no similarity at all to the canonical/AdoMet-dependent methyltransferase fold. Strictly conserved in the C-terminal motif of the SET domain tyrosine could be involved in abstracting a proton from the protonated amino group of the substrate lysine, promoting its nucleophilic attack on the sulphonium methyl group of the AdoMet cofactor. In contrast to the AdoMet-dependent protein methyltranferases of the classical type, which tend to bind their polypeptide substrates on top of the cofactor, it is noted from the Rubisco LSMT structure that the AdoMet seems to bind in a separate cleft, suggesting how a polypeptide substrate could be subjected to multiple rounds of methylation without having to be released from the enzyme. In contrast, SET7/9 is able to add only a single methyl group to its substrate. It has been demonstrated that association of SET domain and myotubularin-related proteins modulates growth control PUBMED:9537414. The SET domain-containing Drosophila protein, enhancer of zeste, has a function in segment determination and the mammalian homologue may be involved in the regulation of gene transcription and chromatin structure.

    \

    It seems likely that the varied domains that occur together with the SET domain will be involved in recognizing protein substrates and ''reading'' histone tails in order to dictate which (if any) of their multiple lysine residues should get methylated PUBMED:12372294.

    \ \ 4016 IPR001280 \ Photosystem I, a membrane complex found in the chloroplasts of plants and cyanobacteria \ uses light energy to transfer electrons from plastocyanin to ferredoxin PUBMED:. \ The electron transfer components of the photosystem include the primary electron donor \ chlorophyll P-700 and 5 electron acceptors: chlorophyll (A0), phylloquinone (A1) and \ three 4Fe-4S iron-sulphur centres, designated Fx, Fa and Fb.\ \

    The proteins psaA and psaB are similar and form a dimer in the membrane, the complex \ being involved in binding the electron transfer components PUBMED:.

    \ 2558 IPR000090 \

    The flagellar motor switch in Escherichia coli and Salmonella typhimurium regulates the \ direction of flagellar rotation and hence controls swimming behaviour PUBMED:8224881.\ The switch is a complex apparatus that responds to signals transduced by the\ chemotaxis sensory signalling system during chemotactic behaviour PUBMED:8224881. CheY,\ the chemotaxis response regulator, is believed to act directly on the switch\ to induce tumbles in the swimming pattern, but no physical interactions of \ CheY and switch proteins have yet been demonstrated.

    \

    The switch complex comprises at least three proteins - FliG, FliM and FliN.\ It has been shown that FliG interacts with FliM, FliM interacts with itself,\ and FliM interacts with FliN PUBMED:8631704. Several residues within the middle third\ of FliG appear to be strongly involved in the FliG-FliM interaction, with\ residues near the N- or C-termini being less important PUBMED:8631704. Such clustering\ suggests that FliG-FliM interaction plays a central role in switching.

    \

    Analysis of the FliG, FliM and FliN sequences shows that none are especially\ hydrophobic or appear to be integral membrane proteins PUBMED:2656645. This result is\ consistent with other evidence suggesting that the proteins may be \ peripheral to the membrane, possibly mounted on the basal body M ring PUBMED:2656645, PUBMED:1631122. FliG is present in about 25 copies per flagellum. This structure of the\ C-terminal domain is known, this domain functions\ specifically in motor rotation PUBMED:10440379.

    \ 998 IPR000465 \ Xeroderma pigmentosum (XP) PUBMED:8160271 is a human autosomal recessive disease,\ characterised by a high incidence of sunlight-induced skin cancer. Skin cells of individuals with this condition are hypersensitive to ultraviolet light, due\ to defects in the incision step of DNA excision repair. There are a minimum of\ seven genetic complementation groups involved in this pathway: XP-A to XP-G.\ XP-A is the most severe form of the disease and is due to defects in a 30 kDa\ nuclear protein called XPA (or XPAC) PUBMED:1918083.\ The sequence of the XPA protein is conserved from higher eukaryotes PUBMED:1764072 to\ yeast (gene RAD14) PUBMED:1741034. XPA is a hydrophilic protein of 247 to 296 amino-acid\ residues which has a C4-type zinc finger motif in its central section.\ \ 7006 IPR009838 \

    This family consists of several bacterial TraL proteins. TraL is a predicted peripheral membrane protein, which is thought to be involved in bacterial sex pilus assembly PUBMED:8655498. The exact function of this family is unclear.

    \ 676 IPR002692 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The penicillin amidases or penicillin acylases are serine peptidases belonging to the MEROPS peptidase family S45 (clan PB(S)). The protein fold of the peptidase domain for members of this family resembles that of archaean proteasome subunit B, the type example of clan PB.

    \ \

    Penicillin amidase or penicillin acylase catalyses the hydrolysis of benzylpenicillin to phenylacetic acid and 6-aminopenicillanic acid (6-APA) a key intermediate in the the synthesis of penicillins PUBMED:9292993.

    \ 4289 IPR007832 \ The family comprises a subunit specific to RNA Pol III, the tRNA specific polymerase. The C34 subunit of Saccharomyces cerevisiae RNA Pol III is part of a subcomplex of three subunits which have no counterpart in the other two nuclear RNA polymerases. This subunit interacts with TFIIIB70 and therefore participates in Pol III recruitment PUBMED:9312031.\ 1552 IPR000479 \ The cation-independent mannose-6-phosphate receptor is a type I membrane protein responsible for transport of phosphorylated lysosomal enzymes from the golgi complex and the cell surface to lysosomes. Lysosomal enzymes bearing phosphomannosyl residues bind specifically to mannose-6-phosphate receptors in the golgi apparatus and the resulting receptor-ligand complex is transported to an acidic prelysosomal compartment where the low pH mediates the dissociation of the complex. This receptor also binds insulin growth factor. It contains\ 15 copies of a repeat.\ 1969 IPR004989 \

    This domain represents the N-terminal region of Orf6, which is localised upstream of the 20S proteasome subunit genes, prcA and prcB in members of the Actinobacteria: Streptomyces coelicolor PUBMED:9765579, Frankia sp. PUBMED:10652097 and Rhodococcus erythropolis PUBMED:7583123.

    \ 856 IPR007627 \ Region 2 of sigma-70 is the most conserved region of the entire protein. All members of this class of sigma-factor contain region 2. The high conservation is due to region 2 containing both the -10 promoter recognition helix and the primary core RNA polymerase binding determinant. The core-binding helix, interacts with the clamp domain of the largest polymerase subunit, beta prime PUBMED:11931761, PUBMED:8858155. The aromatic residues of the recognition helix, found at the C terminus of this domain are thought to mediate strand separation, thereby allowing transcription initiation PUBMED:11931761, PUBMED:8858155.\ 6329 IPR009479 \

    This family consists of several human herpesvirus U55 proteins. The function of this family is unknown.

    \ 1894 IPR003746 \

    This entry describes proteins of unknown function.

    \ 7985 IPR012971 \

    This N-terminal domain is found in a subfamily of hypothetical nucleolar GTP-binding proteins similar to human NGP1 PUBMED:15112237.

    \ 881 IPR007591 \ This is a family of eukaryotic single-stranded DNA binding-proteins with specificity to a pyrimidine-rich element found in the promoter region of the alpha2(I) collagen gene.\ 2521 IPR002181 \

    Fibrinogen plays key roles in both blood clotting and platelet aggregation. During blood clot formation, the conversion of soluble fibrinogen to insoluble fibrin is triggered by thrombin, resulting in the polymerisation of fibrin, which forms a soft clot; this is then converted to a hard clot by factor XIIIA, which further cross-links fibrin molecules. Platelet aggregation involves the binding of the platelet protein receptor integrin alpha(IIb)-beta(3) to the C-terminal domain of the fibrinogen gamma chain, mediating a range of adhesive reactions that include adhesion, platelet aggregation and fibrin clot retraction PUBMED:12799374.

    \

    Fibrinogen occurs as a dimer, where each monomer is composed of three non-identical chains, alpha, beta and gamma, linked together by several disulphide bonds PUBMED:11460466. The N-terminals of all six chains come together to form the centre of the molecule, from which the monomers extend in opposite directions as coiled coils, followed by C-terminal globular domains. The C-terminal globular domains are referred to as the D regions, while the coiled-coil and N-terminal central region are referred to as the E region. During clot formation, the N-terminal alpha and beta chains are cleaved, enabling them to bind to the C-terminal gamma and beta chains, respectively, of adjacent molecules, causing the proteins to polymerise PUBMED:11593005.

    \

    This entry represents the C-terminal globular domains (D region) of the alpha, beta and gamma chains. These domains are related to domains in other proteins: in the sea cucumber fibrogen-like FreP-A and FreP-B proteins; in the C-terminus of Drosophila scabrous protein that is involved in the regulation of neurogenesis, possibly through the inhibition of R8 cell differentiation; and in ficolin proteins, which display lectin activity towards N-acetylglucosamine through their fibrogen-like domains PUBMED:12396010.

    \ \ 7987 IPR012973 \

    This C-terminal domain is found in the NOG subfamily of nucleolar GTP-binding proteins PUBMED:15112237.

    \ 6995 IPR010941 \

    This entry represents the N-terminal region of the bacterial poly-beta-hydroxybutyrate polymerase (PhaC). Polyhydroxyalkanoic acids (PHAs) are carbon and energy reserve polymers produced in some bacteria when carbon sources are plentiful and another nutrient, such as nitrogen, phosphate, oxygen, or sulphur, becomes limiting. PHAs composed of monomeric units ranging from 3 to 14 carbons exist in nature. When the carbon source is exhausted, PHA is utilised by the bacterium. PhaC links D-(-)-3-hydroxybutyrl-CoA to an existing PHA molecule by the formation of an ester bond PUBMED:10427049.

    \ 7259 IPR009995 \

    This family consists of several archaeal proteins of around 370 residues in length. The function of this family is unknown.

    \ 2401 IPR011618 \

    Bacterial PTS transporters transport and concomitantly phosphorylate their sugar substrates, and typically consist of multiple subunits or protein domains. The Man family is unique in several respects among PTS permease families.\

  • It is the only PTS family in which members possess a IID protein.
  • It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.
  • Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.
  • \

    The Gut family consists only of glucitol-specific permeases, but these occur both in Gram-negative and Gram-positive bacteria. Escherichia coli consists of IIA protein, a IIC protein and a IIBC protein.

    This entry represents the N-terminal conserved region of the IIBC component.

    \ \ 5549 IPR008775 \ This family is made up of several eukaryotic phytanoyl-CoA dioxygenase (PhyH) proteins as well as a number of bacterial deoxygenases. PhyH is a peroxisomal enzyme catalysing the first step of phytanic acid alpha-oxidation. PhyH deficiency causes Refsum's disease (RD) which is an inherited neurological syndrome biochemically characterised by the accumulation of phytanic acid in plasma and tissues PUBMED:10767344.\ 6188 IPR009417 \

    This family consists of several Rice tungro bacilliform virus P12 proteins. The function of this family is unknown PUBMED:2041739.

    \ 3502 IPR001501 \

    Hydrogenases are enzymes that catalyze the reversible activation of hydrogen and which occur widely in prokaryotes as well as in some eukaryotes. There are various types of hydrogenases, but all of them seem to contain at least one iron-sulphur cluster. They can be broadly divided into two groups: hydrogenases containing nickel and, in some cases, also selenium (the [NiFe] and [NiFeSe] hydrogenases) and those lacking nickel (the [Fe] hydrogenases).

    \

    The [NiFe] and [NiFeSe] hydrogenases are heterodimer that consist of a small subunit that contains a signal peptide and a large subunit. All the known large subunits seem to be evolutionary related PUBMED:2180913; they contain two Cys-x-x-Cys motifs; one at their N-terminal end; the other at their C-terminal end. These four cysteines are involved in the binding of nickel PUBMED:7854413. In the [NiFeSe] hydrogenases the first cysteine of the C-terminal motif is a selenocysteine which has experimentally been shown to be a nickel ligand PUBMED:2521386.

    \ 1537 IPR003506 \

    Three cysteine-rich proteins (also believed to be lipoproteins) make up the\ extracellular matrix of the Chlamydial outer membrane PUBMED:2287277. They are involved in the essential structural integrity of both the elementary body (EB) and recticulate body (RB) phase. As these bacteria lack the peptidoglycan layer common to most Gram-negative microbes, such proteins are highly important \ in the pathogenicity of the organism.

    \

    The largest of these is the major outer membrane protein (MOMP), and \ constitutes around 60% of the total protein for the membrane PUBMED:8477811. OMP6 is the second largest, with a molecular mass of 58kDa, while the OMP3 protein is ~15kDa PUBMED:2287277. MOMP is believed to elicit the strongest immune response, and has recently been linked to heart disease through its sequence similarity to a murine heart-muscle specific alpha myosin PUBMED:10037605.

    \

    The OMP6 family plays a structural role in the outer membrane during \ the EB stage of the Chlamydial cell, and different biovars show a small, yet \ highly significant, change at peptide charge level PUBMED:2287277. Members of this family include Chlamydia trachomatis, Chlamydia pneumoniae and Chlamydia psittaci.

    \ 2200 IPR007497 \ Members of this family have so far been found in bacteria and mouse SwissProt or TrEMBL entries. However possible family members have also been identified in translated rat (GenBank:AW144450) and human (GenBank:AI478629) ESTs. A mouse family member has been named SIMPL (signalling molecule that associates with mouse pelle-like kinase). SIMPL appears to facilitate and/or regulate complex formation between IRAK/mPLK (IL-1 receptor-associated kinase) and IKK (inhibitor of kappa-B kinase) containing complexes, and thus regulate NF-kappa-B activity PUBMED:11096118. Separate experiments demonstrate that a mouse family member (named LaXp180) binds the Listeria monocytogenes surface protein ActA, which is a virulence factor that induces actin polymerisation. It may also bind stathmin, a protein involved in signal transduction and in the regulation of microtubule dynamics PUBMED:11207567. In bacteria its function is unknown, but it is thought to be located in the periplasm or outer membrane.\ 1473 IPR002762 \

    The function of CbiX is uncertain, however it is found\ in cobalamin biosynthesis operons and so may have a\ related function. Some CbiX proteins contain a striking\ histidine-rich region at their C-terminus, which suggests\ that it might be involved in metal chelation PUBMED:9742225.

    \ 885 IPR002645 \ The STAS (Sulphate Transporter and AntiSigma factor antagonist) domain is found in the C-terminal region of sulphate transporters and bacterial anti-sigma factor antagonists. It has been suggested that this domain may have a general NTP binding function. The establishment of differential gene expression in sporulating Bacillus subtilis involves four protein components one of which is SpoIIAA (). The four components regulate the sporulation sigma factor F. Early in sporulation, SpoIIAA is in the phosphorylated state (SpoIIAA-P), as a result of the activity of the ATP-dependent protein kinase SpoIIAB (). The site at which this protein is a conserved serine. SpoIIAB is an anti-sigma factor that in its free form inhibits F by binding to it. Competition by SpoIIAA (the anti-anti-sigma factor) for binding to SpoIIAB releases Sigma F activity PUBMED:9560229. The STAS domain is found in the anti-sigma factor antagonist SpoIIAA.\ 2430 IPR001299 \

    Ependymins are secretory proteins found predominantly in the cerebrospinal fluid of teleost fish PUBMED:1831964, PUBMED:8350351. A bound form of the glycoproteins is associated \ with the extracellular matrix, probably with collagen fibrils, that may be the functional \ form of ependymins PUBMED:8005346. The proteins bind calcium via N-linked sialic acid \ residues. The molecular function of ependymins appear to be related to cell contact\ phenomena involving the extracellular matrix PUBMED:8005346.

    \ 7462 IPR011509 \

    This short repeat is found in the RtxA toxin family PUBMED:9927695.

    \ 1122 IPR004912 \ The function of this protein is unknown. It has a conserved amino terminus of 50 residues followed by a positively charged tail,\ suggesting it may interact with nucleic acid. \ 7833 IPR012545 \

    This family contains many hypothetical bacterial proteins.

    \ 5465 IPR008630 \ This family contains a number of glycosyltransferase enzymes that contain a DXD motif. This family includes a number of Caenorhabditis elegans homologues where the DXD is replaced by DXH. Some members of this family are included in glycosyltransferase family 34.\ 14 IPR013057 \ This transmembrane region is found in many amino acid transporters including (UNC-47) and (MTR). UNC-47 encodes a vesicular amino butyric acid (GABA) transporter, (VGAT) and is is predicted to have 10 transmembrane domains UNC47_CAEEL PUBMED:9349821. MTR is an N system amino acid transporter system protein involved in methyltryptophan resistance MTR_NEUCR. Other members of this family include proline transporters and amino acid transporters whose specificity has not yet been identified.\ 923 IPR005016 \

    This is a family of proteins which display differential expression in various tumour and cell lines. The function of these proteins is unknown.

    \ 7182 IPR010857 \

    This family contains a number of zona-pellucida-binding proteins that seem to be restricted to mammals. These are sperm proteins that bind to the 90 kDa family of zona pellucida glycoproteins in a calcium-dependent manner PUBMED:7729589. These represent some of the specific molecules that mediate the first steps of gamete interaction, allowing fertilisation to occur PUBMED:9378618.

    \ 2446 IPR006716 \ This family consists of the fungal C-8 sterol isomerase and mammalian sigma1 receptor. C-8 sterol isomerase (delta-8--delta-7 sterol isomerase), catalyses a reaction in ergosterol biosynthesis, which results in unsaturation at C-7 in the B ring of sterols PUBMED:8082205. Sigma 1 receptor is a low molecular mass mammalian protein located in the endoplasmic reticulum PUBMED:8755605, which interacts with endogenous steroid hormones, such as progesterone and testosterone PUBMED:9425306. It also binds the sigma ligands, which are a set of chemically unrelated drugs including haloperidol, pentazocine, and ditolylguanidine PUBMED:8755605. Sigma1 effectors are not well understood, but sigma1 agonists have been observed to affect NMDA receptor function, the alpha-adrenergic system and opioid analgesia.\ 3361 IPR005527 \

    Cytokinesis needs to be regulated spatially in order to ensure that it occurs between the daughter genomes. In prokaryotes such as Escherichia coli, cytokinesis is\ initiated by FtsZ, a tubulin-like protein that assembles into a ring structure at the cell center called the Z ring. A fundamental problem in prokaryotic cell biology is to\ understand how the midcell division site is identified. Two major negative regulatory systems are known to be involved in preventing Z-ring assembly at all sites\ except the midcell. One of these systems, called nucleoid occlusion, blocks Z-ring assembly in the area occupied by an unsegregated nucleoid until a critical stage in\ chromosome replication or segregation is reached. The other system consists of three proteins, MinC, MinD and MinE, which prevent assembly of Z rings in regions\ of the cell not covered by the nucleoid, such as the cell poles. MinC is an inhibitor of FtsZ polymerization, resulting in the inhibition of Z ring assembly in the cell; MinD greatly enhances the inhibitory effects of MinC in vivo; and MinE antagonizes the effects of MinC and MinD PUBMED:11378404.

    \

    MinE is a small bifunctional protein. The amino terminus of MinE is required to interact with MinD, while the carboxyl terminus is required for 'topological specificity' - that is, the ability of MinE to antagonize MinCD inhibition of Z rings at the midcell position but not at the poles.

    \ \ 1397 IPR004275 \ In addition to the highly specific cell-mediated immune system, vertebrates possess an efficient host-defense mechanism against invading microorganisms which involves the synthesis of highly potent antimicrobial peptides with a large spectrum of activity. \ This family contains a number of these defence peptides secreted from the skin of amphibians, including the opiate-like\ dermorphins and deltorphins, and the antimicrobial dermoseptins and temporins.\ 3026 IPR000688 \

    Bacterial membrane-bound nickel-dependent hydrogenases requires a number of accessory proteins\ which are involved in their maturation. The exact role of these proteins is not yet clear, but some seem\ to be required for the incorporation of the nickel ions PUBMED:8305450. One of these proteins is generally\ known as hypA. It is a protein of about 12 to 14 kDa that contains, in its C-terminal region, four conserved\ cysteines that form a zinc-finger like motif. Escherichia coli has two proteins that belong to this family, hypA and\ hybF. A homologue, MJ0214, has also been found in a number of archaeal species, including the genome of Methanococcus jannaschii.

    \ \ 5986 IPR009322 \

    This is a family of small phage tail protein, referred to as protein E.

    \ 623 IPR007007 \ Ninjurin (nerve injury-induced protein) is involved in nerve regeneration and in the formation of some tissues PUBMED:8780658.\ 6585 IPR010629 \

    This entry represents several insect specific allergen repeats. These repeats are commonly found in various proteins from cockroaches, fruit flies and mosquitos. It has been suggested that the repeat sequences have evolved by duplication of an ancestral amino acid domain, which may have arisen from the mitochondrial energy transfer proteins PUBMED:9804858.

    \ 4850 IPR007394 \ Members of this family are predicted to contain a helix-turn-helix motif, for example residues 37-55 in Mycoplasma mycoides p13 (). Genes encoding family members are often part of operons that encode components of the SRP pathway, and this protein may regulate the expression of an operon related to the SRP pathway PUBMED:9070906.\ 4403 IPR005621 \ The binding of SeqA protein to hemimethylated GATC sequences is important in the negative modulation of chromosomal initiation at oriC, and in the formation of SeqA foci necessary for Escherichia coli chromosome segregation PUBMED:11457824. SeqA tetramers are able to aggregate or multimerize in a reversible, concentration-dependent manner PUBMED:11457824. Apart from its function in the control of DNA replication, SeqA may also be a specific transcription factor PUBMED:11442835.\ 4584 IPR001983 \

    Mammalian translationally controlled tumor protein (TCTP) (or P23) is a protein which has been found to be preferentially synthesized in cells during the early growth phase of some types of tumor PUBMED:2479380, PUBMED:3357792, but which is also expressed in normal cells. The physiological function of TCTP is still not known. It\ was first identified as a histamine-releasing factor, acting in IgE +-\ dependent allergic reactions. In addition, TCTP has been shown to bind to\ tubulin in the cytoskeleton, has a high affinity for calcium, is the binding\ target for the antimalarial compound artemisinin, and is induced in vitamin\ D-dependent apoptosis. TCTP production is thought to be controlled at the\ translational as well as the transcriptional level PUBMED:10951206.

    \

    TCTP is a hydrophilic protein of 18 to 20 Kd. TCTPs do not share significant sequence similarity with any other class of\ proteins. Recently, the structure of TCTP was determined and exhibited\ significant structural similarity to the human protein Mss4, which is a\ guanine nucleotide-free chaperone of the Rab protein PUBMED:11473261. Close homologs have been found in plants PUBMED:1623194, earthworm PUBMED:9655922, Caenorhabditis elegans (F52H2.11), Hydra, Saccharomyces cerevisiae (YKL056c) PUBMED:8091862 and Schizosaccharomyces pombe (SpAC1F12.02c).

    \ 6276 IPR009454 \

    This region is found in Apolipophorin proteins.

    \ 1194 IPR010256 \

    A number of evolutionarily-related proteins have been found to be involved in the transport of ammonium ions across membranes PUBMED:8062823, PUBMED:8621394.

    \ \

    Members of this family include:\

    \

    As expected by their transport function, these proteins are highly hydrophobic\ and seem to contain from 10 to 12 transmembrane domains.

    \ 312 IPR006867 \ This conserved region contains a leucine zipper-like domain. The proteins are found only in plants and their functions are unknown.\ 3723 IPR002705 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the peptidase C16 family (clan CA); they are only found in viruses such as Coronaviridae, often found as part of a multifunctional protein with RNA-directed RNA polymerase activity.

    \ 445 IPR007554 \ Wall-associated teichoic acids are a heterogeneous class of phosphate-rich polymers that are covalently linked to the cell wall peptidoglycan of Gram-positive bacteria. They consist of a main chain of phosphodiester-linked polyols and/or sugar moieties attached to peptidoglycan via a linkage unit. CDP-glycerol:poly(glycerophosphate) glycerophosphotransferase is responsible for the polymerisation of the main chain of the teichoic acid by sequential transfer of glycerol-phosphate units from CDP-glycerol to the linkage unit lipid PUBMED:10648531.\ 3756 IPR007035 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M55 (DppA aminopeptidase family, clan MN). The type example is Bacillus subtilis DppA, which is a binuclear zinc-dependent, D-specific aminopeptidase. The structure reveals that DppA is a new example of a self-compartmentalising protease, a family of proteolytic complexes. Proteasomes are the most extensively studied representatives of this family. The DppA enzyme is composed of identical 30 kDa subunits organised in a decamer with 52 point-group symmetry. A 20 A wide channel runs through the complex, giving access to a central chamber holding the active sites. The structure shows DppA to be a prototype of a new family of metalloaminopeptidases characterised by the SXDXEG key sequence PUBMED:11473256. The only known substrates are D-ala-D-ala and D-ala-gly-gly.

    \ 94 IPR004143 \ This domain is found in biotin protein ligase, lipoate-protein ligase A and B. Biotin is covalently attached at the active site of certain enzymes that transfer carbon dioxide from bicarbonate to organic acids to form cellular metabolites. Biotin protein ligase (BPL) is the enzyme responsible for attaching biotin to a specific lysine at the active site of biotin enzymes. Each organism probably has only one BPL. Biotin attachment is a two step reaction that results in the formation of an amide linkage between the carboxyl group of biotin and the epsilon-amino group of the modified lysine PUBMED:10470036. Lipoate-protein ligase A (LPLA) catalyses the formation of an amide linkage between lipoic acid and a specific lysine residue in lipoate dependent enzymes PUBMED:8206909.\ 7514 IPR011656 \ NOTCH signalling plays a fundamental role during a great number of developmental processes in multicellular animals PUBMED:10221902. NOD and NODP represent a region present in many NOTCH proteins and NOTCH homologs in multiple species such as NOTCH2 and NOTCH3, LIN12, SC1 and TAN1. The role of the NOD and NODP domains remains to be elucidated.\ 7273 IPR010890 \

    This family contains the bacterial primosomal replication proteins priB and priC (approximately 180 residues long). In Escherichia coli, these function in the assembly of the primosome PUBMED:10613856.

    \ 6361 IPR010535 \

    This family consists of hypothetical proteins specific to Oryza sativa. One sequence () appears to be tandemly repeated.

    \ 4828 IPR005064 \

    This is a protein family of unknown function.

    \ 2733 IPR001554 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 14 \ comprises enzymes with only one known activity; beta-amylase (). A Glu residue has been proposed as a catalytic residue, but it is not known if it is the nucleophile or the proton donor.

    \ \

    Beta-amylase PUBMED:2457058, PUBMED:2464171 is an enzyme that hydrolyzes 1,4-alpha-glucosidic linkages in starch-type polysaccharide substrates so as to remove\ successive maltose units from the non-reducing ends of the chains. Beta-amylase is present in certain bacteria as well as in plants.

    \

    Three highly conserved sequence regions are found in all known beta-amylases.\ The first of these regions is located in the N-terminal section of the enzymes\ and contains an aspartate which is known PUBMED:2474529 to be involved in the catalytic\ mechanism. The second, located in a more central location, is centered around\ a glutamate which is also involved PUBMED:8174545 in the catalytic mechanism.

    \

    The 3D structure of a complex of soybean beta-amylase with an inhibitor\ (alpha-cyclodextrin) has been determined to 3.0A resolution by X-ray\ diffraction PUBMED:1491009. The enzyme folds into large and small domains: the large\ domain has a (beta alpha)8 super-secondary structural core, while the smaller\ is formed from two long loops extending from the beta-3 and beta-4 strands\ of the (beta alpha)8 fold PUBMED:1491009. The interface of the two domains, together\ with shorter loops from the (beta alpha)8 core, form a deep cleft, in which\ the inhibitor binds PUBMED:1491009. Two maltose molecules also bind in the cleft,\ one sharing a binding site with alpha-cyclodextrin, and the other sitting\ more deeply in the cleft PUBMED:1491009.

    \ 1595 IPR003805 \

    Some bacteria synthesize cobalamin (vitamin B12) de novo under anaerobic conditions. \ The CobU, CobS, CobT, and CobC proteins have been proposed to catalyse the late steps in adenosylcobalamin biosynthesis, which define the nucleotide loop assembly pathway PUBMED:10518530. CobS is the cobalamin(-5'-phosphate) synthase enzyme involved in part III of cobalamin biosynthesis. The enzyme catyalzes the reactions adenosylcobinamide-GDP + alpha-ribazole-5'-P = adenosylcobalamin-5'-phosphate + GMP and adenosylcobinamide-GDP + alpha-ribazole = adenosylcobalamin + GMP. The protein product is associated with a large complex of proteins and is induced by cobinamide PUBMED:8501034.

    \ \ 2396 IPR001884 \

    Translation initiation factor 5A (IF-5A) is reported to be involved in the first step of peptide bond formation in translation, to be involved in\ cell-cycle regulation and to be a cofactor for the Rev and Rex transactivator proteins of human immunodeficiency virus-1 and T-cell leukaemia virus I, respectively PUBMED:8347280, PUBMED:1903841, PUBMED:9753699. \ \ \ IF-5A contains an unusual amino acid, hypusine (N-epsilon-(4-aminobutyl-2-hydroxy)lysine), that is required for its function. The first step in the post-translational\ modification of lysine to hypusine is catalyzed by the enzyme deoxyhypusine synthase, the structure of which has been reported.

    \ \ \

    The crystal structure of IF-5A from the archaeon Pyrobaculum aerophilum has been determined to 1.75 A. Unmodified P. aerophilum IF-5A is found to be a beta structure with two domains and three separate hydrophobic cores. The lysine (Lys42) that is post-translationally modified by deoxyhypusine synthase is found\ at one end of the IF-5A molecule in a turn between beta strands beta4 and beta5; this lysine residue is freely solvent accessible. The C-terminal domain is found to be homologous to the cold-shock protein CspA of E. coli, which has a well characterized RNA-binding fold, suggesting that IF-5A is involved in RNA binding PUBMED:9753699.

    \ \ 3189 IPR000065 \ Leptin, a metabolic monitor of food intake and energy need, is expressed\ by the ob obesity gene. The protein may function as part of a signalling\ pathway from adipose tissue that acts to regulate the size of the body\ fat depot PUBMED:7984236, the hormone effectively turning the brain's appetite\ message off when it senses that the body is satiated. Obese humans have\ high levels of the protein, suggesting a similarity to type II (adult\ onset) diabetes, in which sufferers over-produce insulin, but can't respond\ to it metabolically - they have become insulin resistant. Similarly, it is\ thought that obese individuals may be leptin resistant.\ 799 IPR000722 \

    RNA polymerases catalyze the DNA dependent polymerisation of RNA from DNA, using the\ four ribonucleoside triphosphates as substrates. Prokaryotes contain a single RNA polymerase\ compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases).\ Eukaryotic RNA polymerase I is essentially used to transcribe ribosomal RNA units, polymerase II\ is used for mRNA precursors, and III is used to transcribe 5S and tRNA genes. Each class of RNA\ polymerase is assembled from nine to fourteen different polypeptides. Members of the family\ include the largest subunit from eukaryotes; the gamma subunit from Cyanobacteria; the beta'\ subunit from bacteria; the A' subunit from archaea; and the B'' subunit from chloroplast\ RNA polymerases.

    \ 5921 IPR009287 \

    This family consists of several eukaryotic transcription initiation Spt4 proteins. Three transcription-elongation factors Spt4, Spt5, and Spt6 are conserved among eukaryotes and are essential for transcription via the modulation of chromatin structure. Spt4 and Spt5 are tightly associated in a complex, while the physical association of the Spt4-Spt5 complex with Spt6 is considerably weaker. It has been demonstrated that Spt4, Spt5, and Spt6 play roles in transcription elongation in both yeast and humans including a role in activation by Tat. It is known that Spt4, Spt5, and Spt6 are general transcription-elongation factors, controlling transcription both positively and negatively in important regulatory and developmental roles PUBMED:11182892.

    \ 1842 IPR002808 \

    This entry describes prokaryotic proteins of unknown function.

    \ 4662 IPR000531 \ In Escherichia coli the TonB protein interacts with outer membrane receptor proteins that carry out high-affinity binding and energy-dependent uptake of specific substrates into the periplasmic space PUBMED:14499604. These substrates are either poorly permeable through the porin channels or are encountered at very low concentrations. In the absence of TonB these receptors bind their substrates but do not carry out active transport. The TonB protein also interacts with some colicins. The proteins that are currently known or presumed to interact with TonB include BtuB, CirA, FatA, FcuT, FecA, FhuE, FptA, HemR, IrgA, IutA, PfeA, PupA and Tbp1. Most of these proteins contain a short conserved region at their N-terminus.\ 728 IPR007186 \

    This domain inhibits pectin methylesterases (PMEs) and invertases through formation of a non-covalent 1:1 complex PUBMED:8521860. It has been implicated in the regulation of fruit development, carbohydrate metabolism and cell wall extension. It may also be involved in inhibiting microbial pathogen PMEs. It has been observed that it is often expressed as a large inactive preprotein PUBMED:8521860. It is also found at the N-termini of PMEs predicted from DNA sequences, suggesting that both PMEs and their inhibitors are expressed as a single polyprotein and subsequently processed. It has two disulphide bridges and is mainly alpha-helical PUBMED:10880981.

    \ 1947 IPR002577 \

    The hxlR-type HTH domain is a domain of ~90-100 amino acids present in putative transcription regulators with a winged helix-turn-helix (wHTH) structure. The domain is named after Bacillus subtilis hxlR, a transcription activator of the hxlAB operon involved in the detoxification of formaldehyde PUBMED:10572115. The hxlR-type domain forms the core of putative transcription regulators and of hypothetical proteins occurring in eubacteria as well as in archaea. The sequence and structure of hxlR-type proteins show similarities with the marR-type wHTH PUBMED:11839496.

    \

    \ The crystal structure of ytfH resembles the DNA-binding domains of winged helix proteins, containing a three helix (H) bundle and a three-stranded antiparallel beta-sheet (B) in the topology: H1-H2-B1-H3-H4-B2-B3-H5-H6. This topology corresponds with that of the marR-type DNA-binding domain, wherein helices 3 and 4 comprise the helix-turn-helix motif and the beta-sheet is called the wing.

    \ \ \ 5680 IPR008566 \ This family consists of several uncharacterised proteins from the Gammaherpesvirinae.\ 496 IPR001494 \

    The exchange of macromolecules between the nucleus and cytoplasm takes place through nuclear pore complexes. Active transport of large molecules through these pore complexes require carrier proteins that shuttle between the two components. Members of the importin-beta/karyopherin-beta family subunits act as carriers for many nuclear trafficking processes PUBMED:12372823. Importin-beta binds cargo in the cytoplasm, the complex moves through the pore and cargo is released in the nucleus on binding of Ran-GTP to importin-beta.

    \ \

    Importin-beta is a helicoidal molecule constructed from 19 HEAT repeats, each formed from a pair of alpha-helices. Many nuclear pore proteins contain FG sequence repeats, and interactions between\ repeats containing FxFG or GLFG cores and transport factors have been demonstrated. The crystal structure of residues 1-442 of importin-beta bound to a\ GLFG peptide indicates that this repeat core binds to the same primary site as FxFG cores, suggesting that functional differences between different repeats probably\ arise from differences in their spatial organization.

    \ 7278 IPR010005 \

    This family contains the bacterial formate hydrogenlyase maturation protein HycH, which is approximately 140 residues long. This may be required for the conversion of a precursor form of the large subunit of hydrogenlyase 3 into a mature form PUBMED:1625581.

    \ 3097 IPR000354 \

    Involucrin PUBMED:1359382, PUBMED:8277848 is a protein present in keratinocytes of epidermis and other\ stratified squamous epithelia. Involucrin first appears in the cell cytosol,\ but ultimately becomes cross-linked to membrane proteins by transglutaminase\ thus helping in the formation of an insoluble envelope beneath the plasma\ membrane.

    \ \

    Structurally involucrin consists of a conserved region of about 75 amino acid\ residues followed by two extremely variable length segments that contain\ glutamine-rich tandem repeats. The glutamine residues in the tandem repeats\ are the substrate for the tranglutaminase in the cross-linking reaction. The\ total size of the protein varies from 285 residues (in dog) to 835 residues\ (in orangutan).

    \ 5184 IPR008021 \

    Coat protein A, also known as attachment protein, is necessary for adsorption of the virion onto the F-pilus of the host cell.

    \ \ 7287 IPR010008 \

    This family contains a number of RstB proteins approximately 120 residues long, including RstB1 and RstB2, from the Vibrio cholerae phage CTX. Functional analyses indicate that rstB2 is required for integration of the CTXphi phage into the V. cholerae chromosome PUBMED:9220000.

    \ 391 IPR007219 \

    This domain is found in a number of fungal transcription factors including transcriptional activator xlnR, yeast regulatory protein GAL4, and other transcription proteins regulating a variety of cellular and metabolic processes.

    \ 3813 IPR003512 \ This family contains the bacteriophage helix-destabilizing protein, or single-stranded DNA binding protein, required for DNA synthesis. The protein binds to DNA in a highly cooperative manner without pronounced sequence specificity. In the presence of single-stranded DNA it binds cooperatively to form a helical protein-DNA complex. It prevents the conversion during synthesis of the single-stranded (progeny) viral DNA back into the double-stranded replicative form.\ 450 IPR000237 \ The GRIP (golgin-97, RanBP2alpha,Imh1p and p230/golgin-245) domain\ PUBMED:10209120, PUBMED:10209123, PUBMED:10209125\ is found in many large coiled-coil proteins. It has been shown to\ be sufficient for targeting to the Golgi. The GRIP domain contains\ a completely conserved tyrosine residue.\ 1211 IPR012307 \

    This TIM alpha/beta barrel structure is found in xylose isomerase () and in endonuclease IV (, ). This domain is also found in the N termini of bacterial myo-inositol catabolism proteins. These are involved in the myo-inositol catabolism pathway, and is required for growth on myo-inositol in Rhizobium leguminosarum bv. viciae PUBMED:11497462.

    \ 6886 IPR009768 \

    This family represents a conserved region within a number of myosin II heavy chain-like proteins that seem to be specific to Arabidopsis thaliana.

    \ 2245 IPR007654 \ This region is found in some SIR2 proteins ().\ 6369 IPR010540 \

    This family consists of several bacterial proteins of unknown function.

    \ 5134 IPR007971 \

    This family consists of several bundlin proteins from Escherichia\ coli. Bundlin is a type IV pilin protein that is the only known structural component of\ enteropathogenic E. coli bundle-forming pili (BFP). BFP\ play a role in virulence, antigenicity, autoaggregation, and localised adherence to epithelial cells\ PUBMED:11083828.

    \ 4259 IPR000754 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S9 is one of the proteins from the small ribosomal subunit. It belongs to a\ family of ribosomal proteins which, on the basis of sequence similarities PUBMED:, PUBMED:2332055,\ groups bacterial; algal chloroplast; cyanelle and archaeal S9 proteins; and mammalian;\ plant; and yeast mitochondrial ribosomal S9 proteins.

    \ 6497 IPR009555 \

    This entry represents several Xylella fastidiosa surface protein specific repeats which are found in found in conjunction with , and .

    \ 2741 IPR000805 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 26 comprises enzymes with only one known activity; mannanase ().

    \ \

    Family 26 encompasses mainly mannan endo-1,4-beta-mannosidases.\ Mannan endo-1,4-beta-mannosidase hydrolyses mannan and galactomannan, but\ displays little activity towards other plant cell wall polysaccharides PUBMED:7848261. The enzyme randomly hydrolyses 1,4-beta-D-linkages in mannans, galacto-mannans, glucomannans and galactoglucomannans.

    \ 5951 IPR010364 \

    This family consists of several bacterial CreD or Cet inner membrane proteins. Dominant mutations of the cet gene of Escherichia coli result in tolerance to colicin E2 and increased amounts of an inner membrane protein with a Mr of 42,000. The cet gene is shown to be in the same operon as the phoM gene, which is required in a phoR background for expression of the structural gene for alkaline phosphatase, phoA. Although the Cet protein is not required for phoA expression, it has been suggested that the Cet protein has an enhancing effect on the transcription of phoA PUBMED:2835585.

    \ 4904 IPR006709 \ This protein is found to be part of a large ribonucleoprotein complex containing the U3 snoRNA PUBMED:12068309. Depletion of the Utp proteins impedes production of the 18S rRNA, indicating that they are part of the active pre-rRNA processing complex. This large RNP complex has been termed the small subunit (SSU) processome PUBMED:12068309.\ 521 IPR006652 \

    Kelch is a 50-residue motif, named after the Drosophila mutant in which it was first identified PUBMED:8453663. The motif appears 6 times in Drosophila egg-chamber regulatory protein, and is also found in mouse protein MIPP PUBMED:8453663 and in a number of poxviruses. In addition, kelch repeats have been recognised in alpha- and beta-scruin PUBMED:7593276, PUBMED:7822422, and in galactose oxidase from the fungus Dactylium dendroides PUBMED:8126718. The structure of galactose oxidase reveals that the repeated sequence corresponds to a 4-stranded anti-parallel beta-sheet motif that forms the repeat unit in a super-barrel structural fold PUBMED:8182749.

    \

    The known functions of kelch-containing proteins are diverse: scruin is an actin cross-linking protein; galactose oxidase catalyses the oxidation of the hydroxyl group at the C6 position in D-galactose; neuraminidase hydrolyses sialic acid residues from glycoproteins; and kelch may have a cytoskeletal function, as it is localised to the actin-rich ring canals that connect the 15 nurse cells to the developing oocyte in Drosophila PUBMED:7593276. Nevertheless, based on the location of the kelch pattern in the catalytic unit in galactose oxidase, functionally important residues have been predicted in glyoxal oxidase PUBMED:8126718.

    \ 3601 IPR003184 \ This family of orthopoxvirus secreted proteins (also known as T1 and A41) interact with members of both the CC and CXC superfamilies of chemokines. It has been suggested that these secreted proteins modulate leukocyte influx into virus-infected tissues PUBMED:9123853.\ 7910 IPR012986 \

    This family consists of the PsaX family of photosystem I (PSI) protein subunits. PSI is a large multi-subunit pigment protein complex embedded in the thylakoid membranes of green plants and cyanobacteria. PsaX is one of the 12 protein subunits found in PSI and these subunits are arranged as monomers or trimers within the membrane as shown by the structure of the trimeric complex from Synechococcus elongatus PUBMED:14556907.

    \ 3883 IPR001010 \ Thionins are small, basic plant proteins, 45 to 50 amino acids in length, which include three or four conserved disulphide linkages. The proteins are toxic to animal cells, presumably attacking the cell membrane and rendering it permeable: this results in the inhibition of sugar uptake and allows potassium and phosphate ions, proteins, and nucleotides to leak from cells PUBMED:3985614. Thionins are mainly found in seeds where they may act as a defence against consumption by animals. A barley (Hordeum vulgare) leaf thionin that is highly toxic to plant pathogens and is involved in the mechanism of plant defence against microbial infections has also been identified PUBMED:1377959. The hydrophobic protein crambin from the Abyssinian cabbage (Crambe abyssinica) is also a member of the thionin family PUBMED:3985614.\ 5399 IPR008485 \ This family consists of several eukaryotic proteins of unknown function.\ 6484 IPR009547 \

    This family consists of several Tenuivirus PVC2 proteins from Rice grassy stunt virus, Maize stripe virus and Rice hoja blanca virus. The function of this family is unknown.

    \ 8142 IPR013207 \

    This 54 amino acid repeat is found in many hypothetical proteins. Several hypothetical proteins from Corynebacterium glutamicum and C. efficiens along with PS1 protein contain this repeat region. The N-terminal region of PS1 contains an esterase domain which transfers corynomycolic acid. The C-terminal region consists of 4 tandem LGFP repeats. It is hypothesised that the PS1 proteins in Corynebacterium, when associated with the cell wall, may be anchored via the LGFP tandem repeats that may be important for maintaining cell wall integrity [PUBMED:. Deletion of protein results in a 10-fold increase in the cell volume of the organism and infers the corresponding involvement of the protein in the cell shape formation PUBMED:12740729. The secondary structure of each repeat is predicted to comprise two beta-strands and one alpha-helix PUBMED:.

    \ 5811 IPR009244 \

    This family consists of several eukaryotic proteins, which are homologues of the yeast MED7 protein. Activation of gene transcription in metazoans is a multistep process that is triggered by factors that recognise transcriptional enhancer sites in DNA. These factors work with co-activators such as MED7 to direct transcriptional initiation by the RNA polymerase II apparatus PUBMED:9989412.

    \ 5886 IPR010334 \

    An essential step in mRNA turnover is decapping. In yeast, two proteins have been identified that are essential for decapping, Dcp1 (this family) and Dcp2 (). The precise role of these proteins in the decapping reaction has not been established. Evidence suggests that the Dcp1 may enhance the function of Dcp2 PUBMED:12554866.

    \ 5407 IPR008760 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S32 (clan PA(S)). The type example is equine arteritis virus serine endopeptidase (equine arteritis virus), which is involved in processing of nidovirus polyproteins PUBMED:10725411.

    \ 252 IPR004883 \ This is a group of uncharacterised proteins of unknown function.\ 6985 IPR009826 \

    This entry represents the N terminus (approximately 100 residues) of a number of phage DNA circulation proteins.

    \ 2633 IPR007313 \ This is a bacterial family of cytoplasmic membrane proteins. It includes two transmembrane regions. The molecular function of FxsA is unknown, but in Escherichia coli its overexpression has been shown to alleviate the exclusion of phage T7 in those cells with an F plasmid.\ 4420 IPR002078 \ Some bacterial regulatory proteins activate the expression of genes from\ promoters recognized by core RNA polymerase associated with the alternative\ sigma-54 factor. These have a conserved domain of about 230 residues involved\ in the ATP-dependent PUBMED:8407777, PUBMED:2041769 interaction with sigma-54. \ About half of the proteins in which this domain is found (algB, dcdT, flbD, hoxA, hupR1, hydG, ntrC, pgtA and pilR) belong to signal transduction two-component systems PUBMED:2694934 and possess a domain that can be phosphorylated by a sensor-kinase protein in their N-terminal section.\ Almost all of these proteins possess a helix-turn-helix DNA-binding domain in their C-terminal section.\ The domain which interacts with the sigma-54 factor has an ATPase activity.\ This may be required to promote a conformational change necessary for the\ interaction PUBMED:1534752. The domain contains an atypical ATP-binding motif A (P-loop) as well as a form of motif B. The two ATP-binding motifs are located in the N-terminal section of the domain.\ 3 IPR005238 \

    2-phosphosulfolactate phosphatase () catalyzes the sulfonation of phosphoenolpyruvate to form 2-phospho-3-sulfolactate, the second step in coenzyme M biosynthesis. Coenzyme M is the terminal methyl carrier in methanogenesis PUBMED:11589710.

    \ 888 IPR004112 \

    In bacteria two distinct, membrane-bound, enzyme complexes are responsible for\ the interconversion of fumarate and succinate (): fumarate\ reductase (Frd) is used in anaerobic growth, and succinate dehydrogenase (Sdh)\ is used in aerobic growth. Both complexes consist of two main components: a\ membrane-extrinsic component composed of a FAD-binding flavoprotein and an\ iron-sulphur protein; and an hydrophobic component composed of a membrane\ anchor protein and/or a cytochrome B.

    \

    In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) ()\ is an enzyme composed of two subunits: a FAD flavoprotein and and iron-sulphur\ protein.

    \

    The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is\ covalently bound to a histidine residue which is located in the N-terminal\ section of the protein PUBMED:2668268. The sequence around that histidine is well\ conserved in Frd and Sdh from various bacterial and eukaryotic species PUBMED:1375942.

    \

    This family includes members that bind FAD such as the flavoprotein subunits from\ succinate and fumarate dehydrogenase, aspartate oxidase and the alpha subunit of adenylylsulphate\ reductase.

    \ 1699 IPR000511 \ Cytochrome c heme-lyase (CCHL) () and cytochrome Cc1 heme-lyase (CC1HL) PUBMED:1499554 are mitochondrial enzymes that catalyze the covalent attachment of a heme group on two cysteine residues of cytochrome c and c1. These two enzymes are functionally and evolutionary related. There are two conserved regions, the first is located in the central section and the second in the C-terminal section. Both patterns contain conserved histidine, tryptophan and acidic residues which could be important for the interaction of the enzymes with the apoproteins and/or the heme group.\ 4205 IPR001780 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of:\

    \

    These proteins have 87 to 110 amino-acid residues.

    \ 753 IPR007168 \ This domain is found in Phage shock protein C (PspC) that is thought to be a transcriptional regulator. The presumed domain is 60 amino acid residues in length.\ 4285 IPR005571 \ Rpb5 has a bipartite structure which includes a eukaryote-specific N-terminal domain and a C-terminal domain resembling the archaeal RNAP subunit H PUBMED:10841537, PUBMED:10841538. The N-terminal domain is involved in DNA binding and is part of the jaw module in the RNA pol II structure PUBMED:10784442. This module is important for positioning the downstream DNA.\ 1174 IPR000936 \

    Alphaviruses are enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Semliki Forest and Sindbis viruses PUBMED:15378043. Alphaviruses consist of three structural proteins: the core nucleocapsid protein C, and the envelope proteins P62 and E1 () that associate as a heterodimer. The viral membrane-anchored surface glycoproteins are responsible for receptor recognition and entry into target cells through membrane fusion. The proteolytic maturation of P62 into E2 and E3 () causes a change in the viral surface. Together the E1, E2, and sometimes E3 glycoprotein "spikes" form an E1/E2 dimer or an E1/E2/E3 trimer, where E2 extends from the centre to the vertices, E1 fills the space between the vertices, and E3, if present, is at the distal end of the spike PUBMED:8107141, PUBMED:9445057. Upon exposure of the virus to the acidity of the endosome, E1 dissociates from E2 to form an E1 homotrimer, which is necessary for the fusion step to drive the cellular and viral membranes together PUBMED:11301009. This entry represents the alphaviral E2 glycoprotein. The E2 glycoprotein functions to interact with the nucleocapsid through its cytoplasmic domain, while its ectodomain is responsible for binding a cellular receptor.

    \ 1323 IPR000249 \

    Members of this group are polyhedral organelle shell proteins CsoS1A, CsoS1B and CsoS1C of Thiobacillus neapolitanus (Halothiobacillus neapolitanus) and their orthologs from other bacteria.

    \ \

    Some autotrophic and non-autotrophic organisms form polyhedral organelles, carboxysomes/enterosomes PUBMED:11722879. \ The best studied is the carboxysome of Halothiobacillus neapolitanus, which is composed of at least 9 proteins: six shell proteins, CsoS1A, CsoS1B, CsoS1C, Cso2A, Cso2B and CsoS3 (carbonic anhydrase) PUBMED:14729686, one protein of unknown function and the large and small subunits of RuBisCo (CbbL and Cbbs). \ Carboxysomes appear to be approximately 120 nm in diameter, most often observed as regular hexagons, with a solid interior bounded by a unilamellar protein shell. The interior is filled with type I RuBisCo, which is composed of 8 large subunits and 8 small subunits; it accounts for 60% of the carboxysomal protein, which amounts to approximately 300 molecules of enzyme per carboxysome. Carboxysomes are required for autotrophic growth at low CO2 concentrations and are thought to function as part of a CO2-concentrating mechanism PUBMED:15012219, PUBMED:9891798.

    \ \ \

    Polyhedral organelles, enterosomes, from non-autotrophic organisms are involved in coenzyme B12-dependent 1,2-propanediol utilisation (e.g., in Salmonella enterica PUBMED:10498708) and ethanolamine utilisation (e.g., in\ Salmonella typhimurium PUBMED:7868611). Genes needed for enterosome formation are located in the 1,2-propanediol utilisation pdu PUBMED:11844753, PUBMED:10498708 or ethanolamine utilisation eut PUBMED:7868611, PUBMED:10464203 operons, respectively. Although enterosomes of non-autotrophic organisms are apparently related to\ carboxysomes structurally, a functional relationship is uncertain. A role in CO2 concentration, similar to that of the carboxysome, is unlikely since there is no known association between CO2 and coenzyme B12-dependent 1,2-propanediol or ethanolamine utilisation PUBMED:11844753. It seems probable that entrosomes help protect the cells from reactive aldehyde species in the degradation pathways of 1,2-propanediol and ethanolamine PUBMED:11722879.

    \ \ 7800 IPR013108 \

    This entry consists of a variety of amidohydrolase enzymes.

    \ 5789 IPR009236 \

    This family consists of A13L proteins from the Chordopoxviruses. A13L or p8 is one of the three most abundant membrane proteins of the intracellular mature Vaccinia virus PUBMED:9311819.

    \ 5587 IPR008724 \ This family consists of several sequences which are highly related to the C1 protein of the Vaccinia virus.\ 2938 IPR007620 \ In herpes simplex virus type 2, UL56 is thought to be a tail-anchored type II membrane protein involved in vesicular trafficking. The C-terminal hydrophobic region is required for association with the cytoplasmic membrane, and the N-terminal proline-rich region is important for the translocation of UL56 to the Golgi apparatus and cytoplasmic vesicles PUBMED:12050385.\ 3350 IPR001039 \

    Major Histocompatibility Complex (MHC) glycoproteins are heterodimeric cell surface receptors that function to present antigen peptide fragments to T cells responsible for cell-mediated immune responses. MHC molecules can be subdivided into two groups on the basis of structure and function: class I molecules present intracellular antigen peptide fragments (~10 amino acids) on the surface of the host cells to cytotoxic T cells; class II molecules present exogenously derived antigenic peptides (~15 amino acids) to helper T cells. MHC class I and II molecules are assembled and loaded with their peptide ligands via different mechanisms. However, both present peptide fragments rather than entire proteins to T cells, and are required to mount an immune response.

    \

    Class I MHC glycoproteins are expressed on the surface of all somatic nucleated cells, with the exception of neurons. MHC class I receptors present peptide antigens that are synthesised in the cytoplasm, which includes self-peptides (presented for self-tolerance) as well as foreign peptides (such as viral proteins). These antigens are generated from degraded protein fragments that are transported to the endoplasmic reticulum by TAP proteins (transporter of antigenic peptides), where they can bind MHC I molecules, before being transported to the cell surface via the Golgi apparatus PUBMED:9485452, PUBMED:15526153. MHC class I receptors display antigens for recognition by cytotoxic T cells, which have the ability to destroy viral-infected or malignant (surfeit of self-peptides) cells.

    \

    MHC class I molecules are comprise two chains: a MHC alpha chain (heavy chain), and a beta2-microglobulin chain (light chain), where only the alpha chain spans the membrane. The alpha chain has three extracellular domains (alpha 1-3, with alpha1 being at the N-terminus), a transmembrane region and a C-terminal cytoplasmic tail (); the soluble extracellular beta-2 microglobulin chain associates primarily with the alpha-3 domain and is necessary for MHC stability. The alpha1 and alpha2 domains of the alpha chain are referred to as the recognition region, because the peptide antigen binds in a deep groove between these two domains. This entry represents the alpha chain domains alpha1 and alpha2 that make up this recognition region (the alpha3 domain is represented by ().

    \ 2122 IPR007408 \ This is an archaeal protein of unknown function.\ 2801 IPR007305 \ Traffic through the yeast Golgi complex depends on a member of the syntaxin family of SNARE proteins, Sed5, present in early Golgi cisternae. Got1 is thought to facilitate Sed5-dependent fusion events PUBMED:10406798.\ 4799 IPR004285 \ Members of this family are functionally uncharacterised.\ 1116 IPR003389 \ Va2 protein can interact with the adenoviral packaging signal and this interaction involves DNA sequences that have previously been demonstrated to be required for packaging PUBMED:10684284. During the course of lytic infection, the adenovirus major late promoter (MLP) is induced to high levels after replication of viral DNA has started. IVa2 is a transcriptional activator of the major late promoter PUBMED:8207818.\ 6948 IPR010781 \

    This family consists of several hypothetical bacterial proteins of around 95 residues in length. The function of this family is unknown.

    \ 6095 IPR010430 \

    This is a family of bacterial and archaeal proteins with unknown function.

    \ 6523 IPR009577 \

    This family contains a small number of putative small multi-drug export proteins.

    \ 5634 IPR008717 \ This family consists of the eukaryotic Noggin proteins. Noggin is a glycoprotein that binds bone morphogenetic proteins (BMPs) selectively and, when added to osteoblasts, it opposes the effects of BMPs. It has been found that noggin arrests the differentiation of stromal cells, preventing cellular maturation PUBMED:12633782.\ 1112 IPR000939 \ Adenoviruses are responsible for diseases such as pneumonia, cystitis, conjunctivitis and diarrhoea, all \ of which can be fatal to patients who are immunocompromised PUBMED:7704534. Viral infection commences with \ recognition of host cell receptors by means of specialised proteins on viral surfaces. Specific attachment \ of adenovirus is achieved through interactions between host-cell receptors and the adenovirus fiber protein \ and is mediated by the globular carboxy-terminal domain of the adenovirus fiber protein, rather than the \ 'shaft' region represented by this family. The alignment of this family contains two copies of a fifteen\ residue repeat found in the 'shaft' region of adenoviral fiber proteins.\ 4706 IPR006783 \ Autonomous mobile genetic elements such as transposon or insertion sequences (IS)\ encode an enzyme, transposase, that is required for excising and inserting\ the mobile element. Transposases have been grouped into various families PUBMED:8041625, PUBMED:1310791, PUBMED:1718819. This family includes the putative transposase ISC1217 from archaebacteria.\ 5355 IPR008477 \ This is a family of eukaryotic proteins with unknown function, which are induced by tumour necrosis factor.\ 3134 IPR007478 \ The N-terminal module of the D6R/N1R proteins defines a novel, conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses. The KilA-N domain is suggested to be homologous to the fungal DNA-binding APSES domain. The KilA-N and APSES domains may also share a common fold with the nucleic acid-binding modules of the LAGLIDADG nucleases and the N-terminal domains of the tRNA endonuclease PUBMED:11897024.\ 3362 IPR000425 \

    A number of transmembrane (TM) channel proteins can be grouped together\ \ \ on the basis of sequence similarities PUBMED:8325040, PUBMED:2014003, PUBMED:1715617, PUBMED:7529436.

    \ \

    These include:\ \

    \ \

    MIP family proteins are thought to contain 6 TM domains. Sequence analysis\ \ suggests that the proteins may have arisen through tandem, intragenic\ \ duplication from an ancestral protein that contained 3 TM domains PUBMED:.

    \ \

    Some of the proteins in this group are responsible for the molecular basis of\ the blood group antigens, surface markers on the outside of the red blood \ cell membrane. Most of these markers are proteins, but some are carbohydrates a\ ttached to lipids or proteins PUBMED:11845000. Aquaporin-CHIP (Aquaporin 1) belo\ ngs to the Colton blood group system and is associated with Co(a/b) antigen.

    \ \ \ \ \ \ 3105 IPR007251 \

    The low affinity iron permease is an integral membrane protein required for ferrous iron low affinity uptake, and induced by iron deprivation.

    \ 5244 IPR001665 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the MEROPS peptidase family C37, (clan PA(C)). The type example is calicivirin (Southampton virus), an endopeptidase that cleaves the polyprotein at sites N-terminal to itself, liberating the polyprotein helicase. Southampton virus is a positive-stranded ssRNA virus belonging to the \ Caliciviruses, which are viruses that cause gastroenteritis. The calicivirus genome contains two open reading frames, ORF1 and ORF2. ORF1 encodes a non-structural polypeptide, which has RNA helicase, cysteine protease and RNA polymerase activity PUBMED:8642693. The regions of the polyprotein in which these activities lie are similar to proteins produced by the picornaviruses PUBMED:1551442.\ ORF2 encodes a structural, capsid protein. Two different\ families of caliciviruses can be distinguished on the basis of sequence\ similarity, namely the Norwalk-like viruses or small round structured\ viruses (SRSVs), and those classed as non-SRSVs.

    \ \ \ 862 IPR001232 \ SKP1 (together with SKP2) was identified as an essential component of the \ cyclin A-CDK2 S phase kinase complex PUBMED:10205047. It was found to bind several \ F-box containing proteins (e.g., Cdc4, Skp2, cyclin F) and to be involved in the \ ubiquitin protein degradation pathway. A yeast homologue of SKP1 (P52286) was \ identified in the centromere bound kinetochore complex PUBMED:8670864 and is also \ involved in the ubiquitin pathway PUBMED:9390558. In the slime mold FP21 \ was shown to be glycosylated in the cytosol and has homology to SKP1 PUBMED:7852383.\ 7760 IPR012912 \

    Members of this family are similar to the protein product of ORF-3 () found on plasmid pRiA4 in the bacterium Agrobacterium rhizogenes. This plasmid is responsible for tumourigenesis at wound sites of plants infected by this bacterium, but the ORF-3 product does not seem to be involved in the pathogenetic process PUBMED:2226811. Other proteins found in this family are annotated as being putative TnpR resolvases (, ), but no further evidence was found to back this. Moreover, another member of this family is described as a probable lexA repressor () and in fact carries a LexA DNA binding domain (), but no references were found to expand on this.

    \ 2540 IPR001029 \ Flagellin is the subunit which polymerizes to form the filaments of\ bacterial flagella. Two regions, one at the N terminus and the other,\ this one, at the C terminus seem always to occur \ together PUBMED:2190210.\ 4087 IPR000156 \

    Ran is an evolutionary conserved member of the Ras superfamily that regulates all receptor-mediated transport between the nucleus and the cytoplasm. Ran Binding Protein 1 (RanBP1) has guanine nucleotide dissociation inhibitory activity, specific for the GTP form of Ran and also functions to stimulate Ran GTPase activating protein(GAP)-mediated GTP hydrolysis by Ran. RanBP1 contributes to maintaining the gradient of RanGTP across the nuclear envelope high (GDI activity) or the cytoplasmic levels of RanGTP low (GAP cofactor) PUBMED:12019565.

    All RanBP1 proteins contain an approx 150 amino acid residue Ran binding domain. Ran BP1 binds directly to RanGTP with high affinity.\ \ There are four sites of contact\ between Ran and the Ran binding domain. One of these\ involves binding of the C-terminal segment of Ran to a groove on the Ran binding domain that is\ analogous to the surface utilized in the EVH1peptide\ interaction PUBMED:10404224. Nup358 \ contains four Ran binding domains. The structure of the first of these is known PUBMED:10078529.

    \ 7236 IPR009983 \

    This family consists of several hypothetical bacterial proteins of around 90 residues in length. The function of this family is unknown.

    \ 1515 IPR004282 \ Members of this family are probable integral membrane proteins. Their molecular function is unknown. CemA proteins\ are found in the inner envelope membrane of chloroplasts but not in the thylakoid membrane PUBMED:8633006. A cyanobacterial\ member of this family has been implicated in CO2 transport, but is probably not a CO2 transporter itself PUBMED:8633006.\ 884 IPR002913 \

    START (StAR-related lipid-transfer) is a lipid-binding domain in StAR, HD-ZIP and signalling proteins PUBMED:10322415. StAR (Steroidogenic Acute Regulatory protein) is a mitochondrial protein that is synthesised in response to luteinising hormone stimulation PUBMED:7961770.\ Expression of the protein in the absence of hormone stimulation is sufficient to induce\ steroid production, suggesting that this protein is required in the acute regulation of\ steroidogenesis. Representatives of the START domain family have\ been shown to bind different ligands such as sterols (StAR protein) and\ phosphatidylcholine (PC-TP). Ligand binding by the START domain can also\ regulate the activities of other domains that co-occur with the START domain\ in multidomain proteins such as Rho-gap, the homeodomain,\ and the thioesterase domain PUBMED:10322415, PUBMED:11276083.

    \

    \ The crystal structure of START domain of human MLN64 shows an\ alpha/beta fold built around an U-shaped incomplete beta-barrel. Most\ importantly, the interior of the protein encompasses a 26 x 12 x 11 Angstroms\ hydrophobic tunnel that is apparently large enough to bind a single\ cholesterol molecule PUBMED:10802740. The START domain structure revealed an unexpected\ similarity to that of the birch pollen allergen Bet v 1 and to bacterial\ polyketide cyclases/aromatases PUBMED:11276083, PUBMED:10802740.

    \ 7366 IPR006637 \

    This hydrophobic repeat is found in a number of Chlostridium proteins. It contains a conserved tryptophan residue.

    \ 650 IPR002128 \ This domain represents a C-terminal extension of NADH-Ubiquinone/plastoquinone (complex I) chains (see ). Only NADH-Ubiquinone chain 5 from chloroplasts belong to this family. Chain 5 is a component of complex I which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction that is associated with proton translocation across the membrane PUBMED:1470679.\ 4442 IPR003189 \ This family represents the B subunit of shiga-like toxin (SLT or verotoxin) produced by some strains of Escherichia coli associated with hemorrhagic colitis and hemolytic uremic syndrome. SLT s are composed of one enzymatic A subunit and five cell binding B subunits.\ 5727 IPR008579 \ The function of the proteins in this entry are unknown. They contain the conserved barrel domain of the 'cupin' superfamily and members are specific to plants and bacteria.\ 1489 IPR002712 \

    CcdB protein is a topoisomerase poison from Escherichia coli PUBMED:9917404.\ It is responsible for killing plasmid-free segregants, and interferes with the activity of DNA gyrase. It acts to inhibit partitioning of the chromosomal DNA.

    \ \ 1044 IPR013086 \

    Neurotransmitter transport systems are integral to the release, re-uptake and recycling of neurotransmitters at synapses. High affinity transport proteins found in the plasma membrane of presynaptic nerve terminals and glial cells are responsible for the removal from the extracellular space of released-transmitters, thereby terminating their actions PUBMED:15336049. Plasma membrane neurotransmitter transporters fall into two structurally and mechanistically distinct families. The majority of the transporters constitute an extensive family of homologous proteins that derive energy from the co-transport of Na+ and Cl-, in order to transport neurotransmitter molecules into the cell against their concentration gradient. The family has a common structure of 12 presumed transmembrane helices and includes carriers for gamma-aminobutyric acid (GABA), noradrenaline/adrenaline, dopamine, serotonin, proline, glycine, choline, betaine and taurine. They are structurally distinct from the second more-restricted family of plasma membrane transporters, which are responsible for excitatory amino acid transport. The latter couple glutamate and aspartate uptake to the cotransport of Na+ and the counter-transport of K+, with no apparent dependence on Cl- PUBMED:8811182. In addition, both of these transporter families are distinct from the vesicular neurotransmitter transporters PUBMED:8103691, PUBMED:7823024.

    Sequence analysis of the Na+/Cl- neurotransmitter superfamily reveals that it can be divided into four subfamilies, these being transporters for monoamines, the amino acids proline and glycine, GABA, and a group of orphan transporters PUBMED:9779464.

    \

    The 5-HT neurotransmitter transporter is known to be expressed in the brain\ and also in the periphery: on platelet, placental and pulmonary cell\ membranes. The brain 5-HT transporter is thought to be the principal site\ of action of therapeutic anti-depressants (which inhibit this transporter),\ and it may also mediate the behavioural effects of cocaine and amphetamines\ PUBMED:7681602. The human form (630 amino acids) is 92% identical to the rat brain\ 5-HT transporter, and shares the same predicted topology and conserved sites\ for post-translational modification.

    \ \

    This domain is found at the N-terminal region of some 5-HT neurotransmitters.

    \ 6408 IPR010559 \

    This family represents a region within bacterial histidine kinase enzymes. Two-component signal transduction systems such as those mediated by histidine kinase are integral parts of bacterial cellular regulatory processes, and are used to regulate the expression of genes involved in virulence. Members of this family often contain and/or .

    \ 8107 IPR013264 \

    This is the N-terminal, catalytic core domain of DNA primases. DNA primase () is a nucleotidyltransferase which synthesizes the oligoribonucleotide primers required for DNA replication on the lagging strand of the replication fork. It can also prime the leading strand and has been implicated in cell division PUBMED:8294018.

    \ 3886 IPR007711 \ Several plasmids with proteic killer gene systems have been reported. All of them encode a stable toxin and an unstable antidote. Upon loss of the plasmid, the less stable inhibitor is inactivated more rapidly than the toxin, allowing the toxin to be activated. The activation of those systems result in cell filamentation and cessation of viable cell production. It has been verified that both the stable killer and the unstable inhibitor of the systems are short polypeptides. This family corresponds to the toxin.\ 5173 IPR008010 \

    This family of eukaryotic membrane proteins includes the putative receptor for human cytomegalovirus gH. The cellular function of this family\ remains unknown.

    \ 3898 IPR004930 \ This family is the Pneumovirus nucleocapsid protein. It is the most abundant protein in the virion and an important element in conferring helical symmetry on the nucleoprotein core as well as interacting with the M protein during virion formation.\ 6866 IPR009757 \

    This family consists of several Circovirus proteins of around 60 residues in length. The function of this family is unknown.

    \ 3059 IPR007743 \ Interferon-inducible GTPase (IIGP) is thought to play a role in in intracellular defense. IIGP is predominantly associated with the Golgi apparatus and also localizes to the endoplasmic reticulum and exerts a distinct role in IFN-induced intracellular membrane trafficking or processing PUBMED:11907101.\ 2613 IPR004923 \ The Saccharomyces cerevisiae iron permease FTR1 is a plasma membrane permease for high-affinity iron uptake. Also included in this family are bacterial hypothetical integral membrane proteins.\ 4989 IPR002644 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbZ (Ycf9), which is thought to be located at the interface of PSII and LHCII (light-harvesting complex II) complexes, the latter containing the light-harvesting antenna. PsbZ appears to act as a structural factor, or linker, that stabilises the PSII-LHCII supercomplexes, which fail to form in PsbZ-deficient mutants. This may in part be due to the marked decrease in two LHCII antenna proteins, CP26 and CP29, found in PsbZ-deficient mutants, which result in structural changes, as well as functional modifications in PSII PUBMED:11402165. PsbZ may also be involved in photo-protective processes under sub-optimal growth conditions.

    \ 3936 IPR005004 \

    This is a family of proteins expressed by members of the Poxviridae.

    \ 5202 IPR008037 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    This family of serine protease inhibitors belong to MEROPS inhibitor family I19, clan IW. They inhibit chymotrpsin, a peptidase belong to the S1 family () PUBMED:14705960.

    \ \

    They were first isolated from Locusta migratoria (migratory locust). These were HI, LMCI-1 (PMP-D2) and LMCI-2 (PMP-C) PUBMED:1472051, PUBMED:1740125, PUBMED:10696590; five additional members SGPI-1 to 5 were identified in Schistocerca gregaria (desert locust) PUBMED:9475173, and a heterodimeric serine protease inhibitor (pacifastin) was isolated from the hemolymph of Pacifastacus leniusculus (crayfish) PUBMED:9192625.

    \ \ \

    Pacifastin is a 155-kDa composed of two covalently linked subunits, which are separately encoded. The heavy chain of pacifastin (105 kDa) is related to transferrins, containing three transferrin lobes, two of which seem to\ be active for iron binding PUBMED:9192625. A number of the members of the transferrin family are also serine peptidases belong to MEROPS peptidase family S60 (). The light chain of pacifastin (44 kDa) is the proteinase inhibitory subunit, and has nine cysteine-rich inhibitory domains\ that are homologous to each other. The locust inhibitors share a conserved array of six cysteine residues with the pacifastin light chain. The structure of members of this family reveal that they are comprised of a triple-stranded antiparallel beta-sheet connected by three disulphide bridges PUBMED:9192625.

    \ \

    The biological function(s) of the locust inhibitors is (are) not fully understood. LMCI-1 and LMCI-2 were shown to inhibit the endogenous proteolytic activating cascade of prophenoloxidase. Expression analysis shows that the genes encoding the SGPI precursors are differentially expressed in a time-, stage- and hormone-dependent manner.

    \ \ 3918 IPR003684 \ This family consists of porins from the alpha subdivision of proteobacteria the members of this family are related to\ Gram-ve_porins PUBMED:1370281. The porins form large aqueous channels in the cell membrane allowing the selective entry of\ hydrophilic compounds this so called 'molecular sieve' is found in the cell walls of Gram-negative bacteria.\ 3039 IPR000258 \ Certain Gram-negative bacteria express proteins that enable them to promote nucleation of ice at relatively high temperatures (above -5C) PUBMED:1366726, PUBMED:8224607. These proteins are localised at the outer membrane surface and can cause frost damage to many plants. The primary structure of the proteins contains a highly repetitive domain that dominates the sequence. The domain comprises a number of 48-residue repeats, which themselves contain 3 blocks of 16 residues, the first 8 of which are identical. It is thought that the repetitive domain may be responsible for aligning water molecules in the seed crystal.\
    \
                  [.........48.residues.repeated.domain..........]\
                 /              / |              | \\              \\\
                AGYGSTxTagxxssli  AGYGSTxTagxxsxlt  AGYGSTxTaqxxsxlt\
                [16.residues...]  [16.residues...]  [16.residues...]\
    
    \ 7415 IPR011444 \

    This domain is found in a family of paralogues in the planctomycetes. The function is not known. It is found associated with the Planctomycete cytochrome C domain .

    \ 2269 IPR006462 \

    These sequences comprise a paralogous family of hypothetical proteins in Arabidopsis thaliana. No homologs are detected from other species. Length heterogeneity within the family is attributable partly to a 21-residue repeat present in from zero to three tandem copies. The proteins have no known function.

    \ 5280 IPR008453 \ This family consists of clavanin proteins from the haemocytes of the invertebrate Styela clava, a solitary tunicate. The family is made up of four alpha-helical antimicrobial peptides, clavanins A, B, C and D. The tunicate peptides resemble magainins in size, primary sequence and antibacterial activity. Synthetic clavanin A displays comparable antimicrobial activity to magainins and cecropins. The presence of alpha-helical antimicrobial peptides in the haemocytes of a urochordate suggests that such peptides are primeval effectors of innate immunity in the vertebrate lineage PUBMED:9001389.\ 4160 IPR000100 \ Ribonuclease P () (RNase P) PUBMED:1689306, PUBMED:1700778, PUBMED:1374553 is a site specific endonuclease\ that generates mature tRNAs by cleaving-off the leader sequences at their\ 5'ends. In bacteria RNase P is known to be composed of two components: a large\ (about 400 base pairs) RNA (gene rnpB) and a small protein (119 to 133 amino\ acids) (gene rnpA). The RNA moiety of RNase P carries the catalytic activity;\ the function of the protein component is not yet clear although it may act as\ an electrostatic screen allowing the highly negatively charged RNA enzyme-\ substrate complex to fold into the catalytic conformation.\ The sequence of rnpA is not highly conserved, however there is, in the central\ part of the protein, a conserved basic region.\ 3413 IPR005091 \

    The major surface protein (MSP1) of the cattle pathogen Anaplasma is a heterodimer comprised of MSP1a and MSP1b. This family is the MSP1b chain. The MSP1\ proteins are putative adhesins for bovine erythrocytes.

    \ 6090 IPR010428 \

    This is a family of bacterial protein with undetermined function.

    \ 5938 IPR010357 \

    This family consists of several hypothetical eukaryotic proteins of unknown function that are thioredoxin-like.

    \ 2419 IPR005639 \

    This family contains insecticidal toxins produced by Bacillus species of bacteria. During spore formation the bacteria produce crystals of this protein. When an insect ingests these proteins they are activated by proteolytic cleavage. The N-terminus is cleaved in all of the proteins and a C-terminal extension is cleaved in some members. Once activated the endotoxin binds to the gut epithelium and causes cell lysis by the formation of cation-selective channels, which leads to death. The activated toxin is composed of three distinct domains: an N-terminal helical bundle domain involved in membrane insertion and pore formation; a beta-sheet domain involved in receptor binding; and a C-terminal beta-sandwich domain that interacts with the N-terminal domain to form a channel PUBMED:7490762, PUBMED:11468393. This entry represents the conserved N-terminal domain.

    \ 3645 IPR002472 \ Neuronal ceroid lipofuscinoses (NCL) represent a group of encephalopathies\ that occur in 1 in 12,500 children. Mutations in the palmitoyl protein thioesterase gene causing infantile neuronal\ ceroid lipofuscinosis PUBMED:7637805. \ \ The most common mutation results in intracellular\ accumulation of the polypeptide and undetectable enzyme activity in\ the brain.\ Direct sequencing of cDNAs derived from brain RNA of INCL patients has\ shown a mis-sense transversion of A to T at nucleotide position 364, which\ results in substitution of Trp for Arg at position 122 in the protein - \ Arg 122 is immediately adjacent to a lipase consensus sequence that \ contains the putative active site Ser of PPT. The occurrence of this and\ two other independent mutations in the PPT gene strongly suggests that\ defects in this gene cause INCL.\ 7603 IPR011693 \ This family contains sequences which are similar to ORF40 of the bacteriophage TP901-1. The members of this family are both viral and bacterial proteins. In most bacteriophages, the genes located between the major head (orf 36 in TP901-1) and major tail (orf 42 in TP901-1) are involved in the formation and connection of the head and tail structures, and in DNA packaging PUBMED:11312666.\ 7216 IPR010866 \

    This family contains the bacterial enzyme alpha-2,8-polysialyltransferase (approximately 500 residues long). This catalyses the polycondensation of alpha-2,8-linked sialic acid required for the synthesis of polysialic acid (PSA) PUBMED:12578835.

    \ 6042 IPR010407 \

    This family consists of several mammalian signaling lymphocytic activation molecule (SLAM) proteins. Optimal T cell activation and expansion require engagement of the TCR plus co-stimulatory signals delivered through accessory molecules. SLAM, a 70 kDa co-stimulatory molecule belonging to the Ig superfamily, is defined as a human cell surface molecule that mediates CD28-independent proliferation of human T cells and IFN-gamma production by human Th1 and Th2 clones PUBMED:10570270. SLAM has also been recognised as a receptor for measles virus PUBMED:12610126.

    \ 314 IPR006943 \ This conserved region is found in a number of plant proteins of unknown function.\ 6591 IPR010633 \

    This family consists of several prophage minor tail protein Z like sequences from Escherichia coli, Salmonella typhimurium and Lambda-like bacteriophages.

    \ 505 IPR001093 \ Synonym(s): Inosine-5'-monophosphate dehydrogenase, Inosinic acid dehydrogenase; \ Synonym(s): Guanosine 5'-monophosphate oxidoreductase \ \ \

    This entry contains two related enzymes IMP dehydrogenase and GMP reducatase. These enzymes adopt a TIM barrel structure.

    \ \

    IMP dehydrogenase () (IMPDH) catalyzes the rate-limiting reaction of de novo GTP biosynthesis, the NAD-dependent reduction of IMP into XMP PUBMED:2902093.\ \ \ \ IMP dehydrogenase is associated with cell proliferation and is a possible target for cancer chemotherapy. Mammalian and bacterial IMPDHs are tetramers of identical chains. There are two IMP dehydrogenase isozymes in humans PUBMED:1969416. IMP dehydrogenase nearly always contains a long insertion that has two CBS domains within it.

    \ \

    GMP reductase () catalyzes the irreversible and NADPH-dependent reductive deamination of GMP into IMP PUBMED:2904262.\ \ \ \ It converts nucleobase, nucleoside and nucleotide derivatives of G to A nucleotides, and maintains intracellular balance of A and G nucleotides.

    \ 6999 IPR010799 \

    This entry represents the C terminus (approximately 200 residues) of the product of a bacterial gene cluster that is involved in the degradation of the cyanobacterial toxin microcystin LR.

    \ 4345 IPR001751 \ The S-100 domain is a subfamily of the EF-hand calcium binding proteins.\ S-100s are small dimeric acidic calcium and zinc-binding proteins PUBMED: abundant in the brain. They have two different types of calcium-binding sites: a low affinity one with a special structure and a 'normal' EF-hand type high affinity site. The vitamin-D dependent intestinal calcium-binding proteins (ICaBP or calbindin 9 Kd) also belong to this family of proteins, but it does not form dimers. In the past years the sequences of many new members of this family have been determined (for reviews see PUBMED:2115931, PUBMED:3075365, PUBMED:7759097); in most cases the function of these proteins is not yet known, although it is becoming clear that they are involved in cell growth and differentiation, cell cycle regulation and metabolic control. A number of these proteins are known to bind calcium while others are not (p10 for example).\ 1247 IPR000768 \ Mono-ADP-ribosylation is a post-translational modification of proteins in which the \ ADP-ribose moiety of NAD is transferred to proteins. This process is responsible for the toxicity\ of some bacterial toxins (e.g., cholera and pertussis toxins). A family of \ mono(ADP-ribosyl)transferases exists in vertebrates that transfer ADP-ribose to arginine PUBMED:8703012.\ \ At least five forms of the enzyme have been characterised to date, some of which are\ attached to the membrane via glycosylphosphatidylinositol (GPI) anchors, while others\ appear to be secreted. The enzymes contain ~250-300 residues, which encode putative\ signal sequences and carbohydrate attachment sites. In addition, the N- and C-termini are\ predominantly hydrophobic, a characteristic of GPI-anchored proteins PUBMED:7947688.\ 755 IPR001533 \

    DCoH is the dimerization cofactor of hepatocyte nuclear factor 1 (HNF-1) that functions as both a transcriptional coactivator and a pterin dehydratase PUBMED:8897596. X-ray crystallographic studies have shown that the ligand binds at four sites per tetrameric enzyme, with little apparent conformational change in the protein.

    \ \ 5035 IPR003224 \ This family is found in RNA viruses, and may be involved in RNA binding, possibly playing a regulatory role.\ 221 IPR013050 \

    The DOMON domain is an 110-125 residue long domain which has been identified\ in the physiologically important enzyme dopamine beta-monooxygenase and in\ several other secreted and transmembrane proteins from both plants and\ animals. It has been named after DOpamine beta-MOnooxygenase N-terminal\ domain. The DOMON domain can be found in one to four copies and in association\ with other domains, such as the Cu-ascorbate dependent monooxygenase domain,\ the epidermal growth factor domain, the trypsin inhibitor-like domain (TIL), the SEA domain and the Reelin domain.\ The architectures of the DOMON domain proteins strongly suggest a function in\ extracellular adhesion PUBMED:11551777.

    \

    \ The sequence conservation is predominantly centered around patches of\ hydrophobic residues. The secondary structure prediction of the DOMON domain\ points to an all-beta-strand fold with seven or eight core strands supported\ by a buried core of conserved hydrophobic residues. There is a chraracteristic\ motif with two small positions (Gly or Ser) corresponding to a conserved turn\ immediately C-terminal to strand three. It has been proposed that the DOMON\ domain might form a beta-sandwich structure, with the strands distributed into\ two beta sheets as is seen in many extracellular adhesion domains such as the\ immunoglobulin, fibronectin type III, cadherin and PKD domains PUBMED:11551777.

    \ 7913 IPR012604 \

    This C-terminal region is found in RBM1-like RNA binding hnRNPs PUBMED:15112237.

    \ 5857 IPR010316 \

    This domain is found at the N terminus of bacterial DNA-3-methyladenine glycosylase II ,\ involved in DNA repair PUBMED:10706276, and in several putative regulatory proteins.

    \ \ 5585 IPR008797 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    In PSII, the oxygen-evolving complex (OEC) is responsible for catalysing the splitting of water to O(2) and 4H+. The OEC is composed of a cluster of manganese, calcium and chloride ions bound to extrinsic proteins. In cyanobacteria there are five extrinsic proteins in OEC (PsbO, PsbP-like, PsbQ-like, PsbU and PsbV), while in plants there are only three (PsbO, PsbP and PsbQ), PsbU and PsbV having been lost during the evolution of green plants PUBMED:15258264.

    \

    This family represents the PSII OEC protein PsbQ. Both PsbQ and PsbP () are regulators that are necessary for the biogenesis of optically active PSII. The crystal structure of PsbQ from spinach revealed a 4-helical bundle polypeptide. The distribution of positive and negative charges on the protein surface might explain the ability of PsbQ to increase the binding of chloride and calcium ions and make them available to PSII PUBMED:12949587.

    \ \ 5479 IPR008599 \ This region is found in several proteins characterised as carbohydrate diacid regulators (e.g. ). An HTH DNA-binding motif is found at the C terminus of these proteins suggesting that this region includes the sugar recognition region.\ 822 IPR000477 \ The use of an RNA template to produce DNA, for integration into the host genome and exploitation of a host cell, is a strategy employed in the replication of retroid elements, such as the retroviruses and bacterial retrons. The enzyme catalysing polymerisation is an RNA-directed DNA-polymerase, or reverse trancriptase (RT) (). Reverse transcriptase occurs in a variety of mobile elements, including retrotransposons, retroviruses, group II introns, bacterial msDNAs, hepadnaviruses, and caulimoviruses.\

    Retroviral reverse transcriptase is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. The discovery of retroelements in the prokaryotes raises intriguing questions concerning their roles in bacteria and the origin and evolution of reverse transcriptases and whether the bacterial reverse transcriptases are older than eukaryotic reverse transcriptases PUBMED:8828137.

    \ 3007 IPR007250 \ These heat shock proteins (Hsp9 and Hsp12) are strongly expressed and undergo an increase of 100 fold, upon entry into stationary phase in yeast PUBMED:2175390, PUBMED:8679693.\ 3262 IPR004962 \ Mab-21 is a homeotic regulator homologue. The protein is found in eukayrotes. \ 7775 IPR012878 \

    The members of this family are sequences derived from hypothetical bacterial and eukaryotic proteins of unknown function. One member of this family is annotated as a possible arabinosidase, but no references were found to back this.

    \ 1259 IPR004618 \

    Aspartate--ammonia ligase (asparagine synthetase) catalyses the conversion of L-aspartate to L-asparagine in the presence of ATP and ammonia. This family represents one of two non-homologous forms of aspartate--ammonia ligase found in Escherichia coli. This type is also found in Haemophilus influenzae, Treponema pallidum and Lactobacillus delbrueckii, but appears to have a very limited distribution. The fact that the protein from the Haemophilus influenzae is more than 70% identical to that from the spirochete Treponema pallidum, but less than 65% identical to that from the closely related Escherichia coli, strongly suggests lateral transfer.

    \ 3952 IPR006872 \ This is a family of poxvirus late H7 proteins.\ 4739 IPR002314 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \ \

    This domain includes the glycine, histidine, proline, threonine and serine tRNA synthetases.

    \ 3403 IPR001339 \

    The mRNA capping enzyme in yeasts is composed of two separate chains, alpha a mRNA\ guanyltransferase and beta an RNA 5'-triphosphate. X-ray crystallography reveals a large \ conformational change during guanyl transfer by mRNA capping enzymes PUBMED:9160746.\ Binding of the enzyme to nucleotides is specific to the GMP moiety of GTP. The viral \ mRNA capping enzyme is a monomer that transfers a GMP cap onto the end of mRNA that \ terminates with a 5'-diphosphate tail.

    \ 120 IPR002557 \

    The Peritrophin-A domain is found in chitin binding proteins, particularly the peritrophic matrix\ proteins of insects and animal chitinases PUBMED:9651363, PUBMED:8621536, PUBMED:9256413. Copies of the domain are also found in some baculoviruses. It is an extracellular domain that contains six\ conserved cysteines that probably form three disulphide bridges. Chitin binding has been demonstrated for a\ protein containing only two of these domains PUBMED:9651363.

    \ 3920 IPR000860 \

    Porphobilinogen deaminase (PBGD), or hydroxymethylbilane synthase, is the third enzyme in the \ biosynthetic pathway of tetrapyrroles, which include the vitally important macrocycles haem, \ chlorophyll and corrin PUBMED:. PBGD catalyses the head-to-tail polymerisation of 4 molecules \ of porphobilinogen to assemble the open chain tetrapyrrole, hydroxymethylbilane. PBGD is a \ ubiquitously occurring, monomeric protein, showing high sequence conservation among proteins from \ bacteria, fungi, plants and mammals. The protein contains a dipyrromethane cofactor, which is \ covalently attached to a cysteine side chain. The structure of PBGD shows the same chain fold\ as proteins from 2 classes of binding protein, the transferrins and the group-II periplasmic \ receptors (the sulphate-, phosphate-, maltodextrin- and lysine/arginine/ornithine-binding proteins). \ Despite structural similarities, there is no significant identity between their sequences.

    \ \ 5604 IPR008736 \ Human papillomaviruses (HPVs) are epitheliotropic viruses, and their life cycle is intimately linked to the stratification and differentiation state of the host epithelial tissues. The kinetics of E5a protein expression during the complete viral life cycle has been studied and the highest level was found to be coincidental with the onset of virion morphogenesis PUBMED:9721230.\ 7900 IPR012960 \

    This is an N-terminal domain of Dyskerin-like proteins, which is often associated with the TruB N-terminal() and PUA() domains PUBMED:15112237.

    \ 2655 IPR006079 \

    Lantibiotics are heavily-modified bacteriocin-like peptides from Gram- positive bacteria. They contain alpha,beta-unsaturated amino acids (dehydroalanine and dehydrobutyrine) and lanthionine or 3-methyllanthionine rings\ (collectively known as thioether rings). There are 2 types of lantibiotic:

    1. Type A (which include nisin, subtilin, epidermin,\ gallidermin and Pep5) are strongly cationic and bactericidal - nisin, subtilin and Pep5 inhibit the growth of Gram-positive\ bacteria, probably by voltage-dependent pore formation in the cytoplasmic membrane, resulting in cellular efflux of\ electrolytes, amino acids and ATP;
    2. Type B lantibiotics possess at most one positive charge and are not bactericidal.

    \ \

    This family contains both type A and type B molecules.

    \ 494 IPR002197 \

    The Factor for Inversion Stimulation (FIS) protein is a regulator of\ bacterial functions, and binds specifically to weakly related DNA sequences \ PUBMED:7536730,PUBMED:11123690. It activates ribosomal RNA transcription, and is involved in upstream\ activation of rRNA promoters. The\ protein has been shown to play a role in the regulation of virulence factors\ in both Salmonella typhimurium and Escherichia coli PUBMED:11532124. Some of its\ functions include inhibition of the initiation of DNA replication from the\ OriC site, and promotion of Hin-mediated DNA inversion.

    \ \

    \ In its C-terminal extremity, FIS encodes a helix-turn-helix (HTH) DNA-\ binding motif, which shares a high degree of similarity with other HTH\ motifs of more primitive bacterial transcriptional regulators, such as the\ nitrogen assimilation regulatory proteins (NtrC) from species like Azobacter,\ Rhodobacter and Rhizobium. This has led to speculation that both evolved\ from a single common ancestor PUBMED:9738943.

    \ \

    \ The 3-dimensional structure of the E. coli FIS DNA-binding protein has been\ determined by means of X-ray diffraction to 2.0A resolution PUBMED:1619650,PUBMED:11183780. FIS is\ composed of four alpha-helices tightly intertwined to form a globular dimer\ with two protruding HTH motifs. The 24 N-terminal amino acids are poorly \ defined, indicating that they might act as 'feelers' suitable for DNA or\ protein (invertase) recognition PUBMED:1619650. Other proteins belonging to this subfamily include:

    \ \ \ 4753 IPR002028 \

    Tryptophan synthase () catalyzes the last step in the biosynthesis\ of tryptophan PUBMED:2679363, PUBMED:1366510:\ \ \ It has two functional domains, each found in bacteria and plants on a\ separate subunit. In Escherichia coli, the 2 subunits, A and B, are encoded by the trpA and trpB genes respectively. The alpha chain is for the aldol cleavage of indoleglycerol phosphate to indole and glyceraldehyde 3-phosphate and the beta chain is for the synthesis of tryptophan from indole and serine. In fungi the two domains are fused together in a single multifunctional protein, in the order: (NH2-A-B-COOH) PUBMED:2521855, PUBMED:2734310. The two domains of the Neurospora crassa polypeptide are linked by a connector of 54-amino acid residues that has less than 25% identity to the 45-residue connector of the Saccharomyces cerevisiae polypeptide. Two acidic residues are believed to serve as proton donors/acceptors in the enzyme's\ catalytic mechanism.

    \ \ \ 4342 IPR003726 \ S-methylmethionine: homocysteine methyltransferase from Escherichia coli accepts selenohomocysteine as a substrate. S-methylmethionine is an abundant plant product that can be utilized for methionine biosynthesis PUBMED:9882684. Human methionine synthase (5-methyltetrahydrofolate:L-homocysteine\ S-transmethylase; ) shares 53 and 63% identity with the E. coli and the presumptive Caenorhabditis elegans proteins, respectively, and contains all residues implicated in B12 binding to the E. coli protein PUBMED:9013615. Betaine--homocysteine S-methyltransferase () converts betaine and homocysteine to dimethylglycine and methionine, respectively. This reaction is also required for the irreversible oxidation of choline PUBMED:8798461.\ 4851 IPR005345 \

    The proteins of this family are uncharacterised, they contain five CXXC motifs.

    \ 4536 IPR000368 \ Sucrose synthases catalyse the synthesis of sucrose\ in the following reaction:\ \ This family includes the bulk of the sucrose synthase\ protein. However the carboxyl terminal region of the\ sucrose synthases belongs to the glycosyl transferase\ family . This enzyme is found mainly in plants\ but also appears in bacteria.\ 4241 IPR004977 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The S25 ribosomal protein is a compnent of the 40S ribosomal subunit.

    \ 3392 IPR003872 \ This is a family of spirochete major outer sheath protein C-terminal regions. These proteins are present on the bacterial cell surface. In Treponema denticola the major outer sheath protein (Msp) binds immobilized laminin and fibronectin supporting the hypothesis that Msp mediates the extracellular matrix binding activity of T. denticola PUBMED:9023187.\ 2 IPR001078 \ This domain is found in the lipoamide acyltransferase component of the branched-chain alpha-keto acid dehydrogenase complex (), which catalyses the overall conversion of alpha-keto acids to acyl-CoA and carbon dioxide PUBMED:8487300. It contains multiple copies of three enzymatic components: branched-chain alpha-keto acid decarboxylase (E1), lipoamide\ acyltransferase (E2) and lipoamide dehydrogenase (E3). The domain is also found in the dihydrolipoamide succinyltransferase component of the 2-oxoglutarate dehydrogenase complex ().\ These proteins contain one to three copies of a lipoyl binding domain followed by the catalytic domain.\ 2701 IPR011584 \

    The green fluorescent protein was originally found in the jellyfish (Aequorea victoria), and functions as\ an energy-transfer acceptor. It fluoresces in vivo upon receiving energy from the \ Ca2+-activated photoprotein aequorin. The protein absorbs light maximally at 395 nm and exhibits\ a smaller absorbance peak at 470 nm. The fluorescence emission spectrum peaks at 509 nm\ with a shoulder at 540 nm. The protein is produced in the photocytes and contains a chromophore, which is composed of modified amino acid residues. The chromophore\ is formed upon cyclization of the residues ser-dehydrotyr-gly.

    \ \ 5269 IPR008690 \ The N5-methyltetrahydromethanopterin: coenzyme M () of Methanosarcina mazei Go1 is a membrane-associated, corrinoid-containing protein that uses a transmethylation reaction to drive an energy-conserving sodium ion pump PUBMED:9559648.\ 861 IPR003380 \ The c-ski proto-oncogene has been shown to influence proliferation, morphological transformation and myogenic differentiation PUBMED:7999783. It may play a role in treminal differentiation of skeletal muscle cells but not in the determination of cells to the myogenic lineage. Sno, a Ski proto-oncogene homologue, is expressed in two isoforms and plays a role in the response to proliferation stimuli.\ 7903 IPR000876 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of \ sequence similarities. One of these families includes yeast S7 (YS6); archaeal S4e; and \ mammalian and plant cytoplasmic S4 PUBMED:2124517. Two highly similar isoforms of mammalian S4 \ exist, one coded by a gene on chromosome Y, and the other on chromosome X. These proteins have \ 233 to 264 amino acids.

    \ 7252 IPR009989 \

    This family contains the bacterial protein TrbM (approximately 180 residues long). In Comamonas testosteroni T-2, TrbM is derived from the IncP1beta plasmid pTSA, which encodes the widespread genes for p-toluenesulfonate (TSA) degradation PUBMED:11282598.

    \ 4451 IPR002625 \ This family includes the Smr (Small MutS Related) proteins,\ and the C-terminal region of the MutS2 protein. It has been suggested that this domain interacts with the MutS1 () protein in the case of Smr proteins and with the N-terminal MutS related region of MutS2, PUBMED:10431172.\ 7380 IPR011428 \

    This domain is found in the Bacilli coat protein X as a tandem repeat and as a single domain in coat protein V. The proteins are found in the insoluble fraction PUBMED:8509331.

    \ 5852 IPR009261 \

    This family consists of several baculovirus proteins of unknown function.

    \ 203 IPR003156 \

    This domain is often found adjacent to the DHH domain, found in the RecJ-like phosphoesterase family , and is called DHHA1 for DHH associated domain. DHHA1 is diagnostic of DHH subfamily 1 members PUBMED:9478130. This domain is also found in alanyl tRNA synthetase e.g. , suggesting that it may have an RNA binding function. The domain is about 60 residues long and contains a conserved GG motif.

    \ 3270 IPR007244 \

    NatC N(alpha)-terminal acetyltransferases contain Mak10p, Mak31p and Mak3p subunits. All three subunits are associated with each other to form the active complex PUBMED:11274203.

    \ 5073 IPR007910 \

    This family consists of several uncharacterised Borrelia\ burgdorferi proteins of unknown function.

    \ 3998 IPR004137 \ Members of this family, also known as hybrid-cluster proteins, contain two Fe/S centers - a [4Fe-4S] cubane cluster, and a hybrid [4Fe-2S-2O] cluster. The physiological role of this protein is as yet unknown, although a role in nitrate/nitrite respiration has been suggested PUBMED:10651802.\ 7498 IPR011653 \ This group of paralogous proteins identified in Mycoplasma penetrans includes homologues of p35 PUBMED:12466555.\ 7421 IPR011519 \

    This conserved sequence is found associated with in several paralogous proteins in Rhodopirellula baltica. It is also found associated with in several eukaryotic integrin-like proteins (e.g. human ASPIC ) and in several other bacterial proteins (e.g. ) PUBMED:12536216.

    \ 5010 IPR000058 \ This domain was first identified as a zinc finger at the C-terminus of An1 a ubiquitin-like\ protein in Xenopus laevis PUBMED:8390387.\ The following pattern describes the zinc finger.\
    \
    C-X2-C-X(9-12)-C-X(1-2)-C-X4-C-X2-H-X5-H-X-C\
    
    \ where X can be any amino acid, and numbers in brackets indicate the number of residues. It has now been identified in a number of, as yet uncharacterised proteins from various sources.\ 7065 IPR009874 \

    This family consists of several hypothetical bacterial and one archaeal sequence of around 120 residues in length. The function of this family is unknown.

    \ 4399 IPR002579 \

    Peptide methionine sulphoxide reductase (Msr) reverses the inactivation of many proteins due to the oxidation of critical methionine residues by reducing methionine sulphoxide, Met(O), to methionine PUBMED:10841552. It is present in most living organisms, and the cognate structural gene belongs to the so-called minimum gene set PUBMED:8994848, PUBMED:8816789.

    \ \

    The domains: MsrA and MsrB, reduce different epimeric forms of methionine sulphoxide. This group represents MsrB, the crystal structure of which has been determined to 1.8A PUBMED:11938352. The overall structure shows no resemblance to the structures of MsrA () from other organisms; though the active sites show approximate mirror symmetry. In each case, conserved amino acid motifs mediate the stereo-specific recognition and reduction of the substrate. Unlike the MsrA domain, the MsrB domain activates the cysteine or selenocysteine nucleophile through a unique Cys-Arg-Asp/Glu catalytic triad. The collapse of the reaction intermediate most likely results in the formation of a sulfenic or selenenic acid moiety. Regeneration of the active site occurs through a series of thiol-disulfide exchange steps involving another active site Cys residue and thioredoxin.

    \ \

    In a number of pathogenic bacteria including Neisseria gonorrhoeae, the MsrA and MsrB domains are fused; the MsrA being N-terminal to MsrB. This arrangement is reversed in Treponema pallidum. In Neisseria gonorrhoeae and Neisseria meningtidis a thioredoxin domain is fused to the N-terminus. This may function to reduce the active sites of the downstream MsrA and MsrB domains.

    \ \ 1464 IPR003958 \

    The CCAAT-binding factor (CBF) is a mammalian transcription factor that binds to a CCAAT motif in the\ promoters of a wide variety of genes, including type I collagen and albumin. The factor is a heteromeric\ complex of A and B subunits, both of which are required for DNA-binding PUBMED:2266139, PUBMED:1549471. The \ subunits can interact in the absence of DNA-binding, conserved regions in each being important in mediating \ this interaction.

    The A subunit can be split into 3 domains on the basis of sequence similarity, a \ non-conserved N-terminal 'A domain'; a highly-conserved central 'B domain' involved in DNA-binding; and a \ C-terminal 'C domain', which contains a number of glutamine and acidic residues involved in protein-protein \ interactions PUBMED:1549471. The A subunit shows striking similarity to the HAP3 subunit of the yeast \ CCAAT-binding heterotrimeric transcription factor PUBMED:1549471, PUBMED:7845362. The Kluyveromyces lactis HAP3 protein \ has been predicted to contain a 4-cysteine zinc finger, which is thought to be present in similar HAP3\ and CBF subunit A proteins, in which the third cysteine is replaced by a serine PUBMED:7845362. This domain is found in the CCAAT transcription factor and archaeal histones.

    \ 5503 IPR008535 \ This family consists of several bacterial proteins of unknown function.\ 1735 IPR004177 \ The DDHD domain is 180 residues long and contains four conserved residues that may form a metal binding site. The domain is named after these four residues. This pattern of conservation of metal binding residues is often seen in phosphoesterase domains. This domain is found in retinal degeneration B proteins, as well as a family of probable phospholipases.\ 7209 IPR010863 \

    This family consists of several hypothetical archaeal proteins of around 110 residues in length. The function of this family is unknown, although one sequence () is described as a putative HTH transcription regulator.

    \ 804 IPR000684 \ RNA polymerase II () PUBMED:1883205, PUBMED:1700503 is one of the three forms of RNA polymerase that\ exist in eukaryotic nuclei. The C-terminal region of the largest subunit of this oligomeric enzyme consists\ of the tandem repeat of a conserved heptapeptide PUBMED:2251729. The number of repeats varies according to\ the species (for example there are 17 in Plasmodium, 26 in yeast, 44 in Drosophila, and 52 in mammals). The\ region containing these repeats is essential for the function of polymerase II. This repeated heptapeptide\ (called CT7n or CTD) is rich in hydroxyl groups. It probably projects out of the globular catalytic domain\ and may interact with the acidic activator domains of transcriptional regulatory proteins. It is also known\ to bind by intercalation to DNA. RNA polymerase II is activated by phosphorylation. The serine and threonine\ residues in the CT7n repeats are the target of such phosphorylation.\ 3816 IPR005068 \

    This repeat is found in the tail fibers of phage, for example protein K PUBMED:7676622 but bacterial homologues have also been identified. The repeats are about 40 residues long.

    \ 6151 IPR009396 \

    This family consists of several eukaryotic pigment-dispersing hormone (PDH) proteins. The pigment-dispersing hormone (PDH) is produced in the eyestalks of Crustacea where it induces light-adapting movements of pigment in the compound eye and regulates the pigment dispersion in the chromatophores PUBMED:8477858.

    \ 3748 IPR001384 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M35 (deuterolysin family, clan MA(M)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA.

    \ \

    Deuterolysin is a microbial zinc-containing metalloprotease that shows\ some similarity to thermolysin PUBMED:1886621. The protein is expressed with a\ possible 19-residue signal sequence, a 155-residue propeptide, and an\ active peptide of 177 residues PUBMED:8049277. The latter contains an HEXXH motif\ towards the C-terminus, but the other zinc ligands are as yet undetermined\ PUBMED:1886621, PUBMED:8049277.

    \ 6207 IPR009426 \

    This family consists of several Barley yellow dwarf virus proteins of unknown function.

    \ 374 IPR011999 \

    Flaviviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include yellow fever, West Nile, tick-borne encephalitis (TBE), Japanese encephalitis (JE) and Dengue type 2 viruses PUBMED:15378043. Flaviviruses consist of three structural proteins: the core nucleocapsid protein C (), and the envelope glycoproteins M () and E. Glycoprotein E is a class II viral fusion protein that mediates both receptor binding and fusion. Class II viral fusion proteins are found in flaviviruses and alphaviruses, and are structurally distinct from class I fusion proteins from influenza virus and HIV. Glycoprotein E is comprised of three domains: domain I (dimerisation domain) is an 8-stranded beta barrel, domain II (central domain) is an elongated domain composed of twelve beta strands and two alpha helices, and domain III (immunoglobulin-like domain) is an IgC-like module with ten beta strands. This entry represents domains I and II, which are intertwined PUBMED:7753193.

    \

    The glycoprotein E dimers on the viral surface re-cluster irreversibly into fusion-competent trimers upon exposure to low pH, as found in the acidic environment of the endosome. The formation of trimers results in a conformational change in the hinge region of domain II, a key structural element that opens a ligand-binding hydrophobic pocket at the interface between domains I and II. The conformational change results in the exposure of a fusion peptide loop at the tip of domain II, which is required in the fusion step to drive the cellular and viral membranes together by inserting into the membrane PUBMED:12759475.

    \ \ 2378 IPR007286 \

    EAP30 is a subunit of the ELL complex. The ELL is an 80-kDa RNA polymerase II transcription factor. ELL interacts with three other proteins to form the complex known as ELL complex. The ELL complex is capable of increasing that catalytic rate of transcription elongation, but is unable to repress initiation of transcription by RNA polymerase II as is the case of ELL. EAP30 is thought to lead to the derepression of ELL's transcriptional inhibitory activity.

    \ 4848 IPR005343 \

    This is a small family of mainly hypothetical proteins of unknown function.

    \ 4813 IPR000631 \

    Several uncharacterised proteins have been shown to share regions of similarities, including yeast chromosome\ XI hypothetical protein YKL151c; Caenorhabditis elegans hypothetical protein R107.2; Escherichia coli hypothetical protein yjeF;\ Bacillus subtilis hypothetical protein yxkO; Helicobacter pylori hypothetical protein HP1363; Mycobacterium\ tuberculosis hypothetical protein MtCY77.05c; Mycobacterium leprae hypothetical protein B229_C2_201;\ Synechocystis strain PCC 6803 hypothetical protein sll1433; and Methanococcus jannaschii hypothetical protein\ MJ1586. These are proteins of about 30 to 40 kDa whose central region is well conserved.

    \ \ 7852 IPR013113 \

    Proteins in this entry are siderophore-interacting FAD-binding proteins.

    \

    This entry includes the vibriobactin utilization protein ViuB, which is involved in the removal of iron from iron-vibriobactin complexes, as well as several hypothetical proteins.

    \ 3822 IPR006480 \

    This group of sequences describe one of the many mutually dissimilar families of holins, phage proteins that act together with lytic enzymes in bacterial lysis. This family includes, besides phage holins, the protein TcdE/UtxA involved in toxin secretion in Clostridium difficile and related species PUBMED:11444771.

    \ 328 IPR006314 \

    A defined member of this superfamily is Dyp, a dye-decolorizing peroxidase that lacks a typical heme-binding region PUBMED:10742277. A distinct, uncharacterized branch of this superfamily has a typical twin-arginine dependent signal sequence characteristic of exported proteins with bound redox cofactors.

    \ 1989 IPR005180 \

    This domain is found in an undescribed set of proteins. It normally occurs uniquely within a sequence, but is found as a tandem repeat (). It has an interesting phylogenetic distribution with the majority of examples in bacteria and archaea, but it is also found in Drosophila melanogaster (e.g. ).

    \ 2086 IPR007351 \ This is a family of uncharacterised proteins.\ 439 IPR001675 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 29 () comprises enzymes with a number of known activities; sialyltransferase (), beta-galactosamide alpha-2,6-sialyltransferase (), alpha-N-acetylgalactosaminide alpha-2,6-sialyltransferase (), beta-galactoside alpha-2,3-sialyltransferase (), N-acetyllactosaminide alpha-2,3-sialyltransferase (), alpha-N-acetyl-neuraminide alpha-2,8-sialyltransferase (); lactosylceramide alpha-2,3-sialyltransferase (). These enzymes use a nucleotide monophosphosugar as the donor (CMP-NeuA) instead of a nucleotide diphosphosugar.

    \ \

    Sialyltransferase may be responsible for the synthesis of the sequence NEUAC-Alpha-2,3-GAL-Beta-1,3-GALNAC-, found on sugar chains O-linked to thr or ser and also as a terminal sequenec on certain gagnliosides. These enzymes catalyse sialyltransfer reactions during glycosylation, and are type II membrane proteins.

    \ 6101 IPR010433 \

    This family consists of several plant specific eukaryotic initiation factor 4B proteins.

    \ 6707 IPR009673 \

    This family contains hypothetical proteins of unknown function that are approximately 200 residues long. They seem to be specific to Caenorhabditis elegans.

    \ 4727 IPR006037 \

    This domain is often found next to the TrkA-N domain. The exact function of this domain is unknown. It has been suggested that it may bind an unidentified ligand. The domain is predicted to adopt an all beta structure PUBMED:11292341.

    \ 5085 IPR007922 \

    This family contains several actinomycete proteins of unknown function.

    \ 7499 IPR011639 \ Homologues of the Escherichia coli Eco57I restriction endonuclease are found in several phylogenetically diverse bacteria.\ 4583 IPR005334 \

    Tctex-1 is a dynein light chain. Dynein translocates rhodopsin-bearing vesicles along microtubules and it has been shown that Tctex-1 can bind to the cytoplasmic tail of rhodopsin. An efficient vectorial transport\ system must be required to deliver large numbers of newly synthesized rhodopsin molecules (~107 molecules per\ day per photoreceptor) to the base of the outer segment of the photoreceptor, Tctex-1 may well play a role in this process. C-terminal rhodopsin mutations responsible for retinitis pigmentosa inhibit the interaction between Tctex-1 and rhodopsin, which may be the molecular basis of\ retinitis pigmentosa.

    In the mouse, the\ chromosomal location and pattern of expression of Tctex-1 make it a candidate for involvement in male sterility PUBMED:2570638.

    \ 5560 IPR008813 \ This family consists of Firmicute RepL proteins which are involved in plasmid replication.\ 4973 IPR003777 \

    This entry describes proteins of unknown function.

    \ 3599 IPR000183 \ These enzymes are collectively known as group IV decarboxylases PUBMED:8181483.\ Pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and\ related substrates can be classified into two different families on the basis\ of sequence similarities PUBMED:3143046, PUBMED:8181483.\ Members of this family while most probably evolutionary related, do not share\ extensive regions of sequence similarities. The proteins contain a conserved lysine\ residue which is known, in mouse ODC PUBMED:1730582, to be the site of attachment of the\ pyridoxal-phosphate group. The proteins also contain a stretch of three\ consecutive glycine residues and has been proposed to be part of a substrate-\ binding region PUBMED:2198270.\ 5770 IPR010268 \

    This family consists of several archaeal PaREP1 proteins, the function of the family is unknown.

    \ 3213 IPR005014 \

    Actinobacillus pleuropneumoniae is the etiological agent of swine pleuropneumonia. The gene encoding an outer membrane lipoprotein A (OmlA) of Actinobacillus pleuropneumoniae has been cloned from several serotypes and is thought to exist as allelic variants PUBMED:9809431.

    \ 2627 IPR006211 \ The furin-like cysteine rich region has been found in a variety of proteins from eukaryotes that are involved in the mechanism of signal transduction by receptor tyrosine kinases, which involves receptor aggregation PUBMED:1936959.\ 1346 IPR006997 \

    This is a family of Baculovirus proteins of unknown function.

    \ 4254 IPR000529 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S6 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S6 is known to bind together with S18 to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups bacterial, red algal chloroplast and cyanelle S6 ribosomal proteins.

    \ 1577 IPR004271 \

    This family represents the matrix protein, M1, of influenza C virus. The M1 protein is the product of a spliced mRNA. Small quantities of the unspliced mRNA are found in the cell additionally encoding the M2 protein (see ).

    \ 5078 IPR007915 \

    This family of proteins is functionally uncharacterised.

    \ 1343 IPR007601 \ Polyhedra are large crystalline occlusion bodies containing nucleopolyhedrovirus virions, and surrounded by an electron-dense structure called the polyhedron envelope or polyhedron calyx. The polyhedron envelope (associated) protein PEP is thought to be an integral part of the polyhedron envelope. PEP is concentrated at the surface of polyhedra, and is thought to be important for the proper formation of the periphery of polyhedra. It is thought that PEP may stabilise polyhedra and protect them from fusion or aggregation PUBMED:8176372.\ 2522 IPR003303 \ Filaggrins are filament-associated proteins that interact with keratin\ intermediate filaments of terminally differentiating mammalian epidermis\ via disulphide bond formation PUBMED:2740331. They show wide species variations and\ their aberrant expression has been implicated in a number of keratinising\ disorders. The proteins are synthesised as large, insoluble, highly-\ phosphorylated precursors, containing multiple tandem repeats of 324 amino\ acids, which are not separated by a large linker. The precursor is\ deposited as keratohyalin granules. During terminal differentiation, it\ is dephosphorylated and proteolytically cleaved.\ 2767 IPR005196 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    The family of glycosyl hydrolases () containing this domain includes vacuolar acid trehalase and maltose phosphorylase. Maltose phosphorylase (MP) is a dimeric enzyme that catalyzes the conversion of maltose and inorganic phosphate into beta-D-glucose-1-phosphate and glucose. This domain is believed to be essential for catalytic activity PUBMED:11587643 although its precise function remains unknown.

    \ 7702 IPR012453 \

    This family of small proteins seems to be found in several places in the Coxiella genome.

    \ 6125 IPR010442 \

    This domain is suggested to be involved in protein-protein interactions PUBMED:10485852. The domain is found in conjunction with .

    \ 3342 IPR007328 \ Mytilus foot protein-3 (Mfp-3) is a highly polymorphic protein family located in the byssal adhesive plaques of blue mussels.\ 4564 IPR007707 \ This family contains the proteins TACC 1, 2 and 3, found concentrated in the centrosomes of eukaryotes which may play a conserved role in organising centrosomal microtubules. The human TACC proteins have been linked to cancer and TACC2 has been identified as a possible tumour suppressor (AZU-1) PUBMED:11121038.\ 1561 IPR004680 \ This family includes two characterized citrate/proton symporters from Bacillus subtilis. CitM transports citrate complexed to Mg2+, while the CitH apparently transports citrate without Mg2+. The family also includes uncharacterized transporters, including a third paralog in Bacillus subtilis.\ 6486 IPR009548 \

    This family consists of several hypothetical eukaryotic proteins of unknown function.

    \ 2134 IPR007425 \

    This sequence is usually found in association with and , and occasionally also with in integral membrane proteins. Together, this entry, and make up the C-terminal portion of Staphylococcus aureus FmtC/MprF, which is involved in resistance to defensins by the lysinylation of membrane phospholipids PUBMED:11342591. This domain along with and also occurs adjacent to the OB-fold nucleic acid binding domain () and tRNA synthetase class II () in lysyl-tRNA synthases.

    \ 7061 IPR010824 \

    This family consists of several hypothetical bacterial proteins of around 125 residues in length. Several members of this family are described as putative lipoproteins and are often known as YcfL. The function of this family is unknown.

    \ 2174 IPR007470 \

    The majority of proteins in this family are annotated as uroporphyrin-III C-methyltransferase () PUBMED:3062586; however, there is no direct evidence to support this annotation for these proteins, which come from mainly pathogenic Gram-negative organisms. There is some evidence to suggest that the proteins are membrane anchored as they have a predicted N-terminal signal peptide and transmembrane domain and may be involved in haem transport PUBMED:9006036.

    \ 1659 IPR003691 \

    Three genes, crcA, cspE and crcB when present in high copy confer camphor resistance on a cell and suppress mutations in the chromosomal partition gene mukB in Escherichia coli. The cspE gene has been previously identified as a cold shock-like protein with homologues in all organisms tested PUBMED:8844142.

    \ \

    Camphor and mukB mutations may interfere with chromosome condensation and high copy crcA, cspE and crcB have been implicated as promoting or protecting chromosome folding PUBMED:8844142.

    \ 6932 IPR009796 \

    This family consists of several hypothetical Streptococcus thermophilus bacteriophage proteins of around 130 residues in length. One of the sequences in this family, from phage Sfi11 (Swisss:O80186) is known as Gp149. The function of this family is unknown.

    \ 5777 IPR010273 \

    This family consists of a series of hypothetical bacterial proteins. One of the family members from Bacillus subtilis is thought to be involved in cell division and sporulation PUBMED:2556375.

    \ 3020 IPR006894 \ This domain represents a C-terminal conserved region found in these bacterial proteins necessary for hydrogenase synthesis. Their precise function is unknown PUBMED:8045431.\ 906 IPR001647 \ Numerous bacterial transcription regulatory proteins bind DNA via a helix-turn-helix (HTH) motif. These proteins are very diverse, but for convenience may be grouped into subfamilies on the basis of sequence similarity. One such family groups together a range of proteins, including acrR, betI, bm3R1, envR, qacR, mtrR, tcmR, tetC, tetR, ttk, ybiH, and yhgD PUBMED:7826010, PUBMED:8196548, PUBMED:8428974. Many of these proteins function as repressors that control the level of susceptibility to hydrophobic antibiotics and detergents. They all have similar molecular weights - from 21 to 25 kDa. The helix-turn-helix motif is located in the initial third of the protein. The 3D structure of the homodimeric TetR protein complexed with 7-chloro-tetracycline-magnesium has been determined to 2.1 A resolution PUBMED:7707374. TetR folds into 10 alpha-helices with connecting turns and loops. The 3 N-terminal alpha-helices of the repressor form the DNA-binding domain: this structural motif encompasses an HTH fold with an inverse orientation compared with that of other DNA-binding proteins.\ 2090 IPR007355 \ This is an archaeal protein of unknown function.\ 1539 IPR001344 \

    The light-harvesting complex (LHC) consists of chlorophylls A and B and the chlorophyll A-B binding protein. LHC functions as a light receptor that captures and delivers excitation energy to photosystems I and II with which it is closely associated. Under changing light conditions, the reversible phosphorylation of light harvesting chlorophyll a/b binding proteins (LHCII) represents a system for balancing the excitation energy between the two photosystems PUBMED:15225658.

    \

    The N-terminus of the chlorophyll A-B binding protein extends into the stroma where it is involved with adhesion of granal membranes and photo-regulated by reversible phosphorylation of its threonine residues PUBMED:10682866. Both these processes are believed to mediate the distribution of excitation energy between photosystems I and II.

    \

    This family also includes the photosystem II protein PsbS, which plays a role in energy-dependent quenching that increases thermal dissipation of excess absorbed light energy in the photosystem PUBMED:15033974.

    \ \ 5227 IPR008608 \ This family contains several mammalian ectropic viral integration site 2A (EVI2A) proteins. The function of this protein is unknown although it is thought to be a membrane protein and may function as an oncogene in retrovirus induced myeloid tumours PUBMED:2167436, PUBMED:2117566.\ 3283 IPR002771 \

    Members of this family are integral membrane proteins that includes the antibiotic resistance protein MarC. These proteins may be transporters.

    \ 118 IPR002524 \

    Members of this family are integral membrane proteins, that\ are found to increase tolerance to divalent metal ions such\ as cadmium, zinc, and cobalt. These proteins are considered to\ be efflux pumps that remove these ions from cells PUBMED:9696746, PUBMED:8829543, however others are implicated in ion uptake PUBMED:1508175. The\ family has six predicted transmembrane domains. Members of the family are variable\ in length because of variably sized inserts, often containing low-complexity sequence.

    \ 6625 IPR009630 \

    This family consists of several hypothetical proteins of around 415 residues in length which seem to be specific to the bacterium Leptospira interrogans.

    \ 6805 IPR009722 \

    This entry represents a conserved region approximately 100 residues long within a number of hypothetical bacterial proteins that may be regulated by SdiA, a member of the LuxR family of transcriptional regulators PUBMED:9495757. Some proteins contain the repeat.

    \ 4097 IPR000238 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosome-binding factor A PUBMED:9422595 (gene rbfA) is a bacterial protein that\ associates with free 30S ribosomal subunits. It does not associate with 30S\ subunits that are part of 70S ribosomes or polysomes. It is essential for\ efficient processing of 16S rRNA.\ Ribosome-binding factor A is a protein of from 13 to 15 Kd which is found in\ most prokaryotic organisms. A putative chloroplastic form seems to exist in\ plants.

    \ 6889 IPR010760 \

    This is a group of proteins of unknown function.

    \ 152 IPR008160 \ Members of this family belong to the collagen superfamily PUBMED:8240831.\ Collagens are generally extracellular structural proteins\ involved in formation of connective tissue structure.\ The sequence is predominantly repeats of the G-X-Y and the polypeptide chains\ form a triple helix. The first position of the repeat is\ glycine, the second and third positions can be any residue\ but are frequently proline and hydroxyproline. Collagens\ are post-translationally modified by proline hydroxylase\ to form the hydroxyproline residues. Defective\ hydroxylation is the cause of scurvy.\

    Some members of the collagen superfamily are not involved\ in connective tissue structure but share the same triple\ helical structure.

    \ 2376 IPR000148 \ This family includes the E7 oncoprotein from various papillomaviruses PUBMED:11422538. Along with E5 and E6 their activities seem to be especially important for viral oncogenesis. E5 is located at the cell surface and reduces cell gap-gap junction communication. In cervical cancer E5 is expressed in earlier \ stages of neoplastic transformation of the cervical epithelium during viral infection. The role of E7 is less well understood but it has been shown to impede growth arrest signals in both NIH 3T3 cells and HFKs and that this correlates with elevated cdc25A gene expression. This deregulation of cdc25A is\ linked to disruption of cell cycle arrest PUBMED:11752153.\ \ 7804 IPR012947 \

    The catalytically active form of threonyl/alanyl tRNA synthetase is a dimer. Within the tRNA synthetase class II dimer, the bound tRNA interacts with both monomers making specific interactions with the catalytic domain, the C-terminal domain, and this SAD domain (the second additional domain). The second additional domain is comprised of a pair of perpendicularly orientated antiparallel beta sheets, of four and three strands, respectively, that surround a central alpha helix that forms the core of the domain PUBMED:10319817.

    \ 8004 IPR012955 \

    This domain is the C-terminal region of the CASP family of proteins. These are Golgi membrane proteins which are thought to have a role in vesicle transport PUBMED:12429822.

    \ 7123 IPR009912 \

    This family consists of several hypothetical bacterial proteins of around 160 residues in length. Members of this family contain four highly conserved cysteine resides toward the C-terminal region of the protein. The function of this family is unknown.

    \ 870 IPR007259 \

    Members of this family are spindle pole body (SBP) components such as Spc97, Spc98 and gamma-tubulin. The SPB functions as the microtubule-organising centre in yeast, with the microtubule cytoskeleton playing an essential role in chromosome segregation, cellular organisation and vesicle trafficking in eukaryotic cells. In most cells, the centrosome is the primary microtubule-organising centre that nucleates and organises microtubules. Gamma-tubulin localises to centrosomes and is required for microtubule nucleation. In Saccharomyces cerevisiae, gamma-tubulin forms a stable complex with Spc97 and Spc98 PUBMED:11950928.

    \ 4726 IPR005657 \

    This family contains saliva proteins from haematophagous insects that counteract vertebrate host haemostasis events such as coagulation,\ vasoconstriction and platelet aggregation PUBMED:12421416. These include:

    \ \

    All members of this family belong to MEROPS proteinase inhibitor family I59, clan IZ.

    \ \ \ 657 IPR003014 \

    It has been shown that, the N-terminal N domains of members of the plasminogen/hepatocyte growth factor family, the apple domains of the plasma prekallikrein/coagulation factor XI family, and domains of various nematode proteins belong to the same module superfamily, the PAN module PUBMED:10561497. PAN contains a conserved core of three disulphide\ bridges. In some members of the family there is an additional\ fourth disulphide bridge that links the N and C termini of the\ domain. The domain is found in diverse proteins, in some the domain\ mediates protein-protein interactions, in others it mediates\ protein-carbohydrate interactions.

    \ 1504 IPR004461 \ The carbon monoxide dehydrogenase alpha subunit () catalyses the interconversion of CO and CO2 and the synthesis of acteyl-coA from the methylated corrinoid/iron sulphur protein, CO and CoA. Nomenclature follows the description for Methanosarcina thermophila. The complex is also found in Archaeoglobus fulgidus, not considered a methanogen, but is otherwise generally associated with methanogenesis.\ 3715 IPR000588 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of sequences contain a aspartic peptidase signature that belongs to MEROPS peptidase family A3, subfamily A3A (cauliflower mosaic virus-type endopeptidase, clan AA). \ \ \ Cauliflower mosaic viruses belong to a group of plant viruses known as pararetroviruses, which have a \ double-stranded DNA genome. The genome includes an open reading frame (ORF V) that shows similarities to the pol\ gene of retroviruses. This ORF codes for a polyprotein that includes a reverse transcriptase, which, on the basis\ of a DTG triplet near the N-terminus, was suggested to include an aspartic protease. The presence of an aspartic\ protease has been confirmed by mutational studies, implicating Asp-45 in catalysis. The protease releases itself\ from the polyprotein and is involved in reactions required to process the ORF IV polyprotein, which includes the\ viral coat protein PUBMED:7674916. The viral aspartic peptidase signature has also been found associated with a polyprotein encoded by integrated pararetrovirus-like sequences in the genome of Nicotiana tabacum (tobacco) PUBMED:10557305.

    \ 363 IPR001041 \

    The ferredoxin protein family are electron carrier proteins with an iron-sulphur cofactor that act in a wide variety of metabolic reactions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron-sulphur cluster(s) and according to sequence similarities.

    \

    This entry represents members of the 2Fe-2S ferredoxin family that have a general core structure consisting of beta(2)-alpha-beta(2), which includes putidaredoxin and terpredoxin, and adrenodoxin PUBMED:11173487, PUBMED:15755454, PUBMED:10220356, PUBMED:12069587. \ They are proteins of around one hundred amino acids with four conserved cysteine residues to which the 2Fe-2S cluster is ligated. This conserved region is also found as a domain in various metabolic enzymes and in multidomain proteins, such as aldehyde oxidoreductase (N-terminal), xanthine oxidase (N-terminal), phthalate dioxygenase reductase (C-terminal), succinate dehydrogenase iron-sulphur protein (N-terminal), and methane monooxygenase reductase (N-terminal).

    \ \ 859 IPR007022 \ Survival motor neuron (SMN) interacting protein 1, SIP1, interacts with SMN protein and plays a crucial role in the biogenesis of spliceosomes. There is an evidence that the protein is linked to spinal muscular atrophy and amyotrophic lateral sclerosis in humans PUBMED:11943600.\ 5008 IPR002530 \

    Alpha-prolamins are the major seed storage proteins of species of the grass tribe Andropogonea. They are unusually rich in glutamine, proline, alanine, and leucine residues and their sequences show a series of tandem repeats presumed to be the result of multiple intragenic duplication PUBMED:8451243. In Zea mays, the 22 kDa and 19 kDa zeins are encoded by a large multigene family and are the major seed storage proteins accounting for 70% of the total zein fraction. Structurally the 22 kDa and 19 kDa zeins are composed of nine adjacent, topologically antiparallel helices clustered within a distorted cylinder. The 22 kDa alpha-zeins are encoded by 23 genes PUBMED:11691845; twenty-two of the members are found in a roughly tandem array forming a dense gene cluster. The expressed genes in the cluster are interspersed with nonexpressed genes. Interestingly, some of the expressed genes differ in their transcriptional regulation. Gene amplification appears to be in blocks of genes explaining the rapid and compact expansion of the cluster during the evolution of maize.

    \ 7761 IPR012856 \

    D-aminopeptidase () is a dimeric enzyme with each monomer being composed of three domains. Domain B is organised to form a beta barrel made up of eight antiparallel beta strands. It is connected to domain A, the catalytic domain, by an eight-residue sequence, and also interacts with both domains A and C via non-covalent bonds. Domain B probably functions in maintaining domain C in a good position to interact with domain A PUBMED:10986464.

    \ 1525 IPR000673 \ CheB methylesterase is responsible for removing the methyl group from the gamma-glutamyl methyl\ ester residues in the methyl-accepting chemotaxis proteins (MCP). The enzyme catalyzes the reaction:\ protein L-glutamate O-methyl ester and water is converted to protein L-glutamate and methanol. CheB\ is regulated through phosphorylation by CheA. The N-terminal region of the protein is similar to that\ of other regulatory components of sensory transduction systems. The Myxococcus FrzG protein also\ belongs to this family, and is required for the normal aggregation of cells during fruiting body\ formation.\ 4559 IPR001058 \

    Synucleins are small, soluble proteins expressed primarily in neural tissue and in certain tumors PUBMED:9750188, PUBMED:11806835. The family includes three known proteins: alpha-synuclein, beta-synuclein, and gamma-synuclein. All synucleins have in common a highly conserved alpha-helical lipid-binding motif with similarity to the class-A2 lipid-binding domains of the exchangeable apolipoproteins PUBMED:10952980.

    \

    Synuclein family members are not found outside vertebrates, although they have some conserved structural similarity with plant 'late-embryo-abundant' proteins. The alpha- and beta-synuclein proteins are found primarily in brain tissue, where they are seen mainly in presynaptic terminals PUBMED:7857654, PUBMED:7877458. The gamma-synuclein protein is found primarily in the peripheral nervous system and retina, but its expression in breast tumors is a marker for tumor progression PUBMED:9044857.\ Normal cellular functions have not been determined for any of the synuclein proteins,\ although some data suggest a role in the regulation of membrane stability and/or turnover.\ Mutations in alpha-synuclein are associated with rare familial cases of early-onset Parkinson's\ disease, and the protein accumulates abnormally in Parkinson's disease, Alzheimer's disease,\ and several other neurodegenerative illnesses PUBMED:11433374.

    \ 7874 IPR011725 \

    This entry describes a very small protein, coenzyme PQQ biosynthesis protein A, which is smaller than 25 amino acids in many species. It is proposed to serve as a peptide precursor of coenzyme pyrrolo-quinoline-quinone (PQQ), with Glu and Tyr of a conserved motif Glu-Xxx-Xxx-Xxx-Tyr becoming part of the product PUBMED:9467911.

    \ 2871 IPR003427 \

    Histidine carboxylase catalyses the formation of histamine from histidine. It requires a pyruvoyl group for its activity. Cleavage of the proenzyme PI chain yields two subunits, alpha and beta, which arrange as a hexamer (alpha beta) 6 by nonhydrolytic self-catalysis.

    \ 278 IPR007180 \ This domain is specific to the human splicing factor 3b subunit 2 and its orthologs.\ 298 IPR006736 \

    This family consists of several uncharacterised plant proteins which share a conserved region.

    \ 4886 IPR002026 \

    Urease is a nickel-binding enzyme that catalyzes the hydrolysis of urea to carbon dioxide\ and ammonia PUBMED:3402446:\ \ Historically, it was the first enzyme to be crystallized (in 1926). It is mainly\ found in plantseeds and microorganisms. In plants, urease is a hexamer of identical chains. In bacteria\ PUBMED:2651866, it consists of either two or three different subunits (alpha , beta and gamma, described in this entry). The structure of the\ urease complex is known PUBMED:7754395.

    \ \ This subunit does not appear to take part in the catalytic mechanism.\ 7864 IPR013123 \

    This domain is a RNA 2'-O ribose methyltransferase substrate binding domain.

    \ 6678 IPR009656 \

    This entry represents the C terminus of bacterial poly(3-hydroxybutyrate) (PHB) de-polymerase. This degrades PHB granules to oligomers and monomers of 3-hydroxy-butyric acid.

    \ 6879 IPR009764 \

    This family consists of several ovarian carcinoma immunoreactive antigen (OCIA) and related eukaryotic sequences. The function of this family is unknown PUBMED:11162530,PUBMED:12445744.

    \ 1811 IPR007015 \ Proteins of this family are predominantly nucleolar. The majority are described as transcription factor transactivators. The family also includes the fifth essential DNA polymerase (Pol5p) of Schizosaccharomyces pombe and Saccharomyces cerevisiae (). Pol5p is localized exclusively to the nucleolus and binds near or at the enhancer region of rRNA-encoding DNA repeating units. \ 103 IPR013069 \

    The BTB (for BR-C, ttk and bab) PUBMED:7938017 or POZ (for Pox virus and Zinc finger)\ PUBMED:7958847 domain is present near the N terminus of a fraction of zinc finger\ () proteins and in proteins that contain the motif such as Kelch and a family of pox virus proteins.\ The BTB/POZ domain mediates homomeric dimerisation and in some instances\ heteromeric dimerisation PUBMED:7958847.\ The structure of the dimerised PLZF BTB/POZ domain has been solved and\ consists of a tightly intertwined homodimer. The central scaffolding of\ the protein is made up of a cluster of alpha-helices flanked by short\ beta-sheets at both the top and bottom of the molecule PUBMED:9770450.\ POZ domains from several zinc finger proteins have been shown to mediate\ transcriptional repression and to interact with components of histone\ deacetylase co-repressor complexes including N-CoR and SMRT PUBMED:9019154, PUBMED:9824158, PUBMED:9765306.\ The POZ or BTB domain is also known as BR-C/Ttk or ZiN.

    \ 4551 IPR007233 \ Sybindin is a physiological syndecan-2 ligand on dendritic spines, the small protrusions on the surface of dendrites that receive the vast majority of excitatory synapses. Syndecan-2 induces spine formation by recruiting intracellular vesicles toward postsynaptic sites through the interaction with synbindin PUBMED:11018053. \ 6118 IPR009385 \

    This family consists of several plasmid SOS inhibition protein (PsiB) sequences PUBMED:9987116.

    \ 4785 IPR001732 \

    The UDP-glucose/GDP-mannose dehydrogenases are a small group of enzymes which possesses the ability to catalyze the NAD-dependent 2-fold oxidation of an alcholol to an acid without the release of an aldehyde intermediate PUBMED:2470755, PUBMED:9013585.

    \ \

    The enzymes have a wide range of functions. In plants UDP-glucose dehydrogenase, , is an important enzyme in the synthesis of hemicellulose and pectin PUBMED:12031484, which are the components of newly formed cell walls; while in zebrafish UDP-glucose dehydrogenase is required for cardiac valve formation PUBMED:11533493. In Xanthomonas campestris, a plant pathogen, UDP-glucose dehydrogenase is required for virulence PUBMED:11554764.

    \ \

    GDP-mannose dehydrogenase, , catalyzes the formation of GDP-mannuronic acid, which is the monomeric unit from which the exopolysaccharide alginate is formed. Alginate is secreted by a number of bacteria, which include, the pathogenic bacterium Pseudomonas aeruginosa and Azotobacter vinelandii. In Pseudomonas aeruginosa alginate is believed to play an important role in the bacteria's resistance to antibiotics and the host immune response PUBMED:12135385, while in Azotobacter vinelandii it is essential for the encystment process PUBMED:9864323.

    \ 330 IPR004953 \ The human EB1 protein was originally discovered as a protein interacting with the C-terminus of the APC protein. This interaction is often\ disrupted in colon cancer, due to deletions affecting the APC C-terminus. Several EB1 orthologues are also included in this family. The\ interaction between EB1 and APC has been shown to have a potent synergistic effect on microtubule polymerization. Neither of EB1 or\ APC alone has this effect. It is thought that EB1 targets APC to the + ends of microtubules, where APC promotes microtubule\ polymerization. This process is regulated by APC phosphorylation by Cdc2, which disrupts APC-EB1 binding. Human EB1 protein can\ functionally substitute for the yeast EB1 homologue Mal3. In addition, Mal3 can substitute for human EB1 in promoting microtubule\ polymerization with APC. \ \ 1749 IPR000167 \

    A number of proteins are produced by plants that experience water-stress.\ Water-stress takes place when the water available to a plant falls below a\ critical level. The plant hormone abscisic acid (ABA) appears to modulate the\ response of plant to water-stress. Proteins that are expressed during water-\ stress are called dehydrins PUBMED:2562763, PUBMED:1387328. Dehydrins contribute to freezing stress tolerance in plants and it was suggested that this could be partly due to their protective effect on membranes PUBMED:15356392.

    \ \ \

    Dehydrins share a number of structural features. One of the most notable\ features is the presence, in their central region, of a continuous run of\ five to nine serines followed by a cluster of charged residues. Such a region\ has been found in all known dehydrins so far with the exception of pea\ dehydrins. A second conserved feature is the presence of two copies of a\ lysine-rich octapeptide; the first copy is located just after the cluster\ of charged residues that follows the poly-serine region and the second copy\ is found at the C-terminal extremity.

    \ 3228 IPR004564 \

    This protein, LolA, is known so far only in the gamma subdivision of the Proteobacteria. \ In Escherichia coli, lipoproteins are anchored to the\ periplasmic side of either the inner or outer membrane through N-terminal lipids, depending on the lipoprotein-sorting signal present at\ position 2 PUBMED:12032293. Five Lol proteins are involved in the sorting and outer membrane localization of lipoproteins. LolCDE, an ATP\ binding cassette (ABC) transporter, in the inner membrane releases outer membrane-directed lipoproteins from the inner membrane in an ATP-dependent manner, leading to the formation of a water-soluble complex between the lipoprotein and the molecular chaperone, LolA. The LolA-lipoprotein complex crosses the periplasm and then\ interacts with outer membrane receptor LolB, which is essential for the anchoring of lipoproteins to the outer membrane.

    \

    E. coli lipoproteins are anchored to the inner or outer membrane depending on the residue at position 2. Aspartate at this\ position makes lipoproteins specific to the inner membrane, whereas other residues cause the release of lipoproteins from the inner\ membrane.

    \ 1251 IPR001332 \ Arteriviruses encode four envelope proteins, GL, GS, M and N. GL envelope glycoprotein\ is heterogenously glycosylated with N-acetyllactosamine in a cell-type-specific manner. \ The GL glycoprotein expresses the neutralization determinants PUBMED:8553578.\ 6323 IPR010519 \

    This family consists of transformer proteins from several Drosophila species and also from Ceratitis capitata (Mediterranean fruit fly). The transformer locus (tra) produces an RNA processing protein that alternatively splices the doublesex pre-mRNA in the sex determination hierarchy of Drosophila melanogaster PUBMED:8013913.

    \ 6301 IPR010510 \

    This family consists of several mammalian FGF binding protein 1. Fibroblast growth factors (FGFs) play important roles during fetal and embryonic development PUBMED:11819092. Fibroblast growth factor-binding protein (FGF-BP) 1 is a secreted protein that can bind fibroblast growth factors (FGFs) 1 and 2 PUBMED:11509569.

    \ 3865 IPR002527 \ Poliovirus infection leads to drastic alterations in membrane\ permeability late during infection. Proteins 2B and 2BC enhance\ membrane permeability PUBMED:9218794, PUBMED:8798506.\ 1271 IPR005144 \

    The ATP-cone is an evolutionarily mobile, ATP-binding regulatory domain PUBMED:10939243.

    \ 4149 IPR001826 \

    RHS elements are proteins of non-essential function believed to play an important role in the natural ecology of the cell. The protein sequences comprise highly conserved 141 kDa domain containing multiple tandem 22-residue repeats, followed by divergent C-terminal domains PUBMED:2403547, PUBMED:7934896. The 22 residue repeats contain a YD dipeptide which is the most strongly conserved motif of the repeat.

    \ \ 5699 IPR008570 \ This family consists of eukaryotic proteins with no known function.\ 7674 IPR012861 \

    This family contains many hypothetical bacterial and archaeal proteins. A few members of this family are annotated as being putative transmembrane proteins, and the region in question in fact contains many hydrophobic residues.

    \ 3424 IPR005778 \

    This model describes N5-methyltetrahydromethanopterin: coenzyme M methyltransferase subunit A in methanogenic archaea. This methyltranferase is a\ membrane-associated enzyme complex that uses methyl-transfer reaction to drive sodium-ion pump. \ \ Archaea have evolved energy-yielding pathways marked by one-carbon biochemistry featuring novel cofactors and enzymes. This transferase (encoded by subunit A) is involved in the transfer of 'methyl' group from N5-methyltetrahydromethanopterin to coenzyme M. In an accompanying reaction, methane is produced by two-electron reduction of methyl-coenzyme M by another enzyme, methyl-coenzyme M reductase.

    \ \ 1221 IPR006801 \ Apolipoprotein A-II (ApoA-II) is the second major apolipoprotein of high density lipoprotein in human plasma. Mature ApoA-II is present as a dimer of two 77-amino acid chains joined by a disulphide bridge PUBMED:12119188. ApoA-II regulates many steps in HDL metabolism, and its role in coronary heart disease is unclear PUBMED:12119188. In bovine serum, the ApoA-II homologue is present in almost free form. Bovine ApoA-II shows antimicrobial activity against Escherichia coli and yeasts in phosphate buffered saline (PBS) PUBMED:9538260.\ 8011 IPR012614 \

    This family consists of the small acid-soluble spore proteins (SASP) P type (sspP). sspP is expressed only in the forespore compartment of the sporulating cell. sspP is also expressed under sigma-G control from the same promoter as sspO. Mutations deleting sspP causes no discernible effect on sporulation, spore properties or spore germination PUBMED:10806362.

    \ 2601 IPR001766 \ The fork head protein of Drosophila melanogaster, a transcription factor that promotes terminal rather than segmental development, contains neither homeodomains nor zinc-fingers characteristic of other transcription factors PUBMED:2566386. Instead, it contains a distinct type of DNA-binding region, containing around 100 amino acids, which has since been identified in a number of transcription factors (including D. melanogaster FD1-5, mammalian HNF-3, human HTLF, Saccharomyces cerevisiae HCM1, etc.). This is referred to as the fork head domain but is also known as a 'winged helix' PUBMED:2566386, PUBMED:8332212, PUBMED:1356269.\ The fork head domain binds B-DNA as a monomer PUBMED:8332212, but shows no similarity to previously identified DNA-binding motifs. Although the domain is found in several different transcription factors, a common function is their involvement in early developmental decisions of cell fates during embryogenesis PUBMED:1356269.\ 790 IPR001047 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archebacterial ribosomal proteins have been grouped\ based on sequence similarities PUBMED:7662106. One of these families, S8E, consists\ of a number of proteins with either about 220 amino acids (in eukaryotes) or about\ 125 amino acids (in archebacteria).

    \ 6660 IPR009647 \

    This conserved region of approximately 90 residues is found in a sub-group of bacterial Penicillin-Binding Proteins (PBPs). A variable length loop region separates this region from the transpeptidase unit (). It is predicted to be a beta fold.

    \ 3975 IPR007580 \

    Poxvirus T4 protein is thought to be secreted or retained in the endoplasmic reticulum if the protein also contains an additional C-terminal region (). M-T4 of myxoma virus () is thought to protect infected lymphocytes from apoptosis and modulate the inflammatory response to virus infection PUBMED:10544103.

    \ 619 IPR013132 \ NeuB is the prokaryotic N-acetylneuraminic acid synthase (Neu5Ac). It catalyses the direct formation of Neu5Ac (the most common sialic acid) by condensation of phosphoenolpyruvate (PEP) and N-acetylmannosamine (ManNAc). This reaction has only been observed in prokaryotes; eukaryotes synthesise the 9-phosphate form, Neu5Ac-9-P, and utilize ManNAc-6-P instead of ManNAc. Such eukaryotic enzymes\ are not present in this family PUBMED:10873658. This family also contains SpsE spore coat polysaccharide biosynthesis proteins.\ 2517 IPR000577 \ It has been shown PUBMED:1659648 that four different type of carbohydrate kinases seem to be evolutionary related.\ These enzymes include L-fucolokinase () (gene fucK); gluconokinase () (gene gntK); glycerol\ kinase () (gene glpK); xylulokinase () (gene xylB); and L-xylulose kinase ()\ (gene lyxK). These enzymes are proteins of from 480 to 520 amino acid residues.\ 2630 IPR000776 \ The fusion glycoproteins from this family are found in ssRNA negative-strand viruses.\ This protein directs fusion of viral and cellular membranes, resulting in viral penetration,\ and can direct fusion of infected cells with adjoining cells, resulting in the formation of\ syncytia. The mature form is a dimer of polypeptides F-1 and F-2 linked by a disulphide\ bond.\ 3297 IPR001208 \

    The MCM domain is found in DNA-dependent ATPases required for the initiation of\ eukaryotic DNA replication PUBMED:1454522, PUBMED:8265339, PUBMED:14731643. In eukaryotes there is a family of six proteins that contain this\ domain, MCM2 to MCM7. They were first identified in yeast where most of them have a\ direct role in the initiation of chromosomal DNA replication by interacting directly\ with autonomously replicating sequences (ARS). They were thus called minichromosome\ maintenance proteins, MCM proteins PUBMED:8332451.

    \ \

    This family is also present in the archebacteria in 1 to 4 copies. Methanococcus jannaschii has four members, MJ0363, MJ0961, MJ1489 and MJECL13.

    \ \

    The "MCM motif" contains Walker-A and Walker-B type nucleotide binding motifs.\ The diagnostic sequence defining the MCMs is IDEFDKM. Only Mcm2 (aka Cdc19 or\ Nda1) has been subjected to mutational analysis in this region, and most\ mutations abolish its activity PUBMED:9383050. The presence of a putative ATP-binding domain implies that these proteins may\ be involved in an ATP-consuming step in the initiation of DNA replication in\ eukaryotes.

    \ \

    The MCM proteins bind together in a large complex PUBMED:9366552.\ Within this complex, individual subunits associate\ with different affinities, and there is a tightly associated core of\ Mcm4 (Cdc21), Mcm6 (Mis5) and Mcm7 PUBMED:9658174. This core complex\ in human MCMs has been associated with helicase activity in vitro PUBMED:9305914,\ leading to the suggestion that the MCM proteins are the eukaryotic replicative helicase.

    \ \

    Fission yeast (Schizosaccharomyces pombe) MCMs, like those in metazoans, are found in the nucleus\ throughout the cell cycle. This is in contrast to the budding yeast (Saccharomyces cerevisiae)\ in which MCM proteins move in and out of the nucleus during each cell\ cycle. The assembly of the MCM complex in fission yeast is required\ for MCM localization, ensuring that only intact MCM complexes remain\ in the nucleus PUBMED:10588642.

    \ 137 IPR000618 \

    Insect cuticle is composed of proteins and chitin. The cuticular proteins seem to be specific to the type of\ cuticle (flexible or stiff) that occur at stages of the insect development. The proteins found in the flexible\ cuticle of larva and pupa of different insects share a conserved C-terminal section PUBMED:2462055 such a\ region is also found in the soft endocuticle of adults insects PUBMED:1997327 as well as in other cuticular\ proteins including in arachnids PUBMED:9014336. In addition, cuticular proteins share hydrophobic regions\ dominated by tetrapeptide repeats (A-A-P-A/V), which are presumed to be functionally important PUBMED:1997327,\ PUBMED:9066122. Many insect cuticle proteins also include a 35-36 amino acid motif known as the R and R consensus. An extended form of this motif has been shown PUBMED:11520687 to bind chitin. It has no sequence similiarity to the cysteine-containing chitin-binding domain of chitinases and some peritrophic membrane proteins, suggesting that arthropods have two distinct classes of chitin-binding proteins, those with the chitin-binding domains found in lectins, chitinases and peritrophic membranes (cysCBD), and those with the type of chitin-binding domains found in cuticular proteins (non-cysCBD) PUBMED:11520687.

    \

    The cuticle protein signature has been found in locust cuticle proteins 7 (LM-7), 8 (LM-8), 19\ (LM-19) and endocuticle structural glycoprotein ABD-4; Hyalophora cecropia cuticle proteins 12 and 66;\ Drosophila melanogaster larval cuticles proteins I, II, III and IV (LCP1 to LCP4); drosophila pupal cuticle proteins PCP,\ EDG-78E and EDG-84E; Manduca sexta cuticle protein LCP-14; Tenebrio molitor cuticle proteins ACP-20, A1A, A2B\ and A3A; and Araneus diadematus (spider) cuticle proteins ACP 11.9, ACP 12.4, ACP 12.6, ACP 15.5 and ACP 15.7.

    \ 7430 IPR011459 \

    These proteins share a region of homology in their N termini, and are found in several phylogenetically diverse bacteria and in the archaeon Methanosarcina acetivorans. Some of these proteins also contain characterised domains such as (e.g. ) and (e.g. ).

    \ 3129 IPR007045 \

    This family of bacterial proteins have been characterized as, 4-deoxy-L-threo-5-hexosulose-uronate ketol-isomerase (), an enzyme involved in pectin degradation.

    \ \ \ 253 IPR004920 \

    Proteins in this group are Caenorhabditis elegans proteins of unknown function.

    \ 3106 IPR000369 \

    Potassium channels are the most diverse group of the ion channel family\ PUBMED:1772658, PUBMED:1879548. They are important in shaping the action potential, and in neuronal excitability and plasticity PUBMED:2451788. The potassium channel family is\ composed of several functionally distinct isoforms, which can be broadly\ separated into 2 groups PUBMED:2555158: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.

    \

    These are all highly similar proteins, with only small amino acid\ changes causing the diversity of the voltage-dependent gating mechanism,\ channel conductance and toxin binding properties. Each type of K+ channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; and others are regulated by GTP-binding proteins or\ other second messengers PUBMED:2448635. In eukaryotic cells, K+ channels\ are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes PUBMED:1373731. In prokaryotic cells, they play a role in the\ maintenance of ionic homeostasis PUBMED:11178249.

    \

    All K+ channels discovered so far possess a core of \ alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has\ been termed the K+ selectivity sequence.\ In families that contain one P-domain, four subunits assemble to form a selective pathway for K+ across the membrane.\ However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K+ channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains.\ The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K+ channels; and three types of calcium (Ca)-activated K+ channels (BK, IK and SK)\ PUBMED:11178249, PUBMED:. The 2TM domain family comprises inward-rectifying K+ \ channels. In addition, there are K+ channel alpha-subunits that possess two P-domains. These are usually highly regulated K+ selective leak channels.

    \ \

    Ion channels exhibit a high degree of diversity, varying both in their\ electrophysiological and pharmacological properties PUBMED:3194754, PUBMED:1939241. The slow\ voltage-gated potassium channels (Isk) are membrane proteins that induce\ selective potassium permeation by membrane depolarisation. It is thought\ that they may act as discrete potassium-conducting ion channels, or\ alternatively that they may subserve as a modularity protein that activates\ endogenous potassium channels.

    \

    The Isk channel protein is considerably smaller than the sodium, potassium\ or calcium channel proteins, and contains only a single putative transmembrane domain. The potassium current elicited by this protein is also\ unusually slow in activation and deactivation after electrical polarisation.\ Such characteristics differ from those of conventional ion channels,\ resembling more those of simple channel-forming peptide ionophores and\ synthetic amphiphilic peptides PUBMED:3194754, PUBMED:1939241.

    \ \ 6876 IPR009762 \

    This family consists of several Circovirus proteins of around 35 residues in length. Members of this family are described as ORF-10 proteins and their function is unknown.

    \ 2225 IPR002743 \ This archaebacterial and bacterial protein family has no known function.\ 7266 IPR009999 \

    This family consists of several Staphylococcus aureus and related bacteriophage proteins of around 65 residues in length. The function of this family is unknown.

    \ 3404 IPR004206 \ The mRNA capping enzyme in yeasts is composed of two separate chains, a mRNA guanyltransferase and an RNA 5'-triphosphate. This is the beta chain of mRNA capping enzyme which has triphosphatase activity. The beta chain (polynucleotide 5'-phosphatase ) converts the 5'-triphosphate end of a nascent mRNA chain into a diphosphate in the first step of mRNA capping.\ The function of the capping enzyme also depends on the guanylyltransferase activity conferred by the alpha chain (see ).\ 5100 IPR007937 \

    Vaccinia viral RNA synthesis is carried out by a virus coded, multi-subunit, eukaryotic-like RNA polymerase. RNA polymerase subunits are synthesized\ throughout infection and the assembled RNA polymerase is packaged into nascent virions late in infection. The RNA polymerase exists in two different forms, one\ specific for early genes and one specific for late genes. Both forms of the RNA polymerase have in common eight subunits, ranging in size from 147 to 7 kDa This family consists of several poxvirus DNA-dependent RNA polymerase 22 kDa\ subunits.

    \ 2575 IPR004208 \ A specific region of the influenza B virus NS1 protein, which includes part of its effector domain, blocks the covalent linkage of mouse ISG15 to its target proteins both in vitro and in infected cells. Of the several hundred proteins induced by interferon (IFN) alpha/beta, the ubiquitin-like ISG15 protein is one of the most predominant. Influenza A virus employs a different strategy: its NS1 protein does not bind the ISG15 protein, but little or no ISG15 protein is produced during infection PUBMED:11157743.\ 27 IPR003833 \ This domain represents subunit 1 of allophanate hydrolase (AHS1).\ 1849 IPR002822 \

    The proteins in this family have no known function.

    \ 352 IPR006706 \

    Extensins are homologous hydroxyproline-rich glycoproteins (HRGPs) found in the plant extracellular matrix. The key to the role of HRGPs in cell wall self-assembly and cell\ extension lies in their chemistry, which is dependent on extensive post-translational modifications (PTMs): hydroxylation, glycosylation,\ and cross-linking. Repetitive peptide motifs characterize HRGPs.

    \ \

    This is a family of extensin-like proteins.

    \ 5166 IPR008003 \

    This family contains several bacteriophage proteins. Three of the proteins in this\ family have been labelled putative cro repressor proteins.

    \ 4180 IPR005813 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    L20 is a protein from the large (50S) subunit; in Escherichia coli it is known to\ bind directly to the 23S rRNA, and is required for ribosome assembly, but\ does not take part in protein synthesis. It belongs to a family of ribosomal\ proteins, including L20 from eubacteria, plant and alga chloroplasts and\ cyanelles PUBMED:.

    \ 6556 IPR009600 \

    Many eukaryotic proteins are anchored to the cell surface via glycosylphosphatidylinositol (GPI), which is posttranslationally attached to the C terminus by GPI transamidase. The mammalian GPI transamidase is a complex of at least four subunits, GPI8, GAA1, PIG-S, and PIG-T. PIG-U is thought to represent a fifth subunit in this complex and may be involved in the recognition of either the GPI attachment signal or the lipid portion of GPI PUBMED:12802054.

    \ 4840 IPR002833 \ This domain has no known function, and has been found in yeast, archaebacteria and eubacteria. In Caenorhabditis elegans this domain occurs with the ubiquitin-associated domain (see ).\ 6435 IPR009523 \

    This family consists of several prokineticin proteins and related BM8 sequences. The suprachiasmatic nucleus (SCN) controls the circadian rhythm of physiological and behavioural processes in mammals. It has been shown that prokineticin 2 (PK2), a cysteine-rich secreted protein, functions as an output molecule from the SCN circadian clock. PK2 messenger RNA is rhythmically expressed in the SCN, and the phase of PK2 rhythm is responsive to light entrainment. Molecular and genetic studies have revealed that PK2 is a gene that is controlled by a circadian clock PUBMED:12024206.

    \ 1862 IPR002845 \

    The archaebacterial proteins in this family have no known function.

    \ 5236 IPR008739 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to MEROPS peptidase family C28 (clan CA).The protein fold of the peptidase unit for members of this family resembles that of papain.

    The leader peptidase of foot and mouth disease virus (Foot-and-mouth disease virus) cleaves itself from the growing polyprotein and also cleaves the host translation initiation factor 4GI (eIF4G), thus inhibiting 5'-cap dependent translation PUBMED:12297280.

    \ 7017 IPR009846 \

    This family consists of several eukaryotic splicing factor 3B subunit 10 (SF3b10) proteins. SF3b10 is a 10 kDa subunit of the splicing factor SF3b. SF3b associates with the splicing factor SF3a and a 12S RNA unit to form the U2 small nuclear ribonucleoproteins complex. SF3b10 and SF3b14b are also thought to facilitate the interaction of U2 with the branch site PUBMED:12234937.

    \ 1139 IPR005654 \

    This family of proteins contains a P-loop motif and are predicted to be ATPases.

    \ 3663 IPR002608 \

    This family consist of the C proteins (C', C, Y1, Y2) found in the Paramyxovirinae, e..g. human parainfluenza virus 3, and sendai virus. The C proteins effect viral RNA synthesis having both a positive and negative effect during the course of infection PUBMED:9621061.\ The paramyxovirinae have a negative-strand ssRNA genome of 15.3 kb from which six mRNAs are transcribed, five of these are monocistronic. \ The P/C mRNA is polycistronic and has two overlapping open reading frames P and C, C encodes the nested C proteins C', C, Y1 and Y2 PUBMED:2542021.

    \ 7010 IPR009842 \

    This family consists of several hypothetical bacterial proteins of around 310 residues in length. Members of this family seem to be found exclusively in Agrobacterium, Rhizobium and Brucella species. The function of this family is unknown.

    \ 5234 IPR008893 \ This domain is named after the most conserved central motif of the domain. It is found in a variety of polyA polymerases as well as the Escherichia coli molybdate metabolism regulator and other proteins of unknown function.The domain is found in isolation in proteins such as and is between 70 and 80 residues in length. \ 3383 IPR006788 \ MOBP is abundantly expressed in central nervous system myelin, and shares several characteristics with myelin basic protein (MBP), in terms of regional distribution and function. MOBP has been shown to be essential for normal arrangement of the radial component in central nervous system myelin PUBMED:10103078, PUBMED:11793190.\ 656 IPR001926 \ Pyridoxal-5'-phosphate-dependent enzymes (B6 enzymes) catalyze manifold reactions in the metabolism of amino acids. Most of these enzymes can be assigned to one of three different families of homologous proteins, the alpha, beta and gamma families. The alpha and gamma family might be distantly related with one another, but are clearly not homologous with the beta family. The beta family includes L- and D-serine dehydratase, threonine dehydratase, the beta subunit of tryptophan synthase, threonine synthase and cysteine synthase. These enzymes catalyze beta-replacement or beta-elimination reactions PUBMED:8112347.\

    Comparison of sequences from eukaryotic, archebacterial, and eubacterial species indicates that the functional specialization of most B6 enzymes has occurred already in the universal ancestor cell. The cofactor pyridoxal-5-phosphate must have emerged very early in biological evolution; conceivably, organic cofactors and metal ions were the first biological catalysts PUBMED:10800595.

    \

    The 3D\ structure of the beta-subunit of tryptophan synthase has been solved. The\ subunit has two domains that are approximately the same size and similar to\ each other in folding pattern. Each has a core containing a four-stranded\ parallel beta-sheet with three helices on its inner side and one on the outer\ side. The cofactor is bound at the interface between the domains PUBMED:7748903.

    \ 1550 IPR000953 \ The CHROMO (CHRromatin Organization MOdifier) domain PUBMED:1982376, PUBMED:1708124, PUBMED:7667093, PUBMED:7501439 \ is a conserved region of around 60 amino acids, originally identified in Drosophila modifiers of variegation.\ These are proteins that alter the structure of chromatin to the condensed morphology of heterochromatin, \ a cytologically visible condition where gene expression is repressed. In one of these proteins, Polycomb, \ the chromo domain has been shown to be important for chromatin targeting. Proteins that contain a chromo \ domain appear to fall into 3 classes. The first class includes proteins having an N-terminal chromo domain \ followed by a region termed the chromo shadow domain PUBMED:7667093, eg. Drosophila and human heterochromatin \ protein Su(var)205 (HP1); and mammalian modifier 1 and modifier 2. The second class includes proteins with \ a single chromo domain, eg. Drosophila protein Polycomb (Pc); mammalian modifier 3; human Mi-2 autoantigenand \ and several yeast and Caenorhabditis elegans hypothetical proteins. In the third class paired tandem chromo domains are \ found, eg. in mammalian DNA-binding/helicase proteins CHD-1 to CHD-4 and yeast protein CHD1.\ 2610 IPR005581 \

    This family includes eukaryotic fructosamine-3-kinase enzymes PUBMED:11016445 which may initiate a process leading to the deglycation of fructoselysine and of glycated proteins and in the phosphorylation of 1-deoxy-1-morpholinofructose, fructoselysine, fructoseglycine, fructose and\ glycated lysozyme. The family also includes bacterial members that have not been characterised but probably have a similar or identical function.

    \ 4791 IPR004936 \ The UL21 protein appears to be a dispensable component in herpesviruses PUBMED:8151763.\ 4360 IPR004297 \ Members of this family are found in Solanaceae spp. plants, a taxonomic group (family) that includes pepper and tobacco\ plant species. Synthesis of these proteins is induced by tobacco mosaic virus (TMV) and salicylic acid PUBMED:1477404; indeed they\ are thought to be involved in the development of systemic acquired resistance (SAR) after an initial hypersensitive\ response to microbial infection PUBMED:1477404, PUBMED:10888849. SAR is characterized by long-lasting resistance to infection by a wide range of\ pathogens, extending to plant tissues distant from the initial infection site PUBMED:10888849.\ 4689 IPR002848 \

    Translins are DNA-binding proteins that specifically recognise consensus sequences at the breakpoint junctions in chromosomal translocations, mostly involving immunoglobulin (Ig)/T-cell receptor gene segments. They seem to recognise single-sranded DNA ends generated by staggered breaks occuring at recombination hot spots PUBMED:9013868.

    \ 4058 IPR004720 \ Bacterial PTS transporters transport and concomitantly phosphorylate their sugar substrates, and typically consist of multiple subunits or protein domains.The Man family is unique in several respects among PTS permease families.\
  • It is the only PTS family in which members possess a IID protein.
  • It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.
  • Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.
  • \

    The mannose permease of Escherichia coli, for example, can transport and phosphorylate glucose, mannose, fructose, glucosamine, N-acetylglucosamine, and other sugars. Other members of this can transport sorbose, fructose and N-acetylglucosamine.

    \

    This family is specific for the IIB components of this family of PTS transporters.

    \ 2557 IPR001624 \

    Four genes from the major Bacillus subtilis chemotaxis locus have been shown to encode proteins that are similar to the Salmonella typhimurium FlgB, FlgC, FlgG and FliF proteins; a further gene product is similar to the Escherichia coli FliE protein PUBMED:1905667. All of these proteins are thought to form part of the hook-basal body complex of the bacterial flagella PUBMED:1905667. The FlgB, FlgC and FlgG proteins are components of the proximal and distal rods; FliF forms the M-ring that anchors the rod assembly to the membrane; but the role of FliE has not yet been determined PUBMED:1905667. The similarity between the proteins in these two organisms suggests that the structures of the M-ring and the rod may be similar PUBMED:1905667. Nevertheless, some differences in size and amino acid composition between some of the homologues suggest the basal body proteins may be organised slightly differently within B. subtilis PUBMED:1905667.

    \

    From gel electrophoresis and autoradiography of 35S-labelled S. typhimurium hook-basal body complexes and the deduced number of sulphur-containing residues in FliE, the stoichiometry of the protein in the hook-basal body complex has been estimated to be about nine subunits PUBMED:1551848. FliE does not undergo cleavage of a signal peptide, nor does it show any similarity to the axial components like the rod or hook proteins, which are thought to be exported by the flagellum-specific export pathway PUBMED:1551848. On this evidence, it has been suggested that FliE may be in the vicinity of the MS ring, perhaps acting as an adaptor protein between ring and rod substructures PUBMED:1551848.

    \ 7312 IPR011083 \

    This region is occasionally found in conjunction with . Most of the proteins appear to be phage tail proteins; however some appear to be involved in other processes. For instance the RhiB protein () from Rhizobium leguminosarum may be involved in plant-microbe interactions PUBMED:1597418. A related protein, microcystin related protein (MrpB, ) is involved in the pathogenicity of Microcystis aeruginosa. The finding of this family in a structural component of the phage tail fibre baseplate () suggests that its function is structural rather than enzymatic. Structural studies show this region consists of a helix and a loop PUBMED:12888344 and three beta-strands. This alignment does not catch the third strand as it is separated from the rest of the structure by around 100 residues. This strand is conserved in homologues but the intervening sequence is not. Much of the function of appears to reside in this intervening region. In the tertiary structure of the phage baseplate this domain forms part of the collar and may bind SO4. The long unconserved region maybe due to domain swapping in and out of a loop or due to rapid evolution.

    \ 7084 IPR009885 \

    This family consists of several hypothetical Enterobacterial proteins of around 80 residues in length. The function of this family is unknown.

    \ 2371 IPR001866 \ E2 is an early regulatory protein found in the dsDNA papillomaviruses. E2 regulates viral transcription and DNA replication. It binds to the E2RE response element (5'-ACCNNNNNNGGT-3') present in multiple copies in the regulatory region. It can either activate or repress transcription, depending on E2RE's posiiton with regard to proximal promoter elements. Repression occurs by sterically hindering the assembly of the transcription initiation complex. The E1-E2 dimer complex binds to the origin of DNA replication PUBMED:1328886.\ 3320 IPR003402 \ The methionine-10 mutant allele of Neurospora crassa codes for a protein of unknown function. However, homologous proteins have been found in yeast, suggesting this protein may be involved in methionine biosynthesis, transport and/or utilization PUBMED:7557397.\ 1117 IPR002605 \ This family consists of various adenovirus penton base\ proteins, from both the mastadenoviridae having mammalian hosts\ and the aviadenoviridae having avian hosts. The penton base is a \ major structural protein forming part of the penton which consists\ of a base and a fiber, the pentons hold a morphologically prominent \ position at the vertex capsomer in the adenovirus particle PUBMED:1316685. \ In mammalian adenovirus there is only one tail on each base where as \ in avian adenovirus there are two PUBMED:1316685.\ 4743 IPR002649 \

    In transfer RNA many different modified nucleosides are found, especially in the anticodon region.\ tRNA (guanine-N1-)-methyltransferase is one of several nucleases operating together with the tRNA-modifying enzymes before the formation of the mature tRNA. It catalyses the reaction:\ \ \ methylating guanosine(G) to N1-methylguanine (1-methylguanosine (m1G)) at position 37 of tRNAs that read CUN (leucine), CCN(proline), and CGG (arginine) codons. The presence of m1G improves the cellular growth rate and the polypeptide steptime and also prevents the tRNA from shifting the reading frame PUBMED:2207153.

    The mechanism of the trmD3-induced frameshift involving mutant tRNA(Pro) and tRNA(Leu) species has been investigated PUBMED:7689113. It has been suggested that the conformation of the anticodon loop may be a major determining element for the formation of m1G37 in vivo PUBMED:9047363.

    \ 5330 IPR008603 \ Dynactin is a multi-subunit complex and a required cofactor for most, or all, o\ f the cellular processes powered by the microtubule-based motor cytoplasmic dyn\ ein. p62 binds directly to the Arp1 subunit of dynactin PUBMED:10671518,\ PUBMED:10607597.\ 826 IPR005062 \

    This family of eukaryotic proteins brings together the yeast nuclear export factor Sac3 that localizes to cytoplasmic fibrils of nuclear pore complex PUBMED:12631707, and mammalian GANP/MCM3-associated proteins, which facilitate the nuclear localization of MCM3, a protein that associates with chromatin in the G1 phase of the cell-cycle.

    \ 3926 IPR006932 \ This is a family of Poxvirus A22 protein.\ 6171 IPR009407 \

    This family is of the Parechovirus genome-linked protein Vpg type P3B.

    \ 1218 IPR007241 \ In yeast, 15 Apg proteins coordinate the formation of autophagosomes. Autophagy is a bulk degradation process induced by starvation in eukaryotic cells PUBMED:11689437. Apg9 plays a direct role in the formation of the cytoplasm to vacuole targeting and autophagic vesicles, possibly serving as a marker for a specialized compartment essential for these vesicle-mediated alternative targeting pathways PUBMED:10662773.\ 200 IPR006720 \

    The defective chorion-1 gene (dec-1) in Drosophila encodes follicle cell proteins necessary for proper eggshell assembly. Multiple products of the dec-1 gene are formed by alternative RNA splicing and proteolytic processing PUBMED:1699826. Cleavage products include S80 (80 kDa) which is incorporated into the eggshell, and further proteolysis of S80 gives S60 (60 kDa).

    Alternative splicing generates different carboxy terminal ends in different protein isoforms. This domain is the most C-terminal region that is present in the main isoforms.

    \ 3432 IPR003314 \ This family consists of MuA-transposase and repressor protein CI. The phage Mu transposase is essential for integration, replication-transposition, and excision of Mu DNA. The N-terminus of the Mu transposase has considerable sequence homology with the Mu repressor and with the NH2 terminus of the transposase of the Mu-like phage D108. These three proteins are known to share binding sites on DNA. An internal sequence in the Mu A protein also shares these features PUBMED:2999776.\

    The repressor protein of bacteriophage Mu establishes and maintains lysogeny by shutting down transposition functions needed for phage DNA replication. It\ interacts with several repeated DNA sequences within the early operator,\ preventing transcription from two divergent promoters. It also directly represses transposition by competing with the MuA transposase for an internal activation sequence (IAS) that is coincident with the operator and required for efficient transposition. The transposase and repressor proteins compete for the\ operator/IAS region using homologous DNA-binding domains located at their\ amino termini PUBMED:10387082.

    \ 4934 IPR005376 \

    The adenovirus early E2A DNA-binding protein (Ad DBP) is a multifunctional protein required, amongst other things, for DNA\ replication and transcription control. It binds to single- and double-stranded DNA, as well as to RNA, in a sequence-independent\ manner. This signature represents the zinc binding domain of the viral DNA- binding protein, which is active in DNA replication. The zinc atoms appear to be required for the stability of the protein fold rather than being involved in\ direct contacts with the DNA, the protein contains two zinc atoms in\ different, novel coordinations. Two copies of this domain are found at the C-terminus of many members of the family PUBMED:8039495.

    \ 3066 IPR002183 \

    Interleukin-3 (IL3) is a cytokine that regulates blood-cell production by controlling the production, differentiation and function of granulocytes and macrophages PUBMED:3497843, PUBMED:2413359. The protein, which exists in vivo as a monomer, is produced in activated T-cells and mast cells PUBMED:3497843, PUBMED:2413359, and is activated by the cleavage of an N-terminal signal sequence PUBMED:2413359.

    \

    IL3 is produced by T-lymphocytes and T-lymphomas only after stimulation with antigens, mitogens, or chemical activators such as phorbol esters. However, IL3 is constitutively expressed in the myelomonocytic leukaemia cell line WEHI-3B PUBMED:2413359. It is thought that the genetic change of the cell line to constitutive production of IL3 is the key event in development of this leukaemia PUBMED:2413359.

    \ 5442 IPR008500 \ This family consists of porcine and bovine circovirus ORF3 proteins of unknown function.\ 6485 IPR010597 \

    This family consists of several uncharacterised mammalian proteins of unknown function.

    \ 3395 IPR007151 \ This family includes proteins related to Mpp10 (M phase phosphoprotein 10). The U3 small nucleolar ribonucleoprotein (snoRNP) is required for three cleavage events that generate the mature 18S rRNA from the pre-rRNA. In Saccharomyces cerevisiae, depletion of Mpp10, a U3 snoRNP-specific protein, halts 18S rRNA production and impairs cleavage at the three U3 snoRNP-dependent sites PUBMED:9391061.\ 1785 IPR006998 \

    The dlt operon (dltA to dltD) of Lactobacillus rhamnosus 7469 encodes four proteins responsible for the esterification of lipoteichoic acid (LTA) by D-alanine. These esters play an important role in controlling the net anionic charge of the poly (GroP) moiety of LTA. DltA and DltC encode the D-alanine-D-alanyl carrier protein ligase (Dcl) and D-alanyl carrier protein (Dcp), respectively. Whereas the functions of DltA and DltC are defined, the functions of DltB and DltD are unknown. In vitro assays showed that DltD bound Dcp for ligation with D-alanine by Dcl in the presence of ATP. In contrast, the homologue of Dcp, the Escherichia coli acyl carrier protein (ACP), involved in fatty acid biosynthesis, was not bound to DltD and thus was not ligated with D-alanine. DltD also catalyzed the hydrolysis of the mischarged D-alanyl-ACP. The hydrophobic N-terminal sequence of DltD was required for anchoring the protein in the membrane. It is hypothesized that this membrane-associated DltD facilitates the binding of Dcp and Dcl for ligation of Dcp with D-alanine and that the resulting D-alanyl-Dcp is translocated to the primary site of D-alanylation PUBMED:10781555.

    \ \ \

    These sequences contain the C-terminal region of DltD.

    \ 1855 IPR002831 \

    TrmB, is a protein of 38,800 apparent molecular weight, that is involved in the maltose-specific regulation of the trehalose/maltose ABC transport operon in Thermococcus litoralis. TrmB has been shown to be a maltose-specific repressor, and this inhibition is counteracted by maltose and trehalose. TrmB binds maltose and trehalose half-maximally at 20 uM and 0.5 mM sugar concentration, respectively PUBMED:12426307. Other members of this family are annotated as either transcriptional regulators or hypothetical proteins.

    \ \ \ 7634 IPR012478 \

    This family contains sequences bearing similarity to a region of GSG1 (), a protein specifically expressed in testicular germ cells PUBMED:9337410. It is possible that over expression of the human homolog may be involved in tumourigenesis of human testicular germ cell tumours PUBMED:9337410. The region in question has four highly conserved cysteine residues.

    \ 7185 IPR009953 \

    This family consists of several bacterial dinitrogenase reductase ADP-ribosyltransferase (DRAT) proteins. Members of this family seem to be specific to Rhodospirillum, Rhodobacter and Azospirillum species. Dinitrogenase reductase ADP-ribosyl transferase (DRAT) carries out the transfer of the ADP-ribose from NAD to the Arg-101 residue of one subunit of the dinitrogenase reductase homodimer, resulting in inactivation of that enzyme. Dinitrogenase reductase-activating glycohydrolase (DRAG) removes the ADP-ribose group attached to dinitrogenase reductase, thus restoring nitrogenase activity. The DRAT-DRAG system negatively regulates nitrogenase activity in response to exogenous NH4+ or energy limitation in the form of a shift to darkness or to anaerobic conditions PUBMED:11160092.

    \ 7877 IPR012535 \

    Cdc14 is a component of the septation initiation network (SIN) and is required for the localisation and activity of Sid1. Sid1 is a protein kinase that localises asymmetrically to one spindle pole body (SPB) in anaphase disappears prior to cell separation PUBMED:10775265 PUBMED:11384993.

    \ 4712 IPR004906 \

    This group includes Caenorhabditis elegans vacuolar assembly protein and several uncharacterised proteins which may be putative transposases.

    \ 703 IPR004107 \

    Proteins containing this domain cleave DNA substrates by a series of staggered cuts, during which the protein becomes covalently linked to the DNA through a catalytic tyrosine residue at the carboxy end of the alignment PUBMED:9082984, PUBMED:9288963.

    \ \

    The phage integrase N-terminal SAM-like domain is almost always found with the signature that defines the phage integrase family (see ).

    \ 1963 IPR004948 \ This family includes hypothetical ATP-binding proteins from prokaryotes.\ 2824 IPR004218 \ Prokaryotic glutathione synthetase (glutathione synthase) catalyses the conversion of gamma-L-glutamyl-L-cysteine and glycine to orthophosphate and glutathione in the presence of ATP. This is the second step in glutathione biosynthesis. The enzyme is inhibited by 7,8-dihydrofolate, methotrexate and trimethoprim. This is the ATP-binding domain of the enzyme.\ 6942 IPR009801 \

    This family consists of several hypothetical eukaryotic proteins of around 200 residues in length. Members of this family seem to be specific to mammals and their function is unknown.

    \ 1560 IPR002736 \

    The citG gene is found in a gene cluster with citrate lyase\ subunits PUBMED:9457870. The CitG protein catalyzes the conversion of ATP and dephospho-CoA to adenine and\ 2'-(5"-triphosphoribosyl)-3'-dephospho-CoA, the predicted precursor of the citrate lyase prosthetic\ group PUBMED:11042274.

    \ 681 IPR000668 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of proteins belong to the peptidase family C1, sub-family C1A (papain family, clan CA). It includes proteins classed as non-peptidase homologs. These are have either been shown experimentally to lack peptidase activity or lack one or more of the active site residues.

    \ \

    The papain family has a wide variety of activities, including broad-range (papain) and narrow-range endo-peptidases, aminopeptidases, dipeptidyl peptidases and enzymes with both exo- and endo-peptidase activity PUBMED:7845226. Members of the papain family are widespread, found in baculovirus PUBMED:8439290, eubacteria, yeast, and practically all protozoa, plants and mammals PUBMED:7845226. The proteins are typically\ lysosomal or secreted, and proteolytic cleavage of the propeptide is required for enzyme activation, although bleomycin hydrolase is cytosolic in fungi and mammals PUBMED:3117099. Papain-like cysteine proteinases are essentially synthesised as inactive proenzymes (zymogens) with N-terminal propeptide regions. The activation process of these enzymes includes the removal of propeptide regions. The propeptide regions serve a variety of functions in vivo and in vitro. The pro-region is required for the proper folding of the newly synthesised enzyme, the inactivation of the peptidase domain and stabilisation of the enzyme against denaturing at neutral to alkaline pH conditions. Amino acid residues within the pro-region mediate their membrane association, and play a role in the transport of the proenzyme to lysosomes. Among the most notable features of propeptides is their ability to inhibit the activity of their cognate enzymes and that certain propeptides exhibit high selectivity for inhibition of the peptidases from which they originate PUBMED:12188906.\

    \ \

    The catalytic residues of papain are Cys-25 and His-159, other important residues being Gln-19, which helps form the 'oxyanion hole', and Asn-175, which orientates the imidazole ring of His-159.

    \ \ 7153 IPR009931 \

    This family consists of several Curtovirus V2 proteins. The exact function of V2 is unclear but it is known that the protein is required for a successful host infection process PUBMED:9123819.

    \ 6822 IPR009733 \

    This family represents a conserved region approximately 60 residues long, multiple copies of which are found within eukaryotic involucrin, and which is rich in glutamine and glutamic acid residues. Involucrin forms part of the insoluble cornified cell envelope (a specialised protective barrier) of stratified squamous epithelia PUBMED:12210515. Members of this family seem to be restricted to mammals.

    \ 4570 IPR003122 \

    The aspartate receptor, Tar, is a member of a family of transmembrane receptors that mediate chemotactic response in certain enteric bacteria, such as Salmonella typhimurium and Escherichia coli PUBMED:8831788. These methyl-accepting chemotaxis receptors are one of the first components in the sensory excitation and adaptation responses in bacteria, which act to alter swimming behaviour upon detection of specific chemicals. The aspartate receptor mediates movement towards the attractants aspartate and maltose, and away from the repellents nickel and cobalt. There are many different types of bacterial 60 kDa transmembrane receptors, which share similar topology and signalling mechanisms. They possess three domains: a periplasmic ligand-binding domain, two transmembrane segments, and a cytoplasmic domain. The structure of the ligand-binding domain comprises a closed or partly opened, four-helical bundle with a left-handed twist. The difference in the sequence of the ligand-binding domain between receptors reflects the different ligand specificities. Binding of the ligand causes a conformational change that is transmitted across the membrane to the cytoplasmic activation domain PUBMED:11504940.

    \ 3049 IPR001288 \

    Initiation factor 3 (IF-3) (gene infC) is one of the three factors required for the \ initiation of protein biosynthesis in bacteria. IF-3 is thought to function as a \ fidelity factor during the assembly of the ternary initiation complex which consist of \ the 30S ribosomal subunit, the initiator tRNA and the messenger RNA. IF-3 is a basic\ protein that binds to the 30S ribosomal subunit PUBMED:8405963. The chloroplast initiation factor IF-3(chl) is a protein that \ enhances the poly(A,U,G)-dependent binding of the initiator tRNA to chloroplast ribosomal\ 30s subunits in which the central section is evolutionary related to the sequence of \ bacterial IF-3 PUBMED:8144528.

    \ 2159 IPR001142 \ A number of uncharacterised integral membrane proteins from yeast contain an internal duplication due to duplicated genes. Duplicated copies of genes may be classified in two types of cluster organization. The first type includes genes sharing a significant level of identity in the amino acid sequences of their predicted protein product. They are recovered on two different chromosomes, transcribed in the same orientation and the distance between them is conserved. The second type of cluster is based on one gene unit tandemly repeated. This duplication is itself repeated elsewhere in the genome. The basic gene unit is recovered many times in the genome and is a component of a multigene family of unknown function. These organizations in clusters of genes suggest a 'Lego organization' of the yeast chromosomes PUBMED:9234674. The proteins belonging to this family are of unknown function.\ 729 IPR006588 \

    The PAW domain of unknown function is found in peptide N glycanase (PNGase, ) and in a number of hypothetical proteins.

    \ 7372 IPR011425 \

    These proteins include CSE2 () PUBMED:8336709 and a subunit of the Mediator complex PUBMED:12584197. CSE2, or chromosome segregation protein, has a microtubule-related role in chromosome segregation.

    \ 6012 IPR010392 \

    This family consists of several Potexvirus coat proteins.

    \ 1301 IPR005468 \ Avidin PUBMED:2388586 is a minor constituent of egg white in several groups of oviparous\ vertebrates. Avidin, which was discovered in the 1920's, takes its name from\ the avidity with which it binds biotin. These two molecules bind so strongly\ that is extremely difficult to separate them. Streptavidin is a protein produced\ by Streptomyces avidinii which also binds biotin and whose sequence is\ evolutionary related to that of avidin.\

    Avidin and streptavidin both form homotetrameric complexes of noncovalently\ associated chains. Each chain forms a very strong and specific non-covalent\ complex with one molecule of biotin.

    \ \

    The three-dimensional structures of both streptavidin PUBMED:2928324, PUBMED:8515446 and avidin PUBMED:2784773\ have been determined and revealed them to share a common fold: an eight\ stranded anti-parallel beta-barrel with a repeated +1 topology enclosing an\ internal ligand binding site.

    \

    Fibropellins I and III PUBMED:8500658 are proteins that form the apical lamina of the sea\ urchin embryo, a component of the extracellular matrix. These two proteins\ have a modular structure composed of a CUB domain (see), followed\ by a variable number of EGF repeats and a C-terminal avidin-like domain.

    \ 6457 IPR010583 \

    This family consists of several bacterial MltA-interacting protein (MipA) like sequences. As well as interacting with the membrane-bound lytic transglycosylase MltA, MipA is known to bind to PBP1B, a bifunctional murein transglycosylase/transpeptidase. MipA is considered to be a structural protein mediating the assembly of MltA to PBP1B into a complex PUBMED:10037771.

    \ 5764 IPR010263 \

    This family consists of a series of hypothetical bacterial sequences of unknown function.

    \ 741 IPR002885 \

    Pentatricopeptide repeat proteins are characterised by the presence of a tandem array of repeats, where the number of PPR motifs controls the affinity and specificity of the PPR protein for RNA. These proteins occur predominantly in plants, where they appear to play essential roles in RNA/DNA metabolism in mitochondria and chloroplasts PUBMED:15270678. It has been suggested that each of the highly variable PPR proteins is a gene-specific regulator of plant organellar RNA metabolism. PPR proteins may also play a role in organelle biogenesis, probably via binding to organellar transcripts PUBMED:15269332. Examples of PPR repeat-containing proteins include PET309 , which may be involved in RNA stabilisation PUBMED:7664742, and crp1, which is involved in RNA processing PUBMED:8039510. The repeat is associated with a predicted plant protein that has a domain organization similar to the human BRCA1 protein.

    \ 7128 IPR010842 \

    This family consists of several hypothetical bacterial proteins of around 120 residues in length. Members of this family seem to be found exclusively in Rhizobium, Agrobacterium and Pseudomonas species. The function of this family is unknown.

    \ 8131 IPR013159 \

    This entry represents the C-terminal domain of bacterial DnaA proteins PUBMED:8110826, PUBMED:1779750, PUBMED:2558436 that play an important role in initiating and regulating chromosomal replication. DnaA is an ATP- and DNA-binding protein. It binds specifically to 9 bp nucleotide repeats known as dnaA boxes which are found in the chromosome origin of replication (oriC).

    \

    DnaA is a protein of about 50 kDa that contains two conserved regions: the first is located in the N-terminal half and corresponds to the ATP-binding domain, the second is located in the C-terminal half and could be involved in DNA-binding. The protein may also bind the RNA polymerase beta subunit, the dnaB and dnaZ proteins, and the groE gene products (chaperonins) PUBMED:2172087.

    \ 1965 IPR004952 \ This family includes several proteins of unknown function. Members of this family may be involved in nitrogen fixation, since they are found within nitrogen fixation operons. \ 1411 IPR001784 \ The bunyaviruses are enveloped viruses with a genome consisting of 3 ssRNA segments (called L, M and S). The nucleocapsid protein is encode on the small (S) genomic RNA. The N protein is the major component of the nucleocapsids. This protein is thought to interact with the L protein, virus RNA and/or other N proteins PUBMED:7897347.\ 329 IPR003172 \

    Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Lep d 1, Der f 1, Der m 1 and Der p 1.

    \ \

    This family includes E1 protein, an epididymal secretory protein as well as the Dermatophagoides pteronyssinus (house dust mite) allergens.

    \ 3329 IPR003019 \

    Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, nickel, etc. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds PUBMED:1779825, PUBMED:2959513. An empirical classification into three classes has been proposed by Fowler and coworkers PUBMED:2959504 and Kojima PUBMED:1779826. Members of class I are defined to include polypeptides related in the positions of their cysteines to equine MT-1B, and include mammalian MTs as well as from crustaceans and molluscs. Class II groups MTs from a variety of species, including sea urchins,\ fungi, insects and cyanobacteria. Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units PUBMED:2959504.

    \

    This original classification system has been found to be limited, in the sense that it does not allow clear differentiation of patterns of structural similarities, either between or within classes. Consequently, all class I and class II MTs (the proteinaceous sequences) have now been grouped into families of phylogenetically-related and thus alignable sequences. This system subdivides the MT superfamily into families, subfamilies, subgroups, and isolated isoforms and alleles.

    \

    The metallothionein superfamily comprises all polypeptides that resemble equine renal metallothionein in several respects PUBMED:2959504: e.g., low molecular weight; high metal content; amino acid composition with high Cys and low aromatic residue content; unique sequence with characteristic distribution of cysteines, and spectroscopic manifestations indicative of metal thiolate clusters. A MT family subsumes MTs that share particular sequence-specific features and are thought to be evolutionarily related. The inclusion of a MT within a family presupposes that its amino acid sequence is alignable with that of all members. Fifteen MT families [http://www.unizh.ch/~mtpage/MT.html] have been characterised, each family being identified by its number and its taxonomic range: e.g., Family 1: vertebrate MTs.

    \

    This entry is a superfamily of metallothioneins, containing 3 families.

    \ 2987 IPR001955 \

    Pancreatic hormone (PP) PUBMED:6107857 is a peptide synthesized in pancreatic islets of Langherhans, which acts as a regulator of pancreatic and gastrointestinal functions.

    \

    The hormone is produced as a larger propeptide, which is enzymatically cleaved to yield the mature active peptide: this is 36 amino acids in length PUBMED:3031687 and has an amidated C terminus PUBMED:2599092. The hormone has a globular structure, residues 2-8 forming a left-handed poly-proline-II-like helix, residues 9-13 a beta turn, and 14-32 an alpha-helix,held close to the first helix by hydrophobic interactions PUBMED:3031687. Unlike glucagon, another peptide hormone, the structure of pancreatic peptide is preserved in aqueous solution PUBMED:2067973. Both N and C termini are required for activity: receptor binding and activation functions may reside in the N and C termini respectively PUBMED:3031687.

    \ \ 565 IPR000998 \ A 170 amino acid domain, the so-called MAM domain, has been recognised in the extracellular region of \ functionally diverse proteins PUBMED:8387703. These proteins have a modular, receptor-like architecture \ comprising a signal peptide, an N-terminal extracellular domain, a single transmembrane domain and an \ intracellular domain. Such proteins include meprin (a cell surface glycoprotein) PUBMED:1374387; A5 \ antigen (a developmentally-regulated cell surface protein) PUBMED:1908252; and receptor-like tyrosine \ protein phosphatase PUBMED:1655529. The MAM domain is thought to have an adhesive function. It contains \ 4 conserved cysteine residues, which probably form disulphide bridges.\ 6089 IPR009369 \

    This family consists of several Actinobacillus actinomycetemcomitans leukotoxin activator (LktC) proteins. Actinobacillus actinomycetemcomitans is a Gram-negative bacterium that has been implicated in the etiology of several forms of periodontitis, especially localised juvenile periodontitis. LktC along with LktB and LktD are thought to be required for activation and localisation of the leukotoxin PUBMED:2004819.

    \ 666 IPR000095 \ The molecular bases of the versatile functions of Rho-like GTPases are still unknown.\ Small domains that bind Cdc42p- and/or Rho-like small GTPases.\ Also known as the Cdc42/Rac interactive binding (CRIB). The Cdc42/Rac interactive binding\ (CRIB) region has been shown to inhibit transcriptional activation\ and cell transformation mediated by the Ras-Rac pathway PUBMED:9119069. In fission yeast pak1+ encodes a protein kinase that interacts\ with Cdc42p and is involved in the control of cell polarity\ and mating PUBMED:8846783.\ 7767 IPR012414 \

    This family features the antihypertensive and antiviral proteins BDS-I () and BDS-II () expressed by Anemonia sulcata. BDS-I is organised into a triple-stranded antiparallel beta-sheet, with an additional small antiparallel beta-sheet at the N-terminus PUBMED:2566326. Both peptides are known to specifically block the Kv3.4 potassium channel, and thus bring about a decrease in blood pressure PUBMED:9506974. Moreover, they inhibit the cytopathic effects of mouse hepatitis virus strain MHV-A59 on mouse liver cells, by an unknown mechanism PUBMED:2566326.

    \ 7583 IPR006457 \

    This domain is found tandemly duplicated in a most members of a paralogous family in the archaeon Methanosarcina acetivorans str. C2A. This domain is clearly related to the central region of a family of archaeal S-layer proteins described in .

    \ 355 IPR005804 \

    Fatty acid desaturases are enzymes that catalyze the insertion\ of a double bond at the delta position of fatty acids. There seem to be two distinct families of fatty acid desaturases which do not\ seem to be evolutionary related. This entry is family 2 of the desaturases and\ includes, plant stearoyl-acyl-carrier-protein desaturase () that catalyses\ the introduction of a double bond at the delta position of steraoyl-ACP to produce \ oleoyl-ACP PUBMED:2006187 and is responsible for the conversion of saturated fatty acids to unsaturated \ fatty acids in the synthesis of vegetable oils, and Cyanobacteria desA PUBMED:2118597 an enzyme that can introduce a second cis double bond at the delta position of fatty acid bound to membranes glycerolipids. \ DesA is involved in chilling tolerance; the phase transition temperature of lipids of \ cellular membranes being dependent on the degree of unsaturation of fatty acids of \ the membrane lipids.

    \ 7054 IPR009867 \

    This family consists of several hypothetical bacterial proteins of around 120 residues in length. The function of this family is unknown.

    \ 3564 IPR001292 \

    The oestrogen receptors (ERs) are steroid or nuclear hormone receptors that act as transcription regulators involved in diverse physiological functions. Oestrogen receptors function as dimeric molecules in nuclei to regulate the transcription of target genes in a ligand-responsive manner. The ER consists of three functional and structural domains: an N-terminal modulatory domain, a highly conserved DNA-binding domain that recognises specific sequences (), and a C-terminal ligand-binding domain ().

    \

    The N-terminal modulatory domain spans the first 180 residues and contains the activation function 1 (AF1) region. Nuclear receptors differ considerably with respect to AF1 activity and regulation, as it is a poorly conserved region PUBMED:15831449. There is another activation function region, namely AF2, which resides in the C-terminal end of the ligand-binding domain. Transcription activation is facilitated by both AF1 and AF2, which appear to act synergistically in the ER complex PUBMED:15728727, PUBMED:14612550. For example, the ER can recruit TIF2 (transcription intermediary factor 2) via the AF1 and AF2 regions, whose synergistic action results in the activation of transcription.

    \

    This entry represents the AF1-containing modulatory domain found at the N-terminus in oestrogen alpha-type receptors.

    \ \ \ 1332 IPR006790 \ This is a family of viral structural glycoproteins PUBMED:1629955.\ 3304 IPR003901 \ Methyl coenzyme M reductase (MCR) catalyses the final step in methanogenesis. MCR is composed of three subunits, alpha, beta and gamma PUBMED:8863453. Genes encoding the beta (mcrB) and gamma (mcrG) subunits are separated by two open reading frames coding for two proteins C and D PUBMED:3170483. The function of proteins C and D (this family) is unknown.\ 2669 IPR001651 \ Gastrin and cholecystokinin (CCK) PUBMED: are structurally and functionally related peptide hormones that function as hormonal regulators of various digestive processes and feeding behaviors. They are known to induce gastric secretion, stimulate pancreatic secretion, increase blood circulation and water secretion in the stomach and intestine, and stimulate smooth muscle contraction. Originally found in the gut, these hormones have since been shown to be present in various parts of the nervous system. Like many other active peptides they are synthesized as larger protein precursors that are enzymatically converted to their mature forms. They are found in several molecular forms due to tissue-specific post-translational processing. The biological activity of gastrin and CCK is associated with the last five C-terminal residues. One or two positions downstream, there is a conserved sulphated tyrosine residue. The amphibian caerulein skin peptide, the cockroach leukosulphakinin I and II (LSK) peptides, Drosophila melanogaster putative CCK-homologs Drosulphakinins I and II, cionin, a chicken gastrin/cholecystokinin-like peptide and cionin, a neuropeptide from the protochordate Ciona intestinalis belong to the same family.\ 7139 IPR009923 \

    This family consists of several hypothetical bacterial proteins as well as one archaeal sequence . Members of this family are typically of around 70 residues in length. The function of this family is unknown.

    \ 3099 IPR004959 \ This family includes IpaB, which is an invasion plasmid antigen from Shigella PUBMED:11207575, as well as EvcA from Escherichia coli. Members of this family\ seem to be involved in pathogenicity of some enterobacteria. However the exact function of this component is not clear. \ \ 4368 IPR004682 \

    TRAP-T family proteins generally consist of three components, and these systems have so far been found in Gram-negative bacteria, Gram-positive bacteria and archaea. Only one member of the family has been both sequenced and functionally characterised. This system is the DctPQM system of Rhodobacter capsulatus. DctP is a periplasmic dicarboxylate (malate, fumarate, succinate) binding receptor that is biochemically well-characterised.

    \ 7825 IPR002587 \ 1L-myo-Inositol-1-phosphate synthase () catalyzes the conversion of D-glucose 6-phosphate to 1L-myo-inositol-1-phosphate, the first committed step in the production of all inositol-containing compounds, including phospholipids, either directly or by salvage. The enzyme exists in a cytoplasmic form in a wide range of plants, animals, and fungi. It has also been detected in several bacteria and a chloroplast form is observed in alga and higher plants. Inositol phosphates play an important role in signal transduction.\

    In baker's yeast, Saccharomyces cerevisiae, the transcriptional regulation of the INO1 gene has been studied in detail PUBMED:7975896 and its expression is sensitive to the availability of phospholipid precursors as well as growth phase. The regulation of the structural gene encoding 1L-myo-inositol-1-phosphate synthase has also been analyzed at the transcriptional level in the aquatic angiosperm, Spirodela polyrrhiza and the halophyte, Mesembryanthemum crystallinum PUBMED:9370339.

    \ 3151 IPR003192 \ Maltoporin (LamB protein) forms a trimeric structure which facilitates the diffusion of maltodextrins across the outer membrane of Gram-negative bacteria. The membrane channel is formed by an antiparallel beta-barrel PUBMED:7824948.\ 3292 IPR004518 \

    This domain is found in a group of prokaryotic proteins which includes Escherichia coli MazG. The domain is about 100 amino acid residues in length and contains four conserved negatively charged residues that probably form an active site or metal binding site.

    \ 7693 IPR012903 \

    This domain is found in the cyanobacteria, and the nitrogen-fixing proteobacterium Azotobacter vinelandii and may be involved in nitrogen fixation, but no role has been assigned PUBMED:2644218.

    \ \ 6381 IPR010548 \

    This family consists of several mammalian specific BCL2/adenovirus E1B 19 kDa protein-interacting protein 3 or BNIP3 sequences. BNIP3 belongs to the Bcl-2 homology 3 (BH3)-only family, a Bcl-2-related family possessing an atypical Bcl-2 homology 3 (BH3) domain, which regulates PCD from mitochondrial sites by selective Bcl-2/Bcl-XL interactions. BNIP3 family members contain a C-terminal transmembrane domain that is required for their mitochondrial localisation, homodimerisation, as well as regulation of their pro-apoptotic activities. BNIP3-mediated apoptosis has been reported to be independent of caspase activation and cytochrome c release and is characterised by early plasma membrane and mitochondrial damage, prior to the appearance of chromatin condensation or DNA fragmentation PUBMED:12690108.

    \ 6102 IPR009375 \

    This family consists of several bacteriophage Mu-like tail sheath (GpL) proteins as well as several related hypothetical bacterial proteins.

    \ 6345 IPR009487 \

    This family consists of several Orthopoxvirus A43R proteins. The function of this family is unknown.

    \ 3963 IPR006971 \ This family includes M2 protein of unknown function from variola virus. \ 229 IPR001159 \ The DsRBD domain is found in a variety of RNA-binding proteins with different\ structures and exhibiting a diversity of functions PUBMED:8036511.\ It is involved in localization of at least five different mRNAs in the early Drosophila embryo and by interferon-induced protein kinase in humans, which is part of the cellular response to dsRNA.\ 4143 IPR003490 \ Infectious hematopoietic necrosis virus (IHNV) is a member of the family Rhabdoviridae. The non-virion protein (NV) is coded\ for by one of the six genes of the IHNV genome PUBMED:8578857, but is absent in vesiculovirus -like rhabdovirus PUBMED:9010293.\ 2407 IPR002200 \

    Elicitins are a family of small, highly-conserved proteins secreted by phytopathogenic fungi belonging to the phytophthora species PUBMED:7753775, PUBMED:. They are toxic proteins reponsible for inducing a necrotic and systemic hypersensitive response in plants from the solanaceae and cruciferae families. Leaf necrosis provides immediate control of fungal invasion and induces systemic acquired resistance; both responses mediate basic protection against subsequent pathogen inoculation.

    \ \

    Members of this family share a high level of sequence similarity, but they differ in net charge, dividing them into two classes: alpha and beta PUBMED:7753775, PUBMED:. Alpha-elicitins are highly acidic, with a valine residue at position 13, whereas beta-elicitins are basic, with a lysine at the same position. Residue 13 is known to be involved in the control of necrosis and, being exposed, is thought to be involved in ligand/receptor binding PUBMED:, PUBMED:9385630. Phenotypically, the two classes can be distinguished by their necrotic properties: beta-elicitins are 100-fold more toxic and provide better subsequent protection PUBMED:7753775, PUBMED:.

    \ 2392 IPR001027 \ Human immunodeficiency virus (HIV) and equine infectious anemia virus (EIAV) are closely related lentiviruses that infect immune cells, but their pathogenesis differ. The coat polyprotein of equine infectious anemia virus (EIAV) contains gp90 and gp45 PUBMED:2841805.\

    A fluorescence polarization-based diagnostic assay for EIAV has been developed providing the basis of an improved commercial diagnostic assay for EIAV infection of horses. The most sensitive and specific peptide probe was a peptide corresponding to the immunodominant region of the EIAV transmembrane protein, gp45 PUBMED:10790112.

    \ 1262 IPR006034 \ Asparaginase, which is found in various plant, animal and bacterial cells, catalyses the deamination of asparagine to yield aspartic acid and an ammonium ion, resulting in a depletion of free circulatory asparagine in plasma PUBMED:3026924. The enzyme is effective in the treatment of human malignant lymphomas, which have a diminished capacity to produce asparagine synthetase: in order to survive, such cells absorb asparagine from blood plasma PUBMED:2407723, PUBMED:3379033 - if Asn levels have been depleted by injection of asparaginase, the lymphoma cells die. Glutaminase, a similar enzyme, catalyses the deaminination of glutamine to glutamic acid and an ammonium ion PUBMED:2407723. Both enzymes are homotetramers PUBMED:3026924: two threonine residues in the N-terminal half of the proteins are involved in the catalytic activity.\ 6058 IPR010415 \

    This is a family of uncharacterised bacterial proteins.

    \ 5260 IPR008412 \ Bone sialoprotein (BSP) is a major structural protein of the bone matrix that is specifically expressed by fully-differentiated osteoblasts PUBMED:8061918. The expression of bone sialoprotein (BSP) is normally restricted to mineralised connective tissues of bones and teeth where it has been associated with mineral crystal formation. However, it has been found that ectopic expression of BSP occurs in various lesions, including oral and extraoral carcinomas, in which it has been associated with the formation of microcrystalline deposits and the metastasis of cancer cells to bone PUBMED:10785518.\ 2900 IPR002896 \ Herpesviruses are dsDNA viruses with no RNA stage. This family consists of glycoprotein-D (gD or gIV) which is common to herpes simplex virus type 1 and herpes simplex virus type 2, as well as equine herpes, bovine herpes and Marek's disease virus. Glycoprotein-D has been found on the viral envelope and the plasma membrane of infected cells. gD immunisation can produce an immune response to bovine herpes virus (BHV-1). This response is stronger than that of the other major glycoproteins gB (gI) and gC (gIII) in BHV-1 PUBMED:7530392.\ 6342 IPR010527 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    In PSII, the oxygen-evolving complex (OEC) is responsible for catalysing the splitting of water to O(2) and 4H+. The OEC is composed of a cluster of manganese, calcium and chloride ions bound to extrinsic proteins. In cyanobacteria there are five extrinsic proteins in OEC (PsbO, PsbP-like, PsbQ-like, PsbU and PsbV), while in plants there are only three (PsbO, PsbP and PsbQ), PsbU and PsbV having been lost during the evolution of green plants PUBMED:15258264.

    \

    This family represents the PSII extrinsic protein PsbU, which forms part of the OEC in cyanobacteria and red algae. PsbU acts to stabilise the oxygen-evolving machinery of PSII against heat-induced inactivation, which is crucial for cellular thermo-tolerance PUBMED:10318707.

    \ \ 3691 IPR002073 \ Retinal 3',5'-cGMP phosphodiesterase () (PDE) is located in photoreceptor\ outer segments PUBMED:: it is light activated, playing a pivotal role in\ signal transduction. In rod cells, PDE is oligomeric, comprising an\ alpha-, a beta- and 2 gamma-subunits, while in cones, PDE is a homodimer\ of alpha chains, which are associated with several smaller subunits.\ Both rod and cone PDEs catalyse the hydrolysis of cAMP or cGMP to the\ corresponding nucleoside 5' monophosphates, both enzymes also binding\ cGMP with high affinity. The cGMP-binding sites are located in the\ N-terminal half of the protein sequence, while the catalytic core \ resides in the C-terminal portion.\ 4455 IPR000330 \

    This domain is found in proteins involved in a variety of processes including transcription regulation (e.g., SNF2, STH1, brahma, MOT1), DNA repair (e.g., ERCC6, RAD16, RAD5), DNA recombination (e.g., RAD54), and chromatin unwinding (e.g., ISWI) as well as a variety of other proteins with little functional information (e.g., lodestar, ETL1) PUBMED:7651832, PUBMED:14729263. SNF2 functions as the ATPase component of the SNF2/SWI multisubunit complex, which utilises energy derived from ATP hydrolysis to disrupt histone-DNA interactions, resulting in the increased accessibility of DNA to transcription factors.

    \

    Proteins that contain this domain appear to be distantly related to the\ DEAX box helicases , however\ no helicase activity has ever been demonstrated for these proteins.

    \ 5708 IPR008618 \ This family consists of several Fijivirus 64 kDa capsid proteins.\ 7668 IPR012886 \

    The formiminotransferase (FT) domain of formiminotransferase- cyclodeaminase (FTCD) forms a homodimer, and each protomer comprises two subdomains. The N-terminal subdomain is made up of a six-stranded mixed beta-pleated sheet and five alpha helices, which are arranged on the external surface of the beta sheet. This, in turn, faces the beta-sheet of the C-terminal subdomain to form a double beta-sheet layer. The two subdomains are separated by a short linker sequence, which is not thought to be any more flexible than the remainder of the molecule. The substrate is predicted to form a number of contacts with residues found in both the N-terminal and C-terminal subdomains PUBMED:10673422.

    \ 3501 IPR011541 \

    High affinity nickel transporters are involved in the incorporation of nickel into H2-uptake hydrogenase PUBMED:7934894, PUBMED:7651142 and urease PUBMED:8197192 enzymes and are essential for the expression of catalytically active hydrogenase and urease. Ion uptake is dependent on proton motive force. HoxN in Alcaligenes eutrophus is thought to be an integral membrane protein with seven transmembrane helices PUBMED:8288539. The family also includes a cobalt transporter.

    \ 5674 IPR008445 \ This family consists of several Chordopoxvirus A15 like sequences.\ 986 IPR008197 \

    A group of proteins containing 8 characteristically-spaced cysteine residues, which are involved in disulphide bond formation, have been termed '4-disulphide core' proteins PUBMED:6896234. While the pattern of conserved cysteines suggests that the sequences may adopt a similar fold, the overall degree of sequence similarity is low (e.g. a few Pro and Gly residues are reasonably well conserved, as is the polar/acidic nature of residues between the third and fourth Cys, but otherwise there is little sequence conservation). The group of sequences that share this pattern include whey acidic protein (WAP) PUBMED:6896234; WDNM1 protein (which is involved in the metastatic potential of adenocarcinomas in rats PUBMED:3136918; Kallmann syndrome protein PUBMED:1913827; and caltrin-like protein II from guinea pig PUBMED:2324101 (which inhibits calcium transport into spermatozoa); and elafin a serine elastase inhibitor, which belongs to MEROPS inhibitor family I17. Elafin has no activity against plasmin, trypsin, alpha-chymotrypsin, and cathepsin G PUBMED:2394696.\

    \ 3452 IPR001614 \

    The myelin sheath is a multi-layered membrane, unique to the nervous system, that functions as an insulator to greatly increase the velocity of axonal impulse conduction PUBMED:2435734. Myelin proteolipid protein (PLP or lipophilin) PUBMED:1711121 is the major myelin protein from the central nervous system (CNS). It probably plays an important role in the formation or maintenance of the multilamellar structure of myelin. In man point mutations in PLP are the cause of Pelizaeus-Merzbacher disease (PMD), a neurologic disorder of myelin metabolism. In animals dismyelinating diseases such as mouse 'jimpy' (jp), rat md, or dog 'shaking pup' are also caused by mutations in PLP.

    \

    PLP is a highly conserved PUBMED:1722981 hydrophobic protein of 276 to 280 amino acids which seems to contain four transmembrane segments, two disulphide bonds and which covalently binds lipids (at least six palmitate groups in mammals) PUBMED:1281423.

    \

    PLP is highly related to M6, a neuronal membran glycoprotein PUBMED:8398137.

    \ 5669 IPR008402 \ The anaphase-promoting complex (APC) is a conserved multi-subunit ubiquitin ligase required for the degradation of key cell cycle regulators. Members of this family are components of the anaphase-promoting complex homologous to Apc15p PUBMED:12477395.\ 7322 IPR011110 \

    A large group of two component regulator proteins appear to have the same N-terminal structure of 14 tandem repeats. These repeats show homology to members of and indicating that they are likely to form a beta-propeller. This family has been built with artificially high cut-offs in order to avoid overlaps with other beta-propeller families. The fourteen repeats are likely to form two propellers; it is not clear if these structures are likely to recruit other proteins or interact with DNA.

    \ 4218 IPR000915 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaeabacterial ribosomal proteins can be grouped on the basis of sequence \ similarities. One of these families includes mammalian ribosomal protein L6 (L6 was previously known \ as TAX-responsive enhancer element binding protein 107); Caenorhabditis elegans ribosomal protein L6 \ (R151.3); Saccharomyces cerevisiae ribosomal protein YL16A/YL16B; and Mesembryanthemum crystallinum ribosomal protein \ YL16-like. These proteins have 175 (yeast) to 287 (mammalian) amino acids.

    \ 442 IPR001173 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \ This domain is found in a diverse family of glycosyl transferases that transfer the sugar\ from UDP-glucose, UDP-N-acetyl-galactosamine, GDP-mannose or CDP-abequose, to a range\ of substrates including cellulose, dolichol phosphate and teichoic acids.\ \ 3869 IPR007813 \

    PilN is a plasmid-encoded, lipoprotein which locates to the outer membrane of bacteria and are part of a thin pilus required only for liquid mating PUBMED:10686134.

    \ 1342 IPR007663 \ Baculoviruses are distinct from other virus families in that there are two viral phenotypes: budded virus (BV) and occlusion-derived virus (ODV). BVs disseminate viral infection throughout the tissues of the host and ODVs transmit baculovirus between insect hosts. GFP tagging experiments implicate p74 as an ODV envelope protein PUBMED:2688302, PUBMED:11514740.\ 6217 IPR009434 \

    This family consists of several mammalian neuroendocrine-specific golgi protein P55 (NESP55) sequences. NESP55 is a novel member of the chromogranin family and is a soluble, acidic, heat-stable secretory protein that is expressed exclusively in endocrine and nervous tissues, although less widely than chromogranins PUBMED:12438142.

    \ 413 IPR004104 \

    Enzymes containing this domain utilise NADP or NAD, and are known as the GFO/IDH/MOCA family in Swiss-Prot.\ GFO is a glucose--fructose oxidoreductase, which converts D-glucose and D-fructose into\ D-gluconolactone and D-glucitol in the sorbitol-gluconate pathway. MOCA is a rhizopine catabolism\ protein which may catalyze the NADH-dependent dehydrogenase reaction involved in rhizopine catabolism.\ Other proteins belonging to this family include Gal80, a negative regulator for the expression of lactose and\ galactose metabolic genes; and several hypothetical proteins from yeast, Escherichia coli and Bacillus subtilis.

    \

    The oxidoreductase, C-terminal domain is almost always associated with the oxidoreductase, N-terminal domain (see ).

    \ 2590 IPR000562 \ Fibronectin is a multi-domain glycoprotein, found in a soluble form in plasma, and in an insoluble form in loose\ connective tissue and basement membranes, that binds cell surfaces and various compounds including collagen,\ fibrin, heparin, DNA, and actin. Fibronectins are involved in a number of important functions e.g., wound\ healing; cell adhesion; blood coagulation; cell differentiation and migration; maintenance of the cellular\ cytoskeleton; and tumour metastasis PUBMED:3031656. The major part of the sequence of fibronectin consists of the\ repetition of three types of domains, which are called type I, II, and III PUBMED:3780752. Type II domain is\ approximately forty residues long, contains four conserved cysteines involved in disulphide bonds and is part of\ the collagen-binding region of fibronectin. In fibronectin the type II domain is duplicated. Type II domains have\ also been found in a range of proteins including blood coagulation factor XII; bovine seminal plasma proteins\ PDC-109 (BSP-A1/A2) and BSP-A3 PUBMED:3606570; cation-independent mannose-6-phosphate receptor PUBMED:1323236;\ mannose receptor of macrophages PUBMED:2373685; 180 Kd secretory phospholipase A2 receptor PUBMED:8294398. DEC-205\ receptor PUBMED:7753172; 72 Kd and 92 Kd type IV collagenase () PUBMED:2834383; and hepatocyte\ growth factor activator PUBMED:7683665.\ 1157 IPR004236 \

    The alpha-lytic protease prodomain is associated with serine peptidases, specifically the alpha-lytic endopeptidases, which are bacterial enzymes and which belong to MEROPS peptidase subfamily S1E (). The protease precursor in Gram-negative bacterial proteases may be a general property of extracellular bacterial proteases PUBMED:3234766. The alpha-lytic protease is encoded with a large (166 amino acid) N-terminal pro region that is required transiently both in vivo and in vitro for the correct folding of the protease domain PUBMED:2507926, PUBMED:1552947. The pro region also acts as a potent inhibitor of the mature enzyme PUBMED:1579568.

    \ 1910 IPR003794 \

    This entry describes proteins of unknown function.

    \ 1635 IPR005603 \

    This non-structural protein does not appear to be essential for viral growth in tissue culture and its physiological role is unknown.

    \ 3669 IPR001290 \ Poly(ADP-ribose) polymerase (PARP) modifies various nuclear proteins by \ poly(ADP-rybosyl)ation. The modification is dependent on DNA and is involved in the \ regulation of various important cellular processes such as differentiation,\ proliferation and tumor transformation and also in the regulation of the molecular\ events involved in the recovery of the cell from DNA damage PUBMED:8390463, PUBMED:8755499. \ Poly(ADP-ribose) polymerase catalyses the covalent attachment of ADP-ribose units from NAD+ to itself and to a limited number of other DNA binding proteins, which decreases \ their affinity for DNA. The C-terminal catalytic domain of the polymerase is almost always associated with the N-terminal regulatory domain (see ).\ 6076 IPR009361 \

    Zw10 and rough deal proteins are both required for correct metaphase check-pointing during mitosis PUBMED:11146659,PUBMED:11146660. These proteins bind to the centromere/kinetochore PUBMED:11146660.

    \ 499 IPR006847 \ This region is found in the N-terminal half of translation initiation factor IF-2. It is found in two copies in IF-2 alpha isoforms, and in only one copy in the N-terminally truncated beta and gamma isoforms PUBMED:1764105. Its function is unknown.\ 3276 IPR001538 \

    Mannose-6-phosphate isomerase or phosphomannose isomerase () (PMI) is the enzyme that catalyzes the interconversion of mannose-6-phosphate and fructose-6-phosphate. In eukaryotes PMI is involved in the synthesis of GDP-mannose, a constituent of N- and O-linked glycans and GPI anchors and in prokaryotes it participates in a variety of pathways, including capsular polysaccharide biosynthesis and D-mannose metabolism. PMI's belong to the cupin superfamily whose functions range from isomerase and epimerase activities involved in the modification of cell wall carbohydrates in bacteria and plants, to non-enzymatic storage proteins in plant seeds, and transcription factors linked to congenital baldness in mammals PUBMED:11165500. Three classes of PMI have been defined PUBMED:8307007.

    \ The type II phosphomannose isomerases are bifunctional enzymes . This entry covers the isomerase region of the protein PUBMED:9507048. The guanosine diphospho-D-mannose pyrophosphorylase region is described in another InterPro entry (see ).\ 686 IPR001930 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M1 (clan MA(E)), the type example being aminopeptidase N from Homo sapiens. The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA.

    \ \ \ Membrane alanine aminopeptidase ()\ is part of the HEXXH+E\ group; it consists entirely of aminopeptidases, spread across a wide\ variety of species PUBMED:7674922. Functional studies show that CD13/APN catalyzes the removal of single amino acids from the amino terminus of small peptides and probably plays a role in their final digestion; one family member (leukotriene-A4 hydrolase) is known to hydrolyse the epoxide leukotriene-A4\ to form an inflammatory mediator PUBMED:7674922. This hydrolase has been shown to\ have aminopeptidase activity PUBMED:2244921, and the zinc ligands of the M1 family\ were identified by site-directed mutagenesis on this enzyme PUBMED:7674922 CD13 participates in trimming peptides bound to MHC class II molecules PUBMED:8691132 and cleaves MIP-1 chemokine, which alters target cell specificity from basophils to eosinophils PUBMED:8627182. CD13 acts as a receptor for specific strains of RNA viruses (coronaviruses) which cause a relatively large percentage of upper respiratory\ trace infections.

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 6003 IPR010386 \

    This family consists of several bacterial tRNA-(MSIO[6]A)-hydroxylase (MiaE) proteins. The modified nucleoside 2-methylthio-N-6-isopentenyl adenosine (ms2i6A) is present at position 37 (3' of the anticodon) of tRNAs that read codons beginning with U except tRNA(I,V Ser) in Escherichia coli. Salmonella typhimurium 2-methylthio-cis-ribozeatin (ms2io6A) is found in tRNA, probably in the corresponding species that have ms2i6A in E. coli. The miaE gene is absent in E. coli, a finding consistent with the absence of the hydroxylated derivative of ms2i6A in this species PUBMED:8253666.

    \ 1238 IPR001669 \

    The arginine dihydrolase (AD) pathway is found in many prokaryotes and some primitive eukaryotes, an example of the latter being Giardia lamblia PUBMED:9504342. The three-enzyme anaerobic pathway breaks down L-arginine to form 1 mol of ATP, carbon dioxide and ammonia. In simpler bacteria, the first enzyme, arginine deiminase, can account for up to 10% of total cell protein PUBMED:9504342.

    \ \

    Most prokaryotic arginine deiminase pathways are under the control of a repressor gene, termed ArgR PUBMED:1583685. This is a negative regulator, and will only release the arginine deiminase operon for expression in the presence of arginine PUBMED:9851988. The crystal structure of apo-ArgR from Bacillus stearothermophilus has been determined to 2.5A by means of X-ray crystallography PUBMED:10331868. The protein exists as a hexamer of identical subunits, and is shown to have six DNA-binding domains, clustered around a central oligomeric core when bound to arginine. It predominantly interacts with A.T residues in ARG boxes. This hexameric protein binds DNA at its N terminus to repress arginine biosyntheis or activate arginine catabolism. Some species have several ArgR paralogs. In a neighbor-joining tree, some of these paralogous sequences show long branches and differ significantly from the well-conserved C-terminal region.

    \ 1254 IPR002640 \

    \ The serum paraoxonases/arylesterases are enzymes that catalyse the hydrolysis\ of the toxic metabolites of a variety of organophosphorus insecticides. The\ enzymes hydrolyse a broad spectrum of organophosphate substrates, including \ paraoxon and a number of aromatic carboxylic acid esters (e.g., phenyl\ acetate), and hence confer resistance to organophosphate toxicity PUBMED:8661009. \

    \

    \ Mammals have 3 distinct paraoxonase types, termed PON1-3 PUBMED:8661009, PUBMED:11038162. In mice and\ humans, the PON genes are found on the same chromosome in close proximity. \ PON activity has been found in variety of tissues, with highest levels in \ liver and serum - the source of serum PON is thought to be the liver. Unlike mammals, fish and avian species lack paraoxonase activity. \

    \

    \ Human and rabbit PONs appear to have two distinct Ca2+ binding sites, one\ required for stability and one required for catalytic activity. The Ca2+\ dependency of PONs suggests a mechanism of hydrolysis where Ca2+ acts as the\ electrophilic catalyst, like that proposed for phospholipase A2. The\ paraoxonase enzymes, PON1 and PON3, are high density lipoprotein (HDL)-\ associated proteins capable of preventing oxidative modification of low\ density lipoproteins (LPL) PUBMED:11038162. Although PON2 has oxidative properties, the\ enzyme does not associate with HDL.\

    \

    \ Within a given species, PON1, PON2 and PON3 share ~60% amino acid sequence \ identity, whereas between mammalian species particular PONs (1,2 or 3) share\ 79-90% identity at the amino acid level. Human PON1 and PON3 share numerous \ conserved phosphorylation and N-glycosylation sites; however, it is not \ known whether the PON proteins are modified at these sites, or whether \ modification at these sites is required for activity in vivo PUBMED:11038162. \

    \ \ This family consists of arylesterases (Also known as serum paraoxonase) . These enzymes hydrolyse organophosphorus esters such as paraoxon and are found in the liver and blood. They confer resistance to organophosphate toxicity PUBMED:9032442. Human arylesterase (PON1) is associated with HDL and may protect against LDL oxidation PUBMED:8661009.\ 709 IPR002591 \ This family consists of phosphodiesterases, including human\ plasma-cell membrane glycoprotein PC-1 / alkaline phosphodiesterase I\ / nucleotide pyrophosphatase (nppase). These enzymes catalyse the\ cleavage of phosphodiester and phosphosulphate bonds in NAD, \ deoxynucleotides and nucleotide sugars PUBMED:9344668. Another member of this family is\ ATX an autotaxin, tumor cell motility-stimulating protein which \ exhibits type I phosphodiesterases activity PUBMED:7982964.\ The alignment encompasses the active site PUBMED:7730366, PUBMED:7982964.\ Also present with in this family is 60 kDa Ca2+-ATPase from \ Myroides odoratus PUBMED:8617788.\ 1182 IPR006799 \ Anti-Mullerian hormone, AMH is a signalling molecule involved in male and female sexual differentiation PUBMED:1782869. Defects in synthesis or action of AMH cause persistent Mullerian duct syndrome (PMDS), a rare form of male pseudohermaphroditism PUBMED:8162013. This family represents the N-terminal part of the protein, which is not thought to be essential for activity PUBMED:8162013. AMH contains a TGF-beta domain (), at the C-terminus.\ 4044 IPR000762 \ Several extracellular heparin-binding proteins involved in regulation of growth and differentiation belong to a new family of growth factors. These growth factors are highly related proteins of about 140 amino acids that contain 10 conserved cysteines probably involved in disulphide bonds, and include pleiotrophin PUBMED:15121180 (also known as heparin-binding growth-associated molecule HB-GAM, heparin-binding growth factor 8 HBGF-8, heparin-binding neutrophic factor HBNF and osteoblast specific protein OSF-1); midkine (MK) PUBMED:15047154; retinoic acid-induced heparin-binding protein (RIHB) PUBMED:7796887; and pleiotrophic factors alpha-1and -2 and beta-1 and -2 from Xenopus laevis, the homologs of midkine and pleiotrophin respectively. Pleiotrophin is a heparin-binding protein that has neurotrophic activity and has mitogenic activity towards fibroblasts. It is highly expressed in brain and uterus tissues, but is also found in gut, muscle and skin. It is thought to possess an important brain-specific function. Midkine is a regulator of differentiation whose expression is regulated by retinoic acid, and, like pleiotrophin, is a heparin-binding growth/differentiation factor that acts on fibroblasts and nerve cells.\ 1000 IPR000242 \

    Tyrosine specific protein phosphatases () (PTPase)\ PUBMED:1650499, PUBMED:1335746, PUBMED:1836211, PUBMED:2560275, PUBMED:2550140 are\ enzymes that catalyse the removal of a phosphate group attached to a tyrosine\ residue.

    \ \ \

    These enzymes are very important in the control of cell growth,\ proliferation, differentiation and transformation. Multiple forms of PTPase\ have been characterised and can be classified into two categories: soluble\ PTPases and transmembrane receptor proteins that contain PTPase domain(s).

    \ \

    Structurally, all known receptor PTPases, are made up of a variable length\ extracellular domain, followed by a transmembrane region and a C-terminal\ catalytic cytoplasmic domain. Some of the receptor PTPases contain fibronectin\ type III (FN-III) repeats, immunoglobulin-like domains, MAM domains or\ carbonic anhydrase-like domains in their extracellular region. The cytoplasmic\ region generally contains two copies of the PTPase domain. The first seems to\ have enzymatic activity, while the second is inactive. The inactive domains of tandem phosphatases can be divided into two classes. Those which bind phosphorylated tyrosine residues may recruit multi-phosphorylated substrates for the adjacent active domains and are more conserved, while the other class have accumulated several variable amino acid substitutions and have a complete loss of tyrosine binding capability. The second class shows a release of evolutionary constraint for the sites around the catalytic centre, which emphasises a difference in function from the first group. There is a region of higher conservation common to both classes, suggesting a new regulatory centre.PUBMED:14739250

    \ \

    PTPase domains consist of about 300 amino acids. There are two conserved\ cysteines, the second one has been shown to be absolutely required for\ activity. Furthermore, a number of conserved residues in its immediate\ vicinity have also been shown to be important.

    \ \ 5781 IPR010276 \

    This family consists of allatostatins, bombystatins, helicostatins, cydiastatins and schistostatin from several insect species. Allatostatins (ASTs) of the Tyr/Phe-Xaa-Phe-Gly Leu/Ile-NH2 family are a group of insect neuropeptides that inhibit juvenile hormone biosynthesis by the corpora allata PUBMED:10098619.

    \ 3267 IPR003697 \

    Maf is a putative inhibitor of septum formation in eukaryotes, bacteria, and archaea.\ The Maf protein shares substantial\ amino acid sequence identity with the Escherichia coli OrfE protein PUBMED:8387996.

    \ \ 2976 IPR006898 \ This domain consists of an alternative C terminus of homeobox-containing transcription factor HNF-1, found in the HNF-1A isoform. Different isoforms of HNF-1 are generated by the differential use of polyadenylation sites and by alternative splicing. The C-terminal region of HNF-1 is responsible for the activation of transcription, and HNF-1A, which has this C-terminal extension, transactivates less well than the B and C isoforms PUBMED:7900999. Mutations and polymorphisms in HNF-1 cause the type 3 form of maturity-onset diabetes of the young (MODY3) PUBMED:9133564.\ 7948 IPR012634 \

    This family consists of PhTx insecticidal neurotoxins that are found in the venom of Brazilian, Phoneutria nigriventer. The venom of the Phoneutria nigrivente contains numerous neurotoxic polypeptides of 30-140 amino acids, which exert a range of biological effects. While some of these neurotoxins are lethal to mice after intracerebroventricular injections, others are extremely toxic to insects of the orders Diptera and Dictyoptera but had much weaker toxic effects on mice PUBMED:10978749.

    \ 6389 IPR010552 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 3422 IPR004316 \ This family includes proteins such as Drosophila saliva PUBMED:9739134, MtN3 involved in root nodule development PUBMED:8634476 and a protein\ involved in activation and expression of recombination activation genes (RAGs) PUBMED:8630032. Although the molecular function of\ these proteins is unknown, they are almost certainly transmembrane proteins. This family contains a region of two\ transmembrane helices that is found in two copies in most members of the family.\ 6550 IPR010610 \

    This entry represents a conserved region of unknown function within bacterial glycosyl transferases. Many proteins containing this domain are members of the glycosyl transferase family 28 .

    \ 5449 IPR008868 \ This family consists of several bacterial TniB NTP-binding proteins. TniB is a probable ATP-binding protein PUBMED:8195081 which is involved in Tn5053 mercury resistance transposition PUBMED:8594337.\ 1809 IPR004868 \ Like DNA-directed DNA polymerase family B, members of this family are also DNA polymerase type B proteins. Those included here are found in plant and fungal mitochondria, and in viruses. \ \ 3953 IPR004969 \

    Proteins in this group show homology to vaccinia virus I1L (Late) encoded protein.

    \ 1130 IPR000850 \ Adenylate kinases (ADK) are phosphotransferases that catalyse the reversible reaction \ \ an essential reaction for many processes in living cells. Two ADK isozymes \ have been identified in mammalian cells. These specifically bind AMP and favour binding to ATP over \ other nucleotide triphosphates (AK1 is cytosolic and AK2 is located in the mitochondria). A third ADK \ has been identified in bovine heart and human cells PUBMED:6088234, this is a mitochondrial GTP:AMP \ phosphotransferase, also specific for the phosphorylation of AMP, but can only use GTP or ITP as a\ substrate PUBMED:218813. ADK has also been identified in different bacterial species and in yeast \ PUBMED:1587477. Two further enzymes are known to be related to the ADK family, i.e. yeast uridine \ monophosphokinase and slime mold UMP-CMP kinase. Within the ADK family there are several conserved \ regions, including the ATP-binding domains. One of the most conserved areas includes an Arg residue, \ whose modification inactivates the enzyme, together with an Asp that resides in the catalytic cleft \ of the enzyme and participates in a salt bridge.\ 6440 IPR009526 \

    Members of this protein family are small, typically about 80 residues in length, and are highly hydrophobic. The gene is found so far only in a subset of the firmicutes in association with genes of the ATP synthase F1 complex or NADH-quinone oxidoreductase. This family includes YwzB from Bacillus subtilis.

    \ 5413 IPR008737 \

    This family of proteins are of unknown function and found exclusively in nematodes PUBMED:7525414.

    \ 2137 IPR007418 \ This is a family of uncharacterised archaeal/bacterial proteins.\ 1700 IPR003317 \ These proteins are cytochrome bd type terminal oxidases that catalyse quinol dependent, Na+ independent oxygen uptake PUBMED:8626304. Members of this family are integral membrane proteins and contain a protoheame IX center B558. \

    Cytochrome bd may play an important role in microaerobic nitrogen fixation in the enteric bacterium Klebsiella pneumoniae, where it is expressed under all conditions that permit diazotrophy PUBMED:9274021.

    \ 4732 IPR000924 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Glutamyl-tRNA synthetase () is a class Ic synthetase and shows several similarities with glutaminyl-tRNA synthetase concerning structure and catalytic properties. It is an alpha2 dimer. To date one crystal structure of a glutamyl-tRNA synthetase (Thermus thermophilus) has been solved. The molecule has the form of a bent cylinder and consists of four domains. The N-terminal half (domains 1 and 2) contains the 'Rossman fold' typical for class I synthetases and resembles the corresponding part of E. coli GlnRS, whereas the C-terminal half exhibits a GluRS-specific structure PUBMED:9426192.\

    \ 7110 IPR010095 \

    This entry represents a region of a sequence similarity between a family of putative transposases of Thermoanaerobacter tengcongensis, smaller related proteins from Bacillus anthracis, putative transposes described by , and other proteins.

    \ 6496 IPR010600 \

    This entry represents the C-terminal region of inter-alpha-trypsin inhibitor heavy chains. Inter-alpha-trypsin inhibitors are glycoproteins with a high inhibitory activity against trypsin, built up from different combinations of four polypeptides: bikunin and the three heavy chains that belong to this family (HC1, HC2, HC3). The heavy chains do not have any protease inhibitory properties but have the capacity to interact in vitro and in vivo with hyaluronic acid, which promotes the stability of the extra-cellular matrix. This domain is associated with the VWA domain .

    \ 949 IPR012680 \

    Laminins are large heterotrimeric glycoproteins involved in basement membrane function PUBMED:15037599. The laminin globular (G) domain can be found in one to several copies in various laminin family members, including a large number of extracellular proteins. The C-terminus of the laminin alpha chain contains a tandem repeat of five laminin G domains, which are critical for heparin-binding and cell attachment activity PUBMED:10747011. Laminin alpha4 is distributed in a variety of tissues including peripheral nerves, dorsal root ganglion, skeletal muscle and capillaries; in the neuromuscular junction, it is required for synaptic specialisation PUBMED:15823034. The structure of the laminin-G domain has been predicted to resemble that of pentraxin PUBMED:9480764.

    \ \

    Laminin G domains can vary in their function, and a variety of binding functions have been ascribed to different LamG modules. For example, the laminin alpha1 and alpha2 chains each have five C-teminal laminin G domains, where only domains LG4 and LG5 contain binding sites for heparin, sulphatides and the cell surface receptor dystroglycan PUBMED:10747011. Laminin G-containing proteins appear to have a wide variety of roles in cell adhesion, signalling, migration, assembly and differentiation. This entry represents one subtype of laminin G domains, which is sometimes found in association with thrombospondin-type laminin G domains ().

    \ 5484 IPR008523 \ This family consists of several bacterial proteins of unknown function.\ 3540 IPR005835 \

    This domain is found in a wide range of enzymes which transfer nucleotides onto phosphosugars.

    \ 6057 IPR010414 \

    The human FRG1 gene maps to human chromosome 4q35 and has been identified as a candidate for facioscapulohumeral muscular dystrophy. Currently, the function of FRG1 is unknown PUBMED:9714712.

    \ 5426 IPR008490 \ This family consists of several proteins from Sulfolobus solfataricus described as first ORF in transposon ISC1212.\ 7502 IPR011699 \ These proteins share some similarity with members of the Major Facilitator Superfamily (MFS).\ 5544 IPR008382 \ This family consists of several mammalian protein kinase A anchoring protein 3 (PRKA3) or A-kinase anchor protein 110 kDa (AKAP 110) sequences. Agents that increase intracellular cAMP are potent stimulators of sperm motility. Anchoring inhibitor peptides, designed to disrupt the interaction of the cAMP-dependent protein kinase A (PKA) with A kinase-anchoring proteins (AKAPs), are potent inhibitors of sperm motility. PKA anchoring is a key biochemical mechanism controlling motility. AKAP110 shares compartments with both RI and RII isoforms of PKA and may function as a regulator of both motility- and head-associated functions such as capacitation and the acrosome reaction PUBMED:10319321.\ 6264 IPR009452 \

    This entry consists of several Pneumovirus matrix glycoprotein M2 sequences. This family functions as a transcription processivity factor that is essential for virus replication PUBMED:12692207.

    \ 7675 IPR012482 \

    The sequences in this family are derived from plant proteins and are similar to a polypeptide expressed by Lemna gibba mRNA was found to be more abundant in dark-treated plants than in those grown in continuous white light conditions (PUBMED:). The function of this polypeptide is unknown.

    \ 4166 IPR000911 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L11 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L11 \ is known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the \ basis of sequence similarities PUBMED:2167467, PUBMED:, groups bacteria, plant chloroplast, read \ algal chloroplast, cyanelle and archaeabacterial L11; and mammalian, plant and yeast L12 (YL15). L11 is \ a protein of 140 to 165 amino-acid residues. In E. coli, the C-terminal half of L11 has been \ shown PUBMED:2483975 to be in an extended and loosely folded conformation and is likely to be buried \ within the ribosomal structure.

    \ 7887 IPR012638 \

    This family consists of the tryptophan (trp) leader peptides. Tryptophan accumulation is the principal event resulting in down regulation of transcription of the structural genes of the trp operon. The leader peptide of the trp operon forms mutually exclusive secondary structures that would either result in the termination of transcription of the trp operon when tryptophan is in plentiful supply or vice versa PUBMED:15262409.

    \ 4280 IPR007073 \ RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 7, represents a mobile module of the RNA polymerase. Domain 7 interacts with the lobe domain of Rpb2 () PUBMED:8910400, PUBMED:11313498.\ 3034 IPR005214 \

    The gene product of gene 3 from Avian infectious bronchitis virus. Currently, the function of this protein remains unknown.

    \ 4713 IPR003201 \ Transposons are mobile DNA sequences capable of replication and insertion into the chromosome. Typically transposons code for the transposase enzyme, which catalyses insertion, found between terminal inverted repeats. Tn5 has a unique method of self- regulation in which a truncated version of the transposase enzyme acts as an inhibitor PUBMED:10207011.\ 1751 IPR002915 \ This family includes the enzyme deoxyribose-phosphate aldolase, which is involved in nucleotide metabolism. \ \ The family also includes a group of related bacterial proteins of unknown function, see examples and .\ 6203 IPR009424 \

    This family consists of several short hypothetical plant proteins of unknown function.

    \ 6428 IPR009520 \

    This family consists of several short, hypothetical phage and bacterial proteins. The function of this family is unknown.

    \ 4883 IPR001441 \

    Synonym(s): Di-trans-poly-cis-undecaprenyl-diphosphate synthase, Undecaprenyl pyrophosphate synthetase, Undecaprenyl pyrophosphate synthase, UPP synthetase

    \ \

    Di-trans-poly-cis-decaprenylcistransferase () (UPP synthetase) \ generates undecaprenyl pyrophosphate (UPP) from isopentenyl pyrophosphate\ (IPP) PUBMED:9882662. This bacterial enzyme is also found in archaebacteria and in a number of uncharacterized proteins including some from yeasts.

    \ 7293 IPR010010 \

    This family consists of several plant, algal and cyanobacterial photosystem I protein M (PsaM) sequences. PsaM forms part of the photosystem I complex and its binding is stabilised by PsaI PUBMED:8787020.

    \ 2969 IPR004889 \ H2-forming N5,N10-methylenetetrahydromethanopterin dehydrogenase (), also known as coenzyme F420-dependent N(5),N(10)-methenyltetrahydromethanopterin reductase, catalyses an intermediate step in methanogenesis from CO(2) and H(2) in bacteria, the conversion of N(5),N(10)-methylenetetrahydromethanopterin and reduced co-enzyme F420 to 5-methyl-5,6,7,8-tetrahydromethanopterin and co-enzyme F420.\ 6960 IPR010783 \

    This family consists of several bacterial HrpN harpin proteins. HrpN is a virulence determinant which elicits lesion formation in Arabidopsis and tobacco and triggers systemic resistance in Arabidopsis PUBMED:12650449.

    \ 1451 IPR001894 \

    The precursor sequences of a number of antimicrobial peptides secreted by neutrophils (polymorphonuclear leukocytes) upon activation have been found to be evolutionarily related and are collectively known as cathelicidins PUBMED:7589491.

    \

    Structurally, these proteins consist of three domains: a signal sequence, a conserved region of about 100 residues that contains four cysteines involved in two disulphide bonds, and a highly divergent C-terminal section of variable size. It is in this C-terminal section that the antibacterial peptides are found; they are proteolytically processed from their precursor by enzymes such as elastase. This structure is shown in the following schematic representation:

    \
    \
       +---+--------------------------------+--------------------+\
       |Sig| Propeptide     C  C  C  C      | Antibacterial pep. |\
       +---+----------------|--|--|--|------+--------------------+\
                            |  |  |  |\
                            +--+  +--+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \ 5519 IPR008811 \ This family consists of several raffinose synthase proteins, also known as seed imbibition (Sip1) proteins. Raffinose (O-alpha- D-galactopyranosyl- (1-->6)- O-alpha- D-glucopyranosyl-(1-->2)- O-beta- D-fructofuranoside) is a widespread oligosaccharide in plant seeds and other tissues. Raffinose synthase () is the key enzyme that channels sucrose into the raffinose oligosaccharide pathway PUBMED:12244450.\ 3271 IPR006958 \ The function of these proteins is unknown. The yeast orthologues have been implicated in cell cycle progression and biogenesis of 60S ribosomal subunits. The Schistosoma mansoni Mak16 has been shown to target protein transport to the nucleolus PUBMED:10838225.\ 7404 IPR011422 \

    These proteins include BRCA1-associated protein 2 (BRAP2), which binds nuclear localisation signals (NLSs) in vitro and in yeast two-hybrid screening PUBMED:9497340. These proteins share a region of sequence similarity at their N terminus. They also have at the C terminus.

    \ 7281 IPR010896 \

    This helix-turn-helix-containing DNA-binding domain is found associated in\ homing nucleases PUBMED:13678957.

    \ \ 3545 IPR004301 \ Nucleoplasmins are also known as chromatin decondensation proteins. They bind to core histones and transfer DNA to them in a reaction that requires ATP. This is thought to play a role in the assembly of regular\ nucleosomal arrays.\ 1379 IPR003784 \ The BioY protein is involved in bioconversion of pimelate into dethiobiotin PUBMED:2110099 although the exact function of the protein is unknown.\ 7480 IPR011417 \

    AP180 is an endocytotic accessory protein that has been implicated in the formation of clathrin-coated pits. The domain is involved in phosphatidylinositol 4,5-bisphosphate binding and is a universal adaptor for nucleation of clathrin coats PUBMED:12740367, PUBMED:12742163.

    \ 5054 IPR007891 \

    CHASE3 is an extracellular sensory domain, which is present in various classes of\ transmembrane receptors that are upstream of signal transduction pathways in bacteria. Specifically,\ CHASE3 domains are found in histidine kinases, adenylate cyclases, methyl-accepting chemotaxis\ proteins and predicted diguanylate cyclases/phosphodiesterases. Environmental factors that are\ recognized by CHASE3 domains are not known at this time PUBMED:12486065.

    \ 2054 IPR007206 \

    This is a protein of unknown function. It is found C-terminal to another domain of unknown function, DUF383 ().

    \ 7811 IPR012944 \

    This domain occurs in several hypothetical proteins. It also occurs in RagB, , a protein involved in signalling PUBMED:7499430 and SusD, , an outer membrane protein involved in nutrient binding PUBMED:11717282.

    \ 5337 IPR008876 \ This family consists of several enterobacterial TraY proteins. TraY is involved in bacterial conjugation where it is required for efficient nick formation in the F plasmid PUBMED:12003924.\ 685 IPR005078 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This is a group of cysteine peptidases which constitute MEROPS peptidase family C54 (Aut2 peptidase family, clan CA), which are a group of proteins of unknown function.

    \ 5707 IPR008803 \ This family consists of several eukaryotic root hair defective 3 like GTP-binding proteins. It has been speculated that the RHD3 protein is a member of a novel class of GTP-binding proteins that is widespread in eukaryotes and required for regulated cell enlargement PUBMED:9087433. The family also contains the homologous Saccharomyces cerevisiae synthetic construct enhancement of YOP1 (SEY1) protein which is involved in membrane trafficking PUBMED:12427979.\ 5408 IPR008763 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to the MEROPS peptidase family S55 (SpoIVB peptidase family, clan PA(S)).

    The protein SpoIVB plays a key role in signaling in the final sigma-K checkpoint of Bacillus subtilis PUBMED:11418578, PUBMED:11741860.

    \ 4352 IPR007226 \ Toxoplasma gondii is a persistent protozoan parasite capable of infecting almost any warm-blooded vertebrate. The surface of T. gondii is coated with a family of developmentally regulated glycosylphosphatidylinositol (GPI)-linked proteins (SRSs), of which SAG1 is the prototypic member. SRS proteins mediate attachment to host cells and interface with the host immune response to regulate the virulence of the parasite. The structure of the immunodominant SAG1 antigen reveals a homodimeric configuration PUBMED:9418898. This family of surface antigens is found in other apicomplexans.\ 1226 IPR003163 \ This DNA-binding domain is found in several yeast proteins involved in transcriptional regulation. Often these proteins also contain the ank domain . The resolved structure of this domain reveals DNA-binding motif characteristic of the CAP family of helix-turn-helix transcription factors.\ 2206 IPR002740 \

    This family of prokaryotic proteins has no known function.

    \ 5409 IPR008762 \

    Bacterial chemotactic-signal transducers PUBMED:3052756 are proteins that respond to\ changes in the concentration of attractants and repellents in the environment,\ and transduce a signal from the outside to the inside of the cell. These\ proteins undergo two covalent modifications: deamidation and reversible\ methylation. Attractants increase the level of methylation while repellents\ decrease it. The methyl groups are added by the methyl-transferase cheR and\ are removed by the methylesterase cheB.

    \ \

    All these proteins are composed of the same structural domains: a N-terminal\ region that resembles a signal peptide, but which is not removed from the\ mature protein and serves as a membrane-spanning region; a periplasmic\ domain of about 160 amino acids that forms the receptor domain; a second\ transmembrane region and finally a C-terminal cytoplasmic domain of about 300\ amino acids which contains the methylation sites.

    \

    The methyl-accepting sites are specific glutamate residues (some of these\ sites are translated as glutamine but are irreversibly deamidated by cheB).\ They are clustered in two regions of the cytoplasmic domain PUBMED:2033064.

    \ 3448 IPR004268 \

    \ The sequencing of a number of pathogenic bacterial genomes has led to novel\ virulence proteins being discovered that are yet to be biochemically \ characterised. One example is the MviN family of proteins, first \ described in Salmonella PUBMED:8200538, and conserved across a wide variety of \ pathogens in both animals and plants. Further work on these proteins of\ as yet unknown function has revealed they are integral membrane molecules, \ and are part of an operon essential in at least one species PUBMED:11274131.\

    \ \ \ 5031 IPR001138 \

    The N-terminal region of a number of fungal transcriptional regulatory\ proteins contains a Cys-rich motif that is involved in zinc-dependent\ binding of DNA. The region forms a binuclear Zn cluster, in which two Zn\ atoms are bound by six Cys residues \ PUBMED:2107541, PUBMED:1557122.\ A wide range of proteins are known to contain this domain. These include the\ proteins involved in arginine, proline, pyrimidine, quinate, maltose and galactose\ metabolism; amide and GABA catabolism; leucine biosynthesis and others.

    \ \ 6031 IPR010402 \

    The CCT (CONSTANS, CO-like, and TOC1) domain is a highly conserved basic module of ~43 amino acids, which is found near the C-terminus of plant proteins often involved in light signal transduction. The CCT domain is found in association with other domains, such as the B-box zinc finger, the GATA-type zinc finger, the ZIM motif or the response regulatory domain. The CCT domain contains a putative nuclear localisation signal within the second half of the CCT motif and has been\ shown to be involved in nuclear localization and probably also has a role in\ protein-protein interaction PUBMED:10926537.

    \ \ 7116 IPR009907 \

    This family consists of several bacterial proteins of around 70 residues in length. The function of this family is unknown.

    \ 7555 IPR013093 \

    This ATPase family, associated with various cellular activities, includes some of the AAA proteins not detected by PUBMED:7646486, PUBMED:9927482.

    \ 6839 IPR009741 \

    This family consists of several hypothetical plant proteins of around 100 residues in length. The function of this family is unknown.

    \ 5510 IPR008734 \ This family consists of several eukaryotic phosphorylase kinase alpha and beta subunits. Phosphorylase kinase (PHK) is a regulatory enzyme in glycogen metabolism. Mutations in the gene encoding the alpha subunit of PHK (PHKA2) have been shown to be responsible for X-linked liver glycogenosis (XLG). XLG, a frequent type of glycogen storage disease, is characterised by hepatomegaly and growth retardation PUBMED:9384616, PUBMED:9835437.\ 3319 IPR003457 \ MerT is an mercuric transport integral membrane protein and is responsible for transport of the Hg2+ iron from periplasmic MerP (also part of the transport system) to mercuric reductase (MerA).\ 7426 IPR011455 \

    This is a family of paralogous proteins in Leptospira interrogans.

    \ 4439 IPR004903 \ Bacterial surface layer proteins are S-layer precursor proteins. The S-layer is a paracrystalline mono-layered assembly of proteins which coat the surface of bacteria.\ \ 786 IPR000039 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Members of this family are large subunit ribosomal proteins which are found in the Eukaryota and Archaea. These proteins have 115 to 187 amino-acid residues. The family consists of:

    \

    \ \ \ 5412 IPR008848 \ This family consists of several plasmid regulatory proteins from the extreme thermophilic and acidophilic archaea Sulfolobus.\ 5633 IPR008661 \ This family consists of several eukaryotic L6 membrane proteins. L6, IL-TMP, and TM4SF5 are cell surface proteins predicted to have four transmembrane domains. Previous sequence analysis led to their assignment as members of the tetraspanin superfamily it has now been found that that they are not significantly related to genuine tetraspanins, but instead constitute their own L6 family PUBMED:10975581. Several members of this family have been implicated in Homo sapiens cancer PUBMED:1565644, PUBMED:9479038.\ 1043 IPR002421 \

    The N-terminal and internal 5'3'-exonuclease domains are commonly found together, and are most often associated with 5' to 3' nuclease activities. The XPG protein signatures () are never found outside the '53EXO' domains. The latter are found in more diverse proteins PUBMED:7926735, PUBMED:10322433, PUBMED:8464724. The number of amino acids that separate the two 53EXO domains, and the presence of accompanying motifs allow the diagnosis of several protein families.

    In the eubacterial type A DNA-polymerases, the N-terminal and internal domains are separated by a few amino acids, usually four. The pattern DNA_POLYMERASE_A () is always present towards the C-terminus. Several eukaryotic structure-dependent endonucleases and exonucleases have the 53EXO domains separated by 24 to 27 amino acids, and the XPG protein signatures are always present. In several proteins from herpesviridae, the two 53EXO domains are separated by 50 to 120 amino acids. These proteins are implicated in the inhibition of the expression of the host genes. Eukaryotic DNA repair proteins with 600 to 700 amino acids between the 53_EXO domains all carry the XPG protein signatures.

    \ 6213 IPR009430 \

    Gas vesicles provide cells with buoyancy, enabling them to remain at the water surface. These organelles are generally synthesized by halophilic archaea and cyanobacteria, as well as some other prokaryotes. A cluster of 12-14 gvp genes is responsible for gas vesicle synthesis, for instance gvpMLKJIHGFEDACNO in Halobacterium sp. PUBMED:15126480. GvpF and GvpL are essential for gas vesicle formation and display sequence similarity to one another, both containing predicted coiled-coil domains that are often involved in self-oligomerisation.

    \ 852 IPR007699 \ This domain was thought to be unique to the SGT1-like proteins, but is also found in calcyclin binding proteins. Sgt1p is a highly conserved eukaryotic protein that is required for both SCF (Skp1p/Cdc53p-Cullin-F-box)-mediated ubiquitination and kinetochore function in yeast and also plays a role in the cAMP pathway. Calcyclin (S100A6) is a member of the S100A family of calcium binding proteins and appears to play a role in cell proliferation PUBMED:12577318.\ 1069 IPR004697 \ The p-aminobenzoyl-glutamate transporter family includes two transporters, the AbgT (YdaH) protein of Escherichia coli and MtrF of Neisseria gonorrhoeae. AbgT is apparently cryptic in wild type cells, but when expressed on a high copy number plasmid, or when expressed at higher levels due to mutation, it allows utilization of p-aminobenzoyl-glutamate as a source of p-aminobenzoate for p-aminobenzoate auxotrophs. p-Aminobenzoate is a constituent of and a precursor for the biosynthesis of folic acid.\ 7971 IPR012640 \

    This family consists of the homologues of the VirB proteins of type IV secretion systems (T4SS). Conjugal transfer across the cell envelope of Gram-negative bacteria is mediated by a supramolecular structure termed mating pair formation (Mpf) complex. Collectively, secretion pathways ancestrally related to bacterial conjugation systems are now known as T4SS. T4SS are involved in the delivery of effector molecules to eukaryotic target cells; each of these systems exports distinct DNA or protein substrates to effect a myriad of changes in host cell physiology during infection PUBMED:11309113.

    \ 5486 IPR008640 \ This seven residue repeat makes up the majority of the sequence of a family of bacterial haemagglutinins and invasins. The representative alignment contains four repeats.\ 2223 IPR007612 \ This is a family of plant and bacterial uncharacterised proteins.\ 5554 IPR008778 \ This region is found the C-terminal half of the Pirin protein. The function of Pirin is unknown but the gene coding for this protein is known to be expressed in all tissues in the human body although it is expressed most strongly in the liver and heart. Pirin is known to be a nuclear protein, exclusively localised within the nucleoplasma and predominantly concentrated within dot-like subnuclear structures. A tomato homologue of human Pirin has been found to be induced during programmed cell death. Human Pirin interacts with Bcl-3 and NFI and hence is probably involved in the regulation of DNA transcription and replication. It appears to be an Fe(II)-containing member of the Cupin superfamily.\ 7830 IPR006518 \

    These sequences are full-length and part-length members of the RHS (retrotransposon hot spot) family in Trypanosoma brucei and Trypanosoma cruzi. Members of this family are frequently interrupted by non-LTR retrotransposons inserted at exactly the same relative position.

    \ 1393 IPR000877 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    This family of eukaryotic proteinase inhibitors, belongs to MEROPS inhibitor family I12, clan IF. They inhibit serine peptidases of the S1 family () PUBMED:14705960.

    \ \

    The Bowman-Birk inhibitor family PUBMED:6996568 is one of the numerous families of serine proteinase inhibitors. They have a duplicated structure and generally possess two distinct inhibitory sites.\ These inhibitors are primarily found in plants and in particular in the seeds of legumes as well as in cereal grains. In cereals they exist in two forms, one of which is a duplication of the basic structure PUBMED:3667571. \ Proteins of the Bowman-Birk inhibitor family of serine proteinase inhibitors interact with the enzymes they inhibit via an exposed surface loop that adopts the canonical proteinase inhibitory conformation. The resulting non-covalent complex renders the proteinase inactive. This inhibition mechanism is common for the majority of serine proteinase inhibitor proteins and many analogous examples are known. A particular feature of the Bowman-Birk inhibitor protein, however, is that the interacting loop is a particularly well-defined disulfide-linked short beta-sheet region PUBMED:11375759, PUBMED:12325158, PUBMED:12643767.

    \ \ \ \ 7908 IPR012992 \

    This family consists of the tetracycline resistance determinant tet(M) leader peptides. A short open reading frame corresponding to a 28 amino acid peptide, which contains a number of inverted repeat sequences was found immediately upstream of tet(M). Transcriptional analyses has found that expression of tet(M) resulted from an extension of a small transcript representing the upstream leader region into the resistance determinant. Therefore, this leader sequence is responsible for transcriptional attenuation and thus regulation of the transcription of tet(M) PUBMED:1323953.

    \ 8012 IPR012530 \

    This family consists of the B melanoma antigen (BAGE) peptides. The BAGE gene encodes a human tumour antigen that is recognised by a cytolytic T lymphocyte. BAGE genes are expressed in melanomas, bladder and lung carcinomas and in a few tumours of other histological types PUBMED:12461691.

    \ 3504 IPR001075 \

    Pioneering investigations on the maturation of Fe-S proteins were performed in bacteria and have led to the identification of two operons termed nif (nitrogen fixation) and isc (iron-sulfur cluster assembly) that function in Fe-S-cluster biosynthesis. The nif operon encodes proteins that execute specific functions in the assembly of nitrogenase, a complex metalloenzyme that catalyses the fixation of nitrogen; some of the Nif proteins are specifically involved in the formation of the Fe-S cluster of nitrogenase and these are found in organisms that do not fix nitrogen PUBMED:8875867, PUBMED:8048161. The isc operon encodes proteins necessary for the maturation of bacterial Fe-S proteins.

    \ \ \

    In a number of organisms, for example Azotobacter vinelandii, NifU is a protein associated with the nif operon. It contains two domains, the N-terminal () and the C-terminal presented in this entry. These domains exist either together or on different polypeptides, both domains being found in organisms that do not fix nitrogen e.g. yeast, so they have a broader significance in the cell than nitrogen fixation. It has been proposed that they are specifically required for the formation and maturation of Fe-S clusters that in eukaryotes occurs in the mitochondrial matrix. In yeast, for example, deletion of the C-terminal domain does not markedly affect Fe-S biosynthesis but in combination with inactivation of ISU1 there is a defect in mitochondrial FE-S-protein maturation.

    \ \ 372 IPR001122 \

    Flaviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include yellow fever, West Nile, tick-borne encephalitis (TBE), Japanese encephalitis (JE) and Dengue type 2 viruses PUBMED:15378043. Flaviviruses consist of three structural proteins: the core nucleocapsid protein C, and the envelope glycoproteins M () and E. The virion of these viruses is a nucleocapsid covered by a lipoprotein envelope, where the nucleocapsid is a complex of capsid protein C and mRNA. The capsid protein C is a dimeric alpha-helical protein, and its interaction with RNA is critical for the production of viable virus particles PUBMED:12768036.

    \ 4035 IPR000549 \ Photosystem I (PSI) PUBMED:3333014 is an integral membrane protein complex that uses light energy to mediate electron transfer from plastocyanin to ferredoxin. It is found in the chloroplasts of plants and cyanobacteria. PSI is composed of at least 14 different subunits, two of which, PSI-G (gene psaG) and PSI-K (gene psaK), are small hydrophobic proteins of about 7 to 9 Kd and evolutionary related PUBMED:8360180. Both seem to contain two transmembrane regions. Cyanobacteria contain only PSI-K.\ 7578 IPR013018 \

    This is a group of uncharacterised proteins from the archaea. MTH865 is a hypothetical 8.4 kDa protein from the archaea Methanobacterium thermoautotrophicum with unknown function. The proteins have an EF-hand like fold.

    \ 149 IPR003495 \ This family of proteins contains P47K, a Pseudomonas chlororaphis protein needed for nitrile hydratase expression, and the cobW gene product, which may be involved in cobalamin biosynthesis in Pseudomonas denitrificans PUBMED:1655697.\ 6703 IPR010681 \

    This family consists of several plethodontid receptivity factor (PRF) proteins which seem to be specific to Plethodon jordani (Jordan's salamander). PRF is a courtship pheromone produced by males increase female receptivity PUBMED:10489368.

    \ 168 IPR007052 \ The function of the CS domain is unknown. The CS domain is sometimes found C-terminal to the CHORD domain () in metazoan proteins, but occurs separately from the CHORD domain in plants. This association is thought to be indicative of an functional interaction between CS and CHORD domains PUBMED:10571178.\ 3063 IPR003443 \ Interleukin-15 (IL-15) is a cytokine that possesses a variety of biological functions, including stimulation and maintenance of cellular immune responses PUBMED:10872679. IL-15 stimulates the proliferation of T-lymphocytes, which requires interaction of IL-15 with components of IL-2R, including IL-2R beta and probably IL-2R gamma, but not IL-2R alpha.\ 2452 IPR006891 \ The enteropathogenic Escherichia coli EspF secreted protein induces host cell apoptosis. Its proline-rich structure suggests that it may act by binding to SH3 domains or EVH1 domains of host cell signalling proteins PUBMED:11298644.\ 3315 IPR004222 \ Methane monooxygenase () catalyses the oxidation of methane to methanol ion the presence of oxygen and NAD(P)H in methanotrophs. It has a broad specificity, hydroxylating many alkanes, and converting alkenes into the corresponding epoxides. In additional reactions CO is oxidized to CO(2), ammonia is oxidized to hydroxylamine, and some aromatic compounds and cyclic alkanes can also be hydroxylated, although more slowly. In Methylococcus capsulatus there are two forms of the enzyme, a soluble and a membrane-bound type. The soluble form consists of 3 components, A, B and C. Protein A is made up of 3 chains, alpha, beta and gamma. This family is the gamma chain.\ 2307 IPR007750 \ This family is found in Arabidopsis thaliana and contains uncharacterised proteins.\ 142 IPR003333 \

    Methyl transfer from the ubiquitous S-adenosyl-L-methionine (AdoMet) to either nitrogen, oxygen or carbon atoms is frequently employed in diverse organisms ranging from bacteria to plants and mammals. The reaction is catalyzed by methyltransferases (Mtases) and modifies DNA, RNA, proteins and small molecules, such as catechol for regulatory purposes. The various aspects of the role of DNA methylation in prokaryotic restriction-modification systems and in a number of cellular processes in eukaryotes including gene regulation and differentiation is well documented.

    \

    This entry represents cyclopropane-fatty-acyl-phospholipid synthase that is slosely related to methyltransferases.

    \

    Cyclopropane-fatty-acyl-phospholipid synthase or CFA synthase catalyses the reaction:

    \ \

    The major mycolic acid produced by Mycobacterium tuberculosis contains two cis-cyclopropanes in the meromycolate chain. Cyclopropanation may contribute to the structural integrity of the cell wall complex PUBMED:7592990.

    \ \ 5688 IPR008638 \

    This entry represents a conserved domain found near the N-terminus of a number of large, repetitive bacterial proteins, including many proteins of over 2500 amino acids. A number of the members of this family have been designated adhesins, filamentous haemagglutinins, heme/hemopexin-binding protein, etc. Members generally have a signal sequence, then an intervening region, then the region described in this entry. Following this region, proteins typically have regions rich in repeats but may show no homology between the repeats of one member and the repeats of another. This domain is suggested to be a carbohydrate-dependent haemagglutination activity site PUBMED:11703654.

    \ 5072 IPR007909 \

    This family consists of several bacterial conjugal transfer proteins, TrbD. TrbD contains a\ nucleotide binding motif and may provide energy for the export of DNA or the export of other Trb\ proteins PUBMED:8763954.

    \ 4987 IPR007214 \ This domain of unknown function is found in numerous prokaryote organisms. The structure of YbaK shows a novel fold. This domain also occurs in a number of prolyl-tRNA synthetases (proRS) from prokaryotes. Thus, the domain is thought to be involved in oligonucleotide binding, with possible roles in recognition/discrimination or editing of prolyl-tRNA PUBMED:10813833.\ 2404 IPR006957 \ Ethylene insensitive 3 (EIN3) proteins are a family of plant DNA-binding proteins that regulate transcription in response to the gaseous plant hormone ethylene, and are essential for ethylene-mediated responses. \ \ In the presence of ethylene, dark-grown dicotyledonous\ seedlings undergo dramatic morphological changes collectively known as the 'triple response'. In Arabidopsis, these changes consist of a radial swelling of the hypocotyl, an exaggeration in the\ curvature of the apical hook, and the inhibition of cell elongation in the hypocotyl and root.\ 3660 IPR002895 \ The G surface protein of Paramecium primaurelia has important internal homologies and a periodic structure, which could be dictated in part by the rigid scaffolding of cysteine residues. The predicted secondary structure shows a quasi absence of alpha-helix and an abundance of beta-pleated sheets and random coils. The monotony of the amino acid sequence is in favour of a structural role for the protein PUBMED:3783679. This structure is based on the presence of 37 periods of about 75 residues, each period containing eight cysteine residues PUBMED:2308165. Homologies with other proteins are limited to surface antigens of trypanosomes.\ 5931 IPR009293 \

    This family consists of bacterial sequences several of which are thought to be general stress proteins.

    \ 3685 IPR000730 \

    Proliferating cell nuclear antigen (PCNA), or cyclin, is a non-histone acidic nuclear protein\ PUBMED:2884104 that plays a key role in the control of eukaryotic DNA replication PUBMED:1346518.\ It acts as a co-factor for DNA polymerase delta, which is responsible for leading strand DNA\ replication PUBMED:2565339. The sequence of PCNA is well conserved between plants and animals,\ indicating a strong selective pressure for structure conservation, and suggesting that this type\ of DNA replication mechanism is conserved throughout eukaryotes PUBMED:1671766. In yeast, POL30, is associated with polymerase III, the yeast analog of polymerase delta.

    \ \ \ \

    Homologues of\ PCNA have also been identified in the archaea (euryarchaeota and crenarchaeota) and in Paramecium bursaria chlorella virus and in nuclear polyhedrosis viruses.

    \ 8115 IPR001191 \ Geminiviruses are characterised by a genome of circular single-stranded DNA encapsidated\ in twinned (geminate) quasi-isometric particles, from which the group derives its name\ PUBMED:. Most geminiviruses can be divided\ into 2 subgroups on the basis of host range and/or insect vector: i.e. those that infect dicotyledenous plants and are transmitted by the same whitefly species, and\ those that infect monocotyledenous plants and are transmitted by different leafhopper\ vectors. The genomes of the whitefly-transmitted cassava latent (CLV),\ tomato golden mosaic (TGMV) and bean golden mosaic (BGMV) viruses possess a bipartite\ genome. By contrast, only a single DNA component has been identified for the leafhopper-transmitted \ maize streak (MSV) and wheat dwarf (WDV) viruses PUBMED:6526009, PUBMED:2829117.\ Beet curly top (BCTV), bean summer death and tobacco yellow dwarf viruses belong to a\ third possible subgroup. BCTV is transmitted by a specific leafhopper species, yet like\ the whitefly-transmitted geminiviruses it has a host range confined to dicotyledenous\ plants.\ 7130 IPR010843 \

    This family consists of several bacterial and archaeal AroM proteins. In Escherichia coli the aroM gene is cotranscribed with aroL PUBMED:3001025. The function of this family is unknown.

    \ 6708 IPR010683 \

    This family represents a conserved region within a number of proteins of unknown function that seem to be specific to Arabidopsis thaliana. Note that some family members contain more than one copy of this region.

    \ 3161 IPR006826 \ Lantibiotics are ribosomally synthesised antimicrobial agents derived from ribosomally synthesised peptides PUBMED:1539969. They are produced by bacteria of the Firmicutes phylum, and include mutacin, subtilin, and nisin. Lantibiotic peptides contain thioether bridges termed lanthionines that are thought to be generated by dehydration of serine and threonine residues followed by addition of cysteine residues PUBMED:12127987. This family constitutes the N terminus of the enzyme proposed to catalyse the dehydration step PUBMED:12127987, PUBMED:10215865.\ 1460 IPR003199 \ This family of choloylglycine hydrolases includes conjugated bile acid hydrolase (CBAH) and penicillin acylase which cleave carbon-nitrogen bonds, other than peptide bonds, in linear amides. \ 6452 IPR010580 \

    This family consists of several ribosome associated membrane protein RAMP4 (or SERP1) sequences. Stabilisation of membrane proteins in response to stress involves the concerted action of a rescue unit in the ER membrane comprised of SERP1/RAMP4, other components of the translocon, and molecular chaperones in the ER PUBMED:10601334.

    \ 6564 IPR010617 \

    This family represents a conserved region within a number of hypothetical proteins of unknown function found in eukaryotes, bacteria and archaea. These may possibly be integral membrane proteins.

    \ 3384 IPR007835 \ The MOFRL(multi-organism fragment with rich Leucine) domain is found in bacteria and eukaryotes. The function of this domain is not clear, although it exists in some putative enzymes such as reductases and kinases.\ 5395 IPR008707 \ This domain consists of several PilC protein sequences from Neisseria gonorrhoeae and Neisseria meningitidis. PilC is a phase-variable protein associated with pilus-mediated adherence of pathogenic Neisseria to target cells PUBMED:9467907.\ 2889 IPR006878 \ This is a family of herpesvirus proteins including Epstein-barr virus protein BBRF1.\ 8017 IPR012531 \

    This family consists of the BB1 proteins. BB1 is a growth regulating protein that is expressed by multiple tissues in humans, including the lung. BB1 has been shown to function in cell growth-related processes of foetal and early postnatal lung PUBMED:11033765.

    \ 1363 IPR000409 \

    The "beige" mouse is established as an animal model of Chediak-Higashi\ Syndrome (CHS) PUBMED:8896560. The BEACH domain was described in the BEIGE protein\ (D1035670) and in the highly homologous CHS protein . It is also\ found in distantly related proteins like, for example, \ and which are factor associated with neutral\ sphingomyelinase activation PUBMED:9620659.

    \ \

    The BEACH domain is usually followed by a series of WD repeats (). The function of the BEACH domain is\ unknown.

    \ 6022 IPR009338 \

    This is a family of conserved bacteriophage open reading frames.

    \ 2043 IPR002696 \ This is a family of short (70 amino acid) hypothetical\ proteins from various bacteria. They contain three conserved \ cysteine residues. from Aeromonas hydrophila has\ been found to have hemolytic activity (unpublished).\ 7687 IPR012446 \

    Sequences found in this family are derived from hypothetical eukaryotic proteins of unknown function. The region in question is approximately 280 residues long.

    \ 6843 IPR009744 \

    This family consists of several bacterial VirC1 proteins. In Agrobacterium tumefaciens, a cis-active 24-base-pair sequence adjacent to the right border of the T-DNA, called overdrive, stimulates tumour formation by increasing the level of T-DNA processing. It is thought that the virC operon, which enhances T-DNA processing probably, does so because the VirC1 protein interacts with overdrive. It has now been shown that the virC1 gene product binds to overdrive but not to the right border of T-DNA PUBMED:2592351.

    \ 2905 IPR005206 \

    The immediate-early protein ICP4 (infected-cell polypeptide 4) is required for efficient transcription of early and late viral genes and is thus essential for productive infection. ICP4 is a large phosphoprotein that binds DNA in a sequence specific manner as a homodimer. ICP4 represses transcription from LAT, ICP4 and ORF-P that have high-affinity a ICP4 binding site that spans the transcription initiation site. ICP4 proteins have two highly conserved regions, this family contains the N-terminal region that contains sites for DNA binding and homodimerisation PUBMED:11739685.

    \ 5340 IPR008390 \ Members of this family are 19 kDa membrane proteins. The levels of the plant protein AWPM-19 increase dramatically when there is an increase level of abscisic acid. The increase presence of this protein leads to greater tolerance of freezing PUBMED:9249988.\ 3895 IPR002569 \

    Peptide methionine sulphoxide reductase (Msr) reverses the inactivation of many proteins due to the oxidation of critical methionine residues by reducing methionine sulphoxide, Met(O), to methionine PUBMED:10841552. It is present in most living organisms, and the cognate structural gene belongs to the so-called minimum gene set PUBMED:8994848, PUBMED:8816789.

    \ \

    The domains: MsrA and MsrB, reduce different epimeric forms of methionine sulphoxide. This group represent MsrA, the crystal structure of which has been determined in a number of organisms. In Mycobacterium tuberculosis, the MsrA structure has been determined to 1.5 A° resolution PUBMED:12837786. \ \ In contrast to the three catalytic cysteine residues found in previously characterised MsrA structures, M. tuberculosis MsrA represents a class containing only two functional cysteine residues. The overall structure shows no resemblance to the structures of MsrB () from other organisms; though the active sites show approximate mirror symmetry. In each case, conserved amino acid motifs mediate the stereo-specific recognition and reduction of the substrate.

    \ \

    In a number of pathogenic bacteria including Neisseria gonorrhoeae, the MsrA and MsrB domains are fused; the MsrA being N-terminal to MsrB. This arrangement is reversed in Treponema pallidum. In Neisseria gonorrhoeae and Neisseria meningtidis a thioredoxin domain is fused to the N-terminus. This may function to reduce the active sites of the downstream MsrA and MsrB domains.

    \ \ 4333 IPR000085 \ RuvA forms a complex with RuvB (), this complex is a helicase that mediates the\ Holliday junction migration by localised denaturation and re-annealing.\ RuvA stimulates, in the presence of DNA, the weak\ ATPase activity of RuvB. RuvA binds both single- and double-\ stranded DNA (dsDNA). RUVA binds preferentially to supercoiled rather\ than to relaxed dsDNA.\ 4533 IPR003674 \

    N-linked glycosylation is a ubiquitous protein modification, and is essential for viability in eukaryotic cells. A lipid-linked core-oligosaccharide is assembled at the membrane of the endoplasmic reticulum and transferred to selected asparagine residues of nascent polypeptide chains by the oligosaccharyl transferase (OTase) complex PUBMED:7588624.

    \ \

    This family consists of the oligsacharyl transferase STT3 subunit and related proteins. The STT3 subunit is part of the oligosccharyl transferase (OTase) complex of proteins and is required for its activity PUBMED:7588624.

    \ 44 IPR002007 \

    Peroxidases are haem-containing enzymes that use hydrogen peroxide as\ the electron acceptor to catalyse a number of oxidative reactions.

    \ \

    Peroxidases are found in bacteria, fungi, plants and animals. On the basis\ of sequence similarity, a number of animal haem peroxidases can be\ categorised as members of a superfamily: myeloperoxidase (MPO); eosinophil\ peroxidase (EPO); lactoperoxidase (LPO); thyroid peroxidase (TPO);\ prostaglandin H synthase (PGHS); and peroxidasin PUBMED:8062820, PUBMED:7922023, PUBMED:2840655.

    \

    MPO plays a major role in the oxygen-dependent microbicidal system of neutrophils. EPO from eosinophilic granulocytes \ participates in immunological reactions, and potentiates tumor necrosis \ factor (TNF) production and hydrogen peroxide release by human monocyte-derived macrophages PUBMED:2548579, PUBMED:7774640. In the main, MPO (and possibly EPO) utilises Cl-ions and H2O2 to form hypochlorous acid (HOCl), which can effectively kill\ bacteria or parasites. In secreted fluids, LPO catalyses the oxidation of thiocyanate ions (SCN-) by H2O2, producing the weak oxidising agent \ hypothiocyanite (OSCN-), which has bacteriostatic activity PUBMED:6295491. TPO uses \ I- ions and H2O2 to generate iodine, and plays a central role in the \ biosynthesis of thyroid hormones T(3) and T(4).

    \

    To date, the 3D structures of MPO and PGHS have been reported. MPO is a \ homodimer: each monomer consists of a light (A or B) and a heavy (C or D) \ chain resulting from post-translational excision of 6 residues from the \ common precursor. Monomers are linked by a single inter-chain disulphide. \ Each monomer includes a bound calcium ion PUBMED:1320128. PGHS exists as a symmetric \ dimer, each monomer of which consists of 3 domains: an N-terminal epidermal\ growth factor (EGF) like module; a membrane-binding domain; and a large\ C-terminal catalytic domain containing the cyclooxygenase and the peroxidase \ active sites. The catalytic domain shows striking structural similarity to \ MPO. The cyclooxygenase active site, which catalyses the formation of \ prostaglandin G2 (PGG2) from arachidonic acid, resides at the apex of a \ long hydrophobic channel, extending from the membrane-binding domain to the\ centre of the molecule. The peroxidase active site, which catalyses the\ reduction of PGG2 to PGH2, is located on the other side of the molecule, at\ the haem binding site PUBMED:8121489. Both MPO and the catalytic domain of PGHS are \ mainly alpha-helical, 19 helices being identified as topologically and\ spatially equivalent; PGHS contains 5 additional N-terminal helices that\ have no equivalent in MPO. In both proteins, three Asn residues in each\ monomer are glycosylated.

    \ 4811 IPR001233 \ A number of uncharacterized proteins including Escherichia coli rtcB, Mycobacterium \ tuberculosis MtCY441.01., Caenorhabditis elegans F16A11.2 and Methanococcus jannaschii \ MJ0682 belong to this family.\ 7703 IPR012454 \

    The proteins in this entry have not been characterised.

    \ 2867 IPR000745 \ NS4a (non-structural protein) forms an integral part of the NS3 serine protease in Hepatitis C virus, \ as it is required in a number of cases as a cofactor of cleavage PUBMED:9568891, PUBMED:9261364. It has \ also been reported that NS4a interacts with NS4b and NS3 to form a multi-subunit replicase complex \ PUBMED:9261364.\ 5996 IPR009324 \

    This is a family of uncharacterised proteins found in bacteria and archaea.

    \ 6189 IPR009418 \

    This family consists of several hypothetical Mycobacterium leprae specific proteins. The function of this family is unknown.

    \ 4991 IPR003105 \

    This domain has been termed SRAYDG, for SET and Ring finger Associated, and because of the conserved YDG motif within the domain. Further characteristics of the domain are the conservation of up to 13 evenly spaced glycine residues and a VRV(I/V)RG motif. The domain is mainly found in plants and animals and in bacteria. In animals, this domain is associated with the Np95-like ring finger protein and the related gene product Np97, which contains PHD and RING FINGER domains and which is an important determinant in cell cycle progression. Np95 is a chromatin-associated ubiquitin ligase, binding to histones is direct and shows a remarkable preference for histone H3 and its N-terminal tail. The SRA-YDG domain contained in Np95 is indispensable both for the interaction with histones and for chromatin binding in vivo PUBMED:9880673, PUBMED:14993289.\ In plants the SRA-YDG domain is associated with the SET domain, found in a family of histone methyl transferases, and in bacteria it is found in association with HNH, a non-specific nuclease motif PUBMED:14993289, PUBMED:11691919.

    \ \ 4716 IPR007194 \ TRAPP plays a key role in the targeting and/or fusion of ER-to-Golgi transport vesicles with their acceptor compartment. TRAPP is an 800 kDa protein that contains at least 10 subunits.\ 3959 IPR005007 \

    This is a family of proteins expressed by members of the Poxviridae.

    \ 7353 IPR001496 \

    The SOCS box was first identified in SH2-domain-containing proteins of the suppressor of cytokines signaling (SOCS) family PUBMED:9202125 but was later also found in:

    \ \

    The SOCS box found in these proteins is an about 50 amino acid carboxy-terminal domain composed of two blocks of well-conserved residues separated by between 2 and 10 nonconserved residues PUBMED:9419338. The C-terminal conserved region is an L/P-rich sequence of unknown function, whereas the N-terminal conserved region is a consensus BC box PUBMED:9869640, which binds to the Elongin BC complex PUBMED:9869640, PUBMED:10051596. It has been proposed that this association could couple bound proteins to the ubiquitination or proteasomal compartments PUBMED:10051596.

    \ 3892 IPR005002 \ This enzyme () is involved in the synthesis of the GDP-mannose and dolichol-phosphate-mannose required for a number of critical mannosyl transfer reactions.\ 5967 IPR009310 \

    This is a family of bacterial proteins located in the phenyl dioxygenase (bph) operon. The function of this family is unknown.

    \ 8149 IPR013253 \

    This domain is found in cell division proteins which are required for kinetochore-spindle association PUBMED:15371542.

    \ 508 IPR003308 \

    Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is\ composed of three domains; the amino-terminal zinc binding domain, the central\ domain is the catalytic domain rve and the carboxyl terminal domain is a DNA binding domain. Often found as part of the POL polyprotein.

    \ 3921 IPR000864 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    This family of proteinase inhibitors belong to MEROPS inhibitor family I13, clan IG. They inhibit peptidases of the S1 () and S8 ()families PUBMED:14705960. \ Potato inhibitor type I sequences are not solely restricted to potatoes but are found in other plant species for example: barley endosperm chymotrypsin inhibitor PUBMED:3106042, and pumpkin trypsin inhibitor. Exceptions are found in leech's, e.g. the medicinal leech Hirudo medicinalis, but not other metazoa PUBMED:3519213. In general, the proteins have retained a specificity towards chymotrypsin-like and elastase-like proteases PUBMED:. Structurally these inhibitors are small (60 to 90 residues) and in contrast with other families of protease inhibitors, they lack disulphide bonds. The inhibitor is a wedge-shaped molecule, its pointed edge formed by the protease-binding loop, which contains the scissile bond. The loop binds tightly to the protease active site, subsequent cleavage of the scissile bond causing inhibition of the enzyme PUBMED:3519213.

    \ \

    The inhibitors (designated type I and II) are \ synthesised in potato tubers, increasing in concentration as the tuber develops. Synthesis of the inhibitors throughout the plant is also induced by leaf damage; this systemic response being triggered by the release of a putative plant hormone PUBMED:.

    \ \

    Examples found in the bacteria and archaea are probable false positives.

    \ \ 7592 IPR011675 \ This is a family of hypothetical bacterial and bacteriophage proteins. The region in question is approximately 150 residues long and is highly conserved throughout the family.\ 4215 IPR000552 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of sequence\ similarities. One of these families consists of mammalian PUBMED:3396452, Trypanosoma brucei,\ Caenorhabditis elegans and fungal L44, and Haloarcula marismortui LA PUBMED:8504167.

    \ 4038 IPR006924 \ This small acidic protein is found in 30S ribosomal subunit of cyanobacteria and plant plastids. In plants it has been named plastid-specific ribosomal protein 3 (PSRP-3), and in cyanobacteria it is named Ycf65. Plastid-specific ribosomal proteins may mediate the effects of nuclear factors on plastid translation. The acidic PSRPs are thought to contribute to protein-protein interactions in the 30S subunit, and are not thought to bind RNA PUBMED:10874039.\ 201 IPR001194 \

    The human serine- and leucine-rich DENN protein possesses a RGD cellular adhesion motif and a leucine-zipper-like motif associated with protein dimerization, and shows partial homology to the receptor binding domain of tumor necrosis factor alpha. DENN is virtually identical to MADD, a human MAP kinase-activating death domain protein that interacts with type I tumor necrosis factor receptor. DENN displays significant homology to Rab3 GEP, a rat GDP/GTP exchange protein specific for Rab3 small G proteins implicated in intracellular vesicle trafficking. DENN also exhibits strong similarity to Caenorhabditis elegans AEX-3, which interacts with Rab3 to regulate synaptic vesicle release PUBMED:9796103. The DENN domain is always encircled on both sides by more divergent domains, known as uDENN () and dDENN (), which could play a key role in DENN function.

    \ 4777 IPR004033 \ A number of methyltransferases have been shown to share regions of\ similarities PUBMED:9045837. Apart from the ubiquinone/menaquinone biosynthesis methyltransferases (for example, the C-methyltransferase from the ubiE gene of Escherichia coli), the ubiquinone biosynthesis methyltransferases (for example, the C-methyltransferase from the COQ5 gene of Saccharomyces cerevisiae) and the menaquinone biosynthesis methyltransferases (for example, the C-methyltransferase from the MENH gene of Bacillus subtilis), this family also includes methyltransferases involved in biotin and sterol biosynthesis and in phosphatidylethanolamine methylation.\ 7549 IPR011710 \

    This domain is found at the C terminus of the coatamer beta subunit proteins (Beta-coat proteins). This C-terminal domain probably adapts the function of the N-terminal domain.

    \ 1123 IPR007530 \

    Also known as aminoglycoside 6-adenylyltransferase (), this protein confers resistance to aminoglycoside antibiotics.

    \ 6806 IPR009723 \

    This entry represents a conserved region approximately 150 residues long located towards the N terminus of the POP1 subunit that is common to both the RNase MRP and RNase P ribonucleoproteins () PUBMED:7926742. These RNA-containing enzymes generate mature tRNA molecules by cleaving their 5' ends.

    \ 4145 IPR000406 \ The GDP dissociation inhibitor for rho proteins, rho GDI, regulates GDP/GTP\ exchange. The protein contains 204 amino acids, with a calculated Mr value\ of 23,421. Hydropathy analysis shows it to be largely hydrophilic, with a\ single hydrophobic region. Results of database searches suggest rho GDI is\ a novel protein, currently with no known homologue. \ The protein plays an important role in the activation of the superoxide\ (O2-)-generating NADPH oxidase of phagocytes. This process requires the\ interaction of membrane-associated cytochrome b559 with 3 cytosolic\ components: p47-phox, p67-phox and a heterodimer of the small G-protein\ p21rac1 and rho GDI PUBMED:8223583. The association of p21rac and GDI inhibits\ dissociation of GDP from p21rac, thereby maintaining it in an inactive form.\ The proteins are attached via a lipid tail on p21rac that binds to the\ hydrophobic region of GDI PUBMED:8796870. Dissociation of these proteins might be\ mediated by the release of lipids (e.g., arachidonate and phosphatidate)\ from membranes through the action of phospholipases PUBMED:8796870. The lipids may then\ compete with the lipid tail on p21rac for the hydrophobic pocket on GDI.\ 5790 IPR010282 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 6051 IPR009350 \

    This family represents the minor tail protein T of Lambda-like viruses and their prophage. The minor tail protein T is located at the distal end and is involved in the assembly of the initiator complex for tail polymerisation. The protein is essential for tail assembly but is not found in the mature virion PUBMED:6220514.

    \ 2748 IPR000602 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 38 comprises enzymes with only one known activity; alpha-mannosidase () ().

    \ \

    Lysosomal alpha-mannosidase is necessary for the catabolism of N-linked carbohydrates released during glycoprotein turnover. The enzyme\ catalyzes the hydrolysis of terminal, non-reducing alpha-D-mannose residues in alpha-D-mannosides, and can\ cleave all known types of alpha-mannosidic linkages. Defects in the gene cause lysosomal alpha-mannosidosis\ (AM), a lysosomal storage disease characterized by the accumulation of unbranched oligo-saccharide chains.

    \ 3169 IPR005413 \

    The type III secretion system of Gram-negative bacteria is used to transport virulence factors from the pathogen directly into the host cell PUBMED:9618447 and is only triggered when the bacterium comes into close contact with the host. Effector proteins secreted by the type III system do not possess a secretion signal, and are considered unique because of this. Yersinia spp. secrete effector proteins called YopB and YopD that facilitate the spread of \ other translocated proteins through the type III needle and the host cell cytoplasm PUBMED:9440524. In turn, the transcription of these moieties is thought to be\ regulated by another gene, lcrV, found on the Yops virulon that encodes the \ entire type III system PUBMED:9495760. The product of this gene, LcrV protein, also \ regulates the secretion of YopD through the type III translocon PUBMED:11443094, and \ itself acts as a protective "V" antigen for Yersinia pestis, the causative \ agent of plague PUBMED:11489861.

    \ \

    Recently, a homologue of the Yersinia LcrV protein (PcrV) was found in\ Pseudomonas aeruginosa, an opportunistic pathogen. In vivo studies using\ mice found that immunisation with the protein protected burned animals from \ infection by Pseudomonas aeruginosa, and enhanced survival. In addition, it\ is speculated that PcrV determines the size of the needle pore for type III\ secreted effectors. PUBMED:11500471.\

    \ 1812 IPR001462 \ This domain is at the C-terminus of hepadnavirus P proteins and represents a functional domain that controls the RNase H activities of the protein. The domain is always associated with and .\ 1872 IPR003341 \ This signature describes a cysteine repeat C-X3-C-X3-C the function of which is unknown as is the function of the proteins in which they occur. Most of the sequences in this group are from Caenorhabditis elegans and Caenorhabditis briggsae.\ 6362 IPR009496 \

    This entry contains of several mammalian and one bird sequence from Gallus gallus (Chicken) and represents the C-terminal region of several sequences, but in others it represents the full protein. All of the mammalian proteins are hypothetical and have no known function, but from the chicken is annotated as being a repulsive guidance molecule (RGM). RGM is a GPI-linked axon guidance molecule of the retinotectal system. RGM is repulsive for a subset of axons, those from the temporal half of the retina. Temporal retinal axons invade the anterior optic tectum in a superficial layer, and encounter RGM expressed in a gradient with increasing concentration along the anterior-posterior axis. Temporal axons are able to receive posterior-dependent information by sensing gradients or concentrations of guidance cues. Thus, RGM is likely to provide positional information for temporal axons invading the optic tectum in the stratum opticum PUBMED:12353034.

    \ \ 3951 IPR007678 \

    Protein G5 is found in a number of Poxviruses.

    \ 3453 IPR001609 \

    Muscle contraction is caused by sliding between the thick and thin filaments of the myofibril. Myosin is a major component of thick filaments and exists as a hexamer of 2 heavy chains PUBMED:1939027, 2 alkali light chains, and 2 regulatory light chains. The heavy chain can be subdivided into the N-terminal globular head and the C-terminal coiled-coil rod-like tail, although some forms have a globular region in their C-terminal. There are many cell-specific isoforms of myosin heavy chains, coded for by a multi-gene family PUBMED:2806546. Myosin interacts with actin to convert chemical energy, in the form of ATP, to mechanical energy PUBMED:3540939. The 3-D structure of the head portion of myosin has been determined PUBMED:8316857 and a model for actin-myosin complex has been constructed PUBMED:8316858.

    \

    The globular head is well conserved, some highly-conserved regions possibly relating to functional and structural domains PUBMED:6576334. The rod-like tail starts with an invariant proline residue, and contains many repeats of a 28 residue region, interrupted at 4 regularly-spaced points known as skip residues. Although the sequence of the tail is not well conserved, the chemical character is, hydrophobic, charged and skip residues occuring in a highly ordered and repeated fashion PUBMED:6576334.

    \ 410 IPR004129 \ Glycerophosphoryl diester phosphodiesterases display broad specificity for glycerophosphodiesters; glycerophosphocholine, glycerophosphoethanolamine, glycerophosphoglycerol, and bis(glycerophosphoglycerol) all of which are are hydrolysed by this enzyme.\ 1853 IPR002826 \

    The prokaryotic proteins in this family have no known function.

    \ 182 IPR006767 \

    This group of sequences contain a conserved C-terminal domain which is found in the Schizosaccharomyces pombe protein CwfJ (). CwfJ is part of the Cdc5p complex involved in mRNA splicing PUBMED:11884590. This domain is found in association with , which is generally N-terminal and adjacent to this domain.

    \ 6986 IPR009827 \

    This entry represents the N-terminal region of the bacterial dicarboxylate carrier protein MatC. The MatC protein is an integral membrane protein that could function as a malonate carrier PUBMED:9826185.

    \ 1024 IPR007886 \

    Alanine dehydrogenases () and pyridine nucleotide transhydrogenase () have been\ shown to share regions of similarity PUBMED:8439307. Alanine dehydrogenase catalyzes the NAD-dependent\ reversible reductive amination of pyruvate into alanine. Pyridine nucleotide transhydrogenase catalyzes\ the reduction of NADP+ to NADPH with the concomitant oxidation of NADH to NAD+. This enzyme is located\ in the plasma membrane of prokaryotes and in the inner membrane of the mitochondria of eukaryotes. The\ transhydrogenation between NADH and NADP is coupled with the translocation of a proton across the\ membrane. In prokaryotes the enzyme is composed of two different subunits, an alpha chain (gene pntA)\ and a beta chain (gene pntB), while in eukaryotes it is a single chain protein. The sequence of alanine\ dehydrogenase from several bacterial species are related with those of the alpha subunit of bacterial\ pyridine nucleotide transhydrogenase and of the N-terminal half of the eukaryotic enzyme. The two most\ conserved regions correspond respectively to the N-terminal extremity of these proteins, represented in this entry, and to a central\ glycine-rich region which is part of the NAD(H)-binding site.

    \ 2181 IPR007551 \ This is a family of uncharacterised proteins.\ 8064 IPR013258 \

    This domain is associated with the N terminus of striatin. Striatin is an intracellular protein which has a caveolin-binding motif, a coiled-coil structure, a calmodulin-binding site, and a WD () repeat domain PUBMED:10748158. It acts as a scaffold protein PUBMED:15569929 and is involved in signalling pathways PUBMED:10748158, PUBMED:12610732.

    \ 918 IPR007379 \

    Tim44 is an essential component of the machinery that mediates the translocation of nuclear-encoded proteins across the mitochondrial inner membrane PUBMED:10430866. Tim44 is thought to bind phospholipids of the mitochondrial inner membrane both by electrostatic interactions and by penetrating the polar head group region PUBMED:10430866.

    \ 3179 IPR005052 \ Lectins are structurally diverse proteins that bind to specific carbohydrates. This family includes the VIP36 and ERGIC-53 lectins. These\ two proteins were the first recognized members of a family of animal lectins similar (19-24%) to the leguminous plant lectins PUBMED:8205612. The alignment for this family aligns\ residues lying towards the N-terminus, where the similarity of VIP36 and ERGIC-53 is greatest. However, while Fiedler and Simons PUBMED:8205612 identified these proteins as a\ new family of animal lectins, this alignment also includes yeast sequences. ERGIC-53 is a 53kD protein, localized to the intermediate region between the endoplasmic\ reticulum and the Golgi apparatus (ER-Golgi-Intermediate Compartment, ERGIC). It was identified as a calcium-dependent, mannose-specific lectin PUBMED:8868475. Its dysfunction\ has been associated with combined factors V and VIII deficiency OMIM:227300 OMIM:601567, suggesting an important and substrate-specific role for ERGIC-53 in\ the glycoprotein- secreting pathway PUBMED:8868475,PUBMED:10090935.\ 3529 IPR000800 \ The Notch domain is also called the 'DSL' domain or the Lin-12/Notch repeat (LNR). The LNR region is present only in Notch related proteins C-terminal to EGF repeats. The lin-12/Notch proteins\ act as transmembrane receptors for intercellular signals that specify cell fates during animal\ development. In response to a ligand, proteolytic cleavages release the intracellular domain of\ Notch, which then gains access to the nucleus and acts as a transcriptional co-activator PUBMED:3119223. The\ LNR region is supposed to negatively regulate the Lin-12/Notch proteins activity. It is a triplication\ of an around 35-40 amino acids module present on the extracellular part of the protein PUBMED:7697721, PUBMED:8139658. Each\ module contains six cysteine residues engaged in three disulphide bonds and three conserved aspartate\ and asparagine residues PUBMED:3119223. The biochemical characterization of a recombinantly expressed LIN-12.1\ module from the human Notch1 receptor indicate that the disulphide bonds are formed between the first\ and fifth, second and fourth, and third and sixth cysteines. The formation of this particular disulphide\ isomer is favored by the presence of Ca2+, which is also required to maintain the structural integrity\ of the rLIN-12.1 module. The conserved aspartate and asparagine residues are likely to be important for\ Ca2+ binding, and thereby contribute to the native fold.\ 7525 IPR011620 \ \

    Two-component regulatory system lytS/lytT probably regulates genes involved in cell wall metabolism.\ This entry represents the transmembrane region of the 5TM-LYT (5TM Receptors of the LytS-YhcK type) PUBMED:12914674.

    \ 8099 IPR013215 \

    The N-terminal and C-terminal domains of cobalamin-independent methionine synthase together define a catalytic cleft in the enzyme. The N-terminal domain is thought to bind the substrate, in particular, the negatively charged polyglutamate chain. The N-terminal domain is also thought to stabilise a loop from the C-terminal domain PUBMED:15326182.

    \ 8045 IPR013177 \

    This domain is found at the C terminal end of mitochondrial proteins of unknown function.

    \ 5909 IPR010343 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 5895 IPR009276 \

    Family of Proteobacteria proteins with unknown function.

    \ 3636 IPR002004 \

    The polyadenylate-binding protein (PABP) has a conserved C-terminal domain (PABC), which is also found in the hyperplastic discs protein (HYD) family of ubiquitin ligases that contain HECT domains () PUBMED:11287654. PABP recognises the 3’ mRNA poly(A) tail and plays an essential role in eukaryotic translation initiation and mRNA stabilisation/degradation. PABC domains of PABP are peptide-binding domains that mediate PABP homo-oligomerisation and protein-protein interactions. In mammals, the PABC domain of PABP functions to recruit several different translation factors to the mRNA poly(A) tail PUBMED:11940585.

    \ 5122 IPR007959 \

    This family consists of dinoflagellate luciferase and luciferin binding proteins. Luciferase is\ involved in catalysing the light emitting reaction in bioluminescence and luciferin binding protein\ (LBP) is known to bind to luciferin (the substrate for luciferase) to stop it reacting with the enzyme\ and therefore switching off the bioluminescence function. The expression of these two proteins is\ controlled by a circadian clock at the translational level, with synthesis and degradation occurring on\ a daily basis PUBMED:11747464.

    \ 2014 IPR005622 \

    This is a family of uncharacterised bacterial proteins which includes Escherichia coli SprT (). SprT is described as a regulator of bolA gene in stationary phase PUBMED:13129938. The majority of members contain the metallopeptidase zinc binding signature which has a HExxH motif, however there is no evidence for them being metallopeptidases.

    \ 6817 IPR009728 \

    This entry represents the N-terminal region of the mammalian BAALC proteins. BAALC (brain and acute leukaemia, cytoplasmic) is highly conserved among mammals, but is absent from lower organisms. Two isoforms are specifically expressed in neuroectoderm-derived tissues, but not in tumours or cancer cell lines of non-neural tissue origin. It has been shown that blasts from a subset of patients with acute leukaemia greatly overexpress eight different BAALC transcripts, resulting in five protein isoforms. Among patients with acute myeloid leukaemia, those overexpressing BAALC show distinctly poor prognosis, pointing to a key role of the BAALC products in leukaemia. It has been suggested that BAALC is a gene implicated in both neuroectodermal and hematopoietic cell functions PUBMED:11707601.

    \ 409 IPR004308 \ This family represents the catalytic subunit of glutamate-cysteine ligase (), also known as\ gamma-glutamylcysteine synthetase (GCS). This enzyme catalyses the rate limiting step in the biosynthesis of glutathione.\ The eukaryotic enzyme is a dimer of a heavy chain and a light chain with all the catalytic activity exhibited by the heavy\ chain.\ 4252 IPR000851 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S5 is one of the proteins from the small ribosomal subunit, and is a protein of \ 166 to 254 amino-acid residues. In Escherichia coli, S5 is known to be important in the assembly and \ function of the 30S ribosomal subunit. Mutations in S5 have been shown to increase translational \ error frequencies. It belongs to a family of ribosomal proteins which, on the basis of sequence \ similarities PUBMED:, PUBMED:2247072, groups bacterial, cyanelle, red algal chloroplast, \ archaeal and fungal mitochondrial S5; mammalian, Caenorhabditis elegans, Drosophila and plant S2; and yeast \ S4 (SUP44).

    \ 1430 IPR007736 \ This family contains plant proteins related to caleosin. Caleosins contain calcium-binding domains and have an oleosin-like association with lipid bodies. Caleosins are present at relatively low levels and are mainly bound to microsomal membrane fractions at the early stages of seed development. As the seeds mature, overall levels of caleosins increased dramatically and they were associated almost exclusively with storage lipid bodies PUBMED:11171180. The calcium binding domain is probably related to the calcium-binding EF-hands motif .\ 4096 IPR003116 \ This is the Ras-binding domain found in proteins related to Ras. It is found\ in association with the PE-bind and pkinase domains.\ 1235 IPR000229 \ Arenaviruses are single stranded RNA viruses. This family represents the\ nucleocapsid protein that encapsidates the viral ssRNA PUBMED:8599223.\ 7922 IPR012557 \

    This family consists of the heat stable enterotoxin (ST) from Escherichia coli. ST is a small peptide of 18 or 19 amino acid residues produced by enterotoxigenic Escherichia coli and is the cause of acute diarrhoea in infants and travellers in developing countries. ST triggers a biological response by binding to a membrane-associated guanylyl cyclase C which is located on intestinal epithelial cell membranes PUBMED:15049831.

    \ 6935 IPR009798 \

    This family consists of several plant wound-induced protein sequences related to WI12 from Mesembryanthemum crystallinum (). Wounding, methyl jasmonate, and pathogen infection is known to induce local WI12 expression. WI12 expression is also thought to be developmentally controlled in the placenta and developing seeds. WI12 preferentially accumulates in the cell wall and it has been suggested that it plays a role in the reinforcement of cell wall composition after wounding and during plant development PUBMED:11598226.

    \ 3188 IPR004247 \ This family contains retroviral transactivating (Tat) proteins, from a variety of lentiviruses. The Tat protein may have a role in trans-activation of the viral long terminal repeat PUBMED:2536163.\ 2388 IPR001059 \ Elongation factor P (EF-P) is a prokaryotic protein translation factor required\ for efficient peptide bond synthesis on 70S ribosomes from fMet-tRNAfMet PUBMED:9195040.\ Probably functions indirectly by altering the affinity of the ribosome for aminoacyl-tRNA,\ thus increasing their reactivity as acceptors for peptidyl transferase.\ \ 5388 IPR008423 \ This family contains several Bacillus thuringiensis P21 proteins. These proteins are thought to be molecular chaperones and have mosquitocidal properties PUBMED:9023925,PUBMED:2644205.\ 7539 IPR011695 \ This motif is found in the Tash AT-hook proteins of Theileria annulata. These proteins are transported to the host nucleus and are likely to be involved in pathogenesis PUBMED:11683409. The repeat is also often found in conjunction with . Shiels et al. suggest that they may be part of PEST motifs (a signal for rapid proteolytic degradation) PUBMED:15075278, though this is not definite. This motif is also found in other T. annulata proteins, which have no other known domains (unpublished data: C Yeats).\ 1080 IPR005519 \ This family of class B acid phosphatases also contains a number of vegetative storage proteins (VPS25). The acid phosphatase activity of VPS has been experimentally demonstrated PUBMED:1639823.\ 4355 IPR003518 \

    Salmonella typhimurium contains a 90kb plasmid that is associated with\ virulence. This plasmid encodes at least 6 genes needed by the\ bacterium for invading host macrophages during infection. These include\ the 70kDa mkaA protein PUBMED:2164511, a recognised virulence factor, and more recently described, four spv genes under the control of a regulator PUBMED:8483415.

    \

    Deletion studies on the virulence plasmid have shown that an open reading \ frame encoding a 28kDa protein was needed for successful invasion of the \ host. This protein, designated mkfA PUBMED:2164511, VRP4 PUBMED:2696057 or VirA PUBMED:1657882 by different\ groups, is utilised by the microbe upon entry into macrophages, although the \ exact mechanism is unclear.

    \ 6903 IPR008302 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 4500 IPR002100 \ Human serum response factor (SRF) is a ubiquitous nuclear protein important\ for cell proliferation and differentiation. SRF function is essential\ for transcriptional regulation of numerous growth-factor-inducible genes,\ such as c-fos oncogene and muscle-specific actin genes. \ A core domain of around 90 amino acids is sufficient for the activities\ of DNA-binding, dimerisation and interaction with accessory factors. Within\ the core is a DNA-binding region, designated the MADS box PUBMED:7637780, that is\ highly similar to many eukaryotic regulatory proteins: among these are\ MCM1, the regulator of cell type-specific genes in fission yeast; DSRF,\ a Drosophila trachea development factor; the MEF2 family of myocyte-specific enhancer factors; and the Agamous and Deficiens families of\ plant homeotic proteins. \

    Proteins belonging to the MADS family function as dimers, the primary\ DNA-binding element of which is an anti-parallel coiled coil of two\ amphipathic alpha-helices, one from each subunit. The DNA wraps around\ the coiled coil allowing the basic N-termini of the helices to fit into\ the DNA major groove. The chain extending from the helix N-termini reaches\ over the DNA backbone and penetrates into the minor groove. A 4-stranded,\ anti-parallel beta-sheet packs against the coiled-coil face opposite the\ DNA and is the central element of the dimerisation interface.\ The MADS-box domain is commonly found associated with K-box region see

    \ 2391 IPR001379 \

    Fertilization proteins are acrosomal proteins involved in various roles during the fertilization process. Structurally these proteins consist of a closed bundle of helices with a right-hand twist. Lysin and SP18, both characterised in abalone, are two evolutionarily related fertilization proteins that have distinctive roles. Following its release from sperm, lysin binds to the egg vitelline envelope (VE) via the VE receptor for lysin (VERL), then non-enzymatically dissolves the VE to create a hole, thereby allowing the sperm to pass through the envelope and fuse with the egg PUBMED:10666624. Lysins exhibit species-specific binding to their egg receptor, possibly through differences in charged surface residues PUBMED:10698629. SP18 is also released from sperm, acting as a potent fusagen of liposomes to mediate the fusion between the sperm and egg cell membranes. Despite a similarity in the overall fold, the variation in the surface features of SP18 and lysin account for their different roles in fertilization PUBMED:11331004.

    \ \ 5279 IPR008777 \ This family consists of Phytoreovirus nonstructural proteins Pns10 and Pns11. Genome segment S11 of Oryza sp. gall dwarf virus (Rice gall dwarf virus), a member of Phytoreovirus encodes a putative protein of 40 kDa that exhibits approximately 37% homology at the amino acid level to the nonstructural proteins Pns10 of Oryza sp. dwarf and wound tumour viruses, which are other members of Phytoreovirus PUBMED:10949951.\ 1119 IPR005608 \

    Adenovirus infection inhibits synthesis and processing of rRNA and redistributes nucleolar antigens. Adenovirus protein V associates\ with nucleoli in infected cells.

    \ 3312 IPR004927 \

    Mercury is a highly toxic metal. Toxicity can result from three different\ mercurial forms: elemental, inorganic ion and organomercurial compounds. The\ ability of bacteria to detoxify mercurial compounds by reduction and\ volatilisation is conferred by the Mer genes, which are usually plasmid\ encoded (although chromosome resistance determinants have also occasionally\ been identified) PUBMED:9168120. Organomercurial lyase (MerB), also known as alkylmercury lyase, mediates the first\ of the two steps in the microbial detoxification of organomercurial salts\ (the other catalysed by mercuric reductase). \

    \

    Organomercurial lyase catalyses the protonolysis of the C-Hg bond in a wide\ range of organomercurial salts (primary, secondary, tertiary, alkyl, vinyl,\ allyl and aryl) to Hg(II) and the respective organic compound PUBMED:10548738:\

    \

    RHg(+) + H(+) = RH + Hg(2+)\

    \

    Hg(II) is subsequently detoxified by mercuric reductase. \

    \

    The enzyme has been purified to homogeneity in Escherichia coli and has been found\ to be a 22.4kDa monomer with no detectable cofactors or metal ions.

    \ \ \ 7260 IPR010886 \

    This family consists of several bacterial histone H1-like Hc1 proteins, which appear to be specific to Chlamydia species. Chlamydiae are prokaryotic obligate intracellular parasites that undergo a biphasic life cycle involving an infectious, extracellular form known as elementary bodies and an intracellular, replicating form termed reticulate bodies. The gene coding for Hc1 is expressed only during the late stages of the chlamydial life cycle concomitant with the reorganisation of chlamydial reticulate bodies into elementary bodies, suggesting that the Hc1 protein plays a role in the condensation of chlamydial chromatin during intracellular differentiation PUBMED:2023942.

    \ 7196 IPR009961 \

    This family consists of several uncharacterised proteins from Drosophila melanogaster. The function of this family is unknown.

    \ 5456 IPR008420 \ This family consists of P13 proteins from Borrelia species. P13 is a 13 kDa integral membrane protein which is post-translationally processed at both ends and modified by an unknown mechanism PUBMED:11292755.\ 3930 IPR007008 \

    This is a family of poxvirus proteins of unknown function.

    \ 318 IPR007034 \

    This conserved region is found in a number of eukaryotic proteins, including the ribosome biogenesis protein (BMS) which may act as a molecular switch during maturation of the 40S ribosomal subunit in the nucleolus.

    \ 6764 IPR009703 \

    This family consists of several mammalian selenoprotein S (SelS) sequences. SelS is a plasma membrane protein and is present in a variety of tissues and cell types. These proteins are involved in the degradation process of misfolded endoplasmic reticulum (ER) luminal proteins which participate in the transfer of misfolded proteins from the ER to the cytosol, where they are destroyed by the proteasome in a ubiquitin-dependent manner PUBMED:12477932. They probably serve as a linker between DER1, which mediates the retro-translocation of misfolded proteins into the cytosol, and the ATPase complex VCP, which mediates the translocation and ubiquitination.

    \ 2740 IPR002053 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 25 comprises enzymes with only one known activity; lysozyme ().

    \

    It has been shown PUBMED:1916274, PUBMED:1747104 that a number of cell-wall lytic enzymes are evolutionary related and can be classified into a single family.\ Two residues, an aspartate and a glutamate, have been shown PUBMED:567645 to be\ important for the catalytic activity of the Charalopsis enzyme. These residues\ as well as some others in their vicinity are conserved in all proteins from\ this family.

    \ 3073 IPR000506 \ Acetohydroxy acid isomeroreductase catalyses the conversion of acetohydroxy acids into dihydroxy valerates. This reaction is the second in the synthetic pathway of the essential branched side chain\ amino acids valine and isoleucine PUBMED:9218783. The enzyme forms a tetramer of similar but non-identical chains, and requires magnesium as a cofactor.\ \ 8153 IPR013235 \

    This domain is specific to the PPP5 subfamily of serine/threonine phosphatases.

    \ 6790 IPR010719 \

    This family contains a number of putative rRNA methylases.

    \ 4492 IPR007347 \ In Bacillus subtilis this protein interferes with sporulation at an early stage and this inhibitory effect is overcome by SpoIIB and SpoVG. SpoVS seems to play a positive role in allowing progression beyond stage V of sporulation. Null mutations in the spoVS gene block sporulation at stage V, impairing the development of heat resistance and coat assembly PUBMED:7559352.\ 3943 IPR007585 \

    Protein E2 is encoded by pox viruses and its function is unknown.

    \ 250 IPR004882 \

    This family consists of several LUC7 protein homologues that are restricted to eukaryotes. LUC7 has been shown to be a U1 snRNA associated protein PUBMED:10631324 with a role in splice site recognition PUBMED:11170747. The entry contains human and mouse LUC7 like (LUC7L) proteins PUBMED:10500099 and human cisplatin resistance-associated overexpressed protein (CROP) PUBMED:11804584.

    \ 2899 IPR000785 \ The equine Herpesvirus EHV1 protein belongs to a family of sequences that groups together HSV1 \ UL10, EHV1 52, VZV 50, EBV BBRF3, HVS1 39 and HCMV UL100. Little is yet known about the properties \ of the protein. However, its amino acid sequence is highly hydrophobic, containing 8 putative\ membrane-spanning regions, and it is therefore believed to be either membrane-associated or transmembrane.\ 5215 IPR008449 \ This family consists of several Drosophila chorion proteins S36 and S38. The chorion genes of Drosophila are amplified in response to developmental signals in the follicle cells of the ovary PUBMED:1908228.\ 3514 IPR007876 \ This family is comprised of several flagellar sheath adhesin proteins also called neuraminyllactose-binding hemagglutinin precursor (NLBH) or N-acetylneuraminyllactose-binding fibrillar hemagglutinin receptor-binding subunits. NLBH is found exclusively in Helicobacter which are gut colonising bacteria and bind to sialic acid rich macromolecules present on the gastric epithelium PUBMED:11855744.\ 3785 IPR007223 \

    Peroxin-13 is a component of the peroxisomal translocation machinery with Peroxin-14 and Peroxin-17. Both termini of Peroxin-13 are oriented to the cytosol. It is required for peroxisomal association of peroxin-14 PUBMED:10882522. The proteins also contain an SH3 domain ().

    \ 5400 IPR008754 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase M43 (cytophagalysin family, clan MA(M)), subfamily M43B. The predicted active site residues for members of this family and thermolysin, the type example for clan MA, occur in the motif HEXXH.

    \ \

    The type example of this family is the pregnancy-associated plasma protein A (PAPP-A), which cleaves insulin-like growth factor (IGF) binding protein-4 (IGFBP-4), causing a dramatic reduction in its affinity for IGF-I and -II. Through this mechanism, PAPP-A is a regulator of IGF bioactivity in several systems, including the Homo sapiens ovary and the cardiovascular system PUBMED:10913121, PUBMED:11713222, PUBMED:11897673.

    \ 3591 IPR001399 \

    Bluetongue virus VP6 protein binds ATP and exhibits an\ RNA-dependent ATPase function and a helicase activity that\ catalyses the unwinding of double-stranded RNA substrates PUBMED:9311795. VP6 from five United States\ prototype bluetongue virus (BTV) serotypes contain unusually high concentrations of glycine, \ few aromatic amino acids, but a high concentration of charged amino acids,\ a characteristic of hydrophilic proteins PUBMED:1329371.

    \ \

    VP6 is an inner capsid protein that surrounds the genomic DS-RNA. Its\ hydrophilic nature coupled with a capability to bind ss- and ds-RNA,\ suggests that it interacts directly with the BTV genomic RNA.

    \ 1210 IPR002680 \ The alternative oxidase is used as a second terminal oxidase in the mitochondria, electrons are transferred directly from reduced ubiquinol to oxygen forming water PUBMED:8770590. This is not coupled to ATP synthesis and is not inhibited by cyanide, this pathway is a single step process PUBMED:9426242. In rice the transcript levels of the alternative oxidase are increased by low temperature PUBMED:9426242. It has been predicted to contain a coupled diiron center on the basis of a conserved sequence motif consisting of the proposed iron ligands, four Glu and two His residues PUBMED:11106766. The EPR study of Arabidopsis thaliana\ alternative oxidase AOX1a shows that the enzyme contains a\ hydroxo-bridged mixed-valent Fe(II)/Fe(III) binuclear iron center PUBMED:12215444. A catalytic cycle has been proposed that involves diiron center and at least one transient protein-derived radical, most probably an invariant Tyr residue PUBMED:11801238.\ \ \ 4209 IPR002674 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This ribosomal protein is found in archaebacteria and eukaryotes PUBMED:2546769. Ribosomal protein L37 has a single zinc finger-like motif of the C2-C2 type PUBMED:8484768.

    \ 2493 IPR004464 \ The GlpX protein is involved in glycerol metabolism but its exact function is unknown. It is induced by but not required for growth in glycerol.\ 1054 IPR004117 \ All known members of this group are seven-transmembrane proteins that are candidate odorant receptors in Drosophila.\ 5591 IPR008553 \ The members of this archaebacterial protein family are around 250-300 amino acid residues in length. The function of these proteins is not known.\ 217 IPR007218 \

    DNA polymerase is responsible for effective DNA replication. The function of the delta subunit 4 of DNA polymerase is not yet known.

    \ 1227 IPR005138 \

    This is the N-terminal domain of aerolysin and pertussis toxin which contains a type-C lectin like fold.

    \ 5058 IPR007895 \

    This is a domain of unknown function found in proteins of unknown function.

    \ \ 1370 IPR003093 \

    Active cell suicide (apoptosis) is induced by events such as growth factor withdrawal and toxins.\ It is controlled by regulators, which have either an inhibitory effect on programmed cell death\ (anti-apoptotic) or block the protective effect of inhibitors (pro-apoptotic) PUBMED:15335822,\ PUBMED:8918887. Many viruses have found a way of countering defensive apoptosis by encoding their own\ anti-apoptosis genes preventing their target-cells from dying too soon.

    PAll proteins belonging to\ the Bcl-2 family PUBMED:8910675 contain either a BH1, BH2, BH3, or BH4 domain. All anti-apoptotic\ proteins contain BH1 and BH2 domains, some of them contain an additional N-terminal BH4 domain\ (Bcl-2, Bcl-x(L), Bcl-w), which is never seen in pro-apoptotic proteins, except for Bcl-x(S). On the\ other hand, all pro-apoptotic proteins contain a BH3 domain (except for Bad) necessary for\ dimerization with other proteins of Bcl-2 family and crucial for their killing activity, some of them\ also contain BH1 and BH2 domains (Bax, Bak). The BH3 domain is also present in some anti-apoptotic\ protein, such as Bcl-2 or Bcl-x(L). Proteins that are known to contain these domains include vertebrate\ Bcl-2 (alpha and beta isoforms) and Bcl-x (isoforms (Bcl-x(L) and Bcl-x(S)); mammalian proteins Bax and\ Bak; mouse protein Bid; Xenopus laevis proteins Xr1 and Xr11; human induced myeloid leukemia cell\ differentiation protein MCL1 and Caenorhabditis elegans protein ced-9.

    \ 5880 IPR010329 \

    In eukaryotes 3-hydroxyanthranilic acid dioxygenase () is part of the kynurenine pathway for the degradation of tryptophan and the biosynthesis of nicotinic acid PUBMED:9539135.The prokaryotic homologue is involved in the 2-nitrobenzoate degradation pathway PUBMED:12620844.

    \ 4912 IPR004848 \ This family of viral proteins is known as the 110 family PUBMED:2325202.\ The function of members of this family is unknown. The family\ contains a central cysteine rich region with eight conserved\ cysteines. Some members of the family contains two copies of\ the cysteine rich region Swiss:P18560.\ 2191 IPR007501 \ This is a family of hypothetical archaeal proteins.\ 4705 IPR007321 \

    This domain is found in a family of plant gene products and is thought to be related to gypsy type transposons. There is a domain of unknown function, DUF390 (), at the C terminus of the proteins.

    \ 543 IPR002172 \

    Low density lipoprotein (LDL) is the major cholesterol-carrying lipoprotein of plasma. The receptor protein binds LDL and transports it into cells by endocytosis. In order to be internalised, the receptor-ligand complex must first cluster into clathrin-coated pits. Seven successive cysteine-rich repeats of about 40 amino acids are present in the N-terminal of this multidomain membrane protein PUBMED:6091915.

    \ \

    The LDL-receptor class A domain contains 6 disulphide-bound cysteines PUBMED:7548065 and a highly conserved cluster of negatively charged amino acids, of which many are clustered on one face of the module PUBMED:7603991. A schematic representation of this domain is shown here:

    \
    \
       +---------------------+        +--------------------------------+\
       |                     |        |                                |\
      -CxxxxxxxxxxxxCxxxxxxxxCxxxxxxxxCxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxC-\
                    |                            |\
                    +----------------------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    'x': any residue.\
    
    \

    In LDL-receptors the class A domains form the binding site for LDL PUBMED:6091915 and calcium PUBMED:3320043. The acidic residues between the fourth and sixth cysteines are important for high-affinity binding of positively charged sequences in LDLR's ligands PUBMED:3283935. The repeat has been shown PUBMED:7603991 to consist of a beta-hairpin structure followed by a series of beta turns. In the absence of calcium, LDL-A domains are unstructured; the bound calcium ion imparts structural integrity.

    \ \

    Following these repeats is a 350 residue domain that resembles part of the epidermal growth factor (EGF) precursor PUBMED:6327078, PUBMED:6091915.

    Similar domains have been found (see references in PUBMED:7603991) in several extracellular and membrane proteins (see examples).

    \ \

    \ Numerous familial hypercholestorolemia mutations of the LDL receptor alter the calcium coordinating residue of LDL-A domains or other crucial scaffolding residues.\

    \ 7521 IPR011658 \ This domain forms an insert in bacterial beta-glucosidases and is found in other glycosidases, glycosyltransferases, proteases, amidases, yeast adhesins, and bacterial toxins, including anthrax protective antigen (PA). The domain also occurs in a Dictyostelium prespore-cell-inducing factor Psi and in fibrocystin, the mammalian protein whose mutation leads to polycystic kidney and hepatic disease. The crystal structure of PA shows that this domain (named PA14 after its location in the PA20 pro-peptide) has a beta-barrel structure. The PA14 domain sequence suggests a binding function, rather than a catalytic role. The PA14 domain distribution is compatible with carbohydrate binding.\ 7038 IPR010814 \

    This family consists of several hypothetical bacterial proteins of around 100 residues in length. Members of this family appear to be Actinomycete specific. The function of this family is unknown.

    \ 153 IPR004477 \

    This family is defined to identify a pair of paralogous 3' exoribonucleases in Escherichia coli, plus the set of proteins apparently orthologous to one or the other in other eubacteria. VacB was characterized originally as required for the expression of virulence genes, but is now recognized as the exoribonuclease RNase R (Rnr). Its paralog in Escherichia coli and Haemophilus influenzae is designated exoribonuclease II (Rnb). Both are involved in the degradation of mRNA, and consequently have strong pleiotropic effects that may be difficult to disentangle. Both these proteins share domain-level similarity (RNB, S1) with a considerable number of other proteins, and full-length similarity scoring below the trusted cut off to proteins associated with various phenotypes but uncertain biochemistry; it may be that these latter proteins are also 3' exoribonucleases.

    \ 824 IPR002942 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The S4 domain is a small domain consisting of 60-65 amino acid residues\ that was detected in the bacterial ribosomal protein S4, eukaryotic\ ribosomal S9, two families of pseudouridine synthases, a novel family\ of predicted RNA methylases, a yeast protein containing a pseudouridine\ synthetase and a deaminase domain, bacterial tyrosyl-tRNA synthetases,\ and a number of uncharacterized, small proteins that may be involved in\ translation regulation PUBMED:10093218. The S4 domain probably mediates binding to\ RNA.

    \ 1904 IPR003774 \

    This entry describes proteins of unknown function.

    \ 933 IPR001102 \ Synonym(s): Transglutaminase, Fibrinoligase, TGase \

    Protein-glutamine gamma-glutamyltransferases () (TGase) are calcium-dependent enzymes that\ catalyze the cross-linking of proteins by promoting the formation of\ isopeptide bonds between the gamma-carboxyl group of a glutamine in one\ polypeptide chain and the epsilon-amino group of a lysine in a second\ polypeptide chain. TGases also catalyze the conjugation of polyamines to\ proteins PUBMED:1683845, PUBMED:1974250.

    \ \

    Transglutaminases are widely distributed in various organs, tissues and\ body fluids. The best known transglutaminase is blood coagulation factor XIII,\ a plasma tetrameric protein composed of two catalytic A subunits and two\ non-catalytic B subunits. Factor XIII is responsible for cross-linking fibrin chains,\ thus stabilizing the fibrin clot.

    \ 7729 IPR012464 \

    This family contains sequences derived from proteins of unknown function expressed by Drosophila melanogaster and Anopheles gambiae.

    \ 7297 IPR010011 \

    This domain, which is usually found tandemly repeated, is found various receptor co-activating proteins.

    \ 57 IPR007680 \ Arabinosyltransferase is involved in arabinogalactan (AG) biosynthesis pathway in mycobacteria. AG is a component of the macromolecular assembly of the mycolyl-AG-peptidoglycan complex of the cell wall. This enzyme has important clinical applications as it is believed to be the target of the antimycobacterial drug Ethambutol PUBMED:8876238.\ 3456 IPR003356 \ This domain is fpound in N-6 adenine-specific DNA methylase () from Type I and Type IC restriction systems.\ These enzymes are responsible for the methylation of specific DNA sequences in order to prevent the host from digesting its own genome via its restriction enzymes. These methylases have the same sequence specificity as their corresponding restriction enzymes. The type I restriction and modification system is composed of three polypeptides R, M and S. The M and S subunits together form a methyltransferase that methylates two adenine residues in complementary strands of a bipartite DNA recognition sequence. In the presence of the R subunit, the complex can also act as an endonuclease, binding to the same target sequence but cutting the DNA some distance from this site. Whether the DNA is cut or modified depends on the methylation state of the target sequence. When the target site is unmodified, the DNA is cut. When the target site is hemimethylated, the complex acts as a maintenance methyltransferase, modifying the DNA so that both strands become methylated.\ 942 IPR002305 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    \ \ 2983 IPR001356 \ The homeobox domain was first identified in a number of drosophila homeotic and \ segmentation proteins, but is now known to be well-conserved in many other animals, \ including vertebrates PUBMED:2568852, PUBMED:1357790, PUBMED:. Hox genes encode homeodomain-containing transcriptional regulators that operate differential genetic programs along the anterior-posterior axis of animal bodies PUBMED:12445403. The domain binds DNA through a \ helix-turn-helix (HTH) structure. The HTH motif is characterised by two alpha-helices, \ which make intimate contacts with the DNA and are joined by a short turn. The second \ helix binds to DNA via a number of hydrogen bonds and hydrophobic interactions, which \ occur between specific side chains and the exposed bases and thymine methyl groups within \ the major groove of the DNA PUBMED:. The first helix helps to stabilise the \ structure.

    The motif is very similar in sequence and structure in a wide range of \ DNA-binding proteins (e.g., cro and repressor proteins, homeotic proteins, etc.). One of \ the principal differences between HTH motifs in these different proteins arises from the \ stereo-chemical requirement for glycine in the turn which is needed to avoid steric \ interference of the beta-carbon with the main chain: for cro and repressor proteins the \ glycine appears to be mandatory, while for many of the homeotic and other DNA-binding \ proteins the requirement is relaxed.

    \ 4286 IPR006110 \

    In eukaryotes, there are three different forms of DNA-dependent RNA polymerases () transcribing different sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebacteria, there is generally a single form of RNA polymerase which also consists of an oligomeric assemblage of 10 to 13 polypeptides. A component of 14 to 18 kDa shared by all three forms of eukaryotic RNA polymerases and which has been sequenced in budding yeast (gene RPB6 or RPO26), in fission yeast (gene rpb6 or rpo15), in human and in African swine fever virus is evolutionary related to the archaebacterial subunit K (gene rpoK). The archaebacterial protein is colinear with the C-terminal part of the eukaryotic subunit.

    \

    The structures of the omega subunit and RBP6, and the structures of the omega/beta' and RPB6/RPB1 interfaces, suggest a molecular mechanism for the function of omega and RPB6 in promoting RNAP assembly and/or stability. The conserved regions of omega and RPB6 form a compact structural domain that interacts simultaneously with conserved regions of the largest RNAP subunit and with the C-terminal tail following a conserved region of the largest RNAP subunit. The second half of the conserved region of omega and RPB6 forms an arc that projects away from the remainder of the structural domain and wraps over and around the C-terminal tail of the largest RNAP subunit, clamping it in a crevice, and threading the C-terminal tail of the largest RNAP subunit through the narrow gap between omega and RPB6 PUBMED:11158566.

    \ 1725 IPR006076 \ This family includes various FAD dependent oxidoreductases: Glycerol-3-phosphate dehydrogenase (), Sarcosine oxidase beta subunit (), D-alanine oxidase (), D-aspartate oxidase (). \

    D-amino acid oxidase () (DAMOX or DAO) is an FAD flavoenzyme that catalyzes the oxidation \ of neutral and basic D-amino acids into their corresponding keto acids. DAOs have been characterized \ and sequenced in fungi and vertebrates where they are known to be located in the peroxisomes. D-aspartate \ oxidase () (DASOX) PUBMED:1601857 is an enzyme, structurally related to DAO, which catalyzes \ the same reaction but is active only toward dicarboxylic D-amino acids. In DAO, a conserved histidine \ has been shown PUBMED:1673125 to be important for the enzyme's catalytic activity.

    \ 256 IPR004987 \

    This is a family of proteins of unknown function.

    \ 1037 IPR004173 \ This domain is predicted to be a small molecule binding domain, based on its occurrence with other domains PUBMED:11292341. The domain is named after its three conserved histidine residues.\ 7809 IPR012577 \

    Members of this family include many hypothetical proteins. It also includes members of the NIPSNAP family, which have putative roles in vesicular transport PUBMED:9661659. This domain is often found in duplicate.

    \ 4527 IPR001499 \

    G-protein-coupled receptors, GPCRs, constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence PUBMED:8170923. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. There is a specialized database for GPCRs: http://www.gpcr.org/7tm/.

    \

    Little is known about the structure and function of the mating factor\ receptors, STE2 and STE3. It is believed, however, that they are integral\ membrane proteins that may be involved in the response to mating factors\ on the cell membrane PUBMED:, PUBMED:3001640, PUBMED:2836861. The amino acid sequences of both receptors\ contain high proportions of hydrophobic residues grouped into 7 domains,\ in a manner reminiscent of the rhodopsins and other receptors believed to\ interact with G-proteins. However, while a similar 3D framework has been\ proposed to account for this, there is no significant sequence similarity\ either between STE2 and STE3, or between these and the rhodopsin-type\ family: the receptors thus bear their own unique '7TM' signatures.

    \

    The STE3 gene of Saccharomyces cerevisiae is the cell-surface receptor that binds the\ 13-residue lipopeptide a-factor. Several related fungal pheromone receptor\ sequences are known: these include pheromone B alpha 1 and B alpha 3, and\ pheromone B beta 1 receptors from Schizophyllum commune; pheromone receptor\ 1 from Ustilago hordei; and pheromone receptors 1 and 2 from Ustilago maydis.\ Members of the family share about 20% sequence identity.

    \ 3439 IPR000713 \

    This family contains a number of related ligase enzymes that catalyse consecutive steps in the synthesis of peptidoglycan. This family also includes folylpolyglutamate synthase that transfers glutamate to folylpolyglutamate and cyanophycin synthetase that catalyses the biosynthesis of the cyanobacterial reserve material multi-L-arginyl-poly-L-aspartate (cyanophycin) PUBMED:9652408.

    \

    The N-terminal domain is almost always associated with the cytoplasmic peptidoglycan synthetases C-terminal domain (see ).

    \ 196 IPR005112 \

    This region is always found associated with . It is predicted to form a globular domain that is completely alpha helical PUBMED:11563850. Although not statistically supported it has been suggested that this domain may be similar to members of the Rho/Rac/Cdc42 GEF family PUBMED:11563850.

    \ 3336 IPR003358 \ This is a family of hypothetical proteins which are putative methyltransferases. The aligned region contains the GXGXG S-AdoMet binding site suggesting a putative methyltransferase activity.\ 3880 IPR003187 \

    Outer membrane phospholipase A (OMPLA) is an integral membrane phospholipase, which is present in many\ Gram-negative bacteria and has a broad substrate specificity . The role of OMPLA has been most thoroughly studied in Escherichia coli,\ where it participates in the secretion of bacteriocins. Bacteriocin release is triggered by a lysis\ protein (bacteriocin release protein or BRP), followed by a phospholipase dependent accumulation\ of lysophospholipids and free fatty acids in the outer membrane PUBMED:12615538. The reaction products enhance the\ permeability of the outer membrane, which allows the semispecific secretion of bacteriocins. One speculative function of OMPLA is related to organic solvent tolerance in bacteria.

    Structurally, it consists of a\ 12-stranded antiparallel beta-barrel with a convex and a flat side. The active site residues are exposed\ on the exterior of the flat face of the beta-barrel. The activity of the enzyme is regulated by reversible\ dimerisation. Dimer interactions occur exclusively in the\ membrane-embedded parts of the flat side of the beta-barrel, with polar residues embedded in an\ apolar environment forming the key interactions. The active site His and Ser residues are located at the exterior of the beta-barrel, at the outer\ leaflet side of the membrane. This location indicates that under normal conditions the substrate and\ the active site are physically separated, since in E. coli phospholipids are exclusively located in the\ inner leaflet of the outer membrane.

    \ 7534 IPR011625 \

    This is a domain of the alpha-2-macroglobulin family.

    \

    The alpha-macroglobulin (aM) family of proteins includes protease inhibitors PUBMED:2473064, typified by the human tetrameric a2-macroglobulin (a2M); they belong to the MEROPS proteinase inhibitor family I39, clan IL. These protease inhibitors share several defining properties, which include (i) the ability to inhibit proteases from all catalytic classes, (ii) the presence of a 'bait region' and a thiol ester, (iii) a similar protease inhibitory\ mechanism and (iv) the inactivation of the inhibitory capacity by reaction of the thiol ester with small primary amines. aM protease inhibitors inhibit by steric hindrance PUBMED:2472396. The mechanism involves protease cleavage of the bait region, a segment of the aM that is particularly susceptible to proteolytic cleavage, which initiates a conformational change such that the aM collapses about the protease. In the resulting aMprotease complex, the active site of the protease is sterically shielded, thus substantially decreasing access to protein substrates. Two additional events occur as a consequence of bait region cleavage, namely (i) the h-cysteinyl-g-glutamyl thiol ester becomes highly reactive and (ii) a major conformational change exposes a conserved COOH-terminal receptor binding domain PUBMED:2469470 (RBD). RBD exposure allows the aM protease complex to bind to clearance receptors and be removed from circulation PUBMED:2430968. Tetrameric, dimeric, and, more recently, monomeric aM protease inhibitors have been identified PUBMED:9914899, PUBMED:10426429.

    \ \ 4466 IPR006279 \

    These sequences represent the delta subunit of a family of known and putative heterotetrameric sarcosine oxidases. Five operons of such oxidases are found in Mesorhizobium loti and three in Agrobacterium tumefaciens, a high enough copy number to suggest that not all members share the same function. Sarcosine oxidase catalyzes the oxidative demethylation of sarcosine to glycine. The reaction converts tetrahydrofolate to 5,10-methylene-tetrahydrofolate. The enzyme is known in monomeric and heterotetrameric (alpha,beta,gamma,delta) forms, this represents the heterotetrameric form.

    \ 3762 IPR000383 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This family of sequences are serine peptidases belonging to MEROPS peptidase family S15 (clan SC) PUBMED:7845208. The type example is X-Pro dipeptidyl-peptidase of Lactococcus lactis.

    \ \

    These proteins, which have similar specificity to mammalian dipeptidyl-peptidase IV, cleave Xaa-Pro-releasing\ N-terminal dipeptides. The penultimate residue must be proline.\ In L. lactis the proteins exist as cytoplasmic homodimers PUBMED:7845208.

    \ 5817 IPR009247 \

    This family consists of several Chordopoxvirus sequences homologous to the Vaccinia virus A35R protein. The function of this family is unknown.

    \ 480 IPR000286 \ Histones can be reversibly acetylated on several lysine residues.\ Regulation of transcription is caused in part by this\ mechanism. Histone deacetylases catalyse the removal\ of the acetyl group. Histone deacetylases, acetoin utilization proteins and acetylpolyamine amidohydrolases are all members of this ancient protein superfamily PUBMED:9278492.\ 2582 IPR000968 \ Influenza virus belongs to the class of ssRNA negative-strand viruses. Nonstructural protein 2 (NS2) may \ play a role in promoting normal replication of the genomic RNAs by preventing the replication of\ short-length RNA species PUBMED:8113739. NS1 and NS2 proteins are produced from the same gene by \ alternative splicing.\ 1294 IPR001690 \

    Bacterial species have many methods of controlling gene expression and cell\ growth. Regulation of gene expression in response to changes in cell density is termed quorum sensing PUBMED:10607620, PUBMED:9990077. Quorum-sensing bacteria produce, release and respond to hormone-like molecules (autoinducers) that accumulate in the external environment as the cell population grows. Once a threshold of these molecules is reached, a signal transduction cascade is triggered that ultimately leads to behavioural changes in the bacterium PUBMED:9990077. Autoinducers are thus clearly important mediators of molecular communication.

    \

    Conjugal transfer of Agrobacterium octopine-type Ti plasmids is activated \ by octopine, a metabolite released from plant tumours PUBMED:8188582. Octopine causes conjugal donors to secrete a pheromone, Agrobacterium autoinducer (AAI),\ and exogenous AAI further stimulates conjugation. The putative AAI synthase and an AAI-responsive transcriptional regulator have been found to be encoded by the Ti plasmid traI and traR genes, respectively. TraR and TraI are similar to the LuxR and LuxI regulatory proteins of Vibrio fischeri, and AAI is similar in structure to the diffusable V.fischeri autoinducer, the inducing ligand of LuxR. TraR activates target genes in the presence of AAI and also activates traR and traI themselves, creating two positive-feedback loops. TraR-AAI-mediated activation in wild-type Agrobacterium strains is enhanced by culturing on solid media, suggesting a possible role in cell density sensing PUBMED:8188582.

    \

    Production of light by the marine bacterium Vibrio fischeri and by recombinant hosts containing cloned lux genes is controlled by the density\ of the culture PUBMED:3697093. Density-dependent regulation of lux gene expression has been shown to require a locus consisting of the luxR and luxI genes.

    \

    In these and other Gram-negative bacteria, N-(3-oxohexanoyl)-L-homoserine lactone (OHHL) acts as the autoinducer by binding to transcriptional regulatory proteins and activating them PUBMED:7968529. OHHL and related molecules, such as N-butanoyl- (BHL), N-hexanoyl- (HHL) and N-oxododecanoyl- (PAI) homoserine lactones, are produced by a family of proteins that share a high level of sequence similarity.

    \

    Proteins which currently members of this family include:\

    \ 3649 IPR006843 \ This family identifies a conserved region found in a number of plastid lipid-associated proteins (PAPs), and in a number of putative fibrillin proteins.\ 335 IPR007316 \ eIF-3 is a multisubunit complex that stimulates translation initiation in vitro at several different steps. This family corresponds to the gamma subunit of eIF3 PUBMED:7542616, PUBMED:9851972.\ 1967 IPR005069 \ This family of worm proteins has no known function.\ 3468 IPR005106 \

    This domain adopts a Rossman NAD binding fold. The C-terminal domain of homoserine dehydrogenase contributes a single helix to this structural domain, which is not included in the Pfam model.

    \ 3903 IPR004475 \ This family represents the large subunit, DP2, of a two subunit novel archaebacterial replicative DNA polymerase first characterized for Pyrococcus furiosus. The structure of DP2 appears to be organized as a ~950 residue component separated from a ~300 residue component by a ~150 residue intein. The other subunit, DP1, has sequence similarity to the eukaryotic DNA polymerase delta small subunit.\ 7958 IPR012600 \

    This is found at the N-terminal end of some peptidases that belong to MEROPS peptidase family C25 (). Little is known about the function of this motif.

    \ 46 IPR001828 \ This describes a ligand binding domain and includes extracellular ligand binding domains of a wide range of receptors, as well as the bacterial amino acid binding proteins of known structure PUBMED:8011339.\ 7362 IPR006571 \

    TLDc is a domain of unknown function, restricted to eukaryotes, and commonly found in TBC () and LysM () domain containing proteins.

    \ 1745 IPR000653 \ The members of this family are probably all pyridoxal-phosphate-dependent aminotransferase enzymes with a variety of molecular functions. The family includes StsA , StsC and StsS PUBMED:9238101. The aminotransferase activity was demonstrated for purified StsC protein as the L-glutamine:scyllo-inosose aminotransferase , which catalyzes the first amino transfer in the biosynthesis of the streptidine subunit of streptomycin PUBMED:9238101.\ 5365 IPR008478 \ This family consists of several uncharacterised proteins from the Borrelia burgdorferi and Borrelia garinii.\ 5014 IPR004198 \ Predicted zinc finger with eight potential zinc ligand binding residues. This domain is found in Jumonji PUBMED:11165500, and may have a DNA binding function. The mouse jumonji protein is required for neural tube formation, and is essential for normal heart development. It also plays a role in the down-regulation of cell proliferation signalling.\ 2166 IPR007548 \ This is a family of uncharacterised prokaryotic proteins.\ 122 IPR003305 \ The 1,4-beta-glucanase CenC from Cellulomonas fimi contains two\ cellulose-binding domains, CBD(N1) and CBD(N2), arranged in tandem at its\ N-terminus. These homologous CBDs are distinct in their selectivity for binding amorphous and not crystalline cellulose PUBMED:10704194.\ Multidimensional heteronuclear nuclear magnetic resonance (NMR) spectroscopy\ was used to determine the tertiary structure of the 152 amino acid N-terminal\ cellulose-binding domain from C. fimi 1,4-beta-glucanase CenC\ (CBDN1)PUBMED:8916925. The tertiary\ structure of CBDN1 is strikingly similar to that of the bacterial\ 1,3-1,4-beta-glucanases, as well as other sugar-binding proteins with jelly-roll folds.\ 1939 IPR004024 \ This domain is found in several Caenorhabditis elegans proteins. It contains 4 conserved cysteines. This domain is presumably extracellular and these cysteines form disulphide bridges.\ 1447 IPR001588 \

    Caseins PUBMED:3074304 are the major protein constituent of milk. Caseins can be classified into two families; the first consists of the kappa-caseins, and the second groups the alpha-s1, alpha-s2, and beta-caseins. The alpha/beta caseins are a rapidly diverging family of proteins. However two regions are conserved: a cluster of phosphorylated serine residues and the signal sequence.

    \

    Alpha-s2 casein is known as epsilon-casein in mouse, gamma-casein in rat and casein-A in guinea pig. Alpha-s1 casein is known as alpha-casein in rat and rabbit and as casein-B in guinea pig.

    \ 3087 IPR000413 \

    Integrins are the major metazoan receptors for cell adhesion to extracellular matrix proteins and, in vertebrates, also play important roles in certain cell-cell adhesions, make transmembrane connections to the cytoskeleton and activate many intracellular signaling pathways PUBMED:12297042. Integrins are alpha-beta heterodimers; each subunit crosses the membrane once, with most of the polypeptide in the extracellular space, and has two short cytoplasmic domains. Most integrins recognise relatively short peptide motifs, and in general require an acidic amino acid to be present. Ligand specificity depends on both the alpha and beta subunits. Many integrins are expressed on cell surfaces in an inactive state in which they do not bind ligands and do not signal. Integrins frequently intercommunicate and the engagement of one may lead to the activation or inhibition of another.

    \

    The structure of unliganded alphaV beta3 showed the molecule to be folded, with the head bent over towards the C termini of the legs which would normally be inserted into the membrane. The head comprises a beta propeller domain at the end terminus of the alphaV subunit and an I/A domain inserted into a loop on the top of the hybrid domain in the beta subunit. The I/A domain consists of a Rossman fold with a core of beta parallel sheets surrounded by amphipathic alpha helices.

    \ Some alpha subunits are cleaved post-\ translationally to produce a heavy and a light chain linked by a disulphide\ bond PUBMED:3028640, PUBMED:2199285. Integrin alpha chains share a conserved sequence which is found at\ the beginning of the cytoplasmic domain, just after the end of the\ transmembrane region. Within the N-terminal domain of alpha subunits, seven sequence repeats, each\ of approximately 60 amino acids, have been found PUBMED:3327687. It has been predicted \ that these repeats assume the beta-propeller fold. The domains contain seven \ four-stranded beta-sheets arranged in a torus around a pseudosymmetry axis\ PUBMED:8990162. Integrin ligands and a putative Mg2+ ion are predicted to bind to the\ upper face of the propeller, in a manner analogous to the way in which the\ trimeric G-protein beta subunit (G beta) (which also has a beta-propeller\ fold) binds the G protein alpha subunit PUBMED:8990162.\

    Integrin cytoplasmic domains are normally less than 50 amino acids in length, with the beta-subunit sequences\ exhibiting greater homology to each other than the alpha-subunit sequences PUBMED:12826403. This is consistent with\ current evidence that the beta subunit is the principal site for binding of cytoskeletal and signalling\ molecules, whereas the alpha subunit has a regulatory role. The first ten residues of the\ alpha-subunit cytoplasmic domain appear to form an alpha helix that is terminated by a proline residue. The\ remainder of the domain is highly acidic in nature and this loops back to contact the\ membrane-proximal lysine anchor residue.

    \ 206 IPR004265 \ This family contains a number of proteins which are induced during disease response in plants.\ 7034 IPR009855 \

    This family consists of several Baculovirus specific late expression factor 10 (LEF-10) sequences. LEF-10 is thought to be a late expressed structural protein although its exact function is unknown PUBMED:12202224.

    \ 4666 IPR006171 \

    This is a conserved region from DNA primase. This corresponds to the Toprim domain common to DnaG primases, topoisomerases, OLD family nucleases and RecR proteins PUBMED:9121560. Both DnaG motifs IV and V are present in the alignment, the DxD (V) motif may be involved in Mg2+ binding and mutations to the conserved glutamate (IV) completely abolish DnaG type primase activity. DNA primase is a nucleotidyltransferase it synthesizes the oligoribonucleotide primers required for DNA replication on the lagging strand of the replication fork; it can also prime the leading stand and has been implicated in cell division PUBMED:8294018. This family also includes the atypical archaeal A subunit from type II DNA topoisomerases PUBMED:9722641. Type II DNA topoisomerases catalyse the relaxation of DNA supercoiling by causing transient double strand breaks.

    \ 6584 IPR009611 \

    This entry represents the C-terminal region of eukaryotic chorion protein S19. In Drosophilidae, the S19 gene is known to form part of an autosomal cluster that also contains s16, s15 and s18 PUBMED:11404001. Note that members of this family contain a conserved PVA motif, and many contain .

    \ 1111 IPR000978 \ Adenoviruses are responsible for diseases such as pneumonia, cystitis, conjunctivitis and diarrhoea, all \ of which can be fatal to patients who are immunocompromised PUBMED:7704534. Viral infection commences with \ recognition of host cell receptors by means of specialised proteins on viral surfaces. Specific attachment \ of adenovirus is achieved through interactions between host-cell receptors and the adenovirus fiber protein \ and is mediated by the globular carboxy-terminal domain of the adenovirus fiber protein, termed the \ carboxy-terminal knob domain.\ 3751 IPR005072 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M44 (clan ME). The active site residues for members of this family and family M16 occur in the motif HXXEHProtein. The type example is the vaccinia virus-type metalloendopeptidase G1 from vaccinia virus, it is a metalloendopeptidase expressed by many Poxviridae which appears to play a role in the maturation of viral proteins.

    \ 3183 IPR007025 \ Late expression factor 8 (LEF-8) is one of the primary components of RNA polymerase produced by polyhedrosis viruses. LEF-8 shows homology to the second largest subunit of prokaryotic DNA-directed RNA polymerasePUBMED:12124466.\ 1048 IPR006114 \

    6-Phosphogluconate dehydrogenase () (6PGD) is an oxidative carboxylase that catalyses the decarboxylating reduction of 6-phosphogluconate into ribulose 5-phosphate in the presence of NADP. This reaction is a component of the hexose mono-phosphate shunt and pentose phosphate pathways (PPP) PUBMED:2113917, PUBMED:6641716. Prokaryotic and eukaryotic 6PGD are proteins of about 470 amino acids whose sequences are highly conserved PUBMED:1659648. The protein is a homodimer in which the monomers act independently PUBMED:6641716: each contains a large, mainly alpha-helical domain and a smaller beta-alpha-beta domain, containing a mixed parallel and anti-parallel 6-stranded beta sheet PUBMED:6641716. NADP is bound in a cleft in the small domain, the substrate binding in an adjacent pocket PUBMED:6641716.

    This entry represents the C-terminal all-alpha domain of 6-phosphogluconate dehydrogenase. The domain contains two structural repeats of 5 helices each. The NAD-binding domain is described in .

    \ 2689 IPR001191 \ Geminiviruses are characterised by a genome of circular single-stranded DNA encapsidated\ in twinned (geminate) quasi-isometric particles, from which the group derives its name\ PUBMED:. Most geminiviruses can be divided\ into 2 subgroups on the basis of host range and/or insect vector: i.e. those that infect dicotyledenous plants and are transmitted by the same whitefly species, and\ those that infect monocotyledenous plants and are transmitted by different leafhopper\ vectors. The genomes of the whitefly-transmitted cassava latent (CLV),\ tomato golden mosaic (TGMV) and bean golden mosaic (BGMV) viruses possess a bipartite\ genome. By contrast, only a single DNA component has been identified for the leafhopper-transmitted \ maize streak (MSV) and wheat dwarf (WDV) viruses PUBMED:6526009, PUBMED:2829117.\ Beet curly top (BCTV), bean summer death and tobacco yellow dwarf viruses belong to a\ third possible subgroup. BCTV is transmitted by a specific leafhopper species, yet like\ the whitefly-transmitted geminiviruses it has a host range confined to dicotyledenous\ plants.\ 4326 IPR007568 \ This family is comprised of fungal proteins with multiple transmembrane regions. RTA1 () is involved in resistance to 7-aminocholesterol PUBMED:8660468, while RTM1 () confers resistance to an unknown toxic chemical in molasses PUBMED:7672593. These proteins may bind to the toxic substance, and thus prevent toxicity. They are not thought to be involved in the efflux of xenobiotics PUBMED:8660468.\ 2205 IPR007537 \ This is a family of uncharacterised eukaryotic proteins.\ 3664 IPR002021 \ The nucleocapsid protein is referred to as NP. NP is is the major\ structural component of the nucleocapsid. The protein is approx.\ 58 kDa. 2600 NP molecules go to tightly encapsidate the viral RNA.\ NP interacts with several other viral encoded proteins, all of which are \ involved in controlling replication: NP-NP, NP-P, NP-(PL), \ and NP-V PUBMED:9125045, PUBMED:8806522, PUBMED:8396656.\ 5045 IPR007687 \ Methyl coenzyme M reductase (MCR) catalyses the final step in methanogenesis. MCR is composed of three subunits, alpha (, ), beta () and gamma () PUBMED:8863453. Genes encoding the beta (mcrB) and gamma (mcrG) subunits are separated by two open reading frames coding for two proteins C and D PUBMED:3170483. The function of proteins C and D is unknown.\ 2158 IPR007523 \

    This is a family of uncharacterised proteins possibly involved in DNA repair.

    \ 3096 IPR006830 \ This family represents the Salmonella outer membrane lipoprotein InvH. The molecular function of this protein is unknown, but it is required for the localisation to outer membrane of InvG, which is involved in a type III secretion apparatus mediating host cell invasion PUBMED:9680224, PUBMED:9786184.\ 1866 IPR003326 \ This domain is found in a family of proteins from Caenorhabditis elegans. The domain has no known function, but has 4 conserved cysteine residues and is a maximum of 175 residues long.\ 4613 IPR007077 \

    This domain is found in a number of bacterial proteins including the TfoX gene product of Haemophilus influenzae. TfoX may play a key role in the development of genetic competence by regulating the expression of late competence-specific genes PUBMED:7724607. This family corresponds to the C-terminal presumed domain of TfoX. The domain is found in association with the N-terminal domain in some, but not all members of this group, suggesting this is an autonomous and functionally unrelated domain. For example it is found associated with in .

    \ 517 IPR006155 \ Human genes containing triplet repeats can markedly expand in length, leading\ to neuropsychiatric disease. Expansion of triplet repeats explains the\ phenomenon of anticipation, i.e. the increasing severity or earlier age of\ onset in successive generations in a pedigree PUBMED:8325628.\ A novel gene containing CAG repeats has been identified and mapped to\ chromosome 14q32.1, the genetic locus for Machado-Joseph disease (MJD).\ Normally, the gene contains 13-36 CAG repeats, but most clinically diagnosed\ patients and all affected members of a family with the clinical and \ pathological diagnosis of MJD show expansion of the repeat number, from \ 68-79 PUBMED:7874163. Similar abnormalities in related genes may give rise to diseases\ similar to MJD. \ MJD is a neurodegenerative disorder characterised by cerebellar ataxia, \ pyramidal and extra-pyramidal signs, peripheral nerve palsy, external \ ophtalmoplegia, facial and lingual fasciculation and bulging. The disease\ is autosomal dominant, with late onset of symptoms, generally after the\ fourth decade.\ 2019 IPR005642 \

    Members of this family contain a conserved core of four predicted transmembrane segments. Some members have an additional pair of N-terminal transmembrane helices. The functions of the proteins in this family are unknown.

    \ 134 IPR001715 \

    The calponin homology domain (also known as CH-domain) is a superfamily of actin-binding domains found in both cytoskeletal proteins and signal transduction proteins PUBMED:7589522. It comprises the following groups of actin-binding domains:\

    \

    A comprehensive review of proteins containing this type of actin-binding domains is given in PUBMED:7584474.

    \

    The CH domain is involved in actin binding in some members of the family. However in calponins there is evidence that the CH domain is not involved in its actin binding activity PUBMED:9625744. Most proteins have two copies of the CH domain, however some proteins such as calponin and the human vav proto-oncogene () have only a single copy. The structure of an example CH-domain has recently been solved PUBMED:9164454.

    \ 6927 IPR009793 \

    This family consists of several hypothetical bacterial proteins of around 200 residues in length. The function of this family is unknown although some members are annotated as being putative integral membrane proteins.

    \ 1448 IPR001707 \

    Chloramphenicol acetyltransferase (CAT) () PUBMED:1867713 catalyzes the acetyl-CoA dependent acetylation of chloramphenicol (Cm), an antibiotic which inhibits prokaryotic peptidyltransferase activity. Acetylation of Cm by CAT inactivates the antibiotic. A histidine residue, located in the C-terminal section of the enzyme, plays a central role in its catalytic mechanism.

    \

    There is a second family of CAT PUBMED:1314803, evolutionary unrelated to the main family described above. These CAT belong to the bacterial hexapeptide-repeat containing-transferases family (see ).

    \

    The crystal structure of the type III enzyme from Escherichia coli with chloramphenicol bound has been determined. CAT is a trimer of identical subunits (monomer Mr 25,000) and the trimeric structure is stabilized by a number of hydrogen bonds, some of which result in the extension of a beta-sheet across the subunit interface. Chloramphenicol binds in a deep pocket located at the boundary between adjacent subunits of the trimer, such\ that the majority of residues forming the binding pocket belong to one subunit while the catalytically essential histidine belongs to the adjacent subunit. His195 is appropriately positioned to act as a general base catalyst in the reaction, and the required tautomeric stabilization is provided by an unusual interaction with a main-chain carbonyl oxygen PUBMED:2187098.

    \ 5362 IPR008651 \ This family consists of several bacterial HicB related proteins. The function of HicB is unknown although it is thought to be involved in pilus formation. It has been speculated that HicB performs a function antagonistic to that of pili and yet is necessary for invasion of certain niches PUBMED:9721313.\ 397 IPR003018 \ This domain is present in phytochromes and cGMP-specific phosphodiesterases. cGMP-dependent 3',5'-cyclic phosphodiesterase () catalyses the conversion of guanosine 3',5'-cyclic phosphate to guanosine 5'-phosphate.\ A phytochrome is a regulatory photoreceptor which exists in 2 forms that are reversibly interconvertible by light, the PR form that absorbs maximally in the red region of the spectrum, and the PFR form that absorbs maximally in the far-red region. This domain is also found in NifA, a transcriptional activator which is required for activation of most Nif operons which are directly involved in nitrogen fixation. \ NifA interacts with sigma-54.\ 2804 IPR004257 \ This family contains a predicted structural envelope protein GP4 from equine arteritis virus (EAV).\ 443 IPR000715 \ This pattern describes a family of UDP-GlcNAc/MurNAc: polyisoprenol-P GlcNAc/MurNAc-1-P\ transferases. Members of the family include eukaryotic N-acetylglucosamine-1-phosphate\ transferases, which catalyze the conversion of UDP-N-acteyl-D-glucosamine and dolichyl\ phosphate to UMP and N-acetyl-D-glucosaminyl-diphosphodolichol in the glycosylation pathway;\ and bacterial phospho-N-acetylmuramoyl-pentapeptide-transferases, which catalyze the first step\ of the lipid cycle reactions in the biosynthesis of cell wall peptidoglycan.\ 6357 IPR010533 \

    This entry includes vertebrate transcription factors, some of which are regulated by IL-3/adenovirus E4 promoter binding protein PUBMED:1620116. Others were found to strongly repress transcription in a DNA-binding-site-dependent manner PUBMED:1620116.

    \ 6427 IPR009519 \

    This family consists of several hypothetical Fijivirus proteins of unknown function.

    \ 5432 IPR008494 \ This family consists of several highly related Mus musculus and Homo sapiens proteins of unknown function.\ 4299 IPR004942 \

    This family includes proteins that are about 100 amino acids long. Members of this family of proteins are associated with both flagellar outer\ arm dynein and Drosophila and rat brain cytoplasmic dynein. We propose that roadblock/LC7 family members may modulate specific dynein\ functions PUBMED:10402468.

    \ \ 8028 IPR013269 \

    This entry contains Orf UL2 of Human cytomegalovirus (HCMV), which is a short protein of unknown function PUBMED:12533697

    \ 6797 IPR010723 \

    Proteins containing this domain are all oxygen-independent coproporphyrinogen-III oxidases (HemN). This enzyme catalyses the oxygen-independent conversion of coproporphyrinogen-III to protoporphyrinogen-IX PUBMED:12196143, one of the last steps in haem biosynthesis. The function of this domain is unclear, but comparison to other proteins containing a radical SAM domain suggest it may be a substrate binding domain.

    \ 7437 IPR011466 \

    These proteins from several diverse bacteria share a short conserved sequence towards their N termini.

    \ 6310 IPR010515 \

    NC10 stands for Non-helical region 10 and is taken from . A mutation in this region in is associated with an increased risk of prostrate cancer. This domain is cleaved from the precursor and forms endostatin. Endostatin is a key tumour suppressor and has been used highly successfully to treat cancer. It is a potent angiogenesis inhibitor PUBMED:11606364. Endostatin also binds a zinc ion near the N terminus; this is likely to be of structural rather than functional importance according to PUBMED:10704302.

    \ 576 IPR007754 \

    N-acetylglucosaminyltransferase II () is a Golgi resident enzyme that catalyzes an essential step in the biosynthetic pathway leading from high mannose to complex N-linked oligosaccharides PUBMED:7797505. Mutations in the MGAT2 gene lead to a congenital disorder of glycosylation (CDG IIa). CDG IIa patients have an increased bleeding tendency, unrelated to coagulation factors PUBMED:11596651.

    \

    Synonym(s): UDP-N-acetyl-D-glucosamine:alpha-6-D-mannoside beta-1,2-N- acetylglucosaminyltransferase II, GnT II/MGAT2.

    \ 3441 IPR011601 \

    This entry represents a C-terminal conserved region of UDP-N-acetylenolpyruvoylglucosamine reductase , which is also called UDP-N-acetylmuramate dehydrogenase. It is a part of the pathway for the biosynthesis of the UDP-N-acetylmuramoyl-pentapeptide, which is a precursor of bacterial peptidoglycan.

    \ \ 4700 IPR004244 \ Many human L1 elements are capable of retrotransposition. Some of these have been shown to exhibit reverse transcriptase (RT) activity PUBMED:9140393 although the function of many are, as yet, unknown.\ 6396 IPR009506 \

    This family is found in several hypothetical bacterial proteins. In some cases it represents it represents the C-terminal region whereas in others it represents the whole sequence.

    \ 6376 IPR010544 \

    This domain represents a region within kinesin-related proteins from higher plants. Many proteins containing this domain also contain the domain. Kinesins are ATP-driven microtubule motor proteins that produce directed force PUBMED:12471890. Some family members are associated with the phragmoplast, a structure composed mainly of microtubules that executes cytokinesis in higher plants PUBMED:10898978.

    \ 1340 IPR007799 \

    This family consists of unidentified baculoviral p47 proteins which is one of the primary components of Autographa californica\ multinucleocapsid polyhedrovirus encoded RNA polymerase, which initiates transcription from late and very late promoters PUBMED:9733837.\

    \ 7344 IPR011116 \

    SecA protein binds to the plasma membrane where it interacts with proOmpA to support translocation of proOmpA through the membrane. SecA protein achieves this translocation, in association with SecY protein, in an ATP-dependent manner. This domain is composed of two C-terminal alpha helical subdomains: the wing and scaffold subdomains.

    \ 7268 IPR010000 \

    This family consists of several caerin 1 proteins from Litoria species, Australian tree frogs. The caerin 1 peptides are among the most powerful of the broad-spectrum antibiotic amphibian peptides PUBMED:12717721. These peptides are excreted from amphibian skin, and can interact with and disrupt bacterial membranes, leading to the permeabilisation of the cell membrane. Caerin 1.1 forms a helix-bend-helix sturcture, where both helices are required for activity, as well as the bend region for flexibility.

    \ 1922 IPR003826 \ Members of this family are related to the amino terminus of Eshcerichia coli S-adenosylmethionine decarboxylase.\ 2578 IPR001561 \ Matrix protein (M1) of influenza virus is a bifunctional protein that mediates the\ encapsidation of RNA-nucleoprotein cores into the membrane envelope. It is therefore\ required that M1 binds both membrane and RNA simultaneously PUBMED:9164466.\ 8083 IPR013214 \

    Mastoparan (MP) peptides I, II and III are extracted from the venom gland of the Neotropical social wasp Protopolybia exigua (Saussure) They are tetradecapeptides presenting from seven to ten hydrophobic amino acid residues and from two to four lysine residues in their primary sequences. These peptides cause the degranulation of mast cells. Protopolybia-MP-I also causes haemolysis of erythrocytes.

    \ 5284 IPR008606 \ This family consists of several eukaryotic translation initiation factor 4E binding proteins (EIF4EBP1, -2 and -3). Translation initiation in eukaryotes is mediated by the cap structure (m7GpppN, where N is any nucleotide) present at the 5' end of all cellular mRNAs, except organellar. The cap is recognised by eukaryotic initiation factor 4F (eIF4F), which consists of three polypeptides, including eIF4E, the cap-binding protein subunit. The interaction of the cap with eIF4E facilitates the binding of the ribosome to the mRNA. eIF4E activity is regulated in part by translational repressors, 4E-BP1, 4E-BP2 and 4E-BP3 which bind to it and prevent its assembly into eIF4F PUBMED:9593750.\ 5695 IPR008569 \ This repeated 32 residue sequence occurs in proteins which have no known function and only seems to occur in Caenorhabditis elegans.\ 977 IPR006926 \ This protein forms part of the Class C vacuolar protein sorting (Vps) complex. Vps16 is essential for vacuolar protein sorting, which is essential for viability in plants, but not yeast PUBMED:11702788. The Class C Vps complex is required for SNARE-mediated membrane fusion at the lysosome-like yeast vacuole. It is thought to play essential roles in membrane docking and fusion at the Golgi-to-endosome and endosome-to-vacuole stages of transport PUBMED:11422941. The role of VPS16 in this complex is not known.\ 3977 IPR005022 \

    This family of proteins function as a trans-activator of viral late genes.

    \ 3065 IPR000779 \ T-Lymphocytes regulate the growth and differentiation of certain lymphopoietic and\ haemopoietic cells through the release of various secreted protein factors PUBMED:3918306.\ These factors, which include interleukin-2 (IL2), are secreted by lectin- or antigen-stimulated\ T-cells, and have various physiological effects. IL2 is a lymphokine that induces the\ proliferation of responsive T-cells. In addition, it acts on some B-cells, via receptor-specific\ binding PUBMED:3517854, as a growth factor and antibody production stimulant PUBMED:1510960. The\ protein is secreted as a single glycosylated polypeptide, and cleavage of a signal sequence\ is required for its activity PUBMED:3517854. Solution NMR suggests that the structure of IL2 comprises a\ bundle of 4 helices (termed A-D), flanked by 2 shorter helices and several poorly-defined\ loops. Residues in helix A, and in the loop region between helices A and B, are important for\ receptor binding. Secondary structure analysis has suggested similarity to IL4 and \ granulocyte-macrophage colony stimulating factor (GMCSF) PUBMED:1510960.\ \ 1533 IPR001002 \

    A number of plant and fungal proteins that bind N-acetylglucosamine (e.g. solanaceous lectins of tomato and potato, plant endochitinases, the wound-induced proteins: hevein, win1 and win2, and the Kluyveromyces lactis killer toxin alpha subunit) contain this domain PUBMED:1757999. The domain may occur in one or more copies and is thought to be involved in recognition or binding of chitin subunits PUBMED:2070799, PUBMED:1375935. In chitinases, as well as in the potato wound-induced proteins, the 43-residue domain directly follows the signal sequence and is therefore at the N-terminus of the mature protein; in the killer toxin alpha subunit it is located in the central section of the protein.

    \ 131 IPR006695 \ Centromere Protein B (CENP-B) is a DNA-binding protein localized to the centromere. Within the N-terminal 125 residues, there is a DNA-binding region, which binds to a corresponding 17bp CENP-B box sequence. CENP-B dimers either bind two separate DNA molecules or alternatively, they may bind two CENP-B boxes on one DNA molecule, with the intervening stretch of DNA forming a loop structure. The CENP-B DNA-binding domain consists of two repeating domains, RP1 and RP2. This family corresponds to RP1 has been shown to consist of four helices in a helix-turn-helix structure PUBMED:9451007.\ 4487 IPR004761 \

    Amino acid permeases are integral membrane proteins involved in the transport\ of amino acids into the cell. A number of such proteins have been found to be\ evolutionary related PUBMED:3146645, PUBMED:2687114, PUBMED:8382989.\ These proteins seem to contain up to 12 transmembrane segments. The best conserved region\ in this family is located in the second transmembrane segment.

    \

    Spore germination protein (amino acid permease) is involved in the response to the germinative mixture of L-asparagine, glucose, fructose and potassium ions (AFFK). These proteins could be amino acid transporters.

    \ 4243 IPR002906 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This family of ribosomal proteins consists mainly of the 40S ribosomal protein S27a which is synthesized as a C-terminal extension of ubiquitin (CEP) (). The S27a\ domain compromises the C-terminal half of the protein.\ The synthesis of ribosomal proteins as extensions of ubiquitin promotes their incorporation into nascent ribosomes by a transient metabolic stabilization and is required for efficient ribosome biogenesis PUBMED:2538753. The ribosomal extension protein S27a contains a basic region that is proposed to form a zinc finger; its fusion gene is proposed as a mechanism to maintain a fixed ratio between ubiquitin necessary for degrading proteins and ribosomes a\ source of proteins PUBMED:2538756.

    \ 6508 IPR009566 \

    This family consists of several hypothetical proteins of around 120 residues in length which are found specifically in Trypanosoma brucei. The function of this family is unknown.

    \ 1646 IPR002124 \

    Cytochrome c oxidase () is an oligomeric enzymatic complex which is a component \ of the respiratory chain complex and is involved in the transfer of electrons from \ cytochrome c to oxygen PUBMED:6307356. \ In eukaryotes this enzyme complex is located in the mitochondrial inner membrane; in \ aerobic prokaryotes it is found in the plasma membrane.

    \

    In eukaryotes, in addition to the \ three large subunits, I, II and III, that form the catalytic center of the enzyme complex, there are \ a variable number of small polypeptidic subunits. One of these subunits is known as Vb in mammals, V in slime mold and IV in yeast, binds a zinc atom. The sequence of subunit Vb is well conserved and includes three conserved cysteines that coordinate the zinc ion PUBMED:1661610, PUBMED:8638158. Two of these cysteines are clustered in the C-terminal section of the subunit.

    \ 7042 IPR010815 \

    This family consists of several hypothetical Enterobacterial proteins of around 100 residues in length. Members of this family are often described as YbjC. In Escherichia coli the ybjC gene is located downstream of nfsA (which encodes the major oxygen-insensitive nitroreductase). It is thought that nfsA and ybjC form an operon an its promoter is a class I SoxS-dependent promoter PUBMED:11741843. The function of this family is unknown.

    \ 4762 IPR000884 \

    This repeat was first described in 1986 by Lawler and Hynes PUBMED:2430973. It was found in the thrombospondin protein where it is repeated 3 times. Now a number of proteins involved in the complement pathway (properdin, C6, C7, C8A, C8B, C9) PUBMED:2459396 as well as extracellular matrix protein like mindin, F-spondin PUBMED:10409509, SCO-spondin and even the circumsporozoite surface protein 2 and TRAP proteins of Plasmodium PUBMED:10508153, PUBMED:1501644 contain one or more instance of this repeat.\ It has been involved in cell-cell interraction, inhibition of angiogenesis PUBMED:10500044 and\ apoptosis PUBMED:9135017.

    \

    The intron-exon organisation of the properdin gene confirms the hypothesis \ that the repeat might have evolved by a process involving exon shuffling PUBMED:1417780.\ A study of properdin structure provides some information about the structure of\ the thrombospondin type I repeat PUBMED:1868073.

    \ 1744 IPR006081 \

    Defensins are 2-6 kDa, cationic, microbicidal peptides active against many Gram-negative and Gram-positive bacteria, \ fungi, and enveloped viruses PUBMED:8528769, containing three pairs of intramolecular disulphide bonds. On the basis of their size and pattern of\ disulphide bonding, mammalian defensins are classified into alpha, beta and theta categories. Alpha-defensins, which have been identified in humans, monkeys and several\ rodent species, are particularly abundant in neutrophils, certain macrophage populations and Paneth cells of the small intestine.

    Defensins are produced constitutively and/or in response to microbial products or proinflammatory cytokines. Some defensins are also called corticostatins (CS) because \ they inhibit corticotropin-stimulated corticosteroid production. The mechanism(s) by which microorganisms are killed and/or inactivated by defensins is not understood completely. However, it is generally believed that killing is a\ consequence of disruption of the microbial membrane. The polar topology of defensins, with spatially separated charged and hydrophobic regions, allows them to\ insert themselves into the phospholipid membranes so that their hydrophobic regions are buried within the lipid membrane interior and their charged (mostly cationic)\ regions interact with anionic phospholipid head groups and water. Subsequently, some defensins can aggregate to form 'channel-like' pores; others might bind to and cover the microbial membrane in a 'carpet-like' manner. The net outcome is the disruption of membrane integrity and function,\ which ultimately leads to the lysis of microorganisms. Some defensins are synthesized as propeptides which may be relevant to this process.

    Human neutrophil-derived alpha-defensins (HNPs) are\ capable of enhancing phagocytosis by mouse macrophages. HNP1-3 have been reported to increase the production of tumor necrosis factor (TNF) and IL-1, while decreasing the production of IL-10 by monocytes.\ Increased levels of proinflammatory factors (e.g. IL-1, TNF, histamine and prostaglandin D2) and suppressed levels of IL-10 at the site of microbial infection are likely to\ amplify local inflammatory responses. This might be further reinforced by the capacity of some human and rabbit alpha-defensins to inhibit the production of\ immunosuppressive glucocorticoids by competing for the binding of adrenocorticotropic hormone to its receptor. Moreover, human alpha-defensins can enhance\ or suppress the activation of the classical pathway of complement in vitro by binding to solid-phase or fluid-phase complement C1q, respectively. The\ capacity of defensins to enhance phagocytosis, promote neutrophil recruitment, enhance the production of proinflammatory cytokines, suppress anti-inflammatory\ mediators and regulate complement activation argues that defensins upregulate innate host inflammatory defenses against microbial invasion.

    \ 1345 IPR007589 \ This family constitutes the 39 kDa major capsid protein of the Baculoviridae PUBMED:2644736.\ 3741 IPR000819 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M17 (leucyl aminopeptidase family, clan MF), the type example being leucyl aminopeptidase from Bos taurus.

    \ \

    Aminopeptidases are exopeptidases involved in the processing and regular\ turnover of intracellular proteins, although their precise role in cellular\ metabolism is unclear PUBMED:1555602, PUBMED:2395881. Leucine aminopeptidases cleave leucine residues\ from the N-terminal of polypeptide chains, but substantial rates are evident\ for all amino acids PUBMED:2395881.

    \ \

    The enzymes exist as homo-hexamers, comprising 2 trimers stacked on top of\ one another PUBMED:2395881. Each monomer binds 2 zinc ions and folds into 2 alpha/beta-type quasi-spherical globular domains, producing a comma-like shape PUBMED:2395881. The N-terminal 150 residues form a 5-stranded beta-sheet with 4 parallel and 1 anti-parallel strand sandwiched between 4 alpha-helices PUBMED:2395881. An alpha-helix extends into the C-terminal domain, which comprises a central 8-stranded saddle-shaped beta-sheet sandwiched between groups of helices, forming the monomer hydrophobic core PUBMED:2395881. A 3-stranded beta-sheet resides on the surface of the monomer, where it interacts with other members of the hexamer PUBMED:2395881. The 2 zinc ions and the active site are entirely located in the C-terminal catalytic domain PUBMED:2395881.

    \ 5569 IPR008705 \ This family contains a conserved novel zinc finger domain found in the eukaryotic proteins Nanos and Xcat-2. In Drosophila melanogaster, Nanos functions as a localised determinant of posterior pattern. Nanos RNA is localised to the posterior pole of the maturing egg cell and encodes a protein that emanates from this localised source. Nanos acts as a translational repressor and thereby establishes a gradient of the morphogen Hunchback PUBMED:7601003. Xcat-2 is found in the vegetal cortical region and is inherited by the vegetal blasomeres during development, and is degraded very early in development. The localised and maternally restricted expression of Xcat-2 RNA suggests a role for its protein in setting up regional differences in gene expression that occur early in development PUBMED:8223259.\ 6768 IPR009706 \

    This family consists of several hypothetical bacterial proteins of around 200 residues in length. The function of this family is unknown.

    \ 5187 IPR008024 \

    This domain consists of two transmembrane helices and a conserved linking section.

    \ 4737 IPR004364 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \ \

    This entry includes the asparagine, aspartic acid and lysine tRNA synthetases.

    \ 2924 IPR002580 \ This family consists of various herpes virus proteins;\ the gene 20 product, U49 protein, UL24 protein and BXRF1.\ The UL24 gene (product of the 24th ORF) is not essential for virus \ replication, mutants with lesions in UL24 show a reduced ability to \ replicate in tissue culture and have reduced thymidine kinase activity\ as the UL24 gene overlaps with thymidine kinase PUBMED:9501052.\ 1153 IPR000031 \

    Phosphoribosylaminoimidazole carboxylase is a fusion protein in plants and fungi, but consists of two non-interacting proteins in bacteria, PurK and PurE.\ PurK, N5-carboxyaminoimidazole ribonucleotide (N5_CAIR) synthetase, catalyzes the conversion of 5-aminoimidazole ribonucleotide (AIR), ATP, and bicarbonate to N5-CAIR, ADP, and Pi. PurE converts N5-CAIR to CAIR, the sixth step of de novo purine biosynthesis. In the presence of high concentrations of bicarbonate, PurE is reported able to convert AIR to CAIR directly and without ATP. Some members of this family contain two copies of this domain [ PUBMED:10074353. The crystal structure of PurE indicates a unique quaternary structure that confirms the octameric nature of the enzyme PUBMED:10574791.

    \ 954 IPR006990 \ None of the members of the tweety (tty) family have been functionally characterized. However, they are considered to be transmembrane proteins with five potential membrane-spanning regions. A number of potential functions have been suggested on the basis of homology to the yeast FTR1 and FTH1 iron\ transporter proteins and the mammalian neurotensin receptors 1 and 2 in that they have a similar hydrophobicity profiles\ although there is no detectable sequence homology to the tweety-related proteins. It has been proposed that the tweety-related\ proteins could be involved in transport of iron or other divalent cations or alternatively that they may be\ membrane-bound receptors PUBMED:10950931.\ 611 IPR004041 \

    The NAF domain is a 24 amino acid domain that is found in a plant-specific subgroup of serine-threonine protein kinases (CIPKs), that interact with calcineurin B-like calcium sensor proteins (CBLs). Whereas the N-terminal part of CIPKs comprises a conserved catalytic domain typical of Ser-Thr kinases, the much less conserved C-terminal domain appears to be unique to this subgroup of kinases. The only exception is the NAF domain that forms an 'island of conservation' in this otherwise variable region. The NAF domain has been named after the prominent conserved amino acids Asn-Ala-Phe. It represents a minimum protein interaction module that is both necessary and sufficient to mediate the interaction with the CBL calcium sensor proteins PUBMED:11230129.

    \

    The secondary structure of the NAF domain is currently not known, but secondary structure computation of the C-terminal region of Arabidopsis thaliana CBL-interacting protein kinase 1 revealed a long helical structure PUBMED:11230129.

    \ 1283 IPR002699 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    The CF(0) D subunit may be an integral part of the catalytic sector of the V-ATPase PUBMED:7831318. Proteins in this family include V-type H+ transporting and Na+ dependent ATPases.

    \ 323 IPR007808 \ This family of uncharacterised, mostly short, proteins contain a putative zinc binding domain with four conserved cysteines.\ 3139 IPR000001 \ Kringles are autonomous structural domains, found throughout the blood clotting and fibrinolytic proteins.\ Kringle domains are believed to play a role in binding mediators (e.g., membranes,\ other proteins or phospholipids), and in the regulation of proteolytic activity\ PUBMED:3886654, PUBMED:6373375, PUBMED:2157850. \ Kringle domains PUBMED:3131537, PUBMED:3891096, PUBMED:1879523 are characterised by a triple loop, 3-disulphide bridge structure, whose conformation is defined by a number of hydrogen bonds and small pieces of anti-parallel beta-sheet. They are found in a varying number of copies in some plasma proteins including prothrombin and urokinase-type plasminogen activator, which are serine proteases belonging to MEROPS peptidase family S1A.\ 4865 IPR005358 \

    This family of proteins contain 8 conserved cysteines that may form a zinc binding site. The function of these proteins is unknown.

    \ 4981 IPR002706 \

    DNA-repair protein Xrcc1 functions in the repair of single-strand DNA breaks in mammalian cells and forms a repair complex with beta-Pol, ligase III and PARP PUBMED:10467087. The NMR solution structure of the Xrcc1 N-terminal domain (Xrcc1 NTD) shows that the structural core is a beta-sandwich with beta-strands connected by loops, three helices and two short two-stranded beta-sheets at each connection side. The Xrcc1 NTD specifically binds single-strand break DNA (gapped and nicked) and a gapped DNA-beta-Pol complex PUBMED:10467102.

    \ 5464 IPR008513 \ This family consists of several bacterial proteins of unknown function.\ 440 IPR005027 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 43 comprises enzymes with only one known activities; beta-glucuronyltransferase();.

    \ 7918 IPR012622 \

    This family consists of ergtoxin peptides which are toxins secreted by the scorpions. The ergtoxins are capable of blocking the function of K+ channels. More than 100 ergtoxins have been found from scorpion venoms and they have been classified into three subfamilies according to their primary structures PUBMED:15211519.

    \ 6078 IPR009362 \

    This is a family of uncharacterised proteins found in viruses, archaea and bacteria.

    \ 2535 IPR007057 \

    This family contains the archaeal flagellar protein F and related proteins, they appear to be distantly related to .

    \ 418 IPR006096 \

    Glutamate, leucine, phenylalanine and valine dehydrogenases are structurally and functionally related. They contain a Gly-rich region containing a conserved Lys residue, which has been implicated in the catalytic activity, in each case a reversible oxidative deamination reaction.

    \

    Glutamate dehydrogenases (, , and ) (GluDH) are enzymes that catalyze the NAD- and/or NADP-dependent reversible deamination of L-glutamate into alpha-ketoglutarate PUBMED:1358610, PUBMED:8315654. GluDH isozymes are generally involved with either ammonia assimilation or glutamate catabolism. Two separate enzymes are present in yeasts: the NADP-dependent enzyme, which catalyses the amination of alpha-ketoglutarate to L-glutamate; and the NAD-dependent enzyme, which catalyses the reverse reaction PUBMED:2989290 - this form links the L-amino acids with the Krebs cycle, which provides a major pathway for metabolic interconversion of alpha-amino acids and alpha- keto acids PUBMED:3368458.

    \

    Leucine dehydrogenase () (LeuDH) is a NAD-dependent enzyme that catalyzes the reversible deamination of leucine and several other aliphatic amino acids to their keto analogues PUBMED:3069133. Each subunit of this octameric enzyme from Bacillus sphaericus contains\ 364 amino acids and folds into two domains, separated by a deep cleft. The\ nicotinamide ring of the NAD+ cofactor binds deep in this cleft, which is thought to\ close during the hydride transfer step of the catalytic cycle.

    \ \ 519 IPR003855 \ This is a family of K+ potassium transporters that are conserved across phyla, having both bacterial (KUP) PUBMED:8226635, yeast (HAK) PUBMED:7621817, and plant (AtKT) PUBMED:9350997 sequences as members.\ 5274 IPR008607 \ Trypanosoma brucei escapes destruction by the host immune system by regularly replacing its Variant Surface Glycoprotein (VSG) coat. The VSG is expressed in a VSG expression site, together with expression site associated gene (ESAG) 6 and 7, encoding the heterodimeric transferrin receptor (Tf-R). There are around 20 VSG expression sites, and trypanosomes can change the site that is active. Since ESAG6 and 7 in different expression sites differ somewhat in sequence, expression site switching results in the production of a slightly different Tf-R PUBMED:11814575.\ 112 IPR004178 \ Small-conductance Ca2+-activated K+ channels (SK channels) are independent of voltage and gated solely by intracellular Ca2+. These membrane channels are heteromeric complexes that comprise pore-forming alpha-subunits and the Ca2+-binding protein calmodulin (CaM) PUBMED:11323678. CaM binds to the SK channel through this the CaM-binding domain (CaMBD), which is located in an intracellular region of the alpha-subunit immediately carboxy-terminal to the pore. Channel opening is triggered when Ca2+ binds the EF hands in the N-lobe of\ CaM. The structure of this domain complexed with CaM is known PUBMED:11323678. This domain forms an elongated dimer with a CaM molecule bound at each end; each CaM wraps around three alpha-helices, two from one CaMBD subunit and one from the other.\ 7222 IPR010869 \

    This family contains a number of hypothetical bacterial proteins of unknown function approximately 400 residues long.

    \ 4835 IPR005227 \

    Holliday junction resolvases (HJRs) are key enzymes of DNA recombination. The principal HJRs are now known or confidently predicted for all bacteria and archaea whose genomes have been completely sequenced, with many species encoding multiple potential HJRs. Structural and evolutionary relationships of HJRs and related nucleases suggests that the HJR function has evolved independently from at least four distinct structural folds, namely RNase H, endonuclease, endonuclease VIIcolicin E and RusA ():

    \ \

    Horizontal gene transfer, lineage-specific gene loss and gene family expansion, and non-orthologous gene displacement seem to have been major forces in the evolution of HJRs and related nucleases. A remarkable case of displacement is seen in the Lyme disease spirochete Borrelia burgdorferi, which does not possess any of the typical HJRs, but instead encodes, in its chromosome and each of the linear plasmids, members of the exonuclease family predicted to function as HJRs. The diversity of HJRs and related nucleases in bacteria and archaea contrasts with their near absence in eukaryotes. The few detected eukaryotic representatives of the endonuclease fold and the RNase H fold have probably been acquired from bacteria via horizontal gene transfer. The identity of the principal HJR(s) involved in recombination in eukaryotes remains uncertain; this function could be performed by topoisomerase IB or by a novel, so far undetected, class of enzymes. Likely HJRs and related nucleases were identified in the genomes of numerous bacterial and eukaryotic DNA viruses. Gene flow between viral and cellular genomes has probably played a major role in the evolution of this class of enzymes.

    \ \ \

    This family represents the YqgF family of putative Holliday junction resolvases. With the exception of the spirochetes, the YqgF family is represented in all bacterial lineages, including the mycoplasmas with their highly degenerate genomes.

    \

    The RuvC resolvases are conspicuously absent in the low-GC Gram-positive bacterial lineage, with the exception of Ureaplasma urealyticum (, PUBMED:10982859). Furthermore, loss of function ruvC mutants of E. coli show a residual HJR activity that cannot be ascribed to the prophage-encoded RusA resolvase PUBMED:8648624. This suggests that the YqgF family proteins could be alternative HJRs whose function partially overlaps with that of RuvC PUBMED:10982859.

    \ \ 5739 IPR008587 \ This family consists of a number of sequences found in plants. The function of this family is unknown.\ 2654 IPR008174 \

    Galanin is a peptide hormone that controls various biological activities PUBMED:1710578. Galanin-like immuno-reactivity has been found in the central and peripheral nervous systems of mammals, with high concentrations demonstrated in discrete regions of the central nervous system, including the median eminence, hypothalamus, arcuate nucleus, septum, neuro-intermediate lobe of the pituitary, and the spinal cord. Its localisation within neurosecretory granules suggests that galanin may function as a neurotransmitter, and it has been shown to coexist with a variety of other peptide and amine neurotransmitters within individual neurons PUBMED:2448788.

    \

    Although the precise physiological role of galanin is uncertain, it has a number of pharmacological properties: it stimulates food intake, when injected into the third ventricle of rats; it increases levels of plasma growth hormone and prolactin, and decreases dopamine levels in the median eminence PUBMED:2448788; and infusion into humans results in hyperglycemia and glucose intolerance, and inhibits pancreatic release of insulin, somatostatin and pancreatic peptide. Galanin also modulates smooth muscle contractility within the gastro-intestinal and genito-urinary tracts, all such activities suggesting that the hormone may play an important role in the nervous modulation of endocrine and smooth muscle function PUBMED:2448788.

    \

    Galanin is a 29 amino acid peptide processed from a larger precursor protein. Except in human, galanin is C-terminally amidated. Its sequence is highly conserved and the first 14 residues are identical in all currently known sequences.

    \ \ 4031 IPR003375 \

    PsaE is a 69 amino acid polypeptide from photosystem I present on the stromal side of the thylakoid membrane. The structure is comprised of a well-defined five-stranded beta-sheet similar to SH3 domains PUBMED:8193119. This subunit may form complexes with ferredoxin and ferredoxin-oxidoreductase in the photosystem I reaction centre.

    \ 734 IPR002869 \ This domain is found in prokaryotes. It includes a region of the large protein pyruvate-flavodoxin oxidoreductase and the whole pyruvate ferredoxin oxidoreductase gamma subunit protein. It is not known whether the\ gamma subunit has a catalytic or regulatory role. Pyruvate\ oxidoreductase (POR) catalyses the final step in the fermentation\ of carbohydrates in anaerobic microorganisms PUBMED:8550425. This involves the\ oxidative decarboxylation of pyruvate with the participation of\ thiamine followed by the transfer of an acetyl moiety to coenzyme\ A for the synthesis of acetyl-CoA PUBMED:8550425. The family also includes\ pyruvate flavodoxin oxidoreductase as encoded by the nifJ gene in\ cyanobacterium which is required for growth on molecular nitrogen\ when iron is limited PUBMED:8415612.\ 6543 IPR010609 \

    This repeat composes the C-terminal part of the bacteriophage T4 baseplate protein Gp5. This region of the protein forms a needle like projection from the baseplate that is presumed to puncture the bacterial cell membrane. Structurally three copies of the repeated region trimerise to form a beta solenoid type structure PUBMED:11823865. This family also includes repeats from bacterial Vgr proteins.

    \ 1704 IPR004877 \

    Cytochrome b561 is a secretory vesicle-specific electron transport protein. It is an integral membrane protein, that binds two heme groups non-covalently.

    \ 5830 IPR009251 \

    This family consists of several alpha-2,3-sialyltransferase (CST-I) proteins largely found in Campylobacter jejuni.

    \ 4867 IPR005360 \

    Members of this family of proteins are about 80 amino acids in length and their function is unknown. The proteins contain a conserved GRY motif.

    \ 2277 IPR006914 \

    This group of proteins, mainly from Neisseria meningitidis, may have hemagglutinin or hemolysin activity. A number of them have a second conserved domain, , which is found in possible Pseudomonas aeruginosa hemagglutinins.

    \ 3534 IPR005614 \

    NrfD is an integral transmembrane protein with loops in both the periplasm and the cytoplasm. NrfD is thought to participate in the transfer of electrons, from the quinone pool into the terminal components of the Nrf pathway PUBMED:8057835.

    \ 3112 IPR003110 \

    Phosphorylated immunoreceptor signaling motifs (ITAMs) exhibit unique abilities to bind and activate\ Lyn and Syk tyrosine kinases PUBMED:7594458. Motif may be dually phosphorylated on tyrosine that links antigen receptors to downstream signalling machinery.

    \ 8124 IPR013238 \

    Rpc25 is a strongly conserved subunit of RNA polymerase III and has homology to Rpa43 in RNA polymerase I, Rpb7 in RNA polymerase II and the archaeal RpoE subunit. Rpc25 is required for transcription initiation and is not essential for the elongating properties of RNA polymerase III PUBMED:15612920.

    \ 6166 IPR010462 \

    This family consists of several bacterial ectoine synthase proteins. The ectABC genes encode the diaminobutyric acid acetyltransferase (EctA), the diaminobutyric acid aminotransferase (EctB), and the ectoine synthase (EctC). Together these proteins constitute the ectoine biosynthetic pathway PUBMED:11823218.

    \ 2497 IPR001670 \

    Alcohol dehydrogenase () (ADH) catalyzes the reversible oxidation of ethanol to acetaldehyde with the concomitant reduction of NAD PUBMED:. Currently three, structurally and catalytically, different types of alcohol dehydrogenases are known:\

    \

    Iron-containing ADH's have been found in yeast (gene ADH4) PUBMED:3584063, as well as in Zymomonas mobilis (gene adhB) PUBMED:2823079. These two iron-containing ADH's are closely related to the following enzymes:\

    \

    \ 3950 IPR006854 \ This is a family of poxvirus proteins required for virus morphogenesis. This protein is necessary for proteolytic processing of the major viral structural proteins, P4a and P4b PUBMED:1920628.\ 6630 IPR009633 \

    This family consists of several Orthopoxvirus specific proteins predominantly of around 340 residues in length. This family contains both B17 and B15 proteins, the function of which are unknown.

    \ 1350 IPR000299 \ This domain is found in a number of cytoskeletal-associated proteins that associate with various proteins at the interface between the plasma membrane and the cytoskeleton. It is a conserved N-terminal domain of about 150 residues PUBMED:2120593, PUBMED:1955455, PUBMED:7983158, involved in the linkage of cytoplasmic proteins to the membrane.\ 2811 IPR004174 \ gpW is a 68 residue protein known to be present in phage particles. Extracts of phage-infected cells lacking gpW contain DNA-filled heads, and active tails, but no infectious virions. gpW is required for the addition of gpFII to the head, which is, in turn, required for the attachment of tails. Since gpFII and tails are known to be attached at the connector, gpW is also likely to assemble at this site. The addition of gpW to filled heads increases the DNase resistance of the packaged DNA, suggesting that gpW either forms a plug at the connector to prevent ejection of the DNA, or binds directly to the DNA. The large number of positively charged residues in gpW (its calculated pI is 10.8) is consistent with a role in DNA interaction PUBMED:11302702.\ 2547 IPR003382 \ This domain is found in diverse flavoprotein enzymes, including epidermin biosynthesis protein, EpiD, which has been shown to be a flavoprotein that binds FMN PUBMED:1644762. This enzyme catalyzes the removal of two reducing equivalents from the cysteine residue of the C-terminal meso-lanthionine of epidermin to form a --C==C-- double bond. This family also includes the B chain of dipicolinate synthase a small polar molecule that accumulates to high concentrations in bacterial endospores, and is thought to play a role in spore heat resistance, or the maintenance of heat resistance PUBMED:8345520. Dipicolinate synthase catalyses the formation of dipicolinic acid from dihydroxydipicolinic acid. This family also includes phenylacrylic acid decarboxylase (EC 4.1.1.-) PUBMED:8181743.\ 2935 IPR006908 \ This is a family of herpesvirus UL49 tegument proteins. It was shown that interactions between herpesvirus envelope and tegument proteins may play a role in secondary envelopment during herpesvirus virion maturation.\ \ \ 6799 IPR009718 \

    This entry represents the C terminus (approximately 30 residues) of a number of Rex proteins. These are redox-sensing repressors that appear to be widespread among Gram-positive bacteria PUBMED:12970197. They modulate transcription in response to changes in cellular NADH/NAD(+) redox state. Rex is predicted to include a pyridine nucleotide-binding domain (Rossmann fold), and residues that might play key structural and nucleotide binding roles are highly conserved.

    \ 5090 IPR007927 \

    This family contains several bacteriophage proteins of\ unknown function.

    \ 4335 IPR003035 \ This domain is named RWP-RK after a conserved motif at the C terminus of the domain. The domain is found\ in algal minus dominance proteins as well as plant proteins involved in nitrogen-controlled development PUBMED:10647012.\ 8001 IPR012603 \

    This domain is found N-terminal to the ARID/BRIGHT domain in DNA-binding proteins of the Retinoblastoma-binding protein 1 family PUBMED:15112237.

    \ 6280 IPR010502 \

    This entry represents the family 9 carbohydrate-binding module (CBD9), which exhibit an immunoglobulin-like beta-sandwich fold, with an additional beta-strand at the N-terminus PUBMED:12796496.

    \

    Bacterial extracellular cellulases and hemicellulases are involved in the hydrolysis of the major structural polysaccharides of plant cell walls. These are usually modular enzymes that contain catalytic and non-catalytic domains. The CBD9 domain binds to cellulose, xylan, as well as to a range of soluble di- and mono-saccharides, and is found in cellulose- and xylan-degrading enzymes PUBMED:9752722.

    \ \ 6255 IPR009448 \

    The N-terminal region of this group of proteins is required for correct folding of the ER UDP-Glc: glucosyltransferase. These proteins selectively reglucosylates unfolded glycoproteins, thus providing quality control for protein transport out of the ER. Unfolded, denatured glycoproteins are substantially better substrates for glucosylation by this enzyme than are the corresponding native proteins. This protein and transient glucosylation may be involved in monitoring and/or assisting the folding and assembly of newly made glycoproteins, in order to identify glycoproteins that need assistance in folding from chaperones

    \ 1746 IPR003206 \ This family contains the large subunit of the trimeric diol dehydratases and glycerol dehydratases. These enzymes are produced by some enterobacteria in response to growth substances.\ 6420 IPR009515 \

    This family consists of several hypothetical short plant proteins from Arabidopsis thaliana and Oryza sativa. The function of this family is unknown.

    \ 6534 IPR009584 \

    This family consists of several Citrus tristeza virus (CTV) 6 kDa, 51 residue long hydrophobic (P6) proteins. The function of this family is unknown.

    \ 7641 IPR012496 \

    These sequences are similar to a region conserved amongst various protein products of the transmembrane channel-like (TMC) gene family, such as Transmembrane channel-like protein 3 () and EVIN2 () - this region is termed the TMC domain PUBMED:12906855. Mutations in these genes are implicated in a number of human conditions, such as deafness and epidermodysplasia verruciformis PUBMED:12906855. TMC proteins are thought to have important cellular roles, and may be modifiers of ion channels or transporters PUBMED:12812529.

    \ 7137 IPR010845 \

    This family consists of several bacterial FlaF flagellar proteins. FlaF and FlaG are trans-acting, regulatory factors that modulate flagellin synthesis during flagellum biogenesis PUBMED:1699845.

    \ 4587 IPR004537 \

    Tellurite resistance protein TehB is part of a tellurite-reducing operon tehA and tehB. When present in high copy number, TehB is responsible for potassium tellurite resistance, probably by increasing the reduction rate of tellurite to metallic tellurium within the bacterium. TehB is a cytoplasmic protein which possesses three conserved motifs (I, II, and III) found in S-adenosyl-L-methionine (SAM)-dependent non-nucleic acid methyltransferases PUBMED:11053398. Conformational changes in TehB are observed upon binding of both tellurite and SAM, suggesting that TehB utilizes a methyltransferase activity in the detoxification of tellurite.

    \ 5163 IPR008000 \

    This family consists of several uncharacterised bacterial proteins of unknown function.

    \ 3368 IPR003454 \ This family consists of monooxygenase components such as MmoB methane monooxygenase () regulatory protein B. When MmoB is present at low concentration it converts methane monooxygenase from an oxidase to a hydroxylase and stabilizes intermediates required for the activation of dioxygen PUBMED:10393915. Also found in this family is DmpM or Phenol hydroxylase () protein component P2, this protein lacks redox co-factors and is required for optimal turnover of Phenol\ hydroxylase PUBMED:9012665. Phenol hydroxylase catabolises phenol and some of its methylated derivatives in the first step of phenol biodegradation, and is required for growth on phenol. The multicomponent enzyme is made up of P0, P1, P2, P3, P4 and P5 polypeptides.\ 7629 IPR012439 \

    This family consists of sequences derived from hypothetical eukaryotic proteins. A region approximately 100 residues in length is featured.

    \ 1092 IPR001227 \ Enzymes like bacterial malonyl CoA-acly carrier protein transacylase () \ and eukaryotic fatty acid synthase () that are involved in fatty acid\ biosynthesis belong to this group. Also included are the polyketide synthases \ 6-methylsalicylic acid synthase (), a multifunctional enzyme that involved\ in the biosynthesis of patulin and conidial green pigment synthase ().\ 6638 IPR009636 \

    This family consists of several phage minor structural protein GP20 sequences of around 180 residues in length. The function of this family is unknown.

    \ 2408 IPR005539 \

    This domain is required for the nuclear localisation of these proteins PUBMED:11352458. All of these proteins are members of the Tale/Knox homeodomain family, a subfamily, containing homeobox .

    \ 3019 IPR001109 \

    The large subunit of [NiFe]-hydrogenase, as well as other nickel metalloenzymes, is synthesised as a precursor devoid of the metalloenzyme active site. This precursor then undergoes a complex post-translational maturation process that requires a number of accessory proteins. The hydrogenase expression/formation proteins (HUPF/HYPC) form a family of small\ proteins that are hydrogenase precursor-specific chaperones required for this maturation process PUBMED:8497190. They are believed to keep the hydrogenase precursor in a conformation accessible for metal incorporation PUBMED:9485446, PUBMED:10783387.

    \ \ 2357 IPR002798 \ Many members of this family have no known function and are predicted to be integral membrane proteins. is annotated as "Stage II sporulation protein M related"; and weakly related to other proteins with similar annotation.\ 67 IPR007803 \ The aspartyl/asparaginyl beta-hydroxylase () specifically hydroxylates one aspartic or asparagine residue in certain epidermal growth factor-like domains of a number of proteins PUBMED:8041771.\ 716 IPR002420 \

    Phosphatidylinositol 3-kinase (PI3-kinase) () is an enzyme\ that phosphorylates phosphoinositides on the 3-hydroxyl group of the inositol\ ring. The usually N-terminal C2 domain interacts mainly with the scaffolding helical domain of the enzyme, and exhibits only minor\ interactions with the catalytic domain PUBMED:12151228. The domain consists of two four-stranded antiparallel beta-sheets\ that form a beta-sandwich. Isolated C2 domain binds multilamellar phospholipid\ vesicles which suggests that this domain could play a role in membrane association. Membrane attachment by C2 domains is typically mediated by the loops connecting beta-strand regions\ that in other C2 domain-containing proteins are calcium-binding region

    \ 6997 IPR010798 \

    This family consists of several eukaryotic triadin proteins. Triadin is a ryanodine receptor and calsequestrin binding protein located in junctional sarcoplasmic reticulum of striated muscles PUBMED:11707337.

    \ 4003 IPR000128 \ Steroid or nuclear hormone receptors (NRs) constitute an important super-\ family of transcription regulators that are involved in widely diverse \ physiological functions, including control of embryonic development, cell\ differentiation and homeostasis. Members of the superfamily include the\ steroid hormone receptors and receptors for thyroid hormone, retinoids, \ 1,25-dihydroxy-vitamin D3 and a variety of other ligands. The proteins \ function as dimeric molecules in nuclei to regulate the transcription of \ target genes in a ligand-responsive manner PUBMED:7899080, PUBMED:8165128. In addition to C-terminal\ ligand-binding domains, these nuclear receptors contain a highly-conserved,\ N-terminal zinc-finger that mediates specific binding to target DNA \ sequences, termed ligand-responsive elements. In the absence of ligand,\ steroid hormone receptors are thought to be weakly associated with nuclear\ components; hormone binding greatly increases receptor affinity.\ \

    NRs are extremely important in medical research, a large number of them\ being implicated in diseases such as cancer, diabetes, hormone resistance\ syndromes, etc. While several NRs act as ligand-inducible transcription\ factors, many do not yet have a defined ligand and are accordingly termed \ "orphan" receptors. During the last decade, more than 300 NRs have been\ described, many of which are orphans, which cannot easily be named due to \ current nomenclature confusions in the literature. However, a new system \ has recently been introduced in an attempt to rationalise the increasingly \ complex set of names used to describe superfamily members.

    \

    The progesterone receptor consists of 3 functional and structural domains:\ an N-terminal (modulatory) domain; a DNA binding domain that mediates\ specific binding to target DNA sequences (ligand-responsive elements);\ and a hormone binding domain. The N-terminal domain is unique to the \ progesterone receptors and spans approximately the first 500 residues;\ the highly-conserved DNA-binding domain is smaller (around 65 residues)\ and occupies the central portion of the protein; and the hormone binding\ domain lies at the receptor C-terminus.\

    \ 8027 IPR013265 \

    This entry contains putative genes, of 129 bp, from the Trichothecene gene cluster of Fusarium sporotrichioides and F. graminearum that encode a predicted protein of 43 amino acids whose function is unknown PUBMED:12080147, PUBMED:11352533.

    \ 2479 IPR003953 \

    In bacteria two distinct, membrane-bound, enzyme complexes are responsible for\ the interconversion of fumarate and succinate (): fumarate\ reductase (Frd) is used in anaerobic growth, and succinate dehydrogenase (Sdh)\ is used in aerobic growth. Both complexes consist of two main components: a\ membrane-extrinsic component composed of a FAD-binding flavoprotein and an\ iron-sulphur protein; and an hydrophobic component composed of a membrane\ anchor protein and/or a cytochrome B.

    \

    In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) ()\ is an enzyme composed of two subunits: a FAD flavoprotein and and iron-sulphur\ protein.

    \

    The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is\ covalently bound to a histidine residue which is located in the N-terminal\ section of the protein PUBMED:2668268. The sequence around that histidine is well\ conserved in Frd and Sdh from various bacterial and eukaryotic species PUBMED:1375942.

    \

    This family includes members that bind FAD such as the flavoprotein subunits from\ succinate and fumarate dehydrogenase, aspartate oxidase and the alpha subunit of adenylylsulphate\ reductase.

    \ 2453 IPR000801 \ This family contains several seemingly unrelated proteins, including human esterase D; \ mycobacterial antigen 85, which is responsible for the high affinity of mycobacteria to \ fibronectin; Corynebacterium glutamicum major secreted protein PS1; and hypothetical proteins \ from Escherichia coli, yeast, mycobacteria and Haemophilus influenzae.\ 5297 IPR008904 \ The largest of the mammalian translation initiation factors, eIF3, consists of at least eight subunits ranging in mass from 35 to 170 kDa. eIF3 binds to the 40 S ribosome in an early step of translation initiation and promotes the binding of methionyl-tRNAi and mRNA PUBMED:8995409.\ 340 IPR001799 \

    Ephrins are a family of proteins PUBMED:7838529 that are ligands of class V (EPH-related) receptor protein-tyrosine kinases (see ). These receptors and their ligands have been implicated in regulating neuronal axon guidance and in patterning of the developing nervous system and may also serve a patterning and compartmentalization role outside of the nervous system as well.

    \

    Ephrins are membrane-attached proteins of 205 to 340 residues. Attachment appears to be crucial for their normal function. Type-A ephrins are linked to the membrane via a glycosylphosphatidylinositol (GPI)-linkage, while type-B ephrins are type-I membrane proteins.

    \ \ 7851 IPR012550 \

    This family contains many hypothetical proteins from bacteria and yeast.

    \ 4585 IPR005015 \

    Thermostable direct\ hemolysin (TDH) is considered an important virulence factor in Vibrio parahaemolyticus gastroenteritis and is a dimer composed of two identical subunit\ molecules of approximately 21 kDa. A number of biological properties have been attributed to TDH including hemolytic activity, enterotoxicity,\ cytotoxicity and cardiotoxicity PUBMED:11267763.

    \ 5070 IPR007907 \

    This family consists of several uncharacterised baculovirus proteins of unknown function.

    \ 5719 IPR008576 \ This family consists of several eukaryotic proteins of unknown function that are S-adenosyl-L-methionine-dependent methyltransferase-like.\ 4221 IPR000244 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L9 is one of the proteins from the large ribosomal subunit.\ In Escherichia coli, L9 is known to bind directly to the 23S rRNA. It belongs\ to a family of ribosomal proteins grouped on the basis of sequence similarities PUBMED:, PUBMED:8306963.

    \

    The crystal structure of Bacillus stearothermophilus L9 shows the 149-residue protein comprises two globular domains connected by a rigid linker PUBMED:12051860. Each domain contains an rRNA binding site, and the protein functions as a\ structural protein in the large subunit of the ribosome. The C-terminal domain consists of two loops, an alpha-helix and a three-stranded mixed\ parallel, anti-parallel beta-sheet packed against the central alpha-helix. The long central alpha-helix is exposed to solvent in the middle and participates in the\ hydrophobic cores of the two domains at both ends.

    \ 6915 IPR010767 \

    This family consists of several hypothetical bacterial proteins of around 100 residues in length. The function of this family is unknown.

    \ 5131 IPR007968 \

    This family consists of several uncharacterised tobravirus proteins of unknown function.

    \ 2796 IPR002012 \ The gonadotropin-releasing hormones (GnRH) (gonadoliberin) PUBMED: are a family\ of peptides that play a pivotal role in reproduction. The main function of\ GnRH is to act on the pituitary to stimulate the synthesis and secretion of\ luteinizing and follicle-stimulating hormones, but GnRH also acts on the\ brain, retina, sympathetic nervous system, gonads and placenta in certain\ species. There seems to be at least three forms of GnRH. The second form is\ expressed in midbrain and seems to be widespread. The third form has only been\ found so far in fish.\ GnRH is a C-terminal amidated decapeptide processed from a larger precursor\ protein. Four of the ten residues are perfectly conserved in all species\ where GnRH has been sequenced.\ 5673 IPR008772 \ This family consists of several bacterial PhnH sequences which are known to be involved in phosphonate metabolism PUBMED:2155230.\ 3260 IPR005555 \ The M-factor is a pheromone produced upon nitrogen starvation. The production of M-factor is increased by the pheromone signal. The protein undergoes post-translational modification to remove the C-terminal signal peptide, the carboxy-terminal cysteine residue is carboxy-methylated and S-alkylated with a farnesyl residue PUBMED:8878833.\ 6798 IPR010724 \

    This entry represents the N terminus (approximately 80 residues) of replication initiator protein A (RepA), a DNA replication initiator in plasmids PUBMED:12637554. Most proteins in this entry are bacterial, but archaeal and eukaryotic members are also included.

    \ 2837 IPR005628 \ Members of this family are involved in the general secretion pathway. The family includes proteins such as ExeK, PulK, OutX and XcpX.\ 6160 IPR009401 \

    Members of this family have been shown to be involved in transcriptional repression via the Mediator complex PUBMED:12738880.

    \ 1847 IPR001434 \

    This group of sequences is represented by a conserved region of about 53 amino acids shared\ between regions, usually repeated, of proteins from a small number of\ phylogenetically distant prokaryotes. Examples include a 132-residue region found\ repeated in three of the five longest proteins of Bacillus anthracis, a 131-residue\ repeat in a cell wall-anchored protein of Enterococcus faecalis, and a 120-residue\ repeat in Methanobacterium thermoautotrophicum. A similar region is found in some\ Chlamydia trachomatis outer membrane proteins.

    \

    In Chlamydia trachomatis three cysteine-rich proteins (also believed to be lipoproteins), MOMP, OMP6 and OMP3, make up the\ extracellular matrix of the outer membrane PUBMED:2287277. They are involved \ in the essential structural integrity of both the elementary body (EB) and \ recticulate body (RB) phase. They are thought to be involved in porin formation and as these bacteria lack the peptidoglycan layer\ common to most Gram-negative microbes, such proteins are highly important \ in the pathogenicity of the organism.

    \ 1485 IPR005036 \

    This family consists of several eukaryotic proteins that are thought to be involved in the regulation of glycogen metabolism. For instance, the mouse PTG protein has been shown to interact with glycogen synthase, phosphorylase kinase, phosphorylase a: these three enzymes have key roles in the regulation of\ glycogen metabolism. PTG also binds the catalytic subunit of protein phosphatase 1 (PP1C) and localizes it to glycogen. Subsets of similar interactions have been\ observed with several other members of this family, such as the yeast PIG1, PIG2, GAC1 and GIP2 proteins. While the precise function of these proteins is not\ known, they may serve a scaffold function, bringing together the key enzymes in glycogen metabolism. This entry is a carbohydrate binding domain.

    \ 4348 IPR000858 \ In Brassicaceae, self-incompatible plants have a self/non-self recognition system, which involves \ the inability of flowering plants to achieve self-fertilization. This is sporophytically controlled \ by multiple alleles at a single locus (S). There are a total of 50 different S alleles in Brassica oleracea.\ S-locus glycoproteins, as well as S-receptor kinases, are in linkage with the S-alleles PUBMED:7672580. Most of the proteins within this family contain apple-like domain (), which is predicted to possess protein- and/or carbohydrate-binding functions.\ 8090 IPR013271 \

    This family contain neuropeptides, isolated from ganglia of the African giant snail, Achatina fulica. Each peptide has a Trp residue at both the N- and C-termini. Purified WWamide-1, -2 and -3 showed an inhibitory effect on the phasic contractions of the anterior byssus retractor muscle (ABRM) PUBMED:8495720.

    \ 5797 IPR010284 \

    This family consists of several short hypothetical plant and cyanobacterial proteins. In plants these proteins are localised to the chloroplast and are known as hypothetical chloroplast protein 12. This family is likely to play some role in photosynthesis.

    \ 2518 IPR000253 \

    The forkhead-associated (FHA) domain PUBMED:7482699 is a phosphopeptide recognition domain found in many regulatory proteins. It displays specificity for phosphothreonine-containing epitopes but will also recognise phosphotyrosine with relatively high affinity. It spans approximately 80-100 amino acid residues folded into an 11-stranded beta sandwich, which sometimes contain small helical insertions between the loops connecting the strands PUBMED:11911881.

    \ \

    To date, genes encoding FHA-containing proteins have been identified in eubacterial and eukaryotic but not archaeal genomes. The domain is present in a diverse range of proteins, such as kinases, phosphatases, kinesins, transcription factors, RNA-binding proteins and metabolic enzymes which partake in many different cellular processes - DNA repair, signal transduction, vesicular transport and protein degradation are just a few examples.

    \ 6614 IPR009624 \

    This is a group of proteins of unknown function.

    \ 6979 IPR009821 \

    This family consists of several Enterobacterial proteins of around 50 residues in length. Members of this family are found in Escherichia coli and Salmonella typhi where they are often known as YdfA. The function of this family is unknown.

    \ 7490 IPR011662 \ This is a short domain found at the N terminus of the secretins of the bacterial type II/III secretory system as well as the TonB-dependent receptor proteins. These proteins are involved in TonB-dependent active uptake of selective substrates.\ 592 IPR005303 \

    This domain is found to the N-terminus of MOSC domain (). The function of this domain is unknown, however it is predicted to adopt a beta barrel fold.

    \ 1737 IPR006719 \

    The defective chorion-1 gene (dec-1) in Drosophila encodes follicle cell proteins necessary for proper eggshell assembly. Multiple products of the dec-1 gene are formed by alternative RNA splicing and proteolytic processing PUBMED:1699826. Cleavage products include S80 (80 kDa) which is incorporated into the eggshell, and further proteolysis of S80 gives S60 (60 kDa).

    \

    This domain is present at the N-terminal of these proteins.

    \ 544 IPR004238 \ Different types of late embryogenesis abundant (LEA) proteins are expressed at different stages of late embryogenesis in higher plant seed embryos and\ under conditions of dehydration stress. They may be induced by abscisic acid. This domain may be repeated several times in these proteins whose function is unknown.\ 5625 IPR008897 \ This family consists of the Saccharomyces cerevisiae trans-acting factor B and C (REP1 and 2) proteins. The S. cerevisiae plasmid stability system consists of two plasmid-coded proteins, Rep1 and Rep2, and a cis-acting locus, STB. The Rep proteins show both self- and cross-interactions in vivo and in vitro, and bind to the STB DNA with assistance from host factor(s). Within the S. cerevisiae nucleus, the Rep1 and Rep2 proteins tightly associate with STB-containing plasmids into well organised plasmid foci that form a cohesive unit in partitioning. It is generally accepted that the protein-protein and DNA-protein interactions engendered by the Rep-STB system are central to plasmid partitioning. Point mutations in Rep1 that knock out interaction with Rep2 or with STB simultaneously block the ability of these Rep1 variants to support plasmid stability PUBMED:12177044.\ 1378 IPR005499 \ This family contains the enzyme 6-carboxyhexanoate--CoA ligase . This enzyme is involved in the first step of biotin synthesis, where it converts pimelate into pimeloyl-CoA PUBMED:1445232. The enzyme requires magnesium as a cofactor and forms a homodimer PUBMED:1445232.\ 1179 IPR003298 \ A novel antigen of Plasmodium falciparum has been cloned that contains a hydrophobic domain typical of an integral membrane protein. The antigen\ is designated apical membrane antigen 1 (AMA-1) by virtue of appearing to \ be located in the apical complex PUBMED:2701947. AMA-1 appears to be transported to \ the merozoite surface close to the time of schizont rupture. \

    The 66kDa merozoite surface antigen (PK66) of Plasmodium knowlesi, a simian\ malaria, possesses vaccine-related properties believed to originate from a \ receptor-like role in parasite invasion of erythrocytes PUBMED:2211675. The sequence\ of PK66 is conserved throughout plasmodium, and shows high similarity to\ P.falciparum AMA-1. Following schizont rupture, the distribution of PK66 \ changes in a coordinate manner associated with merozoite invasion. Prior\ to rupture, the protein is concentrated at the apical end, following which\ it distributes itself entirely across the surface of the free merozoite.\ Immunofluorescence studies suggest that, during invasion, PK66 is excluded\ from the erythrocyte at, and behind, the invasion interface PUBMED:2211675.

    \ 7942 IPR012518 \

    This family consists of the ocellatin family of antimicrobial peptides. Ocellatins are produced from the electrical-stimulated skin secretions of the South American frog, Leptodactylus ocellatus. The family consists of three structurally related peptides, ocellatin 1, ocellatin 2 and ocellatin 3. These peptides present haemolytic activity against human erythrocytes and are also active against Escherichia coli PUBMED:15648972.

    \ 4658 IPR005017 \

    This family includes TodX from Pseudomonas putida F1 and TbuX from Ralstonia pickettii PKO1 . These are membrane proteins of uncertain\ function that are involved in toluene catabolism. Related proteins involved in the degradation of similar aromatic hydrocarbons are also in this family, such as CymD .

    \ 4935 IPR000416 \ VP4 is one of the two surface proteins of rotaviruses (the other one being VP7). \ VP4 is the rotavirus cell attachment protein in vitro and in vivo PUBMED:8523562. The receptor-binding specificity of rotaviruses, via VP4, may be influenced by the associated VP7 protein PUBMED:8551583. Positions 150 and 187 of VP4 play an important role in early rotavirus-cell interactions PUBMED:9568967.\ 101 IPR005607 \

    The BSD domain is an about 60-residue long domain named after the BTF2-like\ transcription factors, Synapse-associated proteins and DOS2-like proteins in\ which it is found. Additionally, it is also found in several hypothetical\ proteins. The BSD domain occurs in one or two copies in a variety of species\ ranging from primal protozoan to human. It can be found associated with other\ domains such as the BTB domain (see ) or the U-box in multidomain\ proteins. The function of the BSD domain is yet unknown PUBMED:11943536.

    \ \

    Secondary structure prediction indicates the presence of three predicted alpha\ helices, which probably form a three-helical bundle in small domains. The\ third predicted helix contains neighboring phenylalanine and tryptophan\ residues - less common amino acids that are invariant in all the BSD domains\ identified and that are the most striking sequence features of the domain PUBMED:11943536.\

    \ Some proteins known to contain one or two BSD domains are listed below:\ \
  • Mammalian TFIIH basal transcription factor complex p62 subunit (GTF2H1).
  • \
  • Yeast RNA polymerase II transcription factor B 73 kDa subunit (TFB1), the\ homologue of BTF2.
  • \
  • Yeast DOS2 protein. It is involved in single-copy DNA replication and\ ubiquitination.
  • \
  • Drosophila synapse-associated protein SAP47.
  • \
  • Mammalian SYAP1.
  • \
  • Various Arabidopsis thaliana hypothetical proteins.
  • \ 4349 IPR001673 \

    Several Dictyostelium species have proteins that contain conserved repeats. These proteins have been variously described as extracellular matrix protein B', cyclic nucleotide phosphodiesterase inhibitor precursor', prestalk protein precursor', 'putative calmodulin-binding protein CamBP64', and cysteine-rich, acidic integral membrane protein precursor' as well as 'hypothetical protein'. The repeats are not confined to Dictyostelium spp, they occur in the Ascomycete Trichoderma harzianum in one of the conidiospore surface proteins, .

    \ 6945 IPR009803 \

    This family consists of several hypothetical proteins which seem to be specific to Oryzias latipes (Japanese ricefish). Members of this family are typically around 200 residues in length. The function of this family is unknown.

    \ 4488 IPR001537 \ The spoU gene of Escherichia coli codes for a protein that shows strong similarities to previously characterized 2'-O-methyltransferases PUBMED:9321663, PUBMED:8265370. The Pet56 protein of Saccharomyces cerevisiae has been shown to be required for ribose methylation at a universally conserved nucleotide in the peptidyl transferase center of the mitochondrial large ribosomal RNA (21S rRNA). Cells reduced in this activity were deficient in formation of functional large subunits of the mitochondrial ribosome. The Pet56 protein catalyzes the site-specific formation of 2'-O-methylguanosine on in vitro transcripts of both mitochondrial 21S rRNA and E. coli 23S rRNA providing evidence for an essential modified nucleotide in rRNA PUBMED:8266080.\ 3118 IPR002487 \ The K-box region is commonly found associated with SRF-type\ transcription factors see . The K-box is a possible\ coiled-coil structure PUBMED:2031185. Possible role in multimer formation PUBMED:7958839.\ 1918 IPR003806 \

    This entry describes proteins of unknown function.

    \ 1010 IPR001878 \ The 18 residues CCHC zinc finger domain is mainly found in the Nucleocapsid protein of retrovirus. It is required for viral genome packaging and for early infection process PUBMED:2695083, PUBMED:10074407. It is also found in eukaryotic proteins involved in RNA binding or single strand DNA binding.\ 5515 IPR008540 \ This group of proteins contains members of the BZR1/LAT61 family of plant transcriptional repressors involved in controlling the response to Brassinosteroids (BRs). BRs are plant hormones that play essential roles in growth and development. BZR1 binds directly to DNA repressing the synthesis of genes involved in BR synthesis. Phosphorylation of BZR1 by BIN1 targets BZR1 to the 20S proteosome, while dephosphorylation leads to nuclear accumulation of BZR1 PUBMED:15681342.\ 3763 IPR001847 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S21 (assemblin family, clan 21).

    \ \

    A number of viral proteases have been discovered and their sequence similarity is very low. Studies with protease inhibitors suggest that the herpesvirus protease is a serine protease belonging to either the trypsin-like or subtilisin-like families; it is not inhibited by inhibitors of Cys, Asp or metallo proteases.

    \ 7993 IPR012978 \

    This is the central domain of a novel family of hypothetical nucleolar proteins PUBMED:15112237.

    \ 1705 IPR005798 \

    In the mitochondrion of eukaryotes and in aerobic prokaryotes, cytochrome b is a component of respiratory chain complex III () - also known as the bc1 complex or ubiquinol-cytochrome c reductase. In plant chloroplasts and cyanobacteria, there is a analogous protein, cytochrome b6, a component of the plastoquinone-plastocyanin reductase (), also known as the b6f complex.

    \

    Cytochrome b/b6 PUBMED:2509716, PUBMED:8329437 is an integral membrane protein of approximately 400 amino acid residues that probably has 8 transmembrane segments. In plants and cyanobacteria, cytochrome b6 consists of two subunits encoded by the petB and petD genes. The sequence of petB is colinear with the N-terminal part of mitochondrial cytochrome b, while petD corresponds to the C-terminal part.\ Cytochrome b/b6 non-covalently binds two heme groups, known as b562 and b566. Four conserved histidine residues are postulated to be the ligands of the iron atoms of these two heme groups.

    \

    Apart from regions around some of the histidine heme ligands, there are a few conserved regions in the sequence of b/b6. The best conserved of these regions includes an invariant P-E-W triplet which lies in the loop that separates the fifth and sixth transmembrane segments. It seems to be important for electron transfer at the ubiquinone redox site - called Qz or Qo (where o stands for outside) - located on the outer side of the membrane. This entry is the C-terminus of these proteins.

    \ 978 IPR005378 \

    The movement of lipid and protein components between intracellular organelles requires the regulated interactions of many\ molecules. Vacuolar protein sorting-associated protein (Vps)5 is a yeast protein that is a subunit of a large multimeric\ complex, termed the retromer complex, involved in retrograde transport of proteins from endosomes to the trans-Golgi network. Sorting nexin (SNX) 1 and SNX2 are its mammalian orthologs PUBMED:11102511.

    \ \

    To carry out its biological functions, Vps5 forms the retromer complex\ with at least four other proteins: Vps17, Vps26, Vps29, and Vps35.Vps35 contains a central region of weaker sequence similarity, thought to indicate the presence of at least three domains PUBMED:11102511.

    \ 4596 IPR003325 \ This domain is found in tellurite resistance proteins, cAMP binding protein, and chemical-damaging agent resistance proteins and general stress proteins. \ Tellurium compounds are used in several industrial processes, although they are\ relatively rare in the environment. Genes associated with tellurite resistance (TeR) are found in many pathogenic bacteria PUBMED:10203839. \

    The cellular slime mould, Dictyostelium discoideum, contains a cAMP-binding protein, CABP1, which is composed of two subunits. The C-terminal half of these subunits contain this domain PUBMED:2176639.

    \ 4976 IPR005380 \

    The XS (rice gene X and SGS3) domain is found in a family of plant proteins including gene X and SGS3 . SGS3 is thought to be involved in post-transcriptional gene silencing (PTGS). This domain contains a conserved aspartate residue that may be functionally important.

    The XS domain containing proteins contain coiled-coils, which suggests that they will\ oligomerise. Most coiled-coil proteins form either a dimeric or a trimeric structure. It is possible that different members\ of the XS domain family could oligomerise via their coiled-coils forming a variety of complexes PUBMED:12162795.

    \ 2925 IPR004021 \ This domain has no known function. It is found in one or two copies per protein, and is found associated with the PAAD/DAPIN domain .\ 5110 IPR007947 \

    CD164 is a mucin-like receptor, or sialomucin, with specificity in\ receptor/\ ligand interactions that depends on the structural characteristics of the\ mucin-like receptor. Its functions include mediating, or regulating,\ haematopoietic progenitor cell adhesion and the negative regulation of their\ growth and/or-differentiation. It exists in the native state as a\ disulphide-\ linked homodimer of two 80-85kDa subunits. It is usually expressed by CD34+\ and CD341o/- haematopoietic stem cells and associated microenvironmental\ cells. It contains, in its extracellular region, two mucin domains (I and\ II)\ linked by a non-mucin domain, which has been predicted to contain intra-\ disulphide bridges. This receptor may play a key role in haematopoiesis\ by facilitating the adhesion of human CD34+ cells to bone marrow stroma and\ by negatively regulating CD34+ CD341o/- haematopoietic progenitor cell\ proliferation. These effects involve the CD164 class I and/or II epitopes\ recognised by the monoclonal antibodies (mAbs) 105A5 and 103B2/9E10. These\ epitopes are carbohydrate-dependent and are located on the N-terminal\ mucin domain I PUBMED:10491205, PUBMED:11027692.

    \

    It has been found that murine MGC-24v and rat endolyn share significant\ sequence similarities with human CD164. However, CD164 lacks the consensus\ glycosaminoglycan (GAG)-attachment site found in MGC-24; it is possible\ that GAG-association is responsible for the high molecular weight of the\ epithelial-derived MGC-24 glycoprotein PUBMED:9763543.\

    \

    Genomic structure studies have placed CD164 within the mucin-subgroup\ that\ comprises multiple exons, and demonstrate the diverse chromosomal\ distribution of this family of molecules. Molecules with such multiple\ exons may have sophisticated regulatory mechanisms that involve not only\ post-translational modifications of the oligosaccharide side chains, but\ also differential exon usage. Although differences in the intron and exon\ sizes are seen between the mouse and human genes, the predicted proteins\ are similar in size and structure, maintaining functionally important\ motifs that regulate cell proliferation or subcellular distribution \ PUBMED:11027692.\

    \

    CD164 is a gene whose expression depends on differential usage of poly-\ adenylation sites within the 3'-UTR. The conserved distribution of the\ 3.2- and 1.2-kb CD164 transcripts between mouse and human suggests that\ (i) a mechanism may exist to regulate tissue-specific polyadenylation, and\ (ii) differences in polyadenylation are important for the expression and\ function of CD164 in different tissues. Two other aspects of the structure\ of CD164 are of particular interest. First, it shares one of several\ conserved features of a cytokine-binding pocket - in this respect, it is\ notable that evidence exists for a class of cell-surface sialomucin\ modulators that directly interact with growth factor receptors to regulate\ their response to physiological ligands. Second, its cytoplasmic tail\ contains a C-terminal YHTL motif found in many endocytic membrane proteins\ or receptors. These Tyr-based motifs bind to adaptor proteins, which mediate\ the sorting of membrane proteins into transport vesicles from the plasma\ membrane to the endosomes, and between intracellular compartments.\

    \ \ 6029 IPR010400 \

    This is a family of proteins of unknown function.

    \ 7215 IPR009973 \

    This family consists of several Seadornavirus specific VP7 proteins of around 305 residues in length. The function of this family is unknown.

    \ 8094 IPR013206 \

    These peptides are designated Leucophaea maderae tachykinin-related peptides (Lem TRPs). Some were isolated from the midgut of L. maderae, whereas others appear to be brain specific. The Lem TRPs of the brain are myotropic and induce increases in the amplitude and frequency of spontaneous contractions and tonus of hindgut muscle in L. maderae PUBMED:9114447. They were also isolated from brain-corpora, cardiaca-corpora, allata-suboesophageal ganglion extracts of the Locusta migratoria. They stimulate visceral muscle contractions of the oviduct and the foregut of Locusta migratoria PUBMED:2132575.

    \ 4296 IPR002156 \

    The RNase H domain is responsible for hydrolysis of the RNA portion of RNA x DNA hybrids, and this activity requires the presence of divalent cations (Mg2+ or Mn2+) that bind its active site. This domain is a part of a large family of homologous RNase H enzymes of which the RNase HI protein from Escherichia coli is the best characterised PUBMED:9741851. Secondary structure predictions for the enzymes from E. coli, yeast, human liver and diverse retroviruses (HIV, Rous sarcoma virus, foamy viruses) supported, in every case, the five beta-strands (1 to 5) and four or five alpha-helices (A, B/C, D, E) that have been identified by crystallography in the RNase H domain of HIV-1 reverse transcriptase and in E. coli RNase H PUBMED:10603172. Reverse transcriptase (RT) is a modular enzyme carrying polymerase and ribonuclease H (RNase H) activities in separable domains. Reverse transcriptase (RT) converts the single-stranded RNA genome of a retrovirus into a double-stranded DNA copy for integration into the host genome. This process requires ribonuclease H as well as RNA- and DNA-directed DNA polymerase activities.

    \ \

    Retroviral RNase H is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. Bacterial RNase H () catalyses endonucleolytic cleavage to 5'-phosphomonoester acting on RNA-DNA hybrids.

    \ \

    The 3D structure of the RNase H domain from diverse bacteria and retroviruses\ has been solved PUBMED:2169648, PUBMED:8108376, PUBMED:1707186. All have four beta\ strands and four to five alpha helices. The Escherichia coli RNase H1 protein\ binds a single Mg2+ ion cofactor in the active site of the enzyme. The\ divalent cation is bound by the carboxyl groups of four acidic residues,\ Asp-10, Glu-48, Asp-70, and Asp-134 PUBMED:8108376. The first three acidic residues are\ highly conserved in all bacterial and retroviral RNase H sequences.\

    \ 5197 IPR008032 \

    This is a family of unknown function found in archaebacterial proteins. The family has been solved via structural\ genomics techniques and comprises of segregated helical and anti-parallel beta sheet regions. It is a putative metal-binding protein.

    \ 2967 IPR003996 \

    Secretion of virulence factors in Gram-negative bacteria involves transportation of the protein across two membranes to reach the cell exterior PUBMED:1558765. Four principal exotoxin secretion systems have been described. In the type II and IV secretion systems, toxins are first exported to the periplasm by way of a cleaved N-terminal signal sequence; a second set of proteins is used for extracellular transport (type II), or the C-terminus of the exotoxin itself is used (type IV). Type III secretion involves at least 20 molecules that assemble into a needle; effector proteins are then translocated through this without need of a signal sequence. In the Type I system, a complete channel is formed through both membranes, and the secretion signal is carried on the C-terminus of the exotoxin.

    \

    The RTX (repeats in toxin) family of cytolytic toxins belong to the Type I \ secretion system, and are important virulence factors in Gram-negative bacteria. As well as the C-terminal signal sequence, several glycine-rich\ repeats are also found. These are essential for binding calcium, and are critical for the biological activity of the secreted toxins PUBMED:8800842. All RTX toxin operons exist in the order rtxCABD, RtxA protein being the structural\ component of the exotoxin, both RtxB and D being required for its export from the bacterial cell; RtxC is an acyl-carrier-protein-dependent acyl- modification enzyme, required to convert RtxA to its active form PUBMED:10470043.

    \

    Escherichia coli hemolysin (HlyA) is often quoted as the model for RTX \ toxins. Recent work on its relative rtxC gene product HlyC PUBMED:9521785 has revealed that it provides the acylation aspect for post-translational modification of two internal lysine residues in the HlyA protein. Other residues, including His23 and two conserved tyrosine residues, also appear to be important PUBMED:10413532.

    \ 941 IPR002525 \

    Transposase proteins are necessary for efficient DNA transposition.\ This family includes an amino-terminal region of the pilin gene inverting\ protein (PIVML) and members of the IS111A/IS1328/IS1533 family of\ transposases.

    \ 7325 IPR011113 \

    The Rho termination factor disengages newly transcribed RNA from its DNA template at certain, specific transcripts. It is thought that two copies of Rho bind to RNA and that Rho functions as a hexamer of protomers PUBMED:10230401.

    \ 5335 IPR008673 \ This family consists of several mammalian microfibril-associated glycoprotein (MAGP) 1 and 2 proteins. MAGP1 and 2 are components of elastic fibres. MAGP-1 has been proposed to bind a C-terminal region of tropoelastin, the soluble precursor of elastin. MAGP-2 was found to interact with fibrillin-1 and -2, as well as fibulin-1, another component of elastic fibres. This suggests that MAGP-2 may be important in the assembly of microfibrils PUBMED:12122015.\ 4790 IPR004286 \ UL16 protein may play a role in capsid maturation including DNA packaging/cleavage PUBMED:9645194. In immunofluorescence studies PUBMED:8955043, UL16\ was localised to the nucleus of infected cells in areas containing high concentrations of HSV capsid proteins. These\ nuclear compartments have been described previously as viral assemblons PUBMED:8676489 and are distinct from compartments\ containing replicating DNA. Localization within assemblons argues for a role of UL16 encoded protein in capsid\ assembly or maturation PUBMED:8955043.\ 7165 IPR010853 \

    This repeat is found in the CagY proteins - part of the CAG pathogenicity island - and involved in delivery of the protein CagA into host cells PUBMED:12823823. It forms part of a surface needle structure, and this repeat may form an alpha-helical rod structure PUBMED:12823823. The repeat contains a conserved -DC- and -EC-, which are regularly spaced in the alignment.

    \ 5224 IPR008768 \

    This family contains the capsid assembly protein (scaffolding protein) of bacteriophage T7.

    \ 2511 IPR004207 \ Ferredoxin thioredoxin reductase is a [4FE-4S] protein which plays an important role in the ferredoxin/thioredoxin regulatory chain. It converts an electron signal (photoreduced ferredoxin) to a thiol signal (reduced thioredoxin), regulating enzymes by reduction of specific disulphide groups. It catalyses the light-dependent activation of several photosynthetis enzymes. Ferredoxin thioredoxin reductase is a heterodimer of subunit a and subunit b. Subunit a is the variable subunit, and b is the catalytic chain. This family is the alpha chain.\ 999 IPR000465 \ Xeroderma pigmentosum (XP) PUBMED:8160271 is a human autosomal recessive disease,\ characterised by a high incidence of sunlight-induced skin cancer. Skin cells of individuals with this condition are hypersensitive to ultraviolet light, due\ to defects in the incision step of DNA excision repair. There are a minimum of\ seven genetic complementation groups involved in this pathway: XP-A to XP-G.\ XP-A is the most severe form of the disease and is due to defects in a 30 kDa\ nuclear protein called XPA (or XPAC) PUBMED:1918083.\ The sequence of the XPA protein is conserved from higher eukaryotes PUBMED:1764072 to\ yeast (gene RAD14) PUBMED:1741034. XPA is a hydrophilic protein of 247 to 296 amino-acid\ residues which has a C4-type zinc finger motif in its central section.\ \ 4686 IPR001264 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 51 comprises enzymes with only one known activity; murein polymerases (). These enzymes utilise MurNAc-GlcNAc-P-P-lipid II as the sugar donor.

    \ \ \

    The family includes the bifunctional penicillin-binding proteins that have a \ transglycosylase (N-terminus) and transpeptidase (C-terminus) domain PUBMED:9244263 and \ the monofunctional biosynthetic peptidoglycan transglycosylases PUBMED:8830253.

    \ 5730 IPR008379 \ There is a unique sequence domain at the C terminus of all known 4.1 proteins, known as the C-terminal domain (CTD). Mammalian CTDs are associated with a growing number of protein-protein interactions, although such activities have yet to be associated with invertebrate CTDs. Mammalian CTDs are generally defined by sequence alignment as encoded by exons 18-21. Comparison of known vertebrate 4.1 proteins with invertebrate 4.1 proteins indicates that mammalian 4.1 exon 19 represents a vertebrate adaptation that extends the sequence of the CTD with a Ser/Thr-rich sequence. The CTD was first described as a 22/24 kDa domain by chymotryptic digestion of erythrocyte 4.1 (4.1R). CTD is thought to represent an independent folding structure which has gained function since the divergence of vertebrates from invertebrates PUBMED:11432737.\ 4803 IPR001526 \

    CD59 (also called 1F-5Ag, H19, HRF20, MACIF, MIRL, P-18 or protectin) inhibits formation of membrane attack complex (MAC), thus protecting cells from complement mediated lysis. It has a signaling role, as a GPI-anchored molecule, in T cell activation and appears to have some role in cell adhesion through CD2 (controversial). CD59 associates with C9, inhibiting incorporation into C5b-8 preventing terminal steps in polymerization of the (MAC) in plasma membranes. Genetic defects in GPI-anchor attachment that cause a reduction or loss of both CD59 and CD55 on erythrocytes produce the symptoms of the disease\ paroxysmal nocturnal hemoglobinuria (PNH).

    \ \

    A variety of GPI-linked cell-surface glycoproteins are composed of one or more copies of a conserved domain of about 100 amino-acid residues PUBMED:1850423, PUBMED:8394346. Among these proteins, U-PAR contains three tandem copies of the domain, while all the others are made up of a single domain.

    \

    As shown in the following schematic, this conserved domain contains 10 cysteine residues involved in five disulphide bonds - in U-PAR, the first copy of the domain lacks the fourth disulphide bond.

    \
    \
         +------+     +------------------------+                    +---+\
         |      |     |                        |                    |   |\
     xCxxCxxxxxxCxxxxxCxxxxxCxxxxxxxxxxxxxxxxxxCxxxxCxxxxxxxxxxxxxxCCxxxCxxxxxxxx\
      |                     |                       |              |\
      +---------------------+                       +--------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 6416 IPR010565 \

    This entry represents the N-terminal region of muskelin and is found in conjunction with several repeats. Muskelin is an intracellular, kelch repeat protein that is needed in cell-spreading responses to the matrix adhesion molecule, thrombospondin-1 PUBMED:12384287.

    \ 2473 IPR000771 \

    Fructose-bisphosphate aldolase PUBMED:2199259, PUBMED:1412694 is a glycolytic \ enzyme that catalyzes the reversible aldol cleavage or condensation of fructose-1,6-bisphosphate into \ dihydroxyacetone-phosphate and glyceraldehyde 3-phosphate. There are two classes of fructose-bisphosphate aldolases with different catalytic mechanisms. Class-II aldolases PUBMED:1412694, \ mainly found in prokaryotes and fungi, are homodimeric enzymes, which require a divalent metal ion, \ generally zinc, for their activity. This family also includes the Escherichia coli galactitol operon protein,\ gatY, which catalyzes the transformation of tagatose 1,6-bisphosphate into glycerone phosphate and \ D-glyceraldehyde 3-phosphate; and Escherichia coli N-acetyl galactosamine operon protein, agaY, which \ catalyzes the same reaction. There are two histidine residues in the first half of the sequence of \ these enzymes that have been shown to be involved in binding a zinc ion PUBMED:8436219.

    \ \ 6681 IPR009659 \

    This family consists of several hypothetical bacterial proteins of around 150 residues in length. The function of this family is unknown.

    \ 6300 IPR010509 \

    This region covers the N terminus and first two membrane regions of a small family of ABC transporters. Mutations in this domain in are believed responsible for Zellweger Syndrome-2 PUBMED:1301993; mutations in are responsible for recessive X-linked adrenoleukodystrophy PUBMED:8441467. A Saccharomyces cerevisiae protein containing this domain is involved in the import of long-chain fatty acids PUBMED:8670886.

    \ 2002 IPR005531 \

    This is a family of small proteins. It includes a protein identified as an alkaline shock protein PUBMED:7864904 so may be involved in stress response.

    \ 7950 IPR012571 \

    Proteins in this family are yeast mitochondrial inner membrane proteins MDM31 and MDM32. These proteins are required for the maintenance of mitochondrial morphology, and the stability of mitochondrial DNA PUBMED:15631992.

    \ 4156 IPR000788 \

    Ribonucleotide reductase () PUBMED:3286319, PUBMED:8511586 catalyzes the reductive\ synthesis of deoxyribonucleotides from their corresponding ribonucleotides. It provides\ the precursors necessary for DNA synthesis. RNRs divide into three classes on the basis of their\ metallocofactor usage. Class I RNRs, found in eukaryotes, bacteria, bacteriophage and viruses, use a diiron-tyrosyl\ radical, Class II RNRs, found in bacteria, bacteriophage, algae and archaea, use coenzyme B12\ (adenosylcobalamin, AdoCbl). Class III RNRs, found in anaerobic bacteria and bacteriophage, use an FeS cluster\ and S-adenosylmethionine to generate a glycyl radical. Many organisms have more than one class of RNR present in\ their genomes.

    \

    Ribonucleotide reductase is an oligomeric\ enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (300 to\ 400 residues) - class II RNRs are less complex, using the small molecule B12 in place of the small\ chain PUBMED:11875520.

    The reduction of ribonucleotides to deoxyribonucleotides involves the transfer of free radicals,\ the function of\ each metallocofactor is to generate an active site thiyl radical. This thiyl radical then initiates the nucleotide reduction\ process by hydrogen atom abstraction from the ribonucleotide PUBMED:9309223. The radical-based reaction involves five\ cysteines: two of these are located at adjacent anti-parallel strands in a\ new type of ten-stranded alpha/beta-barrel; two others reside at the\ carboxyl end in a flexible arm; and the fifth, in a loop in the centre of\ the barrel, is positioned to initiate the radical reaction PUBMED:8052308. There are several regions of similarity in the sequence of the large \ chain of prokaryotes, eukaryotes and viruses spread across 3 domains:\ an N-terminal domain common to the mammalian and bacterial enzymes; a\ C-terminal domain common to the mammalian and viral ribonucleotide \ reductases; and a central domain common to all three PUBMED:9309223.

    \ 7388 IPR011436 \

    This domain is found in a small number of Chlamydia proteins of unknown function. It occurs together with .

    \ 6161 IPR009402 \

    This family consists of several Orthopoxvirus A47 proteins. The function of this family is unknown.

    \ 6126 IPR009388 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbY found in PSII. In higher plants, two related PsbY proteins exist, PsbY-1 and PsbY-2, which appear to function as a heterodimer. In spinach and Arabidopsis, these two proteins arise from a single-copy nuclear gene that is processed in the chloroplast. By contrast, prokaryotic and organellar chromosomes encode a single PsbY protein, as found in cyanobacteria and red algae, indicating a duplication event in the evolution of higher plants PUBMED:15042356. PsbY has two low manganese-dependent activities: a catalase-like activity and an L-arginine metabolising activity that converts L-arginine into ornithine and urea PUBMED:9829828. In addition, a redox-active group is thought to be present in the protein. In cyanobacteria, PsbY deletion mutants have a slightly impaired PSII that is less capable of coping with low levels of calcium ions than the wild-type.

    \ 4188 IPR001021 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The bacterial ribosomal protein L25 is an RNA binding protein. Ribosomal protein L25\ shows homology to general stress proteins and glutaminyl-tRNA synthetases PUBMED:9799245.

    \ 6120 IPR010439 \

    This domain is often found in tandem repeats and co-occur with C2 domains , Protein kinase C, phorbol ester/diacylglycerol binding regions and PH domains .

    \ 5080 IPR007917 \

    This family of proteins is functionally uncharacterised.

    \ 6392 IPR009504 \

    This family consists of several bacterial YhjQ proteins. The function of this family is unknown.

    \ 2373 IPR004167 \ A small domain of the E2 subunit of 2-oxo-acid dehydrogenases that is responsible for the binding of the E3 subunit. Proteins containing this domain include the branched-chain alpha-keto acid dehydrogenase complex of bacteria, which catalyses the overall conversion of alpha-keto acids to acyl-CoA and carbon dioxide; and the E-3 binding protein of eukaryotic pyruvate dehydrogenase.\ 1242 IPR005129 \

    Bacterial periplasmic transport systems require the function of a specific substrate-binding protein, located in the periplasm, and several\ cytoplasmic membrane transport components. In Escherichia coli K-12, the arginine-ornithine transport system requires an\ arginine-ornithine-binding protein and the lysine-arginine-ornithine (LAO) transport system includes a LAO-binding protein. Both\ periplasmic proteins can be phosphorylated by a single kinase, ArgK PUBMED:2136858 resulting in reduced levels of transport activity of the periplasmic transport systems that\ include each of the binding proteins. The ArgK protein acts as an ATPase enzyme and as a kinase.

    \ 640 IPR000086 \ MutT is a small bacterial protein (~12-15Kd) involved in the GO system PUBMED:1328155\ responsible for removing an oxidatively damaged form of guanine (8-hydroxy-\ guanine or 7,8-dihydro-8-oxoguanine) from DNA and the nucleotide pool.\ 8-oxo-dGTP is inserted opposite dA and dC residues of template DNA with near equal efficiency, leading to A.T to G.C transversions. MutT\ specifically degrades 8-oxo-dGTP to the monophosphate, with the concomitant\ release of pyrophosphate. A short conserved N-terminal region of mutT \ (designated the MutT domain) is also found in a variety of other\ prokaryotic, viral and eukaryotic proteins PUBMED:8233837, PUBMED:8170394, PUBMED:8226881, PUBMED:10373642.\ \

    The generic name 'NUDIX hydrolases' (NUcleoside DIphosphate linked\ to some other moiety X) has been coined for this domain family PUBMED:8810257. The\ family can be divided into a number of subgroups, of which MutT anti-\ mutagenic activity represents only one type; most of the rest hydrolyse\ diverse nucleoside diphosphate derivatives (including ADP-ribose, GDP-\ mannose, TDP-glucose, NADH, UDP-sugars, dNTP and NTP).

    \ 3472 IPR001694 \

    Respiratory-chain NADH dehydrogenase () PUBMED:, PUBMED:2029890 (also known as complex I or NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the inner mitochondrial membrane which also seems to exist in the chloroplast and in cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this bioenergetic enzyme complex there are fifteen which are located in the membrane part, seven of which are encoded by the mitochondrial and chloroplast genomes of most species. The most conserved of these organelle-encoded subunits is known as subunit 1 (gene ND1 in mitochondrion, and NDH1 in chloroplast) and seems to contain the ubiquinone binding site.

    \

    The ND1 subunit is highly similar to subunit 4 of Escherichia coli formate hydrogenlyase (gene hycD), subunit C of hydrogenase-4 (gene hyfC). Paracoccus denitrificans NQO8 and Escherichia coli nuoH NADH-ubiquinone oxidoreductase subunits also belong to this family PUBMED:7690854.

    \ 7972 IPR012539 \

    This family consists of the cuticle proteins from the Cancer pagurus and the Homarus americanus. These proteins are isolated from the calcified regions of the crustacean and they contain two copies of an 18 residue sequence motif, which thus far has been found only in crustacean calcified exoskeletons PUBMED:10425740.

    \ 1930 IPR003839 \

    This is a family of proteins of unknown function, so far found only in Caenorhabditis elegans and Caenorhabditis briggsae.

    \ 2512 IPR004209 \ Ferredoxin thioredoxin reductase is a [4FE-4S] protein which plays an important role in the ferredoxin/thioredoxin regulatory chain. It converts an electron signal (photoreduced ferredoxin) to a thiol signal (reduced thioredoxin), regulating enzymes by reduction of specific disulphide groups. It catalyses the light-dependent activation of several photosynthetis enzymes. Ferredoxin thioredoxin reductase is a heterodimer of subunit a and subunit b. Subunit a is the variable subunit, and b is the catalytic chain. This family is the beta chain.\ 2333 IPR002761 \

    This domain is about 200 amino acids long with a strongly conserved motif\ SGGKD at the N-terminal. The structure of Q8U2K6 from Pyrococcus furiosus has been resolved to 2.7A and is suggested to be a putative N-type pytophosphatase.

    \ \ \ \ \

    In some members of the family e.g.\ , this domain is associated with , another domain of unknown function. Proteins with this uncharacterized domain include two apparent ortholog families in the archaea, one of which is universal among the first four completed archaeal genomes. The domain comprises the full length of the archaeal proteins and the first third of fungal proteins.

    \ \ 6626 IPR010650 \

    This is a family of PrkA bacterial and archaeal serine kinases approximately 630 residues long. PrkA possesses the A-motif of nucleotide-binding proteins and exhibits distant homology to eukaryotic protein kinases PUBMED:8626065. Note that many family members are hypothetical.

    \ 5721 IPR008670 \ This family consists of several bacterial Acyl-CoA reductase (LuxC) proteins. The channelling of fatty acids into the fatty aldehyde substrate for the bacterial bioluminescence reaction is catalysed by a fatty acid reductase multienzyme complex, which channels fatty acids through the thioesterase (LuxD), synthetase (LuxE) and reductase (LuxC) components PUBMED:9128139.\ 7673 IPR012890 \

    Sequences found in this family are similar to a region of a human GC-rich sequence DNA-binding factor homolog (). This is thought to be a protein involved in transcriptional regulation due to partial homologies to a transcription repressor and histone-interacting protein PUBMED:11707072.

    \ 2040 IPR007177 \

    This domain is found in a family of proteins of unknown function. It appears to be found in eukaryotes and archaebacteria, and occurs associated with a potential metal-binding region in RNase L inhibitor, RLI ().

    \ 683 IPR001730 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    Nuclear inclusion A (NIA) proteases from potyviruses are cysteine peptidases belong to the MEROPS peptidase family C4 (NIa protease family, clan PA(C)) PUBMED:7845226, PUBMED:.

    \ \

    Potyviruses include plant viruses in which the single-stranded RNA encodes a polyprotein with NIA protease activity, where proteolytic cleavage is specific for Gln+Gly sites. The NIA protease acts on the polyprotein, releasing itself by Gln+Gly cleavage at both the N- and C-termini. It further processes the polyprotein by cleavage at five similar sites in the C-terminal half of the sequence. In addition to its C-terminal protease activity, the NIA protease contains an N-terminal domain that has been implicated in the transcription process PUBMED:7845226.

    \

    This peptidase is present in the nuclear inclusion protein of potyviruses.

    \ 5792 IPR003888 \ The "FY-rich" domain N-terminal region is sometimes closely juxtaposed with the C-terminal region (), but sometimes is far distant. It is of unknown function, but occurs frequently in chromatin-associated proteins like trithorax and its homologues.\ 7347 IPR011118 \

    This family includes fungal tannase PUBMED:8917102 and feruloyl esterase PUBMED:11931668, PUBMED:8679110. It also includes several bacterial homologues of unknown function.

    \ 7079 IPR010830 \

    This family contains a number of hypothetical bacterial proteins of unknown function that are approximately 300 residues long.

    \ 3209 IPR001809 \ The ospA and ospB genes encode the major outer membrane proteins of the Lyme disease spirochaete Borrelia burgdorferi PUBMED:2761388. The deduced gene products OspA and OspB, contain 273 and 296 residues respectively PUBMED:2761388. The two Osp proteins show a high degree of sequence similarity, indicating a recent evolutionary event. Molecular analysis and sequence comparison of OspA and OspB with other proteins has revealed similarity to the signal peptides of prokaryotic lipoproteins PUBMED:2761388, PUBMED:1560779.\ 1793 IPR002288 \

    Topoisomerases catalyse the interconversion of topological isomers of DNA and play \ a key role in DNA metabolism PUBMED:7770916. Topoisomerase I catalyses an ATP-independent reaction, \ while topoisomerase II catalyses an ATP-dependent reaction, resulting in the formation \ of DNA supercoils PUBMED:1651812, PUBMED:1646964, PUBMED:2845399. Eukaryotic enzymes can form \ both positive and negative supercoils, while prokaryotic enzymes form only negative \ supercoils.

    \ \

    Eukaryotic topoisomerase II exists as a homodimer; in bacteriophage T4 it \ consists of three heterologous subunits; in prokaryotes it exists as a tetramer\ of two subunits (two each of gyrA and gyrB); and in Escherichia coli, a second type II\ topoisomerase, involved in chromosome segregation (topoisomerase IV),\ consists of two subunits (parC and parE). GyrB, parE, and the product of \ bacteriophage T4 gene 39, are all similar to the eukaryotic proteins.

    \

    Structural studies of E. coli topoisomerase II have shown that the enzyme\ binds to DNA, forming a complex in which a DNA strand of approximately 120\ base pairs is wound around a protein core. At low resolution, this\ complex resembles a flattened sphere, and may be heart-shaped, with the DNA\ embedded in the protein. There is evidence for channels or cavities in\ the complex, which may have a role in the DNA translocation process PUBMED:1646964.

    \

    The gyrB protein possesses 2 uniquely-folded domains. The N-terminal domain\ (domain 1) possesses ATP-binding and hydrolysis functions, and forms an\ 8-stranded anti-parallel beta-sheet with unusual strand connectivities - the\ structure, which is stabilised by a hydrophobic core, can be subdivided\ into 6- and 2-stranded anti-parallel sheets, connected by a parallel sheet. The C-terminal domain (domain 2) contains a 4-stranded mixed parallel\ and anti-parallel beta-sheet. Four helices are also present, 2 of which are\ rich in arginine residues. The gyrB dimer is punctured by a 20A hole, which\ may provide a gateway through which DNA is passed during supercoiling.\ Every arginine of domain 2 protrudes into this hole, possibly creating a \ DNA-binding surface PUBMED:1646964.

    \

    From this structural information and results of various biochemical studies,\ a possible mechanism has been proposed: DNA is first bound by the gyrB\ dimer, then cleaved by gyrA. A large conformational change allows passage\ of another DNA strand through the double-stranded break and into the protein\ complex. This may involve ATP binding, exploiting the energy of association\ of ATP to the complex to stabilise an unfavourable protein conformation.\ The DNA break is then repaired by ligation, and the whole DNA molecule\ released - this possibly involves hydrolysis of ATP to ADP and organic\ phosphorous, which can dissociate from the protein, allowing the protein\ complex to return to its favoured conformation, and releasing the DNA PUBMED:1646964.

    \ 4287 IPR005576 \

    The eukaryotic RNA polymerase subunits RPB4 and RPB7 form a heterodimer that reversibly associates with the RNA polymerase II core. Archaeal cells contain a single RNAP made up of about 12 subunits, displaying considerable homology to the\ eukaryotic RNAPII subunits. The RPB4 and RPB7 homologs are called subunits F and E, respectively, and\ have been shown to form a stable heterodimer. While the RPB7 homolog is\ reasonably well conserved, the similarity between the eukaryotic RPB4 and the archaeal F subunit is barely detectable PUBMED:11741548.

    \ 2121 IPR002726 \ This archaebacterial protein has no known function. It\ contains several predicted transmembrane regions,\ suggesting it is an integral membrane protein.\ 5576 IPR008820 \ Rubella virus (RV), the Solea senegalensis member of the genus Rubivirus within the family Togaviridae, is a small enveloped, positive strand RNA virus. The nucleocapsid consists of 40S genomic RNA and a single species of capsid protein which is enveloped within a host-derived lipid bilayer containing two viral glycoproteins, E1 (58 kDa) and E2 (42-46 kDa). In virus infected cells, RV matures by budding either at the plasma membrane, or at the internal membranes depending on the cell type and enters adjacent uninfected cells by a membrane fusion process in the endosome, directed by E1-E2 heterodimers. The heterodimer formation is crucial for E1 transport out of the endoplasmic reticulum to the Golgi and plasma membrane. In RV E1, a cysteine at position 82 is crucial for the E1-E2 heterodimer formation and cell surface expression of the two proteins. E1 has been shown to be a type 1 membrane protein, rich in cysteine residues with extensive intramolecular disulphide bonds PUBMED:11682134. This family is found together with and .\ 1214 IPR004939 \

    The anaphase-promoting complex (APC) is a multi-subunit E3 protein ubiquitin ligase that is responsible for the metaphase to anaphase transition and the exit from mitosis. One of the subunits of the APC that is required for its ubiquitination activity is Doc1/Apc10, a protein composed of a Doc1 homology domain that has been identified in a number of diverse putative E3 ubiquitin ligases. The crystal structure of Saccharomyces cerevisiae Doc1/Apc10 has been resolved at 2.2A resolution PUBMED:11884135. The Doc1 homology domain forms a beta-sandwich structure that is related in architecture to the galactose-binding domain of galactose oxidase, the coagulation factor C2 domain and a domain of XRCC1. Residues that are invariant amongst Doc1/Apc10 sequences map to a beta-sheet region of the molecule, whose counterpart in galactose oxidase, the coagulation factor C2 domains and XRCC1, mediate bio-molecular interactions.

    \ 3991 IPR008179 \

    Phosphoribosyl-ATP pyrophosphatase, catalyses the second step in the histidine biosynthetic pathway:\ \ The Neurospora crassa enzyme also catalyzes the reactions of histidinol dehydrogenase () and phosphoribosyl-AMP cyclohydrolase ().

    \ \ 22 IPR000472 \ Transforming growth factor-beta (TGF-beta) forms a family with other\ growth factors described in . The receptors for most of the \ members of this growth factor family are related. These proteins are\ receptor-type kinases of Ser/Thr type ), which have a single\ transmembrane domain and a specific hydrophilic Cys-rich ligand-binding domain PUBMED:9023056, PUBMED:8047140, PUBMED:8909794. The C-terminal part of the extracellular\ domain is conserved. Some of the receptors of this family contain subclass-specific\ N-terminal extensions of this homology domain. The type I receptors also possess 7 extracellular residues\ preceding the cysteine box.\ 191 IPR007708 \

    This presumed domain is found at the C terminus of lariat debranching enzyme. This domain is always found in association with a metallo-phosphoesterase domain . RNA lariat debranching enzyme is capable of digesting a variety of branched nucleic acid substrates and multicopy single-stranded DNAs. The enzyme degrades intron lariat structures during splicing.

    \ 1720 IPR006218 \ Members of the 3-deoxy-D-arabino-heptulosonate 7-phosphate (DAHP) synthetase family catalyse the first step in aromatic amino acid biosynthesis from chorismate. Class I includes bacterial and yeast enzymes; class II includes higher plants and various microorganisms (see ) PUBMED:8760910. \

    The first step in the common pathway leading to the biosynthesis of aromatic compounds is the stereospecific condensation of phosphoenolpyruvate (PEP) and D-erythrose-4-phosphate (E4P) giving rise to 3-deoxy-D-arabino-heptulosonate-7-phosphate (DAHP). This reaction is catalyzed by DAHP synthase, a metal-activated enzyme, which in microorganisms is the target for negative-feedback regulation by pathway intermediates or by end products. In Escherichia coli there are three DAHP synthetase isoforms, each specifically inhibited by one of the three aromatic amino acids. The crystal structure of the phenylalanine-regulated form of DAHP synthetase shows the fold as is a (beta/alpha)8 barrel with several additional beta strands and alpha helices PUBMED:10425687.

    \ 6728 IPR009682 \

    This family consists of several hypothetical Staphylococcus aureus and phage proteins of 53 residues in length. The function of this family is unknown.

    \ 6135 IPR010447 \

    This family consists of several Herpesvirus IR6 proteins. The equine herpesvirus 1 (EHV-1) IR6 protein forms typical rod-like structures in infected cells, influences virus growth at elevated temperatures, and determines the virulence of EHV-1 Rac strains PUBMED:9811716.

    \ 618 IPR004142 \ This family consists of proteins from different gene families: Ndr1/RTP/Drg1, Ndr2, and Ndr3. Their similarity was previously noted PUBMED:10581191. The precise molecular and cellular function of members of this family is still unknown, yet they are known to be involved in cellular differentiation events. The Ndr1 group was the first to be discovered. Their expression is repressed by the proto-oncogenes N-myc and c-myc, and in line with this observation, Ndr1 protein expression is down-regulated in neoplastic cells, and is reactivated when differentiation is induced by chemicals such as retinoic acid. Ndr2 and Ndr3 expression is not under the control of N-myc or c-myc. Ndr1 expression is also activated by several chemicals: tunicamycin and homocysteine induce Ndr1 in human umbilical endothelial cells; nickel induces Ndr1 in several cell types. Members of this family are found in wide variety of multicellular eukaryotes, including an Ndr1 type protein in Helianthus annuus (sunflower), known as Sf21. Interestingly, the highest scoring matches in the noise are all alpha/beta hydrolases (), suggesting that this family may have an enzymatic function.\ 6119 IPR010438 \

    This family consists of several Bacteriophage lambda Bor and Escherichia coli Iss proteins. Expression of bor significantly increases the survival of the E. coli host cell in animal serum. This property is a well known bacterial virulence determinant indeed, bor and its adjacent sequences are highly homologous to the iss serum resistance locus of the plasmid ColV2-K94, which confers virulence in animals. It has been suggested that lysogeny may generally have a role in bacterial survival in animal hosts, and perhaps in pathogenesis PUBMED:2144037.

    \ 4553 IPR001359 \ Synapsins are neuronal phosphoproteins that coat synaptic vesicles, bind to several \ elements of the cytoskeleton (including actin filaments), and are believed to function in \ the regulation of neurotransmitter release PUBMED:2117454, PUBMED:10578110. The synapsin family currently \ includes the highly related synapsin I and II. Both synapsins exist in two alternatively \ spliced variants, IA and IB and IIA and IIB, that only differ at the C-terminus. \ It also includes synapsin III.\ 2673 IPR003191 \ Transcription of the anti-viral guanylate-binding protein (GBP) is induced by interferon-gamma during macrophage induction. This family contains GBP1 and GPB2, both GTPases capable of binding GTP, GDP and GMP.\ 4612 IPR001222 \

    Transcription factor S-II (TFIIS) is a eukaryotic protein which induces mRNA cleavage by enhancing the intrinsic nuclease activity of RNA polymerase (Pol) II, past template-encoded pause sites. TFIIS shows DNA-binding activity only in the presence of RNA polymerase II PUBMED:3346229. It is widely distributed being found in mammals, Drosophila, yeast and in the archaebacteria Sulfolobus acidocaldarius PUBMED:8502569. S-II proteins have a relatively conserved C-terminal region but variable N-terminal region, and some members of this family are expressed in a tissue-specific manner PUBMED:1917889, PUBMED:8566795.

    \

    TFIIS is a modular factor that comprises an N-terminal domain I, a central domain II, and a C-terminal domain III PUBMED:12914699. The weakly conserved domain I forms a four-helix bundle and is not required for TFIIS activity. Domain II forms a three-helix bundle, and domain III adopts a zinc-ribbon fold with a thin protruding ß-hairpin. Domain II and the linker between domains II and III are required for Pol II binding, whereas domain III is essential for stimulation of RNA cleavage. TFIIS extends from the polymerase surface via a pore to the internal active site, spanning a distance of 100 Å. Two essential and invariant acidic residues in a TFIIS loop complement the Pol II active site and could position a metal ion and a water molecule for hydrolytic RNA cleavage. TFIIS also induces extensive structural changes in Pol II that would realign nucleic acids in the active centre.

    \

    Some viral proteins also contain the TFIIS zinc ribbon C-terminal domain. The vaccinia virus protein, unlike its eukaryotic homologue, is an integral RNA polymerase subunit rather than a readily separable transcription factor PUBMED:2398897.

    \ 7899 IPR012987 \

    This presumed domain is found at the N terminus of RNP K-like proteins that also contain KH domains PUBMED:15112237.

    \ 3793 IPR006708 \

    Peroxisome(s) form an intracellular compartment, bounded by a typical lipid bilayer membrane. Peroxisome functions are often specialized by organism and cell type; two widely distributed and well-conserved functions are H2O2-based respiration and fatty acid ß-oxidation. Other functions include ether lipid (plasmalogen) synthesis and cholesterol synthesis in\ animals, the glyoxylate cycle in germinating seeds ("glyoxysomes"), photorespiration in leaves, glycolysis in trypanosomes ("glycosomes"), and methanol and/or amine\ oxidation and assimilation in some yeasts.

    PEX genes encode the\ machinery ("peroxins") required to assemble the peroxisome. Membrane assembly and maintenance requires three of these (peroxins 3,\ 16, and 19) and may occur without the import of the matrix (lumen) enzymes. Matrix protein import follows a branched pathway of\ soluble recycling receptors, with one branch for each class of peroxisome targeting sequence (two are well characterized), and a\ common trunk for all. At least one of these receptors, Pex5p, enters and exits peroxisomes as it functions. Proliferation of the organelle\ is regulated by Pex11p. Peroxisome biogenesis is remarkably conserved among eukaryotes. A group of fatal, inherited\ neuropathologies are recognized as peroxisome biogenesis diseases.

    \ 3513 IPR000415 \

    This family is involved in the reduction of nitrogen containing compounds.\ Members of this family utilise FMN as a cofactor and are\ often found to be homodimers. Possible characteristics include Oxygen-insensitive NAD(P)H nitroreductase (FMN-dependent nitroreductase) (Dihydropteridine reductase) () and NADH dehydrogenase (). A number of the proteins are described as oxidoreductases. They are primarily found in bacterial lineages though a number of eukaryotic homologs have been identified: Caenorhabditis elegans , \ Drosophila melanogaster , mouse and human . \ This domain is not found in photosynthetic eukaryotes. The sequences containing this domain in photosynthetic organisms are possible false positives.

    \ \ 4179 IPR002171 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L2 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L2 is known to bind to the 23S rRNA and to have peptidyltransferase activity. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities PUBMED:1579444, PUBMED:, groups:

    \ \ 1962 IPR004921 \ This family represents a group of plasmid encodes proteins specifically found in Borrelia and that currently do not show any similarity to\ any other proteins outside the Borrelia genus. Proteins within this family are about 450 residues long and are found to be expanded in\ Borrelia burgdorferi. The function of this protein is unknown. \ 7511 IPR011637 \ These proteins appear to have some sequence similarity with but their function is unknown PUBMED:15306018.\ 902 IPR004345 \

    This family includes members from a wide variety of eukaryotes. It includes the TB2/DP1 (deleted in polyposis) protein which in human is deleted in severe forms of familial adenomatous polyposis, an autosomal\ dominant oncological inherited disease.

    \

    The family also includes the plant protein of known similarity to TB2/DP1, the\ HVA22 abscisic acid-induced protein (e.g. Q07764), which is thought to be a regulatory protein.

    \ 1846 IPR002812 \

    3-Dehydroquinate synthase () is an enzyme in the common pathway of aromatic amino acid biosynthesis that catalyses the conversion of 3-deoxy-D-arabino-heptulosonic acid 7-phosphate (DAHP) into 3-dehydroquinic acid PUBMED:11173489. This synthesis of aromatic amino acids is an essential metabolic function for most prokaryotic as well as lower eukaryotic cells, including plants. The pathway is absent in humans; therefore, DHQS represents a potential target for the development of novel and selective antimicrobial agents. Owing to the threat posed by the spread of pathogenic bacteria resistant to many currently used antimicrobial drugs, there is clearly a need to develop new anti-infective drugs acting at novel targets. A further potential use for DHQS inhibitors is as herbicides PUBMED:11412967.

    \ 341 IPR007815 \ This family includes erythromycin esterase enzymes PUBMED:3899861, PUBMED:3523438 that confer resistance to the erythromycin antibiotic.\ 1033 IPR004175 \ Members of this entry are bacterial and archaeal RNA ligases that are able to ligate tRNA half molecules containing 2',3'-cyclic phosphate and 5' hydroxyl termini to products containing the 2',5' phosphodiester linkage. Each member of this family contains an internal duplication, each of which contains an HXTX motif that defines the family. The structure of a related protein is known PUBMED:12466548. They belong to the 2H phosphoeseterase superfamily PUBMED:11080166. They share a common active site, characterised by two conserved histidines, with vertebrate myelin-associated 2',3' phosphodiesterases, plant Arabidopsis thaliana CPDases and several several bacteria and virus proteins.\ 1402 IPR001562 \

    Kinases are generally multi-domain, multi-functional proteins. Protein tyrosine kinases (PTKs), serine/threonine kinases, and other signal transduction proteins, possess a region of unknown function related to pleckstrin, designated the PH domain. A point mutation affecting a conserved Arg in the PH domain of the cytoplasmic PTK Btk causes the human disease X-linked agammaglobulinemia and X-linked immunodeficiency in mice. Btk is named after Bruton's tyrosine kinase, an enzyme which is essential for B cell maturation in humans and mice PUBMED:8070576. Btk forms a family with 2 other PTKs, Isk/Tsk and Tec, in which the PH domain is followed by an SH3 domain. The conserved sequence between the 2 regions has tentatively been designated the TH (Tec homology) domain. The N-terminal 27 residues of the TH domain are highly conserved (the Btk motif), and are followed by a proline-rich (PRR) region: the Btk motif contains a conserved His and 3 Cys residues that are involved in zinc fingers (although these differ from known zinc finger topologies), while PRRs are commonly involved in protein-protein interactions. The Tec extension to the PH domain may be of functional importance in various signalling pathways in different species PUBMED:8070576. A complete TH domain, containing both the Btk and PRR regions, has not been found outside the Btk family, and may be a hallmark of these cytoplasmic PTKs.

    \ \

    The crystal structures of Btk show that the Btk-type zinc finger has a globular core, formed by a long loop which is held together by a zinc ion. The zinc-binding residues are a histidine and three cysteines, which are fully conserved in the Btk motif PUBMED:8070576, PUBMED:9280283, PUBMED:9796816, PUBMED:9218782.

    \ 5241 IPR008744 \

    This signature is found in the RNA-direct RNA polymerase of apple chlorotic leaf spot virus and cherry mottle virus.

    \ 7505 IPR011628 \

    This conserved region is found in a group of haemagglutinins and peptidases, e.g. , that, in Porphyromonas gingivalis, form components of the major extracellular virulence complex RgpA-Kgp - a mixture of proteinases and adhesins PUBMED:10858222. These domains are cleaved from the original polyprotein and form part of the adhesins PUBMED:9245829.

    \ 892 IPR007019 \

    The surfeit locus protein SURF-6 has been shown to be a component of the nucleolar matrix and has a strong binding capacity for nucleic acids PUBMED:9548374. SURF-6 is always found in the nucleolus regardless of the phase of the cell cycle\ suggesting that it is a structural protein constitutively present in nucleolar substructures. A role in rRNA processing has been proposed for this protein.

    \ 5726 IPR008578 \ This family consists of several plant proteins of unknown function.\ 6877 IPR010755 \

    This family consists of several hypothetical bacterial proteins of around 165 residues in length. The function of this family is unknown.

    \ 6867 IPR010750 \

    This family consists of several hypothetical eukaryotic proteins of around 300 residues in length. The function of this family is unknown.

    \ 3803 IPR005844 \

    Phosphoglucomutase (, PGM) is an enzyme responsible for\ the conversion of D-glucose 1-phosphate into D-glucose 6-phosphate. PGM\ participates in both the breakdown and synthesis of glucose. Phosphomannomutase (, PMM) is an enzyme responsible for\ the conversion of D-mannose 1-phosphate into D-mannose 6-phosphate. PMM is\ required for different biosynthetic pathways in bacteria.

    \

    This domain is contained in both proteins.

    \ 3332 IPR001497 \

    Synonym(s): 6-O-methylguanine-DNA methyltransferase, O-6-methylguanine-DNA-alkyltransferase

    \ \

    The repair of DNA containing O6-alkylated\ guanine is carried out by DNA-[protein]-cysteine S-methyltransferase (). The major mutagenic and carcinogenic effect of methylating agents in DNA is the formation of O6-alkylguanine. The\ alkyl group at the O-6 position is transferred to a cysteine residue in the\ enzyme PUBMED:3052269. This is a suicide reaction since the enzyme is irreversibly inactivated\ and the methylated protein accumulates as a dead-end product. Most, but not\ all of the methyltransferases are also able to repair O-4-methylthymine. DNA-[protein]-cysteine S-methyltransferases are widely distributed and are found in various prokaryotic and eukaryotic sources PUBMED:1579490.

    \ 87 IPR004711 \ The benzoate transporter family contains only a single characterised member, the benzoate transporter of Acinetobacter calcoaceticus, which functions as a benzoate/proton symporter.\ 3839 IPR004188 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \ \

    Phenylalanyl-tRNA synthetase from Thermus thermophilus has an alpha 2 beta 2 type quaternary structure and is one of the most complicated members of the synthetase family. Identification of phenylalanyl-tRNA synthetase as a member of class II aaRSs was based only on sequence alignment of the small alpha-subunit with other synthetases PUBMED:8199244. This is the N-terminal domain of phenylalanyl-tRNA synthetase.

    \ 1399 IPR002538 \ Members of this family assemble into long tubular structures at the surface of the infected protoplast. These\ proteins aid the infection of the virus PUBMED:9267012, PUBMED:9514964.\ 8106 IPR004624 \

    This protein family includes an uncharacterised member designated phnA in Escherichia coli, part of a large operon associated with alkylphosphonate uptake and carbon-phosphorus bond cleavage PUBMED:2155230. This protein is not related to the characterised phosphonoacetate hydrolase designated PhnA PUBMED:9300819.

    \ 7220 IPR010868 \

    This entry represents the N terminus (approximately 50 residues) of cyclin-dependent kinase inhibitor 2a p19Arf, which seems to be restricted to mammals. This is a tumour-suppressor protein that has been shown to inhibit the growth of human tumour cells lacking functional p53 by inducing a transient G2 arrest and subsequently apoptosis PUBMED:12660818.

    \ 1310 IPR002554 \ Protein phosphatase 2A (PP2A) is a major intracellular protein\ phosphatase that regulates multiple aspects of cell growth and metabolism.\ The ability of this widely distributed heterotrimeric enzyme to act on a\ diverse array of substrates is largely controlled by the nature of its\ regulatory B subunit. There are multiple families of B subunits, this family is called the B56 family PUBMED:7592815.\ 2504 IPR003447 \ The femAB operon codes for two nearly identical approximately 50-kDa proteins involved in the formation of the Staphylococcal pentaglycine interpeptide bridge in peptidoglycan PUBMED:9393725. These proteins are also considered as a factor influencing the level of methicillin resistance PUBMED:10209768.\ 890 IPR000863 \ This family includes a range of sulphotransferase proteins including flavonyl 3-sulphotransferase, \ aryl sulphotransferase, alcohol sulphotransferase, estrogen sulphotransferase and phenol-sulphating \ phenol sulphotransferase. These enzymes are responsible for the transfer of sulphate groups to \ specific compounds.\ 7753 IPR012477 \

    This family features glycosyltransferases belonging to glycosyltransferase family 52 PUBMED:12691742, which have alpha-2,3- sialyltransferase () and alpha-glucosyltransferase () activity. For example, beta-galactoside alpha-2,3- sialyltransferase expressed by Neisseria meningitidis ()is a member of this family and is involved in a step of lipooligosaccharide biosynthesis requiring sialic acid transfer; these lipooligosaccharides are thought to be important in the process of pathogenesis PUBMED:8910446.

    \ 6969 IPR010789 \

    This family consists of several putative Lactococcus bacteriophage terminase small subunit proteins. The exact function of this family is unknown.

    \ 6510 IPR009567 \

    This family consists of several eukaryotic proteins of around 360 residues in length. The function of this family is unknown.

    \ 5247 IPR008629 \ In Arabidopsis, GUN4 is required for the functioning of the plastid mediated repression of nuclear transcription that is involved in controlling the levels of magnesium- protoporphyrin IX. GUN4 binds the product and substrate of Mg-chelatase, an enzyme that produces Mg-Proto, and activates Mg-chelatase. GUN4 is thought to participate in plastid-to-nucleus signaling by regulating magnesium-protoporphyrin IX synthesis or trafficking.\ 12 IPR008154 \

    Amyloidogenic glycoprotein (A4 protein or APP) is an integral, glycosylated membrane brain protein PUBMED:2900137, PUBMED:8140621. APP is associated with Alzheimer's disease (AD). This responsibility stems from the fact that a small peptide (of 43 residues), called the amyloid beta protein, which is part of the sequence of A4, is the major constituent of amyloid deposits in AD and in Down's syndrome. As shown in the schematic representation below, the amyloid beta protein both precedes and forms part of the unique transmembrane region of A4.

    \
    \
           +----------------------------------------xxxxxxx-------------+\
           |  Extracellular                         XXXXXXX Cytoplasmic |\
           +------------------------------------BBBBBBBBxxx-------------+\
    \
    'X': Transmembrane region.\
    'B': Position of the amyloid beta protein in A4.\
    
    \

    The exact function of A4 protein is not yet known, but it has been suggested that it mediates cell-cell interactions PUBMED:8380642, PUBMED:8425535.

    \ 2649 IPR004957 \ The Spumavirus gag protein is a core viral polyprotein that undergoes specific enzymatic cleavages in vivo to yield the mature protein.\ 5527 IPR008906 \ This dimerisation domain is found at the C terminus of the transposases of elements belonging to the Activator superfamily (hAT element superfamily). The isolated dimerisation domain forms extremely stable dimers in vitro PUBMED:10662858.\ 7381 IPR011434 \

    This domain is found as 1-3 copies in a small family of proteins of unknown function.

    \ 1314 IPR002010 \

    Secretion of virulence factors in Gram-negative bacteria involves \ transportation of the protein across two membranes to reach the cell \ exterior PUBMED:8969244. There have been four secretion systems described in \ animal enteropathogens such as Salmonella and Yersinia, with further \ sequence similarities in plant pathogens like Ralstonia and Erwinia PUBMED:8969244.

    \ \

    The type III secretion system is of great interest, as it is used to \ transport virulence factors from the pathogen directly into the host cell \ PUBMED:10334981 and is only triggered when the bacterium comes into close contact with\ the host. The protein subunits of the system are very similar to those of \ bacterial flagellar biosynthesis PUBMED:10564516. However, while the latter forms a\ ring structure to allow secretion of flagellin and is an integral part of\ the flagellum itself PUBMED:10564516, type III subunits in the outer membrane\ translocate secreted proteins through a channel-like structure.

    \ \

    It is believed that the family of type III inner membrane proteins are \ used as structural moieties in a complex with several other subunits PUBMED:9618447. \ One such set of inner membrane proteins, labeled "R" here for nomenclature \ purposes, includes the Salmonella and Shigella SpaR, the Yersinia YscT, \ Rhizobium Y4YN, and the Erwinia HrcT genes PUBMED:9618447. The flagellar protein FliR \ also shares similarity, probably due to evolution of the type III secretion \ system from the flagellar biosynthetic pathway.

    \ \ 5862 IPR009265 \

    This family consists of several short baculovirus proteins of unknown function.

    \ 3801 IPR001576 \ Phosphoglycerate kinase () (PGK) is an enzyme that catalyses the formation of ATP to ADP and vice versa. In the second step of the second phase in glycolysis, 1,3-diphosphoglycerate is converted to\ 3-phosphoglycerate, forming one molecule of ATP. If the reverse were to occur, one molecule of ADP would be formed. This reaction is essential in most cells for the generation of ATP in aerobes, for fermentation in anaerobes and for carbon fixation in plants. \

    PGK is found in all living organisms and its sequence has been highly conserved throughout evolution. The enzyme exists as a monomer containing two nearly equal-sized domains that correspond to the N- and C-termini of the protein (the last 15 C-terminal residues loop back into the N-terminal domain). 3-phosphoglycerate (3-PG) binds to the N-terminal, while the nucleotide substrates, MgATP or MgADP, bind to the C-terminal domain of the enzyme. This extended two-domain structure is associated with large-scale 'hinge-bending' conformational changes, similar to those found in hexokinase PUBMED:10593256. At the core of each domain is a 6-stranded parallel beta-sheet surrounded by alpha helices. Domain 1 has a parallel beta-sheet of six strands with an order of 342156, while domain 2 has a parallel beta-sheet of six strands with an order of 321456. Analysis of the reversible unfolding of yeast phosphoglycerate kinase leads to the conclusion that the two lobes are capable of folding independently, consistent with the presence of intermediates on the folding pathway with a single domain folded PUBMED:2124145.

    \

    Phosphoglycerate kinase (PGK) deficiency is associated with haemolytic anaemia and mental disorders in man PUBMED:6689547.

    \ 678 IPR008279 \ A number of enzymes that catalyze the transfer of a phosphoryl group from\ phosphoenolpyruvate (PEP) via a phospho-histidine intermediate have been shown\ to be structurally related PUBMED:7686067, PUBMED:8973315, PUBMED:2176881, PUBMED:1557039. All these enzymes share the same catalytic mechanism: they bind PEP and\ transfer the phosphoryl group from it to a histidine residue. This domain is a "swivelling" beta/beta/alpha domain which is thought to be mobile in all\ proteins known to contain it PUBMED:12083528. It is often found associated with the pyruvate phosphate dikinase, PEP/pyruvate-binding domain () at its N-terminus.\ 1177 IPR000933 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Family 29 () encompasses alpha-L-fucosidases () PUBMED:2482732, which is a lysosomal enzyme responsible for\ hydrolyzing the alpha-1,6-linked fucose joined to the reducing-end\ N-acetylglucosamine of the carbohydrate moieties of glycoproteins. Deficiency\ of alpha-L-fucosidase results in the lysosomal storage disease fucosidosis.

    \ \ 1120 IPR000646 \ This family includes hexon-associated proteins from adenoviruses. Adenoviruses are responsible for diseases such as pneumonia, cystitis, conjunctivitis and diarrhoea, all \ of which can be fatal to patients who are immunocompromised PUBMED:7704534.\ 3940 IPR007660 \

    This is a family of Chordopoxvirinae D3 protein. The conserved region occupies the entire length of D3 protein.

    \ 5738 IPR008586 \ This family consists of several hypothetical proteins from plants. The function of this family is unknown.\ 3076 IPR007285 \ Chlamydia trachomatis is an obligate intracellular bacterium that develops within a parasitophorous vacuole termed an inclusion. The inclusion is nonfusogenic with lysosomes but intercepts lipids from a host cell exocytic pathway. Initiation of chlamydial development is concurrent with modification of the inclusion membrane by a set of C. trachomatis-encoded proteins collectively designated Incs. One of these Incs, IncA, is functionally associated with the homotypic fusion of inclusions PUBMED:12065525.\ 4202 IPR001705 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L33 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L33 has been shown to be on the surface of 50S subunit. L33 belongs to a family of ribosomal proteins which, on the basis of sequence similarities PUBMED:1742360, PUBMED:8112583, PUBMED:, groups:\

    \

    L33 is a small protein of 49 to 66 amino-acid residues.

    \ 5020 IPR000967 \ This domain is presumed to be a zinc binding domain. The following pattern describes the zinc finger:\ C-X(1-6)-H-X-C-X3-C(H/C)-X(3-4)-(H/C)-X(1-10)-C, where X can be any amino acid, and numbers in brackets\ indicate the number of residues. The two position can be either his or cys. This domain is found in the \ human transcriptional repressor NK-X1, a repressor of HLA-DRA transcription; the Drosophila shuttle craft \ protein, which plays an essential role during the late stages of embryonic neurogenesis; and a yeast \ hypothetical protein YNL023C.\ 2833 IPR004846 \

    This family includes: protein D that is involved in the general (type II) secretion pathway (GSP) within Gram-negative bacteria, a signal sequence-dependent process responsible for protein export PUBMED:8438237, PUBMED:1365398, PUBMED:1592799,PUBMED:8326859, PUBMED:7901733, PUBMED:7934814, PUBMED:8190064 and protein G from the type III secretion system.

    \

    A number of proteins are involved in the GSP; one of these is known as protein D (GSPD protein), the most probable location of which is the outer membrane PUBMED:2677007. This suggests that protein D constitutes the apparatus of the accessory mechanism, and is thus involved in transporting exoproteins from the periplasm, across the outer membrane, to the extracellular environment.

    \

    The type III secretion system is of great interest, as it is used to transport virulence factors from the pathogen directly into the host cell and is only triggered when the bacterium comes into close contact with the host. The protein subunits of the system are very similar to those of bacterial flagellar biosynthesis. However, while the latter forms a ring structure to allow secretion of flagellin and is an integral part of the flagellum itself PUBMED:10564516, type III subunits in the outer membrane translocate secreted proteins through a channel-like structure. Protein G aids in the structural assembly of the invasion complex PUBMED:8733226.

    \ 7991 IPR012580 \

    This small domain is found in a novel nucleolar family PUBMED:15112237.

    \ 3381 IPR005053 \ This family includes of the MobA protein from the E. coli plasmid RSF1010, and the MobL protein from the Thiobacillus ferrooxidans plasmid PTF1. These sequences\ are mobilization proteins, which are essential for specific plasmid transfer.\ 6737 IPR010698 \

    This family consists of several Chordopoxvirus proteins of around 160 residues in length. The function of this family is unknown.

    \ 6511 IPR009568 \

    This family contains a number of hypothetical proteins of unknown function from Arabidopsis thaliana.

    \ 501 IPR007701 \ Interferon-related developmental regulator (IFRD1) is the human homologue of the Rattus norvegicus early response protein PC4 and its murine homolog TIS7 PUBMED:9050919. The exact function of IFRD1 is unknown but it has been shown that PC4 is necessary for muscle differentiation and that it might have a role in signal transduction. This entry also contains IFRD2 and its murine equivalent SKMc15, which are highly expressed soon after gastrulation and in the hepatic primordium, suggesting an involvement in early hematopoiesis PUBMED:9722946.\ 1032 IPR006140 \

    \ A number of NAD-dependent 2-hydroxyacid dehydrogenases which seem to be\ specific for the D-isomer of their substrate have been shown to be\ functionally and structurally related. All contain a glycine-rich\ region located in the central section of these enzymes, this region corresponds to the NAD-binding domain. The catalytic domain is described in

    \ 3702 IPR005650 \ The penicillinase repressor negatively regulates expression of the penicillinase gene. The N-terminal region of this protein is involved in operator recognition, while the C-terminal is responsible for dimerisation of the protein PUBMED:8226686.\ 7229 IPR009979 \

    This family consists of several Lentivirus viral infectivity factor (VIF) proteins. VIF is known to be essential for ability of cell-free virus preparation to infect cells PUBMED:9440006. Members of this family are specific to Bovine immunodeficiency virus (BIV) and Jembrana disease virus which also infects cattle.

    \ 6290 IPR006624 \

    Tectonins I and II are two dominant proteins in the nuclei\ and nuclear matrix from plasmodia of Physarum polycephalum which encode 217 and 353 amino acids, respectively. Tectonin I\ is homologous to the C-terminal two-thirds of tectonin II. Both proteins contain six tandem repeats that are each 33-37 amino acids in\ length and define a new consensus sequence. Homologous repeats are found in L-6, a bacterial lipopolysaccharide-binding lectin from\ horseshoe crab hemocytes. The repetitive sequences of the tectonins and L-6 are reminiscent of the WD repeats of the beta-subunit of\ G proteins, suggesting that they form beta-propeller domains. The tectonins may be lectins that\ function as part of a transmembrane signaling complex during phagocytosis PUBMED:9497393.

    \ \ 4450 IPR000037 \ In bacteria, SsrA RNA recognizes ribosomes stalled on defective messages and acts as a tRNA and mRNA to mediate\ the addition of a short peptide tag to the C-terminus of the partially synthesized nascent polypeptide chain. The\ SsrA-tagged protein is then degraded by C-terminal-specific proteases.

    SmpB, a unique RNA-binding protein that is\ conserved throughout the bacterial kingdom is an essential component of the SsrA quality-control\ system. Deletion of the smpB gene in Escherichia coli results in the same phenotypes observed in ssrA-defective cells,\ including a variety of phage development defects and the failure to tag proteins translated from defective mRNAs. Purified\ SmpB binds specifically and with high affinity to SsrA RNA and is required for stable association of SsrA with ribosomes\ in vivo. Formation of an SmpB-SsrA complex appears to be critical in mediating SsrA activity after aminoacylation with\ alanine but prior to the transpeptidation reaction that couples this alanine to the nascent chain. SsrA RNA is present at\ wild-type levels in the smpB mutant arguing against a model of SsrA action that involves direct competition for\ transcription factors PUBMED:10393194.

    \ 6749 IPR009691 \

    This family consists of several VIRB2 type IV secretion proteins. The virB2 gene encodes a putative type IV secretion system and is known to be a pathogenicity factor in Bartonella species PUBMED:12421311.

    \ 2320 IPR007823 \ This family consists of uncharacterised eukaryotic proteins which are related to S-adenosyl-L-methionine-dependent methyltransferases.\ 4955 IPR000012 \ HIV is the human retrovirus associated with AIDS (acquired immune\ deficiency syndrome), and SIV its simian counterpart. Three main groups\ of primate lentivirus are known, designated HIV-1, HIV-2/SIVMAC/SIVSM and\ SIVAGM. SIVMND has been suggested to represent a fourth \ distinct group PUBMED:2797181. These groups are believed to have diverged from a\ common ancestor long before the spread of AIDS in humans.\ Genetic variation in HIV-1 and HIV-2 has been studied extensively, and\ the nucleotide sequences reported for several strains PUBMED:2611042.

    ORF analysis\ has revealed two open reading frames, yielding the so-called R- and X-ORF\ proteins, whose functions are unknown, but which show a high degree of\ sequence similarity.

    \ 187 IPR007732 \

    Flavocytochrome b558 is the catalytic core of the respiratory-burst oxidase, an enzyme complex that catalyzes the\ NADPH-dependent reduction of O2 into the superoxide anion O2 in phagocytic cells. Flavocytochrome b558 is anchored in the plasma membrane. It is a heterodimer that consists of a large glycoprotein gp91phox (phox forphagocyte oxidase) (beta subunit) and a\ small protein p22phox (alpha subunit). The other components of the respiratory-burst oxidase are water-soluble proteins of cytosolic\ origin, namely p67phox, p47phox, p40phox and Rac. Upon cell stimulation, they assemble with the membrane-bound\ flavocytochrome b558 which becomes activated and generates O2-. PUBMED:8798532.\

    \ 7897 IPR012568 \

    This family represents the K167/Chmadrin repeat PUBMED:15112237. The function of this repeat is unknown.

    \ 1645 IPR003204 \

    Cytochrome c oxidase () is an oligomeric enzymatic complex which is a component \ of the respiratory chain complex and is involved in the transfer of electrons from \ cytochrome c to oxygen PUBMED:6307356. \ In eukaryotes this enzyme complex is located in the mitochondrial inner membrane; in \ aerobic prokaryotes it is found in the plasma membrane.

    \

    In eukaryotes, in addition to the \ three large subunits, I, II and III, that form the catalytic center of the enzyme complex, there are \ a variable number of small polypeptidic subunits. One of these subunits is known as Va.

    \ 7411 IPR011444 \

    This domain is found in a family of paralogues in the planctomycetes. The function is not known. It is found associated with the Planctomycete cytochrome C domain .

    \ 6245 IPR009445 \

    This family consists of several hypothetical eukaryotic proteins of unknown function.

    \ 965 IPR005369 \

    The function of this family is unknown, however the proteins contain two cysteine clusters that may be iron sulphur redox centres.

    \ 4125 IPR006935 \ This family includes the res subunit of type III restriction enzymes () PUBMED:11178902, PUBMED:9628345.\ 5775 IPR010272 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 7979 IPR012541 \

    This C-terminal domain is found in the Dbp10p subfamily of hypothetical RNA helicases PUBMED:15112237.

    \ 3415 IPR005634 \

    This is a family of Drosophila proteins, that are typified by the repetitive motif C-G-P.

    \ 7484 IPR011514 \

    This is a short domain found in bacterial type II/III secretory system proteins. The architecture of these proteins suggests that this family may be functionally analogous to .

    \ 4782 IPR004205 \

    The ubiquinol-cytochrome C reductase complex (cytochrome bc1 complex) is a respiratory multi-enzyme complex PUBMED:9651245, which recognizes a mitochondrial targeting presequence. The bc1 complex contains 11 subunits: 3 respiratory subunits (cytochrome b, cytochrome c1 and Rieske protein), 2 core proteins and 6 low molecular weight proteins. This family represents the 9.5 kDa subunit of the complex. This subunit together with cytochrome B binds to ubiquinone.

    \ 395 IPR000101 \

    Gamma-glutamyltranspeptidase () (GGT) PUBMED:2868390 catalyzes the transfer of\ the gamma-glutamyl moiety of glutathione to an acceptor that may be an amino\ acid, a peptide or water (forming glutamate). GGT plays a key role in the\ gamma-glutamyl cycle, a pathway for the synthesis and degradation of\ glutathione and drug and xenobiotic detoxification PUBMED:1378736. In\ prokaryotes and eukaryotes, it is an enzyme that consists of\ two polypeptide chains, a heavy and a light subunit, processed from a single\ chain precursor by an autocatalytic cleavage. The active site of GGT is known to be located in the light\ subunit.\ The sequences of mammalian and bacterial GGT show a number of regions of\ high similarity PUBMED:2570061. Pseudomonas cephalosporin acylases () that\ convert 7-beta-(4-carboxybutanamido)-cephalosporanic acid (GL-7ACA) into\ 7-aminocephalosporanic acid (7ACA) and glutaric acid are evolutionary related\ to GGT and also show some GGT activity PUBMED:1358202.\ Like GGT, these GL-7ACA acylases, are also composed of two subunits.

    \ \

    As an autocatalytic peptidase GGT belongs to MEROPS peptidase family T3 (gamma-glutamyltransferase family, clan PB(T)). The active site residue for members of this family and family T1 is C-terminal to the autolytic cleavage site. The type example is gamma-glutamyltransferase 1 from Escherichia coli.

    \ 568 IPR001739 \

    Methylation at CpG dinucleotide, the most common DNA modification in\ eukaryotes, has been correlated with gene silencing associated with various\ phenomena such as genomic imprinting, transposon and chromosome X\ inactivation, differenciation, and cancer. Effects of DNA methylation are\ mediated through proteins which bind to symmetrically methylated CpGs. Such\ proteins contain a specific domain of ~70 residues, the methyl-CpG-binding\ domain (MBD), which is linked to additional domains associated with chromatin,\ such as the bromodomain, the AT hook motif,the SET domain, or the PHD finger. MBD-containing\ proteins appear to act as structural proteins, which recruit a variety of\ histone deacetylase (HDAC) complexes and chromatin remodeling factors, leading\ to chromatin compaction and, consequently, to transcriptional repression. The\ MBD of MeCP2, MBD1, MBD2, MBD4 and BAZ2 mediates binding to DNA, in case of\ MeCP2, MBD1 and MBD2 preferentially to methylated CpG. In case of human MBD3\ and SETDB1 the MBD has been shown to mediate protein-protein interactions\ PUBMED:12529184, PUBMED:12787239.

    \ \

    The MBD folds into an alpha/beta sandwich structure comprising a layer of\ twisted beta sheet, backed by another layer formed by the alpha1 helix and a\ hairpin loop at the C terminus. These layers are both\ amphipathic, with the alpha1 helix and the beta sheet lying parallel and the\ hydrophobic faces tighly packed against each other. The beta sheet is composed\ of two long inner strands (beta2 and beta3) sandwiched by two shorter outer\ strands (beta1 and beta4) PUBMED:11371345.

    \ \ 1690 IPR004367 \

    Cyclins are eukaryotic proteins that play an active role in controlling nuclear cell division cycles PUBMED:12910258, and regulate cyclin dependent kinases (CDKs). Cyclins, together with the p34 (cdc2) or cdk2 kinases, form the Maturation Promoting Factor (MPF). There are two main groups of cyclins, G1/S cyclins, which are essential for the control of the cell cycle at the G1/S (start) transition, and G2/M cyclins, which are essential for the control of the cell cycle at the G2/M (mitosis) transition. G2/M cyclins accumulate steadily during G2 and are abruptly destroyed as cells exit from mitosis (at the end of the M-phase). In most species, there are multiple forms of G1 and G2 cyclins. For example, in vertebrates, there are two G2 cyclins, A and B, and at least three G1 cyclins, C, D, and E.

    \

    Cyclin homologues have been found in various viruses, including herpesvirus saimiri and Kaposis sarcoma-associated herpesvirus. These viral homologues differ from their cellular counterparts in that the viral proteins have gained new functions and eliminated others to harness the cell and benefit the virus PUBMED:11056549.

    \ \

    This is the C-terminal domain of cyclins.

    \ 7177 IPR009949 \

    This family consists of several hypothetical Sapovirus proteins of around 165 residues in length. The function of this family is unknown.

    \ 3784 IPR000028 \ Chloroperoxidase (CPO) is a versatile heme-containing enzyme that exhibits peroxidase, catalase and cytochrome P450-like\ activities in addition to catalyzing halogenation reactions PUBMED:8747463.\ Despite functional similarities with other heme enzymes, CPO folds into a novel\ tertiary structure dominated by eight helical segments. The catalytic base,\ required to cleave the peroxide O-O bond, is glutamic acid\ rather than histidine as in other peroxidases.\ 2150 IPR007451 \ Protein of unknown function, cotranscribed with purB in Escherichia coli, but with function unrelated to purine biosynthesis PUBMED:8969519.\ 5667 IPR008401 \ The anaphase-promoting complex (APC) is a conserved multi-subunit ubiquitin ligase required for the degradation of key cell cycle regulators. Members of this family are components of the anaphase-promoting complex homologous to Apc13p PUBMED:12477395.\ 4963 IPR005159 \

    The WCCH motif is found in a retrotransposons and Gemini viruses. A specific function has not been associated to this motif PUBMED:11600699.

    \ 1042 IPR002421 \

    The N-terminal and internal 5'3'-exonuclease domains are commonly found together, and are most often associated with 5' to 3' nuclease activities. The XPG protein signatures () are never found outside the '53EXO' domains. The latter are found in more diverse proteins PUBMED:7926735, PUBMED:10322433, PUBMED:8464724. The number of amino acids that separate the two 53EXO domains, and the presence of accompanying motifs allow the diagnosis of several protein families.

    In the eubacterial type A DNA-polymerases, the N-terminal and internal domains are separated by a few amino acids, usually four. The pattern DNA_POLYMERASE_A () is always present towards the C-terminus. Several eukaryotic structure-dependent endonucleases and exonucleases have the 53EXO domains separated by 24 to 27 amino acids, and the XPG protein signatures are always present. In several proteins from herpesviridae, the two 53EXO domains are separated by 50 to 120 amino acids. These proteins are implicated in the inhibition of the expression of the host genes. Eukaryotic DNA repair proteins with 600 to 700 amino acids between the 53_EXO domains all carry the XPG protein signatures.

    \ 7416 IPR011448 \

    This is a domain that occurs in 1-2 copies in a family of proteins identified in Leptospira interrogans. The function of the proteins is not known.

    \ 5024 IPR004217 \ This family of proteins contain a putative zinc binding domain with four conserved cysteine residues. This domain is found in the human disease protein Deafness Dystonia Protein 1. Members of this family such as Tim9 and Tim10 are involved in mitochondrial protein import PUBMED:11101512. Members of this family seem to be localised to the mitochondrial intermembrane space PUBMED:8663351.\ 7834 IPR010017 \

    Methyl transfer from the ubiquitous S-adenosyl-L-methionine (AdoMet) to either nitrogen, oxygen or carbon atoms is frequently employed in diverse organisms ranging from bacteria to plants and mammals. The reaction is catalyzed by methyltransferases (Mtases) and modifies DNA, RNA, proteins and small molecules, such as catechol for regulatory purposes. The various aspects of the role of DNA methylation in prokaryotic restriction-modification systems and in a number of cellular processes in eukaryotes including gene regulation and differentiation is well documented.

    \ \

    Three classes of DNA Mtases transfer the methyl group from AdoMet to the target base to form either N-6-methyladenine, or N-4-methylcytosine, or C-5- methylcytosine. In C-5-cytosine Mtases, ten conserved motifs are arranged in the same order PUBMED:8127644. Motif I (a glycine-rich or closely related consensus sequence; FAGxGG in M.HhaI PUBMED:8343957), shared by other AdoMet-Mtases PUBMED:2684970, is part of the cofactor binding site and motif IV (PCQ) is part of the catalytic site. In contrast, sequence comparison among N-6-adenine and N-4-cytosine Mtases indicated two of the conserved segments PUBMED:2690010, although more conserved segments may be present. One of them corresponds to motif I in C-5-cytosine Mtases, and the other is named (D/N/S)PP(Y/F). Crystal structures are known for a number of Mtases PUBMED:7607476, PUBMED:8343957, PUBMED:8127644, PUBMED:7971991. The cofactor binding sites are almost identical and the essential catalytic amino acids coincide. The comparable protein folding and the existence of equivalent amino acids in similar secondary and tertiary positions indicate that many (if not all) AdoMet-Mtases have a common catalytic domain structure. This permits tertiary structure prediction of other DNA, RNA, protein, and small-molecule AdoMet-Mtases from their amino acid sequences PUBMED:7897657.

    \ \

    This is a set of proteobacterial proteins, which have homology in their central region to a large\ number of methyltransferases active on a variety of substrates.

    \ \ 2071 IPR007296 \

    This is a domain of unknown function. It sometimes occurs singly or as the C-terminal domain, in combination with another two domains of unknown function: DUF404 () and DUF407 ().

    \ 621 IPR001258 \

    The NHL (NCL-1, HT2A and LIN-41) repeat is found in a variety of enzymes of the \ copper type II, ascorbate-dependent monooxygenase family which catalyse the C-terminus \ alpha-amidation of biological peptides PUBMED:1894599. The repeat also occurs in a human \ zinc finger protein that specifically interacts with the activation domain of \ lentiviral Tat proteins PUBMED:7778269. The repeat domain that is often associated \ with RING finger and B-box motifs PUBMED:9868369.

    \ 4736 IPR002904 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \ \

    Lysyl-tRNA synthetase () is an alpha 2 homodimer that belong to both class I and class II. In eubacteria and eukaryota lysyl-tRNA synthetases belong to\ class II in the same family as aspartyl tRNA synthetase. The class Ic lysyl-tRNA synthetase family is present in archaea and some eubacteria PUBMED:9353192. Moreover in some eubacteria there is a gene X, which is similar to a part of lysyl-tRNA synthetase from class II. Lysyl-tRNA synthetase is duplicated in some species with, for example in E. coli, as a\ constitutive gene (lysS) and an induced one (lysU). A refined crystal structures shows that the active site of lysU is shaped to position the substrates for the nucleophilic attack of the lysine carboxylate on the ATP alpha-phosphate. No residues are directly involved in catalysis, but a number of highly conserved amino acids and three metal ions coordinate the substrates and stabilise the pentavalent transition state. A loop close to the catalytic pocket, disordered in the lysine-bound structure, becomes ordered upon adenine binding PUBMED:10913247.

    \ 464 IPR003107 \

    The HAT (Half A TPR) repeat has a repetitive pattern characterised by three aromatic residues with a conserved spacing. They are structurally and sequentially similar to TPRs (tetratricopeptide repeats), though they lack the highly conserved alanine and glycine residues found in TPRs. The number of HAT repeats found in different proteins varies between 9 and 12. HAT-repeat-containing proteins appear to be components of macromolecular complexes that are required for RNA processing PUBMED:9478129. The repeats may be involved in protein-protein interactions. The HAT motif has striking structural similarities to HEAT repeats (), being of a similar length and consisting of two short helices connected by a loop domain, as in HEAT repeats.

    \ \ \ \ 659 IPR002058 \

    These PAP/25A associated domains are found in uncharacterised eukaryotic proteins, a number of which are described as 'topoisomerase 1-related' though they appear to have little or no homology to topoisomerase 1. The signatures that define this group of sequences often occur towards the C-terminus after the PAP/25A core domain .

    \ 7697 IPR012868 \

    The proteins in this entry have not been characterised.

    \ 3054 IPR000710 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to the MEROPS peptidase family S6 (clan PA(S)). The type eample being the IgA1-specific serine endopeptidase from Neisseria gonorrhoeae PUBMED:7845208. These cleave prolyl bonds in the hinge regions of immunoglobulin A heavy chains. Similar specificity is shown by the unrelated family of M26 metalloendopeptidases.

    \ \ 2584 IPR001407 \

    Influenza RNA-dependent RNA polymerase is composed of three subunits;\ P1 (or PB1), P2 (or PA), and P3 (or PB2).\ There are two separate domains in the influenza virus PB1 protein involved in the interaction with the PB2 and PA subunits PUBMED:9348094, PUBMED:8948635. PB1 has two GTP binding sites.

    \ 8005 IPR012994 \

    This family contains a set of membrane proteins, typically 33 amino acids long. The family has no known function, but the protein is found in the operon CydAB in Escherichia coli. Members have a consensus motif (MWYFXW), which is rich in aromatic residues. The protein forms a single membrane-spanning helix. This family seems to be restricted to proteobacteria PUBMED:9068659.

    \ 331 IPR007706 \

    This family contains EBNA-3A, -3B, and -3C which are latent infection nuclear proteins important for Epstein-Barr virus (EBV)-induced B-cell immortalisation and the immune response to EBV infection.

    \ 7870 IPR012621 \

    This family consists of TOM7 family of mitochondrial import receptors. TOM7 forms part of the translocase of the outer mitochondrial membrane (TOM) complex and it appears to function as a modulator of the dynamics of the mitochondrial protein transport machinery by promoting the dissociation of subunits of the outer membrane translocase PUBMED:9642296.

    \ 7776 IPR012499 \

    This family includes three peptides secreted by the spider Hadronyche versuta (, , ). These are insect-selective, excitatory neurotoxins that may function by antagonising muscle acetylcholine receptors, or acetylcholine receptor subtypes present in other invertebrate neurons PUBMED:10881200. Janus atracotoxin-Hv1c (J-ACTX-Hv1c, ) is organised into a disulphide-rich globular core (residues 3-19) and a beta-hairpin (residues 20-34). There are 4 disulphide bridges, one of which is a vicinal disulphide bridge; this is known to be unimportant in the maintenance of structure but critical for insecticidal activity PUBMED:10881200.

    \ 2106 IPR007383 \ This protein is predicted to be a membrane protein.\ 3402 IPR001339 \

    The mRNA capping enzyme in yeasts is composed of two separate chains, alpha a mRNA\ guanyltransferase and beta an RNA 5'-triphosphate. X-ray crystallography reveals a large \ conformational change during guanyl transfer by mRNA capping enzymes PUBMED:9160746.\ Binding of the enzyme to nucleotides is specific to the GMP moiety of GTP. The viral \ mRNA capping enzyme is a monomer that transfers a GMP cap onto the end of mRNA that \ terminates with a 5'-diphosphate tail.

    \ 7495 IPR011654 \ This group of paralogous proteins identified in Mycoplasma penetrans includes hypothetical proteins of unknown function PUBMED:12466555.\ 6180 IPR009412 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 3699 IPR007735 \ This family consists of the C-terminal region of the pecanex protein homologues. The pecanex protein is a maternal-effect neurogenic gene found in Drosophila PUBMED:1460533.\ 2510 IPR007202 \ These proteins contain a domain with four conserved cysteines that probably form an Fe-S redox cluster.\ 7294 IPR010903 \

    This family consists of several hypothetical glycine rich plant and bacterial proteins of around 300 residues in length. The function of this family is unknown.

    \ 2353 IPR002791 \

    This family of proteins have not been characterized.

    \ 518 IPR003131 \

    The N-terminal, cytoplasmic tetramerization domain (T1) of voltage-gated K+ channels encodes molecular determinants for subfamily-specific assembly of alpha-subunits into functional tetrameric channels PUBMED:9886290. This domain is found in a subset of a larger group of proteins that obtain BTB/POZ domain.

    \ 5799 IPR010286 \

    This family consists of several conserved hypothetical proteins from both eukaryotes and prokaryotes. The function of this family is unknown.

    \ 5898 IPR010340 \

    The large phosphorylated protein (UL32-like) of herpes viruses is the polypeptide most frequently reactive in immuno-blotting analyses with antisera when compared with other viral proteins PUBMED:2455019.

    \ 2097 IPR006340 \

    Members of this family are uncharacterized proteins of about 180 amino acids from the Bacillus/Clostridium group of Gram-positive bacteria, found in no more than one copy per genome.

    \ 527 IPR001909 \

    The Krueppel-associated box (KRAB) is a domain of around 75 amino acids that\ is found in the N-terminal part of about one third of eukaryotic Krueppel-type\ C2H2 zinc finger proteins (ZFPs) PUBMED:14519192. It is enriched in charged amino acids and can be divided into subregions A and B, which are predicted to fold into two amphipathic alpha-helices. The KRAB A and B boxes can be separated by variable spacer segments and many KRAB proteins contain only the A box PUBMED:2023909.

    \

    The functions currently known for members of the KRAB-containing protein family include transcriptional repression of RNA polymerase I, II, and III promoters, binding and splicing of RNA, and control of nucleolus function. The KRAB domain functions as a transcriptional repressor when tethered to the template DNA by a DNA-binding domain. A sequence of 45 amino acids in the KRAB A subdomain has been shown to be necessary and sufficient for transcriptional repression. The B box does not repress by itself but does potentiate the repression exerted by the KRAB A subdomain PUBMED:8183939, PUBMED:8183940. Gene silencing requires the binding of the KRAB domain to the RING-B box-coiled coil (RBCC) domain of the KAP-1/TIF1-beta corepressor. As KAP-1 binds to the heterochromatin proteins HP1, it has been proposed that the KRAB-ZFP-bound target gene could be silenced following recruitment to heterochromatin PUBMED:10653693, PUBMED:10748030.

    \

    KRAB-ZFPs probably constitute the single largest class of transcription factors within the human genome PUBMED:10360839. Although the function of KRAB-ZFPs is largely unknown, they appear to play important roles during cell differentiation and development. The KRAB domain is generally encoded by two exons. The regions coded by the two exons are known as KRAB-A and KRAB-B.

    \ 2125 IPR007413 \ Some members of this family are thought to possess an ATP-binding domain towards their N terminus.\ 4507 IPR001734 \

    Sodium/substrate symport (or co-transport) is a widespread mechanism of solute transport across cytoplasmic membranes of pro- and eukaryotic cells. Thereby the\ energy stored in an inwardly directed electrochemical sodium gradient (sodium motive force, SMF) is used to drive solute accumulation against a concentration\ gradient. The SMF is generated by primary sodium pumps (e.g. sodium/potassium ATPases, sodium translocating respiratory chain complexes) or via the action of\ sodium/proton antiporters. Sodium/substrate transporters are grouped in different families based on sequence similarities PUBMED:1965458, PUBMED:8031825.

    \

    One of these families, known as the sodium:solute symporter family (SSSF), contains over a hundred members of pro- and eukaryotic origin PUBMED:12354616. The average hydropathy plot for SSSF proteins predicts 11 to 15 putative transmembrane domains (TMs) in alpha-helical conformation. A secondary structure model of PutP from Escherichia coli suggests the protein contains 13 TMs with the N-terminus located\ on the periplasmic side of the membrane and the C-terminus facing the cytoplasm. The results support the idea of a common topological motif for members of the SSSF. Transporters with a C-terminal extension are proposed to have\ an additional 14th TM.

    \

    An ordered binding model of sodium/substrate transport suggests that sodium binds to\ the empty transporter first, thereby inducing a conformational alteration which increases the affinity of the transporter for the solute. The formation of the ternary\ complex induces another structural change that exposes sodium and substrate to the other site of the membrane. Substrate and sodium are released and the empty\ transporter re-orientates in the membrane allowing the cycle to start again.

    \ 4495 IPR002060 \ Squalene synthase () (farnesyl-diphosphate farnesyltransferase) (SQS) and Phytoene synthase (EC 2.5.1.-) (PSY) share a number of functional similarities. These similarities are also reflected at the level of their primary structure PUBMED:8294001, PUBMED:8474436, PUBMED:8250898. In particular three well conserved regions are shared by\ SQS and PSY; they could be involved in substrate binding and/or the catalytic\ mechanism. \

    SQS catalyzes the conversion of two molecules of farnesyl diphosphate (FPP) into squalene. It is the first committed step in the cholesterol biosynthetic pathway. The reaction carried out by SQS is catalyzed in two separate steps: the first is a head-to-head condensation of the two molecules of FPP to form presqualene diphosphate; this intermediate is then rearranged in a NADP-dependent reduction, to form squalene:\ \ SQS is found in eukaryotes. In yeast it is encoded by the ERG9 \ gene, in mammals by the FDFT1 gene. SQS seems to be membrane-bound.

    \

    PSY catalyzes the conversion of two molecules of geranylgeranyl diphosphate (GGPP) into phytoene. It is the second step in the biosynthesis of carotenoids from isopentenyl diphosphate. The reaction carried out by PSY is catalyzed in two separate steps: the first is a head-to-head condensation of the two molecules of GGPP to form prephytoene diphosphate; this intermediate is then rearranged to form phytoene.\ \ PSY is found in all organisms that synthesize carotenoids: plants and \ photosynthetic bacteria as well as some non- photosynthetic bacteria and \ fungi. In bacteria PSY is encoded by the gene crtB. In plants PSY is localized in the chloroplast.

    \ 7407 IPR013044 \

    This domain is found in a small number of Chlamydia proteins of unknown function. It occurs together with .

    \ 7546 IPR012910 \

    The Plug domain has been shown to be an independently folding subunit of the TonB-dependent receptors PUBMED:15111112. It acts as the channel gate, blocking the pore until the channel is bound by a ligand. At this point it undergoes conformational changes and opens the channel.

    \ 2835 IPR005644 \

    This is a group of NolW-like proteins, which are closely related to bacterial type II and III secretion system protein ().

    \ 2037 IPR007161 \

    This is a family of bacterial and archaeal proteins of unknown function.

    \ 2979 IPR000021 \ The hok/gef family of Gram-negative bacterial proteins are toxic to cells\ when over-expressed, killing the cells from within by interfering with a\ vital function in the cell membrane PUBMED:3070354. Some family members (flm) increase\ the stability of unstable RNA PUBMED:3070354, some (pnd) induce the degradation of\ stable RNA at higher than optimium growth temperatures PUBMED:2465777, and others\ affect the release of cellular magnesium by membrane alterations PUBMED:2465777. The\ proteins are short (50-70 residues), consisting of an N-terminal hydrophobic (possibly membrane spanning) domain, and a C-terminal periplasmic\ region, which contains the toxic domain. The C-terminal region contains a\ conserved cysteine residue that mediates homo-dimerisation in the gef\ protein, although dimerisation is not necessary for the toxic effect PUBMED:1943700.\ 178 IPR003892 \ This domain may be involved in binding ubiquitin-conjugating enzymes (UBCs). CUE domains also occur in two protein of the IL-1 signal transduction pathway, tollip and TAB2.\ 89 IPR003344 \ Proteins that contain this domain are found in a variety of bacterial and\ phage surface proteins such as intimins. \ Intimin is a bacterial cell-adhesion molecule that mediates the intimate bacterial host-cell interaction. It contains three domains; two immunoglobulin-like domains and a C-type lectin-like module implying that carbohydrate recognition may be important in intimin-mediated cell adhesion PUBMED:10201396.\ 625 IPR000064 \

    The Escherichia coli NLPC/Listeria P60 domain occurs at the C terminus of a number of different bacterial and viral proteins. The viral proteins are either described as tail assembly proteins or Gp19. In bacteria, the proteins are variously described as being putative tail component of prophage, invasin, invasion associated protein, putative lipoprotein, cell wall hydrolase, or putative endopeptidase.

    \ \

    The Escherichia coli NLPC/Listeria P60 domain is contained within the boundaries of the cysteine peptidase domain that defines the MEROPS peptidase family C40 (clan C-). A type example being dipeptidyl-peptidase VI from Bacillus sphaericus and gamma-glutamyl-diamino acid-endopeptidase precursor from Lactococcus lactis (). This group also contains proteins classified as non-peptidase homologues in that they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity of peptidases in the C40 family.\

    \ 972 IPR001943 \ During the process of Escherichia coli nucleotide excision repair, DNA damage\ recognition and processing are achieved by the action of the uvrA, uvrB,\ and uvrC gene products PUBMED:8466476. UvrB and UvrC share a common domain of around 35\ amino acids, the so called UVR domain. This domain in UvrB can interact with\ the homologous domain in UvrC throughout a putative coiled coil structure.\ This interaction is important for the incision of the damaged strand PUBMED:8530482.\ 7340 IPR011091 \

    This family of proteins is restricted to the Actinobacteria and Proteobacteria. \ The function of these proteins is unknown.

    \ 4197 IPR000517 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L30 is one of the proteins from the large ribosomal subunit. L30 belongs to a family of ribosomal proteins which, on the basis of sequence similarities PUBMED:1549461, groups bacteria and archaea L30, yeast mitochondrial L33, and Drosophila melanogaster, Dictyostelium discoideum, fungal and mammalian L7 ribosomal proteins. L30 from bacteria are small proteins of about 60 residues, those from archaea are proteins of about 150 residues, and eukaryotic L7 are proteins of about 250 to 270 residues.

    \ \ 6004 IPR009330 \

    This family consists of several bacterial lipopolysaccharide core biosynthesis proteins (WaaY or RfaY). The waaY, waaQ, and waaP genes are located in the central operon of the waa (formerly rfa) locus on the chromosome of Escherichia coli. This locus contains genes whose products are involved in the assembly of the core region of the lipopolysaccharide molecule. WaaY is the enzyme that phosphorylates HepII in this system PUBMED:9756860.

    \ 6974 IPR010792 \

    This family consists of several hypothetical bacterial proteins, which seem to be specific to Chlamydia pneumoniae. Members of this family are typically around 400 residues in length. The function of this family is unknown.

    \ 4567 IPR004823 \ The TATA box binding protein associated factor (TAF) is part of the transcription initiation factor TFIID multimeric protein complex. TFIID plays a central role in mediating promoter responses to various activators and repressors. It binds tightly to TAFII-250 and directly interacts with TAFII-40. TFIID is composed of TATA binding protein (TBP)and a number of TBP-associated factors (TAFS). TAF proteins adopt a histone-like fold.\ 3235 IPR003758 \

    Tetraacyldisaccharide 4'-kinase phosphorylates the 4'-position of a tetraacyldisaccharide 1-phosphate precursor (DS-1-P) of lipid A, but the enzyme has not yet been purified because of instability PUBMED:9575203. This\ enzyme is involved in the synthesis of lipid A portion of the bacterial lipopolysaccharide layer (LPS).

    \ \ 4184 IPR002671 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L22e forms part of the 60S ribosomal subunit PUBMED:1840484. This family is found in eukaryotes. Rat L22 is related to ribosomal proteins from other eukaryotes and is identical in amino acid sequence to human EAP, the EBER 1 (Epstein-Barr virus encoded RNA) associated protein PUBMED:7999786.

    \ 5406 IPR000280 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S31 (clan PA(S)).

    The type example is pestivirus NS3 polyprotein peptidase from bovine viral diarrhea virus, which is Type 1 pestivirus. The pestiviruses are single-stranded RNA viruses whose genomes encode one large polyprotein PUBMED:7845208.\ The p80 endopeptidase resides towards the middle of the polyprotein and is\ responsible for processing all non-structural pestivirus proteins PUBMED:7845208, PUBMED:1651596.\ The p80 enzyme is similar to other proteases in the PA(S) clan and is predicted\ to have a fold similar to that of chymotrypsin PUBMED:7845208, PUBMED:2548336. An HDS catalytic triad\ has been identified PUBMED:2548336.

    \ 2934 IPR005029 \

    The herpes simplex virus type 1 gene UL47 encodes the tegument proteins referred to collectively as VP13/14, which are believed to be differentially modified forms of the same protein. These proteins have been show to target to the nucleus. The function of this family is unknown but it contains a number of Herpesviridae proteins.

    \ 4147 IPR000198 \ Members of the Rho family of small G proteins transduce signals from plasma-membrane\ receptors and control cell adhesion, motility and shape by actin cytoskeleton formation.\ Like all other GTPases, Rho proteins act as molecular switches, with an active\ GTP-bound form and an inactive GDP-bound form. The active conformation is promoted by\ guanine-nucleotide exchange factors, and the inactive state by GTPase-activating proteins\ (GAPs) which stimulate the intrinsic GTPase activity of small G proteins.\ This entry is a Rho/Rac/Cdc42-like GAP domain, that is found in a wide variety of large,\ multi-functional proteins PUBMED:9009196.\ A number of structure are known for this family\ PUBMED:9009196, PUBMED:8962058, PUBMED:9262406.\ The domain is composed of seven alpha helices.\ This domain is also known as the breakpoint cluster region-homology (BH) domain.\ 6601 IPR009616 \

    This entry represents the N-terminal region of several mammal specific Bim proteins. The Bim protein is one of the BH3-only proteins, members of the Bcl-2 family that have only one of the Bcl-2 homology regions, BH3. BH3-only proteins are essential initiators of apoptotic cell death PUBMED:11734221.

    \ 7819 IPR012642 \

    Proteins containing the Wos2 domain are involved in the regulation of the cell cycle PUBMED:10581266. They are related to the human p23 family PUBMED:10581266.

    \ 4059 IPR004896 \ This protein is required for high-level transcription of the PUC operon. It is an integral membrane protein. The family includes other proteins form Rhodobacter eg. bacteriochlorophyll synthase.\ 4639 IPR000652 \

    Triosephosphate isomerase () (TIM) PUBMED:2204417 is the glycolytic enzyme that catalyzes the\ reversible interconversion of glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. TIM plays an important role in several metabolic pathways and is essential for efficient energy production. It is a dimer of identical subunits, each of which is made up of about 250 amino-acid residues. A glutamic acid residue is involved in the catalytic mechanism PUBMED:2005961. The sequence around the active site residue is perfectly conserved in all known TIM's. Deficiencies in TIM are associated with haemolytic anaemia coupled with a progressive, severe neurological disorder PUBMED:12023819.

    \ 4738 IPR000824 \

    The mtr operon encodes a presumed RNA-binding regulatory protein that is required for attenuation \ control of the trp operon PUBMED:2123343. The operon consists of two structural \ genes, mtrA and mtrB, predicted to encode 22-kD and 6-kD polypeptides respectively PUBMED:1551827. \ MtrB is similar to RegA, an RNA-binding regulatory protein of bacteriophage T4. Both mtrA and mtrB \ have been shown to be necessary for regulation of beta-galactosidase production. The crystal structure \ of the Trp RNA-binding attenuation protein of Bacillus subtilis has been solved PUBMED:7525975, PUBMED:7715723. \ The protein is an ondecamer of 7-stranded beta-sandwiches. The 11 subunits are stabilised by 11 inter-subunit strands, forming a beta-wheel with a large central hole. The binding of L-tryptophan in clefts \ between adjacent strands induces conformational changes in the protein. It is possible that, on binding, \ the mRNA target forms a matching circle in which 11 U/GAG repeats are bound to the surface of the \ protein ondecamer modified by the binding of L-tryptophan PUBMED:7715723.

    \ \ 1794 IPR004615 \

    DNA-directed DNA polymerase () catalyzes DNA-template-directed extension of the 3'-end of an RNA strand by one nucleotide at a time. DNA polymerase III is a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria. The enzyme also has 3' to 5' exonuclease activity. It has a core composed of alpha, epsilon and theta chains, that associate with a tau subunit which allows the core dimerization to form the PolIII' complex. PolIII' associates with the gamma complex (gamma, delta, delta', psi and chi chains) and with the beta chain. This family is the psi subunit, the small subunit of the DNA polymerase III holoenzyme in Escherichia coli and related species, whose exact function is not known. It appears to have a narrow taxonomic distribution, being restricted to the gammaproteobacteria.

    \ 5559 IPR008858 \

    The TROVE (Telomerase, Ro and Vault) domain is a module of ~300-500 residues\ that is found in TEP1 and Ro60 the protein components of three\ ribonucleoprotein particles. The TROVE domain is also found in bacterial\ ribonucleoproteins suggesting an ancient origin of these ribonucleoproteins.\ The TROVE domain can be found associated with other domains, such as the VWFA\ domain, the TEP1 N-terminal domain, the NACHT-NTPase domain, and WD-40 repeats. The TROVE domain may\ be involved in binding the RNA components of the three RNPs, which are\ telomerase RNA, Y RNA and vault RNA PUBMED:14563212.

    \ \

    The TROVE domain contains a few absolutely conserved residues. As none of\ these conserved residues are the polar type of amino acids found in active\ sites, it seems unlikely that this region has an enzymatic function PUBMED:14563212.

    \ 5109 IPR007946 \

    This family consists of several eukaryotic AAR2-like proteins. The Saccharomyces cerevisiae protein AAR2 is involved in splicing pre-mRNA\ of the a1 cistron and other genes that are important for cell growth PUBMED:1922071.

    \ 6277 IPR009455 \

    The domain is found exclusively in plant mitochonchria and is a putative homing endonuclease, though such a function remains to be demonstrated. The domain is found C-terminal to the plant mitochondrial ATPase subunit 8 domain .

    \ 1893 IPR003745 \

    This entry describes proteins of unknown function.

    \ 3210 IPR004890 \

    This domain is found along with a central domain () in a group of Mycoplasma lipoproteins of unknown function.

    \ 4563 IPR003133 \ This domain of large T antigen binds to the SV40 origin of DNA replication PUBMED:8946857.\ 1588 IPR001977 \

    This family contains dephospho-CoA kinases (), which catalyse the phosphorylation of the 3'-hydroxyl group of dephosphocoenzyme A to form Coenzyme A. This enzyme uses ATP in its reaction.

    \ 1841 IPR008203 \ This family includes short archaebacterial proteins of\ unknown function. Archaeoglobus fulgidus has twelve\ copies of this protein, with several being clustered\ together in the genome.\ 3248 IPR000362 \

    A number of enzymes, belonging to the lyase class, for which fumarate is a\ substrate, have been shown \ to share a short conserved sequence around a\ methionine which is probably involved in the catalytic activity of this type\ of enzymes PUBMED:3282546, PUBMED:. The following are examples of members of this family:

    \ \ \
  • P32427 (PCAB_PSEPU): 3-carboxymuconate lactonizing enzyme, (3-carboxy-cis,cis-muconate cycloisomerase), an enzyme involved in aromatic acids catabolism PUBMED:1390752.
  • \ \
  • P24057 (CRD1_ANAPL): Delta-crystallin shares around 90% sequence identity with arginosuccinate lyase,\ showing that it is an example of a 'hijacked' enzyme - accumulated mutations have, however, rendered the\ protein enzymatically inactive.
  • \ \
  • P05042 (FUMC_ECOLI): Class I Fumarase enzyme, (fumarate hydratase), which catalyzes the\ reversible hydration of fumarate to L-malate. Class I enzymes are thermolabile dimeric enzymes (as for\ example: Escherichia coli fumC).
  • \ \
  • P04424 (ARLY_HUMAN): Arginosuccinase, (argininosuccinate lyase), which catalyzes the\ formation of arginine and fumarate from argininosuccinate, the last step in the biosynthesis of arginine.
  • \ \
  • P04422 (ASPA_ECOLI): Aspartate ammonia-lyase, (aspartase), which catalyzes the reversible\ conversion of aspartate to fumarate and ammonia. This reaction is analogous to that catalyzed by fumarase,\ except that ammonia rather than water is involved in the trans-elimination reaction.
  • \ \
  • P00923 (FUMA_ECOLI): class II Fumarase enzyme, , are thermostable and tetrameric and are\ found in prokaryotes (as for example: Escherichia coli fumA and fumB) as well as in eukaryotes. The\ sequence of the two classes of fumarases are not closely related.
  • \ \
  • P25739 (PUR8_ECOLI): Adenylosuccinase, (adenylosuccinate lyase) PUBMED:1574589, which catalyzes the\ eighth step in the de novo biosynthesis of purines, the formation of\ 5'-phosphoribosyl-5-amino-4-imidazolecarboxamide and fumarate from 1-(5-\ phosphoribosyl)-4-(N-succino-carboxamide). That enzyme can also catalyze the formation of fumarate and\ AMP from adenylosuccinate.
  • \ 5582 IPR008552 \ This short presumed domain is found in a large number of hypothetical plant proteins. The domain is quite rich in conserved glycine residues. It occurs in some putative transposons but currently has no known function.\ 245 IPR004320 \ This family represents a number of Arabidopsis thaliana proteins of unknown function.\ 1602 IPR000290 \ This family includes bacterial colicin and pyocin immunity proteins PUBMED:8692833, PUBMED:8755730. These immunity proteins can bind specifically to the DNase-type colicins and pyocins and inhibit their bactericidal activity. The\ 1.8-angstrom crystal structure of the ImmE7 protein consists of four antiparallel alpha-helices PUBMED:8692833. Sequence similarities between colicins E2, A and E1 PUBMED:3936034 are less striking. The colicin\ E2 (pyocin) immunity protein does not share similarity with either the colicin E3 or\ cloacin DF13 PUBMED:6253914 immunity proteins. Pyocin protects a cell that harbours the plasmid\ ColE2 encoding colicin E2 against colicin E2; it is thus essential both for autonomous\ replication and colicin E2 immunity PUBMED:3892228.\ 4452 IPR000928 \

    SNAP-25 (synaptosome-associated protein 25 kDa) proteins are components of SNARE complexes, which are proposed to account for the specificity of membrane fusion and to\ directly execute fusion by forming a tight complex (the SNARE or core\ complex) that brings the synaptic vesicle and plasma membranes\ together. The SNAREs constitute a large family of proteins that\ are characterized by 60-residue sequences known as SNARE motifs (),\ which have a high propensity to form coiled coils and often precede\ carboxy-terminal transmembrane regions. The synaptic core complex is formed by four SNARE motifs (two from\ SNAP25 and one each from synaptobrevin and syntaxin 1) that are\ unstructured in isolation but form a parallel four-helix bundle on\ assembly. The crystal structure of the core complex revealed\ that the helix bundle is highly twisted and contains several salt bridges on\ the surface, as well as layers of interior hydrophobic residues.\ However, a polar layer in the centre of the complex is formed by three\ glutamines (two from SNAP25 and one from syntaxin 1) and one arginine\ (from synaptobrevin) PUBMED:12154365.

    \

    Members \ of the SNAP-25 family contain a cluster of cysteine residues that can be palmitoylated for membrane attachment \ PUBMED:8226991.

    \ 7740 IPR012457 \

    The members of this family are hypothetical proteins expressed by Trypanosoma cruzi, a eukaryotic parasite that causes Chagas, disease in humans. This region is found as multiple copies per protein.

    \ 1688 IPR003712 \ Some bacteria can overcome the toxicity of environmental cyanate by hydrolysis of cyanate. This reaction is catalyzed by cyanate lyase (also known as cyanase) PUBMED:3049588. Cyanate lyase is found in bacteria and plants and catalyzes the reaction of cyanate with bicarbonate to produce ammonia and carbon dioxide. \

    The cyanate lyase monomer is composed of two domains. The N-terminal domain shows structural similarity to the DNA-binding alpha-helix bundle motif. The C-terminal domain has an 'open fold' with no structural homology to other proteins. The dimer structure reveals the C-terminal domains to be intertwined, and the decamer is formed by a pentamer of these dimers. The active site of the enzyme is located between dimers and is comprised of residues from four adjacent subunits of the homodecamer PUBMED:10801492.

    \ 1616 IPR013125 \

    The connexins are a family of integral membrane proteins that oligomerise to form intercellular channels that are clustered at gap junctions. These channels are specialised sites of cell-cell contact that allow the passage of ions, intracellular metabolites and messenger molecules (with molecular weight less than 1-2 kD) from the cytoplasm of one cell to its opposing neighbours. They are found in almost all vertebrate cell types, and somewhat similar proteins have been cloned from plant species. Invertebrates utilise a different family of molecules, innexins, that share a similar predicted secondary structure to the vertebrate connexins, but have no sequence identity to them PUBMED:9769729.

    \ \

    Vertebrate gap junction channels are thought to participate in diverse biological functions. For instance, in the heart they permit the rapid cell-cell transfer of action potentials, ensuring coordinated contraction of the cardiomyocytes. They are also responsible for neurotransmission at specialised 'electrical' synapses. In non-excitable tissues, such as the liver, they may allow metabolic cooperation between cells. In the brain, glial cells are extensively-coupled by gap junctions; this allows waves of intracellular Ca2+ to propagate through nervous tissue, and may contribute to their ability to spatially-buffer local changes in extracellular K+ concentration PUBMED:7685944.

    \ \

    The connexin protein family is encoded by at least 13 genes in rodents, with many homologues cloned from other species. They show overlapping tissue expression patterns, most tissues expressing more than one connexin type. Their conductances, permeability to different molecules, phosphorylation and voltage-dependence of their gating, have been found to vary. Possible communication diversity is increased further by the fact that gap junctions may be formed by the association of different connexin isoforms from apposing cells. However, in vitro studies have shown that not all possible combinations of connexins produce active channels PUBMED:8811187, PUBMED:8608591.

    \ \

    Hydropathy analysis predicts that all cloned connexins share a common transmembrane (TM) topology. Each connexin is thought to contain 4 TM\ domains, with two extracellular and three cytoplasmic regions. This model\ has been validated for several of the family members by in vitro biochemical\ analysis. Both N- and C-termini are thought to face the cytoplasm, and the\ third TM domain has an amphipathic character, suggesting that it contributes\ to the lining of the formed-channel. Amino acid sequence identity between\ the isoforms is ~50-80%, with the TM domains being well conserved. Both\ extracellular loops contain characteristically conserved cysteine residues,\ which likely form intramolecular disulphide bonds. By contrast, the single\ putative intracellular loop (between TM domains 2 and 3) and the cytoplasmic\ C-terminus are highly variable among the family members.\ Six connexins are\ thought to associate to form a hemi-channel, or connexon. Two connexons then\ interact (likely via the extracellular loops of their connexins) to form the\ complete gap junction channel.

    \ \
     \
           NH2-***        ***        *************-COOH\
                 **     **   **      **\
                 **    **     **    **   Cytoplasmic\
              ---**----**-----**----**----------------\
                 **    **     **    **   Membrane\
                 **    **     **    **\
              ---**----**-----**----**----------------\
                 **    **     **    **   Extracellular\
                  **  **       **  **\
                    **           **\
    
    \ \

    Gap junction alpha-8 protein (also called connexin50, Cx50, or lens fibre\ protein MP70) is a connexin of ~431 amino acid residues. The chicken isoform\ is shorter (399 residues) and is hence known as Cx45.6. Cx50 and Cx46 are\ the two gap junction proteins normally found in lens fibre cells of the eye.\ Evidence from both genetically-engineered mice, and from the identification\ of mutations in the human Cx50-encoding gene, highlight the importance of\ this connexin in maintaining lens transparency. Deletion of mice Cx50\ produces a viable phenotype, but these animals start to develop cataracts\ (of the zonular pulverant type) at about one week old. They also have\ abnormally small eyes and lenses. Similarly, mutations in the human gene\ encoding Cx50 have been associated with the occurrence of congenital\ cataracts. Affected individuals develop cataracts (with zonular pulverent\ opacities), and analysis shows they have a single point mutation in the Cx50\ coding region, resulting in a non-conservative substitution in the second\ putative TM domain of a serine residue for a proline.

    \ \

    This domain is found in the C-terminal region of these proteins.

    \ 1797 IPR004149 \ DNA ligases catalyse the crucial step of joining the breaks in duplex DNA during DNA replication, repair and recombination, utilizing either ATP or NAD(+) as a cofactor PUBMED:10698952. This domain is a small zinc binding motif that is presumably DNA binding. It is found only in NAD dependent DNA ligases.\ 7477 IPR011424 \

    This short domain is rich in cysteines and histidines. The pattern of conservation is similar to that found in .

    \ 31 IPR000674 \ Aldehyde oxidase () catalyzes the conversion of an aldehyde in the presence of oxygen and water\ to an acid and hydrogen peroxide. The enzyme is a homodimer, and requires FAD, molybdenum and two\ 2FE-2S clusters as cofactors. Xanthine dehydrogenase () catalyzes the hydrogenation of xanthine\ to urate, and also requires FAD, molybdenum and two 2FE-2S clusters as cofactors. This activity is often\ found in a bifunctional enzyme with xanthine oxidase () activity too. The enzyme can be converted\ from the dehydrogenase form to the oxidase form irreversibly by proteolysis or reversibly through oxidation\ of sulphhydryl groups.\ \ 1440 IPR007833 \ This family includes export proteins involved in capsule polysaccharide biosynthesis, such as KpsS and LipB .\ 5566 IPR008454 \ This domain is found in Staphylococcus aureus collagen-binding surface protein. However, this region does not mediate collagen binding, the region carries out that function. The structure of the repetitive B-region has been solved PUBMED:10673425 and forms a beta sandwich structure. It is thought that this region forms a stalk in Staphylococcus aureus collagen-binding protein that presents the ligand binding domain away from the bacterial cell surface.\ 541 IPR007000 \

    Basement membranes separate dissimilar cell types and thus compartmentalize almost all tissues. They are thin sheets of extracellular matrix whose main components are type IV collagen, nidogen, sulphated proteoglycans, and laminins. Laminins are found in all basement membranes, but also in embryonic mesenchyme and loose connective tissue. Lamin is thought to mediate the attachment, migration and organisation of cells into tissues during embryonic development by interacting with other extracellular matrix components PUBMED:1975589.

    \

    All native laminins identified so far are composed of one alpha, one beta, and one gamma chain and 12 different heterotrimers have been proposed based on five alpha, three beta, and three gamma chains. In vitro studies have indicated that laminins mediate a variety of biological functions. First they form a self-assembling structural network to which other components of the basement membrane attach. Second, they attach cells to the extracellular matrix via alpha-dystroglycan and integrin receptors. Third, they convey information to the cell interior via these receptors, as exemplified by mesenchymal to epithelial transitions in kidney, myogenesis in skeletal muscle, and outgrowth of neurites. Studies of mutated genes as well as gene targeting experiments have indicated different functional roles for the different laminins PUBMED:11969289.

    \ \

    This family of proteins represents the alpha chain of laminin.

    \ 2145 IPR007556 \ This is a family of uncharacterised prokaryotic proteins.\ 3187 IPR007156 \ The members of this family are related to the LemA protein . LemA contains an N-terminal predicted transmembrane helix. It has been predicted that the small N terminus is extracellular PUBMED:8758895. The exact molecular function of this protein is uncertain.\ 1585 IPR005107 \

    Proteins containing this domain form structural complexes with other known families, such as and .

    \ 2589 IPR000083 \

    Fibronectin type I repeats are one of the three repeats found in the fibronectin protein.\ Fibronectin is a plasma protein that binds cell surfaces and various compounds\ including collagen, fibrin, heparin, DNA, and actin. Type I domain (FN1) is approximately\ 40 residues in length. Four conserved cysteines are involved in disulphide bonds. The 3D\ structure of the FN1 domain has been determined PUBMED:2112232, PUBMED:1602484, PUBMED:7582899. It consists of two antiparallel\ beta-sheets, first a double-stranded one, that is linked by a disulphide bond to a\ triple-stranded beta-sheet. The second conserved disulphide bridge links the C-terminal\ adjacent strands of the domain.

    \

    In human tissue plasminogen activator chain A the FN1 domain together with the\ following epidermal growth factor (EGF)-like domain are involved in\ fibrin-binding PUBMED:1900516. It has been suggested that these two modules form a single structural\ and functional unit PUBMED:7582899. The two domains keep their specific tertiary structure, but interact\ intimately to bury a hydrophobic core; the inter-module linker makes up the third strand of\ the EGF-module's major beta-sheet.

    \ 7940 IPR012516 \

    This family consists of the halocidin family of antimicrobial peptides. Halocidins are isolated from the haemocytes of the tunicate, Halocynthia aurantium. They are dimeric in structures, which are found via a disulfide linkage between cysteines of two different- sized monomers. Halocidins have been shown to have strong antimicrobial activities against a wide variety of pathogenic bacteria and could be ideal candidates as peptide antibiotics against multidrug-resistant bacteria PUBMED:12067731.

    \ 2049 IPR007183 \ This is a protein of unknown function.\ 2124 IPR007411 \ This family consists of bacterial proteins of uncharacterised function.\ 4311 IPR000297 \

    Recommended name: Peptidylprolyl isomerase

    \

    Synonyms for proteins with this domain are: Peptidyl-prolyl cis-trans isomerase, PPIase, rotamase, cyclophilin, FKBP65

    \ \

    Peptidylprolyl isomerase () is an\ enzyme that accelerates protein folding by catalyzing the cis-trans\ isomerization of proline imidic peptide bonds in oligopeptides PUBMED:2186809. It has been reported in bacteria and eukayotes.

    \ 1113 IPR006965 \ This 19 kDa glycoprotein binds the major histocompatibility (MHC) class I antigens in the endoplasmic reticulum (ER). The ER retention signal at the C terminus of GP19K causes retention of the complex in the ER, preventing lysis of the cell by cytotoxic T-lymphocytes PUBMED:8249282.\ 1077 IPR003702 \ This family contains several enzymes which take part in pathways involving acetyl-CoA. Acetyl-CoA hydrolase from yeast catalyses the formation of acetate from acetyl-CoA, CoA transferase (CAT1)\ produces succinyl-CoA, and acetate-CoA transferase utilizes\ acyl-CoA and acetate to form acetyl-CoA.\ 6152 IPR010456 \

    This family consists of several Ribosomal protein L11 methyltransferase sequences.

    \ 3779 IPR005082 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases associated with clan U- have an unknown catalytic mechanism as the protein fold of the active site domain and the active site residues have not been reported.

    \

    This group of peptidases belongs to MEROPS peptidase family U9 (phage prohead processing peptidase family, clan U-), which play a role in the head assembly of bacteriophage T4.

    \ 4437 IPR000623 \ Shikimate kinase () catalyzes the fifth step in the biosynthesis of aromatic amino acids from chorismate\ (the so-called shikimate pathway) PUBMED:7612934. The enzyme catalyzes the following reaction:\ \ The protein is found in bacteria (gene aroK or aroL), plants and fungi (where\ it is part of a multifunctional enzyme that catalyses five consecutive steps in this pathway). In 1994, the 3D\ structure of shikimate kinase was predicted to be very close to that of adenylate kinase, suggesting a functional\ similarity as well as an evolutionary relationship PUBMED:7703851. This prediction has since been confirmed\ experimentally. The protein is reported to possess an alpha/beta fold, consisting of a central sheet of five\ parallel beta-strands flanked by alpha-helices. Such a topology is very similar to that of adenylate kinase\ PUBMED:9600856.\ 7447 IPR011475 \

    Several Rhodopirellula baltica proteins share this probable domain. Most of these proteins are predicted to be secreted or membrane-associated.

    \ 7239 IPR010879 \

    This family represents a series of bacterial domains of unknown function of around 50 residues in length. Members of this family are often found as tandem repeats and in some cases represent the whole protein. All member proteins are described as being hypothetical.

    \ 693 IPR007484 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This domain is found in metallopeptidases belonging to the MEROPS peptidase family M28 (aminopeptidase Y, clan MH) PUBMED:7674922. They also contain a transferrin receptor-like dimerisation domain () and a protease-associated PA domain ().

    \ 3043 IPR003521 \ The nucleotide-sensitive chloride conductance regulatory protein (ICln) is\ found ubiquitously in mammalian (and other) cell types and is postulated to\ play a critical role in cell volume regulation. Initial studies proposed\ that ICln was itself a swelling-activated anion channel; however, further\ studies demonstrated that it is localised primarily to the cell cytoplasm.\ It has therefore been postulated that activation of cell volume regulation\ may involve reversible translocation of ICln from the cytoplasm, and its\ insertion into the plasma membrane. It is not resolved whether the anionic channel involved in cell volume regulation after cell-swelling comprises one or more subunits, and if it does, whether ICln is in fact one of them PUBMED:9696697.\ 757 IPR003501 \ The bacterial phosphoenolpyruvate: sugar phosphotransferase system (PTS) is a multi-protein system involved in the\ regulation of a variety of metabolic and transcriptional processes. The lactose/cellobiose-specific family are one of four\ structurally and functionally distinct group IIB PTS system cytoplasmic enzymes. The fold of IIB cellobiose shows similar\ structure to mammalian tyrosine phosphatases. This signature is often found downstream of .\ 6786 IPR010715 \

    This represents a conserved region approximately 180 residues long within bacterial S-type pyocins. Pyocins are polypeptide toxins produced by, and active against, bacteria. S-type pyocins cause cell death by DNA breakdown due to endonuclease activity PUBMED:12423794.

    \ 2828 IPR000889 \ Glutathione peroxidase (GSHPx), an enzyme whose principal function is to protect against damage from \ endogenously-formed hydroxyperoxides, catalyses the reduction of hydroxyperoxides by glutathione \ PUBMED:, PUBMED:7565867.\ \ In higher vertebrates, several forms of GSHPx are known, including a\ ubiquitous cytosolic form (GSHPx-1); a gastrointestinal cytosolic form (GSHPx-GI); a plasma secreted \ form (GSHPx-P); and an epididymal secretory form (GSHPx-EP). In filarial nematode parasites, the major \ soluble cuticular protein (gp29) is a secreted GSHPx, which may provide a mechanism of resistance to \ the immune reaction of the mammalian host by neutralising the products of the oxidative burst of \ leukocytes PUBMED:1631065. The Escherichia coli protein btuE, a periplasmic protein involved in vitamin B12\ transport, is evolutionarily related to GSHPxs, although the significance of this relationship is \ unclear. The structure of bovine seleno-glutathione peroxidase has been determined PUBMED:6852035. The \ protein belongs to the alpha-beta class, with a 3-layer(aba) sandwich architecture. The catalyic site \ of GSHPx contains a conserved residue which is either a cysteine or, in many eukaryotic GSHPx, a \ selenocysteine PUBMED:2142875.\ 3527 IPR007742 \

    Bacterial nitrous oxide (N(2)O) reductase is the terminal oxidoreductase of a respiratory process that generates dinitrogen from\ N(2)O. To attain its functional state, the enzyme is subjected to a maturation process which involves the protein-driven synthesis of a\ unique copper-sulphur cluster and metallation of the binuclear Cu(A) site in the periplasm. NosD is a periplasmic protein which is thought to insert copper into the exported reductase apoenzyme PUBMED:8626275.

    \ 5948 IPR009302 \

    This entry consists of the tail length tape measure protein from bacteriophage HK97 and related sequences from Escherichia coli O157:H7.

    \ 5098 IPR007935 \

    This family consists of several tobravirus 2B proteins. It is known that the 2B protein is required\ for transmission by both Paratrichodorus pachydermus and Paratrichodorus\ anemones nematodes PUBMED:11162804. Transmission of the tobraviruses Tobacco rattle virus by trichodorid vector nematodes\ requires the viral coat protein (CP) and the 2B protein, a nonstructural protein encoded by RNA2, the smaller of the two viral genomic\ RNAs. It is hypothesized that the 2B protein functions by interacting with a small, flexible domain located at the C-terminus of the CP,\ forming a bridge between the virus particle and the internal surface of the vector nematode feeding apparatus PUBMED:12202212.

    \ 8145 IPR013243 \

    This domain is found in the protein Sgf73/Sca7 which is a component of the multihistone acetyltransferase complexes SAGA and SILK PUBMED:15932941. This domain is also found in Ataxin-7, a human protein which in its polyglutamine expanded pathological form, is responsible for the neurodegenerative disease spinocerebellar ataxia 7 (SCA7) PUBMED:15932941.

    \ 8022 IPR012981 \

    This domain is involved in pre-rRNA processing PUBMED:15670595. It has been shown to be required either for nucleolar retention or correct assembly of the box C/D snoRNP in Saccharomyces cerevisiae PUBMED:15670595.

    \ 5492 IPR008528 \ This family consists of several plant proteins of unknown function.\ 4214 IPR007836 \ L41 associates with the ribonucleoprotein particles of the 60S subunit late in the ribosomal maturation process. L41 is encoded by the smallest known open reading frame and in yeast is composed of only 24 amino acids, 17 of which are arginine or lysine.\ 1231 IPR006752 \

    Archaeal flagella are unique motility structures, and the absence of bacterial structural motility genes in the complete genome sequences\ of flagellated archaeal species has always suggested that archaeal flagellar biogenesis is likely mediated by novel components. FlaD and FlaE, are present in the cell as\ membrane-associated proteins but are not major components of isolated flagellar filaments. Interestingly, flaD was found to encode\ two proteins, each translated from a separate ribosome binding site.

    \ \

    This group of sequences contain the archaeal flaD and flaE proteins. The conserved region that defines these sequences is found in the N-teminal region of flaE but towards the C-terminal region of flaD PUBMED:11717274.

    \ 1718 IPR000756 \ Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The DAG\ kinase domain is assumed to be an accessory domain. Upon cell stimulation, DAG kinase converts\ DAG into phosphatidate, initiating the resynthesis of phosphatidylinositols and attenuating\ protein kinase C activity. It catalyzes the reaction: ATP + 1,2-diacylglycerol = ADP +\ 1,2-diacylglycerol 3-phosphate. The enzyme is stimulated by calcium and phosphatidylserine\ and phosphorylated by protein kinase C. This domain is always associated with .\ 7093 IPR009891 \

    This family consists of several plant tapetum specific proteins. Members of this family are found in Arabidopsis thaliana, Brassica napus and Sinapis alba. Members of this family may be involved in sporopollenin formation and/or deposition PUBMED:7764317.

    \ 7954 IPR012576 \

    This family consists of the NADH-ubiquinone oxidoreductase B12 subunit proteins. NADH is the central source of electrons in the mitochondrial and bacterial respiration. NADH-ubiquinone oxidoreductase is involved in the transfer of electrons from NADH to the electron transport chain. This oxidation of NADH is coupled to proton transfer across the membrane, generating a proton motive force that is utilised for the synthesis of ATP. The function of this subunit is unclear PUBMED:9425316.

    \ 2187 IPR007475 \ This is a family of uncharacterised proteins.\ 4391 IPR006940 \

    Securin, also known as pituitary tumour-transforming gene product is a regulatory protein which plays a central role in chromosome stability in the p53/TP53 pathway, and in DNA repair. It probably acts by blocking the action of key proteins, for example, during mitosis it blocks Separase/ESPL1 function preventing the proteolysis of the cohesin complex and the subsequent segregation of the chromosomes. At the onset of anaphase, it is ubiquitinated, leading to its destruction and to the liberation of ESPL1. Its function is however not limited to an inhibitory activity, since it is required to activate ESPL1. The negative regulation of the transcriptional activity and related apoptosis activity of TP53 may explain the strong transforming capability of the protein when it is overexpressed. Over-expression of securin is associated with a number of tumours, and it has been proposed that this may be due to erroneous chromatid separation leading to chromosome gain or loss PUBMED:10411507.

    \ 1621 IPR001083 \

    Some fungal transcription factors contain an N-terminal domain, the copper fist,\ which seems to be involved in copper-dependent DNA-binding PUBMED:8262047, PUBMED:8509391.\ These proteins activate the transcription of the metallothionein gene in response to\ copper. Metallothionein maintains copper levels in yeast \ PUBMED:3052856, PUBMED:8262047. \ The copper fist domain, which is similar in structure to metallothionein itself, undergoes\ a large conformational change on copper-binding that allows DNA-binding. The domain contains a conserved array of zinc-binding residues (Cys-X2-Cys-X8-Cys-X-His) and forms a three-stranded antiparallel beta-sheet with two short helical segments that project from one end of the beta-sheet PUBMED:9665167. Conserved residues form a basic patch that may be important for DNA binding.\

    \ 1581 IPR003779 \ The catechol and protocatechuate branches of the 3-oxoadipate pathway, which are\ important for the bacterial degradation of aromatic compounds, converge at the common intermediate 3-oxoadipate enol-lactone. \ Carboxymuconolactone decarboxylase (CMD) is involved in protocatechuate catabolism. In some\ bacteria a gene fusion event leads to expression of CMD with a hydrolase involved in the same pathway PUBMED:9495744.\ 778 IPR003388 \

    Eukaryotic proteins of the reticulon (RTN) family all share an association with the endoplasmic reticulum (ER). Whereas amino-terminal regions are not related to one another, all reticulon proteins share a 200 amino acid residue region of sequence similarity at the C-terminal. This region contains two\ large hydrophobic regions separated by a 66 residue hydrophilic segment. The\ conserved hydrophobic C-terminal portion has been shown to play an essential\ role in the association of reticulons with the ER membrane. The hydrophobic\ portions are supposed to be membrane-embedded and the hydrophilic 66 residue\ localized to the lumenal/extracellular face of the membrane. Most reticulons\ have a di-lysine ER retention motif at the C-terminal. Because of their likely\ association with the rough as well as the smooth ER, the reticulons might play\ some role in transport processes or in regulation of intracellular calcium\ levels. It has been suggested that the reticulons may be serving as ER-associated channel-like complexes PUBMED:7844160, PUBMED:8833145, PUBMED:9693037, PUBMED:10667797.

    \ 1089 IPR006091 \

    Mammalian Co-A dehydrogenases () are enzymes that catalyse the first step in each cycle of beta-oxidation in mitochondion. Acyl-CoA dehydrogenases PUBMED:3326738, PUBMED:2777793, PUBMED:8034667 catalyze the alpha,beta-dehydrogenation of acyl-CoA thioesters to the corresponding trans 2,3-enoyl CoA-products with concommitant reduction of enzyme-bound FAD. Reoxidation of the flavin involves transfer of electrons to ETF (electron transfering flavoprotein). These enzymes are homodimers containing one molecule of FAD.

    The monomeric enzyme is folded into three domains of approximately equal size. The N-terminal and the C-terminal are mainly alpha-helices packed together, and the middle domain consists of two orthogonal beta-sheets. The flavin ring is buried in the crevise between two alpha-helical domains and the beta-sheet of one subunit, and the adenosine pyrophosphate moiety is stretched into the subunit junction with one formed by two C-terminal domains PUBMED:8356049.

    The central domain of Acyl-CoA dehydrogenase has a beta-barrel fold.

    \ 7350 IPR011084 \

    The metallo-beta-lactamase fold contains five sequence motifs. The first four motifs are found in and are common to all metallo-beta-lactamases. The fifth motif appears to be specific to function. This entry represents the fifth motif from metallo-beta-lactamases involved in DNA repair PUBMED:12177301.

    \ 628 IPR007276 \ Emg1 and Nop14 are novel proteins whose interaction is required for the maturation of the 18S rRNA and for 40S ribosome production PUBMED:11694595.\ 357 IPR006094 \

    Various enzymes use FAD as a co-factor, most of these enzymes are oxygen-dependent oxidoreductases, containing a covalently bound FAD group which is attached to a histidine via an 8-alpha-(N3-histidyl)-riboflavin linkage. One of the enzymes Vanillyl-alcohol oxidase (VAO, ) has a solved structure, the alignment includes the\ FAD binding site, called the PP-loop, between residues 99-110 PUBMED:10984479. The FAD molecule is covalently bound in the known\ structure, however the residue that links to the FAD is not in the alignment. VAO catalyses the oxidation of a wide\ variety of substrates, ranging from aromatic amines to 4-alkylphenols.

    \ 7658 IPR012481 \

    Kanamycin nucleotidyltransferase (KNTase) is involved in conferring resistance to aminoglycoside antibiotics and catalyses the transfer of a nucleoside monophosphate group from a nucleotide to kanamycin. This enzyme is dimeric with each subunit being composed of two domains. The C-terminal domain contains five alpha helices, four of which are organised into an up-and-down alpha helical bundle. Residues found in this domain may contribute to this enzyme,s active site PUBMED:7577914.

    \ 5920 IPR010351 \

    This family consists of several hypothetical proteins from Escherichia coli, Yersinia pestis and Salmonella typhi.

    \ 3566 IPR006770 \

    Opioid peptides act as growth factors in neural and non-neural cells and tissues, in addition to serving for\ neurotransmission/neuromodulation in the nervous system. The native opioid growth factor (OGF), [Met(5)]-enkephalin, is an\ inhibitory peptide that plays a role in cell proliferation and tissue organization during development, cancer, cellular renewal, wound\ healing, and angiogenesis. OGF action is mediated by a receptor mechanism, the receptor for OGF (OGFr) is an\ integral membrane protein associated with the nucleus.

    \ \ \

    OGFr is distinguished by containing a series of imperfect repeats. This entry describes a proline-rich repeat found in a human opioid growth factor receptor PUBMED:11890982.

    \ 2632 IPR003814 \ Formylmethanofuran dehydrogenase () found in methanogenic Archaea, are molybdenum or tungsten iron-sulphur proteins containing a pterin cofactor PUBMED:8125106. The enzyme from Methanosarcina barkeri is a molybdenum iron-sulphur protein involved in methanogenesis. Subunit E protein is co-expressed with the enzyme but fails to co-purify and thus its function is unknown PUBMED:8617280.\ 4830 IPR004254 \ Members of this family are integral membrane proteins. This family includes proteins that are hemolysin-III homologs.\ 2230 IPR007595 \ This family contains several uncharacterised staphylococcal proteins.\ 7569 IPR011713 \

    This entry includes some LRRs that fail to be detected by the model.

    \ 4288 IPR005570 \ Rpb8 is a subunit common to the three yeast RNA polymerases, pol I, II and III. Rpb8 interacts with the largest subunit Rpb1, and with Rpb3 and Rpb11, two smaller subunits.\ 7925 IPR012628 \

    This family consists of toxic peptides (Magi 5) found in the venom of the Hexathelidae spider. Magi 5 is the first spider toxin with binding affinity to site 4 of a mammalian sodium channel and the toxin has an insecticidal effect on larvae, causing paralysis when injected into the larvae.

    \ 937 IPR001460 \

    This signature identifies a large group of proteins, which include:

    \ \

    The large number of penicillin binding proteins, which are represented in this group of sequences, are responsible for the final stages of peptidoglycan biosynthesis for cell wall formation. The proteins synthesise cross-linked peptidoglycan from lipid intermediates, and contain a penicillin-sensitive transpeptidase carboxy-terminal domain. The active site serine (residue 337 in ) is conserved in all members of this family PUBMED:8605631.

    \ \

    MecR1 and BlaR1 are metallopeptidases belonging to MEROPS peptidase family M56, clan M-. BlaR1 and MecR1 cleave their cognate transcriptional repressors BlaI and MecI, respectively, activating the synthesis of MecA.

    \ \

    MecR1 is present in Staphylococcus aureus and Staphylococcus sciuri, whereas BlaR1 (also known as BlaR, PenR1, or PenJ) has been found in Bacillus licheniformis, Staphylococcus epidermis, Staphylococcus haemolyticus, and several S. aureus strains. These proteins are either plasmid-encoded, chromosomal, or transposon-mediated. MecR1/BlaR1 proteins are made up by homologous N-terminal 330-residue transmembrane metallopeptidase domains linked to extracellular 260-residue homologous PBP-like penicillin sensor moieties.

    \ \ 5820 IPR009248 \

    The Rhizobium meliloti bacA gene encodes a function that is essential for bacterial differentiation into bacteroids within plant cells in the symbiosis between R. meliloti and alfalfa. An Escherichia coli homologue of BacA, SbmA, is implicated in the uptake of microcins and bleomycin. This family is likely to be a subfamily of the ABC transporter family.

    \ 4136 IPR000342 \

    RGS (Regulator of G Protein Signalling) proteins are multi-functional, GTPase-accelerating proteins that promote GTP hydrolysis by the alpha subunit of heterotrimeric G proteins, thereby inactivating the G protein and rapidly switching off G protein-coupled receptor signalling pathways PUBMED:10836135. Upon activation by GPCRs, heterotrimeric G proteins exchange GDP for GTP, are released from the receptor, and dissociate into free, active GTP-bound alpha subunit and beta-gamma dimer, both of which activate downstream effectors. The response is terminated upon GTP hydrolysis by the alpha subunit (), which can then bind the beta-gamma dimer (, ) and the receptor. RGS proteins markedly reduce the lifespan of GTP-bound alpha subunits by stabilising the G protein transition state.

    \

    All RGS proteins contain an RGS-box (or RGS domain), which is required for activity. Some small RGS proteins such as RGS1 and RGS4 are comprised of little more than an RGS domain, while others also contain additional domains that confer further functionality PUBMED:10987813. RGS domains can be found in conjunction with a variety of domains, including: DEP for membrane targeting (), PDZ for binding to GPCRs (), PTB for phosphotyrosine-binding (), RBD for Ras-binding (), GoLoco for guanine nucleotide inhibitor activity (), PX for phosphatidylinositol-binding (), PXA that is associated with PX (), PH for stimulating guanine nucleotide exchange (), and GGL (G protein gamma subunit-like) for binding G protein beta subunits () PUBMED:15090201. Those RGS proteins that contain GGL domains can interact with G protein beta subunits to form novel dimers that prevent G protein gamma subunit binding and G protein alpha subunit association, thereby preventing heterotrimer formation.

    \ \ 7716 IPR013099 \

    This entry includes the two membrane helix type ion channels found in bacteria PUBMED:11836519.

    \ 7827 IPR012991 \

    Members of this family are components of the type IV secretion system. They mediate intracellular transfer of macromolecules via a mechanism ancestrally related to that of bacterial conjugation machineries.

    \ 4084 IPR006802 \ This family includes the radial spoke head proteins RSP4 and RSP6 from Chlamydomonas reinhardtii, and several eukaryotic homologues, including mammalian RSHL1, the protein product of a familial ciliary dyskinesia candidate gene PUBMED:11237735.\ 3873 IPR001697 \

    Pyruvate kinase () (PK) catalyses the final step in glycolysis PUBMED:2379684, the conversion of phosphoenolpyruvate to pyruvate with concomitant phosphorylation of ADP to ATP:

    \ \

    The enzyme, which is found in all living organisms, requires both magnesium and potassium ions for its activity PUBMED:3519210. In vertebrates, there are four tissue-specific isozymes: L (liver), R (red cells), M1 (muscle, heart and brain), and M2 (early foetal tissue). In plants, PK exists as cytoplasmic and plastid isozymes, while most bacteria and lower eukaryotes have one form, except in certain bacteria, such as Escherichia coli, that have two isozymes. All isozymes appear to be tetramers of identical subunits of ~500 residues.

    \

    PK helps control the rate of glycolysis, along with phosphofructokinase () and hexokinase (). PK possesses allosteric sites for numerous effectors, yet the isozymes respond differently, in keeping with their different tissue distributions PUBMED:12798932. The activity of L-type (liver) PK is increased by fructose-1,6-bisphosphate (F1,6BP) and lowered by ATP and alanine (gluconeogenic precursor), therefore when glucose levels are high, glycolysis is promoted, and when levels are low, gluconeogenesis is promoted. L-type PK is also hormonally regulated, being activated by insulin and inhibited by glucagon, which covalently modifies the PK enzyme. M1-type (muscle, brain) PK is inhibited by ATP, but F1,6BP and alanine have no effect, which correlates with the function of muscle and brain, as opposed to the liver.

    \

    The structure of cat muscle pyruvate kinase has been determined PUBMED:3519210. The protein comprises three domains each belonging to the alpha-beta class; one of these adopts a 3-layer(aba) sandwich architecture; the other two form beta-barrels.

    \ 5883 IPR010331 \

    Among the bacterial genes required for nodule invasion are the exo genes. These genes are involved in the production of an extracellular polysaccharide. Mutations in the exoD result in altered exopolysaccharide production and defects in nodule invasion PUBMED:1987158.

    \ 387 IPR002770 \

    Formylmethanofuran:tetrahyromethanopterin formyltransferase (Ftr) is involved in C1 metabolism in methanogenic archaea, sulphate-reducing archaea and methylotrophic bacteria. It catalyses the following reversible reaction:

    \ \ \

    Ftr from the thermophilic methanogen Methanopyrus kandelri (optimum growth temperature 98 degrees C) is a hyperthermophilic enzyme that is absolutely dependent on the presence of lyotropic salts for activity and thermostability. The crystal structure of Ftr, determined to a reveals a homotetramer composed essentially of two dimers. Each subunit is subdivided into two tightly associated lobes both consisting of a predominantly antiparallel beta sheet flanked by alpha helices forming an alpha/beta sandwich structure. The approximate location of the active site was detected in a region close to the dimer interface PUBMED:9195883. Ftr from the mesophilic methanogen Methanosarcina barkeri and the sulphate-reducing archaeon Archaeoglobus fulgidus have a similar structure PUBMED:12192072

    \ \

    In the methylotrophic bacterium Methylobacterium extorquens, Ftr interacts with three other polypeptides to form an Ftr/cyclohydrolase complex which catalyses the hydrolysis of formyl-tetrahydromethanopterin to formate during growth on C1 substrates PUBMED:12123819.

    \ \ 7429 IPR011456 \

    This is a large family of short hypothetical proteins in Leptospira interrogans.

    \ 6192 IPR008083 \

    The CD34 group of monoclonal antibodies recognises CD34 (also termed CD34\ antigen), a 105-120kDa cell surface glycoprotein, which is selectively \ expressed by human myeloid and lymphoid progenitor cells, including \ the haemopoietic stem cell. The protein is also expressed on vascular\ endothelial cells. Here, it is concentrated on the surface of the inter-digitating processes, suggesting a possible involvement in cell interactions\ or adhesion, by mediating the attachment of stem cells to the bone marrow \ extracellular matrix, or directly to stromal cells. The restricted pattern\ of expression of CD34 in haemopoiesis suggests that it may have a \ significant function in the earliest stages of blood cell differentiation \ in the bone marrow PUBMED:1694174, PUBMED:1709048.\

    \

    CD34 is a phosphoprotein shown to be activated by protein kinase C (PKC) in\ a developmental stage-specific manner. Analysis of the human CD34 sequence\ reveals that the protein appears to be a type I transmembrane (TM) molecule.\ The predicted internal portion of the protein appears to retain basic amino \ acid residues adjacent to Ser residues, presenting at least two potential\ target sites for PKC phosphorylation. In addition, there are two other \ consensus motifs that correspond to potential target sites for \ Ca+/calmodulin-dependent kinase and/or protease activated kinase I PUBMED:1694174.\

    \

    The protein is not strongly similar to other known proteins, but some weak\ similarities do exist: e.g., to the S+T region (a region rich in potential\ O-linked carbohydrate attachment sites), the TM domain and cytoplasmic \ domain of cell surface proteins such as leukosialin, a major sialoglyco-protein of rat and human leukocytes; to the N-terminal glycosylated region\ of CD45 (the leukocyte common antigen); and to groups of interrelated\ proteins involved in cell adhesion or the regulation of complement.\

    \

    A homologue of human CD34 is expressed in mouse. The amino acid sequences\ only diverge significantly at their N-termini, which are predicted to be \ highly glycosylated and whose functions are probably modulated by \ carbohydrate. The observed pattern of expression of the murine CD34 gene\ is consistent with that of the human antigen. That CD34 is also highly\ expressed outside haematopoiesis, by vascular endothelial cells and by \ fibroblasts in differentiated tissue, suggests a role common to a variety\ of cell types. Concentration of CD34 on the interdigitating membrane\ projections of adjacent capillary endothelial cells has strengthened the\ idea that it functions in the control of events leading to cell-cell or\ cell-matrix adhesion, which role could be modulated by variation in its\ levels of glycosylation. The conservation between the human and mouse\ cysteine-rich domain in the extracellular part of the protein, and the\ exceptionally high conservation of the cytoplasmic domain, imply that the\ protein is more than a carrier for either carbohydrate or negatively charged\ terminal sialic acid residues (a role postulated for leukosialin/sialophorin).\ The highly conserved domain may serve to provide an internal signal of \ external contact with a ligand.\

    \ \ 2937 IPR007622 \ In infected cells, UL55 is associated with the nuclear matrix, and found adjacent to compartments containing the capsid protein ICP35. UL55 was not detected in assembled virions. It is thought that UL55 may play a role in virion assembly or maturation PUBMED:9714248.\ 515 IPR000868 \ This is a family of hydrolase enzymes. Isochorismatase, also known as 2,3 dihydro-2,3 dihydroxybenzoate\ synthase catalyses the conversion of isochorismate, in the presence of water, to 2,3-dihydroxybenzoate\ and pyruvate.\ 2432 IPR001509 \ This family of proteins utilise NAD as a cofactor. The proteins in this family use nucleotide-sugar substrates for a variety of chemical reactions PUBMED:9174344.\ 5455 IPR008700 \ This family consists of several plant nitrate induced or NOI proteins.\ 2677 IPR001409 \ Steroid or nuclear hormone receptors (NRs) constitute an important super-\ family of transcription regulators that are involved in widely diverse \ physiological functions, including control of embryonic development, cell\ differentiation and homeostasis. Members of the superfamily include the\ steroid hormone receptors and receptors for thyroid hormone, retinoids, \ 1,25-dihydroxy-vitamin D3 and a variety of other ligands. The proteins \ function as dimeric molecules in nuclei to regulate the transcription of \ target genes in a ligand-responsive manner PUBMED:7899080, PUBMED:8165128. In addition to C-terminal\ ligand-binding domains, these nuclear receptors contain a highly-conserved,\ N-terminal zinc-finger that mediates specific binding to target DNA \ sequences, termed ligand-responsive elements. In the absence of ligand,\ steroid hormone receptors are thought to be weakly associated with nuclear\ components; hormone binding greatly increases receptor affinity.\ \

    NRs are extremely important in medical research, a large number of them\ being implicated in diseases such as cancer, diabetes, hormone resistance\ syndromes, etc. While several NRs act as ligand-inducible transcription\ factors, many do not yet have a defined ligand and are accordingly termed \ "orphan" receptors. During the last decade, more than 300 NRs have been\ described, many of which are orphans, which cannot easily be named due to \ current nomenclature confusions in the literature. However, a new system \ has recently been introduced in an attempt to rationalise the increasingly \ complex set of names used to describe superfamily members.

    \

    \ The glucocorticoid receptor consists of 3 functional and structural\ domains: an N-terminal (modulatory) domain; a DNA binding domain that\ mediates specific binding to target DNA sequences (ligand-responsive\ elements); and a hormone binding domain. The N-terminal domain is unique\ to the glucocorticoid receptors; it spans the first 440 residues, and is\ primarily responsible for transcriptional activation. The smaller (around\ 65 residues), highly-conserved central portion of the protein is the DNA \ binding domain, which plays a role in DNA binding specificity, homo-\ dimerisation and in interactions with other proteins. The hormone binding \ domain comprises approximately 250 residues at the C-terminus of the\ receptor. This domain mediates receptor activity via interaction with heat\ shock proteins and cyclophilins, or with hormone. For more information, see\ the GRR resource [http://biochem1.basic-sci.georgetown.edu/GRR/GRR.html].

    \ 1606 IPR001981 \ Colipase is a small protein cofactor needed by pancreatic lipase for efficient dietary lipid hydrolyisis. Efficient absorption of dietary fats is dependent on the action of pancreatic triglyceride lipase. Colipase binds to the C-terminal, non-catalytic domain of lipase, thereby stabilising as active conformation and considerably increasing the overall hydrophobic binding site. Structural studies of the complex and of colipase alone have revealed the functionality of its architecture PUBMED:9240923, PUBMED:10570245.\

    Colipase is a small protein with five conserved disulphide bonds. Structural analogies have been recognised between a developmental protein (Dickkopf), the pancreatic lipase C-terminal domain, the N-terminal domains of lipoxygenases and the C-terminal domain of alpha-toxin. These non-catalytic domains in the latter enzymes are important for interaction with membrane. It has not been established if these domains are also involved in eventual protein cofactor binding as is the case for pancreatic lipase PUBMED:10570245.

    \ 2231 IPR007598 \ This is a family of Arabidopsis thaliana proteins. Many of these members contain a repeated region.\ 3789 IPR003899 \

    A large group of bacterial exotoxins are referred to as "A/B toxins", \ essentially because they are formed from two subunits PUBMED:8225592. The "A" subunit\ possesses enzyme activity, and is transferred to the host cell following a conformational change in the membrane-bound transport "B" subunit PUBMED:8225592.

    \

    Bordetella pertussis is the causative agent of whooping cough, and is a \ Gram-negative aerobic coccus. Its major virulence factor is the pertussis \ toxin, an A/B exotoxin that mediates both colonisation and toxaemic stages\ of the the disease PUBMED:3704651, PUBMED:2873570. Recombinant, inactive forms of the 5 subunits that make up the toxin have proven to be good vaccines. The S2 and S3 subunits of the toxin form part of the "B" moiety. They are responsible for binding the whole toxin to host cells prior to invasion, and are classed as adhesins PUBMED:2873570. S2 attaches to a host receptor called lactosylceramide. It has also been speculated that the S3 unit may preferentially bind phagocytes.

    \

    The crystal structure of pertussis toxin has been determined to 2.9A \ resolution PUBMED:8075982. The catalytic A-subunit (S1) shares structural similarity with other ADP-ribosylating bacterial toxins, although differences in the C-terminal portion explain its unique activation mechanism. Despite its\ heterogeneous subunit composition, the structure of the cell-binding\ B-oligomer (S2, S3, two copies of S4, and S5) resembles the symmetrical\ B-pentamers of the cholera and Shiga toxin families, but it interacts\ differently with the A-subunit and there is virtually no sequence similarity between B-subunits of the different toxins. Two peripheral domains that are unique to the pertussis toxin B-oligomer share structural similarity with a calcium-dependent eukaryotic lectin, and reveal possible receptor-binding sites.

    \ 35 IPR004856 \

    N-linked (asparagine-linked) glycosylation of proteins is mediated by a highly conserved pathway in eukaryotes, in which a lipid (dolichol\ phosphate)-linked oligosaccharide is assembled at the endoplasmic reticulum membrane prior to the transfer of the oligosaccharide\ moiety to the target asparagine residues. This oligosaccharide is composed of Glc(3)Man(9)GlcNAc(2). The addition of the three\ glucose residues is the final series of steps in the synthesis of the oligosaccharide precursor. Alg6 transfers the first glucose residue,\ and Alg8 transfers the second one PUBMED:8016100. In the human alg6 gene, a C-T transition, which causes Ala333 to be replaced with Val, has\ been identified as the cause of a congenital disorder of glycosylation, designated as type Ic OMIM:603147 PUBMED:10359825.

    \ 5814 IPR010294 \

    This domain represents the Spacer-1 region from the ADAM-TS family of metalloproteinases PUBMED:11279086.

    \ 1358 IPR000712 \

    Active cell suicide (apoptosis) is induced by events such as growth factor withdrawal and toxins.\ It is controlled by regulators, which have either an inhibitory effect on programmed cell death\ (anti-apoptotic) or block the protective effect of inhibitors (pro-apoptotic) PUBMED:15335822,\ PUBMED:8918887. Many viruses have found a way of countering defensive apoptosis by encoding their own\ anti-apoptosis genes preventing their target-cells from dying too soon.

    All proteins belonging to\ the Bcl-2 family PUBMED:8910675 contain either a BH1, BH2, BH3, or BH4 domain. All anti-apoptotic\ proteins contain BH1 and BH2 domains, some of them contain an additional N-terminal BH4 domain\ (Bcl-2, Bcl-x(L), Bcl-w), which is never seen in pro-apoptotic proteins, except for Bcl-x(S). On the\ other hand, all pro-apoptotic proteins contain a BH3 domain (except for Bad) necessary for\ dimerization with other proteins of Bcl-2 family and crucial for their killing activity, some of them\ also contain BH1 and BH2 domains (Bax, Bak). The BH3 domain is also present in some anti-apoptotic\ protein, such as Bcl-2 or Bcl-x(L). Proteins that are known to contain these domains include vertebrate\ Bcl-2 (alpha and beta isoforms) and Bcl-x (isoforms (Bcl-x(L) and Bcl-x(S)); mammalian proteins Bax and\ Bak; mouse protein Bid; Xenopus laevis proteins Xr1 and Xr11; human induced myeloid leukemia cell\ differentiation protein MCL1 and Caenorhabditis elegans protein ced-9.

    \ 1317 IPR001486 \ Globins are heme-containing proteins involved in binding and/or transporting\ oxygen. Almost all globins belong to a large family , the\ only exceptions are the following proteins which form a family of their own PUBMED:2111321, PUBMED:8177215. \ These proteins contain a conserved histidine which could be involved in heme-\ binding.\ 4251 IPR000876 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of \ sequence similarities. One of these families includes yeast S7 (YS6); archaeal S4e; and \ mammalian and plant cytoplasmic S4 PUBMED:2124517. Two highly similar isoforms of mammalian S4 \ exist, one coded by a gene on chromosome Y, and the other on chromosome X. These proteins have \ 233 to 264 amino acids.

    \ 3768 IPR005319 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S48 (clan S-). The protein fold of the peptidase domain and the active site residues (except for the catalytic serine) are not known for any members of this family.

    \ 6820 IPR009731 \

    This family consists of several Bacteriophage lambda replication protein P like proteins. The bacteriophage lambda P protein promoters replication of the phage chromosome by recruiting a key component of the cellular replication machinery to the viral origin. Specifically, P protein delivers one or more molecules of Escherichia coli DnaB helicase to a nucleoprotein structure formed by the lambda O initiator at the lambda replication origin PUBMED:2165499.

    \ 2876 IPR000569 \ The name HECT comes from 'Homologous to the E6-AP Carboxyl Terminus' PUBMED:7708685. Proteins containing this domain at the C-terminus include\ ubiquitin-protein ligase, which regulates ubiquitination of CDC25. Ubiquitin-protein ligase accepts ubiquitin from an E2 ubiquitin-conjugating enzyme in the form of a thioester, and then directly transfers the ubiquitin to targeted substrates. A cysteine residue is required for ubiquitin-thiolester formation. Human thyroid receptor interacting\ protein 12, which also contains this domain, is a component of an ATP-dependent multisubunit protein that interacts with the ligand binding domain of the thyroid hormone receptor. It could be an E3 ubiquitin-protein ligase. Human ubiquitin-protein ligase E3A interacts with the E6 protein of the cancer-associated human papillomavirus types 16 and 18. The E6/E6-AP complex binds to and targets the P53 tumor-suppressor protein for ubiquitin-mediated\ proteolysis.\ 302 IPR006775 \

    This domain, usually associated with the C terminus, represents a conserved region in uncharacterised proteins with a pankaryotic distribution.

    \ 1535 IPR004835 \ Chitin synthase (), also known as chitin-UDP acetyl-glucosaminyl transferase, is a plasma membrane-bound protein which catalyses the conversion of UDP-N-acettyl-D-glucosamine and {(1,4)-(N-acetyl- beta-D-glucosaminyl)}(N) to UDP and {(1,4)-(N-acetyl-beta-D-\ glucosaminyl)}(N+1). It plays a major role in cell wall biogenesis. \ 2940 IPR002600 \ This family consists of various functionally undefined proteins\ from the herpesviridae and UL7 from bovine herpes virus PUBMED:8551568, PUBMED:7793062.\ UL7 is not essential for virus replication in\ cell culture, and is found localized in the cytoplasm of\ infected cells accumulated around the nucleus\ but could not be detected in purified virions PUBMED:8551568. \ Members of the herpesviridae have a dsDNA genome and do\ not have a RNA stage during there replication.\ 7745 IPR012877 \

    This region is found in a number of Caenorhabditis elegans and Caenorhabditis briggsae proteins, in one case () as a repeat. In many of the family members, this region is associated with the CHK region described by SMART as being found in ZnF_C4 and HLH domain-containing kinases. In fact, one member of this family () is annotated as being a member of the nuclear hormone receptor family, and contains regions typical of such proteins (, , and ).

    \ 7883 IPR012558 \

    This family consists of erythromycin resistance gene leader peptides. These leader peptides are involved in the translational attenuation of erythromycin resistance genes. Interestingly, the consensus sequence of peptides conferring erythromycin resistance is similar to that of the leader peptides, thus indicating that a similar type of interaction between the nascent peptide and antibiotics can occur in both cases PUBMED:11587794.

    \ 4818 IPR005337 \

    This is a family of putative P-loop ATPases. Many of the proteins in this family are hypothetical and kinase activity has been proposed for some family members.

    \ 6225 IPR004462 \ This domain is found as essentially the full length of desulphoredoxin, a 37-residue homodimeric non-heme iron protein. It is also found as the N-terminal domain of desulphoferrodoxin (rbo), a homodimeric non-heme iron protein with 2 Fe atoms per monomer in different oxidation states. This domain binds the ferric rather than the ferrous Fe of desulphoferrodoxin. Neelaredoxin, a monomeric blue non-heme iron protein, lacks this domain.\ 5154 IPR007991 \

    This family consists of several eukaryotic proteins which are homologous to the Saccharomyces cerevisiae RRN3 protein. RRN3 is one of the RRN genes specifically required for the transcription of rDNA by RNA polymerase I (Pol I) in the S. cerevisiae PUBMED:8670901 RNA polymerase I complex within the nucleolus.\ In mammalian cells, the phosphorylation state of Rrn3 regulates rDNA transcription by determining the steady-state\ concentration of the Rrn3 PUBMED:12015311.

    \ 2970 IPR000079 \

    High mobility group (HMG) proteins constitute a family of relatively low molecular weight non-histone components in chromatin. HMG14 and HMG17 are highly-similar proteins of about 100 amino acid residues; the sequence of chicken HMG14 is almost as similar to chicken HMG17 as it is to mammalian HMG14 polypeptides PUBMED:3384337. The proteins bind to the inner side of the nucleosomal DNA, altering the interaction between the DNA and the histone octamer. It is thought that they may be involved in the process that confers specific chromatin conformations to transcribable regions in the genome PUBMED:3754870.

    \

    The SMART signature describes a nucleosomal binding domain, which facilitates binding of proteins to nucleosomes in chromatin. The domain is most commonly found in the high mobility group (HMG) proteins, HMG14 and HMG17, however, it is also found in other proteins which bind to nucleosomes, e.g. NBP-45. NBP-45 is a nucleosomal binding protein, first identified in mice PUBMED:10692437, which is related to HMG14 and HMG17. NBP-45 binds specifically to nucleosome core particles, and can function as a transcriptional activator. These findings led to the suggestion that this domain, common to NBP-45, HMG14 and HMG17 is responsible for binding of the proteins to nucleosomes in chromatin.

    \ 4061 IPR003180 \ Methylpurine-DNA glycosylase is a base excision-repair protein. It is responsible for the hydrolysis of the deoxyribose N-glycosidic bond, excising 3-methyladenine and 3-methylguanine from damaged DNA. \ 7235 IPR009982 \

    This family consists of several VP6 proteins from the Banna virus as well as a related protein VP5 from the Kadipiro virus. Members of this family are typically of around 420 residues in length. The function of this family is unknown.

    \ 5261 IPR008816 \ This family consists of several Rickettsia genus specific 17 kDa surface antigen proteins.\ 5352 IPR008731 \

    This sequence identifies proteins which are a component of the phosphoenolpyruvate-dependent sugar phosphotransferase system, a major carbohydrate active -transport system. Enzyme I transfers the phosphoryl group from phosphoenolpyruvate (PEP) to the phosphoryl carrier protein (HPr). Enzyme I is common to all phosphotransferase systems.

    \ 5538 IPR008455 \ This region is found in a group of Dictyostelium discoideum proteins. It is likely to form a coiled-coil. Some of the proteins are regulated by cyclic AMP and are expressed late in development PUBMED:2157129.\ 1632 IPR006841 \

    This is a family of Coronavirus nonstructural protein NS2. Phosphoamino acid analysis confirmed the phosphorylated nature of NS2 and identified serine and threonine as its phosphorylated amino acid residues PUBMED:1833877. It was also demonstrated that the ns2 gene product is not essential for murine hepatitis virus replication in transformed murine cells PUBMED:2168966.

    \ 4918 IPR004311 \

    Proteins containing this domain include a number of Helicobacter pylori outer membrane proteins with multiple copies of this small conserved region.

    \ 1432 IPR001300 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the MEROPS peptidase family C2 (calpain family, clan CA). A type example is calpain, which is an intracellular protease involved in many important cellular functions that are regulated by calcium PUBMED:2539381. The protein is a complex of 2\ polypeptide chains (light and heavy), with three known forms in mammals\ PUBMED:7845226, PUBMED:2555341: a highly calcium-sensitive (i.e., micro-molar range) form known as mu-calpain, mu-CANP or calpain I; a form sensitive to calcium in the milli-molar range, known as m-calpain, m-CANP or calpain II; and a third form, known as p94, which is found in skeletal muscle only PUBMED:2555341.

    \ \

    All forms have identical light but different heavy chains. Both mu- and m-calpain are heterodimers containing an identical 28-kDa subunit and an 80-kDa subunit that shares 55-65% sequence homology between the two proteases PUBMED:7845226, PUBMED:2539381. The crystallographic structure of m-calpain reveals six "domains" in the 80-kDa subunit:

    \ \
      \
    1. A 19-amino acid NH2-terminal sequence;
    2. \
    3. Active site domain IIa;
    4. \
    5. Active site domain IIb.\ \

      Domain 2 shows\ low levels of sequence similarity to papain; although the catalytic His has\ not been located by biochemical means, it is likely that calpain and papain\ are related PUBMED:7845226.

      \ \
    6. \
    7. Domain III;
    8. \
    9. An 18-amino acid extended sequence linking domain III to domain IV;
    10. \
    11. Domain IV, which resembles the penta EF-hand family of polypeptides, binds calcium and regulates activity PUBMED:7845226. />. Ca2+-binding causes a rearrangement of the protein backbone, the net effect of which is that a Trp side chain, which acts as a wedge between catalytic domains IIa and IIb in the apo state, moves away from the active site cleft allowing for the proper formation of the catalytic triad PUBMED:11914728.
    12. \
    \ \ \

    Calpain-like mRNAs have been identified in other organisms including bacteria, but the molecules encoded by these mRNAs have not been isolated, so little is known\ about their properties. How calpain activity is regulated in these organisms cells is still unclear In metazoans, the activity of calpain is controlled by a single proteinase inhibitor, calpastatin (). The calpastatin gene can produce eight or more calpastatin polypeptides ranging from 17 to 85 kDa by use of different promoters and alternative splicing events. The physiological significance of these different calpastatins is unclear, although all bind to three different places on the calpain molecule; binding to at least two of the sites is Ca2+ dependent. The calpains ostensibly participate in a variety of cellular processes including remodelling of cytoskeletal/membrane attachments, different signal transduction pathways, and apoptosis. Deregulated calpain activity following loss of Ca2+ homeostasis results in tissue damage in response to events such as myocardial infarcts, stroke, and brain trauma PUBMED:12843408.

    \ \ 827 IPR005097 \ This family comprised of three structural domains that can not be separated in the linear sequence. In some\ organisms this enzyme is found as a bifunctional polypeptide with lysine ketoglutarate reductase (PF). The\ saccharopine dehydrogenase can also function as a saccharopine reductase.\ 1099 IPR003381 \

    The late 100 kDa protein is a non-structural viral protein involved in the transport of hexon from the cytoplasm to the nucleus.

    \ 141 IPR004176 \ This short domain is found in one or two copies at the amino terminus of ClpA and ClpB proteins from bacteria and eukaryotes. The function of these domains is uncertain but they may form a protein binding site PUBMED:10982797. The proteins are thought to be subunits of ATP-dependent proteases which act as chaperones to target the proteases to substrates.\ 6666 IPR009651 \

    This family represents the aluminium resistance protein, which confers resistance to aluminium in bacteria PUBMED:9367855.

    \ 6337 IPR009481 \

    This entry represents the N-terminal region of the bacterial heat shock protein HtpX, the pattern being seen in conjunction with . The abnormal accumulation of misfolded proteins outside the plasma (cytoplasmic or inner) membrane up-regulates the synthesis of a class of envelope-localized catalysts of protein folding and degradation. The pathway for this transmembrane signalling in Escherichia coli is mediated by the CpxR-CpxA two-component phospho-relay mechanism PUBMED:12081643. Expression of HtpX in the plasma membrane is under the control of CpxR, with the metalloproteinase active site of HtpX located on the cytosolic side of the membrane. This suggests a potential role for HtpX in the response to mis-folded proteins.

    \ 5872 IPR010324 \

    Dam-replacing protein (DRP) is a restriction endonuclease that is flanked by pseudo-transposable small repeat elements. The replacement of Dam-methylase by DRP allows phase variation through slippage-like mechanisms in several pathogenic isolates of Neisseria meningitidisPUBMED:11334887.

    \ 6998 IPR009835 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This family contains cysteine peptidases belonging to MEROPS peptidase family C60 (clan C-), subfamily C60B. It contains bacterial sortase B proteins that are approximately 200 residues long. Sortase, a transpeptidase present in almost all Gram-positive bacteria, anchors a range of important surface proteins to the cell wall PUBMED:11239768.

    \ 1162 IPR001303 \

    This family includes:\ \ rhamnulose-1-phosphate aldolase (), \ \ L-fuculose phosphate aldolase () PUBMED:8515438, PUBMED:8676381 that is involved in the third step in fucose metabolism, \ \ L-ribulose- 5-phosphate 4-epimerase () involved in the third step of L-arabinose catabolism, \ \ a probable sugar isomerase SgbE, hypothetical proteins and the metazoan \ \ adducins which have not been ascribed any enzymatic function but which play a role in cell membrane cytoskeleton organisation.

    \ \ \

    Adducins are members of the Ig superfamily and encode cell surface sialoglycoproteins expressed by cytokine-activated endothelium. This type I membrane protein mediates leukocyte-endothelial cell adhesion and signal transduction, and may play a role in the development of artherosclerosis and rheumatoid arthritis.\ \ \ Adducin is a cell-membrane skeletal protein that was first purified from human erythrocytes and subsequently isolated from bovine brain membranes. Isoforms of this protein have been detected in lung, kidney, testes and liver. Erythrocyte adducin is a 200-kDa heterodimer protein, composed of alpha and beta subunits, present at about 30,000 copies per cell. It binds with high affinity to Ca(2+)/calmodulin and is a substrate for protein kinases A and C. Both alpha-adducin and beta-adducin show alternative splicing. Thus, there may be several different heterodimeric or homodimeric forms of adducin, each with a different functional specificity. It is thought to play a role in assembly of the spectrin-actin lattice that underlies the plasma membrane PUBMED:102560. Missense mutations in both the alpha- and beta-adducin genes that alter amino acids that are normally phosphorylated have been associated with the regulation of blood pressure in the Milan hypertensive strain (MHS) of rats.\ Gamma adducin was isolated from human foetal brain PUBMED:8893809. It shows a high degree of similarity to the alpha and beta adducins.

    \ \ 839 IPR003309 \ A number of C2H2-zinc finger proteins contain a highly conserved N-terminal motif termed the SCAN domain. The SCAN domain may play an important role in the assembly and function of this newly defined subclass of transcriptional regulators PUBMED:10567577.\ 2044 IPR007169 \ This is a bacterial family of unknown function.\ 1856 IPR002834 \

    This archaebacterial domain has no known function. It is attached to a\ DNA-binding domain of one protein suggesting that this domain might be involved in recognizing some regulatory molecule.

    \ 3156 IPR012679 \

    Laminins are large heterotrimeric glycoproteins involved in basement membrane function PUBMED:15037599. The laminin globular (G) domain can be found in one to several copies in various laminin family members, which includes a large number of extracellular proteins. The C-terminus of laminin alpha chain contains a tandem repeat of five laminin G domains, which are critical for heparin-binding and cell attachment activity PUBMED:10747011. Laminin alpha4 is distributed in a variety of tissues including peripheral nerves, dorsal root ganglion, skeletal muscle and capillaries; in the neuromuscular junction, it is required for synaptic specialisation PUBMED:15823034. The structure of the laminin-G domain has been predicted to resemble that of pentraxin PUBMED:9480764.

    \ \

    Laminin G domains can vary in their function, and a variety of binding functions has been ascribed to different LamG modules. For example, the laminin alpha1 and alpha2 chains each has five C-teminal laminin G domains, where only domains LG4 and LG5 contain binding sites for heparin, sulphatides and the cell surface receptor dystroglycan PUBMED:10747011. Laminin G-containing proteins appear to have a wide variety of roles in cell adhesion, signalling, migration, assembly and differentiation. This entry represents one subtype of laminin G domains, which is sometimes found in association with thrombospondin-type laminin G domains ().

    \ 5142 IPR007979 \

    This family consists of several ICEA proteins from Helicobacter\ pylori. H. pylori infection causes gastritis and\ peptic ulcer disease, and the bacteria is classified as a definite carcinogen of gastric cancer. ICEA1 is speculated\ to be associated with peptic ulcer disease and may have endonuclease activity PUBMED:11843964.

    \ 3489 IPR005054 \ The proteins of this entry are derived from nepoviruses. Together with comoviruses and picornaviruses, nepoviruses are classified in the picornavirus superfamily of plus strand single-stranded RNA viruses. This family\ aligns several nepovirus coat protein sequences. In several cases, this is found at the C-terminus of the RNA2-encoded viral polyprotein. The coat protein consists of three trapezoid-shaped beta-barrel domains, and\ forms a pseudo T = 3 icosahedral capsid structure PUBMED:9519407.\ 4212 IPR002136 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This family includes ribosomal L4/L1 from eukaryotes and plants and L4 from bacteria. L4 from yeast has been shown to bind rRNA PUBMED:9838082. These proteins have 246 (plant) to 427 (human) amino acids.

    \ 7914 IPR012591 \

    The PRO8NT domain is found at the N terminus of pre-mRNA splicing factors of PRO8 family PUBMED:15112237.

    \ 6436 IPR009524 \

    This family consists of several hypothetical mammalian proteins (from mouse and human). The function of this family is unknown.

    \ 4611 IPR003196 \ Accurate transcription in vivo requires at least six general transcription initiation factors, in addition to RNA polymerase II. Transcription initiation factor IIF (TFIIF) is a tetramer of two beta subunits associate with two alpha subunits which interacts directly with RNA polymerase II. The beta subunit of TFIIF is required for recruitment of RNA polymerase II onto the promoter. \ 7961 IPR012950 \

    This family consists of the alpha and beta enterocins and lactococcin G peptides. These peptides have some antimicrobial properties; they inhibit the growth of Enterococcus spp. and a few other Gram-positive bacteria. These peptides act as pore-forming toxins that create cell membrane channels through a barrel-stave mechanism and thus produce an ionic imbalance in the cell. This family of antimicrobial peptides belongs to the class II group of bacteriocin PUBMED:10742203.

    \ 6965 IPR010788 \

    This family represents a conserved region approximately 350 residues long within plant violaxanthin de-epoxidase (VDE). In higher plants, violaxanthin de-epoxidase forms part of a conserved system that dissipates excess energy as heat in the light-harvesting complexes of photosystem II (PSII), thus protecting them from photo-inhibitory damage PUBMED:8692813.

    \ 7394 IPR011440 \

    This domain is found as 1-2 copies in a small family of proteins of unknown function.

    \ 7064 IPR009873 \

    This family consists of several Phytoreovirus S7 proteins which are thought to be viral core proteins PUBMED:2313270.

    \ 7124 IPR009913 \

    This family consists of several bacterial conjugative transfer TraP proteins from Escherichia coli and Salmonella typhimurium. TraP appears to play a minor role in conjugation and may interact with TraB, which varies in sequence along with TraP, in order to stabilise the proposed transmembrane complex formed by the tra operon products PUBMED:8655498.

    \ 523 IPR000794 \ Beta-ketoacyl-ACP synthase () (KAS) PUBMED:3076376 is the enzyme that catalyzes\ the condensation of malonyl-ACP with the growing fatty acid chain. It is found as a component\ of a number of enzymatic systems, including fatty acid synthetase (FAS), which catalyzes the\ formation of long-chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH; the \ multi-functional 6-methysalicylic acid synthase (MSAS) from Penicillium patulum PUBMED:2209605, which is\ involved in the biosynthesis of a polyketide antibiotic; polyketide antibiotic synthase enzyme\ systems; Emericella nidulans multifunctional protein Wa, which is involved in the biosynthesis\ of conidial green pigment; Rhizobium nodulation protein nodE, which probably acts as a \ beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl chain; and yeast\ mitochondrial protein CEM1. The condensation reaction is a two step process, first the acyl\ component of an activated acyl primer is transferred to a cysteine residue of the enzyme and\ is then condensed with an activated malonyl donor with the concomitant release of carbon\ dioxide.\ 493 IPR002145 \

    CopG, also known as RepA, is responsible for the regulation of plasmid copy number. It binds to the repAB promoter and controls synthesis of the plasmid replication initiator protein RepB. Many bacterial transcription regulation proteins bind DNA through a 'helix-turn-helix' motif, nevertheless CopG displays a fully defined HTH-motif structure that is involved not in DNA-binding, but in the maintenance of the intrinsic dimeric functional structure and cooperativity PUBMED:9714164, PUBMED:9857196.

    \ 5086 IPR007923 \

    This family consists of several herpesvirus glycoprotein L or UL1 proteins. Glycoprotein L is\ known to form a complex with glycoprotein H but the function of this complex is poorly understood\ PUBMED:9267002.

    \ 1573 IPR003063 \ The cloacin immunity protein complexes with cloacin in equimolar quantities\ and inhibits it by binding with high affinity to the cloacin C-terminal\ catalytic domain. The immunity protein is relatively small, containing 85\ amino acids.

    An extra ribosome binding site has been found to precede the immunity gene on the polycistronic Clo DF13 mRNA PUBMED:6253914, which perhaps accounts for the fact that, in cloacinogenic cells, more immunity protein than cloacin is synthesised PUBMED:6253914. Comparison of the complete amino acid sequence of the Clo DF13 immunity protein with that of the Col E3 and Col E6 immunity proteins reveals extensive similarities in primary structure, although Col E3 and Clo DF13 immunity proteins are exchangeable only to a low extent in vivo and in vitro PUBMED:6253914.

    \ 8128 IPR013250 \

    SNM1 is a subunit of RNase MRP (mitochondrial RNA processing), a ribonucleoprotein endoribonuclease that has roles in both mitochondrial DNA replication and nuclear 5.8S rRNA processing. SNM1 is an RNA binding protein that binds the MRP RNA specifically PUBMED:10523674.

    \ 6535 IPR010607 \

    This family consists of several hypothetical Rhizobiales specific proteins of around 270 residues in length. The function of this family is unknown.

    \ 3411 IPR001185 \ Mechanosensitive ion channels (MscL) play a critical role in transducing physical stresses\ at the cell membrane into an electrochemical response. MscL is a protein which forms a\ channel organized as a homopentamer, with each subunit containing two transmembrane\ regions PUBMED:9856938. Prokaryotes harbor a\ large-conductance mechanosensitive channel (gene mscL) that opens in response to stretch\ forces in the membrane lipid bilayer and may participate in the regulation of osmotic\ pressure changes within the cell PUBMED:9632260.\ 4309 IPR002107 \ This protein has been called NSP4, NSP5, NS28, and NCVP5. The final steps in the assembly of rotavirus occur in the lumen of the endoplasmic reticulum (ER). Targeting of the immature inner capsid particle (ICP) to this compartment is mediated by the cytoplasmic tail of NSP4, located in the ER membrane PUBMED:2548854, PUBMED:8887538.\ 5172 IPR008009 \

    This alignment represents the conserved core region of a ~90 residue repeat found in several\ haemagglutinins and other cell surface proteins. Sequence similarities to Hyalin () and the PKD domain () suggest an Ig-like fold so this family may be similar in function to the () and () protein families.

    \ 7604 IPR011677 \ This is a group of sequences derived from hypothetical eukaryotic proteins. The region in question is approximately 330 residues long and has a cysteine rich N terminus.\ 4282 IPR007642 \

    RNA polymerases catalyse the DNA-dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Rpb2 is the second largest subunit of the RNA polymerase. This domain forms one of the two distinctive lobes of the Rpb2 structure. This domain is also known as the lobe domain PUBMED:11313498. DNA has been demonstrated to bind to the concave surface of the lobe domain, and plays a role in maintaining the transcription bubble. Many of the bacterial members contain large insertions within this domain, a region known as dispensable region 1 (DRI).

    \ 4618 IPR006942 \ TH1 is a highly conserved but uncharacterised metazoan protein. No homologue has been identified in Caenorhabditis elegans PUBMED:11030415. TH1 binds specifically to A-Raf kinase PUBMED:11952167.\ 4009 IPR004931 \ Prothymosin alpha and parathymosin are two ubiquitous small acidic nuclear proteins that are thought to be involved in cell cycle\ progression, proliferation, and cell differentiation PUBMED:10854063. \ \ 2012 IPR005590 \

    This family consists of bacterial proteins whose function has not been characterised.

    \ 6458 IPR010584 \

    This family consists of several Enterobacterial exodeoxyribonuclease VIII proteins.

    \ 5504 IPR008698 \ This family consists of several NADH-ubiquinone oxidoreductase B18 subunit proteins from different eukaryotic organisms. Oxidative phosphorylation is the well-characterised process in which ATP, the principal carrier of chemical energy of individual cells, is produced due to a mitochondrial proton gradient formed by the transfer of electrons from NADH and FADH2 to molecular oxygen. The oxidative phosphorylation (OXPHOS) system is located in the mitochondrial inner membrane and consists of five multi-subunit enzyme complexes and two small electron carriers: coenzyme Q10 and cytochrome C. At least 70 structural proteins involved in the formation of the whole OXPHOS system are encoded by nuclear genes, whereas 13 structural proteins are encoded by the mitochondrial genome. Deficiency of NADH ubiquinone oxidoreductase, the first enzyme complex of the mitochondrial respiratory chain, is one of the most frequent causes of Homo sapiens mitochondrial encephalomyopathies PUBMED:10830904.\ 2194 IPR007488 \

    Family member Shigella flexneri VirK () is a virulence protein required for the expression, or correct membrane localisation of IcsA (VirG) on the bacterial cell surface PUBMED:1406277, PUBMED:11115111. This family also includes Pasteurella haemolytica lapB (), which is thought to be membrane-associated.

    \ 7746 IPR012913 \

    The sequences found in this family are similar to a region found in the beta-subunit of glucosidase II (), which is also known as protein kinase C substrate 80K-H (PRKCSH). The enzyme catalyses the sequential removal of two alpha-1,3-linked glucose residues in the second step of N-linked oligosaccharide processing PUBMED:10929008. The beta subunit is required for the solubility and stability of the heterodimeric enzyme, and is involved in retaining the enzyme within the endoplasmic reticulum PUBMED:10929008. Mutations in the gene coding for PRKCSH have been found to be involved in the development of autosomal dominant polycystic liver disease (ADPLD), but the precise role the protein has in the pathogenesis of this disease is unknown PUBMED:12529853.

    \ 2009 IPR005586 \

    The proteins in this family are uncharacterised. The proteins are 170-190 amino residues in length.

    \ 7759 IPR012501 \

    This family contains various proteins that are homologues of the yeast Vps54 protein, such as the rat homolog (), the human homolog (), and the mouse homolog (). In yeast, Vps54 associates with Vps52 and Vps53 proteins to form a trimolecular complex that is involved in protein transport between Golgi, endosomal, and vacuolar compartments PUBMED:12039048. All Vps54 homologues contain a coiled coil region (not found in the region featured in this family) and multiple dileucine motifs PUBMED:12039048.

    \ 4396 IPR007843 \ Selenoprotein W contains selenium as selenocysteine in the primary protein structure and levels of this selenoprotein are affected by selenium PUBMED:12405536. The precise role of this family is unclear.\ 2415 IPR006760 \

    This is a conserved region found in both cAMP-regulated phosphoprotein 19 (ARPP-19) and alpha/beta endosulphine. No function has yet been assigned to ARPP-19. Endosulphine is the endogenous ligand for the ATP-dependent potassium channels which occupy a key position in the control of insulin release from the pancreatic beta cell by coupling cell polarity to metabolism. In both cases the region occupies the majority of the protein PUBMED:11279279, PUBMED:11213264.

    \ 7207 IPR009968 \

    This family consists of several bacterial proteins of around 175 residues in length. Members of this family seem to be found exclusively in Chlamydia species. The function of this family is unknown.

    \ 5183 IPR008020 \

    The major coat protein in the capsid of filamentous bacteriophage forms a helical assembly of about\ 7000 identical protomers, with each protomer comprised of 46 amino acids, after the cleavage of the\ signal peptide. Each protomer forms a slightly curved helix that combines to form a tubular structure\ that encapsulates the viral DNA PUBMED:10666593.

    \ 647 IPR007220 \

    All DNA replication initiation is driven by a single conserved eukaryotic initiator complex termed the origin recognition complex (ORC). The ORC is a six protein complex. The function of ORC is reviewed in PUBMED:7867956. This entry is subunit 2, which binds the origin of replication. It plays a role in chromosome replication and mating type transcriptional silencing.

    \ 899 IPR006809 \ The general transcription factor, TFIID, consists of the TATA-binding protein (TBP) associated with a series of TBP-associated factors (TAFs) that together participate in the assembly of the transcription preinitiation complex. The conserved region is found at the C terminus of most member proteins. The crystal structure of hTAFII28 with hTAFII18 shows that this region is involved in the binding of these two subunits. The conserved region contains four alpha helices and three loops arranged as in histone H3 PUBMED:7729427, PUBMED:9695952.\ 4580 IPR005332 \

    Two small nested genes (p19 and p22) are located near the 3' end of the genome of tomato bushy\ stunt virus (TBSV) - the p19 gene encodes a soluble protein, whereas the p22 gene specifies a membrane-associated protein. p22 is required for cell-to-cell movement in all plants tested. PUBMED:7491767.

    \ 718 IPR007070 \ This family of eukaryotic proteins include phosphatidylinositolglycan class N (PIG-N), which is the mammalian homologue of the yeast protein MCD4P expressed in the endoplasmic reticulum PUBMED:10574991. PIG-N is essential for glycosylphosphatidylinositol anchor synthesis. Glycosylphosphatidylinositol (GPI)-anchored proteins are cell surface-localised proteins that serve many important cellular functions PUBMED:10069808.\ 171 IPR002059 \ When Escherichia coli is exposed to a temperature drop from 37 to 10 degrees\ centigrade, a 4-5 hour lag phase occurs, after which growth is resumed at\ a reduced rate PUBMED:1912512. During the lag phase, the expression of around 13\ proteins, which contain specific DNA-binding regions PUBMED:2247479, is increased\ 2-10 fold. These so-called 'cold shock' proteins are thought to help the\ cell to survive in temperatures lower than optimum growth temperature, by\ contrast with heat shock proteins, which help the cell to survive in\ temperatures greater than the optimum, possibly by condensation of the\ chromosome and organization of the prokaryotic nucleoid PUBMED:1912512.\ A conserved domain of about 70 amino acids has been found in prokaryotic and\ eukaryotic DNA-binding proteins PUBMED:1622933, PUBMED:2184368, PUBMED:8022259. This domain is known as the\ 'cold-shock domain' (CSD), part of which is highly similar PUBMED:1614871 to the RNP-1 RNA-binding motif.\ 2866 IPR004109 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This signature identifies the Hepatitis C virus NS3 protein as a serine protease which belongs to MEROPS peptidase family S29 (hepacivirin family, clan PA(S)), which has a trypsin-like fold. The non-structural (NS) protein NS3 is one of the NS proteins involved in replication of the HCV genome. The NS2 proteinase (), a zinc-dependent enzyme, performs a single proteolytic cut to release the N-terminus of NS3. The action of NS3 proteinase (NS3P), which resides in the N-terminal one-third of the NS3 protein, then yields all remaining non-structural proteins. The C-terminal two-thirds of the NS3 protein contain a helicase. The functional relationship between the proteinase and helicase domains is unknown. NS3 has a structural zinc-binding site and requires cofactor NS4. It has been suggested that the NS3 serine protease of hepatitus C virus is involved in cell transformation and that the ability to transform requires an active enzyme PUBMED:11264729.

    \ 6059 IPR010416 \

    This is a family of plasmid encoded proteins with unknown function.

    \ 3192 IPR006199 \

    This is the DNA binding domain of the LexA SOS regulon\ repressor which prevents expression of DNA repair proteins in bacteria.\ The aligned region contains a variant form of the helix-turn-helix DNA\ binding motif PUBMED:8076591.\ This domain usually at the N terminus is found associated with the auto-proteolytic domain of LexA .

    \ 6622 IPR010648 \

    This is a group of proteins of unknown function.

    \ 3969 IPR004976 \ Poly(A) polymerase () catalyses template-independent extension of the 3'-end of a DNA or RNA strand by one nucleotide at a time. The Poxvirus enzyme creates the 3'(poly)A tail of mRNAs, and is a heterodimer of a catalytic and a regulatory subunit. This is the catalytic subunit. \ 8068 IPR013257 \

    The SRI (Set2 Rpb1 interacting) domain mediates RNA polymerase II interaction and couples histone H3 K36 methylation with transcript elongation PUBMED:15798214.

    \ 2476 IPR003664 \ The plsX gene is part of the bacterial fab gene cluster which encodes several key fatty acid biosynthetic enzymes PUBMED:9642179.\ The plsX gene encodes a poorly understood enzyme of phospholipid\ metabolism PUBMED:10464226.\ 567 IPR002083 \

    Although apparently functionally unrelated, intracellular TRAFs and\ extracellular meprins share a conserved region of about 180 residues, the\ meprin and TRAF homology (MATH) domain PUBMED:12387856.\ \ Meprins are mammalian tissue-specific metalloendopeptidases of the astacin\ family implicated in developmental, normal and pathological processes by\ hydrolyzing a variety of proteins. Various growth factors, cytokines, and\ extracellular matrix proteins are substrates for meprins. They are composed of\ five structural domains: an N-terminal endopeptidase domain, a MAM domain (see\ ), a MATH domain, an EGF-like domain (see ) and a\ C-terminal transmembrane region. Meprin A and B form membrane bound\ homotetramer whereas homooligomers of meprin A are secreted. A proteolitic\ site adjacent to the MATH domain, only present in meprin A, allows the release\ of the protein from the membrane PUBMED:7890660.

    \ \

    TRAF proteins were first isolated by their ability to interact with TNF\ receptors PUBMED:8069916. They promote cell survival by the activation of downstream\ protein kinases and, finally, transcription factors of the NF-kB and AP-1\ family. The TRAF proteins are composed of 3 structural domains: a RING finger\ (see ) in the N-terminal part of the protein, one to seven TRAF\ zinc fingers (see ) in the middle and the MATH domain in the\ C-terminal part PUBMED:12387856. The MATH domain is necessary and sufficient for\ self-association and receptor interaction. From the structural analysis two\ consensus sequence recognized by the TRAF domain have been defined: a major\ one, [PSAT]x[QE]E and a minor one, PxQxxD PUBMED:10518213.\ \ The structure of the TRAF2 protein reveals a trimeric self-association of the\ MATH domain PUBMED:10206649. The domain forms a new, eight-stranded\ antiparallel beta sandwich structure. A coiled-coil region adjacent to the\ MATH domain is also important for the trimerisation. The oligomerisation is\ essential for establishing appropriate connections to form signaling complexes\ with TNF receptor-1. The ligand binding surface of TRAF proteins is located in\ beta-strands 6 and 7 PUBMED:10518213.

    \ \ 4448 IPR005511 \

    \ Regucalcin, also known as senesence marker protein-30 (SMP30), was\ discovered in 1978 as a Ca2+ binding protein that does not contain EF-hand\ motifs, suggesting a novel class of Ca2+ binding protein. It is primarily\ localised to the liver and kidney cortex of animals. Expression of its mRNA \ in the liver and renal cortex of rats is stimulated by an increase in \ cellular Ca2+ levels PUBMED:9920722, PUBMED:1315924. \

    \

    \ Regucalin, as a regulatory protein of Ca2+, has a pivotal role in the\ control of many cell functions. The protein has a reversible effect on\ Ca2+-induced activation and inhibition of many enzymes in both the liver and\ renal cortex cells PUBMED:1315924. It has also been shown to inhibit various protein\ kinases (including Ca2+/calmodulin-dependent protein kinase, protein\ kinase C and tyrosine kinase) and protein phosphatases, indicating a\ regulatory role in signal transduction within the cell. In addition, \ regucalcin regulates intracellular Ca2+ homeostasis by enhancing Ca2+- \ pumping activity in the plasma membrane through activation of the pump\ enzymes. Moreover, it can inhibit RNA synthesis in the nuclei of normal \ and regenerating rat livers in vitro PUBMED:9278268.\

    \

    \ Hydropathy profiles indicate hydrophobic domains in both N- and C-terminal \ regions of the regucalcin molecule; the protein also exhibits hydrophilic \ characteristics. Human and rodent regucalcins share 89% sequence \ identity, the high degree of conservation between species suggesting that \ the complete structure is required for physiological function. \

    \ \ 4013 IPR000501 \ The members of this family are associated with capsid intermediates during packaging of dsDNA viruses with no RNA stage in their replication cycle PUBMED:9696839. The protein may affect translocation of the virus glycoproteins to membranes, and is involved in capsid maturation.\ 1768 IPR007778 \ This family consists of REP proteins from a number of Dictyostelium species (Slime molds). REP protein is probably involved in transcription regulation and control of DNA replication, specifically the amplification of plasmid at low copy numbers. The formation of homomultimers may be required for their regulatory activity PUBMED:10366530.\ 653 IPR003429 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    The anti-apoptotic protein p35 from baculovirus is thought to prevent the suicidal response of\ infected insect cells by inhibiting caspases. Ectopic expression of p35 in a number of transgenic animals or cell lines is also anti-apoptotic, giving rise to the hypothesis that the protein is a general inhibitor of caspases.

    \ \

    Purified recombinant p35 inhibits human caspase-1, -3, -6, -7, -8, and -10 with kass values from 1.2 _ 103 to 7 _ 105 (M-1 s-1), and with upper limits of Ki values from 0.1 to 9 nM. Inhibition of 12 unrelated serine or cysteine proteases was insignificant, implying that p35 is a potent caspase-specific inhibitor, which belongs to MEROPS proteinase inhibitor family I50, clan IQ. The interaction of p35 with caspase-3, as a model of the inhibitory mechanism,revealed classic slow-binding inhibition, with both active-sites of the caspase-3 dimer acting equally and independently. Inhibition resulted from complex formation between the enzyme and inhibitor, which could be visualised under non-denaturing conditions, but was dissociated by SDS to give p35 cleaved at Asp87, the P1 residue of the inhibitor. Complex formation requires the substrate-binding cleft to be unoccupied PUBMED:9692966.

    \ \

    Infecting the insect cell line IPLB-Ld652Y with the baculovirus Autographa californica multinucleocapsid nucleopolyhedrovirus (AcMNPV) results in global translation arrest, which correlates with the presence of the AcMNPV apoptotic suppressor, p35. However, the anti-apoptotic function of p35 in translation arrest is not solely due to caspase inactivation, but its activity enhances signalling to a separate translation arrest pathway, possibly by stimulating the late stages of the baculovirus infection cycle PUBMED:14980489.

    \ \ 4459 IPR001189 \

    Superoxide dismutases (SODs) () catalyse the conversion of superoxide radicals to molecular oxygen. Their function is to destroy the radicals that are normally produced within cells and are toxic to biological systems. Three evolutionarily distinct families of SODs are known, of which the Mn/Fe-binding family is one PUBMED:3315461, PUBMED:3345848, PUBMED:1556751. This family includes both single metal-binding SODs and cambialistic SOD, which can bind either Mn or Fe. Fe/MnSODs are ubiquitous enzymes that are responsible for the majority of SOD activity in prokaryotes, fungi, blue-green algae and mitochondria. Fe/MnSODs are found as homodimers or homotetramers.

    \

    The structure of Fe/MnSODs can be divided into two domains, an alpha N-terminal domain and an alpha/beta C-terminal domain, connected by a loop. The structure of the N-terminal domain consists of a two helices in an antiparallel hairpin, with a left-handed twist PUBMED:9537987. The structure of the C-terminal domain is of the alpha/beta type, and consists of a three-stranded antiparallel beta-sheet in the order 213, along with four helices in the arrangement alpha/beta(2)/alpha/beta/alpha(2) PUBMED:9931259.

    \ \ 2533 IPR000774 \ Peptidyl-prolyl isomerases accelerate the folding of proteins, and the FKPA-type enzymes probably\ act in the folding of extracytoplasmic proteins. They catalyze the cis-trans isomerization of \ proline imidic peptide bonds in oligopeptides. This family is only found at the amino terminus\ of proteins belonging to the family. This particular domain is of unknown function.\ 1796 IPR004150 \

    DNA ligases catalyse the crucial step of joining the breaks in duplex DNA during DNA replication, repair and recombination, utilizing either ATP or NAD(+) as a cofactor PUBMED:10698952. This family is a small domain found after the adenylation domain DNA_ligase_N in NAD+-dependent ligases (). OB-fold domains generally are involved in nucleic acid binding.

    \ 4370 IPR007500 \

    This is a domain of unknown function found at the N-terminus of genes involved in cell wall development and nitrous oxide protection.

    \ \

    ScdA is required for normal cell growth and development; mutants have an increased level of peptidoglycan cross-linking and aberrant cellular morphology suggesting a role for ScdA in cell wall metabolism PUBMED:9308171.

    \ \

    NorA1, NorA2, and YtfE are involved in the nitrous oxide response. NorA1 and NorA2, which are similar to YtfE, are co-transcribed with the membrane-bound nitrous oxide (NO) reductases. The genes appear to be involved in NO protection but their function is unknown PUBMED:11069685, PUBMED:15546870.\

    \ 1682 IPR005627 \ Copper transport in Escherichia coli is mediated by the products of at least six genes, cutA, cutB, cutC, cutD, cutE, and cutF. A mutation in one or more of these genes results in an increased copper sensitivity. Members of this family are between 200 and 300 amino acids in length and are found in both eukaryotes and bacteria.\ 402 IPR004152 \ The GAT domain is responsible for binding of GGA proteins to several members of the ARF family including ARF1 PUBMED:10747089 and ARF3. The GAT domain stabilizes membrane bound ARF1 in its GTP bound state, by interfering with GAP proteins PUBMED:11301005.\ 6694 IPR009665 \

    This family consists of several uncharacterised bacterial proteins, which seem to be specific to the orders Clostridia and Bacillales. Family members are typically around 180 residues in length. The function of this family is unknown.

    \ 1750 IPR003433 \ The virus capsid is composed of 60 icosahedral units of a combination of VP4, VP3, VP2 and VP1. Four different translation initiation sites of the densovirus capsid protein mRNA give rise to these four viral proteins, VP1 to VP4. This family represents VP4.\ 2015 IPR005624 \ This entry contains uncharacterised proteins, including GlcG . The alignment contains many conserved motifs that are suggestive of cofactor binding and enzymatic activity.\ 3584 IPR007210 \ This domain is a part of a high affinity multicomponent binding-protein-dependent transport system involved in bacterial osmoregulation. This domain is often fused to the permease component of the transporter complex. It is often found in integral membrane proteins or proteins predicted to be attached to the membrane by a lipid anchor. Glycine betaine is involved in protection from high osmolarity environments for example in Bacillus subtilis PUBMED:7622480. OpuBC is closely related and involved in choline transport. Choline is necessary for the biosynthesis of glycine betaine PUBMED:10216873. L-carnitine is important for osmoregulation in Listeria monocytogenes. This domain is found also in proteins binding l-proline (ProX), histidine (HisX) and taurine (TauA).\ 3995 IPR001330 \

    The beta subunit of the farnesyltransferases is responsible for peptide binding.\ Squalene-hopene cyclase is a bacterial enzyme that catalyzes the cyclization of \ squalene into hopene, a key step in hopanoid (triterpenoid) metabolism PUBMED:9295270. \ Lanosterol synthase () (oxidosqualene-lanosterol cyclase) catalyzes the \ cyclization of (S)-2,3-epoxysqualene to lanosterol, the initial precursor of cholesterol, \ steroid hormones and vitamin D in vertebrates and of ergosterol in fungi PUBMED:8016864. \ Cycloartenol synthase () (2,3-epoxysqualene-cycloartenol cyclase) is a plant \ enzyme that catalyzes the cyclization of (S)-2,3-epoxysqualene to cycloartenol.

    \ 1861 IPR002840 \

    These archaebacterial proteins have no known function.

    \ 3721 IPR001578 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    \ This group of cysteine peptidases belong to the MEROPS peptidase family C12 (ubiquitin C-terminal hydrolase family, clan CA). Families within the CA clan are loosely termed papain-like as protein fold of the peptidase unit resembles that of papain, the type example for clan CA. The type example is the human ubiquitin C-terminal hydrolase UCH-L1.

    \ \

    Ubiquitin is highly conserved, commonly found conjugated to proteins in\ eukaryotic cells, where it may act as a marker for rapid degradation, or\ it may have a chaperone function in protein assembly PUBMED:7845226. The ubiquitin is released by cleavage from the bound protein by a protease PUBMED:7845226. A number of\ deubiquitinising proteases are known: all are activated by thiol compounds\ PUBMED:7845226, PUBMED:3015923, and inhibited by thiol-blocking agents and ubiquitin aldehyde PUBMED:7845226, PUBMED:3031653, and as such have the properties of cysteine proteases PUBMED:7845226.

    \ \

    The deubiquitinsing proteases can be split into 2 size ranges (20-30 kDa\ and 100-200 kDa, ) PUBMED:7845226: this family are the 20-30 kDa ppeptides which includes the yeast yuh1. Yeast yuh1 protease is known to be active only against small ubiquitin conjugates, being inactive against conjugated beta-galactosidase PUBMED:7845226. A mammalian homologue, UCH (ubiquitin conjugate hydrolase), is one of the most abundant proteins in the brain PUBMED:7845226. Only one conserved cysteine can be\ identified, along with two conserved histidines. The spacing between the\ cysteine and the second histidine is thought to be more representative of\ the cysteine/histidine spacing of a cysteine protease catalytic dyad PUBMED:7845226.

    \ 4320 IPR001737 \

    This family of proteins include rRNA adenine dimethylases (e.g. KsgA) and the Erythromycin resistance methylases (Erm).

    \ \

    The bacterial enzyme KsgA catalyzes the transfer of a total of four methyl groups from S-adenosyl-l-methionine (S-AdoMet) to two adjacent adenosine bases in\ 16S rRNA. This enzyme and the resulting modified adenosine bases appear to be conserved in all species of eubacteria, eukaryotes, and archaea, and in\ eukaryotic organelles. Bacterial resistance to the aminoglycoside antibiotic kasugamycin involves inactivation of KsgA and resulting loss of the dimethylations, with modest consequences to the overall fitness of the organism. In contrast, the yeast ortholog, Dim1, is essential. In yeast, and presumably in other eukaryotes, the\ enzyme performs a vital role in pre-rRNA processing in addition to its methylating activity. The best conserved region in these enzymes is located in the N-terminal section and corresponds to a region that is probably involved in S-adenosyl methionine (SAM) binding domain.

    \ \

    The crystal structure of KsgA from Escherichia coli has been solved to a resolution of 2.1A. It bears a strong similarity to the crystal structure of ErmC' from Bacillus stearothermophilus and a lesser similarity to the yeast mitochondrial transcription factor, sc-mtTFB PUBMED:15136037.

    \ \

    The Erm family of RNA methyltransferases, which methylate a single adenosine base in 23S rRNA confer resistance to the MLS-B group of\ antibiotics. Despite their sequence similarity, the two enzyme families have strikingly different levels of regulation that remain to be elucidated. Other orthologs, of this family include the yeast (MTF1, PUBMED:11567089) and human (h-mtTFB) mitochondrial transcription factors, which are nuclear encoded. Human-mtTFB is able to stimulate transcription in vitro independently of its S-adenosylmethionine binding and rRNA methyltransferase activity PUBMED:12897151.

    \ \ 7246 IPR010882 \

    This family consists of several acidic phosphoprotein precursor PCEMA1 sequences which appear to be found exclusively in Plasmodium chabaudi. PCEMA1 is an antigen that is associated with the membrane of the infected erythrocyte throughout the entire intraerythrocytic cycle PUBMED:1475002. The exact function of this family is unclear.

    \ 991 IPR003306 \ The WIF domain is found in the RYK tyrosine kinase receptors and WIF the Wnt-inhibitory-factor. The domain is extracellular and and contains two conserved cysteines that may form a disulphide\ bridge. This domain is Wnt binding in WIF, and it has been suggested that RYK may also bind to Wnt PUBMED:10637605.\ 5443 IPR008501 \ This family consists of several eukaryotic proteins of unknown function.\ 4261 IPR002858 \ Several multicopy gene families have been described in Plasmodium\ falciparum, including the stevor family of subtelomeric open reading\ frames and the rif interspersed repetitive elements. Both families\ contain three predicted transmembrane segments. It has been proposed\ that stevor and rif are members of a larger superfamily that code\ for variant surface antigens PUBMED:9879895.\ 6576 IPR010626 \

    This family represents a conserved region that is found within bacterial proteins, most of which are hypothetical. Some members contain multiple copies.

    \ 2226 IPR007578 \ This is a protein of unknown function, found in herpesvirus and cytomegalovirus.\ 6976 IPR010794 \

    This family consists of several maltose operon periplasmic protein precursor (MalM) sequences. The function of this family is unknown PUBMED:1730061.

    \ 4920 IPR007391 \ Members of this family include vancomycin resistance protein W (VanW). Genes encoding members of this family have been found in vancomycin resistance gene clusters vanB PUBMED:11376048 and vanG PUBMED:11036060. The function of VanW is unknown.\ 7 IPR006683 \

    This family contains a wide variety of enzymes, principally thioesterases. This family includes 4HBT () which catalyses the final step in the biosynthesis of 4-hydroxybenzoate from 4-chlorobenzoate in the soil dwelling microbe Pseudomonas CBS-3. This family includes various cytosolic long-chain acyl-CoA thioester hydrolases. Long-chain acyl-CoA hydrolases hydrolyse palmitoyl-CoA to CoA and palmitate, they also catalyse the hydrolysis of other long chain fatty acyl-CoA thioesters.

    \ 4974 IPR005593 \

    This bacterial enzyme splits fructose-6-P and/or xylulose-5-P with the aid of inorganic phosphate into either acetyl-P and erythrose-4-P and/or acetyl-P and glyeraldehyde-3-P , PUBMED:11292814. This family is distantly related to transketolases e.g. .

    \ 3163 IPR000784 \ This family includes the L2 minor capsid protein, a late protein from papillomaviruses.\ The papillomaviruses are dsDNA viruses with no RNA stage in their replication cycle.\ 714 IPR003719 \

    Five genes, phzF, phzA, phzB, phzC and phzD, encode enzymes for phenazine biosynthesis in the biological control bacterium Pseudomonas aureofaciens. Protein PhzF is similar to 3-deoxy-D-arabino-heptulosonate-7-phosphate synthases of solanaceous plants. PhzC is responsible for the conversion of phenazine-I-carboxylic acid to 2-hydroxy-phenazine-I-carboxylic acid PUBMED:8586283.

    \ 3567 IPR007684 \ This is a viral family of phage zinc-binding transcriptional activators, which also contains cryptic members in some bacterial genomes PUBMED:1597424. The P4 phage delta protein contains two such domains attached covalently, while the P2 phage Ogr proteins possess one domain but function as dimers. All the members of this family have the following consensus sequence: C-X(2)-C-X(3)-A-(X)2-R-X(15)-C-X(4)-C-X(3)-F PUBMED:9143285.\ 6144 IPR010453 \

    This family consists of several Arenavirus RNA polymerase proteins () PUBMED:2705303.

    \ 5040 IPR007363 \ This is a family of uncharacterised proteins.\ 6827 IPR009737 \

    This family contains a number of bacterial and eukaryotic proteins approximately 400 residues long that resemble ferredoxin and appear to have sucrolytic activity PUBMED:7957893.

    \ 2264 IPR002747 \ Protein found in Archaebacteria and Bacteria. These proteins have no known function.\ 7816 IPR012942 \

    SRR1 proteins are signalling proteins involved in regulating the circadian clock PUBMED:12533513.

    \ 3579 IPR003394 \ Pathogenic Neisseria spp. possess a repertoire of phase-variable opacity proteins that mediate various pathogen/host cell interactions PUBMED:10036728. These proteins are integral membrane proteins related to other porins and the Haemophilus influenzae OpA protein.\ 7255 IPR010885 \

    This family represents a conserved region approximately 130 residues long within a number of hypothetical archaeal proteins of unknown function. Some family members contain more than one copy of this region.

    \ 2691 IPR000657 \

    Geminiviruses are characterised by a genome of circular single-stranded\ DNA encapsidated in twinned (geminate) quasi-isometric particles, from which the group derives its name \ PUBMED:. Most geminiviruses can be divided into 2 subgroups\ on the basis of host range and/or insect vector. The genomes of the whitefly-transmitted \ cassava latent (CLV), \ tomato golden mosaic (TGMV) and \ bean golden mosaic (BGMV) viruses possess a bipartite genome. \ By contrast, only a single DNA component has been identified for the leafhopper-transmitted \ maize streak (MSV) and \ wheat dwarf (WDV) viruses PUBMED:6526009, PUBMED:2829117. \ Beet curly top (BCTV), bean summer death and \ tobacco yellow dwarf viruses belong to a third possible subgroup. \ Sequence comparison of the whitefly-transmitted squash leaf curl PUBMED:1984668 and \ tomato yellow leaf curl viruses PUBMED:1840676, PUBMED:1926771 \ with the genomic components of \ TGMV and \ BGMV reveals a close evolutionary relationship PUBMED:1984668. \ Amino acid sequence alignments of potato yellow mosaic viral (PYMV) \ proteins with those encoded by other geminiviruses show that \ PYMV is closely related to geminiviruses \ isolated from the New World, especially in the putative coat protein gene regions PUBMED:1856690.\

    \

    Geminiviruses contain three ORFs (designated AL1, AL2, and AL3) that \ overlap and are specified by multiple polycistronic mRNAs. The AL3 protein comprises approximately 0.05% \ of the cellular proteins and is present in the soluble and organelle fractions PUBMED:8030214. \ AL3 may form oligomers PUBMED:8794317. Immunoprecipitation of AL3 in a \ baculovirus expression system extracts expressing both AL1 and AL3 showed \ that the two proteins also complex with each other. The AL3 protein is involved in viral replication. \

    \ 7735 IPR012423 \

    The Saccharomyces cerevisiae member of this family is part of NuA4, the only essential histone acetyltransferase complex in Saccharomyces cerevisiae involved in global histone acetylation PUBMED:15353583.

    \ 3625 IPR006789 \ The Arp2/3 protein complex has been implicated in the control of actin polymerization. The human complex consists of seven subunits which include the actin related proteins Arp2 and Arp3, and five others referred to as p41-Arc, p34-Arc, p21-Arc, p20-Arc, and p16-Arc. The precise function of p16-Arc is currently unknown. Its structure consists of a single domain containing a bundle of seven alpha helices PUBMED:9230079, PUBMED:11721045.\ 5373 IPR008621 \ This family consists of several Cbb3-type cytochrome oxidase components (FixQ/CcoQ). FixQ is found in nitrogen fixing bacteria. Since nitrogen fixation is an energy-consuming process, effective symbioses depend on operation of a respiratory chain with a high affinity for O2, closely coupled to ATP production. This requirement is fulfilled by a special three-subunit terminal oxidase (cytochrome terminal oxidase cbb3), which was first identified in Bradyrhizobium japonicum as the product of the fixNOQP operon PUBMED:11717256.\ 7890 IPR012578 \

    Proteins containing this domain are components of the nuclear pore complex PUBMED:12791264. One member of this domain is Nucleoporin POM34 () which is thought to have a role in anchoring peripheral Nups into the pore and mediating pore formation PUBMED:12791264.

    \ 5603 IPR008394 \ This family consists of several AfaD and related proteins from Escherichia coli and Salmonella bacteria. The afa gene clusters encode an afimbrial adhesive sheath produced by E. coli. The adhesive sheath is composed of two proteins, AfaD and AfaE, which are independently exposed at the bacterial cell surface. AfaE is required for bacterial adhesion to HeLa cells and AfaD for the uptake of adherent bacteria into these cells PUBMED:10981717.\ 4691 IPR007350 \

    This domain corresponds to a C-terminal cysteine rich region that probably binds to a metal ion and could be DNA-binding. It is found in association with the DDE superfamily () and the Tc5 transposase family ().

    \ 2626 IPR002481 \

    The Ferric uptake regulator or FUR family includes metal ion uptake regulator proteins,\ which bind to the operator DNA and control the transcription\ of metal ion-responsive genes.

    \ 3650 IPR007010 \

    In eukaryotes, polyadenylation of pre-mRNA plays an essential role in the initiation step of protein synthesis, as well as in the export and stability of mRNAs. Poly(A) polymerase, the enzyme at the heart of the polyadenylation machinery, is a template-independent RNA polymerase that specifically incorporates ATP at the 3' end of mRNA. The crystal structure of bovine poly(A) polymerase bound to an ATP analogue at 2.5 A resolution has been determined PUBMED:10944102. The structure revealed expected and unexpected similarities to other proteins. As expected, the catalytic domain of poly(A) polymerase shares substantial structural homology with other nucleotidyl transferases such as DNA polymerase beta and kanamycin transferase.

    \ \

    The C-terminal domain unexpectedly folds into a compact domain reminiscent of the RNA-recognition motif fold. The three invariant aspartates of the catalytic triad ligate two of the three active site metals. One of these metals also contacts the adenine ring. Furthermore, conserved, catalytically important residues contact the nucleotide. These contacts, taken together with metal coordination of the adenine base, provide a structural basis for ATP selection by poly(A) polymerase.

    \ \ 5537 IPR008841 \ This family consists of several Siphovirus tail component proteins as well as some bacterial proteins of unknown function.\ 5321 IPR008389 \ ATP synthase subunit H is an extremely hydrophobic of approximately 9 kDa PUBMED:9556572. This subunit may be required for assembly of vacuolar ATPase PUBMED:9556572.\ 5955 IPR009305 \

    This family consists of several eukaryotic and prokaryotic proteins of unknown function. The yeast protein has been found to be non-essential for cell growth.

    \ 6297 IPR010508 \

    This domain is found in the neurobeachins. The function of this region is not known.

    \ 6502 IPR009560 \

    This family consists of several hypothetical bacterial proteins of around 340 residues in length. Members of this family contain six highly conserved cysteine residues. The function of this family is unknown.

    \ 7073 IPR009878 \

    This domain is found in several Phlebovirus glycoprotein G2 sequences. Members of the Bunyaviridae family acquire an envelope by budding through the lipid bilayer of the Golgi complex. The budding compartment is thought to be determined by the accumulation of the two heterodimeric membrane glycoproteins G1 and G2 in the Golgi PUBMED:9811692.

    \ 5616 IPR008723 \ This family consists of the RNA-dependent RNA polymerase protein VP1 from the Orbivirus. VP1 may have both enzymatic and structural roles in the virus life cycle PUBMED:1846500.\ 2570 IPR007047 \

    This entry is for the fimbriae associated protein Flp/Fap pilin component.

    \ 5292 IPR008771 \ This family consists of phi-29-like late genes activator (or early protein GP4). This protein is thought to be a positive regulator of late transcription and may function as a sigma-like component of the host RNA polymerase PUBMED:10438592.\ 813 IPR004018 \ The RPEL repeat is named after four conserved amino acids it contains. The function of the RPEL repeat is unknown however it might be a DNA binding repeat based on the observation that Q9VZY2 contains a SAP domain that is also implicated in DNA binding.\ 1465 IPR002586 \ This entry consists of various cobyrinic acid a,c-diamide synthases. \ These include CbiA and CbiP from \ Salmonella typhimurium PUBMED:7635831., and CobQ from Rhodobacter capsulatus PUBMED:8501034.\ These amidases catalyse amidations to various side chains of \ hydrogenobyrinic acid or cobyrinic acid a,c-diamide in the biosynthesis \ of cobalamin (vitamin B12) from uroporphyrinogen III.\ Vitamin B12 is an important cofactor and an essential nutrient for many plants and animals and is primarily produced by bacteria PUBMED:7635831.\ 7180 IPR009951 \

    This family consists of bacterial and phage Gam proteins. The gam gene of bacteriophage Mu encodes a protein which protects linear double stranded DNA from exonuclease degradation in vitro and in vivo PUBMED:2945162.

    \ Mu bacteriophage inserts its DNA into the genome of host bacteria and is used as a model for DNA transposition events in other systems. The eukaryotic Ku protein has key roles in DNA repair and in certain transposition events. It was shown through biochemical studies that Gam and the related protein of Haemophilus influenzae display DNA binding characteristics remarkably similar to those of human Ku PUBMED:12524520. In addition, Gam can interfere with Ty1 retrotransposition in Saccharomyces cerevisiae. These data reveal structural and functional parallels between bacteriophage Gam and eukaryotic Ku and suggest that their functions have been evolutionarily conserved PUBMED:12524520.

    \ \ 744 IPR000836 \ Members of PRT family are catalytic and regulatory proteins involved in nucleotide synthesis and salvage. The name PRT comes from phosphoribosyltransferase enzymes, which carry out phosphoryl transfer reactions on PRPP, an activated form of ribose-5-phosphate. This family includes a range of diverse phosphoribosyl transferase enzymes including adenine phosphoribosyltransferase (); hypoxanthine-guanine-xanthine phosphoribosyltransferase; \ hypoxanthine phosphoribosyltransferase (); ribose-phosphate pyrophosphokinase ();\ amidophosphoribosyltransferase (); orotate phosphoribosyltransferase ();\ uracil phosphoribosyltransferase (); and xanthine-guanine phosphoribosyltransferase \ (). Not all PRT proteins are enzymes. For example, in some bacteria PRT proteins regulate the expression of purine and pyrimidine synthetic genes. Members of the family are defined by the protein fold and by a short sequence motif, that was correctly predicted to be a PRPP-binding site. Apart of this motif, different PRT proteins have a low level of sequence identity, less than 15%. The PRT sequence motif is only found in PRTases from the nucleotide synthesis and salvage pathways. Other PRTases, from the tryptophan, histidine and nicotinamide synthetic and salvage pathways, lack the PRT sequence motif and are not members of this family.\ 3635 IPR004020 \

    Pyrin domain was identified as putative proteinprotein interaction domain at the N-terminal region of several proteins thought to function in apoptotic and inflammatory signaling pathways. Using secondary structure prediction and potential-based fold recognition methods, the PYRIN domain is predicted to be a member of the six-helix bundle death domain-fold superfamily that includes death domains (DDs), death effector domains (DEDs), and caspase recruitment domains (CARDs). Members of the death domain-fold superfamily are well established mediators of proteinprotein interactions found in many proteins involved in apoptosis and inflammation, indicating further that the PYRIN domains serve a similar function. Comparison of a circular dichroism spectrum of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1 with spectra of several proteins known to adopt the death domain-fold provides experimental support for the structure prediction. PUBMED:11514682 It is found in interferon-inducible proteins, pyrin and myeloid cell nuclear differentiation antigen.

    \ 6316 IPR010517 \

    This family consists of several Lactococcus lactis bacteriophage major structural proteins.

    \ 2605 IPR002376 \ A number of formyl transferases belong to this group.\ Methionyl-tRNA formyltransferase transfers a formyl group onto\ the amino terminus of the acyl moiety of the methionyl aminoacyl-tRNA. The formyl group appears to play a dual role in the initiator identity of N-formylmethionyl-tRNA by promoting its recognition by IF2 and by impairing its binding to EFTU-GTP.\ Formyltetrahydrofolate dehydrogenase produces formate from formyl-\ tetrahydrofolate. This is the N-terminal domain of these enzymes and is found upstream of the C-terminal domain ().\ \

    The trifunctional glycinamide ribonucleotide synthetase-aminoimidazole ribonucleotide synthetase-glycinamide ribonucleotide transformylase catalyses the second, third and fifth steps in de novo purine biosynthesis. The glycinamide ribonucleotide transformylase belongs to this group.

    \ 5076 IPR007913 \

    This family of proteins is functionally uncharacterised.

    \ 4792 IPR002493 \ The herpesvirus UL25 gene product is a virion component involved in virus\ penetration PUBMED:8615003 and capsid assembly. The product of the UL25 gene is\ required for packaging but not cleavage of replicated viral DNA PUBMED:8615003.\ This family includes a number of herpesvirus proteins: EHV-1 36, EBV BVRF1\ , HCMV UL77 , ILTV ORF2 , and VZV gene\ 34 .\ 1992 IPR005325 \

    This represents a group of short repeats that occurs in a limited number of membrane proteins. It may divide further in short repeats of around 7-10 residues of the pattern G-#-X(2)-#(2)-X (#=hydrophobic).

    \ 4937 IPR007617 \ This is a family of ssRNA positive-strand viral proteins. Conserved region is found in the Beta C and Beta D transcripts.\ 1285 IPR001469 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    This family represents subunits called delta and epsilon in human and metazoan species. In bacterial species the delta (D) subunit is the equivalent to the Oligomycin sensitive subunit (OSCP) in metazoans. The E. coli delta and metazoan OSCP subunits are found in Pfam family OSCP (OSCP).

    \ 393 IPR000024 \ The Frizzled CRD (cysteine rich domain) is conserved in diverse proteins including several receptor tyrosine kinases\ PUBMED:9637908, PUBMED:9684897, PUBMED:9852758.\ In Drosophila melanogaster, members of the Frizzled family of tissue-polarity genes encode proteins that appear to function as cell-surface receptors for Wnts. The Frizzled genes belong to the seven transmembrane class of receptors (7TMR) and have in their extracellular region a cysteine-rich domain that has been implicated as the Wnt binding domain. Sequence similarity between the cysteine-rich domain of Frizzled and several receptor tyrosine kinases, which have roles in development include the muscle-specific receptor tyrosine kinase (MuSK), the neuronal specific kinase (NSK2), and ROR1 and ROR2.\ \ The structure of this domain is known and is composed mainly of alpha helices.\ This domain contains ten conserved cysteines that form five disulphide bridges.\ These are shown in schematic form below\
    \
            +--------------+\
            |              |\
    --C--C--C--C--C--C--C--C--C--C--\
      |  |     |  |  |  |     |  |\
      |  +-----+  |  |  +-----+  |\
      +-----------+  +-----------+\
    
    \ 6419 IPR009514 \

    This family consists of several nuclear disruption (Ndd) proteins from T4-like phages. Early in a bacteriophage T4 infection, the phage ndd gene causes the rapid destruction of the structure of the Escherichia coli nucleoid. The targets of Ndd action may be the chromosomal sequences that determine the structure of the nucleoid PUBMED:9748458.

    \ 2917 IPR007611 \ This family is named after the human herpesvirus protein, but has been characterised in cytomegalovirus as UL47. Cytomegalovirus UL47 is a component of the tegument, which is a protein layer surrounding the viral capsid. UL47 co-precipitates with UL48 and UL69 tegument proteins, and the major capsid protein UL86. A UL47-containing complex is thought to be involved in the release of viral DNA from the disassembling virus particle PUBMED:11773380.\ 1477 IPR000254 \ The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (), cellobiohydrolases () (exoglucanases), or xylanases () PUBMED:1886523. Structurally, cellulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. The CBD of a number of fungal cellulases has been shown to consist of 36 amino acid residues, and it is found either at the N-terminal or at the C-terminal extremity of the enzymes. As it is shown in the following schematic representation, there are four conserved cysteines in this type of CBD domain, all involved in disulphide bonds.\
    \
                             +----------------+\
                             |          +-----|---------+\
                             |          |     |         |\
                      xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx\
    
    \ 841 IPR001283 \

    A number of eukaryotic extracellular proteins have been shown to be evolutionarily \ related. The family includes rodent sperm-coating glycoprotein (or acidic epididymal \ glycoprotein), which is thought to be involved in sperm maturation PUBMED:1301383; \ mammalian testis-specific protein (Tpx-1) PUBMED:2613236; glioma \ pathogenesis-related protein; lizard helothermine, a toxin that blocks ryanodine \ receptors; venom allergen 5 from vespid wasps and venom allergen 3 from fire ants, \ which are potent allergens that mediate allergic reactions to stings insects of \ the Hymenoptera family PUBMED:8454859; plant pathogenesis proteins of the PR-1 family \ PUBMED:2026137, which are synthesised during pathogen infection or other stress-related \ responses; proteins Sc7 and Sc14 from the basidiomycete fungus Schizophyllum commune, \ which are loosely associated with fruiting body hyphal walls PUBMED:8245835; ancylostoma \ secreted protein from dog hookworm; and yeast hypothetical proteins YJL078c, YJL079c\ and YKR013w. The precise functions of these proteins is still unclear.

    \ \ 5539 IPR008884 \ This family consists of bacterial macrocin O-methyltransferase (TylF) proteins. TylF is responsible for the methylation of macrocin to produce tylosin. Tylosin is a macrolide antibiotic used in veterinary medicine to treat infections caused by Gram-positive bacteria and as an animal growth promoter in the Sus scrofa (pig) industry. It is produced by several Streptomyces species. As with other macrolides, the antibiotic activity of tylosin is due to the inhibition of protein biosynthesis by a mechanism that involves the binding of tylosin to the ribosome, preventing the formation of the mRNA-aminoacyl-tRNA-ribosome complex PUBMED:10220165.\ 3216 IPR005619 \ The function of this presumed lipoprotein is unknown. The family includes Escherichia coli YajG .\ 4203 IPR000271 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L34 is one of the proteins from the large subunit of the prokaryotic ribosome. It is a small basic protein of 44 to 51 amino-acid residues PUBMED:1461740. L34 belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups: Eubacterial L34, Red algal chloroplast L34, Cyanelle L34.

    \ 7195 IPR009960 \

    This family consists of several fungal fruit body lectin proteins. Fruit body lectins are thought to have insecticidal activity PUBMED:12787928 and may also function in capturing nematodes PUBMED:12450118.

    \ 8 IPR008334 \

    5'-nucleotidases () PUBMED:1637327 are enzymes that catalyze the hydrolysis of\ phosphate esterified at carbon 5' of the ribose and deoxyribose portions of\ nucleotide molecules. 5'-nucleotidase is a ubiquitous enzyme found in a wide\ variety of species and which occurs in different cellular locations. The extracellular 5'-nucleotidase from mammals and electric ray isozyme is a homodimeric disulphide-bonded glycoprotein attached to the membrane by a GPI-anchor, and requires zinc for its activity. Vibrio parahaemolyticus 5'-nucleotidase (gene nutA) is bound to the membrane by a lipid chain, and requires chloride and magnesium ions for its activity. It is involved in degrading extracellular 5'-nucleotides for nutritional needs.

    \ \

    Periplasmic bacterial 5'-nucleotidase (gene ushA), also known\ as UDP-sugar hydrolase (), can degrade UDP-glucose and other nucleotide diphosphate sugars. It produces sugar-1-phosphate which can then be used by the cell. UshA seems to require cobalt for its activity.\ 5'-Nucleotidases are evolutionary related to the periplasmic bacterial 2',3'-cyclic-nucleotide 2'-phosphodiesterase () (gene cpdB), which catalyzes two consecutive reactions: it first converts 2',3'-cyclic-nucleotide to 3'-nucleotide and then acts as a 3'-nucleotidase; and mosquito apyrase () (ATP-diphosphohydrolase) PUBMED:7846038, which catalyzes the hydrolysis of ATP into AMP and facilitates hematophagy by preventing ADP-dependent platelet aggregation in the host.

    \ \

    CD73 (also called ecto-5'-nucleotidase) possesses the enzymatic activity of a 5'-nucleotidase and catalyses the dephosphorylation of purine and pyrimidine ribo- and deoxyribonucleoside monophosphates to their corresponding nucleosides. Triggering of lymphocyte CD73 with mAb causes phosphorylation and dephosphorylation of certain, yet unknown protein substrates PUBMED:9015312. A possible function for CD73 is to regulate the availability of adenosine for interaction with cell surface adenosine receptor by converting AMP to adenosine. In common with other GPI anchored surface proteins CD73 can mediate costimulatory signals in T cell activation PUBMED:2550543.

    \ \

    This entry is the C-terminal domain of 5'-nucleotidases.\

    \ 2210 IPR007562 \ This is a family of uncharacterised archaeal proteins.\ 3160 IPR006827 \ Lantibiotics are ribosomally synthesised antimicrobial agents derived from ribosomally synthesised peptides PUBMED:1539969. They are produced by bacteria of the Firmicutes phylum, and include mutacin, subtilin, and nisin. Lantibiotic peptides contain thioether bridges termed lanthionines that are thought to be generated by dehydration of serine and threonine residues followed by addition of cysteine residues PUBMED:12127987. This family constitutes the C-terminus of the enzyme proposed to catalyse the dehydration step PUBMED:12127987, PUBMED:10215865.\ 1942 IPR004251 \

    This is a Poxvirus protein family of unknown function.

    \ 11 IPR002589 \

    This domain is found in a number of protein associated with DNA and/or RNA unwinding.

    \ 2046 IPR007179 \

    This is a group of proteins of unknown function. It is found N-terminal to another domain of unknown function, DUF381 ().

    \ 6442 IPR001422 \ Neuromodulin is a component of the\ motile growth cones. It is membrane protein whose expression is\ widely correlated with successful axon elongation PUBMED:3272162. It is a crucial\ component of an effective regeneration response in the nervous system PUBMED:2641999.\ Although its function is uncertain, the N-terminal region is well\ conserved and contains both a calmodulin binding domain, and sites for\ acylation, membrane attachment and protein kinase C phosphorylation.\ Structure predictions suggest that the C-\ terminus may exist as an extended, negatively-charged rod with some\ similarity to the side arms of neurofilaments, indicating that the\ biological role of neuromodulin may depend on its ability to form a\ dynamic membrane-cytoplasm-calmodulin complex PUBMED:2641999.\ 7306 IPR011095 \

    This entry represents the C-terminal, catalytic domain of the D-alanine--D-alanine ligase enzyme . D-Alanine is one of the central molecules of the cross-linking step of peptidoglycan assembly. There are three enzymes involved in the D-alanine branch of peptidoglycan biosynthesis: the pyridoxal phosphate-dependent D-alanine racemase (Alr), the ATP-dependent D-alanine: D-alanine ligase (Ddl), and the ATP-dependent D-alanine:D-alanine-adding enzyme (MurF) PUBMED:12499203.

    \ 7204 IPR009966 \

    This family consists of several plant specific prosystemin proteins. Prosystemin is the precursor protein of the 18 amino acid wound signal systemin which activates systemic defence in plant leaves against insect herbivores PUBMED:9484462.

    \ 1038 IPR006108 \

    3-hydroxyacyl-CoA dehydrogenase () (HCDH) PUBMED:3479790 is an enzyme involved in fatty acid metabolism, it catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA. Most eukaryotic cells have 2 fatty-acid beta-oxidation systems, one located in mitochondria and the other in peroxisomes. In peroxisomes 3-hydroxyacyl-CoA dehydrogenase forms, with enoyl-CoA hydratase (ECH) and 3,2-trans-enoyl-CoA isomerase (ECI) a multifunctional enzyme where the N-terminal domain bears the hydratase/isomerase activities and the C-terminal domain the dehydrogenase activity. There are two mitochondrial enzymes: one which is monofunctional and the other which is, like its peroxisomal counterpart, multifunctional.

    \

    In Escherichia coli (gene fadB) and Pseudomonas fragi (gene faoA) HCDH is part of a multifunctional enzyme which also contains an ECH/ECI domain as well as a 3-hydroxybutyryl-CoA epimerase domain PUBMED:2204034.

    \

    There are two major region of similarities in the sequences of proteins of the HCDH family, the first one located in the N-terminal, corresponds to the NAD-binding site, the second one is located in the center of the sequence. This represents the C-terminal domain which is also found in lambda crystallin. Some proteins include two copies of this domain.

    \ 607 IPR004837 \ The sodium/calcium exchangers are a family of integral membrane proteins. This domain covers the integral membrane regions of these proteins. Sodium/calcium exchangers regulate intracellular Ca2+ concentrations in many cells; cardiac myocytes, epithelial cells, neurons retinal rod photoreceptors and smooth muscle cells PUBMED:1700476. Ca2+ is moved into or out of the cytosol depending on Na+ concentration PUBMED:1700476. In humans and rats there are 3 isoforms; NCX1 NCX2 and NCX3 PUBMED:8798769. \ 1949 IPR004335 \ Several members of this family are Borrelia burgdorferi plasmid proteins of unknown function.\ 223 IPR007301 \

    is a subunit of the terminal quinol oxidase present in the plasma membrane of Acidianus ambivalens, with calculated molecular mass of 20.4 kDa PUBMED:15306018. Thiosulphate:quinone oxidoreductase (TQO) is one of the early steps in elemental sulphur oxidation. A novel TQO enzyme was purified from the thermo-acidophilic archaeon Acidianus ambivalens and shown to consist of a large subunit (DoxD) and a smaller subunit (DoxA). The DoxD- and DoxA-like two subunits are fused together in a single polypeptide in .

    \ 2862 IPR002521 \ The viral core protein forms the internal viral coat that\ encapsidates the genomic RNA and is enveloped in a host\ cell-derived lipid membrane. The core protein has been shown,\ by yeast two-hybrid assay to interact with cellular DEAD box\ helicases PUBMED:10329544. The N terminus of the core protein is\ involved in transcriptional repression PUBMED:10082392.\ 7138 IPR009922 \

    This family contains a number of hypothetical bacterial proteins of unknown function approximately 200 residues long.

    \ 5518 IPR008867 \ This family consists of several bacterial thiazole biosynthesis protein G sequences. ThiG, together with ThiF and ThiH, is proposed to be involved in the synthesis of 4-methyl-5-(b-hydroxyethyl)thiazole (THZ) which is an intermediate in the thiazole production pathway PUBMED:9371431.\ 1223 IPR002325 \ The cytochrome b6f integral membrane protein complex transfers electrons \ between the two reaction center complexes of oxygenic photosynthetic \ membranes, and participates in formation of the transmembrane \ electrochemical proton gradient by also transferring protons from the \ stromal to the internal lumen compartment PUBMED:. The cytochrome b6f complex \ contains four polypeptides: cytochrome f (285 aa); cytochrome b6 (215 aa); \ Rieske iron-sulphur protein (179 aa); and subunit IV (160 aa) PUBMED:8027021. In its \ structure and functions, the cytochrome b6f complex bears extensive analogy\ to the cytochrome bc1 complex of mitochondria and photosynthetic purple \ bacteria; cytochrome f (cyt f) plays a role analogous to that of cytochrome\ c1, in spite of their different structures PUBMED:7631417. \

    The 3D structure of turnip cyt f has been determined PUBMED:8762139. The lumen-side \ segment of cyt f includes two structural domains: a small one above a \ larger one that, in turn, is on top of the attachment to the membrane \ domain. The large domain consists of an anti-parallel beta-sandwich and a \ short haem-binding peptide, which form a three-layer structure. The small \ domain is inserted between beta-strands F and G of the large domain and is \ an all-beta domain. The haem nestles between two short helices at the \ N-terminus of cyt f. Within the second helix is the sequence motif for the \ c-type cytochromes, CxxCH (residues 21-25), which is covalently attached to\ the haem through thioether bonds to Cys-21 and Cys-24. His-25 is the fifth \ haem iron ligand. The sixth haem iron ligand is the alpha-amino group of \ Tyr-1 in the first helix PUBMED:8762139. Cyt f has an internal network of water \ molecules that may function as a proton wire PUBMED:8762139. The water chain appears\ to be a conserved feature of cyt f.

    \ 5121 IPR007958 \

    This family contains various secreted scorpion short toxins which seem to be unrelated to those described in\ .

    \ 4151 IPR003193 \

    CD38, the HUGO gene name, is also called T10 or ADP-ribosyl cyclase/cyclic ADP-ribose hydrolase (). CD38 is a novel enzyme capable of catalysing multiple reactions, including NAD glycohydrolase, ADP-ribosyl cyclase, cyclic ADP ribose hydrolase and base-exchange activities \ Two of the enzymatic products, cyclic ADP-ribose (cADPR) and nicotinic acid adenine dinucleotide phosphate (NAADP), are calcium messengers in a wide variety of cells from protist, plant, and mammal to human. CD38 is a positive and negative regulator of cell activation and proliferation, depending on the cellular environment. It is involved in adhesion between human lymphocytes and endothelial cells and is involved in the metabolism of two calcium messengers, cADPR and NAADP.

    \ \

    CD157 (also called BP-3/IF-7, BST-1 or Mo5) has ADP-ribosyl cyclase and cyclic ADP-ribose hydrolase activities. CD157 supports the growth of a pre-B cell line, DW34. Anti-CD157 mAb IF-7 has synergistic effects on anti-CD3-induced growth of T progenitor cells, and facilitates the development of\ [alpha][beta] TCR+ cells in fetal thymic organ culture system.

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 4708 IPR006842 \ This is a family of putative transposases includes the YhgA sequence from Escherichia coli () and several prokaryotic homologues.\ 1603 IPR003825 \ Colicin V production protein is required in E. Coli for colicin V production from plasmid pColV-K30 PUBMED:2542219. This\ protein is coded for in the purF operon.\ 7582 IPR011664 \ These protein sequences, found in various bacterial species, are similar to those of Abi proteins, which are involved in bacteriophage resistance mediated by abortive infection in Lactococcus species PUBMED:8534099, PUBMED:7601848. The proteins are thought to have helix-turn-helix motifs, found in many DNA-binding proteins, allowing them to perform their function PUBMED:7601849.\ 2628 IPR004266 \ This family contains the protein with a molecular mass of 26189 Da (P26) from beet necrotic yellow vein virus (BNYVV). The function of these proteins is unknown.\ 1347 IPR006774 \ ABF1 is a sequence-specific DNA binding protein involved in transcription activation, gene silencing and initiation of DNA replication. ABF1 is known to remodel chromatin, and it is proposed that it mediates its effects on transcription and gene expression by modifying local chromatin architecture PUBMED:11756546. These functions require a conserved stretch of 20 amino acids in the C-terminal region of ABF1 (amino acids 639 to 662 Saccharomyces cerevisiae ()) PUBMED:11756546. The N-terminal two thirds of the protein are necessary for DNA binding, and the N terminus (amino acids 9 to 91 in S. cerevisiae) is thought to contain a novel zinc-finger motif which may stabilise the protein structure PUBMED:1594441.\ 396 IPR007246 \

    GPI (glycosyl phosphatidyl inositol) transamidase is a multiprotein complex required for a terminal step of adding the glycosylphosphatidylinositol (GPI) anchor attachment onto proteins. Gpi16, Gpi8 and Gaa1 form a sub-complex of the GPI transamidase.

    \ 2782 IPR006759 \

    The complex-type of oligosaccharides are synthesised through elongation by glycosyltransferases after trimming of the precursor oligosaccharides transferred to proteins in the endoplasmic reticulum. N-Acetylglucosaminyltransferases (GnTs) take part in the formation of branches in the biosynthesis of complex-type sugar chains.

    In vertebrates, six GnTs, designated as GnT-I to -VI, which catalyse the transfer of GlcNAc to the core mannose residues of Asn-linked sugar chains, have been identified. GnT-IV () catalyzes the transfer of GlcNAc from UDP-GlcNAc to the GlcNAc1-2Man1-3 arm of core oligosaccharide [Gn2(22)core oligosaccharide] and forms a GlcNAc1-4(GlcNAc1-2)Man1-3 structure on the core oligosaccharide (Gn3(2,4,2)core oligosaccharide). In some members the conserved region occupies all but the very N-terminal, where there is a signal sequence on all members. For other members the conserved region does not occupy the entire protein but is still to the N-terminal end of the protein PUBMED:9278430.

    \ 4041 IPR001559 \

    Synonym(s): Paraoxonase, A-esterase, Aryltriphosphatase, Phosphotriesterase, Paraoxon hydrolase

    \

    Bacteria such as Pseudomonas diminuta harbor a plasmid that carries the gene for Aryldialkylphosphatase (). This enzyme has attracted interest because of its potential use in the detoxification of chemical waste and warfare agents and its ability to degrade agricultural pesticides such as parathion. It act specifically on synthetic organophosphate triesters and phosphorofluoridates. It does not seem to have a natural occuring substrate and may thus have optimally evolved for utilizing paraoxon.

    \

    Aryldialkylphosphatase belongs to a family PUBMED:9383406, PUBMED:9548740 of enzymes that possess a binuclear zinc metal center at their active site. The two zinc ions are coordinated by six different residues, six of which being histidines.

    \ 3457 IPR002941 \ This domain is found in DNA methylases. In prokaryotes, the major role of DNA methylation is to protect host DNA against degradation by restriction enzymes. This family contains both N-4 cytosine-specific DNA methylases and N-6 Adenine-specific DNA methylases. N-4 cytosine-specific DNA methylases () PUBMED:7607512 are enzymes that\ specifically methylate the amino group at the C-4 position of cytosines in\ DNA. Such enzymes are found as components of type II restriction-modification\ systems in prokaryotes. Such enzymes recognize a specific sequence in DNA and\ methylate a cytosine in that sequence. By this action they protect DNA from\ cleavage by type II restriction enzymes that recognize the same sequence. N-6 adenine-specific DNA methylases () (A-Mtase) are enzymes that specifically methylate the amino group at the C-6 position of adenines in DNA. Such enzymes are found in the three existing types of bacterial restriction-modification systems (in type I system the A-Mtase is the product of the hsdM gene, and in type III it is the product of the mod gene). All of these enzymes recognize a specific sequence in DNA and methylate an adenine in that sequence.\ 8033 IPR013161 \

    BssC short protein (57 amino acids) has been described as the gamma-subunit of benzylsuccinate synthase from Thauera aromatica strain K172 PUBMED:9632263. TutF has been identified and described as highly similar to BssC in T. aromatica strain T1 PUBMED:10698784.

    \ 1805 IPR007459 \ The DNA polymerase III holoenzyme () is the polymerase responsible for the replication of the Escherichia coli chromosome. The holoenzyme is composed of the DNA polymerase III core, the sliding clamp, and the DnaX clamp loading complex. The DnaX complex contains either the tau or gamma product of gene dnax, complexed to delta.delta and to chi psi. Chi forms a 1:1 heterodimer with psi. The chi psi complex functions by increasing the affinity of tau and gamma for delta.delta allowing a functional clamp-loading complex to form at physiological subunit concentrations. Psi is responsible for the interaction with DnaX (gamma/tau), but psi is insoluble unless it is in a complex with chi PUBMED:7494000.\ 7033 IPR010813 \

    This family consists of several hypothetical bacterial proteins, which seem to be specific to Staphylococcus species. Members of this family are typically around 100 residues in length. The function of this family is unknown.

    \ 3954 IPR006754 \

    The 34-kDa protein encoded by the I3 gene of vaccinia virus is expressed at early and intermediate times postinfection and is\ phosphorylated on serine residues. I3 protein demonstrates a striking affinity for single-stranded, but not\ for double-stranded, DNA which suggests a role in DNA replication and/or repair. Electrophoretic mobility shift assays indicate that numerous I3 molecules can bind to a template,\ reflecting the stoichiometric interaction of I3 with DNA. Sequence analysis reveals that a pattern of aromatic and charged amino acids\ common to many replicative single-stranded DNA binding proteins (SSBs) is conserved in I3 PUBMED:9525612.

    \ 5433 IPR008598 \ This family consists of several drought induced 19 (Di19) like proteins. Di19 has been found to be strongly expressed in both the roots and leaves of Arabidopsis thaliana during progressive drought PUBMED:7823904. The precise function of Di19 is unknown.\ 1534 IPR004834 \ This region is found commonly in chitin synthases classes I, II and III . Chitin a linear homopolymer of GlcNAc residues, it is an important component of the cell wall of fungi and is synthesised on the cytoplasmic surface of the cell membrane by membrane bound chitin synthases PUBMED:7773595. \ 1224 IPR000074 \

    Exchangeable apolipoproteins (apoA, apoC and apoE) have the same genomic structure and are members of a multi-gene family that probably evolved from a common ancestral gene. This entry includes the ApoA1, ApoA4 and ApoE proteins. ApoA1 and ApoA4 are part of the APOA1/C3/A4/A5 gene cluster on chromosome 11 PUBMED:15108119. Apolipoproteins function in lipid transport as structural components of lipoprotein particles, cofactors for enzymes and ligands for cell-surface receptors. In particular, apoA1 is the major protein component of high-density lipoproteins; apoA4 is thought to act primarily in intestinal lipid absorption; and apoE is a blood plasma protein that mediates the transport and uptake of cholesterol and lipid by way of its high affinity interaction with different cellular receptors, including the low-density lipoprotein (LDL) receptor. Recent findings with apoA1 and apoE suggest that the tertiary structures of these two members of the human exchangeable apolipoprotein gene family are related PUBMED:15234552. The three-dimensional structure of the LDL receptor-binding domain of apoE indicates that the protein forms an unusually elongated four-helix bundle that may be stabilized by a tightly packed hydrophobic core that includes leucine zipper-type interactions and by numerous salt bridges on the mostly charged surface. Basic amino acids important for LDL receptor binding are clustered into a surface patch on one long helix PUBMED:2063194.

    \ 5964 IPR010369 \

    This is a family of plant proteins with unknown function.

    \ 6962 IPR010785 \

    This family consists of several hypothetical Nucleopolyhedrovirus proteins of around 375 residues in length. The function of this family is unknown.

    \ 6837 IPR010739 \

    This family consists of several bacterial proteins of around 120 residues in length. The function of this family is unknown.

    \ 7102 IPR009898 \

    This family contains a number of bacterial proteins of unknown function approximately 180 residues long. These are possibly integral membrane proteins.

    \ 4626 IPR002155 \

    Two different types of thiolase PUBMED:1755959, PUBMED:2191949, PUBMED:1354266 are found both in eukaryotes and in prokaryotes: acetoacetyl-CoA thiolase () and 3-ketoacyl-CoA thiolase (). 3-ketoacyl-CoA thiolase (also called thiolase I) has a broad chain-length specificity for its substrates and is involved in degradative pathways such as fatty acid beta-oxidation. Acetoacetyl-CoA thiolase (also called thiolase II) is specific for the thiolysis of acetoacetyl-CoA and involved in biosynthetic pathways such as poly beta-hydroxybutyrate synthesis or steroid biogenesis.

    \ \

    In eukaryotes, there are two forms of 3-ketoacyl-CoA thiolase: one located in the mitochondrion and the other in peroxisomes.

    \ \

    There are two conserved cysteine residues important for thiolase activity. The first located in the N-terminal section of the enzymes is involved in the formation of an acyl-enzyme intermediate; the second located at the C-terminal extremity is the active site base involved in deprotonation in the condensation reaction.

    \ \

    Mammalian nonspecific lipid-transfer protein (nsL-TP) (also known as sterol carrier protein 2) is a protein which seems to exist in two different forms: a 14 Kd protein (SCP-2) and a larger 58 Kd protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in lipid transport; the latter is found in peroxisomes. The C-terminal part of SCP-x is identical to SCP-2 while the N-terminal portion is evolutionary related to thiolases PUBMED:1755959.

    \ 5694 IPR008822 \ This family consists of several bacterial and phage Holliday junction resolvase (RusA) like proteins. The RusA protein of Escherichia coli is an endonuclease that can resolve Holliday intermediates and correct the defects in genetic recombination and DNA repair associated with inactivation of RuvAB or RuvC PUBMED:7813450.\ 144 IPR001180 \

    Based on sequence similarities a domain of homology has been identified in the following proteins PUBMED:10391936:

    \ \

    This domain, called the citron homology domain, is often found after cysteine rich and pleckstrin homology (PH) domains at the C-terminal end of the proteins PUBMED:10391936. It acts as a regulatory domain and could be involved in macromolecular interactions PUBMED:10391936, PUBMED:9135144.

    \ 3944 IPR006749 \ This family contains fowlpox virus protein E6 and its homologues. The members of this family are functionally uncharacterised PUBMED:10729156.\ 777 IPR004932 \

    RER1 family proteins are involved in involved in the retrieval of some endoplasmic reticulum membrane proteins from the early golgi\ compartment. The C terminus of yeast Rer1p interacts with a coatomer complex PUBMED:11238450.

    \ \ 6265 IPR009453 \

    The Saccharomyces cerevisiae ISN1 (YOR155c) gene encodes an IMP-specific 5'-nucleotidase, which catalyses degradation of IMP to inosine as part of the purine salvage pathway.

    \ 3227 IPR000106 \ Low molecular weight (LMW) phosphotyrosine protein phosphatase (or acid\ phosphatase) acts on tyrosine phosphorylated proteins, low-MW aryl\ phosphates and natural and synthetic acyl phosphates PUBMED:1587862, PUBMED:1304913. It is a\ cytoplasmic enzyme that catalyses the reaction:\ \ The structure of the protein has been solved by X-ray crystallography PUBMED:8052313\ and is found to form a single structural domain. It belongs to the\ alpha/beta class, with 6 alpha-helices and 4 beta-strands forming a 3-layer\ alpha-beta-alpha sandwich architecture.\ 7523 IPR011643 \ In Chlamydomonas reinhardtii, the gene encoding is induced by iron deficiency PUBMED:12012236. Its product complements et3fet4 or yeast ftr1 mutation enabling assimilation of iron. In green algae, this protein secreted and in Chlorococcum littorale is periplasmic.\ 1673 IPR000269 \

    Amine oxidases (AO) are enzymes that catalyze the oxidation of a wide range of biogenic amines including many neurotransmitters, histamine and xenobiotic amines. There are two classes of amine oxidases: flavin-containing () and copper-containing ().\ Copper-containing AO act as a disulphide-linked homodimer. They catalyse the oxidation of primary amines to aldehydes, with the subsequent release of ammonia and hydrogen peroxide: which requires one copper ion per subunit and topaquinone as cofactor PUBMED:8591028. Copper-containing amine oxidases are found in bacteria, fungi, plants and animals. In prokaryotes, the enzyme enables various amine substrates to be used as sources of carbon and nitrogen PUBMED:9048544, PUBMED:9405045. In eukaryotes they have a broader range of functions, including cell differentiation and growth, wound healing, detoxification and cell signalling PUBMED:8805580.

    \

    The copper amine oxidases occur as mushroom-shaped homodimers of 70-95 kDa, each monomer containing a copper ion and a covalently bound redox cofactor, topaquinone (TPQ). TPQ is formed by post-translational modification of a conserved tyrosine residue. The copper ion is coordinated with three histidine residues and two water molecules in a distorted square pyramidal geometry, and has a dual function in catalysis and TPQ biogenesis. The catalytic domain is the largest of the 3-4 domains found in copper amine oxidases, and consists of a beta sandwich of 18 strands in two sheets. The active site is buried and requires a conformational change to allow the substrate access.

    \ 4809 IPR002140 \

    A number of uncharacterized hydrophilic proteins of about 30 kDa share regions of similarity. These include,

    \ \ 7247 IPR009987 \

    This family contains the bacterial protein PilM (approximately 150 residues long). PilM is an inner membrane protein that has been predicted to function as a component of the pilin transport apparatus and thin-pilus basal body PUBMED:11751821.

    \ 7136 IPR009921 \

    This domain occurs in several hypothetical bacterial proteins of around 150 residues in length. The function of this domain is unknown.

    \ 593 IPR006737 \

    Motilin is a gastrointestinal regulatory polypeptide produced by motilin cells in the duodenal epithelium. It is released into the general circulation at about 100-min intervals during the inter-digestive state and is the most important factor in controlling the inter-digestive migrating contractions. Motilin also stimulates endogenous release of the endocrine pancreas PUBMED:9210180.

    This domain is also found in ghrelin, a growth hormone secretagogue synthesised by endocrine cells in the stomach. Ghrelin stimulates growth hormone secretagogue receptors in the pituitary. These receptors are distinct from the growth hormone-releasing hormone receptors, and thus provide a means of controlling pituitary growth hormone release by the gastrointestinal system PUBMED:11306336.

    This domain represents a peptide sequence that lies C-terminal to motilin/ghrelin () on the respective precursor peptide. Its function is unknown.

    \ 7369 IPR011488 \

    These proteins share a region of similarity that falls towards the C terminus from .

    \ 3792 IPR005497 \ PetN is a small hydrophobic protein, crucial for cytochrome b6-f complex assembly and/or stability.\ 1907 IPR003788 \

    This entry describes proteins of unknown function.

    \ 4943 IPR003365 \ This is a family of viral ORFs from various plant and animal ssDNA circoviruses. Published evidence to support the annotated function "viral replication associated protein" has not been found.\ 599 IPR003534 \ The major royal jelly proteins (MRJPs) comprise 12.5% of the mass, and\ 82-90% of the protein content PUBMED:9791542, of honeybee (Apis mellifera) royal jelly. Royal jelly is a substance secreted by the cephalic glands of nurse bees PUBMED:10441680 and it is used to trigger development of a queen bee from a bee larva. The biological function of the MRJPs is unknown, but they are believed to play a major role in nutrition due to their high essential amino acid content PUBMED:10380654.\

    Two royal jelly proteins, MRJP3 and MRJP5, contain a tandem repeat that\ results from a high genetic variablility. This polymorphism may be useful \ for genotyping individual bees PUBMED:10380654.

    \ 4094 IPR002720 \ Retinoblastoma-like and retinoblastoma-associated proteins may have a function in cell cycle regulation. They form a complex with adenovirus E1A and SV40 large T antigen, and may bind and modulate the function of certain cellular proteins with which T and E1A compete for pocket binding. The proteins may act as tumor suppressors, and are potent inhibitors of E2F-mediated trans-activation. \ This domain has the cyclin fold PUBMED:8152925.\ \

    The crystal structure of the Rb pocket bound to a nine-residue E7 peptide containing the LxCxE motif, shared by other Rb-binding viral and cellular proteins, shows that the LxCxE peptide binds a highly conserved groove on the B-box portion of the pocket; the A-box portion appears to be required for the stable folding of the B box (see ). Also highly conserved is the extensive A-B interface, suggesting that it may be an additional protein-binding site. The A and B boxes each contain the cyclin-fold structural motif, with the LxCxE-binding site on the B-box cyclin fold being similar to a Cdk2-binding site of cyclin A and to a TBP-binding site of TFIIB PUBMED:9495340.

    \ \

    The A and B boxes are found at the C-terminal end of the protein; the A-box is on N-terminal side of the B-box.

    \ 4497 IPR000344 \ Animals recognise a wide variety of chemicals using their senses of taste\ and smell. The nematode Caenorhabditis elegans has only 14 types of chemosensory neuron,\ yet is able to respond to dozens of chemicals because each neuron detects\ several stimuli. More than 40 highly divergent transmembrane proteins that\ could contribute to this functional diversity have been described PUBMED:7585938. Most\ of the candidate receptor genes are in clusters of similar genes; 11 of \ these appear to be expressed in small subsets of chemosensory neurons. A\ single type of neuron can potentially express at least 4 different receptor\ genes PUBMED:7585938. Some of these might encode receptors for water-soluble\ attractants, repellents and pheromones, which may be divergent members\ of the G-protein-coupled receptor family PUBMED:7585938.\ Sequences of the Sra family of C.elegans receptor-like proteins contain\ 6-7 hydrophobic, putative transmembrane, regions. These can be\ distinguished from other 7TM proteins (especially those known to couple\ G-proteins) by their own characteristic TM signatures.\ 1820 IPR004947 \ Deoxyribonuclease II () hydrolyses DNA under acidic conditions with a preference for double-stranded DNA. It catalyses the endonucleolytic cleavage of DNA to 3'-phosphomononucleotide and 3'-phosphooligonucleotide end-products. The enzyme may play a role in apoptosis.\ This family also includes hypothetical proteins from Caenorhabditis elegans.\ 516 IPR013129 \ Jumonji protein is required for neural tube formation in mice PUBMED:7758946.There is evidence of domain swapping within the jumonji family of transcription factors PUBMED:10838566. This domain is often associated with jmjN (see ) and belongs to the Cupin superfamily PUBMED:10838566.\ 3795 IPR004258 \ Severe Plasmodium falciparum malaria is characterized by excessive sequestration of infected and uninfected erythrocytes in the microvasculature of the affected organ. Rosetting, the adhesion of P. falciparum-infected erythrocytes to uninfected erythrocytes is a virulent parasite phenotype associated with the occurrence of severe malaria PUBMED:9419207. The adhesive ligand P. falciparum erythrocyte membrane protein 1 (PfEMP1) is a rosetting protein that contains clusters of glycosaminoglycan-binding motifs.\ 100 IPR001487 \ Bromodomains are found in a variety of mammalian, invertebrate and yeast DNA-binding proteins PUBMED:1350857. Bromodomains can interact with\ acetylated lysine PUBMED:9175470.\ In some proteins, the classical bromodomain has diverged to such an\ extent that parts of the region are either missing or contain an insertion\ (e.g., mammalian protein HRX, Caenorhabditis elegans hypothetical protein ZK783.4, yeast protein YTA7). The bromodomain may occur as a single copy, or in duplicate.\

    The precise function of the domain is unclear, but it may be involved in\ protein-protein interactions and may play a role in assembly or activity\ of multi-component complexes involved in transcriptional activation PUBMED:7580139.

    \ 6668 IPR009652 \

    This family consists of several programmed cell death 10 protein (PDCD10 or TFAR15) sequences. The function of this family is unknown.

    \ 606 IPR005148 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    This domain is found at the N-terminus of Arginyl tRNA synthetase, also called additional domain 1 (Add-1). It is about 140 residues long and it has been suggested that this domain will be involved in tRNA recognition PUBMED:9736621.

    \ 4550 IPR000436 \

    Sushi domains are also known as Complement control protein (CCP) modules, or short consensus repeats (SCR), exist in a wide\ variety of complement and adhesion proteins. \ The structure is known for this domain,\ it is based on a beta-sandwich arrangement; one\ face made up of three beta-strands hydrogen-bonded to form a triple-stranded region at its\ centre and the other face formed from two separate beta-strands PUBMED:1829116.

    \ \

    CD21 (also called C3d receptor, CR2, Epstein Barr virus receptor or EBV-R) is the receptor for EBV and for C3d, C3dg and iC3b. Complement components may activate B cells through CD21. CD21 is part of a large signal-transduction complex that also involves CD19, CD81, and Leu13.

    \ \

    Some of the proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. Complement decay-accelerating factor (Antigen CD55) belongs to the Cromer blood group system and is associated with Cr(a), Dr(a), Es(a), Tc(a/b/c), Wd(a), WES(a/b), IFC and UMC antigens. Complement receptor type 1 (C3b/C4b receptor) (Antigen CD35) belongs to the Knops blood group system and is associated with Kn(a/b), McC(a), Sl(a) and Yk(a) antigens.

    \ \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 7860 IPR013115 \

    Members identified by this domain are involved in histidine biosynthesis.

    \ 3158 IPR002000 \

    Lysosome-associated membrane glycoproteins (lamp) PUBMED:1939168 are integral membrane proteins, specific to lysosomes, and whose exact biological function is not yet clear. Structurally, the lamp proteins consist of two internally homologous lysosome-luminal domains separated by a proline-rich hinge region; at the C-terminal extremity there is a transmembrane region (TM) followed by a very short cytoplasmic tail (C). In each of the duplicated domains, there are two conserved disulphide bonds. This structure is schematically represented in the figure below.

    \
    \
       +-----+            +-----+         +-----+            +-----+\
       |     |            |     |         |     |            |     |\
      xCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxxxCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxx\
      +--------------------------++Hinge++--------------------------++TM++C+\
    
    \

    In mammals, there are two closely related types of lamp: lamp-1 and lamp-2. In chicken lamp-1 is known as LEP100.

    \ \

    CD69 (also called gp110 or macrosialin) PUBMED:8486654 is a heavily glycosylated integral membrane protein whose structure consists of a mucin-like domain followed by a proline-rich hinge; a single lamp-like domain; a transmembrane region and a short cytoplasmic tail.

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 617 IPR006989 \ Nab1 and Nab2 are co-repressors that specifically interact with and repress transcription mediated by the three members of the NGFI-A (Egr-1, Krox24, zif/268) family of eukaryotic (metazoa) transcription factors PUBMED:9418898. This family consists of NAB conserved region 2, near the C terminus of the protein. It is necessary for transcriptional repression by the Nab proteins PUBMED:9418898. It is also required for transcription activation by Nab proteins at Nab-activated promoters PUBMED:10734128.\ 4070 IPR011576 \

    Pyridoxamine 5'-phosphate oxidase (PNPOx; ) is a FMN flavoprotein that catalyses the oxidation of pyridoxamine-5-P (PMP) and pyridoxine-5-P (PNP) to pyridoxal-5-P (PLP). This reaction serves as the terminal step in the de novo biosynthesis of PLP in Escherichia coli and as a part of the salvage pathway of this coenzyme in both E. coli and mammalian cells PUBMED:12686112, PUBMED:12824491. The binding sites for FMN and for substrate have been highly conserved throughout evolution.

    \

    This FMN-binding domain is present in pyridoxamine 5'-phosphate oxidases and also in a number of proteins that have not been demonstrated to have enzymatic activity.

    \ \ 6928 IPR009794 \

    This family consists of several hypothetical bacterial proteins of around 125 resides in length. The function of this family is unknown.

    \ 6412 IPR010561 \

    DIRP (Domain in Rb-related Pathway) is postulated to be involved in the Rb-related pathway, which is encoded by multiple eukaryotic genomes and is present in proteins including lin-9 of Caenorhabditis elegans, aly of Drosophila melanogaster and mustard weed. Studies of lin-9 and aly of fruit fly proteins containing DIRP suggest that this domain might be involved in development. Aly, lin-9, act in parallel to, or downstream of, activation of MAPK by the RTK-Ras signalling pathway.

    \ 4082 IPR007268 \ Rad9 is required for transient cell-cycle arrests and transcriptional induction of DNA repair in response to DNA damage.\ 2645 IPR000071 \

    Retroviral matrix proteins (or major core proteins) are components of envelope-associated capsids, which line the inner surface of virus envelopes and are associated with viral membranes PUBMED:9657938. Matrix proteins are produced as part of Gag precursor polyproteins. During viral maturation, the Gag polyprotein is cleaved into major structural proteins by the viral protease, yielding the matrix (MA), capsid (CA), nucleocapsid (NC), and some smaller peptides. Gag-derived proteins govern the entire assembly and release of the virus particles, with matrix proteins playing key roles in Gag stability, capsid assembly, transport and budding. Although matrix proteins from different retroviruses appear to perform similar functions and can have similar structural folds, their primary sequences can be very different.

    \

    This entry represents matrix proteins from immunodeficiency lentiviruses, such as human and simian immunodeficiency viruses (HIV and SIV, respectively) PUBMED:12465460. The structure of HIV-1 consists of 5 alpha helices, a short 3.10 helix and a 3-stranded mixed beta-sheet PUBMED:7966331.

    \ \ 6423 IPR009517 \

    This family consists of several Borna disease virus (BDV) P24 proteins. The function of this family is unknown.

    \ 4199 IPR000054 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaebacterial large subunit ribosomal\ proteins can be grouped on the basis of sequence similarities.\ These proteins have 87 to 128 amino-acid residues. This family consists of:\

  • Yeast L34
  • \
  • Archaeal L31 PUBMED:2207169
  • \
  • Plants L31
  • \
  • Mammalian L31 PUBMED:3816785
  • \ 6761 IPR009701 \

    This family consists of several special lobe-specific silk protein SSP160 sequences which appear to be specific to Chironomus (Midge) species.

    \ 2440 IPR007499 \ The DNA single-strand annealing proteins (SSAPs), such as RecT, Red-beta, ERF and Rad52, function in RecA-dependent and RecA-independent DNA recombination pathways. This family includes proteins related to ERF PUBMED:11914131.\ 6519 IPR009574 \

    This family consists of several hypothetical bacterial proteins of around 260 residues in length. The function of this family is unknown.

    \ 7169 IPR009941 \

    This family consists of several hypothetical bacterial proteins of around 150 residues in length. Members of this family seem to be found exclusively in Borrelia burgdorferi (Lyme disease spirochete). The function of this family is unknown.

    \ 3067 IPR002354 \ Cytokines are protein messengers that carry information from cell to cell\ PUBMED:8151703. Interleukin is one such molecule, and participates in several B-cell \ activation processes: e.g., it enhances production and secretion of IgG1\ and IgE PUBMED:3083412; it induces expression of class II major histocompatability \ complex (MHC) molecules on resting B-cells; and it regulates expression of\ the low affinity Fc receptor for IgE on lymphocytes and monocytes.\ Interleukin-4 (IL4) has a compact, globular fold (similar to other\ cytokines), stabilised by 3 disulphide bonds PUBMED:1993171. One half of the structure\ is dominated by a 4 alpha-helix bundle with a left-handed twist PUBMED:1400355. The\ helices are anti-parallel, with 2 overhand connections, which fall into a\ 2-stranded anti-parallel beta-sheet PUBMED:1400355.\ 7234 IPR010876 \

    This family consists of several eukaryotic NICE-3 and related proteins. The gene coding for NICE-3 is part of the epidermal differentiation complex (EDC), which comprises a large number of genes that are of crucial importance for the maturation of the human epidermis PUBMED:11230159. The function of NICE-3 is unknown.

    \ 7115 IPR010839 \

    This family consists of several bacterial and plant proteins of around 400 residues in length. The function of this family is unknown.

    \ 732 IPR002509 \ This domain is found in polysaccharide deacetylase. This family of\ polysaccharide deacetylases includes NodB (nodulation protein B from \ Rhizobium) which is a chitooligosaccharide deacetylase PUBMED:9163424.\ It also includes chitin deacetylase from yeast PUBMED:9133736,\ and endoxylanases which hydrolyses glucosidic bonds in xylan PUBMED:8170399.\ 7915 IPR012592 \

    The PROCN domain is the central domain in pre-mRNA splicing factors of PRO8 family PUBMED:15112237.

    \ 3697 IPR001478 \

    PDZ domains are found in diverse signaling proteins in bacteria, yeasts,\ plants, insects and vertebrates PUBMED:9041651, PUBMED:9204764. PDZ domains can occur in one or multiple copies and are nearly always found in\ cytoplasmic proteins. They bind either the carboxyl-terminal sequences of proteins or internal peptide sequences PUBMED:9204764. In most cases, interaction between a PDZ domain and its target is constitutive, with a\ binding affinity of 1 to 10 µM. However, agonist-dependent activation of cell surface\ receptors is sometimes required to promote interaction with a PDZ protein. PDZ domain proteins are frequently associated with the plasma membrane, a\ compartment where high concentrations of phosphatidylinositol 4,5-bisphosphate\ (PIP2) are found. Direct interaction between PIP2 and a subset of class II PDZ\ domains (syntenin, CASK, Tiam-1) has been demonstrated.

    \

    PDZ domains consist of 80 to 90 amino acids comprising six beta-strands (betaA to betaF) and two alpha-helices, A and B, compactly arranged in a globular\ structure. Peptide binding of the ligand takes place in an\ elongated surface groove as an antiparallel beta-strand interacts with the betaB strand and\ the B helix. The structure of PDZ domains allows\ binding to a free carboxylate group at the end of a peptide through a\ carboxylate-binding loop between the betaA and betaB strands.

    \ 5339 IPR008387 \ Coupling factor 6 (F6) is a component of mitochondrial ATP synthase which is required for the interactions of the catalytic and proton-translocating segments PUBMED:1825642.\ 1666 IPR005534 \

    CsgG is an outer membrane-located lipoprotein that is highly resistant to protease digestion. During curli assembly, an adhesive surface fibre, CsgG is required to maintain the stability of CsgA and CsgB PUBMED:9383186.

    \ 3215 IPR006876 \

    This group of uncharacterised proteins have a conserved C-terminal region which is found in LMBR1 and in the lipocalin-1 receptor. LMBR1 was thought to play a role in preaxial polydactyly, but recent evidence now suggests this not to be the case PUBMED:12032320.

    \ 6840 IPR009742 \

    This entry represents a bacterial repeated motif of around 30 residues in length. These repeats are often found in multiple copies in the curlin proteins CsgA and CsgB. Curli fibres are thin aggregative surface fibres, connected with adhesion, which bind laminin, fibronectin, plasminogen, human contact phase proteins, and major histocompatibility complex (MHC) class I molecules. Curli fibres are coded for by the csg gene cluster, which is comprised of two divergently transcribed operons. One operon encodes the csgB, csgA, and csgC genes, while the other encodes csgD, csgE, csgF, and csgG. The assembly of the fibres is unique and involves extracellular self-assembly of the curlin subunit (CsgA), dependent on a specific nucleator protein (CsgB). CsgD is a transcriptional activator essential for expression of the two curli fibre operons, and CsgG is an outer membrane lipoprotein involved in extracellular stabilisation of CsgA and CsgB PUBMED:11254632.

    \ 976 IPR006925 \ This protein forms part of the Class C vacuolar protein sorting (Vps) complex. Vps16 is essential for vacuolar protein sorting, which is essential for viability in plants, but not yeast PUBMED:11702788. The Class C Vps complex is required for SNARE-mediated membrane fusion at the lysosome-like yeast vacuole. It is thought to play essential roles in membrane docking and fusion at the Golgi-to-endosome and endosome-to-vacuole stages of transport PUBMED:11422941. The role of VPS16 in this complex is not known.\ 6522 IPR010604 \

    This family consists of several plant specific nuclear matrix protein 1 (NMP1) sequences. Nuclear Matrix Protein 1 is a ubiquitously expressed 36 kDa protein, which has no homologues in animals and fungi, but is highly conserved among flowering and non-flowering plants. NMP1 is located both in the cytoplasm and nucleus and that the nuclear fraction is associated with the nuclear matrix. NMP1 is a candidate for a plant-specific structural protein with a function both in the nucleus and cytoplasm PUBMED:12654864.

    \ 2598 IPR002666 \

    The reduced folate carrier (a transmembrane glycoprotein) transports reduced folate into mammalian cells via the carrier mediated mechanism (as opposed to the receptor mediated mechanism) it also transports cytotoxic folate analogues used in chemotherapy PUBMED:9161403, such as methotrexate (MTX). Mammalian cells have an absolute requirement for exogenous folates which are needed for growth, and biosynthesis of macromolecules PUBMED:9161403.

    \ 5596 IPR008554 \ This family consists of Poxvirus proteins with similarity to glutaredoxin 2 as well as related bacterial sequences from Leuconostoc mesenteroides which are annotated as a MesC protein. MesC is a protein of unknown function which forms part of the mesentericin operon.\ 574 IPR000055 \ This domain is also known as the target recognition domain (TRD).\ Restriction-modification (R-M) systems protect a bacterial cell\ against invasion of foreign DNA by endonucleolytic cleavage of DNA\ that lacks a site specific modification. The host genome is\ protected from cleavage by methylation of specific nucleotides\ in the target sites.

    In type I systems, both restriction and\ modification activities are present in one heteromeric enzyme\ complex composed of one DNA specificity subunit (this family),\ two modification (M) subunits and two restriction (R) subunits PUBMED:9837717. Most of the proteins in this family have two copies of the domain.

    \ 3046 IPR000898 \

    Indoleamine 2,3-dioxgyenase (IDO, ) PUBMED:1907934 is a cytosolic haem protein which, together with the hepatic enzyme tryptophan 2,3-dioxygenase, catalyzes the conversion of tryptophan and other indole derivatives to kynurenines. The physiological role of IDO is not fully understood but is of great interest, because IDO is widely distributed in human tissues, can be up-regulated via cytokines such as interferon-gamma, and can thereby modulate the levels of tryptophan, which is vital for cell growth. The degradative action of IDO on tryptophan leads to cell death by starvation of this essential and relatively scarce amino acid. IDO is a haem-containing enzyme of about 400 amino acids. Site-directed mutagenesis showed His346 () to be essential for haem binding, indicating that this histidine residue may be the proximal ligand. Mutation of Asp274 also compromised the ability of IDO to bind haem, suggesting that Asp274 may coordinate to haem directly as the distal ligand or is essential in maintaining the conformation of the haem pocket PUBMED:12766158.

    \

    Other proteins that are evolutionarily related to IDO include yeast hypothetical protein YJR078w; and myoglobin from the red muscle of the archaeogastropodic molluscs, Haliotis madaka and Sulculus diversicolor PUBMED:8011076, PUBMED:12711393. These unusual globins lack enzymatic activity but have kept the haem group.

    \ \ 1781 IPR007060 \ DivIC from Bacillus subtilis is necessary for both vegetative and sporulation septum formation PUBMED:8113187. These proteins are mainly composed of an N-terminal coiled-coil.\ 7424 IPR011453 \

    This is a large family of paralogous proteins apparently unique to planctomycetes.

    \ 4142 IPR004902 \ This is a family of Rhabdovirus nucleocapsid proteins. These proteins undergo phosphorylation.\ 4436 IPR007857 \ The human homologue of Saccharomyces cerevisiae Skb1 (Shk1 kinase-binding protein 1) is a protein methyltransferase PUBMED:10531356. These proteins seem to play a role in Jak signalling.\ 3180 IPR001220 \ Legume lectins are one of the largest lectin families with more than 70 lectins\ reported. Leguminous plant lectins resemble each other in their physicochemical properties although they differ in their carbohydrate specificities. They consist of two or four subunits with relative molecular mass of 30 kDa and each subunit has one carbohydrate-binding site. The interaction with sugars requires tightly bound calcium and manganese ions. The structural similarities of these lectins are reported by the primary structural analyses and X-ray crystallographic studies. X-ray studies have shown that the folding of the polypeptide chains in the region of the carbohydrate-binding sites is also similar, despite differences in the primary sequences. The carbohydrate-binding sites of these lectins consist of two conserved amino acids on beta pleated sheets. One of these loops contains transition metals, calcium and manganese,\ which keep the amino acid residues of the sugar-binding site at the required\ positions. Amino acid sequences of this loop play an important role in the\ carbohydrate-binding specificities of these lectins. These lectins bind either glucose/mannose or galactose.\

    The exact function of legume lectins \ is not known but they may be involved in the attachment of nitrogen-fixing bacteria to legumes and \ in the protection against pathogens.

    \

    Some legume lectins are proteolytically processed to produce two chains, beta (which corresponds to \ the N-terminal) and alpha (C-terminal)(). The lectin concanavalin A (conA) from jack bean is exceptional \ in that the two chains are transposed and ligated (by formation of a new peptide bond). The N-terminus \ of mature conA thus corresponds to that of the alpha chain and the C-terminus to the beta chain.

    \ 811 IPR001247 \ This domain includes the 3'-5' exoribonucleases, ribonuclease PH that contains a single \ copy of this domain, and removes nucleotide residues following the -CCA terminus of \ tRNA and polyribonucleotide nucleotidyltransferase (PNPase) that contains two tandem \ copies of the domain and is involved in mRNA degradation in a 3'-5' direction. PNPase\ is involved in the RNA degradosome, a multi-enzyme complex important in RNA processing \ and messenger RNA degradation. In yeast these proteins are components of the exosome \ 3'-5' exoribonuclease complex that is required for 3' processing of the 5.8S rRNA\ PUBMED:9390555.\ 6777 IPR009711 \

    This family consists of several hypothetical bacterial proteins of around 90 residues in length. The function of this family is unknown.

    \ 3894 IPR004031 \ Several vertebrate small integral membrane glycoproteins are evolutionary related PUBMED:7499407, PUBMED:7499420, PUBMED:8996089, including eye lens specific membrane protein 20 \ (MP20 or MP19); epithelial membrane protein-1 (EMP-1), which is also known as tumor-associated\ membrane protein (TMP) or as squamous cell-specific protein Cl-20; epithelial membrane protein-2 \ (EMP-2), which is also known as XMP; epithelial membrane protein-3 (EMP-3), also known as YMP;\ and peripheral myelin protein 22 (PMP-22), which is expressed in many tissues but mainly by \ Schwann cells as a component of myelin of the peripheral nervous system (PNS). PMP-22 probably \ plays a role both in myelinization and in cell proliferation. Mutations affecting PMP-22 are \ associated with hereditary motor and sensory neuropathies such as Charcot-Marie-Tooth disease \ type 1A (CMT-1A) in human or the trembler phenotype in mice. The proteins of this family are \ about 160 to 173 amino acid residues in size, and contain four transmembrane segments. PMP-22, \ EMP-1, -2 and -3 are highly similar, while MP20 is more distantly related. This family also includes the claudins, which are components of tight junctions.\ 1203 IPR005039 \ Prophages P1 and P7 exist as unit copy DNA plasmids in the bacterial cell. Maintenance of the prophage state requires the\ continuous expression of two repressors: (i) C1 is a protein which negatively regulates the expression of lytic genes including\ the C1 inactivator gene coi, and (ii) C4 is an antisense RNA which specifically inhibits the synthesis of an anti-repressor Ant.\ 1173 IPR002548 \

    Alphaviruses are enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Semliki Forest and Sindbis viruses PUBMED:15378043. Alphaviruses consist of three structural proteins: the core nucleocapsid protein C, and the envelope proteins P62 and E1 that associate as a heterodimer. The viral membrane-anchored surface glycoproteins are responsible for receptor recognition and entry into target cells through membrane fusion. The proteolytic maturation of P62 into E2 () and E3 () causes a change in the viral surface. Together the E1, E2, and sometimes E3, glycoprotein "spikes" form an E1/E2 dimer or an E1/E2/E3 trimer, where E2 extends from the centre to the vertices, E1 fills the space between the vertices, and E3, if present, is at the distal end of the spike PUBMED:8107141. Upon exposure of the virus to the acidity of the endosome, E1 dissociates from E2 to form an E1 homotrimer, which is necessary for the fusion step to drive the cellular and viral membranes together. The alphaviral glycoprotein E1 is a class II viral fusion protein, which is structurally different from the class I fusion proteins found in influenza virus and HIV. The structure of the Semliki Forest virus revealed a structure that is similar to that of flaviviral glycoprotein E, with three structural domains in the same primary sequence arrangement PUBMED:11301009. This entry represents all three domains of the alphaviral E1 glycoprotein.

    \ \ \ \ 3845 IPR001200 \ The outer and inner segments of vertebrate rod photoreceptor cells contain phosducin,\ a soluble phosphoprotein that complexes with the beta/gamma-subunits of the GTP-binding\ protein, transducin. Light-induced changes in cyclic nucleotide levels modulate the\ phosphorylation of phosducin by protein kinase A PUBMED:2203790. The protein is thought to participate in the regulation of\ visual phototransduction or in the integration of photo-receptor metabolism. Similar\ proteins have been isolated from the pineal gland and it is believed that the functional\ role of the protein is the same in both retina and pineal gland PUBMED:2210381.\ 8043 IPR013171 \

    This region contains the zinc-binding domain of cytidine and deoxycytidylate deaminase.

    \

    Cytidine deaminase () (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into uridine and ammonia while deoxycytidylate deaminase () (dCMP deaminase) hydrolyzes dCMP into dUMP. Both enzymes are known to bind zinc and to require it for their catalytic activity PUBMED:1567863, PUBMED:8428902. These two enzymes do not share any sequence similarity with the exception of a region that contains three conserved histidine and cysteine residues which are thought to be involved in the binding of the catalytic zinc ion.

    \ 7726 IPR012874 \

    This family contains hypothetical proteins of unknown function found in Methanosarcina acetivorans and Methanosarcina mazei.

    \ 6048 IPR013085 \

    Zinc finger domains PUBMED:3125980, PUBMED: are nucleic acid-binding protein structures first \ identified in the Xenopus laevis transcription factor TFIIIA. These domains have since been found in \ numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid \ residues including 2 conserved Cys and 2 conserved His residues in a C-2-C-12-H-3-H type motif. \ The 12 residues separating the second Cys and the first His are mainly polar and basic, implicating \ this region in particular in nucleic acid binding. The zinc finger motif is an unusually small, \ self-folding domain in which Zn is a crucial component of its tertiary structure. All bind 1 atom of \ Zn in a tetrahedral array to yield a finger-like projection, which interacts with nucleotides in the \ major groove of the nucleic acid. The Zn binds to the conserved Cys and His residues. Fingers have \ been found to bind to about 5 base pairs of nucleic acid containing short runs of guanine residues. \ They have the ability to bind to both RNA and DNA, a versatility not demonstrated by the helix-turn-helix motif. The zinc finger may thus represent the original nucleic acid binding protein. It has \ also been suggested that a Zn-centred domain could be used in a protein interaction, e.g. in protein \ kinase C. Many classes of zinc fingers are characterized according to the number and positions of the \ histidine and cysteine residues involved in the zinc atom coordination. In the first class to be \ characterized, called C2H2, the first pair of zinc coordinating residues are cysteines, while the \ second pair are histidines.

    \

    This domain is found in several U1 small nuclear ribonucleoprotein C (U1-C) proteins. The U1 small nuclear ribonucleoprotein (U1 snRNP) binds to the pre-mRNA 5' splice site (ss) at early stages of spliceosome assembly. Recruitment of U1 to a class of weak 5' ss is promoted by binding of the protein TIA-1 to uridine-rich sequences immediately downstream from the 5' ss. Binding of TIA-1 in the vicinity of a 5' ss helps to stabilise U1 snRNP recruitment, at least in part, via a direct interaction with U1-C, thus providing one molecular mechanism for the function of this splicing regulator PUBMED:12486009. This domain is probably a zinc-binding motif which is found in several copies in some proteins.

    \ 2397 IPR002735 \ This domain is found in the N-terminus of eIF-5 , and\ the C-terminus of eIF-2 beta . This region\ corresponds to the whole of the archaebacterial eIF-2 beta\ homolog. It contains a putative zinc binding C4 finger.\ 1106 IPR004985 \

    Adenoviruses have evolved multiple mechanisms to evade the host immune response. Several of the immunomodulatory proteins are encoded in early transcription unit 3 (E3).

    \ 845 IPR006900 \

    COPII (coat protein complex II)-coated vesicles carry proteins from the endoplasmic reticulum (ER) to the Golgi complex PUBMED:11535824. COPII-coated vesicles form on the ER by the stepwise recruitment of three cytosolic components: Sar1-GTP to initiate coat formation, Sec23/24 heterodimer to select SNARE and cargo molecules, and Sec13/31 to induce coat polymerisation and membrane deformation PUBMED:12239560.

    \

    Sec23 p and Sec24p are structurally related, folding into five distinct domains: a beta-barrel, a zinc-finger (), an alpha/beta trunk domain (), an all-helical region, and a C-terminal gelsolin-like domain (). This entry describes the all-helical domain, which forms an approximately 105-residue segment with the C-terminal 30 residues. The linker between alpha-M and alpha-N contacts Sar1.

    \ \ 3387 IPR006656 \

    This domain is found in a number of molybdopterin-containing oxidoreductases, tungsten formylmethanofuran dehydrogenase \ subunit d (FwdD) and molybdenum formylmethanofuran dehydrogenase subunit (FmdD); where a single domain constitutes almost the entire subunit.\ The formylmethanofuran dehydrogenase catalyses the first step in\ methane formation from CO2 in methanogenic archaea and has a \ molybdopterin dinucleotide cofactor PUBMED:9818358.

    \ 7262 IPR010888 \

    This family consists of several minor pilin proteins including CblD from Burkholderia cepacia which is known to CblD be the initiator of pilus biogenesis PUBMED:12686638. The family also contains a variety of Enterobacterial minor pilin proteins.

    \ 785 IPR002670 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L18ae forms part of the 60S ribosomal subunit PUBMED:1840484. This family is found in eukaryotes. Rat ribosomal protein L18 is homologous to Xenopus laevis L14 PUBMED:3371159.

    \ 2082 IPR007336 \ This family includes several bacterial proteins of unknown function, although at least one member () is a putative coproporphyrinogen III oxidase.\ 2790 IPR001296 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Proteins containign this domain transfer UDP, ADP, GDP or CMP linked sugars to a variety of \ substrates, including glycogen, fructose-6-phosphate and lipopolysaccharides. The \ bacterial enzymes are involved in various biosynthetic processes that include\ exopolysaccharide biosynthesis, lipopolysaccharide core biosynthesis and the biosynthesis\ of the slime polysaccaride colanic acid. Mutations in this domain of the human\ N-acetylglucosaminyl-phosphatidylinositol biosynthetic protein are the cause of \ paroxysmal nocturnal hemoglobinuria (PNH), an acquired hemolytic blood disorder\ characterized by venous thrombosis, erythrocyte hemolysis, infections and defective \ hematopoiesis.

    \ 8096 IPR013155 \

    This domain is found valyl and leucyl tRNA synthetases. It binds to the anticodon of the tRNA.

    \ 1626 IPR003377 \

    The drosophila cornichon protein (gene: cni) PUBMED:7540118 is required in the germline\ for dorsal-ventral signaling. The dorsal-ventral pattern formation involves a\ reorganization of the microtubule network correlated with the movement of the\ oocyte nucleus, and depending on the initial correct establishment of the\ anterior-posterior axis via a signal from the oocyte produced by cornichon\ and gurken and received by torpedo protein in the follicle cells. The\ biochemical function of the cornichon protein is currently not known. It is a protein of 144 residues that seems to contain three transmembrane\ regions.

    \ 3492 IPR001860 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 34 comprises enzymes with only one known activity; sialidase or neuraminidase ().

    \ \

    Neuraminidases cleave the terminal sialic acid residues from carbohydrate chains in glycoproteins. Sialic acid is a negatively charged sugar associated with the protein and lipid portions of lipoproteins. In Influenza virus, neuraminidases prevent self-aggregation by removing the carbohydrate from the viral envelope thus facilitating the mobility of the virus to and from the site of infection.\ Antiviral agents that inhibit influenza viral neuraminidase activity are of major importance in the control of influenza PUBMED:10623375.

    \ 7955 IPR013110 \

    The DOT1 domain regulates gene expression by methylating histone H3 PUBMED:15292170. H3 methylation by DOT1 has been shown to be required for the DNA damage checkpoint in yeast PUBMED:15632126.

    \ 2855 IPR002637 \

    This family contains the Saccharomyces cerevisiae HAM1 protein and other hypothetical archaeal, bacterial and Caenorhabditis elegans proteins. \ Saccharomyces cerevisiae HAM1 protects against the mutagenic effects of the base analog 6-N-hydroxylaminopurine (HAP) which can be a natural product of monooxygenase activity on adenine. HAM1 protein protects the cell from HAP, either on the level of deoxynucleoside triphosphate or the DNA level by a yet unidentified set of reactions PUBMED:8789257.

    \ 6315 IPR010516 \

    This family consists of several eukaryotic Sin3 associated polypeptide p18 (SAP18) sequences. SAP18 is known to be a component of the Sin3-containing complex, which is responsible for the repression of transcription via the modification of histone polypeptides PUBMED:9150135. SAP18 is also present in the ASAP complex which is thought to be involved in the regulation of splicing during the execution of programmed cell death PUBMED:12665594.

    \ 359 IPR005101 \

    Deoxyribodipyrimidine photolyase (DNA photolyase) is a DNA repair enzyme. It binds to UV-damaged DNA containing pyrimidine dimers and,\ upon absorbing a near-UV photon (300 to 500 nm), breaks the cyclobutane ring\ joining the two pyrimidines of the dimer. DNA photolyase is an enzyme that\ requires two choromophore-cofactors for its activity: a reduced FADH2 and\ either 5,10-methenyltetrahydrofolate (5,10-MTFH) or an oxidized 8-hydroxy-5-\ deazaflavin (8-HDF) derivative (F420). The folate or deazaflavin chromophore\ appears to function as an antenna, while the FADH2 chromophore is thought to\ be responsible for electron transfer. On the basis of sequence similarities\ DNA photolyases can be grouped into two classes. The first class contains\ enzymes from Gram-negative and Gram-positive bacteria, the halophilic\ archaebacteria Halobacterium halobium, fungi and plants. Class 1 enzymes bind\ either 5,10-MTHF (E. coli, fungi, etc.) or 8-HDF (S.griseus, H.halobium).

    \ \

    Proteins containing this domain also include Arabidopsis cryptochromes 1 (CRY1) and 2 (CRY2), which are blue light photoreceptors that mediate blue light-induced gene\ expression.

    \ \ \ 1974 IPR005098 \ This domain is found in a number of worm proteins and has no known function. The boundaries of the presumed domain are rather uncertain.\ 3077 IPR003446 \ This protein is plasmid encoded and found to be essential for plasmid replication, and is involved in copy control functions PUBMED:3041379.\ 2903 IPR004996 \

    This is a family of proteins expressed by members of the Herpesviridae.

    \ 3738 IPR001818 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase families:

    \

    \ \

    The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA.

    Sequences having this domain are extracellular metalloproteases, such as collagenase and stromelysin, which\ degrade the extracellular matrix, are known as matrixins. They are zinc-dependent,\ calcium-activated proteases synthesised as inactive precursors\ (zymogens), which are proteolytically cleaved to yield the active enzyme\ PUBMED:2551898, PUBMED:2167841. All matrixins and related proteins possess 2 domains: an N-terminal\ domain, and a zinc-binding active site domain. The N-terminal domain\ peptide, cleaved during the activation step, includes a conserved PRCGVPDV\ octapeptide, known as the cysteine switch, whose Cys residue chelates the\ active site zinc atom, rendering the enzyme inactive. The active enzyme\ degrades components of the extracellular matrix, playing a role in the\ initial steps of tissue remodelling during morphogenesis, wound healing,\ angiogenesis and tumour invasion PUBMED:2551898, PUBMED:2167841.

    \ \ 1305 IPR007309 \ Yeast transcription factor IIIC (TFIIIC) is a multisubunit protein complex that interacts with two control elements of class III promoters called the A and B blocks. This family represents the subunit within TFIIIC involved in B-block binding PUBMED:1279682.\ 4430 IPR004693 \ Marine diatoms such as Cylindrotheca fusiformis encode at least six silicon transport protein homologues which exhibit similar size and topology. One characterized member of the family (Sit1) functions in the energy-dependent uptake of either silicic acid [Si(OH)4] or silicate [Si(OH)3O-] by a Na+ symport mechanism. The system is found in marine diatoms which make their "glass houses" out of silicon.\ 4620 IPR002922 \ This family includes a putative thiamine biosynthetic enzyme PUBMED:7961415. This enzyme is involved in the biosynthesis of the thiamine precursor thiazole, and is repressed by thiamine.\ 5750 IPR010259 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    Propeptides, also known as inhibitor or activation peptides, are responsible for the modulation of folding and activity of the pro-enzyme (zymogen). The pro-segment docks into the enzyme moiety shielding the substrate binding site, thereby promoting inhibition of the enzyme. Several such propeptides share a similar topology, despite often low sequence identities. The propeptide region has an open-sandwich antiparallel-alpha/antiparallel-beta fold, with two alpha-helices and four beta-strands with a (beta/alpha/beta)x2 topology. \

    \ \

    This group of sequences contain the propeptide domain at the N terminus of peptidases belonging to MEROPS family S8A, subtilisins. A number of the members of this group of sequences belong to MEROPS inhibitor family I9, clan I-. The propeptide is removed by proteolytic cleavage; removal activating the enzyme.

    \ 7466 IPR011517 \

    The bacterial core RNA polymerase complex, which consists of five subunits, is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme PUBMED:3052291. RNA polymerase recruits alternative sigma factors as a means of switching on specific regulons. Most bacteria express a multiplicity of sigma factors. Two of these factors, Sigma70 (gene rpoD; major sigma factor) and Sigma54 (gene rpoN or ntrA) direct the transcription of a wide variety of genes. The other sigma factors, known as alternative sigma factors, are required for the transcription of specific subsets of genes.

    \

    This family represents a group of sigma factors that are able to regulate extra cellular function (ECF) PUBMED:12073657. Eubacteria display considerable genetic diversity between ECF-sigma factors, but all retain two features: the ability to respond to extra-cytoplasmic functions; and regulation by anti-sigma and anti-anti-sigma factors PUBMED:15374527. This family show sequence similarity to and .

    \ \ 61 IPR006773 \

    This is a family of eukaryotic proteins which are believed to be involved in cell adhesion. Members are involved in gastrulation and also in metastatis formation and the progression of cancer. Experimental evidence suggests that these proteins are transmembrane and possibly glycoproteins PUBMED:10610020, PUBMED:10919708.

    \ 6447 IPR009530 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 925 IPR007373 \ Thiamin pyrophosphokinase (TPK, ) catalyzes the transfer of a pyrophosphate group from ATP to vitamin B1 (thiamin) to form the coenzyme thiamin pyrophosphate (TPP). Thus, TPK is important for the formation of a coenzyme required for central metabolic functions. The structure of thiamin pyrophosphokinase suggest that the enzyme may operate by a mechanism of pyrophosphoryl transfer similar to those described for pyrophosphokinases functioning in nucleotide biosynthesis PUBMED:11435118.\ 2650 IPR001079 \ Animal lectins display a wide variety of architectures.\ They are classified according to the carbohydrate-recognition\ domain (CRD) of which there are two main types, S-type and C-type.\

    Galectins (previously S-lectins) bind exclusively beta-galactosides like lactose. They do not require metal ions for activity.\ Galectins are found predominantly, but not exclusively in mammals PUBMED:8124704. Their function is unclear. They are developmentally regulated and may be involved in differentiation, cellular regulation and tissue\ construction.

    \ 4201 IPR002677 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L32p is part of the 50 S ribosomal subunit. This family is found in both prokaryotes and eukaryotes. Ribosomal protein L32 of yeast binds to and regulates the splicing and the translation of the transcript of its own gene PUBMED:9121443}.

    \ 4957 IPR007143 \

    Vacuolar protein sorting-associated protein VPS28 is required for normal endocytic and biosynthetic traffic to the yeast vacuole. It may facilitate the formation of transport intermediates required for efficient transport out of the prevacuolar endosome.

    \ 2447 IPR001171 \ The two fungal enzymes, C-14 sterol reductase (gene ERG24 in budding yeast and erg3 in\ Neurospora Crassa) and C-24(28) sterol reductase (gene ERG4 in budding yeast and sts1\ in fission yeast), are involved in ergosterol biosynthesis. They act by reducing\ double bonds in precursors of ergosterol PUBMED:8125337.\ These proteins are highly hydrophobic and seem to contain seven or eight transmembrane\ regions. Chicken lamin B receptor that is thought to anchor the lamina to the inner\ nuclear membrane belongs to this family.\ 1132 IPR000043 \ S-adenosyl-L-homocysteine hydrolase () (AdoHcyase) is an enzyme of\ the activated methyl cycle, responsible for the reversible hydration of \ S-adenosyl-L-homocysteine into adenosine and homocysteine. AdoHcyase is an\ ubiquitous enzyme which binds and requires NAD+ as a cofactor.\ AdoHcyase is a highly conserved protein PUBMED:1631127 of about 430 to 470 amino acids.\ The family contains a glycine-rich region in the central part of AdoHcyase; a region thought to be\ involved in NAD-binding.\ 119 IPR006433 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases associated with clan U- have an unknown catalytic mechanism as the protein fold of the active site domain and the active site residues have not been reported.

    \

    This group of peptidases belong to MEROPS peptidase family U35 (clan U-). This family contains the prohead protease from HK97 and related phage and prophage. It is generally encoded next to the gene for the capsid protein that it processes, and in some cases may be fused to it. This family does not show similarity to the prohead protease of phage T4 ().

    \ 6738 IPR009685 \

    This family consists of several mammalian male enhanced antigen 1 (MEA1) proteins. The Mea-1 gene is found to be localised in primary and secondary spermatocytes and spermatids, but the protein products are detected only in spermatids. Intensive transcription of Mea-1 gene and specific localisation of the gene product suggest that Mea-1 may play a important role in the late stage of spermatogenesis PUBMED:8907304.

    \ 3745 IPR000787 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M29 (aminopeptidase T family, clan M-). The protein fold of the peptidase domain and the active site residues are not known for any members of the thermophilic metallo-aminopeptidases family.

    \ \ 6816 IPR009727 \

    This family consists of several NifT and FixU bacterial proteins. The function of NifT is unknown although it is thought that the protein may be involved in biosynthesis of the FeMo cofactor of nitrogenase although perturbation of nifT expression in K. pneumoniae has only a limited effect on nitrogen fixation PUBMED:9139910.

    \ 7528 IPR011624 \ This entry represents the extracellular domain of the 7TM-HD (7TM Receptors with HD hydrolase) PUBMED:12914674.\ 4078 IPR006909 \ This family represents a conserved C-terminal region found in eukaryotic cohesins of the Rad21, Rec8 and Scc1 families. Rad21/Rec8 like proteins mediate sister chromatid cohesion during mitosis and meiosis, as part of the cohesin complex PUBMED:11687503. Cohesion is necessary for homologous recombination (including double-strand break repair) and correct chromatid segregation. These proteins may also be involved in chromosome condensation. Dissociation at the metaphase to anaphase transition causes loss of cohesion and chromatid segregation PUBMED:10207075.\ 3532 IPR001046 \

    The natural resistance-associated macrophage protein (NRAMP) family consists of Nramp1, Nramp2, and yeast proteins Smf1 and Smf2. The NRAMP family is a novel family of functionally related proteins \ defined by a conserved hydrophobic core of ten transmembrane domains PUBMED:7479731. Nramp1 is an integral membrane protein expressed exclusively in cells of \ the immune system and is recruited to the membrane of a phagosome upon \ phagocytosis. Nramp2 is a multiple divalent cation transporter for Fe2+, Mn2+ and Zn2+\ amongst others. It is expressed at high levels in the intestine; and is \ major transferrin-independent iron uptake system in mammals PUBMED:9719491. The yeast proteins Smf1 and Smf2 may also transport divalent cations PUBMED:9632246.

    \ \

    The natural resistance of mice to infection with intracellular parasites is\ controlled by the Bcg locus, which modulates the cytostatic/cytocidal\ activity of phagocytes. Nramp1, the gene responsible, is expressed exclusively in\ macrophages and poly-morphonuclear leukocytes, and encodes a polypeptide\ (natural resistance-associated macrophage protein) with features typical of integral\ membrane proteins. Other transporter proteins from a variety of sources also belong\ to this family.

    \ \ 166 IPR002423 \

    The assembly of proteins has been thought to be the sole result of properties inherent in the primary sequence of polypeptides themselves. In some cases, however, structural information from other protein molecules is required for correct folding and subsequent assembly into oligomers PUBMED:2897629. These 'helper' molecules are referred to as molecular chaperones, a subfamily of which are the chaperonins PUBMED:1349837, which include 10 kDa and 60 kDa proteins. These are found in abundance in prokaryotes, chloroplasts and mitochondria. They are required for normal cell growth (as demonstrated by the fact that no temperature sensitive mutants for the chaperonin genes can be found in the temperature range 20 to 43 degrees centigrade PUBMED:2897629), and are stress-induced, acting to stabilise or protect disassembled polypeptides under heat-shock conditions PUBMED:1349837.

    \

    The 10 kDa chaperonin (cpn10 - or groES in bacteria) exists as a ring-shaped oligomer of between 6 to 8 identical subunits, whereas the 60 kDa chaperonin (cpn60 - or groEL in bacteria) forms a structure comprising 2 stacked rings, each ring containing 7 identical subunits PUBMED:2897629. These ring structures assemble by self-stimulation in the presence of Mg2+-ATP. The cpn10 and cpn60 oligomers also require Mg2+-ATP in order to interact to form a functional complex, although the mechanism of this interaction is as yet unknown PUBMED:1350777. This chaperonin complex is essential for the correct folding and assembly of polypeptides into oligomeric structures, of which the chaperonins themselves are not a part PUBMED:1349837. The binding of cpn10 to cpn60 inhibits the weak ATPase activity of cpn60.

    \

    The 60 kDa form of chaperonin is the immunodominant antigen of patients with Legionnaire's disease PUBMED:1672279, and is thought to play a role in the protection of the Legionella bacteria from oxygen radicals within macrophages. This hypothesis is based on the finding that the cpn60 gene is upregulated in response to hydrogen peroxide, a source of oxygen radicals. Cpn60 has also been found to display strong antigenicity in many bacterial species PUBMED:1347461, and has the potential for inducing immune protection against unrelated bacterial infections. The RuBisCO subunit binding protein (which has been implicated in the assembly of RuBisCO) and cpn60 have been found to be evolutionary homologues, the RuBisCO subunit binding protein having the C-terminal Gly-Gly-Met repeat found in all bacterial cpn60 sequences. Although the precise function of this repeat is unknown, it is thought to be important as it is also found in 70 kDa heat-shock proteins PUBMED:1672279. The crystal structure of Escherichia coli GroEL has been resolved to 2.8A PUBMED:7935790. The TCP-1 family of proteins act as molecular chaperones for tubulin, actin and probably some other proteins. They are weakly, but significantly, related to the cpn60/groEL chaperonin family.

    \ 4711 IPR002514 \ Transposase proteins are necessary for efficient DNA transposition.\ This family consists of various Escherichia coli insertion elements and other\ bacterial transposases some of which are members of the IS3 family.\ This region includes a helix-turn-helix motif (HTH) at the N terminus\ followed by a leucine zipper (LZ) motif. The LZ motif has been shown\ to mediate oligomerisation of the transposase components in IS911 PUBMED:9761671.\ 1022 IPR001507 \

    A large domain, containing around 260 amino acids, has been recognised in a variety of receptor-like eukaryotic glycoproteins PUBMED:1313375. All of these proteins are mosaic proteins composed of various domains and that all have a large extracellular region followed by either a transmembrane region and a very short cytoplasmic region or by a GPI-anchor. The domain common to all these proteins is located in the C-terminal portion of the extracellular region, and contains 8 conserved Cys residues, which are probably involved in disulphide bond formation.

    \ \

    CD105 (also called endoglin) is the regulatory component of the TGF-beta receptor complex. It is a modulator of cellular responses to TGF-beta 1.

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ 4841 IPR003509 \ The function of this family is unknown. Members include several bacterial hypothetical proteins.\ 5758 IPR010260 \

    This family consists of several short bacterial and phage proteins, which are related to the Escherichia coli protein AlpA. AlpA suppresses two phenotypes of a delta lon protease mutant, overproduction of capsular polysaccharide and sensitivity to UV light PUBMED:7511582. Several of the sequences in this family are thought to be DNA-binding proteins.

    \ 772 IPR007850 \ Proteins containing this region include Caenorhabditis elegans, UNC-89. This region is found repeated in UNC-89 and shows conservation in\ prolines, lysines and glutamic acids. Proteins with RCSD are involved in muscle M-line assembly, but the function of this region RCSD is\ not clear. \ 3849 IPR000224 \ This protein is found in ssRNA negative-strand rhabdoviruses. It is\ known as the phosphoprotein or P protein PUBMED:9375014,\ PUBMED:9343167. This protein may be part of the RNA\ dependent RNA polymerase complex PUBMED:9375014. The\ phosphorylation states of this protein may regulate the transcription\ and replication complexes PUBMED:9343167.\ 7365 IPR011426 \

    This family includes CamS (), from which Staphylococcus aureus sex pheromone staph-cAM373 is processed. It also includes a number of uncharacterised bacterial proteins.

    \ 2305 IPR007744 \ This family includes several proteins of unknown function and seems to be specific to Caenorhabditis elegans.\ 2720 IPR006148 \ This entry contains 6-phosphogluconolactonase (), Glucosamine-6-phosphate isomerase (), and Galactosamine-6-phosphate isomerase. 6-phosphogluconolactonase is the enzyme responsible for the hydrolysis of 6-phosphogluconolactone to 6-phosphogluconate, the second step in the pentose phosphate pathway. Glucosamine-6-phosphate isomerase (or Glucosamine 6-phosphate deaminase) is the enzyme responsible for the conversion of D-glucosamine 6-phosphate into D-fructose 6-phosphate PUBMED:8747459. It is the last specific step in the pathway for N-acetylglucosamine (GlcNAC) utilization in bacteria such as Escherichia coli (gene nagB) or in fungi such as Candida albicans (gene NAG1).\ A region located in the central part of Glucosamine-6-phosphate isomerase contains a conserved histidine which has been shown PUBMED:8747459, in nagB, to be important for the pyranose ring-opening step of the catalytic mechanism.\ 3130 IPR003461 \ Keratins are a well known group of intermediate filament proteins. Like actin filaments, keratins are flexible but provide a firm cell skeleton. Unlike actin, however, no known keratins are associated with motor functions. This family represents avian keratin proteins PUBMED:6200321, found in feathers, scale and claw. The avian keratins (F-ker, S-ker, C-ker and B-ker) are a complex mixture of very similar polypeptides.\ 3257 IPR003451 \

    Terpenes are among the largest groups of natural products and include compounds such as vitamins, cholesterol and carotenoids. The biosynthesis of all terpenoids begins with one or\ both of the two C5 precursors of the pathway: isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP). In\ animals, fungi, and certain bacteria, the synthesis of IPP and DMAPP occurs via the well-known mevalonate pathway, however, a second, nonmevalonate terpenoid pathway has been identified in many eubacteria, algae and the chloroplasts of higher plants PUBMED:11004185.

    LytB(IspH) catalyses the conversion of 1-hydroy-2-methyl-2-(E)-butenyl 4-diphosphate into IPP and DMAPP in this second pathway The enzyme appears to be responsible for a branch-step in the nonmevalonate pathway, in that IPP and DMAPP are produced in parallel from a single precursor although the exact mechanism of this is not currently fully understood PUBMED:11818558. Escherichia coli LytB protein had been found to regulate the activity of RelA (guanosine 3',5'-bispyrophosphate synthetase I), which in turn controls the level of a regulatory metabolite. It is involved in penicillin tolerance and the stringent response PUBMED:9537400.

    \ 7302 IPR006605 \

    Basement membranes are sheet-like extracellular matrices found at the basal\ surfaces of epithelia and condensed mesenchyma. By preventing cell mixing and\ providing a cell-adhesive substrate, they play crucial roles in tissue\ development and function. Basement menbranes are composed of an evolutionarily\ ancient set of large glycoproteins, which includes members of the laminin\ family, collagen IV, perlecan and nidogen/entactin. Nidogen/entactin is an\ important basement membrane component, which promotes cell attachment,\ neutrophil chemotaxis, trophoblast outgrowth, and angiogenesis. It consists of\ three globular regions, G1-G3. G1 and G2 are connected by a thread-like\ structure, whereas that between G2 and G3 is rod-like PUBMED:9633511, PUBMED:11427896.

    \ \

    The nidogen G2 region binds to collagen IV and perlecan. The nidogen G2\ structure is composed of two domains, an N-terminal EGF-like domain and a much larger beta-barrel domain of ~230 residues. The nidogen G2 beta-barrel consists of an 11-stranded beta-barrel\ of complex topology, the interior of which is traversed by the hydrophobic,\ predominantly alpha helical segment connecting strands C and D. The N-terminal\ half of the barrel comprises two beta-meanders (strands A-C and D-F) linked by\ the buried alpha-helical segment. The polypeptide chain then crosses the\ bottom of the barrel and forms a five-stranded Greek key motif in the C-\ terminal half of the domain. Helix alpha3 caps the top of the barrel and forms\ the interface to the EGF-like domain. The nidogen G2 beta-barrel domain has\ unexpected structural similarity to green fluorescent protein, suggesting that\ they derive from a common ancestor. A large surface patch on the barrel\ surface is strikingly conserved in all metazoan nidogens. Site-directed\ mutagenesis demonstrates that the conserved residues in the conserved patch\ are involved in the binding of perlecan, and possibly also of collagen IV PUBMED:11427896.

    \ 1444 IPR002568 \

    This family of carlavirus nucleic acid binding proteins includes a\ motif for a potential C-4 type zinc finger this has four highly conserved \ cysteine residues and is a conserved feature of the carlaviruses 3' \ terminal ORF PUBMED:2265707.\ These proteins may function as viral transcriptional regulators. \ The carlavirus family includes garlic latent virus and potato virus S \ and potato virus M, these viruses are positive strand, ssRNA with no DNA stage.

    \ 4388 IPR003335 \

    Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase\ pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to\ the translocase component. PUBMED:2202721. From there, the mature proteins are either targeted to the outer\ membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial\ chromosome.

    \

    \ The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral\ membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of\ the mature peptide into the periplasm (SecD and SecF) PUBMED:2202721. The chaperone protein SecB PUBMED:11336818 is a highly acidic homotetrameric protein that exists as a "dimer of dimers" in the bacterial cytoplasm.\ SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane\ protein ATPase SecA for secretion PUBMED:10418149. Together with SecY and SecG, SecE forms a multimeric\ channel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. The\ latter is mediated by SecA. The structure of the\ Escherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmic\ domains PUBMED:12167867. Each membrane is composed of dimers of SecYEG. The monomeric complex contains 15\ transmembrane helices. \

    \

    This family consists of various prokaryotic SecD and SecF protein export membrane proteins. The SecD and SecF equivalents of the\ Gram-positive bacterium Bacillus subtilis are jointly present in one polypeptide,\ denoted SecDF, that is required to maintain a high capacity for protein secretion.\ Unlike the SecD subunit of the pre-protein translocase of Escherichia coli, SecDF\ of B. subtilis was not required for the release of a mature secretory protein from\ the membrane, indicating that SecDF is involved in earlier translocation steps PUBMED:9694879.\ Comparison with SecD and\ SecF proteins from other organisms revealed the presence of 10 conserved\ regions in SecDF, some of which appear to be important for SecDF function.\ Interestingly, the SecDF protein of B. subtilis has 12 putative transmembrane\ domains. Thus, SecDF does not only show sequence similarity but also structural\ similarity to secondary solute transporters PUBMED:9694879.

    \ 1027 IPR007881 \ This family contains several eukaryotic transmembrane proteins which are related to the Caenorhabditis elegans protein UNC-50 . A mammalian homologue, UNCL is a novel inner nuclear membrane protein that associates with RNA and is involved in the cell-surface expression of neuronal nicotinic receptors. UNCL plays a broader role because UNCL homologues are present in two yeast and a plant species, none of which express nicotinic receptors and it is also found in tissues that lack nicotinic receptors.\ 624 IPR003765 \ The nitrate-reducing system, nitrate reductase , is stimulated by anaerobiosis, nitrate, and nitrite. The delta subunit is not part of the nitrate reductase enzyme but is most likely needed for assembly of the multisubunit enzyme complex. In the absence of the delta\ subunit the core alpha beta enzyme complex is unstable PUBMED:9738886. The delta subunit is essential for enzyme activity in vivo\ and in vitro.\ 5091 IPR007928 \

    Antifreeze proteins (AFPs) are a class of proteins that are able to bind to and inhibit the growth of macromolecular ice, thereby permitting an organism to survive subzero temperatures by decreasing the probability of ice nucleation in their bodies PUBMED:15291806. These proteins have been characterized from a variety of organisms, including fish, plants, bacteria, fungi and arthropods. This entry represents insect AFPs of the type found in spruce budworm, Choristoneura fumiferana.

    \

    The structure of these AFPs consists of a left-handed beta-helix with 15 residues per coil PUBMED:12015145. The beta-helices of insect AFPs present a highly rigid array of threonine residues and bound water molecules that can effectively mimic the ice lattice. As such, beta-helical AFPs provide a more effective coverage of the ice surface compared to the alpha-helical fish AFPs.

    \

    A second insect antifreeze from Tenebrio molitor () also consists of beta-helices, however in these proteins the helices form a right-handed twist; these proteins show no sequence homology to the current entry, but may act by a similar mechanism. The beta-helix motif may be used as an AFP structural motif in non-homologous proteins from other (non-fish) organisms as well.

    \ \ 2074 IPR005272 \

    These small proteins are approximately 100 amino acids in length and appear to be found only in gamma proteobacteria. The function of this protein family is unknown.

    \ \ \ \ 6251 IPR009447 \

    Glycosylphosphatidylinositol (GPI) is a conserved post-translational modification to anchor cell surface proteins to plasma membrane in eukaryotes. GWT1 is involved in GPI anchor biosynthesis; it is required for inositol acylation in yeast PUBMED:12714589.

    \ 4849 IPR005344 \

    Uncharacterised integral membrane protein family.

    \ 2941 IPR005211 \

    This family groups together the viral proteins BLRF1, U46, 53, and UL73. The UL73-like envelope glycoproteins, which associates in a high molecular mass complex with its counterpart, gM, induce neutralizing antibody responses in the host. These glycoprotein are highly polymorphic, particularly in the N-terminal region PUBMED:11602789.

    \ 5322 IPR008869 \ Toluene tolerance is mediated by increased cell membrane rigidity resulting from changes in fatty acid and phospholipid compositions, exclusion of toluene from the cell membrane, and removal of intracellular toluene by degradation PUBMED:9020089. Many proteins are involved in these processes. This family is a transporter which shows similarity to ABC transporters PUBMED:9658016.\ 1029 IPR007342 \ Indigoidine is a blue pigment synthesised by Erwinia chrysanthemi implicated in pathogenicity and protection from oxidative stress. IdgA is involved in indigoidine biosynthesis, but its specific function is unknown PUBMED:11790734.\ 3306 IPR003420 \ Methanol dehydrogenase (MDH) () is a bacterial periplasmic quinoprotein that oxidizes methanol to formaldehyde. MDH is a tetramer of two alpha and two beta subunits. This family contains the small beta subunit.\ 6734 IPR010696 \

    This family consists of several hypothetical bacterial proteins of around 80 residues in length. This family contains a number of conserved cysteine residues and its function is unknown.

    \ 5428 IPR008491 \ This family contains several eukaryotic sequences which are thought to be CDK5 activator-binding proteins, however, the function of this family is unknown.\ 6364 IPR010537 \

    This family contains avian adenovirus fibre proteins, which have been linked to variations in virulence PUBMED:8764019. Avian adenoviruses possess penton capsomers that consist of a pentameric base associated with two fibres PUBMED:7563058.

    \ 2660 IPR000115 \ Phosphoribosylglycinamide synthetase () (GARS) (phosphoribosylamine\ glycine ligase) PUBMED:2687276 catalyzes the second step in the de novo biosynthesis of\ purine:\ \ \ \ In bacteria GARS is a monofunctional enzyme (encoded by the purD gene), in\ yeast it is part, with phosphoribosylformylglycinamidine cyclo-ligase (AIRS) \ of a bifunctional enzyme (encoded by the ADE5,7 gene), in higher eukaryotes it\ is part, with AIRS and with\ phosphoribosylglycinamide formyltransferase (GART) \ of a trifunctional enzyme (GARS-AIRS-GART).\ 1412 IPR007322 \ The bunyaviruses are enveloped viruses with a genome consisting of 3 ssRNA segments (called L, M and S). The nucleocapsid protein is encoded by the small (S) genomic RNA. The L segment codes for an RNA polymerase. This family contains the RNA dependent RNA polymerase on the L segment.\ 6607 IPR010645 \

    This entry represents the N terminus of several putative bacterial membrane proteins, which may be sugar transporters. Note that many members are hypothetical proteins.

    \ 2331 IPR007877 \ This family consists of uncharacterised proteins from Arabidopsis thaliana.\ 2609 IPR007516 \

    Coenzyme F420 hydrogenase () reduces the low-potential two-electron acceptor coenzyme F420. This entry contains the N termini of F420 hydrogenase and dehydrogenase beta subunits PUBMED:2207102, PUBMED:10751389. The N terminus of Methanobacterium formicicum formate dehydrogenase beta chain (, ) is also represented in this entry PUBMED:3531194. This region is often found in association with the 4Fe-4S binding domain, fer4 (), and the C terminus .

    \ 4378 IPR005131 \ L-serine dehydratase is found as a heterodimer of alpha and beta chain or as a fusion of the two chains in a single protein. This enzyme catalyses the deamination of serine\ to form pyruvate and is part of the gluconeogenesis pathway.\ 1176 IPR004913 \

    The exact function of the herpesvirus glycoprotein J is unknown, but it appears to play a role in the inhibition of apotosis of the host cell PUBMED:11090178.

    \ 5097 IPR007934 \

    This family consists of several fungal alpha-L-arabinofuranosidase B proteins. L-Arabinose is a\ constituent of plant cell wall polysaccharides. It is found in a polymeric form in L-arabinan, in which\ the backbone is formed by 1,5-a- linked l-arabinose residues that can be branched via 1,2-a- and\ 1,3-a-linked l-arabinofuranose side chains. AbfB hydrolyses 1,5-a, 1,3-a and 1,2-a linkages in both\ oligosaccharides and polysaccharides, which contain terminal non-reducing l-arabinofuranoses in\ side chains PUBMED:10217508.

    \ 4002 IPR002097 \ Profilin is a small eukaryotic protein that binds to monomeric actin\ (G-actin) in a 1:1 ratio thus preventing the polymerization of actin into\ filaments (F-actin). It can also in certain circumstance promote actin\ polymerization. Profilin also binds to polyphosphoinositides such as PIP2.\ Overall sequence similarity among profilin from organisms which belong to\ different phyla (ranging from fungi to mammals) is low, but the N-terminal\ region is relatively well conserved. That region is thought to be involved in\ the binding to actin. \

    A protein structurally similar to profilin is present in the genome of variola\ and vaccinia viruses (gene A42R).

    \ \

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Ara t 8, Bet v 2, Cyn d 12, Hel a 2, Mer a 1 and Phl p 11.

    \ 2020 IPR005645 \

    The function of the proteins from this family is unknown.

    \ 1492 IPR005616 \ Members of this family include NrfF, CcmH, CycL, Ccl2.\ 4318 IPR002661 \

    The ribosome recycling factor or ribosome release factor (RRF) dissociates ribosomes from mRNA after termination of translation, and is essential for bacterial growth PUBMED:8183897. Thus ribosomes are 'recycled' and ready for another round of protein synthesis.

    \ \ 3293 IPR006922 \

    This family consists of Mbe/Mob proteins defined by an N-terminal conserved region. These proteins are essential for specific plasmid transfer.

    \ 3120 IPR001772 \ Eukaryotic protein kinases PUBMED:, PUBMED:7768349, PUBMED:1835513, PUBMED:1956325, PUBMED:3291115 are enzymes\ that belong to a very extensive family of proteins which share a conserved catalytic core common with\ both serine/threonine and tyrosine protein kinases. There are a number of conserved regions in the\ catalytic domain of protein kinases. In the N-terminal extremity of the catalytic domain there is a\ glycine-rich stretch of residues in the vicinity of a lysine residue, which has been shown to be involved\ in ATP binding. In the central part of the catalytic domain there is a conserved aspartic acid residue\ which is important for the catalytic activity of the enzyme PUBMED:1862342.\

    This domain is found in the C-terminal extremity of various serine/threonine-protein kinases from fungi, plants and animals.

    \ 562 IPR002190 \

    The first mammalian members of the MAGE (melanoma-associated antigen) gene\ family were originally described as completely silent in normal adult tissues,\ with the exception of male germ cells and, for some of them, placenta. By\ contrast, these genes were expressed in various kinds of tumors. However, other\ members of the family were recently found to be expressed in normal cells,\ indicating that the family is larger and more disparate than initially\ expected. MAGE-like genes have also been identified in non-mammalian species,\ like the zebrafish or Drosophila melanogaster. Although no MAGE homologous\ sequences have been identified in Caenorhabditis elegans, Saccharomyces\ cerevisiae or Schizosaccharomyces pombe, MAGE sequences have been found in\ several vegetal species, including Arabidopsis thaliana PUBMED:11454705.

    \

    \ The only region of homology shared by all of the members of the family is a\ stretch of about 200 amino acids which has been named the MAGE conserved\ domain. The MAGE conserved domain is usually located close to the C-terminal,\ although it can also be found in a more central position in some proteins. The\ MAGE conserved domain is generally present as a single copy but it is\ duplicated in some proteins. It has been proposed that the MAGE conserved\ domain of MAGE-D proteins might interact with p75 neurotrophin or related\ receptors PUBMED:11454705.

    \ 4034 IPR002615 \ This family consists of the photosystem I reaction centre subunit IX or PsaJ from various organisms including Synechocystis sp. (strain pcc 6803), Pinus thunbergii (green pine) and Zea mays (maize).\ PsaJ () is a small 4.4kDa, chloroplast encoded, hydrophobic subunit of the photosystem I reaction complex whose function is not yet fully understood PUBMED:10220342. PsaJ can be cross-linked to PsaF () and has a single predicted transmembrane domain. It has a proposed role in maintaing PsaF in the correct orientation to allow for fast electron transfer from soluble donor proteins to P700+ PUBMED:10220342.\ 117 IPR004014 \ The alpha chains of sodium/potassium-transporting ATPases (H+/K+ and Na+/K+-ATPase) catalyze the hydrolysis of ATP, coupled with the exchange of sodium and potassium ions across the plasma membrane. The proteins are located in the cell membrane PUBMED:2553482, the ion transport they mediate creating the\ electro-chemical gradient that provides the energy for the active transport of various nutrients. H+/K+-transporting ATPases are also responsible for production of acid in the stomach PUBMED:3023364. H+/K+ and Na+/K+-ATPase are members of the P-type (or E1-E2-type) cation-transporting ATPase superfamily, which has evolved from a common ancestral gene PUBMED:8151716. The sequences contain 10 transmembrane (TM) helices, some of which are well conserved throughout the superfamily. They may thus all operate via a similar mechanism, with an aspartylphosphoryl enzyme intermediate PUBMED:2876992 being formed during the catalytic cycle. Members of these families are involved in Na+/K+, H+/K+, Ca2+ and Mg2+- transport.\ 7650 IPR012908 \

    The sequences found in this family are similar to PGAP1 (). This is an endoplasmic reticulum membrane protein with a catalytic serine-containing motif that is conserved in a number of lipases. PGAP1 functions as a GPI inositol-deacylase; this deacylation is important for the efficient transport of GPI-anchored proteins from the endoplasmic reticulum to the Golgi body.

    \ 2959 IPR006062 \ Histidine is formed by several complex and distinct biochemical reactions catalysed by eight enzymes. Proteins\ involved in steps 4 and 6 of the histidine biosynthesis pathway are contained in one family. These enzymes are called\ His6 and His7 in eukaryotes and HisA and HisF in prokaryotes. HisA is a phosphoribosylformimino-5-aminoimidazole\ carboxamide ribotide isomerase (), involved in the fourth step of histidine biosynthesis. The bacterial HisF\ protein is a cyclase which catalyzes the cyclization reaction that produces D-erythro-imidazole glycerol phosphate during\ the sixth step of histidine biosynthesis. The yeast His7 protein is a bifunctional protein which catalyzes an \ amido-transferase reaction that generates imidazole-glycerol phosphate and 5-aminoimidazol-4-carboxamide. The latter is the\ ribonucleotide used for purine biosynthesis. The enzyme also catalyzes the cyclization reaction that produces \ D-erythro-imidazole glycerol phosphate, and is involved in the fifth and sixth steps in histidine biosynthesis.\ 7043 IPR009862 \

    This family consists of several bacterial proteins of around 110 residues in length. Members of this family seem to be specific to Agrobacterium species and to Rhizobium loti. The function of this family is unknown.

    \ 7570 IPR012925 \

    This domain is found at the C-terminus of some MerR family transcription factors and has an alpha-helical globin-like fold PUBMED:12682015. It includes Mta, a central regulator of multidrug resistance in Bacillus subtilis.

    \ 3234 IPR004463 \ UDP-3-O-(R-3-hydroxymyristoyl)-GlcNAc deacetylase from \ Escherichia coli, LpxC, was previously designated EnvA. This enzyme is involved in lipid-A precursor biosynthesis. It is essential for cell viability.\ 3146 IPR000843 \ Numerous bacterial transcription regulatory proteins bind DNA via a helix-turn-helix (HTH) motif. \ These proteins are very diverse, but for convenience may be grouped into subfamilies on the basis \ of sequence similarity. One such family groups together a range of proteins, including ascG, ccpA, \ cytR, ebgR, fruR, galR, galS, lacI, malI, opnR, purF, rafR, rbtR and scrR PUBMED:1639817, PUBMED:1805309. \ Within this family, the HTH motif is situated towards the N-terminus.\ 1021 IPR000834 \

    This group of sequences contain a diverse range of gene families, which include metallopeptidases belonging to MEROPS peptidase family M14 (carboxypeptidase A, clan MC), subfamilies M14A and M14B.

    \ \ \

    The carboxypeptidase A family can be divided into two subfamilies:\ carboxypeptidase H (regulatory) and carboxypeptidase A (digestive) PUBMED:7674922. Members of the H family have longer C-termini than those of family A PUBMED:1449602, and carboxypeptidase M (a member of the H family) is bound to the membrane by a glycosylphosphatidylinositol anchor, unlike the majority of the M14 family, which are soluble PUBMED:7674922.

    \ \

    The zinc ligands have been determined as two histidines and a glutamate,\ and the catalytic residue has been identified as a C-terminal glutamate,\ but these do not form the characteristic metalloprotease HEXXH motif PUBMED:7674922, PUBMED:6887246.\ Members of the carboxypeptidase A family are synthesised as inactive\ molecules with propeptides that must be cleaved to activate the enzyme.\ Structural studies of carboxypeptidases A and B reveal the propeptide to\ exist as a globular domain, followed by an extended alpha-helix; this\ shields the catalytic site, without specifically binding to it, while the\ substrate-binding site is blocked by making specific contacts PUBMED:7674922, PUBMED:1548696.

    \ \

    Other examples of protein families in this entry include:

    \ \ 7453 IPR011479 \

    This is a family of short hypothetical proteins found in Rhodopirellula baltica.

    \ 5850 IPR006477 \

    This group of sequences identifies a large paralogous family of variant antigens from several Plasmodium species (P. yoelii, P. berghei and P. chabaudi). It is not believed that there are any orthologs of this family in P. falciparum.

    \ 858 IPR004162 \ The seven in absentia (sina) gene was first identified in Drosophila. The Drosophila Sina protein is essential for the determination of the R7 pathway in photoreceptor cell development: the loss of functional Sina results in the transformation of the R7 precursor cell to a non-neuronal cell type. The Sina protein contains an N-terminal RING finger domain zf-C3HC4. Through this domain, Sina binds E2 ubiquitin-conjugating enzymes (UbcD1) Sina also interacts with Tramtrack (TTK88) via PHYL. Tramtrack is a transcriptional repressor that blocks photoreceptor determination, while PHYL down-regulates the activity of TTK88. In turn, the activity of PHYL requires the activation of the Sevenless receptor tyrosine kinase, a process essential for R7 determination. It is thought that Sina targets TTK88 for degradation, therefore promoting the R7 pathway. Murine and human homologues of Sina have also been identified. The human homologue Siah-1 PUBMED:9403064 also binds E2 enzymes (UbcH5) and through a series of physical interactions, targets beta-catenin for ubiquitin degradation. Siah-1 expression is enhanced by p53, itself promoted by DNA damage. Thus this pathway links DNA damage to beta-catenin degradation PUBMED:9267026, PUBMED:11389839. Sina proteins, therefore, physically interact with a variety of proteins. The N-terminal RING finger domain that binds ubiquitin conjugating enzymes is described in zf-C3HC4, and does not form part of the alignment for this family. The remainder C-terminal part is involved in interactions with other proteins, and is included in this alignment. In addition to the Drosophila protein and mammalian homologues, whose similarity was noted previously, this family also includes putative homologues from Caenorhabditis elegans, Arabidopsis thaliana.\ 3140 IPR002160 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    The Kunitz-type soybean trypsin inhibitor (STI) family consists mainly of proteinase inhibitors from Leguminosae seeds PUBMED:14705960. They belong to MEROPS inhibitor family I3, clan IC. They exhibit proteinase inhibitory activity against serine proteinases; trypsin (MEROPS peptidase family S1, ) and subtilisin (MEROPS peptidase family S8, ), thiol proteinases (MEROPS peptidase family C1, ) and aspartic proteinases (MEROPS peptidase family A1, ) PUBMED:14705960. \

    \

    Inhibitors from cereals are active against subtilisin and endogenous alpha-amylases, while some also inhibit tissue plasminogen activator. The inhibitors are usually specific for either trypsin or chymotrypsin, and some are effective against both. They are thought to protect the seeds against consumption by animal predators, while at the same time existing as seed storage proteins themselves - all the actively inhibitory members contain 2 disulphide bridges. The existence of a member with no inhibitory activity, winged bean albumin 1, suggests that the inhibitors may have evolved from seed storage proteins.

    \

    Proteins from the Kunitz family contain from 170 to 200 amino acid residues and one or two intra-chain disulphide bonds. The best conserved region is found in their N-terminal section. The crystal structures of soybean trypsin inhibitor (STI), trypsin inhibitor DE-3 from Erythrina caffra (ETI) PUBMED:1988676 and the bifunctional proteinase K/alpha-amylase inhibitor from wheat (PK13) have been solved, showing them to share the same 12-stranded beta-sheet structure as those of interleukin-1 and heparin-binding growth factors PUBMED:1738162. The beta-sheets are arranged in 3 similar lobes around a central axis, 6 strands forming an anti-parallel beta-barrel. Despite the structural similarity, STI shows no interleukin-1 bioactivity, presumably as a result of their primary sequence disparities. The active inhibitory site containing the scissile bond is located in the loop between beta-strands 4 and 5 in STI and ETI.

    \ \ \

    The STIs belong to a superfamily that also contains the interleukin-1 \ proteins, heparin binding growth factors (HBGF) and histactophilin, all of \ which have very similar structures, but share no sequence similarity with \ the STI family.

    \ 1307 IPR003147 \ Protein L is a bacterial protein with immunoglobulin (Ig) light chain-binding properties. It contains a number of homologous b1 repeats towards the N-terminus. These repeats have been found to be responsible for the interaction of protein L with Ig light chains PUBMED:1618782.\ 1753 IPR007599 \

    The endoplasmic reticulum (ER) of the yeast Saccharomyces cerevisiae contains a proteolytic system able to selectively degrade misfolded lumenal secretory proteins. For examination of the components involved in this degradation process, mutants were isolated. They could be divided into four complementation groups. The mutations led to stabilization of two different substrates for this process, and the classes were called der for degradation in the ER. DER1 was cloned by complementation of the der1-2 mutation. The DER1 gene codes for a novel, hydrophobic protein that is localized to the ER. Deletion of DER1 abolished degradation of the substrate proteins, suggesting that the function of the Der1 protein may be specifically required for the degradation process associated with the ER PUBMED:8631297. Interestingly this family seems distantly related to the Rhomboid family of membrane peptidases. This family may also mediate degradation of misfolded proteins.

    \ 7984 IPR012562 \

    This is the C-terminal domain found in the RNA helicase II / Gu protein family PUBMED:15112237.

    \ 1501 IPR003338 \ The VAT protein of the archaebacterium Thermoplasma acidophilum, like all other members of the Cdc48/p97 family of AAA ATPases, has two ATPase domains and a 185-residue amino-terminal substrate-recognition domain, VAT-N. VAT shows activity in protein folding and unfolding and thus shares the common function of these ATPases in disassembly and/or degradation of protein complexes. \

    VAT-N is composed of two equally sized subdomains. The amino-terminal subdomain VAT-Nn forms a double-psi beta-barrel whose pseudo-twofold symmetry is\ mirrored by an internal sequence repeat of 42 residues. The carboxy-terminal\ subdomain VAT-Nc forms a novel six-stranded beta-clam fold PUBMED:10531028. Together, VAT-Nn and VAT-Nc form a kidney-shaped structure, in close agreement with results from electron microscopy. VAT-Nn is related to numerous proteins including prokaryotic transcription factors, metabolic enzymes, the protease cofactors UFD1 and PrlF, and aspartic proteinases.

    \ 3878 IPR000719 \ Eukaryotic protein kinases PUBMED:12734000, PUBMED:7768349, PUBMED:1835513, PUBMED:1956325, PUBMED:3291115 are enzymes\ that belong to a very extensive family of proteins which share a conserved catalytic core common with\ both serine/threonine and tyrosine protein kinases. There are a number of conserved regions in the\ catalytic domain of protein kinases. In the N-terminal extremity of the catalytic domain there is a\ glycine-rich stretch of residues in the vicinity of a lysine residue, which has been shown to be involved\ in ATP binding. In the central part of the catalytic domain there is a conserved aspartic acid residue\ which is important for the catalytic activity of the enzyme PUBMED:1862342. This entry includes protein kinases from eukaryotes and viruses and may include some bacterial hits too.\ 3177 IPR001304 \

    Animal lectins display a wide variety of architectures.\ They are classified according to the carbohydrate-recognition\ domain (CRD) of which there are two main types, S-type and C-type PUBMED:3290208, PUBMED:8341801, PUBMED:.

    \

    C-type lectins display a wide range of specificities.\ They require Ca2+ for their activity\ They are found predominantly but not exclusively in vertebrates.

    \

    They can be classified into a number of subgroups based on their function and structure:\

  • Endocytic lectins - \ Membrane-bound receptors that mediate endocytosis \ of glycoproteins
  • \
  • Collectins -\ Represented by the soluble mannose-binding proteins of \ mammalian serum and liver PUBMED:1721241
  • \
  • Selectins - \ Membrane-bound proteins involved in inflammation PUBMED:15336187, PUBMED:1439808
  • \

    \

    CD22 (also called BL-CAM or Lyb8) are adhesion and signaling molecules. Targeted disruption of CD22 in mice results in a reduced level of surface IgM on peripheral B cells, enhanced Ca2+ flux in response to Ig signaling, variable proliferative responses to surface Ig crosslinking. Several studies observed a reduced response to thymus independent antigens. The CD22-knockout data support a role for CD22 in limiting antigen receptor signaling although a positive role in certain B cell response cannot be excluded.

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 760 IPR001683 \

    The PX (phox) domain PUBMED:8931154 occurs in a variety of eukaryotic proteins associated with intracellular signaling pathways.\ PX domains are important phosphoinositide-binding modules that have varying lipid-binding specificities PUBMED:11884510.\ The PX domain is approximately 120 residues long PUBMED:11373621,\ and folds into a three-stranded ß-sheet followed by three -helices and a proline-rich region that immediately preceeds a membrane-interaction loop and spans approximately eight hydrophobic and polar residues. \ The PX domain of p47phox binds to the SH3 domain in the same protein\ PUBMED:11373621. Phosphorylation of p47(phox), a cytoplasmic activator of the microbicidal phagocyte oxidase (phox), elicits interaction of p47(phox) with phoinositides. The protein phosphorylation-driven conformational change of p47(phox) enables its PX domain to bind to phosphoinositides, the interaction of which plays a crucial role in recruitment of p47(phox) from the cytoplasm to membranes and subsequent activation of the phagocyte oxidase. The lipid-binding activity of this protein is normally suppressed by intramolecular interaction of the PX domain with the C-terminal Src homology 3 (SH3) domain PUBMED:12356722.

    \ \

    Among these proteins are: the phox proteins p40phox (see ) and p47phox (see ), the Cpk class of phosphatidylinositol 3-kinase, phospholipase D, Saccharomyces cerevisiae Bem1 and Schizosaccharomyces pombe Scd2, S. cerevisiae GTPase-activating protein Bem3, sorting nexins and the murine adapter protein Fish. A recent multiple alignment of representative PX domain sequences can be found in PUBMED:9687503.

    \ \ \ \ \ \ \ 4392 IPR002208 \

    Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase\ pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to\ the translocase component. PUBMED:2202721. From there, the mature proteins are either targeted to the outer\ membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial\ chromosome.\

    \

    The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral\ membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of\ the mature peptide into the periplasm (SecD and SecF) PUBMED:2202721. The chaperone protein SecB PUBMED:11336818 is a highly acidic homotetrameric protein that exists as a "dimer of dimers" in the bacterial cytoplasm.\ SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane\ protein ATPase SecA for secretion PUBMED:10418149. The structure of the Escherichia coli SecYEG assembly revealed a sandwich of two membranes\ interacting through the extensive cytoplasmic domains PUBMED:12167867. Each membrane is composed of dimers of SecYEG. The\ monomeric complex contains 15 transmembrane helices.

    \

    The eubacterial secY protein PUBMED:1406280 interacts with the signal sequences of secretory proteins as well as with two other components of the protein translocation system: secA and secE. SecY is an integral plasma membrane protein of 419 to 492 amino acid residues that apparently contains 10 transmembrane (TM), 6 cytoplasmic and 5 periplasmic regions.

    \

    Cytoplasmic regions 2 and 3, and TM domains 1, 2, 4, 5, 7 and 10 are well conserved: the conserved cytoplasmic regions are believed to interact with cytoplasmic secretion factors, while the TM domains may participate in protein export PUBMED:2110998. Homologs of secY are found in archaebacteria PUBMED:1764515. SecY is also encoded in the chloroplast genome of some algae PUBMED:1544427 where it could be involved in a prokaryotic-like protein export system across the two membranes of the chloroplast endoplasmic reticulum (CER) which is present in chromophyte and cryptophyte algae.

    \ 5651 IPR008632 \ Parasitic nematodes produce at least two structurally novel classes of small helix-rich retinol- and fatty-acid-binding proteins that have no counterparts in their plant or animal hosts and thus represent potential targets for new nematicides. Gp-FAR-1 is a member of the nematode-specific fatty-acid- and retinol-binding (FAR) family of proteins but localises to the surface of the organism, placing it in a strategic position for interaction with the host. Gp-FAR-1 functions as a broad-spectrum retinol- and fatty-acid-binding protein, and it is thought that it is involved in the evasion of primary host plant defence systems PUBMED:11368765.\ 6009 IPR010389 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 1512 IPR004197 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Cellulases (Endoglucanases) catalyse the endohydrolysis of 1,4-beta-D-glucosidic linkages in cellulose.\ This is the N-terminal ig-like domain of cellulase, enzymes containing this domain belong to family 9 of the glycoside hydrolases ().

    \ 1763 IPR001295 \

    Dihydroorotate dehydrogenase () (DHOdehase) catalyzes the fourth step in the \ de novo biosynthesis of pyrimidine, the conversion of dihydroorotate into orotate. \ DHOdehase is a ubiquitous FAD flavoprotein. In bacteria (gene pyrD), DHOdease is \ located on the inner side of the cytosolic membrane. In some yeasts, such as in \ Saccharomyces cerevisiae (gene URA1), it is a cytosolic protein while in other \ eukaryotes it is found in the mitochondria PUBMED:1409592.

    \ 2110 IPR002725 \ Members of this family are found in some archaebacteria, as well as Helicobacter pylori. The proteins are 190-240 amino acids long, with the C terminus being the most conserved region, containing three conserved histidines.\ 4669 IPR001947 \

    Scorpion venoms contain a variety of peptides toxic to mammals, insects and crustaceans. Among these peptides there is a family of short toxins (30 to 40 residues) PUBMED:7998956, PUBMED:7819188 including charybdotoxin, kaliotoxin PUBMED:1730708, noxiustoxin PUBMED: and iberiotoxin PUBMED:1694175, PUBMED:1381959. Charybdotoxin consists of a single polypeptide chain and is a potent, selective inhibitor of calcium-activated potassium channels in pituitary and aortic smooth muscle cells - the toxin reversibly blocks channel activity by interacting at the external pore of the channel proteinPUBMED:2453055.

    \ \

    The tertiary structure of the toxins comprises a 3-stranded beta-sheet and a short helix, and is stabilised by a number of disulphide bridges PUBMED:1381959 as shown in the following schematic representation:\

    \
                                 +---------------------+\
                                 |                     |\
                                 |                     |\
                          xxxxxxxCxxxxxCxxxCxxxxxxxxxxxCxxxxCxCxxx\
                                       |   |                | |\
                                       |   +----------------+ |\
                                       +----------------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \

    \ 1377 IPR000089 \ The biotin / lipoyl attachment domain has a conserved lysine residue that binds biotin or lipoic\ acid. Biotin plays a catalytic role in some carboxyl transfer reactions and is\ covalently attached, via an amide bond, to a lysine residue in enzymes\ requiring this coenzyme PUBMED:1526981. E2 acyltransferases have an essential cofactor, lipoic acid, which is covalently bound\ via an amide linkage to a lysine group PUBMED:1825611. The lipoic acid cofactor is found in a variety of proteins that include, H-protein of the glycine cleavage system (GCS), mammalian and yeast pyruvate dehydrogenases and fast migrating protein (FMP) (gene acoC) from Alcaligenes eutrophus.\ 6382 IPR009103 \

    Olfactory marker protein (OMP) is a highly expressed, cytoplasmic protein found in mature olfactory sensory receptor neurons of all vertebrates. OMP is a modulator of the olfactory signal transduction cascade. The crystal structure of OMP reveals a beta sandwich consisting of eight strands in two sheets with a jelly-roll topology PUBMED:12054873. Three highly conserved regions have been identified as possible protein-protein interaction sites in OMP, indicating a possible role for OMP in modulating such interactions, thereby acting as a molecular switch PUBMED:12054872.

    \ \ 2047 IPR007254 \ This archaeal family of unknown function is predicted to be an integral membrane protein with six transmembrane regions.\ 399 IPR002079 \ The retroviral p12 protein is a proline rich virion structural protein found in the inner coat. The function carried out by\ p12 in assembly and replication is unknown.\ p12 is associated with pathogenicity of the virus PUBMED:7690416.\ 7612 IPR012494 \

    This family represents the Reovirus core protein Mu-2. Mu-2 is a microtubule associated protein and is thought to play a key role in the formation and structural organisation of reovirus inclusion bodies PUBMED:11932414, PUBMED:1566600.

    \ 6214 IPR009431 \

    This family consists of several D1 dopamine receptor-interacting (calcyon) proteins. D1/D5 dopamine receptors in the basal ganglia, hippocampus, and cerebral cortex modulate motor, reward, and cognitive behaviour. D1-like dopamine receptors likely modulate neocortical and hippocampal neuronal excitability and synaptic function via Ca2+ as well as cAMP-dependent signaling PUBMED:11929934. Defective calcyon proteins have been implicated in both attention-deficit/hyperactivity disorder (ADHD) PUBMED:11923911 and schizophrenia.

    \ 4853 IPR005115 \

    This domain is found duplicated in bacterial membrane proteins of unknown function and contains three transmembrane helices. The conserved glycines are suggestive of an ion channel.

    \ 259 IPR005044 \

    This family consists of proteins of unknown function found in Caenorhabditis species.

    \ 7021 IPR009848 \

    This family consists of several hypothetical Lactococcus lactis and related phage proteins of around 75 residues in length. The function of this family is unknown.

    \ 1100 IPR004292 \ The adenoviral protein 52K (named after the earliest known 52kDa members) is a DNA-binding protein PUBMED:8627769 that is probably involved in virion assembly.\ 4496 IPR000737 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    The squash inhibitors form one of a number of serine proteinase inhibitor families. They belong to MEROPS inhibitor family I7, clan IE. They are generally annotated as either trypsin or elastase inhibitors (MEROPS peptidase family S1, ). The proteins, found exclusively in the seeds of the cucurbitaceae, e.g. Citrullus lanatus (watermelon), Cucumis sativus (cucumber), Momordica charantia (balsam pear), are approximately 30 residues in length and contain 6 Cys residues, which form 3 disulphide bonds PUBMED:2914611. The inhibitors function by being taken up by a serine protease (such as trypsin),\ which cleaves the peptide bond between Arg/Lys and Ile residues in the N-terminal portion of the protein PUBMED:1731946, PUBMED:2914611. Structural studies have shown that the inhibitor has an ellipsoidal shape, and is largely composed of beta-turns PUBMED:2914611. The fold and Cys connectivity\ of the proteins resembles that of potato carboxypeptidase A inhibitor PUBMED:1731946.

    \ \ 7039 IPR009859 \

    This family consists of several hypothetical bacterial and phage proteins of around 180 residues in length. The function of this family is unknown.

    \ 6897 IPR009775 \

    This family consists of several Porcine reproductive and respiratory syndrome virus (PRRSV) ORF2b proteins. The function of this family is unknown however it is known that large amounts of 2b protein are present in the virion and it is thought that this protein may be an integral component of the virion PUBMED:11504553.

    \ 6821 IPR009732 \

    This family consists of several hypothetical bacterial proteins of around 120 residues in length. The function of this family is unknown.

    \ 3359 IPR005526 \ In Escherichia coli assembles into a Z ring at midcell while assembly at polar sites is prevented by the min system. MinC a component of this system, is an inhibitor of FtsZ assembly that is positioned within the cell by interaction with MinDE. MinC is an oligomer, probably a dimer PUBMED:10869074. The C-terminal half of MinC is the most conserved and interacts with MinD. The N-terminal half is thought to interact with FtsZ.\ 2911 IPR003450 \ This family represents the herpesvirus origin of replication binding protein, probably involved in DNA replication.\ 6398 IPR009214 \ There are currently no experimental data for members of this group or their homologues. However, these proteins contain predicted integral membrane proteins (with several transmembrane segments).\ 3535 IPR003873 \ This is a family of small nonstructural proteins, well conserved among Coronavirus strains. This protein is also found in murine hepatitis virus as small envelope protein E.\ 2789 IPR000312 \

    The glycosyl transferase family includes anthranilate phosphoribosyltransferase (TrpD, ) and thymidine phosphorylase ().\ All these proteins can transfer a phosphorylated ribose substrate. Thymidine phosphorylase () catalyses the reversible phosphorolysis\ of thymidine, deoxyuridine and their analogues to their respective bases and\ 2-deoxyribose 1-phosphate. This enzyme regulates the availability of thymidine\ and is therefore essential to nucleic acid metabolism.

    \ \ \ \ 2059 IPR007228 \

    This domain is found in a family of long proteins that are currently found only in rice. They have no known function. However they may be some kind of transposable element. There is a putative gypsy type transposon domain () towards the N terminus of the proteins.

    \ 2685 IPR002005 \ Rab proteins constitute a family of small GTPases that serve a regulatory\ role in vesicular membrane traffic PUBMED:7957092, PUBMED:7585614; C-terminal geranylgeranylation is\ crucial for their membrane association and function. This post-translational\ modification is catalysed by Rab geranylgeranyl transferase (Rab-GGTase), a \ multi-subunit enzyme that contains a catalytic heterodimer and an accessory\ component, termed Rab escort protein (REP)-1 PUBMED:7957092. REP-1 presents newly-\ synthesised Rab proteins to the catalytic component, and forms a stable\ complex with the prenylated proteins following the transfer reaction. \

    The mechanism of REP-1-mediated membrane association of Rab5 is similar\ to that mediated by Rab GDP dissociation inhibitor (GDI). REP-1 and Rab GDI \ also share other functional properties, including the ability to inhibit the\ release of GDP and to remove Rab proteins from membranes.

    \

    The crystal structure of the bovine alpha-isoform of Rab GDI has been\ determined to a resolution of 1.81A PUBMED:8609986. The protein is composed of two\ main structural units: a large complex multi-sheet domain I, and a smaller\ alpha-helical domain II.

    \

    The structural organisation of domain I is closely related to FAD-containing\ monooxygenases and oxidases PUBMED:8609986. Conserved regions common to GDI and the\ choroideraemia gene product, which delivers Rab to catalytic subunits of\ Rab geranylgeranyltransferase II, are clustered on one face of the domain\ PUBMED:7585614. The two most conserved regions form a compact structure at the apex of\ the molecule; site-directed mutagenesis has shown these regions to play a\ critical role in the binding of Rab proteins PUBMED:8609986.

    \ 6179 IPR009411 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 2619 IPR005567 \

    This region contains the important motif (LXXLL) necessary for the interaction of FTZ with the nuclear receptor FTZ-F1. FTZ is thought to represent a category of LXXLL motif-dependent co-activators for nuclear receptors.

    \ 6509 IPR010601 \

    This family consists of several hypothetical proteins of around 360 residues in length and seems to be specific to Caenorhabditis elegans. The function of this family is unknown.

    \ 817 IPR004039 \ Rubredoxin is a low molecular weight iron-containing bacterial protein involved in electron transfer PUBMED:2244884, PUBMED:1992166, sometimes\ replacing ferredoxin as an electron carrier PUBMED:7726577.\ \

    The 3-D structures of a number of rubredoxins have been solved PUBMED:1303768, PUBMED:3441010. The fold belongs to the alpha+beta class, with 2 alpha-helices and 2-3\ beta-strands. Its active site contains an iron ion which is co-ordinated by the sulphurs of four conserved cysteine residues forming an\ almost regular tetrahedron. The conserved cysteines reside on two loops, which are the most conserved regions of the protein. In addition, a ring of acidic residues in the proximity of the [Fe(Cys)4] centre is also well-conserved PUBMED:3441010. \

    \ 5420 IPR008902 \ This family consists of bacterial rhamnosidase A and B enzymes. L-Rhamnose is abundant in biomass as a common constituent of glycolipids and glycosides, such as plant pigments, pectic polysaccharides, gums or biosurfactants. Some rhamnosides are important bioactive compounds. For example, terpenyl glycosides, the glycosidic precursor of aromatic terpenoids, act as important flavouring substances in grapes. Other rhamnosides act as cytotoxic rhamnosylated terpenoids, as signal substances in plants or play a role in the antigenicity of pathogenic bacteria PUBMED:10632887.\ 4778 IPR007129 \

    Yeast biquinol-cytochrome C chaperone is required for assembly of coenzyme QF-2-cytochrome C reductase. It appears to be found in a number of different organisms including human, Caenorhabditis elegans and Rhizobium meliloti.

    \ 5043 IPR007329 \ This conserved region includes the FMN-binding site of the NqrC protein PUBMED:11248234 as well as the NosR and NirI regulatory proteins.\ 7282 IPR010897 \

    This family contains the bacterial stage II sporulation protein P (SpoIIP) (approximately 350 residues long). It has been shown that a block in polar cytokinesis in Bacillus subtilis is mediated partly by transcription of spoIID, spoIIM and spoIIP. This inhibition of polar division is involved in the locking in of asymmetry after the formation of a polar septum during sporulation PUBMED:11886548.

    \ 5414 IPR008399 \ This region is found in the putatively cytoplasmic C terminus of the anthrax receptor.\ 1391 IPR003459 \ This protein is encoded by an open reading frame in plasmid borne DNA repeats of Borrelia spp species. This protein is known as ORF-A PUBMED:8636030. The function of this putative protein is unknown.\ 662 IPR002641 \ This family consists of various patatin glycoproteins from the total soluble protein in potato tubers PUBMED:3371664. Patatin is a storage protein but it also has the enzymatic activity of lipid acyl hydrolase, catalysing the cleavage of fatty acids from membrane lipids PUBMED:3371664.\ 4491 IPR007390 \ One of the family members is Bacillus subtilis stage V sporulation protein R, which is involved in spore cortex formation PUBMED:8144469. Little is known about cortex biosynthesis, except that it depends on several sigma E controlled genes, including spoVR PUBMED:8982457.\ 7397 IPR011494 \

    The Hira proteins are found in a range of eukaryotes and are implicated in the assembly of repressive chromatin. These proteins also contain .

    \ 2508 IPR013130 \ This family includes a common region in the transmembrane proteins mammalian cytochrome b-245 heavy chain (gp91-phox), ferric reductase transmembrane component in yeast and respiratory burst oxidase from Arabidopsis thaliana.\ This may be a family of flavocytochromes capable of moving electrons across the plasma membrane PUBMED:8321236 that include a potential FAD binding domain.\ Mutations in the sequence of cytochrome b-245 heavy chain (gp91-phox)\ lead to the X-linked chronic granulomatous disease. The bacteriocidal\ ability of phagocytic cells is reduced and is characterised by the\ absence of a functional plasma membrane associated NADPH oxidase PUBMED:3600768.\ \ 6208 IPR009427 \

    This family consists of several hypothetical Borrelia burgdorferi and Borrelia hermsii proteins. The function of this family is unknown.

    \ 4095 IPR002719 \ Retinoblastoma-like and retinoblastoma-associated proteins may have a function in cell cycle regulation. They form a complex with adenovirus E1A and SV40 large T antigen, and may bind and modulate the function of certain cellular proteins with which T and E1A compete for pocket binding. The proteins may act as tumor suppressors, and are potent inhibitors of E2F-mediated trans-activation. \ This domain has the cyclin fold PUBMED:8152925.\ \

    The crystal structure of the Rb pocket bound to a nine-residue E7 peptide containing the LxCxE motif, shared by other Rb-binding viral and cellular proteins, shows that the LxCxE peptide binds a highly conserved groove on the B-box portion of the pocket; the A-box portion (see ) appears to be required for the stable folding of the B box. Also highly conserved is the extensive A-B interface, suggesting that it may be an additional protein-binding site. The A and B boxes each contain the cyclin-fold structural motif, with the LxCxE-binding site on the B-box cyclin fold being similar to a Cdk2-binding site of cyclin A and to a TBP-binding site of TFIIB PUBMED:9495340.

    \ \

    The A and B boxes are found at the C-terminal end of the protein; the B-box is on C-terminal side of the A-box.

    \ 102 IPR004324 \ Members of this family are transmembrane proteins. Several are Leishmania putative proteins that are thought to be\ pteridine transporters PUBMED:10589984, PUBMED:7984172. This family also contains five putative Arabidopsis thaliana proteins of unknown\ function as well as two predicted prokaryotic proteins (from the cyanobacteria Synechocystis and Synechococcus).\ 1383 IPR002663 \ VP3 is a minor structural component of the virus. The large RNA segment of birnaviruses codes for a polyprotein (N-VP2-VP4-VP3-C) PUBMED:2828658.\ 495 IPR007150 \ Hus1, Rad1, and Rad9 are three evolutionarily conserved proteins required for checkpoint control in fission yeast. These proteins are known to form a stable complex in vivo PUBMED:11739777. Hus1-Rad1-Rad9 complex may form a PCNA-like ring structure, and could function as a sliding clamp during checkpoint control.\ 3556 IPR007340 \ This family includes the Haemophilus influenzae opacity-associated protein. This protein is required for efficient nasopharyngeal mucosal colonization, and its expression is associated with a distinctive transparent colony phenotype. OapA is thought to be a secreted protein, and its expression exhibits high-frequency phase variation PUBMED:8559074.\ 1656 IPR005479 \

    Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate from \ glutamine () or ammonia () and bicarbonate PUBMED:1972379. This important enzyme \ initiates both the urea cycle and the biosynthesis of arginine and pyrimidines. Glutamine-dependent CPSase \ (CPSase II) is involved in the biosynthesis of pyrimidines and purines. In bacteria such as Escherichia coli, a \ single enzyme is involved in both biosynthetic pathways while other bacteria have separate enzymes. The \ bacterial enzymes are formed of two subunits. A small chain (carA) that provides glutamine amidotransferase \ activity (GATase) necessary for removal of the ammonia group from glutamine, and a large chain (carB)\ that provides CPSase activity. Such a structure is also present in fungi for arginine biosynthesis (CPA1 \ and CPA2).

    Two main CPSases have been identified in mammals, CPSase I is mitochondrial, is found in \ high levels in the liver and is involved in arginine biosynthesis; while CPSase II is cytosolic, is \ associated with aspartate carbamoyltransferase (ATCase) and dihydroorotase (DHOase) and is involved in \ pyrimidine biosynthesis. In the pyrimidine pathway in most eukaryotes, CPSase is found as a domain in a \ multi-functional protein, which also has GATase, ACTase and DHOase activity. Ammonia-dependent CPSase \ (CPSase I) is involved in the urea cycle in ureolytic vertebrates and is a monofunctional protein located \ in the mitochondrial matrix. The CPSase domain is typically 120 kD in size and has arisen from the \ duplication of an ancestral subdomain of about 500 amino acids. Each subdomain independently binds to ATP \ and it is suggested that the two homologous halves act separately, one to catalyze the phosphorylation of \ bicarbonate to carboxyphosphate and the other that of carbamate to carbamyl phosphate. The CPSase subdomain \ is also present in a single copy in the biotin-dependent enzymes acetyl-CoA carboxylase () (ACC), \ propionyl-CoA carboxylase () (PCCase), pyruvate carboxylase () (PC) and urea carboxylase\ ().

    \ 5832 IPR010305 \

    This family consists of several small bacterial proteins several of which are classified as putative lipoproteins. The function of this family is unknown.

    \ 2243 IPR007628 \ This is a family of uncharacterised proteins.\ 5421 IPR006530 \

    These sequences contain two tandem copies of a 21-residue extracellular repeat that is found in Gram-negative, Gram-positive, and animal proteins. The repeat is named for a YD dipeptide, the most strongly conserved motif of the repeat. These repeats appear in general to be involved in binding carbohydrate; the chicken teneurin-1 YD-repeat region has been shown to bind heparin PUBMED:10341219, PUBMED:7934896, PUBMED:2403547.

    \ 1771 IPR004123 \

    Thioredoxins PUBMED:3896121, PUBMED:2668278, PUBMED:7788289, PUBMED:7788290 are small disulphide-containing redox proteins that have been found in all the kingdoms of living organisms. Thioredoxin serves as a general protein disulphide oxidoreductase. It interacts with a broad range of proteins by a redox mechanism based on reversible oxidation of 2 cysteine thiol groups to a disulphide, accompanied by the transfer of 2 electrons and 2 protons. The net result is the covalent interconversion of a disulphide and a dithiol.

    \

    Compared to human thioredoxin, human U5 snRNP-specific protein U5-15kD contains 37 additional residues that may cause structural changes which most likely form putative binding sites for other spliceosomal proteins or RNA. Although U5-15kD apparently lacks protein disulphide isomerase activity, it is\ strictly required for pre-mRNA splicing PUBMED:10610776.

    \ 470 IPR001199 \ Cytochromes b5 are ubiquitous electron transport proteins found in animals, plants and\ yeasts PUBMED:2752049. The microsomal and mitochondrial variants are membrane-bound, \ while those from erythrocytes and other animal tissues are water-soluble PUBMED:4030743, PUBMED:8439576.

    The 3D structure of bovine cyt b5 is known, the\ fold belonging to the alpha+beta class, with 5 strands and 5 short helices\ forming a framework for supporting a central haem group PUBMED:1167544. The cytochrome b5 domain is similar to that of a number\ of oxidoreductases, such as plant and fungal nitrate reductases, sulphite oxidase, yeast\ flavocytochrome b2 (L-lactate dehydrogenase) and plant cyt b5/acyl lipid desaturase\ fusion protein.

    \ 7027 IPR009850 \

    This family represents a conserved region approximately 150 residues long that is sometimes repeated within some Babesia bovis proteins of unknown function.

    \ 4425 IPR007127 \

    The bacterial core RNA polymerase complex, which consists of five subunits, is sufficient for transcription elongation\ and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth,\ dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a\ holoenzyme PUBMED:3052291. RNA polymerase recruits alternative sigma factors\ as a means of switching on specific regulons. Most bacteria express a multiplicity of sigma factors. Two of these factors, \ sigma-70 (gene rpoD), generally known as the major or primary sigma factor, and sigma-54 (gene rpoN or ntrA) \ direct the transcription of a wide variety of genes. The other sigma factors, known as alternative sigma \ factors, are required for the transcription of specific subsets of genes.

    With regard to sequence similarity, \ sigma factors can be grouped into two classes, the sigma-54 and sigma-70 families. Sequence alignments of the sigma70 family members reveal four conserved regions that can be further divided into subregions eg. sub-region 2.2, which\ may be involved in the binding of the sigma factor to the core RNA polymerase; and sub-region 4.2, which \ seems to harbor a DNA-binding 'helix-turn-helix' motif involved in binding the conserved -35 region of \ promoters recognized by the major sigma factors PUBMED:3092189, PUBMED:1597408. \

    \

    Region 1.1 modulates DNA binding by region 2 and 4 when sigma is unbound by the core RNA polymerase PUBMED:9927430, PUBMED:10613885. Region 1.1 is also involved in promoter binding.

    \ 3716 IPR000250 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases in family G1 form a subset of what were formerly termed 'pepstatin-insensitive carboxyl proteinases'. After its\ discovery in about 1970, the pentapeptide pepstatin soon came to be thought of as a very general inhibitor of the\ endopeptidases that are active at acidic pH. But more recently several acid-acting endopeptidases from bacteria and fungi had\ been found to be resistant to pepstatin. The unusual active sites of the 'pepstatin-insensitive carboxyl peptidases' proved\ difficult to characterise, but it has now been established that the enzymes from bacteria are acid-acting serine peptidases in\ family S53 (clan SB), , whereas the fungal enzymes are in family G1 (formerly A4). The importance of glutamate ('E') and\ glutamine ('Q') residues in the active sites of the family G1 enzymes led to the family name, Eqolisin.

    \ \

    This group of glutamate/glutamine peptidases belong to MEROPS peptidase family G1 (eqolisin family, clan GA). An example of this group is scytalidoglutamic peptidase. The proteins are thermostable, pepstatin insensitive and are active at low pH ranges PUBMED:7674922. The enzyme has a unique heterodimeric structure, with a 39-residue light chain and a 173-residue heavy chain bound to each other non-covalently PUBMED:1918060. The tertiary structure of the active site of scytalidoglutamic peptidase (MEROPS G01.001) with a bound tripeptide product has been interpreted as\ showing that Glu136 is the primary catalytic residue. The most likely mechanism is suggested to be\ nucleophilic attack by a water molecule activated by the Glu136 side chain on the si-face of the scissile peptide bond\ carbon atom to form the tetrahedral intermediate. Electrophilic assistance, and oxyanion stabilisation, are provided by the\ side-chain amide of Gln53.

    \ \ \

    Both scytalidoglutamic peptidase (MEROPS G01.001) and aspergilloglutamic peptidase (MEROPS G01.002) cleave the Tyr26\ Thr27 bond in the B chain of oxidized insulin; a bond not cleaved by other acid-acting endopeptidases. Scytalidoglutamic\ peptidase is most active on casein at pH 2 and is inhibited by 1,2-epoxy-3-(p-nitrophenoxy)propane (EPNP), a compound that also\ inhibits pepsin.

    \ 7934 IPR012526 \

    This family consists of antimicrobial peptides secreted by scorpions. Novel antimicrobial peptides have been isolated from scorpions, namely the opistoporin PUBMED:12354111 and the pandinin PUBMED:11563967. These peptides form essentially helical structures and demonstrate high antimicrobial activity against Gram-negative and Gram-positive bacteria respectively.

    \ 648 IPR005055 \

    A class of small (14–20 Kd) water-soluble proteins, called odorant binding proteins (OBPs), first discovered in the insect sensillar lymph but also in the mucus of vertebrates, is postulated to mediate the solubilisation of hydrophobic odorant molecules, and thereby to facilitate their transport to the receptor neurons. The product of a gene expressed in the olfactory system of Drosophila melanogaster, OS-D, shares features common to vertebrate\ odorant-binding proteins, but has a primary structure unlike odorant-binding proteins PUBMED:8206941. OS-D derivatives have subsequently been found in chemosensory organs of phylogenetically distinct insects, including cockroaches, phasmids and moths, suggesting that OS-D-like proteins seem to be conserved in the insect phylum.

    \ 2324 IPR006506 \

    These sequences represent a hypothetical equivalog of gamma proteobacteria, which includes HI0040.

    \ 5363 IPR008451 \ This family consists of several ALT protein homologues found in nematodes. Lymphatic filariasis is a major tropical disease caused by the mosquito borne nematodes Brugia and Wuchereria. About 120 million people are infected and at risk of lymphatic pathology such as acute lymphangitis and elephantiasis. Expression of alt-1 and alt-2 is initiated midway through development in the mosquito, peaking in the infective larva and declining sharply following entry into the host. ALT-1 and the closely related ALT-2 have been found to be strong candidates for a future vaccine against Homo sapiens filariasis PUBMED:10858234.\ 43 IPR007865 \

    This N-terminal domain is associated with N-terminal region of aminopeptidase P (X-Pro aminopeptidase I and II, ), Xaa-Pro dipeptidase (prolidase, ) and related sequences. It is not found associated with methionyl aminopeptidase 1 () or methionyl aminopeptidase 2 () families. The domain is structurally very similar PUBMED:9520390 to the creatinase N-terminal domain (), however, little or no sequence similarity exists between the two domains.

    \ \

    The sequences belong to MEROPS peptidase family M24B, clan MG.

    \ 8018 IPR012643 \

    This family consists of the wound-inducible basic proteins from plants. The metabolic activities of plants are dramatically altered upon mechanical injury or pathogen attack. A large number of proteins accumulates at wound or infection sites, such as the wound-inducible basic proteins. These proteins are small, 47 amino acids in length, has no signal peptides and are hydrophilic and basic PUBMED:8310075.

    \ 6702 IPR009164 \ This group represents a fructose-1,6-bisphosphatase, Bacillus type PUBMED:9696785.\ 2933 IPR005051 \ The UL46 protein (VP11/12) is\ produced in the late phase of Herpes virus infection in a manner highly dependent on viral DNA synthesis, and is mainly distributed at the edge of the nucleus in the cytoplasm. It is a tegument phosphoprotein reported to modulate the activity of UL48 (anti-TNF) protein.\ 1524 IPR004867 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ This short domain is found in members of the glycoside hydrolase family 20 () and represents the C terminal domain in chitobiases and beta-hexosaminidases . It is composed of a beta sandwich structure PUBMED:8673609. The function of this domain is unknown. \ \ 5032 IPR001531 \

    Bacillus cereus contains a monomeric phospholipase C () (PLC) of 245 amino-acid residues. Although PLC prefers to act on phosphatidylcholine, it also shows weak catalytic activity with sphingomyelin and phosphatidylinositol PUBMED:2841128. Sequence studies have shown the protein to be similar both to alpha toxin from Clostridium perfringens and Clostridium bifermentans, a phospholipase C involved in haemolysis and cell rupture PUBMED:2536355, and to lecithinase from Listeria monocytogenes, which aids cell-to-cell spread by breaking down the 2-membrane vacuoles that surround the bacterium during transfer PUBMED:1309513.

    \

    Each of these proteins is a zinc-dependent enzyme, binding 3 zinc ions per molecule PUBMED:2111259. The enzymes catalyse the conversion of phosphatidylcholine and water to 1,2-diacylglycerol and choline phosphate PUBMED:2841128, PUBMED:2536355, PUBMED:2111259.

    \

    In B. cereus, there are nine residues known to be involved in binding the zinc ions: 5 His, 2 Asp, 1 Glu and 1 Trp. These residues are all conserved in the Clostridium alpha-toxin.

    \ 887 IPR001104 \

    Synonym(s): Steroid 5-alpha-reductase

    \

    3-oxo-5-alpha-steroid 4-dehydrogenases, catalyse the conversion of 3-oxo-5-alpha-steroid + acceptor to 3-oxo-delta(4)-steroid + reduced acceptor. The steroid 5-alpha-reductase enzyme is responsible for the formation of dihydrotestosterone, this hormone promotes the differentiation of male external genitalia and the prostate during fetal development PUBMED:1686016. In humans mutations in this enzyme can cause a form of male pseudohermaphorditism in which the external genitalia and prostate fail to develop normally. A related\ enzyme is also found in plants is DET2, a steroid reductase from Arabidopsis. Mutations in this enzyme cause defects in light-regulated development PUBMED:8602526. This domain is present in both type 1 and type 2 forms.

    \ 7907 IPR012975 \

    This domain is found C-terminal to 1 or 2 domains PUBMED:15112237 in NONA and PSP1 proteins.

    \ 2707 IPR004379 \ UDP-galactopyranose mutase () is involved in the conversion of UDP-GALP into UDP-GALF through a 2-keto intermediate, and contains FAD as a cofactor. The gene is known as glf, ceoA, and rfbD. It is known experimentally in \ Escherichia coli, Mycobacterium tuberculosis, and Klebsiella pneumoniae.\ 347 IPR003761 \

    Exonuclease VII is composed of two non-identical subunits; one large subunit and 4 small ones PUBMED:6284744. Exonuclease VII catalyses exonucleolytic cleavage in\ either 5'-3' or 3'-5' direction to yield 5'-phosphomononucleotides.

    \ 4357 IPR003519 \

    Salmonella typhimurium contains a 90kb plasmid that is associated with\ virulence. This plasmid encodes at least 6 genes needed by the \ bacterium for invading host macrophages during infection. These include \ the 70kDa mkaA protein PUBMED:2164511, a recognised virulence factor.

    \

    Deletion studies into the virulence plasmid have shown that an open reading\ frame encoding a 28kDa protein was needed for successful invasion of the \ host. This protein, designated mkfA PUBMED:2164511, VRP4 PUBMED:2696057 or VirA PUBMED:1657882 by different\ groups, is utilised by the microbe upon entry into macrophages, although the\ exact mechanism is unclear.

    \ 4528 IPR006088 \

    This family includes C-5 sterol desaturase and C-4 sterol methyl oxidase. Members of this family are involved in cholesterol biosynthesis and biosynthesis of a plant cuticular wax. These enzymes contain many conserved histidine residues. Members of this family are integral membrane proteins.

    \ 5644 IPR008863 \ This family consists of several prokaryotic TelA like proteins. TelA and KlA are associated with tellurite resistance PUBMED:9406390 and plasmid fertility inhibition PUBMED:7665479.\ 1532 IPR007439 \

    This family represents the bacterial chemotaxis phosphatase, CheZ. This protein forms a dimer characterised by a long four-helix bundle, composed of two helices from each monomer. CheZ dephosphorylates CheY in a reaction that is essential to maintain a continuous chemotactic response to environmental changes. It is thought that CheZ's conserved residue Gln 147 orientates a water molecule for nucleophilic attack at the CheY active site.

    \ 5483 IPR008799 \ This family consists of several avirulence D (AvrD) proteins primarily found in Pseudomonas syringae PUBMED:10485919.\ 6074 IPR009360 \

    Isy1 protein is important in the optimisation of splicing PUBMED:10094305.

    \ 1536 IPR003517 \

    Three cysteine-rich proteins (also believed to be lipoproteins) make up the\ extracellular matrix of the Chlamydial outer membrane PUBMED:2287277. They are involved \ in the essential structural integrity of both the elementary body (EB) and \ recticulate body (RB) phase. As these bacteria lack the peptidoglycan layer\ common to most Gram-negative microbes, such proteins are highly important \ in the pathogenicity of the organism.

    \

    The largest of these is the major outer membrane protein (MOMP), and \ constitutes around 60% of the total protein for the membrane PUBMED:8477811. OMP2\ is the second largest, with a molecular mass of 58kDa, while the OMP3\ protein is ~15kDa PUBMED:2287277. MOMP is believed to elicit the strongest immune \ response, and has recently been linked to heart disease through its sequence\ similarity to a murine heart-muscle specific alpha myosin PUBMED:10037605.

    \

    The OMP3 family plays a structural role in the outer membrane during \ the EB stage of the Chlamydial cell, and different biovars show a small, yet \ highly significant, change at peptide charge level PUBMED:2287277. Members of this \ family include Chlamydia trachomatis, Chlamydia pneumoniae, and Chlamydia psittaci.

    \ 2379 IPR001913 \

    Equine arteritis virus small envelope glycoprotein (GS) is a class I transmembrane protein which adopts a number of different conformations PUBMED:8938984, PUBMED:7745690.

    \ 3989 IPR007498 \

    Paraquat is a superoxide radical-generating agent. The promoter for the pqiA gene is also inducible by other known superoxide generators PUBMED:7751275. This is predicted to be a family of integral membrane proteins, possibly located in the inner membrane. This family is related to NADH dehydrogenase subunit 2 ().

    \ 7575 IPR012921 \

    Spen (split end) proteins regulate the expression of key transcriptional effectors in diverse signalling pathways. They are large proteins characterised by N-terminal RNA-binding motifs and a highly conserved C-terminal SPOC (Spen paralog and ortholog C-terminal) domain. The function of the SPOC domain is unknown, but the SPOC domain of the SHARP Spen protein has been implicated in the interaction of SHARP with the SMRT/NcoR corepressor, where SHARP plays an essential role in the repressor complex PUBMED:12897056.

    \

    The SPOC domain is folded into a single compact domain consisting of a beta-barrel with seven strands framed by six alpha helices. A number of deep grooves and clefts in the surface, plus two nonpolar loops, render the SPOC domain well suited to protein-protein interactions; most of the conserved residues occur on the protein surface rather than in the core. Other proteins containing a SPOC domain include drosophila Split ends, which promotes sclerite development in the head and restricts it in the thorax, and mouse MINT (homologue of SHARP), which is involved in skeletal and neuronal development via its repression of Msx2.

    \ 7888 IPR012639 \

    This family consists of the tryptophan operon leader peptides. The tryptophan operon is regulated by transcription attenuation in response to changes in the level of tryptophan. The transcript of the leader peptide can adopt alternative mutually-exclusive secondary structures that would either result in termination of transcription of the tryptophan structural genes or in transcription of the entire operon PUBMED:12213655.

    \ 2985 IPR001400 \

    Somatotropin is a hormone that plays an important role in growth control. It belongs to a family that includes choriomammotropin (lactogen), its placental analogue; prolactin, which promotes lactation in the mammary gland, and placental prolactin-related proteins; proliferin and proliferin related protein; and somatolactin from various fish PUBMED:, PUBMED:2765528, PUBMED:1993170, PUBMED:2790033.\ The 3D structure of bovine somatotropin has been predicted using a combination of heuristics and energy minimisation PUBMED:2021631.

    \ 8139 IPR013190 \

    This putative domain is found at the C-terminus of glycosyl hydrolase family 98 proteins. This domain is not expected to form part of the catalytic activity.

    \ 7140 IPR009924 \

    This family consists of several hypothetical Caenorhabditis elegans proteins of around 85 residues in length. The function of this family is unknown.

    \ 4914 IPR004096 \ Central cellular functions such as metabolism, solute transport and signal transduction are regulated, in part, via binding of small molecules by specialized domains.\ The 4-vinyl reductase (4VR) domain is a predicted small molecular binding domain, that may bind to hydrocarbons PUBMED:11292341. Proteins that contain this domain include a regulator of the phenol catabolic pathway and a protein involved in chlorophyll biosynthesis.\ 2173 IPR007549 \

    This is a domain of uncharacterised prokaryotic proteins. It is often found C-terminal to the radical SAM domain ().

    \ 4217 IPR002132 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L5 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L5 is known to be involved in binding 5S RNA to the large ribosomal subunit. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities PUBMED:2198942, PUBMED:2016059, PUBMED:1840500, PUBMED:, groups:

    \ \

    L5 is a protein of about 180 amino-acid residues.

    \ 6172 IPR010465 \

    This domain is found in Diaphanous-related formins (Drfs). It binds the N-terminal GTPase-binding domain; this link is broken when GTP-bound Rho binds to the GBD and activates the protein. The addition of diaphanous activating domains (DAD) to mammalian cells induces actin filament formation, stabilises microtubules, and activates serum-response mediated transcription PUBMED:12676083.

    \ 5610 IPR008605 \ This family consists of several eukaryotic extracellular matrix protein 1 (ECM1) sequences. ECM1 has been shown to regulate endochondral bone formation, stimulate the proliferation of endothelial cells and induce angiogenesis. Mutations in the ECM1 gene can cause lipoid proteinosis, a disorder which causes generalised thickening of skin, mucosae and certain viscera. Classical features include beaded eyelid papules and laryngeal infiltration leading to hoarseness PUBMED:11929856.\ 5015 IPR002694 \ Zinc fingers are found in a wide variety of proteins, and are associated with DNA binding. There are several different types, and this family contains the CHC2-type zinc finger, which is found in bacteria and viruses.\ 376 IPR000069 \ Flaviviruses are small enveloped viruses with virions comprised of\ 3 proteins called C, M and E PUBMED:8676481, PUBMED:7913359, PUBMED:8437237.\ The envelope glycoprotein M is made as a precursor, called prM.\ The precursor portion of the protein is the signal peptide\ for the proteins entry into the membrane. prM is cleaved to form\ M in a late-stage cleavage event. Associated with this cleavage\ is a change in the infectivity and fusion activity of the virus.\ 3629 IPR011615 \

    This domain is found in p53 transcription factors, where it is responsible for DNA-binding. These transcription factors play diverse roles in the regulation of cellular functions: the p53 tumour suppressor upregulates the expression of genes involved in cell cycle arrest and apoptosis PUBMED:12826037. The DNA-binding domain acts to clamp, or in the case of TonEBP, encircle the DNA target in order to stabilize the protein-DNA complex PUBMED:11780147. Protein interactions may also serve to stabilize the protein-DNA complex, for example in the STAT-1 dimer the SH2 (Src homology 2) domain in each monomer is coupled to the DNA-binding domain to increase stability PUBMED:9630226. The DNA-binding domain consists of a beta-sandwich formed of 9 strands in 2 sheets with a Greek-key topology. This structure is found in many transcription factors, often within the DNA-binding domain.

    \ 6354 IPR009491 \

    This family consists of several short, hypothetical bacterial proteins of unknown function.

    \ 4900 IPR004937 \ Members of this family transport urea across membranes. The family includes a bacterial homologue.\ \ 461 IPR003106 \

    This region is a plant specific leucine zipper that is always found\ associated with a homeobox PUBMED:7915839.

    \ 7704 IPR012504 \

    Check - See: PUBMED:15231775

    \ 5061 IPR007898 \

    The protein Rrn10 has been identified as a component of the Upstream Activating Factor\ (UAF), an RNA polymerase I (pol I) specific transcription stimulatory factor that recognizes the upstream ribosomal RNA\ (rRNA) gene promoter in a sequence specific manner and which stimulates rRNA synthesis PUBMED:12490702.

    \ 1990 IPR005181 \

    This domain is associated with proteins from viruses, bacteria and eukaryotes. In the latter two taxonomic groups some of the proteins are annotated as either sialic acid-specific 9-O-acetylesterase () or acetylxylan esterase related enzyme. The function of this domain is unknown.

    \ 1912 IPR003797 \ This family of proteins is related to DegV of Bacillus subtilis and includes paralogous sets\ in several species (B. subtilis, Deinococcus radiodurans, Mycoplasma pneumoniae) that\ are closer in percent identity to each than to most homologs from other species. This\ suggests both recent paralogy and diversity of function.\ 6034 IPR010404 \

    This family consists of proteins of unknown function. These proteins are around 200 amino acids in length. The proteins contain a conserved motif PYR in the N-terminal half of the protein that may be functionally important. The species distribution of the family is interesting. So far it is restricted to cyanobacteria, cryptomonads and plants. This suggests that this protein may be involved in some aspect of a photosynthetic lifestyle.

    \ 427 IPR001223 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Some members of this family, , belong to the chitinase class II group which includes chitinase, chitodextrinase and the killer toxin of Kluyveromyces lactis. The chitinases hydrolyse chitin oligosaccharides. The family also includes various glycoproteins from mammals; cartilage\ glycoprotein and the oviduct-specific glycoproteins are two examples.

    \ \ 4921 IPR003709 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    These proteins are metallopeptidases belonging to MEROPS peptidase family M15 (clan MD), subfamily M15B (vanY D-Ala-D-Ala carboxypeptidase) and M15C (Ply, L-alanyl-D-glutamate peptidase).

    \ \ \

    Acquired VanA- and VanB-type glycopeptide resistance in enterococci is due to synthesis of modified peptidoglycan precursors terminating in D-lactate. As opposed to VanA-type strains which are resistant to both vancomycin and teicoplanin, VanB-type strains remain teicoplanin susceptible PUBMED:8631706. \ The vanY gene was necessary for synthesis of the vancomycin-inducible D,D-carboxypeptidase activity previously proposed to be responsible for glycopeptide resistance. However, this activity was not required for peptidoglycan synthesis in the presence of glycopeptides PUBMED:1398115.

    \ \

    Bacteriophage lysins (Ply) or endolysins are phage-encoded cell wall lytic enzymes which are synthesised late during virus multiplication and mediate the release of progeny virions. Bacteriophages of the pathogen Listeria monocytogenes encode endolysin enzymes which specifically hydrolyse the cross-linking peptide bridges in Listeria peptidoglycan. Ply118 is a 30.8-kDa\ L-alanoyl-D-glutamate peptidase and Ply511 (36.5 kDa) acts as N-acetylmuramoyl-L-alanine amidase ().

    \ 5301 IPR008646 \ This family consists several UL45 proteins specifically found in the herpes simplex virus family. The herpes simplex virus UL45 gene encodes an 18 kDa virion envelope protein whose function remains unknown. It has been suggested that the 18 kDa UL45 gene product is required for efficient growth in the central nervous system at low doses and may play an important role under the conditions of a naturally acquired infection PUBMED:11958453.\ 5069 IPR007906 \

    This family consists of the lactophorin precursors proteose peptone component 3 (PP3) and\ glycosylation-dependent cell adhesion molecule 1 (GlyCAM-1). GlyCAM-1 functions as a ligand\ for L-selectin, a saccharide-binding protein on the surface of circulating leukocytes, and mediates\ the trafficking of blood-born lymphocytes into secondary lymph nodes. In this context, sulphatation\ of the carbohydrates of GlyCAM-1 has been shown to be a critical structural requirement to be\ recognised by L-selectin. GlyCAM-1 is also expressed in pregnant and lactating mammary glands\ of mouse and in an unknown site in the lung, in the bovine uterus and rat\ cochlea PUBMED:12057858.

    \ 40 IPR002508 \

    The cell wall envelope of Gram-positive bacteria is a macromolecular, exoskeletal organelle that is assembled and turned over at designated sites. The cell wall also functions as a surface organelle that allows Gram-positive pathogens to interact with their environment, in particular the tissues of the infected host. All of these functions require that surface proteins and enzymes be properly targeted to the cell wall envelope. Two basic mechanisms, cell wall sorting and targeting, have been identified. Cell well sorting is the covalent attachment of surface proteins to the peptidoglycan via a C-terminal sorting signal that contains a consensus LPXTG sequence. More than 100 proteins that possess cell wall-sorting signals, including the M proteins of Streptococcus pyogenes, protein A of Staphylococcus aureus, and several internalins of Listeria monocytogenes, have been identified. Cell wall targeting involves the noncovalent attachment of proteins to the cell surface via specialised binding domains. Several of these wall-binding domains appear to\ interact with secondary wall polymers that are associated with the peptidoglycan, for example teichoic acids and polysaccharides. Proteins that are targeted to the cell surface include muralytic enzymes such as autolysins, lysostaphin, and phage lytic enzymes. Other examples for targeted proteins are the surface S-layer proteins of bacilli and clostridia, as well as virulence factors required for the pathogenesis of Listeria monocytogenes (internalin B) and Streptococcus pneumoniae (PspA) infections PUBMED:10066836.

    \

    Autolysin hydrolyses the link between N-acetylmuramoyl residues and L-amino acid residues in certain bacterial cell wall glycopeptides.

    \ 5930 IPR009292 \

    This is a family of eukaryotic proteins with unknown function.

    \ 6650 IPR009641 \

    This family contains a number of viral proteins of unknown function.

    \ 82 IPR003824 \ Bacitracin resistance protein (BacA) may confer resistance to bacitracin by phosphorylation of undecaprenol PUBMED:8389741.\ 1576 IPR000753 \

    Clusterin is a vertebrate glycoprotein PUBMED:1585460, the exact function of which is not \ yet clear. Clusterin expression is complex, appearing as different forms in\ different cell compartments. One set of proteins is directed for secretion, and other clusterin species are expressed in the\ cytoplasm and nucleus. The secretory form of the clusterin protein (sCLU) is targeted to the ER by an initial\ leader peptide. This ~60-kDa pre-sCLU protein is further glycosylated and proteolytically cleaved into alpha- and beta-subunits, held together by disulphide bonds.\ External sCLU is an 80-kDa protein and may act as a molecular chaperone, scavenging denatured proteins outside cells following specific stress-induced injury such as heat shock. sCLU possesses nonspecific binding activity to hydrophobic domains of various proteins in vitro PUBMED:12551933.

    \

    A specific nuclear form of CLU (nCLU) acts as a pro-death signal, inhibiting cell growth and\ survival. The\ nCLU protein has two coiled-coil domains, one at its N terminus that is unable to bind Ku70, and a C-terminal coiled-coil domain that is uniquely able to associate\ with Ku70 and is minimally required for cell death.

    \

    Clusterin is synthesized as a precursor \ polypeptide of about 400 amino acids which is post-translationally cleaved to form two subunits \ of about 200 amino acids each. The two subunits are linked by five disulphide bonds to form an\ antiparallel ladder-like structure PUBMED:1491011. In each of the mature subunits the five \ cysteines that are involved in disulphide bonds are clustered in domains of about 30 amino acids \ located in the central part of the subunits.

    \ 4347 IPR006454 \

    These sequences represent one of several families of proteins associated with the formation of prokaryotic S-layers. Members of this family are found in archaeal species, including Pyrococcus horikoshii (split into two tandem reading frames), Methanococcus jannaschii, and related species. Some local similarity can be found to other S-layer protein families.

    \ 4913 IPR004072 \

    G-protein-coupled receptors, GPCRs, constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence PUBMED:8170923. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. There is a specialized database for GPCRs: http://www.gpcr.org/7tm/.

    \

    The rhodopsin-like GPCRs themselves represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7\ transmembrane (TM) helices PUBMED:2111655, PUBMED:2830256, PUBMED:8386361.

    \

    Pheromones have evolved in all animal phyla, to signal sex and dominance\ status, and are responsible for stereotypical social and sexual behaviour among members of the same species. In mammals, these chemical signals are believed to be detected primarily by the vomeronasal organ (VNO), a chemosensory organ located at the base of the nasal septum PUBMED:11163270. The VNO is present in most amphibia, reptiles and non-primate mammals but is absent in birds, adult catarrhine monkeys and apes PUBMED:10531049. An active role for the human VNO in the detection of pheromones is disputed; the VNO is clearly present in the foetus but appears to be atrophied or absent in adults. Three distinct families of putative pheromone receptors have been identified in the vomeronasal organ (V1Rs, V2Rs and V3Rs). All are G protein-coupled receptors but are only distantly related to the receptors of the main olfactory system, highlighting their different role PUBMED:11163270.

    \

    The V1 receptors share between 50 and 90% sequence identity but have little\ similarity to other families of G protein-coupled receptors. They appear to\ be distantly related to the mammalian T2R bitter taste receptors and the\ rhodopsin-like GPCRs PUBMED:10548735. In rat, the family comprises 30-40 genes. These are expressed in the apical regions of the VNO, in neurons expressing Gi2. Coupling of the receptors to this protein mediates inositol trisphosphate signalling PUBMED:11163270. A number of human V1 receptor homologues have also been found. The majority of these human sequences are pseudogenes PUBMED:11116092 but an apparently functional receptor has been identified that is expressed in the human olfactory system PUBMED:10973240.

    \ 8113 IPR013249 \

    Region 4 of sigma-70 like sigma-factors are involved in binding to the -35 promoter element via a helix-turn-helix motif PUBMED:11931761.

    \ 2055 IPR004375 \

    This family consists of conserved hypothetical proteins, about 150 amino acids in length, with no known function. The family is restricted to the bacteria. It includes three members in \ Escherichia coli K12 and three in Streptococcus pneumoniae.

    \ 712 IPR004228 \

    Cryptophytes are unicellular photosynthetic algae that use a lumenally located light-harvesting system, which is distinct from the phycobilisome structure found in cyanobacteria and red algae. One of the key components of this system is water-soluble phycoerythrin (PE) 545 whose expression is enhanced by low light levels PUBMED:10430868. Phycoerythrin (PE) 545 is a heterodimeric of alpha(1)alpha(2)betabeta subunits. Each alpha subunit carries a covalently linked 15,16-dihydrobiliverdin chromophore that probably acts as the final energy acceptor. The architecture of the heterodimer suggests that PE 545 may dock to an acceptor protein via a deep cleft and that energy may be transferred via this intermediary protein to the reaction center PUBMED:10430868.

    \ \ 5997 IPR010385 \

    This family consists of several hypothetical proteins from Rhizobium meliloti, Rhizobium loti and Agrobacterium tumefaciens. The function of this family is unknown.

    \ 3718 IPR001872 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of aspartic peptidases belong to the MEROPS peptidase family A8 (signal peptidase II family, clan AC). The catalytic residues have not been identified, but three conserved aspartates can be identified from sequence alignments. The type example is the Escherichia coli lipoprotein signal peptidase or SPase II (). This enzyme recognises a conserved sequence and cuts in front of a cysteine residue to which a glyceride-fatty acid lipid is attached. SPase II is an integral membrane protein that is anchored in the membrane.

    \ \ \

    Bacterial cell walls contain large amounts of murein lipoprotein, a small protein that is both N-terminally bound to lipid and attached to membrane peptidoglycan (murein) through the epsilon-amino group of its C-terminal lysine residue PUBMED:7674916.\ Secretion of this lipoprotein is facilitated by the action of lipoprotein signal peptidase (also known as leader peptidase II), located in the inner membrane PUBMED:7674916, PUBMED:6368552. The enzyme is inhibited by globomycin\ and also by pepstatin, suggesting that it is an aspartic peptidase PUBMED:7674916.

    \ \ 8080 IPR013266 \

    PdT-3 or Tryptophyllin-3 peptide is a subfamily of the family Tryptophyllin and of the superfamily FSAP (Frog Skin Active Peptide). Originally identified in skin extracts of Neotropical leaf frogs, Phyllomedusa sp. This subfamily has an average length of 13 amino acids. The pharmacological activity of the tryptophyllins remains to be established PUBMED:14687697 but it seems that these peptides possess an action on liver protein synthesis and body weight PUBMED:3831963.

    \ 7406 IPR011499 \

    This domain is found at the N terminus of a group of Chlamydial lipid A biosynthesis proteins. It is also found by itself in a family of proteins of unknown function.

    \ 7460 IPR011483 \

    This is a family of proteins found in Rhodopirellula baltica that are predicted to be secreted. Also, a member has been identified in Caulobacter crescentus (). These proteins may be related to .

    \ 5747 IPR008686 \ This family consists of several Mitovirus RNA-dependent RNA polymerase proteins. The family also contains fragment matches in the mitochondria of Arabidopsis thaliana PUBMED:9657003.\ 2103 IPR005939 \

    This is a family of largely hypothetical proteins of unknown function.\

    \ \ 4885 IPR002019 \

    Urease is a nickel-binding enzyme that catalyzes the hydrolysis of urea to carbon dioxide\ and ammonia PUBMED:3402446:\ \ Historically, it was the first enzyme to be crystallized (in 1926). It is mainly\ found in plant seeds and microorganisms. In plants, urease is a hexamer of identical chains. In bacteria\ PUBMED:2651866, it consists of either two or three different subunits (alpha , beta, described in this entry, and gamma ). The structure of the\ urease complex is known PUBMED:7754395.

    \ This subunit does not appear to take part in the catalytic mechanism. \ This subunit is known (confusingly) as alpha in Helicobacter.\ 7274 IPR010891 \

    This family contains the bacterial protein GumN (approximately 330 residues long). Note that many members of this family are hypothetical proteins.

    \ 4927 IPR002499 \

    Vaults are the largest ribonucleoprotein particles known having a mass of approximately 13 MDa. Their function has not been determined PUBMED:10196123. This family corresponds to a repeat found in the amino terminal half of the major vault protein (MVP or Lung resistance-related protein) which has a mass of 100 kDa.

    \ \

    The 13 MDa mammalian vault structure is highly regular and consists of approximately 96 molecules of the 100 kDa major vault protein (MVP), 2 molecules of the 240 kDa minor vault protein TEP1, 8 molecules of the 193 kDa minor vault protein VPARP and at least 6 copies of a small untranslated RNA of 88141 bases. The MVP molecules form the core of the complex, which is a barrel-like structure with an invaginated waist and two protruding caps. The complex can unfold into two symmetrical flower-like structures with 8 petals each supposedly consisting of 6 MVP molecules PUBMED:10196123.

    \ \

    Although all vault components have been identified and characterized, and a model of the vault complex has been determined to 31-Angstrom PUBMED:10196123, not much is known about vault assembly. MVP molecules interact with each other via their coiled coil domain. Purified MVP is able to bind calcium as it contains typical calcium-binding EF-hands. No interactions have been demonstrated between TEP1 and other vault proteins. However, the N-terminal half of MVP binds to a specific domain in the C terminus of VPARP. Furthermore, VPARP (e.g. ) contains amino acid stretches mediating intramolecular binding and a distinct domain with similarity to the catalytic domain of poly(ADP-ribose) polymerase, . MVP (Lung resistance-related protein) is overexpressed in many multidrug-resistant cancer cells.

    \ \

    TEP1 (e.g. ) has a mass of 240 kDa and in addition to being a vault component it is also a telomerase-associated component. The presence of a large number of WD40 repeats, , in the C terminus of the TEP1 protein is a convenient number for this protein to serve a structural or organizing role in the vault. The sharing of the TEP1 protein between vaults and telomerase suggests that TEP1 may play a common role in some aspect of ribonucleoprotein structure, function or assembly.

    \ 5049 IPR007524 \ This region is found N-terminal to the pectate lyase domain () in some plant pectate lyase enzymes.\ 1739 IPR003332 \ Decorin is a proteoglycan that decorates collagen fibres. Borrelia burgdorferi causes lyme disease, a tick-borne infection that can develop into a chronic, multisystemic disorder. Decorin may mediate the adherence of B. burgdorferi to collagen fibers in skin and other tissues PUBMED:7642279. Borrelia burgdorferi decorin binding protein A (DbpA) facilitates this binding PUBMED:9784533.\ 1650 IPR003205 \

    Cytochrome c oxidase () is an oligomeric enzymatic complex which is a component \ of the respiratory chain complex and is involved in the transfer of electrons from \ cytochrome c to oxygen PUBMED:6307356. \ In eukaryotes this enzyme complex is located in the mitochondrial inner membrane; in \ aerobic prokaryotes it is found in the plasma membrane.

    \

    In eukaryotes, in addition to the \ three large subunits, I, II and III, that form the catalytic center of the enzyme complex, there are \ a variable number of small polypeptidic subunits.This family is composed of cytochrome c oxidase subunit VIII.

    \ 3062 IPR004281 \ Interleukin 12 (IL-12) is a disulphide-bonded heterodimer consisting of a 35kDa alpha subunit and a 40kDa beta subunit. It is involved in the stimulation and maintenance of Th1 cellular immune responses, including the normal host defence against various intracellular pathogens, such as Leishmania, Toxoplasma, measles virus and HIV. IL-12 also has an important role in pathological Th1 responses, such as in inflammatory bowel\ disease and multiple sclerosis. Suppression of IL-12 activity in such diseases may have therapeutic benefit. On the other\ hand, administration of recombinant IL-12 may have therapeutic benefit in conditions associated with pathological Th2\ responses PUBMED:11422900, PUBMED:9597139.\ 7941 IPR012517 \

    This family consists of lactocin 705 which is a bacteriocin produced by Lactobacillus casei CRL 705. Lactocin 705 is a class IIb bacteriocin, whose activity depends upon the complementation of two peptides (705-alpha and 705-beta) of 33 amino acid residues each. Lactocin 705 is active against several Gram-positive bacteria, including food-borne pathogens and is a good candidate to be used for biopreservation of fermented meats PUBMED:10754241.

    \ 2897 IPR003363 \

    Glycoprotein G (gG) is one of the seven external glycoproteins of HSV1 and HSV2 PUBMED:3027242. This family also contains the glycoprotein GX (gX) initially identified in Pseudorabies virus. In the HSV2 virus-infected cell, gG-2 is cleaved into a secreted amino-terminal portion (sgG-2) and a carboxy-terminal portion. The latter protein is further O-glycosylated, generating the cell membrane-associated mature gG-2 (mgG-2). The mgG-2 protein has widely been used as a prototype antigen for detection of type-specific antibodies against HSV2 PUBMED:12904375.

    \ 7221 IPR009976 \

    This family contains the Sec10 component (approximately 650 residues long) of the eukaryotic exocyst complex, which specifically affects the synthesis and delivery of secretory and basolateral plasma membrane proteins PUBMED:12665531.

    \ 7543 IPR011692 \

    This family of plant proteins have been implicated in nodule development PUBMED:8634476 in the legume Medicago truncatula. MtN-19 was shown by Northern blot to be induced during nodulation PUBMED:8634476. The molecular function of these proteins is unknown.

    \ 3589 IPR007753 \ Orbivirus are double stranded RNA retroviruses of which the bluetongue virus is a member. The core of bluetongue virus (BTV) is a multienzyme complex composed of two major proteins (VP7 and VP3) and three minor proteins (VP1, VP4 and VP6) in addition to the viral genome. VP4 has been shown to perform all RNA capping activities and has both methyltransferase type 1 and type 2 activities associated with it PUBMED:9811835.\ 3979 IPR007031 \ Members of this family are approximately 26 KDa, and are involved in trans-activation of late transcription PUBMED:8523544.\ 3815 IPR005003 \

    This is a repeat found in the tail fibres of many bacteriophage and homologous bacterial proteins.

    \ 233 IPR003735 \

    This entry describes proteins of unknown function.

    \ 3804 IPR005845 \

    Phosphoglucomutase (, PGM) is an enzyme responsible for\ the conversion of D-glucose 1-phosphate into D-glucose 6-phosphate. PGM\ participates in both the breakdown and synthesis of glucose. Phosphomannomutase (, PMM) is an enzyme responsible for\ the conversion of D-mannose 1-phosphate into D-mannose 6-phosphate. PMM is\ required for different biosynthetic pathways in bacteria.

    \

    This domain is contained in both proteins.

    \ 107 IPR000584 \

    Ca2+ ions are unique in that they not only carry charge but they are also the most widely used of\ diffusible second messengers. Voltage-dependent Ca2+ channels (VDCC) are a family of molecules\ that allow cells to couple electrical activity to intracellular Ca2+ signalling. The opening and closing of\ these channels by depolarizing stimuli, such as action potentials, allows Ca2+ ions to enter neurons\ down a steep electrochemical gradient, producing transient intracellular Ca2+ signals. Many of the\ processes that occur in neurons, including transmitter release, gene transcription and metabolism are\ controlled by Ca2+ influx occurring simultaneously at different cellular locales. The activity of this pore is modulated by 4 tightly-\ coupled subunits: an intracellular beta subunit; a transmembrane gamma\ subunit; and a disulphide-linked complex of alpha-2 and delta subunits, \ which are proteolytically cleaved from the same gene product.

    \ \

    Voltage-gated calcium channels\ are classified as T, L, N, P, Q and R, and are distinguished by their\ sensitivity to pharmacological blocks, single-channel conductance kinetics,\ and voltage-dependence. On the basis of their voltage activation\ properties, the voltage-gated calcium classes can be further divided into\ two broad groups: the low (T-type) and high (L, N, P, Q and R-type)\ threshold-activated channels PUBMED:.

    \

    L-type calcium channnels are formed from different alpha-1 subunit isoforms\ that determine the pharmacological properties of the channel, since they\ form the drug binding domain. Other properties, such as gating voltage-dependence, G protein modulation and kinase susceptibility, are influenced \ by alpha-2, delta and beta subunits.

    There are\ four distinct beta subunits: beta-1, beta-2, beta-3 and beta-4; and the\ magnitude of the shift in the voltage-dependence of activation of change to membrane\ potentials varies with the particular subtype PUBMED:9153247.

    \ 2263 IPR006855 \

    This region of unknown function is found at the C terminus of Neurospora crassa acetylglutamate synthase (). It is also found C-terminal to the amino acid kinase region in some fungal acetylglutamate kinase enzymes (). These enzymes play a role in arginine biosynthesis.

    \ 3640 IPR004963 \ This family contains a number of uncharacterised proteins. Some of these are thought to be putative pectinacetylesterases.\ 1545 IPR005649 \ The chorion genes of Drosophila are amplified in response to developmental signals in the follicle cells of the ovary PUBMED:1908228.\ 1257 IPR007844 \ The AsmA protein is involved in the assembly of outer membrane proteins in Escherichia coli PUBMED:8866482. AsmA mutations were isolated as extragenic suppressors of an OmpF assembly mutant PUBMED:7476172. AsmA may have a role in LPS biogenesis PUBMED:7476172.\ 482 IPR006121 \

    Proteins that transport heavy metals in micro-organisms and mammals share similarities in their sequences and structures.

    \

    These proteins provide an important focus for research, some being involved in bacterial resistance to toxic metals, such as lead and cadmium, while others are involved in inherited human syndromes, such as Wilson's and Menke's diseases PUBMED:8091505.

    \

    A conserved domain has been found in a number of these heavy metal transport or detoxification proteins PUBMED:8091505. The domain, which has been termed Heavy-Metal-Associated (HMA), contains two conserved cysteines that are probably involved in metal binding.

    \

    \ Structure solution of the fourth HMA domain of the Menkes copper transporting\ ATPase shows a well-defined structure comprising a four-stranded antiparallel\ beta-sheet and two alpha helices packed in an alpha-beta sandwich fold PUBMED:9437429. This fold is common to other domains and is classified\ as "ferredoxin-like".

    \ \ 4558 IPR001050 \ \ Syndecans are a family of transmembrane heparan sulphate proteoglycans which are\ implicated in the binding of extracellular matrix components and growth factors.\ Syndecans bind a variety of molecules via their heparan sulphate chains and can act\ as receptors or as co-receptors PUBMED:1335744, PUBMED:8370471.\ \ 3683 IPR002015 \ A weakly conserved repeat module of unknown function, which occurs\ in two regulatory subunits of the 26S-proteasome and in one subunit\ of the APC-complex (cyclosome) PUBMED:9204704.\ 8057 IPR013156 \

    Pseudins are a subfamily of the FSAP family (Frog Secreted Active Peptides) extracted from the skin of the paradoxical frog Pseudis paradoxa (Pseudidae). The pseudins belong to the class of cationic, amphipathic-helical antimicrobial peptides PUBMED:11689009.

    \ 3445 IPR007696 \

    This domain is found in proteins of the MutS family (DNA mismatch repair proteins) and is found associated with MutS_V, MutS_II, MutS_I and MutS_IV. The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair; other members of the family included the eukaryotic MSH 1, 2, 3, 4, 5 and 6 proteins. These have various roles in DNA repair and recombination. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein. The aligned region corresponds with domain III, which is central to the structure of Thermus aquaticus MutS.

    \ 7562 IPR011707 \

    Copper is one of the most prevalent transition metals in living organisms and its biological function is intimately related to its redox properties. Since free copper is toxic, even at very low concentrations, its homeostasis in living organisms is tightly controlled by subtle molecular mechanisms. In eukaryotes, before being transported inside the cell via the high-affinity copper transporters of the CTR family, the copper (II) ion is reduced to copper (I). In blue copper proteins such as Cupredoxin, the copper (I) ion form is stabilised by a constrained His2Cys coordination environment.

    Multicopper oxidases PUBMED:2404764, PUBMED:1995346 are enzymes that possess three spectroscopically different copper centres. These centres are called: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear). Structurally, these proteins contain a cupredoxin-like fold, a beta-sandwich consisting of 7 strands in 2 beta-sheets, arranged in a greek-key beta-barrel PUBMED:11867755.

    \ 411 IPR007123 \

    Gelsolin is a cytoplasmic, calcium-regulated, actin-modulating protein that binds\ to the barbed ends of actin filaments, preventing monomer exchange (end-blocking or\ capping) PUBMED:3023087. It can promote nucleation (the assembly of\ monomers into filaments), as well as sever existing filaments. In addition, this protein\ binds with high affinity to fibronectin. Plasma gelsolin and cytoplasmic gelsolin are\ derived from a single gene by alternate initiation sites and differential splicing.

    \

    Sequence comparisons indicate an evolutionary relationship between gelsolin,\ villin, fragmin and severin PUBMED:2850369. Six large repeating segments\ occur in gelsolin and villin, and 3 similar segments in severin and fragmin. While the\ multiple repeats have yet to be related to any known function of the actin-severing\ proteins, the superfamily appears to have evolved from an ancestral sequence of 120\ to 130 amino acid residues PUBMED:2850369.

    \ 837 IPR001638 \

    Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. The protein components of these traffic systems include one or two transmembrane protein components, one or two membrane-associated ATP-binding proteins (ABC transporters; see ) and a high affinity periplasmic solute-binding protein. The latter are thought to bind the substrate in the vicinity of the inner membrane, and to transfer it to a complex of inner membrane proteins for concentration into the cytoplasm.

    \

    In Gram-positive bacteria which are surrounded by a single membrane and have therefore no periplasmic region, the equivalent proteins are bound to the membrane via an N-terminal lipid anchor. These homolog proteins do not play an integral role in the transport process per se, but probably serve as receptors to trigger or initiate translocation of the solute throught the membrane by binding to external sites of the integral membrane proteins of the efflux system.

    \

    In addition, at least some solute-binding proteins function in the initiation of sensory transduction pathways.

    \

    On the basis of sequence similarities, the vast majority of these solute-binding proteins can be grouped PUBMED:8336670 into eight families or clusters, which generally correlate with the nature of the solute bound.

    \

    Family 3 groups together specific amino acids and opine-binding periplasmic proteins and a periplasmic homolog with catalytic activity.

    \ 2427 IPR001489 \

    Prokaryotic heat-stable enterotoxins are responsible for acute diarrhea PUBMED:3552731.\ The active toxin is a short peptide of around twenty residues which contains\ six cysteines involved in three disulphide bonds.

    \ 7030 IPR009852 \

    This entry represents the C terminus (approximately 180 residues) of eukaryotic T-complex protein 10. The T-complex is involved in spermatogenesis in mice PUBMED:12068715.

    \ 810 IPR001352 \

    Ribonuclease HII is involved in the degradation of the ribonucleotide moiety on RNA-DNA hybrid molecules carrying out endonucleolytic cleavage to 5'-phospo-monoester. Proteins which belong to this family have been found in bacteria, archaea, and yeasts. This family also includes Ribonuclease HIII.

    \ 1186 IPR006680 \

    This group of enzymes represents a large metal dependent hydrolase superfamily PUBMED:8550522. The family includes adenine deaminase () that hydrolyses adenine to form hypoxanthine and ammonia. The adenine deaminase reaction is important for adenine utilization as a purine and also as a nitrogen source PUBMED:9144792. This family also includes dihydroorotase and N-acetylglucosamine-6-phosphate deacetylases (). These enzymes catalyse the reaction: This family includes dihydroorotase and urease which belong to MEROPS peptidase family M38 (beta-aspartyl dipeptidase, clan MJ), where they are classified as non-peptidase\ homologs.

    \ 2614 IPR003494 \ FtsA is essential for bacterial cell division, and co-localizes to the septal ring with FtsZ. It has been suggested that the interaction\ of FtsA-FtsZ has arisen through coevolution in different bacterial strains PUBMED:9352931.\ 8116 IPR013242 \

    This region defines single domain aspartyl proteases from retroviruses, retrotransposons, and badnaviruses (plant dsDNA viruses). These proteases are generally part of a larger polyprotein; usually pol, more rarely gag. Retroviral proteases appear to be homologous to a single domain of the two-domain eukaryotic aspartyl proteases

    \ 6464 IPR010587 \

    This family consists of several uncharacterised proteins from Melanoplus sanguinipes entomopoxvirus (MsEPV). The function of this family is unknown.

    \ 4262 IPR003117 \

    In the absence of cAMP, Protein Kinase A (PKA) exists as an equimolar tetramer of regulatory (R) and catalytic (C) subunits PUBMED:11734894. In addition to its role as an inhibitor of the C subunit, the R subunit anchors the holoenzyme to specific intracellular locations and prevents the C subunit from entering the nucleus. All R subunits have a conserved domain structure consisting of the N-terminal dimerization domain, inhibitory region, cAMP-binding domain A and cAMP-binding domain B. R subunits interact with C subunits primarily through the inhibitory site. The cAMP-binding domains show extensive sequence similarity and bind cAMP cooperatively.

    \ \

    Two types of R subunit exist - Type I and Type II - which differ in molecular weight, sequence, autophosphorylation cabaility, cellular location and tissue distribution. Types I and II were further sub-divided into alpha and beta subtypes, based mainly on sequence similarity. This family of RII alpha, the regulatory subunit portion of type II PKA proteins, contains the dimerisation interface and binding site for A-kinase-anchoring proteins (AKAPs).

    \ 2928 IPR005208 \

    This is a family of Herpesvirus proteins including UL33 ,UL51 . The proteins in this family are involved in packaging viral DNA.

    \ 855 IPR003582 \

    The ShK toxin domain is found in metridin, a toxin from Metridium senile (brown sea anemone), and several hypothetical proteins from Caenorhabditis elegans.

    A number of the proteins in this group are metallopeptidases belonging to MEROPS peptidase families: M10A, M12A and M14A. The majority belonging to M12A, the astacin/adamalysin family of metallopeptidases.

    \ 5711 IPR008573 \ This family consists of several baculovirus proteins of around 130 residues in length. The function of this family is unknown.\ 944 IPR002547 \ This domain is found in prokaryotic methionyl-tRNA synthetases, \ prokaryotic phenylalanyl tRNA synthetases the yeast GU4 nucleic-binding \ protein (G4p1 or p42, ARC1) PUBMED:8895587, human tyrosyl-tRNA synthetase PUBMED:9162081,\ and endothelial-monocyte activating polypeptide II. \ G4p1 binds specifically to tRNA form a complex with methionyl-tRNA \ synthetases PUBMED:8895587. In human tyrosyl-tRNA synthetase this domain may direct\ tRNA to the active site of the enzyme PUBMED:8895587. This domain may perform a\ common function in tRNA aminoacylation PUBMED:9162081.\ 6028 IPR010399 \

    This short motif is found in a variety of plant transcription factors that contain GATA domains as well as other motifs. The most conserved amino acids form the pattern TIFF/YXG. This domain may be involved in binding DNA.

    \ 4397 IPR007672 \ SelP is the only known eukaryotic selenoprotein that contains multiple selenocysteine (Sec) residues, and accounts for more than 50% of the selenium content of rat and human plasma PUBMED:10775431. It is thought to be glycosylated PUBMED:11168591. SelP may have antioxidant properties. It can attach to epithelial cells, and may protect vascular endothelial cells against peroxynitrite toxicity PUBMED:10775431. The high selenium content of SelP suggests that it may be involved in selenium intercellular transport or storage PUBMED:11168591. The promoter structure of bovine SelP suggests that it may be involved in countering heavy metal intoxication, and may also have a developmental function PUBMED:9358058. The N-terminal region always contains one Sec residue, and this is separated from the C-terminal region (9-16 sec residues) by a histidine-rich sequence PUBMED:11168591. The large number of Sec residues in the C-terminal portion of SelP suggests that it may be involved in selenium transport or storage. However, it is also possible that this region has a redox function PUBMED:11168591.\ 5376 IPR008752 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M11 (gametolysin family, clan MA(M)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA and the predicted active site residues for members of this family and thermolysin occur in the motif HEXXH PUBMED:7674922.

    The type example is gametolysin from the unicellular biflagellated alga, Chlamydomonas reinhardtii Gametolysin is a zinc-containing metallo-protease, which is responsible for the degradation of the cell wall. Homologues of gametolysin have also been reported in the simple multicellular organism, Volvox PUBMED:11489172, PUBMED:11680823.\ 371 IPR001179 \

    Synonym(s): Peptidylprolyl cis-trans isomerase

    \ \ FKBP-type peptidylprolyl isomerases () in vertebrates, are receptors for the\ two immunosuppressants, FK506 and rapamycin. The drugs inhibit T cell proliferation\ by arresting two distinct cytoplasmic signal transmission pathways. Peptidylprolyl isomerases accelerate protein folding by\ catalyzing the cis-trans isomerization of proline imidic peptide bonds in oligopeptides.\ These proteins are found in a variety of organisms.\ 672 IPR000084 \ This family is named after a PE motif near to the amino\ terminus. The carboxyl terminus of this family\ are variable and fall into several classes. The\ largest class of PE proteins is the highly repetitive\ PGRS class which have a high glycine content.\ The function of these proteins is uncertain but it\ has been suggested that they may be related to\ antigenic variation of Mycobacterium tuberculosis PUBMED:9634230.\ 1598 IPR002102 \

    Cohesin domains interact with a complementary domain, termed the dockerin domain (see ). The cohesin-dockerin interaction is the crucial interaction for complex formation in the cellulosome PUBMED:9083107.

    \ \ \

    The scaffoldin component of the cellulolytic bacterium Clostridium thermocellum is a non-hydrolytic protein which\ organizes the hydrolytic enzymes in a large complex, called the cellulosome. Scaffoldin comprises a series of functional domains,\ amongst which is a single cellulose-binding domain and nine cohesin domains which are responsible for integrating the individual\ enzymatic subunits into the complex.

    \ 6849 IPR010743 \

    This family consists of several bacterial and one archaeal methionine biosynthesis MetW proteins. Biosynthesis of methionine from homoserine in Pseudomonas putida takes place in three steps. The first step is the acylation of homoserine to yield an acyl-L-homoserine. This reaction is catalysed by the products of the metXW genes and is equivalent to the first step in enterobacteria, Gram-positive bacteria and fungi, except that in these microorganisms the reaction is catalysed by a single polypeptide (the product of the metA gene in Escherichia coli and the met5 gene product in Neurospora crassa). In Pseudomonas putida, as in Gram-positive bacteria and certain fungi, the second and third steps are a direct sulphydrylation that converts the O-acyl-L-homoserine into homocysteine and further methylation to yield methionine. The latter reaction can be mediated by either of the two methionine synthetases present in the cells PUBMED:11479715.

    \ \ 6346 IPR009488 \

    This family consists of several hypothetical proteins of unknown function which appear to be found exclusively in Helicobacter pylori.

    \ 56 IPR007239 \ Apg5p is directly required for the import of aminopeptidase I via the cytoplasm-to-vacuole targeting pathway PUBMED:10712513.\ 2780 IPR006813 \ This family represents beta-1,4-mannosyl-glycoprotein beta-1,4-N-acetylglucosaminyltransferase (). This enzyme transfers the bisecting GlcNAc to the core mannose of complex N-glycans. The addition of this residue is regulated during development and has functional consequences for receptor signalling, cell adhesion, and tumour progression PUBMED:11986323, PUBMED:11784313.\ 7794 IPR012902 \

    This short motif directs methylation of the conserved phenylalanine residue. It is most often found at the N-terminus of pilins and other proteins involved in secretion, see , , and .

    \ 3318 IPR000551 \ The many bacterial transcription regulation proteins which bind DNA through a 'helix-turn-helix' motif\ can be classified into subfamilies on the basis of sequence similarities. One of these is the MerR subfamily.\ MerR, which is found in many bacterial species mediates the mercuric-dependent induction of the mercury\ resistance operon. In the absence of mercury merR represses transcription by binding tightly, as a dimer,\ to the 'mer' operator region; when mercury is present the dimeric complex binds a single ion and becomes\ a potent transcriptional activator, while remaining bound to the mer site. Members of the family include the\ mercuric resistance operon regulatory protein merR; \ Bacillus subtilis bltR and bmrR; Bacillus glnR;\ Streptomyces coelicolor hspR; Bradyrhizobium japonicum nolA; Escherichia coli superoxide response regulator soxR;\ and Streptomyces lividans transcriptional activator tipA PUBMED:7688297, PUBMED:2492496, PUBMED:7608059,\ PUBMED:1677938, PUBMED:1988958, PUBMED:2305262. \ Other members include hypothetical proteins from E. coli, B. subtilis\ and Haemophilus influenzae. Within this family, the HTH motif is situated towards the N-terminus.\ 7361 IPR006576 \

    BRK is a domain of unknown function found only in the metazoa and in association with CHROMO domain () and DEAD/DEAH box helicase domain ().

    \ 2777 IPR002516 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 11 comprises enzymes with only one known activity; galactoside 2-L-fucosyltransferase ().

    \ \

    Some of the proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. Galactoside 2-L-fucosyltransferase 1 () and Galactoside 2-L-fucosyltransferase 2 () belong to the Hh blood group system and are associated with H/h and Se/se antigens.

    \ 3053 IPR013151 \

    This entry is for immunoglobulin-like domains. Studies indicate that the interactions essential for defining the structure of these beta sandwich proteins are also important in nucleation of folding, and that proteins containing this fold may share similar folding pathways even though the proteins may have low sequence homology. The fold consists of a beta-sandwich formed of 7 strands in 2 sheets with a Greek-key topology. Some members of the fold have additional strands. The Pfam alignments do not include the first and last strand of the immunoglobulin-like domain.

    \ \ 960 IPR001394 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the MEROPS peptidase family C19 (ubiquitin-specific protease family, clan CA). Families within the CA clan are loosely termed papain-like as protein fold of the peptidase unit resembles that of papain, the type example for clan CA. Predicted active site residues for members of this family and family C1 occur in the same order in the sequence: N/Q, C, H. The type example is human ubiquitin-specific protease 14.

    \ \

    Ubiquitin is highly conserved, commonly found conjugated to proteins in\ eukaryotic cells, where it may act as a marker for rapid degradation, or\ it may have a chaperone function in protein assembly PUBMED:7845226. The ubiquitin is released by cleavage from the bound protein by a protease PUBMED:7845226. A number of\ deubiquitinising proteases are known: all are activated by thiol compounds\ PUBMED:7845226, PUBMED:3015923, and inhibited by thiol-blocking agents and ubiquitin aldehyde PUBMED:7845226, PUBMED:3031653, and as such have the properties of cysteine proteases PUBMED:7845226.

    \ \

    The deubiquitinsing proteases can be split into 2 size ranges (20-30 kDa, ,\ and 100-200 kDa) PUBMED:7845226: this family are the 100-200 kDa peptides which includes the Ubp1 ubiquitin peptidase from yeast. Only one conserved cysteine can be identified, along with two conserved histidines. The spacing between the cysteine and the second histidine is thought to be more representative of the cysteine/histidine spacing of a cysteine protease catalytic dyad PUBMED:7845226.

    \ 5530 IPR008647 \ UL49.5 protein consists of 98 amino acids with a calculated molecular mass of 10,155 Da. It contains putative signal peptide and transmembrane domains but lacks a consensus sequence for N glycosylation. UL49.5 protein is an O-glycosylated structural component of the viral envelope PUBMED:8551587.\ 114 IPR000022 \

    Members in this domain include biotin dependent carboxylases\ PUBMED:8102604, PUBMED:8366018.\ The carboxyl transferase domain carries out the following reaction;\ transcarboxylation from biotin to an acceptor molecule. There are\ two recognised types of carboxyl transferase. One of them uses acyl-CoA\ and the other uses 2-oxo acid as the acceptor molecule of carbon dioxide. \ All of the members in this family utilise acyl-CoA as the acceptor\ molecule.

    \ 2820 IPR001437 \ Bacterial proteins greA and greB are necessary for efficient RNA\ polymerase transcription elongation past template-encoded arresting sites.\ Arresting sites in DNA have the property of trapping a certain fraction of\ elongating RNA polymerases that pass through, resulting in locked DNA/RNA/\ polymerase ternary complexes. Cleavage of the nascent transcript by cleavage\ factors, such as greA or greB, allows the resumption of elongation from the\ new 3'terminus PUBMED:8431948, PUBMED:7854424.

    Escherichia coli GreA and GreB are sequence homologues and have homologues in\ every known bacterial genome PUBMED:12914698. GreA induces cleavage two or three nucleotides behind the terminus\ and can only prevent\ the formation of arrested complexes while greB releases longer sequences up to eighteen nucleotides in length and can\ rescue preexisting arrested complexes. These functional differences correlate with a\ distinctive structural feature, the distribution of positively charged residues on one face of the N-terminal coiled\ coil. Remarkably, despite close functional similarity, the prokaryotic Gre factors have no\ sequence or structural similarity with eukaryotic TFIIS.

    \ 3149 IPR000576 \ In bacteria there are a number of families of transport proteins, including symporters and antiporters, that\ mediate the intake of a variety of sugars with the concomitant uptake of hydrogen ions (proton symporters)\ PUBMED:8438231. The lacY family of Escherichia coli and Klebsiella pneumoniae are proton/beta-galactoside symporters,\ which, like most sugar transporters, are integral membrane proteins with 12 predicted transmembrane (TM) regions.\ Also similar to the lacY family are the rafinose (rafB) and sucrose (cscB) permeases from E. coli PUBMED:1435727.\ 6796 IPR010722 \

    Biotin synthase (BioB), , catalyses the last step of the biotin biosynthetic pathway. The reaction consists in the introduction of a sulphur atom into dethiobiotin. BioB functions as a homodimer PUBMED:12482614. Thiamin synthesis if a complex process involving at least six gene products (ThiFSGH, ThiI and ThiJ). Two of the proteins required for the biosynthesis of the thiazole moiety of thiamine (vitamin B(1)) are ThiG and ThiH (this entry) and form a heterodimerPUBMED:12650933. Both of these reactions are thought of involve the binding of co-factors, and both function as dimers PUBMED:12482614, PUBMED:12650933. This domain therefore may be involved in co-factor binding or dimerisation.

    \ 8007 IPR012613 \

    This family consists of the small acid-soluble spore proteins (SASP) O type (sspO). SspO (originally cotK) are unique to the spores of Bacillus subtilis and are expressed only in the forespore compartment of sporulating cells of this organism. The sspO is the first gene in a likely operon with sspP and transcription of this gene is primarily by RNA polymerase with the forespore-specific sigma factor, sigma-G. Mutation deleting sspO causes the loss of the SspO from the forespore but had no discernible effect on sporulation, spore properties or spore germination PUBMED:10806362.

    \ 5908 IPR010342 \

    This family consists of several hypothetical proteins from both prokaryotes and eukaryotes. The function of this family is unknown.

    \ 219 IPR007694 \

    The hexameric helicase DnaB unwinds the DNA duplex at the Escherichia coli chromosome replication fork. Although the mechanism by which DnaB both couples ATP hydrolysis to translocation along DNA and denatures the duplex is unknown, a change in the quaternary structure of the protein involving dimerization of the N-terminal domain has been observed and may occur during the enzymatic cycle. This C-terminal domain contains an ATP-binding site and is therefore probably the site of ATP hydrolysis.

    \ 707 IPR001965 \

    The homeodomain (PHD) finger PUBMED:7701562,PUBMED: is a C4HC3 zinc-finger-like motif found in nuclear proteins thought to be involved in chromatin-mediated transcriptional regulation. The PHD finger motif is reminiscent of, but distinct from the C3HC4 type RING finger.

    \

    The function of this domain is not yet known but in analogy with the LIM domain it could be involved in protein-protein interaction and be important for the assembly or activity of multicomponent complexes involved in transcriptional activation or repression. Alternatively, the interactions could be intra-molecular and be important in maintaining the structural integrity of the protein. In similarity to the RING finger and the LIM domain, the PHD finger is thought to bind two zinc ions.

    \ 4861 IPR005354 \

    This family of small proteins has no known function.

    \ 3942 IPR006890 \ This is a family of poxvirus proteins.\ 3546 IPR004870 \

    This is a family of nucleoporin proteins (Nups). Nucleoporins are the main components of the nuclear pore complex in eukaryotic cells, and mediate bidirectional nucleocytoplasmic transport, especially of mRNA and proteins. Two subsets of nucleoporins that contain peptide repeats have been identified: one is characterised by the FG (Phe-Gly) repeat; the other, which is included in this family, contain WD (Trp-Asp) repeats. WD repeat Nups (Nup37, Nup43, Seh1, ALADIN, RAE, and Sec13) are thought to be involved in the assembly of structural domains of the nuclear pore complex PUBMED:14517296.

    \ \ \ 1955 IPR004861 \ This family consists of putative tyrosine phosphatase proteins, this function is inferred from several sequences at the top of the noise,\ such as the Raccoon poxvirus Protein-tyrosine phosphatase (), . \ \ 1863 IPR002846 \ These archaebacterial proteins have no known function.\ The domain is found duplicated in some sequences.\ 6856 IPR010746 \

    This family contains a number of viral proteins of unknown function approximately 200 residues long. Family members seem to be restricted to badnaviruses.

    \ 4478 IPR003671 \

    Spindlin (Spin) and Ssty were first identified for their involvement in gametogenesis. Spindlin was identified as a maternal transcript present in the unfertilised egg and early embryo, and was subsequently shown to interact with the spindle apparatus during oogenesis, and may therefore be important for mitosis PUBMED:9053325. In addition, spindlin appears to be a target for cell cycle-dependent phosphorylation, and as such may play a role in cell cycle regulation during the transition from gamete to embryo PUBMED:11806826. Ssty is a multi-copy, Y-linked spermatogenesis-specific transcript that appears to be required for normal spermatogenesis PUBMED:15020475. Ssty may play an analogous role to spindlin in sperm cells, namely during the transition from sperm cells to early embryo, and in mitosis.

    \ 3202 IPR005152 \

    These lipases are expressed and secreted during the infection cycle of these pathogens. In particular, Candida albicans has a large number of different lipases, possibly reflecting broad lipolytic activity, which may contribute to the persistence and virulence of C. albicans in human tissue PUBMED:11131027.

    \ 604 IPR004009 \ This domain has an SH3-like fold. It is found at the N-terminus of many but not all myosins. The function of this domain is unknown.\ 6937 IPR010940 \

    This entry represents the C terminus (approximately 100 residues) of bacterial and eukaryotic Magnesium-protoporphyrin IX methyltransferase (). This converts magnesium-protoporphyrin IX to magnesium-protoporphyrin IX metylester using S-adenosyl-L-methionine as a cofactor PUBMED:8071204.

    \ 4826 IPR003844 \

    This entry describes integral membrane proteins of unknown function.

    \ 4876 IPR003485 \

    This is a family of unique short (US) region proteins from herpesvirus strains. The US2 family has no known function.

    \ 5704 IPR008866 \ This family consists of several phage terminase large subunit proteins as well as related sequences from several bacterial species. The DNA packaging enzyme of bacteriophage lambda, terminase, is a heteromultimer composed of a small subunit, gpNu1, and a large subunit, gpA, products of the Nu1 and A genes, respectively. Terminase is involved in the site-specific binding and cutting of the DNA in the initial stages of packaging. It is now known that gpA is actively involved in late stages of packaging, including DNA translocation, and that this enzyme contains separate functional domains for its early and late packaging activities PUBMED:11866517.\ 4319 IPR000504 \ Many eukaryotic proteins that are known or supposed to bind single-stranded RNA contain one or more copies of a putative RNA-binding domain of about 90 amino acids. This is known as the eukaryotic putative RNA-binding region RNP-1 signature PUBMED:2470643, PUBMED:3072706, or RNA recognition motif (RRM). RRMs are found in a variety of RNA binding proteins, including heterogeneous nuclear ribonucleoproteins (hnRNPs), proteins implicated in regulation of alternative splicing, and protein components of small nuclear ribonucleoproteins (snRNPs). The RRM in heterodimeric splicing factor U2 snRNP auxiliary factor (U2AF) appears to have two RRM-like domains with specialized features for protein recognition PUBMED:15231733. The motif also appears in a few single stranded DNA binding proteins. The RRM structure consists of four strands and two helices arranged in an alpha/beta sandwich, with a third helix present during RNA binding in some cases PUBMED:8290338.\ 4519 IPR001443 \ Staphylocoagulase is an extracellular protein produced by several\ strains of Staphylococcus aureus and which specifically forms a complex with\ prothrombin PUBMED:3481366, PUBMED:2587230. This complex named staphylothrombin can clot fibrinogen without\ any proteolytic cleavage of prothrombin.\ The C terminus of staphylocoagulase contains the tandem repeat which does not seem to be \ required for the procoagulant activity.\ 5839 IPR010308 \

    This family consists of hypothetical proteins of unknown function found in fungi.

    \ 3852 IPR008170 \

    This family describes PhoU, a regulatory protein of unknown mechanism for high-affinity phosphate ABC transporter systems. The protein consists of two copies of the domain described by the Pfam model. Deletion of PhoU activates constitutive expression of the phosphate ABC transporter and allows phosphate transport, but causes a growth defect and suggesting that the protein has some second function PUBMED:8226621.

    \ \ \ 5237 IPR008740 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \ This group of cysteine peptidases correspond to MEROPS peptidase family C30 (clan PA(C)). These peptidases are related to serine endopeptidases of family S1 and are restricted to RNA viruses, where they are involved in viral polyprotein processing during replication PUBMED:12093723, PUBMED:10725411, PUBMED:11842254.\ 321 IPR007795 \ This family contains uncharacterised bacterial membrane proteins of unknown function.\ 473 IPR001451 \

    A variety of bacterial transferases contain a repeat structure composed of tandem repeats of a [LIV]-G-X(4) hexapeptide, which, in the tertiary structure of LpxA (UDP N-acetylglucosamine acyltransferase) PUBMED:7481807, has been shown to form a left-handed parallel beta helix. A number of different transferase protein families contain this repeat, such as galactoside acetyltransferase-like proteins PUBMED:11937062, the gamma-class of carbonic anhydrases PUBMED:10924115, and tetrahydrodipicolinate-N-succinlytransferases (DapD), the latter containing an extra N-terminal 3-helical domain PUBMED:11910040.

    \ 929 IPR001867 \

    This domain is almost always found associated with the response regulator receiver domain (see ). It may play a role in DNA binding PUBMED:9016718.

    \ 42 IPR000873 \ A number of prokaryotic and eukaryotic enzymes, which appear to act via an ATP-dependent covalent \ binding of AMP to their substrate, share a region of sequence similarity PUBMED:2118102, PUBMED:2911486, \ PUBMED:2254270. This region is a Ser/Thr/Gly-rich domain that is further characterised by a conserved \ Pro-Lys-Gly triplet. The family of enzymes includes luciferase, long chain fatty acid Co-A ligase, \ acetyl-CoA synthetase and various other closely-related synthetases.\ 3483 IPR005550 \

    Members of this family are components of the mitotic spindle. It has been shown that Ndc80/HEC from yeast is part of a complex called the Ndc80p complex PUBMED:11266451. This complex is thought to bind to the microtubules of the spindle.

    \ 4655 IPR001368 \

    A number of proteins, some of which are known to be receptors for growth factors have \ been found to contain a cysteine-rich domain at the N-terminal region that can be \ subdivided into four (or in some cases, three) repeats containing six conserved\ cysteines all of which are involved in intrachain disulphide bonds PUBMED:8387891.

    \ \

    CD27 (also called S152 or T14) mediates a co-stimulatory signal for T and B cell activation and is involved in murine T cell development. Tyrosine-phosphorylation of ZAP-70 following CD27 ligation of T cells has been reported PUBMED:7989747, but not confirmed independently. CD30 was originally identified as Ki-1, an antigen expressed on Reed-Sternberg cells in Hodgkin's\ lymphomas and other non-Hodgkin's lymphomas, particularly diffuse large-cell lymphoma and immunoblastic lymphoma. CD30 has pleiotropic effects on CD30-positive lymphoma cell lines ranging from cell proliferation to cell death. It is thought to be involved in negative selection of T-cells in the thymus and is involved in TCR-mediated cell death. CD30 is a member of the TNFR family of molecules, activate NFkB through interaction with TRAF2 and TRAF5. CD40 (Bp50) plays a central role in the regulation of cell-mediated immunity as well as antibody mediated immunity. It is central to T cell dependent (TD)-responses and may influence survival of B cell lymphomas.

    \

    CD95 (also called APO-1, fas antigen, Fas tumor necrosis factor receptor superfamily, member 6, TNFRSF6 or apoptosis antigen 1, APT1) is expressed, typically at high levels, on activated T and B cells. It is involved in the mediation of apoptosis-inducing signals.

    \ \

    Other proteins known to belong to this family PUBMED:1653571, PUBMED:2174582, \ PUBMED:15335933, PUBMED:15335677 are, tumor Necrosis Factor type I and type II receptors (TNFR), shope \ fibroma virus soluble TNF receptor (protein T2), lymphotoxin alpha/beta receptor, \ low-affinity nerve growth factor receptor (LA-NGFR) (p75), T-cell antigen OX40,\ Wsl-1, a receptor (for a yet undefined ligand) that mediates apoptosis and Vaccinia virus \ protein A53 (SalF19R).

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 539 IPR005547 \

    Members of this family are involved in determining life span PUBMED:9872981. The molecular mechanisms by which LAG1 determines longevity are unclear, although some evidence suggest a participation in ceramide synthesis PUBMED:1387200.

    \ 6980 IPR009822 \

    This family consists of several hypothetical bacterial proteins of around 180 residues in length, which are often known as YaeQ. YaeQ is homologous to RfaH, a specialised transcription elongation protein. YaeQ is known to compensate for loss of RfaH function PUBMED:9604894.

    \ 6558 IPR009602 \

    This family consists of several eukaryotic sequences of around 270 residues in length. Members of this family are found in mouse, human and Drosophila melanogaster. The function of this family is unknown.

    \ 2185 IPR007505 \ This is a family of hypothetical prokaryotic proteins.\ 7662 IPR013102 \

    This domain is found at the C-terminal end of the large alpha/beta domain making up various pyrimidine nucleoside phosphorylases PUBMED:9817849, PUBMED:2199449. It has slightly different conformations in different members of this family. For example, in pyrimidine nucleoside phosphorylase (PYNP, ) there is an added three-stranded anti-parallel beta sheet as compared to other members of the family, such as E. coli thymidine phosphorylase (TP, ) PUBMED:9817849. The domain contains an alpha/ beta hammerhead fold and residues in this domain seem to be important in formation of the homodimer PUBMED:9817849.

    \ 4208 IPR000509 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. The L36E ribosomal family consists of mammalian, Caenorhabditis elegans and Drosophila L36, Candida albicans L39, and yeast YL39 ribosomal proteins PUBMED:8484789.

    \ 4548 IPR002995 \

    The surfeit locus gene SURF4 (or surf-4) encodes a conserved integral eukaryotic membrane protein of about 270 to 300 amino-acid residues that seems to be located in the endoplasmic reticulum PUBMED:7540914.

    \ 1706 IPR005797 \

    In the mitochondrion of eukaryotes and in aerobic prokaryotes, cytochrome b is a component of respiratory chain complex III () - also known as the bc1 complex or ubiquinol-cytochrome c reductase. In plant chloroplasts and cyanobacteria, there is a analogous protein, cytochrome b6, a component of the plastoquinone-plastocyanin reductase (), also known as the b6f complex.

    \

    Cytochrome b/b6 PUBMED:2509716, PUBMED:8329437 is an integral membrane protein of approximately 400 amino acid residues that probably has 8 transmembrane segments. In plants and cyanobacteria, cytochrome b6 consists of two subunits encoded by the petB and petD genes. The sequence of petB is colinear with the N-terminal part of mitochondrial cytochrome b, while petD corresponds to the C-terminal part.\ Cytochrome b/b6 non-covalently binds two heme groups, known as b562 and b566. Four conserved histidine residues are postulated to be the ligands of the iron atoms of these two heme groups.

    \

    Apart from regions around some of the histidine heme ligands, there are a few conserved regions in the sequence of b/b6. The best conserved of these regions includes an invariant P-E-W triplet which lies in the loop that separates the fifth and sixth transmembrane segments. It seems to be important for electron transfer at the ubiquinone redox site - called Qz or Qo (where o stands for outside) - located on the outer side of the membrane. This entry is the N-terminus of these proteins.

    \ 7869 IPR012529 \

    This family consists of the attractin family of water-borne pheromone. Mate attraction in Aplysia involves a long-distance water-borne signal in the form of the attractin peptide that is released during egg laying. These peptides contain 6 conserved cysteines and are folded into 2 antiparallel helices. The second helix contains the IEECKTS sequence conserved in Aplysia attractins PUBMED:15118100.

    \ 388 IPR002770 \

    Formylmethanofuran:tetrahyromethanopterin formyltransferase (Ftr) is involved in C1 metabolism in methanogenic archaea, sulphate-reducing archaea and methylotrophic bacteria. It catalyses the following reversible reaction:

    \ \ \

    Ftr from the thermophilic methanogen Methanopyrus kandelri (optimum growth temperature 98 degrees C) is a hyperthermophilic enzyme that is absolutely dependent on the presence of lyotropic salts for activity and thermostability. The crystal structure of Ftr, determined to a reveals a homotetramer composed essentially of two dimers. Each subunit is subdivided into two tightly associated lobes both consisting of a predominantly antiparallel beta sheet flanked by alpha helices forming an alpha/beta sandwich structure. The approximate location of the active site was detected in a region close to the dimer interface PUBMED:9195883. Ftr from the mesophilic methanogen Methanosarcina barkeri and the sulphate-reducing archaeon Archaeoglobus fulgidus have a similar structure PUBMED:12192072

    \ \

    In the methylotrophic bacterium Methylobacterium extorquens, Ftr interacts with three other polypeptides to form an Ftr/cyclohydrolase complex which catalyses the hydrolysis of formyl-tetrahydromethanopterin to formate during growth on C1 substrates PUBMED:12123819.

    \ \ 6706 IPR009672 \

    This family consists of several Pkip-1 proteins, which seem to be specific to Nucleopolyhedroviruses. The function of this family is unknown although it has been found that Pkip-1 is not essential for virus replication in cell culture or by in vivo intrahaemocoelic injection PUBMED:12867634.

    \ 159 IPR003302 \ SPRR genes (formerly SPR) encode a novel class of polypeptides (small proline rich proteins) that are strongly induced during differentiation of human epidermal keratinocytes in vitro and in vivo.The most characteristic\ feature of the SPRR gene family resides in the structure of the central segments of the encoded polypeptides that are built up from tandemly repeated units of either eight (SPRR1 and SPRR3) or nine (SPRR2) amino\ acids with the general consensus XKXPEPXX where X is any amino acid PUBMED:8325635.\ 5659 IPR008625 \ This family consists of several GAGE and XAGE proteins which are found exclusively in humans. The function of this family is unknown although they have been implicated in Homo sapiens cancers PUBMED:11992404.\ 6490 IPR010598 \

    This entry represents the C terminus of D-glucuronyl C5-epimerase. Glucuronyl C5-epimerases catalyse the conversion of D-glucuronic acid (GlcUA) to L-iduronic acid (IdceA) units during the biosynthesis of glycosaminoglycans PUBMED:9346972.

    \ 1014 IPR004181 \

    Miz1 (Msx-interacting-zinc finger) is a zinc finger-containing protein with homology to the yeast protein, Nfi-1. Miz1 is a sequence specific DNA binding protein that can function as a positive-acting transcription factor. Miz1 binds to the homeobox protein Msx2, enhancing the specific DNA-binding ability of Msx2 PUBMED:9256341. Other proteins containing this domain include the human pias family (protein inhibitor of activated STAT protein).

    \ 4940 IPR000606 \ This family includes RNA helicases thought to be involved in duplex unwinding during viral RNA replication.\ Members of this family are found in positive-strand single stranded RNA viruses from superfamily 1. This helicase has multiple roles at different stages of viral RNA replication, as dissected by mutational analysis PUBMED:10217401. \ 3410 IPR001136 \ \ The merozoite surface antigen 2 (MSA-2) may play a role in the merozoite\ attachment to the erythrocyte. It is thought to be attached to the membrane\ by a GPI-anchor.\ \ 967 IPR005373 \

    Members of this family are proteins of unknown function.

    \ 3094 IPR003520 \

    Secretion of virulence factors in Gram-negative bacteria involves \ transportation of the protein across two membranes to reach the cell \ exterior. There have been four secretion systems described in \ sequence similarities in plant pathogens like Ralstonia and Erwinia PUBMED:8969244.

    \

    The type III secretion system is of great interest, as it is used to \ transport virulence factors from the pathogen directly into the host cell \ PUBMED:10334981 and is only triggered when the bacterium comes into close contact with\ the host. The protein subunits of the system are very similar to those of \ bacterial flagellar biosynthesis. However, while the latter forms a\ ring structure to allow secretion of flagellin and is an integral part of\ the flagellum itself PUBMED:10564516, type III subunits in the outer membrane\ translocate secreted proteins through a channel-like structure.

    \

    The Salmonella/Shigella invasion protein E gene (InvE) is one such type\ III secretion protein subunit, and is localised to the outer membrane of \ the SPI I pathogenicity island, and is involved in the surface presentation.

    \ 2950 IPR005708 \

    Alkaptonuria (AKU), a rare hereditary disorder, was the first disease to be interpreted as an inborn error of metabolism. The\ deficiency causes homogentisic aciduria, ochronosis, and arthritis. AKU patients are deficient for homogentisate 1,2 dioxygenase (), the enzyme that mediates the conversion of homogentisate to maleylacetoacetate; a step in the catabolism of both tyrosine and phenylalanine. \

    \ \ 6789 IPR010718 \

    This family includes a number of hypothetical bacterial and archaeal proteins of unknown function.

    \ 1564 IPR000704 \

    Casein kinase, a ubiquitous, well-conserved protein kinase involved in cell metabolism and\ differentiation, is characterized by its preference for Ser or Thr in acidic stretches of amino acids.\ The enzyme is a tetramer of 2 alpha- and 2 beta-subunits PUBMED:2666134, PUBMED:1856204. However, some\ species (e.g., mammals) possess 2 related forms of the alpha-subunit (alpha and alpha'), while others\ (e.g., fungi) possess 2 related beta-subunits (beta and beta') PUBMED:7737972. The alpha-subunit is the\ catalytic unit and contains regions characteristic of serine/threonine protein kinases. The beta-subunit\ is believed to be regulatory, possessing an N-terminal auto-phosphorylation site, an internal acidic\ domain, and a potential metal-binding motif PUBMED:7737972. The beta subunit is a highly conserved protein\ of about 25 kD that contains, in its central section, a cysteine-rich motif that could be involved in\ binding a metal such as zinc PUBMED:8027080. The mammalian beta-subunit gene promoter shares common\ features with those of other mammalian protein kinases and is closely related to the promoter of the\ regulatory subunit of cAMP-dependent protein kinase PUBMED:7737972.

    \ 2283 IPR006954 \ This family contains a conserved region found in a number of uncharacterised Caenorhabditis elegans proteins.\ 230 IPR002804 \

    The function of this group of proteins from the Archaea is unknown. A single homolog is found in the bacterium, Aquifex aeolicus.

    \ 5230 IPR008792 \ This family contains several bacterial coenzyme PQQ synthesis protein D (PqqD) sequences. This protein is required for coenzyme pyrrolo-quinoline-quinone (PQQ) biosynthesis.\ 5990 IPR010381 \

    This family consists of proteins of unknown function found in Caenorhabditis species.

    \ 5942 IPR010359 \

    This is a family of bacterial and viral proteins with undetermined function. A conserved H-E-X-X-H motif is suggestive of a catalytic active site and shows similarity to .

    \ 319 IPR007751 \

    This domain is associated with eukaryotic proteins of unknown function, which are hydrolase-like.

    \ 4932 IPR006077 \

    Vinculin is a eukaryotic protein that seems to be involved in the\ attachment of the actin-based microfilaments to the plasma membrane. Vinculin\ is located at the cytoplasmic side of focal contacts or adhesion plaques\ PUBMED:2112986. In addition to actin, vinculin interacts with other structural\ proteins such as talin and alpha-actinins.

    \

    Vinculin is a large protein of 116 kDa (about a 1000 residues). Structurally the protein consists of an acidic N-terminal domain of about 90 kDa separated from a basic C-terminal domain of about 25 kDa by a proline-rich region of about 50 residues. The central part of the N-terminal domain consists of a variable number (3 in vertebrates, 2 in Caenorhabditis elegans) of repeats of a 110 amino acids domain.

    \

    Alpha-catenins are evolutionary related to vinculin PUBMED:1924379. Catenins are proteins that associate with the cytoplasmic domain of a variety of cadherins. The association of catenins to cadherins produces a complex which is linked to the actin filament network, and which seems to be of primary importance for cadherins cell-adhesion properties. Three different types of catenins seem to exist: alpha, beta, and gamma. Alpha-catenins are proteins of about 100 kDa which are evolutionary related to vinculin. In terms of their structure the most significant differences are the absence, in alpha-catenin, of the repeated domain and of the proline-rich segment.

    \ \ 3720 IPR005077 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the MEROPS peptidase family C11 (clostripain family, clan CD).

    \ 3205 IPR002918 \ Lipases or triacylglycerol acylhydrolases hydrolyse ester bonds in triacylglycerol giving diacylglycerol, monoacylglycerol, glycerol and free fatty acids PUBMED:1320940. These have been called class 2 as they are not clearly related to other lipase families.\

    These enzymes catalyse the reaction:

    \ \ 4040 IPR002505 \

    This entry contains both phosphate acetyltransferase :\ \ \ and\ phosphate butaryltransferase :

    \ \ \ \

    These enzymes catalyse the\ transfer of an acetyl or butaryl group to orthophosphate.

    \ 5782 IPR010277 \

    This family consists of a number of phage late control gene D proteins and related bacterial sequences.

    \ 6828 IPR010733 \

    This family consists of several hypothetical eukaryotic sequences of around 400 residues in length. The function of this family is unknown.

    \ 7092 IPR009890 \

    This family contains a number of eukaryotic etoposide-induced 2.4 (EI24) proteins approximately 350 residues long. In cells treated with the cytotoxic drug etoposide, EI24 is induced by p53 PUBMED:8649819. It has been suggested to play an important role in negative cell growth control PUBMED:10594026.

    \ 4898 IPR006953 \ This domain identifies a group of proteins, which are described as: General vesicular transport factor, Transcytosis associated protein (TAP) or Vesicle docking protein, this myosin-shaped molecule consists of an N-terminal globular head region, a coiled-coil tail which mediates dimerisation, and a short C-terminal acidic region PUBMED:11927603. p115 tethers COP1 vesicles to the Golgi by binding the coiled coil proteins giantin (on the vesicles) and GM130 (on the Golgi), via its C-terminal acidic region. It is required for intercisternal transport in the Golgi stack. This domain is found in the head region. The head region is highly conserved, but its function is unknown. It does not seem to be essential for vesicle tethering PUBMED:11927603. The N-terminal part of the head region contains context-detected Armadillo/beta-catenin-like repeats.\ 7315 IPR005414 \

    The type III secretion system of Gram-negative bacteria is used to transport virulence factors from the pathogen directly into the host cell PUBMED:9618447 and is only triggered when the bacterium comes into close contact with the host. Effector proteins secreted by the type III system do not possess a secretion signal, and are considered unique because of this. Salmonella spp. \ secrete an effector protein called SopE that is responsible for stimulating \ the reorganisation of the host cell actin cytoskeleton, and ruffling of the \ cellular membrane PUBMED:9482928. It acts as a guanyl-nucleotide-exchange factor on Rho-GTPase proteins such as Cdc42 and Rac. As it is imperative for the bacterium \ to revert the cell back to its "normal" state as quickly as possible, \ another tyrosine phosphatase effector called SptP reverses the actions \ brought about by SopE PUBMED:11316807.

    \ \

    Recently, it has been found that SopE and its protein homologue SopE2 can\ activate different sets of Rho-GTPases in the host cell PUBMED:11316807. Far from being a redundant set of two similar type III effectors, they both act in unison \ to specifically activate different Rho-GTPase signalling cascades in the\ host cell during infection.\

    \ 4091 IPR000593 \ Ras GTPase-activating protein (rasGAP) is a major contributor to the downregulation of ras by facilitating GTP hydrolysis of activated ras. In addition, GAP participates in the down-stream effector system of the ras signaling pathway. Abnormal signal transduction involving activated ras genes plays a major role in the development of a variety of tumors. Depending on the precise genetic alteration, its location within the gene and the effects it exerts on protein function, rasGAP can theoretically function as either an oncogene or as a tumor suppressor gene PUBMED:8738474.\ 1422 IPR002126 \

    Cadherins are a family of adhesion molecules that mediate Ca2+-dependent cell-cell adhesion in all solid tissues of the organism which modulate a wide variety of processes including cell polarisation and migration PUBMED:2197976, PUBMED:,PUBMED:14570569. Cadherin-mediated cell-cell junctions are formed as a result of interaction between extracellular domains of identical cadherins, which are\ located on the membranes of the neighbouring cells. The stability of these adhesive junctions is ensured by binding of the intracellular cadherin\ domain with the actin cytoskeleton. There are a number of different isoforms distributed in a tissue-specific manner in a wide variety of organisms. Cells containing different cadherins tend to segregate in vitro, while those that contain the same cadherins tend to preferentially aggregate together. This observation is linked to the finding that cadherin expression causes morphological changes involving the positional segregation of cells into layers, suggesting they may play an important role in the sorting of different cell types during morphogenesis, histogenesis and regeneration. They may also be involved in the regulation of tight and gap junctions, and in the control of intercellular spacing. Cadherins are evolutionary related to the desmogleins which are component of intercellular desmosome junctions involved in the interaction of plaque proteins.

    \

    Structurally, cadherins comprise a number of domains: classically, these include a signal sequence; a propeptide of around 130 residues; a single transmembrane domain and five tandemly repeated extracellular cadherin domains, 4 of which are cadherin repeats, and the fifth contains 4 conserved cysteines and a N-terminal cytoplasmic domain PUBMED:11736639. However, proteins are designated as members of the broadly defined cadherin family if they have one\ or more cadherin repeats. A cadherin repeat is an independently folding sequence of\ approximately 110 amino acids that contains motifs with the conserved sequences DRE,\ DXNDNAPXF, and DXD. Crystal structures have revealed that multiple cadherin domains form Ca2+-dependent rod-like structures with a conserved Ca2+-binding pocket at the\ domain-domain interface. Cadherins depend on calcium for their\ function: calcium ions bind to specific residues in each cadherin\ repeat to ensure its proper folding, to confer rigidity upon the extracellular domain and is essential for\ cadherin adhesive function and for protection against protease digestion.

    \ 6544 IPR009591 \

    This family consists of several Sugar beet yellow virus (SBYV) putative membrane-binding proteins of around 54 residues in length. The function of this family is unknown.

    \ 3938 IPR006791 \

    This entry represents the Pox virus D2 proteins.

    \ 1009 IPR000571 \ Zinc finger domains are thought to be involved in DNA-binding, and exist as different types, depending on the\ positions of the cysteine residues. Proteins containing zinc finger domains of the C-x8-C-x5-C-x3-H type include zinc\ finger proteins from eukaryotes involved in cell cycle or growth phase-related regulation, e.g. human TIS11B\ (butyrate response factor 1), a probable regulatory protein involved in regulating the response to growth factors,\ and the mouse TTP growth factor-inducible nuclear protein, which has the same function. The mouse TTP protein\ is induced by growth factors. Another protein containing this domain is the human splicing factor U2AF 35 kD\ subunit, which plays a critical role in both constitutive and enhancer-dependent splicing by mediating essential\ protein-protein interactions and protein-RNA interactions required for 3' splice site selection. It has been\ shown that different CCCH zinc finger proteins interact with the 3'\ untranslated region of various mRNA PUBMED:9703499, PUBMED:10330172. This type of zinc finger is very often present in two\ copies.\ 3032 IPR007648 \ ATP synthase inhibitor prevents the enzyme from switching to ATP hydrolysis during collapse of the electrochemical gradient, for example during oxygen deprivation PUBMED:8961923 ATP synthase inhibitor forms a one-to-one complex with the F1 ATPase, possibly by binding at the alpha-beta interface. It is thought to inhibit ATP synthesis by preventing the release of ATP. The minimum inhibitory region for bovine inhibitor () is from residues 39 to 72. The inhibitor has two oligomeric states, dimer (the active state) and tetramer. At low pH, the inhibitor forms a dimer via antiparallel coiled-coil interactions between the C-terminal regions of two monomers. At high pH, the inhibitor forms tetramers and higher oligomers by coiled-coil interactions involving the N terminus and inhibitory region, thus preventing the inhibitory activity PUBMED:8961923.\ 667 IPR008914 \

    The PEBP family is a highly conserved group of proteins that have been identified in numerous tissues in a wide variety of organisms, including bacteria, yeast, nematodes, plants, drosophila and mammals. The various functions described for members of this family include lipid binding, neuronal development PUBMED:12492898, serine protease inhibition PUBMED:11034991, the control of the morphological switch between shoot growth and flower structures PUBMED:10764580, and the regulation of several signalling pathways such as the MAP kinase pathway PUBMED:12551925, and the NF-kappaB pathway PUBMED:11585904. The control of the latter two pathways involves the PEBP protein RKIP, which interacts with MEK and Raf-1 to inhibit the MAP kinase pathway, and with TAK1, NIK, IKKalpha and IKKbeta to inhibit the NF-kappaB pathway. Other PEBP-like proteins that show strong structural homology to PEBP include Escherichia coli YBHB and YBCL, the rat neuropeptide HCNP, and Antirrhinum centroradialis CEN.

    \

    Structures have been determined for several members of the PEBP-like family, all of which show extensive fold conservation. The structure consists of a large central beta-sheet flanked by a smaller beta-sheet on one side, and an alpha helix on the other. Sequence alignments show two conserved central regions, CR1 and CR2, that form a consensus signature for the PEBP family. These two regions form part of the ligand-binding site, which can accommodate various anionic groups. The N- and C-terminal regions are the least conserved, and may be involved in interactions with different protein partners. The N-terminal residues 2-12 form the natural cleavage peptide HCNP involved in neuronal development. The C-terminal region is deleted in plant and bacterial PEBP homologues, and may help control accessibility to the active site.

    \ \ 5574 IPR008909 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \ This all alpha helical domain is the anticodon binding domain of Arginyl tRNA synthetase. This domain is known as the DALR domain after characteristic conserved amino acids PUBMED:10447505.\ 6551 IPR004753 \

    Bacterial cell shape varies greatly between species, and characteristic\ morphologies are used for identification purposes. In addition to individual\ cell shape, the way in which groups of cells are arranged is also typical of\ some bacterial species, especially Gram-positive coccoids. For many years, it was believed that micro-organisms with other than\ spheroidal cell shapes maintained morphology by means of their external cell \ walls. Recently, however, studies of the Gram-positive rod Bacillus subtilis\ have revealed two related genes that are essential for the integrity of cell\ morphogenesis PUBMED:11290328. Termed mreB and mbl, the gene products localise close to\ the cell surface, forming filamentous helical structures. Many \ homologues have been found in diverse bacterial groups, suggesting a common \ ancestor PUBMED:11544518.

    \

    The crystal structure of MreB from Thermotoga maritima has been resolved \ using X-ray crystallography PUBMED:11544518. It consists of 19 beta-strands and 15 alpha-\ helices, and shows remarkable structural similarity to eukaryotic actin. \ MreB crystals also contain proto-filaments, with individual proteins \ assembling into polymers like F-actin, in the same orientation. It is \ hypothesised therefore, that MreB was the forerunner of actin in early \ eukaryotes PUBMED:11731313.

    \ 6383 IPR009501 \

    This family consists of several hypothetical proteins from bacteria and from Dictyostelium discoideum (Slime mold). The function of this family is unknown.

    \ 4329 IPR000685 \ Ribulose bisphosphate carboxylase (RuBisCO) PUBMED:6351728, PUBMED:12221984 catalyzes the\ initial step in Calvin's reductive pentose phosphate cycle in plants as well as purple and green bacteria.\ It consists of a large catalytic unit and a small subunit of undetermined function. In plants, the large\ subunit is coded by the chloroplastic genome while the small subunit is encoded in the nuclear genome.\ Molecular activation of RuBisCO by CO2 involves the formation of a carbamate with the epsilon-amino group\ of a conserved lysine residue. This carbamate is stabilized by a magnesium ion. One of the ligands of\ the magnesium ion is an aspartic acid residue close to the active site lysine PUBMED:1969412.\ 2692 IPR000211 \

    The movement of bipartite Geminiviruses such as squash leaf curl virus (SqLCV) requires the cooperative\ interaction of two essential virus-encoded movement proteins, BR1 and BL1. Recent studies of SqLCV and bean dwarf mosaic virus have shown that BR1 and BL1 act in a cooperative manner to move the viral genome intracellularly from the nucleus to the cytoplasm and across the wall cell to cell. BR1 is a nuclear shuttle protein, and it has been proposed to bind newly replicated viral ssDNA genomes and move these between the nucleus and cytoplasm. These BR1-genome complexes are then directed to the cell periphery through interactions between BR1 and\ BL1, where, as the result of BL1 action, the complexes are moved to adjacent uninfected cells. The precise\ mechanism by which BL1 acts to transport these genome complexes across the cell wall, and whether this may differ in different cell\ types, remains at issue PUBMED:9765472.

    \ 2478 IPR003097 \ Flavoprotein pyridine nucleotide cytochrome reductases PUBMED:1748631 (FPNCR) catalyse the interchange of reducing equivalents between one-electron carriers and the two-electron-carrying nicotinamide dinucleotides. The enzymes\ include ferredoxin:NADP+reductases (FNR) PUBMED:8027025, plant and fungal NAD(P)H:nitrate reductases PUBMED:1748631, PUBMED:12165428, NADH:cytochrome b5 reductases PUBMED:3700359, NADPH:P450 reductases PUBMED:1908607, NADPH:sulphite\ reductases PUBMED:2550423, nitric oxide synthases PUBMED:1712077, phthalate dioxygenase reductase PUBMED:8298460, and various\ other flavoproteins.\ 5104 IPR007941 \

    This family consists of several uncharacterised eukaryotic proteins.

    \ 1187 IPR006992 \

    These proteins are related to the metal-dependent hydrolase superfamily PUBMED:9144792. The family includes 2-amino-3-carboxymuconate-6-semialdehyde decarboxylase which converts alpha-amino-beta-carboxymuconate-epsilon- semialdehyde (ACMS) to alpha-aminomuconate semialdehyde (AMS). ACMS can be converted non-enzymatically to quinolate, a potent endogenous excitoxin of neuronal cells which is implicated in the pathogenesis of various neurodegenerative disorders. In the presence of AMCSD, ACMS is converted to AMS, a benign catabolite.

    \ \ 1557 IPR006495 \

    This group of sequences represent the acyl carrier protein (gamma subunit) of the holoenzyme citrate lyase () composed of alpha (), beta (), and acyl carrier protein subunits in a stoichiometric relationship of 6:6:6. Citrate lyase is an enzyme which converts citrate to oxaloacetate. In bacteria, this reaction is involved in citrate fermentation. The acyl carrier protein covalently binds the coenzyme of citrate lyase. The set contains an experimentally characterized member from Leuconostoc mesenteroides PUBMED:9457870. The sequences come from a wide range of Gram-positive bacteria. For Gram-negative bacteria, it appears that only sequences from the gamma proteobacteria are included.

    \ 7949 IPR012325 \

    Assassin bugs (Arthropoda:Insecta:Hemiptera:Reduviidae), sometimes known as\ conenoses or kissing bugs, are one of the largest and morphologically\ diverse families of true bugs feeding on crickets, caterpillars and other\ insects. Some assassin bug species are bloodsucking parasites of mammals, even\ of human. They can be commonly found throughout most of the world and their\ size varies from a few millimeters to as much as 3 or 4 centimeters PUBMED:. The\ toxic saliva of the predatory assassin bugs contains a complex mixture of\ small and large peptides for diverse uses such as immobilizing and pre-digesting their prey, and defense against competitors and predators. Assassin\ bug toxins are small peptides with disulfide connectivity that target ion-channels. They are relatively homologous to the calcium channel blockers\ omega-conotoxins from marine cone snails and belong to the\ four-loop cysteine scaffold structural class PUBMED:11423127, PUBMED:11669615.

    \

    One of these small proteins, Ptu1, blocks reversibly the N-type calcium\ channels, but at the same time is less specific for the L- or P/Q-type calcium\ channels PUBMED:11423127. Ptu1 is 34 amino acid residues long and is cross-linked by 3\ disulfide bridges. Ptu1 contains a beta-sheet region made of 2 antiparallel\ beta-strands and consists of a compact disulfide-bonded core from which four\ loops emerge as well as N- and C-termini PUBMED:11669615. Some assassin bug toxins are listed below:\

    \ 5774 IPR010271 \

    This family consists of toxin-coregulated pilus subunit (TcpA) proteins from Vibrio cholerae and related sequences. The major virulence factors of toxigenic V. cholerae are cholera toxin (CT), which is encoded by a lysogenic bacteriophage (CTXPhi), and toxin-coregulated pilus (TCP), an essential colonisation factor which is also the receptor for CTXPhi. The genes for the biosynthesis of TCP are part of a larger genetic element known as the TCP pathogenicity island PUBMED:12540588.

    \ 156 IPR004214 \

    Cone snail toxins, conotoxins, are small neurotoxic peptides with disulfide connectivity that target ion-channels or G-protein coupled receptors. Based on the number and pattern of disulfide bonds and biological activities, conotoxins can be classified into several families PUBMED:11478951. Omega, delta and kappa families of conotoxins have a knottin or inhibitor cysteine knot scaffold. The knottin scaffold is a very special disulfide-through-disulfide knot, in which the III-VI disulfide bond crosses the macrocycle formed by two other disulfide bonds (I-IV and II-V) and the interconnecting backbone segments, where I-VI indicates the six cysteine residues starting from the N-terminus.

    \

    The disulfide bonding network, as well as specific amino acids in inter-cysteine loops, provide the specificity of conotoxins PUBMED:10988292. The cysteine arrangements are the same for omega, delta and kappa families, even though omega conotoxins are calcium channel blockers, whereas delta conotoxins delay the inactivation of sodium channels, and kappa conotoxins are potassium channel blockers PUBMED:11478951. Mu conotoxins have two types of cysteine arrangements, but the knottin scaffold is not observed. Mu conotoxins target the voltage-gated sodium channels PUBMED:11478951, and are useful probes for investigating voltage-dependent sodium channels of excitable tissues PUBMED:2410412. Alpha conotoxins have two types of cysteine arrangements PUBMED:1390774, and are competitive nicotinic acetylcholine receptor antagonists.

    \ 71 IPR004130 \ Members of this family are found in a range of archaea and eukaryotes and have hypothesised ATP binding activity.\ 7667 IPR012425 \

    This domain is found towards the C-terminal region of various aldolase enzymes. It consists of five alpha-helices, four of which form an antiparallel helical bundle that plugs the C-terminus of the N-terminal TIM barrel domain PUBMED:12764229. The communication domain is thought to play an important role in the heterodimerisation of the enzyme PUBMED:12764229.

    \ 5937 IPR010356 \

    This family consists of several enterobacterial haemolysin (HlyE) proteins. Haemolysin E (HlyE) is a novel pore-forming toxin of Escherichia coli, Salmonella typhi, and Shigella flexneri. HlyE is unrelated to the well characterised pore-forming Escherichia coli haemolysins of the RTX family, haemolysin A (HlyA), and the enterohaemolysin encoded by the plasmid borne ehxA gene of Escherichia coli 0157. However, it is evident that expression of HlyE in the absence of the RTX toxins is sufficient to give a haemolytic phenotype in Escherichia coli. HlyE is a protein of 34 kDa that is expressed during anaerobic growth of Escherichia coli. Anaerobic expression is controlled by the transcription factor, FNR, such that, upon ingestion and entry into the anaerobic mammalian intestine, HlyE is produced and may then contribute to the colonisation of the host PUBMED:10660049.

    \ 3642 IPR007133 \ Members of this family are components of the RNA polymerase II associated Paf1 complex. The Paf1 complex functions during the elongation phase of transcription in conjunction with Spt4-Spt5 and Spt16-Pob3i PUBMED:11927560, PUBMED:11884586.\ 5308 IPR008846 \ This family consists of several different short Staphylococcal proteins, it contains SLUSH A, B and C proteins as well as haemolysin and gonococcal growth inhibitor. Some strains of the coagulase-negative Staphylococcus lugdunensis produce a synergistic hemolytic activity (SLUSH), phenotypically similar to the delta-hemolysin of S. aureus PUBMED:8975897. Gonococcal growth inhibitor from Staphylococcus acts on the cytoplasmic membrane of the gonococcal cell causing cytoplasmic leakage and, eventually, death PUBMED:3134553.\ 5522 IPR008826 \ This family consists of several eukaryotic selenium binding proteins as well as three sequences from archaea. The exact function of this protein is unknown although it is thought that SBP56 participates in late stages of intra-Golgi protein transport PUBMED:10799528. The Lotus japonicus homologue of SBP56, LjSBP is thought to have more than one physiological role and can be implicated in controlling the oxidation/reduction status of target proteins in vesicular Golgi transport PUBMED:12026169.\ 3479 IPR004298 \ Nicotianamine synthase catalyzes the trimerization of S-adenosylmethionine to yield one molecule of\ nicotianamine. Nicotianamine has an important role in plant iron uptake mechanisms. Plants adopt two strategies (termed I and II) of iron acquisition. Strategy I is adopted by all higher plants except graminaceous plants, which adopt strategy II\ PUBMED:10359845, PUBMED:9952442. In strategy I plants, the role of nicotianamine is not fully determined: possible roles include the formation of more\ stable complexes with ferrous than with ferric ion, which might serve as a sensor of the physiological status of iron within\ a plant, or which might be involved in the transport of iron PUBMED:10359845. In strategy II (graminaceous) plants, nicotianamine is the\ key intermediate (and nicotianamine synthase the key enzyme) in the synthesis of the mugineic family (the only known\ family in plants) of phytosiderophores. Phytosiderophores are iron chelators whose secretion by the roots is greatly\ increased in instances of iron deficiency PUBMED:9952442.\ 2696 IPR002621 \ This family consists of putative movement proteins from Maize streak and wheat dwarf virus.\ 6321 IPR009475 \

    This family represents the N-terminal region of several proteins found in Caenorhabditis elegans. The family is often found with .

    \ 1228 IPR003762 \ The Escherichia coli araBAD operon consists of three genes encoding three enzymes that convert L-arabinose to D-xylulose-5 phosphate.\ L-arabinose isomerase (araA) catalyses the coversion of L-arabinose to L-ribulose as the first step in the pathway of L-arabinose utilization as a carbon source PUBMED:9084180.\ 5863 IPR010319 \

    Structural analysis predicts that this family of proteins are bacterial transglutaminase-like cysteine peptidases (BTLCPs) with an invariant Cys-His-Asp catalytic triad and an N-terminal signal sequence. They are predicted to possess the papain-like cysteine proteinase fold and catalyse post-translational protein modification through transamidase, acetylase or hydrolase activity. Inspection of neighbouring genes suggests a link between this predicted activity and a type-I secretion system resembling ATP-binding cassette exporters of toxins and proteases involved in bacterial pathogenicity PUBMED:15288868.

    \ 2829 IPR005494 \ This region contains the Glutathionylspermidine synthase enzymatic activity . This is the C-terminal region in bienzymes such as . Glutathionylspermidine (GSP) synthetases of Trypanosomatidae and Escherichia coli couple hydrolysis of ATP (to ADP and Pi) with formation of an amide bond between spermidine and the glycine carboxylate of glutathione (gamma-Glu-Cys-Gly). In the pathogenic trypanosomatids, this reaction is the penultimate step in the biosynthesis of the antioxidant metabolite, trypanothione (N1,N8-bis-(glutathionyl)spermidine), and is a target for drug design PUBMED:7775463.\ 1155 IPR010918 \

    This entry includes Hydrogen expression/formation protein, HypE, which may be involved in the maturation of NifE hydrogenase; AIR synthase and FGAM synthase, which are involved in de novo purine biosynthesis; and selenide, water dikinase, an enzyme which synthesizes selenophosphate from selenide and ATP.

    \ 6646 IPR009638 \

    This family represents the eukaryotic Fez1 protein. Fez1 contains a leucine-zipper region with similarity to the DNA-binding domain of the cAMP-responsive activating-transcription factor 5 PUBMED:10097140. There is evidence that Fez1 inhibits cancer cell growth through regulation of mitosis, and that its alterations result in abnormal cell growth PUBMED:11504921. Note that some family members contain more than one copy of this region.

    \ 3367 IPR005656 \

    This family includes 2-methylcitrate dehydratase (PrpD) that is required for propionate catabolism. It catalyses the third step of the 2-methylcitric acid cycle.

    \ 6830 IPR010734 \

    This represents a conserved region approximately 180 residues long within eukaryotic copines. Copines are Ca2+-dependent phospholipid-binding proteins that are thought to be involved in membrane-trafficking, and may also be involved in cell division and growth PUBMED:12440769.

    \ 2144 IPR007434 \ This family contains several proteins of uncharacterised function.\ 281 IPR007284 \

    This group of proteins contain one or more copies of the ground-like domain, which are specific to Caenorhabditis elegans and Caenorhabditis briggsae. It has been proposed that the ground-like domain containing proteins may bind and modulate the activity of Patched-like membrane molecules, reminiscent of the modulating activities of neuropeptides PUBMED:10523520.

    \ 2293 IPR007020 \

    These are proteins of unknown function found in Lactococcus lactis and in their associated bacteriophage.

    \ 891 IPR005331 \

    Chondroitin 4-sulphotransferase catalyses the transfer of sulphate to the C-4 position\ of N-acetylgalactosamine in chondroitin and desulphated dermatan sulphate but did not form 4, 6-di-O-sulphated\ N-acetylgalactosamine when chondroitin sulphate C was used as an acceptor. This suggests that 4-O-sulphation at N-acetylgalactosamine may precede epimerization of glucuronic acid to iduronic acid during dermatan sulphate\ biosynthesis. HNK-1 and other Golgi-associated sulphotransferases share homologous sequences\ including the RDP motif.

    \ 4962 IPR003307 \

    This domain of unknown function is found at the C-terminus of several translation initiation factors PUBMED:8520487. It was first detected at the very C-termini of the yeast protein GCD6, eIF-2B epsilon, and two other eukaryotic translation initiation factors, eIF-4 gamma and eIF-5 and it may be involved in the interaction of eIF-2B, eIF-4 gamma, and eIF-5 with eIF-2 PUBMED:8520487.

    \ 5381 IPR008480 \ This family consists of several plant proteins of unknown function. Three of the sequences (from Gossypium hirsutum) in this family are described as Gossypium hirsutum fibre expressed proteins PUBMED:9750105. The remaining sequences, found in Arabidopsis thaliana, are uncharacterised.\ 1937 IPR003870 \ This domain is found in a family of hypothetical proteins, mostly from Mycobacterium tuberculosis, which includes a putative transposase.\ 6453 IPR009533 \

    This family consists of several hypothetical eukaryotic proteins of unknown function.

    \ 1498 IPR004918 \ In the budding yeast Saccharomyces cerevisiae, cell division control protein Cdc37 is required for the productive formation of Cdc28-cyclin complexes. Cdc37 may be a kinase targeting subunit of Hsp90 PUBMED:9242486.\ 8137 IPR013222 \

    This novel putative carbohydrate binding module (NPCBM) domain is found at the N-terminus of glycosyl hydrolase family 98 proteins.

    \ 7871 IPR012574 \

    This family consists of proteins with similarity to the mitochondrial proteolipids. Mitochondrial proteolipid consists of about 60 amino acids residues and is about 6.8 kDa in size PUBMED:2298292.

    \ 7001 IPR009836 \

    This family represents a conserved region approximately 150 residues long within a number of hypothetical plant proteins of unknown function.

    \ 7057 IPR009868 \

    This family consists of several VirE2 proteins which seem to be specific to Agrobacterium tumefaciens and Rhizobium etli. VirE2 is known to interact, via its C terminus, with VirD4. Agrobacterium tumefaciens transfers oncogenic DNA and effector proteins to plant cells during the course of infection. Substrate translocation across the bacterial cell envelope is mediated by a type IV secretion (TFS) system composed of the VirB proteins, as well as VirD4, a member of a large family of inner membrane proteins implicated in the coupling of DNA transfer intermediates to the secretion machine. VirE2 is therefore thought to be a protein substrate of a type IV secretion system which is recruited to a member of the coupling protein superfamily PUBMED:12950931.

    \ 580 IPR000353 \

    Major Histocompatibility Complex (MHC) glycoproteins are heterodimeric cell surface receptors that function to present antigen peptide fragments to T cells responsible for cell-mediated immune responses. MHC molecules can be subdivided into two groups on the basis of structure and function: class I molecules present intracellular antigen peptide fragments (~10 amino acids) on the surface of the host cells to cytotoxic T cells; class II molecules present exogenously derived antigenic peptides (~15 amino acids) to helper T cells. MHC class I and II molecules are assembled and loaded with their peptide ligands via different mechanisms. However, both present peptide fragments rather than entire proteins to T cells, and are required to mount an immune response.

    \

    Class II MHC glycoproteins are expressed on the surface of antigen-presenting cells (APC), including macrophages, dendritic cells and B cells. MHC II proteins present peptide antigens that originate extracellularly from foreign bodies such as bacteria. Proteins from the pathogen are degraded into peptide fragments within the APC, which sequesters these fragments into the endosome so they can bind to MHC class II proteins, before being transported to the cell surface. MHC class II receptors display antigens for recognition by helper T cells (stimulate development of B cell clones) and inflammatory T cells (cause the release of lymphokines that attract other cells to site of infection) PUBMED:15120183.

    \

    MHC class II molecules are comprised of two membrane-spanning chains, alpha () and beta, of similar size. Both chains consist of two globular domains (N- and C-terminal), and a transmembrane segment to anchor them to the membrane PUBMED:7612235. A groove in the structure acts as the peptide-binding site. This entry represents the N-terminal domain (also called beta-1 domain) of the beta chain.

    \ \ 7654 IPR012386 \

    This group represents a 2',3' cyclic phosphodiesterase, plant type. Please see the following relevant references: PUBMED:11694509, PUBMED:12466548.

    \ 3633 IPR003186 \ PA28 activator complex (also known as 11S regulator of 20S proteasome) is a ring shaped hexameric structure of alternating alpha and beta subunits. This entry represents the beta subunit. The activator complex binds to the 20S proteasome and stimulates peptidase activity in and ATP-independent manner.\ 2227 IPR006707 \ This is a family of hypothetical bacterial proteins.\ 3486 IPR000900 \ Nebulin is a 600-800 kD protein found in the thin filaments of striated vertebrate muscle. It is \ presumed to play a role in binding and stabilising F-actin PUBMED:8609630, essentially by providing \ a template for actin polymerisation (i.e., acting as an "actin zipper"). The amino acid sequence \ shows a uniform repeating pattern along its length, a repeated 35-residue motif constituting up to \ 97% of the polypeptide. Analysis of individual repeats reveals a progressive N- to C-terminal \ divergence, coupled with an increasing alpha-helix propensity. This correlates with a higher\ binding affinity for F-actin at the C-terminus. Thus, it is postulated that once the repeats have \ formed an initiation complex, the whole length of the nebulin molecule may then associate in a highly \ co-operative process with the thin filament, in a manner similar to the closing of a zipper PUBMED:8609630.\ 4742 IPR006678 \

    tRNA-intron endonucleases () cleave pre-tRNA producing 5'-hydroxyl and 2',3'-cyclic phosphate termini, and specifically removing the intron PUBMED:9200602. This entry is for N-terminal domain of tRNA-intron endonuclease.

    \ 7228 IPR010873 \

    This family contains interleukin 11 (approximately 200 residues long). This is a secreted protein that stimulates megakaryocytopoiesis, resulting in increased production of platelets, as well as activating osteoclasts, inhibiting epithelial cell proliferation and apoptosis, and inhibiting macrophage mediator production. These functions may be particularly important in mediating the hematopoietic, osseous and mucosal protective effects of interleukin 11 PUBMED:9416001. Family members seem to be restricted to mammals.

    \ 4770 IPR001042 \

    This signature defines two sets of proteins, one of approximately 440 amino acids the other of approximately 1755 amino acids in length. The latter group described as Ty1 protein B have an aspartic peptidase signature that belongs to MEROPS peptidase family A11 (clan AA), subfamily A11B.

    \ \ \

    Yeast retrotransposon Ty1 produces its proteins as precursors that are subsequently cleaved by an aspartic protease encoded by the element. Cleavage of the Gag and Gag-Pol polyprotein precursors is a critical step in proliferation of retroviruses and retroelements. These cleavage events are essential for transposition as they release the active reverse transcriptase and integrase and they modify the structure of the virus-like particles in a way that is analogous to the morphological changes that occur during retrovirus core maturation PUBMED:9261411, PUBMED:8971723, PUBMED:8764068.

    \ \ 1330 IPR006934 \ Baculovirus occlusion-derived virus (ODV) derives its envelope from an intranuclear membrane source. N-terminal amino acid sequences of the Autographa californica nuclear polyhedrosis virus (AcMNPV) envelope protein ODV-E66 is highly hydrophobic. This defined hydrophobic domain was shown to direct the protein, E66, to induce membrane microvesicles within a baculovirus-infected cell nucleus and the viral envelope. In addition, it was suggested that movement of this protein into the nuclear envelope may initiate through cytoplasmic membranes, such as endoplasmic reticulum, and that transport into the nucleus may be mediated through the outer and inner nuclear membrane PUBMED:9108103.\ \ 4767 IPR003913 \

    Tuberous sclerosis (TSC) is an autosomal dominant disorder caused by a \ mutation in either the TSC1 or TSC2 tumour suppressor genes. The disease is\ characterised by hamartomas in one or more organs (including brain, skin,\ heart and kidney) giving rise to a broad phenotypic spectrum (including \ seizures, mental retardation, renal dysfunction and dermatological abnormalities PUBMED:9580671. TSC2 encodes tuberin, a putative GTPase activating protein for rap1 and rab5. The TSC1 gene was recently identified and codes for hamartin, a novel protein with no significant similarity to tuberin or any other known vertebrate protein. Hamartin and tuberin have been shown to associate physically in vivo, their interaction being mediated by predicted coiled-coil domains. It is thought that hamartin and tuberin function in the same complex, rather than in separate pathways. Moreover, because oligomerisation of the hamartin C-terminal coiled coil domain is inhibited by the presence of tuberin, it is possible that tuberin acts as a chaperone, preventing hamartin self-aggregation. \

    Tuberin, is a widely expressed 1784-amino-acid protein PUBMED:7558029. Expression of the wild-type gene in TSC2 mutant tumour cells inhibits proliferation and tumorigenicity. This "suppressor" activity is encoded by a functional domain in the C-terminus that shares similarity with the GTPase activating protein Rap1GAP. Using a yeast two-hybrid assay, the cytosolic factor, rabaptin-5, was found to associate with a distinct domain lying adjacent\ to the TSC2 GAP similarity region. Rabaptin-5 also binds the active form\ of GTPase Rab5. It is thought that tuberin may function as a Rab5GAP in vivo \ to negatively regulate Rab5-GTP activity in endocytosis PUBMED:9045618.

    \ 7752 IPR012473 \

    This family features sequences bearing similarity to the C-terminal portion of the bacteriophage T4 protein fibritin (). This protein is responsible for attachment of long tail fibres to virus particle, and forms the, "whiskers", or fibres on the neck of the virion. The region seen in this family contains an N-terminal coiled-coil portion and the C-terminal globular foldon domain (residues 457-486), which is essential for fibritin trimerisation and folding PUBMED:15033360. This domain consists of a beta-hairpin; three such hairpins come together in a beta-propeller-like arrangement in the trimer, which is stabilised by hydrogen bonds, salt bridges and hydrophobic interactions PUBMED:15033360.

    \ 3598 IPR000183 \ These enzymes are collectively known as group IV decarboxylases PUBMED:8181483.\ Pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and\ related substrates can be classified into two different families on the basis\ of sequence similarities PUBMED:3143046, PUBMED:8181483.\ Members of this family while most probably evolutionary related, do not share\ extensive regions of sequence similarities. The proteins contain a conserved lysine\ residue which is known, in mouse ODC PUBMED:1730582, to be the site of attachment of the\ pyridoxal-phosphate group. The proteins also contain a stretch of three\ consecutive glycine residues and has been proposed to be part of a substrate-\ binding region PUBMED:2198270.\ 1419 IPR004695 \

    Two members of the Tellurite-Resistance/Dicarboxylate Transporter (TDT) family have been functionally characterized. One is the TehA\ protein of Escherichia coli which has been implicated in resistance to tellurite; the other is the Mae1\ protein of Schizosaccharomyces pombe which functions in the uptake of malate and other dicarboxylates by a\ proton symport\ mechanism. These proteins exhibit 10 putative transmembrane a-helical\ spanners (TMSs).

    \ \ 3706 IPR000181 \

    Peptide deformylase (PDF) is an essential metalloenzyme required for the \ removal of the formyl group at the N-terminus of nascent polypeptide chains\ in eubacteria PUBMED:9846875 . The enzyme acts as a monomer and binds a single zinc ion, catalysing the reaction::\ \ Catalytic efficiency strongly depends on the identity of the bound metal PUBMED:9565550.

    \

    The structure\ of these enzymes is known PUBMED:8845003, PUBMED:9665852. PDF, a member of the zinc metalloproteases family, comprises an active core\ domain of 147 residues and a C-terminal tail of 21 residue.\ The 3D fold of the catalytic core has been determined by X-ray crystallography and NMR.\ Overall, the structure contains a series of anti-parallel beta-\ strands that surround two perpendicular alpha-helices. The C-terminal \ helix contains the characteristic HEXXH motif of metalloenzymes, which is\ crucial for activity. The helical arrangement, and the way the histidine\ residues bind the zinc ion, is reminiscent of other metalloproteases, such\ as thermolysin or metzincins. However, the arrangement of secondary and\ tertiary structures of PDF, and the positioning of its third zinc ligand (a\ cysteine residue), are quite different. These discrepancies, together with \ notable biochemical differences, suggest that PDF constitutes a new class of\ zinc-metalloproteases. \ PUBMED:8845003.

    \ 1400 IPR007541 \ These basic secretory proteins (BSPs) are believed to be part of the plants defence mechanism against pathogens PUBMED:10202814.\ 38 IPR007798 \ This family consists of mammalian Ameloblastin precursor (Amelin) proteins. Matrix proteins of tooth enamel consist mainly of amelogenin but also of non-amelogenin proteins, which, although their volumetric percentage is low, have an important role in enamel mineralization. One of the non-amelogenin proteins is ameloblastin, also known as amelin and sheathlin. Ameloblastin (AMBN) is one of the enamel sheath proteins which is thought to have a role in determining the prismatic structure of growing enamel crystals PUBMED:11867231.\ 7951 IPR012635 \

    This family consists of acidic alpha-KTx short chain scorpion toxins. These toxins named parabutoxins, block voltage-gated K channels and have extremely low pI values. Furthermore, they lack the crucial pore-plugging lysine. In addition, the second important residue of the dyad, the hydrophobic residue (Phe or Tyr) is also missing PUBMED:14561751.

    \ 7486 IPR011647 \ This motif occurs in multiple copies in Leptospira interrogans proteins.\ 4247 IPR001351 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S3 is one of the proteins from the small ribosomal subunit. In \ Escherichia coli, S3 is known to be involved in the binding of initiator Met-tRNA. This family of ribosomal proteins includes S3 from bacteria, algae and \ plant chloroplast, cyanelle, archaebacteria, plant mitochondria, vertebrates, insects,\ Caenorhabditis elegans and yeast PUBMED:8036511. This entry is the C-terminal domain.

    \ 308 IPR006874 \ This is a conserved region found in uncharacterised proteins from Caenorhabditis elegans.\ 7980 IPR012961 \

    This C-terminal domain is found in DOB1/SK12/helY-like DEAD box helicases PUBMED:15112237.

    \ 3985 IPR004168 \ This protein motif is found in the PEVK region of the titin protein. Titin is a muscle protein which may be involved in muscle assembly and maintaining structural integrity of sarcomeres. It may have protein kinase activity.\ 2516 IPR000577 \ It has been shown PUBMED:1659648 that four different type of carbohydrate kinases seem to be evolutionary related.\ These enzymes include L-fucolokinase () (gene fucK); gluconokinase () (gene gntK); glycerol\ kinase () (gene glpK); xylulokinase () (gene xylB); and L-xylulose kinase ()\ (gene lyxK). These enzymes are proteins of from 480 to 520 amino acid residues.\ 7408 IPR011505 \

    These peptidases, which cleave mammalian IgA, are found in Gram-positive bacteria. Often found associated with , they may be attached to the cell wall.

    \ 2901 IPR003493 \ Herpesvirus glycoprotein H (gH) is a virion associated envelope glycoprotein PUBMED:9526546. Complex formation between gH and gL has been demonstrated in both virions and infected cells PUBMED:9267002.\ 3676 IPR001523 \

    The paired box is a conserved 124 amino acid N-terminal domain of unknown function that usually, but not always, precedes a homeobox domain (see ) PUBMED:7527137, PUBMED:7981748. Paired box genes are expressed in alternate segments of the developing fruit fly, the observed grouping of segments into pairs depending on the position of the segment in the segmental array, and not on the identity of the segment as in the case of homeotic genes. This implies that the genes affect different processes from those altered by homeotic genes.

    \ 5691 IPR008568 \ This family consists of eukaryotic putative transmembrane proteins of unknown function.\ 279 IPR004378 \ The Mycobacterium tuberculosis paralogous family 11 groups a number of related hypothetical proteins from this organism. The function of these proteins is not yet known.\ 4859 IPR005351 \

    This is a small family of proteins of unknown function which appear to be related to the hypothetical protein CG10674 from Drosophila melanogaster ().

    \ 7382 IPR011490 \

    This domain is found in a wide variety of contexts, but mostly occurring in cell wall associated proteins. A lack of conserved catalytic residues suggests that it is a binding domain. From context, possible substrates are hyaluronate or fibronectin (personal obs: C Yeats). This is further evidenced by PUBMED:12438356. Possibly the exact substrate is N-acetyl glucosamine. Finding it in the same protein as further supports this proposal. It is found in the C-terminal part of , which is removed during maturation PUBMED:14759609. Some of the proteins it is found in (e.g. ) are involved in methicillin resistance PUBMED:10896508. The name FIVAR derives from Found In Various Architectures.

    \ 7446 IPR011474 \

    This is a family of short hypothetical proteins found in Rhodopirellula baltica.

    \ 3409 IPR006685 \

    Mechanosensitive (MS) channels provide protection against hypo-osmotic shock, responding both to stretching of the cell membrane and to membrane depolarisation. They are present in the membranes of organisms from the three domains of life: bacteria, archaea, and eukarya PUBMED:12626684. There are two families of MS channels: large-conductance MS channels (MscL) and small-conductance MS channels (MscS or YGGB). The pressure threshold for MscS opening is 50% that of MscL PUBMED:12446901. The MscS family is much larger and more variable in size and sequence than the MscL family. Much of the diversity in MscS proteins occurs in the size of the transmembrane regions, which ranges from three to eleven transmembrane helices, although the three C-terminal helices are conserved. This family contains sequences form the MscS family of proteins.

    \

    MscS folds as a homo-heptamer with a cylindrical shape, and can be divided into transmembrane and extramembrane regions: an N-terminal periplasmic region, a transmembrane region, and a C-terminal cytoplasmic region (middle and C-terminal domains). The transmembrane region forms a channel through the membrane that opens into a chamber enclosed by the extramembrane portion, the latter connecting to the cytoplasm through distinct portals PUBMED:12446901.

    \ 794 IPR007209 \ This is a possible metal-binding domain in endoribonuclease RNase L inhibitor. It is found at the N-terminal end of RNase L inhibitor proteins, adjacent to the 4Fe-4S binding domain, fer4, . Also often found adjacent to the DUF367 domain in uncharacterised proteins. The RNase L system plays a major role in the anti-viral and anti-proliferative activities of interferons PUBMED:9524254, and could possibly play a more general role in the regulation of RNA stability in mammalian cells. Inhibitory activity requires concentration-dependent association of RLI with RNase L PUBMED:7539425.\ 5558 IPR008427 \ This fungal specific cysteine rich domain is found in some proteins with proposed roles in fungal pathogenesis PUBMED:12633989.\ 3971 IPR005008 \

    This family represents the Poxvirus rifampicin resistance protein. The failure to isolate genotypic variants of Poxvirus family members encoding a predicted C-terminal truncated form of these proteins, suggests that\ the C terminus of the molecule may be essential to protein function, and, in turn, that this function may be essential to viral\ replication. It has been proposed that possession of a\ gene encoding a member of this polypeptide family might represent a defining molecular characteristic of the Poxviridae PUBMED:8609479.

    \ 2620 IPR004216 \

    L-fucose isomerase () converts the aldose L-fucose into the corresponding ketose L-fuculose during the first step in fucose\ metabolism using Mn2+ as a cofactor. The enzyme is a hexamer, forming the largest structurally known ketol isomerase, and has no sequence or structural similarity with other ketol isomerases. The structure was determined by X-ray crystallography at 2.5 A resolution PUBMED:9367760.

    \ 3041 IPR011600 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of sequences represent the p20 (20kDa) subunit of caspases. The p20 (20 kDa)subunit and the p10 subunit (10 kDa) subunit form the catalytic domain of the caspase and are derived from the p45 (45 kDa) precursor () PUBMED:15226512.

    \ \

    Caspases (Cysteine-dependent ASPartyl-specific proteASE) are cysteine peptidases that belong to the MEROPS peptidase family C14 (caspase family, clan CD) based on the architecture of their catalytic dyad or triad PUBMED:11517925. Caspases are tightly regulated proteins that require zymogen activation to become active, and once active can be regulated by caspase inhibitors. Activated caspases act as cysteine proteases, using the sulphydryl group of a cysteine side chain for catalysing peptide bond cleavage at aspartyl residues in their substrates. The catalytic cysteine and histidine residues are on the p20 subunit after cleavage of the p45 precursor.

    \

    Caspases are mainly involved in mediating cell death (apoptosis) PUBMED:10578171, PUBMED:10872455, PUBMED:15077141. They have two main roles within the apoptosis cascade: as initiators that trigger the cell death process, and as effectors of the process itself. Caspase-mediated apoptosis follows two main pathways, one extrinsic and the other intrinsic or mitochondrial-mediated. The extrinsic pathway involves the stimulation of various TNF (tumour necrosis factor) cell surface receptors on cells targeted to die by various TNF cytokines that are produced by cells such as cytotoxic T cells. The activated receptor transmits the signal to the cytoplasm by recruiting FADD, which forms a death-inducing signalling complex (DISC) with caspase-8. The subsequent activation of caspase-8 initiates the apoptosis cascade involving caspases 3, 4, 6, 7, 9 and 10. The intrinsic pathway arises from signals that originate within the cell as a consequence of cellular stress or DNA damage. The stimulation or inhibition of different Bcl-2 family receptors results in the leakage of cytochrome c from the mitochondria, and the formation of an apoptosome composed of cytochrome c, Apaf1 and caspase-9. The subsequent activation of caspase-9 initiates the apoptosis cascade involving caspases 3 and 7, among others. At the end of the cascade, caspases act on a variety of signal transduction proteins, cytoskeletal and nuclear proteins, chromatin-modifying proteins, DNA repair proteins and endonucleases that destroy the cell by disintegrating its contents, including its DNA. The different caspases have different domain architectures depending upon where they fit into the apoptosis cascades, however they all carry the catalytic p10 and p20 subunits.

    \

    Caspases can have roles other than in apoptosis, such as caspase-1 (interleukin-1 beta convertase) (), which is involved in the inflammatory process. The activation of apoptosis can sometimes lead to caspase-1 activation, providing a link between apoptosis and inflammation, such as during the targeting of infected cells. Caspases may also be involved in cell differentiation PUBMED:15066636.

    \ \ 4185 IPR013025 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This domain is found in both eukaryotic L25 and prokaryotic and eukaryotic L23 proteins.

    \ 866 IPR003395 \

    The SMC (structural maintenance of chromosomes)\ family of proteins exists in virtually all organisms including both bacteria and archaea. The SMC proteins are essential for successful chromosome transmission during replication and segregation of the genome in all organisms and form three types of heterodimer (SMC1SMC3, SMC2SMC4,\ SMC5SMC6), which are core components of large multiprotein complexes.\ The best known complexes are cohesin, which is responsible for\ sister-chromatid cohesion, and condensin, which is required for full\ chromosome condensation in mitosis.

    SMCs are generally present as single proteins in bacteria, and as at least six distinct proteins in eukaryotes. The proteins range in size from approximately 110 to 170 kDa, and share a five-domain structure, with globular N- and C-terminal domains separated by a long\ (circa 100 nm or 900 residues) coiled coil segment in the centre of which is a globular ''hinge'' domain, characterized by a set of four highly conserved glycine residues\ that are typical of flexible regions in a protein. The amino-terminal domain contains a 'Walker A' nucleotide-binding domain (GxxGxGKS/T), which by mutational studies has been shown to be essential in several proteins. The carboxy-terminal domain contains a sequence (the DA-box) that resembles a 'Walker B' motif (XXXXD, where X is any hydrophobic residue), and a LSGG motif with homology to the signature sequence of the ATP-binding cassette (ABC) family of ATPases PUBMED:12360193.

    All\ SMC proteins appear to form dimers, either forming homodimers with themselves, as in the case of prokaryotic\ SMC proteins, or heterodimers between different but related SMC proteins. The\ dimers are arranged in an antiparallel alignment. This orientation brings the N- and C-terminal globular domains (from either different or\ identical protamers) together, which unites an ATP binding site (Walker A motif) within the N-terminal domain\ with a Walker B motif (DA box) within the C-terminal domain, to form a potentially functional ATPase. Protein interaction and microscopy data suggest that SMC\ dimers form a ring-like structure which might embrace DNA molecules. Non-SMC subunits\ associate with the SMC amino- and carboxy-terminal domains. The sequence homology within the carboxy-terminal domain is relatively high within the SMC1-SMC4 group, whereas SMC5 and SMC6 show some divergence in both of these sequences.

    \

    SMCs share not only sequence similarity but also structural similarity with ABC proteins. SMC proteins function together with other proteins in a range of chromosomal transactions, including chromosome condensation, sister-chromatid cohesion, recombination, DNA repair and epigenetic silencing of gene expression PUBMED:11983169.

    \

    This domain is found at the N terminus of SMC proteins.

    \ 4028 IPR005610 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbW found in PSII, where it is a subunit of the oxygen-evolving complex. PsbW appears to have several roles, including guiding PSII biogenesis and assembly, stabilising dimeric PSII PUBMED:10950961, and facilitating PSII repair after photo-inhibition PUBMED:9335523. There appears to be two classes of PsbW, class 1 being found predominantly in algae and cyanobacteria, and class 2 being found predominantly in plants. This entry represents class 1 PsbW.

    \ 1237 IPR001669 \

    The arginine dihydrolase (AD) pathway is found in many prokaryotes and some primitive eukaryotes, an example of the latter being Giardia lamblia PUBMED:9504342. The three-enzyme anaerobic pathway breaks down L-arginine to form 1 mol of ATP, carbon dioxide and ammonia. In simpler bacteria, the first enzyme, arginine deiminase, can account for up to 10% of total cell protein PUBMED:9504342.

    \ \

    Most prokaryotic arginine deiminase pathways are under the control of a repressor gene, termed ArgR PUBMED:1583685. This is a negative regulator, and will only release the arginine deiminase operon for expression in the presence of arginine PUBMED:9851988. The crystal structure of apo-ArgR from Bacillus stearothermophilus has been determined to 2.5A by means of X-ray crystallography PUBMED:10331868. The protein exists as a hexamer of identical subunits, and is shown to have six DNA-binding domains, clustered around a central oligomeric core when bound to arginine. It predominantly interacts with A.T residues in ARG boxes. This hexameric protein binds DNA at its N terminus to repress arginine biosyntheis or activate arginine catabolism. Some species have several ArgR paralogs. In a neighbor-joining tree, some of these paralogous sequences show long branches and differ significantly from the well-conserved C-terminal region.

    \ 7156 IPR009933 \

    This family consists of several T-DNA border endonuclease VirD1 proteins, which appear to be found exclusively in Agrobacterium species. Agrobacterium, a plant pathogen, is capable to stably transform the plant cell with a segment of its own DNA called T-DNA (transferred DNA). This process depends, among others, on the specialised bacterial virulence proteins VirD1 and VirD2 that excise the T-DNA from its adjacent sequences. VirD1 is thought to interact with VirD2 in this process PUBMED:9689041.

    \ 4267 IPR005913 \

    dTDP-4-dehydrorhamnose reductase () catalyzes the last of 4 steps in making dTDP-rhamnose, a precursor of LPS molecules such as core antigen and O-antigen.\

    \ \ 7417 IPR011449 \

    This is a family of proteins identified in Rhodopirellula baltica. Members are also found in the proteobacteria, chlorobiaceae (green sulphur bacteria), and in the betaproteobacteria e.g. Nitrosomonas europea. One member, from Rhodopirellula baltica (), shows some similarity to M12B zinc peptidases.

    \ 4948 IPR001747 \

    This family contains regions from vitellogenin, microsomal triglyceride transfer protein and apolipoprotein B-100. These proteins are all involved in lipid transport PUBMED:9687371.

    This family contains the LV1n chain from lipovitellin, the predominant\ lipoprotein found in the yolk of egg-laying animals involved in lipid and metal storage. LV1n forms two domains and portions of two more:\ the N-sheet, the helical segment, one beta-strand of the A-sheet, and all but two beta-strands of the C-sheet. The N-sheet domain mainly consists of 11 beta-strands wrapped around an uncharged helix of 14 residues, although another beta-strand and 3 more small helices\ are included. There are two disulphide bonds within this region; one is conserved in the\ homologous proteins MTP and apoB. A drawn out loop containing two beta-strands helps to link the N-sheet to both the C- and A-sheets. In addition, a\ depression in the N-sheet globular domain accepts the loops of several beta-strands from the C- and A-sheets forming multiple interactions. Past comments have\ suggested the region has the appearance of and may function as a flexible "ball-and-socket" joint accommodating the lipid as the lipoprotein assembles PUBMED:12135361.

    \ \ 1319 IPR001425 \ The bacterial opsins are retinal-binding proteins that provide light-\ dependent ion transport and sensory functions to a family of halophilic \ bacteria PUBMED:2468194, PUBMED:2591367. They are integral membrane proteins believed to contain\ seven transmembrane (TM) domains, the last of which contains the attachment\ point for retinal (a conserved lysine).

    There are several classes of these\ bacterial proteins: they include bacteriorhodopsin and archaerhodopsin,\ which are light-driven proton pumps; halorhodopsin, a light-driven \ chloride pump; and sensory rhodopsin, which mediates both photoattractant\ (in the red) and photophobic (in the UV) responses.

    \ 7891 IPR012989 \

    The SEP domain is named after Saccharomyces cerevisiae Shp1, Drosophila melanogaster eyes closed gene (eyc), and vertebrate p47. In p47, the SEP domain has been shown to bind to and inhibit the cysteine protease cathepsin L PUBMED:15498563. Most SEP domains are succeeded closely by a UBX domain PUBMED:15498563.

    \ 5979 IPR009318 \

    In Drosophila, taste is perceived by gustatory neurons located in sensilla distributed on several different appendages throughout the body of the animal. This family represents the taste receptor sensitive to trehalose PUBMED:10710312,PUBMED:11516643.

    \ 5666 IPR008565 \ This family consists of several hypothetical bacterial sequences as well as one viral sequence , the function of this family is unknown.\ 6748 IPR010703 \

    This family represents a conserved region of approximately 200 residues within a number of eukaryotic dedicator of cytokinesis (DOCK) proteins. These proteins are potential guanine nucleotide exchange factors that activate some small GTPases, such as Rac, by exchanging bound GDP for free GTP PUBMED:12432077. DOCK proteins are required during several cellular processes, such as cell motility and phagocytosis. For instance, DOCK2 is specifically expressed in haemopoietic cells, and plays a critical role in lymphocyte migration PUBMED:12829596.

    \ 865 IPR001163 \

    This family is found in Lsm (like-Sm) proteins and in bacterial Lsm-related Hfq proteins. In each case, the domain adopts a core structure consisting of an open beta-barrel with an SH3-like topology.

    \

    Lsm (like-Sm) proteins have diverse functions, and are thought to be important modulators of RNA biogenesis and function PUBMED:10801455, PUBMED:12438310. The Sm proteins form part of specific small nuclear ribonucleoproteins (snRNPs) that are involved in the processing of pre-mRNAs to mature mRNAs, and are a major component of the eukaryotic spliceosome. Most snRNPs consist of seven Sm proteins (B/B, D1, D2, D3, E, F and G) arranged in a ring on a uridine-rich sequence (Sm site), plus a small nuclear RNA (snRNA) (either U1, U2, U5 or U4/6) PUBMED:15130578. All Sm proteins contain a common sequence motif in two segments, Sm1 and Sm2, separated by a short variable linker PUBMED:7744013. In other snRNPs, certain Sm proteins are replaced with different Lsm proteins, such as with U7 snRNPs, in which the D1 and D2 Sm proteins are replaced with U7-specific Lsm10 and Lsm11 proteins, where Lsm11 plays a role in histone U7-specific RNA processing PUBMED:15526162. Lsm proteins are also found in archaebacteria, which do not have any splicing apparatus suggesting a more general role for Lsm proteins.

    \

    The pleiotropic translational regulator Hfq (host factor Q) is a bacterial Lsm-like protein, which modulates the structure of numerous RNA molecules by binding preferentially to A/U-rich sequences in RNA PUBMED:15561140. Hfq forms an Lsm-like fold, however, unlike the heptameric Sm proteins, Hfq forms a homo-hexameric ring.

    \ \ 7467 IPR011521 \

    These hypothetical proteins in Rhodopirellula baltica contain several repeats of a sequence whose core contains the residues YTV.

    \ 4072 IPR008162 \

    Inorganic pyrophosphatase () (PPase) PUBMED:2160278, PUBMED:1323891 is the enzyme responsible for the hydrolysis of pyrophosphate (PPi) which is formed principally as the product of the many biosynthetic reactions that utilize ATP. All known PPases require the presence of divalent metal cations, with magnesium conferring the highest activity. Among other residues, a lysine has been postulated to be part of or close to the active site. PPases have been sequenced from bacteria such as Escherichia coli (homohexamer), thermophilic bacteria PS-3 and Thermus thermophilus, from the archaebacteria Thermoplasma acidophilum, from fungi (homodimer), from a plant, and from bovine retina. In yeast, a mitochondrial isoform of PPase has been characterized which seems to be involved in energy production and whose activity is stimulated by uncouplers of ATP synthesis.

    \

    The sequences of PPases share some regions of similarities, among which is a region that contains three conserved aspartates that are involved in the binding of cations.

    \ 6347 IPR010528 \

    This family consists of several bacterial TolA proteins as well as two eukaryotic proteins of unknown function. Tol proteins are involved in the translocation of group A colicins. Colicins are bacterial protein toxins, which are active against Escherichia coli and other related species (See ). TolA is anchored to the cytoplasmic membrane by a single membrane spanning segment near the N terminus, leaving most of the protein exposed to the periplasm PUBMED:12423782.

    \ 5164 IPR008001 \

    Colony stimulating factor 1 (CSF-1) is a homodimeric polypeptide growth factor whose\ primary function is to regulate the survival, proliferation, differentiation, and function of cells of the\ mononuclear phagocytic lineage. This lineage includes mononuclear phagocytic precursors, blood\ monocytes, tissue macrophages, osteoclasts, and microglia of the brain, all of which possess cell\ surface receptors for CSF-1. The protein has also been linked with male fertility PUBMED:11897698\ and mutations in the Csf-1 gene have been found to cause osteopetrosis and failure of tooth eruption\ PUBMED:12379742.

    \ 1315 IPR006135 \

    Secretion of virulence factors in Gram-negative bacteria involves \ transportation of the protein across two membranes to reach the cell \ exterior. There have been four secretion systems described in \ animal enteropathogens such as Salmonella and Yersinia, with further \ sequence similarities in plant pathogens like Ralstonia and Erwinia PUBMED:8969244.

    \ \ The type III secretion system is of great interest, as it is used to \ transport virulence factors from the pathogen directly into the host cell \ PUBMED:10334981 and is only triggered when the bacterium comes into close contact with\ the host. The protein subunits of the system are very similar to those of \ bacterial flagellar biosynthesis PUBMED:10564516. However, while the latter forms a\ ring structure to allow secretion of flagellin and is an integral part of\ the flagellum itself, type III subunits in the outer membrane\ translocate secreted proteins through a channel-like structure.

    \ \ It is believed that the family of type III inner membrane proteins are \ used as structural moieties in a complex with several other subunits PUBMED:9618447. \ One such set of inner membrane proteins, labeled "S" here for nomenclature \ purposes, includes the Salmonella and Shigella SpaS, the Yersinia YscU, \ Rhizobium Y4YO, and the Erwinia HrcU genes. The flagellar protein FlhB \ also shares similarity, probably due to evolution of the type III secretion\ system from the flagellar biosynthetic pathway.

    \ 1529 IPR000780 \

    Methyl transfer from the ubiquitous S-adenosyl-L-methionine (AdoMet) to either nitrogen, oxygen or carbon atoms is frequently employed in diverse organisms ranging from bacteria to plants and mammals. The reaction is catalyzed by methyltransferases (Mtases) and modifies DNA, RNA, proteins and small molecules, such as catechol for regulatory purposes. The various aspects of the role of DNA methylation in prokaryotic restriction-modification systems and in a number of cellular processes in eukaryotes including gene regulation and differentiation is well documented.

    \ \

    Three classes of DNA Mtases transfer the methyl group from AdoMet to the target base to form either N-6-methyladenine, or N-4-methylcytosine, or C-5- methylcytosine. In C-5-cytosine Mtases, ten conserved motifs are arranged in the same order PUBMED:8127644. Motif I (a glycine-rich or closely related consensus sequence; FAGxGG in M.HhaI PUBMED:8343957), shared by other AdoMet-Mtases PUBMED:2684970, is part of the cofactor binding site and motif IV (PCQ) is part of the catalytic site. In contrast, sequence comparison among N-6-adenine and N-4-cytosine Mtases indicated two of the conserved segments PUBMED:2690010, although more conserved segments may be present. One of them corresponds to motif I in C-5-cytosine Mtases, and the other is named (D/N/S)PP(Y/F). Crystal structures are known for a number of Mtases PUBMED:7607476, PUBMED:8343957, PUBMED:8127644, PUBMED:7971991. The cofactor binding sites are almost identical and the essential catalytic amino acids coincide. The comparable protein folding and the existence of equivalent amino acids in similar secondary and tertiary positions indicate that many (if not all) AdoMet-Mtases have a common catalytic domain structure. This permits tertiary structure prediction of other DNA, RNA, protein, and small-molecule AdoMet-Mtases from their amino acid sequences PUBMED:7897657.

    \ \

    Flagellated bacteria swim towards favourable chemicals and away from deleterious ones. Sensing of \ chemoeffector gradients involves chemotaxis receptors, transmembrane (TM) proteins that detect \ stimuli through their periplasmic domains and transduce the signals via their cytoplasmic domains \ PUBMED:, PUBMED:9115443. Signalling outputs from these \ receptors are influenced both by the binding of the chemoeffector ligand to their periplasmic \ domains and by methylation of specific glutamate residues on their cytoplasmic domains. Methylation \ is catalysed by CheR, an S-adenosylmethionine-dependent methyltransferase PUBMED:9115443, which \ reversibly methylates specific glutamate residues within a coiled coil region, to form gamma-glutamyl methyl ester residues PUBMED:9115443, PUBMED:9628482. The structure of the S. typhimurium \ chemotaxis receptor methyltransferase CheR, bound to S-adenosylhomocysteine, has been determined \ to a resolution of 2.0 A PUBMED:9115443. The structure reveals CheR to be a two-domain protein, with \ a smaller N-terminal helical domain linked via a single polypeptide connection to a larger \ C-terminal alpha/beta domain. The C-terminal domain has the characteristics of a nucleotide-binding \ fold, with an insertion of a small anti-parallel beta-sheet subdomain. The S-adenosylhomocysteine-binding site is formed mainly by the large domain, with contributions from residues within the \ N-terminal domain and the linker region PUBMED:9115443.

    \ 4108 IPR003783 \ RecX is a putative bacterial regulatory protein PUBMED:10869079. The gene encoding RecX is found downstream of recA, and it is suggested that the RecX protein might be regulator of RecA activity by interaction with the RecA protein or filament PUBMED:10869079.\ 4195 IPR002171 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L2 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L2 is known to bind to the 23S rRNA and to have peptidyltransferase activity. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities PUBMED:1579444, PUBMED:, groups:

    \ \ 4928 IPR002714 \

    This family of proteins is involved in the ubiquitylation and subsequent proteasomal degradation of proteins via the von Hippel-Lindau ubiquitylation complex. They appear to act as the target recruitment subunit in the E3 ubiquitin ligase complex and recruit hydroxylated hypoxia-inducible factor (HIF) under normoxic conditions. They are also involved in transcriptional repression through interaction with HIF1A, HIF1AN and histone deacetylases. Human VHL has been demonstrated to form a ternary complex with elonginB and elonginC proteins PUBMED:10205047. This complex binds Cul2, which then is involved in regulation of vascular endothelial growth factor mRNA.

    \ 5608 IPR008459 \ This family consists of several nonstructural protein 4 (NS4) sequences or putative small membrane proteins.\ 2053 IPR007205 \

    This is a protein of unknown function. It is found N-terminal to another domain of unknown function, DUF384 ().

    \ 1592 IPR003724 \

    The BtuR, CobO, CobP proteins are Cob(I)alamin adenosyltransferases\ involved with cobalamin biosynthesis (vitamin B12). They synthesize cobalamin by an anaerobic pathway, in which cobalt is added at an early stage and molecular oxygen is not required PUBMED:9742225.

    \ 5899 IPR013029 \

    This domain is found at the C terminus of a family of conserved hypothetical proteins with GTP-binding motifs () and is possibly related to the ubiquitin-like and MoaD/ThiS superfamilies.

    \ 4060 IPR001313 \

    The drosophila pumilio gene codes for an unusual protein that binds through the Puf\ domain that usually occurs as a tandem repeat of eight domains. The FBF-2 protein of\ Caenorhabditis elegans also has a Puf domain. Both proteins function as translational \ repressors in early embryonic development by binding sequences in the 3' UTR of target \ mRNAs PUBMED:9393998, PUBMED:9404893. The same type of repetitive domain has been found in\ in a number of other proteins from all eukaryotic kingdoms. The Puf proteins characterised to date have been reported to bind to 3'-untranslated region (UTR) sequences encompassing a so-called UGUR tetranucleotide motif and thereby to repress gene expression by affecting mRNA translation or stability.

    \

    In Saccharomyces cerevisiae, five proteins, termed Puf1p to Puf5p, bear six to eight Puf repeats PUBMED:15024427. Puf3p binds nearly exclusively to cytoplasmic mRNAs that encode mitochondrial proteins; Puf1p and Puf2p interact preferentially with mRNAs encoding membrane-associated proteins; Puf4p preferentially binds mRNAs encoding nucleolar ribosomal RNA-processing factors; and Puf5p is associated with mRNAs encoding chromatin modifiers and components of the spindle pole body. This suggests the existence of an extensive network of RNA-protein interactions that coordinate the post-transcriptional fate of large sets of cytotopically and functionally related RNAs through each stage of its lifecycle.

    \ 6352 IPR010531 \

    This family consists of several NOA36 proteins which contain 29 highly conserved cysteine residues. The function of this protein is unknown.

    \ 147 IPR004165 \ Coenzyme A (CoA) transferases belong to an evolutionary conserved PUBMED:1624453, PUBMED:9325289 family of enzymes catalyzing the reversible transfer of CoA from one carboxylic acid to another. They have been identified in many prokaryotes and in mammalian tissues. The bacterial enzymes are heterodimer of two subunits (A and B) of about 25 Kd each while eukaryotic SCOT consist of a single chain which is colinear with the two bacterial subunits.\ 4503 IPR002778 \ The signal recognition particle (SRP) binds to the signal peptide of\ proteins as they are being translated. The binding of the SRP halts\ translation and the complex is then transported to the endoplasmic\ reticulum's cytoplasmic surface. The SRP then aids translocation of\ the protein through the ER membrane. The SRP is a ribonucleoprotein\ that is composed of a small RNA and several proteins. One of these\ proteins is the Srp19 protein PUBMED:2460823 (Sec65 in Saccharomyces cerevisiae PUBMED:1313947, PUBMED:1313948).\ 7896 IPR012993 \

    This domain is characteristic of UVSB PI-3 kinase, MEI-41 and ESR1 PUBMED:15112237.

    \ 289 IPR006869 \ This is a conserved region found in uncharacterised proteins from Caenorhabditis elegans and Arabidopsis thaliana.\ 1908 IPR003790 \

    This entry describes proteins of unknown function.

    \ 6600 IPR010640 \

    This family consists of several bacteria specific low temperature requirement A (LtrA) protein sequences which have been found to be essential for growth at low temperatures in Listeria monocytogenes PUBMED:8534098.

    \ 3159 IPR007822 \ This family contains the lanthionine synthetase C-like proteins 1 and 2 which are related to the bacterial lanthionine synthetase components C (LanC). LANCL1(P40 seven-transmembrane-domain protein) and LANCL2 (testes-specific adriamycin sensitivity protein) are thought to be peptide-modifying enzyme components in eukaryotic cells. Both proteins are produced in large quantities in the brain and testes and may have role in the immune surveillance of these organs PUBMED:11376939.\ 1817 IPR002205 \

    Topoisomerases are ubiquitous enzymes that catalyze cleavage and religation of DNA molecules allowing for the interconversion of topological isomers of DNA and play \ a key role in DNA metabolism. Topoisomerases of type I and type II cleave one and two DNA\ strands, respectively. Topoisomerase I catalyses an ATP-independent reaction, \ while topoisomerase II catalyses an ATP-dependent reaction, resulting in the formation \ of DNA supercoils PUBMED:1651812, PUBMED:1646964, PUBMED:2845399. Eukaryotic enzymes can form \ both positive and negative supercoils, while prokaryotic enzymes form only negative \ supercoils.

    \ \

    Eukaryotic topoisomerase II exists as a homodimer; in bacteriophage T4 it \ consists of three heterologous subunits; most bacteria have\ two homologous type II enzymes: DNA gyrase (topoisomerase II, Gyr) and topoisomerase IV (Par). Each enzyme is composed of\ two subunits. GyrA is involved in breakage and reunion of DNA and GyrB functions as an ATPase. GyrB, parE, and the product of \ bacteriophage T4 gene 39, are all similar to the eukaryotic proteins.

    \

    This family includes subunit A, encoded by DNA gyrase A (gyrA) and parC. GyrA is composed of two fragments. The structure of the 59 kDa N-terminal fragment of the E. coli enzyme has been\ determined and the position of the catalytic tyrosine has been localized. The C-terminal 38kDa fragment of GyrA still remains the\ largest piece of the topoisomerase sequence without structural information. It lacks catalytic activity, but can complement the N-terminal fragment increasing its\ supercoiling activity. The C-terminal fragment acts as a non-specific DNA-binding protein and is probably involved in\ stabilization of the DNA-topoisomerase complex PUBMED:11948780.

    \ 7807 IPR012941 \

    Phenol hydroxylase is a homodimer which hydroxylates phenol to catechol, or similar products. The enzyme is comprised of three domains. The first two domains form the active site. The third domain, this domain, is involved in forming the dimerisation interface. The domain adopts a thioredoxin-like fold PUBMED:9634698.

    \ 6299 IPR009466 \

    This region of coronavirus polyproteins encodes the NSP11 protein.

    \ 7512 IPR011661 \ The sulphur oxygenase/reductase (SOR) of the thermo-acidophilic archaeon Acidianus ambivalens is an unusual enzyme consisting of 24 identical subunits arranged in a perfectly symmetrical hollow sphere and containing a mononuclear non-heme iron centre (personal communication: A. Kletzin). At 85 degrees C in vitro, elemental sulphur is oxidised to sulphite, thiosulphate and hydrogen sulphide with no external cofactors needed. The proposed equation is: 4S + O2 + 4 H2O ---> 2 HSO3- + 2 H2S + 2 H+.\ 1886 IPR003737 \

    Although most of the proteins in this group are of unknown function one, from Schizosaccharomyces pombe, has been characterised as a probable N-acetylglucosaminyl-phosphatidylinositol de-N-acetylase.

    \ \ \ 3588 IPR002614 \ The orbivirus VP3 protein is part of the virus core and makes a 'subcore' shell made up of 120 copies of the 100K protein PUBMED:9774103. \ VP3 particles can also bind RNA and are fundamental in the early stages of viral core formation PUBMED:9774103.\ Also found in the family is structural core protein VP2 from broadhaven virus which is similar to VP3 in bluetongue virus PUBMED:1328474.\ Orbivirus are part of the larger reoviridae which have a dsRNA genome of 10-12 linear segments PUBMED:9774103; orbivirus found in this family include bluetongue virus and epizootic hemorrhagic disease virus.\ 4434 IPR007360 \ SirB up-regulates Salmonella typhimurium invasion gene transcription. It is, however, not essential for the expression of these genes. Its function is unknown PUBMED:10322010.\ 2323 IPR006482 \

    This protein is found in at least five species that contain CRISPR loci being found exclusively next to other cas proteins. Its function is unknown.

    \ 6336 IPR004479 \ This protein family is represented by a single member in nearly every completed large (> 1000 genes) prokaryotic genome. In Rhizobium meliloti, a species in which the exo genes make succinoglycan, a symbiotically important exopolysaccharide, exsB is located nearby and affects succinoglycan levels, probably through polar effects on exsA expression or the same polycistronic mRNA PUBMED:8544814. In Arthrobacter viscosus, the homologous gene is designated ALU1 and is associated with an aluminum tolerance phenotype PUBMED:9367855. The function is unknown.\ 4831 IPR000944 \ The following uncharacterized bacterial proteins have been shown to be evolutionary related, Desulphovibrio \ vulgaris protein Rrf2; Escherichia coli hypothetical proteins yfhP and yjeB; Bacillus subtilis hypothetical \ proteins yhdE, yrzC and ywgB; Mycobacterium tuberculosis hypothetical protein Rv1287; and Synechocystis \ strain PCC 6803 hypothetical protein slr0846. These are small proteins of 12 to 18 kD which seem to contain \ a signal sequence, and may represent a family of probable transcriptional regulators.\ 3101 IPR002627 \ tRNA isopentenyltransferases also known as tRNA delta(2)-isopentenylpyrophosphate transferases or IPP transferases. These enzymes modify both cytoplasmic and mitochondrial tRNAs at A(37) to give isopentenyl A(37) PUBMED:8139535.\ 2597 IPR004234 \

    FokI () is a member of an unusual class of bipartite restriction enzymes that recognize a specific DNA sequence and cleave DNA nonspecifically a short distance away from that sequence. It is a type IIs restriction endonuclease PUBMED:9724744. FokI contains amino- and carboxy-terminal domains corresponding to the DNA-recognition and cleavage functions (), respectively.

    \

    The recognition domain is made of three smaller subdomains (D1, D2 and D3) which are evolutionarily related to the helix-turn-helix-containing DNA-binding domain of the catabolite gene activator protein CAP PUBMED:9214510.

    \ 7496 IPR011655 \ These proteins include those ascribed to M penetrans paralogue family 26 in PUBMED:12466555.\ 7841 IPR012596 \

    Proteins in this family are bacteriophage GP30.3 proteins. Their function is poorly characterised PUBMED:8088550PUBMED:9272856.

    \ 2554 IPR007412 \ FlgM binds and inhibits the activity of the transcription factor sigma 28. Inhibition of sigma 28 prevents the expression of genes from flagellar transcriptional class 3, which include genes for the filament and chemotaxis. Correctly assembled basal body-hook structures export FlgM, relieving inhibition of sigma 28 and allowing expression of class 3 genes. NMR studies show that free FlgM is mostly unfolded, which may facilitate its export. The C-terminal half of FlgM adopts a tertiary structure when it binds to sigma 28. All mutations in FlgM that prevent sigma 28 inhibition affect the C-terminal domain and is the region thought to constitute the binding domain. A minimal binding domain has been identified between Glu 64 and Arg 88 in Salmonella typhimurium ().The N-terminal portion remains unstructured and may be necessary for recognition by the export machinery PUBMED:9095196.\ 7545 IPR001245 \ Protein kinases comprise a large family of enzymes that mediate the response of \ eukaryotic cells to external stimuli by phosphorylation of hydroxyamino acids. The \ enzymes fall into two broad classes, characterised with respect to substrate specificity: \ serine/threonine specific and tyrosine specific PUBMED:3291115. \ \

    Tyrosine phosphorylating activity was originally detected in two viral transforming \ proteins PUBMED:, but many retroviral transforming \ proteins and their cellular counterparts have since been shown to possess such activity. \ The growth factor receptors, which are activated by ligand binding, and the\ insulin-related peptide receptor, are also family members.

    \ \ 6565 IPR010618 \

    This family of proteins is very likely to act as transglycosylase enzymes related to and . These other families are weakly matched by this family, and include the known active site residues.

    \ 4531 IPR004141 \ Strictosidine synthase is a key enzyme in alkaloid biosynthesis. It catalyses the condensation of tryptamine with secologanin to form strictosidine.\ 4161 IPR001568 \

    The fungal ribonucleases T2 from Aspergillus oryzae, M from Aspergillus saitoi and Rh from Rhizopus niveus are structurally and functionally related 30 Kd glycoproteins PUBMED:2229029 that cleave the 3'-5' internucleotide linkage of RNA via a nucleotide 2',3'-cyclic phosphate intermediate ().

    \

    Two histidines residues have been shown PUBMED:2298207, PUBMED:1633875 to be involved in the catalytic mechanism of RNase T2 and Rh. These residues and the region around them are highly conserved in a number of other RNAses that have been found to be evolutionary related to these fungal enzymes.

    \ 5239 IPR008742 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases corresponds to MEROPS peptidase family C32 (clan CA). The type example is equine arteritis virus-type cysteine proteinase (porcine reproductive and respiratory syndrome virus), which is involved in viral polyprotein processing PUBMED:10725411.

    \ 1787 IPR006999 \

    The dlt operon (dltA to dltD) of Lactobacillus rhamnosus 7469 encodes four proteins responsible for the esterification of lipoteichoic acid (LTA) by D-alanine. These esters play an important role in controlling the net anionic charge of the poly (GroP) moiety of LTA. DltA and DltC encode the D-alanine-D-alanyl carrier protein ligase (Dcl) and D-alanyl carrier protein (Dcp), respectively. Whereas the functions of DltA and DltC are defined, the functions of DltB and DltD are unknown. In vitro assays showed that DltD bound Dcp for ligation with D-alanine by Dcl in the presence of ATP. In contrast, the homologue of Dcp, the Escherichia coli acyl carrier protein (ACP), involved in fatty acid biosynthesis, was not bound to DltD and thus was not ligated with D-alanine. DltD also catalyzed the hydrolysis of the mischarged D-alanyl-ACP. The hydrophobic N-terminal sequence of DltD was required for anchoring the protein in the membrane. It is hypothesized that this membrane-associated DltD facilitates the binding of Dcp and Dcl for ligation of Dcp with D-alanine and that the resulting D-alanyl-Dcp is translocated to the primary site of D-alanylation PUBMED:10781555.

    \ \ \

    These sequences contain the N-terminal region of DltD.

    \ \ \ 3127 IPR003820 \

    Kdp, the high affinity ATP-driven K+-transport system of Escherichia coli, is a complex of the membrane-bound subunits KdpA, KdpB, KdpC and the small peptide KdpF. KdpC forms strong interactions with the KdpA subunit, serving to assemble and stabilize the Kdp complex PUBMED:9858692. It has been suggested that KdpC could be one of the connecting links between the energy providing subunit KdpB and the K+- transporting subunit KdpA PUBMED:9858692. The K+ transport system actively transports K+ ions via ATP hydrolysis.

    \ 1429 IPR006018 \

    This group of proteins includes two protein families: caldesmon and lymphocyte specific protein.

    \

    Caldesmon (CDM) is an actin- and myosin-binding protein implicated in the\ regulation of actomyosin interactions in smooth muscle and non-muscle cells,\ possibly acting as a bridge between myosin and actin filaments PUBMED:1555769. CDM is\ believed to be an elongated molecule, with an N-terminal myosin/calmodulin-\ binding domain and a C-terminal tropomyosin/actin/calmodulin-binding domain,\ separated by a 40nm-long central helix PUBMED:1555769.

    \

    A high-molecular-weight form of CDM is predominantly expressed in smooth\ muscles, while a low-molecular-weight form is widely distributed in non-\ muscle tissues and cells (the protein is not expressed in skeletal muscle\ or heart).

    \ 272 IPR005629 \

    This family consists of the beta-glucan synthesis-associated proteins KRE6 and SKN1. Beta1,6-Glucan is a key component of the yeast cell wall, interconnecting cell wall proteins, beta1,3-glucan, and chitin. It has been postulated that the synthesis of beta1,6-glucan begins in the endoplasmic reticulum with the formation of protein-bound primer structures and that these primer structures are extended in the Golgi complex by two putative glucosyltransferases that are functionally redundant, Kre6 and Skn1. This is followed by maturation steps at the cell surface and by coupling to other cell wall macromolecules PUBMED:10601196.

    \ \ 5046 IPR007685 \ The functions of Escherichia coli RelA and SpoT differ somewhat. RelA () produces\ pppGpp (or ppGpp) from ATP and GTP (or GDP). SpoT () degrades ppGpp,\ but may also act as a secondary ppGpp synthetase. The two proteins are strongly similar.\ In many species, a single homolog to SpoT and RelA appears reponsible for both ppGpp\ synthesis and ppGpp degradation. \

    (p)ppGpp is a regulatory metabolite of the stringent response, but appears also to be\ involved in antibiotic biosynthesis in some species.

    \ 8091 IPR013231 \

    Abdominal perisympathetic organs of insects contain periviscerokinin neuropeptides of about 11 amino acids.

    \ 1663 IPR005558 \

    Crustacean neurohormone H proteins are referred to as precursor-related peptides as they are typically co-transcribed and translated with the CHH neurohormone (). However, in some species this neuropeptide is synthesized as a separate protein. Furthermore, neurohormone H can undergo proteolysis to give rise to 5 different neuropeptides PUBMED:3298549.

    \ 4339 IPR002133 \

    S-adenosylmethionine synthetase (MAT, ) is the enzyme that catalyzes the formation of S-adenosylmethionine (AdoMet) from methionine and ATP PUBMED:1696256. AdoMet is an important methyl donor for transmethylation and is also the propylamino donor in polyamine biosynthesis.

    \

    In bacteria there is a single isoform of AdoMet synthetase (gene metK), there are two in budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a multigene family.

    \

    The sequence of AdoMet synthetase is highly conserved throughout isozymes and species. The active sites of both the Escherichia coli and rat liver MAT reside between two subunits, with contributions from side chains of residues from both subunits,\ resulting in a dimer as the minimal catalytic entity. The side chains that contribute to the ligand binding sites are conserved between the two proteins. In the\ structures of complexes with the E. coli enzyme, the phosphate groups have the same positions in the (PPi plus Pi) complex and the (ADP plus Pi) complex,\ and are located at the bottom of a deep cavity with the adenosyl group nearer the entrance PUBMED:1213535.

    \ 7994 IPR012979 \

    This C-terminal domain is found in nucleolar proteins PUBMED:15112237, which have WD-40 repeats (see ).

    \ 2595 IPR005189 \

    Focal adhesion kinase (FAK) is a tyrosine kinase found in focal adhesions, intracellular signaling complexes that are formed following engagement of the extracellular matrix by integrins. The C-terminal "focal adhesion targeting" (FAT) region is necessary and sufficient for localizing FAK to focal adhesions. The crystal structure of FAT shows it forms a four-helix bundle that resembles those found in two other proteins involved in cell adhesion, alpha-catenin and vinculin PUBMED:11799401. The binding of FAT to the focal adhesion protein, paxillin, requires the integrity of the helical bundle, whereas binding to another focal adhesion protein, talin, does not.

    \ 1993 IPR005500 \

    This family consists of eubacterial and archaebacterial proteins of unknown function. The proteins contain a motif HXXXEXX(W/Y) where X can be any amino acid. This motif is likely to be functionally important and may be involved in metal binding.

    \ 2874 IPR000357 \

    The HEAT repeat is a tandemly repeated, 37-47 amino acid long module\ occurring in a number of cytoplasmic proteins, including the four\ name-giving proteins huntingtin, elongation factor 3 (EF3), the 65 Kd\ alpha regulatory subunit of protein phosphatase 2A (PP2A) and the\ yeast PI3-kinase TOR1 PUBMED:7550332. Arrays of HEAT repeats consists of 3 to 36\ units forming a rod-like helical structure and appear to function as \ protein-protein interaction surfaces. It has been noted that many\ HEAT repeat-containing proteins are involved in intracellular \ transport processes.

    \ \

    In the crystal structure of PP2A PR65/A PUBMED:9989501, the HEAT repeats consist\ of pairs of antiparallel alpha helices, as predicted in PUBMED:7550332.

    \ 7254 IPR009991 \

    This family contains p22, the smallest subunit of dynactin, a complex that binds to cytoplasmic dynein and is a required activator for cytoplasmic dynein-mediated vesicular transport. Dynactin localises to the cleavage furrow and to the midbodies of dividing cells, suggesting that it may function in cytokinesis PUBMED:9722614. Family members are approximately 170 residues long and seem to be restricted to mammals.

    \ 2774 IPR005200 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This is a family of eukaryotic beta-1,3-glucanases belonging to glycoside hydrolase family 81 ().

    \ 6934 IPR010776 \

    This family consists of several eukaryotic TBP-1 interacting protein (TBPIP) sequences. TBP-1 has been demonstrated to interact with the human immunodeficiency virus type 1 (HIV-1) viral protein Tat, then modulate the essential replication process of HIV. In addition, TBP-1 has been shown to be a component of the 26S proteasome, a basic multiprotein complex that degrades ubiquitinated proteins in an ATP-dependent fashion. Human TBPIP interacts with human TBP-1 then modulates the inhibitory action of human TBP-1 on HIV-Tat-mediated transactivation PUBMED:10806355.

    \ 5577 IPR008821 \ Rubella virus (RV), the Solea senegalensis member of the genus Rubivirus within the family Togaviridae, is a small enveloped, positive strand RNA virus. The nucleocapsid consists of 40S genomic RNA and a single species of capsid protein which is enveloped within a host-derived lipid bilayer containing two viral glycoproteins, E1 (58 kDa) and E2 (42-46 kDa). In virus infected cells, RV matures by budding either at the plasma membrane, or at the internal membranes depending on the cell type and enters adjacent uninfected cells by a membrane fusion process in the endosome, directed by E1-E2 heterodimers. The heterodimer formation is crucial for E1 transport out of the endoplasmic reticulum to the Golgi and plasma membrane. In RV E1, a cysteine at position 82 is crucial for the E1-E2 heterodimer formation and cell surface expression of the two proteins PUBMED:11682134. This family is found together with and .\ 7489 IPR011630 \ These proteins have no known function.\ 7802 IPR012939 \

    This domain occurs within alpha-1,2-mannosidases, which remove alpha-1,2-linked mannose residues from Man(9)(GlcNAc)(2) by hydrolysis. They are critical for the maturation of N-linked oligosaccharides and ER-associated degradation PUBMED:10026209.

    \ 5071 IPR007908 \

    This family consists of several outer membrane proteins (2a and 2b) from Brucella abortus.\ Brucellae abortus is Gram-negative, facultative intracellular bacteria that can infect many species of animals\ and humans PUBMED:9884218.

    \ 1421 IPR004676 \ These proteins are members of the Cadmium Resistance (CadD) Family. To date, this family of proteins has only been found in Gram-positive bacteria. The CadD family includes two close orthologues in two Staphylococcus species that have been reported to function in cadmium resistance, and another staphylococcal protein that has been reported to possibly function in quaternary ammonium ion export.\ 4435 IPR001347 \ The SIS (Sugar ISomerase) domain is a phosphosugar-binding domain PUBMED:10203754 found in \ many phosphosugar isomerases and phosphosugar binding proteins. SIS domains are also found\ in proteins that regulate the expression of genes involved in synthesis of phosphosugars\ possibly by binding to the end-product of the pathway.\ 561 IPR003891 \ This domain is found in DAP-5, eIF4G, MA-3 and other proteins. DAP-5 and MA-3 are involved in cell death or apoptosis. The domain is highly alpha-helical, and may contain repeats and/or regions similar to MIF4G domains.\ 4544 IPR005556 \

    This is a family of proteins is restricted to the fungi, the Saccharomycetales and Schizosaccharomycetales. In Saccharomyces cerevisiae they have been termed the SUN gene family, whose products display high homology in their 258 amino acid C-terminal domain. SIM1, UTH1, NCA3 (SUN, the founding members and now including SUN4) are involved in different cellular processes: DNA replication, ageing, mitochondrial biogenesis and in the cell septation process (SUN4) PUBMED:10870102.

    \ \ 4446 IPR005120 \

    Nonsense-mediated mRNA decay (NMD) is a surveillance mechanism by which eukaryotic cells detect and degrade transcripts containing premature termination codons. Three 'up-frameshift' proteins, UPF1, UPF2 and UPF3, are essential for this process in organisms ranging from yeast, human to plants PUBMED:11368911. \ \ Exon junction complexes (EJCs) are deposited ~24 nucleotides upstream of exon-exon junctions after splicing. Translation causes displacement of the EJCs, however, premature translation termination upstream of one or more EJCs triggers the recruitment of UPF1, UPF2 and UPF3 and activates the NMD pathway PUBMED:12718880, PUBMED:15048104.

    \ \

    This family contains UPF3. \ The crystal structure of the complex between human UPF2 and UPF3b, which are, respectively, a MIF4G (middle portion of eIF4G) domain and an RNP domain (ribonucleoprotein-type RNA-binding domain) has been determined to 1.95A. The protein-protein interface is mediated by highly conserved charged residues in UPF2 and UPF3b and involves the beta-sheet surface of the UPF3b ribonucleoprotein (RNP) domain, which is generally used by these domains to bind nucleic acids. In UPF3b the RNP domain does not bind RNA, whereas the UPF2 construct and the complex do. It is clear that some RNP domains have evolved for specific protein-protein interactions rather than as nucleic acid binding modules PUBMED:15004547.

    \ \ 3976 IPR004975 \ The Poxvirus trans-activator protein A1 is a general late promoter trans-activator. It is active in the intermediate stages of infection.\ 6656 IPR009644 \

    Fukutin is a eukaryotic protein necessary for the maintenance of muscle integrity, cortical histiogenesis, and normal ocular development. Mutations in the fukutin gene have been shown to result in Fukuyama-type congenital muscular dystrophy characterised by brain malformation - one of the most common autosomal-recessive disorders in Japan PUBMED:12783852. This family represents a short conserved region within fukutin-related proteins that is sometimes repeated.

    \ 979 IPR007261 \ Vps36 is involved in Golgi to endosome trafficking.\ 3678 IPR005311 \

    This domain is found at the N-terminus of Class B High Molecular Weight Penicillin-Binding Proteins. Its function has not been precisely defined, but is strongly implicated in PBP polymerisation. The domain forms a largely disordered "sugar tongs" structure.

    \ 4805 IPR005226 \

    This family has no known function. It includes potential membrane proteins.

    \ 4325 IPR004336 \ The molecular structure and function of the NS2 protein is not known. However, mutants lacking the NS2 grow at\ slower rates when compared to the wild-type yet NS2 is not essential for viral replication PUBMED:9847328.\ 5170 IPR008007 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M42 (glutamyl aminopeptidase family, clan MH). For members of this family and family M28 the predicted metal ligands occur in the same order in the sequence: H, D, E, D/E, H; and the active site residues occur in the motifs HXD and EE.

    \ 938 IPR002559 \ Autonomous mobile genetic elements such as transposon or insertion sequences (IS)\ encode an enzyme, transposase, that is required for excising and inserting\ the mobile element. Transposases have been grouped into various families PUBMED:8041625, PUBMED:1310791, PUBMED:1718819. This family includes the IS4 transposase.\ 4946 IPR002927 \ This family consists of virion host shutoff (VHS) proteins from various herpes viruses as well as varicella zoster virus and pseudorabies virus.\ The VHS proteins inhibit cellular gene expression in infected cells.\ The VHS polypeptide destabilizes preexisting host mRNAs and ensures rapid turn over of viral mRNAs PUBMED:9311788.\ 5296 IPR008413 \

    A possible function for these proteins is to guide the\ assembly of the membrane sector of the ATPase\ enzyme complex PUBMED:7961438.\

    \ 7192 IPR009197 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 5198 IPR008033 \

    The filamentous bacteriophages are flexible rods about 1 to 2 microns long and 6 nm in diameter, with a helical shell of protein subunits\ surrounding a DNA core. The approximately 50-residue coat protein subunit is largely alpha-helix and the axis of the alpha-helix\ makes a small angle with the axis of the virion. The protein shell can be considered in three sections: the outer surface, occupied by the\ N-terminal region of the subunit, rich in acidic residues that interact with the surrounding solvent and give the virion a low isoelectric\ point; the interior of the shell, including a 19-residue stretch of apolar side-chains, where protein subunits interact mainly with each\ other; and the inner surface, occupied by the C-terminal region of the subunit, rich in basic residues that interact with the DNA core.

    \

    This is a family of class I phage major coat protein Gp8 or B which is a baseplate structural protein. The coat protein is largely alpha-helix with a slight\ curve PUBMED:8289247.

    \ 5111 IPR007948 \

    This family consists of several uncharacterised bacterial proteins of unknown function.

    \ 5386 IPR008469 \ This family contains several plant plasma membrane proteins termed DREPPs as they are developmentally regulated plasma membrane polypeptides PUBMED:9415814.\ 5081 IPR007918 \

    This family of proteins is functionally uncharacterised.

    \ 836 IPR006059 \

    Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. The protein components of these traffic systems include one or two transmembrane protein components, one or two membrane-associated ATP-binding proteins and a high affinity periplasmic solute-binding protein. In Gram-positive bacteria, which are surrounded by a single membrane and therefore have no periplasmic region, the equivalent proteins are bound to the membrane via an N-terminal lipid anchor. These homologue proteins do not play an integral role in the transport process per se, but probably serve as receptors to trigger or initiate translocation of the solute through the membrane by binding to external sites of the integral membrane proteins of the efflux system. In addition at least some solute-binding proteins function in the initiation of sensory transduction pathways.

    \

    On the basis of sequence similarities, the vast majority of these solute-binding proteins can be grouped PUBMED:8336670\ into eight families of clusters, which generally correlate with the nature of the solute bound. Family 1 currently \ includes the periplasmic proteins maltose/maltodextrin-binding proteins of Enterobacteriaceae (gene malE) PUBMED:7853407 \ and Streptococcus pneumoniae malX; multiple oligosaccharide binding protein of Streptococcus mutans (gene msmE); Escherichia coli \ glycerol-3-phosphate-binding protein; Serratia marcescens iron-binding protein (gene sfuA) and the homologous proteins \ (gene fbp) from Haemophilus influenzae and Neisseria; and Escherichia coli thiamine-binding protein (gene tbpA).

    \ 6037 IPR010405 \

    This family consists of several cofactor of BRCA1 (COBRA1) like proteins. It is thought that COBRA1 along with BRCA1 is involved in chromatin unfolding. COBRA1 is recruited to the chromosome site by the first BRCT repeat of BRCA1, and is itself sufficient to induce chromatin unfolding. BRCA1 mutations that enhance chromatin unfolding also increase its affinity for, and recruitment of, COBRA1. It is thought that that reorganisation of higher levels of chromatin structure is an important regulated step in BRCA1-mediated nuclear functions PUBMED:11739404.

    \ 3737 IPR002620 \

    The alphaviruses produce two mRNAs after infection: the genomic (49S) RNA which is translated into the nonstructural (replicase) proteins and the subgenomic (26S) RNA which serves as the mRNA for the virion structural proteins. The long polyprotein comprises individual nonstructural proteins that are formed by a proteolytic processing steps to give nsPl, nsP2, nsP3 and nsP4 PUBMED:3488539. This signature identifies non-structural protein 2 (nsP2) which has two reported activities: \

  • 1) the nonstructural protein Nsp2 (799 amino acids) of Semliki Forest virus and Sindbis virus specifically cleave the gamma,beta-triphosphate bond at the 5' end of RNA. This activity is restricted to the N-terminal region, the C-terminal domain having no RNA triphosphatase activity PUBMED:10748213.
  • \
  • 2) nsP2 belongs to MEROPS peptidase family C9 (clan CA) and is required for the processing of the polyprotein PUBMED:11257180.
  • \ 1475 IPR003153 \

    Cbl adaptor proteins are RING-type E3 ubiquitin ligases. Cbl may be involved in the negative regulation of thymocyte development, targeting its substrate for ubiquitination PUBMED:11864842. The ubiquitin ligase activity of Cbl, and of its homologue Cbl-b, plays a role in the negative regulation of upstream kinases, such as Lck, Syk and PI3K, in T and B cells PUBMED:12787751. Cbl can interact with the EGF receptor (EGFR), causing the ubiquitination of the receptor following EGF ligand binding and Grb2 association. Ubiquitination is required for ligand-induced endocytosis of the EGFR PUBMED:15194809. The N-terminal domain of Cbl is evolutionarily conserved, and is known to bind to phosphorylated tyrosine residues.

    \ 2670 IPR000583 \

    A large group of biosynthetic enzymes are able to catalyse the removal of the ammonia group from glutamine and\ then to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as\ glutamine amidotransferase (GATase) () PUBMED:4355768. The GATase domain exists either as a separate polypeptidic\ subunit or as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence\ similarities two classes of GATase domains have been identified PUBMED:3298209, PUBMED:6086650, class-I (also known as\ trpG-type) and class-II (also known as purF-type). Enzymes containing Class-II GATase domains include amido\ phosphoribosyltransferase (glutamine phosphoribosylpyrophosphate amidotransferase) (), which catalyses the\ first step in purine biosynthesis (gene purF in bacteria, ADE4 in yeast); glucosamine--fructose-6-phosphate aminotransferase\ (), which catalyses the formation of glucosamine 6-phosphate from fructose 6-phosphate and glutamine\ (gene glmS in Escherichia coli, nodM in Rhizobium, GFA1 in yeast); and asparagine synthetase (glutamine-hydrolizing) (), which is responsible for the synthesis of asparagine from aspartate and glutamine. A cysteine is present at the N-terminal extremity of the mature form of all these enzymes.

    \ \

    This domain is found in a number of cysteine peptidases belonging to MEROPS peptidase family C44 and their non-peptidase homologs.

    \ 2617 IPR005548 \

    FtsQ is one of several cell division proteins. FtsQ interacts with other Fts proteins, reviewed in PUBMED:9864306. The precise function of FtsQ is unknown.

    \ 5516 IPR008542 \ This family consists of a series of repeated sequences (of around 180 residues) which are found in Salmonella typhimurium, Salmonella typhi and Escherichia coli. These repeats are almost always found with this entry. The repeats are associated with RatA and RatB, the coding sequences of which are found in the pathogeneicity island of Salmonella. The sequences may be determinants of pathogenicity PUBMED:12540539, PUBMED:15347755.\ 2963 IPR001801 \

    The histone-like nucleoid-structuring (H-NS) protein belongs to a family of bacterial proteins that play a role in the\ formation of nucleoid structure and affect gene expression under certain conditions PUBMED:7875316.

    \ 6589 IPR009612 \

    This entry represents a conserved region within several bacterial proteins that resemble ImcF, which has been proposed PUBMED:12127983 to be involved in Vibrio cholerae cell surface reorganisation, resulting in increased adherence to epithelial cells and increased conjugation frequency. Note that many entry members are hypothetical proteins.

    \ 5192 IPR008027 \

    The UQCRX/QCR9 protein is the 9/10 subunit of complex III, and is a protein of about 7\ kDa. Deletion of QCR9 results in the inability of Saccharomyces cerevisiae to grow on a fermentable carbon\ source PUBMED:8382892. The protein is part of the mitchondrial respiratory chain.

    \ 1034 IPR005163 \

    This small triple helical domain has been predicted to assume a topology similar to helix-turn-helix domains. These domains are found at the C-terminus of proteins related to .

    \ 6891 IPR009770 \

    This entry represents the C terminus (approximately 100 residues) of a number of hypothetical bacterial proteins of unknown function.

    \ 7270 IPR010002 \

    This family contains a number of ponericin peptides (approximately 30 residues long) from the venom of the predatory ant Pachycondyla goeldii. These peptides exhibit antibacterial and insecticidal properties, and may adopt an amphipathic alpha-helical structure in polar environments such as cell membranes PUBMED:11279030.

    \ 4316 IPR007178 \

    DNA-directed RNA polymerase catalyses the transcription of DNA into RNA using the four ribonucleoside triphosphates as substrates. In Sulfolobus acidocaldarius, RpoE2 is one of 13 subunits in the RNA polymerase. RpoE2 in Methanococcus jannaschii contains a predicted C4-type zinc finger at positions 4 to 19 and this sequence has been noted as a potential metal binding motif in Sulfolobus acidocaldarius PUBMED:8127719. It is possible that family members contain a C4 zinc finger.

    \ 6446 IPR009529 \

    This family consists of several Maize streak virus proteins of unknown function.

    \ 5204 IPR008039 \

    Although archaeal flagella appear superficially similar to those of bacteria, they are quite\ distinct PUBMED:11250034. In several archaea, the flagellin genes are followed immediately by the\ flagellar accessory genes flaCDEFGHIJ. The gene products may have a role in translocation,\ secretion, or assembly of the flagellum. FlaC is a protein whose exact role is unknown but it has\ been shown to be membrane-associated (by immuno-blotting fractionated cells)\ PUBMED:11717274.

    \ 3925 IPR006744 \

    This family contains vaccinia virus protein A12 and its homologues. VVA12 is a virion protein though its function is unknown.

    \ 7797 IPR012848 \

    Most eukaryotic endopeptidases (MEROPS peptidase family A1) are synthesised with signal and propeptides. The animal pepsin-like endopeptidase propeptides form a distinct family of propeptides, which contain a conserved motif approximately 30 residues long. In pepsinogen A, the first 11 residues of the mature pepsin sequence are displaced by residues of the propeptide. The propeptide contains two helices that block the active site cleft, in particular the conserved Asp11 residue, in pepsin, hydrogen bonds to a conserved Arg residue in the propeptide. This hydrogen bond stabilises the propeptide conformation and is probably responsible for triggering the conversion of pepsinogen to pepsin under acidic conditions PUBMED:1594574, PUBMED:2056534.

    \ 1290 IPR005598 \

    Membrane-bound ATP synthases (F1F0) catalyze the synthesis of ATP via a rotary catalytic mechanism utilizing the energy of an\ electrochemical ion gradient. The transmembrane potential is supposed to propel rotation of a subunit c ring of F0 together with\ subunits gamma and epsilon of F1, thereby forming the rotor part of the enzyme, whereas the remainder of the F1F0 complex\ functions as a stator for compensation of the torque generated during rotation.

    \

    A possible function for this protein is to guide the assembly of the membrane sector of the ATPase enzyme complex.

    \ 4698 IPR001959 \

    This entry represents a conserved region of a probable transposase family, which is found in a number of uncharacterised bacterial proteins. A novel insertion sequence (IS)-like element of the\ thermophilic bacterium PS3 that promotes expression of the\ alanine carrier protein-encoding gene PUBMED:7557457 belongs to this entry.

    \ 6968 IPR008355 \

    \ Interferon (INF)-gamma is a dimeric glycoprotein produced by activated T\ cells and natural killer cells. Although originally isolated based on its\ antiviral activity, INF-gamma also displays powerful anti-proliferative and\ immuno-modulatory activities, which are essential for developing appropriate\ cellular defenses against a variety of infectious agents. The first step in\ eliciting these responses is the specific high affinity interaction of INF-\ gamma with its cell-surface receptor (INF-gammaRalpha); the complex then\ interacts with at least one of a family of additional species-specific\ accessory factors (AF-1 or INF-gammabeta), which convey different cellular\ responses. One such response is the association and phosphorylation of two\ protein tyrosine kinases (Jak-1 and Jak-2), which in turn stimulate nuclear\ transcription activators PUBMED:7617032.\

    \

    \ The human INF-gammaR, is a member of the hematopoietic cytokine receptor\ superfamily. It is expressed in a membrane-bound form in many cell types,\ and is over-expressed in tumour cells. It comprises an extracellular portion\ of 229 residues, a single transmembrane region, and a cytoplasmic domain of\ 221 residues. As with other members of its superfamily, the cytokine-binding\ sites are formed by a small set of closely-spaced surface loops that extend\ from a beta-sheet core, much like antigen-binding sites on antibodies. The\ extracellular INF-gammaR monomer comprises two domains (domain D1 from\ residue 14-102, and domain D2 from residue 114-221), each resembling an Ig\ fold with fibronectin type III topology PUBMED:9367779.\

    \ 4364 IPR006341 \

    This is a family of small, glutamine and asparagine-rich peptides that store amino acids in the spores of Bacillus subtilis and related bacteria. Most members of the family have two copies of the spore protease (GPR) cleavage motif, typically EFASE in this family, separating three low-complexity repeats.

    \ 2085 IPR007349 \

    Tihs is a probable integral membrane protein. It is usually found associated with the domain of unknown function DUF405 ().

    \ 997 IPR004938 \ Plant cell walls are crucial for development, signal transduction, and disease resistance in plants. Cell walls are made of cellulose,\ hemicelluloses, and pectins. Xyloglucan (XG), the principal load-bearing hemicellulose of dicotyledonous plants, has a terminal fucosyl\ residue. This fucosyltransferase adds this residue PUBMED:10373113. \ 2089 IPR007354 \ This protein is predicted to be an integral membrane protein.\ 3577 IPR005618 \ This family includes outer membrane protein W (OmpW) proteins from a variety of bacterial species. This protein may form the receptor for S4 colicins in Escherichia coli PUBMED:10348872.\ 8030 IPR013146 \

    This entry included thymopoietins; short proteins of 49 amino acid isolated from bovine spleen cells PUBMED:7306506. Thymopoietins (TMPOs) are a group of ubiquitously expressed nuclear proteins. They are suggested to play an important role in nuclear envelope organisation and cell cycle control PUBMED:10430029.

    \ \

    Thymopoietins are characterised by LEM (LAP2, emerin, MAN1) domain, this is a globular module of approximately 40 amino acids, which is mostly found in the nucleoplasmic portions of metazoan inner nuclear membrane proteins. The LEM domain has been shown to mediate binding to BAF (barrier-to-autointegration factor) and BAF-DNA complexes. BAF dimers bind to double-stranded DNA non-specifically and thereby bridge DNA molecules to form a large, discrete nucleoprotein complex PUBMED:14618255, PUBMED:10671519.

    \ \

    The resolution of the solution structure of the LEM domain reveals that it is composed of a three-residue N-terminal helical turn and two large parallel α helices interacting through a set of conserved hydrophobic amino acids. The two helices, which are connected by a long loop are oriented at an angle of ~45 degree PUBMED:11435115.

    \ 767 IPR004579 \ All proteins in this family for which functions are known are components in a multiprotein endonuclease complex (usually made up of Rad1 and Rad10 homologs). This complex is used primarily for nucleotide excision repair but also for some aspects of recombination repair. In yeast, Rad10 works as a heterodimer with Rad1, and is involved in nucleotide excision repair of DNA damaged with UV light, bulky adducts or cross-linking agents. The complex forms an endonuclease which specifically degrades single-stranded DNA.\ 416 IPR006204 \

    The galacto- (), homoserine (), mevalonate () and phosphomevalonate () kinases contain, in their N-terminal section, a conserved Gly/Ser-rich region which is probably involved in the binding of ATP PUBMED:1846667, PUBMED:10562426. This group of kinases has been called 'GHMP' (from the first letter of their substrates).

    \ 1665 IPR003090 \

    The crystallins are water-soluble structural proteins that occur in\ high concentration in the cytoplasm of eye lens fiber cells. Four major\ groups of crystallin have been distinguished on the basis of size,\ charge and immunological properties: alpha-, beta- and gamma-crystallins\ occur in all vertebrate classes (though gamma-crystallins are low or\ absent in avian lenses); and delta-crystallin is found exclusively in\ reptiles and birds PUBMED:2688200, PUBMED:7634077.

    \

    Alpha-crystallin occurs as large aggregates, comprising two types of related\ subunits (A and B) that are highly similar to the small (15-30kDa) heat\ shock proteins (HSPs), particularly in their C-terminal halves. The\ relationship between these families is one of classic gene duplication\ and divergence, from the small HSP family, allowing adaptation to novel\ functions. Divergence probably occurred prior to evolution of the eye\ lens, alpha-crystallin being found in small amounts in tissues outside\ the lens PUBMED:2688200.

    \ 2594 IPR005070 \ Expression of the envelope (Env) glycoprotein is essential for viral particle egress. This feature is unique to the Spumavirinae, a subclass of the Retroviridae. \ 449 IPR005202 \

    Sequence analysis of the products of the GRAS (GAI, RGA, SCR) gene family indicates that they share a variable N-terminus and a highly conserved C-terminus that contains five recognizable motifs PUBMED:10341448. Proteins in the GRAS family are transcription factors that seem to be involved in development and other processes. Mutation of the SCARECROW (SCR) gene results in a radial pattern defect, loss of a ground tissue layer, in the root. The PAT1 protein is involved in phytochrome A signal transduction PUBMED:10817761.

    \

    \ GRAS proteins contain a conserved region of about 350 amino acids that can be\ divided in 5 motifs, found in the following order: leucine heptad repeat I,\ the VHIID motif, leucine heptad repeat II, the PFYRE motif and the SAW motif\ PUBMED:10341448, PUBMED:14760535. Plant specific GRAS proteins have parallels in their motif structure to\ the animal Signal Transducers and Activators of Transcription (STAT) family of\ proteins PUBMED:10842311 which suggests also some parallels in their functions.

    \ 682 IPR001096 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the MEROPS peptidase family C13 (legumain family, clan CD). A type example is legumain from Canavalia ensiformis. The blood fluke parasite Schistosoma mansoni has two cysteine proteases in its digestive tract, one a cathepsin B-like protease, the other termed hemoglobinase PUBMED:7845226, PUBMED:3305515. The latter has been hard to purify, free of cathepsin\ B, and expressed forms in Escherichia coli prove to be inactive, suggesting that hemoglobinase may act in association with cathepsin B PUBMED:7845226, PUBMED:8457210. Plant vacuolar processing enzyme and legumain from legumes PUBMED:7845226 have been shown to have\ sequence and functional similarity to hemoglobinase. The catalytic residues\ of the family are currently unknown, but sequence alignments reveal one\ totally conserved cysteine and two totally conserved histidines.

    \ 6127 IPR009389 \

    This family consists of several hypothetical proteins from Agrobacterium, Rhizobium and Brucella species. The function of this family is unknown.

    \ 3626 IPR001429 \

    P2X purinoceptors are cell membrane ion channels, gated by adenosine 5'-triphosphate (ATP) and other nucleotides; they have been found to be widely expressed on mammalian cells, and, by means of their functional properties, can be differentiated into three sub-groups. The first group is almost equally well activated by ATP and its analogue alphabetamethyleneATP, whereas, the second group is not activated by the latter compound. A third type of receptor (also called P2Z) is distinguished by the fact that repeated or prolonged agonist application leads to the opening of much larger pores, allowing large molecules to traverse the cell membrane. This increased permeability rapidly leads to cell death, and lysis.

    \ \

    Molecular cloning studies have identified seven P2X receptor subtypes, designated P2X1-P2X7. These receptors are proteins that share 35-48% amino acid identity, and possess two putative transmembrane (TM) domains, separated by a long (~270 residues) intervening sequence, which is thought to form an extracellular loop. Around 1/4 of the residues within the loop are invariant between the cloned subtypes, including 10 characteristic cysteines.

    \ \

    Studies of the functional properties of heterologously expressed P2X receptors, together with the examination of their distribution in native tissues, suggests they likely occur as both homo- and heteromultimers in vivo PUBMED:10414359, PUBMED:12270951.

    \ \

    \ 460 IPR006861 \ This family includes the HABP4 family of hyaluronan-binding proteins, and the PAI-1 mRNA-binding protein, PAI-RBP1. HABP4 has been observed to bind hyaluronan (a glucosaminoglycan), but it is not known whether this is its primary role in vivo. It has also been observed to bind RNA, but with a lower affinity than that for hyaluronan PUBMED:10887182. PAI-1 mRNA-binding protein specifically binds the mRNA of type-1 plasminogen activator inhibitor (PAI-1), and is thought to be involved in regulation of mRNA stability PUBMED:11001948. However, in both cases, the sequence motifs predicted to be important for ligand binding are not conserved throughout the family, so it is not known whether members of this family share a common function.\ 7067 IPR009876 \

    This family consists of several Neisseria species specific OpcA outer membrane proteins. Opc (formerly called 5C) is one of the major outer membrane proteins and has been shown to play an important role in meningococcal adhesion and invasion of both epithelial and endothelial cells PUBMED:12706886.

    \ 6627 IPR009631 \

    This family consists of several hypothetical plant and photosynthetic bacterial proteins of around 160 residues in length. The function of this family is unknown although looking at the species distribution the protein may play a part in photosynthesis.

    \ 157 IPR007715 \ Coq4p was shown to peripherally associate with the matrix face of the mitochondrial inner membrane. The putative mitochondrial- targeting sequence present at the N terminus of the polypeptide efficiently imports it to mitochondria. The function of Coq4p is unknown, although its presence is required to maintain a steady-state level of Coq7p, another component of the Q biosynthetic pathway PUBMED:11469793.\ 5171 IPR008008 \

    This is a short conserved region found in some transposons.

    \ 948 IPR001254 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine proteases belong to the MEROPS peptidase family S1 (chymotrypsin family, clan PA(S))and to peptidase family S6 (Hap serine peptidases).

    \

    The chymotrypsin family is almost totally confined to animals, although trypsin-like enzymes are found in actinomycetes of the genera Streptomyces and Saccharopolyspora, and in the fungus Fusarium oxysporum PUBMED:7845208. The enzymes are inherently secreted, being synthesised with a signal peptide that\ targets them to the secretory pathway. Animal enzymes are either secreted\ directly, packaged into vesicles for regulated secretion, or are retained\ in leukocyte granules PUBMED:7845208.

    \

    The Hap family, 'Haemophilus adhesion and penetration', are proteins that play a role in the interaction with human epithelial cells. The serine protease activity is localized at the N-terminal domain, whereas the binding domain is in the C-terminal region.

    \ 3562 IPR002993 \ Ornithine decarboxylase antizyme (ODC-AZ) PUBMED:7813017 binds to, and destabilizes,\ ornithine decarboxylase (ODC), a key enzyme in polyamine synthesis. ODC is\ then rapidly degraded. The expression of ODC-AZ requires programmed, ribosomal\ frameshifting which is modulated according to the cellular concentration of\ polyamines. High levels of polyamines induce a +1 ribosomal frameshift in the\ translation of mRNA for the antizyme leading to the expression of a full-\ length protein.\ At least two forms of ODC-AZ exist in mammals PUBMED:9782076 and the protein has been\ found in Drosophila (protein Gutfeeling).\ 3214 IPR005297 \

    This family occurs as tandem repeats in a set of lipoproteins. The alignment contains a Y-X4-D motif.

    \ 1928 IPR003831 \

    This entry describes proteins of unknown function.

    \ 6978 IPR009820 \

    This family consists of several Paramecium bursaria chlorella virus 1 (PBCV-1) proteins of around 250 residues in length. The function of this family is unknown.

    \ 5720 IPR008879 \ This family consists of several coat proteins which are specific to the ssRNA positive-strand, no DNA stage viruses such as the Trichoviruses and Vitiviruses.\ 6680 IPR009658 \

    This entry represents a conserved region within a number of proteins of unknown function that seem to be specific to Caenorhabditis elegans. Note that some proteins in the entry contain more than one copy of this region.

    \ 2365 IPR001372 \

    Dynein is a multisubunit microtubule-dependent motor enzyme that acts as the force generating protein of eukaryotic cilia and flagella. The cytoplasmic\ isoform of dynein acts as a motor for the intracellular retrograde motility of\ vesicles and organelles along microtubules.

    \

    Dynein is composed of a number of\ ATP-binding large subunits (see ), intermediate size subunits and small subunits.\ Among the small subunits, there is a family of highly conserved proteins which make up this family PUBMED:7744782, PUBMED:8628263.\

    \ 1640 IPR007745 \ Cox17p is essential for the assembly of functional cytochrome c oxidase (CCO) and for delivery of copper ions to the mitochondrion for insertion into the enzyme in Saccharomyces cerevisiae PUBMED:12370308.\ 5242 IPR008745 \ The domain is found towards the N terminus of the polyprotein of Malus x domestica stem grooving virus, Citrus tatter leaf virus and from Pear black necrotic leaf spot virus, its function is unknown PUBMED:1413530, PUBMED:8277280.\ 2039 IPR007162 \ This is an archaeal family of unknown function.\ 1020 IPR003689 \ These ZIP zinc transporter proteins define a family of metal ion transporters that are found in plants, protozoa, fungi, invertebrates, and vertebrates, making it now possible to address questions of metal ion accumulation and homeostasis in diverse organisms PUBMED:9618566.\ 3137 IPR005540 \

    The MEINOX region is comprised of two domains, KNOX1 and KNOX2. KNOX1 plays a role in suppressing target gene expression. KNOX2, essential for function, is thought to be necessary for homo-dimerization PUBMED:11549765.

    \ 6246 IPR004468 \ CTP synthase is involved in pyrimidine ribonucleotide/ribonucleoside metabolism. The enzyme catalyzes the reaction L-glutamine + H2O + UTP + ATP = CTP + phosphate + ADP + L-glutamate. The enzyme exists as a dimer of identical chains that aggregates as a tetramer. This gene has been found circa 500 bp 5 upstream of enolase in both beta (Nitrosomonas europaea) and gamma (Escherichia coli) subdivisions of proteobacterium PUBMED:9711852.\ 4158 IPR000026 \ Ribonuclease N1 (RNase N1) is a\ guanine-specific ribonuclease from fungi. RNase T1 and other bacteria RNases are related.\ \

    The enzyme hydrolyses the phosphodiester bonds in RNA and oligoribonucleotides PUBMED:8110767, resulting in 3'-nucleoside monophosphates via 2',3'-cyclophosphate intermediates.

    \ 1275 IPR002650 \ This entry consists of ATP-sulphurylase or sulphate adenylyltransferase (0 some of which are part of a bifunctional polypeptide chain associated with adenosyl phosphosulphate (APS) kinase, . Both enzymes are required for PAPS (phosphoadenosine-phosphosulphate) synthesis from inorganic sulphate PUBMED:8522184. ATP sulphurylase catalyses the synthesis of adenosine-phosphosulphate APS from ATP and inorganic sulphate PUBMED:9671738.\ 7151 IPR010848 \

    This family consists of several hypothetical bacterial proteins of around 180 residues in length. The function of this family is unknown.

    \ 7920 IPR012624 \

    This family consists of the I-superfamily of conotoxins. This is a new class of peptides in the venom of some Conus species. These toxins are characterised by four disulfide bridges and inhibit of modify ion channels of nerve cells. The I-superfamily conotoxins is found in five or six major clades of cone snails and could possible be found in many more species PUBMED:15450929.

    \ 1087 IPR004000 \

    Actin PUBMED:1388079, PUBMED:8448030 is a ubiquitous protein involved in the formation of filaments\ that are major components of the cytoskeleton. These filaments interact \ with myosin to produce a sliding effect, which is the basis of muscular\ contraction and many aspects of cell motility, including cytokinesis. Each\ actin protomer binds one molecule of ATP and has one high affinity site for\ either calcium or magnesium ions, as well as several low affinity sites.\ Actin exists as a monomer in low salt concentrations, but filaments form\ rapidly as salt concentration rises, with the consequent hydrolysis of ATP.\ Actin from many sources forms a tight complex with deoxyribonuclease\ (DNase I) although the significance of this is still unknown. The formation\ of this complex results in the inhibition of DNase I activity, and actin\ loses its ability to polymerise. It has been shown that an ATPase domain\ of actin shares similarity with ATPase domains of hexokinase and hsp70\ proteins PUBMED:1828889, PUBMED:1323828.

    \

    In vertebrates there are three groups of actin isoforms: alpha, beta and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exists in most cell types as components of the cytoskeleton and as mediators of internal cell motility. In plants there are many isoforms which are probably involved in a variety of functions such as cytoplasmic streaming, cell shape determination, tip growth, graviperception, cell wall deposition, etc.

    \

    Recently some divergent actin-like proteins have been identified in several species. These proteins include centractin (actin-RPV) from mammals, fungi yeast ACT5, Neurospora crassa ro-4) and Pneumocystis carinii, which seems to be a component of a multi-subunit centrosomal complex involved in microtubule based vesicle motility (this subfamily is known as ARP1); ARP2 subfamily, which includes chicken ACTL, Saccharomyces cerevisiae ACT2, Drosophila melanogaster 14D and Caenorhabditis elegans actC; ARP3 subfamily, which includes actin 2 from mammals, Drosophila 66B, yeast ACT4 and Schizosaccharomyces pombe act2; and ARP4 subfamily, which includes yeast ACT3 and Drosophila 13E.

    \ 2648 IPR003036 \ P30 is essential for viral assembly PUBMED:2414902.\ Cleavage of P70 in vitro can be accompanied by a shift from a concentrically coiled internal strand ("immature") to a collapsed ("mature") form of the virus core PUBMED:410020.\ 3132 IPR007659 \ This is a family of keratins, high-sulphur matrix proteins. The keratin products of mammalian epidermal derivatives such as wool and hair consist of microfibrils embedded in a rigid matrix of other proteins. The matrix proteins include the high-sulphur and high-tyrosine keratins, having molecular weights of 6-20 kDa, whereas microfibrils contain the larger, low-sulphur keratins (40-56 kDa) PUBMED:4678578.\ 8041 IPR013246 \

    The Sgf11 family is a SAGA complex subunit in Saccharomyces cerevisiae. The SAGA complex is a multisubunit protein complex involved in transcriptional regulation. SAGA combines proteins involved in interactions with DNA-bound activators and TATA-binding protein (TBP), as well as enzymes for histone acetylation and deubiquitylation PUBMED:15657441.

    \ 6121 IPR010440 \

    These lipopolysaccharide kinases are related to protein kinases . This family includes waaP (rfaP) gene product is required for the addition of phosphate to O-4 of the first heptose residue of the lipopolysaccharide (LPS) inner core region. It has previously been shown that WaaP is necessary for resistance to hydrophobic and polycationic antimicrobials in Escherichia coli and that it is required for virulence in invasive strains of S. enterica PUBMED:11069912.

    \ 2279 IPR002752 \ These proteins of unknown function are found in bacteria and archaebacteria.\ 5991 IPR010382 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 7672 IPR006631 \

    These are domains of unknown function found in Drosophila melanogaster proteins of unknown\ function.

    \ 216 IPR006133 \

    DNA is the biological information that instructs cells how to exist in an ordered fashion: accurate replication is thus one of the\ most important events in the life cycle of a cell. This function is performed by DNA- directed DNA-polymerases )\ by adding nucleotide triphosphate (dNTP) residues to the 5'-end of the growing chain of DNA, using a complementary DNA\ chain as a template. Small RNA molecules are generally used as primers for chain elongation, although terminal proteins\ may also be used for the de novo synthesis of a DNA chain. Even though there are 2 different methods of priming, these are\ mediated by 2 very similar polymerases classes, A and B, with similar methods of chain elongation. \ \ A number of DNA polymerases have been grouped under the designation of DNA polymerase family B. Six regions\ of similarity (numbered from I to VI) are found in all or a subset of the B family polymerases. The most conserved region (I)\ includes a conserved tetrapeptide with two aspartate residues. Its function is not yet known. However, it has been suggested\ that it may be involved in binding a magnesium ion. All sequences in the B family contain a characteristic DTDS motif, and\ possess many functional domains, including a 5'-3' elongation domain, a 3'-5' exonuclease domain PUBMED:8679562, a DNA binding domain,\ and binding domains for both dNTP's and pyrophosphate PUBMED:9757117.

    \

    This domain has 3' to 5' exonuclease activity and adopts a ribonuclease H type fold PUBMED:8679562.

    \ 412 IPR000683 \

    This group of enzymes utilise NADP or NAD, and is known as the GFO/IDH/MOCA family in Swiss-Prot.\ GFO is a glucose--fructose oxidoreductase, which converts D-glucose and D-fructose into\ D-gluconolactone and D-glucitol in the sorbitol-gluconate pathway. MOCA is a rhizopine catabolism\ protein which may catalyze the NADH-dependent dehydrogenase reaction involved in rhizopine catabolism.\ Other proteins belonging to this family include Gal80, a negative regulator for the expression of lactose and\ galactose metabolic genes; and several hypothetical proteins from yeast, Escherichia coli and Bacillus subtilis.

    \

    The oxidoreductase, N-terminal domain is almost always associated with the oxidoreductase, C-terminal domain (see ).

    \ 7861 IPR013121 \

    This entry contains ferric reductase NAD binding proteins.

    \ 2485 IPR003516 \ Fanconi anaemia (FA) PUBMED:1641028, PUBMED:8490620, PUBMED:7929819 is a recessive inherited disease characterised by\ defective DNA repair. FA cells are sensitive to DNA cross-linking agents\ that cause chromosomal instability and cell death. The disease is manifested\ clinically by progressive pancytopenia, variable physical anomalies, and\ predisposition to malignancy PUBMED:7929819. Four complementation groups have been\ identified, designated A to D. The FA group A gene (FAA) has been\ cloned PUBMED:9169126, but its function remains to be elucidated.\ 6809 IPR010728 \

    This entry represents a conserved region approximately 120 residues long within the bacterial Flp pilus assembly protein CpaB.

    \ 1270 IPR004312 \ ATHILA is a group of Arabidopsis thaliana retrotransposons PUBMED:8534844 belonging to the Ty3/gypsy family of the long terminal\ repeat (LTR) class of eukaryotic retrotransposonsPUBMED:9611185, PUBMED:10889217. The central region of ATHILA retrotransposons contains two or\ three open reading frames (ORFs). This family represents the ORF1 product. The function of ORF1 is unknown.\ 6417 IPR009512 \

    This family consists of several circumsporozoite-related antigen (CRA) or exported protein-1 (EXP1) sequences found specifically in Plasmodium species. The function of this family is unknown.

    \ 3334 IPR001077 \

    Methyl transfer from the ubiquitous S-adenosyl-L-methionine (AdoMet) to either nitrogen, oxygen or carbon atoms is frequently employed in diverse organisms ranging from bacteria to plants and mammals. The reaction is catalyzed by methyltransferases (Mtases) and modifies DNA, RNA, proteins and small molecules, such as catechol for regulatory purposes. The various aspects of the role of DNA methylation in prokaryotic restriction-modification systems and in a number of cellular processes in eukaryotes including gene regulation and differentiation is well documented.

    \ \

    Three classes of DNA Mtases transfer the methyl group from AdoMet to the target base to form either N-6-methyladenine, or N-4-methylcytosine, or C-5- methylcytosine. In C-5-cytosine Mtases, ten conserved motifs are arranged in the same order PUBMED:8127644. Motif I (a glycine-rich or closely related consensus sequence; FAGxGG in M.HhaI PUBMED:8343957), shared by other AdoMet-Mtases PUBMED:2684970, is part of the cofactor binding site and motif IV (PCQ) is part of the catalytic site. In contrast, sequence comparison among N-6-adenine and N-4-cytosine Mtases indicated two of the conserved segments PUBMED:2690010, although more conserved segments may be present. One of them corresponds to motif I in C-5-cytosine Mtases, and the other is named (D/N/S)PP(Y/F). Crystal structures are known for a number of Mtases PUBMED:7607476, PUBMED:8343957, PUBMED:8127644, PUBMED:7971991. The cofactor binding sites are almost identical and the essential catalytic amino acids coincide. The comparable protein folding and the existence of equivalent amino acids in similar secondary and tertiary positions indicate that many (if not all) AdoMet-Mtases have a common catalytic domain structure. This permits tertiary structure prediction of other DNA, RNA, protein, and small-molecule AdoMet-Mtases from their amino acid sequences PUBMED:7897657.

    \ \

    This domain includes a range of O-methyltransferases some of which utilize S-adenosyl methionine as substrate PUBMED:8434913. In prokaryotes, the major role of DNA methylation is to protect host DNA against degradation by restriction enzymes. In eukaryotes, DNA methylation has been implicated in the control of several cellular processes, including differentiation, gene regulation, and embryonic development. O-methyltransferases have a common catalytic domain structure, which might be universal among S-adenosyl-L-methionine (AdoMet)-dependent methyltransferases PUBMED:7773746.

    \

    Comparative analysis of the predicted amino acid sequences of a number of plant O-methyltransferase cDNA clones show that they share some 32-71% sequence identity, and can be grouped according to the different compounds they utilise as substrates PUBMED:9484457.

    \ 5225 IPR008626 \ This family contains Saccharomyces cerevisiae GAL11 proteins. Gal11 and Sin4 proteins are Saccharomyces cerevisiae global transcription factors that regulate transcription of a variety of genes, both positively and negatively. Gal11, in a major part, functions in the activation of transcription, whereas Sin4 has an opposite role, yet they are reported to be present as a complex in the so-called RNA polymerase II holoenzyme PUBMED:11536332.\ 2977 IPR006897 \ This domain consists of a region found within the alpha isoform and at the C terminus of the beta isoform of the homeobox-containing transcription factor of HNF-1. Different isoforms of HNF-1 are generated by the differential use of polyadenylation sites and by alternative splicing. The C-terminal region of HNF-1 is responsible for the activation of transcription PUBMED:7900999. Mutations and polymorphisms in HNF-1 cause the type 3 form of maturity-onset diabetes of the young (MODY3) PUBMED:9133564.\ 2257 IPR002746 \ These are a family of hypothetical proteins are found in Archaebacteria and have no known function.\ 2326 IPR002760 \

    This family contains archaebacterial proteins of unknown function. Members of this\ family may be transmembrane proteins.

    \ 4621 IPR002817 \ ThiC is found within the thiamin biosynthesis operon. ThiC is involved in\ thiamin biosynthesis PUBMED:10382260. The precise catalytic\ function of ThiC is still not known. ThiC participates in the formation of\ 4-Amino-5-hydroxymethyl-2-methylpyrimidine from AIR, an intermediate in\ the de novo pyrimidine biosynthesis.\ \ 7087 IPR010832 \

    Mature peptide hormones and neuropeptides are typically synthesised from much larger precursors and require several post-translational processing steps--including\ proteolytic cleavage--for the formation of the bioactive species. The subtilisin-related proteolytic enzymes that accomplish neuroendocrine-specific cleavages are\ known as prohormone convertases 1 and 2 (PC1 and PC2), which belong to MEROPS peptidase family S8B. The cell biology of these proteases within the regulated secretory pathway of neuroendocrine cells is\ complex, and they are themselves initially synthesised as inactive precursor molecules. ProPC1 propeptide cleavage occurs rapidly in the endoplasmic reticulum, yet its major site of action on prohormones takes place later in the secretory pathway. PC1 undergoes an interesting carboxyl terminal processing event whose function\ appears to be to activate the enzyme. ProPC2, on the other hand, exhibits comparatively long initial folding times and exits the endoplasmic reticulum without\ propeptide cleavage, in association with the neuroendocrine-specific protein 7B2. Once the proPC2/7B2 complex arrives at the trans-Golgi network, 7B2 is\ internally cleaved into two domains, the 21-kDa fragment and a carboxy-terminal 31 residue peptide. PC2 propeptide removal occurs in the maturing secretory granule, most likely through autocatalysis, and 7B2 association does not appear to be directly required for this cleavage event. However, if proPC2 has not encountered 7B2 intracellularly, it cannot generate a catalytically active mature species. The molecular mechanism behind the intriguing intracellular association of 7B2 and proPC2 is still unknown, but may involve conformational rearrangement or stabilisation of a proPC2 conformer mediated by a 36-residue internal segment of 21-kDa 7B2.

    \ \ \

    This family represents proSAAS, which belongs to MEROPS inhibitor family I49, clan I-. ProSAAS is the PC1 binding protein PUBMED:10632593, PUBMED:11742530. It exhibits both structural and functional homology to 7B2 (), which is the PC2 binding protein PUBMED:10812060. The CT domain of proSAAS contains the same inhibitor hexapeptide as 7B2 PUBMED:9756897, PUBMED:10812060, consequently both 7B2 and proSAAS are two members of a homologous family of prohormone convertase inhibitor proteins. However, despite their apparent similarities, there are profound differences in the evolutionary and cell biology of these two prohormone convertases, which are likely to be influenced by their binding proteins and their respective N-terminal PUBMED:12914799 and C-terminal domains.

    \ 665 IPR000270 \ The Phox and Bem1p domain, is present in many eukaryotic cytoplasmic signalling proteins. The domain adopts a beta-grasp fold, similar to\ that found in ubiquitin and Ras-binding domains. A motif, variously termed OPR, PC and AID, represents the most conserved region\ of the majority of PB1 domains, and is necessary for PB1 domain function. This function is the formation of PB1 domain\ heterodimers, although not all PB1 domain pairs associate. \ 6536 IPR010608 \

    This family consists of several plant specific hypothetical proteins of around 160 residues in length. The function of this family is unknown.

    \ 5276 IPR008391 \ This family consists of several bacterial acetyl xylan esterase proteins. Acetyl xylan esterases are enzymes that hydrolyse the ester linkages of the acetyl groups in position 2 and/or 3 of the xylose moieties of natural acetylated xylan from hardwood. These enzymes are one of the accessory enzymes which are part of the xylanolytic system, together with xylanases, beta-xylosidases, alpha-arabinofuranosidases and methylglucuronidases; these are all required for the complete hydrolysis of xylan PUBMED:10878123.\ 7074 IPR009879 \

    This entry consists of several Phlebovirus nonstructural NS-M proteins, which represent the N-terminal region of the M polyprotein precursor. The function of this family is unknown.

    \ 4622 IPR003720 \ Thiamine pyrophosphate (TPP) is synthesized de novo in many bacteria and is a required cofactor for many enzymes in the cell. \ ThiI is required for thiazole synthesis in the thiamine biosynthesis pathway PUBMED:9209060. Almost all the proteins in this group have an N-terminal THUMP domain (see ).\ 1850 IPR002823 \ Members of this prokaryotic family have no known function. Members are predicted to be integral membrane proteins and are similar to a protein in a tartrate utilization region (TAR) of Agrobacterium vitis a common pathogen of grapevine. Most grapevine strains utilize tartrate, an abundant compound in grapevine PUBMED:8672817.\ \ 3653 IPR005309 \

    PapG, the adhesin of the P-pili, is situated at the tip and is only a minor component of the whole pilus structure. A two-domain structure has been postulated for PapG; a carbohydrate binding N-terminus and chaperone binding C-terminus (this domain). The chaperone-binding domain is highly conserved, and is essential for the correct assembly of the pili structure when aided by the chaperone molecule PapD PUBMED:11454740, PUBMED:11440716.

    \ 5588 IPR008653 \ This family consists of several eukaryotic immediate early response (IER) 2 and 5 proteins. The role of IER5 is unclear although it play an important role in mediating the cellular response to mitogenic signals. Again, little is known about the function of IER2 although it is thought to play a role in mediating the cellular responses to a variety of extracellular signals PUBMED:11102586, PUBMED:10049588.\ 5905 IPR009281 \

    This family consists of several LR8 like proteins from humans, mice and rats. The function of the human LR8 protein is unknown although it is known to be strongly expressed in the lung fibroblasts PUBMED:9922225.

    \ 110 IPR004346 \

    This family includes the Helicobacter pylori protein CagE (see examples), which together with other proteins from the\ cag pathogenicity island (PAI), encodes a type IV transporter secretion system. The precise role of CagE is not known,\ but studies in animal models have shown that it is essential for pathogenesis in Helicobacter pylori induced gastritis and\ peptic ulceration PUBMED:11104802. Indeed, the expression of the cag PAI has been shown to be essential for stimulating human gastric\ epithelial cell apoptosis in vitro PUBMED:11447179.

    \

    Similar type IV transport systems are also found in other bacteria. This family includes proteins from the trb and Vir conjugal transfer systems in\ Agrobacterium tumefaciens and homologues of VirB proteins from other species.

    \ 1438 IPR001837 \

    Cyclase-associated protein (CAP) is a conserved two-domain protein that helps to activate the catalytic activity of adenylyl cyclase in\ the cyclase-bound state through interaction with Ras, which binds to the cyclase in a different region. With its other domain, CAP can\ bind monomeric actin and therefore also carries a cytoskeletal function. The protein is thus involved in Ras/cAMP-dependent signal\ transduction and most likely serves as an adapter protein translocating the adenylyl cyclase complex to the actin cytoskeleton. PUBMED:1550959, PUBMED:7962207.

    \

    Structurally, CAP is a protein of 474 to 551 residues. The N- and C-terminal domains of CAP are connected by an intermediate section which contains a proline-rich region. In the yeast protein, this domain has\ been further divided into the P1 and P2 regions. While the P1 region is constituted by a 14 amino-acid sequence of unknown function, the P2 region exhibits a\ consensus SH3-binding motif (PXXP) and is necessary to target CAP to cortical actin patches. Dictyostelium CAP is a phosphatidylinositol 4,5-biphosphate (PIP2) regulated G-actin sequestering protein, which is present in the cytosol and shows\ enrichment at plasma-membrane regions. The cortical translocation is mediated by the N-terminal domain PUBMED:12351838.

    \ 1670 IPR004820 \

    This family includes PUBMED:10208837:

    \ \

    CTP:cholinephosphate cytidylyltransferase (CCT) is a key regulatory enzyme in phosphatidylcholine biosynthesis that catalyzes the formation of CDP-choline.\ A comparison of the catalytic domains of CCTs from a wide variety of organisms reveals a large number of completely conserved residues. There may be a role for the conserved HXGH sequence in catalysis. The membrane-binding domain in rat CCT has been defined, and it has been suggested that lipids may play a role in inactivating the enzyme. A phosphorylation domain has been described PUBMED:9370319.

    \ \ 5353 IPR004685 \ Characterized members of the branched chain Amino Acid:Cation Symporter (LIVCS) family transport all three of the branched chain aliphatic\ amino acids (leucine (L), isoleucine (I) and valine (V)). They function by a Na+ or H+\ symport mechanism and display 12 putative \ transmembrane helical spanners.\ 2385 IPR001326 \

    Eukaryotic elongation factor 1 (EF-1) is responsible for the GTP-dependent binding of \ aminoacyl-tRNAs to the ribosomes PUBMED:2278101. \ EF-1 is composed of four subunits: the alpha chain which binds GTP and aminoacyl-tRNAs,\ the gamma chain that probably plays a role in anchoring the complex to other cellular\ components and the beta and delta (or beta') chains. The beta and delta chains are highly \ similar proteins that both stimulate the exchange of GDP bound to the alpha chain for \ GTP PUBMED:2207149. The beta and \ delta chains are hydrophilic proteins. Their C-terminus seems to be \ important for the nucleotide exchange activity, while the N-termius is probably involved \ in the interaction with the gamma chain.

    \ 7168 IPR010855 \

    Expression from a human cytomegalovirus early promoter (E1.7) has been shown to be activated in trans by the IE2 gene product. Although the IE1 gene product alone had no effect on this early viral promoter, maximal early promoter activity was detected when both IE1 and IE2 gene products were present PUBMED:2157038. The IE1 protein from cytomegalovirus is also known as UL123.

    \ 7063 IPR009872 \

    This family consists of several bacterial proteins of around 100 residues in length. The function of this family is unknown.

    \ 5320 IPR008696 \ This domain is involved in snoRNP biogenesis PUBMED:12228251.\ 1995 IPR005504 \ This family contains a number of archaeal proteins that are completely uncharacterised. The proteins are between 130 and 160 amino acids long. Their C-terminus contains several conserved residues.\ 1798 IPR001525 \ C-5 cytosine-specific DNA methylases () (C5 Mtase) are enzymes that specifically methylate the C-5 carbon of cytosines in DNA to produce C5-methylcytosine PUBMED:3248729, PUBMED:8127644, PUBMED:2716049. In mammalian cells, cytosine-specific methyltransferases methylate certain CpG sequences, which are believed to modulate gene expression and cell differentiation. In bacteria, these enzymes are a component of restriction-modification systems and serve as valuable tools for the manipulation of DNA PUBMED:7773746, PUBMED:8127644. The structure of HhaI methyltransferase (M.HhaI) has been resolved to 2.5 A PUBMED:8343957: the molecule folds into 2 domains - a larger catalytic domain containing catalytic and cofactor binding sites, and a smaller DNA recognition domain.\ 6112 IPR010436 \

    This family consists of several Cytomegalovirus UL84 proteins. The open reading frame UL84 of human cytomegalovirus encodes a multifunctional regulatory protein which is required for viral DNA replication and binds with high affinity to the immediate-early transactivator IE2-p86 PUBMED:12610148.

    \ 5287 IPR008648 \ This family includes UL69 and IE63 that are transcriptional regulator proteins.\ 1418 IPR001442 \

    This duplicated domain is present at the C-terminal of type 4 collagen, the major structural component of glomerular basement membranes (GMB) forming a 'chicken-wire' meshwork together with laminins, proteoglycans and entactin/nidogen. Mutations in alpha-5 collagen IV are associated with\ X-linked Alport syndrome.

    \ 5807 IPR009242 \

    This family consists of several short, hypothetical bacterial proteins of unknown function.

    \ 3933 IPR007596 \ The repeat is found in the A-type inclusion protein of the Poxvirus family PUBMED:2826668.\ 4276 IPR011261 \

    RNA polymerase (RNAP) II, which is responsible for all mRNA synthesis in eukaryotes, consists of 12 subunits. Subunits Rpb3 and Rpb11 form a heterodimer that is functionally analogous to the archaeal RNAP D/L heterodimer, and to the prokaryotic RNAP alpha subunit (RpoA) homodimer. In each case, they play a key role in RNAP assembly by forming a platform on which the catalytic subunits (eukaryotic Rpb1/Rpb2, and prokaryotic beta/beta’) can interact PUBMED:11453250. These different subunits share regions of homology required for dimerisation. In eukaryotic Rpb11 and archaeal L subunits, the dimerisation domain consists of a contiguous Rpb11-like domain, whereas in eukaryotic Rpb3, archaeal D and bacterial RpoA subunits (), the dimerisation domain consists of the Rpb11-like domain interrupted by an insert domain. In the prokaryotic alpha subunit, this dimerisation domain is the N-terminal domain PUBMED:9657722.

    \ 2021 IPR005646 \ This family of bacterial proteins has no known function. The proteins are in the region of 500-600 amino acid residues in length.\ 7377 IPR003335 \

    Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase\ pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to\ the translocase component. PUBMED:2202721. From there, the mature proteins are either targeted to the outer\ membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial\ chromosome.

    \

    \ The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral\ membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of\ the mature peptide into the periplasm (SecD and SecF) PUBMED:2202721. The chaperone protein SecB PUBMED:11336818 is a highly acidic homotetrameric protein that exists as a "dimer of dimers" in the bacterial cytoplasm.\ SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane\ protein ATPase SecA for secretion PUBMED:10418149. Together with SecY and SecG, SecE forms a multimeric\ channel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. The\ latter is mediated by SecA. The structure of the\ Escherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmic\ domains PUBMED:12167867. Each membrane is composed of dimers of SecYEG. The monomeric complex contains 15\ transmembrane helices. \

    \

    This family consists of various prokaryotic SecD and SecF protein export membrane proteins. The SecD and SecF equivalents of the\ Gram-positive bacterium Bacillus subtilis are jointly present in one polypeptide,\ denoted SecDF, that is required to maintain a high capacity for protein secretion.\ Unlike the SecD subunit of the pre-protein translocase of Escherichia coli, SecDF\ of B. subtilis was not required for the release of a mature secretory protein from\ the membrane, indicating that SecDF is involved in earlier translocation steps PUBMED:9694879.\ Comparison with SecD and\ SecF proteins from other organisms revealed the presence of 10 conserved\ regions in SecDF, some of which appear to be important for SecDF function.\ Interestingly, the SecDF protein of B. subtilis has 12 putative transmembrane\ domains. Thus, SecDF does not only show sequence similarity but also structural\ similarity to secondary solute transporters PUBMED:9694879.

    \ 4619 IPR001938 \

    Pathogenesis related (PR) proteins, which are induced by various agents ranging from ethylene to pathogens, are structurally diverse and apparently ubiquitous in plants PUBMED:1463856: they include thaumatin, osmotin, tobacco major and minor PR proteins, alpha-amylase/trypsin inhibitor, and P21 and PWIR2 soybean and wheat leaf proteins. The proteins are involved in systematically acquired resistance and stress response in plants, although their precise role is unknown PUBMED:1463856. Thaumatin is an intensely sweet tasting protein (about 100,000 times sweeter than sucrose PUBMED:7049841) found in the West African shrub Thaumatococcus danielli: it is induced by attack by viroids, which are single-stranded unencapsulated RNA molecules that do not code for protein.

    Like other PR proteins, thaumatin is predicted to have a mainly beta structure, with a high content of beta-turns and little helix PUBMED:1463856.\ Tobacco cells exposed to gradually increased salt concentrations develop a greatly increased tolerance to salt, due to the expression of osmotin PUBMED:, a member of the PR protein family. Wheat plants attacked by barley powdery mildew express a PR protein (PWIR2), which results in resistance against that infection PUBMED:1650615. The similarity between this and other PR proteins to the maize alpha-amylase/trypsin inhibitor has suggested that PR proteins may act as some form of inhibitor PUBMED:1650615.

    \ 1774 IPR007535 \

    This domain is the N-terminal region of catechol, chlorocatechol or hydroxyquinol 1,2-dioxygenase proteins. This region is always found adjacent to the dioxygenase domain ().

    \ 7068 IPR010825 \

    This family consists of several Drosophila species specific Turandot proteins. The Turandot A (TotA) gene encodes a humoral factor, which is secreted from the fat body and accumulates in the body fluids. TotA is strongly induced upon bacterial challenge, as well as by other types of stress such as high temperature, mechanical pressure, dehydration, UV irradiation, and oxidative agents. It is also upregulated during metamorphosis and at high age. Flies that overexpress TotA show prolonged survival and retain normal activity at otherwise lethal temperatures. Although TotA is only induced by severe stress, it responds to a much wider range of stimuli than heat shock genes such as hsp70 or immune genes such as Cecropin A1 PUBMED:11369236.

    \ 2064 IPR007277 \ This is a family of conserved eukaryotic transmembrane proteins.\ 3477 IPR005591 \

    The napB gene encodes a dihaem cytochrome c, the small subunit of a heterodimeric periplasmic nitrate reductase PUBMED:11389694.

    \ 6132 IPR008316 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 4307 IPR006950 \ This family comprises the 11 kDa non-structural proteins found in segment S11 of the Rotavirus genome. They may form part of a complex that is involved in the replication of the genome.\ 4474 IPR005523 \

    This domain is currently found in streptomyces bacteria, in a set of bacterial proteins with no known function. Most proteins contain two copies of this domain.

    \ 2973 IPR000891 \

    \ Pyruvate carboxylase () (PC), a member of the biotin-dependent\ enzyme family, is involved in the gluconeogenesis by mediating the\ carboxylation of pyruvate to oxaloacetate. Biotin-dependent carboxylase\ enzymes perform a two step reaction. Enzyme-bound biotin is first carboxylated\ by bicarbonate and ATP and the carboxyl group temporarily bound to biotin is\ subsequently transferred to an acceptor substrate such as pyruvate PUBMED:11851389. PC has\ three functional domains: a biotin carboxylase (BC) domain,\ a carboxyltransferase (CT) domain which perform the second part of the\ reaction and a biotinyl domain PUBMED:7780827, PUBMED:10229653. The mechanism by which\ the carboxyl group is transferred from the carboxybiotin to the pyruvate is not\ well understood.\

    \

    \ The pyruvate carboxyltransferase domain is also found in other pyruvate\ binding enzymes and acetyl-CoA dependent enzymes suggesting that this domain\ can be associated with different enzymatic activities.

    \ 1769 IPR007643 \ The Dictyostelium discoideum spore coat is a polarised extracellular matrix composed of glycoproteins and cellulose. Four of the major coat glycoproteins exist as a multi-protein complex within the prespore vesicles before secretion. Of these, SP96 and SP70 are members of this family. The presence of SP96 and SP70 in the complex is necessary for the cellulose binding activity of the complex, which is in turn necessary for normal spore coat assembly PUBMED:10931888. The function of this region is not known.\ 3722 IPR000816 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to MEROPS peptidase family C15 (pyroglutamyl peptidase I, clan CF). The type example being pyroglutamyl peptidase I of Bacillus amyloliquefaciens.

    \ \ \

    Pyroglutamyl/pyrrolidone carboxyl peptidase (Pcp or PYRase) is an exopeptidase that\ hydrolytically removes the pGlu from pGlu-peptides or pGlu-proteins PUBMED:7824521, PUBMED:1353026.\ PYRase has been found in prokaryotes and eukaryotes where at least two different classes have been characterised: the first\ containing bacterial and animal type I PYRases, and the second containing\ animal type II and serum PYRases. Type I and bacterial PYRases are soluble\ enzymes, while type II PYRases are membrane-bound. The primary application\ of PYRase has been its utilisation for protein or peptide sequencing, and\ bacterial diagnosis PUBMED:1353026. The conserved residues Cys-144 and His-168 have\ been identified by inhibition and mutagenesis studies PUBMED:7824521, PUBMED:7909543.

    \ 7205 IPR010862 \

    This family consists of several bacterial proteins of around 115 residues in length. Members of this family seem to be found exclusively in Salmonella and Yersinia species and several have been described as being putative cytoplasmic proteins. The function of this family is unknown.

    \ 7183 IPR010187 \

    This entry represents selenoprotein B of glycine reductase, sarcosine reductase, betaine reductase, D-proline reductase, and perhaps others. All members are expected to contain an internal UGA codon, encoding selenocysteine, which may be misinterpreted as a stop codon.

    \ 573 IPR004843 \

    Protein phosphorylation plays a central role in the regulation of cell functions PUBMED:2827745, causing \ the activation or inhibition of many enzymes involved in various biochemical pathways PUBMED:2176161. Kinases and phosphatases are the enzymes responsible for this, and may themselves be subject to control through the action of hormones and growth factors PUBMED:2827745. Serine/threonine (S/T) phosphatases catalyse the dephosphorylation of phosphoserine and phosphothreonine residues. In \ mammalian tissues four different types of PP have been identified and are known as PP1, PP2A, PP2B and \ PP2C. Except for PP2C, these enzymes are evolutionary related. The catalytic regions of the proteins are well conserved and have a slow mutation rate, suggesting that major changes in these regions are highly detrimental PUBMED:2827745.

    \

    The metallo-phosphoesterase motif is found in a large number of proteins invoved in phosphoryation. These include serine/threonine phosphatases, DNA polymerase, exonucleases, and other phosphatases.

    \ 2799 IPR000524 \

    Many bacterial transcription regulation proteins bind DNA through a helix-turn-helix (HTH) motif, which can be classified into subfamilies on the basis of sequence similarities. The HTH GntR family has many members distributed among diverse bacterial groups that regulate various biological processes. It was named GntR after the Bacillus subtilis repressor of the gluconate operon PUBMED:2060763. Family members include GntR, HutC, KorA, NtaR, FadR, ExuR, FarR, DgoR and PhnF. The crystal structure of the FadR protein has been determined PUBMED:11013219. In general, these proteins contain a DNA-binding HTH domain at the N terminus, and an effector-binding or oligomerisation domain at the C terminus (). The DNA-binding domain is well conserved in structure for the whole of the GntR family, consisting of a 3-helical bundle core with a small beta-sheet (wing); the GntR winged helix structure is similar to that found in several other transcriptional regulator families. The regions outside the DNA-binding domain are more variable and are consequently used to define GntR subfamilies PUBMED:11756427. This entry represents the N-terminal DNA-binding domain of the GntR family.

    \ 7665 IPR009109 \

    Ran GTPase is a ubiquitous protein required for nuclear transport, spindle assembly, nuclear assembly and mitotic cell cycle regulation. RanGTPase activating protein 1 (RanGAP1) is one of several RanGTPase accessory proteins. During interphase, RanGAP1 is located in the cytoplasm, while during mitosis it becomes associated with the kinetochores PUBMED:12852855. Cytoplasmic RanGAP1 is required for RanGTPase-directed nuclear transport. The activity of RanGAP1 requires the accessory protein RanBP1. RanBP1 facilitates RanGAP1 hydrolysis of Ran-GTP, both directly and by promoting the dissociation of Ran-GTP from transport receptors, which would otherwise block RanGAP1-mediated hydrolysis. RanGAP1 is thought to bind to the Switch 1 and Switch 2 regions of RanGTPase. The Switch 2 region can be buried in complexes with karyopherin-beta2, and requires the interaction with RanBP1 to permit RanGAP1 function. RanGAP1 can undergo SUMO (small ubiquitin-like modifier) modification, which targets RanGAP1 to RanBP2/Nup358 in the nuclear pore complex, and is required for association with the nuclear pore complex and for nuclear transport PUBMED:11853669. The enzymes involved in SUMO modification are located on the filaments of the nuclear pore complex.

    \

    The RanGAP1 N-terminal domain is fairly well conserved between vertebrate and fungal proteins, but yeast does not contain the C-terminal domain. The C-terminal domain is SUMO-modified and required for the localisation of RanGAP1 at the nuclear pore complex. The structure of the C-terminal domain is multihelical, consisting of two curved alpha/alpha layers in a right-handed superhelix.

    \

    The SSF signature in this entry is currently under review. Please be aware that some of the protein hits may be false positives.

    \ 5316 IPR008727 \ This motif is found usually in pairs in a family of bacterial membrane proteins. It is also found as a triplet of tandem repeats comprising the entire length in a another family of hypothetical proteins.\ 2599 IPR004269 \ This family includes the folate receptor which binds to folate and reduced folic acid derivatives and mediates delivery of\ 5-methyltetrahydrofolate to the interior of cells. These proteins are attached to the membrane by a GPI-anchor. A riboflavin-binding protein required for the transport of riboflavin to the developing oocyte in chicken also belong to this family.\ 7747 IPR012931 \

    This domain is found in the N-terminal region of the TraG protein () from Escherichia coli. This is a membrane-spanning protein, with three predicted transmembrane segments and two periplasmic regions PUBMED:1348105. The TraG protein is known to be essential for DNA transfer in the process of conjugation, with the N-terminal portion being required for F pilus assembly PUBMED:1348105, PUBMED:7915817. The protein is thought to interact with the periplasmic domain of TraN () to stabilise mating-cell interactions PUBMED:7915817.

    \ 6045 IPR010409 \

    This family includes gbp a protein from Soybean that binds to GAGA element dinucleotide repeat DNA PUBMED:12177492. It seems likely that the region which defines this family mediates DNA binding. This putative domain contains several conserved cysteines and a histidine suggesting this may be a zinc-binding DNA interaction domain.

    \ 7370 IPR011419 \

    Mitochondrial F1-ATPase is an oligomeric enzyme composed of five distinct subunit polypeptides. The alpha and beta subunits make up the bulk of protein mass of F1. In Saccharomyces cerevisiae both subunits are synthesised as precursors with N-terminal targeting signals that are removed upon translocation of the proteins to the matrix compartment PUBMED:1826907. These proteins include examples from eukaryotes and bacteria and may have chaperone activity, being involved in F1 ATPase complex assembly.

    \ 5762 IPR009231 \

    This family consists of several mid-1-related chloride channels. Mid-1-related chloride channel (MCLC) proteins function as a chloride channel when incorporated in the planar lipid bilayer PUBMED:11279057.

    \ 1406 IPR005167 \

    Bunyavirus has three genomic segments: small (S), middle-sized (M), and large (L). The S segment encodes the nucleocapsid and a non-structural protein. The M segment codes for two glycoproteins, G1 and G2, and another non-structural protein (NSm). The L segment codes for an RNA polymerase. This family contains the G1 glycoprotein which is the viral attachment protein PUBMED:8553534.

    \ 3695 IPR001086 \

    Prephenate dehydratase (, PDT) catalyses the decarboxylation of prephenate to phenylpyruvate. In microorganisms it is part of the terminal pathway of phenylalanine biosynthesis. In some bacteria such as Escherichia coli PDT is part of a bifunctional enzyme (P-protein) that also catalyzes the transformation of chorismate into prephenate (chorismate mutase, , ) while in other bacteria it is a monofunctional enzyme. The sequence of monofunctional PDT aligns well with the C-terminal part of P-proteins PUBMED:9642265.

    \ 5249 IPR008470 \ This family contains several plant, cyanobacterial and algal proteins of unknown function. The family is exclusively found in phototrophic organisms and may therefore play a role in photosynthesis.\ 3541 IPR007710 \

    Nucleoside 2-deoxyribosyltransferase () catalyses the cleavage of the glycosidic bonds of 2-deoxyribonucleosides. Nucleoside 2-deoxyribosyltransferases can be divided into two groups based on their substrate specificity: class I enzymes are specific for the transfer of deoxyribose between two purines, while class II enzymes will transfer the deoxyribose between either purines or pyrimidines. The structure of the class I PUBMED:14992575 and class II PUBMED:8805514 enzymes are very similar. In class I enzymes, the purine base shields the active site from solvent, which the smaller pyrimidine base cannot do, while in class II enzymes the active site is shielded by a loop (residues 48-62). Both classes of enzymes are found in various Lactobacillus species and participate in nucleoside recycling in these microorganisms. This entry represents both classes of enzymes.

    \ 7162 IPR009938 \

    This entry represents the N terminus of interferon-induced 35 kDa protein (IFP 35) (approximately 80 residues long), which contains a leucine zipper motif in an alpha helical configuration PUBMED:10950963. This group of proteins also includes N-myc-interactor (Nmi), a homologous interferon-induced protein.

    \ 6459 IPR009535 \

    This family represents a small conserved region of unknown function within eukaryotic phospholipase C (). All members also contain and .

    \ 4807 IPR000241 \ This domain is probably a methylase. It is associated with the THUMP domain that also occurs with RNA modification domains PUBMED:11295541.\ 6269 IPR010497 \

    This entry represents the N-terminal region of the eukaryotic epoxide hydrolase protein. Epoxide hydrolases () comprise a group of functionally related enzymes that catalyse the addition of water to oxirane compounds (epoxides), thereby usually generating vicinal trans-diols. EHs have been found in all types of living organisms, including mammals, invertebrates, plants, fungi and bacteria. In animals, the major interest in EH is directed towards their detoxification capacity for epoxides since they are important safeguards against the cytotoxic and genotoxic potential of oxirane derivatives that are often reactive electrophiles because of the high tension of the three-membered ring system and the strong polarisation of the C--O bonds. This is of significant relevance because epoxides are frequent intermediary metabolites, which arise during the biotransformation of foreign compounds PUBMED:10548561. This domain is often found in conjunction with .

    \ \ 7109 IPR009904 \

    This family contains a number of eukaryotic Insulin-induced proteins (INSIG-1 and INSIG-2) approximately 200 residues long. INSIG-1 and INSIG-2 are found in the endoplasmic reticulum and bind the sterol-sensing domain of SREBP cleavage-activating protein (SCAP), preventing it from escorting SREBPs to the Golgi. Their combined action permits feedback regulation of cholesterol synthesis over a wide range of sterol concentrations PUBMED:12202038,PUBMED:12242332.

    \ 2519 IPR001712 \

    The Flagellar/Hr/Invasion Proteins Export Pore (FHIPEP) family PUBMED:8253684, PUBMED:8316211 consists of a number of proteins that constitute the type III secretion (or signal peptide-independent) pathway apparatus PUBMED:1365398, PUBMED:1592799. This mechanism translocates proteins lacking an N-terminal signal peptide across the cell membrane in one step, as it does not require an intermediate periplasmic process to cleave the signal peptide. It is a common pathway amongst Gram-negative bacteria for secreting toxic and flagellar proteins.

    \

    The pathway apparatus comprises three components: two within the inner membrane and one within the outer PUBMED:8316211. An FHIPEP protein is located within the inner membrane, although it is unknown which component it constitutes. FHIPEP proteins have all about 700 amino-acid residues. Within the sequence, the N terminus is highly conserved and hydrophobic, suggesting that this terminus is embedded within the membrane, with 6-8 transmembrane (TM) domains, while the C terminus is less conserved and appears to be devoid of TM regions. It is possible that members of the FHIPEP family serve as pores for the export of specific proteins.

    \ 5153 IPR007990 \

    This family consists of seminal vesicle autoantigen and prolactin-inducible (PIP) proteins.\ Seminal vesicle autoantigen (SVA) is specifically present in the seminal plasma of mice. This 19 kDa secretory glycoprotein suppresses the motility of\ spermatozoa by interacting with phospholipid. PIP has several known functions. In saliva, this\ protein plays a role in host defence by binding to microorganisms such as Streptococcus. PIP is an\ aspartyl proteinase and it acts as a factor capable of suppressing T-cell apoptosis through its\ interaction with CD4 PUBMED:11178965.

    \ 2886 IPR007845 \ The Yersinia enterocolitica O:8 periplasmic binding protein-dependent transport system consisted of four proteins: the periplasmic haemin-binding protein HemT, the haemin permease protein HemU, the ATP-binding hydrophilic protein HemV and the haemin-degrading protein HemS (this family).\ 3755 IPR001570 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases constitutes the MEROPS peptidase family M4 (thermolysin family, clan MA(E)). The protein fold of the peptidase domain of thermolysin, is the type eaxample for members of the clan MA. The thermolysin family is composed only of secreted eubacterial endopeptidases. The zinc-binding residues\ are H-142, H-146 and E-166, with E-143 acting as the catalytic residue.\ Thermolysin also contains 4 calcium-binding sites, which contribute to its\ unusual thermostability. The family also includes enzymes from a number\ of pathogens, including Legionella and Listeria, and the protein pseudolysin,\ all with a substrate specificity for an aromatic residue in the P1' position. Three-dimensional structure analysis has shown that the enzymes undergo\ a hinge-bend motion during catalysis. Pseudolysin has a broader\ specificity, acting on large molecules such as elastin and collagen,\ possibly due to its wider active site cleft PUBMED:7674922.

    \ 6500 IPR009558 \

    This family consists of several hypothetical bacterial proteins of around 210 residues in length. The function of this family is unknown.

    \ 6248 IPR009446 \

    The mgm101 gene was identified as essential for maintenance of the mitochondrial genome in Saccharomyces cerevisiae PUBMED:10209025. Based on its DNA-binding activity, and experimental work with a temperature-sensitive mgm101 mutant, it has been proposed that the mgm101 gene product performs an essential function in the repair of oxidatively damaged mitochondrial DNA PUBMED:10209025.

    \ 2316 IPR007777 \ This family consists of uncharacterised proteins from Borrelia burgdorferi. There is some evidence to suggest that the proteins may be outer surface proteins.\ 2033 IPR007155 \ Members of this entry are short (less than 100 amino acids) proteins found in archaebacteria. The function of these proteins is unknown.\ 7176 IPR009948 \

    This family contains a number of bacterial Syd proteins approximately 180 residues long. It has been suggested that Syd is loosely associated with the cytoplasmic surface of the cytoplasmic membrane, and that interaction with SecY may be involved in this membrane association PUBMED:7890670.

    \ 8095 IPR013210 \

    Leucine Rich Repeats () are short sequence motifs present in a number of proteins with diverse functions and cellular locations. Leucine Rich Repeats are often flanked by cysteine rich domains. This domain is often found at the N-terminus of tandem leucine rich repeats.

    \ 5718 IPR008610 \ This family consists of several eukaryotic rRNA processing protein EBP2 sequences. Ebp2p is required for the maturation of 25S rRNA and 60S subunit assembly. Ebp2p may be one of the target proteins of Rrs1p for executing the signal to regulate ribosome biogenesis PUBMED:10947841.\ 1152 IPR006703 \

    This entry represents Arabidopsis protein AIG1 which appears to be involved in plant resistance to bacteria. The Arabidopsis disease resistance gene RPS2 is involved in recognition of bacterial pathogens carrying the avirulence gene avrRpt2. AIG1 (avrRpt2-induced gene) exhibits RPS2- and avrRpt2-dependent induction early after\ infection with Pseudomonas syringae carrying avrRpt2 PUBMED:8742710.

    The pattern also recognises a number of mammalian proteins, for example the rat immune-associated nucleotide 4 protein, suggesting that the family may have a wider function.

    \ 1196 IPR006980 \

    Ammonia monooxygenase and the particulate methane monooxygenase are both integral membrane proteins, occurring in ammonia oxidisers and methanotrophs respectively, which are thought to be evolutionarily related PUBMED:7590173. These enzymes have a relatively wide substrate specificity and can catalyse the oxidation of a range of substrates including ammonia, methane, halogenated hydrocarbons and aromatic molecules PUBMED:12209257. These enzymes are composed of 3 subunits - A (), B () and C () - and contain various metal centres, including copper. Particulate methane monooxygenase from Methylococcus capsulatus (Bath) is an ABC homotrimer, which contains mononuclear and dinuclear copper metal centres, and a third metal centre containing a metal ion whose identity in vivo is not certainPUBMED:15674245.

    \

    The C subunit from Methylococcus capsulatus (Bath) resides primarily in the membrane and consists of five transmembrane helices. Several conserved residues contribute to a metal binding centrePUBMED:15674245.

    \ 3744 IPR000395 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M27 (clan MA(E)). A number of the proteins have been classified as non-peptidase homologues as they have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity of peptidases in the family.

    \ \

    There are seven antigenically distinct forms of botulinum neurotoxin, \ designated A, B, C1, D, E, F and G. The seven neurotoxins are potent\ protein toxins that inhibit neurotransmitter release from peripheral\ cholinergic synapses PUBMED:2160960. On binding to the neuronal synapses, the\ molecules are internalised and move by retrograde transport up the axon\ into the spinal cord, where they can move between post- and presynaptic\ neurons. The toxin inhibits neurotransmitter release by acting as a zinc\ endopeptidase that cleaves synaptic proteins\ such as synaptobrevins, syntaxin and SNAP-25 PUBMED:8897436.\ The protein toxins exist as disulphide-linked heterodimers of light and \ heavy chains. The light chain has the pharmacological activity, while the\ N- and C-termini of the heavy chain mediate channel formation and toxin\ binding PUBMED:2160960. The light chain exhibits a high level of sequence similarity\ to tetanus toxin (TeTx). Alignment of all characterised neurotoxin sequences\ reveals the presence of highly conserved amino acid domains interspersed\ with amino acid tracts with little overall similarity. The most divergent\ region corresponds to the C-terminal extremity of each toxin, which may\ reflect differences in specificity of binding to neurone acceptor sites PUBMED:1541280.

    \ 764 IPR002638 \ Quinolinate phosphoribosyl transferase (QPRTase) or nicotinate-nucleotide pyrophosphorylase is involved in the de novo synthesis of NAD in both prokaryotes and eukaryotes. It catalyses the reaction of quinolinic acid with 5-phosphoribosyl-1-pyrophosphate (PRPP) in the presence of Mg2+ to give rise to nicotinic acid mononucleotide (NaMN), pyrophosphate and carbon dioxide PUBMED:9016724, PUBMED:8561507. Unlike , this domain also includes the molybdenum transport system protein ModD.\ 1614 IPR013092 \

    The connexins are a family of integral membrane proteins that oligomerise to form intercellular channels that are clustered at gap junctions. These channels are specialised sites of cell-cell contact that allow the passage of ions, intracellular metabolites and messenger molecules (with molecular weight less than 1-2 kD) from the cytoplasm of one cell to its opposing neighbours. They are found in almost all vertebrate cell types, and somewhat similar proteins have been cloned from plant species. Invertebrates utilise a different family of molecules, innexins, that share a similar predicted secondary structure to the vertebrate connexins, but have no sequence identity to them PUBMED:9769729.

    \ \

    Vertebrate gap junction channels are thought to participate in diverse biological functions. For instance, in the heart they permit the rapid cell-cell transfer of action potentials, ensuring coordinated contraction of the cardiomyocytes. They are also responsible for neurotransmission at specialised 'electrical' synapses. In non-excitable tissues, such as the liver, they may allow metabolic cooperation between cells. In the brain, glial cells are extensively-coupled by gap junctions; this allows waves of intracellular Ca2+ to propagate through nervous tissue, and may contribute to their ability to spatially-buffer local changes in extracellular K+ concentration PUBMED:7685944.

    \ \

    The connexin protein family is encoded by at least 13 genes in rodents, with many homologues cloned from other species. They show overlapping tissue expression patterns, most tissues expressing more than one connexin type. Their conductances, permeability to different molecules, phosphorylation and voltage-dependence of their gating, have been found to vary. Possible communication diversity is increased further by the fact that gap junctions may be formed by the association of different connexin isoforms from apposing cells. However, in vitro studies have shown that not all possible combinations of connexins produce active channels PUBMED:8811187, PUBMED:8608591.

    \ \

    Hydropathy analysis predicts that all cloned connexins share a common transmembrane (TM) topology. Each connexin is thought to contain 4 TM\ domains, with two extracellular and three cytoplasmic regions. This model\ has been validated for several of the family members by in vitro biochemical\ analysis. Both N- and C-termini are thought to face the cytoplasm, and the\ third TM domain has an amphipathic character, suggesting that it contributes\ to the lining of the formed-channel. Amino acid sequence identity between\ the isoforms is ~50-80%, with the TM domains being well conserved. Both\ extracellular loops contain characteristically conserved cysteine residues,\ which likely form intramolecular disulphide bonds. By contrast, the single\ putative intracellular loop (between TM domains 2 and 3) and the cytoplasmic\ C-terminus are highly variable among the family members.\ Six connexins are\ thought to associate to form a hemi-channel, or connexon. Two connexons then\ interact (likely via the extracellular loops of their connexins) to form the\ complete gap junction channel.

    \ \
     \
           NH2-***        ***        *************-COOH\
                 **     **   **      **\
                 **    **     **    **   Cytoplasmic\
              ---**----**-----**----**----------------\
                 **    **     **    **   Membrane\
                 **    **     **    **\
              ---**----**-----**----**----------------\
                 **    **     **    **   Extracellular\
                  **  **       **  **\
                    **           **\
    
    \ \

    Two sets of nomenclature have been used to identify the connexins. The\ first, and most commonly used, classifies the connexin molecules according\ to molecular weight, such as connexin43 (abbreviated to Cx43), indicating\ a connexin of molecular weight close to 43 kD. However, studies have\ revealed cases where clear functional homologues exist across species\ that have quite different molecular masses; therefore, an alternative\ nomenclature was proposed based on evolutionary considerations, which\ divides the family into two major subclasses, alpha and beta, each with a\ number of members PUBMED:1320430. Due to their ubiquity and overlapping tissue distributions, it has proved difficult to elucidate the functions of individual connexin isoforms. To circumvent this problem, particular connexin-encoding genes have been subjected to targeted-disruption in mice, and the phenotype of the resulting animals investigated. Around half the connexin isoforms have been investigated in this manner PUBMED:9861669. Further insight into the functional roles of connexins has come from the discovery that a number of human diseases are caused by mutations in connexin genes. For instance, mutations in Cx32 give rise to a form of inherited peripheral neuropathy called X-linked dominant Charcot-Marie-Tooth disease PUBMED:7570999. Similarly, mutations in Cx26 are responsible for both autosomal recessive and dominant forms of nonsyndromic deafness, a disorder characterised by hearing loss, with no apparent effects on other organ systems.

    \ \

    This domain is found in the N-terminal region of these proteins.

    \ \ \ \ \ \ 5374 IPR008839 \ Members of this family are mitochondrial inner membrane proteins with a role in inner mitochondrial membrane organisation and biogenesis PUBMED:12591915.\ 4988 IPR003359 \

    Photosystem I (PSI) is a large protein complex embedded within the photosynthetic thylakoid membrane. It consists of 11 subunits, ~100 chlorophyll a molecules, 2 phylloquinones, and 3 Fe4S4-clusters. The three dimensional structure of the PSI complex has been resolved at 2.5 A PUBMED:11418848, which allows the precise localisation of each cofactor. PSI together with photosystem II (PSII) catalyses the light-induced steps in oxygenic photosynthesis - a process found in cyanobacteria, eukaryotic algae (e.g. red algae, green algae) and higher plants.

    \

    To date, three thylakoid proteins involved in the stable accumulation of PSI have been identified: BtpA () PUBMED:9045660, Ycf3 PUBMED:9321389, PUBMED:9314531, and Ycf4 PUBMED:9321389. Because translation of the psaA and psaB mRNAs encoding the two reaction centre polypeptides, of PSI and PSII respectively, is not affected in mutant strains lacking functional ycf3 and ycf4, the products of these two genes appear to act at a post-translational step of PSI biosynthesis.\ These gene products are therefore involved either in the stabilisation or in the assembly of the PSI complex. However, their exact roles remain unknown. The BtpA protein appears to act at the level of PSI stabilisation PUBMED:10806238. It is an extrinsic membrane protein located on the cytoplasmic side of the thylakoid membrane PUBMED:10103064, PUBMED:10806238. Homologs of BtpA are found in the crenarchaeota and euryarchaeota, where their function remains unknown. The Ycf4 protein is firmly associated with the thylakoid membrane, presumably through a transmembrane domain PUBMED:9321389. Ycf4 co-fractionates with a protein complex larger than PSI upon sucrose density gradient centrifugation of solubilised thylakoids PUBMED:9321389. The Ycf3 protein is loosely associated with the thylakoid membrane and can be released from the membrane with sodium carbonate. This suggests that Ycf3 is not part of a stable complex and that it probably interacts transiently with its partners PUBMED:11752384. Ycf3 contains a number of tetratrico peptide repeats (TPR, ); TPR is a structural motif present in a wide range of proteins, which mediates proteinprotein interactions.

    \ \ 4232 IPR001210 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaebacterial ribosomal proteins can be grouped\ in this family of ribosomal proteins, S17e. They include, vertebrate, Drosophila and\ Neurospora crassa (crp-3) S17's as well as yeast S17a (RP51A) and S17b (RP51B) and\ archaebacterial S17e PUBMED:3240863, PUBMED:2507396, PUBMED:6092944.

    \ 6545 IPR009592 \

    This family consists of several hypothetical bacterial proteins of around 335 residues in length. Members of this family are found exclusively in Escherichia coli and Salmonella species and are often referred to as YggM proteins. The function of this family is unknown.

    \ 5529 IPR008545 \ This family consists of several plant proteins of unknown function. Several sequences in this family are described as being myosin heavy chain-like.\ 7684 IPR012867 \

    This entry contains hypothetical proteins expressed by either bacterial or archaeal species. Some of these are annotated as being transmembrane proteins, and many contain a high proportion of hydrophobic residues.

    \ 3461 IPR004770 \ A single member of the NhaC family, a protein from Bacillus firmus, has been functionally characterized.It is involved in pH homeostasis and sodium extrusion. Members of the NhaC family are found in both Gram-negative bacteria and Gram-positive bacteria.\ 7872 IPR012575 \

    This family consists of the MNLL subunits of NADH-ubiquinone oxidoreductase complex. NADH-ubiquinone oxidoreductase is involved in the transfer of electrons from NADH to the electron transport chain. This oxidation of NADH is coupled to proton transfer across the membrane, generating a proton motive force that is utilised for the synthesis of ATP PUBMED:15581635. MNLL subunit is one of the many subunits found in the complex and it contains a mitochondrial import sequence. However, the role of MNLL subunit is unclear PUBMED:12644575.

    \ 1538 IPR000604 \ The major outer membrane protein of Chlamydia contains four symmetrically spaced variable domains (VDs I\ to IV). This protein maintains the structural rigidity of the outer membrane and facilitates porin formation,\ permitting diffusion of solutes through the intracellular reticulate body membrane. It is believed to play a role\ in pathogenesis and possibly adhesion. Along with the lipopolysaccharide, the major out membrane protein\ (MOMP) makes up the surface of the elementary body cell. Disulphide bond interactions within and between\ MOMP molecules and other components form high molecular weight oligomers. The MOMP is the protein used\ to determine the different serotypes.\ 1678 IPR000247 \ Cucumoviruses are tripartite RNA plant viruses believed to share a close\ evolutionary relationship with brome mosaic viruses. The cucumoviruses\ include cucumber mosaic virus PUBMED:2230731, peanut stunt virus PUBMED:1926787 and tomato aspermy virus PUBMED:1990057. The viral coat proteins show a high degree of sequence\ similarity PUBMED:2230731.\ 2472 IPR001698 \

    The actin filament system, a prominent part of the cytoskeleton in eukaryotic cells, is both a static structure and a dynamic network that can undergo rearrangements: it is thought to be involved in processes such as cell movement and phagocytosis PUBMED:2341404, as well as muscle contraction.

    \

    The F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filaments (barbed end) thereby blocking the exchange of subunits at these ends. Unlike gelsolin (see ) and severin this protein does not sever actin filaments. The F-actin capping protein is a heterodimer composed of two unrelated subunits: alpha and beta. Neither of the subunits shows sequence similarity to other filament-capping proteins PUBMED:2341404.

    \

    The beta subunit is a protein of about 280 amino acid residues whose sequence is well conserved in eukaryotic species PUBMED:2179733.

    \ 7777 IPR012879 \

    The members of this family are all hypothetical eukaryotic proteins of unknown function. One member () is described as being an adipocyte-specific protein, but no evidence of this was found.

    \ 2094 IPR007366 \ This is an archaeal protein of unknown function.\ 558 IPR002482 \ This domain is about 40 residues long and is found in a variety\ of enzymes involved in bacterial cell wall degradation PUBMED:1352512. This\ domain may have a general peptidoglycan binding function.\ 6577 IPR009606 \

    This family contains hypothetical plant proteins of unknown function. Family members contain a number of conserved cysteine residues.

    \ 8062 IPR013168 \

    This domain was originally found in the C-terminal moiety of the Cpl-7 lysozyme encoded by the Streptococcus pneumoniae bacteriophage Cp-7 (). It is assumed that this domain represents a cell wall binding motif although no direct evidence has been obtained so far to support this.

    \ 5628 IPR008634 \ This family consists of archaeal GvpO proteins which are required for gas vesicle synthesis PUBMED:8606186. The family also contains two related sequences from Streptomyces coelicolor.\ 1660 IPR000587 \ Creatinase or creatine amidinohydrolase () catalyzes the conversion of creatine and water to sarcosine\ and urea. The enzyme works as a homodimer, and is induced by choline chloride. Each monomer of creatinase\ has two clearly defined domains, a small N-terminal domain, and a large C-terminal domain. Each of the two active\ sites is made by residues of the large domain of one monomer and some residues of the small domain of the other\ monomer.\ \ 1312 IPR000119 \

    Bacteria synthesize a set of small, usually basic proteins of about 90\ residues that bind DNA and are known as histone-like proteins PUBMED:3118156,\ PUBMED:3047111. Examples include the HU protein in Escherichia coli is a dimer of closely related alpha and beta chains and in other bacteria can be a dimer of identical chains. HU-type proteins have been found in a variety of eubacteria, cyanobacteria and archaebacteria, and are also encoded in the chloroplast genome of some algae PUBMED:1961745. The integration host factor (IHF), a dimer of closely related chains which seem to function in genetic recombination as well as in translational and transcriptional control PUBMED:2972385 is found in enterobacteria and viral proteins include the African Swine fever virus protein A104R (or LMW5-AR) PUBMED:8464748.

    The exact\ function of these proteins is not yet clear but they are capable of wrapping\ DNA and stabilizing it from denaturation under extreme environmental\ conditions. The structure is known for one of these proteins PUBMED:6540370. The protein exists as a dimer and two "beta-arms" function as the non-specific \ binding site for bacterial DNA.

    \ 5956 IPR010366 \

    This family consists of the Shigella flexneri specific protein OspC. The function of this family is unknown but it is thought that Osp proteins may be involved in postinvasion events related to virulence. Since bacterial pathogens adapt to multiple environments during the course of infecting a host, it has been proposed that Shigella evolved a mechanism to take advantage of a unique intracellular cue, which is mediated through MxiE, to express proteins when the organism reaches the eukaryotic cytosol PUBMED:12142411.

    \ 7118 IPR010840 \

    This family consists of several bacterial proteins of around 210 residues in length. The function of this family is unknown.

    \ 6391 IPR009503 \

    This family consists of several short Lactococcus lactis and bacteriophage proteins. The function of this family is unknown.

    \ 3877 IPR001573 \

    Cell signalling mediated via GPCRs (G-protein-coupled receptors) involves the assembly of receptors, G-proteins, effectors and downstream elements into complexes that approach in design 'solid-state' signalling devices. Scaffold molecules, such as the AKAPs (A-kinase anchoring proteins), were discovered more than a decade ago and represent dynamic platforms, enabling multivalent signalling PUBMED:12546660. This family of functionally related proteins is classified on the basis of their ability to associate with the PKA holoenzyme inside cells. A shared property of most, if not all, AKAPs is the ability to form multivalent signal transduction complexes. \ \

    Each anchoring protein contains at least two functional motifs PUBMED:8968497. The conserved PKA binding motif forms an amphipathic helix of 14-18 residues that interacts with hydrophobic determinants located in the extreme N-terminus of the regulatory subunit dimmer. The subcellular address of each AKAP is encoded by a unique targeting motif. Gravin, an autoantigen recognised by serum from myasthenia gravis patients contains 3 repeats of this domain PUBMED:9000000.

    \ 2303 IPR007731 \ This is a family of phage proteins of unknown function.\ 2980 IPR006493 \

    This family is represented by BlyA, a small holin found in Borrelia circular plasmids that prove to be temperate phage PUBMED:11073925. This protein was previously proposed to be a hemolysin. BlyA is small (67 residues) and contains two largely hydrophobic helices and a highly charged C terminus.

    \ 3619 IPR000440 \ This family contains chain 3 of the NADH-ubiquinone / plastoquinone oxidoreductase\ , which catalyses the following reactions:\ \ \ 4797 IPR004340 \

    Herpes simplex virus type 1 (HSV1) DNA replication in host cells is known to be mediated by seven viral-encoded proteins,\ three of which form a heterotrimeric DNA helicase-primase complex. This complex consists of UL5, UL8, and UL52\ subunits. Heterodimers consisting of UL5 and UL52 have been shown to retain both helicase and primase activities.\ Nevertheless, UL8 is still essential for replication: though it lacks any DNA binding or catalytic activities, it is involved in\ the transport of UL5-UL52 and it also interacts with other replication proteins.

    \

    The molecular mechanisms of the\ UL5-UL52 catalytic activities are not known. While UL5 is associated with DNA helicase activity and UL52 with DNA\ primase activity, the helicase activity requires the interaction of UL5 and UL52 PUBMED:10501495, PUBMED:11278618. It is not known if the primase\ activity can be maintained by UL52 alone. The biological significance of UL52-UL8\ interaction is not known. Yeast two-hybrid analysis together with immunoprecipitation experiments have shown that the\ HSV1 UL52 region between residues 366-914 is essential for this interaction, while the first 349 N-terminal residues are\ dispensable PUBMED:10501495.

    \

    This family also includes protein UL70 from cytomegalovirus (CMV, a subgroup of the Herpesviridae)\ strains which, by analogy with UL52, is thought to have DNA primase activity. Indeed, CMV\ strains also possess a DNA helicase-primase complex, the other subunits being protein UL105 (with known similarity to\ HSV1 UL5) and protein UL102.

    \ 7818 IPR012533 \

    GLE1 is an essential nuclear export factor involved in RNA export.

    \ 7648 IPR012476 \

    The members of this family are sequences that are similar to the human protein GLE1 (). This protein is localised at the nuclear pore complexes and functions in poly(A)+ RNA export to the cytoplasm PUBMED:9618489.

    \ 1575 IPR001907 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to the MEROPS peptidase family S14 (ClpP endopeptidase family, clan SK). ClpP is an ATP-dependent protease that cleaves a number of proteins, such as casein and albumin PUBMED:2197275. It exists as a heterodimer of ATP-binding regulatory A and catalytic P subunits, both of which are required for effective levels of protease activity in the presence of\ ATP PUBMED:2197275, although the P subunit alone does possess some catalytic activity. This family of sequences represent the P subunit.

    \ \

    Proteases highly similar to ClpP have been found to be encoded in the genome\ of bacteria, metazoa, some viruses and in the chloroplast of plants. A number of the proteins in this family are classified as non-peptidase homologues as they have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for catalytic activity.\

    \ 7983 IPR012968 \

    This domain is present in proteins of the Ferlin family. It is often located between two C2 domains PUBMED:15112237.

    \ 3237 IPR007300 \ The two products of the lrgAB operon are potential membrane proteins, and LrgA and LrgB are both thought to control murein hydrolase activity and penicillin tolerance PUBMED:10714982.\ 6313 IPR009472 \

    This family consists of several hypothetical proteins of unknown function all from photosynthetic organisms including plants and cyanobacteria.

    \ 3478 IPR005623 \ This is an uncharacterized protein involved in formation of periplasmic nitrate reductase.\ 5672 IPR008898 \ This family consists of several bacterial YopD like proteins. Virulent Yersinia species harbour a common plasmid that encodes essential virulence determinants (Yersinia outer proteins [Yops]), which are regulated by the extracellular stimuli Ca2+ and temperature. YopD is thought to be a possible transmembrane protein and contains an amphipathic alpha-helix in its carboxy terminus PUBMED:8418066.\ 64 IPR011022 \

    G protein-coupled receptors are a large family of signalling molecules that respond to a wide variety of extracellular stimuli. The receptors relay the information encoded by the\ ligand through the activation of heterotrimeric G proteins and intracellular effector molecules. To ensure the appropriate regulation of the signalling cascade, it is vital to properly\ inactivate the receptor. This inactivation is achieved, in part, by the binding of a soluble protein, arrestin, which uncouples the receptor from the downstream G protein after the receptors are phosphorylated by G\ proteincoupled receptor kinases. In\ addition to the inactivation of G protein-coupled receptors, arrestins have also been implicated in the endocytosis of receptors and cross talk with other signalling pathways. Arrestin (retinal S-antigen) is a major protein of the retinal rod outer segments. It interacts with \ photo-activated phosphorylated rhodopsin, inhibiting or 'arresting' its ability to interact with transducin\ PUBMED:15335861. The protein binds calcium, and shows similarity in its C-terminus to alpha-transducin and\ other purine nucleotide-binding proteins. In mammals, arrestin is associated with autoimmune uveitis.

    \ Arrestins comprise a family of closely-related proteins that includes beta-arrestin-1 and -2, which regulate\ the function of beta-adrenergic receptors by binding to their phosphorylated forms, impairing their capacity\ to activate G(S) proteins. The crystal structure of bovine retinal arrestin comprises two domains of antiparallel beta-sheets connected through a hinge\ region and one short alpha-helix on the back of the amino-terminal fold PUBMED:9495348. The binding region for phosphorylated light-activated rhodopsin is\ located at the N-terminal domain, as indicated by the docking of the photoreceptor to the three-dimensional structure of arrestin. The C-terminal domain is a sandwich formed by several beta-sheets.

    \ 6501 IPR009559 \

    This family consists of several Lactococcus lactis bacteriophage major capsid proteins.

    \ 4702 IPR004252 \ Some members of this family are putative plant transposon proteins of unknown function. Some may be similar to leucine zipper transcription factors.\ 5293 IPR008639 \ This family consists of Halobacterium gas vesicle protein C sequences which are thought to confer stability to the gas vesicle membranes PUBMED:1404376,PUBMED:8763925.\ 5471 IPR008517 \ This family consists of several bacterial proteins of unknown function. Some of the family members are described as putative lipoproteins.\ 6719 IPR009678 \

    This family consists of P2 phage tail completion protein R (GpR) like sequences. GpR is thought to be a tail completion protein which is essential for stable head joining PUBMED:8178426.

    \ 3085 IPR001037 \

    Integrase comprises three domains capable of folding independently and whose three-dimensional structures are known. However, the manner in which the N-terminal, catalytic core, and C-terminal domains interact in the holoenzyme remains obscure. Numerous studies indicate that the enzyme functions as a multimer, minimally a dimer. The integrase proteins from HIV-1 and ASV have been studied most carefully with respect to the structural basis of catalysis. Although the active site of ASV integrase does not undergo significant conformational changes on binding the required metal cofactor, that of HIV-1 IN does. This active site-mediated conformational change in HIV-1 reorganizes the catalytic core and C-terminal domains and appears to promote an interaction that is favourable for catalysis PUBMED:10384242.

    \

    Retroviral integrase is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. The presence of retrovirus integrase-related gene sequences in eukaryotes is known. Bacterial transposases involved in the transposition of the insertion sequence also belong to this group.

    \

    HIV integrase catalyses the incorporation of virally derived DNA into the human genome. This unique step in the virus life cycle provides a variety of points for intervention and hence is an attractive target for the development of new therapeutics for the treatment of AIDS PUBMED:9161051. Substrate recognition by the retroviral integrase enzyme is critical for retroviral integration. To catalyze this recombination event, integrase must recognize and act on two types of substrates, viral DNA and host DNA, yet the necessary interactions exhibit markedly different degrees of specificity PUBMED:10384243.

    \ 4710 IPR002492 \ Transposase proteins are necessary for efficient DNA transposition.\ This family includes the amino-terminal region of Tc1, Tc1A, Tc1B and Tc2B transposases of Caenorhabditis elegans. The region encompasses the specific DNA binding and second DNA recognition domains as well as an amino-terminal region of the catalytic domain of Tc3 as described in PUBMED:9312061. Tc3 is a member of the Tc1/mariner family of transposable elements.\ 1244 IPR003348 \ This ATPase is involved in the removal of arsenate, antimonite, and arsenate from the cell. \

    In Escherichia coli an anion-translocating ATPase has been identified as the product of the arsenical resistance operon of resistance plasmid R773. This ATP-driven oxyanion pump catalyses extrusion of the oxyanions arsenite, antimonite and arsenate. Maintenance of a low intracellular concentration of oxyanion produces resistance to the toxic agents. The pump is composed of two polypeptides, the products of the arsA and arsB genes. This two-subunit enzyme produces resistance to arsenite and antimonite. A third gene, arsC, expands the\ substrate specificity to allow for arsenate pumping and resistance PUBMED:1704144.

    \

    The ArsA and ArsB proteins form a membrane-bound pump that functions as an oxyanion-translocating ATPase. The ArsC protein is an arsenate reductase that reduces arsenate to arsenite, which is subsequently pumped out of the cell PUBMED:7629056.

    \ 6021 IPR010396 \

    This family consists of several Orthopoxvirus A5L proteins. The vaccinia virus WR A5L open reading frame (corresponding to open reading frame A4L in vaccinia virus Copenhagen) encodes an immunodominant late protein found in the core of the vaccinia virion. The A5 protein appears to be required for the immature virion to form the brick-shaped intracellular mature virion PUBMED:10233918.

    \ 6693 IPR009664 \

    This family consists of several conserved hypothetical bacterial proteins of around 95 residues in length. The function of this family is unknown

    \ 5617 IPR008416 \ This family consists of several VP1054 proteins from the Baculoviruses. VP1054 is a virus structural protein required for nucleocapsid assembly PUBMED:9188569.\ 2302 IPR007700 \ This is a family of uncharacterised plant proteins of unknown function.\ 3273 IPR003428 \ This mitochondrial matrix protein family contains members of the MAM33 family which bind to the globular 'heads' of C1Q.\ 4993 IPR000420 \ A number of yeast cell wall glycoproteins are characterized by the presence of\ tandem repeats of a region of 18 to 19 residues PUBMED:8322511, PUBMED:9301021.\ 8069 IPR013228 \

    This domain is found C terminal to the PE () and PPE () domains. The secondary structure of this domain is predicted to be a mixture of alpha helices and beta strands PUBMED:12711809.

    \ 7319 IPR011107 \

    These proteins include Ypi1, a novel Saccharomyces cerevisiae type 1 protein phosphatase inhibitor PUBMED:14506263 and ppp1r11/hcgv (), annotated as having protein phosphatase inhibitor activity PUBMED:8781118.

    \ 5873 IPR010325 \

    Rhamnogalacturonate lyase degrades the rhamnogalacturonan I (RG-I) backbone of pectin PUBMED:12591882. This family contains mainly members from plants, but also contains the plant pathogen Erwinia chrysanthemi.

    \ 6254 IPR010493 \

    The N-terminal domain of serine acetyltransferase has a sequence that is conserved in plants PUBMED:7608200 and bacteria PUBMED:7608200.

    \ 1667 IPR003751 \

    The RNA-binding protein CsrA (carbon storage regulator) is a new kind of global regulator, which facilitates specific mRNA decay PUBMED:9211896. CsrA is entirely contained within a globular complex of approximately 18 CsrA-H6 subunits and a single RNA, CsrB. CsrA binds to the CsrB RNA molecule to form the Csr regulatory system which has a strong negative regulatory effect on glycogen biosynthesis, glyconeogenesis and glycogen catabolism and a positive regulatory effect on glycolysis PUBMED:9211896.

    \ 6778 IPR009712 \

    This family consists of several bacterial and phage proteins of around 115 residues in length. The function of this family is unknown.

    \ 5536 IPR008548 \ This family contains the G6 protein from Vaccinia virus (strain Copenhagen) and related proteins from other Orthopoxvirus. The proteins are uncharacterized.\ 2241 IPR006533 \

    These sequences represent the Vgr family of proteins, associated with some classes of Rhs elements. This model does not include a large octapeptide repeat region, VGXXXXXX, found in the Vgr of Rhs classes G and E PUBMED:9696756.

    \ 854 IPR007009 \ This conserved region identifies a set of hypothetical protein sequences from the Metazoa and Ascomycota which include SHQ1 from Saccharomyces cerevisiae.\ 99 IPR004328 \ This functionally uncharacterised domain is found in a number of signal transduction proteins.\ 8032 IPR013232 \

    Gene 1.1 in Bacteriophage T7 encodes a 42 amino acid protein, rich in basic amino acids suggesting its interaction with nucleic acids PUBMED:6254001. Many homologues are present in different T7 and T3-like bacteriophage.

    \ 5405 IPR008758 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S28 (clan SC). The predicted active site residues for members of this family and family S10 occur in the same order in the sequence: S, D, H.

    These serine proteases include several eukaryotic enzymes such as lysosomal Pro-X carboxypeptidase, dipeptidyl-peptidase II, and thymus-specific serine peptidase PUBMED:10527559, PUBMED:11003393, PUBMED:11139392, PUBMED:11173530.

    \ 1053 IPR003002 \

    This large family of proteins is related to . They are 7 transmembrane receptors. This family does not include all known members, as there are problems with overlapping specificity with . This family is very divergent and is greatly expanded in the Caenorhabditis elegans (nematode worm) PUBMED:9582190.

    \ 7625 IPR012436 \

    This family contains sequences derived from a group of hypothetical proteins expressed by Arabidopsis thaliana. These sequences are highly similar and the region concerned is about 100 residues long.

    \ 7717 IPR012413 \

    The sequences found in this family are similar to the BA14K proteins expressed by Brucella abortus () and by Brucella suis (). BA14K was found to be strongly immunoreactive; it induces both humoral and cellular responses in hosts throughout the infective process PUBMED:9673296.

    \ 7939 IPR012515 \

    This family consists of the pleurocidin family of antimicrobial peptides. Pleurocidins are found in the skin mucous secretions of the winter flounder (Pleuronectes americanus) and these peptides exhibit antimicrobial activity against Escherichia coli. Pleurocidin is predicted to assume an amphipathic alpha-helical conformation similar to other linear antimicrobial peptides and may play a role in innate host defence PUBMED:9115266.

    \ 1396 IPR003497 \ This entry represents the N-terminus of baculovirus BRO and ALI motif proteins. The function of BRO proteins is unknown. It\ has been suggested that BRO-A and BRO-C are DNA binding proteins that influence host DNA replication and/or transcription\ PUBMED:10888617. This Pfam domain does not include the characteristic invariant alanine, leucine, isoleucine motif of the ALI proteins PUBMED:9847359.\ 4525 IPR000956 \ Stathmin is a ubiquitous phosphorylated protein thought to act as an intracellular relay for diverse \ regulatory pathways PUBMED:2358074, functioning through a variety of second messengers. Its phosphorylation \ and gene expression are regulated throughout development PUBMED:8344928 and in response to extracellular \ signals regulating cell proliferation, differentiation and function PUBMED:2745432. Stathmin, and the \ related proteins SCG10 and XB3, contain a N-terminal domain (XB3 contains an additional N-terminal \ hydrophobic region), a 78 amino acid coiled-coil region, and a short C-terminal domain.\ 1261 IPR003190 \ Decarboxylation of aspartate is the major route of alanine production in bacteria, and is catalysed by the enzyme aspartate decarboxylase. The enzyme is translated as an inactive proenzyme of two chains, A and B. This family contains both chains of aspartate decarboxylase.\ 6374 IPR010542 \

    This domain represents the C-terminal region of vertebrate heat shock transcription factors. Heat shock transcription factors regulate the expression of heat shock proteins - a set of proteins that protect the cell from damage caused by stress and aid the cell's recovery after the removal of stress PUBMED:11509572. This C-terminal region is found with the N-terminal , and may contain a three-stranded coiled-coil trimerisation domain and a CE2 regulatory region, the latter of which is involved in sustained heat shock response PUBMED:11509572.

    \ 3510 IPR005117 \

    Sulfite reductases (SiRs) and related nitrite reductases (NiRs) catalyse the six-electron reduction reactions of sulfite to sulfide, and nitrite to ammonia, respectively. The Escherichia coli SiR enzyme is a complex composed of two proteins, a flavoprotein alpha-component (SiR-FP) and a hemoprotein beta-component (SiR-HP), and has an alpha(8)beta(4) quaternary structure PUBMED:10984484. SiR-FP contains both FAD and FMN, while SiR-HP contains a Fe(4)S(4) cluster coupled to a siroheme through a cysteine bridge. Electrons are transferred from NADPH to FAD, and on to FMN in SiR-FP, from which they are transferred to the metal centre of SiR-HP, where they reduce the siroheme-bound sulfite.

    \

    SiR-HP has a two-fold symmetry, which generates a distinctive three-domain alpha/beta fold that controls assembly and reactivity PUBMED:7569952. This entry describes the ferrodoxin-like (alpha/beta sandwich) domain, which consists of a duplication containing two subdomains of this fold.

    \ \ 1840 IPR002806 \ This archaebacterial protein family has no known function.\ 3056 IPR000724 \ This domain is found as a tandem repeat in Streptococcal cell surface proteins, such as the\ IgG binding proteins G and MIG. These proteins are type I membrane proteins that bind to\ the constant Fc region of IgG with high affinity. The N-terminus of MIG mediates binding to\ plasma proteinase inhibitor alpha 2-macroglobulin after complex formation with proteases.\ 4222 IPR000244 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L9 is one of the proteins from the large ribosomal subunit.\ In Escherichia coli, L9 is known to bind directly to the 23S rRNA. It belongs\ to a family of ribosomal proteins grouped on the basis of sequence similarities PUBMED:, PUBMED:8306963.

    \

    The crystal structure of Bacillus stearothermophilus L9 shows the 149-residue protein comprises two globular domains connected by a rigid linker PUBMED:12051860. Each domain contains an rRNA binding site, and the protein functions as a\ structural protein in the large subunit of the ribosome. The C-terminal domain consists of two loops, an alpha-helix and a three-stranded mixed\ parallel, anti-parallel beta-sheet packed against the central alpha-helix. The long central alpha-helix is exposed to solvent in the middle and participates in the\ hydrophobic cores of the two domains at both ends.

    \ 1415 IPR006771 \

    The pathogenic dimorphic fungal organism Blastomyces dermatitidis exists as a budding yeast at 37 degrees C and as a mycelium at 25\ degrees C. Bys1 is expressed specifically in the high temperature, unicellular yeast morphology and codes for a protein of 18.6 kDa that contains multiple\ putative phosphorylation sites, a hydrophobic N terminus, and two 34-amino-acid domains with similarly spaced nine-amino-acid\ degenerative repeating motifs PUBMED:11811639. The molecular function of this protein is not known.

    \ 4568 IPR006751 \ The general transcription factor, TFIID, consists of the TATA-binding protein (TBP) associated with a series of TBP-associated factors (TAFs) that together participate in the assembly of the transcription preinitiation complex. TAFII55 binds to TAFII250 and inhibits its acetyltransferase activity. The exact role of TAFII55 is currently unknown. The conserved region is situated towards the N-terminal of the protein PUBMED:11592977.\ 7771 IPR012480 \

    This family features sequences that are similar to a region of the Flavobacterium heparinum proteins heparinase II () and heparinase III (). The former is known to degrade heparin and heparan sulphate, whereas the latter predominantly degrades heparan sulphate. Both are secreted into the periplasmic space upon induction with heparin PUBMED:8702264.

    \ 5643 IPR008562 \ This family consists of several baculovirus sequences of between 350 and 380 residues long. The family has no known function.\ 1685 IPR007253 \ This repeat is found in multiple tandem copies in proteins including amidase enhancers PUBMED:1356138 and adhesins PUBMED:11254569.\ 68 IPR001920 \

    Aspartate racemase () and glutamate racemase () are two evolutionary related bacterial enzymes that do not seem to require a cofactor for their activity PUBMED:8385993. Glutamate racemase, which interconverts L-glutamate into D-glutamate, is required for the biosynthesis of peptidoglycan and some peptide-based antibiotics such as gramicidin S.\ In addition to characterized aspartate and glutamate racemases, this family also includes a hypothetical protein from Erwinia carotovora and one from Escherichia coli (ygeA).

    \

    Two conserved cysteines are present in the sequence of these enzymes. They are expected to play a role in catalytic activity by acting as bases in proton abstraction from the substrate.

    \ 1371 IPR003343 \ Proteins that contain this domain are found in a variety of bacterial and\ phage surface proteins such as intimins. \ Intimin is a bacterial cell-adhesion molecule that mediates the intimate bacterial host-cell interaction. It contains three domains; two immunoglobulin-like domains and a C-type lectin-like module implying that carbohydrate recognition may be important in intimin-mediated cell adhesion PUBMED:10201396.\ 4277 IPR001529 \

    DNA-dependent RNA polymerases () are\ responsible for the polymerisation of ribonucleotides\ into a sequence complementary to the template DNA. In\ eukaryotes, there are three different forms of\ DNA-dependent RNA polymerases transcribing different\ sets of genes. Most RNA polymerases are multimeric\ enzymes and are composed of a variable number of\ subunits. RNA synthesis follows after the attachment\ of RNA polymerase to a specific site, the promoter, on\ the template DNA strand. The RNA synthesis process\ continues until a termination sequence is reached. \ The RNA product, which is synthesised in the 5' to 3'\ direction, is known as the primary transcript.\ \ Eukaryotic nuclei contain three distinct types of RNA\ polymerases that differ in the RNA they synthesise:\ \

    \ \ Eukaryotic cells are also known to contain separate\ mitochondrial and chloroplast RNA polymerases. \ Eukaryotic RNA polymerases, whose molecular masses\ vary in size from 500 to 700 kD, contain two\ non-identical large (>100 kDa) subunits and an array\ of up to 12 different small (less than 50 kDa) subunits.

    \

    In archaebacteria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 polypeptides.\ It has recently been shown PUBMED:8265347, PUBMED:8417319 that small subunits of about 15 kDa, found in polymerase types I and II, are highly conserved. These proteins contain a probable zinc finger in their N-terminal region and a C-terminal zinc ribbon domain (see ).

    \ 1170 IPR003164 \

    The AP2 adaptor is a heterotetramer that plays a central role in clathrin-mediated endocytosis by linking transmembrane receptors to be internalised to the clathrin lattice. During clathrin-mediated endocytosis, clathrin-coated vesicles are formed by pinching off a portion of the plasma membrane, along with its cargo molecules. The AP2 adaptor links the cargo to the clathrin coat, and can interact with proteins involved in the formation of the clathrin-coated vesicles. The alpha adaptor subunit can be divided into a trunk domain and the C-terminal appendage domain (or ear domain), separated by a linker region. The C-terminal appendage domain regulates translocation of endocytic accessory proteins to the bud site PUBMED:12057195.

    \ \ \ 7656 IPR012884 \

    The phage-encoded excisionase protein (Xis, ) is involved in excisive recombination by regulating the assembly of the excisive intasome and by inhibiting viral integration. It adopts an unusual winged-helix structure in which two alpha helices are packed against two extended strands. Also present in the structure is a two-stranded anti-parallel beta-sheet, whose strands are connected by a four-residue wing. During interaction with DNA, helix alpha2 is thought to insert into the major groove, while the wing contacts the adjacent minor groove or phosphodiester backbone. The C-terminal region of Xis is involved in interaction with phage-encoded integrase (Int), and a putative C-terminal alpha helix may fold upon interaction with Int and/or DNA PUBMED:12460578.

    \ 1960 IPR004884 \ This is a group of uncharacterised proteins of unknown function.\ 789 IPR003489 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This family contains the sigma-54 modulation protein family and the S30AE family of ribosomal proteins which includes the light-repressed protein (lrtA) PUBMED:8063707.

    \ 724 IPR001024 \

    This domain,the PLAT (Polycystin-1, Lipoxygenase, Alpha-Toxin) domain or LH2 (Lipoxygenase homology)domain, is found in a variety of membrane or lipid associated proteins. It is present in lipogenases, enzymes involved at various steps in the biosynthesis of leukotrienes with iron as the cofactor. The known structure of pancreatic lipase shows this domain binds to procolipase that mediates membrane association. This domain may mediate membrane attachment via other protein binding partners. The structure of this domain is known for many members of the family and is composed of a beta sandwich PUBMED:11412104.

    \ 2887 IPR002006 \ The core antigen of hepatitis viruses possesses a carboxyl\ terminus rich in arginine. On this basis it was predicted\ that the core antigen would bind DNA PUBMED:399329. There is some\ experimental evidence to support this PUBMED:2677399.\ 1611 IPR001135 \ NADH-ubiquinone oxidoreductase, chain 49kDa () is the third largest\ subunit of complex I and is a component of the iron-sulphur (IP) fragment of the\ enzyme. The respiratory-chain NADH dehydrogenase (also known as complex I or\ NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the\ inner mitochondrial membrane and which also seems to exist in the chloroplast and\ in cyanobacteria (as a NADH-plastoquinone oxidoreductase). NADH-ubiquinone\ oxidoreductase, 49Kd chain is one of the 25 to 30 polypeptide subunits of this\ bioenergetic enzyme complex.\ \

    A number of bacterial enzymes also belong to this family. They include;\ NADH-ubiquinone oxidoreductase, subunit D (gene nuoD), formate hydrogenlyase,\ subunit 5 (gene hycE) and hydrogenase-4, subunit G (gene hyfG) all from Escherichia coli and\ subunit NQO4 of NADH-ubiquinone oxidoreductase from Paracoccus denitrificans\ PUBMED:1445936, PUBMED:7690854.

    \ 7898 IPR012588 \

    This domain is found at the N terminus of 3,-5, exonucleases with HRDC domains, and also in putative exosome components PUBMED:15112237.

    \ 2908 IPR005030 \

    This is a family of viral latent proteins whose function is not fully understood. A role in transcriptional regulation has been suggested PUBMED:11024123.

    \ 3980 IPR007586 \ The 25 kDa product of Vaccinia virus gene L4R is also known as VP8. VP8 is found in the cores of Vaccinia virions and is essential for the formation of transcriptionally competent viral particles. It binds both single stranded and double stranded DNA and RNA with similar affinities. Binding is thought to involve cooperative interactions between protein subunits. The protein is proteolytically cleaved during viral assembly at an Ala-Gly-Ala site. Possible roles for VP8 include packaging and maintaining the DNA genome in a transcribable configuration; binding ssDNA during transcription initiation; and cooperation with I8R protein to unwind early promoter regions. VP8 may also function in either transcription elongation or release of mRNA molecules from viral particles PUBMED:9321647.\ 2638 IPR006701 \ Initiation of packaging of double-stranded viral DNA involves the specific interaction of the prohead with viral DNA in a process mediated by a phage-encoded terminase protein. The terminase enzymes are usually hetero-oligomers composed of a small and a large subunit. This region is found on the large subunit and possesses an endonuclease and ATPase activity that requires Mg2+ and a neutral or slightly basic reaction. This region is also found in bacterial sequences PUBMED:10930407, PUBMED:1548711.\ 5819 IPR010298 \

    This family consists of several hypothetical bacterial proteins as well as some uncharacterised sequences from Arabidopsis thaliana. The function of this family is unknown.

    \ 6216 IPR009433 \

    This family consists of several membrane-associated protein VP24 sequences from a variety of Ebola and Marburg virus. The VP24 protein of Ebola virus is believed to be a secondary matrix protein and minor component of virions. VP24 possesses structural features commonly associated with viral matrix proteins and that VP24 may have a role in virus assembly and budding PUBMED:12525613.

    \ 6955 IPR009810 \

    This family consists of several plant specific late nodulin sequences which are homologous to the Pisum sativum (Garden pea) ENOD3 protein. ENOD3 is expressed in the late stages of root nodule formation and contains two pairs of cysteine residues toward the proteins C terminus which may be involved in metal-binding PUBMED:2152123.

    \ 1687 IPR004153 \ This repeat contains the conserved pattern CXCXC where X can be any amino acid. The repeat is found in up to five copies in Vascular endothelial growth factor C PUBMED:8612600. In the salivary glands of the dipteran Chironomus tentans, a specific messenger ribonucleoprotein (mRNP) particle, the Balbiani ring (BR) granule, can be visualized during its assembly on the gene and during its nucleocytoplasmic transport. This repeat is found over 70 copies in the balbiani ring protein 3 (). It is also found in some silk proteins PUBMED:9089085.\ 3594 IPR004060 \

    G-protein-coupled receptors, GPCRs, constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence PUBMED:8170923. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. There is a specialized database for GPCRs: http://www.gpcr.org/7tm/.

    \

    The rhodopsin-like GPCRs themselves represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7\ transmembrane (TM) helices PUBMED:2111655, PUBMED:2830256, PUBMED:8386361.

    \

    The hypothalamus plays a central role in the integrated control of feeding\ and energy homeostasis PUBMED:9491897. A new family of neuropeptides, orexins, have been identified that bind and activate two closely related (previously) orphan GPCRs PUBMED:9491897, PUBMED:9656726. Orexins stimulate appetite and food consumption PUBMED:9656726. Their genes are expressed bilaterally and symmetrically in the lateral hypothalamus, which has been shown to be the "feeding centre". By contrast, the "satiety centre" is expressed in the ventromedial hypothalamus and is dominated by the leptin-regulated neuropeptide network.

    \

    Both orexin receptors exhibit a similar pharmacology - the 2 orexin peptides, orexin-A and orexin-B, bind to both receptors and, in each case, agonist binding results in an increase in intracellular calcium levels. However, orexin-B shows a 10-fold selectivity for orexin receptor type 2, whilst orexin-A is equipotent at both receptors PUBMED:10498827.

    \ 4130 IPR000352 \ Peptide chain release factors (RFs) are required for the termination of\ protein biosynthesis PUBMED:8821264. At present two classes of RFs can be distinguished.\ Class I RFs bind to ribosomes that have encountered a stop codon at their\ decoding site and induce release of the nascent polypeptide. Class II RFs are\ GTP-binding proteins that interact with class I RFs and enhance class I RF\ activity.\ In prokaryotes there are two class I RFs that act in a codon specific manner\ PUBMED:2215213: RF-1 (gene prfA) mediates UAA and UAG-dependent termination while RF-2\ (gene prfB) mediates UAA and UGA-dependent termination. RF-1 and RF-2 are\ structurally and evolutionary related proteins which have been shown PUBMED:1408743 to\ be part of a larger family.\ 6478 IPR009543 \

    Members of this family may play a role in the control of protein cycling through the trans-Golgi network. Vacuolar sorting protein is an ATPase required for endosomal trafficking PUBMED:10637304. Defects in the human protein VPS13A cause chorea-acanthocytosis, an autosomal recessive neurodegenerative disorder characterized by the gradual onset of hyperkinetic movements and abnormal erythrocyte morphology PUBMED:11381253.

    \ 8108 IPR013227 \

    The apple domain, , has an N-terminal region that contains four tandem repeats of about 90 amino acids and a C-terminal catalytic domain. The 90 amino-acid repeated domain contains 6 conserved cysteines. It has been shown PUBMED:1998666 that three disulphide bonds link the first and sixth, second and fifth, and third and fourth cysteines. This entry contains apple-like domains, which are presented in Plasminogen, Caenorhabditis elegans hypothetical ORFs and the extracellular portion of plant S-locus glycoproteins () and S-receptor kinases. The domain is predicted to possess protein- and/or carbohydrate-binding functions.

    \ 5728 IPR008641 \ This family consists of several Epstein-barr virus BFRF1 like proteins from the Gammaherpesviruses. BFRF1 belongs to the lytic proteins, since its expression is achieved following activation of the EBV replication cycle. Furthermore, it can be classified as an early protein, given the fact that it is only partially inhibited by treatment with PAA and ACV (with the BFRF1 gene behaving like BALF5, a known early gene) and that it is present in Raji cells which harbour a defective EBV strain that does not allow expression of the late lytic genes PUBMED:10708440.\ 7784 IPR012928 \

    The Clostridium neurotoxin family is composed of tetanus neurotoxin and seven serotypes of botulinum neurotoxin. The structure of the botulinum neurotoxin reveals a four domain protein. The N-terminal catalytic domain (), the central translocation domain and two receptor binding domains PUBMED:9783750. This domain is the N-terminal receptor binding domain, which is comprised of two seven-stranded beta-sheets sandwiched together to form a jelly role motif PUBMED:9783750. The role of this domain in receptor binding appears to be indirect.

    \ 1594 IPR004485 \ This protein is involved in cobalamin (vitamin B12) biosynthesis and porphyrin biosynthesis. It converts cobyric acid to cobinamide by the addition of aminopropanol on the F carboxylic group. It is part of the cob operon.\ 4630 IPR004114 \

    The THUMP domain is shared by 4-thiouridine, pseudouridine synthases and RNA methylasesPUBMED:11295541 and is probably an RNA-binding domain that adopts an\ alpha/beta fold similar to that found in the C-terminal domain of translation initiation factor 3 and ribosomal protein S8.\ The THUMP domain probably functions by delivering a variety of RNA modification enzymes to their targets PUBMED:11295541.

    \

    This domain is found in the thiamine biosynthesis proteins (ThiI) (see ).

    \ 3121 IPR003900 \ This group of proteins contains the KID repeat as found in Borrelia and spirochete RepA / Rep+ proteins. The function of these proteins is unknown. RepA and related Borrelia proteins have been suggested to play an important genus-wide role in the biology of the Borrelia PUBMED:9733706.\ 1079 IPR000560 \ Acid phosphatases () are a heterogeneous group of proteins that hydrolyze phosphate esters, optimally at\ low pH. It has been shown PUBMED:1989985 that a number of acid phosphatases, from both prokaryotes and eukaryotes,\ share two regions of sequence similarity, each centered around a conserved histidine residue. These two histidines\ seem to be involved in the enzymes' catalytic mechanism PUBMED:8334986, PUBMED:1429631. The first histidine is located\ in the N-terminal section and forms a phosphohistidine intermediate while the second is located in the C-terminal\ section and possibly acts as proton donor. Enzymes belonging to this family are called 'histidine acid phosphatases'\ and include the Escherichia coli pH 2.5 acid phosphatase (gene appA) and glucose-1-phosphatase () (gene agp); yeast\ constitutive and repressible acid phosphatases (genes PHO3 and PHO5); Schizosaccharomyces pombe acid phosphatase (gene pho1);\ Aspergillus awamorii phytases A and B () (gene phyA and phyB); mammalian lysosomal and prostatic acid phosphatase;\ and several Caenorhabditis elegans hypothetical proteins.\ 1055 IPR001599 \

    This family contains serum complement C3 and C4 precursors and alpha-macrogrobulins.

    \ \ \

    The alpha-macroglobulin (aM) family of proteins includes protease inhibitors PUBMED:2473064, typified by the human tetrameric a2-macroglobulin (a2M); they belong to the MEROPS proteinase inhibitor family I39, clan IL. These protease inhibitors share several defining properties, which include (i) the ability to inhibit proteases from all catalytic classes, (ii) the presence of a 'bait region' and a thiol ester, (iii) a similar protease inhibitory\ mechanism and (iv) the inactivation of the inhibitory capacity by reaction of the thiol ester with small primary amines. \ aM protease inhibitors inhibit by steric hindrance PUBMED:2472396. The mechanism involves protease cleavage of the bait region, a segment of the aM that is particularly susceptible to proteolytic cleavage, which initiates a conformational change such that the aM collapses about the protease. In the resulting aMprotease complex, the active site of the protease is sterically shielded, thus substantially decreasing access to protein substrates. Two additional events occur as a consequence of bait region cleavage, namely (i) the h-cysteinyl-g-glutamyl thiol ester becomes highly reactive and (ii) a major conformational change exposes a conserved COOH-terminal receptor binding domain PUBMED:2469470 (RBD). RBD exposure allows the aM protease complex to bind to clearance receptors and be removed from circulation PUBMED:2430968. Tetrameric, dimeric, and, more recently, monomeric aM protease inhibitors have been identified PUBMED:9914899, PUBMED:10426429.

    \ \ 2244 IPR007649 \ This entry represents a conserved region in a number of uncharacterised plant proteins.\ 7307 IPR006109 \

    NAD-dependent glycerol-3-phosphate dehydrogenase () (GPD) catalyzes the reversible reduction of dihydroxyacetone phosphate to glycerol-3-phosphate. It is a cytoplasmic protein, active as a homodimer PUBMED:2500660, each monomer containing an N-terminal NAD binding site PUBMED:6773774. In insects, it acts in conjunction with a mitochondrial alpha-glycerophosphate oxidase in the alpha-glycerophosphate cycle, which is essential for the production of energy used in insect flight PUBMED:2500660.

    \ 3743 IPR000994 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases and non-peptidase homologs belong to MEROPS peptidase family M24 (clan MG). They includes the enzymes, proline dipeptidase and methionine aminopeptidase.

    \ \ 3666 IPR002693 \ This family consists of paramyxovirus P phosphoprotein from sendai virus and human and bovine parainfluenza viruses. The P protein is an essential part of the viral RNA polymerase complex formed form the P and L proteins PUBMED:10400742. The exact role of the P protein in this complex in unknown but it is involved in multiple protein-protein interactions and binding the polymerase complex to the nucleocapsid or ribonucleoprotein template PUBMED:10400742. It also appears to be important for the proper folding of the L protein PUBMED:10400742. The paramyxoviruses have a negative sense ssRNA genome PUBMED:10400742.\ 8125 IPR013219 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This family of proteins corresponds to mitochondrial ribosomal subunit S27.

    \ 7240 IPR009984 \

    This family contains the eukaryotic protein geminin (approximately 200 residues long). Geminin inhibits DNA replication by preventing the incorporation of MCM complex into prereplication complex, and is degraded during the mitotic phase of the cell cycle. It has been proposed that geminin inhibits DNA replication during S, G2, and M phases and that geminin destruction at the metaphase-anaphase transition permits replication in the succeeding cell cycle PUBMED:9635433.

    \ 5963 IPR009309 \

    This family consists of several hypothetical bacterial proteins. The function of the family is unknown.

    \ 1417 IPR001073 \ \ C1q is a subunit of the C1 enzyme complex that activates the serum complement\ system. C1q comprises 6 A, 6 B and 6 C chains. These share the same topology, each\ possessing a small, globular N-terminal domain, a collagen-like Gly/Pro-rich central\ region, and a conserved C-terminal region, the C1q domain PUBMED:1706597. The C1q\ protein is produced in collagen-producing cells and shows sequence and structural\ similarity to collagens VIII and X PUBMED:2591537, PUBMED:2019595.\ \ 1528 IPR000780 \

    Methyl transfer from the ubiquitous S-adenosyl-L-methionine (AdoMet) to either nitrogen, oxygen or carbon atoms is frequently employed in diverse organisms ranging from bacteria to plants and mammals. The reaction is catalyzed by methyltransferases (Mtases) and modifies DNA, RNA, proteins and small molecules, such as catechol for regulatory purposes. The various aspects of the role of DNA methylation in prokaryotic restriction-modification systems and in a number of cellular processes in eukaryotes including gene regulation and differentiation is well documented.

    \ \

    Three classes of DNA Mtases transfer the methyl group from AdoMet to the target base to form either N-6-methyladenine, or N-4-methylcytosine, or C-5- methylcytosine. In C-5-cytosine Mtases, ten conserved motifs are arranged in the same order PUBMED:8127644. Motif I (a glycine-rich or closely related consensus sequence; FAGxGG in M.HhaI PUBMED:8343957), shared by other AdoMet-Mtases PUBMED:2684970, is part of the cofactor binding site and motif IV (PCQ) is part of the catalytic site. In contrast, sequence comparison among N-6-adenine and N-4-cytosine Mtases indicated two of the conserved segments PUBMED:2690010, although more conserved segments may be present. One of them corresponds to motif I in C-5-cytosine Mtases, and the other is named (D/N/S)PP(Y/F). Crystal structures are known for a number of Mtases PUBMED:7607476, PUBMED:8343957, PUBMED:8127644, PUBMED:7971991. The cofactor binding sites are almost identical and the essential catalytic amino acids coincide. The comparable protein folding and the existence of equivalent amino acids in similar secondary and tertiary positions indicate that many (if not all) AdoMet-Mtases have a common catalytic domain structure. This permits tertiary structure prediction of other DNA, RNA, protein, and small-molecule AdoMet-Mtases from their amino acid sequences PUBMED:7897657.

    \ \

    Flagellated bacteria swim towards favourable chemicals and away from deleterious ones. Sensing of \ chemoeffector gradients involves chemotaxis receptors, transmembrane (TM) proteins that detect \ stimuli through their periplasmic domains and transduce the signals via their cytoplasmic domains \ PUBMED:, PUBMED:9115443. Signalling outputs from these \ receptors are influenced both by the binding of the chemoeffector ligand to their periplasmic \ domains and by methylation of specific glutamate residues on their cytoplasmic domains. Methylation \ is catalysed by CheR, an S-adenosylmethionine-dependent methyltransferase PUBMED:9115443, which \ reversibly methylates specific glutamate residues within a coiled coil region, to form gamma-glutamyl methyl ester residues PUBMED:9115443, PUBMED:9628482. The structure of the S. typhimurium \ chemotaxis receptor methyltransferase CheR, bound to S-adenosylhomocysteine, has been determined \ to a resolution of 2.0 A PUBMED:9115443. The structure reveals CheR to be a two-domain protein, with \ a smaller N-terminal helical domain linked via a single polypeptide connection to a larger \ C-terminal alpha/beta domain. The C-terminal domain has the characteristics of a nucleotide-binding \ fold, with an insertion of a small anti-parallel beta-sheet subdomain. The S-adenosylhomocysteine-binding site is formed mainly by the large domain, with contributions from residues within the \ N-terminal domain and the linker region PUBMED:9115443.

    \ 5056 IPR007893 \

    This domain is found in protein U, a spore coat protein produced at the late stage of development of Myxococcus xanthus. Protein U is produced as a secretory precursor, pro-protein U, which is then secreted across the membrane to\ assemble on the spore surface PUBMED:1904442. \ \ \ This domain is also found in a number of the genes within a conserved polycistronic operon that encodes a novel chaperone-usher pili assembly system. Examples are CsuA/B of Acinetobacter baumanni, and the CsuA, CsuB and CsuE of Vibrio parahaemolyticus and the related genes of Yersinia pestis.

    \ \

    In Acinetobacter baumanni, csuC and csuE are required in the early steps of the process that that leads to biofilm formation. The conservation of the genes and gene order among unrelated bacteria, suggests that the csu operon is widespread and is involved in surface pilus formation which allows the bacteria to form biofilms on abiotic surfaces, a property that may aid there survival in their natural environment PUBMED:14663080.

    \ \ 3549 IPR005549 \

    Members of this family are components of the mitotic spindle. It has been shown that Nuf2 from yeast is part of a complex called the Ndc80p complex PUBMED:11266451. This complex is thought to bind to the microtubules of the spindle. An arabidopsis protein has been included in this family that has previously not been identified as a member of this family, . The match is not strong, but in common with other members of this family contains coiled-coil to the C-terminus of this region.

    \ 984 IPR002035 \ The von Willebrand factor is a large multimeric glycoprotein found in blood\ plasma. Mutant forms are involved in the aetiology of bleeding disorders \ PUBMED:8440408. In von Willebrand factor, the type A domain (vWF) is the prototype for\ a protein superfamily. The vWF domain is found in various plasma proteins:\ complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen \ types VI, VII, XII and XIV; and other extracellular proteins PUBMED:8412987, PUBMED:8145250, PUBMED:1864378. Although the majority of VWA-containing proteins are extracellular, the most ancient ones present in all eukaryotes are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport and the proteasome. A common feature appears to be involvement in multiprotein complexes. Proteins\ that incorporate vWF domains participate in numerous biological events\ (e.g. cell adhesion, migration, homing, pattern formation, and signal\ transduction), involving interaction with a large array of ligands PUBMED:8412987. A number of human diseases arise from mutations in VWA domains. Secondary structure prediction from 75 aligned vWF sequences has revealed a largely alternating sequence of alpha-helices and beta-strands PUBMED:8145250. Fold\ recognition algorithms were used to score sequence compatibility with a\ library of known structures: the vWF domain fold was predicted to be a\ doubly-wound, open, twisted beta-sheet flanked by alpha-helices PUBMED:7843416. \ 3D structures have been determined for the I-domains of integrins CD11b\ (with bound magnesium) PUBMED:7867070 and CD11a (with bound manganese) PUBMED:7479767. The domain \ adopts a classic alpha/beta Rossmann fold and contains an unusual metal \ ion coordination site at its surface. It has been suggested that this site\ represents a general metal ion-dependent adhesion site (MIDAS) for binding \ protein ligands PUBMED:7867070. The residues constituting the MIDAS motif in the CD11b\ and CD11a I-domains are completely conserved, but the manner in which the \ metal ion is coordinated differs slightly PUBMED:7479767.\ 6232 IPR009438 \

    This family consists of several plant specific phytosulfokine precursor proteins. Phytosulfokines, are active as either a pentapeptide or a C-terminally truncated tetrapeptide. These compounds were first isolated because of their ability to stimulate cell division in somatic embryo cultures of Asparagus officinalis PUBMED:12049922.

    \ 4102 IPR007855 \

    Eukaryotic RNA-dependent RNA polymerases (RDRP) are involved in the amplification of regulatory microRNAs during post-transcriptional gene silencing PUBMED:12553882. This enzyme is highly conserved in most eukaryotes but is missing in archaea and bacteria. The core catalytic domain of RDRP enzymes is structurally similar to the beta' subunit of DNA-dependent RNA polymerases (DDRP), however the other domains of DDRP show no similarity to those of RDRP.

    \ \ \ 7019 IPR010807 \

    This family consists of several short, hypothetical bacterial proteins of around 70 residues in length. Members of this family 8 highly conserved cysteine residues. The function of the family is unknown.

    \ 730 IPR001247 \ This domain includes the 3'-5' exoribonucleases, ribonuclease PH that contains a single \ copy of this domain, and removes nucleotide residues following the -CCA terminus of \ tRNA and polyribonucleotide nucleotidyltransferase (PNPase) that contains two tandem \ copies of the domain and is involved in mRNA degradation in a 3'-5' direction. PNPase\ is involved in the RNA degradosome, a multi-enzyme complex important in RNA processing \ and messenger RNA degradation. In yeast these proteins are components of the exosome \ 3'-5' exoribonuclease complex that is required for 3' processing of the 5.8S rRNA\ PUBMED:9390555.\ 4411 IPR000980 \

    The Src homology 2 (SH2) domain is a protein domain of about 100 amino-acid residues first identified as a \ conserved sequence region between the oncoproteins Src and Fps PUBMED:3025655. Similar sequences were later \ found in many other intracellular signal-transducing proteins PUBMED:1377638. SH2 domains function as \ regulatory modules of intracellular signalling cascades by interacting with high affinity to \ phosphotyrosine-containing target peptides in a sequence-specific, SH2 domains recognize between 3-6 residues C-terminal to the phosphorylated tyrosine in a fashion that differs from\ one SH2 domain to another, and strictly phosphorylation-dependent \ manner PUBMED:7883800, PUBMED:15335710, PUBMED:14731533, PUBMED:7531822. They are found in a wide variety of protein \ contexts e.g., in association with catalytic domains of phospholipase Cy (PLCy) and the non-receptor protein \ tyrosine kinases; within structural proteins such as fodrin and tensin; and in a group of small adaptor \ molecules, i.e Crk and Nck. The domains are frequently found as repeats in a single protein \ sequence and will then often bind both mono- and di-phosphorylated substrates.

    The structure of the SH2 domain belongs to the alpha+beta class, its overall shape forming a \ compact flattened hemisphere. The core structural elements comprise a central hydrophobic anti-parallel \ beta-sheet, flanked by 2 short alpha-helices. The loop between \ strands 2 and 3 provides many of the binding interactions with the phosphate group of its phosphopeptide \ ligand, and is hence designated the phosphate binding loop, the phosphorylated ligand binds perpendicular to the beta-sheet and typically interacts with the phosphate binding loop and a hydrophobic binding pocket that interacts with a pY+3 side chain. The N- and C-termini of the domain are close together in space and on the opposite face from the phosphopeptide binding surface and it has been speculated that this has facilitated their integration into surface-exposed regions of host proteins PUBMED:11911873.

    \ 336 IPR004701 \

    The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) PUBMED:8246840, PUBMED:2197982 is a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates and coupled with translocation across the cell membrane, makes the PTS a link between the uptake and metabolism of sugars.

    \ \

    The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred via a signal transduction pathway, to enzyme I (EI) which in turn transfers it to a phosphoryl carrier, the histidine protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease, a membrane-bound complex known as enzyme 2 (EII), which transports the sugar to the cell. EII consists of at least three structurally distinct domains IIA, IIB and IIC PUBMED:1537788. These can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII).

    \ \

    The first domain (IIA or EIIA) carries the first permease-specific phosphorylation site, a histidine which is phosphorylated by phospho-HPr. The second domain (IIB or EIIB) is phosphorylated by phospho-IIA on a cysteinyl or histidyl residue, depending on the sugar transported. Finally, the phosphoryl group is transferred from the IIB domain to the sugar substrate concomitantly with the sugar uptake processed by the IIC domain. This third domain (IIC or EIIC) forms the translocation channel and the specific substrate-binding site.

    \ \

    An additional transmembrane domain IID, homologous to IIC, can be found in some PTSs, e.g. for mannose PUBMED:8246840, PUBMED:1537788, PUBMED:7815935, PUBMED:11361063.

    \ \

    The Man family is unique in several respects among PTS permease families.\

  • It is the only PTS family in which members possess a IID protein.
  • It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.
  • Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.
  • \

    The mannose permease of Escherichia coli, for example, can transport and phosphorylate glucose, mannose, fructose, glucosamine, N-acetylglucosamine, and other sugars. Other members of this can transport sorbose, fructose and N-acetylglucosamine.

    \ \

    This family is specific for the IIA components.

    \ 3117 IPR005643 \

    The c-Jun NH(2)-terminal kinase (JNK) is a member of an evolutionarily conserved sub-family of mitogen-activated protein (MAP) kinases PUBMED:11402333, PUBMED:11790549.

    \ 5491 IPR008527 \ This family consists of several proteins of unknown function Raphanus sativus (Radish) and Brassica napus (Rape).\ 1642 IPR002429 \

    Cytochrome c oxidase () PUBMED:6307356, PUBMED:8083153 is an oligomeric enzymatic complex which is a component of the respiratory chain and is involved in the transfer of electrons from cytochrome c to oxygen. In eukaryotes this enzyme complex is located in the mitochondrial inner membrane; in aerobic prokaryotes it is found in the plasma membrane. The number of polypeptides in the complex ranges from 3-4 (prokaryotes), up to 13(mammals).

    \

    Subunit 2 (CO II) transfers the electrons from cytochrome c to the catalytic subunit 1. It contains two adjacent transmembrane regions in its N-terminus and the major part of the protein is exposed to the periplasmic or to the mitochondrial intermembrane space, respectively. CO II provides the substrate-binding site and contains a copper center called Cu(A), probably the primary acceptor in cytochrome c oxidase. An exception is the corresponding subunit of the cbb3-type oxidase which lacks the copper A redox-center. Several bacterial CO II have a C-terminal extension that contains a covalently bound heme c.

    \ 8144 IPR013170 \

    The cwf21 family is involved in mRNA splicing. It has been isolated as a subcomplex of the splicosome in Schizosaccharomyces pombe PUBMED:11884590.

    \ 6785 IPR010714 \

    This entry represents the C terminus (approximately 500 residues) of the eukaryotic coatomer alpha subunit. Coatomer (COPI) is a large cytosolic protein complex, which forms a coat around vesicles budding from the Golgi apparatus. Such coatomer-coated vesicles have been proposed to play a role in many distinct steps of intracellular transport PUBMED:9261053. This domain is found along with the domain.

    \ 4933 IPR003176 \ This domain represents the C-terminal domain of the viral DNA- binding protein, a multi functional protein involved in DNA replication and transcription control.\ 6884 IPR009766 \

    This family represents a conserved region approximately 130 residues long within a number of proteins of unknown function that seem to be specific to the white spot syndrome virus (WSSV).

    \ 5622 IPR008862 \ This family consists of several eukaryotic T-complex protein 11 (Tcp11) related sequences. Tcp11 is only expressed in fertile adult mammalian testes and is thought to be important in sperm function and fertility. The family also contains the Saccharomyces cerevisiae Sok1 protein which is known to suppress cyclic AMP-dependent protein kinase mutants PUBMED:8065298.\ 7351 IPR011080 \

    This entry represents bacterial domains with an Ig-like fold. These domains are found in a variety of bacterial surface proteins.

    \ 1222 IPR006781 \

    Exchangeable\ apolipoproteins are water-soluble protein components of lipoproteins that solubilize lipids and regulate their metabolism by binding to cell receptors or activating\ specific enzymes. Apolipoprotein C-I (ApoC-1) is the smallest exchangeable apolipoprotein and transfers among HDL (high density lipoprotein), VLDL (very low-density lipoprotein) and chlylomicrons. ApoC-1 activates lecithin:choline acetyltransferase (LCAT), inhibits cholesteryl ester transfer protein, can inhibit hepatic lipase and phospholipase 2 and can stimulate cell growth. ApoC-1 delays the clearance of beta-VLDL by inhibiting its uptake via the LDL receptor-related pathway PUBMED:11580293. ApoC-1 has been implicated in hypertriglyceridemia PUBMED:11353333, and Alzheimer s disease PUBMED:11741391.

    ApoC-1 is believed to\ comprise of two dynamic helices that are stabilized by interhelical interactions and are connected by a short linker region. The minimal folding unit in the lipid-free state of this and other exchangeable apolipoproteins comprises the\ helix-turn-helix motif formed of four 11-mer sequence repeats.

    \ 1968 IPR004988 \

    This is a family of proteins of unknown function.

    \ 1593 IPR002157 \

    In eukaryotes, a number of proteins are involved in the binding and transport of cobalamin (vitamin B12) PUBMED:6313022. Some of them have been sequenced and have been shown PUBMED:1708393, PUBMED:8439564 to be evolutionary related and are listed below:

    \ \ \ \

    These glycoproteins are polypeptides of about 400 amino acids that share many regions of similarity.

    \ 2138 IPR007621 \ This is a family of uncharacterised proteins. They are found in both eukarya and eubacteria. In eubacteria the region is towards the N-terminal of the protein and is accompanied by an N-terminal signal sequence. The C-terminal of eubacterial proteins typically contains one or more putative transmembrane regions. In eukaryotes the region is not accompanied by a signal sequence.\ 4524 IPR005575 \

    Statherin functions biologically to inhibit the nucleation and growth of calcium phosphate minerals. The N-terminus of statherin is highly charged, the glutamic acids of which have been shown to be important in the recognition hydroxyapatite PUBMED:1313424.

    \ 721 IPR002498 \ The family consists of various type I, II and III phosphatidylinositol-4-phosphate 5-kinases (PIP5K enzymes). They contain a region from the common kinase core found in the type\ I phosphatidylinositol-4-phosphate 5-kinase (PIP5K) family as described in\ PUBMED:9535851. \ PIP5K catalyse the formation of phosphoinositol-4,5-bisphosphate via the \ phosphorylation of phosphatidylinositol-4-phosphate a precursor in the \ phosphinositide signaling pathway.\ 5099 IPR007936 \

    This family contains several bacterial virulence-associated protein E like proteins.

    \ 2821 IPR001437 \ Bacterial proteins greA and greB are necessary for efficient RNA\ polymerase transcription elongation past template-encoded arresting sites.\ Arresting sites in DNA have the property of trapping a certain fraction of\ elongating RNA polymerases that pass through, resulting in locked DNA/RNA/\ polymerase ternary complexes. Cleavage of the nascent transcript by cleavage\ factors, such as greA or greB, allows the resumption of elongation from the\ new 3'terminus PUBMED:8431948, PUBMED:7854424.

    Escherichia coli GreA and GreB are sequence homologues and have homologues in\ every known bacterial genome PUBMED:12914698. GreA induces cleavage two or three nucleotides behind the terminus\ and can only prevent\ the formation of arrested complexes while greB releases longer sequences up to eighteen nucleotides in length and can\ rescue preexisting arrested complexes. These functional differences correlate with a\ distinctive structural feature, the distribution of positively charged residues on one face of the N-terminal coiled\ coil. Remarkably, despite close functional similarity, the prokaryotic Gre factors have no\ sequence or structural similarity with eukaryotic TFIIS.

    \ 1090 IPR006092 \

    Mammalian Co-A dehydrogenases () are enzymes that catalyse the first step in each cycle of beta-oxidation in mitochondion. Acyl-CoA dehydrogenases PUBMED:3326738, PUBMED:2777793, PUBMED:8034667 catalyze the alpha,beta-dehydrogenation of acyl-CoA thioesters to the corresponding trans 2,3-enoyl CoA-products with concommitant reduction of enzyme-bound FAD. Reoxidation of the flavin involves transfer of electrons to ETF (electron transfering flavoprotein). These enzymes are homodimers containing one molecule of FAD.

    The monomeric enzyme is folded into three domains of approximately equal size. The N-terminal and the C-terminal are mainly alpha-helices packed together, and the middle domain consists of two orthogonal beta-sheets. The flavin ring is buried in the crevise between two alpha-helical domains and the beta-sheet of one subunit, and the adenosine pyrophosphate moiety is stretched into the subunit junction with one formed by two C-terminal domains PUBMED:8356049.

    The N-terminal domain of Acyl-CoA dehydrogenase is an all-alpha domain, on dimerisation, the N-terminal of one molecule extends into the other dimer and lies on the surface of the molecule.

    \ 1435 IPR001580 \

    Synonym(s): Calregulin, CRP55, HACBP

    \

    Calreticulin PUBMED:1497605 is a high-capacity calcium-binding protein which is present in most tissues and located at the periphery of the endoplasmic (ER) and the sarcoplamic reticulum (SR) membranes. It probably plays a role in the storage of calcium in the lumen of the ER and SR and it may well have other important functions.

    \

    Structurally, calreticulin is a protein of about 400 amino acid residues consisting of three domains:\

    \

    Calreticulin is evolutionarily related to several other calcium-binding proteins, including Onchocerca volvulus antigen RAL-1, calnexin PUBMED:8203019 and calmegin PUBMED:8126001.

    \ 5729 IPR008613 \ Extracellular Ca2+-dependent nuclease YokF from Bacillus subtilis and several other surface-exposed proteins from diverse bacteria are encoded in the genomes in two paralogous forms that differ by a ~45 amino acid fragment, which comprises a novel conserved domain. Sequence analysis of this domain revealed a conserved DxDxDGxxCE motif, which is strikingly similar to the Ca2+-binding loop of the calmodulin-like EF-hand domains, suggesting an evolutionary relationship between them. Functions of many of the other proteins in which the novel domain, named Excalibur (extracellular calcium-binding region), is found, as well as a structural model of its conserved motif are consistent with the notion that the Excalibur domain binds calcium. This domain is but one more example of the diversity of structural contexts surrounding the EF-hand-like calcium-binding loop in bacteria. This loop is thus more widespread than hitherto recognised and the evolution of EF-hand-like domains is probably more complex than previously appreciated PUBMED:12694917.\ 3641 IPR005065 \

    Platelet-activating factor acetylhydrolase (PAF-AH) is a subfamily of phospholipase A2, , responsible for inactivation of platelet-activating factor through cleavage of an\ acetyl group. Three known PAF-AHs are the brain heterotrimeric PAF-AH Ib, the extracellular,\ plasma PAF-AH (pPAF-AH), and the intracellular PAF-AH isoform II (PAF-AH II).

    \ 4250 IPR001912 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S4 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S4 is known to bind directly to 16S ribosomal RNA. Mutations in S4 have been shown to increase translational error frequencies PUBMED:2041737, PUBMED:.\ S4 is a protein of 171 to 205 amino-acid residues (except for NAM9, which is much larger). The crystal structure of a bacterial S4 protein revealed a two domain molecule. The first domain is composed of four helices in the known structure. The second domain is in the middle of the first one and displays some structural homology with the ETS DNA binding domain PUBMED:9707415.\ This family includes small ribosomal subunit S4 from prokaryotes and S9 from animals.

    \ 4039 IPR004277 \ Phosphatidyl serine synthase is also known as serine exchange enzyme (). This family represents eukaryotic PSS I and II, membrane bound proteins that catalyse the replacement of the head group of a phospholipid\ (phosphotidylcholine or phosphotidylethanolamine) by L-serine.\ 4101 IPR007476 \ Members of the RdgC family may have exonuclease activity. RdgC is required for efficient pilin variation in Neisseria gonorrhoeae, suggesting that it may be involved in recombination reactions PUBMED:10655208. In Escherichia coli, RdgC is required for growth in recombination-deficient exonuclease-depleted strains. Under these conditions, RdgC may act as an exonuclease to remove collapsed replication forks, in the absence of the normal repair mechanisms PUBMED:8807285.\ 4429 IPR003478 \ The reoviral gene S1 encodes for haemagglutinin (sigma 1 protein), an outer capsid protein and a major factor in determining virus-host cell interactions. Sigma 1s is one of two translation products of the S1 gene.\ 5882 IPR010330 \

    Many of the members of this family are described as transcription factors. CoiA falls within a competence-specific operon in Streptococcus. CoiA is an uncharacterised protein.

    \ 5911 IPR010345 \

    IL-17 is a potent proinflammatory cytokine produced by activated memory T cells PUBMED:11781375. The IL-17 family is thought to represent a distinct signalling system that appears to have been highly conserved across vertebrate evolution PUBMED:11781375.

    \ 795 IPR007528 \ This family includes RINT-1, a Rad50 interacting protein which participates in radiation induced checkpoint control PUBMED:11096100, as well as the TIP-1 protein from yeast that seems to be involved in a complex with Sec20p that is required for golgi transport PUBMED:8334998.\ 1050 IPR003212 \ This family contains members of the hyperthermophilic archaebacterium 7kD DNA-binding/endoribonuclease P2 family. There are five 7 kDa DNA-binding proteins, 7a-7e, found as monomers in the cell. Protein 7e shows the tightest DNA-binding ability.\ 5813 IPR009246 \

    This family consists of several bacterial ethanolamine ammonia-lyase light chain (EutC) sequences. Ethanolamine ammonia-lyase is a bacterial enzyme that catalyses the adenosylcobalamin-dependent conversion of certain vicinal amino alcohols to oxo compounds and ammonia PUBMED:2197274.

    \ 2050 IPR007184 \

    Glycosidases or glycosyl hydrolases are a big and widespread family of enzymes that hydrolyse the glycosidic bonds between carbohydrates or between a carbohydrate and an aglycone moiety. On the basis of sequence and structural similarity, the glycoside hydrolase family belongs to the beta-fructosidase (furanosidase) superfamily of glycosyl hydrolases. This leads to the prediction that proteins of this family have a glycosidase (glycoside hydrolase) activity and, most probably, act on a furanoside residue (fructose, arabinose and ribose). Crystal structure from Thermotoga maritima a member of this family (PDB:1VKD], determined to high-resolution by Structural Genomics initiatives, reveals a five-bladed beta-propeller fold with three acidic residues forming the active site.

    \ \ 5025 IPR001293 \ Some of the proteins that have this domain are mammalian signal transducers associated with the cytoplasmic\ domain of the 75 kDa tumor necrosis factor receptor. A heterocomplex, homodimer or heterodimer of TRAF1 and\ TRAF2, binds to the N-terminal of the inhibitor of apoptosis proteins 1 and 2 (IAPS) and recruits them to the tumor\ necrosis factor receptor 2. Other proteins, F45G2.6 protein from C. elegans and DG17 protein from slime mold also\ have this domain.\ 1892 IPR003744 \ This is a family of uncharacterized proteins. Conserved regions of hydrophobicity suggest that all members of the family may be integral\ membrane proteins. \ 6700 IPR009669 \

    This family consists of several EspG like proteins from Citrobacter rodentium and Escherichia coli. EspG is secreted by the type III secretory system and is translocated into host epithelial cells. EspG is homologous with Shigella flexneri protein VirA and can rescue invasion in a Shigella virA mutant, indicating that these proteins are functionally equivalent in Shigella. EspG plays an accessory but as yet undefined role in EPEC virulence that may involve intestinal colonisation PUBMED:11349072.

    \ 3788 IPR003898 \

    A large group of bacterial exotoxins are referred to as "A/B toxins", \ essentially because they are formed from two subunits PUBMED:8225592. The "A" subunit\ possesses enzyme activity, and is transferred to the host cell following a conformational change in the membrane-bound transport "B" subunit PUBMED:8225592.

    \

    Bordetella pertussis is the causative agent of whooping cough, and is a \ Gram-negative aerobic coccus. Its major virulence factor is the pertussis \ toxin, an A/B exotoxin that mediates both colonisation and toxaemic stages\ of the the disease PUBMED:3704651, PUBMED:2873570. Recombinant, inactive forms of the 5 subunits that make up the toxin have proven to be good vaccines.\ The S1 ("A") subunit of pertussis toxin causes the characteristic sound of \ the "whoop" in whooping cough. It achieves this through ADP-ribosylation of \ host Gi alpha-units, an adenylate cyclase inhibitor PUBMED:3704651, PUBMED:2873570. Uninhibited, this enzyme produces elevated levels of cAMP, leading to increased cell exudate and inflammation in the lungs PUBMED:2737291.

    \

    The crystal structure of pertussis toxin has been determined to 2.9A \ resolution PUBMED:8075982. The catalytic A-subunit (S1) shares structural similarity with other ADP-ribosylating bacterial toxins, although differences in the C-terminal portion explain its unique activation mechanism. Despite its\ heterogeneous subunit composition, the structure of the cell-binding\ B-oligomer (S2, S3, two copies of S4, and S5) resembles the symmetrical\ B-pentamers of the cholera and Shiga toxin families, but it interacts\ differently with the A-subunit and there is virtually no sequence similarity between B-subunits of the different toxins.

    \ 525 IPR001752 \

    Kinesin PUBMED:8542443, PUBMED:2142876, PUBMED:14732151 is a microtubule-associated force-producing protein that may play a role in organelle transport. The kinesin motor activity is directed toward the microtubule's plus end. Kinesin is an oligomeric complex composed of two heavy chains and two light chains. The maintenance of the quaternary structure does not require interchain disulphide bonds.

    \

    The heavy chain is composed of three structural domains: a large globular N-terminal domain which is responsible for the motor activity of kinesin (it is known to hydrolyze ATP, to bind and move on microtubules), a central alpha-helical coiled coil domain that mediates the heavy chain dimerization; and a small globular C-terminal domain which interacts with other proteins (such as the kinesin light chains), vesicles and membranous organelles.

    \

    A number of proteins have been recently found that contain a domain similar to that of the kinesin 'motor' domain PUBMED:8542443, PUBMED:1832505:\

    \

    The kinesin motor domain is located in the N-terminal part of most of the above proteins, with the exception of KAR3, klpA, and ncd where it is located in the C-terminal section.

    \

    The kinesin motor domain contains about 330 amino acids. An ATP-binding motif of type A is found near position 80 to 90, the C-terminal half of the domain is involved in microtubule-binding.

    \ 4090 IPR001936 \

    Ras proteins are membrane-associated molecular switches that bind GTP and GDP and slowly hydrolyze GTP to GDP PUBMED:1898771. This intrinsic GTPase activity of ras is stimulated by a family of proteins collectively known as 'GAP' or GTPase-activating proteins PUBMED:1883874, PUBMED:7945277. As it is the GTP bound form of ras which is active, these proteins are said to be down-regulators of ras.

    \

    The Ras GTPase-activating proteins are quite large (from 765 residues for sar1 to 3079 residues for IRA2) but share only a limited (about 250 residues) region of sequence similarity, referred to as the 'catalytic domain' or rasGAP domain.

    \

    Note: There are distinctly different GAPs for the rap and rho/rac subfamilies of ras-like proteins (reviewed in reference PUBMED:8259209) that do not share sequence similarity with ras GAPs.

    \ 7659 IPR012905 \

    The members of this family are similar to the galactophilic lectin-1 expressed by P. aeruginosa ((PA-IL, ). Lectins recognising specific carbohydrates found on the surface of host cells are known to be involved in the initiation of infections by this organism. The protein is thought to be organised into an extensive network of beta-sheets, as is the case with many other lectins PUBMED:1429650.

    \ 1255 IPR001873 \

    The apical membrane of many tight epithelia contains sodium channels that\ are primarily characterised by their high affinity to the diuretic blocker\ amiloride PUBMED:8181670, PUBMED:8905643, PUBMED:8905643, PUBMED:7499195. These channels mediate the first step of active sodium\ reabsorption essential for the maintenance of body salt and water\ homeostasis PUBMED:8181670. In vertebrates, the channels control reabsorption of\ sodium in kidney, colon, lung and sweat glands; they also play a role in\ taste perception.

    \

    Members of the epithelial Na+ channel (ENaC) family fall into four\ subfamilies, termed alpha, beta, gamma and delta PUBMED:8905643. The proteins exhibit\ the same apparent topology, each with two transmembrane (TM) spanning\ segments, separated by a large extracellular loop. In most ENaC proteins\ studied to date, the extracellular domains are highly conserved and contain\ numerous cysteine residues, with flanking C-terminal amphipathic TM regions,\ postulated to contribute to the formation of the hydrophilic pores of the\ oligomeric channel protein complexes. It is thought that the well-conserved\ extracellular domains serve as receptors to control the activities of the\ channels.

    \

    Vertebrate ENaC proteins are similar to degenerins of Caenorhabditis elegans\ PUBMED:7929098: deg-1, del-1, mec-4, mec-10 and unc-8. These proteins can be mutated to cause neuronal degradation, and are also thought to form sodium channels.

    \

    Structurally, the proteins that belong to this family consist of about 510\ to 920 amino acid residues. They are made of an intracellular N-terminus\ region followed by a transmembrane domain, a large extracellular loop, a\ second transmembrane segment and a C-terminal intracellular tail PUBMED:7929098.

    \ 2315 IPR007767 \ This family contains uncharacterised proteins from Caenorhabditis elegans.\ 4098 IPR007721 \ The Escherichia coli high-affinity ribose-transport system consists of six proteins encoded by the rbs operon (rbsD, rbsA, rbsC, rbsB, rbsK and rbsR). Of the six components, RbsD is the only one whose function is unknown although it is thought that it somehow plays a critical role in PtsG-mediated ribose transport PUBMED:11320319. This family also includes FucU a protein from the fucose biosynthesis operon that is presumably also involved in fucose transport by similarity to RbsD.\ 2123 IPR007410 \ This is a putative membrane or periplasmic protein.\ 2318 IPR007787 \ This family contains uncharacterised Chlamydia proteins.\ 5953 IPR010365 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 5686 IPR008411 \ The bovine lentivirus also known as the bovine immunodeficiency-like virus (Bovine immunodeficiency virus) has conserved and hypervariable regions in the surface envelope gene PUBMED:9032387.\ 2671 IPR007652 \ The glycosphingolipids (GSL) form part of eukaryotic cell membranes. They consist of a hydrophilic carbohydrate moiety linked to a hydrophobic ceramide tail embedded within the lipid bilayer of the membrane. Lactosylceramide, Gal1,4Glc1Cer (LacCer), is the common synthetic precursor to the majority of GSL found in vertebrates. Alpha 1.4-glycosyltransferases utilise UDP donors and transfer the sugar to a beta-linked acceptor. This region appears to be confined to higher eukaryotes. No function has been yet assigned to this region PUBMED:10854428.\ 6633 IPR010654 \

    This family consists of several Bacteriophage lambda tail assembly protein I and related phage and bacterial sequences. Members of this family are typically around 200 residues in length. The function of this family is unknown.

    \ 3001 IPR003134 \

    The cortactin or HS1 repeat is a tandem repeat of 37-amino acid actin-binding domains. The repeat is named after human cortactin and HS1, proteins involved in cytoskeletal rearrangements implicated in cell migration and apoptosis, respectively. Cortactin contains 6.5 tandem copies of the repeat and is conserved among metazoans, although e.g. insect cortactin and splice variants contain fewer copies. Hematopoietic lineage cell specific protein 1 (HS1) contains 3.5 tandem copies of the cortactin repeat and is mainly expressed in hematopoietic cells. Both cortactin and HS1 contain a C-terminal SH3 domain (). The cortactin repeat domain binds filamentous actin (F-actin) in proteins that modulate the assembly of the actin cytoskeleton. Secondary structure predictions indicate that the cortactin repeat could exhibit a helix-turn-helix structure PUBMED:12534372, PUBMED:15186216.

    \ 1458 IPR006858 \ This protein is found in the nucleus of infected cells and may act as a transcriptional regulator. It induces apoptosis, and is also known as apoptin ().\ 5454 IPR008508 \ This family consists of several hypothetical archaeal proteins of unknown function.\ 5791 IPR009237 \

    US3 of human cytomegalovirus is an endoplasmic reticulum resident transmembrane glycoprotein that binds to major histocompatibility complex class I molecules and prevents their departure. The endoplasmic reticulum retention signal of the US3 protein is contained in the luminal domain of the protein PUBMED:12525649.

    \ 4977 IPR007636 \ This family consists of type II restriction enzymes () that recognise the double-stranded sequence CTCGAG and cleave after C-1 PUBMED:3001639.\ 1291 IPR002951 \ Human genes containing triplet repeats can markedly expand in length, \ leading to neuropsychiatric disease. Expansion of triplet repeats \ explains the phenomenon of anticipation, i.e. the increasing severity or \ earlier age of onset in successive generations in a pedigree PUBMED:8325628. \ Dentatorubral pallidoluysian atrophy (DRPLA, or Smith's disease) is one of \ five disorders now known to result from expansion of a CAG trinucleotide\ repeat encoding glutamine PUBMED:8965642.\ The reported full length cDNA sequence encodes a serine repeat and a region\ of alternating acidic and basic amino acids, in addition to the glutamine\ repeat PUBMED:8965642, PUBMED:7842016. It is believed that the pathology of DRPLA may arise from the \ altered structure and function of the abnormal protein PUBMED:8965642. Although the \ function of the protein is still unknown, its unusual amino acid composition\ may provide clues toward understanding neurodegenerative diseases associated\ with triplet repeat expansion PUBMED:7842016.\ 2285 IPR006967 \ This family of proteins is found in the caudovirales and prophage. It may be a head/tail component or be involved in tail assembly.\ 3769 IPR005320 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S51 (clan PC(S)). The type example being dipeptidase E (alpha-aspartyl dipeptidase) from Escherichia coli. The family contains alpha-aspartyl dipeptidases (dipeptidase E) and cyanophycinases.

    \ \

    The three-dimensional structure of Salmonella typhimurium aspartyl dipeptidase, peptidase E has been determine at 1.2-A resolution. The structure of this 25-kDa enzyme consists of two mixed beta-sheets forming a V, flanked by six alpha-helices. The active site contains a Ser-His-Glu catalytic triad and is the first example of a serine peptidase/protease with a glutamate in the catalytic triad. The active site Ser is located on a strand-helix motif reminiscent of that found in alpha/beta-hydrolases, but the polypeptide fold and the organisation of the catalytic triad differ from those of the known serine proteases. This enzyme appears to represent a new example of convergent evolution of peptidase activity PUBMED:11106384.

    \ \

    Alpha-aspartyl dipeptidase hydrolyses dipeptides containing N-terminal aspartate residues, asp-|-xaa. It does not act on peptides with N-terminal Glu, Asn or Gln, nor does it cleave isoaspartyl peptides. In the cyanobacteria, cyanophycinase is an exopeptidase that catalyses the hydrolytic cleavage of multi-l-arginyl-poly-l-aspartic acid (cyanophycin; a water- insoluble reserve polymer) into aspartate-arginine dipeptides.

    \ \ \ \ 7309 IPR011086 \

    This domain of unknown function is found in a limited set of Bradyrhizobium proteins. There appears to be a periodic -DG- motif in the domain.

    \ 2562 IPR001689 \

    The flagellar motor switch in Escherichia coli and Salmonella typhimurium regulates the direction of flagellar rotation and hence controls swimming behaviour PUBMED:8224881. The switch is a complex apparatus that responds to signals transduced by the chemotaxis sensory signalling system during chemotactic behaviour PUBMED:8224881. CheY, the chemotaxis response regulator, is believed to act directly on the switch to induce tumbles in the swimming pattern, but no physical interactions of CheY and switch proteins have yet been demonstrated.

    \

    The switch complex comprises at least three proteins - FliG, FliM and FliN. It has been shown that FliG interacts with FliM, FliM interacts with itself, and FliM interacts with FliN PUBMED:8631704. Several residues within the middle third of FliG appear to be strongly involved in the FliG-FliM interaction, with residues near the N or C termini being less important PUBMED:8631704. Such clustering suggests that FliG-FliM interaction plays a central role in switching.

    \

    Analysis of the FliG, FliM and FliN sequences shows that none are especially hydrophobic or appear to be integral membrane proteins PUBMED:2656645. This result is consistent with other evidence suggesting that the proteins may be peripheral to the membrane, possibly mounted on the basal body M ring PUBMED:2656645, PUBMED:1631122.

    \ 2927 IPR003868 \ This is a family of Herpesvirus proteins including UL31, UL53, and the product of ORF 69 in some strains. The proteins in this family have no known function.\ 4350 IPR000096 \ The serum amyloid A (SAA) proteins comprise a family of vertebrate proteins\ that associate predominantly with high density lipoproteins (HDL) PUBMED:7504491, PUBMED:8188253. The\ synthesis of certain members of the family is greatly increased (as much as a\ 1000 fold) in inflammation; thus making SAA a major acute phase reactant.\ While the major physiological function of SAA is unclear, prolonged elevation\ of plasma SAA levels, as in chronic inflammation, however, results in a\ pathological condition, called amyloidosis, which affects the liver, kidney\ and spleen and which is characterized by the highly insoluble accumulation of\ SAA in these tissues.\ SAA are proteins of about 110 amino acid residues. The most highly conserved \ region is located in the central part of the sequence.\ 7785 IPR012470 \

    Family of fungal proteins with unknown function. A member of this family has been found to localise in the mitochondria PUBMED:14562095.

    \ 2070 IPR007295 \ Family member FomD is a predicted protein from a fosfomycin biosynthesis gene cluster in Streptomyces wedmorensis PUBMED:7500951. Its function is unknown.\ 1906 IPR003776 \

    This domain comprises the whole of a protein in Methanococcus jannaschii and Methanobacterium thermoautotrophicum, all but the N-terminal 60 residues from a protein of Mycobacterium tuberculosis, and all but the C-terminal 180 residues from a protein in Haemophilus influenzae and Escherichia coli, among proteins from published complete genomes.

    \ 6864 IPR009756 \

    This family represents a conserved region approximately 160 residues long that is repeated multiple times in plant starch synthase III (). Starch synthases extend alpha-1,4 glucan chains by catalysing the transfer of the glucosyl moiety of ADP-Glc to the non-reducing end of a pre-existing alpha-1,4 glucan. SS-III is thought to be primarily involved in the synthesis of amylopectin rather than amylose PUBMED:10859191.

    \ 2506 IPR011619 \

    Escherichia coli has an iron(II) transport system (feo) which may make an important contribution to the iron supply of the cell under anaerobic conditions. FeoB has been identified as part of this transport system and may play a role in the transport of ferrous iron. FeoB is a large 700-800 amino acid integral membrane protein. The N terminus contains a P-loop motif suggesting that iron transport may be ATP dependent PUBMED:8407793.

    \ 528 IPR007851 \ The Saccharomyces cerevisiae member of this family is found to be required for the assembly of preribosomal 40S subunits in the nucleolus PUBMED:11027267. KRR1 is highly expressed in dividing cells and its expression ceases almost completely when cells enter the stationary phase.\ 5298 IPR008905 \ The largest of the mammalian translation initiation factors, eIF3, consists of at least eight subunits ranging in mass from 35 to 170 kDa. eIF3 binds to the 40 S ribosome in an early step of translation initiation and promotes the binding of methionyl-tRNAi and mRNA PUBMED:8995409.\ 4725 IPR003337 \ Trehalose-phosphatases catalyse the de-phosphorylation of\ trehalose-6-phosphate to trehalose and orthophosphate. Trehalose is a common disaccharide of bacteria, fungi and invertebrates that appears to play a major role in desiccation tolerance. A pathway for trehalose biosynthesis may also exist in plants PUBMED:9681009. The trehalose-phosphatase signature is found in the C-terminus of\ trehalose-6-phosphate synthase adjacent to the trehalose-6-phosphate synthase domain (see ). It would appear that the two equivalent genes in the Escherichia coli otsBA operon: otsA, the\ trehalose-6-phosphate synthase and otsB, trehalose-phosphatase (this family) have undergone gene fusion in\ most eukaryotes PUBMED:8045430.\ 2912 IPR004997 \

    This is an accessory subunit of Herpesvirus DNA polymerase that acts to increase the processivity of polymerisation.

    \ 3254 IPR003059 \

    The DNA sequence of the entire colicin E2 operon has been determined PUBMED:3892228.\ The operon comprises the colicin activity gene (ceaB), the colicin immunity\ gene (ceiB) and the lysis gene (celB), which is essential for colicin\ release from producing cells PUBMED:3892228. A putative LexA binding site is located\ upstream from ceaB, and a rho-independent terminator structure is located\ downstream from celB PUBMED:3892228. Comparison of the amino acid sequences of colicin\ E2 and cloacin DF13 reveal extensive similarity. These colicins have\ different modes of action and recognise different cell surface receptors;\ the two major regions of heterology at the C-terminus, and in the C-terminal\ end of the central region are thought to correspond to the catalytic and \ receptor-recognition domains, respectively PUBMED:3892228.

    \ \

    Sequence similarities between colicins E2, A and E1 PUBMED:3936034 are less striking.\ The colicin E2 (pyocin) immunity protein does not share similarity with\ either the colicin E3 or cloacin DF13 PUBMED:6253914 immunity proteins. By contrast,\ the lysis proteins of the ColE2, ColE1 and CloDF13 plasmids are almost\ identical except in the N-terminal regions, which themselves are similar to\ lipoprotein signal peptides PUBMED:3892228. Processing of the ColE2 prolysis protein\ to the mature form is prevented by globomycin, a specific inhibitor of the\ lipoprotein signal peptidase PUBMED:3892228. The mature ColE2 lysis protein is located\ in the cell envelope PUBMED:3892228.

    \ \ \ 3220 IPR001677 \

    Bacterial transferrin binding proteins act as transferrin receptors and are required for transferrin utilisation. Transferrins are iron-binding glycoproteins that control the level of free iron in biological fluids.

    \ 6215 IPR009432 \

    This family consists of several eukaryotic proteins of unknown function.

    \ 8126 IPR013261 \

    TIM21 interacts with the outer mitochondrial TOM complex and promotes the insertion of proteins into the inner mitochondrial membrane PUBMED:15797382.

    \ 2848 IPR008144 \

    Guanylate kinase () (GK) PUBMED:1314905 catalyzes the ATP-dependent phosphorylation of GMP into GDP.\ It is essential for recycling GMP and indirectly, cGMP. In prokaryotes (such as Escherichia coli), lower eukaryotes\ (such as yeast) and in vertebrates, GK is a highly conserved monomeric protein of about 200 amino acids. GK\ has been shown PUBMED:1310897, PUBMED:8097461, PUBMED:1329277 to be structurally similar to protein A57R (or SalG2R)\ from various strains of Vaccinia virus.

    \

    Proteins containing one or more copies of the DHR domain, an SH3 domain as well as a C-terminal GK-like\ domain, are collectively termed MAGUKs (membrane-associated guanylate kinase homologs) PUBMED:8155583, and\ include Drosophila lethal(1)discs large-1 tumor suppressor protein (gene dlg1); mammalian tight junction\ protein Zo-1; a family of mammalian synaptic proteins that seem to interact with the cytoplasmic tail of\ NMDA receptor subunits (SAP90/PSD-95, CHAPSYN-110/PSD-93, SAP97/DLG1 and SAP102); vertebrate 55 kD erythrocyte\ membrane protein (p55); C. elegans protein lin-2; rat protein CASK; and human proteins DLG2 and DLG3. There is\ an ATP-binding site (P-loop) in the N-terminal section of GK, which is not conserved in the GK-like domain of\ the above proteins. However these proteins retain the residues known, in GK, to be involved in the binding of\ GMP.

    \ 639 IPR006964 \

    This domain represents the C-terminal conserved region of NUDE proteins. Aspergillus nidulans NUDE, acts in the cytoplasmic dynein/dynactin pathway and is required for distribution of nuclei PUBMED:11509576. It is a homologue of the nuclear distribution protein RO11 of Neurospora crassa. NUDE interacts with the NUDF via an N-terminal coiled coil domain; this is the only domain which is absolutely required for NUDE function.

    \ \ \ \ 475 IPR005213 \

    This short (30 amino acids) repeat is found in a number of plant proteins. It contains a conserved HGWP motif, hence its name. The function of these proteins is unknown.

    \ 6097 IPR009372 \

    This family consists of several short Chordopoxvirus proteins of unknown function.

    \ 1935 IPR003863 \ This domain is a region found in several Arabidopsis thaliana hypothetical proteins none of which have any known function. The aligned region contains two cysteine residues.\ 1629 IPR003449 \ This is a family of proteins from coronavirus which may function in the formation of membrane-bound replication complexes or in viral assembly.\ 6795 IPR009717 \

    This entry represents the C terminus (approximately 80 residues) of a number of bacterial Mo-dependent nitrogenases. These are involved in nitrogen fixation in cyanobacteria PUBMED:7568132.

    \ 5969 IPR009312 \

    This entry represents a tail fibre component U of bacteriophage.

    \ 6191 IPR010471 \

    This family consists of several hypothetical plant proteins from Arabidopsis thaliana and Oryza sativa. The function of this family is unknown.

    \ 6804 IPR010726 \

    This entry represents a conserved region approximately 80 residues long within a number of proteins of unknown function that seem to be specific to Caenorhabditis elegans. Some proteins contain more than one copy of this region.

    \ 1496 IPR002159 \

    CD36 is a transmembrane, highly glycosylated, 88kDa glycoprotein expressed by monocytes, macrophages, platelets, microvascular endothelial \ cells and adipose tissues. Platelet glycoprotein IV (GP IV) (GPIIIb) (CD36 antigen) is also called GPIV, OKM5-antigen or PASIV. CD36 recognizes oxidized low density lipoprotein, long chain fatty acids, anionic phospholipids, collagen types I, IV and V, thrombospondin (TSP) and Plasmodium falciparum infected erythrocytes. The recognition of apoptotic neutrophils is in co-operation with TSP and avb3.\ Other ligands may still be unknown.

    \ \

    CD36 is a scavenger receptor for oxidized LDL and shed photoreceptor outer segments and in recognition and phagocytosis of apoptotic cells and is the cell adhesion molecule in platelet adhesion and aggregation, platelet-monocyte and platelet-tumor cell\ interaction PUBMED:9478926.

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 5976 IPR009316 \

    The COG complex comprises eight proteins COG1-8. The COG complex plays critical roles in Golgi structure and function PUBMED:11980916.

    \ 2184 IPR007553 \ This is a family of uncharacterised bacterial proteins.\ 4192 IPR002672 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L28e forms part of the 60S ribosomal subunit PUBMED:1840484. This family is found in eukaryotes. In rat there are 9 or 10 copies of the L28 gene. The L28 protein contains a possible internal duplication of 9 residues PUBMED:2207170.

    \ 3155 IPR002049 \ Laminins PUBMED:2404817 are the major noncollagenous components of basement membranes\ that mediate cell adhesion, growth migration, and differentiation. They are\ composed of distinct but related alpha, beta and gamma chains. The three\ chains form a cross-shaped molecule that consist of a long arm and three short\ globular arms. The long arm consist of a coiled coil structure contributed by\ all three chains and cross-linked by interchain disulphide bonds.\ Beside different types of globular domains each subunit contains, in its first\ half, consecutive repeats of about 60 amino acids in length that include eight\ conserved cysteines PUBMED:2666164. The tertiary structure PUBMED:8648630, PUBMED:8648631 of this domain is\ remotely similar in its N-terminal to that of the EGF-like module (see ). It is known as a 'LE' or 'laminin-type EGF-like' domain. The\ number of copies of the LE domain in the different forms of laminins is highly\ variable; from 3 up to 22 copies have been found.\ A schematic representation of the topology of the four disulphide bonds in\ the LE domain is shown below.\ \
    \
             +-------------------+\
           +-|-----------+       |  +--------+  +-----------------+\
           | |           |       |  |        |  |                 |\
         xxCxCxxxxxxxxxxxCxxxxxxxCxxCxxxxxGxxCxxCxxgaagxxxxxxxxxxxCxx\
           sssssssssssssssssssssssssssssssssss\
    \
    'C': conserved cysteine involved in a disulphide bond\
    'a': conserved aromatic residue\
    'G': conserved glycine (lower case = less conserved)\
    's': region similar to the EGF-like domain\
    
    \ In mouse laminin gamma-1 chain, the seventh LE domain has been shown to be the\ only one that binds with a high affinity to nidogen PUBMED:7781764. The binding-sites are\ located on the surface within the loops C1-C3 and C5-C6 PUBMED:8648630, PUBMED:8648631. Long\ consecutive arrays of LE domains in laminins form rod-like elements of limited\ flexibility PUBMED:2404817, which determine the spacing in the formation of laminin\ networks of basement membranes PUBMED:8349613.\ 6049 IPR009349 \

    This zinc finger appears to be common in activating signal cointegrator 1/thyroid receptor interacting protein 4.

    \ 1658 IPR008273 \ This entry defines the N-terminal of various retinaldehyde/retinal-binding proteins that may be\ functional components of the visual cycle. Cellular retinaldehyde-binding protein (CRALBP) carries 11-cis-retinol or 11-cis-retinaldehyde as endogenous ligands and may function as a substrate carrier protein that modulates interaction of these retinoids with visual cycle enzymes PUBMED:1715867. \ The multidomain protein Trio binds the LAR transmembrane tyrosine phosphatase, contains a protein kinase domain, and has separate rac-specific and rho-specific guanine nucleotide exchange factor domains PUBMED:8643598. Trio is a multifunctional protein that integrates and amplifies signals involved in coordinating actin remodeling, which is necessary for cell migration and growth.\

    Other members of the family are \ transfer proteins that include, guanine nucleotide exchange factor that may \ function as an effector of RAC1, phosphatidylinositol/phosphatidylcholine transfer \ protein that is required for the transport of secretory proteins from the golgi\ complex and alpha-tocopherol transfer protein that enhances the transfer of the \ ligand between separate membranes.

    \ 7727 IPR012875 \

    The members of this family are sequences derived from hypothetical eukaryotic and bacterial proteins. The region in question is approximately 60 residues long.

    \ 4385 IPR000904 \ The SEC7 domain was named after the first protein found to contain such a region PUBMED:3042778. \ It has been shown to be linked with guanine nucleotide exchange function PUBMED:9072969, PUBMED:9442017.\ The 3D structure of the domain displays several alpha-helices PUBMED:9653114. It was found to be \ associated with other domains involved in guanine nucleotide exchange (e.g., CDC25, Dbl) in mammalian \ factors PUBMED:9868368.\ 104 IPR005137 \

    Photosystem I (PSI) is a large protein complex embedded within the photosynthetic thylakoid membrane. It consists of 11 subunits, ~100 chlorophyll a molecules, 2 phylloquinones, and 3 Fe4S4-clusters. The three dimensional structure of the PSI complex has been resolved at 2.5 A PUBMED:11418848, which allows the precise localisation of each cofactor. PSI together with photosystem II (PSII) catalyses the light-induced steps in oxygenic photosynthesis - a process found in cyanobacteria, eukaryotic algae (e.g. red algae, green algae) and higher plants.

    \

    To date, three thylakoid proteins involved in the stable accumulation of PSI have been identified: BtpA PUBMED:9045660, Ycf3 PUBMED:9321389, PUBMED:9314531, and Ycf4 () PUBMED:9321389. Because translation of the psaA and psaB mRNAs encoding the two reaction centre polypeptides, of PSI and PSII respectively, is not affected in mutant strains lacking functional ycf3 and ycf4, the products of these two genes appear to act at a post-translational step of PSI biosynthesis.\ These gene products are therefore involved either in the stabilisation or in the assembly of the PSI complex. However, their exact roles remain unknown. The BtpA protein appears to act at the level of PSI stabilisation PUBMED:10806238. It is an extrinsic membrane protein located on the cytoplasmic side of the thylakoid membrane PUBMED:10103064, PUBMED:10806238. Homologs of BtpA are found in the crenarchaeota and euryarchaeota, where their function remains unknown. The Ycf4 protein is firmly associated with the thylakoid membrane, presumably through a transmembrane domain PUBMED:9321389. Ycf4 co-fractionates with a protein complex larger than PSI upon sucrose density gradient centrifugation of solubilised thylakoids PUBMED:9321389. The Ycf3 protein is loosely associated with the thylakoid membrane and can be released from the membrane with sodium carbonate. This suggests that Ycf3 is not part of a stable complex and that it probably interacts transiently with its partners PUBMED:11752384. Ycf3 contains a number of tetratrico peptide repeats (TPR, ); TPR is a structural motif present in a wide range of proteins, which mediates proteinprotein interactions.

    \ \ 4224 IPR001848 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Evidence suggests that, in prokaryotes, the peptidyl transferase reaction is performed by the large subunit 23S rRNA, whereas proteins probably have a greater role in eukaryotic ribosomes. Most of the proteins lie close to, or on the surface of, the 30S subunit, arranged peripherally around the rRNA PUBMED:9281425. The small subunit ribosomal proteins can be categorised as primary binding proteins, which bind directly and independently to 16S rRNA; secondary binding proteins, which display no specific affinity for 16S rRNA, but its assembly is contingent upon the presence of one or more primary binding proteins; and tertiary binding proteins, which require the presence of one or more secondary binding proteins and sometimes other tertiary binding proteins.

    \

    The small ribosomal subunit protein S10 consists of about 100 amino acid residues. In Escherichia coli, S10 is involved in binding tRNA to the ribosome, and also operates as a transcriptional elongation factor PUBMED:8021936. Experimental evidence PUBMED:9371771 has revealed that S10 has virtually no groups exposed on the ribosomal surface, and is one of the "split proteins": these are a discrete group that are selectively removed from 30S subunits under low salt conditions and are required for the formation of activated 30S reconstitution intermediate (RI*) particles. S10 belongs to a family of proteins PUBMED:2179947 that includes: bacteria S10; algal chloroplast S10; cyanelle S10; archaebacterial S10; Marchantia polymorpha and Prototheca wickerhamii mitochondrial S10; Arabidopsis thaliana mitochondrial S10 (nuclear encoded); vertebrate S20; plant S20; and yeast URP2.

    \ 5390 IPR008892 \ This family consists of several WCOR413-like plant cold acclimation proteins.\ 6027 IPR009341 \

    These proteins, whose function is unknown, are found in phages of Gram-positive bacteria.

    \ 7275 IPR008986 \

    Ebola virus is a non-segmented, negative-strand RNA virus that causes severe haemorrhagic fever in humans with high rates of mortality. The Ebola virus matrix protein VP40 is a major structural protein that plays a central role in virus assembly and budding at the plasma membrane of infected cells. VP40 proteins associate with cellular membranes, interact with the cytoplasmic tails of glycoproteins, and bind to the ribonucleoprotein complex. The VP40 monomer consists of two domains, the N-terminal oligomerization domain and the C-terminal membrane-binding domain, connected by a flexible linker. Both the N- and C-terminal domains fold into beta sandwich structures of similar topology PUBMED:10944105. Within the N-terminal domain are two overlapping L-domains with the sequences PTAP and PPEY at residues 7 to13, which are required for efficient budding PUBMED:12559917. L-domains are thought to mediate their function in budding through their interaction with specific host cellular proteins, such as tsg101 and vps-4 PUBMED:12525615.

    \ \ 1002 IPR004910 \

    The Drosophila gene, Yippee, reveals a novel family of putative zinc binding proteins highly conserved among eukaryotes.

    \ 2745 IPR001764 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ Glycoside hydrolase family 3 comprises enzymes with a number of known activities; beta-glucosidase (); beta-xylosidase (); N-acetyl beta-glucosaminidase (); glucan\ beta-1,3-glucosidase (); cellodextrinase (); exo-1,3-1,4-glucanase ().\ \ These enzymes are two-domain globular proteins that are N-glycosylated at three sites PUBMED:10368285. This domain is often\ N-terminal to the glycoside hydrolase family 3, C terminal domain . \ 1166 IPR006948 \ Allicin is a thiosulphinate that gives rise to dithiines, allyl sulphides and ajoenes, the three groups of active compounds in Allium species. Allicin is synthesised from sulphoxide cysteine derivatives by alliinase, whose C-S lyase activity cleaves C(beta)-S(gamma) bonds. It is thought that this enzyme forms part of a primitive plant defence system PUBMED:12235163.\ 6489 IPR009550 \

    This family represents a conserved region within Agrobacterium tumefaciens VirE3. Agrobacterium tumefaciens (a plant pathogen) has a tumour-inducing (Ti) plasmid of which part, the transfer (T)-region, is transferred to plant cells during the infection process. Vir proteins mediate the processing of the T-region and the transfer of a single-stranded (ss) DNA copy of this region, the T-strand, into the recipient cells. VirE3 is a translocated effector protein, but its specific role has not been established PUBMED:12560481.

    \ 1980 IPR005102 \

    The structure of this module is known PUBMED:11080456 and consists of an Ig-like fold. The function of this domain is unknown, but might be involved in mediating interaction with carbohydrates.

    \ 7360 IPR011081 \

    This entry represents bacterial domains with an Ig-like fold. These domains are found in a variety of bacterial surface proteins.

    \ 5488 IPR008525 \ This family consists of several proteins of unknown function from Coxiella burnetii (the causative agent of a zoonotic disease called Q fever).\ 1641 IPR000859 \ The CUB domain is an extracellular domain of approximately 110 residues which is found in functionally \ diverse, mostly developmentally regulated proteins PUBMED:8510165, PUBMED:2026272 and in peptidases belonging to MEROPS peptidase families M12A (astacin) and S1A (chymotrypsin). Almost all CUB domains \ contain four conserved cysteines which probably form two disulphide bridges (C1-C2, C3-C4). The structure \ of the CUB domain has been predicted to be a beta-barrel similar to that of immunoglobulins. Proteins \ that have been found to contain the CUB domain include mammalian complement subcomponents C1s/C1r, which \ form the calcium-dependent complex C1, the first component of the classical pathway of the complement \ system; hamster serine protease Casp, which degrades type I and IV collagen and fibronectin in the \ presence of calcium; mammalian complement-activating component of Ra-reactive factor (RARF), a protease \ that cleaves the C4 component of complement; vertebrate enteropeptidase (), a type II \ membrane protein of the intestinal brush border, which activates trypsinogen; vertebrate bone \ morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone formation and expresses \ metalloendopeptidase activity; sea urchins blastula proteins BP10 and SpAN; Caenorhabditis elegans \ hypothetical proteins F42A10.8 and R151.5; neuropilin (A5 antigen), a calcium-independent cell adhesion \ molecule that functions during the formation of certain neuronal circuits; fibropellins I and III from \ sea urchin; mammalian hyaluronate-binding protein TSG-6 (or PS4), a serum and growth factor induced \ protein; mammalian spermadhesins; and Xenopus embryonic protein UVS.2, which is expressed during \ dorsoanterior development.\ 4198 IPR002150 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L31 is one of the proteins from the large ribosomal subunit. L31 is a protein of 66 to 97 amino-acid residues which has only been found so far in bacteria and in some plant and algal chloroplasts.

    \ 6885 IPR009767 \

    This family represents a conserved region approximately 130 residues long within the bacterial DNA helicase TraI. TraI is a bifunctional protein that catalyses the unwinding of duplex DNA as well as acts as a sequence-specific DNA trans-esterase, providing the site- and strand-specific nick required to initiate DNA transfer PUBMED:11054423.

    \ 458 IPR005114 \ This short domain is found in multiple copies in bacterial helicase proteins. The domain is predicted to contain 3 alpha helices. The function of this domain may be to bind nucleic acid.\ 4719 IPR005498 \ Although not essential for conjugation, the TrbI protein greatly increases conjugational efficiency PUBMED:8763954.\ 1060 IPR005128 \ Alpha-acetolactate decarboxylase plays a dual role in the\ cell: (i) it catalyzes the second step of the acetoin pathway, \ and thus potentially the internal pH of cells\ and (ii) it controls the pool of alpha-acetolactate during leucine\ and valine synthesis.\ 6739 IPR009686 \

    This family contains a number of plant senescence-associated proteins of approximately 450 residues in length. In Hemerocallis, petals have a genetically based program that leads to senescence and cell death approximately 24 hours after the, flower opens, and it is believed that senescence proteins produced around that time have a role in this program PUBMED:10412903.

    \ 6268 IPR009052 \

    DNA polymerase III holoenzyme is the primary enzyme responsible for replication of Escherichia coli chromosomal DNA. The holoenzyme consists of 17 proteins and contains two core polymerases. The polymerase III catalytic core has three tightly associated subunits: alpha, epsilon and theta. The alpha subunit is responsible for the DNA polymerase activity, while the epsilon subunit is the 3'à 5' proofreading exonuclease. The epsilon subunit binds to both the alpha and theta subunits in the linear order alpha-epsilon-theta. The theta subunit is the smallest, and may act to enhance the proofreading activity of epsilon, although its function remain elusive. The fold of the theta subunit appears similar to the chaperone J-domain PUBMED:10794414.

    \ \ 1454 IPR004986 \ The gene III product (P15) of cauliflower mosaic virus (CaMV) is a DNA binding protein in which the DNA binding activity\ is located on its C-terminal part. A family of related proteins is expressed by other members of the Caulimoviridae.\ 4530 IPR007882 \

    Neurons contain abundant subsets of highly stable microtubules that resist de-polymerising conditions such as exposure to the cold. Stable microtubules are thought to be essential for neuronal development, maintenance, and function. STOP is a major factor responsible for the intriguing stability properties of neuronal microtubules and is important for synaptic plasticity. STOPs (for stable tubule only polypeptides) are calmodulin-binding and calmodulin-regulated proteins which, in mammals, are encoded by a single gene but exhibit\ substantial cell specific variability due to mRNA splicing and alternative promoter use. STOP microtubule stabilising activity has been ascribed to two classes of new bifunctional calmodulin- and\ microtubule-binding motifs, with distinct microtubule binding properties in vivo. STOPs seem to be restricted to vertebrates and are composed of a conserved domain split by the apparent insertion of\ variable sequences that are completely unrelated among species PUBMED:14567673.

    N-STOP (for neuronal adult STOP) contains two repeat domains. The central repeat domain is composed of five repeated sequences of 46\ amino acids. These sequences are almost completely identical, exhibiting an unusual degree of conservation of the repeat motif, compared to repeated sequences in other microtubule-associated proteins. The\ carboxy-terminal repeat domain is composed of 28 imperfect repeats of an 11 amino acid consensus sequence. Upstream of the carboxy-terminal repeat domain, rat N-STOP contains a highly basic sequence (called\ the "KR domain" after its high content in lysine and arginine residues) and a so-called "linker domain" located between the central repeat domain and the KR domain. To date, two splicing variants of STOP, E-STOP and F-STOP, have been characterised in rodents. Knowledge of STOPs function and properties may help in the treatment of neuroleptics in illnesses such as schizophrenia, currently thought to result from synaptic defects PUBMED:12231625.

    \ 3218 IPR000680 \ This family of lipoproteins is found in Borrelia spirochetes. The function of these proteins is\ uncertain, but it may serve to avoid the host immune response by changing from one surface\ exposed variable major outer membrane lipoprotein to another.\ 4387 IPR003708 \

    Secretion across the inner membrane in some Gram-negative bacteria occurs\ via the preprotein translocase pathway. Proteins are produced in the \ cytoplasm as precursors, and require a chaperone subunit to direct them to \ the translocase component PUBMED:2202721. From there, the mature proteins are either \ targeted to the outer membrane, or remain as periplasmic proteins. The \ translocase protein subunits are encoded on the bacterial chromosome.

    \

    \ The translocase itself comprises 7 proteins, including a chaperone protein\ (SecB), an ATPase (SecA), an integral membrane complex (SecCY, SecE and \ SecG), and two additional membrane proteins that promote the release of \ the mature peptide into the periplasm (SecD and SecF) PUBMED:2202721. The chaperone \ protein SecB PUBMED:11336818 is a highly acidic homotetrameric protein that exists\ as a "dimer of dimers" in the bacterial cytoplasm. SecB maintains \ preproteins in an unfolded state after translation, and targets these to \ the peripheral membrane protein ATPase SecA for secretion PUBMED:10418149.

    \

    \ Recently, the tertiary structure of Haemophilus influenzae SecB () was resolved\ by means of X-ray crystallography to 2.5A PUBMED:11101901. The chaperone comprises four\ chains, forming a tetramer, each chain of which has a simple alpha+beta fold\ arrangement. While one binding site on the homotetramer recognises unfolded\ polypeptides by hydrophobic interactions, the second binds to SecA through\ the latter's C-terminal 22 residues.

    \ 1148 IPR006819 \ The virD operon in Agrobacterium encodes a site-specific endonuclease, and a number of other poorly characterised products. This family represents the VirD5 protein.\ 2505 IPR007167 \ This family includes FeoA a small protein, probably involved in Fe2+ transport PUBMED:8407793.\ 914 IPR000672 \ Enzymes that participate in the transfer of one-carbon units require the coenzyme tetrahydrofolate (THF).\ Various reactions generate one-carbon derivatives of THF, which can be interconverted between different\ oxidation states by methylene-THF dehydrogenase (), methenyl-THF cyclohydrolase ()\ and formyl-THF synthetase () PUBMED:2541774, PUBMED:8485162. The dehydrogenase and cyclohydrolase\ activities are expressed by a variety of multifunctional enzymes, including the tri-functional eukaryotic\ C1-tetrahydrofolate synthase PUBMED:2541774; a bifunctional eukaryotic mitochondrial protein; and the\ bifunctional Escherichia coli folD protein PUBMED:2541774, PUBMED:8485162. Methylene-tetrahydrofolate dehydrogenase and\ methenyltetrahydrofolate cyclo-hydrolase share an overlapping active site PUBMED:2541774, and as such are\ usually located together in proteins, acting in tandem on the carbon-nitrogen bonds of substrates other\ than peptide bonds.\ 1472 IPR003339 \ Cobalt transport proteins are most often found in cobalamin (vitamin B12)\ biosynthesis operons. Salmonella typhimurium synthesizes cobalamin (vitamin B12) de novo under anaerobic conditions. Not all Salmonella and Pseudomonas cobalamin synthetic genes have apparent homologs in the other species suggesting that the cobalamin biosynthetic pathways differ between the two organisms PUBMED:8501034.\ 8008 IPR012611 \

    This family consists of the small acid-soluble spore proteins (SASP) belonging to the K type (sspK). The sspK are unique to the spores of Bacillus subtilis and are expressed only in the forespore compartment of sporulating cells of this organism. The sspK gene is monocistronic and transcription is primarily by the RNA polymerase with the forespore-specific sigma factor, sigma-G. Mutation deleting sspK results in loss of SspK from the spore but had no discernible effect on sporulation, spore properties or spore germination PUBMED:10806362.

    \ 7529 IPR011621 \ These bacterial 7TM receptor proteins have an intracellular domain . This entry corresponds to the 7 helix transmembrane domain. These proteins also contain an N-terminal extracellular domain.\ 7040 IPR009860 \

    This family consists of several phage associated hyaluronidase proteins () which seem to be specific to Streptococcus pyogenes and Streptococcus pyogenes bacteriophages. The substrate of hyaluronidase is hyaluronic acid, a sugar polymer composed of alternating N-acetylglucosamine and glucuronic acid residues. Hyaluronic acid is found in the ground substance of human connective tissue and the vitreous of the eye and also is the sole component of the capsule of group A streptococci. The capsule has been shown to be an important virulence factor of this organism by virtue of its ability to resist phagocytosis. Production by S. pyogenes of both a hyaluronic acid capsule and hyaluronidase enzymatic activity capable of destroying the capsule is an interesting, yet-unexplained, phenomenon PUBMED:7622224.

    \ 3807 IPR005133 \

    This is a family of small, transmembrane proteins believed to be\ components of Na+/H+ and K+/H+ antiporters. Members, including proteins designated\ MnhG from Staphylococcus aureus and PhaG from Rhizobium meliloti, show some\ similarity to chain L of the NADH dehydrogenase I, which also translocates protons.

    \ 5033 IPR007343 \

    Members of this family of bacterial proteins are described as hypothetical proteins or zinc metallopeptidases. The majority have a HExxH zinc-binding motif characteristic of neutral zinc metallopeptidases, however there is no evidence to support their function as metallopeptidases.

    \ 3111 IPR002611 \ These proteins contain an ATP/GTP binding P-loop motif. They are found \ associated with IS21 family insertion sequences PUBMED:7698671. Functionally they have not been characterized, but they may be involved in transposition PUBMED:9141667.\ 1385 IPR004284 \ Birnaviruses are ds RNA viruses. Non structural protein VP5 is found in RNA segment A. The function of this small viral\ protein is unknown.\ 8109 IPR006583 \

    CW is a domain associated with a number of Caenorhabditis elegans\ hypothetical proteins.

    \ 4729 IPR003445 \ This family consists of various potassium transport proteins (Trk) and V-type sodium ATP synthase subunit J or translocating ATPase J (). These proteins are involved in active sodium up-take utilizing ATP in the process. TrkH from Escherichia coli is a hydrophobic membrane protein and determines the specificity and kinetics of cation transport by the TrK system in this organism PUBMED:7896723. This protein interacts with TrkA and requires TrkE for transport activity.\ 7020 IPR009847 \

    This family consists of several mammalian SNRPN upstream reading frame (SNURF) proteins. SNURF or RPF4 is a RING-finger protein and a coregulator of androgen receptor-dependent transcription. It has been suggested that SNURF is involved in the regulation of processes required for late steps of spermatid maturation PUBMED:12351196, PUBMED:12874792.

    \ 7439 IPR011468 \

    This is a family of hypothetical proteins found in Leptospira interrogans.

    \ 3909 IPR002507 \ Reovirus nonstructural protein sigma NS exhibits a ssRNA-binding activity and is thought to be involved in assembling the reovirus mRNAs for genome replication and virion morphogenesis. Various studies have been carried out to localize the RNA-binding site PUBMED:9343167. They suggest that the first 11 amino acids of sigma NS, which are predicted to form an amphipathic alpha-helix, are important for both ssRNA binding and formation of complexes larger than 7-9 S.\

    A number of other studies have attempted to identify and characterise the RNA-binding activities of sigma NS. A study of the avian reovirus sigma NS protein suggests that it binds to single-stranded RNA in a nucleotide sequence non-specific manner and is functionally similar to its counterpart specified by mammalian reovirus PUBMED:9634083.

    \ 2467 IPR003753 \

    Exonuclease VII is composed of two nonidentical subunits; one large subunit and 4 small ones PUBMED:6284744.\ Exonuclease VII catalyses exonucleolytic cleavage in\ either 5'-3' or 3'-5' direction to yield 5'-phosphomononucleotides. The large subunit also contains the OB-fold domains () that bind to nucleic acids at the N-terminus.

    \ 3573 IPR006665 \

    Most of the bacterial outer membrane proteins in this group are porin-like integral membrane proteins (such as ompA) PUBMED:2202726, but some are small lipid-anchored proteins (such as pal) PUBMED:10515919. It is also found in MotB and related proteins. They are present in the outer membrane of many Gram-negative organisms PUBMED:1538702. This domain is found at the C-terminal half of these proteins and is well conserved. The N-terminal half is variable although some of the proteins in this group have the OmpA-like transmembrane domain at the N terminus.

    \ 503 IPR006921 \

    This domain, primarily C-terminal, is found in a family of proteins thought to be involved in regulating gene activity in the proliferative and/or differentiative pathways induced by NGF PUBMED:9722946.

    \ 951 IPR000007 \

    Tubby, an autosomal recessive mutation, mapping to mouse chromosome 7, was\ recently found to be the result of a splicing defect in a novel gene with unknown\ function. This mutation maps to the tub gene PUBMED:8612280, PUBMED:8606774.\ The mouse tubby mutation is the cause of maturity-onset obesity, insulin\ resistance and sensory deficits. By contrast with the rapid juvenile-onset weight gain seen in diabetes (db) and obese (ob) mice, obesity in\ tubby mice develops gradually, and strongly resembles the late-onset obesity\ observed in the human population. Excessive deposition of adipose tissue culminates in a two-fold increase of body weight. Tubby mice also suffer\ retinal degeneration and neurosensory hearing loss. The tripartite\ character of the tubby phenotype is highly similar to human obesity\ syndromes, such as Alstrom and Bardet-Biedl. Although these phenotypes\ indicate a vital role for tubby proteins, no biochemical function has yet\ been ascribed to any family member PUBMED:10591637, although it has been suggested that the phenotypic features of tubby mice may be the result of cellular \ apoptosis triggered by expression of the mutated tub gene. TUB is the founding-member of the tubby-like proteins, the TULPs. TULPs\ are found in multicellular organisms from both the plant and animal\ kingdoms. Ablation\ of members of this protein family cause disease phenotypes that are indicative\ of their importance in nervous-system function and development PUBMED:14708010.

    \ \

    Mammalian TUB is a hydrophilic protein of ~500 residues. The N-terminal\ portion of the protein is conserved neither in length nor sequence, but, in TUB, contains the nuclear\ localisation signal and may have\ transcriptional-activation activity. The \ C-terminal 250 residues are highly conserved. The C-terminal extremity\ contains a cysteine residue that might play an important role in the normal\ functioning of these proteins.\ The crystal structure\ of the C-terminal core domain from mouse tubby has been determined to 1.9A\ resolution. This domain is arranged as a\ 12-stranded, all anti-parallel, closed beta-barrel that surrounds a central\ alpha helix, (which is at the extreme carboxyl terminus of the protein)\ that forms most of the hydrophobic core. \ Structural analyses suggest that TULPs constitute a unique family of \ bipartite transcription factors PUBMED:10591637.\

    \ 4509 IPR000691 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    The Streptomyces family of bacteria produce a number of proteinase inhibitors, which belong to MEROPS inhibitor family I16, clan IY. They are characterised by their strong activity towards subtilisin (MEROPS peptidase family S8, ) and are collectively known as Streptomyces subtilisin inhibitors (SSI). Some SSI also inhibit trypsin, chymotrypsin (MEROPS peptidase family S1, ) and griselysin (MEROPS peptidase family M4, ) PUBMED:14705960. Mutation of the active site residue can influence\ inhibition specificity PUBMED:1908859. SSI is a homodimer, each monomer containing 2 anti-parallel beta-sheets\ and 2 short alpha-helices. Protease binding induces the widening of a channel-like structure, in which\ hydrophobic side-chains are sandwiched between 2 lobes PUBMED:6387152. Loss of the C-terminal tetrapeptide\ VFAF drastically reduces the inhibitory effect of the proteins when there is less than one molecule of\ inhibitor present per molecule of enzyme. This implies that the tetrapeptide is neccessary to maintain the\ correct 3D fold PUBMED:6993452. Structural similarities between the primary and secondary contact loops of SSI,\ and the ovomucoid and pancreatic secretory trypsin inhibitor family suggest evolution of the 2 families from\ a common ancestor PUBMED:6387152.

    \ \ \ 7956 IPR012970 \

    This family consists of a group of secreted bacterial lyase enzymes () capable of acting on hyaluronan and chondroitin in the extracellular matrix of host tissues, contributing to the invasive capacity of the pathogen.

    \ 7003 IPR009837 \

    This family represents a conserved region approximately 180 residues long within osteoregulin, a bone-remodelling protein expressed highly in osteocytes within trabecular and cortical bone. A conserved RGD motif is found towards the C-terminal end of this region, and this is potentially involved in integrin recognition PUBMED:10967096.

    \ 4449 IPR007450 \

    This is a bacterial outer membrane lipoprotein, possibly involved in maintaining the structural integrity of the cell envelope PUBMED:9973334. The lipid attachment site is a conserved N-terminal cysteine residue sometimes found adjacent to the OmpA domain ().

    \ 3747 IPR001333 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M32 (carboxypeptidase Taq family, clan MA(E)). The predicted active site residues for members of this family and thermolysin, the type example for clan MA, occur in the motif HEXXH.

    \ \

    Carboxypeptidase Taq is a zinc-containing thermostable metallopeptidase. It was originally discovered and purified from Thermus\ aquaticus; optimal enzymatic activity occurs at 80 celcius. Although very little is known about this enzyme, it is thought either to be associated\ with a membrane or to be particle bound PUBMED:.

    \ 3414 IPR007515 \

    Guanine nucleotide exchange factor MSS4 (Rab interacting factor) is a guanine-nucleotide releasing protein that acts on members of the SCE4/YPT1/RAB subfamily. It stimulates release of GDP and may play a role in vesicular transport.

    \ 1205 IPR005165 \

    Anthrax bacilli produce a set of three proteins, protective antigen, lethal factor (LF; 90 kD), and edema factor, which are known collectively as anthrax toxin (ATx). These proteins are nontoxic individually, but act in binary or ternary combinations to produce shock-like symptoms and death. LF is a Zn2+-protease that cleaves several mitogen-activated protein kinase kinases, kills macrophages, and causes death of the host PUBMED:11326092.

    \ 2500 IPR004108 \ Proteins containing this domain may be involved in the mechanism of biological hydrogen activation and contain 4FE-4S clusters. They can use molecular hydrogen for the reduction of a variety of substances.\ 2197 IPR007495 \ This is a family of putative periplasmic proteins.\ 4718 IPR007665 \ This is a family of proteins known to be involved in conjugal transfer. The TrbF protein is thought to compose part of the pilus required for transfer PUBMED:11846762.\ 4079 IPR006910 \ This domain represents a conserved N-terminal region found in eukaryotic cohesins of the Rad21, Rec8 and Scc1 families. Rad21/Rec8 like proteins mediate sister chromatid cohesion during mitosis and meiosis, as part of the cohesin complex PUBMED:11687503. Cohesion is necessary for homologous recombination (including double-strand break repair) and correct chromatid segregation. These proteins may also be involved in chromosome condensation. Dissociation at the metaphase to anaphase transition causes loss of cohesion and chromatid segregation PUBMED:10207075.\ 4953 IPR005515 \

    VOMI binds tightly to ovomucin fibrils of the egg yolk membrane. The structure PUBMED:8131734 consists of three beta-sheets forming Greek key motifs, which are related by an internal pseudo three-fold symmetry. Furthermore, the structure of VOMI has strong similarity to the structure of the delta-endotoxin, as well as a carbohydrate-binding site in the top region of the common fold PUBMED:8848836.

    \ 1701 IPR013081 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    Cytochrome b559, which forms part of the reaction centre core of PSII is a heterodimer composed of one alpha subunit (PsbE), one beta (PsbF) subunit, and a haem cofactor. Two histidine residues from each subunit coordinate the haem. Although cytochrome b559 is a redox-active protein, it is unlikely to be involved in the primary electron transport in PSII due to its very slow photo-oxidation and photo-reduction kinetics. Instead, cytochrome b559 could participate in a secondary electron transport pathway that helps protect PSII from photo-damage. Cytochrome b559 is essential for PSII assembly PUBMED:12560096.

    \ \

    This domain occurs in both the alpha and beta subunits of cytochrome B559. In the alpha sbunit it occurs together with a lumenal domain (), while in the beta subunit it occurs on its own.

    \ 7419 IPR011451 \

    This entry represents a cluster of homologous proteins identified in Leptospira interrogans. One member () has been predicted to be a phenazine biosynthesis family protein.

    \ 1633 IPR007878 \

    Members have a phosphoesterase module (2H) PUBMED:12466548 and are predicted to be involved in RNA modification. The viral group of 2H phosphoesterases contains proteins from two unrelated virus types: the type C rotaviruses (VP3 protein, ) that are double stranded multipartite RNA viruses and the coronaviruses (NS2 protein, this group) that are positive strand RNA viruses. Given that these viruses have vertebrate hosts, it is likely that the 2H phosphoesterase domain was derived from the host by one of virus groups followed by rapid sequence divergence PUBMED:12466548. Subsequently, it may have been exchanged between the viral families. Although the direction of the exchange is not clear, it is possible that a double stranded replicative form of a subgenomic RNA transcript of the coronavirus NS2 was stabilized by a rotavirus and incorporated into its multiple double stranded RNA genome PUBMED:12466548. These proteins can be utilized as novel drug targets because of their predicted RNA modification role.

    \ 3974 IPR007579 \

    Poxvirus T4 protein is thought to be retained in the endoplasmic reticulum. M-T4 of myxoma virus () is thought to protect infected lymphocytes from apoptosis and modulate the inflammatory response to virus infection PUBMED:10544103. The N terminus is .

    \ 2757 IPR005192 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This is a family of dextranase () and isopullulanase () which are all members of glycoside hydrolase family 49 (). Dextranase hydrolyses alpha-1,6-glycosidic bonds in dextran polymers.

    \ 680 IPR004134 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of proteins belong to MEROPS peptidase family C1, sub-family C1B (bleomycin hydrolase, clan CA). This family contains prokaryotic and eukaryotic aminopeptidases and bleomycin hydrolases.

    \ 964 IPR006214 \

    This family of proteins of unknown function contains a subset of Bax inhibitor-1 proteins.

    \ 2779 IPR004310 \ This protein is encoded by ORF3 of equine arteritis virus. The function is unknown.\ 668 IPR000717 \ A homology domain of unclear function, occurs in the C-terminal region of several\ regulatory components of the 26S proteasome as well as in other proteins. This domain\ has also been called the PINT motif (Proteasome, Int-6, Nip-1 and TRIP-15) PUBMED:9644972.\ Apparently, all of the characterized proteins containing PCI domains are parts of larger\ multi-protein complexes. Proteins with PCI domains include budding yeast proteasome\ regulatory components Rpn3(Sun2), Rpn5, Rpn6, Rpn7and Rpn9 PUBMED:9584156; mammalian proteasome regulatory components p55, p58 and p44.5, and translation\ initiation factor 3 complex subunits p110 and INT6 PUBMED:8995409, PUBMED:9341143; Arabidopsis\ COP9 and FUS6/COP11 PUBMED:8689678; mammalian G-protein pathway suppressor GPS1, and several\ uncharacterized ORFs from plant, nematodes and mammals. The complete homology domain comprises\ approx. 200 residues, the highest conservation is found in the C-terminal half. Several of the\ proteins mentioned above have no detectable homology to the N-terminal half of the domain.\ 6291 IPR010505 \

    The molybdenum cofactor (MoCo) is part of the active site of all molybdenum (Mo)-dependent enzymes, with the exception of nitrogenase, and plays important roles in the global carbon, sulphur, and nitrogen cycles PUBMED:14761975. MoCo is synthesized by a highly conserved pathway that involves several enzymes, many of which have been characterised, including MoaA and MoaC from Escherichia coli that catalyse the first step of the pathway, the conversion of guanosine derivative into molybdopterin precursor Z. MoaA belongs to a family of enzymes involved in the synthesis of metallo-cofactors (). Each subunit of the MoaA dimer is comprised of an N-terminal SAM domain () that contains the [4Fe-4S] cluster typical for this family of enzymes, as well as an additional [4Fe-4S] cluster in the C-terminal domain that is unique to MoaA proteins PUBMED:15317939. The unique Fe site of the C-terminal [4Fe-4S] cluster is thought to be involved in the binding and activation of 5'-GTP.

    \

    Mutations in the human MoCo biosynthesis proteins MOCS1, MOCS2 or GEPH cause MoCo Deficiency type A (MOCOD), causing the loss of activity of MoCo-containing enzymes, resulting in neurological abnormalities and death PUBMED:12754701.

    \ \ 6898 IPR009776 \

    This family consists of several bacterial SpoOM proteins which are thought to control sporulation in Bacillus subtilis.Spo0M exerts certain negative effects on sporulation and its gene expression is controlled by sigmaH PUBMED:9795118.

    \ 3842 IPR006766 \ This is a family of conserved plant proteins. The conserved region was identified in a phosphate-induced protein of unknown function PUBMED:10189698.\ 2488 IPR002544 \ The neuropeptide Phe-Met-Arg-Phe-NH2 (FMRFamide) is a potent cardioactive neuropeptide in Lymnaea stagnalis PUBMED:1968092. FMRFamide (Phe-Met-Arg-Phe-NH2) was first demonstrated to be cardioactive in several molluscan species. FMRFamide is now known to be cardioexcitatory in mammals, to inhibit morphine-induced antinociception, and to block morphine-, defeat-, and deprivation-induced feeding PUBMED:3067224. \

    Thirteen neuropeptides varying in length from 7 to 11 residues and ending C-terminally in -Phe-Met-Arg-Phe-NH2 (calliFMRFamides 1-13) and one dodecapeptide ending in -Met-Ile-Arg-Phe-NH2 (calliMIRFamide 1) have been isolated from thoracic ganglia of the blowfly Calliphora vomitoria. Results indicate that the N terminus (in addition to the C terminus as previously found for FMRFamides of other organisms) is crucial for at least some biological activities PUBMED:1549595.

    \ 4165 IPR000911 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L11 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L11 \ is known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the \ basis of sequence similarities PUBMED:2167467, PUBMED:, groups bacteria, plant chloroplast, read \ algal chloroplast, cyanelle and archaeabacterial L11; and mammalian, plant and yeast L12 (YL15). L11 is \ a protein of 140 to 165 amino-acid residues. In E. coli, the C-terminal half of L11 has been \ shown PUBMED:2483975 to be in an extended and loosely folded conformation and is likely to be buried \ within the ribosomal structure.

    \ 1446 IPR000117 \ Kappa-casein is a mammalian milk protein involved in a\ number of important physiological processes PUBMED:9409842. In the gut,\ the ingested protein is split into an insoluble peptide\ (para kappa-casein) and a soluble hydrophilic glycopeptide\ (caseinomacropeptide). Caseinomacropeptide is responsible\ for increased efficiency of digestion, prevention of neonate\ hypersensitivity to ingested proteins, and inhibition of\ gastric pathogens.\ 2615 IPR002543 \

    The FtsK/SpoIIIE domain is found extensively in a wide variety of proteins \ from prokaryotes and plasmids PUBMED:7592387 some of which contain up to three copies.The domain contains a putative ATP binding P-loop motif. A mutation in FtsK causes a temperature sensitive block in cell\ division and it is involved in peptidoglycan synthesis or modification PUBMED:7592387. The SpoIIIE protein is implicated in intercellular chromosomal DNA transfer PUBMED:7592387.

    \ 6231 IPR009437 \

    This family consists of several lamprin proteins from the Sea lamprey Petromyzon marinus. Lamprin, an insoluble non-collagen, non-elastin protein, is the major connective tissue component of the fibrillar extracellular matrix of lamprey annular cartilage. Although not generally homologous to any other protein, soluble lamprins contain a tandemly repeated peptide sequence (GGLGY), which is present in both silkmoth chorion proteins and spider dragline silk. Strong homologies to this repeat sequence are also present in several mammalian and avian elastins. It is thought that these proteins share a structural motif which promotes self-aggregation and fibril formation in proteins through interdigitation of hydrophobic side chains in beta-sheet/beta-turn structures, a motif that has been preserved in recognisable form over several hundred million years of evolution PUBMED:7678258.

    \ 3393 IPR003857 \ This is a family of spirochete major outer sheath protein N-terminal regions. These proteins are present on the bacterial cell surface. In Treponema denticola the major outer sheath protein (Msp) binds immobilized laminin and fibronectin supporting the hypothesis that Msp mediates the extracellular matrix binding activity of T. denticola PUBMED:9023187.\ 1145 IPR001480 \ Members of this domain are plant lectins. Curculin is a sweet-tasting and taste-modifying protein from the fruits of Curculigo latifolia (Lumbah). The three mannose-binding sites are devoid of mannose-binding activity PUBMED:9132060. Other members of this domain are mannose specific and have diverse functions. The lectin of the saffron crocus (Crocus sativus L.) specifically interacts with a yeast mannan and is a major corm protein specifically expressed in this organ PUBMED:10691656. \

    The actin-binding\ and vesicle-associated protein comitin exhibits a mannose-specific\ lectin activity and may have a role in cell motility. It binds to vesicle membranes via mannose residues and, by way of its interaction with actin, links these membranes to the cytoskeleton.

    \ 7999 IPR012583 \

    This N-terminal domain is found in hypothetical nucleolar proteins with NUC202 tandem repeat PUBMED:15112237.

    \ 5338 IPR008908 \ Sarcoglycans are a subcomplex of transmembrane proteins which are part of the dystrophin-glycoprotein complex. They are expressed in the skeletal, cardiac and smooth muscle. Although numerous studies have been conducted on the sarcoglycan subcomplex in skeletal and cardiac muscle, the manner of the distribution and localisation of these proteins along the nonjunctional sarcolemma is not clear PUBMED:12566627. This family contains alpha and epsilon members.\ 208 IPR004190 \ The DNA polymerase processivity factor is a replisome sliding clamp subunit, which is responsible for tethering the catalytic subunit of DNA polymerase to the DNA during high speed replication. The crystal structure of the bacteriophage RB69 sliding clamp has been solved. It has shown that the peptide binds to the sliding clamp at the same position as that of a replication inhibitor peptide bound to PCNA. This suggests that the replication inhibitor protein p21CIP1 competes with eukaryotic polymerases for the same binding pocket on the clamp PUBMED:10535734.\ 454 IPR004161 \ Elongation factor Tu consists of three structural domains, this is the second domain. This second domain adopts a beta barrel structure, and is involved in binding to charged tRNA PUBMED:7491491. This domain is also found in other proteins such as elongation factor G and translation initiation factor IF-2. This domain is structurally related to , and in fact has weak sequence matches to this domain.\ 5575 IPR008788 \ This family consists of Poxvirus N2L proteins. N2L may be responsible for alpha amanitin resistance PUBMED:2024475.\ 3767 IPR005151 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to the MEROPS peptidase family S41 (C-terminal processing peptidase family, clan SM). The members of this group include: the tricorn protease of bacteria and archaea, C-terminal peptidases with different substrates specificities in different species including processing of D1 protein of the photosystem II reaction centre in higher plants and cleavage of a peptide of 11 residues from the precursor form of penicillin-binding protein; and some appear to be responsible for degrading oligopeptides, probably derived from the proteasome.

    \ 4080 IPR004583 \

    In nucleotide excision repair (NER) in eukaryotes, DNA is incised on both sides of the lesion, resulting in the removal of a fragment ~25-30 nucleotides long. This is\ followed by repair synthesis and ligation. This reaction, in yeast, requires the damage binding factors Rad14, RPA, and the Rad4-Rad23 complex, the transcription factor TFIIH which contains the two DNA\ helicases Rad3 and Rad25, essential for creating a bubble structure, and the two endonucleases, the Rad1-Rad10 complex and Rad2, which incise the damaged\ DNA strand on the 5'- and 3'-side of the lesion, respectively PUBMED:10915862.

    Homologues of all the above mentioned yeast genes, except for RAD7, RAD16, and MMS19, have been identified in humans, and mutations in these human genes\ affect NER in a similar fashion as they do in yeast, with the exception of XPC, the human counterpart of yeast RAD4. Deletion of RAD4 causes the same high level\ of UV sensitivity as do mutations in the other class 1 genes, and rad4 mutants are completely defective in incision. By contrast, XPC is required for\ the repair of nontranscribed regions of the genome but not for the repair of the transcribed DNA strand.

    \ 2826 IPR005615 \

    This homodimeric enzyme catalyses the second step in glutathione bisynthesis,\

    \ 6735 IPR009684 \

    This family consists of several animal specific latexin and proteins related to latexin that belong to MEROPS proteinase inhibitor family I47, clan I- PUBMED:14705960.

    \

    Latexin, a protein possessing inhibitory activity against rat carboxypeptidase A1 (CPA1) and CPA2 (MEROPS peptidase family M14A), is expressed in a neuronal subset in the cerebral cortex\ and cells in other neural and non-neural tissues of rat PUBMED:10698712, PUBMED:11455960. OCX-32, the 32 kDa eggshell matrix protein, \ is present at high levels in the uterine fluid during the terminal phase of eggshell formation, and is localised predominantly in the outer eggshell. The timing of OCX-32 secretion into the uterine fluid suggests that it may play a role in the termination of mineral deposition PUBMED:12952168. OCX-32 protein possesses limited identity (32%) to two unrelated proteins: latexin and to a skin protein that is encoded by a retinoic acid receptor-responsive gene, TIG1. Tazarotene Induced Gene 1 (TIG1) is a putative 228 transmembrane protein with a small N-terminal intracellular region, a single membrane-spanning hydrophobic region, and a large C-terminal extracellular region containing a glycosylation signal. TIG1 is up-regulated by retinoic acid receptor but not by retinoid X receptor-specific synthetic retinoids PUBMED:8601727. TIG1 may be a tumour suppressor gene whose diminished expression is involved in the malignant progression of prostate cancer PUBMED:11929948.

    \ \ 5561 IPR008864 \ This family consists of several Tenuivirus nucleocapsid proteins PUBMED:2024478.\ 4294 IPR004025 \ This enzyme hydrolyses 28S rRNA, and acts a protein synthesis inhibitor. Members of the ribonuclease U2 family include ribonuclease mitogillin, ribonuclease alpha-sarcin and ribonuclease clavin precursor proteins.\ 6538 IPR009586 \

    This domain represents a conserved region within the Z subunit of bacterial chlorophyllide reductase. This enzyme converts chlorophylls to bacteriochlorophylls by reducing ring B of the tetrapyrrole. Most proteins of this entry contain the domain.

    \ 4538 IPR007768 \ SUFU, encoding the human ortholog of Drosophila suppressor of fused, appears to have a conserved role in the repression of Hedgehog signalling. SUFU exerts its repressor role by physically interacting with GLI proteins in both the cytoplasm and the nucleus PUBMED:12150819. SUFU has been found to be a tumour-suppressor gene that predisposes individuals to medulloblastoma by modulating the SHH signalling pathway PUBMED:12068298.\ 1266 IPR007036 \

    This family describes both succinylglutamate desuccinylase that catalyses the fifth and last step in arginine catabolism by the arginine succinyltransferase pathway and also includes aspartoacylase which cleaves acylaspartate into a fatty acid and aspartate. Mutations in lead to Canavan disease PUBMED:8252036.

    \ 7613 IPR012858 \

    This group of sequences is similar to a region of the dendritic cell-specific transmembrane protein (DC-STAMP, ). This is thought to be a novel receptor protein that shares no identity with other multimembrane-spanning proteins PUBMED:11169400. It is thought to have seven putative transmembrane regions PUBMED:11169400, two of which are found in the region featured in this family. DC-STAMP is also described as having potential N-linked glycosylation sites and a potential phosphorylation site for PKC PUBMED:11169400, but these are not conserved.

    \ 7383 IPR011496 \

    This family consists of both eukaryotic and prokaryotic hyaluronidases. Human is expressed during meningioma PUBMED:9811929. Clostridium perfringens, , is involved in pathogenesis and is likely to act on connectivity tissue during gas gangrene PUBMED:8177218. It catalyses the random hydrolysis of 1->4-linkages between N-acetyl-beta-D-glucosamine and D-glucuronate residues in hyaluronate.

    \ 3490 IPR005305 \

    This domain oocurs within nepoviruses. Together with comoviruses and picornaviruses, nepoviruses are classified in the picornavirus superfamily of plus strand single-stranded RNA viruses. This domain aligns several nepovirus coat protein sequences. In several cases, this is found at the C-terminus of the RNA2-encoded viral polyprotein. The coat protein consists of three trapezoid-shaped beta-barrel domains, and forms a pseudo T = 3 icosahedral capsid structure PUBMED:9519407.

    \ 6870 IPR010751 \

    This family consists of several bacterial TrfA proteins. The trfA operon of broad-host-range IncP plasmids is essential to activate the origin of vegetative replication in diverse species. The trfA operon encodes two ORFs. The first ORF is highly conserved and encodes a putative single-stranded DNA binding protein (Ssb). The second, trfA, contains two translational starts as in the IncP alpha plasmids, generating related polypeptides of 406 (TrfA1) and 282 (TrfA2) amino acids. TrfA2 is very similar to the IncP alpha product, whereas the N-terminal region of TrfA1 shows very little similarity to the equivalent region of IncP alpha TrfA1. This region has been implicated in the ability of IncP alpha plasmids to replicate efficiently in Pseudomonas aeruginosa PUBMED:8954881.

    \ 2875 IPR004155 \

    These proteins contain a short bi-helical repeat that is related to HEAT. Cyanobacteria and red algae harvest light energy using macromolecular complexes known as phycobilisomes (PBS), peripherally attached to the photosynthetic membrane. The major components of PBS are the phycobiliproteins. These heterodimeric proteins are covalently attached to phycobilins: open-chain tetrapyrrole chromophores, which function as the photosynthetic light-harvesting pigments. Phycobiliproteins differ in sequence and in the nature and number of\ attached phycobilins to each of their subunits. These proteins include the lyase enzymes that specifically attach particular phycobilins to apophycobiliprotein subunits. The most comprehensively studied of these is the CpcE/Flyase , , which attaches phycocyanobilin (PCB) to the alpha subunit of apophycocyanin PUBMED:8132596. Similarly, MpeU/V attaches phycoerythrobilin to phycoerythrin II, while CpeY/Z is thought to be involved in phycoerythrobilin (PEB) attachment to phycoerythrin (PE) I (PEs I and II differ in sequence and in the number of attached molecules of PEB: PE I has five, PE II has six) PUBMED:9023176.

    \

    All the reactions of the above lyases involve an apoprotein cysteine SH addition to a terminal delta 3,3'-double bond. Such a reaction is not possible in the case of phycoviolobilin (PVB), the phycobilin of alpha-phycoerythrocyanin (alpha-PEC). It is thought that in this case, PCB, not PVB, is first added to apo-alpha-PEC, and is then isomerized to PVB. The addition reaction has been shown to occur in the presence of either of the components of alpha-PEC-PVB lyase PecE or PecF (or both). The isomerisation reaction occurs only when both PecE and PecF components are present, i.e. the PecE/F phycobiliprotein lyase is also a phycobilin isomerase PUBMED:10708746. Another member of this family is the NblB protein, whose similarity to the phycobiliprotein lyases was previously noted PUBMED:9882677. This constitutively expressed protein is not known to have any lyase activity. It is thought to be involved in the coordination of PBS degradation with environmental nutrient limitation. It has been suggested that the similarity of NblB to the phycobiliprotein lyases is due to the ability to bind tetrapyrrole phycobilins via the common repeated motif PUBMED:9882677.

    \ 4278 IPR000268 \ In eukaryotes, there are three different forms of DNA-dependent RNA polymerases () transcribing different sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebacteria, there is generally a single form of RNA polymerase which also consists of an oligomeric assemblage of 10 to 13 polypeptides.\ Archaebacterial subunit N (gene rpoN) PUBMED:7597027 is a small protein of about 8 kDa, it\ is evolutionary related PUBMED:8045907 to a 8.3 kDa component shared by all three forms of\ eukaryotic RNA polymerases (gene RPB10 in yeast and POLR2J in mammals) as well\ as to African swine fever virus protein CP80R PUBMED:11831707.\ \ There is a conserved region which is located at the\ N-terminal extremity of these polymerase subunits; this region contains two\ cysteines that binds a zinc ion PUBMED:10841539.\ 4054 IPR003188 \

    The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) PUBMED:8246840, PUBMED:2197982 is a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates and coupled with translocation across the cell membrane, makes the PTS a link between the uptake and metabolism of sugars.

    \ \

    The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred via a signal transduction pathway, to enzyme I (EI) which in turn transfers it to a phosphoryl carrier, the histidine protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease, a membrane-bound complex known as enzyme 2 (EII), which transports the sugar to the cell. EII consists of at least three structurally distinct domains IIA, IIB and IIC PUBMED:1537788. These can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII).

    \ \

    The first domain (IIA or EIIA) carries the first permease-specific phosphorylation site, a histidine which is phosphorylated by phospho-HPr. The second domain (IIB or EIIB) is phosphorylated by phospho-IIA on a cysteinyl or histidyl residue, depending on the sugar transported. Finally, the phosphoryl group is transferred from the IIB domain to the sugar substrate concomitantly with the sugar uptake processed by the IIC domain. This third domain (IIC or EIIC) forms the translocation channel and the specific substrate-binding site.

    \ \

    An additional transmembrane domain IID, homologous to IIC, can be found in some PTSs, e.g. for mannose PUBMED:8246840, PUBMED:1537788, PUBMED:7815935, PUBMED:11361063.

    \ \

    The lactose/cellobiose-specific family are one of four structurally and functionally distinct group IIA PTS system enzymes. This family of proteins normally function as a homotrimer, stabilized by a centrally located metal ion PUBMED:9261069. Separation into subunits is thought to occur after phosphorylation.\

    \ 3166 IPR001124 \ A number of mammalian lipid-binding serum glycoproteins belong to this family.\ They include; the lipopolysaccharide-binding protein (LBP), the bactericidal\ permeability-increasing protein (BPI), the cholesteryl ester transfer protein\ (CETP) and the phospholipid transfer protein (PLTP) \ PUBMED:2722846, PUBMED:8132678, PUBMED:2402637.\ 3850 IPR000811 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 35 \ comprises enzymes with only one known activity; glycogen and starch phosphorylase ().

    \ \

    The main role of glycogen phosphorylase (GPase) is to provide phosphorylated glucose molecules (G-1-P) PUBMED:2182117. GPase is a highly regulated allosteric enzyme. The net effect of the regulatory site allows the enzyme to operate at a variety of rates; the enzyme is not simply regulated as "on" or "off", but rather it can be thought of being set to operate at an ideal rate based on changing conditions at in the cell. The most important allosteric effector is the phosphate molecule covalently attached to Ser14.\ This switches GPase from the b (inactive) state to the a (active) state. Upon phosphorylation, GPase attains about 80% of its Vmax. When the enzyme is not phosphorylated, GPase activity is practically non-existent at low AMP levels PUBMED:.

    \

    \ There is some apparent controversy as to the structure of GPase. All sources agree that the enzyme is multimeric, but there is apparent controversy as to the enzyme being a tetramer or a dimer. Apparently, GPase (in the a\ form) forms tetramers in the crystal form. The consensus seems to be that 'regardless of the a or b form, GPase functions as a dimer in vivo PUBMED:2667896. The GPase monomer is best described as consisting of two domains, an N-terminal domain and a C-terminal domain PUBMED:8798388. The C-terminal domain is often referred to as the catalytic domain. It consists of a beta-sheet core surrounded by layers of helical segments PUBMED:2667896. The vitamin cofactor pyridoxal phosphate (PLP) is covalently attached to the amino acid backbone. The N-terminal domain also consists of a central beta-sheet core and is surrounded by layers of helical segments. The N-terminal domain contains different allosteric effector sites to regulate the enzyme.

    \

    Bacterial phosphorylases follow the same catalytic mechanisms as their plant and animal counterparts, but differ considerably in terms of their substrate specificity and regulation. The catalytic domains are highly conserved while the regulatory sites are only poorly conserved. For maltodextrin phosphorylase from Escherichia coli the physiological role of the enzyme in the utilisation of maltidextrins is known in detail; that of all the other bacterial phosphorylases is still unclear. Roles in regulatuon of endogenous glycogen metabolism in periods of starvation, and sporulation, stress response or quick adaptation to changing environments are possible PUBMED:10077830.

    \ 2115 IPR007401 \ This is a predicted membrane protein.\ 1733 IPR004669 \ These proteins are members of the C4-dicarboxylate Uptake C (DcuC) family. DcuC has 12 GES predicted transmembrane regions, is induced only under anaerobic conditions, and is not repressed by glucose. DcuC may therefore function as a succinate efflux system during anaerobic glucose fermentation. However, when overexpressed, it can replace either DcuA or DcuB in catalyzing fumarate-succinate exchange and fumarate uptake.\ 2591 IPR003961 \

    Fibronectins are multi-domain glycoproteins found in a soluble form in plasma, and in an insoluble form in loose connective tissue and basement membranes PUBMED:3780752. They contain multiple copies of 3 repeat regions (types I, II and III), which bind to a variety of substances including heparin, collagen, DNA, actin, fibrin and fibronectin receptors on cell surfaces. The wide variety of these substances means that fibronectins are involved in a number of important functions: e.g., wound healing; cell adhesion; blood coagulation; cell differentiation and migration; maintenance of the cellular cytoskeleton; and tumour metastasis PUBMED:3031656. The role of fibronectin in cell differentiation is demonstrated by the marked reduction in the expression of its gene when neoplastic transformation occurs. Cell attachment has been found to be mediated by the binding of the tetrapeptide RGDS to integrins on the cell surface PUBMED:2466295, although related sequences can also display cell adhesion activity.

    \

    Plasma fibronectin occurs as a dimer of 2 different subunits, linked together by 2 disulphide bonds near the C-terminus. The difference in the 2 chains occurs in the type III repeat region and is caused by alternative splicing of the mRNA from one gene PUBMED:3780752. The observation that, in a given protein, an individual repeat of one of the 3 types (e.g., the first FnIII repeat) shows much less similarity to its subsequent tandem repeats within that protein than to its equivalent repeat between fibronectins from other species, has suggested that the repeating structure of fibronectin arose at an early stage of evolution. It also seems to suggest that the structure is subject to high selective pressure PUBMED:6317187.

    \

    The fibronectin type III repeat region is an approximately 100 amino acid domain, different tandem repeats of which contain binding sites for DNA, heparin and the cell surface PUBMED:3780752. The superfamily of sequences believed to contain FnIII repeats represents 45 different families, the majority of which are involved in cell surface binding in some manner, or are receptor protein tyrosine kinases, or cytokine receptors.

    \ 7452 IPR011478 \

    This is a family of cytochrome-like proteins in Rhodopirellula baltica. These proteins also contain , , and .

    \ 1502 IPR005045 \ Members of this family have no known function. They have predicted transmembrane helices.\ 5589 IPR008380 \

    This family includes a 5'-nucleotidase, , specific for purines (IMP and GMP) PUBMED:9371705. These enzymes are members of the Haloacid Dehalogenase (HAD) superfamily. HAD members are recognized by three short motifs {hhhhDxDx(T/V)}, {hhhh(T/S)}, and either {hhhh(D/E)(D/E)x(3-4)(G/N)} or {hhhh(G/N)(D/E)x(3-4)(D/E)} (where "h" stands for a hydrophobic residue). Crystal structures of many HAD enzymes has verified PSI-PRED predictions of secondary structural elements which show each of the "hhhh" sequences of the motifs as part of beta sheets. This subfamily of enzymes is part of "Subfamily I" of the HAD superfamily by virtue of a "cap" domain in between motifs 1 and 2. This subfamily's cap domain has a different predicted secondary structure than all other known HAD enzymes and thus has been designated "subfamily IG", the domain appears to consist of a mixed alpha/beta fold.

    \ 882 IPR007726 \ The SSXT or SS18 protein is involved in synovial sarcoma in humans. A SYT-SSX fusion gene resulting from the chromosomal translocation t(X;18) (p11;q11) is characteristic of synovial sarcomas. This translocation fuses the SSXT (SYT) gene from chromosome 18 to either of two homologous genes at Xp11, SSX1 or SSX2 PUBMED:12173050.\ 834 IPR007587 \

    This family includes a conserved region from a group of yeast proteins that associate with the SIT4 phosphatase. This association is required for SIT4's role in G1 cyclin transcription and for bud formation. This family also includes homologous regions from other eukaryotes.

    \ 7101 IPR010835 \

    This family consists of several hypothetical bacterial proteins of around 190 residues in length. Several members of this family are annotated as being putative lipoproteins and are often known as YceB. The function of this family is unknown.

    \ 3825 IPR002196 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 24 comprises enzymes with only one known activity; lysozyme ().

    \ This family includes lambda phage lysozyme and Escherichia coli endolysin PUBMED:3586019. Lysozyme helps to release mature phage particles from the cell wall by breaking down the peptidoglycan. The enzyme hydrolyses the 1,4-beta linkages between N-acetyl-D-glucosamine and N-acetylmuramic acid in peptidoglycan heteropolymers of prokaryotic cell walls. E. coli endolysin also functions in bacterial cell lysis and acts as a transglycosylase.\ \ The T4 lysozyme structure contains 2 domains, the interface between which forms the active-site cleft. The N-terminus of the 2 domains undergoes a 'hinge-bending' motion about an axis passing through the molecular waist PUBMED:3586019, PUBMED:2234094. This mobility is thought to be important in allowing access of substrates to the enzyme active site.\ 4270 IPR001788 \ This family may represent an RNA dependent RNA polymerase PUBMED:8269709. The family contains the following proteins:\ \ 7044 IPR009863 \

    This family consists of several bacterial LcrG proteins. Yersiniae are equipped with the Yop virulon, an apparatus that allows extracellular bacteria to deliver toxic Yop proteins inside the host cell cytosol in order to sabotage the communication networks of the host cell or even to cause cell death. LcrG is a component of the Yop virulon involved in the regulation of secretion of the Yops PUBMED:9484897.

    \ 6141 IPR010451 \

    Acetoacetate decarboxylase (ADC) is involved in solventogenesis in certain bacteria, which occurs at the end of the exponential growth phase when there is a metabolic switch from classical sugar fermentation with the production of acetate and butyrate to the re-internalisation and oxidation of these acids to acetate and butanol PUBMED:11824611. In Clostridium, SpoOA controls the switch from acid to solvent production. A SpoAO-binding motif occurs in the gene encoding ADC PUBMED:10972834.

    \

    This family also contains the fungal decarboxylase DEC1 encoded by the Tox1B locus, which along with the Tox1A gene product is required for the production of the polyketide T-toxin. The pathogenic fungus Cochliobolus heterostrophus requires the T-toxin for high virulence to maize with T-cytoplasm PUBMED:12236595.

    \ 7581 IPR011681 \ GcrA, together with CtrA (see and ), form a master cell cycle regulator. These bacterial regulators are involved in controlling the progression and asymmetric polar morphogenesis PUBMED:15087506. During this process, there are temporal and spatial variations in the concentrations of GcrA and CtrA. The variation in concentration produces time and space dependent transcriptional regulation of modular functions that implement cell-cycle processes PUBMED:15087506. More specifically, GcrA acts as an activator of components of the replisome and the segregation machinery PUBMED:15087506.\ 3212 IPR005132 \ This is a domain found in some bacterial and eukaryotic lipoproteins. The function of RlpA is not well understood, but it has been shown to act as a prc mutant suppressor in Escherichia coli PUBMED:8576052. This entry contains a\ conserved region in the middle of RlpA.\ 7396 IPR011495 \

    This is the dimerisation and phosphoacceptor domain of a subfamily of histidine kinases. It shares sequence similarity with and . It is usually found adjacent to a C-terminal ATPase domain (). This domain is found in a wide range of bacteria and also several archaea.

    \ 7233 IPR010875 \

    This family consists of several bacterial proteins of around 130 residues in length. Members of this family seem to be specific to Borrelia burgdorferi (Lyme disease spirochete). The function of this family is unknown.

    \ 4421 IPR000394 \ Sigma factors PUBMED:3052291 are bacterial transcription initiation factors that promote\ the attachment of the core RNA polymerase to specific initiation sites and are\ then released. They alter the specificity of promoter recognition. Most\ bacteria express a multiplicity of sigma factors. Two of these factors, sigma-\ 70 (gene rpoD), generally known as the major or primary sigma factor, and\ sigma-54 (gene rpoN or ntrA) direct the transcription of a wide variety of\ genes. The other sigma factors, known as alternative sigma factors, are\ required for the transcription of specific subsets of genes.\ With regard to sequence similarity, sigma factors can be grouped into two\ classes: the sigma-54 and sigma-70 families. The sigma-70 family has many\ different sigma factors (see the relevant entry ). The sigma-54\ family consists exclusively of sigma-54 factor PUBMED:2517036, PUBMED:7934866 required for the\ transcription of promoters that have a characteristic -24 and -12 consensus\ recognition element but which are devoid of the typical -10,-35 sequences\ recognized by the major sigma factors. The sigma-54 factor is also\ characterized by its interaction with ATP-dependent positive regulatory\ proteins that bind to upstream activating sequences.\ Structurally sigma-54 factors consist of three distinct regions:\
      \
    1. A relatively well conserved N-terminal glutamine-rich region of about 50 residues that contains a potential leucine zipper motif.
    2. \
    3. A region of variable length which is not well conserved.
    4. \
    5. A well conserved C-terminal region of about 350 residues that contains a second potential leucine zipper, a potential DNA-binding 'helix-turn-helix' motif and a perfectly conserved octapeptide whose function is not known.
    6. \
    \ 6963 IPR010786 \

    This family consists of several hypothetical Enterobacterial proteins of around 250 residues in length. The function of this family is unknown.

    \ 5837 IPR010307 \

    It has been suggested that the domains I and II from laminin A, B1 and B2 may come together to form a triple helical coiled-coil structure PUBMED:3182802.

    \ 2957 IPR000429 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    The group of proteins belongs to the hirudin family; they are proteinase inhibitors belongs to MEROPS inhibitor family I14, clan IM; they inhibit serine peptidases of the S1 family () PUBMED:14705960.

    \ \ \ \

    Hirudin is a potent thrombin inhibitor secreted by the salivary glands of\ the Hirudinaria manillensis (buffalo leech) and Hirudo medicinalis (medicinal leech) PUBMED:3513162. \ It forms a stable non-covalent complex with alpha-thrombin, thereby abolishing its ability to cleave \ fibrinogen. \ The structure of hirudin has been solved by NMR PUBMED:2567183, and the structure \ of a recombinant hirudin-thrombin complex has been determined by X-ray\ crystallography to 2.3A PUBMED:2374926. Hirudin consists of an N-terminal globular\ domain and an extended C-terminal domain. Residues 1-3 form a parallel beta-\ strand with residues 214-217 of thrombin, the nitrogen atom of residue 1\ making a hydrogen bond with the Ser195 O gamma atom of the catalytic site.\ The C-terminal domain makes numerous electrostatic interactions with an\ anion-binding exosite of thrombin, while the last five residues are in\ a helical loop that forms many hydrophobic contacts PUBMED:2374926.

    \ 1061 IPR007689 \ Mating-type protein A-alpha specifies the A-alpha-Y mating type. The A-alpha-Y protein binds to the AalphaZ protein of another mating type in Schizophyllum commune PUBMED:9286672 and may also regulate gene expression of the homokaryotic cell.\ 7734 IPR012490 \

    This is a family of proteins expressed by the crenarchaeon Pyrobaculum aerophilum. The members are highly variable in length and level of conservation. The presence of numerous frameshifts and internal stop codons in multiple alignments are thought to indicate that most family members are no longer functional PUBMED:11792869.

    \ 2387 IPR001816 \

    In prokaryotes elongation factor Ts (EF-Ts) is a component of the elongation cycle of protein biosynthesis. It associates with the EF-Tu.GDP complex and induces the exchange of GDP to GTP, it remains bound to the aminoacyl-tRNA.EF-Tu.GTP complex up to the GTP hydrolysis stage on the ribosome PUBMED:1637866.

    \

    EF-Ts is also a component of the chloroplast protein biosynthetic machinery and is encoded in the genome of some algal chloroplast PUBMED:8219057. It is also present in mitochondria PUBMED:7615523.

    \ \ 792 IPR005806 \

    Ubiquinol-cytochrome c reductase (bc1 complex or complex III) is an enzyme complex of bacterial and mitochondrial oxidative phosphorylation systems It catalyses the \ oxidoreduction of the mobile redox components ubiquinol and cytochrome c, generating an \ electrochemical potential, which is linked to ATP synthesis PUBMED:2986972, PUBMED:3004982. \ The complex consists of three subunits in most bacteria, and nine in mitochondria: both \ bacterial and mitochondrial complexes contain cytochrome b and cytochrome c1 subunits, \ and an iron-sulphur 'Rieske' subunit, which contains a high potential 2Fe-2S cluster PUBMED:2820981.The mitochondrial form also includes six other subunits that do not \ possess redox centres. Plastoquinone-plastocyanin reductase (b6f complex), present in \ cyanobacteria and the chloroplasts of plants, catalyses the oxidoreduction of plastoquinol\ and cytochrome f. This complex, which is functionally similar to ubiquinol-cytochrome c \ reductase, comprises cytochrome b6, cytochrome f and Rieske subunits PUBMED:1391772.

    \

    The Rieske subunit acts by binding either a ubiquinol or plastoquinol anion, transferring \ an electron to the 2Fe-2S cluster, then releasing the electron to the cytochrome c or \ cytochrome f haem iron PUBMED:2986972, PUBMED:1391772. The rieske domain has a [2Fe-2S] centre. Two conserved cysteines that one Fe ion while the other Fe ion is coordinated by two conserved histidines. The 2Fe-2S cluster is bound in the \ highly conserved C-terminal region of the Rieske subunit.

    \ 3116 IPR003349 \

    Jumonji protein is required for neural tube formation in mice PUBMED:7758946.There is evidence of domain swapping within the jumonji family of transcription factors PUBMED:10838566. This domain is often associated with JmjC (see ).

    \ 1818 IPR002939 \

    Molecular chaperones are a diverse family of proteins that function to protect proteins in the intracellular milieu from irreversible aggregation during synthesis and in times of cellular stress. The bacterial molecular chaperone DnaK is an enzyme that couples cycles of ATP binding, hydrolysis, and ADP release by an N-terminal ATP-hydrolizing domain to cycles of sequestration and release of unfolded proteins by a C-terminal substrate binding domain. Dimeric GrpE is the co-chaperone for DnaK, and acts as a nucleotide exchange factor, stimulating the rate of ADP release 5000-fold PUBMED:8016869. DnaK is itself a weak ATPase; ATP hydrolysis by DnaK is stimulated by its interaction with another co-chaperone, DnaJ. Thus the co-chaperones DnaJ and GrpE are capable of tightly regulating the nucleotide-bound and substrate-bound state of DnaK in ways that are necessary for the normal housekeeping functions and stress-related functions of the DnaK molecular chaperone cycle.

    Besides stimulating the ATPase activity of DnaK through its J-domain, DnaJ also associates with unfolded polypeptide chains and prevents their aggregation PUBMED:15063739. Thus, DnaK and DnaJ may bind to one and the same polypeptide chain to form a ternary complex. The formation of a ternary complex may result in cis-interaction of the J-domain of DnaJ with the ATPase domain of DnaK. An unfolded polypeptide may enter the chaperone cycle by associating first either with ATP-liganded DnaK or with DnaJ. DnaK interacts with both the backbone and side chains of a peptide substrate; it thus shows binding polarity and admits only L-peptide segments. In contrast, DnaJ has been shown to bind both L- and D-peptides and is assumed to interact only with the side chains of the substrate.

    This domain consists of the C-terminal region of the DnaJ protein. Although the function of this region is unknown, it is always found associated with and .\

    \ 2209 IPR007561 \ This is a family of uncharacterised proteins.\ 2007 IPR005583 \

    The members of this family are functionally uncharacterised. They are about 250 amino acids in length.

    \ 4951 IPR002588 \ This RNA methyltransferase domain PUBMED:10364504 is found in a wide range of\ ssRNA viruses, including Hordei-, Tobra-, Tobamo-, Bromo-,\ Clostero- and Caliciviruses. This methyltransferase is involved\ in mRNA capping. Capping of mRNA enhances its stability. This usually\ occurs in the nucleus. Therefore, many viruses that replicate\ in the cytoplasm encode their own PUBMED:10364504.\ 405 IPR006107 \

    Glutamyl-tRNA(Gln) amidotransferase subunit B () PUBMED:9342321 is a microbial enzyme that furnishes a means for formation of correctly charged Gln-tRNA(Gln) through the transamidation of misacylated Glu-tRNA(Gln) in organisms which lack glutaminyl-tRNA synthetase. The reaction takes place in the presence of glutamine and ATP through an activated gamma-phospho-Glu-tRNA(Gln). The enzyme is composed of three subunits: A (an amidase), B and C. It also exists in eukaryotes as a protein targeted to the mitochondria.\

    \ 4650 IPR004667 \

    These proteins are members of the ATP:ADP Antiporter (AAA) family, which consists of nucleotide transporters that have 12 GES predicted transmembrane regions. One protein from Rickettsia prowazekii functions to take up ATP from the eukaryotic cell cytoplasm into the bacterium in exchange for ADP. Five AAA family paralogues are encoded within the genome of R. prowazekii. This organism transports UMP and GMP but not CMP, and it seems likely that one or more of the AAA family paralogues are responsible. The genome of Chlamydia trachomatis encodes two AAA family members, Npt1 and Npt2, which catalyse ATP/ADP exchange and GTP, CTP, ATP and UTP uptake probably employing a proton symport mechanism. Two homologous adenylate translocators of Arabidopsis thaliana are postulated to be localized to the intracellular plastid membrane where they function as ATP importers.

    \ 646 IPR003455 \ \

    This domain is found at the N-terminus of polyketide synthesis O-methyltransferase proteins.

    \ \ 6640 IPR010657 \

    This entry represents a conserved region located towards the N-terminal end of ImpA and related proteins. ImpA is an inner membrane protein, which has been suggested to be involved with proteins that are exported and associated with colony variations in Actinobacillus actinomycetemcomitans PUBMED:11083768. Note that many members are hypothetical proteins.

    \ 7859 IPR013107 \

    Acyl Co-A dehydrogenases () are enzymes that catalyse the first step in each cycle of beta-oxidation in mitochondion. Acyl-CoA dehydrogenases PUBMED:3326738, PUBMED:2777793, PUBMED:8034667 catalyze the alpha,beta-dehydrogenation of acyl-CoA thioesters to the corresponding trans 2,3-enoyl CoA-products with concommitant reduction of enzyme-bound FAD. Reoxidation of the flavin involves transfer of electrons to ETF (electron transfering flavoprotein). These enzymes are homodimers containing one molecule of FAD.

    The monomeric enzyme is folded into three domains of approximately equal size. The N-terminal and the C-terminal are mainly alpha-helices packed together, and the middle domain consists of two orthogonal beta-sheets. The flavin ring is buried in the crevise between two alpha-helical domains and the beta-sheet of one subunit, and the adenosine pyrophosphate moiety is stretched into the subunit junction with one formed by two C-terminal domains PUBMED:8356049. The C-terminal domain of Acyl-CoA dehydrogenase is an all-alpha, four helical up-and-down bundle.

    \ 2233 IPR006514 \

    These sequences contain an uncharacterised domain found in both Arabidopsis thaliana (at least 10 copies) and Oryza sativa. Most member proteins have only a short stretch of sequence N-terminal to this domain, but one has a long N-terminal extension that includes a protein kinase domain ().

    \ 3512 IPR002351 \

    Nitrophorins are haemoproteins found in saliva of blood-feeding insects PUBMED:10093938, PUBMED:11058753. Saliva of the blood-sucking bug Rhodnius prolixus contains four homologous nitrophorins, designated NP1 to NP4 in order of their relative abundance in the glands PUBMED:7721773. As isolated, nitrophorins contain nitric oxide (·NO) ligated to the ferric (FeIII) haem iron. Histamine, which is released\ by the host in response to tissue damage, is another nitrophorin ligand. Nitrophorins transport ·NO to the feeding site.\ Dilution, binding of histamine and increase in pH (from pH ~5 in salivary gland to pH ~7.4 in the host tissue) facilitate the release of ·NO into the tissue where it induces vasodilatation.

    \ \

    The salivary nitrophorin from the hemipteran Cimex lectularius has no sequence similarity to Rhodnius prolixus nitrophorins. It is suggested that the two classes of insect nitrophorins have arisen as a product of the convergent\ evolution PUBMED:9716517.

    \ \

    3-D structures of several nitrophorin complexes are known PUBMED:11058753. The nitrophorin structures reveal lipocalin-like\ eight-stranded beta-barrel, three alpha-helices and two disulphide bonds, with haem inserted into one end of the barrel. Members of the lipocalin family are known to bind a variety of small hydrophobic ligands, including biliverdin, in a similar fashion (see PUBMED:8761444 for review). The haem iron is ligated to His59. The position of His59 is restrained through water-mediated\ hydrogen bond to the carboxylate of Asp70. The His59-Fe bond is bent ~15° out of the imidazole plane. Asp70 forms an unusual hydrogen bond with one of the haem propionates, suggesting the residue has an altered pKa. In NP1-histamine\ structure (PDB 1NP1), the planes of His59 and histamine imidazole rings lie in an arrangement almost identical to that found in oxidised cytochrome b5.

    \ 4616 IPR001111 \ The transforming growth factor beta, N-terminus (TGFb) domain is present in a\ variety of proteins which include the transforming growth factor beta,\ decapentaplegic proteins and bone morphogenetic proteins. Transforming growth\ factor beta is a multifunctional peptide that controls proliferation,\ differentiation and other functions in many cell types. The decapentaplegic\ protein acts as an extracellular morphogen responsible for the proper\ development of the embryonic dorsal hypoderm, for viability of larvae and\ for cell viability of the epithelial cells in the imaginal disks. Bone\ morphogenetic protein induces cartilage and bone formation and may be responsible\ for epithelial osteogenesis in some organisms.\ 7522 IPR011650 \ This domain consists of 4 beta strands and two alpha helices which make up the dimerisation surface of members of the M20 family of peptidases PUBMED:9083113. This family includes a range of zinc metallopeptidases belonging to several families in the peptidase classification PUBMED:7674922. Family M20 are Glutamate carboxypeptidases. Peptidase family M25 contains X-His dipeptidases.\ 3080 IPR013021 \

    This is a region of myo-inositol-1-phosphate synthases that is related to the glyceraldehyde-3-phosphate dehydrogenase-like, C-terminal domain.

    \

    1L-myo-Inositol-1-phosphate synthase () catalyzes the conversion of D-glucose 6-phosphate to 1L-myo-inositol-1-phosphate, the first committed step in the production of all inositol-containing compounds, including phospholipids, either directly or by salvage. The enzyme exists in a cytoplasmic form in a wide range of plants, animals, and fungi. It has also been detected in several bacteria and a chloroplast form is observed in alga and higher plants. Inositol phosphates play an important role in signal transduction.

    \

    In baker's yeast, Saccharomyces cerevisiae, the transcriptional regulation of the INO1 gene has been studied in detail PUBMED:7975896 and its expression is sensitive to the availability of phospholipid precursors as well as growth phase. The regulation of the structural gene encoding 1L-myo-inositol-1-phosphate synthase has also been analyzed at the transcriptional level in the aquatic angiosperm, Spirodela polyrrhiza and the halophyte, Mesembryanthemum crystallinum PUBMED:9370339.

    \ 1600 IPR000293 \

    Colicins are plasmid-encoded polypeptide toxins produced by and active against Escherichia coli and closely related bacteria. The channel-forming colicins are transmembrane proteins that depolarize the cytoplasmic membrane, leading to dissipation of cellular energy PUBMED:1689257, PUBMED:1693745. Colicins A, B, E1, Ia, Ib, and N belong to that group. The N-terminal part of these colicins is involved in their uptake; the central part is important for binding to outer membrane receptors and the C-terminal part is the channel-forming region.

    \ 5120 IPR007957 \

    L11L is an integral membrane protein of the African swine fever\ virus, which is expressed late in the virus replication cycle. The protein is thought to be\ non-essential for growth in vitro and for virus virulence in domestic pigs PUBMED:9603334.

    \ 4808 IPR011063 \

    This group of proteins belongs to the PP-loop superfamily PUBMED:7731953.

    \ 16 IPR001048 \ This entry contains proteins with various specificities and includes the aspartate, glutamate and uridylate kinase families. In prokaryotes and plants the synthesis of the essential amino acids lysine and threonine is predominantly regulated by feed-back inhibition of aspartate kinase (AK) and dihydrodipicolinate synthase (DHPS).\ In Escherichia coli, thrA, metLM, and lysC encode aspartokinase isozymes that show feedback inhibition by threonine, methionine, and lysine, respectively PUBMED:10220897. The lysine-sensitive isoenzyme of aspartate kinase from spinach leaves has a subunit composition of 4 large and 4 small subunits PUBMED:9584993. \

    In plants although the control of carbon fixation and nitrogen assimilation has been studied in detail, relatively little is known about the regulation of carbon and nitrogen flow into amino acids. The metabolic regulation of expression of an Arabidopsis thaliana aspartate kinase/homoserine dehydrogenase (AK/HSD) gene, which encodes two linked key enzymes in the biosynthetic pathway of aspartate family amino acids has been studied PUBMED:9501134. The conversion of aspartate into either the storage amino acid asparagine or aspartate family amino acids may be subject to a coordinated, reciprocal metabolic control, and this biochemical branch point is a part of a larger, coordinated regulatory mechanism of nitrogen and carbon storage and utilization.

    \ 490 IPR001879 \

    This domain is found in the extracellular part of some hormone receptors including the calcitonin receptor; corticotropin releasing factor receptor 1; diuretic hormone receptor; glucagon-like peptide 1 receptor; and parathyroid hormone peptide receptor.

    \ 5398 IPR008484 \ This family consists of several short (27-30aa) porcine and bovine circovirus ORF6 proteins of unknown function.\ 276 IPR007158 \ The proteins in this family are around 200 amino acids long with the exception of that has an additional 100 amino acids at its N terminus. The function of these bacterial protein is unknown, however, they do contain several conserved histidines and aspartates that might form a metal-binding site.\ 1302 IPR005042 \

    The pathogenicity gene, pthA, of Xanthomonas citri is required to elicit symptoms of Asiatic citrus canker disease;\ introduction of pthA into Xanthomonas strains that are mildly pathogenic or opportunistic on citrus confers the ability to\ induce cankers on citrus. Structurally, pthA is highly similar to avrBs3 and avrBsP\ from X. c. pv. vesicatoria and to avrB4, avrb6, avrb7, avrBIn, avrB101, and avrB102 from X. c. pv. malvacearum PUBMED:1421509.

    \ 4863 IPR005356 \

    The protein in this family are about 190 amino acids long. The function of these proteins is unknown.

    \ 2348 IPR002785 \

    Proteins of this family are associated with CRISPR repeats in a wide set of prokaryotic genomes; their function is unknown.

    \ 1719 IPR001206 \ Diacylglycerol kinase (DGK, ) phosphorylates diacylglycerol (DAG) to yield phosphatidic acid. This enzyme initiates\ resynthesis of phosphoinositides consumed by phospholipase C during cellular signal transduction. Mammalian DGK consists of nine isozymes encoded by separate genes PUBMED:11983067. In addition to\ PKC-like zinc fingers and catalytic regions commonly conserved in all DGKs, these isozymes contain a variety of regulatory\ domains of known and/or predicted functions. The mammalian isozymes are named according to the order of their cDNA\ cloning and are subdivided into five groups based on their characteristic structural features. Each DGK isozyme is a critical downstream component of a\ DAG-dependent signaling system. \

    This domain is usually associated with an accessory domain (see ).

    \ 7339 IPR011090 \

    This family of proteins is restricted to the Gammaproteobacteria, their function is unknown.

    \ 510 IPR005522 \

    ArgRIII has been demonstrated to be an inositol polyphosphate kinase PUBMED:10574768 which catalyses the reaction\ \ .

    \ 5888 IPR010335 \

    This family consists of several mammalian pre-pro-megakaryocyte potentiating factor precursor (MPF) or mesothelin proteins. Mesothelin is a glycosylphosphatidylinositol-linked glycoprotein highly expressed in mesothelial cells, mesotheliomas, and ovarian cancer, but the biological function of the protein is not known PUBMED:10733593,PUBMED:10500211.

    \ 6623 IPR009629 \

    This family consists of several Erythrovirus X proteins, which seem to be found exclusively in human parvovirus and human erythrovirus. The function of this family is unknown.

    \ 322 IPR007807 \ This domain is about 350 amino acid residues long and appears to have a P-loop motif, suggesting this is an ATPase. This domain is often N-terminal to a GCN5-related N-acetyltransferase domain .\ 1776 IPR000512 \ Diphtheria toxin () is a 58 kDa protein secreted by lysogenic strains of Corynebacterium diphtheriae. The toxin causes the disease diphtheria in humans by gaining entry into the cell cytoplasm and inhibiting protein synthesis PUBMED:8573568. The mechanism of inhibition involves transfer of the ADP-ribose group of NAD to elongation factor-2 (EF-2), rendering EF-2 inactive. The catalysed reaction is as follows: \ \ The crystal structure of the diphtheria toxin homodimer has been determined to 2.5A resolution PUBMED:1589020. The structure reveals a Y-shaped molecule of 3 domains, a catalytic domain (fragment A), whose fold is of the alpha + beta type; a transmembrane (TM) domain, which consists of 9 alpha-helices, 2 pairs of which may participate in pH-triggered membrane insertion and translocation; and a receptor-binding domain, which forms a flattened beta-barrel with a jelly-roll-like topology PUBMED:1589020. The TM- and receptor binding-domains together constitute fragment B.\ 2566 IPR002910 \ This family consists of various plant development proteins which are homologues of Floricaula (FLO) and leafy (LFY) proteins which are floral meristem\ identity proteins.\ Mutations in the sequences of these proteins affect flower and leaf development.\ 4477 IPR004169 \ This family of spider neurotoxins are thought to be calcium ion channel inhibitors.\ 6422 IPR010566 \

    This family consists of a number of bacteria specific domains, which are found in haemolysin-type calcium binding proteins. This family is found in conjunction with and is often found in multiple copies.

    \ 6920 IPR009790 \

    This family consists of several hypothetical mammalian proteins of around 250 residues in length. The function of this family is unknown.

    \ 912 IPR004095 \

    The TGS domain is present in a number of enzymes, for example, in threonyl-tRNA synthetase (ThrRS), GTPase, and guanosine-3',5'-bis(diphosphate) 3'-pyrophosphohydrolase (SpoT) PUBMED:10447505. The TGS domain is also present at the amino terminus of the uridine kinase from the spirochaete Treponema pallidum (but not any other organism, including the related spirochaete Borrelia burgdorferi).

    \

    TGS is a small domain that consists of ~50 amino acid residues and is predicted to possess a predominantly beta-sheet structure. There is no direct information\ on the functions of the TGS domain, but its presence in two types of\ regulatory proteins (the GTPases and guanosine polyphosphate phosphohydrolases/synthetases) suggests a ligand (most likely nucleotide)-binding, regulatory role PUBMED:10447505.

    \ 6914 IPR009787 \

    This family consists of several hypothetical eukaryotic proteins of around 190 residues in length. The function of this family is unknown.

    \ 7194 IPR009959 \

    This family consists of several hypothetical bacterial proteins of around 125 residues in length. The function of this family is unknown.

    \ 7253 IPR009990 \

    This family consists of several Pardaxin proteins. Pardaxin, a 33-amino-acid pore-forming polypeptide toxin isolated from the Red Sea Moses sole Pardachirus marmoratus, has a helix-hinge-helix structure. This is a common structural motif found both in antibacterial peptides that can act selectively on bacterial membranes (e.g., cecropin), and in cytotoxic peptides that can lyse both mammalian and bacterial cells (e.g., melittin). Pardaxin possesses a high antibacterial activity with a significantly reduced haemolytic activity towards human red blood cells compared with melittin PUBMED:8620888. Pardaxin has also been found to have a shark repellent action PUBMED:3996550.

    \ 2954 IPR004792 \ This is a family of conserved hypothetical proteins that may include proteins with a dinucleotide-binding motif (Rossman fold), including oxidoreductases and dehydrogenases.\ 3812 IPR006516 \

    This set of sequences represent a family of phage and plasmid replication proteins. In bacteriophage IKe and related phage, the full-length protein is designated gene II protein. A much shorter protein of unknown function, translated from a conserved in-frame alternative initiator, is designated gene X protein. Members of this family also include plasmid replication proteins.

    \ 1874 IPR003366 \ This domain is found in a family of hypothetical Caenorhabditis elegans proteins. The aligned region has no known function nor do any of the proteins which possess it. The aligned region is approximately 130 amino acids long and contains two conserved cysteine residues.\ 7590 IPR011673 \ This is a family of proteins of unknown function expressed by various bacterial species. Some members of this family (e.g. , ) are thought to be lipoproteins. Another member of this family () is thought to be involved in photosynthesis PUBMED:10976061.\ 6662 IPR009649 \

    This family consists of several bacterial TraU proteins. TraU appears to be more essential to conjugal DNA transfer than to assembly of pilus filaments PUBMED:2198250.

    \ 6613 IPR009623 \

    This is a group of proteins of unknown function.

    \ 3164 IPR003334 \ Latrophilin is a member of the secretin family of G protein-coupled receptors. Alpha-Latrotoxin (LTX) stimulates massive exocytosis of synaptic vesicles and may help to elucidate the mechanism of regulation of neurosecretion. Latrophilin is the synaptic Ca2+-independent LTX receptor. The extracellular domain of latrophilin is homologous to olfactomedin (see ), a soluble neuronal protein thought to participate in odorant binding. Latrophilin may bind unidentified endogenous ligands and transduce signals into nerve terminals, thus implicating G proteins in the control of synaptic vesicle exocytosis PUBMED:9261169.\ 963 IPR003903 \ The Ubiquitin Interacting Motif (UIM) was first described in the 26S proteasome subunit PSD4/RPN-10 PUBMED:9488668. It is known to bind multiple ubiquitin and was also found in many proteins involved in the endocytic pathway, including\ the PSD4/RPN-10/S5a multiubiquitin binding subunit of the 26S proteasome; the VPS27 vacuolar sorting protein; and ataxin-3, a protein involved in ataxia disease.\ 1388 IPR003760 \ This is a family of basic membrane lipoproteins from Borrelia and various putative lipoproteins from other bacteria.\ All of these proteins are outer membrane proteins and are thus antigenic in nature when possessed by the pathogenic\ members of the family PUBMED:9350727. \

    The Bacillus subtilis degR, a positive regulator of the production of degradative enzymes, is also a member of this group PUBMED:9335269.

    \ 1790 IPR007059 \ The terminal electron transfer enzyme dimethyl sulphoxide reductase of Escherichia coli is a heterotrimeric enzyme composed of a membrane extrinsic catalytic dimer (DmsAB) and a membrane intrinsic polytopic anchor subunit (DmsC) PUBMED:8429002. This family represents DmsC.\ \ \ 4375 IPR001991 \

    It has been shown PUBMED:8031825 that integral membrane proteins that mediate the uptake\ of a wide variety of molecules with the concomitant uptake of sodium ions\ (sodium symporters) can be grouped, on the basis of sequence and functional\ similarities into a number of distinct families. One of these families PUBMED:1279699 is\ known as the sodium:dicarboxylate symporter family (SDF).

    \ \

    Such re-uptake of neurotransmitters from the synapses, is thought to be an important mechanism for terminating their action, by removing these chemicals from the synaptic cleft, and transporting them into presynaptic nerve terminals, and surrounding neuroglia. this removal is also believed to prevent them accumulating to the point of reaching neurotoxic PUBMED:1448170, PUBMED:1280334.

    \ \

    The structure of these transporter proteins has been variously reported to\ contain from 8 to 10 transmembrane (TM) regions, although 10 now seems to\ be the accepted value.

    \ \

    Members of the family include: several mammalian excitatory amino acid transporters, and a number of bacterial transporters. They vary with regars to their dependence on transport of sodium, and other ions.

    \ 832 IPR007856 \

    Synonym(s):cerebroside sulphate activator, CSAct

    Saposin B is a small non-enzymatic glycoprotein required for the breakdown\ of cerebroside sulphates (sulphatides) in lysosomes. Saposin B contains three intramolecular disulphide bridges, exists as a dimer and is remarkably heat, protease, and pH stable. The crystal structure of human saposin B reveals an unusual shell-like dimer consisting of a monolayer of alpha-helices enclosing a large hydrophobic cavity. Although the secondary structure of saposin B is similar to that of the known monomeric members of the saposin-like superfamily, the helices are repacked into a different tertiary arrangement to form the homodimer. A comparison of the two forms of the saposin B dimer suggests that extraction of target lipids from membranes involves a conformational change that facilitates access to the inner cavity PUBMED:12518053.

    \ \ 5161 IPR007998 \

    This family consists of several eukaryotic proteins of unknown function.

    \ 8081 IPR013213 \

    Mastoparans are a family of tetradecapeptides from wasp venom that have been shown to directly activate GTP-binding regulatory proteins. These peptides show selectivity among G proteins: they strongly activate Go and Gi but not Gs or Gt. The peptides of this family are composed by 14 amino acids but they can assume different structures PUBMED:9537994.

    \ 3071 IPR001811 \ Synonym(s): cytokine, intecrine\

    Many low-molecular weight factors secreted by cells including fibroblasts, macrophages and endothelial cells, in response to a variety of stimuli such as growth factors, interferons, viral transformation and bacterial products, are structurally related PUBMED:1910690, PUBMED:2149646, PUBMED:2687068. Most members of this family of proteins seem to have mitogenic, chemotactic or inflammatory activities. These small cytokines are also called intercrines or chemokines. They are cationic proteins of 70 to 100 amino acid residues that share four conserved cysteine residues involved in two disulphide bonds, as shown in the following schematic representation:\

    \
                                 +------------------------------------+\
                                 |                                    |\
         xxxxxxxxxxxxxxxxxxxxxxCxCxxxxxxxxxxxxxxxxxxxxxxxCxxxxxxxxxxxxCxxxxx\
                               |                         |\
                               +-------------------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    

    \

    These proteins can be sorted into two groups based on the spacing of the two amino-terminal cysteines. In the first group (see ), the two cysteines are separated by a single residue (C-x-C), while in the second group (see ), they are adjacent (C-C).

    \ \ 6692 IPR009663 \

    This family consists of several enterobacterial PilO proteins. The function of PilO is unknown although it has been suggested that it is a cytoplasmic protein in the absence of other Pil proteins, but PilO protein is translocated to the outer membrane in the presence of other Pil proteins. Alternatively, PilO protein may form a complex with other Pil protein(s). PilO has been predicted to function as a component of the pilin transport apparatus and thin-pilus basal body PUBMED:11751821. This family does not seem to be related to .

    \ 6390 IPR010553 \

    This family consists of several hypothetical and putative bacterial sugar phospate isomerase proteins related to RpiB.

    \ 7924 IPR012627 \

    This family consists of Magi peptide toxins (Magi 1, 2 and 5) isolated from the venom of Hexathelidae spider. These insecticidal peptide toxins bind to sodium channels and induce flaccid paralysis when injected into lepidopteran larvae. However, these peptides are not toxic to mice when injected intracranially at 20 pmol/g.

    \ 476 IPR000445 \ The HhH motif is an around 20 amino acids domain present in prokaryotic and\ eukaryotic non-sequence-specific DNA binding proteins PUBMED:7664751, PUBMED:9973609, PUBMED:9987128. \ The HhH motif is similar to, but distinct from, the HtH motif. Both of these\ motifs have two helices connected by a short turn. In the HtH motif the second\ helix binds to DNA with the helix in the major groove. This allow the contact\ between specific base and residues throughout the protein. In the HhH motif\ the second helix does not protrude from the surface of the protein and\ therefore cannot lie in the major groove of the DNA. Crystallographic studies\ suggest that the interaction of the HhH domain with DNA is mediated by amino\ acids located in the strongly conserved loop (L-P-G-V) and at the N-terminal\ end of the second helix PUBMED:7664751. This interaction could involve the formation of\ hydrogen bonds between protein backbone nitrogens and DNA phosphate groups\ PUBMED:8692686. \ The structural difference between the HtH and HhH domains is reflected at the\ functional level: whereas the HtH domain, found primarily in gene regulatory\ proteins, binds DNA in a sequence specific manner, the HhH domain is rather\ found in proteins involved in enzymatic activities and binds DNA with no\ sequence specificity PUBMED:8692686.\ 7966 IPR012995 \

    This family consists of the CIII family of regulatory proteins. The lambda CIII protein has 54 amino acids and it forms an amphipathic helix within its amino acid sequence. Lambda CIII stabilises the lambda CII protein and the host sigma factor 32, responsible for transcribing genes of the heat shock regulon PUBMED:8990286.

    \ 6746 IPR009689 \

    This family represents a conserved region approximately 200 residues long within a number of proteins of unknown function that seem to be specific to Caenorhabditis elegans.

    \ 5208 IPR008042 \

    This signature identifies members of the Pao retrotransposon family.

    \ 6929 IPR009795 \

    This family consists of several Trypanosoma brucei putative variant specific antigen proteins of around 80 residues in length.

    \ 616 IPR006988 \ Nab1 and Nab2 are co-repressors that specifically interact with and repress transcription mediated by the three members of the NGFI-A (Egr-1, Krox24, zif/268) family of eukaryotic (metazoa) transcription factors PUBMED:9418898. This region consists of the N-terminal NAB conserved region 1, which interacts with the EGR1 inhibitory domain (R1) PUBMED:9418898. It may also mediate multimerisation.\ \ 2916 IPR006928 \

    This entry represents the N-terminal domain of the Herpesvirus tegument protein.

    \ 3084 IPR004825 \ The insulin family of proteins PUBMED:6107857 groups a number of active peptides which are evolutionary\ related including insulin; relaxin; insulin-like growth factors I and II PUBMED:2197088; mammalian\ Leydig cell-specific insulin-like peptide (gene INSL3) PUBMED:8253799 and early placenta insulin-like\ peptide (ELIP) (gene INSL4) PUBMED:8666396; insect prothoracicotropic hormone (bombyxin) PUBMED:;\ locust insulin-related peptide (LIRP) PUBMED:1688797; molluscan insulin-related peptides 1 to 5 (MIP)\ PUBMED:1868853; and Caenorhabditis elegans insulin-like peptides PUBMED:9548970. Structurally, all these peptides\ consist of two polypeptide chains (A and B) linked by two disulphide bonds. They all share a conserved\ arrangement of four cysteines in their A chain. The first of these cysteines is linked by a disulphide\ bond to the third one and the second and fourth cysteines are linked by interchain disulphide bonds to\ cysteines in the B chain. Insulin is involved in the regulation of normal glucose homeostasis, as well\ as other specific physiological functions PUBMED:6243748. It is synthesised as a\ prepropeptide from which an endoplasmic reticulum-targeting sequence is cleaved to yield proinsulin.\ Prosinsulin contains regions A and B separated by an intervening connecting region, C. The\ connecting region is cleaved, liberating the active protein, which contains the A and B chains,\ held together by 2 disulphide bonds PUBMED:503234.\ 2944 IPR000361 \

    This family includes HesB which may be involved in nitrogen fixation; the hesB gene is expressed only under nitrogen fixation conditions PUBMED:10217509. Other members of this family include various hypothetical proteins that also contain the NifU-like domain (). NifU-like proteins are found in species as divergent as humans and H. influenzae suggesting that these proteins perform some basic cellular function PUBMED:8875867.

    \ 4123 IPR001590 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M12, subfamily M12B (adamalysin family, clan (MA(M)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA and the predicted active site residues for members of this family and thermolysin occur in the motif HEXXH PUBMED:7674922.

    \ \

    The adamalysins are zinc dependant endopeptidases found in snake venom. There are some mammalian proteins such as ,\ and fertilin . Fertilin and closely related\ proteins appear to not have some active site residues and\ may not be active enzymes.

    \ \ \

    CD156 (also called ADAM8 () or MS2 human) has been implicated in extravasation of leukocytes. CD molecules are leucocyte antigens on cell surfaces (CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm).\

    \ \ 1381 IPR007100 \

    RNA-directed RNA polymerase (RdRp) () is an essential protein encoded in the genomes of all RNA containing viruses with no DNA stage. It catalyses synthesis of the RNA strand complementary to a given RNA template. RdRp's of many viruses are products of processing of polyproteins. Some RdRp's consist of one polypeptide chain, others are complexes of several subunits.\ The domain organization PUBMED:9878607 and the 3D structure of the catalytic center of a wide range of RdPp's, even those with a low overall sequence homology, are conserved. The catalytic center is formed by several motifs containing a number of conserved amino acid residues.

    \ \

    There are 4 superfamilies of viruses that cover all RNA containing viruses with no DNA stage:

    \ The RNA-directed RNA polymerases may also be divided into the following three subgroups of the above superfamily:\

    \ \

    This family consists of the Birnaviridae enzymes. These proteins lack the highly conserved Gly-Asp-Asp (GDD) sequence, a component of the proposed catalytic site of this enzyme family that exists in the conserved motif VI of the palm domain of other RNA-directed RNA polymerases PUBMED:12069523.

    \ 7933 IPR012965 \

    This is a fungal domain of unknown function, though the yeast protein MSB1() which contains this domain is thought to play a role in bud formation PUBMED:1996092.

    \ 3374 IPR007182 \

    This domain is found in a possible subunit of the Na+/H+ antiporter PUBMED:9852009, PUBMED:11356194 as well as in the bacterial NADH dehydrogenase subunit. Usually four transmembrane regions are found in this domain.

    \ 4418 IPR004124 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Sialidases () hydrolyse alpha-(2->3)-, alpha-(2->6)-, alpha-(2->8)-glycosidic linkages of terminal sialic residues in oligosaccharides, glycoproteins, glycolipids, colominic acid and synthetic substrates. Sialidases may act as pathogenic factors in microbial infections PUBMED:2034213.

    \

    The 1.8 A\ structure of trans-sialidase from leech (Macrobdella decora, ) in complex with 2-deoxy-2, 3-didehydro-NeuAc was solved. The refined model comprising\ residues 81-769 has a catalytic beta-propeller domain, a N-terminal lectin-like domain and an irregular beta-stranded\ domain inserted into the catalytic domain PUBMED:9562562.

    \ 2815 IPR001899 \

    Viruses, parasites and bacteria are covered in protein and sugar molecules that help them gain entry into a host by counteracting the host's defences. One such molecule is the M protein produced by certain streptococcal bacteria. M proteins embody a motif that is now known to be shared by many Gram-positive bacterial surface proteins. The motif includes a conserved hexapeptide, which precedes a hydrophobic C-terminal membrane anchor, which itself precedes a cluster of basic residues PUBMED:2188957, PUBMED:2287281.\ This structure is represented in the following schematic representation:

    \
    \
      +--------------------------------------------+-+--------+-+\
      |    Variable length extracellular domain    |H| Anchor |B|\
      +--------------------------------------------+-+--------+-+\
    \
      'H': conserved hexapeptide.\
      'B': cluster of basic residues.\
    
    \

    It has been proposed that this hexapeptide sequence is responsible for a post-\ translational modification necessary for the proper anchoring of the proteins\ which bear it, to the cell wall.

    \ \ 7121 IPR009910 \

    This family consists of several hypothetical bacterial proteins of around 80 residues in length. Members of this family contain four highly conserved cysteine residues. The function of this family is unknown.

    \ 130 IPR001547 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 5 comprises enzymes with several known activities; endoglucanase (); beta-mannanase (); exo-1,3-glucanase (); endo-1,6-glucanase (); xylanase (); endoglycoceramidase ().

    \ \

    The microbial degradation of cellulose and xylans requires several types of enzymes. Fungi and bacteria produces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, can be classified into families. One of these families is known as the cellulase family A PUBMED:2806912 or as the glycosyl hydrolases family 5 PUBMED:1747104. One of the conserved regions in this family contains a conserved glutamic acid residue which is potentially involved PUBMED:1677466 in the catalytic mechanism.

    \ 3984 IPR004259 \ This family includes the M1 phosphoprotein non-structural RNA polymerase alpha subunit () from various strains of rabies virus PUBMED:2148206. The M1 phosphoprotein is thought to be a\ component of the active polymerase, and may be involved in template binding.\ 8097 IPR013272 \

    This domain is found at the C terminus in proteins of the YL1 family PUBMED:7702631. These proteins have been shown to be DNA-binding and may be transcription factors PUBMED:7702631. This domain is also found in proteins that do not belong to the YL1 family.

    \ 7146 IPR009927 \

    This family consists of several hypothetical archaeal proteins of around 350 residues in length. The function of this family is unknown.

    \ 3281 IPR000102 \ In MAP1B the basic region containing the KKEE and KKEVI motifs is responsible for the interaction between MAP1B and microtubules in vivo. This region bears no sequence relationship to the microtubule binding domains of kinesin, MAP2, or tau PUBMED:2480963.\ Neuraxin is a putative structural protein of the rat central nervous system that is immunologically related to\ microtubule-associated protein 5 (MAP5). Neuraxin may be implicated in neuronal membrane-microtubule interactions PUBMED:2555150. Both proteins contain a region that consists of 12 tandem\ repeats of a 17 residues motif.\ 4152 IPR002734 \ This domain is\ found in the C-terminus of the bifunctional deaminase-reductase of Escherichia coli, Bacillus subtilis and other bacteria in combination with that catalyses the second and third steps in the biosynthesis of riboflavin, i.e., the deamination of 2,5-diamino-6-ribosylamino-4(3H)-pyrimidinone 5'-phosphate (deaminase) and the subsequent reduction of the ribosyl side chain (reductase) PUBMED:9068650. The domain is also present in some HTP reductases from archaea and fungi.\ 7436 IPR011465 \

    This is a family of paralogous proteins from Rhodopirellula baltica.

    \ 6628 IPR010651 \

    This is a family of bacterial sugar transporters approximately 300 residues long. Members include glucose uptake proteins PUBMED:10438764, ribose transport proteins, and several putative and hypothetical membrane proteins probably involved in sugar transport across bacterial membranes.

    \ 4540 IPR005828 \

    Recent genome-sequencing data and a wealth of biochemical and molecular genetic investigations have revealed the occurrence of dozens of families of primary and secondary transporters. Two such families have been found to occur ubiquitously in all classifications of living organisms. These are the ATP-binding cassette (ABC) superfamily and the major facilitator superfamily (MFS), also called the uniporter-symporter-antiporter family. While ABC family permeases are in general multicomponent primary active transporters, capable of transporting both small molecules and macromolecules in response to ATP hydrolysis the MFS transporters are single-polypeptide secondary carriers capable only of transporting small solutes in response to chemiosmotic ion gradients. Although well over 100 families of transporters have now been recognized and classified, the ABC superfamily and MFS account for nearly half of the solute transporters encoded within the genomes of microorganisms. They are also prevalent in higher organisms. The\ importance of these two families of transport systems to living organisms can therefore not be overestimated PUBMED:9529885.

    \ \

    The MFS was originally believed to function primarily in the uptake of sugars but subsequent studies revealed that drug efflux systems, Krebs cycle metabolites, organophosphate:phosphate exchangers, oligosaccharide:H1 symport permeases, and bacterial aromatic acid permeases were all members of the MFS. These observations led to the probability that the MFS is far more widespread in nature and far more diverse in function than had been thought previously. 17 subgroups of the MFS have been identified PUBMED:9529885.

    \ \

    Evidence suggests that the MFS permeases arose by a tandem intragenic duplication event in the early prokaryotes. This event generated a 2-transmembrane-spanner (TMS) protein topology from a primordial 6-TMS unit. Surprisingly, all currently recognized MFS permeases retain the two six-TMS units within a single polypeptide chain, although in 3 of the 17 MFS families, an additional two TMSs are found PUBMED:8987357. Moreover, the well-conserved MFS specific motif between TMS2 and TMS3 and the related but less well conserved motif between TMS8 and TMS9 PUBMED:1970645 prove to be a characteristic of virtually all of the more than 300 MFS proteins identified.

    \ \ 3092 IPR003895 \

    Secretion of virulence factors in Gram-negative bacteria involves transportation of the protein across two membranes to reach the cell exterior PUBMED:7608068. There have been four secretion systems described in \ animal enteropathogens, such as Salmonella and Yersinia, with further \ sequence similarities in plant pathogens like Ralstonia and Erwinia.

    \

    The type III secretion system is of great interest, as it is used to \ transport virulence factors from the pathogen directly into the host cell \ PUBMED:9618447 and is only triggered when the bacterium comes into close contact with the host. The protein subunits of the system are very similar to those of bacterial flagellar biosynthesis. However, while the latter forms a\ ring structure to allow secretion of flagellin and is an integral part of\ the flagellum itself PUBMED:10564516, type III subunits in the outer membrane\ translocate secreted proteins through a channel-like structure.

    \

    Exotoxins secreted by the type III system do not possess a secretion signal,\ and are considered unique for this reason PUBMED:9618447. Salmonella and Shigella spp. secrete an invasive protein, named SipB and IpaB respectively. These proteins are required for internalisation of the bacterium within the host\ cell PUBMED:7608068. Induction of apoptosis is then carried out, by the binding of IL-1 converting enzyme to the exotoxin.

    \ 2994 IPR005000 \

    This family includes 2,4-dihydroxyhept-2-ene-1,7-dioic acid aldolase () and 4-hydroxy-2-oxovalerate aldolase ().

    \ 4015 IPR007345 \ Pyruvyl-transferases are involved in peptidoglycan-associated polymer biosynthesis. CsaB in Bacillus anthracis is necessary for the non-covalent anchoring of proteins containing an SLH (S-layer homology) domain to peptidoglycan-associated pyruvylated polysaccharides. WcaK and AmsJ are involved in the biosynthesis of colanic acid in Escherichia coli and of amylovoran in Erwinia amylovora PUBMED:10970841.\ 3125 IPR004684 \ This family includes the characterized 2-Keto-3-Deoxygluconate transporters from Bacillus subtilis and Erwinia chrysanthemi. There are homologs of this protein found in both Gram-positive and Gram-negative bacteria.\ 4526 IPR000366 \

    G-protein-coupled receptors, GPCRs, constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence PUBMED:8170923. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. There is a specialized database for GPCRs: http://www.gpcr.org/7tm/.

    \ \

    Little is known about the structure and function of the mating factor\ receptors, STE2 and STE3. It is believed, however, that they are integral\ membrane proteins that may be involved in the response to mating factors\ on the cell membrane PUBMED:, PUBMED:3001640, PUBMED:2836861. The amino acid sequences of both receptors\ contain high proportions of hydrophobic residues grouped into 7 domains,\ in a manner reminiscent of the rhodopsins and other receptors believed to\ interact with G-proteins. However, while a similar 3D framework has been\ proposed to account for this, there is no significant sequence similarity\ either between STE2 and STE3, or between these and the rhodopsin-type\ family: the receptors thus bear their own unique '7TM' signatures.

    \ \ 2464 IPR000305 \

    During the process of Escherichia coli nucleotide excision repair, DNA damage recognition and processing are achieved by the action of the uvrA, uvrB, and uvrC gene products PUBMED:12034838. The UvrC proteins contain 4 conserved regions: a central region which interacts with UvrB (Uvr domain), a Helix hairpin Helix (HhH) domain important for 5 prime incision of damage DNA and the homology regions 1 and 2 of unknown function. UvrC homology region 2 is specific for UvrC proteins, whereas UvrC homology region 1 is also shared by few other nucleases.

    \

    It is found in the amino terminal region of excinuclease abc subunit c (uvrC), bacteriophage T4 endonucleases segA, segB, segC, segD and segE; it is also found in putative endonucleases encoded by group I introns of fungi and phage.

    \ 1258 IPR001962 \ This domain is always found associated with (). Family members that contain this domain catalyse the conversion of aspartate to asparagine. Asparagine synthetase B () catalyzes the assembly of asparagine from aspartate, Mg(2+)ATP, and glutamine.\ The three-dimensional architecture of the N-terminal domain of asparagine synthetase B is similar to that observed for glutamine phosphoribosylpyrophosphate amidotransferase while the molecular motif of the C-domain is reminiscent to that observed for GMP synthetase PUBMED:10587437.\ 1015 IPR002893 \

    The MYND domain (myeloid, Nervy, and DEAF-1) is\ present in a large group of proteins that includes RP-8\ (PDCD2), Nervy, and predicted proteins\ from Drosophila, mammals, Caenorhabditis elegans, yeast, and\ plants PUBMED:7498738, PUBMED:8617243,\ PUBMED:2072913. The MYND domain consists of a\ cluster of cysteine and histidine residues, arranged with\ an invariant spacing to form a potential zinc-binding\ motif PUBMED:8617243. Mutating conserved\ cysteine residues in the DEAF-1 MYND domain does not\ abolish DNA binding, which suggests that the MYND\ domain might be involved in protein-protein interactions\ PUBMED:8617243. Indeed, the MYND domain\ of ETO/MTG8 interacts directly with the N-CoR and\ SMRT co-repressors PUBMED:9584201, PUBMED:9819404. Aberrant recruitment\ of co-repressor complexes and inappropriate transcriptional\ repression is believed to be a general mechanism\ of leukemogenesis caused by the t(8;21)\ translocations that fuse ETO with the acute myelogenous\ leukemia 1 (AML1) protein. ETO has been shown to\ be a co-repressor recruited by the promyelocytic leukemia\ zinc finger (PLZF) protein PUBMED:10688654. A\ divergent MYND domain present in the adenovirus E1A\ binding protein BS69 was also shown to interact with\ N-CoR and mediate transcriptional repression PUBMED:10734313. The current evidence suggests that\ the MYND motif in mammalian proteins constitutes a\ protein-protein interaction domain that functions as a\ co-repressor-recruiting interface.

    \ \ 5597 IPR008555 \ This family consists of several eukaryotic proteins of unknown function. One of the family members () is a circulating cathodic antigen (CCA) found in Schistosoma mansoni (Blood fluke) PUBMED:10413050.\ 6762 IPR010706 \

    This family consists of several fatty acid cis/trans isomerase proteins, which appear to be found exclusively in bacteria of the orders Vibrionales and Pseudomonadales. Cis/trans isomerase (CTI) catalyses the cis-trans isomerisation of esterified fatty acids in phospholipids, mainly cis-oleic acid (C(16:1,9)) and cis-vaccenic acid (C(18:1,11)), in response to solvents. The CTI protein has been shown to be involved in solvent resistance in Pseudomonas putida PUBMED:10482510.

    \ 6720 IPR009679 \

    This family consists of several phage regulatory protein CII (CP76) sequences which are thought to be DNA binding proteins which are involved in the establishment of lysogeny PUBMED:3806670.

    \ 3474 IPR007260 \ This family represents a putative ManNAc-6-P-to-GlcNAc-6P epimerase in the N-acetylmannosamine (ManNAc) utilization pathway found mainly in pathogenic bacteria.\ 620 IPR006202 \

    Neurotransmitter ligand-gated ion channels are transmembrane receptor-ion channel complexes that open transiently upon binding of specific ligands, allowing rapid transmission of signals at chemical synapses PUBMED:1721053, PUBMED:1846404.

    \

    Of the five families known, four have been shown to form a sequence-related superfamily. These are the gamma-aminobutyric acid type A (GABA-A), nicotinic acetylcholine, glycine and the serotonin 5HT3 receptors. The ionotropic glutamate receptors () have a distinct primary structure.

    \

    However, all these receptors possess a pentameric structure (made up of varying subunits), surrounding a central pore. Each of these subunits contains a large extracellular N-terminal ligand-binding region; 3 hydrophobic transmembrane domains; a large intracellular region; and a fourth hydrophobic domain PUBMED:1721053, PUBMED:1846404.

    \ \

    This entry presents the extracellular ligand binding domain of these ion channels. This domain forms a pentameric arrangement in the known structure.

    \ 1007 IPR002515 \ Zinc fingers are found in a wide variety of proteins, and are associated with DNA binding. There are several different types, and this family contains the C2HC-type zinc finger, which is found in eukaryotes.\ 2678 IPR006336 \

    Also known as gamma-glutamylcysteine synthetase and gamma-ECS (). This enzyme catalyses the first and rate limiting step in de novo glutathione biosynthesis. Members of this family are found in archaea, bacteria and plants. May and Leaver PUBMED:7937837 discuss the possible evolutionary origins of glutamate-cysteine ligase enzymes in different organisms and suggest that it evolved independently in different eukaryotes, from an ancestral bacterial enzyme. They also state that Arabidopsis thaliana gamma-glutamylcysteine synthetase is structurally unrelated to mammalian, yeast and Escherichia coli homologues. In plants, there are separate cytosolic and chloroplast forms of the enzyme.

    \ 4206 IPR001706 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    L35 is a basic protein of 60 to 70 amino-acid residues from the large (50S) subunit PUBMED:3542048. Like many basic polypeptides, L35 completely inhibits ornithine decarboxylase when present unbound in the cell, but the inhibitory function is abolished upon its incorporation into ribosomes PUBMED:3542048. It belongs to a family of ribosomal proteins, including L35 from bacteria, plant chloroplast, red algae chloroplasts and cyanelles. In plants it is a nuclear encoded gene product, which suggests a chloroplast-to-nucleus relocation during the evolution of higher plants PUBMED:2271612.

    \ 4925 IPR002843 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \ \ \

    This family includes the AC39 subunit from vacuolar ATP\ synthase PUBMED:8509410, and the C subunit from archaebacterial\ ATP synthase PUBMED:8702544. The family also includes subunit C from the\ Sodium transporting ATP synthase from Enterococcus hirae PUBMED:8157629.

    \ 2929 IPR005209 \

    This family includes the UL34 protein from herpesviruses . UL34 gene product is a membrane protein exclusively phosphorylated by the U(S)3 protein kinase PUBMED:1656069, PUBMED:1318405. This protein forms a complex with PUBMED:11507225.

    \ 5548 IPR008601 \ This family is based on a group of Dictyostelium discoideum proteins that are essential in early development PUBMED:2153977. and are located on the cell surface and mediate cell-cell adhesion.\ 7314 IPR011105 \

    These enzymes have been implicated in cell wall hydrolysis, most extensively in Bacillus subtilis. For instance is expressed during sporulation in an inactive form and deposited on the cell outer cortex. During germination the enzyme is activated and hydrolyses the cortex PUBMED:10658652. A similar role is carried out by the partially redundant PUBMED:9515903.

    \ \

    The sleB gene () encodes a germination-specific N-acetylmuramyl-L-alanine amidase in B. subtilis and\ Bacillus cereus PUBMED:10197998. It is synthesized with a putative\ signal sequence and hydrolyses the spore cortex in situ, during germination. In dormant spores it exist in a mature but inactive state.

    \ \ 5962 IPR009308 \

    This family consists of several bacterial L-rhamnose isomerase proteins (). This enzyme interconverts L-rhamnose and L-rhamnulose. In some species, including Escherichia coli, this is the first step in rhamnose catabolism. Sequential steps are catalyzed by rhamnulose kinase (rhaB), then rhamnulose-1-phosphate aldolase (rhaD) to yield glycerone phosphate and (S)-lactaldehyde.

    \ 622 IPR004274 \ The function of this domain is unclear. It is found in proteins of diverse function including phosphatases some of which may be active in active in ternary elongation complexes and a number of NLI interacting factors. In the phospatases this domain is often present N-terminal to the BRCT domain ().\ 673 IPR000070 \

    Pectinesterase () (pectin methylesterase) catalyses the de-esterification of pectin into pectate and methanol. Pectin is one of the main components of the plant cell wall. In plants, pectinesterase plays an important role in cell wall metabolism during fruit ripening. In plant bacterial pathogens such as Erwinia carotovora and in fungal pathogens such as Aspergillus niger, pectinesterase is involved in maceration and soft-rotting of plant tissue. Plant pectinesterases are regulated by pectinesterase inhibitors, which are ineffective against microbial enzymes PUBMED:15722470.

    \

    Prokaryotic and eukaryotic pectinesterases share a few regions of sequence similarity. The crystal structure of pectinesterase from Erwinia chrysanthemi revealed a beta-helix structure similar to that found in pectinolytic enzymes, though it is different from most structures of esterases PUBMED:11162105. The putative catalytic residues are in a similar location to those of the active site and substrate-binding cleft of pectate lyase.

    \ \ 193 IPR002125 \

    Cytidine deaminase () (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into uridine and ammonia while deoxycytidylate deaminase () (dCMP deaminase) hydrolyzes dCMP into dUMP. Both enzymes are known to bind zinc and to require it for their catalytic activity PUBMED:1567863, PUBMED:8428902. These two enzymes do not share any sequence similarity with the exception of a region that contains three conserved histidine and cysteine residues which are thought to be involved in the binding of the catalytic zinc ion.

    \

    Such a region is also found in other proteins PUBMED:8061614, PUBMED:8203015:

    \ \ 2921 IPR005207 \

    This is a family of Herpesvirus proteins including UL14. UL14 protein is a minor component of the virion tegument PUBMED:10590088 and is expressed late in infection. UL14 protein can influence the intracellular localization patterns of a number of proteins belonging to the capsid or the DNA encapsidation machinery PUBMED:11161269.

    \ 8072 IPR013154 \

    This is the catalytic domain of alcohol dehydrogenases. Many of them contain an inserted zinc binding domain. This domain has a GroES-like structure PUBMED:8804825, PUBMED:10556240.

    \ 5105 IPR007942 \

    This family contains several phospholipase-like proteins from Arabidopsis thaliana which are homologous to PEARLI 4.

    \ 7620 IPR012432 \

    This is a group of sequences found in hypothetical proteins predicted to be expressed in a number of bacterial species. The region in question is approximately 150 amino acid residues long.

    \ 1164 IPR007873 \ The formation of N-glycosidic linkages of glycoproteins involves the ordered assembly of the common Glc3Man9GlcNAc2 core-oligosaccharide on the lipid carrier dolichyl pyrophosphate. Whereas early mannosylation steps occur on the cytoplasmic side of the endoplasmic reticulum with GDP-Man as donor, the final reactions from Man5GlcNAc2-PP-Dol to Man9GlcNAc2-PP-Dol on the lumenal side use Dol-P-Man PUBMED:11308030. The ALG3 gene encodes the Dol-P-Man:Man5GlcNAc2-PP-Dol mannosyltransferase.\ 5710 IPR008381 \ Mutants of the Saccharomyces cerevisiae ACN9 gene have two- to fourfold elevated levels of enzymes of the glyoxylate cycle, gluconeogenesis, and acetyl-CoA metabolism PUBMED:10328823. The ACN9 protein was localised to the mitochondrial intermembrane space PUBMED:10103055.\ 3114 IPR007310 \ Bacteria solve the iron supply problem caused by the insolubility of Fe(3+) by synthesizing iron-complexing compounds, called siderophores, and by using iron sources of their hosts, such as heme and iron bound to transferrin and lactoferrin. Escherichia coli, as an example of Gram-negative bacteria, forms sophisticated Fe(3+)-siderophore and heme transport systems across the outer membrane. LucA and IucC catalyse discrete steps in biosynthesis of the siderophore aerobactin from N epsilon-acetyl-N epsilon-hydroxylysine and citrate PUBMED:3087960.\ 6465 IPR009538 \

    This family consists of several PV-1 (PLVAP) proteins, which seem to be specific to mammals. PV-1 is a novel protein component of the endothelial fenestral and stomatal diaphragms PUBMED:11401446. The function of this family is unknown.

    \ 1190 IPR001544 \

    Aminotransferases share certain mechanistic features with other pyridoxal-phosphate dependent enzymes, such as the covalent binding of the pyridoxal-phosphate group to a lysine residue. On the basis of sequence similarity, these various enzymes can be grouped PUBMED:1644759 into subfamilies.

    \

    One of these, called class-IV, currently consists of proteins of about 270 to 415 amino-acid residues that share a few regions of sequence similarity. Surprisingly, the best conserved region does not include the lysine residue to which the pyridoxal-phosphate group is known to be attached, in ilvE, but is located some 40 residues at the C terminus side of the pyridoxal-phosphate-lysine. The D-amino acid transferases (D-AAT), which are among the members of this entry, are required by bacteria to catalyse the synthesis of D-glutamic acid and D-alanine, which are essential constituents of bacterial cell wall and are the building block for other D-amino acids. Despite the difference in the structure of the substrates, D-AATs and L-ATTs have strong similarity PUBMED:7626635, PUBMED:9163511.

    \ 2563 IPR007442 \ FliO is an essential component of the flagellum-specific protein export apparatus PUBMED:10049367. It is an integral membrane protein. Its precise molecular function is unknown.\ 2819 IPR006812 \ Found in clostridia, this protein contains one active site selenocysteine and catalyses the reductive deamination of glycine, which is coupled to the esterification of orthophosphate resulting in the formation of ATP PUBMED:2963330. A member of this family may also exist in Treponema denticola PUBMED:11797052.\ 4860 IPR005353 \

    The function of this family of proteins is unknown.

    \ 1005 IPR000315 \ The B-box zinc finger is an around 40 amino acids domain. One or two copies of\ this motif are generally associated with a ring finger and a coiled coil motif\ to form the so-called tripartite motif. It is found essentially in\ transcription factors, ribonucleoproteins and protooncoproteins, but no\ function is clearly assigned to this domain PUBMED:9923704. It has been shown to be\ essential but not sufficient to localize the PML protein in a punctate pattern\ in interphase nuclei PUBMED:8643677. Among the 7 possible ligands for the zinc atom\ contained in a B-box, only 4 are used and bind one zinc atom in a Cys2-His2\ tetrahedral arrangement. The NMR analysis reveals that the B-box structure\ comprises two beta-strands, two helical turns and three extended loop regions\ different from any other zinc binding motif PUBMED:8846787.\ 3443 IPR007695 \ This signature is of the N-terminal domain of proteins in the mutS family of DNA mismatch repair proteins and is found associated with located in the C-terminal region.\ Yeast MSH3, bacterial proteins involved in DNA mismatch repair and the predicted protein product of the Rep-3 gene of mouse share extensive sequence similarity. \ This family of proteins is named after the Salmonella typhimurium MutS protein that is involved in replication repair and plays a role in preventing recombination between non-identical sequences PUBMED:8510668. \ Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein.\ \

    Mismatch repair contributes to the overall fidelity of DNA replication PUBMED:3304141. It\ involves the correction of mismatched base pairs that have been missed by the\ proofreading element of the DNA polymerase complex. The sequence of some\ proteins involved in mismatch repair in different organisms have been found to\ be evolutionary related PUBMED:1651234, PUBMED:8510668.

    \ 6413 IPR010562 \

    This family consists of several insect specific haemolymph juvenile hormone binding proteins (JHBP). Juvenile hormone (JH) has a profound effect on insects. It regulates embryogenesis, maintains the status quo of larva development and stimulates reproductive maturation in the adult forms. JH is transported from the sites of its synthesis to target tissues by a haemolymph carrier called juvenile hormone-binding protein (JHBP). This protects the JH molecules from hydrolysis by non-specific esterases present in the insect haemolymph PUBMED:12595713.

    \ 6054 IPR010412 \

    This is a family of conserved bacterial proteins with unknown function.

    \ 36 IPR001952 \

    Alkaline phosphatase () (ALP) PUBMED:2379681 is a zinc and magnesium-containing metalloenzyme which hydrolyzes phosphate esters, optimally at high pH. It is found in nearly all living organisms, with the exception of some plants. In Escherichia coli, ALP (gene phoA) is found in the periplasmic space. In Saccharomyces cerevisiae it (gene PHO8) is found in lysosome-like vacuoles and in mammals, it is a glycoprotein attached to the membrane by a GPI-anchor.

    \

    In streptomyces species alkaline phosphatase is involved in the synthesis of streptomycin (SM), an antibiotic, express a phosphatase () (gene strK) which is highly related to ALP. It specifically cleaves both streptomycin-6-phosphate and, more slowly, streptomycin-3''-phosphate PUBMED:1654502.

    \

    In mammals, four different isozymes are currently known PUBMED:2286375. Three of them are tissue-specific: the placental, placental-like (germ cell) and intestinal isozymes. The fourth form is tissue non-specific and was previously known as the liver/bone/kidney isozyme.

    \

    Alkaline phosphatase exists as a dimer, each monomer binding 2 zinc atoms and one magnesium atom, which are essential for enzymic activity, and folds into a 10-stranded beta-sheet structure PUBMED:1898729.

    \ 3398 IPR001354 \

    Mandelate racemase () (MR) and muconate lactonizing enzyme () (MLE) \ are two bacterial enzymes involved in aromatic acid catabolism. They catalyze \ mechanistically distinct reactions yet they are related at the level of their primary, \ quaternary (homooctamer) and tertiary structures PUBMED:2215699, PUBMED:8256284.\ A number of other proteins also seem to be evolutionary related to these two\ enzymes. These include, various plasmid-encoded chloromuconate cycloisomerases \ (), Escherichia coli protein rspA PUBMED:7545940, E. coli bifunctional DGOA protein, E. coli hypothetical proteins ycjG, yfaW and yidU and a hypothetical protein from Streptomyces \ ambofaciens PUBMED:8277241.

    \ 4150 IPR001676 \

    This domain occurs in the capsid proteins of picornaviruses, which are non-enveloped plus-strand ssRNA animal viruses with icosahedral capsids. They include rhinovirus (common cold) and poliovirus.

    \

    The common structure is an 8-stranded beta sandwich which can have one or two extra strands.

    \ 697 IPR004106 \

    This unusual 7-stranded beta-propeller found as a N-terminal domain protects the catalytic triad of prolyl oligopeptidase (see ) with which it is almost always associated, excluding larger peptides and proteins from proteolysis in the cytosol.

    \ \

    Prolyl oligopeptidase are serine peptidases belong to MEROPS peptidase family S9 (clan SC), subfamily S9A. The protein fold of the peptidase domain for members of this family resembles that of serine carboxypeptidase D, the type example of clan SC.

    \ \

    The prolyl oligopeptidase family PUBMED:1953688, PUBMED:1515061, PUBMED:1355343 consist of a number of evolutionary related peptidases whose catalytic activity seems to be provided by a charge relay system similar to that of the trypsin family of serine proteases, but which evolved by independent convergent evolution.

    \ \ \ \ 4219 IPR005568 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    L6 is a protein from the large (50S) subunit. In Escerichia coli, it is located in the aminoacyl-tRNA binding\ site of the peptidyltransferase centre, and is known to bind directly to 23S rRNA. It belongs\ to a family of ribosomal proteins, including L6 from bacteria, cyanelles (structures that\ perform similar functions to chloroplasts, but have structural and biochemical characteristics\ of Cyanobacteria) and mitochondria; and L9 from mammals, Drosophila, plants and yeast. L6\ comprises 2 almost identical folds, suggesting that is was derived by the duplication of an\ ancient RNA-binding protein gene. Analysis reveals several sites on the protein surface where\ interactions with other ribosome components may occur, the N-terminus being involved in \ protein-protein interactions and the C-terminus containing possible RNA-binding sites PUBMED:8262035.

    \ 1085 IPR007752 \ The ActA family is found in Listeria and is associated with motility. ActA protein acts as a scaffold to assemble and activate host cell actin cytoskeletal factors at the bacterial surface, resulting in directional actin polymerization and propulsion of the bacterium through the cytoplasm of the host cell PUBMED:11886549, PUBMED:11854187.\ 1711 IPR005126 \ Within the NapC/NirT family of cytochrome c proteins, some members, such as NapC and NirT , bind four haem groups, while others, such as TorC , bind five haems. This family aligns the common N-terminal region that contains four haem-binding C-X(2)-CH motifs.\ 7008 IPR009840 \

    This family consists of several hypothetical bacterial proteins of around 135 residues in length. Members of this family appear to be found exclusively in the Enterobacteria Escherichia coli, Citrobacter rodentium and Salmonella typhi. The function of this family is unknown.

    \ 4750 IPR004981 \

    This is a family of tryptophan 2,3-dioxygenase () enzymes involved in tryptophan metabolism, which catalyse the reaction:\

    \ 3586 IPR002565 \ This is a family of Orbivirus non structural protein of unknown function, but which may play a role in release of the\ virus from infected cells PUBMED:1654377.\ 6848 IPR009748 \

    This family consists of several Orthopoxvirus C10L proteins. C10L viral protein can play an important role in vaccinia virus evasion of the host immune system. It may consist in the blockade of IL-1 receptors by the C10L protein, a homologue of the IL-1 Ra PUBMED:12084512.

    \ 76 IPR005546 \

    Secretion of protein products occurs by a number of different pathways in bacteria. One of these pathways known as the type V pathway was first described for the IgA1 protease PUBMED:3027577. The protein component that mediates secretion through the outer membrane is contained within the secreted protein itself, hence the proteins secreted in this way are called autotransporters. This family corresponds to the presumed integral membrane beta-barrel domain that transports the protein. This domain is found at the C-terminus of the proteins it occurs in. The N-terminus contains the variable passenger domain that is translocated across the membrane. Once the passenger domain is exported it is cleaved auto-catalytically in some proteins, in others a different peptidase is used and in some cases no cleavage occurs PUBMED:9778731. In those proteins where the cleavage is auto-catalytic, the peptidase domains belong to MEROPS peptidase families S6 and S8.

    \ 5336 IPR008812 \ The small Ras-like GTPase Ran plays an essential role in the transport of macromolecules in and out of the nucleus and has been implicated in spindle and nuclear envelope formation during mitosis in higher eukaryotes. The Saccharomyces cerevisiae ORF YGL164c encoding a novel RanGTP-binding protein, termed Yrb30p was identified. The protein competes with S. cerevisiae RanBP1 (Yrb1p) for binding to the GTP-bound form of S. cerevisiae Ran (Gsp1p) and is, like Yrb1p, able to form trimeric complexes with RanGTP and some of the karyopherins PUBMED:12578832.\ 1490 IPR007078 \ The CcmD protein is part of a C-type cytochrome biogenesis operon PUBMED:7635817. The exact function of this protein is uncertain. It has been proposed that CcmC, CcmD and CcmE interact directly with each other, establishing a cytoplasm to periplasm haem delivery pathway for cytochrome c maturation PUBMED:10998170. This protein is found fused to CcmE in . These proteins contain a predicted transmembrane helix.\ 96 IPR001357 \

    The BRCT domain (after the C_terminal domain of a breast cancer susceptibility protein) is found predominantly in proteins involved in cell cycle checkpoint \ functions responsive to DNA damage PUBMED:9034168, for example as found in the breast cancer DNA-repair protein BRCA1. The domain is an approximately 100 amino acid tandem repeat, which appears to act as a phospho-protein binding domain PUBMED:14576433.

    \

    A chitin biosynthesis protein from \ yeast also seems to belong to this group.

    \ 2960 IPR001692 \

    Histidinol dehydrogenase () (HDH) catalyzes the terminal step in the biosynthesis of histidine in bacteria, fungi, and plants, the four-electron oxidation of L-histidinol to histidine.

    \

    In 4-electron dehydrogenases, a single active site catalyses 2 separate oxidation steps: oxidation of the substrate alcohol to an intermediate aldehyde; and oxidation of the aldehyde to the product acid, in this case His PUBMED:3533140. The reaction proceeds via a tightly- or covalently-bound inter-mediate, and requires the presence of 2 NAD molecules PUBMED:3533140. By contrast with most dehydrogenases, the substrate is bound before the NAD coenzyme PUBMED:3533140. A Cys residue has been implicated in the catalytic mechanism of the second oxidative step PUBMED:3533140.

    \

    In bacteria HDH is a single chain polypeptide; in fungi it is the C-terminal domain of a multifunctional enzyme which catalyzes three different steps of histidine biosynthesis; and in plants it is expressed as nuclear encoded protein precursor which is exported to the chloroplast PUBMED:2034659.

    \ 4393 IPR006722 \

    Sedlin is a 140 amino-acid protein with a putative role in endoplasmic reticulum-to-Golgi transport. Several\ missense mutations and deletion mutations in the SEDL gene, which result in protein truncation by frame shift, are responsible for\ spondyloepiphyseal dysplasia tarda, a progressive skeletal disorder (OMIM:313400). PUBMED:11349230.

    This entry represents an N-terminal conserved region.

    \ 6472 IPR010591 \

    This family consists of several eukaryotic ATP11 proteins. In Saccharomyces cerevisiae, expression of functional F1-ATPase requires two proteins encoded by the ATP11 and ATP12 genes PUBMED:1532796.

    \ 3109 IPR006008 \

    Intracellular septation protein A is a family of proteins which are essential for both normal cell division and bacterial virulence and are believed to play a role in the septation process PUBMED:9746567.

    \ 1197 IPR005533 \

    This domain may have a role in cell adhesion PUBMED:11893501. It is called the AMOP domain after Adhesion associated domain in MUC4 and Other Proteins. This domain is extracellular and contains a number of cysteines that probably form disulphide bridges.

    \ \ 956 IPR000449 \

    UBA domains are a commonly occurring sequence motif of approximately 45 amino acid residues that are found in diverse proteins\ involved in the ubiquitin/proteasome pathway, DNA excision-repair, and cell signaling via protein kinases PUBMED:8871400. The human homologue of\ yeast Rad23A is one example of a nucleotide excision-repair protein that contains both an internal and a C-terminal UBA\ domain.

    The solution structure of human Rad23A UBA(2) showed that the domain forms a compact three-helix bundle PUBMED:9846873. Comparison of the structures of UBA(1) and UBA(2) reveals that both form\ very similar folds and have a conserved large hydrophobic surface patch which may be a common protein-interacting surface present in diverse UBA domains. Evidence that ubiquitin binds to UBA domains leads to the prediction that the hydrophobic surface patch of UBA domains interacts\ with the hydrophobic surface on the five-stranded beta-sheet of ubiquitin PUBMED:12079361.

    \ \ \ \ 5705 IPR008572 \ Proteins containing this 36 residue repeated sequence have no known function.\ 720 IPR006786 \

    This conserved region is located adjacent and C-terminal to a N-terminal pinin/SKD domain . Members of this family have very varied localisations within the eukaryotic cell. Pinin is known to localise at the desmosomes and is implicated in anchoring intermediate filaments to the desmosomal plaque PUBMED:8922384. SDK2/3 is a dynamically localised nuclear protein thought to be involved in modulation of alternative pre-mRNA splicing PUBMED:9447706. MemA is a tumour marker preferentially expressed in human melanoma cell lines. A common feature of the members of this family is that they may all participate in regulating protein-protein interactions PUBMED:10095061.

    \ 6714 IPR009675 \

    This family represents a conserved region approximately 60 residues long within the eukaryotic targeting protein for Xklp2 (TPX2). Xklp2 is a kinesin-like protein localised on centrosomes throughout the cell cycle and on spindle pole microtubules during metaphase. In Xenopus, it has been shown that Xklp2 protein is required for centrosome separation and maintenance of spindle bi-polarity PUBMED:8548825. TPX2 is a microtubule-associated protein that mediates the binding of the C-terminal domain of Xklp2 to microtubules. It is phosphorylated during mitosis in a microtubule-dependent way PUBMED:10871281.

    \ 404 IPR000991 \

    Glutamine amidotransferase (GATase) () activity involves the removal of the ammonia group from a glutamate molecule and its subsequent transfer to a specific substrate, thus creating a new carbon-nitrogen group on the substrate. This activity is found in a range of biosynthetic enzymes, including glutamine amidotransferase, anthranilate synthase component II, p-aminobenzoate, and glutamine-dependent carbamoyl-transferase (CPSase). Glutamine amidotransferase (GATase) domains can occur either as single polypeptides, as in glutamine amidotransferases, or as domains in a much larger multifunctional synthase protein, such as CPSase. On the basis of sequence similarities two classes of GATase domains have been identified PUBMED:3298209, PUBMED:6086650, class-I (also known as trpG-type) and class-II (also known as purF-type). Class-I GATase domains are defined by a conserved catalytic triad consisting of cysteine, histidine and glutamate. Class-I GPTase domains have been found in the following enzymes, the second component of anthranilate synthase and 4-amino-4-deoxychorismate (ADC) synthase; CTP synthase; GMP synthase; glutamine-dependent carbamoyl-phosphate synthase; phosphoribosylformylglycinamidine synthase II; and the histidine amidotransferase hisH.

    \ \

    These signatures also detect peptidases belonging to MEROPS peptidase family C26 (gamma-glutamyl hydrolase), and non-peptidase homologs belonging to family C56 (PfpI endopeptidase) both of which are members of clan PC(C). Other members of family C56 are found in .

    \ \ \ 7632 IPR012444 \

    The sequences making up this family are all derived from hypothetical proteins expressed by C. elegans. The region in question is approximately 160 amino acids long.

    \ 8088 IPR013157 \

    This family of antibacterial peptides are secreted from the granular dorsal glands of Litoria aurea (Green and Golden Bell Frog), L. raniformis (Southern Bell Frog), L. citropa (Blue Mountains tree-frog) and frogs from genus Uperoleia. They are a part of the FSAP peptide family. Amongst the more active of these are aurein 1.2, aurein 2.2 and aurein 3.1; caerin 1.1, maculatin 1.1, uperin 3.6 PUBMED:10951191; citropin 1.1, citropin 1.2, citropin 1.3 and a minor peptide are wide-spectrum antibacterial peptides PUBMED:10504394.

    \ 6649 IPR010662 \

    This family contains a number of hypothetical bacterial proteins of unknown function, which may be cytosolic.

    \ 1414 IPR004619 \

    This is a family of proteins found in a single copy in at least ten different early completed bacterial genomes. The only characterised member of the family is Bvg accessory factor (Baf), a protein required, in addition to the regulatory operon bvgAS, for heterologous transcription of the Bordetella pertussis toxin operon (ptx) in Escherichia coli PUBMED:11094274. Pertussis toxin is an important virulence factor of Bordetella pertussis, the causative agent of pertussis or whooping cough. The BvgAS two-component system controls the expression of pertussis toxin and a number of other B. pertussis virulence factors. Baf acts with BvgAS to further activate ptx transcription in E. coli grown in minimal medium without affecting the growth rate, and functional Baf appears to be required for viability of B. pertussis.

    \ 7631 IPR012866 \

    This family consists of sequences found in a number of hypothetical plant proteins of unknown function. The region of interest contains nine highly conserved cysteine residues and is approximately 160 amino acids in length, which probably represent a zinc-binding domain.

    \ 1919 IPR003807 \

    This entry describes proteins of unknown function.

    \ 7141 IPR010846 \

    This family consists of several hypothetical bacterial proteins of around 260 residues in length. The function of this family is unknown.

    \ 3753 IPR001915 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M48 (Ste24 endopeptidase family, clan M-); members of both subfamily are represented. The members of this set of proteins are mostly described as probable protease htpX homolog () or CAAX prenyl protease 1, which proteolytically removes the C-terminal three residues of farnesylated proteins. They are intergral membrane proteins associated with the endoplasmic reticulum and golgi, binding one zinc ion per subunit.

    \ \ \

    In Saccharomyces cerevisiae Ste24p is required for the first NH2-terminal proteolytic processing event within the a-factor precursor, which takes place after COOH-terminal CAAX modification is complete. The Ste24p contains multiple predicted membrane spans, a zinc metalloprotease motif (HEXXH), and a COOH-terminal ER retrieval signal (KKXX). The HEXXH protease motif is critical for Ste24p activity, since Ste24p fails to function when conserved residues within this motif are mutated.

    \ \

    The Ste24p homologues occur in a diverse group of organisms, including Escherichia coli, Schizosaccharomyces pombe, Haemophilus influenzae, and Homo sapiens, which indicates that the gene is highly conserved throughout evolution. Ste24p and the proteins related to it define a subfamily of proteins that are likely to function as intracellular, membrane-associated zinc metalloproteases PUBMED:9015299.

    \ \ 7295 IPR009099 \

    The beta-lactamase-inhibitor protein (BLIP) is produced by Streptomyces species. BLIP acts as a potent inhibitor of beta-lactamases such as TEM-1, which is the most widespread resistance enzyme to penicillin antibiotics. BLIP binds competitively to TEM-1 and makes direct contacts with TEM-1 active site residues. BLIP is able to inhibit a variety of class A beta-lactamases, possibly through flexibility of its two domains. The two tandemly repeated domains of BLIP have an alpha(2)-beta(4) structure, the beta-hairpin loop from domain 1 inserting into the active site of beta-lactamase PUBMED:8605632. BLIP shows no sequence similarity with BLIP-II, even though both bind to and inhibit TEM-1 PUBMED:11573088.

    \ \ 2157 IPR007460 \ Members of this family are uncharacterised proteins.\ 3221 IPR001800 \

    Members of this family are lipoproteins that are probably involved in evasion of the host immune system by pathogens PUBMED:9403685. They are predominantly found in the Spirochaetaceae.

    \ 6552 IPR009597 \

    This region consists of two a pair of transmembrane helices and occurs three times in each of the family member proteins.

    \ 3960 IPR006083 \

    Phosphoribulokinase (PRK) catalyses the ATP-dependent phosphorylation of \ ribulose-5-phosphate to ribulose-1,5-phosphate, a key step in the pentose phosphate \ pathway where carbon dioxide is assimilated by autotrophic organisms PUBMED:2175647. In \ general, plant enzymes are light-activated by the thioredoxin/ferredoxin system, while \ those from photosynthetic bacteria are regulated by a system that has an absolute \ requirement for NADH. Thioredoxin/ferredoxin regulation is mediated by the reversible\ oxidation/reduction of sulphydryl and disulphide groups.

    Uridine kinase (pyrimidine ribonucleoside kinase) is the rate-limiting enzyme in the pyrimidine\ salvage pathway. It catalyzes the following reaction:\

    Pantothenate kinase () catalyzes the rate-limiting step in the biosynthesis of coenzyme A, the conversion of pantothenate to D-4'-phosphopantothenate in the presence of ATP.

    \ 5387 IPR008482 \ This family consists of several uncharacterised bacterial and archaeal proteins of unknown function.\ 4897 IPR006955 \ This domain identifies a group of proteins, which are described as: General vesicular transport factor, Transcytosis associate protein (TAP) and Vesicle docking protein. This myosin-shaped molecule consists of an N-terminal globular head region, a coiled-coil tail which mediates dimerisation, and a short C-terminal acidic region PUBMED:11927603. p115 tethers COP1 vesicles to the Golgi by binding the coiled coil proteins giantin (on the vesicles) and GM130 (on the Golgi), via its C-terminal acidic region. It is required for intercisternal transport in the Golgi stack. This domain is found in the acidic C-terminal region, which binds to the golgins giantin and GM130. p115 is thought to juxtapose two membranes by binding giantin with one acidic region, and GM130 with another PUBMED:12077354.\ 2859 IPR002214 \

    Hantaviruses are ssRNA negative-strand viruses. The nucleocapsid protein is an internal protein of the virus particle PUBMED:9208453, PUBMED:8578853.

    \ \ 5258 IPR008471 \ This family contains several uncharacterised bacterial proteins with no known function.\ 1707 IPR003088 \

    Cytochromes c (cytC) can be defined as electron-transfer proteins having \ one or several haem c groups, bound to the protein by one or, more \ generally, two thioether bonds involving sulphydryl groups of cysteine \ residues. The fifth haem iron ligand is always provided by a histidine \ residue. CytC possess a wide range of properties and function in a large \ number of different redox processes PUBMED:.

    \

    Ambler PUBMED:1646017 recognised four classes of cytC.

    \

    Class I includes the low-spin soluble cytC of mitochondria and bacteria, with the haem-attachment site towards the N-terminus, and the sixth ligand provided by a methionine residue about 40 residues further on towards the C-terminus. On the basis of sequence similarity, class I cytC were further subdivided into five classes, IA to IE. Class IB includes the eukaryotic mitochondrial cytC and prokaryotic 'short' cyt c2 exemplified by Rhodopseudomonas globiformis cyt c2; class IA includes 'long' cyt c2, such as Rhodospirillum rubrum cyt c2 and Aquaspirillum itersonii cyt c-550, which have several extra loops by comparison with class IB cytC.

    \ 6266 IPR010495 \

    Free iron is limited in vertebrate hosts, thus an alternative to siderophores has been developed by pathogenic bacteria to access host iron bound in protein complexes. HasA is a secreted hemophore that has the ability to obtain iron from hemoglobin. Once bound to HasA, the heme is shuttled to the receptor HasR, which releases the heme into the bacterium PUBMED:10360351.

    \ 32 IPR008274 \ Aldehyde oxidase () catalyzes the conversion of an aldehyde in the presence of oxygen and water\ to an acid and hydrogen peroxide. The enzyme is a homodimer, and requires FAD, molybdenum and two\ 2FE-2S clusters as cofactors. Xanthine dehydrogenase () catalyzes the hydrogenation of xanthine\ to urate, and also requires FAD, molybdenum and two 2FE-2S clusters as cofactors. This activity is often\ found in a bifunctional enzyme with xanthine oxidase () activity too. The enzyme can be converted\ from the dehydrogenase form to the oxidase form irreversibly by proteolysis or reversibly through oxidation\ of sulphhydryl groups.\ \ 1296 IPR007134 \

    Autophagocytosis is a starvation-induced process responsible for transport of cytoplasmic proteins to the vacuole. This domain is the N-terminal while the C-terminal is represented by .

    \ 497 IPR002867 \ A cysteine-rich domain (C6HC), present in Triad1, is conserved in other proteins encoded by various eukaryotes. The C6HC consensus pattern C-x(4)-C-x(14-30)-C-x(1-4)-C-x(4)-C-x(2)-C-x(4)-H-x(4)-C defines this structure as the fourth family member of the zinc-binding RING, LIM, and LAP/PHD fingers. Strikingly, in most of the proteins the C6HC domain is flanked by two RING finger structures . The novel C6HC motif has been called DRIL (double RING finger linked). The strong conservation of the larger tripartite TRIAD (twoRING fingers and DRIL) structure indicates that the three subdomains are functionally linked and identifies a novel class of proteins PUBMED:10422847.\ 4386 IPR011130 \

    The SecA ATPase is involved in the insertion and retraction of preproteins through the plasma membrane. This domain has been found to cross-link to preproteins, thought to indicate a role in preprotein binding. The pre-protein cross-linking domain is comprised of two sub domains that are inserted within the ATPase domain PUBMED:12242434.

    \ 6479 IPR009544 \

    This entry represents the C terminus of hypothetical Arabidopsis thaliana proteins of unknown function.

    \ 4986 IPR003849 \

    This entry describes proteins of unknown function.

    \ 1018 IPR001607 \

    This domain displays some similarities with the Zn binding domain of the insulinase family. It is found only in a small subfamily of ubiquitin\ C-terminal hydrolases (deubiquitinases or UBP) PUBMED:9759494, PUBMED:9409543, All members of this\ subfamily are Isopeptidase-T that are known to cleave isopeptide bonds between\ ubiquitin moieties.

    \

    \ Some of the proteins containing an UBP zinc finger are listed below:\

    \

    \ \ \ \ 3853 IPR001659 \ A phycobilisome is an accessory light energy harvesting structure on the outer face of the thylakoid membranes in cyanobacteria and red algae. Phycobilisomes are mainly composed of phycobiliproteins (such as allophycocyanin, phycocyanin and phycoerythrin) together with linker polypeptides.\ 4024 IPR007826 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbM found in PSII. PsbM is one of the most hydrophobic proteins in the thylakoid membrane. The function of this protein is unknown.

    \ 5635 IPR008694 \ This family consists of a series of repeated 73 residue sequences from the Mycoplasma arthritidis MAA2 variable surface protein. MAA2 is implicated in cytoadherence and virulence and has been shown to exhibit both size and phase variability PUBMED:9596719.\ 6282 IPR009457 \

    This family consists of several hypothetical plant specific proteins of unknown function.

    \ 7977 IPR012953 \

    This N-terminal domain is found in BOP1-like WD40 proteins PUBMED:15112237.

    \ 3069 IPR003573 \

    Interleukin-6 (IL6), also refered to as B-cell stimulatory factor-2 (BSF-2) and interferon beta-2, is a cytokine involved in a wide variety of biological functions PUBMED:3491322. It plays an essential role in the final\ differentiation of B-cells into IG-secreting cells, as well as inducing myeloma/plasmacytoma growth, nerve cell differentiation and, in hepatocytes, acute phase reactants PUBMED:3491322, PUBMED:2037043.

    \

    A number of other cytokines may be grouped with IL6 on the basis of sequence similarity PUBMED:3491322, PUBMED:2037043, PUBMED:2472117: these include granulocyte colony-stimulating factor (GCSF) and myelomonocytic growth factor (MGF). GCSF acts in hematopoiesis by affecting the production, differentiation and function of 2 related white cell groups in the blood PUBMED:2472117. MGF also acts in hematopoiesis, stimulating proliferation and colony formation of normal and transformed avian cells of the myeloid lineage.

    \

    Cytokines of the IL6/GCSF/MGF family are glycoproteins of about 170 to 180 amino acid residues that contains four conserved cysteine residues involved in two disulphide bonds PUBMED:2472117. They have a compact, globular fold (similar to other interleukins), stabilised by the 2 disulphide bonds. One half of the structure is dominated by a 4 alpha-helix bundle with a left-handed twist PUBMED:1400355: the helices are anti-parallel, with 2 overhand connections, which fall into a 2-stranded anti-parallel beta-sheet. The fourth alpha-helix is important to the biological activity of the molecule PUBMED:2037043.

    \

    It has been said PUBMED:1717982 that this family can be extended by the adjunction of LIF and OSM (see the relevant entry ) which seem to be structurally related.

    \ 2863 IPR002519 \ Poliovirus infection leads to drastic alterations in membrane\ permeability late during infection. Proteins 2B and 2BC enhance\ membrane permeability PUBMED:9218794, PUBMED:8798506.\ 2503 IPR006860 \ FecR is involved in regulation of iron dicitrate transport. In the absence of citrate FecR inactivates FecI. FecR is probably a sensor that recognizes iron dicitrate in the periplasm.\ 4074 IPR000159 \ Proteins with this domain are mostly RasGTP effectors and include guanine-nucleotide releasing factor in mammals PUBMED:8987396. This factor stimulates the dissociation of GDP from the Ras-related RALA and RALB GTPases which allows GTP binding and activation of the GTPases. It interacts and acts as an effector molecule for R-ras, K-Ras and Rap PUBMED:7972015.\ \ The domain is also present in a number of other proteins among them the sexual differentiation protein in yeast that is essential for mating and meiosis and yeast adenylate cyclase. These proteins contain repeated leucine-rich (LRR) segments.\ 6846 IPR009747 \

    This family consists of several bacterial SepL and SsaL proteins. SepL plays an essential role in the infection process of enterohemorrhagic Escherichia coli and is thought to be responsible for the secretion of EspA, EspD, and EspB PUBMED:11053395. SsaL of Salmonella typhimurium is thought to be a component of the type III secretion system PUBMED:9140973.

    \ 5513 IPR008538 \ This family consists of a number of hypothetical proteins from the Anabaena and Synechocystis cyanobacterial species.\ 3328 IPR000518 \

    Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium and nickel. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds PUBMED:3064814, PUBMED:2959513, PUBMED:1779825. An empirical classification into three classes was proposed by Kojima PUBMED:1779826, with class III MTs including atypical polypeptides composed of gamma-glutamylcysteinyl units. Class I and class II MTs (the proteinaceous sequences) have now been grouped into families of phylogenetically-related and thus alignable sequences. The MT superfamily is subdivided into families, subfamilies, subgroups, and isolated isoforms and alleles. The metallothionein superfamily comprises all polypeptides that resemble equine renal metallothionein in several respects PUBMED:2959504, e.g., low molecular weight; high metal content; amino acid composition with high Cys and low aromatic residue content; unique sequence with characteristic distribution of cysteines, and spectroscopic manifestations indicative of metal thiolate clusters. A MT family subsumes MTs that share particular sequence-specific features and are thought to be evolutionarily related. Fifteen MT families have been characterised, each family being identified by its number and its taxonomic range.

    \

    Family 14 consists of prokaryota MTs. Its members are recognised by the sequence pattern K-C-A-C-x(2)-C-L-C.The taxonomic range of the members extends to cyanobacteria. Known characteristics are: 53 to 56 AAs; 9 conserved Cys; one conserved tyrosine residue; one conserved histidine residue; contain other unusual residues.

    \ 4930 IPR006792 \

    This region is found in plant seed storage proteins, N-terminal to the Cupin domain (). In Macadamia integrifolia (), this region is processed into peptides of approximately 50 amino acids containing a C-X-X-X-C-(10-12)X-C-X-X-X-C motif. These peptides exhibit antimicrobial activity in vitro PUBMED:10571855.

    \ 6784 IPR009715 \

    RtcR is a sigma54-dependent enhancer binding protein PUBMED:12618438 that activates transcription of the rtcBA operon. The product of the rtcA gene is an RNA 3 -terminal phosphate cyclase PUBMED:9738023. This domain is found at the N terminus of the RtcR sequence. RtcR, and other sigma54-dependent activators, contain in the central region of the protein sequence.

    \ 2902 IPR003840 \ Helicases from the herpes viruses are responsible for the unwinding of DNA and\ are essential for replication and completion of the viral life cycle.\ 4472 IPR007653 \ Translocation of polypeptide chains across the endoplasmic reticulum membrane is triggered by signal sequences. During translocation of the nascent chain through the membrane, the signal sequence of most secretory and membrane proteins is cleaved off. Cleavage occurs by the signal peptidase complex (SPC), which consists of four subunits in yeast and five in mammals. This family is is described as similar to microsomal signal peptidase 23 kDa subunit. Found in eukaryotes PUBMED:8632014, PUBMED:9148931.\ 3165 IPR001124 \ A number of mammalian lipid-binding serum glycoproteins belong to this family.\ They include; the lipopolysaccharide-binding protein (LBP), the bactericidal\ permeability-increasing protein (BPI), the cholesteryl ester transfer protein\ (CETP) and the phospholipid transfer protein (PLTP) \ PUBMED:2722846, PUBMED:8132678, PUBMED:2402637.\ 5658 IPR008716 \ The nodulation genes of Rhizobia are regulated by the nodD gene product in response to host-produced flavonoids and appear to encode enzymes involved in the production of a lipo-chitose signal molecule required for infection and nodule formation. NodZ is required for the addition of a 2-O-methylfucose residue to the terminal reducing N-acetylglucosamine of the nodulation signal. This substitution is essential for the biological activity of this molecule. Mutations in nodZ result in defective nodulation. nodZ represents a unique nodulation gene that is not under the control of NodD and yet is essential for the synthesis of an active nodulation signal PUBMED:8300517.\ 3731 IPR005314 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to MEROPS peptidase family C50 (separase family, clan CD). The active site residues for members of this family and family C14 occur in the same order in the sequence: H,C.

    \ \

    The separases are caspase-like proteases, which plays a central role in the chromosome segregation. In yeast they cleave the rad21 subunit of the cohesin complex at the onset of anaphase. During most of the cell cycle, separase is inactivated by the securin/cut2 protein, which probably covers its active site.

    \ 115 IPR000542 \ A number of eukaryotic acetyltransferases can, on the basis of sequence similarities, be grouped together into a family. These enzymes include choline o-acetyltransferase (), an enzyme that catalyzes the biosynthesis of the neurotransmitter acetylcholine PUBMED:3480542; carnitine o-acetyltransferase () PUBMED:8420957; peroxisomal carnitine octanoyltransferase (), a fatty acid beta-oxidation pathway enzyme which is involved in the transport of medium-chain acyl-coenzyme A's from peroxisome to mitochondria PUBMED:3233218; mitochondrial carnitine palmitoyltransferases I and II () (CPT), enzymes involved in fatty acid metabolism and transport PUBMED:8449948; and Mycoplasma pneumoniae putative acetyltransferase C09_orf600.\ 3923 IPR000327 \ The 'POU' (named after Pit, Oct, Unc and pronounced 'pow') domain is a 70 to 75 amino-acid region\ found upstream of a homeobox domain in some \ eukaryotic transcription factors.\ Such proteins\ bind to specific DNA sequences to cause temporal and spatial regulation of \ the expression of genes, many of which are involved in the regulation of \ neuronal development in the central nervous system of mammals PUBMED:1967821. Some \ other genes are also regulated, including those for immunoglobulin light \ and heavy chains (Oct-2) PUBMED:1967834, and trophic hormone genes, such as those for \ prolactin and growth hormone (Pit-1). Both elements of the POU-domain are \ required for high affinity sequence-specific DNA-binding. The domain may \ also be involved in protein-protein interactions PUBMED:1628619. The 3-D structure \ structure of the POU-domain has been determined by multidimensional\ NMR PUBMED:8462099 and X-ray crystallography to 3.0 A resolution PUBMED:8156594.\ 3736 IPR005315 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to MEROPS peptidase family C8 (clan CA). The peptidases are encoded by the double stranded viral RNAs belonging to the genus Hypovirus.

    \ 479 IPR003661 \ The histidine kinase A (phosphoacceptor) N-terminal domain is a dimerisation and phosphoacceptor domain of histidine kinases. It has been found in bacterial sensor protein/histidine kinases.\ 2182 IPR007506 \ This is a group of hypothetical proteins.\ 6944 IPR010779 \

    This family consists of several Streptococcus bacteriophage sequences and related proteins from Streptococcus species. Members of this family are typically around 100 residues in length and their function is unknown.

    \ 3613 IPR000572 \

    A number of different eukaryotic oxidoreductases that require and bind a molybdopterin cofactor have been shown \ PUBMED:2015248 to share a few regions of sequence similarity. These enzymes include xanthine dehydrogenase (), \ aldehyde oxidase (), nitrate reductase (), and sulphite oxidase (). The multidomain redox \ enzyme NAD(P)H:nitrate reductase (NR) catalyzes the reduction of nitrate to nitrite in a single polypeptide electron \ transport chain with electron flow from NAD(P)H-FAD-cytochrome b5-molybdopterin-NO(3). Three forms of NR are known, an \ NADH-specific enzyme found in higher plants and algae (); an NAD(P)H-bispecific enzyme found in higher plants, \ algae and fungi (); and an NADPH-specific enzyme found only in fungi () PUBMED:2204158. The \ mitochondrial enzyme sulphite oxidase (sulphite:ferricytochrome c oxidoreductase; EC 1.8.2.1) catalyses oxidation of \ sulphite to sulphate, using cytochrome c as the physiological electron acceptor. Sulphite oxidase consists of 2 \ structure/function domains, an N-terminal heme domain, similar to cytochrome b5; and a C-terminal molybdopterin domain \ PUBMED:9428520.

    \ \ 7666 IPR012422 \

    Bacterial cytochrome c oxidase is found bound to the to the cell membrane, where it is involved in the generation of the transmembrane proton electrochemical gradient. It is composed of four subunits. Subunit IV consists of one transmembrane helix that does not interact directly with the other subunits, but maintains its position by indirect contacts via phospholipid molecules found in the structure. The function of subunit IV is as yet unknown PUBMED:12144789.

    \ 6808 IPR010727 \

    This family contains a number of hypothetical bacterial proteins of unknown function that are approximately 600 residues long. Most family members seem to be from Pseudomonas.

    \ 6399 IPR010556 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 6705 IPR009671 \

    This family consists of several hypothetical bacterial proteins of around 120 residues in length. The function of this family is unknown.

    \ 4438 IPR004015 \

    This family consists of chromatin proteins, nuclear skip (SKI-interacting) protein, and some hypothetical proteins.

    \ 7324 IPR011124 \

    This domain appears to be a zinc finger. The alignment shows four conserved cysteine residues and a conserved tryptophan. It is predicted to be a highly specialised mononuclear four-cysteine zinc finger that plays a role in DNA binding and/or promoting protein-protein interactions in complicated eukaryotic processes including chromatin methylation status and early embryonic development. Weak homology to members of further evidences these predictions. The domain is found exclusively in vertebrates, vertebrate-infecting parasites and higher plants PUBMED:14607086.

    \ 4181 IPR001147 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    L21E family contains proteins from a number of eukaryotic\ and archaebacterial organisms which include; mammalian L2, Entamoeba histolytica L21,\ C. elegans L21 (C14B9.7), yeast L21E (URP1) and Halobacterium marismortui HL31.

    \ 5941 IPR010358 \

    This family consists of several eukaryotic brain and reproductive organ-expressed (BRE) proteins. BRE is a putative stress-modulating gene, found able to down-regulate TNF-alpha-induced-NF-kappaB activation upon over expression. A total of six isoforms are produced by alternative splicing predominantly at either end of the gene. Compared to normal cells, immortalised human cell lines uniformly express higher levels of BRE. Peripheral blood monocytes respond to LPS by down-regulating the expression of all the BRE isoforms. It is thought that the function of BRE and its isoforms is to regulate peroxisomal activities PUBMED:11676476.

    \ 6010 IPR010390 \

    This family consists of a number of hypothetical bacterial proteins of unknown function.

    \ 4413 IPR006993 \

    This family of proteins, which contains SH3BGRL3, is functionally uncharacterized. SH3BGRL3 is a highly conserved small protein, which is widely expressed and shows a significant similarity to glutaredoxin 1 (GRX1) of Escherichia coli which is predicted to belong to the thioredoxin\ superfamily. However, SH3BGRL3 lacks both conserved cysteine residues, which characterize\ the enzymatic active site of GRX. This structural feature raises the possibility that SH3BGRL3 and its homologues could function as\ endogenous modulators of GRX activity PUBMED:11444877.

    \ 5419 IPR008312 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function. However, these proteins are encoded in pathogenic and symbiotic bacteria as part of an operon (part of the SCI genomic island in Salmonella enterica and the imp locus in Rhizobium leguminosarum) implicated in pathogenicity and protein secretion PUBMED:12437215, PUBMED:12580282. These proteins are SciH/ImpB.\ 6075 IPR010423 \

    This family consists of several ookinete surface protein (Pvs28) from several species of Plasmodium. Pvs25 and Pvs28 are expressed on the surface of ookinetes. These proteins are potential candidates for vaccine and induce antibodies that block the infectivity of Plasmodium vivax in immunised animals PUBMED:11738740.

    \ 3677 IPR001904 \

    Paxillin is a cytoskeletal protein involved in actin-membrane attachment at sites of cell adhesion to the extracellular matrix (focal adhesion) PUBMED:7534286, PUBMED:7525621. Extensive tyrosine phosphorylation occurs during integrin-mediated cell adhesion, embryonic development, fibroblast transformation and following stimulation of cells by mitogens that operate through the 7TM family of G-protein-coupled receptors PUBMED:7525621. Paxillin binds in vitro to the focal adhesion protein vinculin, as well as to the SH3 domain of c-Src, and, when tyrosine phosphorylated, to the SH2 domain of v-Crk PUBMED:7525621. An N-terminal region has been identified that supports the binding of both vinculin and the focal adhesion tyrosine kinase, pp125Fak PUBMED:7525621.

    \

    Paxillin is a 68 kDa protein containing multiple domains, including four tandem C-terminal LIM domains (each of which binds 2 zinc ions); an N-terminal proline-rich domain, which contains a consensus SH3 binding site; and three potential Crk-SH2 binding sites PUBMED:7534286. The predicted structure of paxillin suggests that it is a unique cytoskeletal protein capable of interaction with a variety of intracellular signalling and structural molecules important in growth control and the regulation of cytoskeletal organisation PUBMED:7534286, PUBMED:7525621.

    \ 5746 IPR008383 \ This family consists of apoptosis inhibitory protein 5 (API5) sequences from several organisms. Apoptosis or programmed cell death is a physiological form of cell death that occurs in embryonic development and organ formation. It is characterised by biochemical and morphological changes such as DNA fragmentation and cell volume shrinkage. API5 is an anti apoptosis gene located in Homo sapiens chromosome 11, whose expression prevents the programmed cell death that occurs upon the deprivation of growth factors PUBMED:9307294,PUBMED:10393420.\ 7576 IPR011683 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ This domain is found in family 53 of the glycosyl hydrolase classification PUBMED:12691742. These enzymes are endo-1,4- beta-galactanases (). The structure of this domain is known PUBMED:12484750 and has a TIM barrel fold.\ 1008 IPR001841 \

    Quality control of intracellular proteins is essential for cellular homeostasis. Molecular chaperones recognise and contribute to the refolding of misfolded or unfolded proteins, whereas the ubiquitin-proteasome system mediates the degradation of such abnormal proteins. Ubiquitin-protein ligases (E3s) determine the substrate specificity for ubiquitylation and have been classified into HECT and RING-finger families. More recently, however, U-box proteins, which contain a domain (the U box) of about 70 amino acids that is conserved from yeast to humans, have been identified as a new type of E3 PUBMED:12944364.

    \ \ \ \

    The RING-finger is a specialised type of Zn-finger of 40 to 60 residues that binds two atoms of zinc, and is probably involved in mediating protein-protein interactions. PUBMED:8317827, PUBMED:8804826, PUBMED:8744354. There are two different variants, the C3HC4-type and a C3H2C3-type, which is clearly related despite the different cysteine/histidine pattern. The latter type is sometimes referred to as 'RING-H2 finger'.

    \ \

    The RING domain is a protein interaction domain which has been implicated in a range of diverse biological processes.\ E3 ubiquitin-protein ligase activity is intrinsic to the RING domain of\ c-Cbl and is likely to be a general function of this domain; Various RING\ fingers exhibit binding to E2 ubiquitin-conjugating enzymes (Ubc's) PUBMED:10662664, PUBMED:10514377, PUBMED:10577187.\

    \ \

    Several 3D-structures for RING-fingers are known PUBMED:8804826, PUBMED:8744354. The 3D structure of the zinc ligation system is unique to the RING domain and is referred to as the 'cross-brace' motif. The spacing of the cysteines in such a domain is C-x(2)-C-x(9 to 39)-C-x(1 to 3)-H-x(2 to 3)-C-x(2)-C-x(4 to 48)-C-x(2)-C. Metal ligand pairs one and three co-ordinate to bind one zinc ion, whilst pairs two and four bind the second, as illustrated in the following schematic representation:

    \
    \
                                  x x x     x x x\
                                 x      x x      x\
                                x        x        x\
                               x        x x        x\
                              C        C   C        C\
                             x  \\    / x   x \\    /  x\
                             x    Zn   x   x   Zn    x\
                              C /    \\ C   H /    \\ C\
                              x         x x         x\
                     x x x x x x         x         x x x x x x\
    \
     'C': conserved cysteine involved zinc binding.\
     'H': conserved histidine involved in zinc binding.\
    'Zn': zinc atom.
    \ Note that in the older literature, some RING-fingers are denoted as LIM-domains. The LIM-domain Zn-finger is a fundamentally different family, albeit with similar Cys-spacing (see ).\ 5978 IPR009317 \

    This family of proteins contain a conserved 60 residue region. This protein is known as ChaB in Escherichia coli and is found next to ChaA, which is a cation transporter protein. ChaB may be regulate ChaA function in some way.

    \ 3694 IPR006782 \ This domain consists of the N-terminal regions of platelet-derived growth factor (PDGF, ) A and B chains.\ 3857 IPR004964 \ The phenazine biosynthesis proteins A and B are involved in the biosynthesis of this antibiotic. Phenazine is a nitrogen-containing heterocyclic molecule with important implications in virulence, competition and biological control.\ 5270 IPR008684 \ Microvirus A protein is a specific endonuclease that cleaves the viral strand of supertwisted, closed circular DNA at a unique site in the A gene. The A protein also causes relaxation of supertwisted DNA and forms a complex with viral DNA that has a discontinuity in gene A of the viral strand PUBMED:158588. The C-terminal region of the sequence contains the cleavage site for A/A* protein PUBMED:6283158.\ 1823 IPR002105 \

    Gram-positive, thermophilic anaerobes such as Clostridium thermocellum or Clostridium cellulolyticum secretes a highly active and thermostable cellulase complex (cellulosome) responsible for the degradation of crystalline cellulose PUBMED:2252383, PUBMED:1478480. The cellulosome contains at least 30 polypeptides, the majority of the enzymes are endoglucanases (), but there are also some xylanases (), beta-glucosidases () and endo-beta-1,3-1,4-glucanases ().

    \ \

    Complete sequence data for many of these enzymes has been obtained. A majority of these proteins contain a highly conserved region of about 65 to 70 residues which is generally (but not always) located in the C terminus. This region contains a duplicated segment of 24 amino acids, the dockerin domain, which is the binding partner of the cohesin domain (see ). The cohesin-dockerin interaction is the crucial interaction for complex formation in the cellulosome PUBMED:10390637.

    \ 7398 IPR011441 \

    This conserved region is found at the N terminus of several phage terminase proteins, associated with .

    \ 5602 IPR008650 \ This family consists of several helicase-primase complex components from the Gammaherpesviruses.\ 485 IPR000536 \

    Steroid or nuclear hormone receptors constitute an important superfamily of transcription regulators that are involved in widely diverse physiological functions, including control of embryonic development, cell differentiation and homeostasis. The receptors function as dimeric molecules in nuclei to regulate the transcription of target genes in a ligand-responsive manner. Nuclear hormone receptors consist of a highly conserved DNA-binding domain that recognises specific sequences (), connected via a linker region to a C-terminal ligand-binding domain. In addition, certain nuclear hormone receptors have an N-terminal modulatory domain (). The ligand-binding domain acts in response to ligand binding, which caused a conformational change in the receptor to induce a response, thereby acting as a molecular switch to turn on transcriptional activity PUBMED:14973393. For example, after binding of the glucocorticoid receptor to the corticosteroid ligand, the receptor is induced to perform functions ranging from nuclear translocation, oligomerisation, cofactor/kinase/transcription factor association, and DNA binding PUBMED:15193451. The ligand-binding domain is a flexible unit, where the binding of a ligand stabilises its conformation, which in turn favours coactivator binding to modify receptor activity PUBMED:15661830; the coactivator can bind to the activator function 2 (AF2) site at the C-terminal end of the ligand-binding domain PUBMED:15728727. The binding of different ligands can alter the conformation of the ligand-binding domain, which ultimately affects the DNA-binding specificity of the DNA-binding domain. In the absence of ligand, steroid hormone receptors are thought to be weakly associated with nuclear components. This entry represents the C-terminal ligand-binding domain.

    \ 6454 IPR010581 \

    This family consists of several hypothetical archaeal proteins of unknown function.

    \ 386 IPR002900 \

    This domain has no known function. It is found in many proteins from Caenorhabditis elegans and Caenorhabditis briggsae.\ The domain is found associated with, and C-terminal to, the cyclin-like F-box .

    \ 2737 IPR006102 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 2 \ comprises enzymes with several known activities; beta-galactosidase (); beta-mannosidase (); beta-glucuronidase ().

    \ \

    These enzymes contain a conserved glutamic acid residue which has been shown PUBMED:1350782, in Escherichia coli lacZ (), to be the general acid/base catalyst in the active site of the enzyme.

    \

    This entry describes the immunoglobulin-like beta-sandwich domain PUBMED:8008071.

    \ 6606 IPR010644 \

    This family contains chlorite dismutase enzymes of bacterial and archaeal origin. This enzyme catalyses the disproportionation of chlorite into chloride and oxygen PUBMED:8929278. Note that many family members are hypothetical proteins.

    \ 7605 IPR011678 \ These sequences are mainly derived from predicted eukaryotic proteins. The region in question lies towards the C terminus of these large proteins and is approximately 300 amino acid residues long.\ 1423 IPR000233 \ Cadherins are transmembrane glycoproteins vital in calcium-dependent cell-cell adhesion during tissue differentiation PUBMED:3061804. Cadherins cluster to form foci of homophilic binding units. A key determinant to the strength of the \ binding that it is mediated by cadherins is the juxtamembrane region of the cadherin. This region induces clustering and also binds to the protein p120ctn PUBMED:9566976. The cytoplasmic region is highly conserved in sequence and has been shown experimentally to regulate the cell-cell binding function of the extracellular domain of E-cadherin, possibly through interaction with the cytoskeleton PUBMED:3061804. This domain is found upstream of the cadherin domain .\ 6017 IPR010394 \

    This family consists of both eukaryotic and prokaryotic 5'-nucleotidase sequences ().

    \ 4340 IPR002133 \

    S-adenosylmethionine synthetase (MAT, ) is the enzyme that catalyzes the formation of S-adenosylmethionine (AdoMet) from methionine and ATP PUBMED:1696256. AdoMet is an important methyl donor for transmethylation and is also the propylamino donor in polyamine biosynthesis.

    \

    In bacteria there is a single isoform of AdoMet synthetase (gene metK), there are two in budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a multigene family.

    \

    The sequence of AdoMet synthetase is highly conserved throughout isozymes and species. The active sites of both the Escherichia coli and rat liver MAT reside between two subunits, with contributions from side chains of residues from both subunits,\ resulting in a dimer as the minimal catalytic entity. The side chains that contribute to the ligand binding sites are conserved between the two proteins. In the\ structures of complexes with the E. coli enzyme, the phosphate groups have the same positions in the (PPi plus Pi) complex and the (ADP plus Pi) complex,\ and are located at the bottom of a deep cavity with the adenosyl group nearer the entrance PUBMED:1213535.

    \ 7705 IPR012455 \

    This protein is found in Lactobacillae prophages.

    \ 6209 IPR010477 \

    This family consists of several proteins which appear to be specific to Drosophila melanogaster. The function of this family is unknown.

    \ 6896 IPR010762 \

    This family contains a number of major capsid Gp23 proteins approximately 500 residues long, from T4-like bacteriophages.

    \ 7214 IPR010865 \

    This family consists of several hypothetical bacterial and plant proteins of around 125 residues in length. The function of this family is unknown.

    \ 1876 IPR003458 \

    This family contains bacteriophage T4 gp38 and related bacterial prophage and phage proteins. Gene 38 of phage T4 codes for a protein containing 183 amino acid residues with molecular weight of 22.3 kDa. Together with genes 36 and 37, whose products are structural proteins of the fibre distal part, gene 38 forms one transcription unit. Gp38, is a chaperone, which is required for assembly of the distal part of the long fibres and which is absent from the mature phage particle. In the absence of gp38 gp37, which is a component of the distal part of the long tail fibre, fails to oligomerise. The carboxy-terminal region of gp37 forms the tip of the distal fibre that interacts with the cell receptors. Functionally the role of gp38 can be replaced by pTfa of phage lambda PUBMED:8892827, PUBMED:1531648, PUBMED:14625682.

    \ \ \ \

    The function of many of the other members of this family remain to be elucidated.

    \ \ 2345 IPR002779 \

    This family contains the gene products of PduO and EutT which are both cobalamin adenosyltransferases. PduO is a protein with ATP:cob(I)alamin adenosyltransferase activity. The main role of this protein is the conversion of inactive cobalamins to AdoCbl for 1,2-propanediol degradation PUBMED:9311132. The EutT enzyme appears to be an adenosyl transferase, converting CNB12 to AdoB12 PUBMED:9352910.

    \ \ 489 IPR005578 \ This family includes a number of eukaryotic proteins. It is an integral membrane protein, conserved in at least 1 copy in all sequenced eukaryotes. The gene name in Schizosaccharomyces pombe is hrf1+ for Heavy metal Resistance Factor 1 (unpublished).\ 6355 IPR009492 \

    This family consists of several bacterial TniQ proteins. TniQ along with TniA and B is involved in the transposition of the mercury-resistance transposon Tn5053 that carries the mer operon. It has been suggested that the tni genes are involved in the dissemination of integrons PUBMED:8594337.

    \ 719 IPR002716 \

    The PilT protein, N-terminal domain (PIN) is a compact domain of about 100 amino acids. The domain has two nearly invariant aspartates and forms a coiled-coil with other monomer units to polymerize a pilus fibre PUBMED:10216854. The function of the PIN domain is unknown but a role in signaling appears likely given the presence of this domain in some bacterial plasmid stability proteins and Dis3 from yeast that is implicated in mitotic control PUBMED:8896453.

    \ 5521 IPR008631 \ This family consists of the eukaryotic glycogen synthase proteins GYS1, GYS2 and GYS3. Glycogen synthase (GS) is the enzyme responsible for the synthesis of -1,4-linked glucose chains in glycogen. It is the rate limiting enzyme in the synthesis of the polysaccharide, and its activity is highly regulated through phosphorylation at multiple sites and also by allosteric effectors, mainly glucose 6-phosphate (G6P) PUBMED:11415431.\ 5690 IPR008655 \ This family consists of several Helicobacter pylori specific IceA2 proteins. The function of this family is unknown.\ 2646 IPR003139 \

    Retroviral matrix proteins (or major core proteins) are components of envelope-associated capsids, which line the inner surface of virus envelopes and are associated with viral membranes PUBMED:9657938. Matrix proteins are produced as part of Gag precursor polyproteins. During viral maturation, the Gag polyprotein is cleaved into major structural proteins by the viral protease, yielding the matrix (MA), capsid (CA), nucleocapsid (NC), and some smaller peptides. Gag-derived proteins govern the entire assembly and release of the virus particles, with matrix proteins playing key roles in Gag stability, capsid assembly, transport and budding. Although matrix proteins from different retroviruses appear to perform similar functions and can have similar structural folds, their primary sequences can be very different.

    \

    This entry represents matrix proteins from delta-retroviruses such as HTLV-I (human T-cell leukaemia virus-I) and HTLV-II, both members of the human oncovirus subclass of retroviruses PUBMED:11752179, PUBMED:9000634.

    \ \ \ \ \ 2288 IPR006979 \

    This conserved region is found in the C-terminal region of a number of conserved archaeal proteins of unknown function.

    \ 20 IPR000182 \

    Histone acetylation is carried out by a class of enzymes known as histone acetyltransferases\ (HATs), which catalyze the transfer of an acetyl group from acetyl-CoA to the lysine E-amino\ groups on the N-terminal tails of histonesPUBMED:12801725. Early indication that HATs were involved in transcription\ came from the observation that in actively transcribed regions of chromatin, histones tend to be\ hyperacetylated, whereas in transcriptionally silent regions histones are hypoacetylated. The histone acetyltransferases are divided into five families. These include the Gcn5-related\ acetyltransferases (GNATs); the MYST (for 'MOZ, Ybf2/Sas3, Sas2 and Tip60)-related HATs;\ p300/CBP HATs; the general transcription factor HATs, which include the TFIID subunit TAF250;\ and the nuclear hormone-related HATs SRC1 and ACTR (SRC3).\ \ The GCN5-related N-acetyltransferase superfamily includes such enzymes as the histone acetyltransferases GCN5 and Hat1, the elongator complex subunit Elp3,\ the mediator-complex subunit Nut1, and Hpa2 PUBMED:9175471.

    \

    Many GNATs share several functional domains, including an N-terminal region of variable length, an\ acetyltransferase domain that encompasses the conserved sequence motifs described above, a\ region that interacts with the coactivator Ada2, and a C-terminal bromodomain that is believed to\ interact with acetyl-lysine residues. Members of the GNAT family are important for the regulation of cell growth and development. In\ mice, knockouts of Gcn5L are embryonic lethal. Yeast Gcn5 is needed for normal progression\ through the G2M boundary and mitotic gene expression. The importance of GNATs is\ probably related to their role in transcription and DNA repair.

    \

    The yeast GCN5 (yGCN5) transcriptional coactivator functions as a histone acetyltransferase (HAT) to promote transcriptional activation. The crystal structure of the yeast histone acetyltransferase Hat1-acetyl coenzyme A (AcCoA) shows that Hat1 has an elongated, curved structure, and the AcCoA molecule is bound in a cleft on the concave surface of the protein, marking the active site of the enzyme. A channel of variable width and depth that runs across the protein is probably the binding site for the histone substrate PUBMED:9727486. The central protein core associated with AcCoA binding that appears to be structurally conserved among a superfamily of N-acetyltransferases, including yeast histone acetyltransferase 1 and Serratia marcescens aminoglycoside 3-N-acetyltransferase PUBMED:10430873.

    \ 4888 IPR007864 \

    Urease and other nickel metalloenzymes are synthesised as precursors devoid of the metalloenzyme active site. These precursors then undergo a complex post-translational maturation process that requires a number of accessory proteins.

    \ \

    Members of this group are nickel-binding proteins required for urease metallocenter assembly PUBMED:8318889. They are believed to function as metallochaperones to deliver nickel to urease apoprotein PUBMED:12072968, PUBMED:10753863. It has been shown by yeast two-hybrid analysis that UreE forms a dimeric complex with UreG in Helicobacter pylori PUBMED:12388207. The UreDFG-apoenzyme complex has also been shown to exist PUBMED:11157956, PUBMED:7721685 and is believed to be, with the addition of UreE, the assembly system for active urease PUBMED:7721685. The complexes, rather than the individual proteins, presumably bind to UreB via UreE/H recognition sites.

    \ \

    The structure of Klebsiella aerogenes UreE reveals a unique two-domain architecture.The N-terminal domain is structurally related to a heat shock protein, while the C-terminal domain shows homology to the Atx1 copper metallochaperone PUBMED:11591723, PUBMED:11602602. Significantly, the metal-binding sites in UreE and Atx1 are distinct in location and types of residues despite the relationship between these proteins and the mechanism for UreE activation of urease is proposed to be different from the thiol ligand exchange mechanism used by the copper metallochaperones.

    \ \

    The C-terminal domain of this protein is the metal-binding region, which can bind up to six Ni molecules per dimer. Most members of this group contain a histidine-rich C-terminal motif that is involved in, but not solely responsible for, binding nickel ions in Klebsiella aerogenes UreE PUBMED:8808929. However, internal ligands, not the histidine residues at the C terminus, are necessary for UreE to assist in urease activation in Klebsiella aerogenes PUBMED:11591723, even though the truncated protein lacking the His-rich region binds two nickel ions instead of six. In Helicobacter pylori and some other organisms, the terminal histidine-rich binding sites are absent, but the internal histidine sites are present, and the latter probably function as nickel donors. Deletion analysis shows that this domain alone is sufficient for metal-binding and activation of urease PUBMED:15866948.

    \ 2868 IPR001490 \ The genome polyprotein contains: caspid protein C, envelope glycoproteins E1 and E2, protein P7, nonstructural protein NS2, protease/helicase NS3, nonstructural proteins NS4A and NS4B (this family), NS5A and NS5B.\ \

    The small proteins NS2A, NS2B, NS4A and NS4B are hydrophobic, suggesting a possible membrane-related function PUBMED:9224925.\ It is known that NS4B interacts with NS4A and NS3 to form a large\ replicase complex to direct the viral RNA replication PUBMED:9261364. NS3 and NS5 may also play a role in the viral RNA replication.

    \ 7889 IPR012559 \

    This family consists of erythromycin resistance gene leader peptides. These leader peptides are involved in the transcriptional attenuation control of the synthesis of the macrolide-lincosamide -streptogramin B resistance protein. It acts as a transcriptional attenuator, in contrast to other inducible erm genes. The mRNA leader sequence can fold in either of two mutually exclusive conformations, one of which is postulated to form in the absence of induction, and to contain two rho factor-independent terminators. PUBMED:1713206.

    \ 462 IPR003660 \ This domain is known as the HAMP domain for histidine kinases, adenylyl\ cyclases, methyl binding proteins and phosphatases.\ It is found in bacterial sensor and chemotaxis proteins and in eukaryotic histidine kinases. The bacterial proteins are usually integral membrane proteins and part of a two-component signal transduction pathway.\ 7500 IPR011642 \ This region in the nucleoside transporter proteins are responsible for determining nucleoside specificity in the human CNT1 and CNT2 proteins (e.g. ) PUBMED:10455109. In the FeoB proteins (e.g. ), which are believed to be Fe2+ transporters, it includes the membrane pore region, so the function of this region is likely to be more general than just nucleoside specificity PUBMED:12781516. This family may represent the pore and gate, with a wide potential range of specificity. Hence its name - Gate.\ 2679 IPR002930 \

    This is a family of glycine cleavage H-proteins, part of the glycine cleavage multienzyme complex (GCV) found in bacteria and the mitochondria of eukaryotes. GCV catalyses the catabolism of glycine in eukaryotes.\ A lipoyl group is attached to a completely conserved lysine residue.\ The H protein shuttles the methylamine group of glycine from the P protein to the T protein.

    \ 6579 IPR010628 \

    This family consists of several bacterial ethanolamine ammonia lyase large subunit (EutB) proteins. Ethanolamine ammonia-lyase is a bacterial enzyme that catalyses the adenosylcobalamin-dependent conversion of certain vicinal amino alcohols to oxo compounds and ammonia. The enzyme is a heterodimer composed of subunits of Mr approximately 55,000 (EutB) and 35,000 (EutC) PUBMED:2197274.

    \ 5415 IPR008400 \ This region is found in the putatively extracellular N-terminal half of the anthrax receptor. It is probably part of the Ig superfamily and most closely related to (personal obs: C Yeats).\ 1579 IPR003696 \ The putative O-carbamoyltransferases (O-Cases) encoded by the nodU genes of Rhizobium fredii and Bradyrhizobium japonicum are involved in the synthesis of nodulation factors PUBMED:7559434. The cmcH genes of Nocardia lactamdurans and Streptomyces clavuligerus encode a functional 3'-hydroxymethylcephem O-carbamoyltransferase for cephamycin biosynthesis that shows significant similarity to the O-carbamoyltransferases PUBMED:7557411.\ 2991 IPR006711 \ This domain constitutes the N-terminal of the paralogous homeobox proteins HoxA9, HoxB9, HoxC9 and HoxD9. The N-terminal region is thought to act as a transcription activation region. Activation may be by interaction with proteins such as Btg proteins, which are thought to recruit a multi-protein Ccr4-like complex PUBMED:10617598.\ 8138 IPR013191 \

    This domain is the putative catalytic domain of glycosyl hydrolase family 98 proteins.

    \ 2869 IPR002868 \ The molecular function of the non-structural 5a viral protein is uncertain.\ The NS5a protein is phosphorylated when expressed in mammalian cells.\ It is thought to interact with the dsRNA-dependent (interferon\ inducible) kinase PKR, PUBMED:9710605, PUBMED:9143277.\ 2548 IPR001444 \

    Many bacterial species swim actively by means of flagella. The flagella\ organelle is made of three parts: the basal body, the hook and the filament.\ The basal body consists of four rings (L,P,S, and M) mounted on a central rod PUBMED:2129540.

    \

    In Salmonella typhimurium and related organisms the rod has been shown to\ consist of four different, yet evolutionary related proteins: in the distal\ portion of the rod there are about 26 subunits of protein flgG and in the\ proximal portion there are about six subunits each of proteins flgB, flgC, and\ flgF.\ These four proteins contain a highly conserved\ asparagine-rich domain at their N terminus.

    \ 7906 IPR012957 \

    The CHDCT2 C-terminal domain is found in PHD/RING fingers and chromo domain-associated CHD-like helicases PUBMED:15112237.

    \ 8015 IPR012609 \

    This family consists of the stage V sporulation (SpoV) proteins of Bacillus subtilis which includes SpoVM. SpoVM is an small, 26 residue-long protein that is produced in the mother cell chamber of the sporangium during the process of sporulation in Bacillus subtilis. SpoVM forms an amphipathic alpha-helix and is recruited to the polar septum shortly after the sporangium undergoes asymmetric division. The function of SpoVM depends on proper subcellular localisation PUBMED:12562810.

    \ 7594 IPR011680 \ This is a family of eukaryotic proteins thought to be involved in axonal outgrowth and fasciculation PUBMED:9096408. The N-terminal regions of these sequences are less conserved than the C-terminal regions, and are highly acidic PUBMED:9096408. The Caenorhabditis elegans homolog, UNC-76 (), may play structural and signalling roles in the control of axonal extension and adhesion (particularly in the presence of adjacent neuronal cells PUBMED:9971736) and these roles have also been postulated for other FEZ family proteins PUBMED:9096408. Certain homologs have been definitively found to interact with the N-terminal variable region (V1) of PKC-zeta, and this interaction causes cytoplasmic translocation of the FEZ family protein in mammalian neuronal cells PUBMED:9971736. The C-terminal region probably participates in the association with the regulatory domain of PKC-zeta PUBMED:9971736. The members of this family are predicted to form coiled-coil structures PUBMED:9971736, PUBMED:14697253, which may interact with members of the RhoA family of signalling proteins PUBMED:9971736, but are not thought to contain other characteristic protein motifs PUBMED:14697253. Certain members of this family are expressed almost exclusively in the brain, whereas others (such as FEZ2, ) are expressed in other tissues, and are thought to perform similar but unknown functions in these tissues PUBMED:14697253.\ 1093 IPR003157 \ This bacterial family of Acyl transferases (or myristoyl-acp-specific thioesterases) catalyse the first step in the bioluminescent fatty acid reductase system.\ 1772 IPR007837 \ DNA damage-inducible (din) genes in Bacillus subtilis are coordinately regulated and together compose a global regulatory network that has been termed the SOS-like or SOB regulon. This family includes DinB from Bacillus subtilis PUBMED:1847907.\ 3897 IPR005056 \ The matrix proteins of Pneumovirus virus are transcriptional processivity and antitermination factor and play a crucial role in viral assembly.\ 2215 IPR006700 \

    This family of conserved hypothetical proteins groups mostly bacterial proteins of unknown function.

    \ 7364 IPR011102 \

    Two-component systems, consisting of a histidine kinase and a cognate response regulator protein, represent the best-known apparatus for transducing external cues into a physiological response in bacteria. The HWE domain is found in a subset of two-component system kinases, belonging to the same superfamily as PUBMED:14702314. In PUBMED:14702314, the HWE family was defined by the presence of conserved a H residue and a WXE motifs and was limited to members of the proteobacteria. However, many homologues of this domain are lack the WXE motif. Furthermore, homologues are found in a wide range of Gram-positive and Gram-negative bacteria as well as in several archaea.

    \ 5737 IPR008654 \ This family consists of a the C-terminal region of a number of eukaryotic hypothetical proteins which are homologous to the Saccharomyces cerevisiae protein IWS1. IWS1 is known to be an Pol II transcription elongation factor and interacts with Spt6 and Spt5 PUBMED:12556496, PUBMED:12242279.\ 4373 IPR003782 \ This family is involved in biogenesis of respiratory and photosynthetic systems. In yeast the SCO1 protein is specifically required\ for a post-translational step in the accumulation of subunits 1 and 2 of cytochrome c oxidase (COXI and COX-II)PUBMED:1944230. It is a mitochondrion-associated cytochrome c oxidase assembly factor.\

    The purple nonsulphur photosynthetic eubacterium Rhodobacter capsulatus is a versatile organism that can obtain cellular energy by several means, including the capture of light energy for photosynthesis as well as the use of light-independent respiration, in which molecular oxygen serves as a terminal electron acceptor. The SENC protein is required for optimal cytochrome c oxidase activity in aerobically grown R. capsulatus cells and is involved in the induction of structural polypeptides of the light-harvesting and reaction center complexes PUBMED:7592491.

    \ 90 IPR005482 \

    Acetyl-CoA carboxylase is found in all animals, plants, and bacteria and catalyzes the first committed step in fatty acid synthesis. It is a\ multicomponent enzyme containing a biotin carboxylase activity, a biotin carboxyl carrier protein, and a carboxyltransferase\ functionality. The\ "B-domain" extends from the main body of the subunit where it folds into two alpha-helical regions and three strands of beta-sheet.\ Following the excursion into the B-domain, the polypeptide chain folds back into the body of the protein where it forms an\ eight-stranded antiparallel beta-sheet. In addition to this major secondary structural element, the C-terminal domain also contains a\ smaller three-stranded antiparallel beta-sheet and seven alpha-helices PUBMED:7915138.

    \ 3079 IPR000990 \

    The pannexin family combines invertebrate gap junction proteins and their vertebrate homologs. These proteins have been named innexins PUBMED:9769729. Gap junctions are composed of membrane proteins,\ which form a channel permeable for ions and small molecules connecting\ cytoplasm of adjacent cells. Although gap junctions provide similar functions\ in all multicellular organisms, until recently it was believed that\ vertebrates and invertebrates use unrelated proteins for this purpose. While\ the connexins family of gap junction proteins is well-\ characterized in vertebrates, no homologs have been found in invertebrates. In\ turn, gap junction molecules with no sequence homology to connexins have been\ identified in insects and nematodes. It has been suggested that these proteins\ are specific invertebrate gap junctions, and they were thus named innexins\ (invertebrate analog of connexins) PUBMED:9428764. As innexin homologs were recently identified in other taxonomic groups including vertebrates, indicating their ubiquitous distribution in the animal kingdom, they were called pannexins\ (from the Latin pan-all, throughout, and nexus-connection, bond) PUBMED:10898987, PUBMED:12492443, PUBMED:5028292.

    \ \

    Genomes of vertebrates carry probably a conserved set of 3 pannexin paralogs\ (PANX1, PANX2 and PANX3). Invertebrate genomes may contain more than a dozen\ pannexin (innexin) genes. Vinnexins, viral homologs of pannexins/innexins,\ were identified in Polydnaviruses that occur in obligate symbiotic\ associations with parasitoid wasps. It was suggested that virally encoded\ vinnexin proteins may function to alter gap junction proteins in infected host\ cells, possibly modifying cell-cell communication during encapsulation\ responses in parasitized insects PUBMED:12205780, PUBMED:14651471. Structurally pannexins are simillar to connexins. Both types of protein\ consist of a cytoplasmic N-terminal domain, followed by four transmembrane\ segments that delimit two extracellular and one cytoplasmic loops; the C-\ terminal domain is cytoplasmic.

    \ \ \ 5505 IPR008536 \ This family consists of several Chlamydia and Parachlamydia proteins, the function of which are unknown.\ 1541 IPR007521 \

    This domain is found N-terminal to choline/ethanolamine kinase regions () in some plant and fungal choline kinase enzymes (). This region is only found in some members of the choline kinase family, and is therefore unlikely to contribute to catalysis.

    \ 1651 IPR003823 \ This entry represents an uncharacterized domain in proteins of unknown function. This domain is found associated with CBS domains in\ some proteins .\ 3620 IPR000260 \ This domain is found in the NADH ubiquinone oxidoreductase (complex I) () which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction that is associated with proton translocation across the membrane PUBMED:1470679. This signature is found upstream of .\ 2576 IPR005187 \

    The influenza C virus genome consists of seven single-stranded RNA segments. The shortest RNA segment encodes a 286 amino acid non-structural protein NS1 PUBMED:10900030. This protein contains 6 conserved cysteines that may be functionally important, perhaps binding to a metal ion.

    \ 4541 IPR001950 \ In budding yeast (Saccharomyces cerevisiae), SUI1 is a translation initiation factor that functions in concert with eIF-2 and the initiator tRNA-Met in directing the ribosome to the proper start site of translation PUBMED:1729602. SUI1 is a protein of 108 residues. Close homologs of SUI1 have been found PUBMED:7904817 in mammals, insects and plants. SUI1 is also evolutionary related to hypothetical proteins from Escherichia coli (yciH), Haemophilus influenzae (HI1225) and Methanococcus vannielii.\ 5755 IPR009226 \

    This family consists of several isoforms of the penaeidin protein, which is specific to shrimps. Penaeidins, a unique family of antimicrobial peptides (AMPs) with both proline and cysteine-rich domains, were initially identified in the hemolymph of the Pacific white shrimp, Litopenaeus vannamei PUBMED:12242595.

    \ 1161 IPR000887 \ 4-Hydroxy-2-oxoglutarate aldolase () (KHG-aldolase) catalyzes the interconversion of \ 4-hydroxy-2-oxoglutarate into pyruvate and glyoxylate. Phospho-2-dehydro-3-deoxygluconate aldolase \ () (KDPG-aldolase) catalyzes the interconversion of 6-phospho-2-dehydro-3-deoxy-D-gluconate \ into pyruvate and glyceraldehyde 3-phosphate. These two enzymes are structurally and functionally \ related PUBMED:3136164. They are both homotrimeric proteins of approximately 220 amino-acid residues. \ They are class I aldolases whose catalytic mechanism involves the formation of a Schiff-base \ intermediate between the substrate and the epsilon-amino group of a lysine residue. In both enzymes, \ an arginine is required for catalytic activity.\ 305 IPR006461 \

    This group of sequences are described by a region of about 170 amino acids found at the C terminus of a family of plant proteins. These proteins have highly divergent N-terminal regions rich in low complexity sequence. PSI-BLAST reveals no clear similarity to any characterized protein. At least 12 distinct members are found in Arabidopsis thaliana.

    \ 1408 IPR000797 \

    The NSS proteins are encoded in the S RNA from ssRNA negative-strand viruses PUBMED:8760423. The S RNA also codes for the nucleoprotein N. The two main products are read from overlapping reading frames in the viral complementary sequence.

    \ 5459 IPR008509 \ This family consists of several eukaryotic proteins of unknown function.\ 1578 IPR004267 \

    This family represents the matrix protein, M2, of influenza C virus. The M1 protein is the product of a spliced mRNA (see ). Small\ quantities of the unspliced mRNA are found in the cell additionally encoding the M2 protein.

    \ 7459 IPR013042 \

    A region of similarity shared by several Rhodopirellula baltica cytochrome-like proteins that are predicted to be secreted. These proteins also contain , , and .

    \ 1416 IPR011616 \

    The basic-leucine zipper (bZIP) transcription factors PUBMED:7780801, PUBMED: of eukaryotic are proteins that contain a basic region mediating sequence-specific DNA-binding followed by a leucine zipper region (see ) required for dimerization.

    \ 5397 IPR008756 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M56 (clan M-). The predicted active site residues for members of this family occur in the motif HEXXH. The type example is BlaR1 peptidase from Bacillus licheniformis.

    \ Production of beta-Lactamase and penicillin-binding protein 2a (which mediate staphylococcal resistance to beta-lactam antibiotics) is regulated by a signal-transducing integral membrane protein\ and a transcriptional repressor. The signal transducer is a fusion protein with penicillin-binding and zinc metalloprotease domains. The signal for protein expression is transmitted by site-specific proteolytic cleavage of both the transducer, which auto-activates, and the repressor, which is inactivated, unblocking gene transcription.

    \ 2778 IPR002685 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 15 comprises enzymes with only one known activity; glycolipid 2-alpha-mannosyltransferase .

    \ 6701 IPR009670 \

    This family consists of several cell surface immobilisation antigen SerH proteins which seem to be specific to Tetrahymena thermophila. The SerH locus of Tetrahymena thermophila is one of several paralogous loci with genes encoding variants of the major cell surface protein known as the immobilisation antigen (i-ag) PUBMED:11973302.

    \ 4574 IPR003819 \ This family consists of TauD/TfdA taurine catabolism dioxygenases. The Escherichia coli tauD gene is required for the utilization of taurine (2-aminoethanesulphonic acid) as a sulphur source and is expressed only under conditions of sulphate starvation. TauD is an alpha-ketoglutarate-dependent dioxygenase catalyzing the oxygenolytic release of sulphite from taurine PUBMED:9287300. The 2,4-dichlorophenoxyacetic acid/alpha-ketoglutarate dioxygenase from Burkholderia sp. strain RASC also belongs to this family PUBMED:8779585. TfdA from Alcaligenes eutrophus is a 2,4-D monooxygenase PUBMED:3036764.\ 4255 IPR001377 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaebacterial ribosomal proteins have been grouped\ on the basis of sequence similarities. \ Ribosomal protein S6 is the major substrate of protein kinases in eukaryotic ribosomes PUBMED:8440735 and\ may play an important role in controlling cell growth and proliferation\ through the selective translation of particular classes of mRNA.

    \ 2195 IPR007489 \ This is a C-terminal region from several bacterial proteins of unknown function that may be involved in a theta-type replication mechanism.\ 3929 IPR007032 \

    These proteins are homologues of vaccinia virus A51.

    \ 8056 IPR013181 \

    This is a group of rice proteins of unknown function. They may have a role in ATPase activation.

    \ 5929 IPR009291 \

    This family consists of several hypothetical proteins from plants. The function of this family is unknown.

    \ 5611 IPR008467 \ This family consists of several eukaryotic dynein light intermediate chain proteins. The light intermediate chains (LICs) of cytoplasmic dynein consist of multiple isoforms, which undergo post-translational modification to produce a large number of species. DLIC1 is known to be involved in assembly, organisation, and function of centrosomes and mitotic spindles when bound to pericentrin. DLIC2 is a subunit of cytoplasmic dynein 2 that may play a role in maintaining Golgi organisation by binding cytoplasmic dynein 2 to its Golgi-associated cargo PUBMED:11907264.\ 3249 IPR003159 \

    Proteins containing this central domain consist of a group of secreted bacterial lyase enzymes capable of acting on a variety of substrates. One such enzyme is hyaluronate lyase, a Streptococcal surface enzyme that degrades hyaluronan and chondroitin, thereby helping to spread the bacteria throughout host tissues PUBMED:14523022. Hyaluronate lyase () is a four-domain enzyme containing an N-terminal carbohydrate-binding domain, a spacer domain, a catalytic domain, and a C-terminal domain that modulates access to the catalytic cleft of the enzyme. The central domain has a beta-sandwich topology, with 18 strands in two sheets. Other bacterial enzymes that display this structure include the central domain of chondroitin AC lyase () PUBMED:10329169, the central domain of xanthan lyase () PUBMED:12475987, and the third domain of chondroitin ABC lyase () PUBMED:12706721. This entry represents these domains of hyaluronate lyase, chondroitin AC lyase, xanthan lyase and chondroitin ABC lyase. This domain if almost always associated with the polysaccharide lyase family 8 C-terminal domain ().

    \ 2454 IPR002603 \ This domain has no known function, being found in several\ Caenorhabditis elegans proteins. The domain contains 8-10 conserved\ cysteines that probably form 4-5 disulphide bridges. By\ inspection of the conservation of cysteines it looks like\ cysteines 1,2,3,4,9 and 10 are always present and that\ sometimes the pair 5 and 8 or the pair 6 and 7 are missing.\ This suggests that cysteines 5/8 and 6/7 make disulphide\ bridges.\ 829 IPR003118 \

    Transcription factors are protein molecules that bind to specific DNA\ sequences in the genome, resulting in the induction or inhibition of gene\ transcription PUBMED:2163347. The ets oncogene is such a factor, possessing a region \ of 85-90 amino acids known as the ETS (erythroblast transformation specific) domain PUBMED:2163347, PUBMED:2253872. This domain is rich in\ positively-charged and aromatic residues, and binds to purine-rich segments\ of DNA. The ETS domain has been identified in other transcription factors\ such as PU.1, human erg, human elf-1, human elk-1, GA binding protein, and\ a number of others PUBMED:2163347, PUBMED:2253872, PUBMED:8425553.\ It is generally localized at the C-terminus of the protein,\ with the exception of ELF-1, ELK-1, ELK-3, ELK-4 and ERF where it is found at\ the N-terminus.

    \ \

    This entry describes a subfamily of the SAM domain a widespread domain in signalling and nuclear proteins that occurs along with the ETS domain.

    \ 7303 IPR011104 \

    This family represents the C-terminal kinase domain of Hpr Serine/threonine kinase PtsK. This kinase is the sensor in a multicomponent phosphorelay system in control of carbon catabolic repression in bacteria PUBMED:9570401. This kinase in unusual in that it recognises the tertiary structure of its target and is a member of a novel family unrelated to any previously described protein phosphorylating enzymes PUBMED:9570401. X-ray analysis of the full-length crystalline enzyme from Staphylococcus xylosus at a resolution of 1.95 A shows the enzyme to consist of two clearly separated domains that are assembled in a hexameric structure resembling a three-bladed propeller PUBMED:11904409.

    \ 7026 IPR009849 \

    This entry represents a conserved region approximately 180 residues long, multiple copies of which are sometimes found within hypothetical Ureaplasma parvum proteins of unknown function.

    \ 638 IPR002259 \

    Delayed-early response (DER) gene products include growth progression\ factors and several unknown products of novel cDNAs. Murine and human cDNAs\ from one novel DER gene (DER12) have been characterised to identify its\ product and to examine its role in the growth response PUBMED:7639753. Both sequences\ encode a hydrophobic 36kD protein that is predicted to contain 8\ transmembrane (TM) domains. The protein has been localised to the nucleolus,\ where its concentration increases following mitogen stimulation PUBMED:7639753.

    \

    Although the function of the protein is unknown, its identification as a\ nucleolar gene transcriptionally activated by growth factors implicates it\ as participating in the proliferative response PUBMED:7639753. Sequence analysis\ reveals the protein to share a high degree of similarity with the C-terminal\ portion of equilibrative nucleoside transporters. These proteins are integral membrane proteins which enable the movement of hydrophilic nucleosides\ and nucleoside analogs down their concentration gradients across cell membranes. ENT family members have been identified in humans, mice, fish, tunicates, slime molds, and bacteria PUBMED:12446811.

    \ 775 IPR005094 \ Relaxases/mobilization proteins are required for the horizontal transfer of genetic information contained on plasmids that occurs during bacterial conjugation. The\ relaxase, in conjunction with several auxiliary proteins, forms the relaxation complex or relaxosome. Relaxases nick duplex DNA in a specific manner by catalysing\ trans-esterification PUBMED:9350859.\ 5557 IPR007111 \

    The NACHT domain is a 300 to 400 residue predicted nucleoside triphosphatase (NTPase) domain, which is found in animal, fungal and bacterial proteins. The NACHT domain has been named after NAIP, CIITA, HET-E and TP1. It is found in\ association with other domains, such as the CARD domain (), the\ DAPIN domain (), the HEAT repeat (), the WD\ repeat (), the leucine-rich repeat (LRR) or the BIR repeat () PUBMED:10782090.

    \

    \ The NACHT domain consists of seven distinct conserved motifs, including the ATP/GTPase specific P-loop, the Mg(2+)-binding site (Walker\ A and B motifs, respectively) and five more specific motifs. The unique features of the NACHT domain include the prevalence of 'tiny' residues\ (glycine, alanine or serine) directly C-terminal of the Mg(2+)-coordinating aspartate in the Walker B motif, in place of a second acidic residue prevalent\ in other NTPases. A second acidic residue is typically found in the NACHT-containing proteins two positions downstream. Furthermore, the distal motif VII contains a conserved pattern of polar, aromatic and hydrophobic residues that is not seen in any other NTPase family PUBMED:10782090.

    \ 2992 IPR002718 \

    \ Gram-negative bacterial outer membranes constitute a semi-permeable, size-\ dependent permeability barrier, for example to hydrolytic enzymes, \ detergents, dyes and hydrophobic anti-microbials. The outer membrane\ protein (OMP) profile of Helicobacter pylori differs from that of other\ Gram-negative bacteria, where the highly non-selective porins are absent and\ a number of less abundant protein species are observed [PUBMED:9252185. OMPs from H. pylori \ have been identified as porins, gastric epithelial cell adhesins and Lewis B\ binding adhesins PUBMED:9430586. Extensive C-terminal sequence similarity between\ these OMPs has been used to define a much larger paralogous family.\

    \

    \ H. pylori is the causative agent of gastritis and peptic\ ulceration in humans. Numerous subtypes of OMPs have been identified in\ H. pylori. Attempts have been made to construct recombinant vectors that are able\ to express these OMPs in order to develop a vaccine protecting against\ Hp infection and a diagnostic reagent kit to quickly detect H. pylori infection. OMPs were chosen as possible targets of vaccine development as they are\ H. pylori specific, surface exposed and highly antigenic.\

    \ \ 1716 IPR007130 \ The terminal step of triacylglycerol (TAG) formation is catalysed by the enzyme diacylglycerol acyltransferase (DAGAT) PUBMED:11751830, PUBMED:11751875.\ 2736 IPR000726 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 19 comprises enzymes with only one known activity; chitinase ().

    \ \

    Chitinases PUBMED:1516675 are enzymes that catalyze the hydrolysis of the\ beta-1,4-N-acetyl-D-glucosamine linkages in chitin polymers. Chitinases belong to glycoside hydrolase families 18 or 19 PUBMED:1747104. Chitinases of family 19 (also known as classes IA or I and IB \ or II) are enzymes from plants that function in the defense against fungal and insect pathogens \ by destroying their chitin-containing cell wall. Class IA/I and IB/II enzymes differ in the\ presence (IA/I) or absence (IB/II) of a N-terminal chitin-binding domain. The catalytic domain \ of these enzymes consist of about 220 to 230 amino acid residues.

    \ 4924 IPR003633 \ Variant-surface-glycoprotein phospholipase C, by hydrolysis of the attached glycolipid, releases soluble variant surface glycoprotein containing phosphoinositol from the cell wall after lysis. It catalyses the conversion of variant-surface-glycoprotein 1,2 didecanoyl-SN-phosphatidylinositol and water to 1,2-didecanoylglycerol and the soluble variant-surface-glycoprotein. It also cleaves similar membrane anchors on some mammalian proteins.\ 7836 IPR012983 \

    This domain is called PHR as it was original found in the proteins PAM (), highwire () and RPM (). This domain can be duplicated in the highwire, PAM and PRM sequences. The function of PHR is currently unclear.

    \ 4105 IPR000093 \ The bacterial protein recR seems to play a role in a recombinational process\ of DNA repair PUBMED:2674903. It may act with recF and recO. RecR is a protein of about\ 200 amino acid residues. This protein contains a putative\ C4-type zinc finger in the N-terminal section.\ 6841 IPR010741 \

    This family consists of several Alphaherpesvirus proteins of around 200 residues in length. The function of this family is unknown.

    \ 3231 IPR002217 \

    A major antigen has been recognised in Helicobacter pylori, a protein with an apparent molecular weight of 20,000 and mass 18,283 kDa PUBMED:7928954. DNA sequence analysis revealed a 525 bp gene, encoding a 175-amino acid residue product with a typical 21-residue lipoprotein signal peptide and consensus prolipoprotein processing site PUBMED:7928954. Results of experimental work with Lpp20 are consistent with it being a nonessential lipoprotein PUBMED:7928954.

    \

    Prokaryotic membrane lipoproteins are synthesised with precursor signal peptides that are cleaved by specific peptidases (signal peptidase II). The enzyme recognises a conserved sequence, cutting upstream of a cysteine residue to which a glyceride-fatty acid lipid is attached PUBMED:2202727.

    \ 7563 IPR011708 \

    This is a conserved region found in the the DNA polymerase III alpha subunit, (). DNA polymerase III is a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria. This DNA polymerase also exhibits 3' to 5' exonuclease activity. The alpha chain is the DNA polymerase.

    \ 5047 IPR007341 \ This bacterial protein is predicted to be an integral membrane protein. Some family members have been annotated as transglycosylase-associated proteins, but no experimental evidence is provided. This family was annotated based on the information in .\ 1493 IPR004714 \

    Cytochrome cbb3 oxidases are found almost exclusively in Proteobacteria, and represent a distinctive class of proton-pumping respiratory haem-copper oxidases (HCO) that lack many of the key structural features that contribute to the reaction cycle of the intensely studied mitochondrial cytochrome c oxidase (CcO). Expression of cytochrome cbb3 oxidase allows human pathogens to colonise anoxic tissues and agronomically important diazotrophs to sustain nitrogen fixation PUBMED:15100055.

    Genes encoding a cytochrome cbb3 oxidase were initially designated fixNOQP (ccoNOQP), the ccoNOQP operon is always found close to a second gene cluster, known as fixGHIS (ccoGHIS) whose expression is necessary for the assembly of a functional cbb3 oxidase. On the basis of their derived amino acid sequences each of the four proteins encoded by the ccoGHIS operon are thought to be membrane-bound. It has been suggested that they may function in concert as a multi-subunit complex, possibly playing a role in the uptake and metabolism of copper required for the assembly of the binuclear centre of cytochrome cbb3 oxidase.

    \ 1885 IPR003734 \

    This entry describes proteins of unknown function.

    \ 1064 IPR001626 \

    ATP-binding cassette (ABC) transporters are multidomain membrane proteins, responsible\ for the controlled efflux and influx of substances (allocrites) across cellular membranes. They are minimally composed of four domains, with two transmembrane domains\ (TMDs) responsible for allocrite binding and transport and two nucleotide-binding domains\ (NBDs) responsible for coupling the energy of ATP hydrolysis to conformational changes\ in the TMDs. Both NBDs are capable of ATP hydrolysis, and inhibition of\ hydrolysis at one NBD effectively abrogates hydrolysis at the other. Hydrolysis\ at the two NBDs may occur in an alternative fashion although they appear substantially functionally\ symmetrical in terms of their binding to diverse nucleotides PUBMED:12504680.

    \ A number of bacterial transport systems have been found to contain integral\ membrane components that have similar sequences PUBMED:1303751: these systems fit the\ characteristics of ATP-binding cassette transporters PUBMED:1659649. The\ proteins form homo- or hetero-oligomeric channels, allowing ATP-mediated \ transport. Hydropathy analysis of the proteins has revealed the presence\ of 6 possible transmembrane regions. These proteins belong to family 3 of ABC transporters.\ 4346 IPR006454 \

    These sequences represent one of several families of proteins associated with the formation of prokaryotic S-layers. Members of this family are found in archaeal species, including Pyrococcus horikoshii (split into two tandem reading frames), Methanococcus jannaschii, and related species. Some local similarity can be found to other S-layer protein families.

    \ 2000 IPR002651 \ This entry represents a group of hypothetical Caenorhabditis elegans\ proteins with unknown function. The aligned\ region is approximately 160 amino acids long.\ 1209 IPR003222 \ The anititermination protein is mostly found in bacteriophages, where it modifies host RNA polymerase, which then transcribes through termination sites that would have prevented expression of these genes. In this way the protein positively regulates expression of some phage genes.\ 4 IPR005123 \

    This family contains members of the 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase superfamily PUBMED:11276424. This family includes the C-terminal of prolyl 4-hydroxylase alpha subunit. The holoenzyme has the activity () catalysing the reaction:

    \

    \

    The full enzyme consists of a alpha2 beta2 complex with the alpha subunit contributing most of the parts of the active site PUBMED:7753822. The family also includes lysyl hydrolases, isopenicillin synthases and AlkB.

    \ \ 7685 IPR012443 \

    Some of the members of this family are hypothetical bacterial and archaeal proteins, but others are annotated as being cation transporters expressed by the archaeon Methanosarcina mazei (, and ).

    \ 2964 IPR002732 \

    This family of archaebacterial proteins are holliday junction resolvases (hjc gene) PUBMED:10430863. The Holliday junction is an essential intermediate of homologous recombination. Holliday junctions are four-stranded DNA complexes that are formed during recombination and related DNA repair events. In the presence of divalent cations, these junctions exist predominantly as the stacked-X form in\ which the double-helical segments are coaxially stacked and twisted by 60 degrees in a right-handed direction across the junction cross-over. In this structure,\ the stacked arms resemble two adjacent double-helices, but are linked at the junction by two common strands that cross-over between the duplexes PUBMED:12126623. During homologous recombination, genetic information is physically exchanged between parental DNAs via crossing single\ strands of the same polarity within the four-way Holliday structure. This process is terminated by the\ endonucleolytic activity of resolvases, which convert the four-way DNA back to two double strands.

    \ 4761 IPR006795 \ This region is found in some members of the SpoU-type rRNA methylase family ().\ 597 IPR007248 \

    The 22 kDa peroxisomal membrane protein (PMP22) is a major component of peroxisomal membranes. PMP22 seems to be involved in pore-forming activity and may contribute to the unspecific permeability of the organelle membrane. PMP22 is synthesised on free cytosolic ribosomes and then directed to the peroxisome membrane by specific targeting information PUBMED:11590176. Mpv17 is a closely related peroxisomal protein involved in the development of early-onset glomerulosclerosis PUBMED:11327696.

    \

    A member of this family found in Saccharomyces cerevisiae is an integral membrane protein of the inner mitochondrial membrane and has been suggested to play a role in mitochondrial function during heat shock PUBMED:15189984.

    \ 3571 IPR000136 \ Oleosins PUBMED:1989697 are the proteinaceous components of plants' lipid storage bodies\ called oil bodies. Oil bodies are small droplets (0.2 to 1.5 mu-m in diameter)\ containing mostly triacylglycerol that are surrounded by a phospholipid/\ oleosin annulus. Oleosins may have a structural role in stabilising the lipid\ body during dessication of the seed, by preventing coalescence of the oil.\ They may also provide recognition signals for specific lipase anchorage in\ lipolysis during seedling growth. Oleosins are found in the monolayer lipid/\ water interface of oil bodies and probably interact with both the lipid and\ phospholipid moieties.\ Oleosins are proteins of 16 Kd to 24 Kd and are composed of three domains: an\ N-terminal hydrophilic region of variable length (from 30 to 60 residues); a\ central hydrophobic domain of about 70 residues and a C-terminal amphipathic\ region of variable length (from 60 to 100 residues). The central hydrophobic\ domain is proposed to be made up of beta-strand structure and to interact with\ the lipids PUBMED:1639802. It is the only domain whose sequence\ is conserved.\ 4382 IPR005606 \

    Sec20 is a membrane glycoprotein associated with secretory pathway.

    \ 4445 IPR003488 \ The SMF family (DNA processing chain A, dprA) are a group of bacterial proteins. In Helicobacter pylori, dprA is required for natural chromosomal and plasmid transformation PUBMED:10640603.\ 2608 IPR004885 \

    This is a group of proteins of unknown function from bacteriophages.

    \ 2370 IPR000427 \

    E2 is an early regulatory protein found in the dsDNA papillomaviruses. The viral genome is a 7.9-kb circular DNA that codes for at least eight early and two late (capsid) proteins. The products of the early genes E6 and E7 are oncoproteins that destabilise the\ cellular tumor suppressors p53 and pRB. The product of the E1 gene is a helicase necessary for viral DNA replication. The products\ of the E2 gene play key roles in the regulation of viral gene transcription and DNA replication. During early stages of viral infection, the\ E2 protein represses the transcription of the oncogenes E6 and E7, reintroduction of E2 into cervical cancer cell-lines leads to repression of E6/E7 transcription, stabilisation of the tumor suppressor p53, and\ cell-cycle arrest at the G1 phase of the cell cycle. E2 can also induce apoptosis by a p53-independent mechanism.

    E2 proteins from all papillomavirus strains bind a consensus palindromic sequence ACCgNNNNcGGT present in multiple copies in the regulatory region. It can either activate or repress transcription, depending on E2RE's position with regard to proximal promoter elements. Repression occurs by sterically hindering the assembly of the transcription initiation complex. \ The E2 protein is composed of a C-terminal DNA-binding\ domain and an N-terminal trans-activation domain. E2 exists in solution and binds to DNA as a dimer The E2-DNA binding domain forms a dimeric ß-barrel, with each subunit contributing an\ anti-parallel 4-stranded ß-sheet "half-barrel" PUBMED:1328886, PUBMED:11988474. The topology of each subunit is ß1-1-ß2-ß3-2-ß4. Helix 1 is the recognition helix housing all of\ the amino acid residues involved in direct DNA sequence specification. Upon dimerisation, strands ß2 and ß4 at the edges of each subunit participate in a\ continuous hydrogen-bonding network, which results in an 8-stranded ß-barrel. The dimer interface is extensive, made up of hydrogen bonds\ between subunits and a substantial hydrophobic ß-barrel core.

    \ 654 IPR003137 \ The PA (Protease associated) domain is found as an insert domain in diverse proteases, which include the MEROPS peptidase families A22B, M28, and S8A PUBMED:7674922. The PA domain is also found in a plant vacuolar sorting receptor and members of the RZF family, e.g. .\ 1627 IPR006784 \ This family represents the Coronavirus ORF3 protein, also known as the X2A protein.\ 6988 IPR009829 \

    This family consists of several hypothetical eukaryotic proteins of around 250 residues in length. The function of this family is unknown.

    \ 853 IPR007131 \

    The SLA1 homology domain is found in the cytoskeleton assembly control protein SLA1, which is responsible for the correct formation of the actin cytoskeleton.

    \ 610 IPR003473 \ Quinolinate synthetase catalyzes the second step of the de novo biosynthetic pathway of pyridine nucleotide formation. In particular, quinolinate synthetase is involved in the condensation of dihydroxyacetone phosphate and iminoaspartate to form quinolinic acid PUBMED:10648170. This synthesis requires two enzymes, an FAD-containing "B protein" and an "A protein".\ 7952 IPR012636 \

    This family consists of the tamulustoxins, which are found in the venom of the Indian red scorpion (Mesobuthus tamulus). Tamulustoxin shares no similarity with other scorpion venom toxins, although the positions of its six cysteine residues suggest that it shares the same structural scaffold. Tamulustoxin acts as a potassium channel blocker PUBMED:11361010.

    \ 2997 IPR011126 \

    This entry represents the N-terminal region of Hpr Serine/threonine kinase PtsK. This kinase is the sensor in a multicomponent phosphorelay system in control of carbon catabolic repression in bacteria PUBMED:11904409. This kinase in unusual in that it recognises the tertiary structure of its target and is a member of a novel family unrelated to any previously described protein phosphorylating enzymes PUBMED:11904409. X-ray analysis of the full-length crystalline enzyme from Staphylococcus xylosus at a resolution of 1.95 A shows the enzyme to consist of two clearly separated domains that are assembled in a hexameric structure resembling a three-bladed propeller. The blades are formed by two N-terminal domains each, and the compact central hub assembles the C-terminal kinase domains PUBMED:9570401.

    \ 5936 IPR009297 \

    This family consists of several hypothetical bacterial and plant proteins of unknown function.

    \ 7423 IPR011506 \

    This motif is conserved at the N terminus of several Rhodopirellula baltica proteins predicted to be extracellular.

    \ 332 IPR000640 \

    This domain includes the C-terminal regions of the elongation factors EF-G and eEF-2, and of some tetracycline resistance proteins. This domain adopts a ferredoxin-like fold consisting of an alpha/beta sandwich with anti-parallel beta-sheets. It is often found associated with , which contains the signatures for the N-terminus of the proteins. This domain resembles the topology of domain III found in the elongation factors EF-G and eEF-2, with which it forms the C-terminal block, although they are not superimposable and domain III lacks some of the characteristics of this domain. EF-G participates in the elongation phase of protein synthesis, and also facilitates the release of tRNA and mRNA from the ribosome PUBMED:12471894.

    \ 776 IPR004322 \ This is a family of bacterial plasmid DNA replication initiator proteins. These RepA\ proteins exist as monomers and dimers in equilibrium: monomers bind directly to repeated DNA sequences and thus\ activate replication; dimers repress repA transcription by binding an inversely repeated DNA operator. Dimer\ dissociation can occur spontaneously or may be mediated by Hsp70 chaperones.\ A similar RepA family of proteins found mainly in Escherichia coli is involved in plasmid replication (see ).\ 7218 IPR009975 \

    This family consists of several P30 proteins which seem to be specific to Mycoplasma agalactiae. P30 is a 30 kDa immunodominant antigen and is known to be a transmembrane protein PUBMED:11473997.

    \ 796 IPR000687 \ Several uncharacterized proteins were found to be evolutionary related, including yeast protein RIO1;\ Caenorhabditis elegans hypothetical protein ZK632.3; Methanococcus jannaschii hypothetical protein MJ0444;\ and Thermoplasma acidophilum hypothetical protein in rpoA2 3'region. The eukaryotic members of this\ family are proteins of about 55 to 60 kD, while the archebacterial ones are half that size. The central part\ of these proteins is highly conserved.\ 7690 IPR013094 \

    This catalytic domain is found in a very wide range of enzymes PUBMED:1409539.

    \ 7829 IPR012962 \

    This family contains many hypothetical proteins which are predicted to be zinc-dependent peptidases.

    \ 6599 IPR009615 \

    This entry represents the N terminus of viral desmoplakin. Desmoplakin is a component of mature desmosomes, which are the main adhesive junctions in epithelia and cardiac muscle. Desmoplakin is also essential for the maturation of adherens junctions PUBMED:11781580. Note that many family members are hypothetical.

    \ 2274 IPR006912 \ This family of plant proteins have no known function.\ 261 IPR005046 \

    This is a family proteins of unknown function. Many contain a tandem peptide repeat sequence of 25 or 26 residues, found in predicted surface proteins (often lipoproteins) from Listeria monocytogenes, L. innocua, Enterococcus faecalis, Lactobacillus plantarum, Mycoplasma mycoides, Helicobacter hepaticus, and other species.

    \ 6875 IPR010754 \

    This family consists of several optic atrophy 3 (OPA3) proteins. OPA3 deficiency causes type III 3-methylglutaconic aciduria (MGA) in humans. This disease manifests with early bilateral optic atrophy, spasticity, extrapyramidal dysfunction, ataxia, and cognitive deficits, but normal longevity PUBMED:12126933.

    \ 962 IPR005122 \

    Uracil-DNA glycosylase () (UNG) is a DNA repair enzyme that excises uracil residues from DNA by\ cleaving the N-glycosylic bond. Uracil in DNA can arise as a result of misincorportation of dUMP residues by DNA\ polymerase or deamination of cytosine. The sequence of uracil-DNA glycosylase is extremely well conserved PUBMED:2555154.

    \ 3554 IPR005899 \

    This family comprises distantly related, low complexity, hydrophobic small\ subunits of several related sodium ion-pumping decarboxylases. These include\ oxaloacetate decarboxylase gamma subunit and methylmalonyl-CoA decarboxylase delta subunit.

    \ \ 7694 IPR012424 \

    Members of this family have been implicated in as being involved in an unusual form of DNA transfer (conjugation) in Bacteroides PUBMED:11319931. The family has been named CtnDOT_TraJ to avoid confusion with other conjugative transfer systems.

    \ 5594 IPR008713 \ NinG or Rap is involved in recombination. Rap (recombination adept with plasmid) increases lambda-by-plasmid recombination catalysed by the Escherichia coli RecBCD pathway PUBMED:11952832.\ 1131 IPR007862 \ Comparisons of adenylate kinases have revealed a particular divergence in the active site lid. In some organisms, particularly the Gram-positive bacteria, residues in the lid domain have been mutated to cysteines and these cysteine residues are responsible for the binding of a zinc ion. The bound zinc ion in the lid domain is clearly structurally homologous to Zinc-finger domains. However, it is unclear whether the adenylate kinase lid is a novel zinc-finger DNA/RNA binding domain, or that the lid bound zinc serves a purely structural function PUBMED:9715904.\ 6474 IPR010592 \

    This family consists of several high affinity transport system protein p37 sequences, which are specific to Mycoplasma species. The p37 gene is part of an operon encoding two additional proteins, which are highly similar to components of the periplasmic binding-protein-dependent transport systems of Gram-negative bacteria. It has been suggested that p37 is part of a homologous, high-affinity transport system in Mycoplasma hyorhinis, a Gram-positive bacterium PUBMED:3208756.

    \ 4514 IPR007311 \ The ST7 (for suppression of tumorigenicity 7) protein is thought to be a tumour suppressor gene. The molecular function of this protein is uncertain.\ 969 IPR000608 \

    The post-translational attachment of ubiquitin () to proteins (ubiquitinylation) alters the function, location or trafficking of a protein, or targets it to the 26S proteasome for degradation PUBMED:15556404, PUBMED:15196553, PUBMED:15454246. Ubiquitinylation is an ATP-dependent process that involves the action of at least three enzymes: a ubiquitin-activating enzyme (E1, ), a ubiquitin-conjugating enzyme (E2), and a ubiquitin ligase (E3, , ), which work sequentially in a cascade PUBMED:14998368. The E1 enzyme mediates an ATP-dependent transfer of a thioester-linked ubiquitin molecule to a cysteine residue on the E2 enzyme. The E2 enzyme () then either transfers the ubiquitin moiety directly to a substrate, or to an E3 ligase, which can also ubiquitinylate a substrate.

    \

    There are several different E2 enzymes (over 30 in humans), which are broadly grouped into four classes, all of which have a core catalytic domain (containing the active site cysteine), and some of which have short N- and C-terminal amino acid extensions: class I enzymes consist of just the catalytic core domain (UBC), class II possess a UBC and a C-terminal extension, class III possess a UBC and an N-terminal extension, and class IV possess a UBC and both N- and C-terminal extensions. These extensions appear to be important for some subfamily function, including E2 localisation and protein-protein interactions PUBMED:15545318. In addition, there are proteins with an E2-like fold that are devoid of catalytic activity, but which appear to assist in poly-ubiquitin chain formation.

    \ \ 7927 IPR012630 \

    This family consists of the hefutoxins that are found in the venom of the scorpion Heterometrus fulvipes. These toxins, kappa-hefutoxin1 and kappa-hefutoxin2, exhibit no homology to any known toxins. The hefutoxins are potassium channel toxins PUBMED:12034709.

    \ 5844 IPR010311 \

    This family consists of several Reovirus core-spike protein lambda-2 (L2) sequences. The reovirus L2 genome segment encodes the core spike protein lambda-2, which mediates enzymatic reactions in 5' capping of the viral plus-strand transcripts PUBMED:11531411.

    \ 2337 IPR007871 \ This family of eukaryotic proteins has no characterised function. The alignment contains some conserved cysteines and histidines that might form a zinc binding site.\ 5745 IPR008592 \ This family consists of several hypothetical proteins specific to Helicobacter pylori. The function of this family is unknown.\ 7936 IPR012528 \

    This family consists of the ponericin L family of antimicrobial peptides that are isolated from the venom of the predatory ant Pachycondyla goeldii. Ponericin L family shares similarities with dermaseptins. Ponericin L may adopt an amphipathic alpha-helical structure in polar environments and these peptides exhibit a defensive role against microbial pathogens arising from prey introduction and/or ingestion PUBMED:11279030.

    \ 2936 IPR007625 \ UL51 protein is a virion protein. In pseudorabies virus, UL51 () was identified as a component of the capsid PUBMED:9188640. In herpes simplex virus type 1 there is evidence for post-translational modification of UL51 PUBMED:9880018.\ 1058 IPR002466 \ Editase () are enzymes that alter mRNA by catalyzing the\ site-selective deamination of adenosine residue into inosine residue.\ The editase domain contains the active site and binds three Zn atoms PUBMED:9159072.\ \ Several editases share a common global arrangement of domains, from N to C terminus: two\ 'double-stranded RNA-specific adenosine deaminase' (DRADA) repeat domains (), followed by\ three 'double-stranded RNA binding' (DsRBD) domains (), followed by\ the editase domain. Other editases have a simplified domains structure with no\ DRADA_REP and possibly fewer DSRBD domains. Editase that deaminate cytidine are not detected by this signature.\ 3047 IPR003403 \ This regulatory protein is expressed from an immediate early gene in the cell cycle of herpesvirus. The protein is known by various names including IE-68, US1, ICP22 and IR4.\ 2105 IPR006698 \

    These are Bacterial and Archaeal proteins of unknown function.

    \ 4576 IPR004120 \ Human T-cell leukemia virus type I (HTLV-I) is the etiological agent for adult T-cell leukemia (ATL), as well as for\ tropical spastic paraparesis (TSP) and HTLV-I associate myelopathy (HAM). A biological understanding of the\ involvement of HTLV-I and in ATL has focused significantly on the workings of the virally-encoded 40 kDa\ phospho-oncoprotein, Tat. Tat is a transcriptional activator. Its ability to modulate the expression and function of many\ cellular genes has been reasoned to be a major contributory mechanism explaining HTLV-I-mediated transformation of\ cells. In activating cellular gene expression, Tat impinges upon several cellular signal-transduction pathways, including\ those for CREB/ATF and NF-kappaB PUBMED:11325603.\ 3369 IPR004869 \ Proteins of this entry are putative integral membrane proteins from bacteria. Several of the members are mycobacterial proteins.\ Many of the proteins contain two copies of this aligned region. The function of these proteins is not known, although it has been\ suggested that they may be involved in lipid transport PUBMED:10694977.\ 1559 IPR006472 \

    These sequences, from both Gram-positive and Gram-negative bacteria, represent the alpha subunit of the holoenzyme citrate lyase composed of alpha (), beta, and acyl carrier protein subunits in a stoichiometric relationship of 6:6:6. Citrate lyase is an enzyme which converts citrate to oxaloacetate. In bacteria, this reaction is involved in citrate fermentation. The alpha subunit catalyzes the reaction Acetyl-CoA + citrate = acetate + (3S)-citryl-CoA. The protein from Lactococcus lactis subsp. lactis has been experimentally characterized PUBMED:1115558.

    \ \ 1902 IPR003772 \

    This entry describes proteins of unknown function.

    \ 8121 IPR013188 \

    This region is thought to be a second domain of the M1 matrix protein.

    \ 7564 IPR006527 \

    This domain occurs in a diverse superfamily of genes in plants. Most examples are found C-terminal to an F-box (), a 60 amino acid motif involved in ubiquitination of target proteins to mark them for degradation. Two-hybid experiments support the idea that most members are interchangeable F-box subunits of SCF E3 complexes PUBMED:12169662. Some members have two copies of this domain.

    \ 1806 IPR006343 \

    These sequences contain a conserved domain. It is found in DnaD, part of Bacillus subtilis replication restart primosome, and of a number of phage-associated proteins. Members, both chromosomal or phage-associated, are found in the Bacillus/Clostridium group of Gram-positive bacteria PUBMED:11679082.

    \ 6098 IPR009373 \

    This family consists of several short Circovirus proteins of unknown function.

    \ 5302 IPR008836 \ This family consists of several mammalian semenogelin (I and II) proteins. Freshly ejaculated Homo sapiens semen has the appearance of a loose gel in which the predominant structural protein components are the seminal vesicle secreted semenogelins (Sg) PUBMED:1584792.\ 5057 IPR007894 \

    This domain of unknown function is often found adjacent to the GGDEF domain in bacteria ().

    \ 2948 IPR001312 \

    Hexokinase is an important enzyme that catalyses the ATP-dependent conversion of aldo- and keto-hexose sugars to the hexose-6-phosphate (H6P). The enzyme can catalyse this reaction on glucose, fructose, sorbitol and glucosamine, and as such is the first step in a number of metabolic pathways PUBMED:1783373. The addition of a phosphate group to the sugar acts to trap it in a cell, since the negatively charged phosphate cannot easily traverse the plasma membrane.

    \ \

    The enzyme is widely distributed in eukaryotes. There are three isozymes of hexokinase in yeast (PI, PII and glucokinase): isozymes PI and PII phosphorylate both aldo- and keto-sugars; glucokinase is specific for aldo-hexoses. All three isozymes contain two domains PUBMED:1783373. Structural studies of yeast hexokinase reveal a well-defined catalytic pocket that binds ATP and hexose, allowing easy transfer of the phosphate from ATP to the sugar PUBMED:10749890. Vertebrates contain four hexokinase isozymes, designated I to IV, where types I to III contain a duplication of the two-domain yeast-type hexokinases. Both the N- and C-terminal halves bind hexose and H6P, though in types I an III only the C-terminal half supports catalysis, while both halves support catalysis in type II. The N-terminal half is the regulatory region. Type IV hexokinase is similar to the yeast enzyme in containing only the two domains, and is sometimes incorrectly referred to as glucokinase.

    \ \

    The different vertebrate isozymes differ in their catalysis, localisation and regulation, thereby contributing to the different patterns of glucose metabolism in different tissues PUBMED:12756287. Whereas types I to III can phosphorylate a variety of hexose sugars and are inhibited by glucose-6-phosphate (G6P), type IV is specific for glucose and shows no G6P inhibition. Type I enzyme may have a catabolic function, producing H6P for energy production in glycolysis; it is bound to the mitochondrial membrane, which enables the coordination of glycolysis with the TCA cycle. Types II and III enzyme may have anabolic functions, providing H6P for glycogen or lipid synthesis. Type IV enzyme is found in the liver and pancreatic beta-cells, where it is controlled by insulin (activation) and glucagon (inhibition). In pancreatic beta-cells, type IV enzyme acts as a glucose sensor to modify insulin secretion. Mutations in type IV hexokinase have been associated with diabetes mellitus.

    \ 8114 IPR013200 \

    This family contains haloacid dehalogenase-like hydrolase enzymes.

    \ 7348 IPR009216 \ This entry represents proteins of unknown function. It has been shown in Salmonella enterica that srfB is one of the genes activated by the global signal transduction/regulatory system SsrA/B PUBMED:10844662. This activation takes place within eukaryotic cells. The activated genes include pathogenicity island 2 (SPI-2) genes and at least 10 other genes (srfB is one of them) which are believed to be horizontally acquired, and to be involved in virulence/pathogenicity PUBMED:10844662.\ 4361 IPR006875 \ The dystrophin glycoprotein complex (DGC) is a membrane-spanning complex that links the interior cytoskeleton to the extracellular matrix in muscle. The sarcoglycan complex is a subcomplex within the DGC and is composed of several muscle-specific, transmembrane proteins (alpha-, beta-, gamma-, delta- and zeta-sarcoglycan). The sarcoglycans are asparagine-linked glycosylated proteins with single transmembrane domains. This family contains beta, gamma and delta members PUBMED:12107060, PUBMED:12189167.\ 6083 IPR008106 \

    The pathogenic neisseriae are a small group of virulent bacteria that \ initiate infection at the human host mucosal membranes PUBMED:11173033. They are Gram-negative cocci and usually exist in pairs. Neisseria gonorrhoeae is passed through \ sexual transmission and can cause renal failure in extreme cases. The more\ extreme Neisseria meningitidis is a usually commensal nasopharynx microbe that \ causes meningococcemia and acute bacterial meningitis, especially in young \ children and teenagers PUBMED:11173033. There are several serogroups, of which types \ A, B and C are the most virulent. Despite recent advances in vaccinology, \ this pathogen is highly important to research and still poorly understood PUBMED:11173033.\

    \

    N. meningitidis has many virulence factors, its major determinant being a \ antiphagocytic polysaccharide capsule that allows the bacterium to evade \ the host immune response PUBMED:11738731. Vaccines based on this polysaccharide have \ proven effective against serogroups A and C meningococci, but serogroup B\ still does not possess an efficient vaccine, and causes the most severe \ form of meningitis PUBMED:11738731. It is believed that a conjugate protein vaccine \ derived from published neisserial genome sequences, rather than one based \ on polysaccharide, will be the best way of eradicating this disease PUBMED:11738731.\

    \

    The focus on novel vaccine targets for N. meningitidis has shifted to the \ adhesins the bacterium secretes to colonise host mucosal epithelia before a\ serious infection takes hold PUBMED:11031243. Interaction of these adhesion molecules\ with their cognate host receptors allows bacterial entry to the epithelium,\ intracellular transport across the host cell, and exit into the bloodstream\ on the other side PUBMED:11031243. Following publication of the complete genome sequence\ of an N. meningitidis serogroup B strain PUBMED:10710307, several new adhesins have been \ identified, including one identical to MafB from N. gonorrhoreae.

    \ \ 93 IPR003142 \ The function of this structural domain is unknown. It is found to the C terminus of the biotin protein ligase domain .\ 4998 IPR003526 \ This entry represents the MECDP-synthases, which are enzymes of the deoxy-xylulose pathway (terpenoid biosynthesis). The ygbB protein is a putative enzyme of this type PUBMED:10694574. A number of proteins from eukaryotes and prokaryotes share this common N-terminal signature and have been shown to play a role in terpenoid biosynthesis.\ 148 IPR003672 \ This family contains a domain common to the cobN protein and to magnesium protoporphyrin chelatase. CobN may play a role in cobalt insertion reactions and is implicated in the conversion of precorrin-2 to cobyrinic acid in cobalamin biosynthesis PUBMED:1655697. Magnesium protoporphyrin chelatase is involved in\ chlorophyll biosynthesis as the third subunit of light-independent protochlorophyllide reductase in bacteria and plants PUBMED:8385667.\ 2466 IPR006697 \

    The exodeoxyribonuclease V enzyme is a multisubunit enzyme comprised of the proteins RecB (), RecC (this family) and RecD (). This enzyme plays an important role in homologous genetic recombination, repair of double strand DNA breaks resistance to UV irradiation and chemical DNA-damage. The enzyme () catalyzes hydrolysis of single-stranded (ss) DNA or double-stranded (ds) DNA and unwinding of the ends of dsDNA PUBMED:7746848. Its nuclease activity is controlled by Chi sites (5' G-C-T-G-G-T-G-G 3') in such a way that the enzyme produces a potent single-stranded DNA substrate for homologous pairing by RecA and single-stranded DNA binding proteins.

    \ 3376 IPR005066 \

    This domain is found in molybdopterin cofactor (Mo-co) oxidoreductases. It is involved in dimer formation, and\ has an Ig-fold structure PUBMED:9428520.

    \ 2228 IPR006764 \ This is a family of uncharacterised proteins.\ 5702 IPR008730 \ This family consists of several moth pheromone biosynthesis activating neuropeptide (PBAN) sequences. Female moths produce and release species specific sex pheromones to attract males for mating. Pheromone biosynthesis is hormonally regulated by the Pheromone Biosynthesis Activating Neuropeptide (PBAN) which is biosynthesised in the subesophageal ganglion (SOG) PUBMED:12110297.\ 6072 IPR010422 \

    This family consists of several hypothetical eukaryotic proteins of unknown function.

    \ 5910 IPR010344 \

    This family consists of hypothetical bacterial proteins several of which are described as putative lipoproteins.

    \ 5506 IPR008889 \ This short motif is found in a variety of plant proteins. These proteins vary greatly in length and are mostly composed of low complexity regions. They all conserve a short motif FXhVQChTG, where X is any amino acid and h is a hydrophobic amino acid. The function of this motif is uncertain, however one protein in this family has been found to bind the SigA sigma factor . It would seem plausible that this motif is needed for this activity and that this whole family might be involved in modulating plastid sigma factors.\ 551 IPR007651 \ Mutations in the lipin gene lead to fatty liver dystrophy in mice. The protein has been shown to be phosphorylated by the TOR Ser/Thr protein kinases in response to insulin stimulation. The conserved region is found at the N terminus of the member proteins PUBMED:11138012, PUBMED:11792863.\ 6238 IPR010485 \

    Gurmarin is a 35-residue polypeptide from the Asclepiad vine Gymnema sylvestre. It has been utilised as a pharmacological tool in the study of sweet-taste transduction because of its ability to selectively inhibit the neural response to sweet tastants in rats PUBMED:7787425.

    \ 4571 IPR001831 \

    Like other lentiviruses, human immunodeficiency virus type 1 (HIV-1) encodes a trans-activating regulatory protein (Tat), which is essential for efficient transcription of the viral genome PUBMED:1883204, PUBMED:8058789. Tat acts by binding to an RNA stem-loop structure, the trans-activating response element (TAR), found at the 5' ends of nascent HIV-1 transcripts. In binding to TAR, Tat alters the properties of the transcription complex, recruits a positive transcription elongation complex (P-TEFb) and hence increases the production of full-length viral RNA PUBMED:8058789. Tat protein also associates with RNA polymerase II complexes during early transcription elongation after\ the promoter clearance and before the synthesis of full-length TAR RNA transcript. This interaction of Tat with RNA polymerase II elongation\ complexes is P-TEFb-independent. There are two Tat binding sites on each transcription elongation complex; one is located on\ TAR RNA and the other one on RNA polymerase II near the exit site for nascent mRNA transcripts which suggests that two Tat molecules are\ involved in performing various functions during a single round of HIV-1 mRNA synthesis PUBMED:12126615.

    \

    The minimum Tat sequence that can mediate specific TAR binding in vitro has been mapped to a basic domain of 10 amino acids, comprising mostly Arg and Lys residues. Regulatory activity, however, also requires the 47 N-terminal residues, which interact with components of the transcription complex and function as a transcriptional activation domain PUBMED:8058789, PUBMED:2117500, PUBMED:8121496.

    \ 4114 IPR002592 \ This family consists of the reovirus sigma 1 hemagglutinin,\ cell attachment protein. This glycoprotein is a minor capsid \ protein and also determines the serotype-specific humoral immune response.\ Sigma 1 consist of a fibrous tail and a globular head. The head has\ important roles in the cell attachment function of sigma 1 \ and determinant of the type-specific humoral immune response PUBMED:2398530.\ Reovirus is part of the orthoreovirus group of reoviridae with,\ a dsRNA genome. Also present in this family is bacteriophage SF6 \ lysozyme .\ 1006 IPR003656 \ The BED finger which was named after the Drosophila proteins BEAF and DREF, is\ found in one or more copies in cellular regulatory factors and transposases\ from plants, animals and fungi. The BED finger is an about 50 to 60 amino acid\ residues domain that contains a characteristic motif with two highly conserved\ aromatic positions, as well as a shared pattern of cysteines and histidines\ that is predicted to form a zinc finger. As diverse BED fingers are able to\ bind DNA, it has been suggested that DNA-binding is the general function of\ this domain PUBMED:10973053. Some proteins known to contain a BED domain include animal, plant and fungi AC1 and Hobo-like transposases; Caenorhabditis elegans Dpy-20 protein, a predicted cuticular gene transcriptional regulator; Drosophila BEAF (boundary element-associated factor), thought to be involved in chromatin insulation; Drosophila DREF, a transcriptional regulator for S-phase genes; and tobacco 3AF1 and tomato E4/E8-BP1, light- and ethylene-regulated DNA binding proteins that contain two BED fingers.\ 5638 IPR008712 \ This family consists of several bacteriophage NinF proteins as well as related sequences from Escherichia coli.\ 952 IPR008280 \

    This domain is found in the tubulin alpha, beta and gamma chains, as\ well as the bacterial FtsZ family of proteins. These proteins\ are GTPases and are involved in polymer formation. Tubulin is the major component\ of microtubules, while FtsZ is the polymer-forming protein\ of bacterial cell division, it is part of a ring in the middle of the\ dividing cell that is required for constriction of cell membrane and\ cell envelope to yield two daughter cells. \ FtsZ can polymerise into tubes, sheets, and rings in vitro and is\ ubiquitous in bacteria and archaea. This is the C-terminal domain.

    \ 4601 IPR003711 \

    The bacterium Myxococcus xanthus responds to blue light by producing carotenoids. It also responds to starvation conditions by developing fruiting bodies, where the cells differentiate into myxospores. Each response entails the transcriptional activation of a separate set of genes. A single gene, carD, is required for the activation of both light- and starvation-inducible genes PUBMED:8692912.

    \ \

    The predicted protein contains four repeats of a DNA-binding domain present in mammalian high mobility group I(Y) proteins and other nuclear proteins from animals and plants. Other peptide stretches on CarD also resemble functional domains typical of eukaryotic transcription factors, including a very acidic region and a leucine zipper. High mobility group yI(Y) proteins are known to bind the minor groove of A+T-rich DNA PUBMED:8692912.

    \ 7142 IPR010178 \

    This entry represents a family of highly hydrophobic, uncharacterised predicted integral membrane proteins found almost entirely in low-GC Gram-positive bacteria, although a member is also found in Aquifex aeolicus.

    \ 2457 IPR007859 \ Electron-transfer flavoprotein-ubiquinone oxidoreductase (ETF-QO) in the inner mitochondrial membrane accepts electrons from electron-transfer flavoprotein which is located in the mitochondrial matrix and reduces ubiquinone in the mitochondrial membrane. The two redox centers in the protein, FAD and a [4Fe4S] cluster, are present in a 64 kDa monomer PUBMED:8306995.\ 1691 IPR005535 \

    This family contains a set of cyclic peptides with a variety of activities. The structure consists of a distorted triple-stranded beta-sheet and a cysteine-knot arrangement of the disulphide bonds PUBMED:10600388.

    \ 1230 IPR005569 \ Arc repressor act by the cooperative binding of two Arc repressor dimers to a 21-base-pair operator site. Each Arc dimer uses an antiparallel beta-sheet to recognize bases in the major groove PUBMED:8107872.\ 3335 IPR002935 \ Members of this family are O-methyltransferases. The family includes also bacterial O-methyltransferases that may be involved in antibiotic production PUBMED:8936303.\ 4795 IPR003202 \ The DNA polymerase processivity factor (UL42) of herpes simplex virus forms a heterodimer with UL30 to create the viral DNA polymerase complex. UL42 functions to increase the processivity of polymerisation and makes little contribution to the catalytic activity of the polymerase.\ 146 IPR003781 \ This domain has a Rossmann fold and is found in a number of proteins including succinyl CoA synthetases,\ malate and ATP-citrate ligases.\ 6959 IPR009814 \

    This family consists of several hypothetical Escherichia coli and bacteriophage lambda-like proteins of around 60 residues in length. The function of this family is unknown.

    \ 6194 IPR010472 \

    Formin homology (FH) proteins play a crucial role in the reorganization of the actin cytoskeleton, which mediates various functions of the cell cortex including motility, adhesion, and cytokinesis PUBMED:10631086. Formins are multidomain proteins that interact with diverse signalling molecules and cytoskeletal proteins, although some formins have been assigned functions within the nucleus. Formins are characterised by the presence of three FH domains (FH1, FH2 and FH3), although members of the formin family do not necessarily contain all three domains PUBMED:12538772. The proline-rich FH1 domain mediates interactions with a variety of proteins, including the actin-binding protein profilin, SH3 (Src homology 3) domain proteins, and WW domain proteins. The FH2 domain () is required to inhibit actin polymerisation. The FH3 domain is less well conserved and is required for directing formins to the correct intracellular location, such the mitotic spindle PUBMED:11171383, or the projection tip during conjugation PUBMED:9606213. In addition, some formins can contain a GTPase-binding domain (GBD) () required for binding to Rho small GTPases, and a C-terminal conserved Dia-autoregulatory domain (DAD).

    \

    This entry represents the FH3 domain.

    \ 6052 IPR009351 \

    This is a family of conserved bacterial proteins with unknown function.

    \ 4235 IPR001266 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This family includes a number of eukaryotic and archaebacterial ribosomal proteins;\ mammalian S19, Drosophila S19, Ascaris lumbricoides S19g (ALEP-1) and S19s, yeast YS16 \ (RP55A and RP55B), Aspergillus S16 and Haloarcula marismortui HS12.

    \ 7793 IPR012923 \

    Replication fork pausing is required to initiate recombination events. More specifically, Swi1 is required for recombination near the mat1 locus. Swi3 has been found to co-purify with Swi1. Together they define a fork protection complex that coordinates leading- and lagging-strand synthesis and stabilises stalled replication forks PUBMED:15367656. This complex is required for accurate replication, fork protection and replication checkpoint signalling PUBMED:15367656, PUBMED:15371597

    \ 1825 IPR007637 \ Members of this family are type II restriction enzymes (). They recognise the double-stranded unmethylated sequence GATC and cleave before G-1 PUBMED:11133943.\ 6699 IPR010680 \

    This family consists of several TraH proteins, which seem to be specific to Agrobacterium and Rhizobium species. This protein is thought to be involved in conjugal transfer but its function is unknown. This family does not appear to be related to .

    \ 5840 IPR010309 \

    This is a domain of unknown function found at the N-terminus of a family of E3 ubiquitin protein ligases, including yeast TOM1, many of which appear to play a role in mRNA transcription and processing. This domain is found in association with and immediately N-terminal to another domain of unknown function: .

    \ 5550 IPR008888 \ This domain consists of several Ustilago mating-type proteins. The b locus of the phytopathogenic fungus Ustilago maydis encodes a multiallelic recognition function that controls the ability of the fungus to form a dikaryon and complete the sexual stage of the life cycle. The b locus has at least 25 alleles and any combination of two different alleles, brought together by mating between haploid cells, allows the fungus to cause disease and undergo sexual development within the plant PUBMED:2227416.\ 5783 IPR010278 \

    This family consists of a number of glycoprotein gp2 sequences from equine herpesviruses.

    \ 4473 IPR007806 \ This family is found in proteins involved in transferring a group of integrating conjugative DNA elements, such as pSAM2 from Streptomyces ambofaciens during mating PUBMED:8366038. Their precise role is not known.\ 7799 IPR001340 \ Bacterial hemolysins are exotoxins that attack blood cell membranes and cause cell\ rupture by mechanisms not clearly defined. Leukocidin consists of two protein components,\ F and S and causes cytotoxic changes in polymorphonuclear leukocytes.\ 4680 IPR002816 \

    pAD1 is a hemolysin/bacteriocin plasmid originally identified in\ Enterococcus faecalis DS16. It encodes a mating response to a peptide\ sex pheromone, cAD1, secreted by recipient bacteria. Once the plasmid\ pAD1 is acquired, production of the pheromone ceases, a trait related\ in part to a determinant designated traB PUBMED:8029329. However, this family also contains plant and mammalian proteins, suggesting that the protein may have a somewhat wider function.

    \ 1712 IPR002689 \ Glycoprotein L from cytomegalovirus serves a chaperone for the correct folding and surface expression of glycoprotein H (gH) PUBMED:7964634. Glycoprotein L is a member of the heterotrimeric gCIII complex of glycoprotein which also includes gH and gO and has an essential role in viral fusion PUBMED:10196283.\ 1945 IPR004306 \ This domain is found entirely in Mycoplasma pneumoniae proteins of unknown function. Another related domain () is found entirely in mycoplasmal proteins of the MG032/MG096/MG288 family and both domains often occur together.\ 1609 IPR003182 \ The virus capsid is composed 60 icosahedral units, each of which is composed of one copy of each of the two coat proteins. This family contains the small coat protein (SCP) PUBMED:1546463 of the comoviridae viral family.\ 4535 IPR006070 \

    Several uncharacterized proteins of from 20 to 46 kDa have been shown to contain a number of conserved regions in their N-terminal section. These include yeast protein SUA5 ().

    \ 4465 IPR006917 \ This family represents a group of putative heme-binding proteins PUBMED:10640688. It includes archaeal and bacterial homologues.\ 5229 IPR008715 \ This family consists of nodulation S (NodS) proteins. The products of the rhizobial nodulation genes are involved in the biosynthesis of lipochitin oligosaccharides (LCOs), which are host-specific signal molecules required for nodule formation. NodS is an S-adenosyl-L-methionine (SAM)-dependent methyltransferase involved in N methylation of LCOs. NodS uses N-deacetylated chitooligosaccharides, the products of the NodBC proteins, as its methyl acceptors PUBMED:11344149.\ 4043 IPR000762 \ Several extracellular heparin-binding proteins involved in regulation of growth and differentiation belong to a new family of growth factors. These growth factors are highly related proteins of about 140 amino acids that contain 10 conserved cysteines probably involved in disulphide bonds, and include pleiotrophin PUBMED:15121180 (also known as heparin-binding growth-associated molecule HB-GAM, heparin-binding growth factor 8 HBGF-8, heparin-binding neutrophic factor HBNF and osteoblast specific protein OSF-1); midkine (MK) PUBMED:15047154; retinoic acid-induced heparin-binding protein (RIHB) PUBMED:7796887; and pleiotrophic factors alpha-1and -2 and beta-1 and -2 from Xenopus laevis, the homologs of midkine and pleiotrophin respectively. Pleiotrophin is a heparin-binding protein that has neurotrophic activity and has mitogenic activity towards fibroblasts. It is highly expressed in brain and uterus tissues, but is also found in gut, muscle and skin. It is thought to possess an important brain-specific function. Midkine is a regulator of differentiation whose expression is regulated by retinoic acid, and, like pleiotrophin, is a heparin-binding growth/differentiation factor that acts on fibroblasts and nerve cells.\ 3172 IPR001236 \ L-lactate dehydrogenases are metabolic enzymes which catalyse the conversion of \ L-lactate to pyruvate, the last step in anaerobic glycolysis. L-lactate dehydrogenase \ is also found as a lens crystallin in bird and crocodile eyes. L-2-hydroxyisocaproate \ dehydrogenases are also members of the family. \ \ Malate dehydrogenases catalyse the interconversion of malate to oxaloacetate. The \ enzyme participates in the citric acid cycle.\ 214 IPR006050 \

    DNA photolyases are enzymes that bind to DNA containing pyrimidine dimers:\ on absorption of visible light, they catalyse dimer splitting into the\ constituent monomers, a process called photoreactivation PUBMED:6325459. This is a DNA\ repair mechanism, repairing mismatched pyrimidine dimers induced by\ exposure to ultra-violet light PUBMED:3000886. The precise mechanisms involved in\ substrate binding, conversion of light energy to the mechanical energy\ needed to rupture the cyclobutane ring, and subsequent release of the\ product are uncertain PUBMED:6325459. Analysis of DNA lyases has revealed the presence\ of an intrinsic chromophore, all monomers containing a reduced FAD moiety,\ and, in addition, either a reduced pterin or 8-hydroxy-5-diazaflavin as a\ second chromophore PUBMED:3000886, PUBMED:2110564. Either chromophore may act as the primary photon\ acceptor, peak absorptions occurring in the blue region of the spectrum\ and in the UV-B region, at a wavelength around 290nm PUBMED:2110564.

    This domain binds a light harvesting cofactor.

    \ 6155 IPR010457 \

    This domain is a ligand-binding immunoglobulin-like domain PUBMED:9501088. The two cysteine residues form a disulphide bridge.

    \ 6652 IPR008313 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 194 IPR007722 \ This presumed domain is always found to the N-terminal side of the NUDIX hydrolase domain . This domain appears to be specific to mRNA decapping protein 2 and its close homologues. This region has been termed Box A PUBMED:12218187.\ 2248 IPR007670 \ This family contains several uncharacterised proteins.\ 3917 IPR003869 \ This is a family of diverse bacterial polysaccharide biosynthesis proteins including the CapD protein from Staphylococcus aureus PUBMED:7961465, the WalL protein, mannosyl-transferase PUBMED:9079898, and several putative epimerases. The CapD protein is required for biosynthesis of type 1 capsular polysaccharide.\ 907 IPR004600 \ Members of this family are part of the TFIIH complex which is involved in the initiation of transcription and nucleotide excision repair. The core-TFIIH basal transcription factor complex has six subunits, this is the p34 subunit.\ 1838 IPR002803 \

    The function of this family of proteins from the Archaea is unknown. A single homolog is found in the bacterium, Aquifex aeolicus.

    \ 7707 IPR013097 \

    The function of this domain is unknown, but it is upregulated in response to salt stress in Populus balsamifera (balsam poplar) PUBMED:14704136. It is also found at the C-terminus of a fructose 1,6-bisphosphate aldolase from Hydrogenophilus thermoluteolus () PUBMED:10705449. is found in the pA01 plasmid, which encodes genes for molybdopterin uptake and degradation of plant alkaloid nicotine. The structure of one has been solved () and the domain forms an alpha-beta barrel dimer PUBMED:14872131. Although there is a clear duplication within the domain it is not obviously detectable in the sequence.

    \ 5012 IPR001628 \

    Steroid or nuclear hormone receptors constitute an important superfamily of transcription regulators that are involved in widely diverse physiological functions, including control of embryonic development, cell differentiation and homeostasis. The receptors function as dimeric molecules in nuclei to regulate the transcription of target genes in a ligand-responsive manner. Nuclear hormone receptors consist of a highly conserved DNA-binding domain that recognises specific sequences, connected via a linker region to a C-terminal ligand-binding domain (). In addition, certain nuclear hormone receptors have an N-terminal modulatory domain (). The DNA-binding domain can elicit either an activating or repressing effect by binding to specific regions of the DNA known as hormone-response elements PUBMED:15242341, PUBMED:15242339. These response elements position the receptors, and the complexes recruited by them, close to the genes of which transcription is affected. The DNA-binding domains of nuclear receptors consist of two zinc-nucleated modules and a C-terminal extension, where residues in the first zinc module determine the specificity of the DNA recognition and residues in the second zinc module are involved in dimerisation. The DNA-binding domain is furthermore involved in several other functions including nuclear localization, and interaction with transcription factors and co-activators PUBMED:15242339. This entry represents the two zinc finger modules involved in DNA-binding.

    \ 1694 IPR000277 \ A number of pyridoxal-dependent enzymes involved in the metabolism of cysteine, homocysteine and methionine have been shown PUBMED:1577698, PUBMED:8511966 to be evolutionary related. These enzymes are proteins of about 400 amino-acid residues. The pyridoxal-P group is attached to a lysine residue located in the central section of these enzymes.\ 5636 IPR008783 \ This family consists of several mammalian podoplanin-like proteins which are thought to control specifically the unique shape of podocytes PUBMED:12032185.\ 5306 IPR008795 \ The prominins are an emerging family of proteins that, among the multispan membrane proteins, display a novel topology. Mouse and Homo sapiens prominin and (Mus musculus) prominin-like 1 (PROML1) are predicted to contain five membrane spanning domains, with an N-terminal domain exposed to the extracellular space followed by four, alternating small cytoplasmic and large extracellular, loops and a cytoplasmic C-terminal domain PUBMED:11467842. The exact function of prominin is unknown although in humans defects in PROM1, the gene coding for prominin, cause retinal degeneration PUBMED:10587575.\ 5987 IPR010378 \

    This is a family of uncharacterised eukaryotic proteins.

    \ 249 IPR004353 \

    The sequence of a 6.8kb DNA fragment from Saccharomyces cerevisiae \ chromosome VII has been analysed PUBMED:8896269. The sequence was found to contain\ five open reading frames (ORFs) greater than 100 amino acids in length. One \ of these (a 73.5kDa protein) shares similarity with the 58.0kDa SPAC1D4.03C\ from Schizosaccharomyces pombe, and with hypothetical proteins from Homo \ sapiens, Drosophila melanogaster, Caenorhabditis elegans and Fugu rubripes.

    \

    The sequences are characterised by a variable N-terminal domain and a more\ conserved C-terminal domain. They share no similarity with any other known, \ functionally or structurally characterised proteins.

    \ 5565 IPR008456 \ The domain fold is a jelly-roll, composed of two antiparallel beta-sheets and two short alpha-helices PUBMED:9334749. A groove on beta-sheet I exhibited the best surface complementarity to the collagen. This site partially overlaps with the peptide sequence previously shown to be critical for collagen binding. Recombinant proteins containing single amino acid mutations designed to disrupt the surface of the putative binding site exhibited significantly lower affinities for collagen.\ 2879 IPR003860 \ This is a group of hemagglutinin esterases from influenza C and Coronaviruses. Hemagglutinin esterases are membrane glycoproteins present on the surface of the virus and are involved with the cell infection process.\ 6348 IPR010529 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 1760 IPR002220 \ Dihydropicolinate synthase (DHDPS) is the key enzyme in lysine biosynthesis\ via the diaminopimelate pathway of prokaryotes, some phycomycetes and\ higher plants. The enzyme catalyses the condensation of L-aspartate-beta-\ semialdehyde and pyruvate to dihydropicolinic acid via a ping-pong\ mechanism in which pyruvate binds to the enzyme by forming a Schiff-base\ with a lysine residue PUBMED:7853400. Three other proteins are structurally related to DHDPS and probably also act\ via a similar catalytic mechanism. These are Escherichia coli N-acetylneuraminate lyase () (gene nanA), which\ catalyzes the condensation of N-acetyl-D-mannosamine and pyruvate to form\ N-acetylneuraminate; Rhizobium meliloti protein mosA PUBMED:8349559, which is involved in the biosynthesis\ of the rhizopine 3-o-methyl-scyllo-inosamine; and E. coli hypothetical protein yjhH.\ The sequences of DHDPS from different sources are well-conserved. The\ structure takes the form of a homotetramer, in which 2 monomers are\ related by an approximate 2-fold symmetry PUBMED:7853400. Each monomer comprises\ 2 domains: an 8-fold alpha-/beta-barrel, and a C-terminal alpha-helical\ domain. The fold resembles that of N-acetylneuraminate lyase. The active\ site lysine is located in the barrel domain, and has access via 2 channels\ on the C-terminal side of the barrel.\ 7457 IPR011481 \

    These hypothetical proteins in Rhodopirellula baltica have a conserved C-terminal region.

    \ 7812 IPR012598 \

    This repeat is found in two hypothetical Plasmodium proteins.

    \ 3861 IPR000341 \

    Phosphatidylinositol 3-kinase (PI3K) () is an enzyme that phosphorylates phosphoinositides on the 3-hydroxyl group of the inositol ring. \ A subset of PI3Ks has the capacity to bind and be activated by the GTP-bound small GTPase p21Ras\ (Ras). PI3Ks are recognized as one of the principal effectors of Ras\ signalling to the cell-cycle control machinery.

    In the structure of the Ras–PI3K gamma complex, contacts between the two molecules are made\ primarily via the so-called switch I region of Ras and the PI3K RBD. The RBD fold comprises a five-stranded mixed beta-sheet,\ flanked by two alpha-helices. Interaction between Ras and the PI3K RBD is primarily polar in character and, as characterized by\ kinetic measurements, is reversible and transient PUBMED:12151228.

    \ \ These regions show some similarity (although not highly \ significant similarity) to Ras-binding domains (unpublished observation).

    \ 5904 IPR009280 \

    This family consists of several short Orthopoxvirus F14 proteins. The function of this protein is unknown.

    \ 4063 IPR003850 \

    Phosphoribosylformylglycinamidine(FGAM) synthetase, , catalyses the fourth step in the de\ novo purine biosynthetic pathway.

    \ \ \

    \ In eukaryotes and many bacterial systems (including Escherichia coli and\ Salmonella typhimurium), the FGAM synthetase is encoded\ by a large protein with an N-terminal ATPase\ domain and a C-terminal glutamine-binding domain. In\ archaeal and other bacterial systems, however, FGAM\ synthetase is encoded by separate genes, making it a\ multisubunit (rather than multidomain) enzyme. For example,\ in Bacillus subtilis, the purL protein is homologous\ to the ATPase domain, whereas the purQ protein is\ homologous to the glutamine-binding domain of the single chain\ FGAM synthetases.

    \ \

    The purL and purQ genes are part of the pur operon in\ B. subtilis, which encodes 11 of the 12 enzymes in the\ purine biosynthetic pathway. The genetic studies also\ identified an open reading frame (ORF) of 84 amino acids\ in this operon, now known as purS, which is conserved in\ a large group of Gram-positive bacteria and methanogenic\ archaea.

    \ \

    Recent studies showed that disruption of\ the purS gene in B. subtilis resulted in a purine auxotrophic\ phenotype, due to defective FGAM synthetase\ activity. Therefore, the purS protein appears to be required\ for the function of the purL and purQ subunits of\ the FGAM synthetase, but the molecular mechanism for\ the functional role of purS is currently not known.

    \ 8059 IPR013183 \

    This is a family of fungal proteins of unknown function.

    \ 6404 IPR010557 \

    This family consists of a number of hypothetical proteins from Escherichia coli O157:H7 and Salmonella typhi. The function of this family is unknown.

    \ 5849 IPR010313 \

    This family consists of several mammalian specific aralkyl acyl-CoA:amino acid N-acyltransferase (glycine N-acyltransferase) proteins .

    \ 6469 IPR010589 \

    This family consists of several Paramyxovirus structural protein V sequences from the Nipah and Hendra virus.

    \ 4984 IPR006780 \

    YABBY proteins are a group of plant-specific transcription factors involved in the specification of abaxial polarity in lateral organs such as leaves and floral organs PUBMED:10679447, PUBMED:11858837.

    \ 982 IPR003123 \ This domain is present in yeast vacuolar sorting protein 9 and other proteins.\ 4146 IPR001763 \

    Rhodanese, a sulphurtransferase involved in cyanide detoxification (see ) shares evolutionary relationship with a large family of proteins PUBMED:9733650, including\

    \

    Rhodanese has an internal duplication. This domain is found as a single copy in other proteins, including phosphatases and ubiquitin C-terminal hydrolases PUBMED:8702871.

    \ 717 IPR006020 \

    The PI domain has a similar structure to the insulin receptor substrate-1 \ PTB domain, a 7-stranded beta-sandwich, capped by a C-terminal helix.\ However, the PI domain contains an additional short N-terminal helix and a\ large insertion between strands 1 and 2, which forms a helix and 2 long\ connecting loops. The substrate peptide fits into a surface cleft formed\ from the C-terminal helix and strand 5 PUBMED:8599766.

    \ 5360 IPR008462 \ CsbD is a bacterial general stress response protein. It's expression is mediated by sigma-B, an alternative sigma factor PUBMED:11988534. The role of CsbD in stress response is unclear.\ 1420 IPR002601 \

    This domain of unknown function is found at the C-terminus in a number of Caenorhabditis elegans proteins. It may be an extracellular domain. Most copies of the C6\ domain contain six conserved cysteine residues. However some copies of the domain are missing cysteine residues\ 1 and 3 suggesting that these form a disulphide bridge. In there are 18 copies of the domain.

    \ 4903 IPR007148 \ This domain is found at the C terminus of proteins containing WD40 repeats. These proteins are part of the U3 ribonucleoprotein and the yeast protein is called Utp12 or DIP2 PUBMED:12068309.\ 1073 IPR000582 \

    Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and long-chain acyl-CoA esters\ with high affinity, and may act as an intra-cellular carrier of acyl-CoA esters. ACBP has a number of important\ physiological and biochemical functions: it is known as a diazepam binding inhibitor, as a putative neurotransmitter,\ as a regulator of insulin release from pancreatic cells, and as a mediator in corticotropin-dependent adrenal\ steroidogenesis PUBMED:3525533, PUBMED:1518047. It is possible that the protein acts as a neuropeptide that takes part\ in the modulation of gamma-aminobutyric acid-ergic transmission PUBMED:3525533. The structure of ACBP has been deduced\ by NMR spectroscopy and has been shown to be a mainly-alpha protein, consisting of 5 short alpha-helices and 3\ connecting beta-strands PUBMED:1518047.

    \ \

    ACBP is a highly conserved protein of about 90 residues that has been so far found in vertebrates, insects, plants\ and yeast. Other proteins belonging to the ACBP family include mouse endozepine-like peptide (ELP) (gene DBIL5) \ PUBMED:8898349; mammalian MA-DBI, a transmembrane protein of unknown function which has been found in mammals; and \ human DRS-1 PUBMED:10354522, a protein of unknown function that contains a N-terminal ACBP-like domain and a C-terminal \ enoyl-CoA isomerase/hydratase domain.

    \ 5834 IPR009253 \

    This family consists of several short hypothetical proteobacterial proteins of unknown function.

    \ 2559 IPR000563 \ Many flagellar proteins are exported by a flagellum-specific export pathway. Attempts have been made to characterise\ the apparatus responsible for this process, by designing assays to screen for mutants with export defects.\ Experiments involving filament removal from temperature-sensitive flagellar mutants of Salmonella typhimurium have\ shown that, while most mutants were able to regrow filaments, flhA, fliH, fliI and fliN mutants showed no or greatly\ reduced regrowth. This suggests that the corresponding gene products are involved in the process of flagellum-specific export PUBMED:1646201. The sequence of fliH has been deduced and shown to encode a protein of molecular mass\ of 25,782 Da.\ 7959 IPR012599 \

    This motif is found at the N-terminal of some members of the Peptidase_C1 family () and is involved in activation of this peptidase.

    \ 3430 IPR007848 \ This domain is found in ribosomal RNA small subunit methyltransferase C (e.g. ) as well as other methyltransferases (e.g. ).\ 3644 IPR001106 \ Phenylalanine ammonia-lyase () (PAL) is a key enzyme of plant and\ fungi phenylpropanoid metabolism, involved in the biosynthesis of a wide\ variety of secondary metabolites such as flavanoids, furanocoumarin phytoalexins\ and cell wall components. These compounds are important for normal growth and in\ responses to environmental stress.\ \

    The family also includes histidine ammonia-lyase () (histidase) that catalyzes the first step in\ histidine degradation, the removal of an ammonia group from histidine to produce\ urocanic acid.

    \ 5609 IPR008677 \ This family consists of mammalian MRVI1 proteins which are related to the lymphoid-restricted membrane protein (JAW1) and the IP3 receptor associated cGMP kinase substrates A and B (IRAGA and IRAGB). The function of MRVI1 is unknown although mutations in the Mrvi1 gene induces myeloid leukaemia by altering the expression of a gene important for myeloid cell growth and/or differentiation so it has been speculated that Mrvi1 is a tumour suppressor gene PUBMED:10321731. IRAG is very similar in sequence to MRVI1 and is an essential NO/cGKI-dependent regulator of IP3-induced calcium release. Activation of cGKI decreases IP3-stimulated elevations in intracellular calcium, induces smooth muscle relaxation and contributes to the antiproliferative and pro-apoptotic effects of NO/cGMP PUBMED:10724174. Jaw1 is a member of a class of proteins with COOH-terminal hydrophobic membrane anchors and is structurally similar to proteins involved in vesicle targeting and fusion. This suggests that the function and/or the structure of the ER in lymphocytes may be modified by lymphoid-restricted resident ER proteins PUBMED:8021504.\ 1295 IPR007135 \

    Autophagocytosis is a starvation-induced process responsible for transport of cytoplasmic proteins to the vacuole. This domain is the C-terminal while the N-terminal is represented by .

    \ 3186 IPR003887 \

    The LEM domain is found in nuclear membrane-associated proteins, including lamino-associated polypeptide 2 and emerin PUBMED:11792821. Defects in the emerin gene are a cause of Emery-Dreifuss muscular dystrophy, an X-linked disorder characterised by early contractures, muscle wasting, weakness and cardiomyopathy.

    \ \ 3078 IPR003086 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    This family of proteins represent monomeric serralysin inhibitors of about 125 residues, which interact with specific metalloprotease which are synthesised by serralysin secretors and characterised by being plant, insect and animal pathogens. It is probable that the serralysin inhibitors protect the host from proteolysis during export of the protease. The members of this family belong to MEROPS proteinase inhibitor family I38, clan IK.

    \ \

    X-ray crystallography of a complex between the Serratia marcescens protease, SmaPI, and the inhibitor of Erwinia chrysanthemi, Inh, reveals that Inh is folded into an eight-stranded b-barrel with an N-terminal trunk of 10 residues. Residues 15 occupy part of the extended active site of the proteinase, thereby preventing access of the substrate. Residues 610 form a linker that connects the N-terminal proteinase-binding peptide to the body of the b-barrel. The backbone carbonyl of Ser-1 interacts with the catalytic zinc; the Ser-2 side chain occupies the S1-binding site and also forms a hydrogen bond to the carboxyl end of the catalytic Glu, whereas Leu-3 occupies the S2 recognition site. Penetration of the trunk region further than 5 residues into the substrate binding cleft appears to be prevented by the b-barrel, which itself interacts with the proteinase near its Met turn (19). Peptide mimetics of the trunk at concentrations up to about 100 mM do not inhibit the protease, demonstrating that the barrel is essential for inhibitory activity PUBMED:10770939, PUBMED:7752231.

    \ \

    Structurally and functionally these inhibitors are closely related to the \ lipocalins, fatty acid-binding proteins, avidins and the enigmatic triabin.\ Together these five protein families constitute the calycin superfamily PUBMED:7684291. \ The proteins are characterised by their high specificity for small hydrophobic molecules and by their ability to form complexes with soluble macromolecules either through intramolecular disulphides or protein-protein interactions PUBMED:8761444.

    \ \ 1970 IPR005096 \ This family is specific to Borrelia burgdorferi. The protein is encoded on extrachromosomal DNA and is of unknown function.\ 4804 IPR005839 \

    This family is defined only on sequence similarity. The size of proteins belonging to this family range from 47 to 61 kDa and contain six conserved cysteines, three of which are clustered.

    \ 6339 IPR009483 \

    This family consists of several invasion plasmid antigen IpaD proteins. Entry of Shigella flexneri into epithelial cells and lysis of the phagosome involve the IpaB, IpaC, and IpaD proteins, which are secreted by type III secretion machinery, and appear to form a multi-protein complex capable of inducing the phagocytic event which internalizes the bacterium PUBMED:11083774.

    \ 3871 IPR007446 \ PilQ is essential for the biogenesis of type IV pili. Its precise function is unknown, but it has been suggested that it may act as a pilus channel in the final stages of pilus assembly.\ 4353 IPR001636 \

    Phosphoribosylaminoimidazole-succinocarboxamide synthase () (SAICAR synthetase) catalyzes the seventh step in the de novo purine biosynthetic pathway; the ATP-dependent conversion of 5'-phosphoribosyl-5-aminoimidazole-4-carboxylic acid and aspartic acid to SAICAR PUBMED:1574589.

    \

    In bacteria (purC), fungi (ADE1) and plants (Pur7), SAICAR synthetase is a monofunctional protein; in animals it is the N-terminal domain of a bifunctional enzyme that also catalyse phosphoribosylaminoimidazole carboxylase (AIRC) activity (see ).

    \ \ 2642 IPR002988 \

    The protein G-related albumin-binding (GA)) module is\ composed of three alpha helices PUBMED:9086265. This module is\ found in a range of bacterial cell surface proteins.\ The GA module from the Peptostreptococcus magnus albumin-binding protein (PAB) shows a strong affinity\ for albumin.

    \ 4858 IPR005350 \

    This family of bacterial proteins includes a number of plasmid-encoded virulence proteins.

    \ 1439 IPR007542 \ This family includes the major capsid protein of iridoviruses, chlorella virus and Spodoptera ascovirus, which are all dsDNA viruses with no RNA stage. This is the most abundant structural protein and can account for up to 45% of virion protein PUBMED:10082389. In Chlorella virus PBCV-1 the major capsid protein is a glycoprotein PUBMED:1566573.\ 3434 IPR005588 \

    The members of this family are regulators of the anti-sigma E protein RseD.

    \ 244 IPR004314 \

    This domain is found in a number of Arabidopsis thaliana and other plant proteins of unknown function. A small number of the proteins that contain this domain are annotated as carboxyl-terminal proteinase-like.

    \ 7400 IPR011421 \

    Bucentaur or craniofacial development protein 1 (BCNT) in ruminents has a different domain architecture to that in mouse and human. For this reason it has been used as a model for molecular evolution PUBMED:9602175,12832649,PUBMED:9006920,PUBMED:15475170,PUBMED:11368901. Both bovine and human BCNTs are phosphorylated by casein kinase II in vitro PUBMED:10350657.

    \ 2715 IPR007788 \ This family of enzymes catalyse the cyclization of free L-glutamine and N-terminal glutaminyl residues in proteins to pyroglutamate (5-oxoproline) and pyroglutamyl residues respectively PUBMED:11035947. This family includes plant and bacterial enzymes and seems unrelated to the mammalian enzymes.\ 2756 IPR000556 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 48 comprises enzymes with several known activities; endoglucanase (); cellobiohydrolase ().

    \ \

    The largest cellulase gene sequenced to\ date is one of the cellulases (celA) from the genome of the thermophilic anaerobic bacterium Caldocellum\ saccharolyticum. The celA gene product is a polypeptide of 1751 amino acids; this has a multidomain structure\ comprising two catalytic domains and two cellulose-binding domains, linked by Pro-Thr-rich regions. The\ N-terminal domain encodes an endoglucanase activity on carboxymethylcellulose, consistent with its similarity\ to several endo-1, 4-beta-D-glucanase sequences. The C-terminal domain shows similarity to a cellulase from\ Clostridium thermocellum (CelS), which acts synergistically with a second component to hydrolyse crystalline\ cellulose PUBMED:7612247.

    \ 1803 IPR001001 \ Describes the beta chain of DNA polymerase III. This is a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria. The beta chain is required for initiation of replication from an RNA primer, nucleotide triphosphate (dNTP)\ residues being added to the 5'-end of the growing DNA chain.\ 1367 IPR001597 \

    This family includes tryptophanase (tryptophan indole-lyase, TNase) (), tyrosine phenol-lyase (TPL) (), threonine aldolase ().

    \ 5972 IPR010372 \

    DNA polymerase III, delta subunit () is required for, along with delta' subunit, the assembly of the processivity factor beta(2) onto primed DNA in the DNA polymerase III holoenzyme-catalysed reaction PUBMED:11432857. The delta subunit is also known as HolA.

    \ 5273 IPR008790 \

    This family of proteins contain poxvirus serine/threonine protein kinases, which are essential for phosphorylation of virion proteins during virion assembly.

    \ 5452 IPR008664 \ This domain consists of mammalian LISCH7 protein homologues. LISCH7 is a liver-specific BHLH-ZIP transcription factor.\ 6637 IPR009635 \

    This family consists of several neural proliferation differentiation control-1 (NPDC1) proteins. NPDC1 plays a role in the control of neural cell proliferation and differentiation. It has been suggested that NPDC1 may be involved in the development of several secretion glands. This family also contains the C-terminal region of the Caenorhabditis elegans protein CAB-1 () which is known to interact with AEX-3 PUBMED:10970871.

    \ 452 IPR006169 \

    Several proteins have recently been shown to contain the 5 structural motifs characteristic\ of GTP-binding proteins PUBMED:1449490. These include murine DRG protein; GTP1 protein\ from Schizosaccharomyces pombe; OBG protein from Bacillus subtilis; and several others.\ Although the proteins contain GTP-binding motifs and are similar to each other, they do\ not share sequence similarity to other GTP-binding proteins, and have thus been classed\ as a novel group, the GTP1/OBG family. As yet, the functions of these proteins is uncertain,\ but they have been shown to be important in development and normal cell metabolism\ PUBMED:8462872, PUBMED:2537815.

    \ 1943 IPR004256 \

    This represents a C-terminal domain of unknown function, usually fused to a prokaryotic putative DEXX-box ATPase domain () PUBMED:9045616.

    \ 1634 IPR004293 \ Members of this family are non-structural proteins that are found in\ transmissible gastroenteritis coronavirus (TGEV) and porcine respiratory coronavirus (PRCV) isolates. These proteins\ are found on the same mRNA as another product, designated ORF3a. While ORF3a/b has been implicated in TGEV\ and PRCV pathogenesis, its precise role remains unclear PUBMED:10948987, PUBMED:10365166.\ 6460 IPR010585 \

    This family consists of several mammalian specific DNA double-strand break repair and V(D)J recombination protein XRCC4 sequences. In the non-homologous end joining pathway of DNA double-strand break repair, the ligation step is catalysed by a complex of XRCC4 and DNA ligase IV. It is thought that XRCC4 and ligase IV are essential for alignment-based gap filling, as well as for final ligation of the breaks PUBMED:12517771.

    \ 5181 IPR008018 \

    The phage head-tail attachment protein is required for the joining of phage heads and tails at the\ last step of morphogenesis PUBMED:12083526.

    \ 3000 IPR006961 \ HrpZ (harpin elicitor) from the plant pathogen Pseudomonas syringae binds to lipid bilayers and forms a cation-conducting pore in vivo. This pore-forming activity may allow nutrient release or delivery of virulence factors during bacterial colonisation of host plants PUBMED:11134504.\ 6566 IPR010619 \

    This family represents a conserved region within a number of hypothetical proteins of unknown function found in eukaryotes, bacteria and archaea. Some family members are membrane proteins.

    \ 2838 IPR007812 \ This family consists of general secretion pathway protein L sequences from several Gram-negative bacteria. The general secretion pathway of Gram-negative bacteria is responsible for extracellular secretion of a number of different proteins, including proteases and toxins. This pathway supports secretion of proteins across the cell envelope in two distinct steps, in which the second step, involving translocation through the outer membrane, is assisted by at least 13 different gene products. GspL is predicted to contain a large cytoplasmic domain and has been shown to interact with the autophosphorylating cytoplasmic membrane protein GspE. It is thought that the tri-molecular complex of GspL, GspE and GspM might be involved in regulating the opening and closing of the secretion pore and/or transducing energy to the site of outer membrane translocation PUBMED:10322014.\ 1494 IPR004852 \

    This is a group of distinct cytochrome c peroxidases (CCPs) that contain two haem groups. Similar to other cytochrome c peroxidases, they reduce hydrogen peroxide to water using c-type haem as an oxidizable substrate. However, since they possess two, instead of one, haem prosthetic groups, bacterial CCPs reduce hydrogen peroxide without the need to generate semi-stable free radicals. The two haem groups have significantly different redox potentials. The high potential (+320 mV) haem feeds electrons from electron shuttle proteins to the low potential (-330 mV) haem, where peroxide is reduced (indeed, the low potential site is known as the peroxidatic site) PUBMED:8591033. The CCP protein itself is structured into two domains, each\ containing one c-type haem group, with a calcium-binding site at the domain interface. This family also includes MauG proteins, whose similarity to di-haem CCP was previously recognized PUBMED:9202457.

    \ \ 3617 IPR001133 \ NADH-ubiquinone oxidoreductase, chain 4L () catalyses the reduction\ of ubiquinone to ubiquinol. It is present in either mitochondria or chloroplasts as\ part of the respiratory-chain NADH dehydrogenase (also known as complex I or\ NADH-ubiquinone oxidoreductase), an oligomeric enzymatic complex.\ \ 39 IPR002502 \ This family includes zinc amidases that have N-acetylmuramoyl-L-alanine\ amidase activity This enzyme domain cleaves the amide bond\ between N-acetylmuramoyl and L-amino acids in bacterial cell walls\ (preferentially: D-lactyl-L-Ala). The structure is known for the\ bacteriophage T7 structure and shows that two of the conserved histidines\ are zinc binding.\ 3418 IPR002844 \ This archaeal enzyme family is involved in formation of methane from\ carbon dioxide . The enzyme requires coenzyme F420 PUBMED:7852356.\ 1287 IPR008218 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    This family also includes the 14 kDa subunit from vATPases PUBMED:8682310 and archaebacterial H+-transporting two-sector ATPase, F subunit PUBMED:8702544.

    \ 7202 IPR010861 \

    This family consists of several hypothetical, highly conserved Streptococcal and related phage proteins of around 100 residues in length. The function of this family is unknown.

    \ 311 IPR006149 \

    The EB domain has no known function. It is found in several Caenorhabditis elegans proteins. The domain contains 8 conserved cysteines that probably form four disulphide bridges and is found associated with kunitz domains

    \ 5540 IPR008676 \ This family consists of three different eukaryotic proteins (mortality factor 4 (MORF4/MRG15), male-specific lethal 3(MSL-3) and ESA1-associated factor 3(EAF3)). It is thought that the MRG family is involved in transcriptional regulation via histone acetylation PUBMED:11290425, PUBMED:11036083.\ 2234 IPR002881 \

    This domain is found in a family of prokaryotic proteins that have no known function. Proteins belonging to this family include hypothetical proteins from eubacteria and archaebacteria. Some of these proteins also contain the Von Willebrand factor, type A domain (see ).

    \ 7445 IPR011473 \

    This is a family of paralogous hypothetical proteins identified in Rhodopirellula baltica that also has members in Gloeobacter violaceus, Sinorhizobium meliloti and Agrobacterium tumefaciens.

    \ 6851 IPR009749 \

    This family consists of several bacterial proteins of around 90 residues in length. The function of this family is unknown.

    \ 3696 IPR004569 \ PdxJ is required in the biosynthesis of pyridoxine (vitamin B6), a precursor to the enzyme cofactor pyridoxal phosphate. PdxJ catalyses condensation of 1-amino-3-oxo-4-(phosphohydroxy)propan-2-one and 1-deoxy-D-xylulose-5-phosphate to form pyridoxine-5'-phosphate. The product of that reaction is oxidized by PdxH to pyridoxal 5'-phosphate.\ 3673 IPR001257 \

    Parvoviruses encode two noncapsid/non-structural proteins, NS1 and NS2. NS1 is essential\ for viral DNA replication PUBMED:8372437. These proteins include the ATP/GTP-binding site \ motif A (P-loop) .

    \ 5446 IPR008503 \ This family consists of several hypothetical proteins from different archaeal and bacterial species.\ 5430 IPR008429 \ This family consists of several eukaryotic cleft lip and palate transmembrane protein 1 sequences. Cleft lip with or without cleft palate is a common birth defect that is genetically complex. The nonsyndromic forms have been studied genetically using linkage and candidate-gene association studies with only partial success in defining the loci responsible for orofacial clefting. CLPTM1 encodes a transmembrane protein and has strong homology to two Caenorhabditis elegans genes, suggesting that CLPTM1 may belong to a new gene family PUBMED:9828125. This family also contains the Homo sapiens cisplatin resistance related protein CRR9p which is associated with CDDP-induced apoptosis PUBMED:11162647.\ 4667 IPR002517 \ The tospovirus genome consists of three linear ssRNA segments,\ denoted L, M and S complexed with the nucleocapsid protein.\ The S RNA encodes the nucleocapsid protein and another\ non-structural protein PUBMED:8429298.\ 4870 IPR003226 \ The function of this domain is not known, but it is found in several uncharacterised proteins and a probable metal dependent protein hydrolase.\ 2549 IPR001635 \

    During flagellar morphogenesis in Salmonella typhimurium and Escherichia coli, the fliK gene product is responsible for hook length control PUBMED:8631687. The deduced amino acid sequences of FliK proteins from S.typhimurium and E. coli have molecular masses of 41,748 and 39,246 Da, respectively, and are fairly hydrophilic PUBMED:8631687. Sequence comparison reveals around 50% identity, with greatest conservation in the C-terminal region, with 71% identity in the last 154 amino acids - mutagenesis of this conserved region completely abolishes motility. The central and C-terminal regions are rich in proline and glutamine respectively; it is thought that they may constitute distinct domains, separated by a linker region PUBMED:8631687.

    \

    It is considered unlikely that FliK functions as a molecular ruler for determining hook length, but that it is more likely to be employing a novel mechanism PUBMED:8631687.

    \ 6340 IPR010526 \

    Members of this entry contain a region found exclusively in eukaryotic sodium channels or their subunits, many of which are voltage-gated. Members very often also contain between one and four copies of and, less often, one copy of .

    \ 6725 IPR009200 \ There are currently no experimental data for members of this group or their homologues. However, these proteins are predicted to contain two or more transmembrane segments.\ 6314 IPR006542 \

    These are a family of small (about 115 amino acids) uncharacterized proteins with N-terminal signal sequences, found exclusively in Gram-positive organisms. Most genomes that have any members of this family have at least two members.

    \ 2781 IPR004276 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 28 comprises enzymes with a number of known activities; 1,2-diacylglycerol 3-beta-galactosyltransferase (); 1,2-diacylglycerol 3-beta-glucosyltransferase (); beta-N-acetylglucosamine transferase ().

    \ 2113 IPR007398 \ This is a family of uncharacterised proteins.\ 5344 IPR008476 \ This is a family of eukaryotic proteins with undetermined function.\ 1036 IPR000603 \ The 3A protein is found in bromoviruses and Cucumoviruses, whose genomes contain 3 RNA segments.\ The third segment (RNA 3) contains two proteins, the coat protein and the 3A protein. The function of the\ 3A protein is uncertain but has been shown to be involved in movement of the virus from the initially infected\ cells to adjacent cells PUBMED:9356336.\ 6608 IPR009618 \

    This entry represents the C terminus of bacterial Erp proteins that seem to be specific to Borrelia burgdorferi (a causative agent of Lyme disease). Borrelia Erp proteins are particularly heterogeneous, which might enable them to interact with a wide variety of host components PUBMED:12616490.

    \ 5206 IPR008040 \

    This domain is found at the N terminus of the hydantoinase/oxoprolinase family.

    \ 1900 IPR003770 \

    This family contains several aminodeoxychorismate lyase proteins. Aminodeoxychorismate lyase is a pyridoxal 5'-phosphate-dependent enzyme that converts 4-aminodeoxychorismate to pyruvate and p-aminobenzoate, a precursor of folic acid in bacteria PUBMED:11011151.

    \ \ 7317 IPR003536 \ Secretion of virulence factors in Gram-negative bacteria involves \ transportation of the protein across two membranes to reach the cell \ exterior. There have been four secretion systems described in \ animal enteropathogens, such as Salmonella and Yersinia, with further \ sequence similarities in plant pathogens like Ralstonia and Erwinia PUBMED:9618447.\ \

    The type III secretion system is of great interest, as it is used to \ transport virulence factors from the pathogen directly into the host cell \ and is only triggered when the bacterium comes into close contact with\ the host. The protein subunits of the system are very similar to those of \ bacterial flagellar biosynthesis. However, while the latter forms a\ ring structure to allow secretion of flagellin and is an integral part of\ the flagellum itself PUBMED:9618447, type III subunits in the outer membrane \ translocate secreted proteins through a channel-like structure.

    \ \

    Exotoxins secreted by the type III system do not possess a secretion signal,\ and are considered unique for this reason PUBMED:9618447. Enteropathogenic and entero-\ haemorrhagic Escherichia coli secrete the bacterial adhesion mediation\ molecule intimin PUBMED:10835344, which targets the translocated intimin receptor, Tir. Tir is secreted by the bacteria and is embedded in the target cell's plasma membrane PUBMED:10835344. This facilitates bacterial cell attachment to the host.

    \ 2773 IPR002037 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 8 comprises enzymes with several known activities; endoglucanase (); lichenase (); chitosanase (). These enzymes were formerly known as cellulase family D PUBMED:2806912.

    \ 3354 IPR006777 \

    Bacteriophage PhiX174 is one of the simplest viruses, having a single-stranded, closed circular DNA of 5386 nucleotide bases and four capsid proteins, J, F, G and\ H. A single molecule of H protein is found on each of the 12 spikes on the microvirus shell of the bacteriophage. H is involved in the ejection of the phage DNA, and at least one copy is injected into the hosts periplasmic space along with the ssDNA viral genome PUBMED:8158636. Part of H is thought to lie outside the shell, where it recognises lipopolysaccharide from virus-sensitive bacterial strains PUBMED:10225278. Part of H may lie within the capsid, since mutations in H can influence the DNA ejection mechanism by affecting the DNA-protein interactions PUBMED:8433365. H may span the capsid through the hydrophilic channels formed by G proteins PUBMED:8158636.

    \ 37 IPR006048 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Alpha amylase is classified as family 13 of the glycosyl hydrolases. The structure is an 8 stranded alpha/beta barrel containing the active site, interrupted by a ~70 a.a. calcium-binding domain protruding between beta strand 3 and alpha helix 3, and a carboxyl-terminal Greek key beta-barrel domain.

    \ 6773 IPR010710 \

    This family consists of a number of hypothetical bacterial proteins. The aligned region spans around 56 residues and contains 4 highly conserved cysteine residues towards the N terminus. The function of this family is unknown.

    \ 6353 IPR010532 \

    This family consists of several archaeal sulfocyanin (or blue copper protein) sequences from a number of Sulfolobus species.

    \ 5169 IPR008006 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases corresponds to MEROPS peptidase family M26 (clan MA(E)). The active site residues for members of this family and family M4 occur in the motif HEXXH. The type example is IgA1-specific metalloendopeptidase from Streptococcus sanguinis ().

    \ 379 IPR000487 \ Flaviviruses encode a single polyprotein. This is cleaved into\ three structural and seven non-structural proteins. All, but two,\ are cleaved by the NS2B-NS3 protease complex PUBMED:9499070, PUBMED:7884844.\ 1710 IPR002322 \ Cytochromes c (cytC) can be defined as electron-transfer proteins having \ one or several haem c groups, bound to the protein by one or, more \ generally, two thioether bonds involving sulphydryl groups of cysteine \ residues. The fifth haem iron ligand is always provided by a histidine \ residue. CytC possess a wide range of properties and function in a large \ number of different redox processes PUBMED:. \

    Ambler PUBMED:1646017 recognised four classes of cytC.

    \

    Class III comprises the low \ redox potential multiple haem cytochromes: cyt C7 (trihaem), C3 (tetrahaem),\ and high-molecular-weight cytC, HMC (hexadecahaem), with only 30-40 \ residues per haem group. The haem c groups, all bis-histidinyl coordinated,\ are structurally and functionally nonequivalent and present different redox\ potentials in the range 0 to -400 mV PUBMED:7830606. \ The 3D structures of a number of cyt C3 proteins have been determined. The proteins\ consist of 4-5 alpha-helices and 2 beta-strands wrapped around a compact\ core of four non-parallel haems, which present a relatively high degree of \ exposure to the solvent. The overall protein architecture, haem plane \ orientations and iron-iron distances are highly conserved PUBMED:7830606.

    \ 7135 IPR009920 \

    This family contains subunit 1 of bacterial heptaprenyl diphosphate synthase (HEPPP synthase) () (approximately 230 residues long). The enzyme consists of two subunits, both of which are required for catalysis of heptaprenyl diphosphate synthesis PUBMED:9748348.

    \ 4629 IPR007292 \

    Nuclear fusion protein tht1 is an integral membrane protein that was shown PUBMED:9442101 by mutation studies to be required for the fusion of nuclear envelopes during karyogamy.

    \ 242 IPR013053 \

    This family contains the juvenile hormone binding protein of the tobacco hawkmoth (Manduca sexta ) PUBMED:8016136 as well as number of Drosophila proteins of unknown function. The juvenile hormone exerts pleiotropic functions during insect life cycles and its binding proteins regulate these functions.

    Based on the\ similarity to the hormone binding protein it is suggested that the members of this family are odorant binding proteins.

    \ 1709 IPR002321 \

    Cytochromes c (cytC) can be defined as electron-transfer proteins having \ one or several haem c groups, bound to the protein by one or, more \ generally, two thioether bonds involving sulphydryl groups of cysteine \ residues. The fifth haem iron ligand is always provided by a histidine \ residue. CytC possess a wide range of properties and function in a large \ number of different redox processes PUBMED:. Ambler PUBMED:1646017 recognised four classes of cytC.

    \ \

    Class II includes the \ high-spin cytC' and a number of low-spin cytochromes, e.g. cyt c-556. \ The haem-attachment site is close to the C-terminus. The cytC' are capable\ of binding such ligands as CO, NO or CN(-), albeit with rate and equilibrium\ constants 100 to 1,000,000-fold smaller than other high-spin haemoproteins\ PUBMED:1646027. This, coupled with its relatively low redox potential, makes it\ unlikely that cytC' is a terminal oxidase. Thus cytC' probably functions\ as an electron transfer protein PUBMED:1646016.

    \

    The 3D structures of a number of cytC' have been determined. The molecule \ usually exists as a dimer, each monomer folding as a four-alpha-helix bundle\ incorporating a covalently-bound haem group at the core PUBMED:1646016. The Chromatium\ vinosum cytC' exhibits dimer dissociation upon ligand binding PUBMED:8230224.

    \ 3964 IPR005009 \ Vaccinia virus, the prototypic poxvirus, possesses a double-stranded DNA genome of 191,686 base pairs \ capable of encoding approximately 200 proteins. Virion enzymes produce mature viral mRNA with eukaryotic features,\ including a 5' cap and a 3' poly(A) tail. Vaccinia virus mRNA capping enzyme is a multifunctional protein with RNA triphosphatase, RNA guanylyltransferase, RNA\ (guanine-7) methyltransferase, and transcription termination factor activities. The protein is a heterodimer of 95- and 33-kDa\ subunits encoded by the vaccinia virus D1 and D12 genes, respectively. The capping reaction entails transfer of GMP from\ GTP to the 5'-diphosphate end of mRNA via a covalent enzyme-(lysyl-GMP) intermediate.\ 6554 IPR009598 \

    This family consists of a series of short proteins of around 90 residues in length. The human protein or BC10 has been implicated in bladder cancer where the transcription of the gene coding for this protein is nearly completely abolished in highly invasive transitional cell carcinomas (TCCs) PUBMED:11920613. The function of this family is unknown.

    \ 2435 IPR001986 \

    EPSP synthase (3-phosphoshikimate 1-carboxyvinyltransferase) () catalyzes the sixth step in the biosynthesis from chorismate of the aromatic amino acids (the shikimate pathway) in bacteria (gene aroA), plants and fungi (where it is part of a multifunctional enzyme which catalyzes five consecutive steps in this pathway) PUBMED:11607190. EPSP synthase has been extensively studied as it is the target of the potent herbicide glyphosate which inhibits the enzyme.

    \

    The sequence of EPSP from various biological sources shows that the structure of the enzyme has been well conserved throughout evolution. Two strongly conserved regions are well defined. The first one corresponds to a region that is part of the active site and which is also important for the resistance to glyphosate PUBMED:1939260. The second second one is located in the C-terminal part of the protein and contains a conserved lysine which seems to be important for the activity of the enzyme.

    \ 2853 IPR004105 \

    This helical bundle domain is the homodimer interface of the signal transducing histidine kinase family PUBMED:9989504.

    \ 228 IPR000340 \

    Ser/Thr and Tyr dual specificity phosphatases are a group of enzymes with both Ser/Thr () and tyrosine specific protein\ phosphatase () activity able to remove both the serine/threonine or tyrosine-bound phosphate group from a wide\ range of phosphoproteins, including a number of enzymes which have been phosphorylated\ under the action of a kinase.\ \ Dual specificity protein phosphatases (DSPs) regulate mitogenic signal transduction and control the cell cycle. The\ crystal structure of a human DSP, vaccinia H1-related phosphatase (or VHR), has been determined at 2.1 angstrom resolution PUBMED:8650541. A shallow active site pocket in VHR allows for the hydrolysis of phosphorylated serine, threonine, or tyrosine protein\ residues, whereas the deeper active site of protein tyrosine phosphatases (PTPs) restricts substrate specificity to only\ phosphotyrosine. Positively charged crevices near the active site may explain the enzyme's preference for substrates with\ two phosphorylated residues. The VHR structure defines a conserved structural scaffold for both DSPs and PTPs. A\ "recognition region" connecting helix alpha1 to strand beta1, may determine differences in substrate specificity between\ VHR, the PTPs, and other DSPs.

    \

    These proteins may also have inactive phosphatase domains, and dependent on the domain composition this loss of catalytic activity has different effects on protein function. Inactive single domain phosphatases can still specifically bind substrates, and protect again dephosphorylation, while the inactive domains of tandem phosphatases can be further subdivided into two classes. Those which bind phosphorylated tyrosine residues may recruit multi-phosphorylated substrates for the adjacent active domains and are more conserved, while the other class have accumulated several variable amino acid substitutions and have a complete loss of tyrosine binding capability. The second class shows a release of evolutionary constraint for the sites around the catalytic centre, which emphasises a difference in function from the first group. There is a region of higher conservation common to both classes, suggesting a new regulatory centre.PUBMED:14739250

    \ \ \ \ 5601 IPR006575 \

    The RWD eukaryotic domain is found in RING finger () and WD repeat () containing proteins\ and DEXDc-like helicases () subfamily\ related to the ubiquitin-conjugating enzymes domain ().

    \ 8136 IPR012387 \

    This group represents a tRNA ligase, yeast type. Please see the following relevant references: PUBMED:12466548, PUBMED:1922054.

    \ 4827 IPR003846 \

    This entry describes proteins of unknown function.

    \ 6711 IPR009674 \

    This domain is found between domain 3 and domain 5, but shows no homology to domain 4 of Rpb2. The external domains in multisubunit RNA polymerase (those most distant from the active site) are known to demonstrate more sequence variability PUBMED:11313498.

    \ 5876 IPR009270 \

    This is a family of bacterial proteins of unknown function.

    \ 309 IPR006887 \ This is a conserved region which characterizes a number of eukaryotic proteins of unknown function.\ 7059 IPR009869 \

    This entry represents the N terminus (approximately 180 residues) of plant Hs1pro-1, which is believed to confer resistance to nematodes PUBMED:12669798.

    \ 4637 IPR002919 \

    This domain is found in proteinase inhibitors as well as in many extracellular proteins. The domain typically contains ten cysteine residues that form five disulphide bonds. The cysteine residues that form the disulphide bonds are 1-7, 2-6, 3-5, 4-10 and 8-9.

    \ \

    This inhibitor domain belongs to MEROPS inhibitor family I8 (clan IA). Proteins containing this domain inhibit peptidases belonging to families S1 (), S8 (), and M4 () PUBMED:14705960 and are restricted to the chordata, nematoda, arthropoda and echinodermata. Examples of proteins containing this domain are:

    \ \ \ \ 1248 IPR003412 \ This is a family of structural glycoproteins from arterivirus that corresponds to open reading frame 4 (ORF4) of the virus.\ 2262 IPR006852 \ This is a family of uncharacterised proteins.\ 2441 IPR005140 \

    This domain is found in the release factor eRF1 which terminates protein biosynthesis by recognizing stop codons at the A site of the ribosome and stimulating\ peptidyl-tRNA bond hydrolysis at the peptidyl transferase center. The crystal structure of human eRF1 is known PUBMED:10676813. The overall\ shape and dimensions of eRF1 resemble a tRNA molecule with domains 1, 2, and 3 of eRF1 corresponding to the anticodon loop,\ aminoacyl acceptor stem, and T stem of a tRNA molecule, respectively. The position of the essential GGQ motif at an exposed tip\ of domain 2 suggests that the Gln residue coordinates a water molecule to mediate the hydrolytic activity at the peptidyl\ transferase center. A conserved groove on domain 1, 80 A from the GGQ motif, is proposed to form the codon recognition site PUBMED:10676813.

    \ \

    This domain is also found in other proteins for which the precise molecular function is unknown. Many of them are from\ Archaebacteria. These proteins may also be involved in translation termination but this awaits experimental verification.

    \ 2483 IPR006793 \ This family represents a number of fimbrial protein transcription regulators found in Gram-negative bacteria. These proteins are thought to facilitate binding of the leucine-rich regulatory protein to regulatory elements, possibly by inhibiting deoxyadenosine methylation of these elements by deoxyadenosine methylase PUBMED:7476191, PUBMED:8846772.\ 860 IPR011996 \

    Potassium channels are the most diverse group of the ion channel family\ PUBMED:1772658, PUBMED:1879548. They are important in shaping the action potential, and in neuronal excitability and plasticity PUBMED:2451788. The potassium channel family is\ composed of several functionally distinct isoforms, which can be broadly\ separated into 2 groups PUBMED:2555158: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.

    \

    These are all highly similar proteins, with only small amino acid\ changes causing the diversity of the voltage-dependent gating mechanism,\ channel conductance and toxin binding properties. Each type of K+ channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; and others are regulated by GTP-binding proteins or\ other second messengers PUBMED:2448635. In eukaryotic cells, K+ channels\ are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes PUBMED:1373731. In prokaryotic cells, they play a role in the\ maintenance of ionic homeostasis PUBMED:11178249.

    \

    All K+ channels discovered so far possess a core of \ alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has\ been termed the K+ selectivity sequence.\ In families that contain one P-domain, four subunits assemble to form a selective pathway for K+ across the membrane.\ However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K+ channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains.\ The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K+ channels; and three types of calcium (Ca)-activated K+ channels (BK, IK and SK)\ PUBMED:11178249, PUBMED:. The 2TM domain family comprises inward-rectifying K+ \ channels. In addition, there are K+ channel alpha-subunits that possess two P-domains. These are usually highly regulated K+ selective leak channels.

    \

    Ca2+-activated K+ channels are a diverse group of channels that are activated by an increase in intracellular Ca2+ concentration. They are found in the majority of nerve cells, where they modulate cell excitability and action potential. Three types of Ca2+-activated K+ channel have been characterised, termed small-conductance (SK), intermediate conductance (IK) and large conductance (BK) respectively PUBMED:9687354.

    \ \

    SK channels are thought to play an important role in the functioning of all excitable tissues. To date, 3 subtypes (designated SK1-SK3) have been cloned, each of which possesses a different tissue expression profile: SK1 channels are expressed in the heart; SK2 channels are found in the adrenal gland; and SK3 channels are known to be present in skeletal muscle.

    \ \

    This entry represents a conserved region, found in proteins of SK channels family.

    \ 5067 IPR007904 \

    This domain is found at the C terminus of the Apolipoprotein B mRNA editing enzyme. Apobec-1 catalyzes C to U editing of apolipoprotein B (apoB) mRNA in the mammalian intestine. C to U RNA editing of mammalian apolipoprotein B (apoB) RNA is a site-specific posttranscriptional modification in which a single cytidine is enzymatically\ deaminated to uridine, thereby generating a UAA stop codon in the edited mRNA. The function\ of this domain is currently unknown.

    \ 5136 IPR007973 \

    This family consists of several bacterial sex pilus assembly and synthesis proteins (TraE).\ Conjugal transfer of plasmids from donor to recipient cells is a complex process in which a\ cell-to-cell contact plays a key role. Many genes encoded by self-transmissible plasmids are\ required for various processes of conjugation, including pilus formation, stabilisation of mating pairs,\ conjugative DNA metabolism, surface exclusion and regulation of transfer gene expression\ PUBMED:10760136. The exact function of the TraE protein is unknown.

    \ 3016 IPR005697 \

    This family of enzymes, homoserine O-succinyltransferase (), catalyses the first step in the biosynthesis of methionine.

    \ 2261 IPR006839 \ This family of bacterial proteins has no known function.\ 1979 IPR002636 \ This family consists of various hypothetical proteins\ from cyanobacteria, none of which are functionally\ described. The aligned region is approximately 120-140\ amino acids long corresponding to almost the entire\ length of the proteins in the family.\ 4941 IPR007609 \

    This family represents the 18kD cysteine-rich protein from ssRNA positive strand viruses.

    \ 1268 IPR007472 \

    This entry represents the C-terminal region of the enzyme arginine-tRNA-protein transferase (), which catalyses the post-translational conjugation of arginine to the N terminus of a protein. In eukaryotes, this functions as part of the N terminu rule pathway of protein degradation by conjugating a destabilising amino acid to the N-terminal aspartate or glutamate of a protein, targeting the protein for ubiquitin-dependent proteolysis. N-terminal cysteine is sometimes modified PUBMED:9858543. The N-terminal is represented by .

    \ 2394 IPR001253 \

    Eukaryotic translation initiation factor A (eIF-1A) (formerly known as eiF-4C) is a \ protein that seems to be required for maximal rate of protein biosynthesis. It enhances \ ribosome dissociation into subunits and stabilizes the binding of the initiator Met-tRNA \ to 40S ribosomal subunits PUBMED:7559407.\ Archaebacteria also seem to possess an eIF-1A homolog.

    \ 4775 IPR000127 \

    The post-translational attachment of ubiquitin () to proteins (ubiquitinylation) alters the function, location or trafficking of a protein, or targets it to the 26S proteasome for degradation PUBMED:15556404, PUBMED:15196553, PUBMED:15454246. Ubiquitinylation is an ATP-dependent process that involves the action of at least three enzymes: a ubiquitin-activating enzyme (E1, ), a ubiquitin-conjugating enzyme (E2, ), and a ubiquitin ligase (E3, , ), which work sequentially in a cascade PUBMED:14998368. The E1 enzyme is responsible for activating ubiquitin, the first step in ubiquitinylation. The E1 enzyme hydrolyses ATP and adenylates the C-terminal glycine residue of ubiquitin, and then links this residue to the active site cysteine of E1, yielding a ubiquitin-thioester and free AMP. To be fully active, E1 must non-covalently bind to and adenylate a second ubiquitin molecule. The E1 enzyme can then transfer the thioester-linked ubiquitin molecule to a cysteine residue on the ubiquitin-conjugating enzyme, E2, in an ATP-dependent reaction.

    \

    This domain is found 2 times in each member of the ubiquitin activating enzymes and is located downstream of the active site cysteine PUBMED:1634524.

    \ 6284 IPR010504 \

    Arfaptin interacts with ARF1, a small GTPase involved in vesicle budding at the Golgi complex and immature secretory granules. The structure of arfaptin shows that upon binding to a small GTPase, arfaptin forms a an elongated, crescent-shaped dimer of three-helix coiled-coils PUBMED:11346801. The N-terminal region of ICA69 is similar to arfaptin PUBMED:12682071.

    \ 6815 IPR010731 \

    This group of proteins consists of several highly conserved Orthopoxvirus proteins known as the C5L protein in Variola virus. The function of these proteins is unknown.

    \ 5349 IPR008767 \

    This family describes proteins found in bacteriophage and in bacterial prophage\ regions. The function of these proteins is not\ known.

    \ 5119 IPR007956 \

    This family consists of several eukaryotic malonyl-CoA decarboxylase (MLYCD) proteins.\ Malonyl-CoA, in addition to being an intermediate in the de novo synthesis of fatty acids, is\ an inhibitor of carnitine palmitoyltransferase I, the enzyme that regulates the transfer of long-chain\ fatty acyl-CoA into mitochondria, where they are oxidised. After exercise, malonyl-CoA\ decarboxylase participates with acetyl-CoA carboxylase in regulating the concentration of\ malonyl-CoA in liver and adipose tissue, as well as in muscle. Malonyl-CoA decarboxylase is\ regulated by AMP-activated protein kinase (AMPK) PUBMED:12065578.

    \ 1675 IPR000269 \

    Amine oxidases (AO) are enzymes that catalyze the oxidation of a wide range of biogenic amines including many neurotransmitters, histamine and xenobiotic amines. There are two classes of amine oxidases: flavin-containing () and copper-containing ().\ Copper-containing AO act as a disulphide-linked homodimer. They catalyse the oxidation of primary amines to aldehydes, with the subsequent release of ammonia and hydrogen peroxide: which requires one copper ion per subunit and topaquinone as cofactor PUBMED:8591028. Copper-containing amine oxidases are found in bacteria, fungi, plants and animals. In prokaryotes, the enzyme enables various amine substrates to be used as sources of carbon and nitrogen PUBMED:9048544, PUBMED:9405045. In eukaryotes they have a broader range of functions, including cell differentiation and growth, wound healing, detoxification and cell signalling PUBMED:8805580.

    \

    The copper amine oxidases occur as mushroom-shaped homodimers of 70-95 kDa, each monomer containing a copper ion and a covalently bound redox cofactor, topaquinone (TPQ). TPQ is formed by post-translational modification of a conserved tyrosine residue. The copper ion is coordinated with three histidine residues and two water molecules in a distorted square pyramidal geometry, and has a dual function in catalysis and TPQ biogenesis. The catalytic domain is the largest of the 3-4 domains found in copper amine oxidases, and consists of a beta sandwich of 18 strands in two sheets. The active site is buried and requires a conformational change to allow the substrate access.

    \ 6267 IPR010496 \

    This is a family of proteins of unknown function.

    \ 5697 IPR008593 \ This family consists of several bacterial and phage DNA N-6-adenine-methyltransferase (Dam) like sequences PUBMED:2180941.\ 1110 IPR007615 \ This is a conserved region found in the Adenovirus E4 34 kDa protein.\ 224 IPR002469 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This domain defines serine peptidases belonging to MEROPS peptidase family S9 (clan SC), subfamily S9B (dipeptidyl-peptidase IV). The protein fold of the peptidase domain for members of this family resembles that of serine carboxypeptidase D, the type example of clan SC. This domain is an alignment of the region to the N-terminal side of the active site, which is found in .

    \ \ \

    CD26 () is also called adenosine deaminase-binding protein (ADA-binding protein) or dipeptidylpeptidase IV (DPP IV ectoenzyme). The exopeptidase cleaves off N-terminal X-Pro or X-Ala dipeptides from polypeptides (dipeptidyl peptidase IV activity). CD26 serves as the costimulatory molecule in T cell activation and is an associated marker of autoimmune diseases, adenosine deaminase-deficiency and HIV pathogenesis.

    \ \

    Dipeptidyl peptidase IV (DPP IV) is responsible for the removal of N-terminal dipeptides sequentially from polypeptides having unsubstituted N termini, provided that the penultimate residue is proline. The enzyme catalyses the reaction:\ \ It is a type II membrane protein that forms a homodimer.

    \ \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 5629 IPR008558 \ This family consists of several Lagovirus sequences of unknown function, largely from Oryctolagus cuniculus hemorrhagic disease virus.\ 4731 IPR002300 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    \ \ 6847 IPR010742 \

    This family consists of several Rab5-interacting protein (RIP5 or Rab5ip) sequences. The ras-related GTPase rab5 is rate-limiting for homotypic early endosome fusion. Rab5ip represents a novel rab5 interacting protein that may function on endocytic vesicles as a receptor for rab5-GDP and participate in the activation of rab5 PUBMED:10818110.

    \ 4911 IPR004908 \

    ATP synthase () is a multisubunit non-phosphorylated ATPase that is involved in the transport of ions. V-type (vacuolar) ATPases are located to the membranes of vacuoles, Golgi complexes and endosomes in eukaryotic cells, where they are responsible for acidifying these organelles by the transport of protons coupled to the hydrolysis of ATP. Organelle acidification is used for receptor-mediated endocytosis, intracellular trafficking and protein degradation. V-ATPase is a hetero-multimeric enzyme composed of a catalytic V1 complex consisting of peripherally associated protein subunits A to H, and a proton-translocating V0 proton pore complex consisting of integral membrane protein components A, C, C', C'' and D PUBMED:14635776.

    \

    This family represents subunit H (also known as Vma13p) of the peripheral V1 complex of vacuolar ATPase, which is a regulatory subunit responsible for activating ATPase activity and coupling ATPase activity to proton flow. The yeast enzyme contains five motifs similar to the HEAT or Armadillo repeats seen in the importins, and can be divided into two distinct domains: a large N-terminal domain consisting of stacked alpha helices, and a smaller C-terminal alpha-helical domain with a similar superhelical topology to an armadillo repeat PUBMED:11416198.

    \ \ 6939 IPR009800 \

    This family consists of several mammalian alpha helical coiled-coil rod HCR proteins. The function of HCR is unknown but it has been implicated in psoriasis in humans and is thought to affect keratinocyte proliferation PUBMED:11875053.

    \ 1647 IPR001349 \

    Cytochrome c oxidase () is an oligomeric enzymatic complex which is a component \ of the respiratory chain complex and is involved in the transfer of electrons from \ cytochrome c to oxygen PUBMED:6307356. \ In eukaryotes this enzyme complex is located in the mitochondrial inner membrane; in \ aerobic prokaryotes it is found in the plasma membrane.

    \

    In eukaryotes, in addition to the \ three large subunits, I, II and III, that form the catalytic center of the enzyme complex, there are \ a variable number of small polypeptidic subunits. One of these subunits is known as VIa \ in vertebrates and fungi. Mammals have two tissue-specific isoforms of VIa, a liver and a \ heart form. Only one form is found in fish PUBMED:9107314.

    \ 7318 IPR003536 \ Secretion of virulence factors in Gram-negative bacteria involves \ transportation of the protein across two membranes to reach the cell \ exterior. There have been four secretion systems described in \ animal enteropathogens, such as Salmonella and Yersinia, with further \ sequence similarities in plant pathogens like Ralstonia and Erwinia PUBMED:9618447.\ \

    The type III secretion system is of great interest, as it is used to \ transport virulence factors from the pathogen directly into the host cell \ and is only triggered when the bacterium comes into close contact with\ the host. The protein subunits of the system are very similar to those of \ bacterial flagellar biosynthesis. However, while the latter forms a\ ring structure to allow secretion of flagellin and is an integral part of\ the flagellum itself PUBMED:9618447, type III subunits in the outer membrane \ translocate secreted proteins through a channel-like structure.

    \ \

    Exotoxins secreted by the type III system do not possess a secretion signal,\ and are considered unique for this reason PUBMED:9618447. Enteropathogenic and entero-\ haemorrhagic Escherichia coli secrete the bacterial adhesion mediation\ molecule intimin PUBMED:10835344, which targets the translocated intimin receptor, Tir. Tir is secreted by the bacteria and is embedded in the target cell's plasma membrane PUBMED:10835344. This facilitates bacterial cell attachment to the host.

    \ 7849 IPR012520 \

    This family includes antimicrobial peptides secreted from skins of frogs. The secretion of antimicrobial peptides from the skins of frogs plays an important role in the self defence of these frogs. Structural characterization of these peptides showed that they belonged to four known families: the brevinin-1 family, the esculentin-2 family, the ranatuerin-2 family and the temporin family PUBMED:10651828.

    \ 4944 IPR007792 \ This family includes the Type IV secretory pathway VirB3 protein, that is found associated with bacterial inner and outer membranes and assists T pilus formation as an assembly factor PUBMED:8405938.\ 3431 IPR003369 \ Members of this protein family are involved in a sec-independent translocation mechanism. This pathway has been called the DeltapH pathway in chloroplasts PUBMED:9367960. Members of this family in Escherichia coli are involved in export of redox proteins with a "twin arginine" leader motif (S/T-R-R-X-F-L-K) PUBMED:9546395. This sec-independent pathway is termed TAT for twin-arginine translocation system. This system mainly transports proteins with bound cofactors that require folding prior to export.\ 2282 IPR006951 \ These are proteins of unknown function found in Borrelia burgdorferi, the Lyme disease spirochete.\ 4204 IPR008195 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaebacterial ribosomal proteins belong to the L34e\ family. These include, vertebrate L34, mosquito L31 PUBMED:8049275, plant L34 PUBMED:8075394,\ yeast putative ribosomal protein YIL052c and archaebacterial L34e.

    \ 7147 IPR009928 \

    This entry represents the N terminus (approximately 120 residues) of bacterial primosomal DnaI proteins, although one family member appears to be of viral origin. DnaI is one of the components of the Bacillus subtilis replication restart primosome, and is required for the DnaB75-dependent loading of the DnaC helicase PUBMED:11679082.

    \ 5649 IPR008699 \ This family consists of several eukaryotic NADH-ubiquinone oxidoreductase ASHI subunit (CI-ASHI) proteins. NADH:ubiquinone oxidoreductase (complex I) is an extremely complicated multiprotein complex located in the inner mitochondrial membrane. Its main function is the transport of electrons from NADH to ubiquinone, which is accompanied by translocation of protons from the mitochondrial matrix to the intermembrane space. Human complex I appears to consist of 41 subunits PUBMED:9878551.\ 2249 IPR006734 \ This family includes a conserved region in several uncharacterised plant proteins.\ 192 IPR004146 \ This short domain is rich in cysteines and histidines. The pattern of conservation is similar to that found in DAG_PE-bind (), therefore we have termed this domain DC1 for divergent C1 domain. This domain probably also binds to two zinc ions. The function of proteins with this domain is uncertain, however this domain may bind to molecules such as diacylglycerol (A Bateman pers. obs.). This family are found in plant proteins.\ 383 IPR005025 \

    NADPH-dependent FMN reductase () reduces FMN and also reduces riboflavin and FAD, although more\ slowly. Members of this family catalyse the reaction

    \ \ \ \ \ \ \ 8040 IPR013240 \

    This is a family of yeast proteins. Subunit A34.5 of RNA polymerase I is a non-essential subunit which is thought to help Pol I overcome topological constraints imposed on ribosomal DNA during the process of transcription PUBMED:9121426.

    \ 7434 IPR011463 \

    This is a family of hypothetical proteins identified in Rhodopirellula baltica.

    \ 7657 IPR010191 \

    This entry represents IMP cyclohydrolase, the final step in the biosynthesis of inosine monophosphate (IMP) in archaea PUBMED:11844782. In bacteria this step is catalysed by a bifunctional enzyme (purH).

    \ 4067 IPR004260 \

    Pyrimidine dimer DNA glycosylases are enzymes responsible for initiating the base excision repair pathway, excising pyrimidine dimers by hydrolysis of the glycosylic bond of the 5' pyrimidine, followed by the intra-pyrimidine phosphodiester bond PUBMED:11148051. One such enzyme is T4 endonuclease V, an enzyme responsible for the first step of a pyrimidine-dimer-specific excision-repair pathway PUBMED:2067549. Bacteriophage T4 that are deficient in these enzymes are extremely sensitive to UV.

    \ 769 IPR007197 \

    Radical SAM proteins catalyze diverse reactions, including unusual methylations, isomerization, sulphur insertion, ring formation, anaerobic oxidation and protein radical formation. Evidence exists that these proteins generate a radical species by reductive cleavage of S:-adenosylmethionine (SAM) through an unusual Fe-S center PUBMED:11222759, PUBMED:15317939.

    \ 1625 IPR002523 \ The CorA transport system is the primary Mg2+ influx system of Salmonella\ typhimurium and Escherichia coli PUBMED:9775386, PUBMED:9786860. CorA is virtually ubiquitous in the\ Bacteria and Archaea. There are also eukaryotic relatives of this protein.\ 3561 IPR001414 \ Ocular albinism type 1 (OA1) is an X-linked disorder characterised by severe\ impairment of visual acuity, retinal hypopigmentation and the presence of\ macromelanosomes. A novel transcript from the OA1\ critical region is expressed in high levels in RNA samples from\ retina and from melanoma and encodes a potential integral membrane\ protein PUBMED:7647783. This protein is of unknown function but is known to bind heterotrimeric G proteins.\ 1095 IPR002123 \ This family contains acyltransferases involved in phospholipid biosynthesis and other proteins of unknown function PUBMED:9259571. This family also includes tafazzin subfamily () PUBMED:8630491.\ 1601 IPR005557 \

    Colicin immunity proteins are plasmid encoded proteins necessary for protecting the cell against colicins. Colicins are toxins released by bacteria during times of stress PUBMED:11590016.

    \ 2943 IPR004999 \

    The family is the capsid assembly protein, which binds DNA and may be involved in anchoring DNA in the capsid.

    \ 6425 IPR010567 \

    This family consists of several P-47 proteins from various Clostridium species as well as two related sequences from Pseudomonas putida. The function of this family is unknown.

    \ 4517 IPR006123 \

    Staphylococcal enterotoxins and streptococcal pyrogenic exotoxins constitute a family of biologically and structurally related toxins produced by Staphylococcus aureus and Streptococcus pyogenes PUBMED:2679358, PUBMED:2185544. These toxins share the ability to bind to the major histocompatibility complex proteins of their hosts. A more distant relative of the family is the Staphylococcus aureus toxic shock syndrome toxin, which shares only a low level of sequence similarity with this group.

    All of these toxins share a similar two-domain fold (N and C-terminal domains) with a long alpha-helix in the middle of the molecule, a\ characteristic beta-barrel known as the "oligosaccharide/oligonucleotide fold" at the N-terminal domain and a beta-grasp motif at the C-terminal domain. Each superantigen possesses slightly different binding mode(s) when it interacts with MHC class II molecules or the T-cell receptor PUBMED:9514739.

    The beta-grasp domain has some structural similarities to the beta-grasp motif present in immunoglobulin-binding\ domains, ubiquitin, 2Fe-2 S ferredoxin and translation initiation factor 3 as identified by the SCOP database.

    \ 1147 IPR006741 \

    The accessory gene regulator (agr) of Staphylococcus aureus is the central regulatory system that controls the gene expression for a\ large set of virulence factors. The arg locus consists of two transcripts: RNAII and RNAIII. RNAII encodes four genes (agrA, B, C, and D) whose gene products assemble a quorum sensing system. At low cell density, the agr genes are continuously expressed at basal levels. A signal molecule, autoinducing peptide\ (AIP), produced and secreted by the bacteria, accumulates outside of the cells. When the cell density increases and the AIP concentration reaches a\ threshold, it activates the agr response, i.e. activation of secreted protein gene expression and subsequent repression of cell wall-associated protein genes. AgrB and AgrD are essential for the production of the autoinducing peptide which functions as a signal for quorum sensing.

    AgrB is a transmembrane protein PUBMED:11195102. AgrB is\ involved in the proteolytic processing of AgrD and may have both proteolytic enzyme activity and a transporter facilitating the export of\ the processed AgrD peptide PUBMED:12122003.

    \ 7032 IPR009854 \

    This family consists of several Orthoreovirus membrane fusion protein p10 sequences. p10 is thought to be a multifunctional protein that plays a key role in virus-host interaction PUBMED:11893756.

    \ 2555 IPR007809 \ This family includes the FlgN protein, an export chaperone involved in flagellar synthesis PUBMED:11169117.\ 5980 IPR009319 \

    This is a family of related phage minor capsid proteins.

    \ 6001 IPR009328 \

    This family consists of several bacterial putative membrane proteins of unknown function.

    \ 7029 IPR010812 \

    This family represents a conserved region approximately 200 residues long within a number of bacterial hypersensitivity response secretion protein HrpJ and similar proteins. HrpJ forms part of a type III secretion system through which, in phytopathogenic bacterial species, virulence factors are thought to be delivered to plant cells PUBMED:10449783.

    \ 3174 IPR005513 \

    LEA proteins are late embryonic proteins abundant in higher plant seed embryos. They may play an essential role in seed survival and control of water exchanges during seed desiccation and imbibition. Family members are conserved along the entire coding region, especially within the hydrophobic internal 20 amino acid motif. This motif may be repeated.

    \ 2078 IPR007317 \ This is a family of conserved eukaryotic proteins with undetermined function.\ 7327 IPR011114 \

    Homologous recombination is a crucial process in all living organisms. In bacteria, this process the RuvA, RuvB, and RuvC proteins are involved. More specifically the proteins process the Holliday junction DNA. RuvA is comprised of three distinct domains. The domain represents the C-terminal domain and plays a significant role in the ATP-dependent branch migration of the hetero-duplex through direct contact with RuvB PUBMED:10890893. Within the Holliday junction, the C-terminal domain makes no interaction with DNA PUBMED:10890893.

    \ 5333 IPR008609 \ This family consists of Ebola and Marburg virus nucleoproteins. These proteins are responsible for encapsidation of genomic RNA. It has been found that nucleoprotein DNA vaccines can offer protection from the virus PUBMED:9657001.\ 2634 IPR006884 \ This is conserved C-terminal region is found in a number of putative transmembrane GTPase. The Fzo protein is a mediator of mitochondrial fusion PUBMED:9230308. This conserved region is also found in the human mitofusin protein PUBMED:11181170.\ 2117 IPR007403 \ This is a family of putative membrane proteins.\ 1865 IPR002485 \ This domain is found in nematode proteins. It is currently\ of unknown function.\ 6930 IPR010774 \

    This family consists of several bacterial and phage proteins of around 95 residues in length. The function of this family is unknown.

    \ 2491 IPR003152 \ The FATC domain is found at the C-terminal end of the PIK-related kinases. Members of the family of PIK-related kinases may act as intracellular sensors that govern radial and horizontal pathways PUBMED:10782091.\ 2734 IPR011613 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 15 comprises enzymes with several known activities; glucoamylase (); alpha-glucosidase (); glucodextranase ().

    \ \ \

    Glucoamylase (GA) catalyses the release of\ D-glucose from the non-reducing ends of starch and other oligo- or poly-saccharides. Studies of fungal GA have indicated 3 closely-clustered acidic\ residues that play a role in the catalytic mechanism PUBMED:1970434. This region is also conserved in a recently sequenced bacterial GA PUBMED:1633799.

    \

    The 3D structure of the pseudo-tetrasaccharide acarbose complexed with\ glucoamylase II(471) from Aspergillus awamori var. X100 has been determined\ to 2.4A resolution PUBMED:8195212. The protein belongs to the mainly-alpha class, and contains 19 helices and 9 strands.

    \ 7045 IPR010816 \

    In filamentous fungi, het loci (for heterokaryon incompatibility) are believed to regulate self/nonself-recognition during vegetative growth. As filamentous fungi grow, hyphal fusion occurs within an individual colony to form a network. Hyphal fusion can occur also between different individuals to form a heterokaryon, in which genetically distinct nuclei occupy a common cytoplasm. However, heterokaryotic cells are viable only if the individuals involved have identical alleles at all het loci PUBMED:9770498.

    \ 5474 IPR008504 \ This family consists of several eukaryotic proteins of unknown function.\ 2719 IPR003836 \ Glucokinases are found in invertebrates and microorganisms and are highly specific for glucose. These enzymes phosphorylate glucose using ATP as a donor to give glucose-6-phosphate and ADP PUBMED:9023215.\ 2806 IPR004196 \

    The assembly of a macromolecular structure proceeds via a specific pathway of ordered events and occurs by changing of protein conformations as they join the assembly. The assembly process is aided by scaffolding proteins, which act as chaperones. In bacteriophages, scaffolding proteins B and D are responsible for procapsid formation. Copies of protein D (240) form the external scaffold, while 60 copies of protein B form the internal scaffold PUBMED:9305849. The role of scaffolding protein D is in the production of viral single-stranded RNA.

    \ \ 7553 IPR011697 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    These peptidases have gamma-glutamyl hydrolase activity; that is they catalyse the cleavage of the gamma-glutamyl bond in poly-gamma-glutamyl substrates. They are structurally related to , but contain extensions in four loops and at the C terminus PUBMED:11953431. They belong to MEROPS peptidase family C26 (gamma-glutamyl hydrolase family), clan PC. The majority of the sequences are classified as unassigned peptidases.

    \ 7885 IPR012620 \

    This family consists of the tryptophanese (tna) operon leader peptide. Tna catalyses the degradation of L-tryptophan to indole, pyruvate and ammonia, enabling the bacteria to utilise tryptophan as a source of carbon, nitrogen and energy. The tna operon of Escherichia coli contains two major structural genes, tnaA and tnaB. Preceding tnaA in the tna operon is a 319 -nucleotide transcribed regulatory region that contains the coding region for a 24-residue leader peptide, TnaC. The RNA sequence in the vicinity of the tnaC stop codon is rich in Cytidylate residues which is required for efficient Rho -dependent termination in the leader region of the tna operon PUBMED:14563884.

    \ 4958 IPR007262 \ Vps55 is involved in the secretion of the Golgi form of the soluble vacuolar carboxypeptidase Y, but not the trafficking of the membrane-bound vacuolar alkaline phosphatase. Both Vps55 and obesity receptor gene-related protein are important for functioning membrane trafficking to the vacuole/lysosome of eukaryotic cells PUBMED:12006663.\ 1478 IPR002883 \

    The recycling of photosynthetically fixed carbon in plant\ cell walls is a key microbial process. Enzyme systems that\ attack the plant cell wall contain noncatalytic carbohydrate-binding modules that mediate attachment to this composite\ structure and play a pivotal role in maximizing the hydrolytic process. In anaerobes, the\ degradation is carried out by a high molecular weight,\ multifunctional complex termed the cellulosome. This\ consists of a number of independent enzyme\ components, each of which contains a conserved 40-residue \ dockerin domain, which functions to bind the enzyme to a\ cohesin domain within the scaffoldin protein PUBMED:7492333, PUBMED:7493964.

    \

    In\ anaerobic bacteria that degrade plant cell walls, exemplified by\ Clostridium thermocellum, the dockerin domains of the\ catalytic polypeptides can bind equally well to any cohesin from\ the same organism. More recently, anaerobic fungi, typified by Piromyces equi,\ have been suggested to also\ synthesize a cellulosome complex, although the dockerin\ sequences of the bacterial and fungal enzymes are completely\ different PUBMED:11524680. For example, the fungal enzymes contain one, two or\ three copies of the dockerin sequence in tandem within the\ catalytic polypeptide. In contrast, all the C. thermocellum\ cellulosome catalytic components contain a single dockerin\ domain. The anaerobic bacterial dockerins are homologous to EF hands\ (calcium-binding motifs) and\ require calcium for activity whereas the fungal dockerin does not require calcium. Finally, the interaction between cohesin and dockerin\ appears to be species specific in bacteria, there is almost no species specificity of binding within fungal\ species and no identified\ sites that distinguish different species.

    \

    The structure of dockerin from Piromyces equi contains two helical stretches and four short\ beta-strands which form an\ antiparallel sheet structure adjacent to an\ additional short twisted parallel strand. The N- and C-termini are\ adjacent to each other.

    \

    Aerobic bacteria contain related regions, however these appear to function as cellulose/carbohydrate binding domains.

    \ \ 2653 IPR003859 \ This is a family of galactosyltransferases from a wide range of metazoa with three related galactosyltransferase activities; all three of which are possessed by one sequence in some cases. The three functions are N-acetyllactosamine synthase (); beta-N-acetylglucosaminyl-glycopeptide beta-1,4-galactosyltransferase (); and lactose synthase (). Note that N-acetyllactosamine synthase is a component of lactose synthase along with alpha-lactalbumin, in the absence of alpha-lactalbumin N-acetyllactosamine synthase is used.\ 6803 IPR010725 \

    This entry represents a conserved region approximately 50 residues long within a number of proteins of unknown function that seem to be specific to Arabidopsis thaliana. Note that many proteins contain multiple copies of this region.

    \ 2763 IPR005103 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ The only known activity within this family is that of endoglucanase () \ 6332 IPR010522 \

    This family consists of several bacterial replication protein C (RepC) sequences.

    \ 1297 IPR003311 \

    The Aux/IAA family of genes are key regulators of auxin-modified gene expression PUBMED:12036262. The plant hormone auxin (indole-3-acetic acid, IAA) regulates diverse cellular and developmental responses in plants, including cell division, expansion, differentiation and patterning of embryo responses PUBMED:15061689. Auxin can regulate the gene expression of several families, including GH3 and SAUR, as well as Aux/IAA itself. The Aux/IAA proteins act as repressors of auxin-induced gene expression, possibly through modulating the activity of DNA-binding auxin response factors (ARFs) (). Aux/IAA and ARF are thought to interact through C-terminal protein-protein interaction domains found in both Aux/IAA and ARF.

    \

    Recent evidence suggests that Aux/IAA proteins can also mediate light responses PUBMED:11544131. Some members of the AUX/IAA family are longer and contain an N-terminal DNA binding domain PUBMED:9482737 and may have an early function in the establishment of vascular and body patterns in embryonic and post-embryonic development in some plants.

    \ \ \ \ 4300 IPR000600 \ A family of bacterial proteins has been described which groups transcriptional repressors, sugar kinases and\ yet uncharacterized open reading frames PUBMED:7952186. This family, known as ROK (Repressor, ORF, Kinase)\ includes the xylose operon repressor, xylR, from Bacillus subtilis, Lactobacillus pentosus and Staphylococcus\ xylosus; N-acetylglucosamine repressor, nagC, from Escherichia coli; glucokinase () from Streptomyces\ coelicolor; fructokinase () from Pediococcus pentosaceus, Streptococcus mutans and Zymomonas mobilis;\ allokinase () and mlc from E. coli; and E. coli hypothetical proteins yajF and yhcI and the\ corresponding Haemophilus influenzae proteins. The repressor proteins (xylR and nagC) from this family possess\ an N-terminal region not present in the sugar kinases and which contains an helix-turn-helix DNA-binding motif.\ 4412 IPR001452 \ SH3 (src Homology-3) domains are small protein modules containing \ approximately 50 amino acid residues PUBMED:15335710, PUBMED:11256992. They are found in a \ great variety of intracellular or\ membrane-associated proteins PUBMED:1639195, PUBMED:14731533, PUBMED:7531822 for example, in a variety of\ proteins with enzymatic activity, in adaptor\ proteins that lack catalytic sequences and in cytoskeletal\ proteins, such as fodrin and yeast actin binding protein ABP-1. \

    The SH3 domain has a characteristic fold which consists of five or six beta-strands arranged as two tightly packed anti-parallel beta sheets. The linker\ regions may contain short helices PUBMED:. The surface of the SH2-domain bears a flat, hydrophobic ligand-binding pocket which consists of three shallow grooves defined by conservative aromatic residues in which the ligand adopts an extended left-handed helical arrangement. The ligand binds with low affinity but this may be enhanced by multiple interactions.\ The region bound by the SH3 domain is in all cases proline-rich and contains PXXP as a core-conserved binding motif. The function of the SH3 domain is not well understood but they may mediate many diverse processes such as increasing local concentration of proteins, altering their subcellular location and mediating the assembly of large multiprotein complexes PUBMED:7953536.

    \ 5572 IPR008419 \ This family consists of P25 proteins from the Beta vulgaris subsp. vulgaris necrotic yellow vein viruses.\ 583 IPR003608 \ This is a domain found in ryanodine, inositol trisphosphate receptor and protein O-mannosyltransferase. Inositol 1,4,5-trisphosphate (InsP3) is an intracellular second messenger that transduces growth factor and neurotransmitter signals. InsP3 mediates the release of Ca2+ from intracellular stores by binding to specific Ca2+ channel-coupled receptors. Ryanodine receptors are involved in communication between transverse-tubules and the sarcoplamic reticulum of cardiac and skeletal muscle. The proteins function as a Ca2+-release channels following depolarisation of transverse-tubules PUBMED:1645727. The function is modulated by Ca2+, Mg2+, ATP and calmodulin. Deficiency in the ryanodine receptor may be the cause of malignant hyperthermia (MH) and of central core disease of muscle (CCD) PUBMED:7829078. protein O-mannosyltransferases transfer mannose from DOL-P-mannose to ser or thr residues on proteins.\ 3781 IPR002491 \

    ATP binding cassette (ABC) transporters are a ubiquitous family of importer\ and exporter proteins that consist of two alpha-helical transmembrane (TM)\ domains, which form a translocation pathway, and two cytoplasmic ABC domains,\ which power the transport reaction through binding and hydrolysis of ATP. In addition most bacterial importers employs a periplasmic\ substrate-binding protein (PBP) that delivers the ligand to the extracellular\ gate of the TM domains. These proteins bind their substrates selectively and\ with high affinity, which is thought to ensure the specificity of the\ transport reaction. Binding proteins in Gram-negative bacteria are present\ within the periplasm, whereas those in Gram-positive bacteria are tethered to\ the cell membrane via the acylation of a cysteine residue that is an integral\ component of a lipoprotein signal sequence. In planta expression of a high-affinity iron-uptake system involving the siderophore chrysobactin in Erwinia chrysanthemi 3937 contributes greatly to invasive growth of this pathogen on its natural host, African violets PUBMED:8596459. The cobalamin (vitamin B12) and\ the iron transport systems share many common attributes and probably evolved\ from the same origin PUBMED:12475936, PUBMED:15475351.\

    \

    The structure of the periplasmic-binding domain is composed of two subdomains,\ each consisting of a central beta-sheet and surrounding alpha-helices, linked\ by a rigid alpha-helix. The substrate binding site is located\ in a cleft between the two alpha/beta subdomains PUBMED:12468528.

    \ 2358 IPR002800 \

    Proteins that belong to this group are restricted to the Mycobacteria and the Archaea and have no known function.

    \ 7520 IPR011701 \ Among the different families of transporter only two occur ubiquitously in all classifications of organisms. These are the ATP-Binding Cassette (ABC) superfamily and the Major Facilitator Superfamily (MFS). The MFS transporters are single-polypeptide secondary carriers capable only of transporting small solutes in response to chemiosmotic ion gradients PUBMED:9529885, PUBMED:9868370.\ 7018 IPR010806 \

    This family consists of several Orthopoxvirus proteins of around 185 resides in length. Members of this family seem to be exclusive to Vaccinia, Camelpox and Cowpox virus. Some family members are annotated as being C8 proteins but their function is unknown.

    \ 6892 IPR009771 \

    This family represents a conserved region approximately 300 residues long within a number of hypothetical eukaryotic proteins of unknown function. These are possibly integral membrane proteins.

    \ 7321 IPR011082 \

    These proteins include the human C1D protein and Saccharomyces cerevisiae YHR081W (rrp47), an exosome-associated protein required for the 3' processing of stable RNAs PUBMED:12972615.

    \ 6400 IPR010938 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 3255 IPR007054 \ The lysis S protein is a cytotoxic protein forming holes in membranes causing cell lysis. The action of Lysis S is independent of the proportion of acidic phospholipids in the membrane PUBMED:8467992.\ 985 IPR001846 \

    A family of growth regulators (originally called cef10, connective tissue growth factor, fisp-12, cyr61, or, alternatively, beta IG-M1 and beta IG-M2), all belong to immediate-early genes expressed after induction by growth factors or certain oncogenes. Sequence analysis of this family revealed the presence of four distinct modules. Each module has homologues in other extracellular mosaic proteins such as Von Willebrand factor, slit, thrombospondins, fibrillar collagens, IGF-binding proteins and mucins. Classification and analysis of these modules suggests the location of binding regions and, by analogy to better characterized modules in other proteins, sheds some light onto the structure of this new family PUBMED:7687569.

    \

    The vWF domain is found in various plasma proteins:\ complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen \ types VI, VII, XII and XIV; and other extracellular proteins PUBMED:8412987, PUBMED:8145250, PUBMED:1864378. Although the majority of VWA-containing proteins are extracellular, the most ancient ones present in all eukaryotes are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport and the proteasome. A common feature appears to be involvement in multiprotein complexes. Proteins\ that incorporate vWF domains participate in numerous biological events\ (e.g. cell adhesion, migration, homing, pattern formation, and signal\ transduction), involving interaction with a large array of ligands PUBMED:8412987. A number of human diseases arise from mutations in VWA domains. Secondary structure prediction from 75 aligned vWF sequences has revealed a largely alternating sequence of alpha-helices and beta-strands PUBMED:8145250.

    \

    One of the functions of von Willebrand factor (vWF) is to serve as a carrier of clotting factor VIII (FVIII). The native conformation of the D' domain of vWF is not only required for factor VIII (FVIII) binding but also for normal multimerization and optimal secretion PUBMED:10807780.

    \ 6922 IPR009791 \

    This family consists of several hypothetical bacterial proteins of around 225 residues in length. Members of this family appear to be specific Borrelia burgdorferi (Lyme disease spirochete). The function of this family is unknown.

    \ 4100 IPR000408 \

    The regulator of chromosome condensation (RCC1) PUBMED:8480369 is a eukaryotic protein\ which binds to chromatin and interacts with ran, a nuclear GTP-binding\ protein , to promote the loss of bound GDP and the uptake of\ fresh GTP, thus acting as a guanine-nucleotide dissociation stimulator (GDS).\ The interaction of RCC1 with ran probably plays an important role in the\ regulation of gene expression.

    \ \

    RCC1, known as PRP20 or SRM1 in yeast, pim1 in fission yeast and BJ1 in\ Drosophila, is a protein that contains seven tandem repeats of a domain of\ about 50 to 60 amino acids. As shown in the following schematic\ representation, the repeats make up the major part of the length of the\ protein. Outside the repeat region, there is just a small N-terminal domain of\ about 40 to 50 residues and, in the Drosophila protein only, a C-terminal\ domain of about 130 residues.

    \
    \
    +----+-------+-------+-------+-------+-------+-------+-------+-------------+\
    |N-t.|Rpt. 1 |Rpt. 2 |Rpt. 3 |Rpt. 4 |Rpt. 5 |Rpt. 6 |Rpt. 7 | C-terminal  |\
    +----+-------+-------+-------+-------+-------+-------+-------+-------------+\
    
    \ The RCC1-type of repeat is also found in the X-linked retinitis pigmentosa\ GTPase regulator PUBMED:8817343. The RCC repeats form a beta-propeller\ structure.\ 6467 IPR009540 \

    This family consists of several basal layer antifungal peptide (BAP) sequences specific to Zea mays. The BAP2 peptide exhibits potent broad-range activity against a range of filamentous fungi, including several plant pathogens PUBMED:11319035.

    \ 6173 IPR009408 \

    This region is found in some of the Diaphanous related formins (Drfs) PUBMED:12676083. It consists of low complexity repeats of around 12 residues.

    \ 765 IPR002638 \ Quinolinate phosphoribosyl transferase (QPRTase) or nicotinate-nucleotide pyrophosphorylase is involved in the de novo synthesis of NAD in both prokaryotes and eukaryotes. It catalyses the reaction of quinolinic acid with 5-phosphoribosyl-1-pyrophosphate (PRPP) in the presence of Mg2+ to give rise to nicotinic acid mononucleotide (NaMN), pyrophosphate and carbon dioxide PUBMED:9016724, PUBMED:8561507. Unlike , this domain also includes the molybdenum transport system protein ModD.\ 1086 IPR001941 \ Pro-opiomelanocortin is present in high levels in the pituitary and is processed into 3 major peptide families: adrenocorticotrophin (ACTH); alpha-, beta- and gamma-melanocyte- stimulating hormones (MSH); and beta-endorphin PUBMED:2266117. ACTH regulates the synthesis and release of glucocorticoids and, to some extent, aldosterone in the adrenal cortex. It is synthesised and released in response to corticotrophin-releasing factor at times of stress (i.e. heat, cold, infection, etc.), its release leading to increased metabolism. The action of MSH in man is poorly understood, but it may be involved in temperature regulation PUBMED:2266117. Full activity of ACTH resides in the first 20 N-terminal amino acids, the first 13 of which are identical to alpha-MSH PUBMED:2266117, PUBMED:2839146.\ 876 IPR007222 \

    SRP is a complex of six distinct polypeptides and a 7S RNA that is essential for transferring nascent polypeptide chains that are destined for export from the cell to the translocation apparatus of the endoplasmic reticulum membrane PUBMED:10734128. SRP binds hydrophobic signal sequences as they emerge from the ribosome, and arrests translation. This is the N-terminal of SRPR, the C-terminal is The receptor consists of a heterodimer of an alpha and a beta chain.

    \ 3224 IPR004984 \

    This domain is found along with a C-terminal domain () in a group of Mycoplasma lipoproteins of unknown function.

    \ 23 IPR006090 \

    Mammalian Co-A dehydrogenases () are enzymes that catalyse the first step in each cycle of beta-oxidation in mitochondion. Acyl-CoA dehydrogenases PUBMED:3326738, PUBMED:2777793, PUBMED:8034667 catalyze the alpha,beta-dehydrogenation of acyl-CoA thioesters to the corresponding trans 2,3-enoyl CoA-products with concommitant reduction of enzyme-bound FAD. Reoxidation of the flavin involves transfer of electrons to ETF (electron transfering flavoprotein). These enzymes are homodimers containing one molecule of FAD.

    The monomeric enzyme is folded into three domains of approximately equal size. The N-terminal and the C-terminal are mainly alpha-helices packed together, and the middle domain consists of two orthogonal beta-sheets. The flavin ring is buried in the crevise between two alpha-helical domains and the beta-sheet of one subunit, and the adenosine pyrophosphate moiety is stretched into the subunit junction with one formed by two C-terminal domains PUBMED:8356049. The C-terminal domain of Acyl-CoA dehydrogenase is an all-alpha, four helical up-and-down bundle.

    \ 3680 IPR001297 \ The phycobilisome linker polypeptide determines the state of aggregation and the location \ of the disc-shaped phycobiliprotein units within the phycobilisome and modulates their\ spectroscopic properties in order to mediate a directed and optimal energy transfer.\ The phycobilisome is a hemidiscoidal structure that is composed of two distinct\ substructures, a core complex (that contains the phycobiliproteins) and a number of\ rods radiating from the core. The linker polypeptide is also found in the chloroplast of\ some eukaryotes where it is required for attachment of phycocyanin to allophycocyanin\ in the core of the phycobilisome.\ 3245 IPR001964 \ The nucleotide sequence of the RNA of potato leafroll luteovirus (PLRV) has been determined PUBMED:2466700, PUBMED:2732710. The sequence contains six large ORFs. The 3' coding region encodes three polypeptides: a 23K coat protein, a 17K polypeptide encoded in a different frame, and a 53K polypeptide, immediately following the coat protein sequence in the same frame. It has been suggested that the 53K polypeptide is translated by readthrough of the amber termination codon of the coat protein gene. The amino acid sequences encoded within the 3' region show many similarities to analogous polypeptides of barley yellow dwarf virus, PAV strain (BYDV), and beet western yellows virus (BWYV). It is possible that the ORF5 protein is a VPG-precursor from which, at the onset of RNA synthesis, the VPG molecule is released, in a similar fashion to that proposed for cowpea mosaic virus.\ 6410 IPR010939 \

    This family consists of several eukaryote specific repeats of unknown function. This repeat seems to always be found with .

    \ 4682 IPR001585 \

    Transaldolase () catalyzes the reversible transfer of a three-carbon ketol unit from sedoheptulose 7-phosphate to glyceraldehyde 3-phosphate to form erythrose 4-phosphate and fructose 6-phosphate. This enzyme, together with transketolase, provides a link between the glycolytic and pentose-phosphate pathways. Transaldolase is an enzyme of about 34 Kd whose sequence has been well conserved throughout evolution. A lysine has been implicated PUBMED:8109173 in the catalytic mechanism of the enzyme; it acts as a nucleophilic group that attacks the carbonyl group of fructose-6-phosphate.

    \

    Transaldolase is evolutionary related PUBMED:7773398 to a bacterial protein of about 20 Kd (known as talC in Escherichia coli, ), whose exact function is not yet known.

    \ 1374 IPR003896 \

    A large group of bacterial exotoxins are referred to as "A/B toxins", \ essentially because they are formed from two subunits PUBMED:8225592. The "A" subunit possesses enzyme activity, and is transferred to the host cell following\ a conformational change in the membrane-bound transport "B" subunit. Clostridial species are one of the major causes of food \ poisoning/gastro-intestinal illnesses. They are Gram-positive, spore-forming rods that occur naturally in the soil PUBMED:8225592. Among the toxins produced by certain Clostridium spp. are the binary exotoxins. These proteins consist of two independent polypeptides, which correspond to the A/B subunit moieties. The enzyme component (A) enters the cell through endosomes produced by the oligomeric binding/translocation protein (B), and prevents actin polymerisation through ADP-ribosylation of monomeric G-actin PUBMED:8225592, PUBMED:9659689, PUBMED:10802189.

    \

    Members of the "B" binary toxin family also include the Bacillus anthracis protective antigen (PA) protein PUBMED:8225592, most likely due to a common evolutionary ancestor. B. anthracis, a large Gram-positive spore-forming rod, is the causative agent of anthrax. Its two virulence factors are the \ poly-D-glutamate polypeptide capsule, and the actual anthrax exotoxin PUBMED:1910002. The toxin comprises three factors: the protective antigen (PA); the oedema factor (EF); and the lethal factor (LF). Each is a thermolabile \ protein of ~80kDa. PA forms the "B" part of the exotoxin and allows passage\ of the "A" moiety (consisting of EF and LF) into target cells. PA protein forms the central part of the complete anthrax toxin, and translocates the B moiety into host cells after assembling as a heptamer in the membrane PUBMED:1910002, PUBMED:3148491.

    \ 1308 IPR005146 \

    This domain is found in tRNA synthetase beta subunits as well as in some non tRNA synthetase proteins.

    \ 3860 IPR003113 \ This is the region of the p110 phosphatidylinositol 3-kinase (PI3-Kinase) that binds the p85 subunit.\ 2413 IPR004211 \

    This family of proteins which includes phage T4 endonuclease VII, Mycobacteriophage gene 59, and other as yet uncharacterised proteins. Phage T4 endonuclease VII (Endo VII) recognizes a broad spectrum of DNA substrates ranging from branched DNAs to single base mismatches. The structure of this enzyme has been resolved and it was found that the monomers form an elongated, intertwined molecular dimer that exibits extreme domain swapping. Two pairs of antiparallel helices which form a novel 'four-helix cross' motif are the major dimerization elements PUBMED:10075917.

    \ 5392 IPR008406 \ This family contains several plant dormancy-associated and auxin-repressed proteins the function of which is poorly understood PUBMED:9684359.\ 2297 IPR007028 \ The function of this family of short bacterial proteins is unknown. \ 5092 IPR007929 \

    This family contains several uncharacterised proteins from Neisseria\ meningitidis. These proteins may have a role in DNA binding.

    \ 2958 IPR002970 \ The lipocalins are a diverse, interesting, yet poorly understood family of \ proteins composed, in the main, of extracellular ligand-binding proteins\ displaying high specificity for small hydrophobic molecules PUBMED:2580349, PUBMED:8761444. Functions\ of these proteins include transport of nutrients, control of cell regulation, pheromone transport, cryptic colouration and the enzymatic synthesis\ of prostaglandins.\

    \ The crystal structures of several lipocalins have been solved and show a \ novel 8-stranded anti-parallel beta-barrel fold well conserved within the\ family. Sequence similarity within the family is at a much lower level and\ would seem to be restricted to conserved disulphides and 3 motifs, which\ form a juxtaposed cluster that may act as a common cell surface receptor\ site PUBMED:8761444. By contrast, at the more variable end of the fold are found an \ internal ligand binding site and a putative surface for the formation of \ macromolecular complexes PUBMED:8573354. The anti-parallel beta-barrel fold is also\ exploited by the fatty acid-binding proteins (which function similarly by\ binding small hydrophobic molecules), by avidin and the closely related\ metalloprotease inhibitors, and by triabin. Similarity at the sequence\ level, however, is less obvious, being confined to a single short \ N-terminal motif.\ The lipocalin family can be subdivided into kernal and outlier sets. The\ kernal lipocalins form the largest self-consistent group, comprising the subfamily of tick histamine-binding proteins. The outlier lipocalins form several smaller distinct subgroups: \ the OBPs, the von Ebner's gland proteins, alpha-1-acid glycoproteins, \ tick histamine binding proteins and the nitrophorins.

    \

    The tick histamine binding proteins are the most recently identified set of \ outlier lipocalins. The structure of one tick histamine binding protein has\ been solved PUBMED:10360182 and has shown the proteins to have the characteristic \ lipocalin fold but without any appreciable sequence similarity. The tick\ histamine binding proteins are secreted into the saliva of the ixodid tick \ Rhipicephalus appendiculatus and share functional similarity with the \ nitrophorins, sequestering histamine at the wound site. Because the tick\ histamine binding proteins outcompete histamine receptors, they are able to\ overcome host inflammatory and immune responses. This enables the ticks to\ feed for extended periods, lasting from days to several weeks, and are able \ to gorge themselves on large blood meals increasing their body mass 100 fold.\ Unlike nitrophorins, the tick proteins do not bind haem (or other cofactor),\ but ligate histamine directly in two rigid orthogonally-arranged binding \ sites, at opposing ends of the lipocalin anti-parallel beta-barrel, which\ have an unusually polar character.

    \ 2836 IPR003413 \ The bacterial general secretion pathway (GSP) is involved in the export of proteins (also called the type II pathway). This family includes GSPI and GSPJ, which contain the pre-pilin signal sequence PUBMED:8407845.\ 3280 IPR004241 \ Light chain 3 (LC3) may function primarily as a MAP1A and MAP1B subunit and its expression may regulate the microtubule binding activity of of the neuronal microtubule-associated proteins (MAPs), MAP1A and MAP1B PUBMED:7908909. Related proteins that belong to this group include the human ganglioside expression factor and a symbiosis-related fungal protein.\ 3672 IPR001403 \ The parvovirus coat protein VP2 together with VP1 forms a capsomer. Both of \ these proteins are formed from the same transcript using alternative \ splicing. As a result, VP1 and VP2 differ only in the N-terminus region.\ VP2 is involved in packaging the viral DNA PUBMED:9129667. The mature viron contains three caaspid proteins \ VP1, VP2, and VP3 and a noncapsid protein NS-1.\ 438 IPR002654 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 25 comprises enzymes with only one known activity; as a lipopolysaccharide biosynthesis protein. These enzymes catalyse the transfer of various sugars onto the growing lipopolysaccharide chain during its biosynthesis PUBMED:8817494.

    \ 1425 IPR010258 \

    \ Several bacterial pathogens utilize conjugation machines to export effector molecules during infection. Such systems are members of the type IV or 'adapted conjugation' secretion family. The prototypical type IV system is the Agrobacterium tumefaciens T-DNA transfer machine, which delivers oncogenic nucleoprotein particles to plant cells. Other pathogens, including Bordetella pertussis, Legionella pneumophila, Brucella spp. and Helicobacter pylori, use type IV machines to export effector proteins to the extracellular milieu or the mammalian cell cytosol.

    \

    Conjugation machines of Gram-negative bacteria consist of two surface structures, the mating channel through which the DNA transfer intermediate and proteins are translocated and the conjugal pilus for contacting recipient cells. Various conjugative pili have been visualized, but to date there is no ultrastructural information about the mating channel. Recent work on the A. tumefaciens T-DNA transfer system has focused on identifying interactions among the VirB protein subunits and defining steps in the transporter assembly pathway. There are three functional groups of VirB proteins: proteins localized exocellularly forming the T-pilus or other adhesive structures; mating-channel components; and cytoplasmic membrane ATPases. Although all of these proteins probably assemble as a supramolecular complex, as yet there is no direct evidence for a physical association between the conjugative pilus and the mating channel.

    \

    Several lines of evidence suggest that VirB6VirB10 are probable channel subunits. VirB6, a highly hydrophobic protein, is thought to span the cytoplasmic membrane several times and presently is the best candidate for a channel-forming protein. VirB7, an outer membrane lipoprotein, interacts with itself and with VirB9 via disulfide bonds between unique reactive cysteines present in each protein. The VirB7VirB9 heterodimer localizes at the outer membrane and plays a critical role in stabilizing other VirB proteins during assembly of the transfer machine. VirB9 is also required for formation of chemically crosslinked VirB10 oligomers probably corresponding to homotrimers PUBMED:10920394.

    \ \ 2732 IPR002594 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 12 comprises enzymes with two known activities: endoglucanase ()and xyloglucan hydrolase (EC not defined). These enzymes were formerly known as cellulase family H.

    \ 6674 IPR010670 \

    This family represents a conserved region approximately 80 residues long within Pyrobaculum aerophilum family 1964 protein.

    \ 851 IPR005224 \

    The sugar fermentation stimulation protein is a probable regulatory factor involved in maltose metabolism. It contains a putative\ DNA-binding domain, and was isolated as a gene which enabled Escherichia coli strain MK2001 to use maltose.

    \ 6819 IPR009730 \

    This entry represents the C terminus (approximately 300 residues) of eukaryotic micro-fibrillar-associated protein 1, which is a component of elastin-associated microfibrils in the extracellular matrix PUBMED:8174780.

    \ 197 IPR004022 \ This domain is predicted to be a DNA binding domain. The DDT domain is named after (DNA binding homeobox and Different Transcription factors). It is found in fetal Alzheimer antigen and several hypothetical and uncharacterised proteins.\ 2346 IPR002781 \

    This family is found in integral membrane proteins of prokaryotes which are uncharacterized.

    \ 7359 IPR003894 \

    The TAF homology (TAFH) or Nervy homology region 1 (NHR1) domain is a domain of 95-100 amino acids present in eukaryotic proteins of the MTG/ETO family and whereof the core ~75-80 residues occur in TAF proteins. The transcription initiation TFIID complex is composed of TATA binding protein (TBP) and a number of TBP-associated factors (TAFs). The TAFH/NHR1 domain is named after fruit fly TATA-box-associated factor 110 (TAF110), human TAF105 and TAF130, and the fruit fly protein Nervy, which is a homologue of human MTG8/ETO PUBMED:9447981, PUBMED:9790752. The human eight twenty-one (ETO or MTG8) and related myeloid transforming gene products MTGR1 and MTG16 as well as the Nervy protein contain the NHR1-4 domains. The NHR1/TAFH domain occurs in the N-terminal part of these proteins, while a MYND-type zinc finger forms the NHR4 domain PUBMED:12559562. The TAFH/NHR1 domain can be involved in protein-protein interactions, e.g in MTG8/ETO with HSP90 and Gfi-1 PUBMED:10076566.

    \ \ \ 2690 IPR000942 \

    Geminiviruses are characterised by a genome of circular single-stranded\ DNA encapsidated in twinned (geminate) quasi-isometric particles, from\ which the group derives its name. Most geminiviruses can be divided\ into two subgroups on the basis of host range and/or insect vector: i.e.\ those that infect dicotyledenous plants and are transmitted by the same\ whitefly species, and those that infect monocotyledenous plants and are\ transmitted by different leafhopper vectors. The genomes of the whitefly-transmitted \ cassava latent (CLV), tomato golden mosaic (TGMV) and bean\ golden mosaic (BGMV) viruses possess a bipartite genome. By contrast, only\ a single DNA component has been identified for the leafhopper-transmitted\ maize streak (MSV) and wheat dwarf (WDV) viruses \ PUBMED:6526009, PUBMED:2829117.

    \ \ \

    Geminiviruses contain three ORFs (designated AL1, AL2, and AL3) that overlap and are specified by multiple polycistronic mRNAs. The AL2 gene product transactivates expression of TGMV coat protein gene PUBMED:1984661, and BR1 movement protein.

    \ \ \ \ 5757 IPR009228 \

    This family consists of several bacteriophage capsid scaffolding protein (GPO) and some related bacterial sequences. GPO is thought to function in both the assembly of proheads and the cleavage of GPN PUBMED:1837355.

    \ 1395 IPR002093 \

    The breast cancer type 2 susceptibility protein has a number of 39 amino acid repeats PUBMED:8673099 that are critical for binding to RAD51 (a key protein in DNA recombinational repair) and resistance to\ methyl methanesulphonate treatment PUBMED:9405383, PUBMED:9560268, PUBMED:9811893. BRCA2 is a breast tumor suppressor with a potential function in the cellular response to DNA damage. At the cellular level, expression\ is regulated in a cell-cycle dependent manner and peak expression of BRCA2 mRNA is found in S phase, suggesting BRCA2 may participate in regulating cell proliferation. There are eight repeats in BRCA2 designated as BRC1\ to BRC8. BRC1, BRC2, BRC3, BRC4, BRC7, and BRC8 are highly conserved and bind to Rad51, whereas BRC5 and BRC6 are less\ well conserved and do not bind to Rad51 PUBMED:10551859. It has been suggested that BRCA2 plays a role in positioning Rad51 at the site of\ DNA repair or in removing Rad51 from DNA once repair has been completed.

    \ 3250 IPR004103 \

    Proteins containing this domain consist of a group of secreted bacterial lyase enzymes capable of acting on hyaluronan (hyaluronate lyase, ) and chondroitin (chondroitin AC lyase, ) in the extracellular matrix of host tissues, contributing to the invasive capacity of the pathogen PUBMED:14523022, PUBMED:10329169. This domain is almost always associated with the polysaccharide lyase family 8, N-terminal domain (see ). This entry represents the C-terminal domain of hyaluronate and chondroitin AC lyase enzymes.

    \ 8029 IPR013267 \

    Most isolated ORF2 of TT virus (TTV) encode a 49 amino acid protein (pORF2a) because of an in-frame stop codon. ORF2s isolated from G1 TTV encode a 202 amino acid protein (pORF2ab) PUBMED:10963344.

    \ 7308 IPR011085 \

    This proteins in this entry are of unknown function. Members are restricted to the Alphaproteobacteria: Bradyrhizobium and Sinorhizobium.

    \ 1780 IPR007817 \ This family includes DIT1 that is involved in synthesizing dityrosine PUBMED:8183942. Dityrosine is a sporulation-specific component of the Saccharomyces cerevisiae ascospore wall that is essential for the resistance of the spores to adverse environmental conditions. is involved in the biosynthesis of pyoverdine PUBMED:8704959.\ 2735 IPR000490 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 17 comprises enzymes with several known activities; endo-1,3-beta-glucosidase (); lichenase (); exo-1,3-glucanase (). Currently these enzymes have only been found in plants and in fungi.

    \ 5480 IPR008594 \ This family consists of several scavenger mRNA decapping enzymes (DcpS). DcpS is a scavenger pyrophosphatase that hydrolyses the residual cap structure following 3 to 5 decay of an mRNA. The association of DcpS with 3 to 5 exonuclease exosome components suggests that these two activities are linked and there is a coupled exonucleolytic decay-dependent decapping pathway. The family contains a histidine triad (HIT) sequence with three histidines separated by hydrophobic residues. The central histidine within the DcpS HIT motif is critical for decapping activity and defines the HIT motif as a new mRNA decapping domain, making DcpS the first member of the HIT family of proteins with a defined biological function. This family is related to \ 3500 IPR007231 \

    The Nucleoporin interacting component is part of the nuclear pore complex required for protein transport in the nucleus.

    \ 563 IPR012301 \

    Malic enzymes (malate oxidoreductases) catalyse the oxidative decarboxylation of malate to form pyruvate PUBMED:, a reaction important in a number of metabolic pathways - e.g. carbon dioxide released from the reaction may be used in sugar production during the Calvin cycle of photosynthesis PUBMED:8300616. There are 3 forms of the enzyme PUBMED:1993674: an NAD-dependent form that decarboxylates oxaloacetate; an NAD-dependent form that does not decarboxylate oxalo-acetate; and an NADPH-dependent form PUBMED:8300616. Other proteins known to be similar to malic enzymes are the Escherichia coli scfA protein; an enzyme from Zea mays (Maize), formerly thought to be cinnamyl-alcohol dehydrogenase PUBMED:2103472; and the hypothetical Saccharomyces cerevisiae protein YKL029c.

    \

    Studies on the duck liver malic enzyme reveals that it can be alkylated by bromopyruvate, resulting in the loss of oxidative decarboxylation and the subsequent enhancement of pyruvate reductase activity PUBMED:1911848. The alkylated form is able to bind NADPH but not L-malate, indicating impaired substrate-or divalent metal ion-binding in the active site PUBMED:1911848. Sequence analysis has highlighted a cysteine residue as the point of alkylation, suggesting that it may play an important role in the activity of the enzyme PUBMED:1911848, although it is absent in the sequences from some species.

    \

    There are three well conserved regions in the enzyme sequences. Two of them seem to be involved in the binding NAD or NADP. The significance of the third one, located in the central part of the enzymes, is not yet known.

    \ 7122 IPR009911 \

    This family consists of several insect fibroin P25 proteins. Silk fibroin produced by the silkworm Bombyx mori consists of a heavy chain, a light chain, and a glycoprotein, P25. The heavy and light chains are linked by a disulfide bond, and P25 associates with disulfide-linked heavy and light chains by noncovalent interactions. P25 is plays an important role in maintaining integrity of the complex PUBMED:10986287.

    \ 2651 IPR005600 \

    The DNA binding domain (residues 1 to 147) of the yeast transcriptional activator GAL4 exists in\ solution in dimeric form, with the region responsible for dimerisation somewhere between residues 74 and 147. Experimental studies confirmed that the\ 'hydrophobic region' of the protein (residues 54-97, which contains a larger proportion of alpha-helix), is essential for dimerisation PUBMED:8765712.

    \ 5403 IPR008890 \ This family consists of several RfbT proteins from Vibrio cholerae. It has been found that genetic alteration of the rfbT gene is responsible for serotype conversion of V. cholerae O1 PUBMED:7688846 and determines the difference between the Ogawa and Inaba serotypes, in that the presence of rfbT is sufficient for Inaba-to-Ogawa serotype conversion PUBMED:11035750.\ 29 IPR011079 \

    Alanine racemase plays a role in providing the D-alanine required for cell wall biosynthesis by isomerising L-alanine to D-alanine. Proteins contains this domain are found in both prokaryotic and eukaryotic proteins PUBMED:1676385,PUBMED:7871888. The molecular structure of alanine racemase from Bacillus stearothermophilus was determined by X-ray crystallography to a resolution of 1.9 A PUBMED:9063881. The alanine racemase monomer is composed of two domains, an eight-stranded alpha/beta barrel at the N-terminus, and a C-terminal domain essentially composed of beta-strand. The pyridoxal 5'-phosphate (PLP) cofactor lies in and above the mouth of the alpha/beta barrel and is covalently linked via an aldimine linkage to a lysine residue, which is at the C-terminus of the first beta-strand of the alpha/beta barrel.

    \ 6775 IPR009709 \

    This family consists of several bacterial small basic proteins of around 100 residues in length. The function of this family is unknown.

    \ 5794 IPR009238 \

    This family consists of several Chordopoxvirus A33R proteins. A33R plays a role in promoting Ab-resistant cell-to-cell spread of virus PUBMED:11752718 and interacts with A36R to incorporate the protein into the outer membrane of intracellular enveloped virions (IEV) PUBMED:12634370.

    \ 7881 IPR012566 \

    This family consists of the leader peptides of the ilvB operon. This region encodes a potential leader polypeptide containing 32 amino acids, 12 of which are the regulatory amino acids valine and leucine. A model for the multivalent regulation of this operon by valyl- and leucyl-tRNA is proposed on the basis of the mutually exclusive formation of five strong stem-and-loop structures in the leader mRNA PUBMED:6292893.

    \ 2381 IPR001753 \

    Enoyl-CoA hydratase () (ECH) PUBMED:2806264 and 3-2trans-enoyl-CoA isomerase () (ECI) PUBMED:1958319 are two enzymes involved in fatty acid metabolism. ECH catalyzes the hydratation of 2-trans-enoyl-CoA into 3-hydroxyacyl-CoA and ECI shifts the 3- double bond of the intermediates of unsaturated fatty acid oxidation to the 2-trans position.

    \

    Most eukaryotic cells have two fatty-acid beta-oxidation systems, one located in mitochondria and the other in peroxisomes. In mitochondria, ECH and ECI are separate yet structurally related monofunctional enzymes. Peroxisomes contain a trifunctional enzyme PUBMED:2303409 consisting of an N-terminal domain that bears both ECH and ECI activity, and a C-terminal domain responsible for 3-hydroxyacyl-CoA dehydrogenase (HCDH) activity.

    \

    In Escherichia coli (gene fadB) and Pseudomonas fragi (gene faoA), ECH and ECI are also part of a multifunctional enzyme which contains both a HCDH and a 3-hydroxybutyryl-CoA epimerase domain PUBMED:2204034.

    \

    A number of other proteins have been found to be evolutionary related to the ECH/ECI enzymes or domains:\

    \

    \ 6868 IPR009758 \

    This family consists of several hypothetical bacterial proteins, which seem to be found exclusively in Rhizobium and Ralstonia species. Members of this family are typically around 210 residues in length and contain 5 highly conserved cysteine residues at their N terminus. The function of this family is unknown.

    \ 2658 IPR008176 \

    The following small plant proteins are evolutionary related:

    \ \

    In their mature form, these proteins generally consist of about 45 to 50 amino-acid residues. As shown in the following schematic representation, these peptides contain eight conserved cysteines involved in disulphide bonds.

    \
    \
              +-------------------------------------------+\
              |          +-------------------+            |\
              |          |                   |            |\
            xxCxxxxxxxxxxCxxxxxCxxxCxxxxxxxxxCxxxxxxCxCxxxC\
                               |   |                | |\
                               +---|----------------+ |\
                                   +------------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \

    The folded structure of Gamma-purothionin is characterised by a well-defined 3-stranded anti-parallel beta-sheet and a short alpha-helix PUBMED:8380707. Three disulphide bridges are located in the hydrophobic core between the helix and sheet, forming a cysteine-stabilised alpha-helical motif. This structure differs from that of the plant alpha- and beta- thionins, but is analogous to scorpion toxins and insect defensins.

    \ 5159 IPR007996 \

    This family consists of several uncharacterised Calicivirus proteins of unknown function.

    \ 1689 IPR007325 \ Proteins in this family are thought to be cyclase enzymes. They are found in proteins involved in antibiotic synthesis. However they are also found in organisms that do not make antibiotics pointing to a wider role for these proteins. The proteins contain a conserved motif HXGTHXDXPXH that is likely to form a part of the active site.\ 2523 IPR001664 \

    Intermediate filaments (IF) PUBMED:8771189, PUBMED:3052284, PUBMED:2183847 are proteins which are primordial components of the cytoskeleton and the nuclear envelope. They generally form filamentous structures 8 to 14 nm wide.

    \

    IF proteins are members of a very large multigene family of proteins which has been subdivided in five major subgroups:\

    \

    All IF proteins are structurally similar in that they consist of: a central rod domain comprising some 300 to 350 residues which is arranged in coiled-coiled alpha-helices, with at least two short characteristic interruptions; a N-terminal non-helical domain (head) of variable length; and a C-terminal domain (tail) which is also non-helical, and which shows extreme length variation between different IF proteins.

    \

    While IF proteins are evolutionary and structurally related, they have limited sequence homologies except in several regions of the rod domain.

    \ \ 6655 IPR010663 \

    This zinc binding domain is found at the C terminus of isoleucyl tRNA synthetase and the enzyme formamidopyrimidine-DNA glycosylase .

    \ 524 IPR000794 \ Beta-ketoacyl-ACP synthase () (KAS) PUBMED:3076376 is the enzyme that catalyzes\ the condensation of malonyl-ACP with the growing fatty acid chain. It is found as a component\ of a number of enzymatic systems, including fatty acid synthetase (FAS), which catalyzes the\ formation of long-chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH; the \ multi-functional 6-methysalicylic acid synthase (MSAS) from Penicillium patulum PUBMED:2209605, which is\ involved in the biosynthesis of a polyketide antibiotic; polyketide antibiotic synthase enzyme\ systems; Emericella nidulans multifunctional protein Wa, which is involved in the biosynthesis\ of conidial green pigment; Rhizobium nodulation protein nodE, which probably acts as a \ beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl chain; and yeast\ mitochondrial protein CEM1. The condensation reaction is a two step process, first the acyl\ component of an activated acyl primer is transferred to a cysteine residue of the enzyme and\ is then condensed with an activated malonyl donor with the concomitant release of carbon\ dioxide.\ 7357 IPR006562 \

    This domain of unknown function is found in helicases and other DNA-binding proteins of eukaryotes.

    \ 2700 IPR000792 \

    This domain is a DNA-binding, helix-turn-helix (HTH) domain of about 65 amino acids, present in transcription regulators of the LuxR/FixJ family of response regulators. The domain is named after Vibrio fischeri luxR, a transcriptional activator for quorum-sensing control of luminescence. LuxR-type \ HTH domain proteins occur in a variety of organisms. The DNA-binding HTH domain is usually located in the C-terminal part; the N-terminal part can contain an autoinducer binding domain or a response regulatory domain. Most luxR-type regulators act as transcription activators, but some can be repressors or have a dual role for different sites. LuxR-type HTH regulators control a wide variety of activities in various biological processes.

    \ \

    Several structures of luxR-type HTH proteins have been resolved and show that the DNA-binding domain is formed by a four-helix bundle. The helix-turn-helix motif comprises the second and third helices, which are being called the scaffold and the recognition helix, respectively. The HTH is involved in DNA-binding into the major groove, where the N-terminal part of the recognition helix makes most DNA contacts. The fourth helix is involved in dimerization of gerE and traR. Signalling events by one of the four activation mechanisms described below lead to multimerization of the regulator. The regulators bind DNA as multimers PUBMED:11243786, PUBMED:12740396, PUBMED:12087407.

    \ \

    LuxR-type HTH proteins can be activated by one of four different mechanisms:

    \ \

    I. Regulators which belong to a two-component sensory transduction system where the protein is activated by its phosphorylation, generally on an aspartate residue, by a transmembrane kinase PUBMED:12352954, PUBMED:12162958. Some proteins that belong to this category are:

    \
  • Rhizobiaceae fixJ, a global regulator inducing the expression of nitrogen-fixation genes in microaerobiosis.
  • \
  • Escherichia coli and Salmonella typhimurium uhpA, activates the uhpT gene for hexose phosphate transport.
  • \
  • Escherichia coli narL and narP, activate the nitrate reductase operon.
  • \
  • Enterobacteria rcsB, involved in the regulation of exopolysaccharide biosynthesis in enteric and plant pathogenesis.
  • \
  • Bordetella pertussis bvgA, plays a role in virulence.
  • \
  • Bacillus subtilis comA, plays a role in the expression of late-expressing competence genes.
  • \ \ \

    II. Regulators which are activated, or in very rare cases repressed, when bound to N-acyl homoserine lactones, which are used as quorum sensing molecules in a variety of Gram-negative bacteria PUBMED:15255890:

    \
  • Vibrio fischeri luxR, activates the bioluminescence operon.
  • \
  • Agrobacterium tumefaciens traR, involved in the regulation of Ti plasmid transfer.
  • \
  • Erwinia carotovora carR, plays a role in the control of the biosynthesis of carbapenem antibiotics.
  • \
  • Erwinia carotovora expR, acts in virulence (soft rot disease) through the activation of genes for plant tissue macerating enzymes.
  • \
  • Pseudomonas aeruginosa lasR, activates the elastase gene (lasB).
  • \
  • Erwinia chrysanthemi echR and Erwinia stewartii esaR.
  • \
  • Pseudomonas aureofaciens phzR, a positive regulator of phenazine antibiotic production.
  • \
  • Pseudomonas aeruginosa rhlR, activates the rhlAB operon as well as the lasB gene.
  • \ \

    III. Autonomous effector domain regulators, without a regulatory domain, represented by gerE PUBMED:11243786.

    \
  • Bacillus subtilis gerE, a transcription activator and repressor for the regulation of spore formation.
  • \ \

    IV. Multiple ligand-binding regulators, exemplified by malT PUBMED:11931562.

    \
  • Escherichia coli malT, activates the maltose operon. MalT binds ATP and maltotriose.
  • \ \ \ 4178 IPR000196 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Three genes from the spc operon in the archaeon (Crenarchaeota) Sulfolobus acidocaldarius, coding for ribosomal proteins S4E, L32E, and L19E (equivalent to rat ribosomal proteins S4, L32, and L19), were sequenced and the structure of the putative proteins was determined. The order of the ribosomal protein genes in the spc operon of the Crenarchaeota kingdom of archaea is identical to that present in the euryarchaeota kingdom of archaea. The genes for r-proteins S4E, L32E, and L19E are absent in bacteria. The archaeal r-proteins showed substantial identity to their eukaryotic equivalents, but in all cases the archaeal proteins formed a separate group from the eukaryotic proteins PUBMED:10381320.

    \ 4942 IPR002166 \ The RNA dependent RNA polymerase is also known as non-structural protein NS5B. NS5B is a 65 kDa protein that resembles other viral RNA polymerases. HCV replication is thought to occur in membrane bound replication complexes. These complexes transcribe the positive strand and the resulting minus strand is used as a template for the synthesis of genomic RNA. There are two viral proteins involved in the reaction, NS3 and NS5B PUBMED:9343198, PUBMED:8598194, PUBMED:9514871.\ 7858 IPR012512 \

    The albumin I protein, a hormone-like peptide, stimulates kinase activity upon binding a membrane bound 43 kDa receptor. The structure of this region reveals a knottin like fold, comprise of three beta strands PUBMED:12631285.

    \ 8002 IPR012590 \

    This domain is found in POP1-like nucleolar proteins PUBMED:15112237.

    \ 1615 IPR013124 \

    The connexins are a family of integral membrane proteins that oligomerise to form intercellular channels that are clustered at gap junctions. These channels are specialised sites of cell-cell contact that allow the passage of ions, intracellular metabolites and messenger molecules (with molecular weight less than 1-2 kD) from the cytoplasm of one cell to its opposing neighbours. They are found in almost all vertebrate cell types, and somewhat similar proteins have been cloned from plant species. Invertebrates utilise a different family of molecules, innexins, that share a similar predicted secondary structure to the vertebrate connexins, but have no sequence identity to them PUBMED:9769729.

    \ \

    Vertebrate gap junction channels are thought to participate in diverse biological functions. For instance, in the heart they permit the rapid cell-cell transfer of action potentials, ensuring coordinated contraction of the cardiomyocytes. They are also responsible for neurotransmission at specialised 'electrical' synapses. In non-excitable tissues, such as the liver, they may allow metabolic cooperation between cells. In the brain, glial cells are extensively-coupled by gap junctions; this allows waves of intracellular Ca2+ to propagate through nervous tissue, and may contribute to their ability to spatially-buffer local changes in extracellular K+ concentration PUBMED:7685944.

    \ \

    The connexin protein family is encoded by at least 13 genes in rodents, with many homologues cloned from other species. They show overlapping tissue expression patterns, most tissues expressing more than one connexin type. Their conductances, permeability to different molecules, phosphorylation and voltage-dependence of their gating, have been found to vary. Possible communication diversity is increased further by the fact that gap junctions may be formed by the association of different connexin isoforms from apposing cells. However, in vitro studies have shown that not all possible combinations of connexins produce active channels PUBMED:8811187, PUBMED:8608591.

    \ \

    Hydropathy analysis predicts that all cloned connexins share a common transmembrane (TM) topology. Each connexin is thought to contain 4 TM\ domains, with two extracellular and three cytoplasmic regions. This model\ has been validated for several of the family members by in vitro biochemical\ analysis. Both N- and C-termini are thought to face the cytoplasm, and the\ third TM domain has an amphipathic character, suggesting that it contributes\ to the lining of the formed-channel. Amino acid sequence identity between\ the isoforms is ~50-80%, with the TM domains being well conserved. Both\ extracellular loops contain characteristically conserved cysteine residues,\ which likely form intramolecular disulphide bonds. By contrast, the single\ putative intracellular loop (between TM domains 2 and 3) and the cytoplasmic\ C-terminus are highly variable among the family members.\ Six connexins are\ thought to associate to form a hemi-channel, or connexon. Two connexons then\ interact (likely via the extracellular loops of their connexins) to form the\ complete gap junction channel.

    \ \
     \
           NH2-***        ***        *************-COOH\
                 **     **   **      **\
                 **    **     **    **   Cytoplasmic\
              ---**----**-----**----**----------------\
                 **    **     **    **   Membrane\
                 **    **     **    **\
              ---**----**-----**----**----------------\
                 **    **     **    **   Extracellular\
                  **  **       **  **\
                    **           **\
    
    \ \

    Gap junction alpha-1 protein (also called connexin43, or Cx43) is a connexin\ of 381 amino acid residues (human isoform) that is widely expressed in\ several organs and cell types, and is the principal gap junction protein of\ the heart. Characterisation of genetically-engineered mice that lack Cx43,\ and also of human patients that have spontaneously-occurring mutations in\ the gene encoding it (GJA1), suggest Cx43 is essential for the development\ of normal cardiac architecture and ventricular conduction. Mice lacking Cx43\ survive to term but die shortly after birth. They have cardiac malformations\ that lead to the obstruction of the pulmonary artery, leading to neonatal\ cyanosis, and subsequent death. This phenotype is reminiscent of some forms\ of stenosis of the pulmonary artery. Human subjects with visceroatrial\ heterotaxia (a heart disorder characterised by arterial defects), have been\ found to have points mutations in the Cx43-encoding gene, as a result of \ which a potential phosphorylation site within the C-terminus is disrupted. \ Consequently, although these mutant Cx43 molecules still form functional gap\ junction channels, their response to protein kinase activation is impaired.

    \ \

    This domain is found in the C terminal region of these proteins.

    \ 6305 IPR010512 \

    This family consists of several Drosophila melanogaster specific proteins. The function of this family is unknown.

    \ 3866 IPR001829 \

    Most Gram-negative bacteria possess a supramolecular structure - the pili - on their surface, that mediates attachment to specific receptors. Many interactive subunits are required to assemble pili, but their assembly only takes place after translocation across the cytoplasmic membrane.

    \

    Periplasmic chaperones assist pili assembly by binding to the subunits, thereby preventing premature aggregation PUBMED:8670884, PUBMED:1683764. This family of chaperones are structurally, and possibly evolutionarily, related to the immunoglobulin superfamily PUBMED:1348692: they contain two globular domains, with a topology identical to an immunoglobulin fold.

    \ 3732 IPR005083 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This is a group of cysteine peptidases which constitute MEROPS peptidase family C55 (clan CE). The type example is the YopJ protease of Yersinia pseudotuberculosis.

    \ 7375 IPR011508 \

    This domain is found in three copies at the N terminus of the Caenorhabditis elegans RSD-2 protein. RSD-2 (RNAi spreading defective) is involved in systemic RNAi PUBMED:14738731. Mutations in the rsd-2 gene do not affect somatic genes but only germline expressed genes PUBMED:14738731.

    \ 2842 IPR000038 \ Members of this family are involved in cell division and bind GTP.\ Members of this family include the cell division control proteins CDC3, CDC10, CDC11 and CDC12/Septin and some uncharacterised proteins involved in cytokinesis.\ 7975 IPR012956 \

    This N-terminal domain is found in CARG-binding factor A-like proteins PUBMED:15112237.

    \ 7573 IPR002087 \ A family of anti-proliferative proteins has been shown to include mammalian and avian protein BTG1 (which appears to be involved in negative regulation\ of cell proliferation) and rat/mouse NGF-inducible protein PC3/TIS21 (BTG2) PUBMED:1373383, PUBMED:8325512, PUBMED:1849653.\ These proteins have from 158 to 363 amino acid residues, that are highly similar and include 3 conserved cysteine residues. BTG2 seems to have a\ signal sequence; while the other proteins may lack such a domain. The sequence\ of the N-terminal half of these proteins is well conserved.\ 1954 IPR004859 \ Signatures of this entry align residues towards the N-terminus of several proteins with multiple functions. The members of this family all appear to\ possess 5'-3' exonuclease activity EC:3.1.11.-. Thus, the aligned region may be necessary for 5'-3' exonuclease function. \ \ 3945 IPR005057 \

    This is a protein family of unknown function.

    \ 2202 IPR007518 \ This is a eukaryotic protein of unknown function.\ 3011 IPR001845 \

    Bacterial transcription regulatory proteins that bind DNA via a helix-turn-helix (HTH) motif can be grouped into families on the basis of sequence similarities. One such group, termed arsR, includes several proteins that appear to dissociate from DNA in the presence of metal ions: arsR, which functions as a transcriptional repressor of an arsenic resistance operon; smtB from Synechococcus PCC 7942, which acts as a transcriptional repressor of the smtA gene that codes for a metallothionein; cadC, a protein required for cadmium-resistance; and hypothetical protein yqcJ from Bacillus subtilis.

    \

    The HTH motif is thought to be located in the central part of these proteins PUBMED:8451191. The motif is characterised by a number of well-conserved residues: at its N-terminal extremity is a cysteine residue; a second Cys is found in arsR and cadC, but not in smtA; and at the C-terminus lie one or two histidines. These residues may be involved in metal-binding (Zn in smtB; metal-oxyanions such as arsenite, antimonite and arsenate for arsR; and cadmium for cadC) PUBMED:8506147. It is believed that binding of a metal ion could induce a conformational change that would prevent the protein from binding DNA PUBMED:8506147.

    \

    The crystal structure of the cyanobacterial smtB shows a fold of five\ alpha-helices (H) and a pair of antiparallel beta-strands (B) in the topology\ H1-H2-H3-H4-B1-B2-H5. Helices 3 and 4 comprise the\ helix-turn-helix motif and the beta-sheet is called the wing as in other wHTH,\ such as the dtxR-type or the merR-type.\ Helix 4 is termed the recognition helix, like in other HTHs where it binds the\ DNA major groove. Most arsR/smtB-like metalloregulators form homodimers PUBMED:14568530.\ The dimer interface is formed by helix 5 and an N-terminal part PUBMED:9466913. Two\ distinct metal-binding sites have been identified. The first site comprises\ cysteine thiolates located in the HTH in helix 3 and for some cases in the\ N-terminus, called the alpha3(N) site PUBMED:8506147. The second metal-binding site\ is located in helix 5 (and C-terminus) and is called the alpha5(C) site. The\ alpha3N site binds large thiophilic, toxic metals including Cd, Pb, and Bi, as\ in S. aureus cadC. ArsR lacks the N-terminal arm and its alpha3 site\ coordinates smaller thiophilic ions like As and Sb. The alpha5 site contains\ carboxylate and imidazole ligands and interacts preferentially with\ biologically required metal ions including Zn, Co, and Ni. ArsR-type\ metalloregulators contain one of these sites, both, or other potential\ metal-binding sites PUBMED:12829264, PUBMED:14960585. Binding of metal ions to these sites leads to\ allosteric changes that can derepress the operator/promotor DNA. The\ metal-inducible operons contain one or two imperfect 12-2-12 inverted repeats,\ which can be recognized by multimeric arsR-type metalloregulators.\

    \ 4264 IPR001663 \ Bacterial ring hydroxylating dioxygenases are multicomponent 1,2-dioxygenase complexes that convert closed-ring structures to non-aromatic cis-diols PUBMED:1885518. The complex has both hydroxylase and electron transfer components. The hydroxylase component is itself composed of two subunits: an alpha-subunit of about 50 kDa, and a beta-subunit of about 20 kDa. The electron transfer component is either composed of two subunits: a ferredoxin and a ferredoxin reductase or by a single bifunctional ferredoxin/reductase subunit. Sequence analysis of hydroxylase subunits of ring hydroxylating systems (including toluene, benzene and napthalene 1,2-dioxygenases) suggests they are derived from a common ancestor PUBMED:1885518. The alpha-subunit binds both a Rieske-like 2Fe-2S cluster and an iron atom: conserved Cys and His residues in the N-terminal region may provide 2Fe-2S ligands, while conserved His and Tyr residues may coordinate the iron. The beta subunit may be responsible for the substrate specificity of the dioxygenase system PUBMED:1885518.\ 690 IPR001948 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M18, (clan MH). The proteins have two catalytic zinc ions at the active site, bound by His/Asp, Asp, Glu, Asp/Glu and His. The catalysed reaction involves the release of an N-terminal aminoacid, usually neutral or hydrophobic, from a polypeptide PUBMED:7674922.

    \ \

    The type example is aminopeptidase I from Saccharomyces cerevisiae, the sequence of which has been deduced, and the mature protein shown to consist\ of 469 amino acids PUBMED:2651436. A 45-residue presequence contains both\ positively- and negatively-charged and hydrophobic residues, which could be arranged\ in an N-terminal amphiphilic alpha-helix PUBMED:2651436. The presequence differs from\ signal sequences that direct proteins across bacterial plasma membranes and\ endoplasmic reticulum or into mitochondria. It is unclear how this unique\ presequence targets aminopeptidase I to yeast vacuoles, and how this\ sorting utilises classical protein secretory pathways PUBMED:2651436.

    \ 6697 IPR009667 \

    This family represents a conserved region approximately 260 residues long within a number of hypothetical proteins of unknown function that seem to be specific to Caenorhabditis elegans. Note that this family contains a number of conserved cysteine and histidine residues.

    \ 4581 IPR004832 \ Two related oncogenes, TCL-1 and MTCP-1 , are overexpressed in T cell prolymphocytic leukemias as a result of chromosomal rearrangements that involve the translocation of one T cell receptor gene to either chromosome 14q32 or Xq28 PUBMED:9520380.\ 5382 IPR008720 \ This family consists of several viral hemorrhagic septicemia virus non-virion (Nv) proteins. The NV protein is a nonstructural protein absent from mature virions although it is present in infected cells. The function of this protein is unknown PUBMED:7571446.\ 6044 IPR009347 \

    This family consists of several Rice tungro bacilliform virus P46 proteins. The function of this family is unknown.

    \ 5752 IPR009224 \

    This short region is found repeated in the mid region of the adenomatous polyposis proteins (APCs). This motif binds axin PUBMED:9823329.

    \ 3654 IPR005310 \

    PapG, the adhesin of the P-pili, is situated at the tip and is only a minor component of the whole pilus structure. A two-domain structure has been postulated for PapG; a carbohydrate binding N-terminus (this domain) and chaperone binding C-terminus. The carbohydrate-binding domain interacts with the receptor glycan PUBMED:11454740, PUBMED:11440716.

    \ 3932 IPR006920 \ This is a family of Chordopoxvirus A9 proteins.\ 2018 IPR002678 \ One member of this family NIF3 (NGG1p interacting factor 3) interacts with\ the yeast transcriptional coactivator NGG1p which is part of the ADA\ complex the exact function of this interaction is unknown PUBMED:8663102.\ 4610 IPR003166 \ General transcription factor TFIIE consists of two subunits, TFIIE alpha and TFIIE beta. TFIIE beta has been found to bind to the region where the promoter starts to open to be single-stranded upon transcription initiation by RNA polymerase II. The structure of the DNA binding core region has been solved PUBMED:10716934 and has a winged helix fold.\ 4577 IPR002212 \

    Transforming growth factor beta (TGF-beta)-binding protein-like (TB) domain comes from human fibrillin-1PUBMED:8364578. This domain is\ found in fibrillins and latent TGF-beta-binding proteins (LTBPs) which are localized to\ fibrillar structures in the extracellular matrix.PUBMED:9362480.

    \ 3553 IPR001062 \ Bacterial transcription antitermination protein, nusG, is a component of the\ transcription complex and interacts with the termination factor rho and RNA\ polymerase PUBMED:8422985, PUBMED:1532577. NusG is a bacterial transcriptional\ elongation factor involved in transcription termination and anti-termination PUBMED:7505669.\ 6634 IPR009634 \

    This family consists of several putative phage excisionase proteins of around 80 residues in length.

    \ 3509 IPR006067 \

    Sulfite reductases (SiRs) and related nitrite reductases (NiRs) catalyse the six-electron reduction reactions of sulfite to sulfide, and nitrite to ammonia, respectively. The Escherichia coli SiR enzyme is a complex composed of two proteins, a flavoprotein alpha-component (SiR-FP) and a hemoprotein beta-component (SiR-HP) (), and has an alpha(8)beta(4) quaternary structure PUBMED:10984484. SiR-FP contains both FAD and FMN, while SiR-HP contains a Fe(4)S(4) cluster coupled to a siroheme through a cysteine bridge. Electrons are transferred from NADPH to FAD, and on to FMN in SiR-FP, from which they are transferred to the metal centre of SiR-HP, where they reduce the siroheme-bound sulfite.

    \

    SiR-HP has a two-fold symmetry, which generates a distinctive three-domain alpha/beta fold that controls assembly and reactivity PUBMED:7569952. In the E. coli SiR-HP enzyme (), the iron is bound to cysteine residues at positions 433, 439, 478 and 482, the latter also forming the siroheme ligand.

    \ 2502 IPR000522 \ This is a subfamily of bacterial binding-protein-dependent transport systems family, and includes transport system permease proteins involved in the transport across the membrane of several compounds. This entry contains the inner components of this multicomponent transport system.\ \ 7100 IPR009897 \

    This family consists of several Orthoreovirus P17 proteins. P17 is specified be ORF2 of the S1 gene and represents a nonstructural protein which associate with cell membranes PUBMED:11883183.

    \ 2722 IPR007494 \

    Glutaredoxins are a multifunctional family of glutathione-dependent disulphide oxidoreductases PUBMED:14713336. Unlike other glutaredoxins, glutaredoxin 2 (Grx2) cannot reduce ribonucleotide reductase. Grx2 has significantly higher catalytic activity in the reduction of mixed disulphides with glutathione (GSH) compared with other glutaredoxins. The active site residues (Cys9-Pro10-Tyr11-Cys12, in Escherichia coli Grx2, ), which are found at the interface between the N- and C-terminal domains are identical to other glutaredoxins, but there is no other similarity between glutaredoxin 2 and other glutaredoxins. Grx2 is structurally similar to glutathione-S-transferases (GST), but there is no obvious sequence similarity. The inter-domain contacts are mainly hydrophobic, suggesting that the two domains are unlikely to be stable on their own. Both domains are needed for correct folding and activity of Grx2. It is thought that the primary function of Grx2 is to catalyse reversible glutathionylation of proteins with GSH in cellular redox regulation including the response to oxidative stress. The N-terminal domain is .

    \ 3770 IPR005321 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S58 \ (DmpA aminopeptidase family, clan PB(S)). The protein fold of the peptidase unit for members of this family resembles that of archaean proteasome subunit B, the type example of clan PB. The type example is aminopeptidase DmpA from Ochrobactrum anthropi. This family also contains proteins that have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity of peptidases in the family.

    \

    L-aminopeptidase D-Ala-esterase/amidase (DmpA) from Ochrobactrum anthropi releases the N-terminal L and/or D-Ala residues from peptide substrates. This is the only known enzyme to liberate N-terminal amino acids with both D and L stereospecificity. DmpA active form is an alphabeta heterodimer, which results from a putative autocatalytic cleavage of an inactive precursor polypeptide. DmpA shows structural homology to N-terminal nucleophile (Ntn) hydrolase family members, and may work by a similar catalytic mechanism, however their secondary structure elements differ significantly PUBMED:10673442.

    \ 4244 IPR000592 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of sequence\ similarities. One of these families include mammalian, yeast, Chlamydomonas reinhardtii and Entamoeba histolytica\ S27, and Methanococcus jannaschii MJ0250 PUBMED:8441676. These proteins have from 62 to 87 amino acids. They\ contain, in their central section, a putative zinc-finger region of the type C-x(2)-C-x(14)-C-x(2)-C.

    \ 3006 IPR001404 \

    Prokaryotes and eukaryotes respond to heat shock and other forms of \ environmental stress by inducing synthesis of heat-shock proteins (hsp) PUBMED:2853609. The 90 kDa heat shock protein, Hsp90, is one of the most abundant proteins in eukaryotic cells, comprising 12% of cellular proteins under non-stress conditions PUBMED:15069952. Its contribution to various cellular processes including signal transduction, protein folding, protein degradation and morphological evolution has been extensively studied PUBMED:8419347, PUBMED:7914036. The full functional activity of Hsp90 is gained in concert with other co-chaperones, playing an important role in the folding of newly synthesised proteins and stabilisation and refolding of denatured proteins after stress. Apart from its co-chaperones, Hsp90 binds to an array of client proteins, where the co-chaperone requirement varies and depends on the actual client.

    The\ sequences of hsp90s show a distinctive domain structure, with a highly-conserved N-terminal domain separated from a conserved, acidic C-terminal\ domain by a highly-acidic, flexible linker region.

    \ 4071 IPR002129 \ A number of pyridoxal-dependent decarboxylases share regions of sequence similarity, particularly in the vicinity of a conserved lysine residue, which provides the attachment site for the pyridoxal-phosphate (PLP) group PUBMED:8181483, PUBMED:2124279. Among these enzymes are aromatic-L-amino-acid decarboxylase (L-dopa decarboxylase or tryptophan decarboxylase), which catalyses the decarboxylation of tryptophan to tryptamine PUBMED:8889823; tyrosine decarboxylase, which converts tyrosine into tyramine; and histidine decarboxylase, which catalyses the decarboxylation of histidine to histamine PUBMED:2300558. These enzymes belong to the group II decarboxylases PUBMED:8181483, PUBMED:8889823.\ 4802 IPR006902 \ The long distance movement protein of Umbraviruses mediates the movement of viral RNA through the phloem of infected plants PUBMED:11601910.\ 1742 IPR001855 \

    Defensins are 2-6 kDa, cationic, microbicidal peptides active against many Gram-negative and Gram-positive bacteria, \ fungi, and enveloped viruses PUBMED:8528769, containing three pairs of intramolecular disulphide bonds. On the basis of their size and pattern of\ disulphide bonding, mammalian defensins are classified into alpha, beta and theta categories. Every mammalian species\ explored thus far has beta-defensins. In cows, as many as 13 beta-defensins exist in neutrophils. However, in other species, beta-defensins are more often produced by\ epithelial cells lining various organs (e.g. the epidermis, bronchial tree and genitourinary tract).

    Defensins are produced constitutively and/or in response to microbial products or proinflammatory cytokines. Some defensins are also called corticostatins (CS) because \ they inhibit corticotropin-stimulated corticosteroid production. The mechanism(s) by which microorganisms are killed and/or inactivated by defensins is not understood completely. However, it is generally believed that killing is a\ consequence of disruption of the microbial membrane. The polar topology of defensins, with spatially separated charged and hydrophobic regions, allows them to\ insert themselves into the phospholipid membranes so that their hydrophobic regions are buried within the lipid membrane interior and their charged (mostly cationic)\ regions interact with anionic phospholipid head groups and water. Subsequently, some defensins can aggregate to form 'channel-like' pores; others might bind to and cover the microbial membrane in a 'carpet-like' manner. The net outcome is the disruption of membrane integrity and function,\ which ultimately leads to the lysis of microorganisms. Some defensins are synthesized as propeptides which may be relevant to this process.

    Human, rabbit and\ guinea-pig beta-defensins, as well as human beta-defensin-2 (hBD2), induce the activation and degranulation of mast cells, resulting in the release of histamine and\ prostaglandin D2

    \ 2878 IPR007142 \

    Hemagglutinin esterases are membrane glycoproteins present on the surface of the virus and are involved with the cell infection process. Hemagglutinin esterase contains Hemagglutinin chain 1 (HE1) and Hemagglutinin chain 2 (HE2), and forms a homotrimer with each monomer being formed by two chains linked by a disulphide bond.

    \ \ 7724 IPR012871 \

    The hypothetical proteins found in this family are expressed by Oryza sativa and are of unknown function.

    \ 6197 IPR009421 \

    This family consists of several Maize streak virus 21.7 kDa proteins. The function of this family is unknown.

    \ 7264 IPR009997 \

    This family consists of several Curtovirus V3 proteins of around 90 residues in length. The function of this family is unknown.

    \ 2823 IPR000740 \

    Molecular chaperones are a diverse family of proteins that function to protect proteins in the intracellular milieu from irreversible aggregation during synthesis and in times of cellular stress. The bacterial molecular chaperone DnaK is an enzyme that couples cycles of ATP binding, hydrolysis, and ADP release by an N-terminal ATP-hydrolysing domain to cycles of sequestration and release of unfolded proteins by a C-terminal substrate binding domain. In prokaryotes the grpE protein. Dimeric GrpE is the co-chaperone for DnaK, and acts as a nucleotide exchange factor, stimulating the rate of ADP release 5000-fold PUBMED:8280473. DnaK is itself a weak ATPase; ATP hydrolysis by DnaK is stimulated by its interaction with another co-chaperone, DnaJ. Thus the co-chaperones DnaJ and GrpE are capable of tightly regulating the nucleotide-bound and substrate-bound state of DnaK in ways that are necessary for the normal housekeeping functions and stress-related functions of the DnaK molecular chaperone cycle.

    The X-ray crystal structure of GrpE in complex with the ATPase domain of DnaK revealed that GrpE is an asymmetric homodimer, bent in a manner that favours extensive contacts with only one DnaKATPase monomer PUBMED:15136046. GrpE does not actively compete for the atomic positions occupied by the nucleotide. GrpE and ADP mutually reduce one another's affinity for DnaK 200-fold, and ATP instantly dissociates GrpE from DnaK.

    \ 2955 IPR007667 \ This is a family of proteins thought to be involved in the response to hypoxia. Family members mostly come from diverse eukaryotic organisms however eubacterial members have been identified. This region is found at the N terminus of the member proteins which are predicted to be transmembrane PUBMED:11172064.\ 4406 IPR000215 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    Serpins (SERine Proteinase INhibitors) PUBMED:14705960, PUBMED:2690952, PUBMED:8417965 belong to MEROPS inhibitor family I4, clan ID. \ \ Serpins are proteins that are primarily known as irreversible serine protease inhibitors active against S1 (), S8 () and C14 () peptidases. There are both extra- and intra-cellular serpins, which are found in all groups of organisms with the notable exception of fungi PUBMED:11116082, PUBMED:12411597.

    \ \ \ \ \

    Serpins and their homologues are a group of high molecular weight (40 to 50 kDa) structurally related proteins involved in a number of fundamental biological processes such as blood coagulation, complement activation, fibrinolysis, angiogenesis, inflammation, tumour suppression and hormone transport. All known serpins have been classified into 16 clades and 10 orphan sequences, the vertebrate serpins can be conveniently classified into six sub-groups PUBMED:11116082. In human plasma they represent approximately 2% of the total protein, of which 70% is alpha-1-antitrypsin.

    \ \

    In contrast to "rigid" proteinase inhibitors, such as those of the Kunitz or Kazal families, the serpins are metastable proteins (active-state proteins) which interact with their substrate and irreversibly trap the acyl intermediate as a result of a major conformational change PUBMED:11116079; they are best described as suicide substrate inhibitors. The common structure of these proteins is a multi-domain fold containing a bundle of 8 or 9 alpha\ helices and a beta sandwich formed by 3 beta sheets. The reactive centre loop (RCL) is found in the C-terminal part of these proteins. \ On the basis of strong sequence similarities, a number of proteins with no\ known inhibitory activity are said to belong to this family, these include: angiotensinogen, corticosteroid-binding globulin and thyroxin-binding globulin PUBMED:12824063.

    \ 2593 IPR004956 \

    Foamy virus (FV) gene expression is strictly dependent on their transactivator proteins called Bel1/Tas. The presence of a functionally active, internal promoter, besides the conventional LTR promoters, is unique to FVs. The nuclear Bel1/Tas protein of primate prototype FV binds DNA target sites directly and consists of at least two functional domains, an N-terminal/central DNA binding and a C-terminal activation domain PUBMED:14972532.

    \ 5467 IPR008515 \ This family consists of several short bacterial proteins of unknown function.\ 1072 IPR007138 \

    This domain is found in monooxygenases involved in the biosynthesis of several antibiotics by Streptomyces species, which can carry out oxygenation without the assistance of any of the prosthetic groups, metal ions or cofactors normally associated with activation of molecular oxygen. The structure of ActVA-Orf6 monooxygenase from Streptomyces coelicolor (), which is involved in actinorhodin biosynthesis, reveals a dimeric alpha+beta barrel topology PUBMED:12514126. There is also a conserved histidine that is likely to be an active site residue. In Streptomyces coelicolor SCO1909 () this domain occurs as a repeat.

    \ 5890 IPR008249 \ There is currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 934 IPR005475 \

    Transketolase () (TK) catalyzes the reversible transfer of a\ two-carbon ketol unit from xylulose 5-phosphate to an aldose receptor, such as\ ribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3-\ phosphate. This enzyme, together with transaldolase, provides a link between\ the glycolytic and pentose-phosphate pathways.\ TK requires thiamine pyrophosphate as a cofactor. In most sources where TK has\ been purified, it is a homodimer of approximately 70 Kd subunits. TK sequences\ from a variety of eukaryotic and prokaryotic sources PUBMED:1567394, PUBMED:1737042 show that the\ enzyme has been evolutionarily conserved.\ In the peroxisomes of methylotrophic yeast Hansenula polymorpha, there is a\ highly related enzyme, dihydroxy-acetone synthase (DHAS) () (also\ known as formaldehyde transketolase), which exhibits a very unusual\ specificity by including formaldehyde amongst its substrates.

    \ 1-deoxyxylulose-5-phosphate synthase (DXP synthase) PUBMED:9371765 is an enzyme so far\ found in bacteria (gene dxs) and plants (gene CLA1) which catalyzes the\ thiamine pyrophosphoate-dependent acyloin condensation reaction between carbon\ atoms 2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D-\ xylulose-5-phosphate (dxp), a precursor in the biosynthetic pathway to\ isoprenoids, thiamine (vitamin B1), and pyridoxol (vitamin B6). DXP synthase\ is evolutionary related to TK. \ The N-terminal section, contains a histidine residue which appears to function in\ proton transfer during catalysis PUBMED:1628611. In the central\ section there are conserved acidic residues that are part of the active cleft\ and may participate in substrate-binding PUBMED:1628611.\ This family includes transketolase enzymes \ and also partially matches to 2-oxoisovalerate dehydrogenase\ beta subunit . Both these enzymes\ utilise thiamine pyrophosphate as a cofactor, suggesting\ there may be common aspects in their mechanism of catalysis.

    \ 8078 IPR013201 \

    This domain is found at the N-terminus of some C1 peptidases such as Cathepsin L where it acts as a propeptide. There are also a number of proteins that are composed solely of multiple copies of this domain such as the peptidase inhibitor salarin . This family is classified as I29 by MEROPS.

    \ 7866 IPR012615 \

    This domain has been identified in a number of distantly related species of trematodes. This protein domain is crucial for eggshell synthesis in trematodes (Ebersberger I).

    \ 6760 IPR009700 \

    This family consists of several hypothetical proteins of around 115 residues in length, which seem to be specific to Enterobacteria. The function of the family is unknown.

    \ 7304 IPR006395 \

    These sequences describe methylaspartate ammonia-lyase, also called beta-methylaspartase. It follows methylaspartate mutase (composed of S and E subunits) in one of several possible pathways of glutamate fermentation.

    \ 74 IPR006808 \ The Fo sector of the ATP synthase is a membrane bound complex which mediates proton transport. It is composed of nine different polypeptide subunits (a, b, c, d, e, f, g F6, A6L). The function of subunit g is currently unknown. The conserved region covers all but the very N-terminus of the member sequences. No prokaryotic members have been identified thus far PUBMED:8011660.\ 7515 IPR011698 \ This group of enzymes was suggested to be related to the MinD family of ATPases involved in regulation of cell division in bacteria and archaea PUBMED:10966576. Further sequence analysis suggests a model for the interaction of CobB and CobQ with their respective substrates PUBMED:10966576. CobB and CobQ were also found to contain unusual Triad family (class I) glutamine amidotransferase domains with conserved Cys and His residues, but lacking the Glu residue of the catalytic triad PUBMED:10966576. \ 2498 IPR001367 \ The diphtheria toxin repressor protein (DTXR) is a member of this group PUBMED:7568230. In \ Corynebacterium diphtheriae where it has been studied in some detail this protein acts\ as an iron-binding repressor of dipheteria toxin gene expression and may serve as a \ global regulator of gene expression. The N-terminus may be involved in iron binding and\ may associate with the Tox operator. Binding of DTXR to Tox operator requires a divalent\ metal ion such as cobalt, ferric, manganese and nickel whereas zinc shows weak \ activation PUBMED:7743135.\ 5869 IPR010323 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 4855 IPR005347 \

    This family of proteins are uncharacterised, however BtrG is part of a butirosin-biosynthetic gene cluster from Bacillus circulans PUBMED:11132962.

    \ 2212 IPR007564 \ This is a family of uncharacterised, hypothetical archaeal proteins.\ 3811 IPR006441 \

    This family represents the major capsid protein component of the heads (capsids) of bacteriophage P2 and related phage including prophage. These sequences represent one of several analogous families lacking detectable sequence similarity. The gene encoding this component is typically located in an operon encoding the small and large terminase subunits, the portal protein and the prohead or maturation protease.

    \ 1815 IPR002755 \

    DNA primase PUBMED:2023935 synthesizes the RNA primers for the Okazaki\ fragments in lagging strand DNA synthesis. DNA primase is a heterodimer of large (p60) and small (p50) subunits in eukaryotes. This family represents sequences of the small subunit and the DNA primase sequences of the Archaea PUBMED:10536154. No sequence similarity can be detected between the eukaryotic p50 and p60 subunits and the primases purified from bacteriophage and bacteria, .

    \ 5966 IPR009201 \ This group represents a virion core protein, vaccinia E11L type.\ 3475 IPR006753 \

    This is a family of conserved coat proteins from the single stranded DNA Nanoviruses PUBMED:10795525.

    \ 5062 IPR007899 \

    The CHAD domain is an alpha-helical domain functionally associated with some members of the adenylate cyclase family . It has conserved histidines that may chelate metals\ PUBMED:12456267.

    \ 172 IPR007533 \ Cytochrome c oxidase assembly protein is essential for the assembly of functional cytochrome oxidase protein. In eukaryotes it is an integral protein of the mitochondrial inner membrane. Cox11 is essential for the insertion of Cu(I) ions to form the CuB site. This is essential for the stability of other structures in subunit I, for example haems a and a3, and the magnesium/manganese centre. Cox11 is probably only required in sub-stoichiometric amounts relative to the structural units PUBMED:10617659. The C-terminal region of the protein is known to form a dimer. Each monomer coordinates one Cu(I) ion via three conserved cysteine residues (111, 208 and 210) in Saccharomyces cerevisiae (). Met 224 is also thought to play a role in copper transfer or stabilising the copper site PUBMED:12063264.\ 5282 IPR008465 \ Dystroglycan is one of the dystrophin-associated glycoproteins, which is encoded by a 5.5 kb transcript in Homo sapiens. The protein product is cleaved into two non-covalently associated subunits, [alpha] (N-terminal) and [beta] (C-terminal). In skeletal muscle the dystroglycan complex works as a transmembrane linkage between the extracellular matrix and the cytoskeleton. [alpha]-dystroglycan is extracellular and binds to merosin ([alpha]-2 laminin) in the basement membrane, while [beta]-dystroglycan is a transmembrane protein and binds to dystrophin, which is a large rod-like cytoskeletal protein, absent in Duchenne muscular dystrophy patients. Dystrophin binds to intracellular actin cables. In this way, the dystroglycan complex, which links the extracellular matrix to the intracellular actin cables, is thought to provide structural integrity in muscle tissues. The dystroglycan complex is also known to serve as an agrin receptor in muscle, where it may regulate agrin-induced acetylcholine receptor clustering at the neuromuscular junction. There is also evidence which suggests the function of dystroglycan as a part of the signal transduction pathway because it is shown that Grb2, a mediator of the Ras-related signal pathway, can interact with the cytoplasmic domain of dystroglycan. In general, aberrant expression of dystrophin-associated protein complex underlies the pathogenesis of Duchenne muscular dystrophy, Becker muscular dystrophy and severe childhood autosomal recessive muscular dystrophy. Interestingly, no genetic disease has been described for either [alpha]- or [beta]-dystroglycan. Dystroglycan is widely distributed in non-muscle tissues as well as in muscle tissues. During epithelial morphogenesis of kidney, the dystroglycan complex is shown to act as a receptor for the basement membrane. Dystroglycan expression in Mus musculus brain and neural retina has also been reported. However, the physiological role of dystroglycan in non-muscle tissues has remained unclear PUBMED:8872465.\ 2880 IPR001364 \ The haemagglutinin (HA) glycoprotein of influenza is a trimer containing \ three structurally distinct regions: a globular head of anti-parallel \ beta-sheet, which contains the receptor binding site and the variable\ antigenic determinants (antigenic variation in haemagglutinin is\ associated with recurrent epidemics of respiratory diseases in man); a\ triple-stranded, coiled-coil, alpha-helical stalk; and a globular foot of\ anti-parallel beta-sheet PUBMED:3374584, PUBMED:6207440, PUBMED:7464906, PUBMED:6162101. \

    The structural domains of haemagglutinin are arranged broadly as follows:\ a large globular, hydrophilic, carbohydrate-containing domain resides on \ the external suface of the membrane; a small, uncharged hydrophobic peptide\ spans the membrane; and a smaller globular, hydrophilic domain resides on\ the inside of the membrane.

    \

    Each monomer in the structure comprises two\ disulphide-linked chains, HA1 and HA2. The N-terminus of HA1 provides a \ central strand in the 5-stranded globular foot, the chain then making its\ way to the globular head, where it forms an 8-stranded Swiss-roll. HA2\ provides two alpha-helices, which form part of the fibrous structure\ (three helices, one from each monomer, pack together as the triple-stranded\ coiled-coil that stablises the trimer), its C-terminus providing the\ remaining strands of the 5-stranded globular foot.

    \ 4427 IPR007624 \ Region 3 forms a discrete compact three helical domain within the sigma-factor. Region is not normally involved in the recognition of promoter DNA, but in some specific bacterial promoters containing an extended -10 promoter element, residues within region 3 play an important role. Region 3 primarily is involved in binding the core RNA polymerase in the holoenzyme PUBMED:11931761.\ 2587 IPR000262 \ A number of oxidoreductases that act on alpha-hydroxy acids and which are FMN-containing flavoproteins have been shown PUBMED:2324094, PUBMED:2271624, PUBMED:1939137 to be structurally related.\ The first step in the reaction mechanism of these enzymes is the abstraction of the proton from the alpha-carbon of the substrate producing a carbanion which can subsequently attach to the N5 atom of FMN. A conserved histidine has been shown PUBMED:2644287 to be involved in the removal of the proton. The region around this active site residue is highly conserved and contains an arginine residue which is involved in substrate binding.\ 3208 IPR000566 \ Proteins which transport small hydrophobic molecules such as steroids, bilins, retinoids, and lipids share limited\ regions of sequence homology and a common tertiary structure architecture PUBMED:3622999, PUBMED:1608945, PUBMED:2217163,\ PUBMED:7684291, PUBMED:3238752. This is an eight stranded antiparallel beta-barrel with a repeated + 1 topology enclosing\ a internal ligand binding site PUBMED:7684291, PUBMED:2217163. The name 'lipocalin' has been proposed PUBMED:3622999 for\ this protein family, but cytosolic fatty-acid binding proteins are also included. The sequences of most members of the family, the core or kernal lipocalins, are characterized by\ three short conserved stretches of residues, while others, the outlier lipocalin group, share only one or two of these\ PUBMED:1834059, PUBMED:7684291. Proteins known to belong to this family include alpha-1-microglobulin (protein HC);\ alpha-1-acid glycoprotein (orosomucoid) PUBMED:3064105; aphrodisin; apolipoprotein D; beta-lactoglobulin; complement\ component C8 gamma chain PUBMED:1707134; crustacyanin PUBMED:2026162; epididymal-retinoic acid binding protein\ (E-RABP) PUBMED:8069623; insectacyanin; odorant-binding protein (OBP); human pregnancy-associated endometrial alpha-2\ globulin; probasin (PB), a rat prostatic protein; prostaglandin D synthase () PUBMED:1723819; purpurin; Von\ Ebner's gland protein (VEGP) PUBMED:7514123; and lizard epididymal secretory protein IV (LESP IV) PUBMED:8486691.\ \ \

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Bla g 4, Bos d 2, Bos d 5, Can f 1, Can f 2, Equ c 1 and Equ c 2.

    \ 1854 IPR002829 \ These archaebacterial proteins have no known function.\ Members of this family contain seven conserved cysteines and\ may also be an integral membrane protein.\ 6767 IPR009705 \

    This family consists of several hypothetical archaeal proteins of around 120 residues in length. All members of this family seem to be Sulfolobus species specific. The function of this family is unknown.

    \ 7448 IPR011512 \

    This is a family of hypothetical proteins from Leptospira interrogans which share a highly conserved sequence motif at the C terminus.

    \ 3993 IPR001240 \ Indole-3-glycerol phosphate synthase (IGPS) (see ) catalyzes the fourth step in the biosynthesis of tryptophan, the ring closure of 1-(2-carboxy-phenylamino)-1-deoxyribulose into indol-3-glycerol-phosphate. In some bacteria, IGPS is a single chain enzyme. In others, such as \ Escherichia coli, it is the N-terminal domain of a bifunctional enzyme that also catalyzes N-(5-phosphoribosyl)anthranilate isomerase \ (PRAI) activity, the third step of tryptophan biosynthesis. In fungi, IGPS is the central domain of a trifunctional enzyme that contains a PRAI C-terminal domain and a glutamine amidotransferase (GATase) N-terminal domain (see ).\

    Phosphoribosylanthranilate isomerase (PRAI) is monomeric and labile in most\ mesophilic microorganisms, but dimeric and stable in the hyperthermophile Thermotoga maritima (tPRAI) PUBMED:10745009. The comparison to the known 2.0 A structure of PRAI from Escherichia coli (ePRAI) shows that tPRAI has the complete TIM- or (beta alp\ ha)8-barrel fold, whereas helix alpha5 in ePRAI is replaced by a loop. The subunits of tPRAI associate via the N-terminal faces of their central beta-barrels. Two long, symmetry-related loops that protrude reciprocally into cavities of the other subunit provide for multiple hydrophobic interactions. Moreover, the side chains of the N-terminal methionines and the C-terminal leucines of both subunits are immobilized in a hydrophobic cluster, and the number of salt bridges is increased in tPRAI. These features appear to be mainly responsible for the high thermostability of tPRAI PUBMED:9166771.

    \ 2569 IPR005626 \

    This is a family of FLP proteins that catalyse recombination between large inverted repetitions of the plasmid.

    \ \ \ 3275 IPR013131 \ Mannitol-1-phosphate 5-dehydrogenase catalyzes the NAD-dependent reduction of mannitol-1-phosphate\ to fructose-6-phosphate PUBMED:1904856 as part of the phosphoenolpyruvate-dependent phosphotransferase\ system (PTS). The PTS facilitates the vectorial translocation of metabolisable carbohydrates to form\ the corresponding sugar phosphates, which are then converted to glycolytic intermediates PUBMED:1322373.\ Mannitol 2-dehydrogenase catalyzes the NAD-dependent reduction of mannitol to fructose PUBMED:8254318.\ Several dehydrogenases have been shown PUBMED:8254318 to be evolutionary related, including \ mannitol-1-phosphate 5-dehydrogenase () (gene mtlD), mannitol 2-dehydrogenase () (gene mtlK);\ mannonate oxidoreductase () (fructuronate reductase) (gene uxuB); Escherichia coli hypothetical\ proteins ydfI and yeiQ; and yeast hypothetical protein YEL070w.\ 2147 IPR007436 \ This family includes several putative integral membrane proteins.\ 2155 IPR007457 \

    The protein represented by this entry, YggX, serves to protect Fe-S clusters from oxidative damage PUBMED:11416172. The effect is two-fold: proteins that rely on Fe-S clusters do not become inactivated, and the release of free iron and hydrogen peroxide--a DNA damaging agent--is prevented. These observations are consistent with the hypothesis that YggX chelates free iron, and recent experiments show that YggX can indeed bind Fe(II) in vitro and in vivo PUBMED:12670952. Furthermore, YggX has a positive effect on the action of at least one Fe(II)-responsive protein. The combined actions of YggX is reminiscent of iron trafficking proteins PUBMED:12033438, and YggX is therefore proposed to play a role in Fe(II) trafficking PUBMED:12670952. In Escherichia coli, YggX was shown to be under the transcriptional control of the redox-sensing SoxRS system PUBMED:14594836.\

    \ 365 IPR007419 \

    The two Fe ions are each coordinated by two conserved cysteine residues. This domain occurs alone in small proteins such as bacterioferritin-associated ferredoxin (BFD, ). The function of BFD is not known, but it may be a general redox and/or regulatory component involved in the iron storage or mobilisation functions of bacterioferritin in bacteria PUBMED:8639572. This domain is also found in nitrate reductase proteins in association with the nitrite and sulphite reductase 4Fe-4S domain (), nitrite/sulphite reductase ferredoxin-like half domain () and pyridine nucleotide-disulphide oxidoreductase (). It is also found in NifU nitrogen fixation proteins, in association with NifU-like N-terminal domain () and C-terminal domain ().

    \ 1497 IPR013147 \

    This family represents the transmembrane region of CD47 leukocyte antigen PUBMED:8794870, PUBMED:12124426.

    \ 908 IPR003195 \

    This family includes the Spt3 yeast transcription factors and the 18 kDa subunit from human transcription initiation factor IID (TFIID-18). Determination of the crystal structure reveals an atypical histone fold PUBMED:9695952.

    \ 3832 IPR006497 \

    This set of protein sequences, defined by an N-terminal domain, represent phage lambda replication protein O and other homologous phage and prophage proteins.

    \ 1066 IPR001140 \

    ATP-binding cassette (ABC) transporters are multidomain membrane proteins, responsible\ for the controlled efflux and influx of substances (allocrites) across cellular membranes. They are minimally composed of four domains, with two transmembrane domains\ (TMDs) responsible for allocrite binding and transport and two nucleotide-binding domains\ (NBDs) responsible for coupling the energy of ATP hydrolysis to conformational changes\ in the TMDs. Both NBDs are capable of ATP hydrolysis, and inhibition of\ hydrolysis at one NBD effectively abrogates hydrolysis at the other. Hydrolysis\ at the two NBDs may occur in an alternative fashion although they appear substantially functionally\ symmetrical in terms of their binding to diverse nucleotides PUBMED:12504680.

    \

    A variety of ATP-binding transport proteins have a six transmembrane\ helical region. They are all integral membrane proteins\ involved in a variety of transport systems. Members of this family include; the\ cystic fibrosis transmembrane conductance regulator (CFTR), bacterial leukotoxin\ secretion ATP-binding protein, multidrug resistance proteins, the yeast leptomycin B\ resistance protein, the mammalian sulphonylurea receptor and antigen peptide\ transporter 2. Many of these proteins have two such regions.

    \ 4330 IPR000685 \ Ribulose bisphosphate carboxylase (RuBisCO) PUBMED:6351728, PUBMED:12221984 catalyzes the\ initial step in Calvin's reductive pentose phosphate cycle in plants as well as purple and green bacteria.\ It consists of a large catalytic unit and a small subunit of undetermined function. In plants, the large\ subunit is coded by the chloroplastic genome while the small subunit is encoded in the nuclear genome.\ Molecular activation of RuBisCO by CO2 involves the formation of a carbamate with the epsilon-amino group\ of a conserved lysine residue. This carbamate is stabilized by a magnesium ion. One of the ligands of\ the magnesium ion is an aspartic acid residue close to the active site lysine PUBMED:1969412.\ 7210 IPR009970 \

    This family contains the bacterial histone H1-like nucleoprotein HC2 (approximately 200 residues long), which seems to be found mostly in Chlamydia. HC2 functions in DNA condensation, although it has been suggested that it also has other roles PUBMED:8733229.

    \ 180 IPR006973 \ This family represents Cwf15/Cwc15 (from Schizosaccharomyces pombe and Saccharomyces cerevisiae respectively) and their homologues. The function of these proteins is unknown, but they form part of the spliceosome and are thus thought to be involved in mRNA splicing PUBMED:11884590.\ 1198 IPR006828 \

    This region is found in the beta subunit of the 5-AMP-activated protein kinase complex, and its yeast homologues Sip1, Sip2 and Gal83, which are found in the SNF1 kinase complex PUBMED:8621499. This region is sufficient for interaction of this subunit with the kinase complex, but is not solely responsible for the interaction, and the interaction partner is not known PUBMED:7813428. The isoamylase domain () is sometimes found associated with proteins that contain this C-terminal domain.

    \ 7438 IPR011467 \

    These hypothetical proteins from bacteria, such as Rhodopirellula baltica, Bacteroides thetaiotaomicron and Porphyromonas gingivalis, share a region of conserved sequence towards their N termini.

    \ 1834 IPR007215 \ DsrH is involved in oxidation of intracellular sulphur in the phototrophic sulphur bacterium Chromatium vinosum D PUBMED:9695921.\ 3774 IPR001539 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases associated with clan U- have an unknown catalytic mechanism as the protein fold of the active site domain and the active site residues have not been reported.

    \

    This is a group of peptidases belonging to MEROPS peptidase family U32 (clan U-). The type example is collagenase (gene prtC) from Porphyromonas gingivalis PUBMED:1317840, which is an enzyme that degrades type I collagen and that seems to require a metal cofactor. The product of PrtC is evolutionary related to a number of uncharacterized proteins with a well conserved region containing two cysteines.

    \ 2330 IPR007828 \ This is a family of uncharacterised eukaryotic proteins. Some members have a described putative function, but a common theme is not evident.\ 3201 IPR007544 \ Many Gram-positive bacteria produce antimicrobial peptides, generally termed bacteriocins. These peptides are usually cationic, less than 50 amino acid residues long, contain an amphiphilic or hydrophobic region, and often kill their target cells by permeabilizing the cell membrane. Antimicrobial peptides with these characteristics are also produced by plants and a wide variety of animals, including humans, and are thus widely distributed in nature. The Linocin_M18 region is found mostly in eubacteria, though homologous sequences have been identified in archaea PUBMED:8919789, PUBMED:7986050.\ 1031 IPR000308 \

    The 14-3-3 proteins are a large family of approximately 30kDa acidic proteins which exist primarily as homo- and heterodimeric within all eukaryotic cells\ PUBMED:1671102, PUBMED:11911880. There is a high degree of sequence identity and conservation between all the 14-3-3 isotypes, particularly in the regions which form the dimer interface or line the central ligand binding channel of the dimeric molecule. Each 14-3-3 protein sequence can be roughly divided into three sections: a divergent amino terminus, the conserved core region\ and a divergent carboxyl terminus. The conserved middle core region of the 14-3-3s encodes an amphipathic groove that forms the main functional domain, a cradle\ for interacting with client proteins. The monomer consists of nine helices\ organized in an antiparallel manner, forming an L-shaped structure. The interior of the L-structure is composed of four\ helices: H3 and H5, which contain many charged and polar amino acids, and H7 and H9, which contain hydrophobic amino acids.\ These four helices form the concave amphipathic groove that interacts with target peptides.\

    \ \

    14-3-3 proteins mainly bind proteins containing phosphothreonine or phosphoserine motifs however exceptions to this rule do exist. Extensive investigation of the 14-3-3 binding site of the mammalian serine/threonine kinase\ Raf-1 has produced a consensus sequence for 14-3-3-binding, RSxpSxP (in the single-letter amino-acid code, where x denotes any\ amino acid and p indicates that the next residue is phosphorylated). 14-3-3 proteins appear to effect intracellular signalling in one of three ways - by direct regulation of the catalytic activity of the bound protein, by regulating interactions between the bound protein and other molecules in the cell by sequestration or modification or by controlling the subcellular localisation of the bound ligand.\ Proteins appear to initially bind to a single dominant site and then subsequently to many, much weaker secondary interaction sites. The 14-3-3 dimer is capable of changing the conformation of its bound ligand whilst itself undergoing minimal structural alteration.

    \ \ 1729 IPR003200 \ This group of proteins represents the nicotinate-nucleotide- dimethylbenzimidazole phosphoribosyltransferase (NN:DBI PRT) enzymes involved in dimethylbenzimidazole synthesis. This function is essential to de novo cobalamin (vitamin B12) production in bacteria. The entry also includes a group of proteins of unknown function.\ 3397 IPR001354 \

    Mandelate racemase () (MR) and muconate lactonizing enzyme () (MLE) \ are two bacterial enzymes involved in aromatic acid catabolism. They catalyze \ mechanistically distinct reactions yet they are related at the level of their primary, \ quaternary (homooctamer) and tertiary structures PUBMED:2215699, PUBMED:8256284.\ A number of other proteins also seem to be evolutionary related to these two\ enzymes. These include, various plasmid-encoded chloromuconate cycloisomerases \ (), Escherichia coli protein rspA PUBMED:7545940, E. coli bifunctional DGOA protein, E. coli hypothetical proteins ycjG, yfaW and yidU and a hypothetical protein from Streptomyces \ ambofaciens PUBMED:8277241.

    \ 1025 IPR007883 \ This family contains proteins of unknown function from Caenorhabditis elegans.\ 3075 IPR007740 \ This family of proteins has been identified as part of the mitochondrial large ribosomal subunit in Saccharomyces cerevisiae PUBMED:12392552.\ 5039 IPR007453 \

    Family member has been observed to co-purify with Desulphovibrio vulgaris dissimilatory sulphite reductase PUBMED:1555572, and many members of this family are annotated as the third (gamma) subunit of dissimilatory sulphite reductase. However, this protein appears to be only loosely associated to the sulphite reductase, which suggests that DsrC may not be an integral part of the dissimilatory sulphite reductase. Members of this family are found in organisms such as Escherichia coli and Haemophilus influenzae which do not contain dissimilatory sulphite reductases but can synthesise assimilatory sirohaem sulphite and nitrite reductases. It is speculated that DsrC may be involved in the assembly, folding or stabilisation of sirohaem proteins PUBMED:9493389. The strictly conserved cysteine in the C terminus suggests that DsrC may have a catalytic function in the metabolism of sulphur compounds PUBMED:9695921.

    \ 926 IPR007371 \ Thiamin pyrophosphokinase (TPK, ) catalyzes the transfer of a pyrophosphate group from ATP to vitamin B1 (thiamin) to form the coenzyme thiamin pyrophosphate (TPP). Thus, TPK is important for the formation of a coenzyme required for central metabolic functions. The structure of thiamin pyrophosphokinase suggests that the enzyme may operate by a mechanism of pyrophosphoryl transfer similar to those described for pyrophosphokinases functioning in nucleotide biosynthesis PUBMED:11435118.\ 1938 IPR003871 \

    This domain (DUF223) found in eukaryotic proteins is of unknown function.

    \ 7403 IPR011502 \

    This is a family of nucleoporins conserved from yeast to human.

    \ 5210 IPR008044 \

    At least one of the members of this domain, the Pal protein from the pneumococcal\ bacteriophage Dp-1 has been shown to be an\ N-acetylmuramoyl-L-alanine amidase PUBMED:6146601. According to the known modular\ structure of this and other peptidoglycan hydrolases from the pneumococcal system, the active site\ should reside in the N-terminal domain whereas the C-terminal domain binds to the choline residues\ of the cell wall teichoic acids PUBMED:9379901, PUBMED:3422470.

    \ 7655 IPR013095 \

    Type III secretion chaperones are involved in delivering virulence effector proteins from bacterial pathogens directly into eukaryotic cells. The chaperones may prevent aggregation and degradation of their substrates, may target the effector to the secretion apparatus, and may ensure a secretion-component unfolded conformation of their specific substrate. One member of this family, SigE () forms homodimers in crystal. The monomers have a novel fold with an alpha-beta(3)-alpha-beta(2)-alpha topology PUBMED:11685226.

    \ 3704 IPR001759 \

    Pentaxins (or pentraxins) PUBMED:6356809, PUBMED:7772283 are a family of proteins which show, under electron microscopy, a discoid arrangement of five noncovalently bound subunits. Proteins of the pentaxin family are involved in acute immunological responses PUBMED:7772283. Three of the principal members of the pentaxin family are serum proteins: namely, C-reactive protein (CRP) PUBMED:9614930, serum amyloid P component protein (SAP) PUBMED:9514915, and female protein (FP) PUBMED:9583999.

    \

    CRP is expressed during acute phase response to tissue injury or inflammation in mammals. The protein resembles antibody and performs several functions associated with host defence: it promotes agglutination, bacterial capsular swelling and phagocytosis, and activates the classical complement pathway through its calcium-dependent binding to phosphocholine. CRPs have also been sequenced in an invertebrate, the Atlantic horseshoe crab, where they are a normal constituent of the hemolymph.

    \

    SAP is a vertebrate protein that is a precursor of amyloid component P. It is found in all types of amyloid deposits, in glomerular basement menbrane and in elastic fibres in blood vessels. SAP binds to various lipoprotein ligands in a calcium-dependent manner, and it has been suggested that, in mammals, this may have important implications in atherosclerosis and amyloidosis.

    \

    FP is a SAP homologue found in the Syrian hamster. The concentration of this plasma protein is altered by sex steroids and stimuli that elicit an acute phase response.

    \

    Pentaxin proteins expressed in the nervous system are neural pentaxin I (NPI) and II (NPII) PUBMED:8884281. NPI and NPII are homologous and can exist within one species. It is suggested that both proteins mediate the uptake of synaptic macromolecules and play a role in synaptic plasticity. Apexin, a sperm acrosomal protein, is a homologue of NPII found in guinea pigs PUBMED:7798266.

    \

    PTX3 (or TSG-14) protein is a cytokine-induced protein that is homologous to CRPs and SAPs, but its function is not yet known.

    \ 227 IPR002593 \ This domain has no known function. It is found in several\ Caenorhabditis elegans proteins. The domain contains 6 conserved\ cysteines that probably form three disulphide bridges.\ 4395 IPR004534 \ In prokaryotes, the incorporation of selenocysteine as the 21st amino acid, encoded by TGA, requires several elements: SelC is the tRNA itself, SelD acts as a donor of reduced selenium, SelA modifies a serine residue on SelC into selenocysteine, and SelB is a selenocysteine-specific translation elongation factor. 3-prime or 5-prime non-coding elements of mRNA have been found as probable structures for directing selenocysteine incorporation. \

    This family describes SelA. A close homolog of SelA is found in Helicobacter pylori, but all other required elements are missing and the protein is shorter at the N-terminus than SelA from other species. The trusted cut-off is set above the score generated for H. pylori putative SelA.

    \ 1082 IPR000573 \ Synonym(s): Citrate hydro-lyase, Aconitase\

    Aconitate hydratase () is the enzyme from the\ tricarboxylic acid cycle that catalyzes the reversible, stereo-specific,\ isomerization of citrate to isocitrate via cis-aconitate in the tricarboxylic acid\ cycle, a non-redox active process PUBMED:7675781, PUBMED:9020582. Aconitase, in\ its active form, contains a 4Fe-4S iron-sulphur cluster; three cysteine residues have\ been shown to be ligands of the 4Fe-4S cluster PUBMED:2726740. Unlike the majority of\ iron-sulphur proteins that function as electron carriers, the Fe-S cluster of\ aconitase reacts directly with an enzyme substrate PUBMED:8151704.

    \

    In eukaryotes two isozymes of aconitase are known to exist: one found in the\ mitochondrial matrix and the other found in the cytoplasm. The aconitase family\ contains a variety of proteins which include: the iron-responsive element binding\ protein (IRE-BP) PUBMED:8347279; alpha-isopropylmalate isomerase, an enzyme catalysing\ the second step in the biosynthesis of leucine; and homoaconitase.

    \

    The aconitate hydratase, C-terminal domain is almost always found along with the aconitate hydratase, N-terminal domain .

    \ 1338 IPR007879 \ This family consists of a series of unidentified baculoviral P33 protein homologues of unknown function.\ 7649 IPR012479 \

    This family comprises sequences bearing significant similarity to the mouse transcriptional regulator protein HCNGP (). This protein is localised to the nucleus and is thought to be involved in the regulation of beta-2-microglobulin genes.

    \ 5547 IPR008628 \ This family consists of several eukaryotic GPP34 like proteins. GPP34 localises to the Golgi complex and is conserved from Saccharomyces cerevisiae to humans. The cytosolic-ally exposed location of GPP34 predicts a role for a novel coat protein in Golgi trafficking PUBMED:11042173.\ 920 IPR000157 \

    In Drosophila melanogaster the Toll protein is involved in establishment of dorso-ventral\ polarity in the embryo. In addition, members of the Toll family play a key\ role in innate antibacterial and antifungal immunity in insects as well as in\ mammals. These proteins are type-I transmembrane receptors that share an\ intracellular 200 residue domain with the interleukin-1 receptor (IL-1R), the\ Toll/IL-1R homologous region (TIR). The similarity between Toll-like receptors\ (LTRs) and IL-1R is not restricted to sequence homology since these proteins\ also share a similar signaling pathway. They both induce the activation of a\ Rel type transcription factor via an adaptor protein and a protein kinase PUBMED:8621445.\ Interestingly, MyD88, a cytoplasmic adaptor protein found in mammals, contains\ a TIR domain associated to a DEATH domain (see ) PUBMED:8621445, PUBMED:9374458, PUBMED:10679407. Besides the mammalian and Drosophila melanogaster proteins, a TIR domain is also found in a number of plant proteins implicated in host defense PUBMED:9868361. As MyD88, these proteins are cytoplasmic.

    \

    Site directed mutagenesis and deletion analysis have shown that the TIR domain is essential for Toll and IL-1R activities. Sequence analysis have revealed\ the presence of three highly conserved regions among the different members of\ the family: box 1 (FDAFISY), box 2 (GYKLC-RD-PG), and box 3 (a conserved W\ surrounded by basic residues). It has been proposed that boxes 1 and 2 are\ involved in the binding of proteins involved in signaling, whereas box 3 is\ primarily involved in directing localization of receptor, perhaps through\ interactions with cytoskeletal elements PUBMED:10671496.

    \ 7842 IPR012547 \

    This family contains many hypothetical bacterial proteins.

    \ 529 IPR006164 \

    The Ku heterodimer is composed of Ku70 and Ku80 (or Ku86), 70 kDa and 80 kDa subunits of an ATP-dependent DNA helicase, which contributes to genomic integrity through its ability to bind DNA double-stranded breaks and facilitate repair by the non-homologous end-joining pathway. This is the central DNA-binding beta-barrel domain and is found in both the Ku70 and Ku80 proteins. Ku makes only a few contacts with the sugar-phosphate backbone, and none with the DNA bases, but it fits sterically to major and minor groove contours forming a ring that encircles duplex DNA, cradling two full turns of the DNA molecule. By forming a bridge between the broken DNA ends, Ku acts to structurally support and align the DNA ends, to protect them from degradation, and to prevent promiscuous binding to unbroken DNA. Ku effectively aligns the DNA, while still allowing access of polymerases, nucleases and ligases to the broken DNA ends to promote end joining PUBMED:11483577.

    \ 2898 IPR002874 \ This family consists of glycoprotein I from various members of the alphaherpesvirinae. These include Herpes simplex virus, varicella-zoster virus and pseudorabies virus. Glycoprotein I (gI) is important during natural infection, mutants lacking gI produce smaller lesions at the site of infection and show reduced neuronal spread PUBMED:8764058. gI forms a heterodimeric complex with gE; this complex displays Fc receptor activity (binds to the Fc region of immunoglobulin) PUBMED:8764058. Glycoproteins are also important in the production of virus-neutralizing antibodies and cell mediated immunity PUBMED:8207390. The alphaherpesvirinae have a dsDNA genome and have no RNA stage during viral replication.\ 54 IPR003374 \ This prokaryotic family of lipoproteins are related to ApbE, from Salmonella typhimurium. ApbE is involved in thiamine synthesis PUBMED:9473043. More specifically is may be involved in the conversion of aminoimidazole ribotide (AIR) to 4-amino-5-hydroxymethyl-2-methyl pyrimidine (HMP) during the biosynthesis of the pyrimidine moiety of thiamine.\ 6865 IPR010749 \

    This family consists of several hypothetical Enterobacterial proteins of around 120 residues in length. The function of this family is unknown.

    \ 3711 IPR001272 \ Phosphoenolpyruvate carboxykinase (ATP) (PEPCK) catalyzes the formation \ of phosphoenolpyruvate by decarboxylation of oxaloacetate while hydrolyzing ATP, a rate \ limiting step in gluconeogenesis (the biosynthesis of glucose) PUBMED:1701430, PUBMED:8609605, \ PUBMED:8599762. It is involved in the glyoxylate bypass, an alternative to the \ tricarboxylic acid cycle in bacteria, fungi and plants.\ 7288 IPR003611 \ This is a short helical motif of unknown function found in intron-associated nuclease 2, which is involved in intron homing.\ 5993 IPR010383 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    The glycosyltransferase family 36 includes cellobiose phosphorylase (), cellodextrin phosphorylase (), and chitobiose phosphorylase. Many members of this family contain two copies of the domain represented in this entry.

    \ 6359 IPR009494 \

    This family consists of several bacterial proteins from Staphylococcus aureus as well as a number of phage proteins. The function of this family is unknown.

    \ 6900 IPR009777 \

    This family consists of several hypothetical bacterial proteins of around 250 residues in length. Members of this family are often known as YacF after the Escherichia coli protein . The function of this family is unknown.

    \ 5655 IPR008388 \ This family consists of eukaryotic vacuolar ATP synthase subunit S1 proteins PUBMED:7929063.\ 1503 IPR003763 \ The CDP-diacylglycerol pyrophosphatases play a role in the regulation of phospholipid metabolism by inositol, as well as regulating the cellular levels\ of phosphatidylinositol PUBMED:11016943.\ 7588 IPR011671 \ These sequences are derived from hypothetical eukaryotic proteins. The region in question is approximately 300 residues long.\ 5423 IPR008489 \ This is a family of uncharacterised ORFs found in Bacteriophage and Lactococcus lactis.\ 7644 IPR012899 \

    This five residue motif is found in a number of bacterial proteins bearing similarity to the protein CpxP (). This is a periplasmic protein that aids in combating extracytoplasmic protein-mediated toxicity, and may also be involved in the response to alkaline pH PUBMED:947. Another member of this family, Spy () is also a periplasmic protein that may be involved in the response to stress PUBMED:9068658. The homology between CpxP and Spy may indicate that these two proteins are functionally related PUBMED:9473036. The motif is found repeated twice in many members of this entry.

    \ 1950 IPR004347 \

    This domain represents the C-terminal region of Orf6, which is localised upstream of the 20S proteasome subunit genes, prcA and prcB in members of the Actinobacteria: Streptomyces coelicolor PUBMED:9765579, Frankia sp. PUBMED:10652097 and Rhodococcus erythropolis PUBMED:7583123.

    \ 3606 IPR005653 \

    This family of proteins are mostly uncharacterised. However the family does include Escherichia coli OstA that has been characterised as an organic solvent tolerance protein PUBMED:7811102.

    \ 2579 IPR002089 \ This protein spans the viral membrane with an extracellular\ amino-terminus external and a cytoplasmic carboxy-terminus. Influenza virus M2 acts as an ion channel protein PUBMED:9360376. The channel pore is formed by the transmembrane domain of the M2 protein and the wild-type M2 channel was found to be regulated by pH and may have a pivotal role in the biology of\ influenza virus infection PUBMED:1374685.\ 3493 IPR006029 \

    Neurotransmitter ligand-gated ion channels are transmembrane receptor-ion channel complexes that open transiently upon binding of specific ligands, allowing rapid transmission of signals at chemical synapses PUBMED:1721053, PUBMED:1846404.

    \

    Of the five families known, four have been shown to form a sequence-related superfamily. These are the gamma-aminobutyric acid type A (GABA-A), nicotinic acetylcholine, glycine and the serotonin 5HT3 receptors. The ionotropic glutamate receptors () have a distinct primary structure.

    \

    However, all these receptors possess a pentameric structure (made up of varying subunits), surrounding a central pore. Each of these subunits contains a large extracellular N-terminal ligand-binding region; 3 hydrophobic transmembrane domains; a large intracellular region; and a fourth hydrophobic domain PUBMED:1721053, PUBMED:1846404.

    \ \

    This domain represents four transmembrane helices of a variety of neurotransmitter-gated ion-channels.

    \ 6087 IPR010427 \

    This is a family of uncharacterised proteins found in Actinobacteria.

    \ 3607 IPR007543 \ This family is involved in organic solvent tolerance in bacteria. The region contains several highly conserved, potentially catalytic, residues PUBMED:7811102.\ 3674 IPR000014 \

    PAS domains are involved in many signalling proteins where they\ are used as a signal sensor domain. PAS domains appear in archaea,\ bacteria and eukaryotes. Several PAS-domain proteins are known to\ detect their signal by way of an associated cofactor. Haeme,\ flavin, and a 4-hydroxycinnamyl chromophore are used in different\ proteins. The PAS domain was named after three proteins that it\ occurs in:

    \
  • Per- period circadian protein
  • \
  • Arnt- Ah receptor nuclear translocator protein
  • \
  • Sim- single-minded protein.
  • \

    PAS domains are often associated with\ PAC domains . It appears that these domains are directly linked, and that together they form the conserved 3D PAS fold. The division between the PAS and PAC domains is caused by major differences in sequences in the region connecting these two motifs PUBMED:15009198. In human PAS kinase, this region has been shown to be very flexible, and adopts different conformations depending on the bound ligand PUBMED:12377121.\ Probably the most surprising identification of a PAS domain was that in\ EAG-like K+-channels PUBMED:9301332.

    \ \ 257 IPR005071 \ This family of worm proteins has no known function\ 6578 IPR010627 \

    This domain is found at the N terminus of bacterial aspartic peptidases belonging to MEROPS peptidase family A24 (clan AD), subfamily A24A (type IV prepilin peptidase, ). It's function has not been specifically determined; however some of the family have been characterised as bifunctional PUBMED:8057924, and this domain may contain the N-methylation activity. The domain consists of an intracellular region between a pair of transmembrane domains. This intracellular region contains an invariant proline and four conserved cysteines. These Cys residues are arranged in a two-pair motif, with the Cys residues of a pair separated (usually) by 2 aa and with each pair separated by 21 largely hydrophilic residues (C-X-X-C...X21...C-X-X-C); they have been shown to be essential to the overall function of the enzyme PUBMED:8340405, PUBMED:9224881.

    \ \

    The bifunctional enzyme prepilin peptidase (PilD) from Pseudomonas aeruginosa is a key determinant in both type-IV pilus biogenesis and extracellular protein secretion, in its roles as a leader peptidase and methyl transferase (MTase). It is responsible for endopeptidic cleavage of the unique leader peptides that characterise type-IV pilin precursors, as well as proteins with homologous leader sequences that are essential components of the general secretion pathway found in a variety of Gram-negative pathogens. Following removal of the leader peptides, the same enzyme is responsible for the second posttranslational modification that characterises\ the type-IV pilins and their homologues, namely N-methylation of the newly exposed N-terminal amino acid residue PUBMED:9224881.

    \ \ 2753 IPR006710 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 43 includes enzymes with the following activities, beta-xylosidase (), alpha-L-arabinofuranosidase (); arabinanase (), and xylanase ().

    \ 4675 IPR000878 \ Uroporphyrin-III C-methyltransferase () (SUMT) PUBMED:1856165, PUBMED:1906874 catalyzes the\ transfer of two methyl groups from S-adenosyl-L-methionine to the C-2 and C-7 atoms of uroporphyrinogen \ III to yield precorrin-2 via the intermediate formation of precorrin-1. SUMT is the first enzyme \ specific to the cobalamin pathway and precorrin-2 is a common intermediate in the biosynthesis of\ corrinoids such as vitamin B12, siroheme and coenzyme F430. The sequences of SUMT from a variety of \ bacterial and archaeal species are currently available. In species such as Bacillus megaterium \ (gene cobA), Pseudomonas denitrificans (cobA) or Methanobacterium ivanovii (gene corA) SUMT is a protein \ of about 25 to 30 kD. In Escherichia coli and related bacteria, the cysG protein, which is involved in \ the biosynthesis of siroheme, is a multifunctional protein composed of a N-terminal domain, probably\ involved in transforming precorrin-2 into siroheme, and a C-terminal domain which has SUMT activity. The \ sequence of SUMT is related to that of a number of P. denitrificans and Salmonella typhimurium enzymes \ involved in the biosynthesis of cobalamin which also seem to be SAM-dependent methyltransferases \ PUBMED:2211521, PUBMED:8501034. The similarity is especially strong with two of these enzymes, cobI/cbiL \ (S-adenosyl-L-methionine--precorrin-2 methyltransferase) and cobM/cbiF, whose exact function is not known.\ 5818 IPR010297 \

    This domain is associated with proteins of unknown function, which are hydrolase-like.

    \ 3343 IPR000523 \ Magnesium-chelatase is a three-component enzyme that catalyses the insertion of Mg2+ into protoporphyrin IX. This is the first unique step in the synthesis of (bacterio)chlorophyll. As a result, it is thought that Mg-chelatase has an important role in channeling intermediates into the (bacterio)chlorophyll branch in response to conditions suitable for photosynthetic growth. ChlI and BchD have molecular weights between 38-42 kDa.\ 2494 IPR003786 \

    Formate dehydrogenase is required for nitrate inducible formate dehydrogenase activity. In Wolinella succinogenes it is a membranous molybdo-enzyme which is involved in phosphorylative electron transport. The functional formate dehydrogenase may be made up of three or four different subunits PUBMED:1781728. In Escherichia coli, FdhD is required for the formation of active formate dehydrogenases.

    \ 4999 IPR003425 \ This family consists of a repeat found in conserved hypothetical integral membrane proteins. The function of this region and the proteins which possess it is unknown.\ 1107 IPR003471 \ Early region 3 (E3) of human adenoviruses (Ads) codes for proteins that appear to control viral interactions with the host PUBMED:8627757. This region called CR1 (conserved region 1) PUBMED:8627757 is found three times in Adenovirus type 19 (a subgroup D virus) 49 Kd protein in the E3 region. CR1 is also found in the 20.1 Kd protein of subgroup B adenoviruses. The function of this 80 amino acid region is unknown. This region is probably a divergent immunoglobulin domain.\ 4481 IPR007727 \ This family of proteins includes Spo12 from Saccharomyces cerevisiae . The Spo12 protein plays a regulatory role in two of the most fundamental processes of biology, mitosis and meiosis, and yet its biochemical function remains elusive PUBMED:11729145. Spo12 is a nuclear protein PUBMED:11278742. Spo12 is a component of the FEAR (Cdc fourteen early anaphase release) regulatory network, which promotes Cdc14 release from the nucleolus during early anaphase PUBMED:11832211. The FEAR network is comprised of the polo kinase Cdc5, the separase Esp1, the kinetochore-associated protein Slk19, and Spo12 PUBMED:11832211.\ 6537 IPR009585 \

    This family consists of several hypothetical bacterial proteins of around 51 residues in length which seem to be specific to Vibrio cholerae. The function of this family is unknown.

    \ 1948 IPR004319 \ This domain is found entirely in Mycoplasma pneumoniae proteins of unknown function. Another related domain () is also found entirely in mycoplasmal proteins of the MG032/MG096/MG288 family and both domains often occur together.\ 990 IPR003482 \ WhiB is a putative transcription factor in Actinobacteria, required for differentiation and sporulation. The process of mycelium formation in Streptomyces, which occurs in response to nutrient limitation, is controlled by a number of whi genes, named for the white colour of aerial hyphae when mutations occur in these genes. The normal colour is grey. The exact role of WhiB is not clear, but a mutation in the gene results in white, tightly coiled aerial hyphae.\ 670 IPR003099 \

    Members of this family are prephenate dehydrogenases involved in tyrosine biosynthesis.

    \ 2216 IPR007569 \ This is a family of uncharacterised proteins.\ 5309 IPR008691 \ Most of the antigens of Mycobacterium leprae and Mycobacterium tuberculosis that have been identified are members of stress protein families, which are highly conserved throughout many diverse species. Of the M. leprae and M. tuberculosis antigens identified by monoclonal antibodies, all except the 18 kDa M. leprae antigen and the 19 kDa M. tuberculosis antigen are strongly cross-reactive between these two species and are coded within very similar genes PUBMED:8454357, PUBMED:2230723.\ 3406 IPR003330 \ The immunogenic major surface antigen (MSG) also termed glycoprotein A (gpA) is involved in the immunopathogenesis of Pneumocystis carinii. MSG from all P. carinii has conserved secondary structure, as well as function PUBMED:9679195, PUBMED:9712777.\ 6142 IPR010452 \

    This family consists of several bacterial isocitrate dehydrogenase kinase/phosphatase (AceK) proteins () PUBMED:9409817.

    \ 6982 IPR009824 \

    This family consists of several hypothetical cyanobacterial proteins of around 150 residues in length, which seem to be specific to Anabaena species. The function of this family is unknown.

    \ 88 IPR004210 \ The BESS motif is named after the proteins in which it is found (BEAF PUBMED:7781065, Suvar(3)7 PUBMED:2107402 and Stonewall PUBMED:8631271). The motif is 40 amino acid residues long and is composed of two predicted alpha helices. Based on the protein in which it is found and the presence of conserved positively charged residues it is predicted to be a DNA binding domain. This domain appears to be specific to Drosophila.\ 6853 IPR010745 \

    This family consists of several hypothetical bacterial proteins of around 150 residues in length. The function of this family is unknown.

    \ 5063 IPR007900 \

    Accurate transcription initiation at protein-coding genes by RNA polymerase II requires the assembly of a multiprotein\ complex around the mRNA start site. Transcription factor TFIID is one of the general factors involved in this process. Yeast TFIID comprises the TATA binding protein and 14 TBP-associated factors (TAFIIs), nine of which contain\ histone-fold domains (). The C-terminal region of the TFIID-specific yeast TAF4 (yTAF4) containing the HFD shares\ strong sequence similarity with Drosophila (d)TAF4 and human TAF4. A structure/function\ analysis of yTAF4 demonstrates that the HFD, a short conserved C-terminal domain (CCTD), and the region separating them\ are all required for yTAF4 function. This region of similarity is found in Transcription initiation factor TFIID component TAF4\ PUBMED:12237303.

    \ 6243 IPR011258 \

    This family represents the N-terminal region of the 2,3-bisphosphoglycerate-independent phosphoglycerate mutase (or phosphoglyceromutase or BPG-independent PGAM) protein (). The family is found in conjunction with Metalloenzyme (located in the C-terminal region of the protein).

    \ 3238 IPR000372 \

    Leucine-rich repeats (LRR, see ) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape PUBMED:14747988. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions PUBMED:11751054. Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis and the immune response.

    \ \

    LRRs are often flanked by cysteine-rich domains: an N-terminal LRR domain and a C-terminal LRR domain (). This entry represents the N-terminal LRR domain.

    \ \ 2423 IPR000941 \

    Enolase (2-phospho-D-glycerate hydrolase) is an essential glycolytic enzyme that catalyses the interconversion of 2-phosphoglycerate and phosphoenolpyruvate PUBMED:1859865, PUBMED:1840492. In vertebrates, there are 3 different, tissue-specific isoenzymes, designated alpha, beta and gamma. Alpha is present in most tissues, beta is localised in muscle tissue, and gamma is found only in nervous tissue. The functional \ enzyme exists as a dimer of any 2 isoforms. In immature organs and in adult liver, it is usually an alpha homodimer, in adult skeletal muscle, a beta homodimer, and in adult neurons, a gamma homodimer. In developing muscle, it is usually an alpha/beta heterodimer, and in the developing nervous system, an \ alpha/gamma heterodimer PUBMED:3390159. The tissue specific forms display minor kinetic differences. Tau-crystallin, one of the major lens proteins in some fish, reptiles and birds, has been shown PUBMED:3589669 to be evolutionary related to enolase.

    \

    Neuron-specific enolase is released in a variety of neurological diseases, such as multiple sclerosis and after seizures or acute stroke. Several tumour cells have also been found positive for neuron-specific enolase. Beta-enolase deficiency is associated with glycogenosis type XIII defect.

    \ 4424 IPR007631 \ The domain is found in the primary vegetative sigma factor. The function of this domain is unclear, and it can be removed without apparent loss of function PUBMED:8858155, PUBMED:11931761.\ 6691 IPR010679 \

    This family represents a conserved region about 130 residues long within hypothetical proteins of unknown function. Family members include eukaryotic, bacterial and archaeal proteins.

    \ 375 IPR001850 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This signature identifies serine peptidases belong to MEROPS peptidase family S7 (flavivirin family, clan PA(S)). The protein fold of the peptidase domain for members of this family resembles that of chymotrypsin, the type example for clan PA.

    \ \ \

    Flaviviruses produce a polyprotein from the ssRNA genome. The N-terminus of the NS3 protein (approx. 180 aa) is required for the processing of the polyprotein. NS3 also has conserved homology with NTP-binding proteins and DEAD family of RNA helicase PUBMED:7642575, PUBMED:2174669, PUBMED:8269709.

    \ 3555 IPR005661 \

    Members of this family are integral membrane proteins. The decarboxylation reactions they catalyse are coupled to the vectorial transport of Na+ across the cytoplasmic membrane, thereby creating a sodium ion motive force that is used for ATP synthesis PUBMED:9428714.

    \ 5736 IPR008585 \ This family consists of a number of bacterial and phage proteins with no known function and which are found in Bacillus species and the Lambda-like viruses.\ 7475 IPR011510 \

    This entry represents a second domain related to the SAM domain. Sterile alpha motif (SAM) domains are known to be involved in diverse protein-protein interactions, associating with both SAM-containing and non-SAM-containing protein pathways.

    \ 4833 IPR003442 \ This group consists of bacterial proteins, which contain a P-loop.\ 4758 IPR004933 \ There are several antigenic variants in Rickettsia tsutsugamushi, and a type-specific antigen (TSA) of 56-kilodaltons located on the\ rickettsial surface is responsible for the variation PUBMED:2496028, PUBMED:1618776. TSA proteins are probably integral membrane proteins. \ \ 5490 IPR008635 \ This short motif is found in invasins and haemagglutinins, normally associated with the Hep_Hag repeat ().\ 1884 IPR003731 \

    This entry represents several Nif (B, X and Y) proteins, which are involved in the biosynthesis of the iron-molybdenum cofactor (FeMo-co) found in the dinitrogenase enzyme of the nitrogenase complex in nitrogen-fixing bacteria. The nitrogenase complex catalyses the reduction of atmospheric dinitrogen to ammonia, and is composed of an iron metalloprotein (dinitrogenase reductase; homodimer of NifH; ) and a Fe-Mo metalloprotein (dinitrogenase; heterotetramer of NifD and NifK; ). The pathway for the synthesis of the Fe-Mo cofactor involves several proteins, including NifB, NifE, NifH, NifN, NifQ, NifV and NifX. NifB appears to be an iron-sulphur source for FeMo-co biosynthesis, while NifX may be associated with the mature FeMo-co, in particular with the addition of homocitrate during the last step of biosynthesis PUBMED:11279153. The NifX protein shows sequence similarity with the C-terminus of NifB PUBMED:12892890, as well as to the conserved protein MTH1175 from the archaeon Methanobacterium thermoautotrophicum, which displays a ribonuclease H-like motif of three layers, alpha/beta/alpha, with a single mixed beta-sheet PUBMED:12836677.

    \ 915 IPR000594 \ Ubiquitin-activating enzyme (E1 enzyme) PUBMED:1647207, PUBMED:1656558 activates ubiquitin by first\ adenylating with ATP its C-terminal glycine residue and thereafter linking\ this residue to the side chain of a cysteine residue in E1, yielding an\ ubiquitin-E1 thiolester and free AMP. Later the ubiquitin moiety is\ transferred to a cysteine residue on one of the many forms of ubiquitin-\ conjugating enzymes (E2).\

    The family of ubiquitin-activating enzymes shares in its catalytic domain significant similarity with a large\ family of NAD/FAD-binding proteins. This domain is based on the common NAD/FAD-binding fold and\ finds members of several families, including UBA ubiquitin activating enzymes; the hesA/moeB/thiF family;\ NADH peroxidases; the LDH family; sarcosin oxidase; phytoene dehydrogenases; alanine dehydrogenases;\ hydroxyacyl-CoA dehydrogenases and many other NAD/FAD dependent dehydrogenases and oxidases.

    \ 3091 IPR000442 \ Group II introns use intron-encoded reverse transcriptase,\ maturase and DNA endonuclease activities for site-specific\ insertion into DNA PUBMED:9362497. Although this type of intron is\ self splicing in vitro they require a maturase protein for\ splicing in vivo. It has been shown that a specific region\ of the aI2 intron is needed for the maturase function PUBMED:8029012.\ This region was found to be conserved in group II introns\ and called domain X PUBMED:8255751.\ 155 IPR001268 \

    Synonym(s): Ubiquinone reductase, Type I dehydrogenase, Complex I dehydrogenase

    \ \

    NADH dehydrogenase (ubiquinone) () is an oligomeric enzymatic complex located in the \ inner mitochondrial membrane, in the chloroplast or in cyanobacteria (as a \ NADH-plastoquinone oxidoreductase). The 30 kDa subunit is one of the 25 to 30 polypeptide \ subunits of this bioenergetic enzyme complex. In mammals and in Neurospora crassa it is\ nuclear encoded as a precursor form with a transit peptide, in paramecium (protein P1)\ and in the Dictyostelium discoideum it is mitochondrial encoded and it is chloroplast encoded in various higher plants. It is also present in bacteria.

    \ 553 IPR003111 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This signature defines the N-terminal domain of the archael, bacterial and eukaryotic lon proteases, which are ATP-dependent serine peptidases belonging to the MEROPS peptidase family S16 (lon protease family, clan SF). In the eukaryotes the majority of the proteins are located in the mitochondrial matrix PUBMED:8248235, PUBMED:9620272. In yeast, Pim1, is located in the mitochondrial matrix, is required for mitochondrial function, is constitutively expressed but is increased after thermal stress, suggesting that Pim1 may play a role in the heat shock response PUBMED:8276800.

    \ 5922 IPR009288 \

    AIG2 is an Arabidopsis proteins that exhibit RPS2- and avrRpt2-dependent induction early after infection with Pseudomonas syringae pv maculicola strain ES4326 carrying avrRpt2 PUBMED:8742710.

    \ 7737 IPR012927 \

    This domain is present in the N-terminal region of the ShET2 enterotoxin produced by Shigella flexneri () and Escherichia coli (). This protein was found to confer toxigenicity in Ussing chamber assays, and the N-terminal region was found to be important for its enterotoxic effect. It is thought to be a hydrophobic protein that forms inclusion bodies within the bacterial cell, and may be secreted by the Mxi system PUBMED:7591128. Most proteins containing this domain are annotated as putative enterotoxins, but one member () is a regulator of acetyl CoA synthetase, and another two members ( and ) are annotated as ankyrin-like regulatory proteins and contain Ank repeats ().

    \ 5889 IPR010336 \

    ME53 is one of the major early-transcribed genes. The ME53 protein is reported to contain a putative zinc finger motif PUBMED:8093490.

    \ 5000 IPR007029 \ This short presumed domain is about 50 amino acid residues long. It often contains two cysteines that may be functionally important. This domain is found in copper transporting ATPases, some phenol hydroxylases and in a set of uncharacterised membrane proteins including . This domain is named after three of the most conserved amino acids it contains. The domain may be metal binding, possibly copper ions. This domain is duplicated in some copper transporting ATPases.\ 7606 IPR011689 \ This is a group of proteins, expressed in the crenarchaeon Pyrobaculum aerophilum, whose members are variable in length and level of conservation. The presence of numerous frameshifts and internal stop codons in multiple alignments are thought to indicate that most family members are no longer functional PUBMED:11792869.\ 1359 IPR007623 \

    This family includes the human p75NTR-associated cell death executor (Nerve growth factor receptor associated protein 1), which may be a signalling adaptor molecule involved in p75NTR-apoptosis induced by nerve growth factor. It may be important in neurogenetic diseases.

    \ 4269 IPR001205 \ RNA-directed RNA polymerase, P3D () is part of the genome polyprotein that\ also contains, coat proteins VP1 to VP4, core proteins P2A to P2C and P3A, genome-linked\ protein VPG and picornain 3C () (protease 3C) (P3C). RNA-directed RNA\ polymerase catalyses RNA-template-directed extension of the 3'-end of an RNA strand by\ one nucleotide at a time. Can initiate a chain de novo.\ 5816 IPR010296 \

    This family consists of uncharacterised bacterial proteins of unknown function which are thioredoxin-like.

    \ 737 IPR002540 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The potyviridae are a family of positive strand RNA viruses, members of which include zucchini yellow mosaic virus,\ and turnip mosaic viruses which cause considerable losses of crops worldwide.

    \ \

    This entry represents a C-terminal region from various plant\ potyvirus P1 proteins (found at the N terminus of the polyprotein).\ The C terminus of P1 is a serine peptidase belonging to MEROPS peptidase family S30 (clan PA(S)). It is the protease responsible for \ autocatalytic cleavage between P1 and the helper component protease, which is a cysteine peptidase belonging to MEROPS peptidase family C6 PUBMED:7844540, PUBMED:1529535. The P1 protein may be involved in virus-host interactions PUBMED:7844540.

    \ 5458 IPR008701 \ This family consists of several NPP1 like necrosis inducing proteins from oomycetes, fungi and bacteria. Infiltration of NPP1 into leaves of Arabidopsis thaliana plants result in transcript accumulation of pathogenesis-related (PR) genes, production of ROS and ethylene, callose apposition, and HR-like cell death PUBMED:12410815.\ 2398 IPR002769 \

    This family includes eukaryotic translation initiation factor\ 6 (eIF6) as well as presumed archaeal homologues.

    \ \

    The assembly of 80S ribosomes requires joining of the 40S and 60S subunits, which is triggered by the formation of an initiation complex on the 40S subunit. This\ event is rate-limiting for translation, and depends on external stimuli and the status of the cell. \ \ \ \ Eukaryotic translation initiation factor 6 (eIF6) binds specifically to the free 60S ribosomal subunit and \ prevents its association with the 40S ribosomal subunit ribosomes PUBMED:9891075. Furthermore, eIF6 interacts in the cytoplasm with RACK1, a receptor for activated protein kinase C (PKC). RACK1 is a major component of translating ribosomes, which harbour significant amounts of PKC. Loading 60S subunits with eIF6 caused a dose-dependent translational block and impairment of 80S formation, which are reversed by expression of RACK1 and stimulation of PKC in vivo and in vitro. PKC stimulation leads to eIF6 phosphorylation and its release, promoting 80S subunit formation. RACK1 provides a physical and functional link between PKC signalling and ribosome activation.

    \ \ \ 7989 IPR012579 \

    This C-terminal domain is found in a novel family of hypothetical nucleolar proteins PUBMED:15112237.

    \ 943 IPR002319 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \ \

    Phenylalanyl-tRNA synthetase () is an alpha2/beta2 tetramer composed of 2 subunits that belongs to class IIc. In eubacteria, a small subunit (pheS gene) can be designated as beta (E. coli) or alpha subunit (nomenclature adopted in InterPro). Reciprocally the large subunit\ (pheT gene) can be designated as alpha (E. coli) or beta (see and ). In all other kingdoms the two subunits have equivalent length in eukaryota, and can be identified by specific signatures. The enzyme from Thermus thermophilus has an alpha 2 beta 2 type quaternary structure and is one of the most complicated members of the synthetase family. Identification of phenylalanyl-tRNA synthetase as a member of class II aaRSs was based only on sequence alignment of the small alpha-subunit with other synthetases PUBMED:8199244.

    \ 7471 IPR011487 \

    This is a family of Rhodopirellula baltica hypothetical proteins of about 500 amino acids in length.

    \ 5194 IPR008029 \

    Endonuclease I is a junction-resolving enzyme encoded by bacteriophage T7, that selectively binds and cleaves\ four-way Holliday DNA junctions PUBMED:12093751. The structure of the enzyme shows that it forms a symmetric homodimer arranged in two well-separated domains. Each domain, however,\ is composed of elements from both subunits, and amino acid side chains from both protomers contribute to the active site PUBMED:11135673.

    \ 4949 IPR007782 \ Using reduced vitamin K, oxygen, and carbon dioxide, gamma-glutamyl carboxylase post-translationally modifies certain glutamates by adding carbon dioxide to the gamma position of those amino acids. In vertebrates, the modification of glutamate residues of target proteins is facilitated by an interaction between a propeptide present on target proteins and the gamma-glutamyl carboxylase PUBMED:10748045.\ 2465 IPR004140 \ The Exo70 protein forms one subunit of the exocyst complex. First discovered in Saccharomyces cerevisiae PUBMED:8978675, Exo70 and other exocyst proteins have been observed in several other eukaryotes, including humans. In S. cerevisiae, the\ exocyst complex is involved in the late stages of exocytosis, and is localized at the tip of the bud, the major site of exocytosis in yeast PUBMED:8978675. Exo70 interacts with the Rho3 GTPase PUBMED:10207081. This interaction mediates one of the three known functions of Rho3 in cell polarity: vesicle docking and fusion with the plasma membrane (the other two functions are regulation of actin polarity and transport of exocytic vesicles from the mother cell to the bud) PUBMED:10588647. In humans, the functions of Exo70 and the exocyst complex are less well characterized: Exo70 is expressed in several tissues and is thought to also be involved in exocytosis PUBMED:9405631.\ 5950 IPR010927 \

    This family consists of several bacterial TraH proteins which are involved in pilus assembly.

    \ 6174 IPR010466 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 5679 IPR008668 \ This family consists of several feline specific Lentivirus virion infectivity factor (VIF) proteins. VIF is essential for productive Feline immunodeficiency virus infection of host target cells in vitro PUBMED:10441553.\ 1500 IPR004201 \ This domain has a double psi-beta barrel fold and includes VCP-like ATPase and N-ethylmaleimide sensitive fusion protein N-terminal domains. Both the VAT and NSF N-terminal functional domains consist of two structural domains of which this is at the C-terminus. The VAT-N domain found in AAA ATPases () is a substrate 185-residue recognition domain PUBMED:10531028.\ 1571 IPR003897 \

    Clostridial species are one of the major causes of food \ poisoning/gastro-intestinal illnesses. They are Gram-positive, spore-forming rods that occur naturally in the soil PUBMED:8335373. Among the family are: Clostridium botulinum, which produces one of the most potent toxins in existence; Clostridium tetani, causative agent of tetanus; and Clostridium perfringens, commonly found in wound infections and diarrhoea cases. The use of toxins to damage the host is a method deployed by many bacterial pathogens.

    \

    The major virulence factor of C. perfringens is the CPE enterotoxin,\ which is secreted upon invasion of the host gut, and contributes to food \ poisoning and other gastrointestinal illnesses PUBMED:8335373. It has a molecular weight of 35.3kDa, and is responsible for the disintegration of tight \ junctions between endothelial cells in the gut PUBMED:9087440. This mechanism is mediated by host claudins-3 and -4, situated at the tight junctions.

    \

    Recently, two more host receptors have been characterised and expressed in \ vivo PUBMED:9334247. Named CPE-R and RVP1, these may be utilised in the passage of Clostridial species through the gut wall, although the regulatory mechanisms\ have not been elucidated.

    \ 7518 IPR011648 \

    The cyanobacterial clock proteins KaiA and KaiB are proposed as regulators of the circadian rhythm in cyanobacteria. The overall fold of the KaiA monomer is that of a four-helix bundle, which forms a dimer in the known structure PUBMED:15071498. The N-terminal domain of KaiA, from cyanobacteria, acts as a psuedo-receiver domain, but lacks the conserved aspartyl residue required for phosphotransfer in response regulators PUBMED:12438647.

    \ 7107 IPR009902 \

    This family consists of several hypothetical Arabidopsis thaliana proteins of around 225 residues in length. The function of this family is unknown.

    \ 6747 IPR009690 \

    This family consists of several phage Gp30.7 proteins of 121 residues in length. Family members seem to be exclusively from the T4-like viruses. The function of this family is unknown.

    \ 5926 IPR009290 \

    This family consists of several radial spoke protein 3 (RSP3) sequences. Eukaryotic cilia and flagella present in diverse types of cells perform motile, sensory, and developmental functions in organisms from protists to humans. They are centred by precisely organised, microtubule-based structures, the axonemes. The axoneme consists of two central singlet microtubules, called the central pair, and nine outer doublet microtubules. These structures are well conserved during evolution. The outer doublet microtubules, each composed of A and B sub-fibres, are connected to each other by nexin links, while the central pair is held at the centre of the axoneme by radial spokes. The radial spokes are T-shaped structures extending from the A-tubule of each outer doublet microtubule to the centre of the axoneme. Radial spoke protein 3 (RSP3), is present at the proximal end of the spoke stalk and helps in anchoring the radial spoke to the outer doublet. It is thought that radial spoke!\ s regulate the activity of inner arm dynein through protein phosphorylation and dephosphorylation PUBMED:12589069.

    \ \ 2156 IPR007458 \ Members of this family are uncharacterised proteins.\ 3190 IPR004616 \ Leucyl/phenylalanyl-tRNA--protein transferase (EC 2.3.2.-) transfers a Leu or Phe to the amino end of certain proteins to enable degradation. The N-terminal residue controls the biological half-life of many proteins via the N-end rule pathway.\ 8086 IPR011720 \

    This family consists of examples of the threonine biosynthesis (thr) operon leader peptide, also called the thr operon attenuator. The small gene for this peptide is often missed in genome annotation. It should be looked for in genomes of the proteobacteria, immediately upstream of genes for threonine biosynthesis, typically aspartokinase I/homoserine dehydrogenase, homoserine kinase, and threonine synthase. Transcription of the rest of the Thr operon is attenuated (mostly turned off) unless the ribosome pauses during a stretch of the leader sequence rich in both Ile (made from Thr) and in Thr itself because of the scarcity of those amino acids at the time. The leader peptide itself, once made, may have no role other than to be degraded. Similar systems exist for some other amino acid biosynthetic operons, such as Trp.

    \ 3358 IPR001398 \

    Macrophage migration inhibitory factor (MIF) seems to\ play an important role in host inflammatory responses where it is involved in the host response to endotoxic shock probably serving as a\ pituitary "stress" hormone that regulates systemic inflammatory responses PUBMED:7737686. MIF\ is a secreted protein that is not processed from a larger\ precursor.

    \

    D-dopachrome tautomerase, related to MIF, is a mammalian cytoplasmic enzyme involved in\ melanin biosynthesis that tautomerizes D-dopachrome with concomitant\ decarboxylation to give 5,6-dihydroxyindole (DHI) PUBMED:8267597.

    \ 2287 IPR006978 \ This conserved region is found in the N-terminal region of a number of conserved archaeal proteins of unknown function.\ 7493 IPR011638 \

    The Gut family consists only of glucitol-specific permeases, but these occur both in Gram-negative and Gram-positive bacteria. Escherichia coli contains IIA protein, IIC protein and IIBC protein.

    This entry represents the C-terminal conserved region of the IIBC component.

    \ 398 IPR000840 \

    Retroviral matrix proteins (or major core proteins) are components of envelope-associated capsids, which line the inner surface of virus envelopes and are associated with viral membranes PUBMED:9657938. Matrix proteins are produced as part of Gag precursor polyproteins. During viral maturation, the Gag polyprotein is cleaved into major structural proteins by the viral protease, yielding the matrix (MA), capsid (CA), nucleocapsid (NC), and some smaller peptides. Gag-derived proteins govern the entire assembly and release of the virus particles, with matrix proteins playing key roles in Gag stability, capsid assembly, transport and budding. Although matrix proteins from different retroviruses appear to perform similar functions and can have similar structural folds, their primary sequences can be very different.

    \

    This entry represents matrix proteins from gamma-retroviruses, such as Moloney murine leukaemia virus (MMLV), feline leukaemia virus (FLV), and feline sarcoma virus (FSV) PUBMED:12467570, PUBMED:9740771. This entry also identifies matrix proteins from several eukaryotic endogenous retroviruses, which arise when one or more copies of the retroviral genome becomes integrated into the host genome PUBMED:12876457.

    \ 6032 IPR009342 \

    This domain is conserved in enzymes that have carbohydrates as substrate, and may be a carbohydrate-binding domain.

    \ 4511 IPR007481 \

    Escherichia coli stringent starvation protein B (SspB), is thought to enhance the specificity of degradation of tmRNA-tagged proteins by the ClpXP protease. The tmRNA tag, also known as ssrA, is an 11-aa peptide added to the C terminus of proteins stalled during translation, targets proteins for degradation by ClpXP and ClpAP. SspB is a cytoplasmic protein that specifically binds to residues 1-4 and 7 of the tag. Binding of SspB enhances degradation of tagged proteins by ClpX, and masks sequence elements important for ClpA interactions, inhibiting degradation by ClpA PUBMED:11535833. However, more recent work has cast doubt on the importance of SspB in wild-type cells PUBMED:11810257. SspB is encoded in an operon whose synthesis is stimulated by carbon, amino acid, and phosphate starvation. SspB may play a special role during nutrient stress, for example by ensuring rapid degradation of the products of stalled translation, without causing a global increase in degradation of all ClpXP substrates PUBMED:11009422.

    \ 7801 IPR012936 \

    This domain occurs in many hypothetical proteins, and also two partially characterised proteins. One of these proteins, PTX1 , is a homeodomain-containing transcription factor involved in regulating all pituitary hormone genes PUBMED:10067870. This protein is down regulated in prostate carcinoma PUBMED:11445006. The other protein, ERGIC-32 , is involved in protein transport from the ER to the Golgi PUBMED:15308636.

    \ 2662 IPR000115 \ Phosphoribosylglycinamide synthetase () (GARS) (phosphoribosylamine\ glycine ligase) PUBMED:2687276 catalyzes the second step in the de novo biosynthesis of\ purine:\ \ \ \ In bacteria GARS is a monofunctional enzyme (encoded by the purD gene), in\ yeast it is part, with phosphoribosylformylglycinamidine cyclo-ligase (AIRS) \ of a bifunctional enzyme (encoded by the ADE5,7 gene), in higher eukaryotes it\ is part, with AIRS and with\ phosphoribosylglycinamide formyltransferase (GART) \ of a trifunctional enzyme (GARS-AIRS-GART).\ 746 IPR001353 \

    This group contains threonine peptidases and non-peptidase homologs belong to MEROPS peptidase family T1 (proteasome family, clan PB(T)). The family consists of the protease components of the archaeal and bacterial proteasomes and the alpha and beta subunits of the eukaryotic proteasome.

    \ \

    ATP-dependent protease complexes are present in all three kingdoms of life, where they rid the cell of misfolded or damaged proteins and control the level of certain regulatory proteins. They include the proteasome in Eukaryotes, Archaea, and Actinomycetales and the HslVU (ClpQY, clpXP) complex in other eubacteria. Genes homologous to eubacterial HslV (ClpQ) and HslU (ClpY, clpX) have also been demonstrated in to be present in the genome of trypanosomatid protozoa. PUBMED:12446803.

    \ \

    The proteasome (or macropain) () PUBMED:7682410, PUBMED:2643381, PUBMED:1317508, PUBMED:7697118, PUBMED:8882582 is a multicatalytic proteinase complex that is involved in an ATP/ubiquitin-dependent non-lysosomal proteolytic pathway. In eukaryotes the proteasome is composed of about 28 distinct subunits, which form a highly ordered ring-shaped structure (20S ring) of about 700 kDa. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, A and B. In eukaryotic organisms there are up to seven different types of beta subunits, three of which may carry the N-terminal threonine \ residues that are the nucleophiles in catalysis, and show different specificities. The molecule is barrel-shaped, and the active sites are on the inner surfaces. Terminal apertures restrict access of substrates to the active sites.

    \ \

    The prokaryotes the ATP-dependant proteasome is coded for by the heat-shock locus VU (HslVU). It consists of HslV, the protease (MEROPS peptidase subfamily T1B), and HslU, , the ATPase and chaperone belonging to the AAA/Clp/Hsp100 family. The crystal structure of Thermotoga maritima HslV has been determined to 2.1-A resolution. The structure of the dodecameric enzyme is well conserved compared to those from Escherichia coli and Haemophilus influenzae PUBMED:12646382, PUBMED:12823960.

    \ \ \ 4291 IPR005093 \

    This is a family of Leviviridae RNA replicases. The replicase is also known as RNA-dependent RNA polymerase.

    \ 6826 IPR009736 \

    This family consists of several hypothetical bacterial proteins of around 150 residues in length. Some family members are described as putative lipoproteins but the function of the family is unknown.

    \ 4615 IPR001839 \

    Transforming growth factor-beta (TGF-beta) PUBMED: is a multifunctional peptide that controls proliferation, differentiation and other functions in many cell types. TGF-beta-1 is a peptide of 112 amino acid residues derived by proteolytic cleavage from the C-terminal of a precursor protein.

    \

    A number of proteins are known to be related to TGF-beta-1 PUBMED:, PUBMED:1575734, PUBMED:8199356. Proteins from the TGF-beta family are only active as homo- or heterodimer; the two chains being linked by a single disulphide bond. From X-ray studies of TGF-beta-2 PUBMED:1631557, it is known that all the other cysteines are involved in intrachain disulphide bonds. As shown in the following schematic representation, there are four disulphide bonds in the TGF-beta's and in inhibin beta chains, while the other members of this family lack the first bond.

    \ \
    \
                                                         interchain\
                                                         |\
              +------------------------------------------|+\
              |                                          ||\
    xxxxcxxxxxCcxxxxxxxxxxxxxxxxxxCxxCxxxxxxxxxxxxxxxxxxxCCxxxxxxxxxxxxxxxxxxxCxCx\
        |      |                  |  |                                        | |\
        +------+                  +--|----------------------------------------+ |\
                                     +------------------------------------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \ 2298 IPR007061 \ This family is commonly found in Streptomyces coelicolor and is of unknown function. These proteins contain several conserved histidines at their N-terminus that may form a metal binding site.\ 5583 IPR008802 \ This family consists of the highly related rubber elongation factor (REF), small rubber particle protein (SRPP) and stress-related protein (SRP) sequences. REF and SRPP are released from the rubber particle membrane into the cytosol during osmotic lysis of the sedimentable organelles (lutoids). The exact function of this family is unknown PUBMED:12461132.\ 4272 IPR002092 \

    DNA-dependent RNA polymerases () are\ responsible for the polymerisation of ribonucleotides\ into a sequence complementary to the template DNA. In\ eukaryotes, there are three different forms of\ DNA-dependent RNA polymerases transcribing different\ sets of genes. Most RNA polymerases are multimeric\ enzymes and are composed of a variable number of\ subunits. RNA synthesis follows after the attachment\ of RNA polymerase to a specific site, the promoter, on\ the template DNA strand. The RNA synthesis process\ continues until a termination sequence is reached. \ The RNA product, which is synthesised in the 5' to 3'\ direction, is known as the primary transcript.\ \ Eukaryotic nuclei contain three distinct types of RNA\ polymerases that differ in the RNA they synthesise:\ \

    \ \ Eukaryotic cells are also known to contain separate\ mitochondrial and chloroplast RNA polymerases. \ Eukaryotic RNA polymerases, whose molecular masses\ vary in size from 500 to 700 kD, contain two\ non-identical large (>100 kDa) subunits and an array\ of up to 12 different small (less than 50 kDa) subunits.

    \

    This is a family of single chain polymerases, which\ are evolutionary related, and which originate from bacteriophage or from\ mitochondria PUBMED:7526118.

    \ 4575 IPR004370 \

    4-Oxalocrotonate tautomerase (4-OT) catalyzes the isomerisation of beta,gamma-unsaturated enones to their alpha,beta-isomers. The enzyme is part of a plasmid-encoded\ pathway, which enables bacteria harbouring the plasmid to use various aromatic hydrocarbons as their sole sources of carbon and energy. The\ enzyme is a barrel-shaped hexamer, which can be viewed as a trimer of dimers. The hexamer contains a hydrophobic core formed by three beta-sheets and\ surrounded by three pairs of alpha-helices. Each 4-OT monomer of 62 amino acids has a relatively simple beta-alpha-beta fold as described by the structure of the enzyme from Pseudomonas putida PUBMED:12051677. The monomer begins\ with a conserved proline at the start of a beta-strand, followed by an alpha-helix and a 310 helix preceding a second parallel beta-strand, and ends with\ a beta-hairpin near the C-terminus. The dimer results from antiparallel interactions between the beta-sheets and alpha-helices of the two monomers, forming a\ four-stranded beta-sheet with antiparallel alpha-helices on one side, creating two active sites, one at each end of the beta-sheet. Three dimers further\ associate to form a hexamer by the interactions of the strands of the C-terminal beta-hairpin loops with the edges of the four-stranded beta-sheets of neighbouring\ dimers, creating a series of cross-links that stabilise the hexamer

    \

    Pro-1 of the mature protein functions as the general base while Arg-39 and an ordered water molecule each provide a hydrogen bond to the C-2 oxygen of substrate. Arg-39\ plays an additional role in the binding of the C-1 carboxylate group. Arg-11 participates both in substrate binding and in catalysis. It\ interacts with the C-6 carboxylate group, thereby holding the substrate in place and drawing electron density to the C-5 position. The hydrophobic nature of\ the active site, which lowers the pKa of Pro-1 and provides a favourable environment for catalysis, is largely maintained by Phe-50.

    \ \

    Because several Arg residues located near the active site are not conserved among all members of this family and because of the presence of fairly distantly related paralogs in Campylobacter jejuni, the family is regarded as not necessarily uniform in function.

    \ \ 3604 IPR004894 \ This is a family of outer surface proteins from Borrelia. The function of these proteins is unknown.\ 4607 IPR003162 \ Human transcription initiation factor TFIID is composed of the TATA-binding polypeptide (TBP) and at least 13 TBP-associated factors (TAFs) that collectively or individually are involved in activator-dependent transcription PUBMED:7667268. \

    TAFII-31 protein is a transcriptional coactivator of the p53 protein PUBMED:7761466.

    \ 441 IPR002495 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 8 comprises enzymes with a number of known activities; lipopolysaccharide galactosyltransferase (), lipopolysaccharide\ glucosyltransferase 1 (), glycogenin glucosyltransferase (), inositol 1-alpha-galactosyltransferase (). These enzymes have a distant similarity to family GT_24.

    \ 674 IPR006800 \ Pellino is involved in Toll-like signalling pathways, and associates with the kinase domain of the Pelle Ser/Thr kinase PUBMED:10858658, PUBMED:11132151, PUBMED:10330490.\ 6024 IPR010398 \

    This is a family of predicted bacterial membrane protein with unknown function.

    \ 4148 IPR000219 \

    The Rho family GTPases Rho, Rac and CDC42 regulate a diverse array of cellular\ processes. Like all members of the Ras superfamily, the Rho proteins cycle between active GTP-bound and inactive GDP-bound conformational states.\ Activation of Rho proteins through release of bound GDP and subsequent\ binding of GTP, is catalyzed by guanine nucleotide exchange factors (GEFs) in\ the Dbl family. The proteins encoded by members of the Dbl family share a\ common domain, presented in this entry, of about 200 residues (designated the Dbl homology or DH domain)\ that has been shown to encode a GEF activity specific for a number of Rho\ family members. In addition, all family members possess a second, shared\ domain designated the pleckstrin homology (PH) domain (). Trio\ and its homolog UNC-73 are unique within the Dbl family insomuch as they\ encode two distinct DH/PH domain modules. The PH domain is invariably located\ immediately C-terminal to the DH domain and this invariant topography suggests\ a functional interdependence between these two structural modules. Biochemical\ data have established the role of the conserved DH domain in Rho GTPase\ interaction and activation, and the role of the tandem PH domain in\ intracellular targeting and/or regulation of DH domain function. The DH domain\ of Dbl has been shown to mediate oligomerization that is mostly homophilic in\ nature. In addition to the tandem DH/PH domains Dbl family GEFs contain\ diverse structural motifs like serine/threonine kinase, RBD,\ PDZ, RGS, IQ, REM, Cdc25\ RasGEF, CH, SH2, SH3, EF, spectrin or Ig.

    \ \

    The DH domain is composed of three structurally conserved regions separated by\ more variable regions. It does not share significant sequence homology with\ other subtypes of small G-protein GEF motifs such as the Cdc25 domain and the\ Sec7 domain, which specifically interact with Ras and ARF\ family small GTPases, respectively, nor with other Rho protein interactive\ motifs, indicating that the Dbl family proteins are evolutionarily unique. The\ DH domain is composed of 11 alpha helices that are folded into a flattened,\ elongated alpha-helix bundle in which two of the three conserved regions,\ conserved region 1 (CR1) and conserved region 3 (CR3), are exposed near the\ center of one surface. CR1 and CR3, together with a part of alpha-6 and the\ DH/PH junction site, constitute the Rho GTPase interacting pocket.

    \ \ 7945 IPR012538 \

    This family consists of the cytochrome c oxidase subunit IIa family. The bax-type cytochrome c oxidase from Thermus thermophilus is known as a two subunit enzyme. From its crystal structure, it was discovered that an additional transmembrane helix, subunit IIa, spans the membrane. This subunit consists of 34 residues forming one helix across the membrane. The presence of this subunit seems to be important for the function of cytochrome c oxidases PUBMED:11152118.

    \ 354 IPR000421 \ Blood coagulation factors V and VIII contain a C-terminal, twice repeated,\ domain of about 150 amino acids, which is called F5/8 type C, FA58C, or C1/C2-\ like domain. In the slime mold cell adhesion protein discoidin, a related\ domain, named discoidin I-like domain, DLD, or DS, has been found which shares\ a common C-terminal region of about 110 amino acids with the FA58C domain, but\ whose N-terminal 40 amino acids are much less conserved. Similar domains have\ been detected in other extracellular and membrane proteins PUBMED:3092220, PUBMED:8390675, PUBMED:8639264\ In coagulation factors V and VIII the repeated domains compose part of a\ larger functional domain which promotes binding to anionic phospholipids on\ the surface of platelets and endothelial cells PUBMED:3125864. The C-terminal domain of\ the second FA58C repeat (C2) of coagulation factor VIII has been shown to be\ responsible for phosphatidylserine-binding and essential for activity PUBMED:2110840, PUBMED:7515064.\ It forms an amphipathic alpha-helix, which binds to the membrane PUBMED:7893714.\ FA58C contains two conserved cysteines in most proteins, which link the\ extremities of the domain by a disulphide bond PUBMED:8504111, PUBMED:7613471, PUBMED:8856064. A further disulphide\ bond is located near the C-terminal of the second FA58C domain in MFGM PUBMED:8856064.\
    \
      +------------------------------------------------------------------------+\
      |                                                               +-+      |\
      |                                                               | |      |\
      CxPLGxxQITASxxxxxRLxxxWxxxxWxxxxxxQGxxxxxxxxxxxxGNxxxxxxxxxxRxPxcxcLRxExGC\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    'c': cysteine involved in a disulphide bond in MFGM .\
    'x': any amino acid.\
    upper case letters: conserved residues.\
    
    \ 18 IPR002912 \

    The ACT domain is found in a variety of contexts and is proposed to be a conserved regulatory binding fold. ACT domains are linked to a wide range of metabolic enzymes that are regulated by amino acid concentration. The archetypical ACT domain is the C-terminal regulatory domain of 3-phosphoglycerate dehydrogenase (3PGDH), which folds with a ferredoxin-like topology. A pair of ACT domains form an eight-stranded antiparallel sheet with two molecules of allosteric inhibitor serine bound in the interface. Biochemical exploration of a few other proteins containing ACT domains supports the suggestions that these domains contain the archetypical ACT structure PUBMED:11751050.

    \ \ 7589 IPR011672 \ This is a family of sequences coming from hypothetical proteins found in both bacterial and archaeal species.\ 1589 IPR000275 \

    Coagulogen is a gel-forming protein of hemolymph that hinders the spread of invaders by immobilising them PUBMED:3905780, PUBMED:6469947. The protein contains a single 175- residue polypeptide chain; this is cleaved after Arg-18 and Arg-46 by a clotting enzyme contained in the hemocyte and activated by a bacterial endotoxin (lipopolysaccharide). Cleavage releases two chains of coagulin, A and B, linked by two disulphide bonds, together with the peptide C PUBMED:3905780, PUBMED:6469947. Gel formation results from interlinking of coagulin molecules. Secondary structure prediction suggests the C peptide forms an alpha- helix, which is released during the proteolytic conversion of coagulogen to coagulin gel PUBMED:3905780. The beta-sheet structure and 16 half-cystines found in the molecule appear to yield a compact protein stable to acid and heat.

    \

    Mammalian blood coagulation is based on the proteolytically induced polymerization of fibrinogens. Initially, fibrin monomers noncovalently interact with each other. The resulting homopolymers are further stabilized when the plasma transglutaminase (TGase) intermolecularly cross-links epsilon-(gamma-glutamyl)lysine bonds. In crustaceans, hemolymph coagulation depends on the TGase-mediated cross-linking of specific plasma-clotting proteins, but without the proteolytic cascade. In horseshoe crabs, the proteolytic coagulation cascade triggered by lipopolysaccharides and beta-1,3-glucans leads to the conversion of coagulogen into coagulin, resulting in noncovalent coagulin homopolymers through head-to-tail interaction. Horseshoe crab TGase, however, does not cross-link coagulins intermolecularly. Recently, we found that coagulins are cross-linked on hemocyte cell surface proteins called proxins. This indicates that a cross-linking reaction at the final stage of hemolymph coagulation is an important innate immune system of horseshoe crabs PUBMED:15170505.

    \ 895 IPR007527 \

    The SWIM Zn-chelating domain is found in a variety of prokaryotic and eukaryotic proteins, including mitogen-activated protein kinase kinase kinase 1 (or MEKK 1) and several hypothetical proteins.

    \ 7758 IPR012933 \

    The viral, archaeal and bacterial proteins making up this family are similar to the YcfA protein expressed by Escherichia coli (). Most of these proteins are hypothetical proteins of unknown function.

    \ 6278 IPR004671 \

    The Escherichia coli NhaB Na+:H+ Antiporter (NhaB) protein has 12 predicted TMS, and catalyses sodium/proton exchange. Unlike NhaA, , this activity is not pH dependent.

    \ 3010 IPR007050 \

    Numerous bacterial transcription regulatory proteins bind DNA via a helix-turn-helix (HTH) motif. This entry represents the HTH DNA binding domain found in Halobacterium halobium and described as a putative bacterio-opsin activator.

    \ 7368 IPR011501 \

    Nucleolar complex-associated protein (Noc3p, ) is conserved in eukaryotes and plays essential roles in replication and rRNA processing in Saccharomyces cerevisiae PUBMED:12110182.

    \ 4582 IPR005333 \

    The cycloidea (cyc) and teosinte branched 1 (tb1) genes code for structurally related proteins implicated in the evolution of key morphological traits. However, the biochemical function of CYC and TB1 proteins remains to be demonstrated. One of the conserved regions is predicted to form a non-canonical basic-Helix-Loop-Helix (bHLP) structure. This domain is also found in two rice DNA-binding proteins, PCF1 and PCF2, where it has been shown to be involved in DNA-binding and dimerization. This indicates a new family of transcription factors, which we have termed the TCP family after its first characterised members (TB1, CYC and PCFs) PUBMED:10363373.

    \ 4037 IPR007157 \ This family includes PspA a protein that suppresses sigma54-dependent transcription. The PspA protein, a negative regulator of the Escherichia coli phage shock psp operon, is produced when virulence factors are exported through secretins in many Gram-negative pathogenic bacteria and its homologue in plants, VIPP1, plays a critical role in thylakoid biogenesis, essential for photosynthesis. Activation of transcription by the enhancer-dependent bacterial sigma54-containing RNA polymerase occurs through ATP hydrolysis-driven protein conformational changes enabled by activator proteins that belong to the large AAA(+) mechanochemical protein family. It has been shown that PspA directly and specifically acts upon and binds to the AAA(+) domain of the PspF transcription activator PUBMED:12079332.\ 408 IPR003902 \

    GCM transcription factors are a family of proteins which contain a GCM motif. The GCM motif is a domain that has been\ identified in proteins belonging to a family of\ transcriptional regulators involved in fundamental developmental processes which comprise Drosophila melanogaster GCM and its mammalian\ homologs PUBMED:8962155, PUBMED:9114061, PUBMED:9580683, PUBMED:10671510. IN GCM transcription factors the N-terminal moiety contains a DNA-binding domain of 150 residues. Sequence conservation is\ highest in this GCM domain. In contrast, the C-terminal moiety contains one or two transactivating regions and is only poorly conserved.

    The GCM motif has been shown to be a DNA binding domain that recognizes preferentially the nonpalindromic octamer 5'-ATGCGGGT-3' PUBMED:8962155, PUBMED:9114061, PUBMED:9580683. The GCM motif contains many conserved basic amino acid residues, seven cysteine residues, and four histidine residues PUBMED:8962155. The conserved cysteines are involved in shaping the overall conformation of the domain, in the process of DNA binding and in the redox regulation of DNA binding PUBMED:9580683. The\ GCM domain as a new class of Zn-containing DNA-binding domain with no similarity to any other DNA-binding domain PUBMED:12682016. The GCM domain consists of a large and\ a small domain tethered together by one of the two Zn ions present in the structure. The large and the small domains comprise five- and three-stranded\ beta-sheets, respectively, with three small helical segments packed against the same side of the two beta-sheets. The GCM domain exercises a novel mode of\ sequence-specific DNA recognition, where the five-stranded beta-pleated sheet inserts into the major groove of the DNA. Residues protruding from the edge strand of\ the beta-pleated sheet and the following loop and strand contact the bases and backbone of both DNA strands, providing specificity for its DNA target site.

    \ 3143 IPR007741 \ Proteins containing this domain are located in the mitochondrion and include ribosomal protein L51, and S25. This domain is also found in mitochondrial NADH-ubiquinone oxidoreductase B8 subunit (CI-B8) . It is not known whether all members of this family form part of the NADH-ubiquinone oxidoreductase and whether they are also all ribosomal proteins.\ 6433 IPR010572 \

    This family consists of hypothetical bacterial and viral proteins of unknown function.

    \ 4604 IPR004855 \

    Transcription initiation factor IIA (TFIIA) is a heterotrimer, the three subunits being known as alpha, beta, and gamma, in order of molecular weight. The N and C-terminal domains of the gamma subunit are represented in TFIIA_gamma and TFIIA_gamma_C (), respectively. This family represents the precursor that yields both the alpha and beta subunits. The TFIIA heterotrimer is an essential general transcription initiation factor for the expression of genes transcribed by RNA polymerase II. Together with TFIID, TFIIA binds to the promoter region; this is the first step in the formation of a pre-initiation complex (PIC). Binding of the rest of the transcription machinery follows this step PUBMED:11089979. After initiation, the PIC does not completely dissociate from the promoter. Some components, including TFIIA, remain attached and re-initiate a subsequent round of transcription.

    \ \ 4880 IPR005371 \

    This family contains small proteins of about 50 amino acids of unknown function. The family includes YoaH .

    \ 5310 IPR008837 \ The Drosophila serendipity alpha (sry alpha) gene is specifically transcribed at the blastoderm stage, from nuclear cycle 11 to the onset of gastrulation, in all somatic nuclei PUBMED:2166703. SRY-A is required for the cellularisation of the embryo and is involved in the localisation of the actin filaments just prior to and during plasma membrane invagination PUBMED:8287797.\ 1695 IPR001893 \

    This cysteine rich repeat contains four cysteines. It is found in multiple copies in a protein that binds to fibroblast growth factors PUBMED:1448090. The repeat is also found in the golgi apparatus protein 1 precursor (MG-160/ESL-1, ).

    \ 5992 IPR008314 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 6722 IPR010027 \

    This entry identifies a family of bacteriophage proteins including G of phage lambda. This protein has been described as undergoing a translational frameshift at a Gly-Lys dipeptide near the C terminus of protein G from phage lambda, with about 4% efficiency, to produce tail assembly protein G-T.

    \ 7718 IPR012416 \

    The members of this family are putative or actual calmodulin binding proteins expressed by various plant species. Some members (for example, ), are known to be involved in the induction of plant defence responses PUBMED:12777041. However, their precise function in this regard is as yet unknown.

    \ 2383 IPR004221 \ Restriction endonuclease EcoRI () is a type II site-specific deoxyribonuclease, which catalyses the endonucleolytic cleavage of DNA, using magnesium as a cofactor, to give specific double-stranded fragments with terminal 5'-phosphates. Type II restriction endonucleases are characterized by their specificity for recognising and cleaving specific DNA sequences. The sequences of these endonucleases are surprisingly unrelated. Restriction endonuclease EcoRI recognises the DNA sequence GAATTC and cleaves after G-1.\ 3537 IPR007758 \ The rotavirus nonstructural protein NSP1 is the least conserved protein in the rotavirus genome, and its function in the replication process is not fully understood. The NSP1-like protein appears to be an essential component of the nuclear pore complex, for example preribosome nuclear export requires the Nup82p-Nup159p-Nsp1p complex. The C-terminal of Nsp1 is involved in binding Nup82 PUBMED:11689687, probably via coiled-coil formation PUBMED:11689687.\ 1792 IPR001241 \

    Topoisomerases catalyse the interconversion of topological isomers of DNA and play \ a key role in DNA metabolism. Topoisomerase I catalyses an ATP-independent reaction, \ while topoisomerase II catalyses an ATP-dependent reaction, resulting in the formation \ of DNA supercoils PUBMED:1651812, PUBMED:1646964, PUBMED:2845399. Eukaryotic enzymes can form \ both positive and negative supercoils, while prokaryotic enzymes form only negative \ supercoils.

    \ \

    Eukaryotic topoisomerase II exists as a homodimer; in Enterobacteria phage T4 it \ consists of three heterologous subunits; in prokaryotes it exists as a tetramer\ of two subunits (two each of gyrA and gyrB); and in Escherichia coli, a second type II\ topoisomerase, involved in chromosome segregation (topoisomerase IV),\ consists of two subunits (parC and parE). GyrB, parE, and the product of \ bacteriophage T4 gene 39, are all similar to the eukaryotic proteins.

    \

    Structural studies of E. coli topoisomerase II have shown that the enzyme\ binds to DNA, forming a complex in which a DNA strand of approximately 120\ base pairs is wound around a protein core. At low resolution, this\ complex resembles a flattened sphere, and may be heart-shaped, with the DNA\ embedded in the protein. There is evidence for channels or cavities in\ the complex, which may have a role in the DNA translocation process PUBMED:1646964.

    \

    The gyrB protein possesses 2 uniquely-folded domains. The N-terminal domain\ (domain 1) possesses ATP-binding and hydrolysis functions, and forms an\ 8-stranded anti-parallel beta-sheet with unusual strand connectivities - the\ structure, which is stabilised by a hydrophobic core, can be subdivided\ into 6- and 2-stranded anti-parallel sheets, connected by a parallel sheet. The C-terminal domain (domain 2) contains a 4-stranded mixed parallel\ and anti-parallel beta-sheet. Four helices are also present, 2 of which are\ rich in arginine residues. The gyrB dimer is punctured by a 20A hole, which\ may provide a gateway through which DNA is passed during supercoiling.\ Every arginine of domain 2 protrudes into this hole, possibly creating a \ DNA-binding surface PUBMED:1646964.

    \

    From this structural information and results of various biochemical studies,\ a possible mechanism has been proposed: DNA is first bound by the gyrB\ dimer, then cleaved by gyrA. A large conformational change allows passage\ of another DNA strand through the double-stranded break and into the protein\ complex. This may involve ATP binding, exploiting the energy of association\ of ATP to the complex to stabilise an unfavourable protein conformation.\ The DNA break is then repaired by ligation, and the whole DNA molecule\ released - this possibly involves hydrolysis of ATP to ADP and organic\ phosphorous, which can dissociate from the protein, allowing the protein\ complex to return to its favoured conformation, and releasing the DNA PUBMED:1646964.

    \ 2768 IPR003469 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ This family consists of the glycosyl hydrolase 68 family (), including several bacterial levansucrase enzymes, and invertase from Zymomonas. Levansucrase (), also known as beta-D-fructofuranosyl transferase, catalyses the conversion of sucrose and (2,6-beta-D-fructosyl)(N) to glucose and (2,6-beta-D-fructosyl)(N+1), where other sugars can also act as fructosyl acceptors. Invertase, or extracellular sucrase (), catalyses the hydrolysis of terminal non-reducing beta-D-fructofuranoside residues in beta-D-fructofuranosides.\ 4823 IPR002036 \ These, as yet, uncharacterised proteins are of 17 to 21 kDa. They contain a conserved region with three histidines at the C terminus.\ 3608 IPR002038 \ The major event of endochondrial ossification is the proteolytic\ degradation of calcified cartilage and the extracellular matrix, and their\ substitution with bone-specific extracellular matrix produced and organised\ by osteoblasts PUBMED:2033080. One of the most abundant products of osteoblasts is\ osteopontin, a glycosylated phosphoprotein with a high acidic amino acid\ content and one copy of the cell attachment sequence RGD PUBMED:2033080. It is thought\ that osteopontin may act as a bridge between osteoblasts and the apatite\ mineral of the bone PUBMED:2033080. Osteopontin-K is a kidney protein, similar to\ osteopontin and probably also involved in cell adhesion PUBMED:1414488\ 4656 IPR001022 \ The movement protein of tobamoviruses is necessary for the initial cell-to-cell\ movement during the early stages of a viral infection. This movement is active,\ and involves the interaction of the movement protein with the plasmodesmata.\ The movement protein possesses the ability to bind to RNA to achieve its\ role PUBMED:1546450.\

    The N terminus contains two particularly well-conserved regions, substitutions\ in one of these results in temperature-sensitive cell-to-cell movement. The C terminus contains three sub-regions characterised by the distributions of charged\ amino acid residues PUBMED:3201760.

    \ 4049 IPR002745 \

    The final step of tRNA splicing in Saccharomyces cerevisiae requires 2'-phosphotransferase (Tpt1) to transfer the 2'-phosphate from\ ligated tRNA to NAD, producing mature tRNA and ADP ribose-1' '-2' '-cyclic phosphate. Yeast and mouse Tpt1 protein and bacterial KptA protein can catalyze the conversion of the\ generated intermediate to both product and the original substrate, these enzymes\ likely use the same reaction mechanism. Step 1 of this reaction is strikingly similar to the\ ADP-ribosylation of proteins catalyzed by a number of bacterial toxins.

    KptA, a functional Tpt1\ protein homologue from Escherichia coli is strikingly similar to yeast Tpt1 in its kinetic parameters, although\ Escherichia coli is not known to have a 2'-phosphorylated RNA substrate PUBMED:9915792,PUBMED:11705403.

    \ 6794 IPR010721 \

    This family contains a number of bacterial and eukaryotic proteins of unknown function that are approximately 300 residues long.

    \ 1260 IPR001461 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of aspartic peptidases belong to MEROPS peptidase family A1 (pepsin family, clan AA). The type example is pepsin A from Homo sapiens.

    \

    Aspartic endopeptidases include pepsins, cathepsins, and renins. Most members of the pepsin family specifically cleave bonds in peptides that are at least six residues in length, with hydrophobic residues in both\ the P1 and P1' positions PUBMED:7674916. Crystallography has shown the active site to form a groove across the junction of the two lobes, with an extended loop projecting over the cleft to form an 11-residue flap, which encloses substrates and inhibitors within the active site PUBMED:7674916. Specificity is determined by several hydrophobic residues surrounding the catalytic aspartates, and by three residues in the flap.

    \

    Cysteine residues are well conserved within the pepsin family, pepsin itself containing three disulphide loops. The first loop is found in all but the fungal enzymes, and is usually around five residues in length, but is longer in barrierpepsin and candidapepsin; the second loop is also small and found only in the animal enzymes; and the third loop is the largest, found in all members of the family, except for the cysteine-free polyporopepsin. The loops are spread unequally throughout the two lobes, suggesting that they formed after the initial gene duplication and fusion event PUBMED:7674916.

    \

    This family does not include the retroviral nor retrotransposon \ aspartic proteases which are much smaller and appear to \ be homologous to the single domain aspartic proteases.

    \ 202 IPR000591 \ This is a domain of unknown function present in signaling proteins including dishevelled, Egl-10, and pleckstrin\ proteins. Segment polarity dishevelled protein is required to establish coherent arrays of polarized cells and\ segments in embryos, and plays a role in wingless signaling. Egl-10 regulates G-protein signaling in the central\ nervous system. Mammalian regulators of G-protein signaling also contain these domains, and regulate signal\ transduction by increasing the GTPase activity of G-protein alpha subunits, thereby driving them into their\ inactive GDP-bound form.\ 5290 IPR000848 \

    G-protein-coupled receptors, GPCRs, constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence PUBMED:8170923. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. There is a specialized database for GPCRs: http://www.gpcr.org/7tm/.

    \

    It has been suggested that the cAMP receptors coordinate aggregation of\ individual cells into a multicellular organism, and regulate the expression\ of a large number of developmentally-regulated genes PUBMED:3047871, PUBMED:8436297, PUBMED:8382181. The amino acid\ sequences of the receptors contain high proportions of hydrophobic residues\ grouped into 7 domains, in a manner reminiscent of the rhodopsins and other\ receptors believed to interact with G-proteins. However, while a similar\ 3D framework has been proposed to account for this, there is no significant\ sequence similarity between these families: the cAMP receptors thus bear\ their own unique '7TM' signature.

    \ \ 1361 IPR002731 \ This domain is found in the BadF () and BadG ()\ proteins that are two subunits of Benzoyl-CoA reductase, that may\ be involved in ATP hydrolysis.\ The family also includes an activase subunit from the enzyme\ 2-hydroxyglutaryl-CoA dehydratase (). The hypothetical protein AQ_278 from Aquifex aeolicus\ contains two copies of this region suggesting that the family may structurally dimerise.\ 7750 IPR012880 \

    The proteins featured in this family are all hypothetical eukaryotic proteins of unknown function. The region in question is approximately 150 residues long.

    \ 6149 IPR010454 \

    This family consists of several phage NinH proteins. The function of this family is unknown.

    \ 3872 IPR006787 \

    This conserved region is found at the N-terminal of the member proteins. It is located adjacent and N-terminal to the pinin/SKD/memA domain . Members of this family have very varied localisations within the eukaryotic cell. Pinin is known to localise at the desmosomes and is implicated in anchoring intermediate filaments to the desmosomal plaque PUBMED:8922384, PUBMED:9447706. SDK2/3 is a dynamically localised nuclear protein thought to be involved in modulation of alternative pre-mRNA splicing PUBMED:12051732. MemA is a tumour marker preferentially expressed in human melanoma cell lines. A common feature of the members of this family is that they may all participate in regulating protein-protein interactions PUBMED:10645008.

    \ \ 4931 IPR000475 \ The virion infectivity factor (vif) of human immunodeficiency virus\ type 1(HIV-1) affects the infectivity of virus particles PUBMED:3497453 to \ T lymphocytes and macrophages (in some cases\ increasing the infectivity of HIV-1 particles by 100- to 1000-fold), \ but has no direct effect on transcription, translation or virus release.\ Vif antibodies are found in the sera of patients at all levels of HIV-1\ infection, indicating that vif is expressed in natural infections in vivo.\ Other lentiviruses, including simian immunodeficiency virus, visna virus,\ and feline immunodeficiency virus, have vif open reading frames, suggesting\ vif plays an essential role during natural infections PUBMED:1357189.\ The expression of vif in BHK-21 cells has been shown to be linked to a\ modification of the C-terminus of gp41env, which modification is\ inhibited by trans-epoxysuccinyl-L-leucylamido-(4-guanidio)butane (E64),\ a specific inhibitor of cysteine proteases PUBMED:1995946. Coupled with sequence\ analysis and the effects of point mutations in vif, it has been suggested\ that vif could be a cysteine protease. Virions \ produced in the absence of Vif have abnormal core morphology and \ those produced in primary T cells carry immature core proteins \ and low levels of mature capsid PUBMED:14618252.\ 660 IPR007012 \

    In eukaryotes, polyadenylation of pre-mRNA plays an essential role in the initiation step of protein synthesis, as well as in the export and stability of mRNAs. Poly(A) polymerase, the enzyme at the heart of the polyadenylation machinery, is a template-independent RNA polymerase which specifically incorporates ATP at the 3' end of mRNA. The crystal structure of bovine poly(A) polymerase bound to an ATP analog at 2.5 A resolutio has been determined PUBMED:10944102. The structure revealed expected and unexpected similarities to other proteins. As expected, the catalytic domain of poly(A) polymerase shares substantial structural homology with other nucleotidyl transferases such as DNA polymerase beta and kanamycin transferase.

    \ \

    The central domain of Poly(A) polymerase shares structural similarity with the allosteric activity domain of ribonucleotide reductase R1, which comprises a four-helix bundle and a three-stranded mixed beta-sheet. Even though the two enzymes bind ATP, the ATP-recognition motifs are different.

    \ 4954 IPR001963 \ Glycoprotein VP7, also known as outer shell glycoprotein, is a serotype-specific antigen, and is the major neutralisation antigen. It is found in the dsRNA rotaviruses.\ 8146 IPR013244 \

    Sec39 is involved in the secretory pathway. In Saccharomyces cerevisiae it has been shown to localise to the endoplasmic reticulum and nuclear membrane PUBMED:15942868.

    \ 4829 IPR001378 \ This domain had been observed is a number of proteins of archaea and bacterial origin. The function of this domain is unknown.\ 975 IPR001807 \

    Chloride channels (CLCs) constitute an evolutionarily well-conserved family of voltage-gated channels that are structurally unrelated to the other known voltage-gated channels. They are found in organisms ranging from bacteria to yeasts and plants, and also to animals. Their functions in higher animals likely include the regulation of cell volume, control of electrical excitability and trans-epithelial transport PUBMED:9046241.

    \ \

    The first member of the family (CLC-0) was expression-cloned from the electric organ of Torpedo marmorata PUBMED:2174129, and subsequently nine CLC-like proteins have been cloned from mammals. They are thought to function as multimers of two or more identical or homologous subunits, and they have varying tissue distributions and functional properties. To date, CLC-0, CLC-1, CLC-2, CLC-4 and CLC-5 have been demonstrated to form functional Cl- channels; whether the remaining isoforms do so is either contested or unproven. One possible explanation for the difficulty in expressing activatable Cl- channels is that some of the isoforms may function as Cl- channels of intracellular compartments, rather than of the plasma membrane. However, they are all thought to have a similar transmembrane (TM) topology, initial hydropathy analysis suggesting 13 hydrophobic stretches long enough to form putative TM domains PUBMED:2174129. Recently, the postulated TM topology has been revised, and it now seems likely that the CLCs have 10 (or possibly 12) TM domains, with both N- and C-termini residing in the cytoplasm PUBMED:9207144.

    \ \

    A number of human disease-causing mutations have been identified in the genes encoding CLCs. Mutations in CLCN1, the gene encoding CLC-1, the major skeletal muscle Cl- channel, lead to both recessively and dominantly-inherited forms of muscle stiffness or myotonia PUBMED:7581380. Similarly, mutations in CLCN5, which encodes CLC-5, a renal Cl- channel, lead to several forms of inherited kidney stone disease PUBMED:8559248. These mutations have been demonstrated to reduce or abolish CLC function.

    \ \ \

    \ 260 IPR005034 \

    This putative domain is found in members of the Dicer protein family of dsRNA nucleases. This domain of\ about 100 amino acids has no known function, but does contain 3 possible zinc ligands.

    \ 1202 IPR005109 \ The members of this family (Anp1, Van1 and Mnn9) are membrane proteins required for proper Golgi function. These proteins colocalize within the cis Golgi, where they are physically associated in two distinct complexes PUBMED:9430634.\ 7058 IPR010823 \

    This family consists of several bacteriophage T4-like capsid assembly (or portal) proteins. The exact mechanism by which the double-stranded (ds) DNA bacteriophages incorporate the portal protein at a unique vertex of the icosahedral capsid is unknown. In phage T4, there is evidence that this vertex, constituted by 12 subunits of gp20, acts as an initiator for the assembly of the major capsid protein and the scaffolding proteins into a prolate icosahedron of precise dimensions. The regulation of portal protein gene expression is an important regulator of prohead assembly in bacteriophage T4 PUBMED:8918937.

    \ 6953 IPR009808 \

    This family consists of hypothetical bacterial and phage proteins of around 59 residues in length. Bacterial members of this family seem to be specific to Enterobacteria. The function of this family is unknown.

    \ 818 IPR004012 \ This domain is present in several proteins that are linked to the functions of GTPases in the Rap and Rab families. They could therefore play important roles in multiple Ras-like GTPase signaling pathways.\ 4414 IPR005327 \

    The small hydrophobic integral membrane protein, SH (previously designated 1A) is found to have a variety of glycosylated forms PUBMED:1413513, PUBMED:2374008. This protein is a component of the mature respiratory syncytial virion PUBMED:1413513 where it may form complexes and appears to play a structural role.

    \ 6529 IPR010606 \

    Mib is a RING ubiquitin ligase in the Notch pathway. Mib interacts with the intracellular domain of Delta to promote its ubiquitylation and internalisation. Cell transplantation studies suggest that mib function is essential in the signalling cell for efficient activation of Notch in neighbouring cells. This domain has been named 'mib/herc2 domain' in PUBMED:12530964and usually the protein also contains an E3 ligase domain (either Ring or Hect).

    \ 6204 IPR010475 \

    This family consists of several insect adipokinetic hormone as well as the related crustacean red pigment concentrating hormone. Flight activity of insects comprises one of the most intense biochemical processes known in nature, and therefore provides an attractive model system to study the hormonal regulation of metabolism during physical exercise. In long-distance flying insects, such as the migratory locust, both carbohydrate and lipid reserves are utilised as fuels for sustained flight activity. The mobilisation of these energy stores in Locusta migratoria is mediated by three structurally related adipokinetic hormones (AKHs), which are all capable of stimulating the release of both carbohydrates and lipids from the fat body PUBMED:9723879.

    \ 6270 IPR009159 \ This group represents a dihydrofolate reductase, type II.\ 5957 IPR010367 \

    This family consists of several Chordopoxvirus specific G3 proteins. The function of this family is unknown.

    \ 1051 IPR000276 \

    G-protein-coupled receptors, GPCRs, constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence PUBMED:8170923. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. There is a specialized database for GPCRs: http://www.gpcr.org/7tm/.

    \

    The rhodopsin-like GPCRs themselves represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7\ transmembrane (TM) helices PUBMED:2111655, PUBMED:2830256, PUBMED:8386361.

    \

    \ 1063 IPR003496 \ This is a family of plant proteins induced by water deficit stress (WDS) PUBMED:9426600, or abscisic acid (ABA) stress and ripening PUBMED:7630961.\ 6064 IPR010928 \

    This family consists of several tyrosinase co-factor MELC1 proteins from a number of Streptomyces species. The melanin operon (melC) of Streptomyces antibioticus contains two genes, melC1 and melC2 (apotyrosinase). It is thought that MelC1 forms a transient binary complex with the downstream apotyrosinase MelC2 to facilitate the incorporation of copper ion and the secretion of tyrosinase indicating that MelC1 is a chaperone for the apotyrosinase MelC2 PUBMED:8360164.

    \ 1761 IPR001667 \ This is a domain of predicted phosphoesterases that includes Drosophila prune protein and bacterial RecJ exonuclease PUBMED:9478130. The RecJ protein of Escherichia coli plays an important role in a number of DNA repair and\ recombination pathways. RecJ catalyzes processive degradation of single-stranded DNA in a 5'-to-3' direction. Sequences highly related to those encoding RecJ can be found in many\ of the eubacterial genomes sequenced to date PUBMED:10633092.\ \ 2520 IPR000692 \ Fibrillarin is a component of a nucleolar small nuclear ribonucleoprotein (SnRNP), functioning in vivo\ in ribosomal RNA processing PUBMED:2026646, PUBMED:8493104. It is associated with U3, U8 and U13 small nuclear\ RNAs in mammals PUBMED:2026646 and is similar to the yeast NOP1 protein PUBMED:2686980. Fibrillarin has a\ well conserved sequence of around 320 amino acids, and contains 3 domains, an N-terminal Gly/Arg-rich\ region; a central domain resembling other RNA-binding proteins and containing an RNP-2-like consensus\ sequence; and a C-terminal alpha-helical domain. An evolutionarily related pre-rRNA processing protein,\ which lacks the Gly/Arg-rich domain, has been found in various archaebacteria.\ 634 IPR001134 \ Netrins are extracellular proteins that control the guidance of CNS commissural\ axons at the midline and peripheral motor axons. This domain is present in a\ number of other proteins. The UNC-6 protein from C. elegans that guides\ dorsoventral migrations on the epidermus and that is required for the guidance\ of pioneering axons and migrating cells along the body wall has this domain.\ The domain is also found in cobra venom factor and in complement factors C3, C4 and C5.\ 8053 IPR013197 \

    This family consists of several DNA-directed RNA polymerase III polypeptides which are related to the Saccharomyces cerevisiae RPC82 protein. RNA polymerase C (III) promotes the transcription of tRNA and 5S RNA genes. In Saccharomyces cerevisiae, the enzyme is composed of 15 subunits, ranging from 10 kDa to about 160 kDa PUBMED:1406632. This region is probably a DNA-binding helix-turn-helix.

    \ 4359 IPR003119 \

    Saposins are small lysosomal proteins that serve as activators of various\ lysosomal lipid-degrading enzymes PUBMED:7595087. They probably act by isolating the\ lipid substrate from the membrane surroundings, thus making it more \ accessible to the soluble degradative enzymes. All mammalian saposins\ are synthesized as a single precursor molecule (prosaposin) which contains\ four Saposin-B domains, yielding the active saposins after proteolytic\ cleavage, and two Saposin-A domains that are removed in the activation\ reaction. \ The Saposin-B domains also occur in other \ proteins, many of them active in the lysis of membranes PUBMED:8003971, PUBMED:8868085. The saposin A-type domain may play a role in targeting, as propeptides containing the saposin A-type domain of the C-terminus of prosaposin and of the N-terminal part of pulmonary surfactant-associated protein B are involved in the transport to the lysosome and to secretory granules (lamellar bodies, which are lysosomal-like organelles), respectively PUBMED:8702672.

    \ 2565 IPR003713 \ The fliD operon of several bacteria consists of three flagellar genes, fliD, fliS, and fliT, and is transcribed in this order PUBMED:8550529. In Bacillus subtilis the operon encoding the flagellar proteins FliD, FliS, and FliT is sigma D-dependent PUBMED:8195064.\ 6060 IPR010417 \

    This is a family of plant seed-specific proteins identified in Arabidopsis thaliana. ATS3 is expressed in a pattern similar to the Arabidopsis seed storage protein genes PUBMED:10380802.

    \ 165 IPR007604 \ This entry represents a conserved region in the CP2 transcription factor family.\ 4092 IPR001895 \

    Ras proteins are membrane-associated molecular switches that bind GTP and GDP and slowly hydrolyze GTP to GDP PUBMED:1898771. The balance between the GTP bound (active) and GDP bound (inactive) states is regulated by the opposite action of proteins activating the GTPase activity and that of proteins which promote the loss of bound GDP and the uptake of fresh GTP PUBMED:8259209, PUBMED:15335949. The latter proteins are known as guanine-nucleotide dissociation stimulators (GDSs) (or also as guanine-nucleotide releasing (or exchange) factors (GRFs)). Proteins that act as GDS can be classified into at least two families, on the basis of sequence similarities, the CDC24 family (see ) and the CDC25 family.

    \

    The size of the proteins of the CDC25 family range from 309 residues (LTE1) to 1596 residues (sos). The sequence similarity shared by all these proteins is limited to a region of about 250 amino acids generally located in their C-terminal section (currently the only exceptions are sos and ralGDS where this domain makes up the central part of the protein). This domain has been shown, in CDC25 an SCD25, to be essential for the activity of these proteins.

    \ 4107 IPR004612 \ The Bacillus subtilis protein belonging to this family has been shown to be required for DNA recombination and repair.\ 6492 IPR009551 \

    This family represents a conserved region of unknown function within a number of hypothetical eukaryotic proteins.

    \ 1427 IPR001693 \

    Calcitonin PUBMED:3060108 is a 32 amino acid polypeptide hormone that causes a rapid but short-lived drop in the level of calcium and phosphate in the blood, by promoting the incorporation of these ions in the bones, alpha type. Alternative splicing of the gene coding for calcitonin produces a distantly related peptide of 37 amino acids, called calcitonin gene-related peptide (CGRP), beta type. CGRP induces vasodilatation in a variety of vessels, including the coronary, cerebral and systemic vasculature. Its abundance in the CNS also points toward a neurotransmitter or neuromodulator role.

    \

    Islet amyloid polypeptide (IAPP) PUBMED:2407732 (also known as diabetes-associated peptide (DAP), or amylin) is a peptide of 37 amino acids that selectively inhibits insulin-stimulated glucose utilization and glycogen deposition in muscle, while not affecting adipocyte glucose metabolism. Structurally, IAPP is closely related to CGRP.

    \

    Two conserved cysteines in the N-terminal of these peptides are known to be involved in a disulphide bond. The C-terminal residue of all three peptides is amidated.

    \
    \
                    xCxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxx-NH(2)\
                     |     |                             Amide group\
                     +-----+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \ \ 1175 IPR002533 \

    Alphaviruses are enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Semliki Forest and Sindbis viruses PUBMED:15378043. Alphaviruses consist of three structural proteins: the core nucleocapsid protein C, and the envelope proteins P62 and E1 () that associate as a heterodimer. The viral membrane-anchored surface glycoproteins are responsible for receptor recognition and entry into target cells through membrane fusion. The proteolytic maturation of P62 into E2 () and E3 causes a change in the viral surface. Together the E1, E2, and sometimes E3 glycoprotein "spikes" form an E1/E2 dimer or an E1/E2/E3 trimer, where E2 extends from the centre to the vertices, E1 fills the space between the vertices, and E3, if present, is at the distal end of the spike PUBMED:8107141, PUBMED:9445057. Upon exposure of the virus to the acidity of the endosome, E1 dissociates from E2 to form an E1 homotrimer, which is necessary for the fusion step to drive the cellular and viral membranes together PUBMED:11301009. This entry represents the alphaviral E3 glycoprotein. Most alphaviruses lose the peripheral protein E3, but in Semliki viruses it remains associated with the viral surface.

    \ 1859 IPR002838 \

    The proteins in this family have no known function.

    \ 4032 IPR003666 \ Photosystem I (PSI) is an integral membrane protein complex that uses light energy to mediate electron transfer from\ plastocyanin to ferredoxin. Subunit III (or PsaF) is one of at least 14 different subunits that compose the photosystem I reaction center (PSI-RC) PUBMED:8443351.\ 5389 IPR008483 \ This family consists of several uncharacterised proteins from the Borrelia burgdorferi and Borrelia garinii.\ 4281 IPR007644 \

    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain forms one of the two distinctive lobes of the Rpb2 structure. This domain is also known as the protrusion domain PUBMED:3116266. The other lobe, RNA polymerase Rpb2, domain 2, is nested within this domain.

    \ 1133 IPR000043 \ S-adenosyl-L-homocysteine hydrolase () (AdoHcyase) is an enzyme of\ the activated methyl cycle, responsible for the reversible hydration of \ S-adenosyl-L-homocysteine into adenosine and homocysteine. AdoHcyase is an\ ubiquitous enzyme which binds and requires NAD+ as a cofactor.\ AdoHcyase is a highly conserved protein PUBMED:1631127 of about 430 to 470 amino acids.\ The family contains a glycine-rich region in the central part of AdoHcyase; a region thought to be\ involved in NAD-binding.\ 4724 IPR001661 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 37 \ comprises enzymes with only one known activity; trehalase ().

    \ \

    Trehalase is the enzyme responsible for the degradation of the disaccharide alpha,alpha-trehalose yielding two glucose subunits PUBMED:8444853. It is an enzyme found in a wide variety of organisms and whose sequence has been highly conserved throughout evolution.

    \ 158 IPR005170 \

    This small domain is found in a family of proteins with the CBS domain and two CBS domains with this domain found at the C-terminus of the proteins, the domain is also found at the C-terminus of some Na+/H+ antiporters. This domain is also found in CorC that is involved in Magnesium and cobalt efflux. The function of this domain is uncertain but might be involved in modulating transport of ion substrates.

    \ 6258 IPR010931 \

    This entry represents the C-terminal region of RepB proteins from Lactococcus lactis (See ).

    \ 8135 IPR012387 \

    This group represents a tRNA ligase, yeast type. Please see the following relevant references: PUBMED:12466548, PUBMED:1922054.

    \ 6308 IPR011546 \

    This domain is found in the FtsH family of proteins that include FtsH a membrane-bound ATP-dependent protease universally conserved in prokaryotes PUBMED:12732516. The FtsH peptidases, which belong to MEROPS peptidase family M41 (clan MA(E)), efficiently degrade proteins that have a low thermodynamic stability - e.g. they lack robust unfoldase activity. This feature may be key and implies that this could be a criterion for degrading a protein. In Oenococcus oeni FtsH is involved in protection against environmental stress PUBMED:12667449, and shows increased expression under heat or osmotic stress. These two lines of evidence suggest that it is a fundamental prokaryotic self-protection mechanism that checks if proteins are correctly folded. The precise function of this N-terminal region is unclear.

    \ 782 IPR007794 \ The ribosome receptor is an integral endoplasmic reticulum protein that has been suggested to be involved in secretion. This highly conserved region is found towards the C terminus of the transmembrane domain PUBMED:11836413. The function is unclear.\ 277 IPR007165 \ These proteins are predicted transmembrane proteins with probably four transmembrane spans. The function of these bacterial proteins is unknown. The sequences do not appear to contain any conserved polar residues that could form an active site.\ 1156 IPR007071 \ A-kinase (or PKA)-anchoring protein AKAP95 is implicated in mitotic chromosome condensation by acting as a targeting molecule for the condensin complex. The protein contains two zinc fingers which are thought to mediate the binding of AKAP95 to DNA PUBMED:11964380.\ 7743 IPR012883 \

    ERp29 () is a ubiquitously expressed endoplasmic reticulum protein, and is involved in the processes of protein maturation and protein secretion in this organelle PUBMED:10727933, PUBMED:11435111. The protein exists as a homodimer, with each monomer being composed of two domains. The N-terminal domain featured in this family is organised into a thioredoxin-like fold that resembles the a domain of human protein disulphide isomerase (PDI) PUBMED:11435111. However, this domain lacks the C-X-X-C motif required for the redox function of PDI; it is therefore thought that the function of ERp29 is similar to the chaperone function of PDI PUBMED:11435111. The N-terminal domain is exclusively responsible for the homodimerisation of the protein, without covalent linkages or additional contacts with other domains PUBMED:11435111.

    \ 6783 IPR010713 \

    This entry represents the C terminus (approximately 60 residues) of plant xyloglucan endo-transglycosylase (XET). Xyloglucan is the predominant hemicellulose in the cell walls of most dicotyledons. With cellulose, it forms a network that strengthens the cell wall. XET catalyses the splitting of xyloglucan chains and the linking of the newly generated reducing end to the non-reducing end of another xyloglucan chain, thereby loosening the cell wall PUBMED:9487728.

    \ 6182 IPR009413 \

    This family consists of several bacterial and eukaryotic Aegerolysin-like proteins. It has been found that aegerolysin and ostreolysin are expressed during formation of primordia and fruiting bodies. It has been suggested that these haemolysins play an important role in initial phase of fungal fruiting. The bacterial members of this family are expressed during sporulation PUBMED:12020804.

    \ 7435 IPR011464 \

    This is a family of hypothetical proteins from Rhodopirellula baltica.

    \ 1265 IPR007079 \ This enzyme transforms N(2)-succinylglutamate into succinate and glutamate. This is the fifth and last step in arginine catabolism by the arginine succinyltransferase pathway.\ 8009 IPR012612 \

    This family consists of the small acid-soluble spore protein (SASP) N type (sspN). SspN is a 48 residues protein that is expressed only in the forespore compartment of sporulating Bacillus subtilis. The sspN gene is recognised equally by both sigma-G and sigma-F. The role of SspN is still not well-defined PUBMED:10333516.

    \ 4776 IPR002830 \

    This family of proteins is found in prokaryotes, archaea and yeast, with two members in A. fulgidus. They are related to UbiD, a 3-octaprenyl-4-hydroxybenzoate carboxy-lyase from Escherichia coli that is involved in ubiquinone biosynthesis PUBMED:11029449. The member from H. pylori has a C-terminal extension of just over 100 residues that is shared, in part, by the Aquifex aeolicus homologue.

    \ 364 IPR002888 \ The [2Fe-2S] binding domain is found in a range of enzymes including dehydrogenases, oxidases and oxidoreductases.\

    The aldehyde oxido-reductase (Mop) from the sulphate reducing anaerobic Gram-negative bacterium Desulphovibrio gigas is a homodimer of 907 amino acid residues subunits and is a member of the xanthine oxidase family. The protein contains a molybdopterin cofactor (Mo-co) and two different [2Fe-2S] centers. It is folded into four domains of which the first two bind the iron sulphur centers and the last two are involved in Mo-co binding. Mo-co is a molybdenum molybdopterin cytosine dinucleotide. Molybdopterin forms a tricyclic system with the pterin bicycle annealed to a pyran ring. The molybdopterin dinucleotide is deeply buried in the protein. The cis-dithiolene group of the pyran ring binds the molybdenum, which is coordinated by three more (oxygen) ligands PUBMED:7502041.

    \ 6883 IPR010758 \

    This family contains a number of bacterial short-chain alcohol dehydrogenases that are approximately 400 residues long. Alcohol dehydrogenases display a wide variety of substrate specificities, and play an important role in a broad range of physiological processes. Short-chain alcohol dehydrogenases form part of a group of alcohol dehydrogenases that are dependent upon NADP PUBMED:11358525.

    \ 7256 IPR009992 \

    This family represents a conserved region approximately 400 residues long within 15-O-acetyltransferase (Tri3), which seems to be restricted to ascomycete fungi. In Fusarium sporotrichioides, this is required for acetylation of the C-15 hydroxyl group of trichothecenes in the biosynthesis of T-2 toxin PUBMED:8593041.

    \ 4068 IPR002801 \

    Aspartate carbamoyltransferase (aspartate transcarbamylase, ATCase) exists as a dimer of catalytic trimers (3x33kDa) that are held together by three dimeric (2x17kDa) regulatory subunits ((c3)2(r2)3). ATCase plays a central role in the regulation of the pyrimidine pathway in bacteria. In (c3)2(r2)3 ATCases, the\ association of the catalytic subunits c3 with the regulatory subunits r2 is responsible for the establishment of positive co-operativity between catalytic sites for the binding of aspartate and it dictates the pattern of allosteric response toward nucleotide effectors. ATCase from Escherichia coli is the most extensively studied allosteric enzyme PUBMED:7791626. The crystal structure of the T-state, the T-state with CTP bound, the R-state with N-phosphonacetyl-L-aspartate (PALA) bound, and the R-state with phosphonoacetamide plus malonate bound have been used in interpreting kinetic and mutational studies.

    \ \

    A high-resolution structure of E. coli ATCase in the presence of PALA (a bisubstrate\ analog) allows a detailed description of the binding at the active site of the enzyme \ and allows a detailed model of the tetrahedral intermediate to be constructed. The\ entire regulatory chain has been traced showing that the N-terminal regions\ of the regulatory chains R1 and R6 are located in close proximity to each other\ and to the regulatory site. This portion of the molecule may be involved in the \ observed asymmetry between the regulatory binding sites as well as in the heterotropic \ response of the enzyme PUBMED:10651286.

    \ \

    ATCase from Erwinia herbicola differs from the\ other investigated enterobacterial ATCases by its absence of homotropic\ co-operativity toward the substrate aspartate and its lack of response to ATP which is\ an allosteric effector (activator) of this family of enzymes. Nevertheless, the E. herbicola ATCase has the same quaternary structure, two trimers of catalytic chains\ with three dimers of regulatory chains ((c3)2(r2)3), as other enterobacterial ATCases\ and shows extensive primary structure conservation PUBMED:10600394.

    \ \ 6105 IPR009377 \

    This family consists of several bacterial EutA ethanolamine utilisation proteins. The EutA protein is thought to protect the lyase (EutBC) from inhibition by CNB12 PUBMED:10464203.

    \ 6499 IPR009557 \

    This family consists of a number of Caenorhabditis elegans specific repeats of around 36 residues in length which are found in two hypothetical proteins. This family is found in conjunction with .

    \ 4295 IPR001427 \ Pancreatic ribonucleases (RNAse) are pyrimidine-specific endonucleases \ found in high quantity in the pancreas of certain mammals and of\ some reptiles PUBMED:3940901. Specifically, the enzymes are involved in endonucleolytic\ cleavage of 3'-phosphomononucleotides and 3'-phosphooligonucleotides ending\ in C-P or U-P with 2',3'-cyclic phosphate intermediates. Ribonuclease can\ unwind the DNA helix by complexing with single-stranded DNA; the complex\ arises by an extended multi-site cation-anion interaction between lysine\ and arginine residues of the enzyme and phosphate groups of the nucleotides.\ Other proteins belonging to the pancreatic RNAse family include: bovine\ seminal vesicle and brain ribonucleases; kidney non-secretory ribonucleases\ PUBMED:2734298; liver-type ribonucleases PUBMED:2611266; angiogenin, which induces vascularisation\ of normal and malignant tissues; eosinophil cationic protein PUBMED:2473157, a\ cytotoxin and helminthotoxin with ribonuclease activity; and frog liver\ ribonuclease and frog sialic acid-binding lectin.\ The sequence of pancreatic RNases contains four conserved disulphide bonds and\ three amino acid residues involved in the catalytic activity.\ 842 IPR005552 \ Scramblase is palmitoylated and contains a potential protein kinase C phosphorylation site. Scramblase exhibits Ca2+-activated phospholipid scrambling activity in vitro. There are also possible SH3 and WW binding motifs. Scramblase is involved in the redistribution of phospholipids after cell activation or injury PUBMED:11487015.\ 7127 IPR010841 \

    This family consists of several bacterial fibronectin-binding proteins which are thought to be involved in virulence in Listeria species PUBMED:10569795,PUBMED:11023185.

    \ 7171 IPR009943 \

    This family consists of several hypothetical plant proteins of around 250 residues in length. Members of this family seem to be found exclusively in Arabidopsis thaliana. The function of this family is unknown.

    \ 1483 IPR001919 \

    The microbial degradation of cellulose and xylans requires several types of enzyme such as endoglucanases (), cellobiohydrolases () (exoglucanases), or xylanases () PUBMED:1886523.\ Structurally, cellulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids.

    \

    The CBD domain is found either at the N-terminal or at the C-terminal extremity of these enzymes. As it is shown in the following schematic representation, there are two conserved cysteines in this CBD domain - one at each extremity of the domain - which have been shown PUBMED:1761039 to be involved in a disulphide bond. There are also four conserved tryptophan, two are involved in cellulose binding.\ The CBD of a number of bacterial cellulases has been shown to consist of about 105 amino acid residues PUBMED:1812490, PUBMED:10973978.

    \
    \
               +-------------------------------------------------+\
               |                                                 |\
              xCxxxxWxxxxxNxxxWxxxxxxxWxxxxxxxxWNxxxxxGxxxxxxxxxxCx\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \ 2813 IPR008119 \

    Toxoplasma gondii is an obligate intracellular apicomplexan protozoan \ parasite, with a complex lifestyle involving varied hosts PUBMED:11269320. It has two \ phases of growth: an intestinal phase in feline hosts, and an extra-intestinal phase in other mammals. Oocysts from infected cats develop \ into tachyzoites, and eventually, bradyzoites and zoitocysts in the \ extraintestinal host PUBMED:11269320. Transmission of the parasite occurs through \ contact with infected cats or raw/undercooked meat; in immunocompromised \ individuals, it can cause severe and often lethal toxoplasmosis. Acute \ infection in healthy humans can sometimes also cause tissue damage PUBMED:11269320.\

    \

    The protozoan utilises a variety of secretory and antigenic proteins to \ invade a host and gain access to the intracellular environment PUBMED:11269320. These \ originate from distinct organelles in the T. gondii cell termed micronemes, \ rhoptries, and dense granules. They are released at specific times during \ invasion to ensure the proteins are allocated to their correct target \ destinations PUBMED:11269320. \ Dense granule antigens (GRAs) are released from the T. gondii tachyzoite\ while still encapsulated in a host vacuole.

    \

    Gra6, one of these moieties, is\ associated with the parasitophorous vacuole PUBMED:10498186. It\ possesses a hydrophobic\ central region flanked by two hydrophilic domains, and is present as a\ single copy gene in the T.gondii genome PUBMED:10498186. Gra6\ shares a similar function\ with Gra2, in that it is rapidly targeted to a network of membranous tubules\ that connect with the vacuolar membrane PUBMED:10498186. Indeed,\ these two proteins,\ together with Gra4, form a multimeric complex that stabilises the parasite\ within the vacuole.

    \ \ 5684 IPR008384 \ This family consists of several eukaryotic ARP2/3 complex 20 kDa subunit (P20-ARC) proteins. The Arp2/3 protein complex has been implicated in the control of actin polymerisation in cells. The human complex consists of seven subunits which include the actin related proteins Arp2 and Arp3 it has been suggested that the complex promotes actin assembly in lamellipodia and may participate in lamellipodial protrusion PUBMED:9230079.\ 4242 IPR000892 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One \ of these families, the S26E family, includes mammalian S26 PUBMED:2993263; Octopus S26 PUBMED:2731467;\ Drosophila S26 (DS31) PUBMED:2928115; plant cytoplasmic S26; and fungal S26 PUBMED:7821815. These proteins \ have 114 to 127 amino acids.

    \ 4363 IPR001448 \

    Small, acid-soluble spore proteins (SASP or ASSP) are proteins bound to the spore DNA of bacteria of the genera Bacillus, Thermoactynomycetes, and\ Clostridium PUBMED:3059997, PUBMED:1569005. They are double-stranded DNA-binding\ proteins that cause DNA to change to an A-like conformation. They protect the\ DNA backbone from chemical and enzymatic cleavage and are thus involved in\ dormant spore's high resistance to UV light. SASP are degraded in the first\ minutes of spore germination and provide amino acids for both new protein\ synthesis and metabolism.

    \

    There are two distinct families of SASP: the alpha/beta type and the gamma-\ type. Alpha/beta SASP are small proteins of about sixty to seventy amino acid\ residues that are generally coded by a multigene family. The N terminus of\ alpha/beta SASP contains the site which is cleaved by a SASP-\ specific protease that acts during germination while the C terminus and is probably involved in DNA-binding.

    \ 626 IPR007064 \ The NMD3 protein is involved in nonsense mediated mRNA decay. This N-terminal region contains four conserved CXXC motifs that could be metal binding. NMD3 is involved in export of the 60S ribosomal subunit is mediated by the adapter protein Nmd3p in a Crm1p-dependent pathway PUBMED:10022925.\ 7159 IPR009936 \

    This family consists of several hypothetical bacterial proteins of around 150 residues in length. The function of this family is unknown.

    \ 2076 IPR007308 \ This is a protein of unknown function.\ 945 IPR004934 \

    Actin filaments have an intrinsic polarity each with a\ fast-growing (barbed) end and a slow-growing (pointed) end. To regulate the dynamics at these\ ends, capping proteins have evolved that specifically bind to either the barbed or the pointed ends\ of the filament, where they block the association and dissociation of monomers. Pointed ends, for which actin\ monomers have significantly lower association and dissociation rate-constants than for barbed, are capped by either\ the Arp2/3 complex or tropomodulins PUBMED:14573353.

    \ \ Tropomodulin is a novel tropomyosin regulatory protein that binds to the end of erythrocyte tropomyosin and blocks head-to-tail\ association of tropomyosin along actin filaments PUBMED:1370827. Limited proteolysis shows this protein is composed of two domains. The unstructured tropomyosin-binding\ region at the N-terminus has an actin pointed-end-capping activity that is dramatically up-regulated\ by tropomyosin coating of the actin filamentPUBMED:11029591. The second region is found near the C-terminus. This tropomyosin-independent\ capping-domain caps pure actin.

    \ \ 549 IPR005592 \

    The N-terminal region of , is found on a subset of Lipase 3 containing proteins.

    \ 5346 IPR008871 \

    This family of proteins contain the coat proteins of the Totiviruses.

    \ 7249 IPR008055 \

    Neurotensin is a 13-residue peptide transmitter, sharing significant\ similarity in its 6 C-terminal amino acid residues with several other\ neuropeptides, including neuromedin N (which is derived from the same\ precursor). This C-terminal region is responsible for the full biological\ activity, the N-terminal portion having a modulatory role. \

    \

    Neurotensin is distributed throughout the central nervous system, with\ highest levels in the hypothalamus, amygdala and nucleus accumbens. It\ induces a variety of effects, including: analgesia, hypothermia and \ increased locomotor activity. It is also involved in regulation of dopamine\ pathways. In the periphery, neurotensin is found in endocrine cells of the\ small intestine, where it leads to secretion and smooth muscle contraction\ PUBMED:11811984. The neurotensin/neuromedin N precursor can also be processed to\ produce large 125-138 amino acid peptides with the neurotensin or neuromedin\ N sequence at their C-terminus. These large peptides appear to be less\ potent than their smaller counterparts, but are also less sensitive to \ degradation and may represent endogenous, long-lasting activators in a\ number of pathophysiological situations.

    \ 5144 IPR007981 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of aspartic peptidases belong to the MEROPS peptidase family A5 (thermopsin family, clan A-). Currently the protein fold and active site residues are not known for any members of this family. The type example is thermopsin from Sulfolobus acidocaldarius.\ Thermopsin is a\ thermostable acid protease which is capable of hydrolysing the following bonds: Leu-Val, Leu-Tyr, Phe-Phe, Phe-Tyr, and Tyr-Thr. The specificity of thermopsin is therefore similar to that of pepsin, that is, it prefers large hydrophobic residues at both sides of the scissile bond PUBMED:2104844.

    \ 320 IPR007789 \ This family contains uncharacterised proteins found in Arabidopsis thaliana.\ 2834 IPR000645 \ The secretion pathway (GSP) for the export of proteins (also called the type II pathway) PUBMED:8438237\ requires a number of protein components. One of them is known as the 'N' protein and has been sequenced\ in a variety of bacteria such as Aeromonas hydrophila (gene exeN); Erwinia carotovora (gene outN); Klebsiella pneumoniae (gene\ pulN); or Vibrio cholerae (gene epsN). The size of the 'N' protein is around 250 amino acids. It apparently\ contains a single transmembrane domain located in the N-terminal section. The short N-terminal domain is\ predicted to be cytoplasmic and the large C-terminal domain periplasmic.\ 7764 IPR012466 \

    This family is composed of sequences derived from a number of hypothetical eukaryotic proteins of unknown function.

    \ 7112 IPR009905 \

    This family contains the bacterial enzyme 2-vinyl bacteriochlorophyllide hydratase (approximately 150 residues long). This is involved in the light-independent bacteriochlorophyll biosynthesis pathway by adding water across the 2-vinyl group PUBMED:8385667.This enzyme is apparently absent from cyanobacteria (which do not use bacteriochlorophyll).

    \ 1836 IPR000888 \

    Deoxythymidine diphosphate (dTDP)-4-keto-6-deoxy-d-hexulose 3, 5-epimerase (RmlC, ) is involved in the biosynthesis of dTDP-l-rhamnose, which is an essential component of the bacterial cell wall, converting dTDP-4-keto-6-deoxy-D-glucose to dTDP-4-keto-L-rhamnose.

    \

    The crystal structure of RmlC from Methanobacterium thermoautotrophicum was determined in the presence and absence of a substrate analogue. RmlC is a homodimer comprising a central jelly roll motif, which extends in two directions into longer beta-sheets. Binding of dTDP is stabilized by ionic interactions to the phosphate group and by a combination of ionic and hydrophobic interactions with the base. The active site, which is located in the centre of the jelly roll, is formed by residues that are conserved in all known RmlC sequence homologues. The active site is lined with a number of charged residues and a number of residues with hydrogen-bonding potentials, which together comprise a potential network for substrate binding and catalysis. The active site is also lined with aromatic residues\ which provide favorable environments for the base moiety of dTDP and potentially for the sugar moiety of the substrate PUBMED:10827167.

    \ 3425 IPR005865 \

    This model describes the N5-methyltetrahydromethanopterin: coenzyme M methyltransferase subunit C in methanogenic archaea. This methyltranferase is a\ membrane-associated enzyme complex that uses methyl-transfer reaction to drive a sodium-ion pump. Archaea have evolved energy-yielding pathways marked by one-carbon biochemistry featuring novel cofactors and enzymes. This transferase is involved in the transfer of a methyl group from N5-methyltetrahydromethanopterin to coenzyme M. In an accompanying reaction, methane is produced by two-electron reduction of the methyl moiety in methyl-coenzyme M by another enzyme methyl-coenzyme M reductase.

    \ \ 7917 IPR012556 \

    This family consists of the entericidin antidote/toxin peptides. The entericidin locus is activated in stationary phase under high osmolarity conditions by rho-S and simultaneously repressed by the osmoregulatory EnvZ/OmpR signal transduction pathway. The entericidin locus encodes tandem paralogous genes (ecnAB) and directs the synthesis of two small cell-envelope lipoproteins which can maintain plasmids in bacterial population by means of post-segregational killing PUBMED:9677290.

    \ 4036 IPR000932 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the intrinsic antenna proteins CP43 (PsbC) and CP47 (PsbB) found in the reaction centre of PSII. These polypeptides bind to chlorophyll a and beta-carotene and pass the excitation energy on to the reaction centre PUBMED:12163077. This family also includes the iron-stress induced chlorophyll-binding protein CP43 (IsiA), which evolved in cyanobacteria from a PSII protein to cope with light limitations and stress conditions. Under iron-deficient growth conditions, CP43 associates with PSI to form a complex that consists of a ring of 18 or more CP43 molecules around a PSI trimer, which significantly increases the light-harvesting system of PSI. IsiA can also provide photoprotection for PSII PUBMED:15301529.

    \ \ 2873 IPR002506 \ The hepatitis delta virus (HDV) encodes a single protein, the\ hepatitis delta antigen (HDAg). The central region of this protein\ has been shown to bind RNA PUBMED:8245865. Several interactions are also\ mediated by a coiled-coil region at the N terminus of the protein PUBMED:9687364.\ 2431 IPR001090 \

    \ Interactions between the Eph receptor tyrosine kinases and their \ membrane-bound ligands, ephrins are promiscuous, but largely fall \ into two groups: EphA receptors bind to GPI-anchored ephrin-A ligands, \ while EphB receptors bind to ephrin-B proteins that have a transmembrane \ and cytoplasmic domain PUBMED:10072375. \ Remarkably, ephrin-B proteins transduce signals, such that bidirectional \ signaling can occur upon interaction with Eph receptor. An important\ role of Eph receptors and ephrins is to mediate cell-contact-dependent \ repulsion. Eph receptors and ephrins also act at boundaries to channel \ neuronal growth cones along specific pathways, restrict the migration \ of neural crest cells, and via bidirectional signaling prevent \ intermingling between hindbrain segments. \ Intriguingly, Eph receptors and ephrins can also trigger an adhesive\ response of endothelial cells and are required for the remodeling of\ blood vessels PUBMED:10730216.

    \

    \ Biochemical studies suggest that the extent of multimerization of\ Eph receptors modulates the cellular response and that the actin\ cytoskeleton is one major target of the intracellular pathways \ activated by Eph receptors PUBMED:10207129.\ Eph receptors and ephrins have thus emerged as key regulators of the\ repulsion and adhesion of cells that underlie the establishment,\ maintenance, and remodeling of patterns of cellular \ organization PUBMED:10730216.

    \ 2199 IPR007496 \

    This is an uncharacterised bacterial integral membrane protein, possibly involved in cysteine biosynthesis. It is speculated to be involved in sulphate transport.

    \ 1356 IPR000060 \

    These prokaryotic transport proteins belong to a family known as BCCT (for Betaine /\ Carnitine / Choline Transporters) and are specific for compounds containing\ a quaternary nitrogen atom. The BCCT proteins contain 12 transmembrane regions\ and are energized by proton symport. They contain a conserved region with four\ tryptophans in their central region PUBMED:8752321.

    \ 4824 IPR000612 \ Several proteins have been shown PUBMED:9588799 to be evolutionary related. These are small proteins of from 52 to\ 140 amino-acid resiudes that contains two transmembrane domains.\ 3914 IPR001070 \ This family includes the VP2 and VP3 internal coat proteins\ from polyomavirusess. Polyomaviruses are small dsDNA tumor viruses.\ Their capsids contain 360 copies of the VP1 proteins arranged in 72 pentamers. This capsid\ encloses the internal proteins VP2 and VP3, as well as the viral\ DNA. A single copy of VP2 or VP3 associates with each VP1 pentamer. A\ crystal structure shows that the C terminal region of the VP2/VP3 protein\ interacts with the VP1 pentamer PUBMED:9628860.\ 7978 IPR012954 \

    This C-terminal domain is found in BAP28-like nucleolar proteins PUBMED:15112237.

    \ 4338 IPR003029 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The S1 domain was originally identified in ribosomal protein S1 but is found in a large number of RNA-associated proteins. The structure of the S1 RNA-binding domain from the Escherichia coli polynucleotide phosphorylase has been determined using NMR methods and consists of a five-stranded antiparallel beta barrel. Conserved residues on one face of the barrel and adjacent loops form the putative RNA-binding site PUBMED:9008164.

    \

    The structure of the S1 domain is very similar to that of cold shock proteins. This suggests that they may both be derived from an ancient nucleic acid-binding protein PUBMED:9008164.

    \ 4665 IPR001631 \

    Eukaryotic-like DNA topoisomerase I, otherwise known as relaxing enzyme, untwisting enzyme or swivelase, () is one of the two types of enzyme that catalyze the interconversion of topological DNA isomers and is vital for the processes of replication, transcription, and recombination PUBMED:2560656, PUBMED:7773745, PUBMED:2542938, PUBMED:7770916. Topoisomerase I catalyses the ATP-independent breakage of single-stranded DNA, followed by passage and rejoining of another single-stranded DNA region PUBMED:2544263. This reaction brings about the conversion of one topological DNA isomer into another: e.g., relaxation of positive and negative super-coils; interconversion of simple and knotted rings of single-stranded DNA; and intertwisting of single-stranded rings of complementary sequences PUBMED:2544263, PUBMED:1849260.

    \

    When a eukaryotic type 1 topoisomerase breaks a DNA backbone bond, it simultaneously forms a protein-DNA link where the hydroxyl group of a tyrosine residue is joined to a 3'-phosphate on DNA, at one end of the enzyme-severed DNA strand. In eukaryotes and poxvirus topoisomerases I, there are a number of conserved residues in the region around the active site tyrosine.

    \

    Vaccinia virus, a cytoplasmically-replicating poxvirus, encodes a type I DNA topoisomerase that is biochemically similar to eukaryotic-like DNA topoisomerases I, and which has been widely studied as a model topoisomerase. It is the smallest topoisomerase known and is unusual in that it is resistant to the potent chemotherapeutic agent camptothecin. The crystal structure of an amino-terminal fragment of vaccinia virus DNA topoisomerase I shows that the fragment forms a five-stranded, antiparallel beta-sheet with two short alpha-helices and connecting loops. Residues that are conserved between all eukaryotic-like type I topoisomerases are not clustered in particular regions of the structure PUBMED:7994576.

    \

    Human topoisomerase I has been shown to be inhibited by camptothecin (CPT), a plant alkaloid with antitumour activity PUBMED:1849260. The crystal structures of human topoisomerase I comprising the core and carboxyl-terminal domains in covalent and noncovalent complexes with 22-base pair DNA duplexes reveal an enzyme that "clamps" around essentially B-form DNA. The core domain and the first eight residues of the carboxyl-terminal domain of the enzyme, including the active-site nucleophile tyrosine-723, share significant structural similarity with the bacteriophage family of DNA integrases. A binding mode for the anticancer drug camptothecin has been proposed on the basis of chemical and biochemical information combined with the three-dimensional structures of topoisomerase I-DNA complexes PUBMED:9488644.

    \ 1569 IPR000996 \

    Clathrin PUBMED:1973890 is the major coat-forming protein that encloses vesicles such as coated pits and forms cell surface patches involved in membrane traffic within eukaryotic cells. The \ clathrin coats (called triskelions) are composed of three heavy chains (180 kD) and three light chains (23 to 27 kD), the light chains being more divergent in sequence than the heavy chains PUBMED:14617352. The clathrin light chains, which may help to properly orient the assembly and disassembly of the clathrin coats, bind non-covalently to the heavy chain, they also bind calcium and interact with the hsc70 uncoating ATPase. In higher eukaryotes two genes code for distinct but related light chains, LC(a) and LC(b). Each of the two genes can yield, by tissue-specific alternative splicing, two separate forms that differ by the insertion of a sequence of respectively 30 or 18 residues. In the N-terminal part of the clathrin light chains, there is a domain of 21 amino acid residues that is perfectly conserved in LC(a) and LC(b). In yeast there is a single light chain (gene CLC1) whose sequence is only distantly related to that of higher eukaryotes.

    \ 3582 IPR005318 \

    This family contains bacterial outer membrane porins with serine protease activity PUBMED:9636669. The serine peptidase domain belongs to MEROPS peptidase family S43 (clan PA(S)).

    \ \

    However many of these proteins are not peptidases and are classified as non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity of peptidases in the S43 family. The putative role of these protein could be to bind ligands and to facilitate the diffusion through the outer membrane.

    \ 7154 IPR009932 \

    This family consists of several hypothetical mammalian proteins of around 240 residues in length.

    \ 1956 IPR004879 \

    This is a group of uncharacterised proteins.

    \ 6480 IPR010596 \

    This entry represents the N-terminal region of the Drosophila specific Methuselah protein. Drosophila Methuselah (Mth) mutants have a 35% increase in average lifespan and increased resistance to several forms of stress, including heat, starvation, and oxidative damage. The protein affected by this mutation is related to G protein-coupled receptors of the secretin receptor family. Mth, like secretin receptor family members, has a large N-terminal ectodomain, which may constitute the ligand binding site PUBMED:11274391.

    \ 5870 IPR009267 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 7178 IPR010856 \

    This family consists of several hypothetical Enterobacterial proteins, of around 420 residues in length. Members of this family are often known as YbiU. The function of this family is unknown.

    \ 2386 IPR001662 \ Eukaryotic elongation factor 1 (EF-1) is responsible for the GTP-dependent binding of aminoacyl-tRNAs to the ribosomes PUBMED:2278101. EF-1 is composed of four subunits: the alpha chain which binds GTP and aminoacyl-tRNAs, the gamma chain that probably plays a role in anchoring the complex to other cellular components and the beta and delta (or beta') chains. The gamma chain is a protein of about 410 to 440 residues.\ 2380 IPR004186 \ The Epstein-Barr virus nuclear antigen 1 (EBNA1) binds to and activates DNA\ replication from the latent origin of replication in EBV. The crystal structure of the DNA-binding and dimerization domains were solved PUBMED:7553871, and it was found that EBNA1 appears to bind DNA via two independent regions, the core and the flanking DNA-binding domains. This DNA-binding domain has a ferredoxin-like fold.\ 6834 IPR010738 \

    This family consists of several hypothetical proteins of around 125 residues in length. Members of this family seem to be specific to Listeria and Streptococcus species. The function of this family is unknown.

    \ 2119 IPR007405 \ This is a family of uncharacterised eubacterial proteins.\ 6909 IPR009784 \

    This family consists of several hypothetical bacterial proteins but contains one sequence () from Saccharomyces cerevisiae. Members of this family are typically around 200 residues in length. The function of this family is unknown.

    \ 4253 IPR005324 \

    This is a family of proteins related to the 30S ribosomal protein S5P from Sulfolobus acidocaldarius (). Ribosomal protein S5 is one of the proteins from the small ribosomal subunit.\ In Escherichia coli, S5 is known to be important in the assembly and function\ of the 30S ribosomal subunit. Mutations in S5 have been shown to increase\ translational error frequencies.

    \ 3070 IPR000226 \ Interleukin-7 (IL-7) PUBMED:2663018 is a cytokine that serves as a growth factor for\ early lymphoid cells of both B- and T-cell lineages. Interleukin-9 (IL-9) PUBMED:1971295\ is a cytokine that supports IL-2 independent and IL-4 independent growth of\ helper T-cells.\ Interleukin-7 and -9 seems to be evolutionary related PUBMED:15335670.\ 1987 IPR002414 \

    This domain has no known function. It is found in various hypothetical proteins and putative lipoproteins from mycoplasmas.

    \ 4768 IPR003008 \

    This domain is found in all tubulin chains, as\ well as the bacterial FtsZ family of proteins. These proteins\ are involved in polymer formation. Tubulin is the major component\ of microtubules, while FtsZ is the polymer-forming protein\ of bacterial cell division, it is part of a ring in the middle of the\ dividing cell that is required for constriction of cell membrane and\ cell envelope to yield two daughter cells. FtsZ and tubulin are GTPases, this entry is the GTPase domain.\ FtsZ can polymerise into tubes, sheets, and rings in vitro and is\ ubiquitous in bacteria and archaea.

    \ 4537 IPR003808 \

    This family consists of the SufE-related proteins. These have been implicated in Fe-S metabolism and export PUBMED:11251816.

    \ 4569 IPR006522 \

    These sequences describe protein S of phage P2, suggested experimentally to act in tail completion and stable head joining, and related proteins from a number of phages.

    \ 7504 IPR011634 \ This domain is found in the BcgI restriction enzyme beta subunit. The reference PUBMED:9642063 suggests that this component is involved in target recognition.\ 7048 IPR010818 \

    This family consists of several hypothetical putative lipoproteins which seem to be found specifically in the bacterium Leptospira interrogans. Members of this family are typically around 670 resides in length and their function is unknown.

    \ 4341 IPR002133 \

    S-adenosylmethionine synthetase (MAT, ) is the enzyme that catalyzes the formation of S-adenosylmethionine (AdoMet) from methionine and ATP PUBMED:1696256. AdoMet is an important methyl donor for transmethylation and is also the propylamino donor in polyamine biosynthesis.

    \

    In bacteria there is a single isoform of AdoMet synthetase (gene metK), there are two in budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a multigene family.

    \

    The sequence of AdoMet synthetase is highly conserved throughout isozymes and species. The active sites of both the Escherichia coli and rat liver MAT reside between two subunits, with contributions from side chains of residues from both subunits,\ resulting in a dimer as the minimal catalytic entity. The side chains that contribute to the ligand binding sites are conserved between the two proteins. In the\ structures of complexes with the E. coli enzyme, the phosphate groups have the same positions in the (PPi plus Pi) complex and the (ADP plus Pi) complex,\ and are located at the bottom of a deep cavity with the adenosyl group nearer the entrance PUBMED:1213535.

    \ 6224 IPR009436 \

    This family consists of several angiotensin II, type I receptor-associated protein (AGTRAP) sequences. AGTRAP is known to interact specifically with the C-terminal cytoplasmic region of the angiotensin II type 1 (AT(1)) receptor to regulate different aspects of AT(1) receptor physiology. The function of this family is unclear.

    \ 4327 IPR003432 \ The bacterial replication terminator protein (RTP) plays a role in the termination of DNA replication by impeding replication fork movement. Two RTP dimers bind to the two inverted repeat regions at the termination site.\ 4260 IPR000056 \ Ribulose-phosphate 3-epimerase () (also known as pentose-5-phosphate\ 3-epimerase or PPE) is the enzyme that converts D-ribulose 5-phosphate into\ D-xylulose 5-phosphate in Calvin's reductive pentose phosphate cycle. In\ Alcaligenes eutrophus two copies of the gene coding for PPE are known PUBMED:1429456,\ one is chromosomally encoded , the other one is on a plasmid .\ PPE has been found in a wide range of bacteria, archaebacteria, fungi and\ plants. All the proteins have from 209 to 241 amino acid residues.\ The enzyme has a TIM barrel structure.\ 290 IPR007573 \ This is a family of related proteins that is plant specific.\ 2461 IPR007441 \ EutH is a bacterial membrane protein whose molecular function is unknown. It has been suggested that it may act as an ethanolamine transporter, responsible for carrying ethanolamine from the periplasm to the cytoplasm PUBMED:10464203.\ 1322 IPR002585 \ These proteins are cytochrome bd type terminal oxidases that catalyse quinol dependent, Na+ independent oxygen uptake PUBMED:8626304. Members of this family are integral membrane proteins and contain a protoheame IX center B558. \

    Cytochrome bd may play an important role in microaerobic nitrogen fixation in the enteric bacterium Klebsiella pneumoniae, where it is expressed under all conditions that permit diazotrophy PUBMED:9274021. Subunit I binds a single b-haem, through ligands at His186 and Met393 (using SW:P11026 numbering). In addition His19 is a ligand for the haem b found in subunit II ().

    \ 7086 IPR009886 \

    This family consists of several mammalian HCaRG(hypertension-related, calcium-regulated gene) proteins. HCaRG is negatively regulated by extracellular calcium concentration, and its basal mRNA levels are higher in hypertensive animals. HCaRG is a nuclear protein potentially involved in the control of cell proliferation PUBMED:10918053.

    \ 6466 IPR009539 \

    This family consists of several strabismus (STB) or Van Gogh-like (VANGL) proteins 1 and 2. The exact function of this family is unknown. It is thought, however that STB1 gene and STB2 may be potent tumour suppressor gene candidates PUBMED:12060845.

    \ 3018 IPR007038 \ This family of proteins are hydrogenase/urease accessory proteins. They contain many conserved histidines that are likely to be involved in nickel binding.\ 478 IPR001348 \

    ATP phosphoribosyltransferase () is the enzyme that catalyzes the first step \ in the biosynthesis of histidine in bacteria, fungi and plants.

    \ 4332 IPR003251 \

    Rubrerythrin (Rr), found in anaerobic sulphate-reducing bacteria PUBMED:7830612, is a fusion protein containing an N-terminal diiron-binding\ domain and a C-terminal domain homologous to rubredoxin PUBMED:1657933. The physiological role of Rr has not been identified.

    \ \

    The 3-D structure of Desulphovibrio vulgaris rubrerythrin has been solved PUBMED:8646540. The structure reveals a tetramer of two-domain\ subunits. In each monomer, the N-terminal 146 residues form a four-alpha-helix bundle containing the diiron-oxo site (centre I), and the C-terminal 45 residues form a rubredoxin-like FeS4 domain.

    \ 6546 IPR009593 \

    This family consists of several hypothetical bacterial proteins of around 155 residues in length. Family members are present in Rhizobium, Agrobacterium and Streptomyces species.

    \ 4131 IPR007594 \

    Asymmetric lipid distribution is a fundamental characteristic of biological lipid bilayers, one such axample is the translocation of the Man5GlcNAc2-PP-Dol intermediate from the cytosolic side of the ER membrane to the lumen before the completion of the biosynthesis of Glc3Man9GlcNAc2-PP-Dol PUBMED:11807558. RFT1 encodes an evolutionarily conserved protein required for this translocation.

    \ 6560 IPR010613 \

    This entry represents the N-terminal region of Pescadillo. Pescadillo protein localises to distinct substructures of the interphase nucleus including nucleoli, the site of ribosome biogenesis. During mitosis pescadillo closely associates with the periphery of metaphase chromosomes and by late anaphase is associated with nucleolus-derived foci and prenucleolar bodies. Blastomeres in mouse embryos lacking pescadillo arrest at morula stages of development, the nucleoli fail to differentiate and accumulation of ribosomes is inhibited. It has been proposed that in mammalian cells pescadillo is essential for ribosome biogenesis and nucleologenesis and that disruption to its function results in cell cycle arrest PUBMED:12237316.

    \ 7600 IPR011690 \ This region is approximately 35 residues long. It is found repeated in a number of putative phosphate starvation-inducible proteins expressed by various bacterial species. PsiF () is known to be an example of such phosphate starvation-inducible proteins PUBMED:2160940.\ 959 IPR001012 \ The UBX domain is found in ubiquitin-regulatory proteins, which are members of the ubiquitination pathway, as well as a number of other proteins including FAF-1 (FAS-associated factor 1), the human Rep-8 reproduction protein and several hypothetical proteins from yeast. The function of the UBX domain is not known although the fragment of avian FAF-1 containing the UBX domain causes apoptosis of transfected cells.\ 7428 IPR011458 \

    This is a family of paralogous proteins in Leptospira interrogans. Several (e.g. ) have been annotated as possible CopG-like transcriptional regulators (see ).

    \ 5944 IPR009300 \

    This family consists of several Staphylococcus aureus bacteriophage RinB proteins and related sequences from their host. The int gene of staphylococcal bacteriophage phi 11 is the only viral gene responsible for the integrative recombination of phi 11. rinA and rinB, are both required to activate expression of the int gene PUBMED:8432703.

    \ 4796 IPR004339 \

    UL49 proteins are present in the viral tegument at the surface of the nucleocapsid PUBMED:12134026. Many of the nonconserved tegument proteins of alpha-herpes viruses play important roles during different steps of the viral replication cycle, such as the shutoff of host cell functions by the vhs protein encoded by UL41 and the transcriptional activation of viral immediate-early genes by the UL48 gene product, VP16. UL49 of HSV-1 has been shown to directly interact with VP16. The UL49 gene products of HSV-1 and bovine herpesvirus 1 (BHV-1) exhibit virus-independent intercellular trafficking of unknown biological function but are dispensable for productive viral replication.

    \

    Envelope glycoprotein M (gM) and the complex formed by glycoproteins E (gE) and I (gI) are involved in the secondary envelopment of pseudorabies virus (PrV) particles in the cytoplasm of infected cells. In the absence of the gE-gI complex and gM, envelopment is blocked and capsids surrounded by tegument proteins accumulate in the cytoplasm. The cytoplasmic domains of gE and gM specifically interact with the C-terminal part of the UL49 gene product of PrV suggesting a role for the protein in secondary envelopment during herpesvirus virion maturation PUBMED:12134026.

    \ 4103 IPR001553 \

    The recA gene product is a multifunctional enzyme that plays a role in homologous recombination, DNA repair and induction of the SOS response PUBMED:1896024. In homologous recombination, the protein functions as a DNA-dependent ATPase, promoting synapsis, heteroduplex formation and strand exchange between homologous DNAs PUBMED:1896024. RecA also acts as a protease cofactor that promotes autodigestion of the lexA product and phage repressors. The proteolytic inactivation of the lexA repressor by an activated form of recA may cause a derepression of the 20 or so genes involved in the SOS response, which regulates DNA repair, induced mutagenesis, delayed cell division and prophage induction in response to DNA damage PUBMED:1896024.

    RecA is a protein of about 350 amino-acid residues. Its sequence is very well conserved PUBMED:9187054, PUBMED:7592482, PUBMED:8587109 among eubacterial species. It is also found in the chloroplast of plants PUBMED:1518831. RecA-like proteins are found in archaea and diverse eukaryotic organisms, like fission yeast, mouse or human. In the filament\ visualised by X-ray crystallography, ß-strand 3, the loop C-terminal to ß-strand 2, and alpha-helix D of the core domain form one surface that packs against\ alpha-helix A and ß-strand 0 (the N-terminal domain) of an adjacent monomer during polymerisation [Lusetti and Cox, Annu. Rev. Biochem. 2002. 71:71-100.]. The core ATP-binding site domain is well conserved, with 14 invariant residues. It contains the nucleotide binding loop between ß-strand 1 and\ alpha-helix C. The Escherichia coli sequence GPESSGKT matches the consensus sequence of amino acids (G/A)XXXXGK(T/S) for the Walker A box (also\ referred to as the P-loop) found in a number of nucleoside triphosphate (NTP)-binding proteins. Another\ nucleotide binding motif, the Walker B box is found at ß-strand 4 in the RecA structure. The Walker B\ box is characterised by four hydrophobic amino acids followed by an acidic residue (usually aspartate). Nucleotide specificity and additional ATP binding interactions are contributed by the amino acid residues at ß-strand 2 and the loop C-terminal to that\ strand, all of which are greater than 90% conserved among bacterial RecA proteins.

    \ 6539 IPR009587 \

    This family consists of several bacterial proteins of around 150 residues in length which are specific to Escherichia coli, Salmonella species and Yersinia pestis. The function of this family is unknown.

    \ 7238 IPR010878 \

    This family consists of several Streptococcus thermophilus bacteriophage Gp111 proteins of around 110 residues in length. The function of this family is unknown.

    \ 6696 IPR009666 \

    This family contains hypothetical proteins of unknown function that are approximately 120 residues long. Family members include eukaryotic and bacterial proteins.

    \ 7311 IPR011121 \

    This is a C-terminal tryptophan rich domain found in membrane proteins of Synechocystis and Bradyrhizobium normally found in 2 to 3 copies.

    \ 4749 IPR000533 \ Tropomyosins PUBMED:3606587, are a family of closely related proteins present in muscle and non-muscle cells. In striated muscle, tropomyosin mediate the interactions between the troponin complex and actin so as to regulate muscle contraction PUBMED:12690456. The role of tropomyosin in smooth muscle and non-muscle tissues is not clear. Tropomyosin is an alpha-helical protein that forms a coiled-coil structure of 2 parallel helices containing 2 sets of 7 alternating actin binding sites PUBMED:6993480. There are multiple cell-specific isoforms, created by differential splicing of the messenger RNA from one gene, but the proportions of the isoforms vary between different cell types. Muscle isoforms of tropomyosin are characterized by having 284 amino acid residues and a highly conserved N-terminal region, whereas non-muscle forms are generally smaller and are heterogeneous in their N-terminal region.\ \

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Met e 1.

    \ 1168 IPR006047 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Alpha amylase is classified as family 13 of the glycosyl hydrolases (). The structure is an 8 stranded alpha/beta barrel containing the active site, interrupted by a ~70 a.a. calcium-binding domain protruding between beta strand 3 and alpha helix 3, and a carboxyl-terminal Greek key beta-barrel domain.

    \ 5259 IPR008872 \ This is a family of Bacillus insecticidal crystal toxins. Strains of Bacillus that have this insecticidal activity use a binary toxin comprised of two proteins, P51 and P42 (this family). Members of this family are highly conserved between strains of different serotypes and phage groups PUBMED:9500937.\ 4760 IPR006761 \

    Tsg was identified in drosophila as being required to specify the dorsal-most structures in the embryo, for example the amnioserosa. Biochemical experiments have revealed three key properties of Tsg:

  • it can synergistically inhibit Dpp/BMP action in both Drosophila melanogaster and vertebrates by forming a tripartite complete between itself, SOG/chordin and a BMP ligand;
  • Tsg seems to enhance the Tld/BMP-1-mediated cleavage rate of SOG/chordin and may change the preference of site utilisation;
  • Tsg can promote the dissociation of chordin cysteine-rich-containing fragments from the ligand to inhibit BMP signalling PUBMED:7958834, PUBMED:11260716.
  • \ 5253 IPR008457 \ Copper sequestering activity displayed by some bacteria is determined by copper-binding protein products of the copper resistance operon (cop). CopD, together with CopC, perform copper uptake into the cytoplasm PUBMED:7917425.\ 436 IPR001503 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 10 comprises enzymes with two known activities; galactoside 3(4)-L-fucosyltransferase () and galactoside 3-fucosyltransferase ().

    \ \

    The galactoside 3-fucosyltransferases display similarities with the alpha-2 and alpha-6-fucosyltranferases PUBMED:9451017. The biosynthesis of the carbohydrate antigen sialyl Lewis X (sLe(x)) is dependent on the activity of an galactoside 3-fucosyltransferase. This enzyme catalyses the transfer of fucose from GDP-beta-fucose to the 3-OH of N-acetylglucosamine present in lactosamine acceptors PUBMED:9042366.

    \

    Some of the proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. Galactoside 3(4)-L-fucosyltransferase () belongs to the Lewis blood group system and is associated with Le(a/b) antigen.

    \ 675 IPR003477 \

    PemK is a growth inhibitor in Escherichia coli known to bind to the promoter region of the Pem operon, auto-regulating synthesis. It is responsible for mediating cell death through inhibiting protein synthesis through the cleavage of single-stranded RNA. PemK is part of the PemK-PemI system, where PemI is an antitoxin that inhibits the action of the PemK toxin PUBMED:15024022. PemK homologues have been found in a wide range of bacteria, which together form an endonuclease family that interfere with mRNA function. This family consists of the PemK protein in addition to ChpA, ChpB, Kid and MazF.

    \ \ \ 5661 IPR008616 \ This family consists of the N-terminal region of the prokaryotic fibronectin-binding protein, the C-terminal region is . Fibronectin binding is considered to be an important virulence factor in streptococcal infections. Fibronectin is a dimeric glycoprotein that is present in a soluble form in plasma and extracellular fluids; it is also present in a fibrillar form on cell surfaces. Both the soluble and cellular forms of fibronectin may be incorporated into the extracellular tissue matrix. While fibronectin has critical roles in eukaryotic cellular processes, such as adhesion, migration and differentiation, it is also a substrate for the attachment of bacteria. The binding of pathogenic Streptococcus pyogenes and Staphylococcus aureus to epithelial cells via fibronectin facilitates their internalisation and systemic spread within the host PUBMED:12055283.\ 1137 IPR005830 \

    This family represents the pore forming lobe of aerolysin, and the related toxins hemolysin and the leukocidin S subunit.

    \ 7213 IPR010864 \

    This family consists of several hypothetical bacterial proteins of around 225 residues in length. The function of this family is unknown.

    \ 5324 IPR008824 \ The RuvB protein makes up part of the RuvABC revolvasome which catalyses the resolution of Holliday junctions that arise during genetic recombination and DNA repair. Branch migration is catalysed by the RuvB protein that is targeted to the Holliday junction by the structure specific RuvA protein PUBMED:12423347. This group of sequences contain this signature which is located in the N-terminal region of the proteins.\ 4723 IPR000519 \ A cysteine-rich domain of approximately forty five amino-acid residues has been found in some extracellular eukaryotic proteins PUBMED:7820556, PUBMED:9187350, PUBMED:8518738, PUBMED:8267796. It is known as either the 'P', 'trefoil' or 'TFF' domain, and contains six cysteines linked by three disulphide bonds with connectivity 1-5, 2-4, 3-6. The domain has been found in a variety of extracellular eukaryotic proteins PUBMED:7820556, PUBMED:8518738, PUBMED:8267796, including protein pS2 (TFF1), a protein secreted by the stomach mucosa; spasmolytic polypeptide (SP) (TFF2), a protein of about 115 residues that inhibits gastrointestinal motility and gastric acid secretion; intestinal trefoil factor (ITF) (TFF3); Xenopus laevis stomach proteins xP1 and xP4; xenopus integumentary mucins A.1 (FIM-A.1 or preprospasmolysin) and C.1 (FIM-C.1), proteins which may be involved in defense against microbial infections by protecting the epithelia from the external environment; xenopus skin protein xp2 (or APEG); Zona pellucida sperm-binding protein B (ZP-B); intestinal sucrase-isomaltase ( / ), a vertebrate membrane bound, multifunctional enzyme complex which hydrolyzes sucrose, maltose and isomaltose; and lysosomal alpha-glucosidase ().\ 7149 IPR009929 \

    This family contains the bacterial type III secretion protein YscO, which is approximately 150 residues long. YscO has been shown to be required for high-level expression and secretion of the anti-host proteins V antigen and Yops in Yersinia pestis PUBMED:9683485.

    \ 6444 IPR009528 \

    This entry represents the C terminus of bacterial enzymes similar to type II restriction endonucleases BsuBI and PstI (). The enzymes of the BsuBI restriction/modification (R/M) system recognise the target sequence 5'CTGCAG and are functionally identical with those of the PstI R/M system PUBMED:1480472.

    \ 2436 IPR005492 \

    Mutations in the LGI/Epitempin gene can result in a special form of epilepsy, autosomal dominant lateral temporal epilepsy. The Epitempin protein was seen to contain a 130 amino acid repeat in its C-terminal section, although a sub-domain of 50 amino acids has now been further defined within this. The domain is often repeated and each repeat forms a beta-sheet, suggesting the formation of a beta-sheet structure. This presumed domain has no known function, but might form an Ig like fold such as a beta propeller.

    \

    This domain has now been found in a number of proteins associated with neurological disorders suggesting that it may play a role in the development of epilepsy and other related conditions PUBMED:12217514.

    \ 7245 IPR009986 \

    This family contains the bacterial transcriptional regulator Crl (approximately 130 residues long). This is a transcriptional regulator of the csgA curlin subunit gene for curli fibres that are found on the surface of certain bacteria PUBMED:1357528.

    \ 4794 IPR005655 \

    UL37 interacts with UL36, which is thought to be an important early step in tegumentation during virion morphogenesis in the cytoplasm PUBMED:11861875.

    \ 5112 IPR007949 \

    This domain consists of several SDA1 protein homologues. SDA1 is a Saccharomyces cerevisiae protein which is involved in the control of the\ actin cytoskeleton. The protein is essential for cell viability and is localised in the nucleus\ PUBMED:10704371.

    \ 1740 IPR001875 \

    The death effector domain (DED) is a homotypic protein interaction module composed of a bundle of six alpha-helices. DED is related in sequence and structure to the death domain (DD, see ) and the caspase recruitment domain (CARD, see ), which work in similar pathways and show similar interaction properties PUBMED:11504623. The dimerisation of DED domains is mediated primarily by electrostatic interactions. DED domains can be found in isolation, or in combination with other domains. Domains associated with DED include: caspase catalytic domains (in caspase-8, -10), death domains (in FADD), nuclear localisation sequences (in DEDD), transmembrane domains (in Bap31 and Bar), nucleotide-binding domains (in Dap3), coiled-coil domains (in Hip and Hippi), SAM domains (in Bar), and E2-binding RING domains (in Bar) PUBMED:15226512.

    \

    Several DED-containing proteins are involved in the regulation of apoptosis through their interactions with DED-containing caspases (), such as caspases 8 and 10 in humans, both of which contain tandem pairs of DEDs. There are many DED-containing modulators of apoptosis, which can either enhance or inhibit caspase activation PUBMED:15173180.

    \ 2515 IPR002348 \ The interleukin-1 (IL1) and heparin-binding growth factor (HBGF) families\ share low sequence similarity (about 25% PUBMED:1849658) but have very similar\ structures. Coupled with the Kunitz-type soybean trypsin inhibitors (STI),\ they form a structural superfamily. Despite their structural correspondence, however, they show no sequence similarity to the STI family.\ \ The crystal structures of interleukin-1 beta and HBGF1 have been solved, \ showing both families to have the same 12-stranded beta-sheet structure \ PUBMED:1738162; the beta-sheets are arranged in 3 similar lobes around a central \ axis, 6 strands forming an anti-parallel beta-barrel PUBMED:1707542, PUBMED:4071057. The beta-sheets \ are generally well preserved and the crystal structures superimpose in\ these areas. The intervening loops are less well conserved - the loop \ between beta-strands 6 and 7 is slightly longer in interleukin-1 beta.\ 5803 IPR010288 \

    This family consists of several bacterial ABC transporter proteins which are homologous to the EcsB protein of Bacillus subtilis. EcsB is thought to encode a hydrophobic protein with six membrane-spanning helices in a pattern found in other hydrophobic components of ABC transporters PUBMED:8581172.

    \ 3153 IPR013056 \ Lambda phage regulatory protein CIII is a small protein that plays a role in stabilising the CII transcriptional activator, via a mechanism that is not yet fully understood PUBMED:1828895, PUBMED:2957696. Stabilised CII activates CI, the gene for the repressor protein that prevents transcription of proteins required for lytic development. The central portion of the protein is well conserved and is both necessary and sufficient for the activity of the protein PUBMED:1828895. Comparative analysis of the CIII sequence in lambda, HK022 and the lambdoid bacteriophage P22 has led to the suggestion that this central region assumes an amphipathic alpha-helical structure PUBMED:1828895.\ 2952 IPR003265 \

    Endonuclease III () is a DNA repair enzyme which removes a number of damaged pyrimidines from DNA via its glycosylase activity and also cleaves the phosphodiester backbone at apurinic / apyrimidinic sites via a beta-elimination mechanism PUBMED:7773744, PUBMED:9032058. The structurally related DNA glycosylase MutY\ recognises and excises the mutational intermediate 8-oxoguanine-adenine mispair PUBMED:1328155. The 3-D structures of E. coli endonuclease III PUBMED:1411536 and catalytic domain of MutY PUBMED:9846876 have been determined. The\ structures contain two all-alpha domains: a sequence-continuous, six-helix domain (residues 22-132) and a Greek-key,\ four-helix domain formed by one N-terminal and three C-terminal helices (residues 1-21 and 133-211) together with the\ [Fe4S4] cluster. The cluster is bound entirely within the C-terminal loop by four cysteine residues with a ligation pattern\ Cys-(Xaa)6-Cys-(Xaa)2-Cys-(Xaa)5-Cys which is distinct from all other known Fe4S4 proteins. This structural motif is\ referred to as a [Fe4S4] cluster loop (FCL) PUBMED:7664751. Two DNA-binding motifs have been proposed, one at either end of the\ interdomain groove: the helix-hairpin-helix (HhH) and FCL motifs (see). The primary role of the iron-sulphur cluster appears to\ involve positioning conserved basic residues for interaction with the DNA phosphate backbone by forming the loop of\ the FCL motif PUBMED:7664751, PUBMED:10900127.

    \ \

    The HhH-GPD domain gets its name from its hallmark helix-hairpin-helix and Gly/Pro rich loop followed by a conserved aspartate. This domain is found in a diverse range of structurally related DNA repair proteins that include: endonuclease III, and DNA glycosylase MutY, an A/G-specific adenine glycosylase. Both of these enzymes have a C terminal iron-sulphur cluster loop (FCL). The methyl-CPG binding protein (MBD4) also contain a related domain that is a thymine DNA glycosylase. The family also includes DNA-3-methyladenine glycosylase II , 8-oxoguanine DNA glycosylases and other members of the AlkA family.

    \ 6730 IPR010693 \

    This family consists of a number of hypothetical bacterial proteins of around 70 residues in length. Members of this family contain three highly conserved cysteine residues. The function of this family is unknown.

    \ 2142 IPR007432 \ This family consists of several proteins of uncharacterised function.\ 2073 IPR007299 \

    This protein is predicted to be an integral membrane protein. Several family members are annotated as potential transport proteins, but there is no experimental evidence to suggest the function of any family member. It is usually found associated with the domain of unknown function DUF418 ().

    \ 4561 IPR005601 \

    Irreversible binding of T-even bacteriophages to Escherichia coli is mediated by the short tail fibres, which serve as inextensible stays\ during DNA injection. Short tail fibres are exceptionally stable elongated trimers of gene product 12 (gp12), a 56 kDa protein. The\ N-terminal region of gp12 is important for phage attachment, the central region forms a long shaft, while a C-terminal globular region is\ implicated in binding to the bacterial lipopolysaccharide core. The distal half-fiber contains two molecules each of gp36\ and gp37 and one molecule of gp35.\

    \ 1837 IPR005636 \

    This presumed domain is found in bacterial and eukaryotic proteins. Its function is unknown. The domain contains multiple conserved motifs including a DTXW motif that this domain has been named after.

    \ 1898 IPR003768 \

    This family represents ScpA, which along with ScpB () interacts with SMC in vivo forming a complex that is required for chromosome condensation and segregation PUBMED:12065423, PUBMED:12897137. The SMC-Scp complex appears to be similar to the MukB-MukE-Muk-F complex in Escherichia coli PUBMED:10545099, where MukB () is the homologue of SMC. ScpA and ScpB have little sequence similarity to MukE () or MukF (), they are predicted to be structurally similar, being predominantly alpha-helical with coiled coil regions.

    \ \ \

    In general scpA and scpB form an operon in most bacterial genomes. Flanking genes are highly variable suggesting that the operon has moved throughout evolution. Bacteria containing an smc gene also contain scpA or scpB but not necessarily both. An exception is found in Deinococcus radiodurans, which contains scpB but neither smc nor scpA. In the archaea the gene order SMC-ScpA is conserved in nearly all species, as is the very short distance between the two genes, indicating co-transcription of the both in different archaeal genera and arguing that interaction of the gene products is not confined to the homologues in Bacillus subtilis. It would seem probable that, in light of all the studies, SMC, ScpA and ScpB proteins or homologues act together in chromosome condensation and segregation in all prokaryotes PUBMED:12100548.

    \ \ \ 559 IPR005119 \ The structure of this domain is known and is similar to the periplasmic binding proteins PUBMED:9309218. This domain is found in members of the LysR family of prokaryotic transcriptional regulatory proteins which share sequence similarities over approximately 280\ residues including a putative helix-turn-helix DNA-binding motif at their N terminus.\ 2052 IPR007181 \

    This is a strongly conserved YPLM motif. It is found C-terminal to another domain of unknown function, DUF372 \ ().

    \ 3600 IPR002463 \ Ornatin is a potent glycoprotein IIb-IIIa (GP IIb-IIIa) antagonist and\ platelet aggregation inhibitor PUBMED:1765068. The protein is 41-52 residues in length\ and contains the RGD recognition motif common in adhesion proteins, and\ 6 conserved cysteine residues. The sequences of ornatin isoforms B, C, D \ and E are highly similar, while isoforms A2 and A3 are less similar, lacking\ the N-terminal 9 residues. Ornatins share ~40% identity with decorsin,\ a GP IIb-IIIa antagonist isolated from the leech (Macrobdella decora) PUBMED:1765068.\ 2208 IPR007539 \ This entry represents the C terminus of a protein of unknown function, found in dsDNA viruses with no RNA stage, including bacteriophages lambda and P22, and also in some Escherichia coli prophages.\ 1822 IPR006440 \

    The characterized member of this family is the death-on-curing (DOC) protein of phage P1. It is part of a two protein operon with prevents-host-death (phd) that forms an addiction module. DOC lacks homology to analogous addiction module post-segregational killing proteins involved in plasmid maintenance. These modules work as a combination of a long lived poison (e.g. this protein) and a more abundant but shorter lived antidote. Members of this family have a well-conserved central motif HxFx[ND][AG]NKR. \ A similar region, with K replaced by G, is found in the huntingtin interacting protein (HYPE) family PUBMED:8411153.

    \

    \ 5195 IPR008030 \

    NmrA is a negative transcriptional regulator involved in the post-translational modification of the\ transcription factor AreA. NmrA is part of a system controlling nitrogen metabolite repression in\ fungi PUBMED:11726498. This family only contains a few sequences as iteration results in significant\ matches to other Rossmann fold families.

    \ 3152 IPR005501 \ This family includes LamB. The lam locus of Aspergillus nidulans consists of two divergently transcribed genes, lamA and lamB, involved in the utilization of lactams such as 2-pyrrolidinone. Both genes are under the control of the positive regulatory gene amdR and are subject to carbon and nitrogen metabolite repression PUBMED:1729609. The exact molecular function of the proteins in this family is unknown.\ 7904 IPR012532 \

    This is a C-terminal domain in Bloom,s syndrome DEAD helicase subfamily PUBMED:15112237.

    \ 5051 IPR007888 \

    This family of proteins includes the DNA-binding meisosis-specific protein NDT80 PUBMED:12454476. It also describes PhoG\ and its homologues, proteins that have been found to increase acid phosphatase activity within certain fungi PUBMED:7916713. It\ is not clear that these proteins are actually the acid phosphatase themselves.

    \ 6129 IPR010444 \

    This family consists of several Bacteriophage lambda Kil protein like sequences from both phages and bacteria. Induction of a lambda prophage causes the death of the host cell even in the absence of phage replication and lytic functions due to expression of the lambda kil gene PUBMED:11470529.

    \ 8092 IPR013202 \

    These neuropeptides are the first members of the insect kinin-family isolated from the American cockroach. Their occurrence in the retrocerebral complex suggests a physiological role as a neurohormone. The C-terminal sequence Phe-X-Ser-Trp-Gly-NH2 characterised the peptides as members of the insect kinin family. Data suggest a possible involvement of insect kinins in water-balance by regulating the osmoregulation. These peptides have lengths ranging from 6 to 14 amino acids PUBMED:9350979.

    \ 5209 IPR008043 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the MEROPS peptidase family C21 (tymovirus endopeptidase family, clan CA). The type example is tymovirus endopeptidase (turnip yellow mosaic virus). The noncapsid protein expressed from ORF-206 of turnip yellow mosaic virus (TYMV) is autocatalytically processed by a\ papain-like protease, producing N-terminal 150-kDa and C-terminal 70-kDa proteins.

    \ 5182 IPR008019 \

    Apolipoprotein C-II (ApoC-II) is the major activator of lipoprotein lipase, a key enzyme in the\ regulation of triglyceride levels in human serum\ PUBMED:10903476.

    \ 1926 IPR002550 \ This transmembrane region has no known function. Many of the sequences in this family are annotated as hemolysins, however this is due to a similarity to that does not contain this domain. This domain is found in the N terminus of the proteins adjacent to two intracellular CBS domains ().\ 3394 IPR002898 \ This family groups together integral membrane proteins that appear to be involved in translocation of proteins across a membrane. These proteins are probably proton channels. \ MotA is an essential component of the flageller motor that uses a proton gradient to generate rotational motion in the flageller PUBMED:10348868. ExbB is part of the TonB-dependent transduction complex. The TonB complex uses the proton gradient across the inner bacterial membrane to transport large molecules across the outer bacterial membrane.\ 2817 IPR000118 \

    Metazoan granulins PUBMED:1542665 are a family of cysteine-rich peptides of about 6 Kd which may\ have multiple biological activity. A precursor protein (known as acrogranin)\ potentially encodes seven different forms of granulin (grnA to grnG) which are\ probably released by post-translational proteolytic processing. \ Granulins are evolutionary related to a PMP-D1, a peptide extracted from the\ pars intercerebralis of migratory locusts PUBMED:1740125.\ A schematic representation of the structure of a granulin is shown below:\ \

    \
           xxxCxxxxxCxxxxxCCxxxxxxxxCCxxxxxxCCxxxxxCCxxxxxCxxxxxxCx\
    \
    'C': conserved cysteine probably involved in a disulphide bond.\
    
    \

    \

    In plants a granulin domain is often associated with the C terminus of cysteine proteases belong to the MEROPS peptidase family C1, subfamily C1A (papain).

    \ 3333 IPR008332 \

    Synonym(s): 6-O-methylguanine-DNA methyltransferase, O-6-methylguanine-DNA-alkyltransferase

    \ \

    The repair of DNA containing O6-alkylated\ guanine is carried out by DNA-[protein]-cysteine S-methyltransferase (). The major mutagenic and carcinogenic effect of methylating agents in DNA is the formation of O6-alkylguanine. The\ alkyl group at the O-6 position is transferred to a cysteine residue in the\ enzyme PUBMED:3052269. This is a suicide reaction since the enzyme is irreversibly inactivated\ and the methylated protein accumulates as a dead-end product. Most, but not\ all of the methyltransferases are also able to repair O-4-methylthymine. DNA-[protein]-cysteine S-methyltransferases are widely distributed and are found in various prokaryotic and eukaryotic sources PUBMED:1579490.

    \

    This group of proteins are characterised by having an N-terminal ribonuclease-like domain associated with 6-O-methylguanine DNA methyltransferase activity ().

    \ 2096 IPR007369 \

    This group of sequences contain aspartic endopeptidases belong to MEROPS peptidase family A22 (presenilin family, clan AD): subfamily A22B.

    \ \ \

    \ The members of this family are membrane proteins. In some proteins this region is found associated with the Protease Associated (PA) domain . There is a sequence-similarity relationship with presenilin, in human and mouse the sequences are described as minor histocompatibility antigen H13. \ \

    \ 1376 IPR001273 \

    Phenylalanine, tyrosine and tryptophan hydroxylases constitute a family of \ tetrahydrobiopterin-dependent aromatic amino acid hydroxylases, all of which are \ rate-limiting catalysts for important metabolic pathways PUBMED:3475690. The proteins \ are structurally and functionally related, each containing iron, and catalysing ring \ hydroxylation of aromatic amino acids, using tetra-hydrobiopterin (BH4) as a substrate. \ All are regulated by phosphorylation at serines in their N-termini. It has been suggested \ that the proteins each contain a conserved C-terminal catalytic (C) domain and an unrelated N-terminal regulatory (R) domain. It is possible that the R domains arose from \ genes that were recruited from different sources to combine with the common gene for the \ catalytic core. Thus, by combining with the same C domain, the proteins acquired\ the unique regulatory properties of the separate R domains.

    \ \

    A variety of enzymes belong to this family that includes, phenylalanine-4-hydroxylase from Chromobacterium violaceum where it is copper-dependent; it is \ iron-dependent in Pseudomonas aeruginosa, phenylalanine-4-hydroxylase catalyzes the conversion of phenylalanine to tyrosine. \ In humans, deficiencies are the cause of phenylketonuria, the most common inborn error \ of amino acid metabolism PUBMED:9406548, tryptophan 5-hydroxylase catalyzes the rate-limiting step in serotonin biosynthesis: \ the conversion of tryptophan to 3-hydroxy-anthranilate and tyrosine 3-hydroxylase catalyzes the rate limiting step in catecholamine biosynthesis: \ the conversion of tyrosine to 3,4-dihydroxy-L-phenylalanine.

    \ 1328 IPR006871 \ This is a family of Baculovirus ssDNA-binding proteins.\ 4997 IPR005587 \

    This presumed family is about 160 residues long. It is found in archaebacteria and eubacteria. In it is associated with a helix-turn-helix domain. This suggests that this may be a ligand-binding family.

    \ 3268 IPR004023 \ This family was originally identified in drosophila and called mago nashi, it is a strict maternal effect, grandchildless-like, gene PUBMED:1765008. The human homologue has been shown to interact with an RNA binding protein, ribonucleoprotein rbm8 () PUBMED:10662555. An RNAi knockout of the Caenorhabditis elegans homologue causes masculinization of the germ line (Mog phenotype) hermaphrodites, suggesting it is involved in hermaphrodite germ-line sex determination PUBMED:10656761.\ 1664 IPR001166 \

    Arthropods express a family of neuropeptides which include, hyperglycemic\ hormone (CHH), molt-inhibiting hormone (MIH), gonad-inhibiting hormone (GIH) and \ mandibular organ-inhibiting hormone (MOIH) from crustaceans and ion transport peptide (ITP) from\ locust PUBMED:8590372.

    \

    Hyperglycemic hormone, which controls blood sugar levels, is an abundant\ peptide in the sinus glands of isopods and decapods PUBMED:8436119. The peptide is a potent secretagogue, releasing digestive enzymes\ from the hepatopancreas. It may act as a stress hormone. American lobster molt-inhibiting\ hormone also shows hyper-glycemic hormone activity PUBMED:2169734.

    \ 3922 IPR001592 \

    This protease is found in genome polyproteins of potyviruses. The genome polyprotein contains: N-terminal protein (P1), helper component protease\ (, HC-PRO), protein P3, 6KD protein (6K1), cytoplasmic inclusion protein (CI), 6KD protein 2 (6K2), genome-linked protein (VPG), nuclear inclusion protein A (), nuclear inclusion protein B () and coat protein (CP).\ The coat protein is at the C terminus of the polyprotein.

    \ 1762 IPR004097 \ This domain is called DHHA2 since it is often associated with the DHH domain () and is diagnostic of DHH subfamily 2 members PUBMED:9478130. The domain is about 120 residues long and contains a conserved DXK motif at its amino terminus. It is present in inorganic pyrophosphatases and in exopolyphosphatase of Saccharomyces cerevisiae.\ 3052 IPR002069 \ Interferon gamma (IFN-gamma) is produced by lymphocytes activated by specific antigens or mitogens. IFN-gamma shows antiviral activity and has important immunoregulatory functions. It is a potent activator of microphages and had antiproliferative effects on transformed cells. It can potentiate the antiviral and antitumor effects of the type I interferons.\ 6921 IPR010770 \

    This family consists of several eukaryotic SGT1 proteins. Human SGT1 or hSGT1 is known to suppress GCR2 and is highly expressed in the muscle and heart. The function of this family is unknown although it has been speculated that SGT1 may be functionally analogous to the Gcr2p protein of Saccharomyces cerevisiae which is known to be a regulatory factor of glycolytic gene expression PUBMED:9928932.

    \ 271 IPR005560 \ This entry is a small cysteine-rich repeat. The cysteines mostly follow a C-X(2)-C-X(3)-C-X(2)-C-X(3) pattern, though they often appear at other positions in the repeat as well.\ 6609 IPR009619 \

    This is a group of proteins of unknown function.

    \ 7711 IPR008987 \

    The T4 bacteriophage is a double-stranded, structurally complex virus that infects Escherichia coli. Gene product 9 (gp9) connects the long tail fibres to the baseplate, and triggers baseplate reorganization and tail contraction after virus attachment to the host cell. The gp9 protein forms a homotrimer, with each monomer having three domains: the N-terminal alpha-helical domain forms a triple coiled coil, the middle domain is a mixed, seven-stranded beta sandwich with a unique fold, and the C-terminal domain is a eight-stranded beta-sandwich with similarity to jellyroll viral capsid protein structures PUBMED:10545330. The flexible loops that occur between domains may enable the conformational changes necessary during infection.

    \

    The SSF signature in this entry is currently under review. Please be aware that some of the protein hits may be false positives.

    \ 7374 IPR011489 \

    The Pfam alignment for this domain is truncated at the C terminus and does not include the final cysteine defined in Callebaut et al PUBMED:12507493. This is to stop the family overlapping with other domains.

    \ 6298 IPR010935 \

    This entry represents the hinge region of the SMC (Structural Maintenance of Chromosomes) family of proteins. The hinge region is responsible for formation of the DNA interacting dimer. It is also possible that the precise structure of it is an essential determinant of the specificity of the DNA-protein interaction PUBMED:12411491.

    \ 4852 IPR005346 \

    This is a small family of proteins of unknown function.

    \ 5453 IPR008728 \ PAXNEB or PAX6 neighbour is found in several eukaryotic organisms. The function of this protein is unknown.\ 5074 IPR007911 \

    This family consists of several bacterial flagellar transcriptional activator (FlhD) proteins. FlhD\ combines with FlhC to form a regulatory complex in Escherichia coli.\ This complex has been shown to be a global regulator involved in many cellular processes as well as\ a flagellar transcriptional activator PUBMED:11287152.

    \ 7670 IPR012417 \

    The sequences featured in this family are found repeated in a number of plant calmodulin-binding proteins (such as , and ), and are thought to constitute the calmodulin-binding domains PUBMED:12825696, PUBMED:11684678. Binding of the proteins to calmodulin depends on the presence of calcium ions PUBMED:12825696, PUBMED:11684678. These proteins are thought to be involved in various processes, such as plant defence responses PUBMED:12825696 and stolonisation or tuberization PUBMED:11684678.

    \ 7487 IPR011651 \ This entry represents a region of conserved sequence at the N terminus of several Notch ligand proteins.\ 5379 IPR008704 \ This family consists of several small subunit ribosomal RNA proteins from various Naegleria species. Naegleria species are pathogenic free-living amoebae PUBMED:9214655, PUBMED:7804245.\ 5218 IPR008659 \ This family contains several KRE9 and KNH1 proteins which are involved in encoding cell surface O glycoproteins, which are required for beta -1,6-glucan synthesis in Saccharomyces cerevisiae PUBMED:9748432.\ 2251 IPR006747 \ This family includes several uncharacterised proteins.\ 4696 IPR002622 \ Transposase proteins are necessary for efficient DNA transposition.\ This family includes insertion sequences from Synechocystis PCC 6803 three of which are characterised as homologous to bacterial IS5- and IS4- and to several members of the IS630-Tc1-mariner superfamily PUBMED:9305771.\ 3610 IPR006132 \

    This entry contains two related enzymes:\

      \
    1. Aspartate carbamoyltransferase () (ATCase) catalyzes the conversion\ of aspartate and carbamoyl phosphate to carbamoylaspartate, the second step\ in the de novo biosynthesis of pyrimidine nucleotides PUBMED:3015959. In prokaryotes\ ATCase consists of two subunits: a catalytic chain (gene pyrB) and a\ regulatory chain (gene pyrI), while in eukaryotes it is a domain in a multi-\ functional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD\ in mammals PUBMED:8098212) that also catalyzes other steps of the biosynthesis of\ pyrimidines.
    2. \
    3. Ornithine carbamoyltransferase () (OTCase) catalyzes the conversion\ of ornithine and carbamoyl phosphate to citrulline. In mammals this enzyme\ participates in the urea cycle PUBMED:2662961 and is located in the mitochondrial\ matrix. In prokaryotes and eukaryotic microorganisms it is involved in the\ biosynthesis of arginine. In some bacterial species it is also involved in the\ degradation of arginine PUBMED:3109911 (the arginine deaminase pathway).
    4. \
    \ It has been shown PUBMED:6379651 that these two enzymes are evolutionary related. The\ predicted secondary structure of both enzymes are similar and there are some\ regions of sequence similarities. One of these regions includes three\ residues which have been shown, by crystallographic studies PUBMED:6377306, to be\ implicated in binding the phosphoryl group of carbamoyl phosphate and may also play a role in trimerization of the molecules PUBMED:10318893. The carboxyl-terminal, aspartate/ornithine-binding domain is is described by . \

    \ 8000 IPR012584 \

    This domain is found in a novel family of nucleolar proteins PUBMED:15112237.

    \ 2742 IPR000743 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 28 comprises enzymes with several known activities; polygalacturonase (); exo-polygalacturonase (); exo-polygalacturonase (); rhamnogalacturonase (EC not defined).

    \ \

    Polygalacturonase (PG) (pectinase) PUBMED:2400785, PUBMED:2193922 catalyzes the random\ hydrolysis of 1,4-alpha-D-galactosiduronic linkages in pectate and other galacturonans. In fruit,\ polygalacturonase plays an important role in cell wall metabolism during ripening. In plant\ bacterial pathogens such as Erwinia carotovora or Pseudomonas solanacearum and fungal\ pathogens such as Aspergillus niger, polygalacturonase is involved in maceration and \ soft-rotting of plant tissue. Exo-poly-alpha-D-galacturonosidase () (exoPG) PUBMED:2168372\ hydrolyzes peptic acid from the non-reducing end, releasing digalacturonate. PG and exoPG share a\ few regions of sequence similarity, and belong to family 28 of the glycosyl hydrolases.

    \ 6616 IPR010646 \

    This is a group of proteins of unknown function.

    \ 7482 IPR003597 \

    The basic structure of immunoglobulin (Ig) molecules is a tetramer of two light chains and two heavy chains linked by disulfide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain (CL) and a variable domain (VL). There are five types of heavy chains: alpha, delta, epsilon, gamma and mu, all consisting of a variable domain (VH) and three (in alpha, delta and gamma) or four (in epsilon and mu) constant domains (CH1 to CH4). Ig molecules are highly modular proteins, in which the variable and constant domains have clear, conserved sequence patterns. The domains in Ig and Ig-like molecules are grouped into four types: variable (V-type) (), constant-1 (C1-type) (), constant-2 (C2-type) () and intermediate (I-type) PUBMED:9417933. X-ray and NMR studies have shown that these domains form a Greek-key beta-sandwich structure with the types differing in the number of strands in the beta-sheets as well as in their sequence patterns PUBMED:15327963, PUBMED:11377196.

    \

    The Ig constant chain domains are related to an extracellular domain found in both class I PUBMED:9597133 and class II PUBMED:12637770 major histocompatibility complex (MHC) alpha and beta chains. These homologous domains are approximately one hundred amino acids long and include a conserved intradomain disulfide bond. Members of the immunoglobulin superfamily are found in hundreds of proteins of different functions. Examples include antibodies, the giant muscle kinase titin and receptor tyrosine kinases. Immunoglobulin-like domains may be involved in protein-protein and protein-ligand interactions.

    \ \ \

    Ig-like domains can be classified according to the number of beta strands. C1-type are classical Ig-like domains that resemble the antibody constant domain. C1 domains are found almost exclusively in molecules involved in the immune system, such as immunoglobulins, major histocompatibility complex molecules and T-cell receptors.

    \ 4720 IPR007688 \

    VirB proteins are suggested to act at the bacterial surface and there play an important role in directing t-DNA transfer to plant cells. VirB6 from Agrobacterium tumefaciens is an essential component of the type IV secretion machinery for T pilus formation and genetic\ transformation of plants. Absence of VirB6 leads to\ reduced cellular levels of VirB5 and VirB3, which were proposed to assist T pilus formation as minor component(s) or assembly\ factor(s), respectively.\

    \ 6595 IPR010637 \

    This family consists of several SifA and SifB and SseJ proteins, which seem to be specific to the Salmonella species. SifA, SifB and SseJ have been demonstrated to localise to the Salmonella-containing vacuole (SCV) and to Salmonella-induced filaments (Sifs). Trafficking of SseJ and SifB away from the SCV requires the SPI-2 effector SifA. SseJ trafficking away from the SCV along Sifs is unnecessary for its virulence function PUBMED:12496192.

    \ 7391 IPR011438 \

    This domain is found in several hypothetical bacterial proteins as a tandem repeat.

    \ 6862 IPR010748 \

    This entry represents the N terminus (approximately 300 residues) of subunit 3 of the eukaryotic origin recognition complex (ORC). Origin recognition complex (ORC) is composed of six subunits that are essential for cell viability. They collectively bind to the autonomously replicating sequence (ARS) in a sequence-specific manner and lead to the chromatin loading of other replication factors that are essential for initiation of DNA replication PUBMED:11395502.

    \ 7410 IPR011418 \

    This highly-conserved sequence is found at the C terminus of several apurinic/apyrimidinic (AP) endonucleases in a range of Gram-positive and Gram-negative bacteria. See also AP endonucleases family 2 .

    \ 6682 IPR009660 \

    This family consists of bacteriophage Gp15 proteins and related bacterial sequences. The function of this family is unknown

    \ 1773 IPR000627 \

    Dioxygenases catalyse the incorporation of both atoms of molecular oxygen into substrates using a variety of reaction mechanisms. Cleavage of aromatic rings is one of the most important functions of dioxygenases, which play key roles in the degradation of aromatic compounds. The substrates of ring-cleavage dioxygenases can be classified into two groups according to the mode of scission of the aromatic ring. Intradiol enzymes use a non-haem Fe(III) to cleave the aromatic ring between two hydroxyl groups (ortho-cleavage), whereas extradiol enzymes () use a non-haem Fe(II) to cleave the aromatic ring between a hydroxylated carbon and an adjacent non-hydroxylated carbon (meta-cleavage) PUBMED:10730195. These two subfamilies differ in sequence, structural fold, iron ligands, and the orientation of second sphere active site amino acid residues.

    \

    Enzymes that belong to the intradiol family include catechol 1,2-dioxygenase (1,2-CTD) (); protocatechuate 3,4-dioxygenase (3,4-PCD) (); and chlorocatechol 1,2-dioxygenase () PUBMED:15060064.

    \ \ 5806 IPR010291 \

    This domain, of unknown function, associates with several hypothetical eukaryotic proteins.

    \ 5431 IPR008493 \ This family consists of several eukaryotic proteins of unknown function.\ 453 IPR000795 \ Elongation factors belong to a family of proteins that promote the GTP-dependent binding of aminoacyl\ tRNA to the A site of ribosomes during protein biosynthesis, and catalyse the translocation of the\ synthesised protein chain from the A to the P site. The proteins are all relatively similar in the vicinity of\ their C-termini, and are also highly similar to a range of proteins that includes the nodulation Q protein from\ Rhizobium meliloti, bacterial tetracycline resistance proteins PUBMED:2841293 and the omnipotent suppressor\ protein 2 from yeast.

    In both prokaryotes and eukaryotes, there are three distinct types of elongation\ factors, EF-1alpha (EF-Tu), which binds GTP and an aminoacyl-tRNAand delivers the latter to the A site of\ ribosomes; EF-1beta (EF-Ts), which interacts with EF-1a/EF-Tu to displace GDP and thus allows the\ regeneration of GTP-EF-1a; and EF-2 (EF-G), which binds GTP and peptidyl-tRNA and translocates the\ latter from the A site to the P site. In EF-1-alpha, a specific region has been shown PUBMED:3126836 to be\ involved in a conformational change mediated by the hydrolysis of GTP to GDP. This region is conserved\ in both EF-1alpha/EF-Tu as well as EF-2/EF-G and thus seems typical for GTP-dependent proteins which\ bind non-initiator tRNAs to the ribosome. The GTP-binding protein synthesis factor family also includes the\ eukaryotic peptide chain release factor GTP-binding subunits PUBMED:7556078 and prokaryotic peptide chain\ release factor 3 (RF-3) PUBMED:7737996; the prokaryotic GTP-binding protein lepA and its homolog in yeast\ (GUF1) and Caenorhabditis elegans (ZK1236.1); yeast HBS1 PUBMED:1394434; rat statin S1 PUBMED:1709933; and the prokaryotic\ selenocysteine-specific elongation factor selB PUBMED:2531290.

    \ \ 3176 IPR004926 \

    Members of this family are similar to late embryogenesis abundant proteins. Members of the family have been isolated in a number of\ different screens. However, the molecular function of these proteins remains obscure.

    \ \ 5295 IPR008645 \

    The function of the U47 herpesvirus proteins is unknown PUBMED:10482554.

    \ 3278 IPR002539 \ The C terminus of the MaoC protein is found to share similarity with\ a wide variety of enzymes. All these enzymes contain multiple domains.\ This domain is found in parts of two enzymes that have been assigned\ dehydratase activities.\ A deletion mutant of the C-terminal 271 amino acids in \ abolished its 2-enoyl-CoA hydratase activity, suggesting that this\ region may be a hydratase enzyme PUBMED:9891075.\ The maoC gene is part of a operon with maoA which is involved\ in the synthesis of monoamine oxidase PUBMED:1556068.\ 1800 IPR003498 \ This family includes proteins that are probably involved in DNA packing in herpesvirus. This domain is found at the C-terminus\ of the protein.\ 5162 IPR007999 \

    This family consists of several uncharacterised Drosophila melanogaster proteins of unknown function.

    \ 3597 IPR007203 \

    ORMDL1 belongs to a novel gene family comprising three genes in humans (ORMDL1, ORMDL2 and ORMDL3), and homologs in yeast, microsporidia, plants, Drosophila, urochordates and vertebrates. ORMDLs are involved in protein folding in the endoplasmic reticulum.

    \ 2852 IPR003169 \

    The glycine-tyrosine-phenylalanine (GYF) domain is an around 60-amino acid\ domain which contains a conserved GP[YF]xxxx[MV]xxWxxx[GN]YF motif. It was\ identified in the human intracellular protein termed CD2 binding protein 2\ (CD2BP2), which binds to a site containing two tandem PPPGHR segments within\ the cytoplasmic region of CD2. Binding experiments and mutational analyses\ have demonstrated the critical importance of the GYF tripeptide in ligand\ binding. A GYF domain is also found in several other eukaryotic proteins of\ unknown function PUBMED:9843987. It has been proposed that the GYF domain found in these\ proteins could also be involved in proline-rich sequence recognition PUBMED:10404223.\ \ Resolution of the structure of the CD2BP2 GYF domain by NMR spectroscopy\ revealed a compact domain with a beta-beta-alpha-beta-beta topology, where the\ single alpha-helix is tilted away from the twisted, anti-parallel beta-sheet.\ The conserved residues of the GYF domain create a contiguous\ patch of predominantly hydrophobic nature which forms an integral part of the\ ligand-binding site PUBMED:10404223. There is limited homology within the C-terminal 20-30\ amino acids of various GYF domains, supporting the idea that this part of the\ domain is structurally but not functionally important PUBMED:12426371.

    \ \ 7640 IPR012918 \

    The members of this family are sequences similar to the C-terminal region of RTP801, the protein product of a hypoxia-inducible factor 1 (HIF-1)- responsive gene PUBMED:11884613. Two members of this family expressed by Drosophila melanogaster, Scylla () and Charybde (), are designated as Hox targets PUBMED:11884613. RTP801 is thought to be involved in various cellular processes PUBMED:11884613. Over expression of the gene caused the \ apoptosis-resistant phenotype in cycling cells, and apoptosis sensitivity in growth arrested cells PUBMED:11884613. Moreover, the protein product of the mouse homolog of RTP801 (dig2 ()) is thought to be induced by diverse apoptotic signals, and also by dexamethasone treatment PUBMED:12736248.

    \ 780 IPR005162 \

    Transposable elements (TEs) promote various chromosomal rearrangements more efficiently, and often more specifically, than\ other cellular processes. Retrotransposons are structurally similar to retroviruses and are bounded by long terminal repeats. This is a family of eukaryotic Gag or capsid-related retrotranspon-related proteins. There is a central motif QGXXEXXXXXFXXLXXH that is common to Retroviridae gag-proteins, but is poorly conserved.

    \ 6414 IPR010563 \

    This family consists of several TraK proteins from Escherichia coli, Salmonella typhi and Salmonella typhimurium. TraK is known to be essential for pilus assembly but its exact role in this process is unknown PUBMED:8655498.

    \ 2475 IPR006694 \

    This is a group of related eukaryotic putative fatty acid hydrolases.

    \ 3638 IPR007466 \

    Peptidyl-arginine deiminase (PAD) enzymes catalyse the deimination of the guanidino group from carboxy-terminal arginine residues of various peptides to produce ammonia. PAD from Porphyromonas gingivalis (PPAD) appears to be evolutionarily unrelated to mammalian PAD (), which is a metalloenzyme. PPAD is thought to belong to the same superfamily as aminotransferase and arginine deiminase, and to form an alpha/beta propeller structure. This family has previously been named PPADH (Porphyromonas peptidyl-arginine deiminase homologs) PUBMED:11504612. The predicted catalytic residues in PPAD () are Asp130, Asp187, His236, Asp238 and Cys351 PUBMED:11504612. These are absolutely conserved with the exception of Asp187 which is absent in two family members. PPAD is also able to catalyse the deimination of free L-arginine, but has primarily peptidyl-arginine specificity. It may have a FMN cofactor PUBMED:10377098.

    \ 6122 IPR010441 \

    This is a family of proteins of unknown function.

    \ 1599 IPR000885 \ Collagens contain a large number of globular domains in between the\ regions of triple helical repeats .\ These domains are involved in binding diverse substrates.\ One of these domains is found at the C terminus of fibrillar collagens.\ The exact function of this domain is unknown.\ 3028 IPR002780 \

    HypD is involved in the hyp operon which is needed for the activity of the three hydrogenase isoenzymes in Escherichia coli. HypD is one of the genes needed for formation of these enzymes PUBMED:1849603. This protein has been found in Gram-negative and Gram-positive bacteria and Archaea. HypD contains\ many possible metal binding residues, which may bind to nickel.\ Transposon Tn5 insertions into HypD resulted in R.\ leguminosarum mutants that lacked any hydrogenase activity in\ symbiosis with peas PUBMED:8326860.

    \ 1734 IPR004875 \

    These proteins are probably endonucleases of the DDE superfamily. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis. The catalytic activity of this enzyme involves DNA cleavage at a specific site followed by a strand transfer reaction. Interestingly this family also includes the CENP-B protein. This\ domain in that protein appears to have lost the metal binding residues and is unlikely to have endonuclease activity. Centromere Protein B (CENP-B) is a DNA-binding protein\ localised to the centromere PUBMED:9451007.

    \ \ 7616 IPR012428 \

    The members of this family are all derived from relatively short hypothetical proteins thought to be expressed by various Nucleopolyhedroviruses.

    \ 6117 IPR009384 \

    This family consists of several bacterial FlbD flagellar proteins. The exact function of this family is unknown PUBMED:9168127.

    \ 4857 IPR005349 \

    This family of short membrane proteins is as yet uncharacterised.

    \ 6726 IPR010690 \

    This family consists of several putative bacterial stage IV sporulation (SpoIV) proteins. YqfD of Bacillus subtilis () is known to be essential for efficient sporulation although its exact function is unknown PUBMED:12662922.

    \ 4945 IPR007430 \ VirB8 is a bacterial virulence protein with cytoplasmic, transmembrane, and periplasmic regions. It is thought that it is a primary constituent of a DNA transporter. The periplasmic region interacts with VirB9, VirB10, and itself PUBMED:11371528.\ 4874 IPR005365 \

    This is a small family of proteins of unknown function.

    \ 4772 IPR000574 \ This signature is found in coat proteins from the related tymoviruses. The coat protein is also known as the virion\ protein. The virus coat is composed of 180 copies of the coat protein arranged in an icosahedral shell.\ 4578 IPR000195 \ Identification of a TBC domain in GYP6_YEAST and GYP7_YEAST, which are\ GTPase activator proteins of yeast Ypt6 and Ypt7, imply that these domains\ are GTPase activator proteins of Rab-like small GTPases PUBMED:11013213.\ 2758 IPR000852 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 52 \ comprises enzymes with only one known activity; beta-xylosidase ().

    \ \

    Proteins harboring beta-xylosidase and xylanase activities PUBMED:8074507have been\ identified in the Gram-positive, facultative thermophilic aerobe Bacillus\ stearothermophilus 21 PUBMED:8074507. This microbe, which functions in xylan\ degradation, can utilise xylan as a sole source of carbon. The enzyme\ hydrolyses 1,4-beta-D-xylans, removing successive D-xylose residues from\ the non-reducing termini. It also hydrolyses xylobiose.

    \ 7671 IPR008920 \

    Bacteria regulate membrane fluidity by manipulating the relative levels of saturated and unsaturated fatty acids within the phospholipids of their membrane bilayers. In Escherichia coli, the transcription factor, FadR, functions as a switch that co-ordinately regulates the machinery required for fatty acid beta-oxidation and the expression of a key enzyme in fatty acid biosynthesis. This single repressor controls the transcription of the whole fad regulon PUBMED:11279025.

    \

    The crystal structure of FadR reveals a two domain dimeric molecule where the N-terminal winged-helix domain binds DNA (), and the C-terminal domain binds acyl-CoA PUBMED:11279025. The binding of acyl-CoA to the C-terminal domain results in a conformational change that affects the DNA binding affinity of the N-terminal domain PUBMED:11013219.

    \

    FadR is a member of the GntR family of bacterial transcription regulators. The DNA-binding domain is well conserved for this family, whereas the C-terminal effector-binding domain () is more variable, and is consequently used to define the GntR subfamilies PUBMED:11756427. The FadR group is the largest subgroup, and is characterised by an all-helical C-terminal domain composed of 6 to 7 alpha helices PUBMED:11013219. This entry represents the C-terminal domain of FadR.

    \ \ 5556 IPR008886 \ Despite being classed as uncharacterised proteins, the members of this family are almost certainly enzymes in that they contain a domain distantly related to .\ 5240 IPR008743 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \ This group of cysteine peptidases corresponds to MEROPS peptidase family C33 (clan CA). The type example is equine arteritis virus Nsp2-type cysteine proteinase, which is involved in viral polyprotein processing PUBMED:10725411.\ 5289 IPR008405 \ Apo L belongs to the high density lipoprotein family that plays a central role in cholesterol transport. The cholesterol content of membranes is important in cellular processes such as modulating gene transcription and signal transduction both in the adult brain and during neurodevelopment. There are six apo L genes located in close proximity to each other on chromosome 22q12 in humans. 22q12 is a confirmed high-susceptibility locus for schizophrenia and close to the region associated with velocardiofacial syndrome that includes symptoms of schizophrenia PUBMED:11930015.\ 58 IPR004313 \

    The two acireductone dioxygenase enzymes (ARD and ARD', previously known as E-2 and E-2') from Klebsiella\ pneumoniae share the same amino acid sequence Q9ZFE7, but bind different metal ions: ARD binds Ni2+, ARD' binds\ Fe2+ PUBMED:9880484. ARD and ARD' can be experimentally interconverted by removal of the bound metal ion and reconstitution with\ the appropriate metal ion. The two enzymes share the same substrate, 1,2-dihydroxy-3-keto-5-(methylthio)pentene, but\ yield different products. ARD' yields the alpha-keto precursor of methionine (and formate), thus forming part of the\ ubiquitous methionine salvage pathway that converts 5'-methylthioadenosine (MTA) to methionine. This pathway is\ responsible for the tight control of the concentration of MTA, which is a powerful inhibitor of polyamine biosynthesis and\ transmethylation reactions PUBMED:11371200. ARD yields methylthiopropanoate, carbon monoxide and formate, and thus prevents the\ conversion of MTA to methionine. The role of the ARD catalysed reaction is unclear: methylthiopropanoate is cytotoxic,\ and carbon monoxide can activate guanylyl cyclase, leading to increased intracellular cGMP levels PUBMED:11371200, PUBMED:9880484.

    \

    This family also\ contains other proteins, whose functions are not well characterised.

    \ 4366 IPR004333 \ The SBP plant protein domain is a sequence\ specific DNA-binding domain PUBMED:8569690. Proteins with this domain probably function as transcription factors involved in the control of\ early flower development. The domain contains 10 conserved cysteine and histidine residues that probably are zinc\ ligands.\ 7134 IPR009919 \

    This family consists of several hypothetical putative outer membrane proteins which appear to be specific to Anaplasma marginale and Anaplasma ovis.

    \ 7055 IPR004082 \ A total of 715 potential protein-coding genes have been identified in the \ nucleotide sequence of Arabidopsis thaliana chromosome 5, with an average gene density of 1 gene per 4001 bp PUBMED:10718197. Amongst the gene products is a \ well-conserved family of 130.7kDa proteins that share no sequence similarity\ with any other known proteins. The sequences are characterised by an N-terminal domain of variable length, a central cysteine-rich region and a relatively acidic C-terminal domain. The sequences may possess a PHD finger.\ 5176 IPR008013 \

    GATA transcription factors mediate cell differentiation in a diverse range of tissues. Mutations\ are often associated with certain congenital human\ disorders. The six classical vertebrate GATA proteins, GATA-1 to GATA-6, are highly\ homologous and have two tandem zinc fingers. The classical GATA transcription factors function as\ transcription activators. In lower metazoans GATA proteins carry a single canonical zinc finger. This\ family represents the N-terminal domain of the family of GATA transcription activators.

    \ 7331 IPR011125 \

    Proteins of the HypF family are involved in the maturation and regulation of hydrogenase PUBMED:9492269. In the N terminus they appear to have two Zinc finger domains.

    \ 2334 IPR007838 \ This is a family of eubacterial hypothetical proteins.\ 127 IPR007237 \

    This family includes the CD20 protein and the beta subunit of the high affinity receptor for IgE Fc. The high affinity receptor for IgE is a tetrameric structure consisting of a single IgE-binding alpha subunit, a single beta subunit, and two disulphide-linked gamma subunits. The alpha subunit of Fc epsilon RI and most Fc receptors are homologous members of the Ig superfamily. By contrast, the beta and gamma subunits from Fc epsilon RI are not homologous to the Ig superfamily. Both molecules have four putative transmembrane segments and a probable topology where both N- and C termini protrude into the cytoplasm PUBMED:2531187.

    \ 467 IPR006674 \ This domain is found in a superfamily of enzymes with a predicted or known phosphohydrolase activity. These enzymes appear to be involved in the nucleic acid metabolism, signal transduction and possibly other functions in bacteria, archaea and eukaryotes.\ The fact that all the highly conserved residues in the HD superfamily are histidines or aspartates suggests that coordination of divalent cations is essential for the activity of these proteins PUBMED:9868367.\ 8130 IPR013153 \

    This is a family of PrkA bacterial and archaeal serine kinases approximately 630 residues long. This is the N-terminal AAA domain PUBMED:8626065.

    \ 6110 IPR009380 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 2300 IPR007703 \ This family contains several uncharacterised viral proteins of unknown function.\ 4190 IPR001141 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein, L27 is found in fungi, plants, algae and vertebrates\ PUBMED:8148381, PUBMED:8058833.\ The family has a specific signature at the C terminus.

    \ 3363 IPR004326 \

    The Mlo-related proteins are a family of plant integral membrane proteins, first discovered in barley. Mutants lacking wild-type Mlo proteins show\ broad spectrum resistance to the powdery mildew fungus, and dysregulated cell death control, with spontaneous cell\ death in response to developmental or abiotic stimuli. Thus wild-type Mlo proteins are thought to be inhibitors of cell\ death whose deficiency lowers the threshold required to trigger the cascade of events that result in plant cell death.

    \

    Mlo\ proteins are localized in the plasma membrane and possess seven transmembrane regions; thus the Mlo family is the only\ major higher plant family to possess 7 transmembrane domains. It has been suggested that Mlo proteins function as\ G-protein coupled receptors in plants PUBMED:10574976; however the molecular and biological functions of Mlo proteins is still unclear.

    \ 4119 IPR002631 \ This family consists of various bacterial plasmid replication (Rep) proteins. These proteins are essential for replication of plasmids, the Rep proteins are topoisomerases that nick the positive stand at the plus origin of replication and also at the single-strand conversion sequence PUBMED:2695401.\ 1278 IPR001421 \ Adenosine triphosphate (ATP) synthase contains a rotary motor involved in biological energy conversion. Its membrane-embedded F0 sector has a rotation generator fueled by the proton-motive force, which provides the energy required for the synthesis of ATP by the F1 domain PUBMED:10576729. Subunit 8 is one of the chains of the nonenzymatic component (F0) of the mitochondrial ATPase complex.\ 5499 IPR008627 \ This pentapeptide repeat is found mainly in Caenorhabditis elegans. The most conserved amino acid at each position leads to its name GETHR (Bateman A unpublished obs.). The family also includes a divergent repeat in a microneme protein . The function of this repeat is unknown.\ 582 IPR003890 \ This is the middle domain of eukaryotic initiation factor 4G (eIF4G). It also occurs in the nonsense-mediated mRNA decay protein 2 (NMD2p), which is involved in nonsense-mediated decay of mRNAs containing premature stop codons, and nuclear cap-binding protein (CBP80). The domain is rich in alpha-helices and may contain multiple alpha-helical repeats. In eIF4G, this domain binds eIF4A, eIF3, RNA and DNA.\ 3138 IPR005541 \

    The MEINOX region is comprised of two domains, KNOX1 and KNOX2. KNOX1 plays a role in suppressing target gene expression. KNOX2, essential for function, is thought to be necessary for homo-dimerization PUBMED:11549765.

    \ 484 IPR003511 \ The HORMA (for Hop1p, Rev7p and MAD2) domain has been suggested to recognise chromatin states that result from DNA adducts, double stranded breaks or non-attachment to the spindle and acts as an adaptor that recruits other proteins. Hop1 is a meiosis-specific protein, Rev7 is required for DNA damage induced mutagenesis, and MAD2 is a spindle checkpoint protein which prevents progression of the cell cycle upon detection of a defect in mitotic spindle integrity.\ 3317 IPR007746 \ The prokaryotic MerE (or URF-1) protein is part of the mercury resistance operon. The protein is thought not to have any direct role in conferring mercury resistance to the organism but may be a mercury resistance transposon PUBMED:9479042, PUBMED:11763242.\ 1914 IPR003799 \

    This entry describes proteins of unknown function.

    \ 7000 IPR010800 \

    This family of proteins includes several glycine rich proteins as well as two nodulins 16 and 24. The family also contains proteins that are induced in response to various stresses.

    \ 1289 IPR000272 \

    The FXYD protein family contains at least seven members in mammals PUBMED:12538882. Two other family\ members that are not obvious orthologs of any identified mammalian FXYD protein\ exist in zebrafish. All these proteins share a signature sequence of six conserved\ amino acids comprising the FXYD motif in the NH2-terminus, and two glycines and\ one serine residue in the transmembrane domain. FXYD proteins are widely distributed in mammalian tissues with prominent expression\ in tissues that perform fluid and solute transport or that are electrically excitable.

    \

    Initial functional characterization suggested that FXYD proteins act as channels or as modulators of ion\ channels however studies have revealed that most FXYD proteins\ have another specific function and act as tissue-specific regulatory subunits of the\ Na,K-ATPase. Each of these auxiliary\ subunits produces a distinct functional effect on the transport characteristics of\ the Na,K-ATPase that is adjusted to the specific functional demands of the tissue in\ which the FXYD protein is expressed. FXYD proteins appear to preferentially\ associate with Na,K-ATPase alpha1-beta isozymes, and affect their function in a way that\ render them operationally complementary or supplementary to coexisting isozymes.

    \ 747 IPR005037 \

    Members of this family are related to the pre mRNA splicing factor PRP38 from yeast PUBMED:1508195, therefore all the members of this family could be involved in splicing. This\ conserved region could be involved in RNA binding. The putative domain is about 180 amino acids in length. PRP38 is a unique component of the U4/U6.U5 tri-small\ nuclear ribonucleoprotein (snRNP) particle and is necessary for an essential step late in spliceosome maturation PUBMED:9582287.

    \ 5468 IPR008516 \ This family consists of several eukaryotic proteins of unknown function.\ 981 IPR007234 \ Vps53 complexes with Vps52 and Vps54 to form a multi-subunit complex involved in regulating membrane trafficking events PUBMED:10637310.\ 1921 IPR003811 \

    This entry describes proteins of unknown function.

    \ 5199 IPR008034 \

    Delta-lysin is a 26 amino acid, hemolytic peptide toxin secreted by Staphylococcus aureus. It is thought that delta-toxin forms an amphipathic\ helix upon binding to lipid bilayers PUBMED:12206677. The precise mode of action of delta-lysis is\ unclear.

    \ 2242 IPR007618 \ This domain is found at the N-termini of some human herpesvirus U58 proteins, and some cytomegalovirus UL87 proteins. This region is always found N-terminal to the UL87 (), which has no known function.\ 7341 IPR011092 \

    The members of this family are primarily from the Gammaproteobacteria. The function of these proteins is unknown.

    \ 7232 IPR010874 \

    This family consists of several telomere-binding protein beta subunits, which appear to be specific to the family Oxytrichidae. Telomeres are specialised protein-DNA complexes that compose the ends of eukaryotic chromosomes. Telomeres protect chromosome termini from degradation and recombination and act together with telomerase to ensure complete genome replication. TEBP beta forms a complex with TEBP alpha and this complex is able to recognise and bind ssDNA to form a sequence-specific, telomeric nucleoprotein complex that caps the very 3' ends of chromosomes PUBMED:9875850.

    \ 838 IPR007273 \ In vertebrates, secretory carrier membrane proteins (SCAMPs) 1-3 constitute a family of putative membrane-trafficking proteins composed of cytoplasmic N-terminal sequences with NPF repeats, four central transmembrane regions (TMRs), and a cytoplasmic tail. SCAMPs probably function in endocytosis by recruiting EH-domain proteins to the N-terminal NPF repeats but may have additional functions mediated by their other sequences PUBMED:11050114.\ 1978 IPR000615 \ Bestrophin is a 68-kDa basolateral plasma membrane protein expressed in retinal pigment epithelial cells (RPE). It is encoded by the VMD2 gene, which is mutated in Best macular dystrophy, a disease characterised by a depressed light peak in the electrooculogram PUBMED:12032738. VMD2 encodes a 585-amino acid protein with an approximate mass of 68 kDa which has been designated bestrophin. Bestrophin shares homology with the Caenorhabditis elegans RFP gene family, named for the presence of a conserved arginine (R), phenylalanine (F), proline (P), amino acid sequence motif. Bestrophin is a plasma membrane protein, localised to the basolateral surface of RPE cells consistent with a role for bestrophin in the generation or regulation of the EOG light peak. Bestrophin and other RFP family members represent a new class of chloride channels, indicating a direct role for bestrophin in generating the light peak PUBMED:12032738. The VMD2 gene underlying Best disease was shown to represent the first human member of the RFP-TM protein family. More than 97% of the disease-causing mutations are located in the N-terminal RFP-TM domain implying important functional properties PUBMED:12058047.\ 1586 IPR000187 \

    Corticotropin-releasing factor (CRF), urotensin-I, urocortin and sauvagine\ form a family of related neuropeptides in vertebrates. The family can be\ grouped into 2 separate paralogous lineages, with urotensin-I, urocortin and\ sauvagine in one group and CRF forming the other group. Urocortin and\ sauvagine appear to represent orthologues of fish urotensin-I in mammals and\ amphibians, respectively. The peptides have a variety of physiological\ effects on stress and anxiety, vasoregulation, thermoregulation, growth and\ metabolism, metamorphosis and reproduction in various species, and are all\ released as preprohormones PUBMED:10375459.

    \ CRF PUBMED:2200028 is a hormone found mainly in the paraventricular nucleus of the mammalian hypothalamus that regulates the release of corticotropin (ACTH) from the pituitary gland. From here, CRF\ is transported to the anterior pituitary, stimulating adrenocorticotropic\ hormone (ACTH) release via CRF type 1 receptors, thereby activating the\ hypothalamo-pituitary-adrenocortical axis (HPA) and thus glucocorticoid\ release.

    \

    \ CRF is evolutionary related to a number of other active peptides. Urocortin acts in vitro to stimulate the secretion of adrenocorticotropic hormone. Urotensin is found in the teleost caudal neurosecretory system and may play a role in osmoregulation and as a corticotropin-releasing factor. Urotensin-I is released\ from the urophysis of fish, and produces ACTH and subsequent cortisol \ release in vivo. The nonhormonal portion of the prohormone is thought to be\ the urotensin binding protein (urophysin). Sauvagine (), isolated from frog \ skin, has a potent hypotensive and diuretic effect.

    \ 7422 IPR011452 \

    This is a family of paralogues from the planctomycete Rhodopirellula baltica.

    \ 6916 IPR009788 \

    This family consists of several archaeal GvpD gas vesicle proteins. GvpD is thought to be involved in the regulation of gas vesicle formation PUBMED:8763925,PUBMED:12864859.

    \ 5103 IPR007940 \

    The SH3 domain-binding protein inhibits the auto and transphophorylation of BTK and acts as a negative regulator of BTK-related signalling in B cells.

    \ 5477 IPR008753 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M13 (neprilysin family, clan MA(E)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA and the predicted active site residues for members of this family and thermolysin occur in the motif HEXXH PUBMED:7674922.

    \ \

    M13 peptidases are well-studied proteases found in a wide range of organisms including mammals and bacteria. In mammals they participate in processes such as cardiovascular development, blood-pressure regulation, nervous control of respiration, and regulation of the function of neuropeptides in the central nervous system. In bacteria they may be used for digestion of milk PUBMED:11223883, PUBMED:7674922. The family includes eukaryotic and prokaryotic oligopeptidases, as well as some of the proteins responsible for the molecular basis of the blood group antigens e.g. Kell PUBMED:7674922.

    \ \

    Neprilysin (), is another member of this group, it is variously known as common acute lymphoblastic leukemia antigen (CALLA), enkephalinase (gp100) and neutral endopeptidase metalloendopeptidase (NEP). It is a plasma membrane-bound mammalian enzyme that is able to digest biologically-active peptides, including enkephalins PUBMED:7674922. The zinc ligands of neprilysin are known and are analogous to those in thermolysin, a related peptidase PUBMED:7674922, PUBMED:8099556. Neprilysins, like thermolysin, are inhibited by phosphoramidon, which appears to selectively inhibit this family in mammals. The enzymes are all oligopeptidases, digesting oligo- and polypeptides, but not proteins PUBMED:7674922. Neprilysin consists of a short cytoplasmic domain, a membrane-spanning region and a large extracellular domain. The cytoplasmic domain contains a conformationally-restrained octapeptide, which is thought to act as a stop transfer sequence that prevents proteolysis and secretion PUBMED:7674922, PUBMED:3555489.

    \ \ \ 1961 IPR004919 \ This family includes prokaryotic proteins of unknown function.\ 275 IPR007153 \ Members of this family are around 160 amino acids in length and are mainly found in archaebacteria, with a small number of eubacterial examples. The high level of conservation in this family suggests some as yet unknown important biological function.\ 2631 IPR002567 \ Herpes simplex virus type 1 glycoprotein K (gK) plays an essential role in viral replication and cell fusion. gK is a very hydrophobic membrane protein that contains a signal sequence and several hydrophobic regions. gK contains three transmembrane domains (amino acids 125-139, 226-239, and 311-325) and another hydrophobic domain (amino acids 241-265), which is relatively less hydrophobic and much longer compared with the transmembrane sequences located in the extracellular loop. The domains may interact with each other to form a complex tertiary structure that is critical for the biological function of gK PUBMED:9407122.\ 3044 IPR007269 \ The isoprenylcysteine o-methyltransferase () carries out carboyxl methylation of cleaved eukaryotic proteins that terminate in a CaaX motif. In Saccharomyces cerevisiae this methylation is carried out by Ste14p, an integral endoplasmic reticulum membrane protein. Ste14p is the founding member of the isoprenylcysteine carboxyl methyltransferase (ICMT) family, whose members share significant sequence homology PUBMED:11451995.\ 2714 IPR003837 \

    Glu-tRNAGln amidotransferase is a heterotrimeric enzyme that is required for correct decoding of glutamine codons during translation. The Glu-tRNA Gln amidotransferase enzyme is an important translational fidelity mechanism replacing incorrectly charged Glu-tRNAGln with the correct Gln-tRANGln via transmidation of the misacylated Glu-tRNAGln PUBMED:9342321. This activity supplements the lack of glutaminyl-tRNA synthetase activity in Gram-positive eubacteria, cyanobacteria, archaea, and organelles PUBMED:9342321.

    \ 6249 IPR006297 \

    LepA (GUF1 in Saccaromyces) is a GTP-binding membrane protein related to EF-G and EF-Tu. Two types of phylogenetic tree, rooted by other GTP-binding proteins, suggest that eukaryotic homologs (including GUF1 of yeast) originated within the bacterial LepA family. The function of the proteins in this family are unknown.

    \ 5696 IPR008818 \ This family consists of several Rotavirus major outer capsid protein VP7 sequences. The rotavirus capsid is composed of three concentric protein layers. Proteins VP4 and VP7 comprise the outer layer. VP4 forms spikes and is the viral attachment protein. VP7 is a glycoprotein and the major constituent of the outer protein layer PUBMED:12050377.\ 4752 IPR000831 \

    The Trp repressor (TrpR) binds to at least five operators in the Escherichia coli genome, repressing gene expression. The operators at which it binds vary considerably in DNA sequence and location within the promoter; when bound to the Trp operon it recognises the sequence 5'-ACTAGT-3' and acts to prevent the initiation of transcription. The TrpR controls the trpEDCBA (trpO) operon and the genes for trpR, aroH, mtr and aroL, which are involved in the biosynthesis and uptake of the amino acid tryptophan PUBMED:12475235. The repressor binds to the operators only in the presence of L-tryptophan, thereby controlling the intracellular level of its effector; the complex also regulates Trp repressor biosynthesis by binding to its own regulatory region. TrpR acts as a dimer that is composed of identical 6-helical subunits, where four of the helices form the core of the protein and intertwine with the corresponding helices from the other subunit.

    \ \ 1284 IPR001469 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    This family represents subunits called delta and epsilon in human and metazoan species. In bacterial species the delta (D) subunit is the equivalent to the Oligomycin sensitive subunit (OSCP) in metazoans. The E. coli delta and metazoan OSCP subunits are found in Pfam family OSCP (OSCP).

    \ 5132 IPR007969 \

    This family consists of several uncharacterised Mycobacterium tuberculosis proteins of unknown function.

    \ 7464 IPR011504 \

    This motif is found at the N terminus of several short hypothetical proteins in Rhodopirellula baltica and the predicted Arylsulfatase B () .

    \ 6055 IPR009353 \

    This family consists of several Orthopoxvirus N1 proteins. The function of this family is unknown.

    \ 2126 IPR007422 \ This is a family of uncharacterised archaeal proteins.\ 2698 IPR000714 \ The IR5 open reading frame (ORF) of the equine herpesvirus type 1 (EHV-1) genome maps within the \ inverted repeat segments. Sequence analyses of the gene region revealed an ORF of 236 amino acids \ that showed a high degree of similarity to ORF64 of varicella zoster virus and ORF3 of EHV-4, both \ of which map within the inverted repeats, and to the US10 ORF of herpes simplex virus type 1 (HSV-1), \ which maps within the unique short segment. The IR5 ORF houses a sequence of 13 residues (CAYWCCLGHAFAC) \ that matches perfectly the consensus zinc finger motif (C-X2-4-C-X2-15-C/H-X2-4-C/H) PUBMED:1316680.\ Putative cis-acting elements flanking the IR5 ORF include a TATA box, a CAAT box, and a polyadenylation \ signal. Coupled with various experimental data, the IR5 gene of EHV-1 thus exhibits characteristics \ representative of a late gene of the gamma-1 class. The DNA sequence covering ~70% of the short unique \ region (Us) and part of the short inverted repeat of the Mareks disease virus type 1 GA strain has\ been determined. Sequence analysis showed the presence of nine potential ORFs in the Us region, four \ of which were found to be similar to US10 (minor virion protein) PUBMED:1282282.\ 1831 IPR003752 \ DsbB is a protein component of the pathway that leads to disulphide bond formation in periplasmic proteins of Escherichia coli and other bacteria.\ The DsbB protein oxidizes the\ periplasmic protein DsbA which in turn oxidizes cysteines in other periplasmic proteins in order to make disulphide\ bonds PUBMED:8430071. DsbB acts as a redox potential transducer across the cytoplasmic membrane. It is a membrane protein which spans the membrane four times with both the N- and C-termini of the protein are in the cytoplasm. Each of the periplasmic domains of the protein has two essential cysteines. The two\ cysteines in the first periplasmic domain are in a Cys-X-Y-Cys configuration that is\ characteristic of the active site of other proteins involved in disulphide bond formation,\ including DsbA and protein disulphide isomerase PUBMED:7957076.\ 2468 IPR003883 \ Extensins are plant cell-wall proteins; they can account for up to 20% of the dry weight of the cell wall. They are highly-glycosylated, possibly reflecting their interactions with cell-wall carbohydrates. Amongst their functions is cell\ wall strengthening in response to mechanical stress (e.g., during attack by pests, plant-bending in the wind, etc.). This repeat occurs within extensin-like proteins.\ 8089 IPR013259 \

    The sulfakinin (SK) family of neuropeptides have only been identified in crustaceans and insects. For most species there is the potential for producing two sulfakinin peptides, one has a short sulfakinin sequence. The function of the sulfakinins is difficult to assess. For the American cockroach, various forms of the endogenous sulfakinins have been shown to be active on the hindgut, and also on the heart. In C. vomitoria the peptides act as neurotransmitters or neuromodulators, linking the brain with all thoracic and abdominal ganglia. In adults of P. monodon they appear to be restricted to a few neurones in the brain with a neural pathway extending along to the ventral thoracic and abdominal ganglia.

    \ 7943 IPR012593 \

    This family consists of the PEA-VEAacid neuropeptides family. These neuropeptides are isolated from the abdominal perisympathetic organs of the American cockroach. These peptides are found together with Pea-YLS-amide and Pea-SKNacid, giving a unique neuropeptide pattern in abdominal perisympathetic organs. The functions of these neuropeptides are unknown PUBMED:10676456.

    \ 7558 IPR011704 \

    This entry includes some of the AAA proteins not detected by the model.

    \ 491 IPR007734 \

    Heparan sulphate (HS) is a long unbranched polysaccharide found covalently attached to various proteins at the cell surface and in the\ extracellular matrix, where it acts as a co-receptor for a number of growth factors, morphogens, and adhesion proteins. HS-O-sulphotransferase (Hs2st) occupies a critical position in the succession of enzymes responsible for the biosynthesis of HS, catalysing the transfer of sulphate to the C2-position of selected hexuronic acid residues within the nascent HS chain. Mice that lack HS2ST undergo developmental failure after midgestation, the most dramatic effect being the complete failure of kidney development PUBMED:11956326. This family is related to .

    \ 7828 IPR012543 \

    This family contains many hypothetical proteins.

    \ 3287 IPR002056 \ Virtually all mitochondrial precursors are imported via the same \ mechanism PUBMED:7709435: precursors first bind to receptors on the mitochondrial\ surface, then insert into the translocation channel in the outer membrane.\ Many outer-membrane proteins participate in the early stages of import,\ four of which (MAS20, MAS22, MAS37 and MAS70) are components of the receptor.\ MAS20, which forms a subcomplex with MAS22, seems to interact with most or\ all mitochondrial precursors, suggesting that the protein binds directly\ to mitochondrial targeting sequences. The MAS37 and MAS70 components also\ form a subcomplex, the two subcomplexes possibly binding via their trans-\ membrane (TM) regions - the TM region of MAS70 promotes oligomerisation\ of attatched protein domains and shares sequence similarity with the\ TM region of MAS20 PUBMED:8163528.\ 4661 IPR003538 \ Iron is essential for growth in both bacteria and mammals. Controlling the\ amount of free iron in solution is often used as a tactic by hosts to limit\ invasion of pathogenic microbes; binding iron tightly within protein\ molecules can accomplish this. Such iron-protein complexes include haem in\ blood, lactoferrin in tears/saliva and transferrin in blood plasma. Some\ bacteria express surface receptors to capture eukaryotic iron-binding\ compounds, while others have evolved siderophores to scavenge iron from\ iron-binding host proteins PUBMED:8057905.\ \

    The absence of free iron molecules in the surrounding environment triggers \ transcription of gene clusters that encode both siderophore-synthesis \ enzymes, and receptors that recognise iron-bound siderophores PUBMED:2521621. An \ example of the latter is Escherichia coli fepA, which resides in the outer \ envelope and captures iron-bound enterobactin PUBMED:9886293.

    \ \

    To complete transport of bound iron across the inner membrane, a second \ receptor complex is needed. The major component of this is tonB, a 27kDa\ protein that facilitates energy transfer from the proton motive force to\ outer receptors PUBMED:9643536. B-12 and colicin receptors also make use of the tonB\ system to drive active transport at the outer membrane.

    \ 1354 IPR006949 \ The P2 bacteriophage J protein lies at the edge of the baseplate. This family also includes a number of bacterial homologues, which are thought to have been horizontally transferred.\ 4972 IPR007005 \ These proteins are found in a wide range of eukaryotes. Their function is uncertain though they are nuclear proteins, possibly with DNA-binding activity.\ 2201 IPR007512 \ This family of short eukaryotic proteins has no known function. Most of the members of this family are only 80 amino acid residues long. However the Arabidopsis homologue is over 300 residues long. These proteins contain a conserved N-terminal cysteine and a conserved motif GXGXGXG in the carboxy terminal half that may be functionally important.\ 2705 IPR004911 \ This family includes the two characterized human gamma-interferon-inducible lysosomal thiol reductase (GILT) sequences PUBMED:3136170, PUBMED:10639150. It also contains several other eukaryotic putative proteins with similarity to GILT PUBMED:11491538. The\ aligned region contains three conserved cysteine residues. In addition, the two GILT sequences possess a C-X(2)-C motif that is\ shared by some of the other sequences in the family. This motif is thought to be associated with disulphide bond reduction. \ \ 5440 IPR008499 \ This family consists of uncharacterised proteins found in Mus musculus (mouse), man, zebra fish and other eukaryotes.\ 129 IPR000462 \ A number of phosphatidyltransferases, which are all involved in phospholipid\ biosynthesis and that share the property of catalyzing the displacement of CMP\ from a CDP-alcohol by a second alcohol with formation of a phosphodiester bond\ and concomitant breaking of a phosphoride anhydride bond share a conserved\ sequence region PUBMED:3031032, PUBMED:1848238.\ These enzymes are proteins of from 200 to 400 amino acid residues. The\ conserved region contains three aspartic acid residues and is located in the\ N-terminal section of the sequences.\ 2399 IPR004699 \ Bacterial PTS transporters transport and concomitantly phosphorylate their sugar substrates, and typically consist of multiple subunits or protein domains.The Man family is unique in several respects among PTS permease families.\
  • It is the only PTS family in which members possess a IID protein.
  • It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.
  • Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.
  • \

    The Gut family consists only of glucitol-specific transporters, but these occur both in Gram-negative and Gram-positive bacteria. Escherichia coli consists of IIA protein, a IIC protein and a IIBC protein.

    This family is specific for the IIC component.

    \ 3700 IPR004898 \

    Pectate lyase is responsible for the maceration and soft-rotting of plant tissue. It catalyses the eliminative cleavage of pectate to produce oligosaccharides with 4-deoxy-alpha-D-gluc-4-enuronosyl groups at their non-reducing ends. Pectate lyase is an extracellular enzyme and is induced by pectin. It is subject to self-catabolite repression, and has been implicated in plant disease.

    \ \ The structure and the folding kinetics of one member of this family, pectate lyase C\ (pelC)1 from Erwinia chrysanthemi has been investigated in some detail PUBMED:11926834. PelC contains a parallel beta-helix folding motif. The majority of the regular secondary structure is composed of parallel beta-sheets (about\ 30%). The individual strands of the sheets are connected by unordered loops of varying length. The backbone is then formed by a large helix composed of beta-sheets. There are two disulphide bonds in pelC and 12 proline residues. One of these prolines, Pro220, is involved in a cis peptide bond. he folding mechanism of pelC involves two slow phases that have been attributed to proline isomerization.\ 5723 IPR008577 \ This family consists of several uncharacterised proteins from a number of the Siphoviruses as well as some bacterial proteins from Streptococcus species. Some of the members of this family are described as putative minor structural proteins.\ 5860 IPR010318 \

    This family consists of hypothetical bacterial and archaeal proteins of unknown function.

    \ 3285 IPR001038 \

    Equine herpesvirus glycoprotein 13 (EHV-1 gp13) has the characteristic\ features of a membrane-spanning protein: an N-terminal signal sequence;\ a hydrophobic membrane anchor region; a charged C-terminal cytoplasmic tail;\ and an exterior domain with nine potential N-glycosylation sites PUBMED:2455821.\ EHV-1 gp13 is the structural homologue of the gC-like glycoproteins of Herpes\ simplex virus (gC-1 and gC-2), pseudorabies Herpesvirus (gIII) and Varicella-zoster\ virus (gp66).

    \

    Secretory glycoprotein GP57-65 precursor (glycoprotein A - GA) is similar to Herpesvirus glycoprotein C, and belongs to the immunoglobulin gene superfamily PUBMED:2836620, PUBMED:2543160. GA is thought to play an immunoevasive role in the pathogenesis of Marek's disease. It is a candidate for causing the early-stage immunosuppression that occurs after MDHV infection.

    \ 7166 IPR010854 \

    This entry consists of several hypothetical Enterobacterial proteins of around 90 residues in length. Some of the proteins are annotated as ydgH precursors and contain two copies of this region, one at the N terminus and the other at the C terminus. The function of this family is unknown.

    \ 3004 IPR000397 \ Hsp33 is a molecular chaperone, distinguished from all\ other known chaperones by its mode of functional regulation.\ Its activity is redox regulated. Hsp33 is a cytoplasmically\ localized protein with highly reactive cysteines that\ respond quickly to changes in the redox environment.\ Oxidizing conditions like H2O2 cause disulphide bonds\ to form in Hsp33, a process that leads to the activation\ of its chaperone function PUBMED:10025400.\ 2793 IPR001863 \

    Glypicans PUBMED:8589707, PUBMED:7657705 are a family of heparan sulphate proteoglycans which are anchored to cell membranes by a glycosylphosphatidylinositol (GPI) linkage. Structurally, these proteins consist of three separate domains:

    \ \ 3820 IPR007633 \ Holins are a diverse family of proteins that cause bacterial membrane lysis during late-protein synthesis. It is thought that the temporal precision of holin-mediated lysis may occur through the build-up of a holin oligomer which causes the lysis PUBMED:11459934.\ 4048 IPR000032 \

    Phosphocarrier HPr protein, a small cytoplasmic protein, is a component of the phosphoenolpyruvate-dependent sugar phosphotransferase\ system (PTS) major carbohydrate transport system in bacteria PUBMED:8246840, PUBMED:2197982. The phosphoryl group from phosphoenolpyruvate (PEP) is transferred to HPr, the phosphoryl carrier protein, by enzyme I. Phospho-HPr then transfers it to the permease. In some bacteria HPr is a domain in a larger protein that includes a EIII(Fru)\ (IIA) domain and in some cases also a EI domain.

    \

    The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS)\ is a major carbohydrate transport system in bacteria. The PTS catalyses\ the phosphorylation of sugar substrates during their translocation across\ the cell membrane. The mechanism involves the transfer of a phosphoryl\ group from phosphoenolpyruvate (PEP) via enzyme I (EI) to enzyme II (EII)\ of the PTS system, which in turn transfers it to a phosphocarrier protein\ (HPr) PUBMED:7853396, PUBMED:7704530.

    \

    There is a conserved histidine in the N-terminus of HPr, which serves as an acceptor for\ the phosphoryl group of EI. In the central part of HPr there is a conserved serine which, in Gram-positive bacteria only, is phosphorylated by an\ ATP-dependent protein kinase, a process which probably plays a regulatory role in sugar\ transport.

    \ 4001 IPR002130 \

    Cyclophilin PUBMED: is the major high-affinity binding protein in vertebrates for the immunosuppressive drug cyclosporin A (CSA), but is also found in other organisms. It exhibits a peptidyl-prolyl cis-trans isomerase activity () (PPIase or rotamase). PPIase is an enzyme that accelerates protein folding by catalyzing the cis-trans isomerization of proline imidic peptide bonds in oligopeptides PUBMED:2186809. It is probable that CSA mediates some of its effects via an forming a tight complex with cyclophilin that inhibits the\ phosphatase activity of calcineurin PUBMED:7514602, PUBMED:8117697\ . Cyclophilin A is a cytosolic and highly abundant protein. The protein belongs to a family of isozymes, including cyclophilins B and C, and natural killer cell cyclophilin-related protein PUBMED:1464374, PUBMED:8404888, PUBMED:7526121. Major isoforms have been found throughout the cell, including the ER, and some are even secreted. The sequences of the different forms of cyclophilin-type PPIases are well conserved.

    \
  • Note: FKBP's, a family of proteins that bind the immunosuppressive drug FK506, are also PPIases, but their sequence is not at all related to that of cyclophilin (see ).
  • \ 3126 IPR004623 \ Kdp is a high affinity ATP-driven K+ transport system in Escherichia coli. It is composed of three membrane-bound subunits, KdpA, KdpB and KdpC and one small peptide, KdpF. KdpA is the K+-transporting subunit of this complex. During assembly of the complex, KdpA and KdpC bind to each other. This interaction is thought to stabilize the complex. Data indicates that KdpC might connect the KdpA, the K+-transporting subunit, to KdpB, the ATP-hydrolyzing (energy providing) subunit PUBMED:9858692.\ 3035 IPR005295 \

    These proteins are the product of ORF 3B from Avian infectious bronchitis virus (IBV). Currently, the function of this protein remains unknown PUBMED:9168126.

    \ 1199 IPR001103 \

    Steroid or nuclear hormone receptors (NRs) constitute an important super-family of transcription regulators that are involved in diverse \ physiological functions, including control of embryonic development, cell\ differentiation and homeostasis. Members include the\ steroid hormone receptors and receptors for thyroid hormone, retinoids and \ 1,25-dihydroxy-vitamin D3. The proteins \ function as dimeric molecules in the nucleus to regulate the transcription of \ target genes in a ligand-responsive manner PUBMED:7899080, PUBMED:8165128.

    \ \

    NRs are extremely important in medical research, a large number of them\ being implicated in diseases such as cancer, diabetes and hormone resistance\ syndromes. Many do not yet have a defined ligand and are accordingly termed \ "orphan" receptors. More than 300 NRs have been\ described to date and a new system \ has recently been introduced in an attempt to rationalise the increasingly \ complex set of names used to describe superfamily members.

    \

    \ The androgen receptor (AR) consists of 3 functional and structural domains:\ an N-terminal (modulatory) domain; a DNA binding domain () that mediates\ specific binding to target DNA sequences (ligand-responsive elements);\ and a hormone binding domain. The N-terminal domain (NTD) is unique to the \ androgen receptors and spans approximately the first 530 residues; the\ highly-conserved DNA-binding domain is smaller (around 65 residues) and\ occupies the central portion of the protein; and the hormone ligand binding\ domain (LBD) lies at the receptor C-terminus. In the absence of ligand,\ steroid hormone receptors are thought to be weakly associated with nuclear\ components; hormone binding greatly increases receptor affinity.

    \ \

    The LBDs of steroid hormone\ receptors fold into 12 helices that form a ligand-binding pocket. When an agonist is bound, helix 12 folds over\ the pocket to enclose the ligand PUBMED:12089231. When an antagonist is unbound, helix 12 is positioned away from the pocket in a way that interferes with the binding of\ coactivators to a groove in the hormone-binding domain formed after ligand binding. In AR, ligand binding that induces folding of helix 12 to overlie the\ pocket discloses a groove that binds a region of the NTD. Coactivator molecules can also bind to this groove, but the predominant site for coactivator binding\ to AR is in the NTD. AR ligand resides in a pocket and primarily contacts helices 4, 5, and 10. The DNA-binding region includes eight\ cysteine residues that form two coordination complexes, each composed of four cysteines and a Zn2+ ion. These two zinc fingers form the structure that binds\ to the major groove of DNA. The second zinc finger stabilizes the binding complex by hydrophobic interactions with the first finger and contributes to specificity of receptor DNA binding.\ It is also necessary for receptor dimerization that occurs during DNA binding

    \

    Defects in the androgen receptor cause testicular feminisation syndrome,\ androgen insensibility syndrome (AIS) PUBMED:1307250, PUBMED:1569163. AIS may be complete (CAIS),\ where external genitalia are phenotypically female; partial (PAIS), where\ genitalia are substantively ambiguous; or mild (MAIS), where external\ genitalia are normal male, or nearly so. Defects in the receptor also cause\ X-linked spinal and bulbar muscular atrophy (also known as Kennedy's disease).\

    \ \ 4628 IPR001721 \ Threonine dehydratases including Serine/threonine dehydratase (see ) contain a common C-terminal region that may have a regulatory role. Some members contain two copies of this region PUBMED:9562556.\ \ 5003 IPR005495 \ Members of this family are predicted integral membrane proteins of unknown function. They are about 350 amino acids long, contain about 6 transmembrane regions and may be permeases, although there is no verification of this.\ 3279 IPR005298 \

    This presumed 110 amino acid residue domain is found in multiple copies in MAP (MHC class II analog protein) PUBMED:7545162. Each of the repeated domains contain a subdomain of 31 residues that share striking sequence homology with a segment in the peptide binding groove of the beta chain of the major histocompatibility complex (MHC) class II proteins from different mammalian species. The domain has been found to a range of other extracellular matrix proteins PUBMED:7545162 and may play a role in protein recognition and binding.

    \ 3355 IPR006815 \ This small protein is involved in DNA packaging, interacting with DNA via its hydrophobic C-terminus. In bacteriophage phi-X174, J is present in 60 copies, and forms an S-shaped polypeptide chain without any secondary structure. It is thought to interact with DNA through simple charge interactions PUBMED:911774.\ 1721 IPR002480 \

    Members of the 3-deoxy-D-arabino-heptulosonate 7-phosphate (DAHP) synthetase family () catalyse the first step in aromatic amino acid biosynthesis from chorismate. Class I (see ) includes bacterial and yeast enzymes; class II includes higher plants and various microorganisms PUBMED:8760910.

    \

    The first step in the common pathway leading to the biosynthesis of aromatic compounds is the stereospecific condensation of phosphoenolpyruvate (PEP) and D-erythrose-4-phosphate (E4P) giving rise to 3-deoxy-D-arabino-heptulosonate-7-phosphate (DAHP). This reaction is catalyzed by DAHP synthase, a metal-activated enzyme, which in microorganisms is the target for negative-feedback regulation by pathway intermediates or by end products.

    \ 7070 IPR006541 \

    These sequences represent a family of integral membrane proteins, most of which are about 650 residues in size and predicted to span the membrane seven times. Nearly half of the members of this family are found in association with a member of the lactococcin 972 family of bacteriocins () PUBMED:10589723. Others may be associated with uncharacterized proteins that may also act as bacteriocins. Although this protein is suggested to be an immunity protein, and the bacteriocin is suggested to be exported by a Sec-dependent process, the role of this protein is unclear.

    \ 7778 IPR012506 \

    The members of this family are similar to the hypothetical protein yhhN expressed by E. coli (). Many of the members of this family are annotated as being possible transmembrane proteins, and in fact they all have a high proportion of hydrophobic residues.

    \ 6476 IPR010594 \

    This family consists of several hypothetical Baculovirus proteins of unknown function.

    \ 5129 IPR007966 \

    This family consists of several uncharacterised Chlamydia proteins of unknown function.

    \ 1623 IPR004916 \ Ubiquinone biosyntheis proteins, COQ7, are central metabolic regulatory proteins. They are members of a protein family, that contain two repeats of about 90 amino acids, that contains two conserved motifs. One of these DXEXXH may be part of an enzyme active site.\ 663 IPR003392 \ The transmembrane protein, patched, is a receptor for the morphogene Sonic Hedgehog. In Drosophila melanogaster, this protein associates with the smoothened protein to transduce hedgehog signals, leading to the activation of wingless, decapentaplegic and patched itself. It participates in cell interactions that establish pattern within the segment and imaginal disks during development. The mouse homolog may play a role in epidermal development. The human Niemann-Pick C1 protein, defects in which cause Niemann-Pick type II disease, is also a member of this family. This protein is involved in the intracellular trafficking of cholesterol, and may play a role in vesicular trafficking in glia, a process that may be crucial for maintaining the structural functional integrity of nerve terminals.\ 2711 IPR005190 \

    This is a conserved repeated domain found in GlnE proteins. These proteins adenylate and deadenylate glutamine synthases: The domain is related to the nucleotidyltransferase domain .

    \ 4022 IPR003687 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbK found in PSII, where it is tightly associated with the antenna protein CP43 (PsbC). PsbK is required for accumulation of the PSII complex, and may participate in the assembly and stability of the PSII complex. In particular, PsbK may be involved in the binding of plastoquinone and in maintaining the dimeric organisation of PSII PUBMED:12939265, PUBMED:9632665.

    \ 992 IPR003657 \ The WRKY domain is a 60 amino acid region that is defined by the conserved\ amino acid sequence WRKYGQK at its N-terminal end, together with a novel\ zinc-finger- like motif. The WRKY domain is found in one or two copies in a\ superfamily of plant transcription factors involved in the regulation of\ various physiological programs that are unique to plants, including pathogen\ defense, senescence, trichome development and the biosynthesis of secondary\ metabolites. The WRKY domain binds specifically to the DNA sequence motif\ (T)(T)TGAC(C/T), which is known as the W box. The invariant TGAC core of the W\ box is essential for function and WRKY binding PUBMED:10785665. Some proteins known to contain a WRKY domain include Arabidopsis thaliana ZAP1 (Zinc-dependent Activator Protein-1) and AtWRKY44/TTG2, a protein involved in trichome\ development and anthocyanin pigmentation; and wild oat ABF1-2, two proteins involved in the gibberelic acid-induced expression of the alpha-Amy2 gene.\ 7835 IPR012546 \

    This family contains many archaeal proteins which have very conserved sequences.

    \ 7968 IPR012607 \

    This family consists of the 30S ribosomal proteins subunit S22 polypeptides. This polypeptide is 47 amino acids in length and has a molecular weight of about 5 kDa. The S22 subunit is a component of the stationary-phase-specific ribosomal protein and is assembled in the ribosomal particles in the stationary phase. This subunit along with other stationary-phase-specific ribosomal proteins result in compositional changes of ribosomes during the stationary phase. The significance of this change is not clear as yet PUBMED:11168583.

    \ 4182 IPR001787 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L21 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L21 is known to bind to the 23S rRNA in the presence of L20. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups:\

    \

    Bacterial L21 is a protein of about 100 amino-acid residues, the mature form of the spinach chloroplast L21 has 200 residues.

    \ \ 3899 IPR005099 \ This non-structural protein is one of two found in pneumoviruses. The protein is about 140 amino acids in length. The NS1 protein appears to be important for\ efficient replication but not essential PUBMED:10982380. The NS1 protein has been shown by yeast two-hybrid to interact with the viral P protein PUBMED:10949949. This protein is also known as\ the 1C protein. It has also been shown that NS1 can potently inhibit transcription and RNA replication PUBMED:9445048.\ 6574 IPR009605 \

    This entry represents a conserved region within Arabidopsis thaliana proteins of unknown function. Proteins of the entry sometimes contain more than one copy of the domain.

    \ 5618 IPR008424 \ This entry occurs in several mammalian T-cell surface antigen CD2 proteins as well as homologous African swine fever virus (ASFV) sequences. CD2 mediates T cell adhesion via its ectodomain and signal transduction utilising its 117-amino acid cytoplasmic tail PUBMED:11376005. The structural and functional similarities of the ASFV LMW8-DR to CD2, a protein that is involved in cell-cell adhesion and immune response modulation, suggest a possible role in the pathogenesis of ASFV infection PUBMED:7907198.\ 7356 IPR006561 \

    This domain is found in proteins containing the double-stranded RNA-binding motif, DSRM (), or the zinc finger domain C2H2 (). This domain is found\ exclusively in the metazoa.

    \ 3568 IPR000310 \ Pyridoxal-dependent decarboxylases are bacterial proteins acting on ornithine, lysine, arginine and related substrates PUBMED:8181483.\ One of the regions of sequence similarity contains a conserved lysine residue, which is the site of attachment of the pyridoxal-phosphate group.\ 8050 IPR013166 \

    The proteins in this entry contain the C-terminal domain of citrate lyase ligase .

    \ 5371 IPR008750 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the peptidase family C47 (staphopain family, clan CA). \ \ The type example are the staphopains, which are one of four major families of proteinases secreted by the Gram-positive Staphylococcus aureus. These staphylococcal cysteine proteases are secreted as preproenzymes that are proteolytically cleaved to generate the mature enzyme PUBMED:12437090, PUBMED:11447146, PUBMED:11767947.

    \ 348 IPR006055 \ This entry includes a variety of exonuclease proteins, such as ribonuclease T PUBMED:8506149 and the epsilon subunit of DNA polymerase III. Ribonuclease T is responsible for the end-turnover of tRNA,and removes the terminal AMP residue from uncharged tRNA. DNA polymerase III is a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria, and also exhibits 3' to 5' exonuclease activity.\ 4177 IPR001857 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L19 is one of the proteins from the large ribosomal subunit PUBMED:8262035, PUBMED:1985969. In Escherichia coli, L19 is known to be located at the 30S-50S ribosomal subunit interface PUBMED:339951 and may play a role in the structure and function of the aminoacyl-tRNA binding site. It belongs to a family of ribosomal proteins, including L19 from bacteria and the chloroplasts of red algae.

    \

    L19 is a protein of 120 to 130 amino-acid residues.

    \ 535 IPR004172 \ The L27 domain is found in receptor targeting proteins Lin-2 and Lin-7, as well as some protein kinases and human MPP2 protein.\ 3095 IPR006937 \ This family represents a number of plant neutral invertases ().\ 7856 IPR012522 \

    This family includes antimicrobial peptides isolated from the crude venom of the wolf spider Oxyopes kitabensis. These peptides, known as oxyopinins, are the largest linear cationic amphipathic peptides chemically characterised and exhibit disrupting activities towards biological membranes PUBMED:11976325.

    \ 2623 IPR003418 \ Fumarate reductase is a membrane-bound flavoenzyme consisting of four subunits, A-B. A and B comprise the membrane-extrinsic catalytic domain and C and D link the catalytic centers to the electron-transport chain. This family consists of the 13kDa hydrophobic subunit D. This component may be required to anchor the catalytic components of the fumarate reductase complex to the cytoplasmic membrane.\ 6667 IPR010666 \

    This presumed zinc-binding domain is found in a variety of DNA-binding proteins. It seems likely that this domain is involved in nucleic acid binding. It is named GRF after three conserved residues in the centre of the alignment of the domain. This zinc finger may be related to .

    \ 6512 IPR009569 \

    This family consists of several hypothetical bacterial proteins of around 180 residues in length. The function of this family is unknown.

    \ 980 IPR007258 \ Vps52 complexes with Vps53 and Vps54 to form a multi-subunit complex involved in regulating membrane trafficking events PUBMED:10637310.\ 1326 IPR006883 \ This is a family of Baculovirus proteins of approximate mass 19 kDa.\ 7982 IPR012561 \

    This is central domain B in proteins of the Ferlin family PUBMED:15112237.

    \ 3060 IPR000975 \ Interleukin-1 is a cytokine with a wide range of biological and physiological effects, including fever, \ prostaglandin synthesis (in e.g., fibroblasts, muscle and endothelial cells), T-lymphocyte activation, \ and interleukin 2 production PUBMED:. This family is a member of a superfamily that also contains\ the heparin binding growth factors (HBGF), the Kunitz-type soybean trypsin inhibitors (STI) and \ histactophilin. All have very similar structures, but although the interleukin-1 and HBGF families share \ some sequence similarity (about 25%), they show none at all to the STIs. \

    The interleukin-1 family consists \ of 2 main classes, designated alpha (IL1A) and beta (IL1B), as well as the more recently discovered \ interleukin 1 receptor antagonist (IL1RA). Sequence similarity is high within the IL1A and IL1B subfamilies \ (about 60-70%) but low between them (less than 30%). IL1As and IL1Bs are synthesised as larger precursors, \ which are processed to give mature carboxy fragments. IL1B requires this cleavage to become biologically \ active, but IL1A precursor is already active. Both IL1A and IL1B bind to the same IL1-specific receptor on \ the target cell, which is then internalised to initiate the relevant effects. IL1RA binds to the IL1 \ receptor, blocking the effects of IL1A and IL1B whilst eliciting no response of its own. From sequence \ comparisons it seems to have arisen by gene duplication before IL1 diverged into IL1A and IL1B PUBMED:1828896. The crystal structures of IL1A and IL1B PUBMED:2602367 have been solved, \ they share the same 12-stranded beta-sheet structure as both the heparin binding growth factors \ and the Kunitz-type soybean trypsin inhibitors PUBMED:1738162. The beta-sheets are arranged in 3 similar lobes \ around a central axis, 6 strands forming an anti-parallel beta-barrel. Several regions, especially the loop \ between strands 4 and 5, have been implicated in receptor binding.

    \ 1677 IPR004946 \

    This family of cucumovirus proteins may be long-distance movement proteins.

    \ 7795 IPR012491 \

    Rec10 / Red1 is involved in meiotic recombination and chromosome segregation during homologous chromosome formation. This protein localises to the synaptonemal complex in Saccharomyces cerevisiae and the analogous structures (linear elements) in Schizosaccharomyces pombe PUBMED:15226405. This family is currently only found in fungi.

    \ 5497 IPR008831 \ The family consists of Saccharomyces cerevisiae SOH1 homologues. SOH1 is responsible for the repression of temperature sensitive growth of the HPR1 mutant PUBMED:7982575 and has been found to be a component of the RNA polymerase II transcription complex. SOH1 not only interacts with factors involved in DNA repair, but transcription as well. Thus, the SOH1 protein may serve to couple these two processes PUBMED:8849885.\ 431 IPR004888 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This is a family of eukaryotic enzymes belonging to glycosyl hydrolase family 63 (). They catalyse the specific cleavage of the\ non-reducing terminal glucose residue from Glc(3)Man(9)GlcNAc(2). Mannosyl oligosaccharide glucosidase is the first enzyme in the N-linked oligosaccharide processing pathway.

    \ \ 7895 IPR012982 \

    This domain is found in poly(ADP-ribose)-synthetases PUBMED:15112237. The function of this domain is unknown.

    \ 4728 IPR003148 \

    This domain is found in a wide variety of proteins. These protein include potassium channels , phosphoesterases , and various other transporters. This domain binds to NAD.

    \ 2592 IPR004237 \ The ability of bacteria to bind fibronectin is thought to enable the colonisation of wound tissue and blood clots. The fibronectin-binding protein is directly involved in the fibronectin-mediated adherence of the bacteria to epithelial cells PUBMED:1386839. The fibronectin binding repeat is found in bacterial fibronectin binding proteins and serum opacity factor.\ 3782 IPR001761 \

    This family includes the periplasmic binding proteins, and the LacI family transcriptional regulators. The periplasmic binding proteins are the primary receptors for chemotaxis and transport of many sugar based solutes. The LacI family of proteins consist of transcriptional regulators related to the lac repressor. In this case, generally the sugar binding domain binds a sugar which changes the DNA binding activity of the repressor domain (lacI) PUBMED:1583688, PUBMED:8638105.

    \ 7751 IPR012467 \

    The sequences featured in this family are found in hypothetical archaeal and bacterial proteins of unknown function. The region in question is approximately 200 amino acids long.

    \ 1180 IPR005611 \

    Amb V is an Ambrosia sp (ragweed) pollen allergen. Amb t V has been shown to contain a C-terminal helix as the major T cell\ epitope. Free sulphhydryl groups also play a major\ role in the T cell recognition of cross-reactivity T cell epitopes within these related allergens PUBMED:7594515.

    \ 7384 IPR011435 \

    This family of protenis of unknown function contains several conserved glycines and phenylalanines.

    \ 5896 IPR010339 \

    This family consists of the C-terminal region of several eukaryotic and archaeal RuvB-like 1 (Pontin or TIP49a) and RuvB-like 2 (Reptin or TIP49b) proteins. The N-terminal domain contains the AAA ATPase, central region domain. In zebrafish, the liebeskummer (lik) mutation, causes development of hyperplastic embryonic hearts. lik encodes Reptin, a component of a DNA-stimulated ATPase complex. Beta-catenin and Pontin, a DNA-stimulated ATPase that is often part of complexes with Reptin, are in the same genetic pathways. The Reptin/Pontin ratio serves to regulate heart growth during development, at least in part via the beta-catenin pathway PUBMED:12464178. TBP-interacting protein 49 (TIP49) was originally identified as a TBP-binding protein, and two related proteins are encoded by individual genes, tip49a and b. Although the function of this gene family has not been elucidated, they are supposed to play a critical role in nuclear events because they interact!\ with various kinds of nuclear factors and have DNA helicase activities. TIP49a has been suggested to act as an autoantigen in some patients with autoimmune diseases PUBMED:10902922.

    \ \ 4183 IPR001063 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L22 is one of the proteins from the large ribosomal subunit.\ In Escherichia coli, L22 is known to bind 23S rRNA. It belongs to a family of\ ribosomal proteins which includes: bacterial L22; algal and plant chloroplast L22\ (in legumes L22 is encoded in the nucleus instead of the chloroplast); cyanelle L22;\ archaebacterial L22; mammalian L17; plant L17 and yeast YL17.

    \ 2939 IPR002660 \ This family consists of various proteins from the herpesviridae that are similar to herpes simplex virus type I UL6 virion protein. UL6 is essential for cleavage and packaging of the viral genome PUBMED:8955060.\ 4638 IPR003328 \ Zonadhesin is a sperm-specific membrane protein containing multiple cell adhesion molecule-like domains PUBMED:7592795, PUBMED:9452463. Pig zonadhesin binds to the extracellular matrix of the egg in a species-specific manner. The TILa domain is found five times in pig zonadhesin.\ It is a cysteine rich domain that occurs along side the TIL domain () and is likely to be a distantly related relative.\ 81 IPR002191 \

    The fliL operon of Escherichia coli contains seven genes (including fliO, fliP, fliQ and fliR) involved in the biosynthesis and functioning of the flagellar organelle PUBMED:8282695. The fliO, fliP, fliQ and fliR genes encode highly hydrophobic polypeptides. The fliQ gene product, a small integral membrane protein that contains two putative transmembrane (TM) regions, is required for the assembly of the rivet at the earliest stage of flagellar biosynthesis.

    \

    Proteins sharing an evolutionary relationship with FliQ have been found in a range of bacteria: these include Yop translocation protein S from Yersinia sp. PUBMED:8300512; surface antigen-presentation protein SpaQ from Salmonella typhimurium and Shigella flexneri PUBMED:8404849; and probable translocation protein Y4YM from Rhizobium sp. PUBMED:9163424. All of these members export proteins, that do not possess signal peptides, through the membrane. Although the proteins that these exporters move may be different, the exporters are thought to function in similar ways PUBMED:7814323.

    \ 2410 IPR003407 \ This family represents the immunodominant surface antigen of Theileria parasites including equi merozoite antigen-1 (EMA-1) and equi merozoite antigen-2 (EMA-2) PUBMED:9497033. The protein shows variation at a putative glycosylation site, a potential mechanism for host immune response evasion PUBMED:8538686.\ 6292 IPR010506 \

    This domain binds DMAP1, a transcriptional co-repressor.

    \ 419 IPR000971 \ Globins are heme-containing proteins involved in binding and/or transporting oxygen. They belong to a very \ large and well studied family which is widely distributed in many organisms. The major groups of globins are\ hemoglobins (Hb) and myoglobins (Mg) from vertebrates, invertebrate globins, leghemoglobins from plants,\ and flavohemoproteins from bacteria. Hb is the protein responsible for transporting oxygen from the lungs to \ other tissues, and is a tetramer of two alpha and two beta chains. Most vertebrate species also express\ specific embryonic or fetal forms of hemoglobin where the alpha or the beta chains are replaced by a chain \ with higher oxygen affinity, as for the gamma, delta, epsilon and zeta chains in mammals, for example.\ Mg is a monomeric protein responsible for oxygen storage in muscles. A wide variety of globins are found in\ invertebrates PUBMED:3138426. Molluscs generally have one or two muscle globins which are either monomeric \ or dimeric, while insects, such as the midge Chironomus thummi, have a large set of extracellular globins. \ Nematodes and annelids have a variety of intracellular and extracellular globins, some of them are \ multi-domain polypeptides (from two up to nine-domain globins), and some produce large, disulphide-bonded \ aggregates. Leghemoglobins (Lg) from the root nodules of leguminous plants provides oxygen for bacteroids.\ Flavohemoproteins from bacteria (Escherichia coli hmpA) and fungi consist of two distinct domains, an \ N-terminal globin domain and a C-terminal FAD-containing reductase domain. In bacteria such as Vitreoscilla, \ the enzyme-associated globin is a single domain protein. All these globins seem to have evolved from a \ common ancestor.\ 3310 IPR007018 \ Regulation of mRNA synthesis requires intermediary proteins that transduce regulatory signals from upstream transcriptional activator proteins to basal transcription machinery at the core promoter. Three types of intermediary factors that enable the basal transcription machinery to respond to transcriptional activator proteins bound to regulatory DNA sequences have been identified: (i) TAFIIs, which associate with TATA-binding protein (TBP) to form TFIID; (ii) mediator, which associates with RNA polymerase II to form a holo-polymerase; and (iii) coactivators such as human upstream stimulatory activity (USA), mammalian CBP/P300, yeast ADA complex, and HMG proteins. The interaction of these multiprotein complexes with activators and general transcription factors is essential for transcriptional regulation. This family of proteins represent the transcriptional mediator protein that is required for activation of many RNA polymerase II promoters and which are conserved from yeast to humans PUBMED:9234719.\ 6039 IPR009345 \

    This family consists of several eukaryotic BMP and activin membrane-bound inhibitor (BAMBI) proteins. Members of the transforming growth factor-beta (TGF-beta) superfamily, including TGF-beta, bone morphogenetic proteins (BMPs), activins and nodals, are vital for regulating growth and differentiation. BAMBI is related to TGF-beta-family type I receptors but lacks an intracellular kinase domain. BAMBI is co-expressed with the ventralising morphogen BMP4 during Xenopus embryogenesis and requires BMP signalling for its expression. The protein stably associates with TGF-beta-family receptors and inhibits BMP and activin as well as TGF-beta signalling PUBMED:10519551.

    \ 1433 IPR001259 \

    Calpain inhibitor (calpastatin) is restricted to the metazoa and specifically inhibits calpain (calcium-dependent cysteine protease). Calpastatin belongs to MEROPS inhibitor family I27, clan II. It plays a key role in post-mortem tenderisation of meat and may be involved in muscle\ protein degradation in living tissue.

    \ \ \

    The calpain system originally comprised three molecules: two Ca2+-dependent proteases, mu-calpain and m-calpain, and a third polypeptide, calpastatin, whose only known function is to inhibit the two calpains. Both mu- and m-calpain are heterodimers containing an identical 28-kDa subunit and an 80-kDa subunit that shares 55-65% sequence homology between the two proteases. The single calpastatin gene can produce eight or more calpastatin polypeptides ranging from 17 to 85 kDa by use of different promoters and alternative splicing events. The physiological significance of these different calpastatins is unclear, although all bind to three different places on the calpain molecule; binding to at least two of the sites is Ca2+ dependent.

    \ \

    How calpain activity is regulated in cells is still unclear, but the calpains\ ostensibly participate in a variety of cellular processes including remodelling of cytoskeletal/membrane attachments, different signal transduction pathways, and apoptosis. Deregulated calpain activity following loss of Ca2+ homeostasis results in tissue damage in response to events such as myocardial infarcts, stroke, and brain trauma PUBMED:12843408.

    \ \ 6686 IPR010674 \

    This domain represents a conserved region of approximately 60 residues in length within nucleolar GTP-binding protein 1 (NOG1). The NOG1 family includes eukaryotic, bacterial and archaeal proteins. In Saccharomyces cerevisiae, the NOG1 gene has been shown to be essential for cell viability, suggesting that NOG1 may play an important role in nucleolar functions. In particular, NOG1 is believed to be functionally linked to ribosome biogenesis, which occurs in the nucleolus. In eukaryotes, NOG1 mutants were found to disrupt the biogenesis of the 60S ribosomal subunit PUBMED:12788953.

    \

    The DRG and OBG proteins as well as the prokaryotic NOG-like proteins are homologous throughout their length to the amino half of eukaryotic NOG1, which contains the GTP binding motifs (); the N-terminal GTP-binding motif is required for function.

    \ 1652 IPR008213 \ The phycobilisome linker polypeptide determines the state of aggregation and the location of the disc-shaped phycobiliprotein units within the phycobilisome and modulates their spectroscopic properties in order to mediate a directed and optimal energy transfer. The phycobilisome is a hemidiscoidal structure that is composed of two distinct substructures, a core complex (that contains the phycobiliproteins) and a number of rods radiating from the core. The N-terminal domain of the petH gene product from Anabaena sp. PCC 7119 shows homology to the CpcD phycobilisome linker polypeptide PUBMED:8343609.\ 2583 IPR001009 \ Synonym(s): RNA nucleotidyltransferase (RNA-directed) \ \

    The pattern describes the P2 subunit of influenza RNA polymerase (),an enzyme which is composed of three subunits: P1 (or PB1), P2 (or PA), and P3 (or PB2). The P2 subunit in addition to the P1 subunit is required for viral RNA synthesis in replication of the influenza virus genome PUBMED:8709268.

    \ 6384 IPR010549 \

    This entry represents the C-terminal region of the African swine fever virus IAP-like protein p27. This domain is found in conjunction with . It has been suggested that the domain may be incoded by the gene involved in aspects of infection in the arthropod host, ticks of the genus Ornithodoros PUBMED:9143281.

    \ 1724 IPR004133 \ This domain contains 9 conserved cysteines and is extracellular. Therefore the cysteines may form disulphide bridges. This family of proteins has been termed the DAN family PUBMED:9660951 after the first member to be reported. This family includes DAN, Cerberus and Gremlin. The gremlin protein is an antagonist of bone morphogenetic protein signaling. It is postulated that all members of this family antagonize different TGF beta TGF-beta ligands PUBMED:9660951.\ 7002 IPR010801 \

    This family contains bacterial fibronectin-attachment proteins (FAP). Family members are rich in alanine and proline, are approximately 300 long, and seem to be restricted to mycobacteria. These proteins contain a fibronectin-binding motif that allows mycobacteria to bind to fibronectin in the extracellular matrix PUBMED:9988684.

    \ 2289 IPR006994 \ This family includes a number of poorly characterised eukaryotic proteins.\ 766 IPR001374 \

    The R3H motif: a domain that binds single-stranded nucleic acids.

    \ \

    The most prominent feature of the R3H motif is the presence of an invariant arginine residue and a highly conserved histidine residue that are separated by three residues. The motif also displays a conserved pattern of hydrophobic residues, prolines and glycines. The R3H motif is present in proteins from a diverse range of organisms that includes Eubacteria, green plants, fungi and various groups of metazoans. Intriguingly, it has not yet been identified in Archaea and Escherichia coli.

    \ \

    The sequences that contain the R3H domain, many of which are hypothetical proteins predicted from genome sequencing projects, can be grouped into eight families on the basis of similarities outside the R3H region. Three of the families contain ATPase domains either upstream (families II and VII) or downstream of the R3H domain (family VIII). The N-terminal part of members of family VII contains an SF1 helicase domain5. The C-terminal part of family VIII contains an SF2 DEAH helicase domain5. The ATPase domain in the members of family II is similar to the stage-III sporulation protein AA (S3AA_BACSU), the proteasome ATPase, bacterial transcription-termination factor r and the mitochondrial F1-ATPase b subunit (the F5 helicase family5). Family VI contains Cys-rich repeats6, as well as a ring-type zinc finger upstream of the R3H domain. JAG bacterial proteins (family I) contain a KH domain N-terminal to the R3H domain. The functions of other domains in R3H proteins support the notion that the R3H domain might be involved in interactions with single-stranded nucleic acids PUBMED:9787637.

    \ 5245 IPR008748 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \ This group of cysteine peptidases correspond to MEROPS peptidase family C41 (clan C-). The type example is cysteine proteinase (hepatitis E virus), which is a papain-like protease that cleaves the viral polyprotein encoded by ORF1 of the hepatitis E virus (Porcine hemagglutinating encephalomyelitis virus) PUBMED:10963340, PUBMED:1518855, PUBMED:8219799.\ 4518 IPR006173 \

    Staphylococcus aureus is a Gram-positive coccus that grows in clusters or\ pairs, and is the major cause of nosocomial infections due to its multiple \ antibiotic resistant nature PUBMED:3782090. Patients who are immunocompromised (e.g., \ those suffering from third degree burns or chronic illness) are at risk \ from deep staphylococcal infections, such as osteomyelitis and pneumonia.\ Most skin infections are also caused by this bacterium.

    \ \ Many virulence mechanisms are employed by Staphylococci to induce \ pathogenesis: these can include polysaccharide capsules and exotoxins PUBMED:3782090.\ One of the major virulence exotoxins is toxic shock syndrome toxin (TSST),\ which is secreted by the organism upon successful invasion. It causes a\ major inflammatory response in the host via superantigenic properties,\ and is the causative agent of toxic shock syndrome.

    \

    The structure of the TSST protein was originally determined to 2.5A by means\ of X-ray crystallography PUBMED:8107781. The N- and C-terminal domains both contain\ regions involved in MHC class II association; the C-terminal domain is also\ implicated in binding the T-cell receptor. Overall, the structure \ resembles that of Staphylococcal enterotoxin B (SEB), but differs in its\ N-terminus and in the degree to which a long central helix is covered by \ surface loops PUBMED:8268150. The region around the carboxyl end of this helix is \ proposed to govern the superantigenic properties of TSST. An adjacent\ region along this helix is thought to be critical in the ability of TSST\ to induce toxic shock syndrome. Most recently, the structures of five \ mutants of TSST have been determined to 1.95A PUBMED:9194182. The mutations are in \ the central alpha-helix, and allow mapping of portions of TSST involved in\ superantigenicity and lethality.

    \ 5848 IPR009259 \

    This family consists of several roughex (RUX) proteins specific to Drosophila species. Roughex can influence the intracellular distribution of cyclin A and is therefore defined as a distinct and specialised cell cycle inhibitor for cyclin A-dependent kinase activity PUBMED:11027291. Rux is though to regulate the metaphase to anaphase transition during development PUBMED:11231149.

    \ 4498 IPR002184 \

    Animals recognise a wide variety of chemicals using their senses of taste and smell. The nematode Caenorhabditis elegans has only 14 types of chemosensory neuron, yet is able to respond to dozens of chemicals because each neuron detects several stimuli. More than 40 highly divergent transmembrane proteins that could contribute to this functional diversity have been described PUBMED:7585938. Most of the candidate receptor genes are in clusters of similar genes; 11 of these appear to be expressed in small subsets of chemosensory neurons. A single type of neuron can potentially express at least 4 different receptor genes PUBMED:7585938. Some of these might encode receptors for water-soluble attractants, repellents and pheromones, which may be divergent members of the G-protein-coupled receptor family PUBMED:7585938.

    \

    Sequences of the srb family of C.elegans receptor-like proteins contain 6-8 hydrophobic, putative transmembrane, regions. These can be distinguished from other 7TM proteins (especially those known to couple G-proteins, see ) by their own characteristic TM signatures.

    \ 1514 IPR005150 \

    Cellulose, an aggregate of unbranched polymers of beta-1,4-linked glucose residues, is the major component of wood and thus paper, and is synthesized by plants, most algae, some bacteria and fungi, and even some animals. The genes that synthesize cellulose in higher plants differ greatly from the well-characterized genes found in Acetobacter and Agrobacterium sp. More correctly designated as "cellulose synthase catalytic subunits", plant cellulose synthase (CesA) proteins are integral membrane proteins, approximately 1,000 amino acids in length. There are a number of highly conserved residues, including several motifs shown to be necessary for processive glycosyltransferase activity PUBMED:8901635.

    \ 2920 IPR007616 \ The proteins in this family have no known function. Cytomegalovirus UL88 is also a member of this family.\ 6712 IPR010686 \

    This family contains a number of bacterial and eukaryotic proteins of unknown function that are approximately 200 residues long. Some family members are annotated as putative lipoproteins.

    \ 8016 IPR012540 \

    This family consists of cuticle protein 7 isoforms that are isolated from the carapace cuticle of a juvenile horseshoe crab, Limulus polyphemus. There are 3 isoforms of cuticle protein 7. The 3 isoforms are N-terminally blocked but could be deblocked by treatment with pyroglutaminase, showing that the N-terminal residue is a pyroglutamine residue PUBMED:12628379.

    \ 4462 IPR001852 \

    Snz1p is a highly conserved protein involved in growth arrest in S. cerevisiae PUBMED:8955308. Sor1 (singlet oxygen resistance) is essential in pyridoxine (vitamin B6)\ synthesis in C. nicotianae and Aspergillus flavus. Pyridoxine\ quenches singlet oxygen at a rate comparable to that of vitamins C and E, two of the most highly efficient biological antioxidants, suggesting a previously unknown role for pyridoxine in\ active oxygen resistance. PUBMED:10430950.

    \ 3621 IPR006137 \

    Respiratory-chain NADH dehydrogenase () (also known as complex\ I or NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex\ located in the inner mitochondrial membrane which also seems to exist in\ the chloroplast and in cyanobacteria (as a NADH-plastoquinone oxidoreductase).

    \

    Among the 25 to 30 polypeptide subunits of this bioenergetic enzyme complex\ there is one with a molecular weight of 20 kDa (in mammals) PUBMED:1577158, which is a\ component of the iron-sulphur (IP) fragment of the enzyme. It seems to bind a\ 4Fe-4S iron-sulphur cluster. The 20 kDa subunit has been found to be nuclear encoded, as a precursor form with a transit peptide in mammals, and\ in Neurospora crassa. It is \ mitochondrial encoded in Paramecium (gene psbG)\ and chloroplast encoded in various higher plants (gene ndhK or psbG).

    \ 3712 IPR001328 \

    Peptidyl-tRNA hydrolase () (PTH) is a bacterial enzyme that cleaves\ peptidyl-tRNA or N-acyl-aminoacyl-tRNA to yield free peptides or N-acyl-amino acids and \ tRNA. The natural substrate for this enzyme may be peptidyl-tRNA which drop off the \ ribosome during protein synthesis PUBMED:1833189,\ PUBMED:8635758. Bacterial PTH has been found to be \ evolutionary related to a yeast protein PUBMED:8563640.

    \ 8058 IPR013182 \

    This domain is found in different combinations with cortical patch components EF hand, SH3 and ENTH and is therefore likely to be involved in cytoskeletal processes. This family contains many hypothetical proteins.

    \ 6850 IPR010744 \

    This family consists of several phage CI repressor proteins and related bacterial sequences. The CI repressor is known to function as a transcriptional switch, determining whether transcription is lytic or lysogenic PUBMED:2370665.

    \ 6306 IPR009469 \

    This domain represents the N-terminal region of the coronavirus RNA-directed RNA Polymerase.

    \ 868 IPR001212 \

    Somatomedin B, a serum factor of unknown function, is a small cysteine-rich peptide,\ derived proteolytically from the N-terminus of the cell-substrate adhesion protein\ vitronectin PUBMED:2447940. Cys-rich\ somatomedin B-like domains are found in a number of proteins PUBMED:1710108, including plasma-cell membrane glycoprotein (which\ has nucleotide pyrophosphate and alkaline phosphodiesterase I activities) PUBMED:1647027 and placental protein 11 (which appears\ to possess amidolytic activity).

    \

    The SMB domain of vitronectin has been demonstrated to interact with both the\ urokinase receptor and the plasminogen activator inhibitor-1 (PAI-1) and the\ conserved cysteines of the NPP1 somatomedin B-like domain have been shown to\ mediate homodimerization PUBMED:12533192.\ \ As shown in the following schematic representation below the SMB domain\ contains eight Cys residues, arranged into four disulfide bonds. It has been\ suggested that the active SMB domain may be permitted considerable disulfide\ bond heterogeneity or variability, provided that the Cys25-Cys31 disulfide\ bond is preserved. The three dimensional structure of the SMB domain is\ extremely compact and the disulfide bonds are packed in the center of the\ domain forming a covalently bonded core PUBMED:15157085. The structure of the SMB domain\ presents a new protein fold, with the only ordered secondary structure being a\ single-turn alpha-helix and a single-turn 3(10)-helix PUBMED:12808446.

    \
    \
             xxCxxxxxxCxxxxxxxxxCxCxxxCxxxxxCCxxxxxCxxxxx\
                                ********************\
    \
    'C': conserved cysteine probably involved in a disulfide bond.\
    '*': position of the pattern.\
    
    \ 2027 IPR002878 \ This domain has no known function and is found in conserved hypothetical archaea\ and bacterial proteins. The domain is approximately 120 amino acids long.\ 7882 IPR012618 \

    This family consists of the tetracycline resistance leader peptide. The presence of 3 inverted repeats, which can form 2 different conformations of mRNA, suggests that the tetracycline resistance (TcR) region is regulated by a translational attenuation mechanism. A Rho-independent transcriptional terminator structure is present immediately after the translational stop codon of the TET protein PUBMED:2996983.

    \ 3986 IPR002192 \ This enzyme catalyses the reversible conversion of ATP to AMP, pyrophosphate and phosphoenolpyruvate (PEP) PUBMED:8610096. Residues at the N-terminus correspond to the transit peptide which is indispensable for the transport of the precursor protein into chloroplasts in plants PUBMED:2841317. This domain is present at the N-terminus of some PEP-utilizing enzymes.\ 2721 IPR002109 \

    Glutaredoxins PUBMED:3152490, PUBMED:3286320, PUBMED:2668278, also known as thioltransferases, are small proteins of approximately one hundred amino-acid residues. Glutaredoxin functions as an electron carrier in the glutathione-dependent synthesis of deoxyribonucleotides by the enzyme ribonucleotide reductase. Like thioredoxin, which functions in a similar way, glutaredoxin possesses an active center disulphide bond. It exists in either a reduced or an oxidized form where the two cysteine residues are linked in an intramolecular disulphide bond.

    \ \

    Glutaredoxin has been sequenced in a variety of species. On the basis of extensive sequence similarity, it has been proposed PUBMED:1994586 that vaccinia protein O2L is most probably a glutaredoxin. Finally, it must be noted that phage T4 thioredoxin seems also to be evolutionary related.

    \ \ \ 185 IPR006671 \

    Cyclins are eukaryotic proteins that play an active role in controlling nuclear cell division cycles PUBMED:12910258, and regulate cyclin dependent kinases (CDKs). Cyclins, together with the p34 (cdc2) or cdk2 kinases, form the Maturation Promoting Factor (MPF). There are two main groups of cyclins, G1/S cyclins, which are essential for the control of the cell cycle at the G1/S (start) transition, and G2/M cyclins, which are essential for the control of the cell cycle at the G2/M (mitosis) transition. G2/M cyclins accumulate steadily during G2 and are abruptly destroyed as cells exit from mitosis (at the end of the M-phase). In most species, there are multiple forms of G1 and G2 cyclins. For example, in vertebrates, there are two G2 cyclins, A and B, and at least three G1 cyclins, C, D, and E.

    \

    Cyclin homologues have been found in various viruses, including herpesvirus saimiri and Kaposis sarcoma-associated herpesvirus. These viral homologues differ from their cellular counterparts in that the viral proteins have gained new functions and eliminated others to harness the cell and benefit the virus PUBMED:11056549.

    \ \ Cyclins contain two domains of similar all-alpha fold, of which this entry is associated with the N-terminal domain.\ 4389 IPR001901 \

    Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase\ pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to\ the translocase component. PUBMED:2202721. From there, the mature proteins are either targeted to the outer\ membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial\ chromosome.\

    \

    The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral\ membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of\ the mature peptide into the periplasm (SecD and SecF) PUBMED:2202721. The chaperone protein SecB PUBMED:11336818 is a highly acidic homotetrameric protein that exists as a "dimer of dimers" in the bacterial cytoplasm.\ SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane\ protein ATPase SecA for secretion PUBMED:10418149. SecE, part of the main \ SecYEG translocase complex, is ~106 residues in length, and spans the \ inner membrane of the Gram-negative bacterial envelope. Together with\ SecY and SecG, SecE forms a multimeric channel through which preproteins\ are translocated, using both proton motive forces and ATP-driven secretion. The latter is mediated by SecA.

    \ \

    In eukaryotes, the evolutionary related protein sec61-gamma plays a role in protein translocation through the endoplasmic reticulum; it is part of a trimeric complex that also consist of sec61-alpha and beta PUBMED:8107851. Both secE and sec61-gamma are small proteins of about 60 to 90 amino acids that contain a single transmembrane region at their C-terminal extremity (Escherichia coli secE is an exception, in that it possess an extra N-terminal segment of 60 residues that contains two additional transmembrane domains) PUBMED:9393849.

    \ 7506 IPR011659 \ This region appears to be related to the repeat (personal obs: C Yeats). This model is likely to miss copies within a sequence.\ 7967 IPR012555 \

    This family consists of the major transforming proteins (E5) of the bovine papilloma virus (BPV). The equine sarcoid is one of the most common dermatological lesion in equids. It is a benign, locally invasive dermal fibroblastic lesion and studies have shown an association of the lesions with BPV. E5 is a short hydrophobic membrane protein localising to the Golgi apparatus and other intracellular membranes. It binds to and constitutively activates the platelet-derived growth factor-beta in transformed cells. This stimulation activates a receptor signalling cascade which results in an intracellular growth stimulatory signal PUBMED:12951274.

    \ 5826 IPR009250 \

    This family consists of several FlgM proteins from Helicobacter pylori. FlgM is an anti-sigma factor which along with FliA plays a central role in the regulation of flagellar biogenesis in H. pylori PUBMED:11985711.

    \ 5784 IPR009234 \

    This region of the APC family of proteins is known as the basic domain. It contains a high proportion of positively charged amino acids and interacts with microtubules PUBMED:9654054.

    \ 8127 IPR013194 \

    This domain is found on transcriptional regulators. It forms interactions with histone deacetylases PUBMED:12773392.

    \ 3098 IPR001666 \ Phosphatidylinositol transfer protein (PITP) is a ubiquitous cytosolic protein, thought to be involved in transport of phospholipids from their site of synthesis in the endoplasmic reticulum and Golgi to other cell membranes PUBMED:7774006. More recently, PITP has been shown to be an essential component of the polyphosphoinositide synthesis machinery and is hence required for proper signalling by epidermal growth factor and f-Met-Leu-Phe, as well as for exocytosis. The role of PITP in polyphosphoinositide synthesis may also explain its involvement in intracellular vesicular traffic PUBMED:7774006.\ \ 5022 IPR001876 \

    Ran is an evolutionary conserved member of the Ras superfamily that regulates all receptor-mediated transport between the\ nucleus and the cytoplasm. Ran binding protein 2 (RanBP2) is a 358-kDa nucleoporin located on the cytoplasmic side of the nuclear pore complex which plays a role in nuclear protein import PUBMED:12019565. RanBP2 contains multiple zinc fingers which mediate binding to RanGDP PUBMED:10318915.

    \ 7106 IPR009901 \

    This family consists of several hypothetical Enterobacterial proteins of around 160 residues in length. The function of this family is unknown.

    \ 3323 IPR006124 \ This domain unites alkaline phosphatase,\ N-acetylgalactosamine-4-sulphatase, and cerebroside sulphatase, enzymes with known\ three-dimensional structures, with phosphopentomutase,\ 2,3-bisphosphoglycerate-independent phosphoglycerate mutase, phosphoglycerol\ transferase, phosphonate monoesterase, streptomycin-6-phosphate phosphatase, alkaline\ phosphodiesterase/nucleotide pyrophosphatase PC-1, and several closely related sulphatases. This domain is also related to alkaline phosphatase PUBMED:10082381.\ The most conserved residues are\ probably involved in metal binding and catalysis.\ 6378 IPR010545 \

    This family consists of several hypothetical archaeal proteins of unknown function.

    \ 2193 IPR007487 \ This is a family of putative secreted proteins of unknown function.\ 6468 IPR010588 \

    This entry represents the C terminus of plant P proteins. The maize P gene is a transcriptional regulator of genes encoding enzymes for flavonoid biosynthesis in the pathway leading to the production of a red phlobaphene pigment PUBMED:8768374, and P proteins are homologous to the DNA-binding domain of myb-like transcription factors PUBMED:8313474. This domain is associated with domain.

    \ 6047 IPR010410 \

    This is a family of plant proteins with undetermined function.

    \ 3309 IPR004229 \

    Methylamine dehydrogenase () is a periplasmic quinoprotein found in several methylotrophic bacteria PUBMED:8021187. It is induced when grown on methylamine as a carbon source MADH and catalyses the oxidative deamination of amines to their corresponding aldehydes. The redox cofactor of this enzyme is tryptophan tryptophylquinone (TTQ). Electrons derived from the oxidation of methylamine are passed to an electron acceptor, which is usually the blue-copper protein amicyanin ().

    \ \ \ \

    MADH is a hetero-tetramer, comprised of two heavy subunits and two light subunits. The light subunit forms two antiparallel beta sheets, and contains the active site of this enzyme which is accessible via a hydrophobic channel between the heavy and light subunits. The redox cofactor TTQ is formed from two posttranlationally modified tryptophan residues within this subunit PUBMED:9514722.

    \ 1188 IPR004839 \ Aminotransferases share certain mechanistic features with other pyridoxal-phosphate dependent enzymes, such as the covalent binding of the pyridoxal-phosphate group to a lysine residue. On the basis of sequence similarity, these various enzymes can be grouped PUBMED:1990006 into class I and class II. This entry includes proteins from both subfamilies.\ 3162 IPR002210 \ Papillomaviruses are members of the papovavirus superfamily. More than 70 different types of papillomavirus have been discovered in humans, some of which have been shown to cause genital carcinomas and cutaneous warts. The viruses contain a circular dsDNA genome surrounded by an icosahedral capsid. Two proteins are involved in capsid formation: a major (L1) and a minor (L2) protein, in the approximate proportion 95:5%. Experiments have indicated that intermolecular disulphide bonding is responsible for cohesion of the L1 capsid proteins PUBMED:7561785.\ 123 IPR005084 \

    The carbohydrate-binding module, family 6 PUBMED: was previously known as cellulose-binding domain family VI (CBD VI). The cellulose-binding function has been\ demonstrated in one case on amorphous cellulose and xylan. Some of these modules also bind beta-1,3-glucan.

    \ 4926 IPR002842 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \ \ \

    This family also includes the vacuolar ATP synthase\ E subunit PUBMED:2145285, as well as the archaebacterial ATP\ synthase E subunit PUBMED:8702544.

    \ 6401 IPR009508 \

    This family consists of several eukaryotic Churchill proteins. This protein contains a novel zinc binding region that mediates FGF signaling during neural development. The slow induction by FGF of a transcription factor (Churchill) in the neural plate in turn induces expression of Sip1 (Smad interacting protein-1), which inhibits mesodermal genes and sensitizes cells to later neural inducing factors PUBMED:14651843.

    \ 5361 IPR008749 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases correspond to MEROPS peptidase family C42. The type example is beet yellows virus-type papain-like endopeptidase (beet yellows virus) PUBMED:11711606.

    \ 1463 IPR001289 \ The CCAAT-binding factor (CBFB/NF-YA) is a mammalian transcription factor that binds to a \ CCAAT motif in the promoters of a wide variety of genes, including type I collagen and \ albumin PUBMED:2266139. The factor is a heteromeric complex of A and B subunits, both of \ which are required for DNA-binding PUBMED:1549471. The subunits can \ interact in the absence of DNA-binding, conserved regions in each being important in \ mediating this interaction.

    The B subunit contains a region of similarity with the yeast \ protein HAP2 PUBMED:2000400. For the B subunit it has been suggested that the N-terminal \ portion of the conserved region is involved in subunit interaction and the C-terminal\ region involved in DNA-binding PUBMED:1569083.

    \ 598 IPR005304 \

    Originally isolated from Schizosaccharomyces pombe, Mra1 () was identified as a protein of 359 amino acids, which has apparent homologues in rice and budding yeast.The suppressor Mra1 is found in high-copy-number when Ras1 is mutated, and recovers the mating deficiency caused by the decrease of Ras1 activity. Mutational analysis in yeast suggests that the suppressor Mra1 is essential for cell growth and promotes mating PUBMED:9133664 and lies downstream of Ras1 in a unique signal transduction pathway.

    \ 190 IPR002602 \ This domain has no known function being found in several\ Caenorhabditis elegans proteins. The domain contains 12 conserved\ cysteines that probably form six disulphide bridges.\ This domain is found associated with Ig and\ Fibronectin, type III domains.\ 6567 IPR010620 \

    This family is related to and is likely to also form a beta-propeller. SBBP stands for Seven Bladed Beta Propeller.

    \ 5113 IPR007950 \

    This family consists of several bacterial fertility inhibition (FINO) proteins. The conjugative\ transfer of F-like plasmids is repressed by FinO, an RNA binding protein. FinO blocks the\ translation of TraJ, a positive activator of transcription of genes required for conjugation. FinO binds a TraJ antisense RNA, FinP,\ thereby protecting it from degradation, and catalyzes FinP-TraJ mRNA hybridization. Interactions between these two RNAs are\ predicted to block the TraJ ribosomal binding site. FinO is largely helical, binds to its highest affinity binding site within FinP as a monomer, and contains two distinct RNA binding\ regions \ PUBMED:10876242.

    \ 3790 IPR003683 \

    This family consists of cytochrome b6/f complex subunit 5 (PetG). The cytochrome bf complex found in green plants, eukaryotic algae and cyanobacteria, connects photosystem I to photosystem II in the electron transport chain, functioning as a plastoquinol:plastocyanin/cytochrome c6 oxidoreductase PUBMED:7493961. The purified complex from the unicellular alga Chlamydomonas reinhardtii contains seven subunits; namely four high molecular weight subunits (cytochrome f, Rieske iron-sulphur protein, cytochrome b6, and subunit IV) and three approximately miniproteins (PetG, PetL, and PetX) PUBMED:7493968. Stoichiometry measurements are consistent with every subunit being present as two copies per b6/f dimer. The absence of PetG affects either the assembly or stability of the cytochrome bf complex in Chlamydomonas reinhardtii PUBMED:7493961.

    \ 6744 IPR009688 \

    This entry represents the C terminus (approx. 120 residues) of a number of eukaryotic proteins of unknown function.

    \ 6283 IPR010934 \

    This entry represents the C-terminal region of several NADH dehydrogenase subunit 5 proteins and is found in conjunction with and .

    \ 7289 IPR010900 \

    This family consists of several bacterial nicotine adenine dinucleotide glycohydrolase (NGA) proteins which appear to be specific to Streptococcus pyogenes. NAD glycohydrolase (NADase) is a potential virulence factor. Streptococcal NADase may contribute to virulence by its ability to cleave beta-NAD at the ribose-nicotinamide bond, depleting intracellular NAD pools and producing the potent vasoactive compound nicotinamide PUBMED:10979908.

    \ 5943 IPR010360 \

    This is a family of bacterial sequences with undetermined function.

    \ 7080 IPR009881 \

    This family contains a number of hypothetical bacterial proteins of unknown function approximately 100 residues in length.

    \ 4744 IPR004506 \ tRNA (5-methylaminomethyl-2-thiouridylate)-methyltransferase () catalyses the addition of 5-methylaminomethyl-2-thiouridylate to tRNAs using S-adenosyl-L-methionine as a substrate and releasing S-adenosyl-L-homocysteine. The enzyme is cytoplasmic and is involved in tRNA processing.\ 4453 IPR006021 \ Staphylococcus aureus nuclease (SNase) homologues, previously thought to be\ restricted to bacteria and archaea, are also in eukaryotes. Staphylococcal nuclease has multidomain organization PUBMED:9003410. The human cellular coactivator p100 contains\ four repeats, each of which is a SNase homologue. These repeats are unlikely to possess SNase-like activities as each lacks equivalent SNase catalytic residues, yet they may mediate p100's single-stranded DNA-binding function PUBMED:9041650.\ A variety of proteins including many that are still uncharacterised belong to this group.\ 3967 IPR005058 \

    P4A is one of the most abundant structural proteins in the Vaccinia virion.

    \ 6547 IPR009594 \

    This entry represents the N terminus of bacterial ARAC-type transcriptional regulators. In Escherichia coli these regulate the L-arabinose operon through sensing the presence of arabinose, and when the sugar is present, transmitting this information from the arabinose-binding domains to the protein s DNA-binding domains PUBMED:12683999. This family might represent the N-terminal arm of the protein, which binds to the C-terminal DNA binding domains to hold them in a state where the protein prefers to loop and remain non-activating PUBMED:9600837. This domain is associated with the domain.

    \ 1004 IPR007275 \ This family of poorly characterised proteins contains YT521-B, a putative splicing factor from rat. YT521-B is a tyrosine-phosphorylated nuclear protein, that interacts with the nuclear transcriptosomal component scaffold attachment factor B, and the 68 kDa Src substrate associated during mitosis, Sam68. In vivo splicing assays demonstrated that YT521-B modulates alternative splice site selection in a concentration-dependent manner PUBMED:10564280.\ 2989 IPR000981 \ Oxytocin and vasopressin are nine-residue, structurally and functionally related neurohypophysial peptide \ hormones. Oxytocin mediates contraction of the smooth muscle of the uterus and mammary gland, while \ vasopressin has antidiuretic action on the kidney, and mediates vasoconstriction of the peripheral vessels \ PUBMED:3147712. In common with most active peptides, both hormones are synthesised as larger protein \ precursors that are enzymatically converted to their mature forms. Members of this family are found in birds,\ fish, reptiles and amphibians (mesotocin, isotocin, valitocin, glumitocin, aspargtocin, vasotocin, seritocin, \ asvatocin, phasvatocin), in worms (annetocin), octopi (cephalotocin), locust (locupressin or neuropeptide\ F1/F2) and in molluscs (conopressins G and S) PUBMED:7591488.\ 1339 IPR004122 \

    Barrier-to-autointegration factor (BAF) is an essential protein that is highly conserved in metazoan evolution, and which may act as a DNA-bridging protein PUBMED:12902403. BAF binds directly to double-stranded DNA, to transcription activators, and to inner nuclear membrane proteins, including lamin A filament proteins that anchor nuclear-pore complexes in place, and nuclear LEM-domain proteins that bind to laminins filaments and chromatin. New findings suggest that BAF has structural roles in nuclear assembly and chromatin organization, represses gene expression and might interlink chromatin structure, nuclear architecture and gene regulation in metazoans PUBMED:15130582.

    \

    BAF can be exploited by retroviruses to act as a host component of pre-integration complexes, which promote the integration of the retroviral DNA into the host chromosome by preventing autointegration of retroviral DNA PUBMED:14645565. BAF might contribute to the assembly or activity of retroviral pre-integration complexes through direct binding to the retroviral proteins p55 Gag and matrix, as well as to DNA.

    \ \ 6553 IPR010611 \

    This short presumed domain contains three conserved aspartate residues, hence the name 3D. This conservation is suggestive of a cation binding function. The central aspartate is found in a DTG motif that is suggestive of a peptidase like active site.

    \ 1279 IPR000568 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \ \

    The CF(0) A subunit is a highly hydrophobic protein and has been predicted to contain 8 transmembrane (TM)\ regions PUBMED:2162353. It is a key component of the proton channel, possibly playing a direct role in the translocation\ of protons across the membrane. Sequence comparison of A subunits reveals that the overall level of similarity is quite\ low, but the degree of conservation is relatively high in putative TM domain 5, which contains a conserved arginine.\ Mutagenesis experiments have shown that the Arg residue is required for proton translocation, its replacement\ resulting in loss of ATPase activity PUBMED:2536742.

    \ 1083 IPR002655 \ This is a group of Acyl-CoA oxidases (). Acyl-coA oxidase converts acyl-CoA into trans-2-enoyl-CoA PUBMED:9525937.\ 3064 IPR003502 \ Over a hundred cytokines have now been identified, including several putative new members of the IL-1 family. The IL-1 family consists of 2 main classes, designated alpha (IL1A) and beta (IL1B), as well as the more recently discovered IL-1 receptor antagonist (IL1RA). Sequence similarity is high within the IL1A and IL1B subfamilies (about 60-70%) but low between them (less than 30%). IL1As\ and IL1Bs are synthesised as larger precursors, which are processed to give mature carboxy fragments. IL1B requires this cleavage to become biologically active, but IL1A precursor is already active. Both IL1A and IL1B bind to the same IL1-specific receptor on the target cell, which is then internalised to initiate the relevant effects (which appear to be similar or identical). The N terminal approx. 115 amino acids form a propeptide that is cleaved off to release the active interleukin-1. This signature is for the propeptide.\ 2006 IPR005585 \

    The proteins in this family are around 140-170 residues in length. The proteins contain many conserved residues, with the most conserved motifs found in the central and C-terminal region. The function of these proteins is unknown.

    \ 3966 IPR004900 \ The Poxvirus P35 protein is an immunodominant envelope protein.\ 3211 IPR004943 \ This family includes Lepidopteran low molecular weight (30 kDa) lipoprotein, which is an extracellular protein of unknown function. Biosynthesis occurs in a stage-dependent fashion in the fat body. \ 6572 IPR010623 \

    This domain represents a conserved region situated towards the C-terminal end of several hypothetical bacterial proteins of unknown function. A few members resemble the ImcF protein, which has been proposed PUBMED:12127983 to be involved in Vibrio cholerae cell surface reorganisation that results in increased adherence to epithelial cells line and increased conjugation frequency.

    \ 2657 IPR005849 \

    Galactose-1-phosphate uridyl transferase catalyses the conversion of UDP-glucose and alpha-D-galactose 1-phosphate to alpha-D-glucose 1-phosphate and UDP-galactose during galactose metabolism. The enzyme is present \ in prokaryotes and eukaryotes. Defects in GalT in humans is the cause of galactosemia, an \ inherited disorder of galactose metabolism that leads to jaundice, cataracts and mental retardation.

    \

    This domain describes the C terminal of Galactose-1-phosphate uridyl transferase. SCOP reports fold duplication of the C-terminal with the N-terminal domain. Both are involved in Zn and Fe binding

    \ 472 IPR001343 \ Gram-negative bacteria produce a number of proteins that are secreted into the growth medium by a mechanism that does not require a cleaved N-terminal signal sequence. These proteins, while having different functions, seem to share two properties: they bind calcium and they contain a multiple tandem repeat of a nonapeptide PUBMED:2303029. The nonapeptide is found in a group of bacterial exported proteins that includes haemolysin, cyclolysin, leukotoxin and metallopeptidases belonging to MEROPS peptidase family M10 (clan MA(M)), subfamily 10B (serralysin).

    It has been suggested that the internally \ repeated domain of haemolysin may be involved in Ca-mediated binding to erythrocytes. It has been shown that such a domain is involved in the binding of calcium ions in a parallel beta roll structure PUBMED:8253063.

    \ 2258 IPR006837 \ This is a family of uncharacterised proteins that includes YibQ.\ 2026 IPR007136 \ This repeat is found as four tandem repeats in a family of bacterial membrane proteins. Each repeat contains two transmembrane regions and a conserved tryptophan.\ 6665 IPR009650 \

    This family consists of several Fijivirus specific P9-2 proteins from Rice black streaked dwarf virus (RBSDV) and Fiji disease virus. The function of this family is unknown.

    \ 5416 IPR008903 \ This family consists of several Clostridium botulinum hemagglutinin (HA) subcomponents. C. botulinum type D strain 4947 produces two different sizes of progenitor toxins (M and L) as intact forms without proteolytic processing. The M toxin is composed of neurotoxin (NT) and nontoxic-nonhemagglutinin (NTNHA), whereas the L toxin is composed of the M toxin and hemagglutinin (HA) subcomponents (HA-70, HA-17, and HA-33) PUBMED:8631890.\ 7203 IPR009965 \

    This family consists of several Tenuivirus PV2 proteins. PV2 is thought to be a membrane associated protein PUBMED:8883361. The function of this family is unclear.

    \ 3400 IPR007221 \ MreC (murein formation C) is involved in the rod shape determination in Escherichia coli, and more generally in cell shape determination of bacteria whether or not they are rod-shaped.\ 3830 IPR006428 \

    This group of sequences represent one of several distantly related families of phage portal protein. This protein forms a hole, or portal, that enables DNA passage during packaging and ejection. It also forms the junction between the phage head (capsid) and the tail proteins. It functions as a dodecamer of a single polypeptide of average mol. wt. of 40-90 KDa.

    \ 6677 IPR010672 \

    This entry represents the N terminus of a number of hypothetical archaeal proteins of unknown function.

    \ 2816 IPR001990 \

    Granins (chromogranins or secretogranins) PUBMED:2053134 are a family of acidic proteins present in the secretory granules of a wide variety of endocrine and neuro-endocrine cells. The exact function(s) of these proteins is not yet known but they seem to be the precursors of biologically active peptides and/or they may act as helper proteins in the packaging of peptide hormones and neuropeptides. Apart from their subcellular location and the abundance of acidic residues (Asp and Glu), these proteins do not share many structural similarities. Only one short region, located in the C-terminal section, is conserved in all these proteins.

    \

    Chromogranins and secretogranins together share a C-terminal motif, whereas chromogranins A and B share a region of high similarity in their N-terminal section; this region includes two cysteine residues involved in a disulphide bond.

    \ 6938 IPR009799 \

    This family consists of several bacterial sequences which are related to the EthD protein of Rhodococcus ruber (). Rhodococcus ruber (formerly Gordonia terrae) IFP 2001 is one of a few bacterial strains able to degrade ethyl tert-butyl ether (ETBE), which is a major pollutant from gasoline. This strain was found to undergo a spontaneous 14.3-kbp chromosomal deletion, which results in the loss of the ability to degrade ETBE. Sequence analysis of the region corresponding to the deletion revealed the presence of a gene cluster, ethABCD, encoding a ferredoxin reductase (EthA), a cytochrome P-450 (EthB), a ferredoxin (EthC), and a 10-kDa protein of unknown function (EthD), respectively. Upstream of ethABCD lies ethR, which codes for a putative positive transcriptional regulator of the AraC/XylS family. Transformation of the ETBE-negative mutant by a plasmid carrying the ethRABCD genes restored the ability to degrade ETBE. Complementation was abolished if the plasmid carried ethRABC only demonstrating that EthD is essential for the ETBE degradation system PUBMED:11673424.

    \ 5016 IPR007872 \

    This probable zinc binding motif contains four cysteines that may chelate zinc. This domain is often found associated with N-terminal domain of heat shock protein DnaJ domain. It is also found in DPH3 and DPH4, which are involved in the biosynthesis of diphthamide.

    \ \

    Diphthamide is a unique post-translationally modified histidine residue found only in translation elongation factor 2 (eEF-2). It is conserved from archaea to humans and serves as the target for diphteria toxin and Pseudomonas exotoxin A. These two toxins catalyse the transfer of ADP-ribose to diphtamide on eEF-2, thus inactivating eEF-2, halting cellular protein synthesis, and causing cell death PUBMED:11595641. The biosynthesis of diphtamide is dependant on at least five proteins, DPH1 to -5, and a still unidentified amidating enzyme. DPH3 and DPH4 share a conserved region, which encode a putative zinc finger, the DHP-type or CSL-type (after the conserved motif of the final cysteine) zinc finger PUBMED:14527407, PUBMED:15485916. The function of this motif is unknown.

    \ \ \ \ 5041 IPR007364 \ This is a family of uncharacterised proteins.\ 1030 IPR007691 \ UDP-3-O-[3-hydroxymyristoyl] glucosamine N-acyltransferase () catalyses an early step in lipid A biosynthesis PUBMED:8366125: Members of this family also contain a hexapeptide repeat (). This entry represents the non-repeating region of LPXD proteins.\ 7298 IPR010905 \

    Unsaturated glucuronyl hydrolase catalyses the hydrolytic release of unsaturated glucuronic acids from oligosaccharides produced by the reactions of polysaccharide lyases PUBMED:12777820.

    \ 6134 IPR010446 \

    This family consists of several beta-1,4-N-acetylgalactosaminyltransferase proteins from Campylobacter jejuni PUBMED:10660542.

    \ 5877 IPR009271 \

    The name LSPD derives from the conserved residues in the middle of this repeat. These repeats are found in coagulation factor V and occur in the B domain, which is cleaved prior to activation of the protein. It has been suggested that domain B bring domains A and C together for activation PUBMED:11229814.

    \ 5600 IPR008711 \ The ninR region of Bacteriophage lambda contains two recombination genes, orf (ninB) and rap (ninG), that have roles when the RecF and RecBCD recombination pathways of Escherichia coli, respectively, operate on Bacteriophage lambda PUBMED:11952832.\ 1452 IPR006820 \ This domain occurs at the N-terminal of proteins belonging to the caudal-related homeobox protein family. This region is thought to mediate transcription activation. The level of activation caused by mouse Cdx2 () is affected by phosphorylation at serine 60 via the mitogen-activated protein kinase pathway PUBMED:11729123. Caudal family proteins are involved in the transcriptional regulation of multiple genes expressed in the intestinal epithelium, and are important in differentiation and maintenance of the intestinal epithelial lining. Caudal proteins always have a homeobox DNA binding domain ().\ 186 IPR003158 \ Photosynthesis in purple bacteria is dependent on light-induced electron transfer in the reaction centre (RC), coupled to the uptake of protons from the cytoplasm. The RC contains a cytochrome c subunit which re-reduces the oxidized electron donor.\ 1563 IPR005551 \

    Citrate lyase phosphoribosyl-dephospho-CoA transferase catalyzes the formation of 2-(5''-triphosphoribosyl)-3'- dephosphocoenzyme-A, the precursor of the prosthetic group of the holo-acyl carrier protein (gamma chain) of citrate lyase, from ATP and dephospho-CoA. \

    \ 7181 IPR009952 \

    This family contains uroplakin II, which is approximately 180 residues long and seems to be restricted to mammals. Uroplakin II is an integral membrane protein, and is one of the components of the apical plaques of mammalian urothelium formed by the asymmetric unit membrane - this is believed to play a role in strengthening the urothelial apical surface to prevent the cells from rupturing during bladder distension PUBMED:8175808.

    \ 5823 IPR010300 \

    Cysteine dioxygenase type I () converts cysteine to cysteinesulphinic acid and is the rate-limiting step in sulphate production.

    \ 3671 IPR000176 \

    This family contains viral proteins that are bifunctional, acting as both an mRNA cap-specific RNA 2'-O-methyltransferase, which methylates the ribose 2' OH group of the first transcribed nucleotide, thereby producing a 2'-o-methylpurine cap and a poly(A) polymerase processivity factor which binds to Poly(A)\ but has no catalytic activity. The structure of this protein is known PUBMED:8612277.

    \ 4779 IPR003197 \

    The cytochrome bd type terminal oxidases catalyse quinol dependent, Na+ independent oxygen uptake PUBMED:8626304. Members of this family are integral membrane proteins and contain a protoheame IX center B558.

    \

    Cytochrome bd may play an important role in microaerobic nitrogen fixation in the enteric bacterium Klebsiella pneumoniae, where it is expressed under all conditions that permit diazotrophy PUBMED:9274021.

    \ \

    The 14 kDa (or VI) subunit of the complex is not directly involved in electron transfer, but has a role in assembly of the complex PUBMED:7770525.

    \ 7401 IPR011420 \

    The AreA nitrogen regulatory proteins (which are GATA type transcription factors) share a highly conserved N terminus and have at the C terminus.

    \ 1679 IPR003350 \ A class, also called ONECUT, of homeodomain proteins. \ The CUT domain is a DNA-binding motif which can bind independently or in cooperation with the homeodomain (), often found downstream of the CUT domain. Proteins display two modes of DNA binding, which hinge on the homeodomain and on the linker that separates it from the cut domain, and two modes of transcriptional stimulation, which hinge on the homeodomain PUBMED:9593691.\ 3528 IPR007196 \ The Ccr4-Not complex is a global regulator of transcription that affects genes positively and negatively and is thought to regulate transcription factor TFIID PUBMED:11696541.\ 3246 IPR007534 \ LuxE is an acyl-protein synthetase found in bioluminescent bacteria. LuxE catalyses the formation of an acyl-protein thiolester from a fatty acid and a protein. This is the second step in the bioluminescent fatty acid reduction system, which converts tetradecanoic acid to the aldehyde substrate of the luciferase-catalysed bioluminescence reaction PUBMED:8941351. A conserved cysteine found at position 364 in Photobacterium phosphoreum LuxE () is thought to be acylated during the transfer of the acyl group from the synthetase subunit to the reductase. The C-terminal of the synthetase is though to act as a flexible arm to transfer acyl groups between the sites of activation and reduction PUBMED:2023262. This family also includes Vibrio cholerae RBFN protein (), which is involved in the biosynthesis of the O-antigen component 3-deoxy-L-glycero-tetronic acid.\ 5778 IPR010274 \

    This family consists of several Orthopoxvirus A36R proteins. The A36R protein is predicted to be a type Ib membrane protein PUBMED:11017799.

    \ 487 IPR000861 \

    The REM repeat, which is also called rho effector or HR1 domain, was\ first described as a three times repeated homology region of the\ N-terminal non-catalytic part of protein kinase PRK1(PKN) PUBMED:7851406. \ The first two of these repeats were later shown to bind the small\ G protein rho PUBMED:8647255, PUBMED:9446575 known to activate PKN in its GTP-bound\ form. Similar rho-binding domains also occur in a number of other\ protein kinases and in the rho-binding proteins rhophilin and rhotekin. Recently, the structure of the N-terminal REM repeat complexed with RhoA has been determined by X-ray crystallography PUBMED:10619026. It forms an antiparallel coiled-coil fold termed an ACC finger.

    \ \ 6405 IPR008325 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 1821 IPR002624 \ This family consists of various deoxynucleoside kinases including cytidine (), guanosine (), adenosine () and thymidine kinase (, which also phosphorylates deoxyuridine and deoxycytosine. These enzymes catalyse the production of deoxynucleotide 5'-monophosphate from a deoxynucleoside, using ATP and yielding ADP in the process.\ 735 IPR002880 \ This family includes the N terminal region of the pyruvate ferredoxin oxidoreductase, corresponding to the first two structural domains. This region is involved in inter subunit contacts PUBMED:10048931. Pyruvate oxidoreductase (POR) catalyses the final step in the fermentation of carbohydrates in anaerobic microorganisms PUBMED:8550425. This involves the oxidative decarboxylation of pyruvate with the participation of thiamine followed by the transfer of an acetyl moiety to coenzyme A for the synthesis of acetyl-CoA PUBMED:8550425. The family also includes pyruvate flavodoxin oxidoreductase as encoded by the nifJ gene in cyanobacterium which is required for growth on molecular nitrogen when iron is limited PUBMED:8415612.\ 1624 IPR006888 \ Cor1 is a component of the chromosome core in the meiotic prophase chromosomes PUBMED:7876343. Xlr is a lymphoid cell specific protein PUBMED:7821804. Xmr is abundantly transcribed in testis in a tissue-specific and developmentally regulated manner. The protein is located in the nuclei of spermatocytes, early in the prophase of the first meiotic division, and later becomes concentrated in the XY nuclear subregion where it is in particular associated with the axes of sex chromosomes PUBMED:8306953.\ 2771 IPR005197 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This is a family of alpha-1,3-glucanases belonging to glycoside hydrolase family 71 ().

    \ 2151 IPR007452 \ This family contains several proteins of uncharacterised function.\ 6844 IPR009745 \

    This entry represents a 24 residue repeated motif from the Trypanosoma brucei cysteine-rich, acidic integral membrane protein precursor (CRAM). CRAM is concentrated in the flagellar pocket, an invagination of the cell surface of the trypanosome where endocytosis has been documented PUBMED:1697030.

    \ 705 IPR006448 \

    This group of sequences describe the distinct family of phage (and integrated prophage) putative terminase small subunit sequnces. Members tend to be encoded by the gene adjacent to the phage terminase large subunit gene.

    \ 2025 IPR007132 \ This repeat was found as seven tandem copies in one protein. It is predicted to be composed of beta-strands. Thus it is likely that it forms a beta-propeller structure. It is found in association with BNR repeats, which also form a beta-propeller.\ 5438 IPR008497 \ This family consists of several bacterial proteins of unknown function.\ 1554 IPR007291 \ Circoviruses are small circular single stranded viruses. This family includes the VP1 protein from the chicken anemia virus which is the viral coat protein.\ 928 IPR002792 \

    Saccharomyces cerevisiae contains an endoexonuclease NucR that has been implicated in both recombination and repair. The N-terminal half of the protein shows homology (approximately 50%) with human rho genes, while the C-terminal region, which is related to the Escherichia coli recC protein, apparently encodes the\ endoexonuclease activity PUBMED:1408836.

    \ 677 IPR002989 \ These repeats are found in many mycobacterial proteins. The repeats\ are most common in the PPE family of proteins , where they\ are found in the MPTR subfamily. The function of\ these repeats is unknown. The repeat can be approximately described as\ XNXGX, where X can be any amino acid. These repeats are similar to\ A(D/N)LXX repeats PUBMED:9655353, however it is not clear if these two families are\ structurally related.\ 4464 IPR005754 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to MEROPS peptidase family C60 (clan C-) and include the members of both subfamilies of sortases. The Staphylococcus aureus sortase is a transpeptidase that attaches surface proteins to the cell wall; it cleaves between the Gly and Thr of the LPXTG motif and catalyses the formation of an amide bond between the carboxyl-group of threonine and the amino-group of the cell-wall peptidoglycan PUBMED:10427003. Sortase homologues are found in almost all Gram-positives, a single Gram-negative (Shewanella putrefaciens) and an archaean (Methanobacterium thermoautotrophicum), where cell wall LPXTG-mediated decoration has not been reported PUBMED:11401711, PUBMED:14572546.

    \ \

    Surface proteins not only promote interaction between the invading pathogen and animal tissues, but also provide ingenious strategies for bacterial escape from the host's immune response. In the case of S. aureus protein A, immunoglobulins are captured on the microbial surface and camouflage bacteria during the invasion of host tissues. S. aureus mutants lacking the srtA gene fail to anchor and display some surface proteins and are impaired in the ability to cause animal infections. Sortase acts on surface proteins that are initiated into the secretion (Sec) pathway and have their signal peptide removed by signal peptidase. The S. aureus genome encodes two sets of sortase and secretion genes. It is conceivable that S. aureus has evolved more than one pathway for the transport of 20 surface proteins to the cell wall envelope.

    \ \ 199 IPR000488 \

    The death domain (DD) is a homotypic protein interaction module composed of a bundle of six alpha-helices. DD is related in sequence and structure to the death effector domain (DED, see ) and the caspase recruitment domain (CARD, see ), which work in similar pathways and show similar interaction properties PUBMED:11504623. DD bind each other forming oligomers. Mammals have numerous and diverse DD-containing proteins PUBMED:7482697. Within these proteins, the DD domains can be found in combination with other domains, including: CARDs, DEDs, ankyrin repeats (), caspase-like folds, kinase domains, leucine zippers (), leucine-rich repeats (LRR) (), TIR domains (), and ZU5 domains () PUBMED:15226512.

    \

    Some DD-containing proteins are involved in the regulation of apoptosis and inflammation through their activation of caspases and NF-kappaB, which typically involves interactions with TNF (tumour necrosis factor) cytokine receptors PUBMED:14585074, PUBMED:14601641. In humans, eight of the over 30 known TNF receptors contain DD in their cytoplasmic tails; several of these TNF receptors use caspase activation as a signalling mechanism. The DD mediates self-association of these receptors, thus giving the signal to downstream events that lead to apoptosis. Other DD-containing proteins, such as ankyrin, MyD88 and pelle, are probably not directly involved in cell death signalling. DD-containing proteins also have links to innate immunity, communicating with Toll family receptors through bipartite adapter proteins such as MyD88 PUBMED:12691620.

    \ \ 1738 IPR006718 \

    The defective chorion-1 gene (dec-1) in Drosophila encodes follicle cell proteins necessary for proper eggshell assembly. Multiple products of the dec-1 gene are formed by alternative RNA splicing and proteolytic processing PUBMED:1699826. Cleavage products include S80 (80 kDa) which is incorporated into the eggshell, and further proteolysis of S80 gives S60 (60 kDa).

    This repeat is usually found in 12 copies in the central region of the protein. Its function is unknown. Length polymorphisms of Dec-1 have been observed in wild-type strains, and are caused by changes in the numbers of the first five repeats PUBMED:8350348.

    \ 7316 IPR011100 \

    Alpha-glucuronidases, components of an ensemble of enzymes central to the recycling of photosynthetic biomass, remove the alpha-1,2 linked 4-O-methyl glucuronic acid from xylans. This family represents the central catalytic domain of alpha-glucuronidase PUBMED:11937059.

    \ 7284 IPR010898 \

    This family contains component I of bacterial heptaprenyl diphosphate synthase () (approximately 170 residues long). This is one of the two dissociable subunits that form the enzyme, both of which are required for the catalysis of the biosynthesis of the side chain of menaquinone-7 PUBMED:9748348.

    \ 4521 IPR001217 \

    The STAT protein (Signal Transducers and Activators of Transcription) family contains transcription factors that are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus PUBMED:12039028. Binding of these factors to cell-surface receptors leads to receptor autophosphorylation at a tyrosine, the phosphotyrosine being recognised by the STAT SH2 domain, which mediates the recruitment of STAT proteins from the cytosol and their association with the activated receptor. The STAT proteins are then activated by phosphorylation via members of the JAK family of protein kinases, causing them to dimerise and translocated to the nucleus, where they bind to specific promoter sequences in target genes. In mammals, STATs comprise a family of seven structurally and functionally related proteins: Stat1, Stat2, Stat3, Stat4, Stat5a and Stat5b, Stat6. STAT proteins play a critical role in regulating innate and acquired host immune responses. Dysregulation of at least two STAT signaling cascades (i.e. Stat3 and Stat5) is associated with cellular transformation.

    \

    Signaling through the JAK/STAT pathway is initiated when a cytokine binds to its corresponding receptor. This leads to conformational changes in the\ cytoplasmic portion of the receptor, initiating activation of receptor associated members of the JAK family of kinases. The JAKs, in turn, mediate phosphorylation at the specific receptor tyrosine residues, which then serve as docking sites for STATs and other signaling molecules. Once recruited to the receptor, STATs also become phosphorylated by JAKs, on a single tyrosine residue. Activated STATs dissociate from the receptor, dimerize, translocate to the nucleus and bind to members of the GAS (gamma activated site) family of enhancers.

    \

    The seven STAT proteins identified in mammals range in size from 750 and 850 amino acids. The chromosomal distribution of these STATs, as well as the identification of STATs in more primitive eukaryotes, suggest that this family arose from a single primordial gene. STATs share structurally and functionally conserved domains including: an N-terminal domain that strengthens interactions between STAT dimers on adjacent DNA-binding sites; a coiled-coil STAT domain that is implicated in protein-protein interactions; a DNA-binding domain with an immunoglobulin-like fold similar to p53 tumour suppressor protein; an EF-hand-like linker domain connecting the DNA-binding and SH2 domains; an SH2 domain () that acts as a phosphorylation-dependent switch to control receptor recognition and DNA-binding; and a C-terminal transactivation domain PUBMED:9630226. The crystal structure of the N-terminus of Stat4 reveals a dimer. The interface of this dimer is formed by a ring-shaped element consisting of five short helices. Several studies suggest that this N-terminal dimerization promotes cooperativity of binding to tandem GAS elements and with the transcriptional coactivator CBP/p300.

    \ 1997 IPR005512 \ This domain is found in a family of plant hypothetical proteins.\ 6742 IPR010700 \

    This family contains a number of hypothetical proteins of unknown function approximately 350 residues long. These are of bacterial and viral origin.

    \ 2785 IPR000741 \

    Fructose-bisphosphate aldolase () PUBMED:2199259, PUBMED:1412694 is a glycolytic \ enzyme that catalyses the reversible aldol cleavage or condensation of fructose-1,6-bisphosphate into dihydroxyacetone-phosphate and glyceraldehyde 3-phosphate. There are two classes of fructose-bisphosphate aldolases with different catalytic mechanisms: class I enzymes PUBMED:3355497 are found in animals, do not require a metal ion, and are characterised by the formation of a Schiff base intermediate between a highly conserved active site lysine and a substrate carbonyl group, while the class II enzymes are produced in bacteria and fungi, and require an active-site divalent metal ion. This entry represents the class I enzymes.

    \

    In vertebrates, three forms of this enzyme are found: aldolase A is expressed in muscle, aldolase B in liver, kidney, stomach and intestine, and aldolase C in brain, heart and ovary. The different isozymes have different catalytic functions: aldolases A and C are mainly involved in glycolysis, while aldolase B is involved in both glycolysis and gluconeogenesis. Defects in aldolase B result in hereditary fructose intolerance.

    \ \ 1713 IPR003143 \ Cytochrome cd1 (nitrite reductase) catalyses the conversion of nitrite to nitric oxide in the nitrogen cycle. This family represents the d1 heme binding domain of cytochrome cd1, in which His/Tyr side chains ligate the d1 heme iron of the active site in the oxidized state PUBMED:9311786.\ 7850 IPR012549 \

    Some members of this family are putative bacterial membrane proteins. This domain is found immediately N-terminal to the sulphatase domain in many sulphatases.

    \ 6772 IPR010709 \

    This family consists of several archaeal proteins of around 150 residues in length. The function of this family is unknown.

    \ 7046 IPR009864 \

    This family consists of several rhoptry-associated protein 1 (RAP-1) sequences which appear to be specific to Plasmodium falciparum PUBMED:11254620.

    \ 1337 IPR006853 \ This is a family of Baculovirus p26 proteins.\ 5023 IPR000197 \

    CBP and the related protein p300 are large nuclear molecules that interact\ with transcriptional activators and repressors. They belong to a class of\ protein containing a histone acetyltransferase activity, which suggest a role\ in chromatin remodeling. They are involved in biological function as diverse \ as cell growth, differentiation, or apoptosis PUBMED:8848831.\ CBP/P300 proteins contain in their N and C terminal parts the so called\ transcriptional adaptor putative zinc finger (TAZ finger).\ Each TAZ domain is an around 100 amino acids domain which shows an internal\ triplication of a Cys-x4-Cys-x8-His-x3-Cys motif, although some of the repeats\ are imperfect. The binding sites for YY1, E1A and TFIIB in CBP and P300\ proteins have been mapped in the region that contain the TAZ finger,\ suggesting a possible protein-binding function for this motif.\ Proteins containing this domain have been found to bind phosphorylated CREB.

    \ \ 5343 IPR008891 \ This family is common to ssRNA positive-strand viruses and are commonly described as nucleic acid binding proteins (NABP).\ 3656 IPR003354 \ This domain represents a conserved region in papovavirus small and middle T-antigens. It is found as the N-terminal domain in the small T-antigen, and is centrally located in the middle T-antigen.\ 5207 IPR008041 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the MEROPS peptidase family C23 (clan CA). The type example is Carlavirus (apple stem pitting virus) endopeptidase, this thought to play\ a role in the post-translational cleavage of the high molecular weight primary translation products of the virus.

    \ 3870 IPR007445 \ PilO proteins are involved in the assembly of pilin. However, the precise function of this family of proteins is not known.\ 2211 IPR007563 \

    This is a family of uncharacterised prokaryotic proteins. Multiple predicted transmembrane regions suggest that the protein is membrane associated.

    \ 1462 IPR003417 \ Core binding factor (CBF) is a heterodimeric transcription factor essential for genetic regulation of hematopoiesis and osteogenesis. The beta subunit binds to the core site, 5'-PYGPYGGT-3', of a number of enhancers and promoters, including murine leukemia virus, polyomavirus enhancer, T-cell receptor enhancers etc. The beta subunit enhances DNA-binding ability of the alpha subunit in vitro, and has been show to have a structure related to the OB fold PUBMED:10404215. Also included in this family are the Drosophila melanogaster brother and big brother proteins, which regulate the DNA-binding properties of Runt.\ 750 IPR002165 \ This is a cysteine rich repeat found in several different extracellular receptors. The function of the repeat is unknown. Three copies of the repeat are found in plexin () PUBMED:7605632. Two copies of the repeat are found in mahogany protein. A related Caenorhabditis elegans protein () contains four copies of the repeat, while the Met receptor contains a single copy of the repeat.\ 1590 IPR006822 \ This family represents the epsilon subunit of the coatomer complex, which is involved in the regulation of intracellular protein trafficking between the endoplasmic reticulum and the Golgi complex PUBMED:10469566.\ 5384 IPR008438 \

    This family consists of several mammalian calcineurin-binding proteins. Calcineurin is a Ca/calmodulin-dependent serine-threonine phosphatase and has been implicated in the transduction of signals that control the hypertrophy of cardiac muscle and slow fibre gene expression in striated muscle. \ A novel family of striated muscle-specific calcineurin-interacting proteins called calsarcins or myozenins has been identified that interact and co-localize with the Z-disc protein alpha-actinin thereby coupling muscle activity to calcineurin activation PUBMED:11114196.

    \ \

    Because calcineurin responds to sustained, low amplitude calcium signals, calsarcins may serve to localize calcineurin in the vicinity of unique intracellular pool, where it can interact with specific upstream activators or downstream substrates. Therefore, calsarcins may play an important role in modulating the function and substrate specificity of calcineurin in striated muscle cells.

    \ \

    Three isoforms of calsarcins that have been identified in human, rat and mouse.\

    \

    \ \

    Calsarcin-1, is expressed, throughout the development-cycle, in all striated muscle tissues. However, CALS-1 expression is localized in slow-twitch fibers. Calsarcin-2, has an approximate ~30% identity with CALS-1 is a globular protein with central glycine-rich domain flanked by a-helical regions. CALS-2 is expressed transiently in heart during early embryogenesis and later becomes restricted to skeletal muscle with weaker signals in adult prostate, placenta and pancreas. In contrast to CALS-1, the expression of Calsarcin-2 is restricted to fast-twitch skeletal fiber. Calsarcin-3, is expressed specifically in skeletal muscle and is enriched in fast-twitch muscle fibers. Like calsarcin-1 and calsarcin-2, calsarcin-3 interacts with calcineurin, and the Z-disc proteins alpha-actinin, gamma-filamin, and telethonin PUBMED:11842093.

    \ \ \ 3670 IPR004102 \

    Poly(ADP-ribose) polymerase catalyses the covalent attachment of ADP-ribose units from NAD+ to itself and to a limited number of other DNA binding proteins, which decreases their affinity for DNA. Poly(ADP-ribose) polymerase is a regulatory component induced by DNA damage. The regulatory domain of the polymerase is almost always associated with the C-terminal catalytic domain (see ).

    \

    This domain consists of a duplication of two helix-loop-helix structural repeats PUBMED:9521710.

    \ 1934 IPR002572 \ This domain is found in 1 to 3 copies in archaebacterial\ proteins. The function of the domain is unknown. This\ family appears to be expanded in Archaeoglobus fulgidus.\ 7645 IPR012502 \

    This family contains sequences expressed in eukaryotic organisms bearing high similarity to the WAPL conserved region of Drosophila melanogaster wings apart-like protein. This protein is involved in the regulation of heterochromatin structure PUBMED:10747063. hWAPL (), the human homolog, is found to play a role in the development of cervical carcinogenesis, and is thought to have similar functions to Drosophila wapl protein PUBMED:15150110. Malfunction of the hWAPL pathway is thought to activate an apoptotic pathway that consequently leads to cell death PUBMED:15150110.

    \ 6814 IPR009726 \

    This entry represents a short conserved region (approximately 40 residues) within TraN bacterial mating pair stabilisation proteins. TraN is thought to be required for the formation of stable mating aggregates during F-directed conjugation. This region contains five conserved cysteine residues.

    \ 430 IPR001382 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 47 comprises enzymes with only one known activity; alpha-mannosidase ().

    \

    Alpha-mannosidase is involved in the maturation of Asn-linked oligo-saccharides PUBMED:8144580. The enzyme hydrolyses terminal 1,2-linked alpha-D-mannose\ residues in the oligo-mannose oligosaccharide man(9)(glcnac)(2) in a\ calcium-dependent manner. The mannose residues are trimmed away to produce,\ first, man(8)glcnac(2), then a man(5)(glcnac)(2) structure.

    \ 2825 IPR004215 \ Prokaryotic glutathione synthetase (glutathione synthase) catalyses the conversion of gamma-L-glutamyl-L-cysteine and glycine to orthophosphate and glutathione in the presence of ATP. This is the second step in glutathione biosynthesis. The enzyme is inhibited by 7,8-dihydrofolate, methotrexate and trimethoprim. This domain is the N-terminus of the enzyme.\ 6018 IPR009336 \

    This family consists of several bacterial and phage proteins of unknown function.

    \ 6598 IPR010639 \

    This family consists of several Nucleopolyhedrovirus actin-rearrangement-inducing factor (Arif-1) proteins. In response to Autographa californica multicapsid nuclear polyhedrosis virus (AcMNPV) infection, a sequential rearrangement of the actin cytoskeleton occurs this is induced by Arif-1 PUBMED:9311884. Arif-1 is tyrosine phosphorylated and is located at the plasma membrane as a component of the actin rearrangement-inducing complex PUBMED:11264366.

    \ 4936 IPR004909 \ This family includes the Sugar beet yellow virus heat shock protein 90 homologue and other hypothetical proteins.\ 5303 IPR008444 \ This family consists of Chlamydia virulence proteins which are thought to be required for growth within mammalian cells PUBMED:2845228.\ 6082 IPR009364 \

    This is a family of uncharacterised proteins found in Proteobacteria.

    \ 2084 IPR007339 \ This family of uncharacterised proteins appears to be restricted to proteobacteria.\ 7780 IPR012505 \

    The members of this family are all hypothetical bacterial proteins of unknown function, and are similar to the YbbR protein expressed by Bacillus subtilis (, ). One member () is annotated as an uncharacterized secreted protein, whereas another member () is described as a hypothetical protein in the 5,region of the def gene of Thermus thermophilus, which encodes a deformylase PUBMED:7961514, but no further information was found in either case. This region is found repeated up to four times in many members of this family.

    \ 7349 IPR011108 \

    The metallo-beta-lactamase fold contains five sequence motifs. The first four motifs are found in and are common to all metallo-beta-lactamases. The fifth motif appears to be specific to function. This entry represents the fifth motif from metallo-beta-lactamases involved in RNA metabolism PUBMED:12177301.

    \ 5965 IPR010370 \

    The function of this family is unclear, but and are described as transcription elongation factor A, SII-like proteins.

    \ 5422 IPR008619 \ This highly divergent repeat occurs in number of proteins implicated in cell aggregation PUBMED:2539596. The Pfam alignment probably contains three such repeats (personal obs: C Yeats). These are likely to have a beta-helical structure.\ 1631 IPR002574 \ This family consists of various coronavirus matrix proteins which are\ transmembrane glycoproteins. The M protein or E1 glycoprotein is\ The coronavirus M protein is implicated in virus assembly PUBMED:6325918.\ The E1 viral membrane protein is required for formation of the viral \ envelope and is transported via the Golgi complex PUBMED:2305554.\ 3364 IPR004983 \

    The Mlp (for Multicopy Lipoprotein) family of lipoproteins is found in Borrelia species PUBMED:9488385. This family were previously known as 2.9 lipoprotein genes PUBMED:10531261. These surface\ expressed genes may represent new candidate vaccinogens for Lyme disease PUBMED:9488385. Members of this family generally are downstream of four ORFs called A,B,C and D\ that are involved in hemolytic activity.

    \ 3965 IPR006732 \ The function of these viral proteins is not known.\ 7880 IPR012605 \

    This family consists of the RepA1 leader peptides. The frequency of replication of IncFII plasmid NR1 during the cell division cycle is regulated by the control of the synthesis of the plasmid-specific replication initiation protein (RepA1). When RepA1 is synthesised, it binds to the plasmid replication origin (ori) and effects the assembly of a replication complex composed of host proteins that mediate the replication of the plasmid PUBMED:1447133. The tap gene encodes a 24-amino acids protein. The translation of tap is required for translation of repA.

    \ 688 IPR007863 \

    These metallopeptidases belong to MEROPS peptidase family M16 (clan ME). They include proteins, which are classified as non-peptidase homologues either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    \ \

    The peptidases in this group of sequences include:

    \ \ \

    These proteins do not share many regions of sequence similarity; the most noticeable is in the N-terminal section. This region includes a conserved histidine followed, two residues later by a glutamate and another histidine. In pitrilysin, it has been shown PUBMED:7990931 that this H-x-x-E-H motif is involved in enzymatic activity; the two histidines bind zinc and the glutamate is necessary for catalytic activity. The mitochondrial processing peptidase consists of two structurally related domains. One is the active peptidase whereas the other, the C-terminal region, is inactive. The two domains hold the substrate like a clamp PUBMED:11470436.

    \ \ 4910 IPR005124 \ This family represents the eukaryotic vacuolar (H+)-ATPase (V-ATPase) G subunit. V-ATPases generate an acidic environment in several intracellular compartments.\ Correspondingly, they are found as membrane-attached proteins in several organelles. They are also found in the plasma membranes of some specialized cells.\ V-ATPases consist of peripheral (V1) and membrane integral (V0) heteromultimeric complexes. The G subunit is part of the V1 subunit, but is also thought to be\ strongly attached to the V0 complex. It may be involved in the coupling of ATP degradation to H+ translocation.\ 3915 IPR000092 \ A variety of isoprenoid compounds are synthesized by various organisms. For\ example in eukaryotes the isoprenoid biosynthetic pathway is responsible for\ the synthesis of a variety of end products including cholesterol, dolichol,\ ubiquinone or coenzyme Q. In bacteria this pathway leads to the synthesis of\ isopentenyl tRNA, isoprenoid quinones, and sugar carrier lipids. Among the\ enzymes that participate in that pathway, are a number of polyprenyl\ synthetase enzymes which catalyze a 1'4-condensation between 5 carbon isoprene\ units.\ It has been shown PUBMED:2198286, PUBMED:2089044, PUBMED:1826006, PUBMED:1303794, PUBMED:1495965 that all the above enzymes share some regions of\ sequence similarity. Two of these regions are rich in aspartic-acid residues\ and could be involved in the catalytic mechanism and/or the binding of the\ substrates.\ 7706 IPR012851 \

    The Coat F proteins, which contribute to the Bacillales spore coat, appear to consist of one or two copies of this domain. It occurs multiple times in the genomes it is found in.

    \ 3847 IPR007312 \ This family includes both bacterial phospholipase C enzymes () and eukaryotic acid phosphatases .\ 2672 IPR003191 \ Transcription of the anti-viral guanylate-binding protein (GBP) is induced by interferon-gamma during macrophage induction. This family contains GBP1 and GPB2, both GTPases capable of binding GTP, GDP and GMP.\ 2872 IPR004195 \ Bacteriophage lambda head decoration protein D stabilizes the head shell after the rearrangement of GP7 subunits of the head shell lattice that accompanies expansion of the head. There are approximately 420 copies of protein D per mature phage.\ 6663 IPR010664 \

    This family consists of several hypothetical bacterial proteins of around 190 residues in length. The function of this family is unknown.

    \ 3322 IPR005184 \

    A domain found in proteins of unknown function PUBMED:12625841, some of which are described as heat shock protein (HslJ). In Helicobacter pylori the protein is secreted e.g. () and implicated in motility. In Leishmania spp. it is described as an essential protein, over-expression of which, in L.amazonensis, increases virulence (; PUBMED:10403759). A pair of cysteine residues show correlated conservation, suggesting that they form a disulphide bond.

    \ 751 IPR006568 \

    PSP is a proline-rich domain of unknown function found in spliceosome associated proteins.

    \ \ 6175 IPR009409 \

    This family consists of several short hypothetical archaeal proteins of unknown function.

    \ 1040 IPR003385 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ The enzymes in this entry () belong to the glycoside hydrolase family 77 , and transfer a segment of a (1,4)-alpha-D-glucan to a new 4-position in an acceptor, which may be\ glucose or (1,4)-alpha-D-glucan PUBMED:7678257. They belong to the disproportionating family of enzymes.\ 6008 IPR010388 \

    This group, typified by Salmonella typhimurium CbiK, contains anaerobic cobalt chelatases that act in the anaerobic cobalamin biosynthesis pathway PUBMED:9150215, PUBMED:11215515.

    \

    Cobalamin (vitamin B12) can be complexed with metal via ATP-dependent reactions (aerobic pathway) (e.g., in Pseudomonas denitrificans) or via ATP-independent reactions (anaerobic pathway) (e.g., in Salmonella typhimurium) PUBMED:8905078, PUBMED:11469861. The corresponding cobalt chelatases are not homologous. This group belongs to the class of ATP-independent, single-subunit chelatases that also includes distantly related protoporphyrin IX (PPIX) ferrochelatase (HemH) (Class II chelatases) PUBMED:12686546. The structure of Salmonella typhimurium CbiK shows that it has a remarkably similar topology to Bacillus subtilis ferrochelatase despite only weak sequence conservation PUBMED:10451360. Both enzymes contain a histidine residue identified as the metal ion ligand, but CbiK contains a second histidine in place of the glutamic acid residue identified as a general base in PPIX ferrochelatase PUBMED:10451360. Site-directed mutagenesis has confirmed a role for this histidine and a nearby glutamic acid in cobalt binding, modulating metal ion specificity as well as catalytic efficiency PUBMED:10451360.

    \

    It should be noted that CysG and Met8p, which are multifunctional proteins associated with siroheme biosynthesis, include chelatase activity and can therefore be considered as the third class of chelatases PUBMED:12686546. As with the class II chelatases, they do not require ATP for activity. However, they are not structurally similar to HemH or CbiK, and it is likely that they have arisen by the acquisition of a chelatase function within a dehydrogenase catalytic framework PUBMED:11980703, PUBMED:12686546.

    \ \ 2812 IPR007048 \ These proteins from bacteriophage T4 and related phage may be a structural component of the outer wedge of the baseplate that has acidic lysozyme activity PUBMED:3186452.\ 306 IPR006460 \

    This family of hypothetical plant proteins are defined by a region of about 170 amino acids found at the C terminus. These proteins have highly divergent N-terminal regions rich in low complexity sequence. PSI-BLAST reveals no clear similarity to any characterized protein. At least 12 distinct members are found in Arabidopsis thaliana.

    \ 7792 IPR012483 \

    Mba1 is part of the mitochondrial protein export machinery and represents the first component of a novel Oxa1-independent insertion pathway into the mitochondrial inner membrane PUBMED:8690083, PUBMED:11381092.

    \ 6053 IPR009352 \

    The camelpox virus A46 homologue, , is described as a toll-like receptor inhibitor.

    \ 7770 IPR012881 \

    The members of this family are hypothetical eukaryotic proteins of unknown function. The region in question is approximately 100 amino acid residues long.

    \ 236 IPR003793 \ This is an uncharacterised domain found in proteins of unknown function.\ 5660 IPR008564 \ This family consists of a number of conserved eukaryotic proteins of unknown function.\ 7826 IPR012938 \

    Proteins containing this domain are thought to be glucose/sorbosone dehydrogenases. The best characterised of these proteins is soluble glucose dehydrogenase () from Acinetobacter calcoaceticus, which oxidises glucose to gluconolactone. The enzyme is a calcium-dependent homodimer which uses PQQ as a cofactor PUBMED:10508152.

    \ 6431 IPR010570 \

    This family consists of several hypothetical proteins of unknown function and seems to be specific to Bacteroides species.

    \ 2982 IPR007868 \ Homing endonucleases are encoded by mobile DNA elements that are found inserted within host genes in all domains of life. The crystal structure of the homing nuclease PI-Sce PUBMED:12219083 revealed two domains: an endonucleolytic center resembling the C-terminal domain of Drosophila melanogaster Hedgehog protein, and a second domain containing the protein-splicing active site. This domain corresponds to the protein-splicing domain.\ 1362 IPR003920 \

    An operon encoding 4 proteins required for bacterial cellulose biosynthesis\ (bcs) in Acetobacter xylinum has been isolated via genetic complementation\ with strains lacking cellulose synthase activity PUBMED:2146681. Nucleotide sequence analysis showed the cellulose synthase operon to consist of 4 genes, \ designated bcsA, bcsB, bcsC and bcsD, all of which are required for maximal bacterial cellulose synthesis in A. xylinum.

    \

    The calculated molecular mass of the protein encoded by bcsB is 85.3kDa PUBMED:2146681. BcsB encodes the catalytic subunit of cellulose synthase. The protein polymerises uridine 5'-diphosphate glucose to cellulose: UDP-glucose + (1,4-beta-D-glucosyl)(N) = UDP + (1,4-beta-D-glucosyl)(N+1). The enzyme is specifically activated by the nucleotide cyclic diguanylic acid. Sequence analysis suggests that BcsB contains several transmembrane (TM) domains, and shares a high degree of similarity with Escherichia coli YhjN.

    \ 1767 IPR006796 \ Dickkopf proteins are a class of Wnt antagonists. They possess two conserved cysteine-rich regions. This family represents the N-terminal conserved region PUBMED:12167704. The C-terminal region has been found to share significant sequence similarity to the colipase fold () PUBMED:9663378.\ 5514 IPR008539 \

    This family consists of a group of proteins, that may be involved in lipopolysaccharide-modification PUBMED:10482503. Members are functionally uncharacterised.

    \ 1877 IPR003368 \

    This repeat is found in several Chlamydia polymorphic membrane proteins. Chlamydia pneumoniae is an obligate intracellular bacterium and a common human pathogen causing infection of the upper and lower respiratory tract. Common for the Pmps, the tetrapeptide GGA(I/V/L) motif is repeated several times in the N-terminal part. The C-terminal half is characterised by conserved tryptophans and a carboxy-terminal phenylalanine. A signal peptide leader sequence is predicted in Chlamydophila pneumoniae Pmps, which indicates an outer membrane localisation PUBMED:10587946. Pmp10 and Pmp11 contain a signal peptidase II cleavage site suggesting lipid modification. The C. pneumoniae pmp genes represent 17.5% of the chlamydia-specific coding capacity and they are all transcribed during chlamydial growth but the function of Pmps remains unknown PUBMED:11583841.

    \ 1091 IPR003703 \ This is a family of acyl-CoA thioesterases of unknown function that hydrolyse a range of acyl-CoA thioesters PUBMED:1645722.\ 5935 IPR009296 \

    This family consists of several short hypothetical bacterial proteins of unknown function.

    \ 5841 IPR010310 \

    This family consists of several short bacterial proteins of unknown function.

    \ 5523 IPR008543 \ This family consists of chloroplast encoded Ycf2, which is around 2000 residues in length. The function of Ycf2 is unknown, though it may be an ATPase. Its retention in reduced chlorplast geneomes of non-photosynthetic plants, e.g. Epifagus virginiana, and transformation experiments in tobacco indicate that it has an essential function which is probably not related to photosynthesis PUBMED:10792825.\ 2891 IPR007796 \

    This family consists of the BLLF1 viral late glycoprotein, also termed gp350/220. It is the most abundantly expressed glycoprotein in the viral envelope of the Herpesviruses and is the major antigen responsible for stimulating the production of neutralising antibodies in vivo. The binding of the viral major glycoprotein BLLF1 to the CD21 cellular receptor is thought to play an essential role during infection of B lymphocytes by the Epstein-Barr virus PUBMED:11024143.

    \ 3458 IPR007358 \

    The Escherichia coli nucleoid contains DNA in a condensed but functional form. Analysis of proteins released from isolated spermidine nucleoids after treatment with DNase I reveals significant amounts of two proteins not previously detected in wild-type Escherichia coli. Partial amino-terminal sequencing has identified them as the products of rdgC and yejK. These proteins are strongly conserved in Gram-negative bacteria, suggesting that they have important cellular roles PUBMED:10368163.

    \ 237 IPR004119 \ This family includes proteins of unknown function. All known members of this group are proteins from drosophila and Caenorhabditis elegans.\ 7049 IPR010819 \

    N-acylglucosamine 2-epimerase (AGE, ) reversibly converts N-acyl-D-glucosamine to N-acyl-D-mannosamine, the latter ultimately being converted to cytidine 5- monophospho-N-acetylneuraminic acid, which is used as a precursor for the synthesis of connective tissues, blood cells and cellular macromolecules. AGE is a renin-binding protein (RnBP), which might act as a cellular rennin inhibitor. AGE functions as a homodimer, where monomer has an alpha(6)/alpha(6)-barrel structure commonly found in glucoamylases and cellulases PUBMED:11061972. This family contains a number of eukaryotic and bacterial AGE enzymes.

    \ 3616 IPR001516 \

    This domain represents an N-terminal extension of . It contains NADH-Ubiquinone chain 5 and eubacterial chain L; these are found in the NADH:ubiquinone oxidoreductase (complex I) which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction that is associated with proton translocation across the membrane PUBMED:1470679.

    \ 3382 IPR004435 \

    Molybdenum plays an important role in the biogeochemistry of nitrogen and sulfur and is an essential trace element associated with a large group of redox active enzymes in eukaryotes, eubacteria and Archaea. With the exception of nitrogenase, molybdenum is present as an ubiquitous basic structure composed of a molybdenum atom coordinated to one or two molecules of a tricyclic pyranopterin forming the molybdenum cofactor PUBMED:1587808.

    \ \

    The MobB domain is similar to that of the urease accessory protein UreG and the hydrogenase accessory protein HypB, both GTP hydrolases involved in loading nickel into the metallocenters of their respective target enzymes. It is involved in the final step of molybdenum-cofactor biosynthesis. While its precise function has not been identified it is thought to be involved in the transfer of a guanine dinucleotide moiety to molybdopterin, as it shows GTP-binding and weak GTPase activity PUBMED:9219527. The MobB protein () from E. coli, which is comprised of this domain, is a homodimer PUBMED:14646116. Each molecule is composed of two distinct regions - an outer region comprised of 6 beta-strands and three alpha helices, and an inner region comprised of a two-strand beta hairpin followed by an alpha helix. These regions require interaction with the second monomer to allow proper folding to occur. The two monomers are intertwined and form an extensive 16-stranded beta-sheet. While the active site could not be positively identified, the presence of highly conserved residues suggests the substrate binding site occurs in the central solvent channel.

    \ 5307 IPR008796 \ This family contains several Photosystem I reaction centre subunit N (PSI-N) proteins. The protein has no known function although it is localised in the thylakoid lumen PUBMED:10230065. PSI-N is a small extrinsic subunit at the lumen side and is very likely involved in the docking of plastocyanin.\ 3521 IPR004030 \

    Nitric oxide synthase () (NOS) enzymes produce nitric oxide (NO) by catalyzing a five-electron oxidation of a guanidino nitrogen of L-arginine (L-Arg). Oxidation of L-Arg to L-citrulline occurs via two successive monooxygenation reactions producing N(omega)-hydroxy-L-arginine as an intermediate. 2 mol of O(2) and 1.5 mol of NADPH are consumed per mole of NO formed PUBMED:8782597.

    \

    Arginine-derived NO synthesis has been identified in mammals, fish, birds, invertebrates, plants, and bacteria PUBMED:8782597. Best studied are mammals, where three distinct genes encode NOS isozymes: neuronal (nNOS or NOS-1), cytokine-inducible (iNOS or NOS-2) and endothelial (eNOS or NOS-3) PUBMED:7510950. iNOS and nNOS are soluble and found predominantly in the cytosol, while eNOS is membrane associated. The enzymes exist as homodimers, each monomer consisting of two major domains: an N-terminal oxygenase domain, which belongs to the class of heme-thiolate proteins, and a C-terminal reductase domain, which is homologous to NADPH:P450 reductase (). The interdomain linker between the oxygenase and reductase domains contains a calmodulin (CaM)-binding sequence. NOSs are the only enzymes known to simultaneously require five bound cofactors animal NOS isozymes are catalytically self-sufficient. The electron flow in the NO synthase reaction is: NADPH --> FAD --> FMN --> heme --> O(2).

    \

    eNOS localisation to endothelial membranes is mediated by cotranslational N-terminal myristoylation and post-translational palmitoylation PUBMED:9199168. The subcellular localisation of nNOS in skeletal muscle is\ mediated by anchoring of nNOS to dystrophin. nNOS contains an additional \ N-terminal domain, the PDZ domain PUBMED:7535955. Some bacteria, like Bacillus halodurans, Bacillus subtilis or Deinococcus radiodurans, contain homologs of NOS oxygenase domain. The pattern is directed against the N-terminal heme binding site.

    \ 6199 IPR009422 \

    This family consists of several mammalian Gemin6 proteins. The exact function of Gemin6 is unknown but it has been found to form part of the complex. The SMN complex plays a key role in the biogenesis of spliceosomal small nuclear ribonucleoproteins (snRNPs) and other ribonucleoprotein particles PUBMED:11748230.

    \ 6473 IPR009542 \

    This family consists of several microsomal signal peptidase 12 kDa subunit proteins. Translocation of polypeptide chains across the endoplasmic reticulum (ER) membrane is triggered by signal sequences. Subsequently, signal recognition particle interacts with its membrane receptor and the ribosome-bound nascent chain is targeted to the ER where it is transferred into a protein-conducting channel. At some point, a second signal sequence recognition event takes place in the membrane and translocation of the nascent chain through the membrane occurs. The signal sequence of most secretory and membrane proteins is cleaved off at this stage. Cleavage occurs by the signal peptidase complex (SPC) as soon as the lumenal domain of the translocating polypeptide is large enough to expose its cleavage site to the enzyme. The signal peptidase complex is possibly also involved in proteolytic events in the ER membrane other than the processing of the signal sequence, for example the further digestion of the cleaved signal peptide or the degradation of membrane proteins. Mammalian signal peptidase is as a complex of five different polypeptide chains. This family represents the 12 kDa subunit (SPC12).

    \ 3107 IPR001804 \

    Isocitrate dehydrogenase (IDH) PUBMED:2682654, PUBMED:1939242 is an important enzyme of carbohydrate metabolism which catalyzes the oxidative decarboxylation of isocitrate into alpha-ketoglutarate. IDH is either dependent on NAD+ () or on NADP+ (). In eukaryotes there are at least three isozymes of IDH: two are located in the mitochondrial matrix (one NAD+-dependent, the other NADP+-dependent), while the third one (also NADP+-dependent) is cytoplasmic. In Escherichia coli the activity of a NADP+-dependent form of the enzyme is controlled by the phosphorylation of a serine residue; the phosphorylated form of IDH is completely inactivated.

    \

    3-isopropylmalate dehydrogenase () (IMDH) PUBMED:1748999, PUBMED:7773180 catalyzes the third step in the biosynthesis of leucine in bacteria and fungi, the oxidative decarboxylation of 3-isopropylmalate into 2-oxo-4-methylvalerate.

    \

    Tartrate dehydrogenase () PUBMED:8053675 catalyzes the reduction of tartrate to oxaloglycolate.

    \

    These enzymes are evolutionary related PUBMED:2682654, PUBMED:1748999, PUBMED:7773180, PUBMED:8053675. The best conserved region of these enzymes is a glycine-rich stretch of residues located in the C-terminal section.

    \ 7565 IPR012885 \

    This domain is found is found towards the C terminus of proteins that contain an F-box, , suggesting that they are effectors linked with ubiquitination.

    \ 2918 IPR007626 \ This protein is known as R50 in cytomegalovirus.\ 5334 IPR008475 \ This domain is found, normally as a tandem repeat, at the C terminus of bacterial phospholipase C proteins.\ 5532 IPR008441 \ This family consists of several capsular polysaccharide proteins. Capsular polysaccharide (CPS) is a major virulence factor in Streptococcus pneumoniae PUBMED:11179285.\ 3684 IPR000682 \ Protein-L-isoaspartate(D-aspartate) O-methyltransferase () (PCMT)\ PUBMED: (which is also known as L-isoaspartyl protein carboxyl methyltransferase)\ is an enzyme that catalyzes the transfer of a methyl group from S-\ adenosylmethionine to the free carboxyl groups of D-aspartyl or L-isoaspartyl\ residues in a variety of peptides and proteins. The enzyme does not act on\ normal L-aspartyl residues L-isoaspartyl and D-aspartyl are the products of\ the spontaneous deamidation and/or isomerization of normal L-aspartyl and L-\ asparaginyl residues in proteins. PCMT plays a role in the repair and/or\ degradation of these damaged proteins; the enzymatic methyl esterification of\ the abnormal residues can lead to their conversion to normal L-aspartyl\ residues. The SAM domain is present in most of these proteins.\ 2247 IPR007669 \ This family represents a conserved region, found in several Caenorhabditis elegans proteins.\ 8021 IPR012573 \

    This family consists of meleagrin and cygnin basic peptides that are isolated from turkey and black swan respectively. Both peptides are low in molecular weight and contain three disulphide bonds with high concentrations of aromatic residues. These peptides show similarity to transferrins and probably play some vital role in avian eggs but the exact function is still unknown PUBMED:2760022.

    \ 6448 IPR009531 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 6341 IPR009484 \

    This family consists of several repeats of around 30 residues in length which are found specifically in mature-parasite-infected erythrocyte surface antigen proteins from Plasmodium falciparum. This family often found in conjunction with .

    \ 3371 IPR001213 \ The mouse mammary tumor virus (MMTV) is a milk-transmitted type B retrovirus. The\ superantigen (SAg) is encoded in the long terminal repeat PUBMED:7612231.\ 512 IPR001838 \

    Inwardly-rectifying K+ channels (Kir) are the principal class of two-TM domain K+ channels. They are characterised by the property of inward-rectification, which is described as the ability to allow large inward currents and smaller outward currents. Inwardly rectifying potassium channels (Kir) are responsible for regulating diverse processes including: cellular excitability, vascular tone, heart rate, renal salt flow, and insulin release PUBMED:10102275. To date, around twenty members of this superfamily have been cloned, which can be grouped into six families by sequence similarity, and these are designated Kir1.x-6.x PUBMED:7580148, PUBMED:10449331.

    \

    Cloned Kir channel cDNAs encode proteins of between ~370-500 residues, both N- and C-termini are thought to be cytoplasmic, and the N-terminus lacks a signal sequence. Kir channel alpha subunits possess only 2TM domains linked with a P-domain. Thus, Kir channels share similarity with the fifth and sixth domains, and P-domain of the other families. It is thought that four Kir subunits assemble to form a tetrameric channel complex, which may be hetero- or homomeric PUBMED:10102275.

    \

    Potassium channels are the most diverse group of the ion channel family\ PUBMED:1772658, PUBMED:1879548. They are important in shaping the action potential, and in neuronal excitability and plasticity PUBMED:2451788. The potassium channel family is\ composed of several functionally distinct isoforms, which can be broadly\ separated into 2 groups PUBMED:2555158: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.

    \

    These are all highly similar proteins, with only small amino acid\ changes causing the diversity of the voltage-dependent gating mechanism,\ channel conductance and toxin binding properties. Each type of K+ channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; and others are regulated by GTP-binding proteins or\ other second messengers PUBMED:2448635. In eukaryotic cells, K+ channels\ are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes PUBMED:1373731. In prokaryotic cells, they play a role in the\ maintenance of ionic homeostasis PUBMED:11178249.

    \

    All K+ channels discovered so far possess a core of \ alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has\ been termed the K+ selectivity sequence.\ In families that contain one P-domain, four subunits assemble to form a selective pathway for K+ across the membrane.\ However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K+ channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains.\ The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K+ channels; and three types of calcium (Ca)-activated K+ channels (BK, IK and SK)\ PUBMED:11178249, PUBMED:. The 2TM domain family comprises inward-rectifying K+ \ channels. In addition, there are K+ channel alpha-subunits that possess two P-domains. These are usually highly regulated K+ selective leak channels.

    \

    \ 3366 IPR006099 \

    Methylmalonyl-CoA mutase () (MCM) PUBMED:1975493 is an adenosylcobalamin (vitamin B12) dependent enzyme that catalyzes the isomerization between methylmalonyl-CoA and succinyl-CoA. MCM is involved in various catabolic or biosynthetic pathways; for example in man it is involved in the degradation of several amino acids, odd-chain fatty acids and cholesterol via propionyl-CoA to the tricarboxylic acid cycle; while in some bacteria it is involved in the synthesis of propionate from tricarboxylic acid-cycle intermediates.

    \

    Deficiency of MCM in man causes an often fatal disorder of organic acid metabolism termed methylmalonic acidemia. The sequences of eukaryotic and prokaryotic MCM are rather well conserved. In eukaryotes MCM is located in the mitochondrial matrix and is a homodimer of a polypeptide chain of about 710 amino acids. In bacteria MCM is a dimer of two non-identical, yet structurally related chains. This family also includes an Escherichia coli protein (gene sbm) whose function is not yet known.

    \

    A small degree of similarity is said PUBMED:2197274 to exist between MCM and the large subunit of the adenosylcobalamin-dependent enzyme ethanolamine ammonia-lyase, but this similarity is so weak that these two type of enzymes can not be detected by a single pattern.

    \ 4636 IPR007378 \ The preprotein translocation at the inner envelope membrane of chloroplasts so far involves five proteins: Tic110, Tic55, Tic40, Tic22 (this family) and Tic20. The molecular function of these proteins has not yet been established PUBMED:12032074.\ 1 IPR006139 \

    A number of NAD-dependent 2-hydroxyacid dehydrogenases which seem to be specific for the D-isomer of their substrate\ have been shown to be functionally and structurally related. The catalytic domain contains a number of conserved charged residues which may play a role in the catalytic mechanism. The NAD-binding domain is described in

    \ 1480 IPR005088 \ This domain is found in a number of bacterial cellulases.\ 1680 IPR001373 \

    The cullins are hydrophilic proteins involved in cell division control in yeasts PUBMED:8943317 and probably in various processes in the cell cycle of other organisms PUBMED:8681378. Mammalian vasopressin-activated calcium-mobilizing receptor (VACM-1), a kidney-specific protein thought to form a cell surface receptor but which does not have any structural hallmarks of a receptor PUBMED:7611460 and Drosophila lin19 belong to this family of proteins.

    \ 3446 IPR007861 \ This domain is found in proteins of the MutS family (DNA mismatch repair proteins) and is found associated with several other domains (see , and ). The mutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair; other members of the family included the eukaryotic MSH 1,2,3, 4,5 and 6 proteins. These have various roles in DNA repair and recombination. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein PUBMED:8036718. The aligned region corresponds in part with globular domain IV, which is involved in DNA binding, in Thermus aquaticus MutS as characterised in PUBMED:11048710.\ 7025 IPR010811 \

    This represents a short conserved region (approximately 50 residues long), sometimes repeated, within a number of hypothetical Oryza sativa proteins of unknown function.

    \ 6964 IPR010787 \

    This family contains a number of hypothetical bacterial proteins of unknown function approximately 300 residues in length. Some family members are predicted to be metal-dependent.

    \ 5272 IPR008472 \ This family contains sequences with are repeated in several uncharacterised proteins from Drosophila melanogaster.\ 5845 IPR010926 \

    These proteins share a region of sequence similarity with the tail of myosin (for example ).\ Myosins act as molecular motors.\

    \ 5637 IPR008559 \ This family consists of several eukaryotic proteins with no known function.\ 2856 IPR007483 \

    This family includes the hamartin protein which is thought to function as a tumour suppressor. The hamartin protein interacts with the tuberin protein . Tuberous sclerosis complex (TSC) is an autosomal dominant disorder and is characterised by the presence of hamartomas in many organs, such as brain, skin, heart, lung, and kidney. It is caused by mutation in either TSC1 or TSC2 tumour suppressor genes. TSC1 encodes a protein, hamartin, containing two coiled-coil regions, which have been shown to mediate binding to tuberin. The TSC2 gene codes for tuberin . These two proteins function within the same pathway(s) regulating cell cycle, cell growth, adhesion, and vesicular trafficking PUBMED:12167664.

    \ 5703 IPR008901 \ This family consists of several eukaryotic alkaline phytoceramidase (aPHC) sequences. Ceramidases are enzymes involved in regulating cellular levels of ceramides, sphingoid bases, and their phosphates. Alkaline phytoceramidase is responsible for the hydrolysis of phytoceramide PUBMED:11356846.\ 1213 IPR013332 \

    ApbA, the ketopantoate reductase enzyme of Salmonella typhimurium is required for the synthesis of thiamine via the alternative pyrimidine biosynthetic pathway PUBMED:9488683. Precursors to the pyrimidine moiety of thiamine are synthesized de novo by the purine biosynthetic pathway or the alternative pyrimidine biosynthetic (APB) pathway. The ApbA protein catalyzes the NADPH-specific reduction of ketopantoic acid to pantoic acid. This activity had previously been associated with the pantothenate biosynthetic gene panE PUBMED:9721324. ApbA and PanE are allelic PUBMED:9721324.

    \ \ 6013 IPR010393 \

    This family consists of several bacterial YecM proteins of unknown function.

    \ 7164 IPR010852 \

    This family consists of several hypothetical bacterial proteins of around 180 residues in length. Members of this family are found in Streptomyces, Rhizobium, Ralstonia, Agrobacterium and Bradyrhizobium species. The function of this family is unknown.

    \ 5744 IPR008591 \

    DNA replication in eukaryotes results from a highly coordinated interaction between proteins, often as part of protein complexes, and the DNA template. One of the key early steps leading to DNA replication is formation of the prereplication complex, or pre-RC. The pre-RC is formed by the sequential binding of the origin recognition complex (ORC), Cdc6 and Cdt1 proteins, and the MCM complex. Activation of the pre-RC into the initiation complex (IC) is achieved via the action of S-phase kinases, eventually leading to the loading of the replication machinery.

    \

    Recently, a novel replication complex, GINS (for Go, Ichi, Nii, and San; five, one, two, and three in Japanese), has been identified PUBMED:12730133, PUBMED:12730134. \ \ The precise function of GINS is not known. However, genetic and two-hybrid interactions indicate that it mediates the loading of the enzymatic replication machinery at a step after the action of the S-phase kinases PUBMED:12730134. Furthermore, GINS may be a part of the replication machinery itself, since it is found associated with replicating DNA PUBMED:12730133, PUBMED:12730134. Electron microscopy of GINS shows that it forms a ring-like structure PUBMED:12730133, reminiscent of the structure of PCNA PUBMED:8001157, the DNA polymerase delta replication clamp.This observation, coupled with the observed interactions for GINS, indicates that the complex may represent the replication clamp for DNA polymerase epsilon PUBMED:12730133.

    \ \ \

    The GINS complex is essential for initiation of DNA replication in Xenopus egg extracts PUBMED:12730133. This 100 kDa stable complex includes Sld5, Psf1, Psf2, and Psf3. Homologues of these components are found also in other eukaryotes. This family of proteins represents the Sld5 component.

    \ 3647 IPR003700 \

    The panB gene from Escherichia coli encodes the first enzyme of the pantothenate biosynthesis pathway, ketopantoate hydroxymethyltransferase (KPHMT) . Fungal ketopantoate hydroxymethyltransferase is essential for the biosynthesis of coenzyme A, while the pathway intermediate 4'-phosphopantetheine is required for penicillin production PUBMED:10503542.

    \ 4011 IPR004098 \

    The splicing factor Prp18 is required for the second step of pre-mRNA\ splicing. PRP18 appears to\ be primarily associated with the U5 snRNP.

    \

    The structure of a large fragment of the Saccharomyces cerevisiae\ Prp18 is known PUBMED:10737784. This fragment is fully active in yeast splicing in vitro and\ includes the sequences of Prp18 that have been evolutionarily conserved.\ The core structure consists of five alpha-helices that adopt a novel fold. The\ most highly conserved region of Prp18, a nearly invariant stretch of 19 aa,\ forms part of a loop between two alpha-helices and may interact with the\ U5 small nuclear ribonucleoprotein particles PUBMED:10737784.

    \ 3910 IPR001746 \ These occlusion proteins are major components of the virus occlusion bodies, large proteinaceous structures (polyhedra) that protect the virus from the outside environment for extended periods until they are ingested by insect larvae. They occur in various viruses including the single nucleocapsid nuclear polyhedrosis virus (SNPV) and granulosis virus.\ 3615 IPR001750 \ This domain is found in the NADH:ubiquinone oxidoreductase (complex I) which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction that is associated with proton translocation across the membrane PUBMED:1470679.\ 6271 IPR010498 \

    This is a family of enterotoxigenic bacterial adhesins.

    \ 240 IPR004253 \ This domain of unknown function is found in Arabidopsis thaliana and other plant proteins.\ 1103 IPR002612 \

    This family consists of adenovirus E1B 55 kDa protein or large t-antigen. E1B 55 kDa binds p53 the tumor suppressor protein converting it from a transcriptional activator which responds to damaged DNA in to an unregulated repressor of genes with a p53 binding site PUBMED:10207064. This protects the virus against p53 induced host antiviral responses and prevents apoptosis as induced by the adenovirus E1A protein PUBMED:10207064.\ The E1B region of adenovirus encodes two proteins E1B 55 kDa, the large t-antigen as found in this family and E1B 19 kDa , the small t-antigen. Both of these proteins inhibit E1A induced apoptosis.

    \ 5294 IPR008408 \ This family consists of several brain acid soluble protein 1 (BASP1) or neuronal axonal membrane protein NAP-22. The BASP1 is a neuron enriched Ca2+-dependent calmodulin-binding protein of unknown function PUBMED:10965107,PUBMED:9310187.\ 145 IPR000595 \ Proteins that bind cyclic nucleotides (cAMP or cGMP) share a structural domain of about 120 residues PUBMED:14638413,\ PUBMED:10550204, PUBMED:1710853. The best studied of these proteins is the prokaryotic catabolite gene activator (also\ known as the cAMP receptor protein) (gene crp) where such a domain is known to be composed of three alpha-helices and\ a distinctive eight-stranded, antiparallel beta-barrel structure. There are six invariant amino acids in this domain,\ three of which are glycine residues that are thought to be essential for maintenance of the structural integrity of\ the beta-barrel. cAMP- and cGMP-dependent protein kinases (cAPK and cGPK) contain two tandem copies of the cyclic\ nucleotide-binding domain. The cAPK's are composed of two different subunits, a catalytic chain and a regulatory chain,\ which contains both copies of the domain. The cGPK's are single chain enzymes that include the two copies of the domain\ in their N-terminal section. Vertebrate cyclic nucleotide-gated ion-channels also contain this domain. Two such\ cations channels have been fully characterized, one is found in rod cells where it plays a role in visual signal\ transduction.\ 710 IPR004013 \ The PHP (Polymerase and Histidinol Phosphatase) domain is a putative phosphoesterase domain. This family is often associated with an N-terminal region .\ 7725 IPR012461 \

    This family is composed of sequences derived from hypothetical eukaryotic proteins of unknown function. Some members of this family are annotated as being potential phospholipases but no literature was found to support this.

    \ 428 IPR001139 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 30 \ comprises enzymes with only one known activity; glucosylceramidase ().

    \ \

    Family 30 encompasses the mammalian glucosylceramidases. Human acid beta-glucosidase (D-glucosyl-N-acylsphingosine glucohydrolase),\ cleaves the glucosidic bonds of glucosylceramide and synthetic beta-glucosides PUBMED:3456607. Any one of over 50 different mutations in the gene of glucocerebrosidase have been found to affect activity of this hydrolase, producing variants of Gaucher disease, the most prevalent lysosomal storage disease PUBMED:3456607, PUBMED:9316290.

    \ 7548 IPR011709 \

    This domain is found towards the C terminus of the DEAD-box helicases (). In these helicases it is, apparently, always found in association with . There do seem to be a couple of instances where it occurs by itself - e.g. .

    \ 1786 IPR007002 \

    The dlt operon (dltA to dltD) of Lactobacillus rhamnosus 7469 encodes four proteins responsible for the esterification of lipoteichoic acid (LTA) by D-alanine. These esters play an important role in controlling the net anionic charge of the poly (GroP) moiety of LTA. DltA and DltC encode the D-alanine-D-alanyl carrier protein ligase (Dcl) and D-alanyl carrier protein (Dcp), respectively. Whereas the functions of DltA and DltC are defined, the functions of DltB and DltD are unknown. In vitro assays showed that DltD bound Dcp for ligation with D-alanine by Dcl in the presence of ATP. In contrast, the homologue of Dcp, the Escherichia coli acyl carrier protein (ACP), involved in fatty acid biosynthesis, was not bound to DltD and thus was not ligated with D-alanine. DltD also catalyzed the hydrolysis of the mischarged D-alanyl-ACP. The hydrophobic N-terminal sequence of DltD was required for anchoring the protein in the membrane. It is hypothesized that this membrane-associated DltD facilitates the binding of Dcp and Dcl for ligation of Dcp with D-alanine and that the resulting D-alanyl-Dcp is translocated to the primary site of D-alanylation PUBMED:10781555.

    \ \ \

    These sequences contain the central region of DltD.

    \ \ 1046 IPR001813 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The 60S acidic ribosomal protein plays an important role in the elongation step of protein synthesis. This family includes archaebacterial L12, eukaryotic P0, P1 and P2 PUBMED:8722011.

    \

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Alt a 6, Alt a 12, Cla h 3, Cla h 4 and Cla h 12.

    \ 7015 IPR009845 \

    This family consists of several bacterial and related archaeal protein of around 180 residues in length. The function of this family is unknown.

    \ 5567 IPR000727 \

    The process of vesicular fusion with target membranes depends on a set of SNAREs (SNAP-Receptors), \ which are associated with the fusing membranes PUBMED:9239749, PUBMED:9232812. Target SNAREs \ (t-SNAREs) are localised on the target membrane and belong to two different families, the \ syntaxin-like family and the SNAP-25 like family. One member of each family, together with a\ v-SNARE localised on the vesicular membrane, are required for fusion.

    The Syntaxins are type-I \ transmembrane proteins that contain several regions with coiled-coil propensity in their cytosolic \ part, the SNARE motif. SNAP-25 () is a protein consisting of two coiled-coil regions, which is associated with the \ membrane by lipid anchors. SNARE motifs assemble into parallel four helix bundles stabilised by the burial of these hydrophobic helix faces in the bundle core. Monomeric SNARE motifs are disordered so this assembly reaction is accompanied by a dramatic increase in alpha-helical secondary structure PUBMED:14570579. The parallel arrangement of SNARE motifs within complexes bring the transmembrane anchors, and the two membranes, into close proximity. Recently, it was shown that the two coiled-coil regions of SNAP-25 and\ one of the coiled-coil regions of the syntaxins are related PUBMED:9096343. This domain is found in both Syntaxin and SNAP-25 families as well as in other proteins.

    \ 7607 IPR012934 \

    The zf-AD domain, also known as ZAD, forms an atypical treble-cleft-like zinc co-ordinating fold. The zf-AD domain is thought to be involved in mediating dimer formation, but does not bind to DNA PUBMED:14604529.

    \ 7326 IPR011112 \

    The Rho termination factor disengages newly transcribed RNA from its DNA template at certain, specific transcripts. It is thought that two copies of Rho bind to RNA and that Rho functions as a hexamer of protomers PUBMED:10230401. This domain is found to the N terminus of the RNA binding domain ().

    \ 3848 IPR005984 \

    Phospholamban (PLB) is a small protein (52 amino acids) that regulates the affinity of the cardiac sarcoplasmic reticulum Ca2+-ATPase (SERCA2a) for calcium. PLB is present in cardiac myocytes, in slow-twitch and smooth muscle and is expressed also in aorta endothelial cells in which it could play a role in tissue relaxation. The phosphorylation/dephosphorylation of phospholamban removes and restores, respectively, its inhibitory activity on SERCA2a. It has in fact been shown that phospholamban, in its non-phosphorylated form, binds to SERCA2a and inhibits this pump by lowering its affinity for Ca2+, whereas the phosphorylated form does not exert the inhibition. PLB is phosphorylated at two sites, namely at Ser-16 for a\ cAMP-dependent phosphokinase and at Thr-17 for a Ca2+/calmodulin-dependent phosphokinase, phosphorylation at Ser-16 being a prerequisite for the phosphorylation at Thr-17.

    The structure of a 36-amino-acid-long N-terminal fragment of human phospholamban phosphorylated at Ser-16 and Thr-17 and Cys36Ser mutated was determined from nuclear magnetic resonance data. The peptide assumes a conformation characterized by two alpha-helices connected by an irregular strand, which\ comprises the amino acids from Arg-13 to Pro-21. The proline is in a trans conformation. The two phosphate groups on Ser-16 and Thr-17 are shown to interact preferably with the side chains of Arg-14 and Arg-13, respectively PUBMED:12080135.

    \ \ 3701 IPR007318 \ The Saccharomyces cerevisiae phospholipid methyltransferase () has a broad substrate specificity of unsaturated phospholipids PUBMED:2445736.\ 4297 IPR001900 \

    This group of bacterial and eukaryotic proteins represent both characterized and related sequences to exoribonuclease II (RNase II)and ribonuclease R; a bacterial 3' --> 5' exoribonuclease homologous to RNase II PUBMED:11948193,PUBMED:15604703,PUBMED:9829834.

    \ \

    The size of these proteins range from 644 residues (rnb) to 1250 (SSD1). While their sequence is highly divergent they share a conserved domain in their C-terminal section PUBMED:9241229. It is possible that this domain plays a role in the exonuclease function.

    \ 4707 IPR006829 \ This group of putative transposases includes mostly Bacillus members. However, we have also found a Bacillus subtilis bacteriophage SPbetac2 homologue (), possibly arising as a result of horizontal transfer.\ 4664 IPR008336 \

    Eukaryotic-like DNA topoisomerase I, otherwise known as relaxing enzyme, untwisting enzyme or swivelase, () is one of the two types of enzyme that catalyze the interconversion of topological DNA isomers and is vital for the processes of replication, transcription, and recombination PUBMED:2560656, PUBMED:7773745, PUBMED:2542938, PUBMED:7770916. Topoisomerase I catalyses the ATP-independent breakage of single-stranded DNA, followed by passage and rejoining of another single-stranded DNA region PUBMED:2544263. This reaction brings about the conversion of one topological DNA isomer into another: e.g., relaxation of positive and negative super-coils; interconversion of simple and knotted rings of single-stranded DNA; and intertwisting of single-stranded rings of complementary sequences PUBMED:2544263, PUBMED:1849260.

    \

    When a eukaryotic type 1 topoisomerase breaks a DNA backbone bond, it simultaneously forms a protein-DNA link where the hydroxyl group of a tyrosine residue is joined to a 3'-phosphate on DNA, at one end of the enzyme-severed DNA strand. In eukaryotes and poxvirus topoisomerases I, there are a number of conserved residues in the region around the active site tyrosine.

    \

    This entry represents the eukaryotic DNA topoisomerase I N-terminal DNA binding domain. Human topoisomerase I has been shown to be inhibited by camptothecin (CPT), a plant alkaloid with antitumour activity PUBMED:1849260. The crystal structures of human topoisomerase I comprising the core and carboxyl-terminal domains in covalent and noncovalent complexes with 22-base pair DNA duplexes reveal an enzyme that "clamps" around essentially B-form DNA. The core domain and the first eight residues of the carboxyl-terminal domain of the enzyme, including the active-site nucleophile tyrosine-723, share significant structural similarity with the bacteriophage family of DNA integrases. A binding mode for the anticancer drug camptothecin has been proposed on the basis of chemical and biochemical information combined with the three-dimensional structures of topoisomerase I-DNA complexes PUBMED:9488644.

    \ 5646 IPR008874 \

    The traT gene is one of the F factor transfer genes and encodes an outer membrane protein which is involved in interactions between Escherichia coli and its surroundings PUBMED:9933744. The protein plays a role in preventing unproductive conjugation between bacteria carrying like plasmids.

    \ 111 IPR003673 \

    This is a family of enzymes with diverse function, including fatty-acid CoA racemase enzymes such as arylpropionyl-CoA epimerase a key enzyme in the inversion metabolism of ibuprofen, carnitine dehydratase (CAIB) () and bile acid-inducible operon protein F (BAIF) PUBMED:3170477.

    \

    The 2-arylpropionic acid derivatives, including ibuprofen, are the most widely used anti-inflammatory analgesic cyclooxygenase inhibitors. The (-)-R-enantiomer, which is inactive in terms of cyclooxygenase inhibition, is epimerized in vivo via 2-arylpropionyl-coenzyme A (CoA) epimerase to the cyclooxygenase-inhibiting (+)-S-enantiomer. In addition to its obvious importance in drug metabolism, the homology of the epimerase with carnitine dehydratases from several species suggests that this protein, which up to now has only been characterized as having a role in drug transformation, has a function in lipid metabolism PUBMED:9106621.\ Carnitine dehydratase catalyzes the dehydration of L-(-)-carnitine to crotonobetaine PUBMED:8188598.

    \ 3200 IPR005818 \ Linker histone H1 is an essential component of chromatin structure. H1 links nucleosomes into higher order structures.\ Histone H5 performs the same\ function as histone H1, and replaces H1 in certain cells. \ The structure of GH5, the globular domain of the linker\ histone H5 is known PUBMED:8384699, PUBMED:3463990. The fold is similar to the DNA-binding\ domain of the catabolite gene activator protein, CAP, thus providing a\ possible model for the binding of GH5 to DNA.\ 2786 IPR001195 \

    Proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. Glycophorin A (PAS-2) and glycophorin B (PAS-3) belong to the MNS blood group system and are associated with antigens that include M/N, S/s, U, He, Mi(a), M(c), Vw, Mur, M(g), Vr, M(e), Mt(a), St(a), Ri(a), Cl(a), Ny(a), Hut, Hil,\ M(v), Far, Mit, Dantu, Hop, Nob, En(a), ENKT, and others.

    \

    Glycophorin A is the major sialoglycoprotein of the erythrocyte membrane PUBMED:2605264. Structurally, glycophorin A consists of\ an N-terminal extracellular domain, heavily glycosylated on serine and threonine residues,\ followed by a transmembrane region and a C-terminal cytoplasmic domain. \

    \ 3104 IPR001346 \ The expression of type I interferon genes (interferons alpha and beta) is induced by many \ agents, including viral attack PUBMED:3409321. Induction is mediated by the binding of \ interferon regulatory factor 1 (IRF-1) to a region known as the interferon consensus \ sequence (ICS), located upstream of the interferon genes PUBMED:1460054. Other factors may \ also bind to the ICS, including IRF-2, which does not function as an activator, but \ rather suppresses the function of IRF-1 under certain circumstances PUBMED:2475256. \ IRF proteins contain a conserved N-terminal region of about 120 amino acids, which folds \ into a structure that binds specifically to the ICS; the remaining parts of the\ sequences vary depending on the precise function of the protein PUBMED:1460054.\ 1696 IPR000010 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    This family represent the cystatins, which are cysteine proteinase inhibitors belonging to MEROPS inhibitor family I25, clan IH PUBMED:2107324, PUBMED:14587292, PUBMED:1855589. They mainly inhibit peptidases belonging to peptidase families C1 (papain family) and C13 (legumain family). The cystatin family includes:

    \ \ \ 1384 IPR002664 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S50 (clan SF).

    \ \

    The large RNA segment, segment A, of birnaviruses codes for a polyprotein (N-VP2-VP4-VP3-C) PUBMED:2828658 that is processed into the major structural proteins of the virion: VP2, VP3, and into the putative protease VP4 PUBMED:2828658.

    \ 6073 IPR008323 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 4733 IPR000924 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Glutamyl-tRNA synthetase () is a class Ic synthetase and shows several similarities with glutaminyl-tRNA synthetase concerning structure and catalytic properties. It is an alpha2 dimer. To date one crystal structure of a glutamyl-tRNA synthetase (Thermus thermophilus) has been solved. The molecule has the form of a bent cylinder and consists of four domains. The N-terminal half (domains 1 and 2) contains the 'Rossman fold' typical for class I synthetases and resembles the corresponding part of E. coli GlnRS, whereas the C-terminal half exhibits a GluRS-specific structure PUBMED:9426192.\

    \ 3348 IPR003619 \ Mammalian dwarfins are phosphorylated in response to transforming growth factor beta and are implicated in control of cell growth \ PUBMED:8799132. The dwarfin family also includes the Drosophila protein MAD that is required for the function of decapentaplegic (DPP) and may play a role in DPP signaling. Drosophila Mad binds to DNA and directly mediates activation of vestigial by Dpp PUBMED:9230443. This domain is also found in nuclear factor I (NF-I) or CCAAT box-binding transcription factor (CTF).\ 2702 IPR007839 \

    GTP cyclohydrolase III catalyses the formation of 2-amino-5-formylamino-6- ribofuranosylamino-4(3H)-pyrimidinone ribonucleotide monophosphate and inorganic phosphate from GTP. The enzyme also has an independent pyrophosphate phosphohydrolase activity. The proteins are 200-270 amino acids in length.

    \ 1756 IPR007085 \

    The DNA/pantothenate metabolism flavoprotein affects synthesis of DNA and pantothenate metabolism.

    \ 1582 IPR003010 \

    This family contains nitrilases that break carbon-nitrogen bonds and appear to be involved in the reduction of organic nitrogen compounds and ammonia production PUBMED:7987228. They all have distinct substrate specificity and include cyanide hydratases, aliphatic amidases, beta-alanine synthase, and a few other proteins with unknown molecular function. Sequence conservation over the entire length, as well as the similarity in the reactions catalyzed by the known enzymes in this family, points to a common catalytic mechanism. They have an invariant cysteine that is part of the catalytic site in nitrilases. Another highly conserved motif includes an invariant glutamic acid that might also be involved in catalysis PUBMED:7987228.

    \ 7574 IPR012895 \

    This domain is the HSCB C-terminal oligomerisation domain and is found on co-chaperone proteins.

    \ 2731 IPR000773 \ Granulocyte-macrophage colony-stimulating factor (GMCSF) is a cytokine that acts in\ hematopoiesis to stimulate growth and differentiation of hematopoietic precursor cells\ from various lineages including granulocytes, macrophages, eosinophils and erythrocytes\ PUBMED:2458827, PUBMED:1569568. GMCSF is a glycoprotein of ~120 residues that contains 4 conserved\ cysteines that participate in disulphide bond formation. The crystal structure of recombinant\ human GMCSF has been determined PUBMED:1569568. There are two molecules in the asymmetric\ unit, which are related by an approximate non-crystallographic 2-fold axis. The overall\ structure, which is highly compact and globular with a predominantly hydrophobic core, is\ characterised by a 4-alpha-helix bundle. The helices are arranged in a left-handed anti-parallel\ fashion, with two overhand connections. Within the connections is a two-stranded \ anti-parallel beta-sheet. The tertiary structure has a topology similar to that of Sus scrofa (pig) growth\ factor and interferon-beta. Most of the proposed critical regions for receptor binding are\ located on a continuous surface at one end of the molecule that includes the C terminus\ PUBMED:1569568.\ 4746 IPR007638 \ This is a region found N-terminal to the catalytic domain of glutaminyl-tRNA synthetase () in eukaryotes but not in Escherichia coli. This region is thought to bind RNA in a non-specific manner, enhancing interactions between the tRNA and enzyme, but is not essential for enzyme function PUBMED:10347214.\ 1127 IPR002198 \ The short-chain dehydrogenases/reductases family (SDR) PUBMED:7742302 is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized was Drosophila alcohol dehydrogenase, this family used to be called PUBMED:2707261, PUBMED:1889416, PUBMED:1740120 'insect-type', or 'short-chain' alcohol dehydrogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least 2 domains PUBMED:6789320, the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains PUBMED:6789320.\ 756 IPR004327 \ Phosphotyrosyl phosphatase activator (PTPA) proteins stimulate the phosphotyrosyl phosphatase (PTPase) activity of\ the dimeric form of protein phosphatase 2A (PP2A). PTPase activity in PP2A (in vitro) is relatively low when compared\ to the better recognized phosphoserine/ threonine protein phosphorylase activity. The specific biological role of PTPA is\ unknown, Basal expression of PTPA depends on the activity of a ubiquitous transcription factor, Yin Yang 1 (YY1). The\ tumour suppressor protein p53 can inhibit PTPA expression through an unknown mechanism that negatively controls\ YY1 PUBMED:11171037.\ 6015 IPR009334 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 6731 IPR010694 \

    This family consists of several bacterial VirK proteins of around 145 residues in length. The function of this family is unknown PUBMED:11434457.

    \ 1052 IPR000832 \

    G-protein-coupled receptors, GPCRs, constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence PUBMED:8170923. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. There is a specialized database for GPCRs: http://www.gpcr.org/7tm/.

    \

    The secretin-like GPCRs include secretin PUBMED:1646711, calcitonin PUBMED:1658940, parathyroid hormone/parathyroid hormone-related peptides PUBMED:1658941 and vasoactive intestinal peptide PUBMED:1314625, all of which activate adenylyl cyclase and the phosphatidyl-inositol-calcium pathway. The amino acid sequences of the receptors contain high proportions of hydrophobic residues grouped into 7 domains, in a manner reminiscent of the rhodopsins and other receptors believed to interact with G-proteins. However, while a similar 3D framework has been proposed to account for this, there is no significant sequence identity between these families: the secretin-like receptors thus bear their own unique '7TM' signature.

    \ \ 1676 IPR003245 \ Blue (type-1) copper proteins are small proteins which bind a single copper atom and which are \ characterized by an intense electronic absorption band near 600 nm PUBMED:6698995, PUBMED:8433378. The most \ well known members of this class of proteins are the plant chloroplastic plastocyanins, which exchange \ electrons with cytochrome c6, and the distantly related bacterial azurins, which exchange electrons with \ cytochrome c551. This family of proteins also includes amicyanin from bacteria such as Methylobacterium \ extorquens or Thiobacillus versutus that can grow on methylamine; auracyanins A and B from Chloroflexus \ aurantiacus PUBMED:1313011; blue copper protein from Alcaligenes faecalis; cupredoxin (CPC) from cucumber \ peelings PUBMED:1468551; cusacyanin (basic blue protein; plantacyanin, CBP) from cucumber; halocyanin from \ Natronobacterium pharaonis PUBMED:8195126, a membrane associated copper-binding protein; pseudoazurin from \ Pseudomonas; rusticyanin from Thiobacillus ferrooxidans PUBMED:1879547; stellacyanin from the Japanese \ lacquer tree; umecyanin from horseradish roots; and allergen Ra3 from ragweed. Although there is an appreciable amount of divergence in the sequences of all these proteins, the copper ligand \ sites are conserved. This domain is found in a variety of plant cyanins and pollern allergen.\ \

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Amb a 3.

    \ 5785 IPR010279 \

    This family consists of several bacterial proteins of unknown function that include the Escherichia coli genes for ElaB, YgaM and YqjD.

    \ 3988 IPR003695 \ Exopolyphosphate phosphatase (Ppx) and guanosine pentaphosphate phosphatase (GppA) belong to the sugar kinase/actin/hsp70 superfamily PUBMED:8212131.\ 1426 IPR003930 \

    Potassium channels are the most diverse group of the ion channel family\ PUBMED:1772658, PUBMED:1879548. They are important in shaping the action potential, and in neuronal excitability and plasticity PUBMED:2451788. The potassium channel family is\ composed of several functionally distinct isoforms, which can be broadly\ separated into 2 groups PUBMED:2555158: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.

    \

    These are all highly similar proteins, with only small amino acid\ changes causing the diversity of the voltage-dependent gating mechanism,\ channel conductance and toxin binding properties. Each type of K+ channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; and others are regulated by GTP-binding proteins or\ other second messengers PUBMED:2448635. In eukaryotic cells, K+ channels\ are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes PUBMED:1373731. In prokaryotic cells, they play a role in the\ maintenance of ionic homeostasis PUBMED:11178249.

    \

    All K+ channels discovered so far possess a core of \ alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has\ been termed the K+ selectivity sequence.\ In families that contain one P-domain, four subunits assemble to form a selective pathway for K+ across the membrane.\ However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K+ channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains.\ The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K+ channels; and three types of calcium (Ca)-activated K+ channels (BK, IK and SK)\ PUBMED:11178249, PUBMED:. The 2TM domain family comprises inward-rectifying K+ \ channels. In addition, there are K+ channel alpha-subunits that possess two P-domains. These are usually highly regulated K+ selective leak channels.

    \ \

    Ca2+-activated K+ channels are a diverse group of channels that are\ activated by an increase in intracellular Ca2+ concentration. They are\ found in the majority of nerve cells, where they modulate cell excitability\ and action potential. Three types of Ca2+-activated K+ channel have been\ characterised, termed small-conductance (SK), intermediate conductance (IK)\ and large conductance (BK) respectively PUBMED:9687354.

    \

    BK channels (also referred to as maxi-K channels) are widely expressed \ in the body, being found in glandular tissue, smooth and skeletal muscle, \ as well as in neural tissues. They have been demonstrated to regulate \ arteriolar and airway diameter, and also neurotransmitter release. Each\ channel complex is thought to be composed of 2 types of subunit: the pore-\ forming (alpha) subunits and smaller accessory (beta) subunits.

    \ \

    The alpha subunit of the BK channel was initially thought to share the\ characteristic 6TM organisation of the voltage-gated K+ channels. However,\ the molecule is now thought to possess an additional TM domain, with an \ extracellular N-terminus and intracellular C-terminus. This C-terminal \ region contains 4 predominantly hydrophobic domains, which are also thought\ to lie intracellularly. The extracellular N-terminus and the first TM region\ are required for modulation by the beta subunit. The precise location of the\ Ca2+-binding site that modulates channel activation remains unknown, but it\ is thought to lie within the C-terminal hydrophobic domains.

    \ \

    The beta subunit (which is thought to possess 2 TM domains) increases the\ Ca2+ sensitivity of the BK channel PUBMED:7695911. It does this by enhancing the time\ spent by the channel in burst-like open states. However, it has little \ effect on the durations of closed intervals between bursts, or on the\ numbers of open and closed states entered during gating PUBMED:10051518.\

    \ 2888 IPR001616 \

    Equine herpesvirus-1 (EHV-1)is a respiratory virus capable of causing abortion and neurological disease. Its complete DNA sequence has been determined PUBMED:1318606 and the constituent genes found to be arranged co-linearly with those in the genomes of other alphaherpesviruses, namely varicella-zoster virus and herpes simplex virus type-1 (HSV-1) PUBMED:1318606. Comparisons of the predicted amino acid sequences have allowed functions of many EHV-1 proteins to be inferred.

    \

    For example, detailed analysis of HSV-1 and HSV-2 DNA has revealed an open reading frame sufficient to encode 626 amino acids for the HSV-1 alkaline exonuclease (620 amino acids for HSV-2) PUBMED:3005609. Comparison of the predicted amino acid sequences of the viral enzymes has revealed significant differences in the N-terminal portions of the proteins; nevertheless, their three-dimensional structures are believed to be similar.

    \ 3339 IPR012327 \

    In prokaryotes, the major role of DNA methylation is to protect host DNA against degradation by restriction enzymes. There are 2 major classes of DNA methyltransferase that differ in the nature of the modifications they effect. The members of one class (C-MTases) methylate a ring carbon and form C5-methylcytosine (see ). Members of the second class (N-MTases) methylate exocyclic nitrogens and form either N4-methylcytosine\ (N4-MTases) or N6-methyladenine (N6-MTases). Both classes of MTase utilise the cofactor S-adenosyl-L-methionine (SAM) as the methyl donor and are active as monomeric enzymes PUBMED:7663118.

    \

    N-6 adenine-specific DNA methylases () (A-Mtase) are enzymes that specifically methylate the amino group at the C-6 position of adenines in DNA.\ Such enzymes are found in the three existing types of bacterial \ restriction-modification systems (in type I system the A-Mtase is the product of the hsdM\ gene, and in type III it is the product of the mod gene). All of these enzymes\ recognize a specific sequence in DNA and methylate an adenine in that\ sequence. It has been shown PUBMED:3323532, PUBMED:3248728, PUBMED:2541254, PUBMED:7607512 that A-Mtases contain a conserved motif Asp/Asn-Pro-Pro-Tyr/Phe in their N-terminal section, this conserved region could be\ involved in substrate binding or in the catalytic activity. The structure of N6-MTase TaqI (M.TaqI) has been resolved to 2.4 A PUBMED:7971991. The molecule folds into\ 2 domains, an N-terminal catalytic domain, which contains the catalytic and cofactor binding sites, and comprises a central 9-stranded beta-sheet, surrounded by 5 helices; and a C-terminal DNA recognition domain, which is formed by 4 small beta-sheets and 8 alpha-helices. The N- and C-terminal domains form a cleft that accommodates the DNA substrate. A classification of N-MTases has been proposed, based on conserved motif (CM) arrangements PUBMED:7607512. According to this classification, N6-MTases that\ have a DPPY motif (CM II) occuring after the FxGxG motif (CM I) are\ designated D12 class N6-adenine MTases.

    \ 6989 IPR009830 \

    This family consists of several putative lipoproteins from Mycobacterium species. The function of this family is unknown.

    \ 2351 IPR002789 \

    This prokaryotic protein family has no known function. It contains several conserved aspartates and histidines that could be metal ligands.

    \ 7677 IPR012484 \

    The sequence making up family 7 of the metallothionein superfamily are found repeated in metallothionein proteins expressed by two Tetrahymena species. Metallothioneins are low molecular mass, cysteine-rich metal-binding proteins that are thought to be involved in the regulation of levels of trace metals, and detoxification of these metals when present in excess PUBMED:7813475. Some of the metallothioneins found in this family (for example, ) are known to be induced by cadmium and are thought to be involved in the cellular sequestration of toxic metal ions. The high proportion of cysteine residues allows the metal ions to be bound by the formation of clusters of metal-thiolate complexes PUBMED:7813475. Tetrahymena spp. metallothioneins differ from other eukaryotic metallothioneins mainly in the length of their sequences and in the cysteine-containing motifs they exhibit.

    \ 7976 IPR012959 \

    This C-terminal domain is found in Penguin-like proteins and is associated with Pumilio like repeats PUBMED:15112237.

    \ 6358 IPR010534 \

    This family consists of several phage antitermination protein Q and related bacterial sequences. Antiterminator proteins control gene expression by recognising control signals near the promoter and preventing transcriptional termination which would otherwise occur at sites that may be a long way downstream PUBMED:8332211.

    \ 5283 IPR008633 \ This family consists of archaeal GvpH proteins which are thought to be involved in gas vesicle synthesis PUBMED:9211710.\ 2167 IPR007463 \ This is a bacterial protein of unknown function.\ 3170 IPR001236 \ L-lactate dehydrogenases are metabolic enzymes which catalyse the conversion of \ L-lactate to pyruvate, the last step in anaerobic glycolysis. L-lactate dehydrogenase \ is also found as a lens crystallin in bird and crocodile eyes. L-2-hydroxyisocaproate \ dehydrogenases are also members of the family. \ \ Malate dehydrogenases catalyse the interconversion of malate to oxaloacetate. The \ enzyme participates in the citric acid cycle.\ 4624 IPR001869 \ Thiol-activated cytolysins PUBMED:2254290, PUBMED: are toxins produced by a variety of Gram-positive bacteria and are characterized by their ability to lyse cholesterol-containing membranes, their reversible inactivation by oxidation and their capacity to bind to cholesterol. All these proteins contain a single cysteine residue, located in their C-terminal section, which has been shown PUBMED:2888650 to be essential for the binding to cholesterol.\ 6530 IPR009581 \

    This family is baesd on the C terminus of several hypothetical eukaryotic proteins of unknown function. Proteins in this entry contain two conserved motifs: DRHHYE and QCC, as well as a number of conserved cysteine residues.

    \ 2553 IPR001782 \

    The flgH, flgI and fliF genes of Salmonella typhimurium encode the major proteins for the L, P and M rings of the flagellar basal body PUBMED:2544561. In fact, the basal body consists of four rings (L,P,S and M) surrounding the flagellar rod, which is believed to transmit motor rotation to the filament PUBMED:2129540. The M ring is integral to the inner membrane of the cell, and may be connected to the rod via the S (supramembrane) ring, which lies just distal to it. The L and P rings reside in the outer membrane and periplasmic space, respectively.

    \

    The sequences of the FlgH, FlgI and FliF gene products have been determined PUBMED:2544561. FlgH and FlgI, which are exported across the cell membrane to their destinations in the outer membrane and periplasmic space, have typical N-terminal cleaved signal-peptide sequences PUBMED:2544561, PUBMED:3549691. FlgH is predicted to have an extensive beta-sheet structure, in keeping with other outer membrane proteins, and FlgI is thought to have even more beta-structure content PUBMED:2544561. Several aspects of the DNA sequence of these genes and their surrounds suggest complex regulation of the flagellar gene system.

    \ 1135 IPR007666 \

    ATP is the most commonly used phosphoryl group donor for kinases. However, some archaea utilise novel ADP-dependent glucokinases and phosphofructokinases in their glycolytic pathways PUBMED:9075622, PUBMED:11342216, PUBMED:11717273, PUBMED:10409652. These ADP-dependent kinases are homologous to each other but show no significant similarity to any of currently characterised ATP-dependent enzymes. Interestingly this family also contains sequences from higher eukaryotes, though the function of these is not known.

    \ 3997 IPR000817 \

    Prion protein (PrP-c) PUBMED:2572197, PUBMED:1916104, PUBMED:2908696 is a small glycoprotein found in high \ quantity in the brain of animals infected with certain degenerative neurological diseases, such as \ sheep scrapie and bovine spongiform encephalopathy (BSE), and the human dementias Creutzfeldt-Jacob \ disease (CJD) and Gerstmann-Straussler syndrome (GSS). PrP-c is encoded in the host genome and is \ expressed both in normal and infected cells. During infection, however, the PrP-c molecule become \ altered (conformationally rather than at the amino acid level) to an abnormal isoform, PrP-sc. In detergent-treated brain extracts from infected individuals, fibrils\ composed of polymers of PrP-sc, namely scrapie-associated fibrils or prion rods, can be evidenced by electron microscopy. The precise function of the normal PrP isoform in healthy individuals remains unknown. Several results, mainly obtained in transgenic animals, indicate that PrP-c\ might play a role in long-term potentiation, in sleep physiology, in oxidative burst compensation (PrP can fix four Cu2+ through its octarepeat domain), in\ interactions with the extracellular matrix (PrP-c can bind to the precursor of the laminin receptor, LRP), in apoptosis and in signal transduction (costimulation of\ PrP-c induces a modulation of Fyn kinase phosphorylation) PUBMED:12354606.

    The normal isoform, PrP-c, is anchored at the cell membrane, in rafts, through a glycosyl phosphatidyl inositol (GPI); its half-life at the cell surface is 5 h, after which\ the protein is internalised through a caveolae-dependent mechanism and degraded in the endolysosome compartment. Conversion between PrP-c and PrP-sc\ occurs likely during the internalisation process.

    In humans, PrP is a 253 amino acid protein, which has a molecular weight of 3536 kDa. It has two hexapeptides\ and repeated octapeptides at the N-terminus, a disulphide bond and is associated at the C-terminus with a GPI, which enables it to anchor to the external part of the\ cell membrane. The\ secondary structure of PrP-c is mainly composed of alpha-helices, whereas PrP-sc is mainly beta-sheets: transconformation of alpha-helices into beta-sheets has been\ proposed as the structural basis by which PrP acquires pathogenicity in TSEs. The three-dimensional structures shows the protein to be made of a globular domain which includes three alpha-helices and two small antiparallel beta-sheet\ structures, and a long flexible tail whose conformation depends on the biophysical parameters of the environment. Crystals of the globular domain of PrP\ have recently been obtained; their analysis suggests a possible dimerisation of the protein through the three-dimensional swapping of the C-terminal helix 3 and\ rearrangement of the disulphide bond.

    \ 7867 IPR001941 \ Pro-opiomelanocortin is present in high levels in the pituitary and is processed into 3 major peptide families: adrenocorticotrophin (ACTH); alpha-, beta- and gamma-melanocyte- stimulating hormones (MSH); and beta-endorphin PUBMED:2266117. ACTH regulates the synthesis and release of glucocorticoids and, to some extent, aldosterone in the adrenal cortex. It is synthesised and released in response to corticotrophin-releasing factor at times of stress (i.e. heat, cold, infection, etc.), its release leading to increased metabolism. The action of MSH in man is poorly understood, but it may be involved in temperature regulation PUBMED:2266117. Full activity of ACTH resides in the first 20 N-terminal amino acids, the first 13 of which are identical to alpha-MSH PUBMED:2266117, PUBMED:2839146.\ 1714 IPR003038 \ Members of this family are thought to be integral membrane\ proteins. Some members of this family have been shown to\ cause apoptosis if mutated PUBMED:8413235, these proteins are known as\ DAD for defender against death. The family also includes\ the epsilon subunit of the oligosaccharyltransferase that\ is involved in N-linked glycosylation PUBMED:7593165.\ 108 IPR004010 \ Cache is a signaling domain that is found in animal calcium channel subunits and a certain class of prokaryotic chemotaxis receptors.\ 6086 IPR009367 \

    This family consists of several hypothetical eukaryotic and prokaryotic proteins. The function of this family is unknown.

    \ 820 IPR001584 \

    Integrase comprises three domains capable of folding independently and whose three-dimensional structures are known. However, the manner in which the N-terminal, catalytic, and C-terminal domains interact in the holoenzyme remains obscure. Numerous studies indicate that the enzyme functions as a multimer, minimally a dimer. The integrase proteins from Human immunodeficiency virus 1 (HIV-1) and Avian sarcoma virus (ASV) have been studied most carefully with respect to the structural basis of catalysis. Although the active site of ASV integrase does not undergo significant conformational changes on binding the required metal cofactor, that of HIV-1 does. This active site-mediated conformational change in HIV-1 reorganizes the catalytic core and C-terminal domains and appears to promote an interaction that is favorable for catalysis PUBMED:10384242.

    \

    Retroviral integrase is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. The presence of retrovirus integrase-related gene sequences in eukaryotes is known. Bacterial transposases involved in the transposition of the insertion sequence also belong to this group.

    \

    HIV integrase catalyses the incorporation of virally derived DNA into the human genome. This unique step in the virus life cycle provides a variety of points for intervention and hence is an attractive target for the development of new therapeutics for the treatment of AIDS PUBMED:9161051. Substrate recognition by the retroviral integrase enzyme is critical for retroviral integration. To catalyze this recombination event, integarse must recognize and act on two types of substrates, viral DNA and host DNA, yet the necessary interactions exhibit markedly different degrees of specificity PUBMED:10384243.

    \ 1952 IPR004858 \ Members of this family are multigene family 530 proteins from African swine fever viruses. These proteins may be involved in promoting survival of infected macrophages PUBMED:11238833.\ \ 1353 IPR001153 \ Barwin is a basic protein isolated from aqueous extracts of barley seeds. It is\ 125 amino acids in length, and contains six cysteine residues that combine to form\ three disulphide bridges PUBMED:1390663,\ PUBMED:1390664. Comparative analysis\ shows the sequence to be highly similar to a 122 amino acid stretch in the C-terminal\ of the products of two wound-induced genes (win1 and win2) from potato, the\ product of the hevein gene of rubber trees, and pathogenesis-related protein 4 from\ tobacco. The high levels of similarity to these proteins, and their ability to bind\ saccharides, suggest that the barwin domain may be involved in a common defense\ mechanism in plants.\ 211 IPR012310 \

    This domain belongs to a more diverse superfamily, including catalytic domain of the mRNA capping enzyme () and NAD-dependent DNA ligase () PUBMED:8653795.

    \ 2412 IPR007581 \ Endonuclease V is specific for single-stranded DNA or for duplex DNA that contains uracil or that is damaged by a variety of agents PUBMED:8990280.\ 7960 IPR012949 \

    This family forms bacteriocin-like propetides with a glycine-glycine cleavage site. The bacteriocin is initially formed as a pre-propeptide and upon cleavage at the glycine-glycine cleavage site, a leader peptide and the propeptide would be formed. The propeptide then undergoes posttranslational modification before becoming functional PUBMED:10858242.

    \ 7385 IPR011515 \

    Shugoshin-like proteins contain this conserved sequence at the C terminus, which is rich in basic amino-acids. Shugoshin (Sgo1) protects Rec8 at centromeres during anaphase I (during meiosis) so that sister chromatids remain tethered. Sgo2 is a paralogue of Sgo1 and is involved in correctly orienting sister-centromeres PUBMED:14730319.

    \ 6631 IPR010652 \

    This family represents a conserved region of approximately 60 residues within a number of hypothetical bacterial and archaeal proteins of unknown function.

    \ 4603 IPR004598 \ Members of this family are part of the TFIIH complex which is involved in the initiation of transcription and nucleotide excision repair. The core-TFIIH basal transcription factor complex has six subunits, this is the p52 subunit.\ 989 IPR003124 \

    The WH2 (WASP-Homology 2, or Wiskott-Aldrich homology 2) domain is an ~18 amino acids actin-binding motif. This domain was first recognized as an essential element for the regulation of the cytoskeleton by the mammalian Wiskott-Aldrich syndrome protein (WASP) family. WH2 proteins occur in eukaryotes from yeast to mammals, in insect viruses, and in some bacteria. The WH2 domain is found as a modular part of larger proteins; it can be associated with the WH1 or EVH1 domain and with the CRIB domain, and the WH2 domain can occur as a tandem repeat. The WH2 domain binds actin monomers and can facilitate the assembly of actin monomers into newly forming actin filaments PUBMED:11434350, PUBMED:11911886.

    \ \ \ 4970 IPR000236 \ The hepatitis B virus (HBV) X gene shares sequences with both the polymerase and precore genes, carries several regulatory signals critical to the replicative cycle, and its product has a transactivating function PUBMED:7561749. The transactivating function is probably associated with a tumorigenic potential of HBx, since x gene sequences, encoding functional HBx, have been repeatedly found integrated into the genome of liver carcinoma cells PUBMED:8530810.\ 4856 IPR005348 \

    This is a family of small integral membrane proteins found in some archaebacteria.

    \ 3809 IPR006531 \

    These domain occurs in a family of phage (and bacteriocin) proteins related to the phage P2 V gene product, which forms the small spike at the tip of the tail PUBMED:7483254. Homologs in general are annotated as baseplate assembly protein V. At least one member is encoded within a region of Pectobacterium carotovorum (Erwinia carotovora) described as a bacteriocin, a phage tail-derived module able to kill bacteria closely related to the host strain.

    \ 1453 IPR006068 \

    Members of this families are involved in Na+/K+, H+/K+, Ca++ and Mg++ transport.

    \ 7465 IPR013043 \

    A region of similarity shared by several Rhodopirellula baltica cytochrome-like proteins that are predicted to be secreted. These proteins also contain , , and .

    \ 5369 IPR008843 \ Entomopoxviruses (EPVs) are large (300-400 nm) oval-shaped viruses replicating in the cytoplasm of their insect host cells. At the end of their replicative cycle EPVs virions are occluded in a highly expressed protein called spheroidin. This protein forms large (5-20 mm long) oval-shaped occlusion bodies (OBs) called spherules. The infectious cycle of EPVs begins with the ingestion by the insect host of the spherules, their dissolution by the alkaline reducing conditions of the midgut fluid and the release of virions in the midgut lumen. The infective particles first replicate in midgut epithelial cells, then pass the gut barrier to colonise the internal tissues, mainly the fat body cells. Whilst spheroidin has been demonstrated to be non-essential for viral replication, it plays an essential role in the natural biological cycle of the virus in protecting virions from adverse environmental conditions (e.g. UV degradation) and thus improving transmission efficacy. In this respect, spheroidins are functionally similar to polyhedrins of baculoviruses or cypoviruses PUBMED:10867199.\ 7451 IPR011477 \

    This sequence motif is highly conserved in several short hypothetical proteins from Rhodopirellula baltica. It is also associated with in .

    \ 6491 IPR010599 \

    This region of unknown function is situated between the and domains in a cytoplasmic and membrane associated protein which appears to function as an adapter protein or regulator of Ras signalling pathways PUBMED:14597674.

    \ 2146 IPR007435 \ This family consists of several proteins of uncharacterised function.\ 7376 IPR011427 \

    This domain is found in several Chlamydia polymorphic membrane proteins PUBMED:11254597. Chlamydia pneumoniae is an obligate intracellular bacterium and a common human pathogen causing infection of the upper and lower respiratory tract. This domain is found between the beta-helical repeats () and the C-terminal .

    \ \ \ 3463 IPR001898 \

    Integral membrane proteins that mediate the intake of a wide variety of\ molecules with the concomitant uptake of sodium ions (sodium symporters) can\ be grouped, on the basis of sequence and functional similarities into a number\ of distinct families. One of these families currently consists of the\ following proteins:\

    \

    These transporters are proteins of from 430 to 620 amino acids which are\ highly hydrophobic and which probably contain about 12 transmembrane regions.

    \ 4605 IPR003194 \ Accurate transcription in vivo requires at least six general transcription initiation factors, in addition to RNA polymerase II. Transcription initiation factor IIA (TFIIA) is a multimeric protein which facilitates the binding of TFIID to the TATA box. \ 4415 IPR006151 \

    This entry contains both shikimate and quinate dehydrogenases. Shikimate 5-dehydrogenase () catalyses the conversion of shikimate to 5-dehydroshikimate. This reaction is part of the shikimate pathway which is involved in the biosynthesis of aromatic amino acids.\ Quinate 5-dehydrogenase catalyses the conversion of quinate to 5-dehydroquinate. This reaction is part of the quinate pathway where quinic acid is exploited as\ a source of carbon in prokaryotes and microbial eukaryotes.\ Both the shikimate and quinate pathways share two common pathway metabolites, 3-dehydroquinate and dehydroshikimate.

    \ 7516 IPR013106 \

    This domain is found in antibodies as well as neural protein P0 and CTL4 amongst others.

    \ 1649 IPR003177 \

    Cytochrome c oxidase () is an oligomeric enzymatic complex which is a component \ of the respiratory chain complex and is involved in the transfer of electrons from \ cytochrome c to oxygen PUBMED:6307356. \ In eukaryotes this enzyme complex is located in the mitochondrial inner membrane; in \ aerobic prokaryotes it is found in the plasma membrane.

    \

    In eukaryotes, in addition to the \ three large subunits, I, II and III, that form the catalytic center of the enzyme complex, there are \ a variable number of small polypeptidic subunits. This family is composed of the heart and liver isoforms of cytochrome c oxidase subunit VIIa.

    \ 4111 IPR005516 \

    Remorin binds both simple and complex galaturonides. The N-terminal region of remorin is proline rich, while the C-terminal region has been predicted to form a coiled-coil, that is expected to interact with other macromolecules, most likely DNA. Functional similarities between the behavior of the proteins and viral proteins involved in intercellular communication have been noted PUBMED:8989883.

    \ 6393 IPR010554 \

    This group contains several eukaryote specific repeats of around 35 residues in length. The function of this family is unknown.

    \ 922 IPR007829 \ This domain is composed of a pair of transmembrane alpha helices connected by a short linker. The function of this domain is unknown, however it occurs in a wide range or protein contexts.\ 2321 IPR007801 \ This family consists of uncharacterised bacterial proteins.\ 4351 IPR007545 \ Lysine-oxoglutarate reductase/Saccharopine dehydrogenase (LOR/SDH) is a bifunctional enzyme. This conserved region is commonly found immediately N-terminal to saccharopine dehydrogenase conserved region () in eukaryotes PUBMED:9426595, PUBMED:9654071.\ 2697 IPR002511 \ Disruption of the V1 gene in Tomato yellow leaf curl virus (TYLCV)\ stopped its ability to systemically infect tomato plants, suggesting\ that the V1 gene product is required for successful infection\ of the host PUBMED:9123819.\ 7117 IPR006606 \

    This repeat of unknown function has been found in Ciona intestinalis (sea squirt) COS41.4 protein,\ Caenorhabditis elegans R01H10.6 protein and Drosophila melanogaster CG1126\ protein.

    \ 4878 IPR005368 \

    This family contains small proteins of unknown function.

    \ 797 IPR011262 \

    RNA polymerase (RNAP) II, which is responsible for all mRNA synthesis in eukaryotes, consists of 12 subunits. Subunits Rpb3 and Rpb11 form a heterodimer that is functionally analogous to the archaeal RNAP D/L heterodimer, and to the prokaryotic RNAP alpha (RpoA) subunit homodimer. In each case, they play a key role in RNAP assembly by forming a platform on which the catalytic subunits (eukaryotic Rpb1/Rpb2, and prokaryotic beta/beta’) can interact PUBMED:11453250.

    \

    The dimerisation domains differ between the different subunit families. In eukaryotic Rpb3, archaeal D and bacterial RpoA subunits (), the dimerisation domain is comprised of a central insert domain, which interrupts an Rpb11-like domain (), dividing it into two halves PUBMED:9657722. In eukaryotic Rpb11 and archaeal L subunits, the insert domain is lacking, leaving the Rpb11-like domain intact and contiguous.

    \ \ 3996 IPR001108 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of aspartic peptidases belong to MEROPS peptidase family A22 (presenilin family, clan AD): subfamily A22A, the type example being presenilin 1 from Homo sapiens.

    \ \

    Presenilins are polytopic transmembrane (TM) proteins, mutations in which\ are associated with the occurrence of early-onset familial Alzheimer's\ disease, a rare form of the disease that results from a single-gene\ mutation PUBMED:9791530, PUBMED:9521418. \ The physiological functions of presenilins are unknown, but they may be related to developmental signalling, apoptotic signal transduction, or processing of selected proteins, such as the beta-amyloid precursor protein(beta-APP). There are a number of subtypes which belong to this presenilin family. That presenilin homologues have been identified in species that do not have an Alzhemier's disease correlate suggests that they may have functions unrelated to the disease, homologues having been identified in mouse, Drosophila melanogaster, Caenorhabditis elegans \ PUBMED:7566091 and other members of the eukarya including plants.

    \ 6507 IPR009565 \

    This family consists of several hypothetical mammalian proteins of around 190 residues in length. The function of this family is unknown.

    \ 3451 IPR000548 \ The myelin sheath is a multi-layered membrane, unique to the nervous system, that functions as an insulator to greatly increase the velocity of axonal impulse conduction PUBMED:2435734. Myelin basic protein (MBP) PUBMED:1710177, PUBMED:1710279 is a hydrophilic protein that may function to maintain the correct structure of myelin, interacting with the lipids in the myelin membrane by electrostatic and hydrophobic interactions. In mammals various forms of MBP exist which are produced by the alternative splicing of a single gene; these forms differ by the presence or the absence of short (10 to 20 residues) peptides in various internal locations in the sequence. The major form of MBP is generally a protein of about 18.5 Kd (170 residues). MBP is the target of many post-translational modifications: it is N-terminally acetylated, methylated on an arginine residue, phosphorylated by various serine/threonine protein-kinases, and deamidated on some glutamine residues.\ 1827 IPR007884 \ This family contains DREV protein homologues from several eukaryotes. The function of this protein is unknown PUBMED:11132146. However, these proteins appear to be related to other methyltransferases.\ 6198 IPR010473 \

    Diaphanous-related formins (Drfs) are a family of formin homology (FH) proteins that act as effectors of Rho small GTPases during growth factor-induced cytoskeletal remodelling, stress fibre formation, and cell division PUBMED:10631086. Drf proteins are characterised by a variety of shared domains: an N-terminal GTPase-binding domain (GBD), formin-homology domains FH1, FH2 () and FH3 (), and a C-terminal conserved Dia-autoregulatory domain (DAD) that binds the GBD.

    \

    This entry represents the GBD, which is a bifunctional autoinhibitory domain that interacts with and is regulated by activated Rho family members. Mammalian Drf3 contains a CRIB-like motif within its GBD for binding to Cdc42, which is required for Cdc42 to activate and guide Drf3 towards the cell cortex where it remodels the actin skeleton PUBMED:12676083.

    \ 2035 IPR007166 \ This family of archaeal proteins has no known function. They contain an N-terminal motif QXSXEXXXL that is likely to be functionally important.\ 2029 IPR007141 \

    This domain is currently only found in a small number of proteins restricted to Streptomyces spp. All have four conserved cysteines that probably form two disulphide bonds. One of these proteins from Streptomyces nigrescens, is the well characterised metalloproteinase inhibitor PUBMED:2243793, PUBMED:3888972, SMPI (), which belongs to MEROPS proteinase inhibitor family I36, clan IU. The functional of the other proteins is not known.

    \ \ \

    The structure of SMPI has been determined. It has 102 amino acid residues with two disulfide bridges and specifically inhibits metalloproteinases such as thermolysin, which belongs to MEROPS peptidase family M4. SMPI is composed of two beta-sheets, each consisting of four antiparallel beta-strands. The structure can be considered as two Greek key motifs with 2-fold internal symmetry, a Greek key beta-barrel. One unique structural feature found in SMPI is in its extension between the first and second strands of the second Greek key motif which is known to be involved in the inhibitory activity of SMPI. In the absence of sequence similarity, the SMPI structure shows clear similarity to both domains of the eye lens crystallins , both domains of the calcium sensor protein-S, as well as the single-domain yeast killer toxin. The yeast killer toxin structure was thought to be a precursor of the two-domain beta gamma-crystallin proteins, because of its structural similarity to each domain of the beta gamma-crystallins. SMPI thus provides another example of a single-domain protein structure that corresponds to the ancestral fold from which the two-domain proteins in the beta gamma-crystallin superfamily are believed to have evolved PUBMED:9735297.

    \ \ 2926 IPR005035 \ Herpes simplex virus (HSV) is a large DNA virus, the genome of which encodes approximately 80 genes. The UL3 gene of HSV-2 is predicted to encode a 233 amino acid protein\ with a molecular mass of 26 kDa. Homologues of the UL3 protein are encoded only among alphaherpesviruses. The function of the UL3 protein of HSV remains unknown but it is known to localise to the nucleus and is a phosphoprotein PUBMED:10466815.\ 748 IPR002683 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    In PSII, the oxygen-evolving complex (OEC) is responsible for catalysing the splitting of water to O(2) and 4H+. The OEC is composed of a cluster of manganese, calcium and chloride ions bound to extrinsic proteins. In cyanobacteria there are five extrinsic proteins in OEC (PsbO, PsbP-like, PsbQ-like, PsbU and PsbV), while in plants there are only three (PsbO, PsbP and PsbQ), PsbU and PsbV having been lost during the evolution of green plants PUBMED:15258264.

    \

    This family represents the PSII OEC protein PsbP. Both PsbP and PsbQ () are regulators that are necessary for the biogenesis of optically active PSII. PsbP increases the affinity of the water oxidation site for chloride ions and provides the conditions required for high affinity binding of calcium ions. The crystal structure of PsbP from Nicotiana tabacum revealed a two-domain structure, where domain 1 may play a role in the ion retention activity in PSII, the N-terminal residues being essential for calcium and chloride ion retention activity PUBMED:15031714. PsbP is encoded in the nuclear genome in plants.

    \ \ 3352 IPR003464 \ This small enzyme forms a homodecameric complex, that catalyses the third step in the catabolism of catechol to succinate- and acetyl-coa in the beta-ketoadipate pathway (). The protein has a ferredoxin-like fold according to SCOP.\ 5971 IPR009313 \

    This is a family of uncharacterised Baculovirus proteins that are all about 11 kDa in size.

    \ 8087 IPR013205 \

    The tryptophan operon regulatory region of Citrobacter freundiis (leader transcript) encodes a 14-residue peptide containing characteristic tandem tryptophan residues. It is about 10 nucleotides shorter than those of Escherichia coli and Salmonella typhimurium PUBMED:6749821.

    \ 3365 IPR005300 \

    This group of proteins includes MltA; a membrane-bound, murein degrading transglycosylase enzyme which plays an important role in the controlled growth\ of the stress-bearing sacculus of Escherichia coli PUBMED:10037771, PUBMED:9287002.

    \ 843 IPR000082 \ SEA is an extracellular domain associated with\ O-glycosylation PUBMED:7670383.\ Proteins found to contain SEA-modules include, agrin, enterokinase, 63 kDa sea urchin (Strongylocentrotus purpuratus) sperm protein, perlecan (heparan sulphate proteoglycan core, mucin 1 and the cell surface antigen, 114/A10, and two functionally uncharacterised,\ probably extracellular, Caenorhabditis elegans proteins. Despite the functional\ diversity of these adhesive proteins, a common denominator seems to be their\ existence in heavily glycosylated environments. In addition, the better characterised\ proteins all contain O-glycosidic-linked carbohydrates such as\ heparan sulphate that contribute considerably to their molecular masses. The common\ module might regulate or assist binding to neighbouring carbohydrate moieties.\

    Enterokinase, the initiator of intestinal digestion, is a\ mosaic protease composed of a distinctive assortment of\ domains PUBMED:8052624.

    \ 4479 IPR007880 \ This family consists of Spiralin proteins found in spiroplasma bacteria. Spiroplasmas are helically shaped pathogenic bacteria related to the mycoplasmas. The surface of spiroplasma bacteria is crowded with the membrane-anchored lipoprotein spiralin whose structure and function are unknown although its cellular function is thought to be a structural and mechanical one rather than catalytical PUBMED:11988221.\ 6899 IPR010763 \

    This family consists of several hypothetical bacterial proteins of around 220 residues in length. The function of this family is unknown.

    \ 1584 IPR000151 \

    Ciliary neurotrophic factor (CNTF) is a member of the gp130 family of cytokines. CNTF is a survival factor for various\ neuronal cell types and seems to prevent the degeneration of motor\ axons after axotomy suggesting it may be a potential therapeutic for treating\ neurodegeneration and nerve injury. CNTF acts on oligodendrocytes by favoring their final maturation, and this effect is mediated through the 130 kDa glycoprotein receptor common to the CNTF family and transduced through the Janus kinase pathway. The functional receptor complex of CNTF is composed of the CNTF receptor alpha (CNTFR), gp130 and the leukemia inhibitory factor receptor (LIFR).

    The structure of CNTF is a four helical bundle\ PUBMED:7796798. CNTF acts as a homodimer. Three regions on CNTF have been identified as binding sites for its receptors. The ligand-receptor interactions are mediated through the cytokine binding domains (CBDs) and/or the immunoglobulin-like domains of the receptors. However, in the case of CNTF, the precise nature of the protein-protein contacts in the signaling complex has not yet been resolved, but there is evidence that the membrane distal CBD (CBD1) of LIFR associates in vitro with soluble CNTFR in the absence of CNTF PUBMED:11943154, PUBMED:12417647.

    \ 995 IPR004170 \ The WWE domain is named after three of its conserved residues and is predicted to mediate specific protein-protein interactions in ubiquitin and ADP ribose conjugation systems.\ 48 IPR001464 \

    More than a thousand proteins of the annexin superfamily have been identified in major eukaryotic\ phyla, but annexins are absent from yeasts and prokaryotes PUBMED:15059252. Most eukaryotic species have 1-20 annexin (ANX) genes; even the primitive unicellular protist Giardia has\ at least seven. All annexins share a core domain made up of four similar repeats, each approximately 70 amino acids long PUBMED:1646719. Each repeat is\ made up of five alpha helices and usually contains a characteristic 'type 2' motif for binding calcium ions with the sequence\ 'GxGT-[38 residues]-D/E'. Animal and fungal annexins also have\ variable amino-terminal domains.

    The core domains of most vertebrate annexins have been analyzed by X-ray crystallography, revealing conservation of their\ secondary and tertiary structures despite only 45-55% amino-acid identity among individual members. Each annexin\ repeat is folded into five alpha-helices, and these in turn are wound into a right-handed super-helix. The four\ repeats pack into a structure that resembles a flattened disc, with a slightly convex surface on which the Ca2+-binding\ loops are located and a concave surface at which the amino and carboxyl termini come into close apposition. Annexins are traditionally thought of as calcium-dependent phospholipid-binding proteins, but recent work suggests a more\ complex set of functions.

    \

    \ 3251 IPR001916 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 22 comprises enzymes with two known activities; lysozyme type C () and alpha-lactalbumins. Asp and/or the carbonyl oxygen of the C-2 acetamido group \ of the substrate acts as the catalytic nucleophile/base.

    \ \

    Lysozyme type C and alpha-lactalbumin and are similar both in terms of primary \ sequence and structure, and probably evolved from a common ancestral \ protein. There is, however, no similarity in function as lactalbumin \ promotes the conversion of galactosyltransferase to lactose synthase and is\ essential for milk production PUBMED:6715332, while lysozyme catalyses the hydrolysis \ of bacterial cell wall polysaccharides; it has also been recruited for a \ digestive role in certain ruminants and colobine monkeys PUBMED:2738070. Another \ significant difference between the two enzymes is that all lactalbumins have \ the ability to bind calcium PUBMED:3785375, while this property is restricted to only \ a few lysozymes PUBMED:3666156.

    The binding site was deduced using high resolution \ X-ray structure analysis and was shown to consist of three aspartic acid \ residues. It was first suggested that calcium bound to lactalbumin \ stabilised the structure, but recently it has been claimed that calcium \ controls the release of lactalbumin from the golgi membrane and that the \ pattern of ion binding may also affect the catalytic properties of the \ lactose synthetase complex.

    \ 3464 IPR006986 \ Nab1 and Nab2 are co-repressors that specifically interact with and repress transcription mediated by the three members of the NGFI-A (Egr-1, Krox24, zif/268) family of eukaryotic (metazoa) transcription factors PUBMED:9418898. This C-terminal region is found only in the Nab1 subfamily.\ 7688 IPR012435 \

    These sequences are found in hypothetical eukaryotic proteins of unknown function. The region concerned is approximately 280 residues long.

    \ 5595 IPR008785 \ This family consists of several Poxvirus virion envelope protein A14-like sequences. A14 is a component of the virion membrane and has been found to be an H1 phosphatase substrate in vivo and in vitro. A14 is hyperphosphorylated on serine residues in the absence of H1 expression PUBMED:10729144.\ 2543 IPR001528 \ Flaviviruses encode a single polyprotein. This is cleaved into three structural and seven non-structural proteins. The NS4B protein is small and poorly conserved among the Flaviviruses. NS4B contains multiple hydrophobic potential membrane spanning regions PUBMED:2174669. NS4B may form membrane components of the viral replication complex and could be involved in membrane localisation of NS3 and NS5 (see ) PUBMED:2174669.\ 2409 IPR002076 \

    This group of eukaryotic integral membrane proteins are evolutionary related, but exact\ function has not yet clearly been established.\ The proteins have from 290 to 435 amino acid residues. Structurally, they seem\ to be formed of three sections: a N-terminal region with two transmembrane\ domains, a central hydrophilic loop and a C-terminal region that contains from\ one to three transmembrane domains. \ Members of this family are involved in long chain fatty acid elongation systems that produce the 26-carbon precursors for ceramide and sphingolipid synthesis PUBMED:8027068. Predicted to be integral membrane proteins, in eukaryotes they are probably located on the endoplasmic reticulum. Yeast ELO3 () affects plasma membrane H+-ATPase activity, and may act on a glucose-signaling pathway that controls the expression of several genes that are transcriptionally regulated by glucose such as PMA1 PUBMED:7768822.

    \ \ 1723 IPR004007 \ Dihydroxyacetone kinase (glycerone kinase) catalyses the phosphorylation of glycerone in the presence of ATP to glycerone phosphate in the glycerol utilization pathway. This is the predicted phosphatase domain of the dihydroxyacetone kinase family.\ 7652 IPR012850 \

    This domain is organised as a five-stranded anti-parallel beta-sheet PUBMED:9571044, PUBMED:8196040. It is the probable result of a decay of the common-fold.

    \ 7536 IPR011635 \ The APHP (acidic peptide-dependent hydrolases/peptidase) domain is found in a variety of different proteins.\ 3771 IPR000209 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to the MEROPS peptidase families S8 (subfamilies S8A (subtilisin) and S8B (kexin)) and S53 (sedolisin) both of which are members of clan SB.

    \ \

    The subtilisin family is the second largest serine protease family characterised to date. Over 200 subtilises are presently known, more than 170 of which with their complete amino acid sequence PUBMED:9070434. It is widespread, being found in eubacteria,\ archaebacteria, eukaryotes and viruses PUBMED:7845208. The vast majority of the family are endopeptidases, although there is an exopeptidase, tripeptidyl peptidase PUBMED:7845208, PUBMED:8439290. Structures have been determined for several members of the subtilisin family: they exploit the same catalytic triad as the chymotrypsins, although the residues occur in a different order (HDS in\ chymotrypsin and DHS in subtilisin), but the structures show no other\ similarity PUBMED:7845208, PUBMED:8439290. Some subtilisins are mosaic proteins, and others\ contain N- and C-terminal extensions that show no sequence similarity to\ any other known protein PUBMED:7845208. Based on sequence homology, a subdivision into six families has been proposed PUBMED:9070434.

    \ \

    The proprotein-processing endopeptidases kexin, furin and related enzymes\ form a distinct subfamily known as the kexin subfamily (S8B). These preferentially\ cleave C-terminally to paired basic amino acids. Members of this subfamily\ can be identified by subtly different motifs around the active site PUBMED:7845208, PUBMED:8439290.\ Members of the kexin family, along with endopeptidases R, T and K from the\ yeast Tritirachium and cuticle-degrading peptidase from Metarhizium, require\ thiol activation. This can be attributed to the presence of Cys-173 near to\ the active histidine PUBMED:8439290.Only 1 viral member of the subtilisin family is known, a 56-kDa protease from herpes virus 1, which infects the channel catfish PUBMED:7845208.

    \ \

    Sedolisins (serine-carboxyl peptidases) are proteolytic enzymes whose fold resembles that of subtilisin; however, they\ are considerably larger, with the mature catalytic domains containing approximately 375 amino acids. The defining\ features of these enzymes are a unique catalytic triad, Ser-Glu-Asp, as well as the presence of an aspartic acid\ residue in the oxyanion hole. High-resolution crystal structures have now been solved for sedolisin from Pseudomonas\ sp. 101, as well as for kumamolisin from a thermophilic bacterium, Bacillus novo sp. MN-32. Mutations in the human gene leads to a fatal neurodegenerative disease PUBMED:12673349.

    \ 91 IPR002860 \

    Members of this family contain multiple BNR (bacterial neuraminidase repeat) repeats or Asp-boxes. The repeats are short, however the repeats are never found closer than 40 residues together suggesting that the repeat is structurally longer. These repeats are found in a variety of non-homologous proteins, including bacterial ribonucleases, sulphite oxidases, reelin, netrins, sialidases, neuraminidases, some lipoprotein receptors, and a variety of glycosyl hydrolases PUBMED:11266614.

    \ 3827 IPR006432 \

    These sequences represent a family of phage minor structural proteins. The protein is suggested to be the head-tail connector, or portal protein, on the basis of its position in the phage gene order, its presence in mature phage, its size, and its conservation across a number of complete genomes of tailed phage that lack other candidate portal proteins. Several other known portal protein families lack clear homology to this family and to each other PUBMED:10652093.

    \ 5135 IPR007972 \

    This family consists of several uncharacterised eukaryotic proteins of unknown function.

    \ 2281 IPR006946 \ This family contains a conserved region found in a number of uncharacterised plant proteins.\ 7075 IPR010828 \

    This family contains a number of alcohol acetyltransferase () enzymes approximately 500 residues long that seem to be restricted to Saccharomyces. These catalyse the esterification of isoamyl alcohol by acetyl coenzyme A PUBMED:7764365.

    \ 3522 IPR003484 \

    Rhizobial nodulation (Nod) factors are signalling molecules secreted by root-nodulating rhizobia in response to flavanoids excreted by the host plant. They induce various symbiotic responses on the roots of the leguminous host plant at low concentrations, and are required for successful infection. Rhizobial Nod factors are lipo-chitooligosaccharides carrying various substituents which are important determinants of host specificity PUBMED:11732607.

    \ \

    NodA is an N-acyl transferase which specifies the transfer of an acyl chain to the oligosaccharide backbone of Nod factor. Allelic variation of the nodA gene can contribute to the determination of host range PUBMED:8930915.

    \ \ 1366 IPR001466 \

    This is a group of diverse sequences that contain D-alanyl-D-alanine carboxypeptidase B, aminopeptidase (DmpB), alkaline D-peptidase, animal D-Ala-D-Ala carboxypeptidase homologues and the class A and C beta-lactamases and eukaryotic beta-lactamase homologs which are variously described as: transesterases, non-ribosomal peptide synthetases and hypothetical proteins. Many are serine peptidases belonging to MEROPS peptidase family S12 (D-Ala-D-Ala carboxypeptidase B family, clan SE). The beta-lactamases are classified as S12 non-peptidase homologues; these either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    \ \

    Beta-lactamase catalyses the opening and hydrolysis of the beta-lactam ring\ of beta-lactam antibiotics such as penicillins and cephalosporins PUBMED:1856867. There are four groups, classed A, B, C and D according to sequence, substrate specificity, and kinetic behaviour: class A (penicillinase-type) is the most common PUBMED:1856867. The genes for class A beta-lactamases are widely distributed in bacteria, frequently located on transmissible plasmids in Gram-negative organisms, although an equivalent chromosomal gene has been found in a few species PUBMED:2788410.

    \ \

    Class A, C and D beta-lactamases are serine-utilising hydrolases - class B enzymes utilise a catalytic zinc centre instead. The 3 classes of serine beta-lactamase are evolutionarily related and belong to a superfamily that also includes DD-peptidases and other penicillin-binding proteins PUBMED:3128280. All these proteins contain an S-x-x-K motif, the Ser being the active site residue. Although clearly related, however, the sequences of the 3 classes of serine beta-lactamases vary considerably outside the active site.

    \ 7099 IPR009896 \

    This family consists of several Mycoplasma species specific Cytadhesin P32 and P30 proteins. P30 has been found to be membrane associated and localised on the tip organelle. It is thought that it is important in cytadherence and virulence PUBMED:9632619.

    \ 1509 IPR003558 \ Escherichia coli, Haemophilus spp and Campylobacter spp. all produce \ a toxin that is seen to cause distension in certain cell lines PUBMED:8112838, PUBMED:10203548, \ which eventually disintegrate and die. This novel toxin, termed cytolethal \ distending toxin (cdt), has three subunits: A, B and C. Their sizes are \ approx. 27.7, 29.5 and 19.9kDa respectively PUBMED:8112838, and they appear to be \ entirely novel PUBMED:10203548. \ \

    Further research on the complete toxin has revealed that it blocks the cell\ cycle at stage G2, through inactivation of the cyclin-dependent kinase Cdk1, and without induction of DNA breaks. This leads to multipolar abortive \ mitosis and micronucleation, associated with centrosomal amplification PUBMED:10777111.\ The roles of each subunit are unclear, but it is believed that they have\ separate roles in pathogenicity.

    \ 1875 IPR004394 \ The gene iojap is a pattern-striping gene in maize, reflecting a chloroplast development defect in some cells. Maize has two RNA polymerases in plastids, but the plastid-encoded one, similar to bacterial RNA polymerases, is missing in iojap mutants. The role of iojap in chloroplast development, and the role of its bacterial orthologs modeled here, is unclear.\ 7526 IPR011623 \ This entry represents the transmembrane region of the 7TM-DISM (7TM Receptors with Diverse Intracellular Signalling Modules) PUBMED:12914674.\ 6252 IPR010491 \

    This domain is specific to the N-terminal part of the prp1 splicing factor, which is involved in mRNA splicing (and possibly also poly(A)+ RNA nuclear export and cell cycle progression). This domain is specific to the N terminus of the RNA splicing factor encoded by prp1 PUBMED:9003295. It is involved in mRNA splicing and possibly also poly(A)and RNA nuclear export and cell cycle progression.

    \ 3061 IPR000098 \ Interleukin-10 (IL-10) is a protein that inhibits the synthesis of a\ number of cytokines, including IFN-gamma, IL-2, IL-3, TNF and GM-CSF produced\ by activated macrophages and by helper T cells.\ Structurally, IL-10 is a protein of about 160 amino acids that contains four\ conserved cysteines involved in disulphide bonds PUBMED:8590020.\ \
    \
                    +-------------------------------------+\
                    |                  *****              |\
         xxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxCxCxxxxxxxxxxxx\
                                        |                   |\
                                        +-------------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    '*': position of the pattern.\
    
    \ IL-10 is highly similar to the Epstein-Barr virus BCRF1 protein which inhibits\ the synthesis of gamma-interferon and to Equine herpesvirus type 2 protein E7.\ It is also similar, but to a lesser degree, with human protein mda-7 PUBMED:8545104, a\ protein which has antiproliferative properties in human melanoma cells. Mda-7\ only contains two of the four cysteines of IL-10.\ 5323 IPR008913 \ This family of domains are likely to bind to zinc ions. They contain many conserved cysteine and histidine residues. The domain has been named after the N-terminal motif CXHY. This domain can be found in isolation in some proteins, but is also often associated with a zinc finger RING domain (). One of the proteins in this family (YKI4_YEAST) is a mitochondrial intermembrane space protein called Hot13. This protein is involved in the assembly of small TIM complexes PUBMED:15294910.\ 4432 IPR007037 \

    This entry includes the vibriobactin utilization protein viuB, which is involved in the removal of iron from iron-vibriobactin complexes, as well as several hypothetical proteins.

    \ 2278 IPR006927 \

    The sequences in this family are plant proteins of unknown function.

    \ 4952 IPR000349 \ This family contains the major surface antigens of the hepatitus viruses (Hepadnaviridae). The protein is most likely required for an early step of the life cycle involving entry or uncoating of virus particles.\ 4810 IPR001656 \

    Members of this family of proteins appear to be responsible for synthesis of pseudouridine from uracil-13 in transfer RNAs PUBMED:12756329. They are hydrophilic proteins of from 39 to 77 kDa and homologues are found in bacteria, archaea, and eukarya.

    \ 3563 IPR003462 \

    This family contains the bacterial Ornithine cyclodeaminase enzyme, which catalyses the deamination of ornithine to proline PUBMED:2644238. This family also contains mu-crystallin the major component of the eye lens in several Australian marsupials, mRNA for this protein has also been found in human retina PUBMED:1384048.

    \ 7587 IPR011670 \ This family includes sequences of largely unknown function but which share a number of features in common. They are expressed by bacterial species, and in many cases these bacteria are known to associate symbiotically with plants. Moreover, the majority are coded for by plasmids, which in many cases are known to confer on the organism the ability to interact symbiotically with leguminous plants. An example of such a plasmid is NGR234, which encodes Y4CF, a protein of unknown function that is a member of this family PUBMED:9163424. Other members of this family are expressed by organisms with a documented genomic similarity to plant symbionts PUBMED:12271122.\ 7028 IPR009851 \

    This family represents a conserved region approximately 150 residues long within a number of eukaryotic proteins that show homology with Drosophila melanogaster Modifier of rudimentary (Mod(r)) proteins. The N-terminal half of Mod(r) proteins is acidic, whereas the C-terminal half is basic PUBMED:7651329, and both of these regions are represented in this family.

    \ 5994 IPR009323 \

    This family consists of several putative bacterial membrane proteins. The function of this family is unclear.

    \ 768 IPR007517 \ The Mre11 complex (Mre11 Rad50 Nbs1) is central to chromosomal maintenance and functions in homologous recombination, telomere maintenance and sister chromatid association. The Rad50 coiled-coil region contains a dimer interface at the apex of the coiled coils in which pairs of conserved Cys-X-X-Cys motifs form interlocking hooks that bind one Zn ion. This alignment includes the zinc hook motif and a short stretch of coiled-coil on either side.\ 3223 IPR000044 \ Mycoplasma genitalium has the smallest known genome of any free-living \ organism. Its complete genome sequence has been determined by whole-genome random sequencing and assembly PUBMED:7569993. Only 470 putative coding regions were identified, including genes for DNA replication, transcription and\ translation, DNA repair, cellular transport and energy metabolism PUBMED:7569993. \ A hypothetical protein from the MG045 gene PUBMED:8253680 has a homologue of similarly\ unknown function in Mycoplasma pneumoniae PUBMED:8948633.\ 569 IPR004299 \ The MBOAT (membrane bound O-acyl transferase) family of membrane proteins contains a variety of acyltransferase\ enzymes. A conserved histidine has been suggested to be the active site residue PUBMED:10694878.\ 5118 IPR007955 \

    Trophinin and tastin form a cell adhesion molecule complex that potentially mediates an initial\ attachment of the blastocyst to uterine epithelial cells at the time of implantation. Trophinin and tastin\ bind to an intermediary cytoplasmic protein called bystin. Bystin may be involved in implantation and\ trophoblast invasion because bystin is found with trophinin and tastin in the cells at human implantation sites and also in the intermediate trophoblasts at\ invasion front in the placenta from early pregnancy PUBMED:9560222. This family also includes the\ Saccharomyces cerevisiae protein ENP1. ENP1 is an essential\ protein in S. cerevisiae and is localised in the nucleus\ PUBMED:9034325. It is thought that ENP1 plays a direct role in the early steps of rRNA processing\ as enp1 defective S. cerevisiae cannot synthesise 20S\ pre-rRNA and hence 18S rRNA, which leads to reduced formation of 40S ribosomal subunits\ PUBMED:12527778.

    \ 2219 IPR006850 \ This represents a conserved region found in a number of Chlamydophila pneumoniae proteins.\ 2586 IPR004304 \ This family includes amidohydrolases of formamide and acetamide . The formamidase from Methylophilus methylotrophus forms a homotrimer suggesting that this may be a common property of other members of this family.\ 447 IPR000203 \

    This domain has been termed the GPS domain (for GPCR proteolytic site), because it contains a cleavage site in latrophilin PUBMED:9920906. However this region in latrophilin is found in many otherwise unrelated cell surface receptors PUBMED:10469603. There is no evidence currently that this domain provides a cleavage site in any of the other receptors. However the peptide bond that is cleaved in latrophilin is between Leu and Thr residues that are conserved in some of the other receptors PUBMED:10469603

    \ \

    GPS domains are about 50 residues long and contain either 2 or 4 cysteine residues that are likely to form disulphide bridges. Based on conservation of these cysteines the following pairing can be predicted.

    \ \
    \
                                 +-----------------+\
                                 |                 |\
               +-----------------+---------------+ |\
               |                 |               | |\
            XXXCXXXXXXXXXXXXXXXXXCXXXXXXXXXXXXXXXCXCXXLTXXXXXXX\
                                                       ^\
                                                       cleavage site\
    
    \ 4135 IPR007739 \ This family consists of a group of proteins which are related to the Streptococcal rhamnose-glucose polysaccharide assembly protein (RgpF). Rhamnan backbones are found in several O-polysaccharides found in phytopathogenic bacteria and are regarded as pathogenic factors PUBMED:12010977.\ 5867 IPR006231 \

    The membrane-associated enzyme, malate:quinone-oxidoreductase, is an alternative to the better-known NAD-dependent malate dehydrogenase as part of the TCA cycle. The reduction of a quinone rather than NAD+ makes the reaction essentially irreversible in the direction of malate oxidation to oxaloacetate. Both forms of malate dehydrogenase are active in E. coli; disruption of this form causes less phenotypic change. In some bacteria, this form is the only or the more important malate dehydrogenase PUBMED:11092847.

    \ 6996 IPR009834 \

    This family contains fatty acid elongase 3-ketoacyl-CoA synthase 1, a plant enzyme approximately 350 residues long.

    \ 7595 IPR011686 \ The omega transcriptional repressor regulates expression of genes involved in copy number control and stable maintenance of plasmids. The omega protein belongs to the structural superfamily of MetJ/Arc repressors featuring a ribbon-helix-helix DNA-binding motif with the beta-ribbon located in and recognising the major groove of operator DNA PUBMED:11733997.\ 3499 IPR011577 \

    Cytochrome b561 is an integral membrane and electron transport protein, that binds two haeme groups non-covalently. This domain is also found in a number of nickel-dependent hydrogenase subunits which are also B-type cytochromes that interact with quinones and anchor the hydrogenase to the membrane.

    \ 6904 IPR009780 \

    This family consists of several short, hypothetical bacterial proteins of around 80 residues in length. Members of this family are found in Rhizobium, Agrobacterium and Brucella species. The function of this family is unknown.

    \ 1225 IPR002891 \ Enzyme that catalyses the phosphorylation of adenylylsulphate to 3'-phosphoadenylylsulphate. This domain contains an ATP binding P-loop motif PUBMED:9786849.\ 4884 IPR011612 \

    Urease (urea amidohydrolase, ) catalyses the hydrolysis of urea to form ammonia and carbamate. The subunit composition of urease from different sources varies PUBMED:7565414, but each holoenzyme consists of four structural domains PUBMED:7754395: three structural domains and a nickel-binding catalytic domain common to amidohydrolases PUBMED:9144792. Urease is unique among nickel metalloenzymes in that it catalyses a hydrolysis rather than a redox reaction. In Helicobacter pylori, the gamma and beta domains are fused and called the alpha subunit (). The catalytic subunit (called beta or B) has the same organization as the Klebsiella alpha subunit. Jack bean (Canavalia ensiformis) urease has a fused gamma-beta-alpha organization ().

    The N-terminal domain is a composite domain and plays a major trimer stabilising role by contacting the catalytic domain of the symmetry related alpha-subunit PUBMED:7754395.

    \ 4227 IPR001892 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S13 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S13 is known to be involved in binding fMet-tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to 177 amino-acid residues. This family of ribosomal proteins is present in procaryotes and eukaryotes PUBMED:1872840, PUBMED:.

    \ 3289 IPR002528 \

    Characterised members of the Multi Antimicrobial Extrusion (MATE) family function as drug/sodium antiporters. These proteins mediate resistance to a wide range of cationic dyes, fluroquinolones, aminoglycosides and other structurally diverse antibodies and drugs. MATE proteins are found in bacteria, archaea and eukaryotes. These proteins are predicted to have 12 alpha-helical transmembrane regions, some of the animal proteins may have an additional C-terminal helix.

    \ 7554 IPR013101 \

    This entry includes some LRRs that fail to be detected by PUBMED:7817399, PUBMED:8264799.

    \ 6540 IPR009588 \

    This family consists of several hypothetical Feline immunodeficiency virus (FIV) proteins. Members of this family are typically around 67 residues long and are often annotated as ORF3 proteins. The function of this family is unknown.

    \ 3033 IPR002652 \ This family consists of the importin alpha (karyopherin alpha), importin beta (karyopherin beta) binding domain. The domain mediates formation of the importin alpha beta complex; required for classical NLS import of proteins into the nucleus, through the nuclear pore complex and across the nuclear envelope. Also in the alignment is the NLS of importin alpha which overlaps with the IBB domain PUBMED:8692858.\ 299 IPR006735 \ This family represents several uncharacterised eukaryotic proteins.\ 5573 IPR008436 \ This family consists of several Chlamydia 15 kDa cysteine-rich outer membrane proteins which are associated with differentiation of reticulate bodies (RBs) into elementary bodies (EBs) PUBMED:3066701.\ 905 IPR001906 \

    Sequences containing this domain belong to the terpene synthase family. It has been suggested that this gene family be designated tps (for terpene synthase). Sequence comparisons reveal similarities between the monoterpene (C10) synthases, sesquiterpene (C15) synthases and the diterpene (C20) synthases. It has been split into six subgroups on the basis of phylogeny, called Tpsa-Tpsf PUBMED:9268308.

    \ \ \ \

    In the fungus Phaeosphaeria sp.L487 the synthesis of ent-kaurene from geranylgeranyl dophosphate is promoted by a single bifunctional protein PUBMED:9268298.

    \ 7031 IPR009853 \

    This family consists of several Caenorhabditis elegans proteins of around 70-75 residues in length. The function of this family is unknown.

    \ 7143 IPR009190 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 572 IPR007747 \ MEN1, the gene responsible for multiple endocrine neoplasia type 1, is a tumour suppressor gene that encodes a protein called Menin which may be an atypical GTPase stimulated by nm23 PUBMED:12145286.\ 7646 IPR012849 \

    The region featured in this family is found towards the N-terminus of a number of adaptor proteins that interact with Abl-family tyrosine kinases PUBMED:12011975. More specifically, it is termed the homeo-domain homologous region (HHR), as it is similar to the DNA-binding region of homeo-domain proteins PUBMED:7590236. Other homeo-domain proteins have been implicated in specifying positional information during embryonic development, and in the regulation of the expression of cell-type specific genes PUBMED:7590236. The Abl-interactor proteins are thought to coordinate the cytoplasmic and nuclear functions of the Abl-family kinases, and seem to be involved in cytoskeletal reorganisation, but their precise role remains unclear PUBMED:12011975.

    \ 1150 IPR003778 \ This domain represents subunit 2 of allophanate hydrolase (AHS2).\ 3780 IPR004279 \ The perilipin family includes lipid droplet-associated protein (perilipin) and adipose differentiation-related protein\ (adipophilin). Perilipin is a modulator of adipocyte lipid metabolism and adipophilinis involved in the development and maintenance of adipose tissue. Other proteins belong to this group include TIP47, a cargo selection device for mannose 6-phosphate receptor trafficking PUBMED:9590177.\ 6869 IPR009759 \

    This family consists of several hypothetical bacterial proteins of around 115 residues in length, which seem to be specific to Escherichia coli. The function of this family is unknown.

    \ 3648 IPR003861 \ This is is a family of Papillomavirus proteins, E4, coded for by ORF4. A splice variant, E1--E4, exists but the function of neither E4 nor E1--E4 is known PUBMED:9454695.\ 1628 IPR004945 \ The function of the Coronavirus 6B and 7B proteins is not known.\ 5546 IPR008789 \ This family consists of several highly related Poxvirus sequences which are thought to be intermediate transcription factors PUBMED:1660196.\ 1867 IPR002849 \ This archaebacterial protein family has no known function.\ The proteins are predicted to contain two transmembrane\ helices.\ 2118 IPR007404 \ This is a family of predicted membrane-bound metal-dependent hydrolases, based on .\ 1975 IPR005047 \

    This family consists of proteins found in Caenorhabditis species. There is some evidence to suggest they may be G protein-coupled receptor-like.

    \ 5981 IPR010375 \

    This is a family of uncharacterised bacterial proteins.

    \ 5126 IPR007963 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M61 (glycyl aminopeptidase family, clan MA(E)).The predicted active site residues for members of this family and thermolysin, the type example for clan MA, occur in the motif HEXXH. The type example is glycyl aminopeptidase from Sphingomonas capsulata.

    \ 2738 IPR001540 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 20 comprises enzymes with several known activities; beta-hexosaminidase (); lacto-N-biosidase (). Carbonyl oxygen of the C-2 acetamido group of the substrate acts as the catalytic nucleophile/base in this family of enzymes.

    \ \

    In the brain and other tissues, beta-hexosaminidase A degrades GM2 gangliosides; specifically, the enzyme hydrolyses terminal non-reducing N-acetyl-D-hexosamine residues in N-acetyl-beta-D-hexosaminides. There are 3 forms of beta-hexosaminidase: hexosaminidase A is a trimer, with one alpha, one beta-A and one beta-B chain; hexosaminidase B is a tetramer of two beta-A and two beta-B chains; and hexosaminidase S is a homodimer of alpha chains. The two beta chains are derived from the cleavage of a precursor. Mutations in the beta-chain lead to Sandhoff disease, a lysosomal storage disorder characterised by accumulation of GM2 ganglioside PUBMED:8357844.

    \ 6947 IPR010780 \

    This family consists of several hypothetical, putative lipoproteins of around 80 residues in length. Members of this family seem to be specific to the class Gammaproteobacteria. The function of this family is unknown.

    \ 6931 IPR010775 \

    This family consists of several bacterial and plant proteins of around 250 residues in length. The function of this family is unknown.

    \ 6002 IPR009329 \

    This is a family of bacterial proteins that are related to the hypothetical protein YeeT.

    \ 1562 IPR002020 \ Citrate synthase is a member of a small family of enzymes that can directly\ form a carbon-carbon bond without the presence of metal ion cofactors PUBMED:2337600.\ It catalyses the first reaction in the Krebs' cycle PUBMED:2337600:\ \ This reaction is an an important function in both energy generation and carbon assimilation PUBMED:6343122.\ The reaction proceeds via a non-covalently bound intermediate citryl-\ coenzyme A intermediate, and is thus thought to be a 2-step process PUBMED:2337600. \ The enzyme exists as a globular homodimer (a hexamer in prokaryotes), which \ is almost completely helical (20 helices per monomer), which is unusual for\ such a large enzyme PUBMED:7120407. In\ eukaryotes, there are two isozymes of citrate synthase: one is found in the\ mitochondrial matrix, the second is cytoplasmic. Both seem to be dimers of\ identical chains. Each monomer is divided into a large and a small \ domain, the cleft between these domains forming the active site where both \ citrate and acetyl-coenzyme A have been shown to bind PUBMED:7120407. The enzyme\ undergoes a conformational change upon binding of the oxaloacetate, whereby\ the active site cleft closes over PUBMED:7120407.\ There are a number of regions of sequence similarity between prokaryotic and\ eukaryotic citrate synthases. One of the best conserved contains a histidine\ which is one of three residues shown PUBMED:2337600 to be involved in the catalytic\ mechanism of the vertebrate mitochondrial enzyme.\ 1816 IPR006591 \

    DNA-dependent RNA polymerase catalyzes the\ transcription of DNA into RNA using the four\ ribonucleoside triphosphates as substrates. Each class of RNA polymerase is assembled from 9 to 15\ different polypeptides.

    \ \

    Rbp10 (RNA polymerase CX) is a domain found in RNA polymerase subunit 10; present in RNA\ polymerase I, II and III.

    \ 3517 IPR000903 \ Myristoyl-CoA:protein N-myristoyltransferase () (Nmt) PUBMED:8322618 is the enzyme responsible \ for transferring a myristate group on the N-terminal glycine of a number of cellular eukaryotics and \ viral proteins. Nmt is a monomeric protein of about 50 to 60 kD whose sequence appears to be well \ conserved.\ 2374 IPR004334 \ This family of poxvirus proteins is found in cytoplasmic sites of viral DNA replication PUBMED:10854161. However, its function is\ unknown.\ 1669 IPR000647 \ Nuclear factor I (NF-I) or CCAAT box-binding transcription factor (CTF) PUBMED:2504497, PUBMED:2339052 (also\ known as TGGCA-binding proteins) are a family of vertebrate nuclear proteins which recognize and bind, as\ dimers, the palindromic DNA sequence 5'-TGGCANNNTGCCA-3'. CTF/NF-I binding sites are present in viral and\ cellular promoters and in the origin of DNA replication of Adenovirus type 2. The CTF/NF-I proteins were\ first identified as nuclear factor I, a collection of proteins that activate the replication of several\ Adenovirus serotypes (together with NF-II and NF-III) PUBMED:6216480. The family of proteins was also\ identified as the CTF transcription factors, before the NFI and CTF families were found to be identical\ PUBMED:3398920. The CTF/NF-I proteins are individually capable of activating transcription and DNA replication.\ In a given species, there are a large number of different CTF/NF-I proteins, generated both by alternative\ splicing and by the occurrence of four different genes. CTF/NF-1 proteins contain 400 to 600 amino acids.\ The N-terminal 200 amino-acid sequence, almost perfectly conserved in all species and genes sequenced,\ mediates site-specific DNA recognition, protein dimerization and Adenovirus DNA replication. The C-terminal\ 100 amino acids contain the transcriptional activation domain. This activation domain is the target of gene\ expression regulatory pathways ellicited by growth factors and it interacts with basal transcription factors\ and with histone H3 PUBMED:8543151.\ 1277 IPR007849 \ ATP10 is an inner membrane protein essential for the assembly of a functional mitochondrial ATPase complex, possibly by acting as a chaperone molecule PUBMED:2141026.\ 7843 IPR012964 \

    This family of proteins contains many bacterial proteins that are encoded by the unbL gene. The function of these proteins is unknown.

    \ 1269 IPR007471 \

    This entry represents the N-terminal region of the enzyme arginine-tRNA-protein transferase (), which catalyses the post-translational conjugation of arginine to the N terminus of a protein. In eukaryotes, this functions as part of the N terminus rule pathway of protein degradation by conjugating a destabilising amino acid to the N-terminal aspartate or glutamate of a protein, targeting the protein for ubiquitin-dependent proteolysis. N-terminal cysteine is sometimes modified PUBMED:9858543. In Saccharomyces cerevisiae, Cys20, 23, 94 and/or 95 are thought to be important for activity PUBMED:7495814. Of these, only Cys 94 appears to be completely conserved in this family. The C-terminal is represented by .

    \ 3370 IPR002917 \ Human HSR1, has been localized to the human MHC class I region and is highly homologous to a putative GTP-binding protein, MMR1 from mouse. These proteins represent a new subfamily of GTP-binding proteins that has both prokaryote and eukaryote members PUBMED:8180467.\ 7997 IPR012560 \

    This is central domain A in proteins of the Ferlin family PUBMED:15112237.

    \ 5331 IPR008787 \

    This family of proteins which include vaccinia virus G7L and fowlpox virus FPV120 are associated with the intracellualar mature virus particle. The function of this family of proteins is not known.

    \ 3240 IPR007775 \ B144/LST1 is a gene encoded in the human major histocompatibility complex that produces multiple forms of alternatively spliced mRNA and encodes peptides fewer than 100 amino acids in length. B144/LST1 is strongly expressed in dendritic cells. Transfection of B144/LST1 into a variety of cells induces morphologic changes including the production of long, thin filopodia PUBMED:11478849.\ 6424 IPR009518 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbX found in PSII, which is associated with the oxygen-evolving complex. Its expression is light-regulated. PsbX appears to be involved in the regulation of the amount of PSII PUBMED:11202442, and may be involved in the binding or turnover of quinone molecules at the Qb (PsbA) site PUBMED:11230572.

    \ 4047 IPR000109 \ The transport of peptides into cells is a well-documented biological phenomenon\ which is accomplished by specific, energy-dependent transporters found in a number\ of organisms as diverse as bacteria and humans. The PTR family of proteins is distinct\ from the ABC-type peptide transporters and was uncovered by sequence analyses of a\ number of recently discovered peptide transport proteins PUBMED:7476181.\ These proteins that seem to be mainly\ involved in the intake of small peptides with the concomitant uptake of a\ proton PUBMED:7817396.\ \

    These integral membrane proteins are predicted to comprise twelve\ transmembrane regions.

    \ 5767 IPR010265 \

    This family consists of a series of phage minor tail proteins and related sequences from several bacterial species.

    \ 2532 IPR003468 \

    Cytochrome cbb3 oxidases are found almost exclusively in Proteobacteria, and represent a distinctive class of proton-pumping respiratory haem-copper oxidases (HCO) that lack many of the key structural features that contribute to the reaction cycle of the intensely studied mitochondrial cytochrome c oxidase (CcO) PUBMED:15100055. Cytochrome cbb3 oxidases are required both to support symbiotic nitrogen fixation, whilst ensuring that the oxygen-labile nitrogenase is not compromised. Cytochrome cbb3 oxidases consist of four subunits: FixN (or CcoN), FixO (or CcoO), FixP (or CcoP) and FixQ (or CcoQ). The catalytic core is comprised of subunits FixN, FixO and FixP, where FixN acts as the catalytic subunit, and Fix O and FixP are membrane-bound mono- and di-haem cytochromes c, respectively. The FixQ subunit protects the core complex in the presence of oxygen from proteolytic degradation PUBMED:11864982. This entry represents the mono-haem FixO subunit.

    \ 4520 IPR004093 \ Staphylokinases and streptokinases are not proteases. They are involved in plasminogen activation. The three-dimensional structure of streptokinase is believed to contain two independently folded domains, each homologous to serine proteases PUBMED:6760891.\ 4781 IPR004192 \ The ubiquinol cytochrome c reductase (cytochrome bc1) complex is a respiratory chain that generates an elctrochemical potential coupled to ATP synthesis. The bc1 complex contains 11 subunits, 3 respiratory subunits (cytochrome B, cytochrome C1, Rieske protein), 2 core proteins and 6 low-molecular weight proteins. Each subunit of the cytochrome bc1 complex provides a single helix (this family) to make up the transmembrane region of the complex.\ 3681 IPR005542 \

    Pbx proteins are members of the TALE (three-amino-acid loop extension) family of atypical homeodomain proteins, whose members\ are characterized by a three-residue insertion in the first helix of the homeodomain involved in their interaction with Hox proteins. Examination\ of Pbx1 has shown that, in addition to the homeodomain, a short 16-residue C-terminal tail is essential for maximal cooperative interactions with\ Hox partners as well as for maximal monomeric binding of Pbx1 to DNA.

    The PBX domain is a bipartite acidic domain PUBMED:1363814.

    \ 7277 IPR010893 \

    This family contains bacterial hydrogenase-1 expression proteins approximately 120 residues long. This includes the Escherichia coli protein HyaE, and the homologous proteins HoxO of R. eutropha and HupG of R. leguminosarum. Deletion of the hoxO gene in R. eutropha led to complete loss of the uptake [NiFe] hydrogenase activity, suggesting that it has a critical role in hydrogenase assembly PUBMED:12914940.

    \ 2102 IPR007382 \ This protein is predicted to be an integral membrane protein.\ 1249 IPR002484 \ Arterivirus are ssRNA positive-strand viruses with no DNA stage in their replication cycle. This family contains the viral nucleocapsid protein, which encapsidates the viral ssRNA.\ 7433 IPR011462 \

    This entry represents a conserved sequence region found in hypothetical proteins from a wide range of bacteria. Shewenella oneidensis contains multiple members.

    \ 4046 IPR007115 \

    The complex organic chemistry involved in the transformation of GTP to tetrahydrobiopterin is catalysed by only three enzymes: GTP cyclohydrolase I, 6-pyruvoyltetrahydropterin synthase and sepiapterin reductase. Tetrahydrobiopterin is the cofactor for several aromatic amino acid monooxygenases and the nitric oxide synthases. 6-Pyruvoyl tetrahydropterin synthase (PTPS) PUBMED:8137809 is a Zn-dependent metalloprotein, transforms dihydroneopterin triphosphate into 6-pyruvoyltetrahydropterin in the presence of Mg(II) and for which the crystal structure is known.

    \ \

    The enzyme is a homohexameric, composed of a dimer of trimers. A transition metal binding site formed by the three histidine residues 23, 48 and 50 is present in each subunit, and bound Zn(II) is responsible for the enzymatic activity. Site-directed mutagenesis of each of these three histidine residues results in a complete loss of metal binding and enzymatic activity PUBMED:7563095, PUBMED:9165069.

    \ \

    The function of the bacterial branch of the sequence lineage appears not to have been established.

    \ \ 2255 IPR006740 \ This family includes a conserved region found in several uncharacterised plant proteins.\ 3488 IPR002186 \

    This family is comprised of antitumour antibiotic chromoproteins, as represented by neocarzinostatin PUBMED:8235619. These chromoproteins consist of a noncovalently bound, labile enediyne chromophore and its stabilising carrier apoprotein. The protein component of the chromophore displays an unusual bicyclic dienediyne structure. The chromoprotein inter-chelates the DNA, where its cycloaromatisation produces a biradical intermediate that has the ability to abstract hydrogens from the sugar moiety of DNA. This causes single- and double-strand breaks in the DNA PUBMED:11491295. In addition to their ability to cleave DNA at sites specific for each chromophore, results indicate that these chromoproteins also possess proteolytic activity against histones, with histone H1 as the preferred substrate PUBMED:9383447.

    \

    Neocarzinostatin has 2 disulphide bridges and is kidney-shaped with 2 defined domains that hold a binding cavity. The larger domain forms a 7-stranded antiparallel beta-barrel and the smaller domain consists of 2 anti-parallel strands of beta sheet that are perpendicular to each other PUBMED:8235619. Other members of this family include macromycin, actinoxanthine, kedarcidin PUBMED:9383447, and C-1027 PUBMED:11491295.

    \ 615 IPR007053 \

    This domain is found in proteins from viruses, bacteria and the eukayota. The domain contains a well-conserved NCEHF motif. The function of this domain is unknown.

    \ 5977 IPR010374 \

    This is a family of uncharacterised bacterial membrane proteins.

    \ 5928 IPR010354 \

    Members of this family are thought to have structural features in common with the beta chain of the class II antigens, as well as myosin, and may play an important role in the pathogenesis PUBMED:8188369.

    \ 6802 IPR009721 \

    This entry represents the C terminus (approximately 170 residues) of a number of hypothetical plant proteins of unknown function.

    \ 6527 IPR009580 \

    Phospho-ethanolamine N-methyltransferase is involved in glycosylphosphatidylinositol (GPI) anchor biosynthesis PUBMED:12655644.

    \ 4915 IPR002490 \ This family consists of the 116kDa V-type ATPase (vacuolar (H+)-ATPases)\ subunits, as well as V-type ATP synthase subunit i. \ The V-type ATPases family are proton pumps that acidify intracellular\ compartments in eukaryotic cells for example yeast central vacuoles,\ clathrin-coated and synaptic vesicles. They have important roles in\ membrane trafficking processes PUBMED:10224039. \ The 116kDa subunit (subunit a) in the V-type ATPase is part of the V0\ functional domain responsible for proton transport. The a subunit is a\ transmembrane glycoprotein with multiple putative transmembrane helices \ it has a hydrophilic amino terminal and a hydrophobic carboxy \ terminal PUBMED:10224039, PUBMED:10340849. It has roles in proton transport and assembly of the\ V-type ATPase complex PUBMED:10224039, PUBMED:10340849. \ This subunit is encoded by two homologous gene in yeast VPH1 and STV1 PUBMED:10340849.\ 2256 IPR006750 \

    This family contains uncharacterised bacterial proteins.

    \ 7873 IPR012595 \

    This family consists of the PetM family of cytochrome b6f complex subunit IV. The cytochrome b6f complex consists of 7 subunits and contains 2 beta haemes and 1 chlorophyll alpha per cytochrome f. It is highly active in transferring electrons from decylplastoquinol to oxidised plastocyanin PUBMED:7493968.

    \ 7226 IPR010872 \

    This family consists of several hypothetical bacterial proteins of around 250 residues in length. Members of this family seem to be found exclusively in Streptomyces coelicolor and Mycobacterium tuberculosis. The function of this family is unknown.

    \ 3863 IPR003138 \ VP1, VP2, VP3 and VP4 are the four basic units that form the icosahedral coat of picornaviruses. Five symmetry-related N termini of coat protein VP4 form a ten-stranded, antiparallel beta barrel around the base of the icosahedral fivefold axis PUBMED:9083115.\ 6372 IPR010541 \

    This entry represents the C terminus of several eukaryotic RWD domain-containing proteins of unknown function.

    \ 4109 IPR002861 \ Extracellular matrix (ECM) proteins play an important role in early cortical development, specifically in the formation of neural connections and in controlling the cyto-architecture of the central nervous system. \ The product of the reeler gene in mouse is reelin,a large extracellular protein secreted by pioneer neurons that coordinates cell positioning during neurodevelopment PUBMED:9338784. F-spondin and mindin are a family of matrix-attached adhesion molecules that share structural similarities and overlapping domains of expression. \ Both F-spondin and mindin promote adhesion and outgrowth of hippocampal embryonic neurons and bind to a putative receptor(s) expressed on both hippocampal and sensory neurons PUBMED:10409509.\ \

    This domain of unknown function is found at the N terminus of reelin\ and F-spondin (see http://www.bork.embl-heidelberg.de/Modules/07-matrix.gif).

    \ 3911 IPR005031 \ Members of this family of enzymes from Streptomyces spp. are involved in polyketide (linear poly-beta-ketones) synthesis.\ 4302 IPR000769 \

    The Rop protein regulates plasmid DNA replication by modulating the initiation of transcription \ of the primer RNA precursor. Processing of the precursor, RNAII, is inhibited by hydrogen bonding \ of RNAII to its complementary sequence in RNAI. Rop increases the affinity of RNAI for RNAII and \ thus decreases the rate of replication initiation events. The 3D structure of Rop has been \ determined by X-ray crystallography and refined to 1.7A resolution. The 63 amino acid protein is \ a homodimer, each monomer consisting almost entirely of two alpha-helices, the whole molecule \ forming a highly regular four-alpha-helix bundle PUBMED:3681971. This can be approximated by a \ four-stranded rope, with radius 7.0 A, a left-handed helical twist, and pitch 172.5 A. A very compact \ packing of side chains in the helix interfaces of the Rop coiled-coil structure is presumed to \ account for its high stability PUBMED:1841691. The overall details of the structure have been\ confirmed by proton NMR PUBMED:1841691, PUBMED:2223771.

    \ \ 1382 IPR002662 \ VP2 is the major structural protein of birnaviruses PUBMED:8525637. The large RNA segment of birnaviruses codes for a polyprotein (N-VP2-VP4-VP3-C) PUBMED:2828658.\ 6818 IPR009729 \

    This family consists of several mammalian galactose-3-O-sulfotransferase proteins. Gal-3-O-sulfotransferase is thought to play a critical role in 3'-sulfation of N-acetyllactosamine in both O- and N-glycans PUBMED:11323440.

    \ 407 IPR003789 \ This domain is found in GatB and proteins related to bacterial Yqey. It is about 140 amino acid residues long. This domain is found at the C terminus\ of GatB which transamidates Glu-tRNA to Gln-tRNA. The function of this domain is uncertain. It does however suggest that Yqey and its relatives\ have a role in tRNA metabolism.\ 7456 IPR011480 \

    This is a family of short hypothetical proteins found in Rhodopirellula baltica.

    \ 3797 IPR004184 \

    Pyruvate formate-lyase (also known as formate C-acetyltransferase) is an enzyme which converts acetyl-CoA and formate to CoA and pyruvate.\ In Escherichia coli, it uses a radical mechanism to reversibly cleave the C1-C2 bond of pyruvate using the Gly 734 radical and two cysteine residues (Cys 418, Cys 419) PUBMED:10504733.

    \ 1522 IPR006189 \

    The CHASE domain is an extracellular domain of 200-230 amino acids, which is\ found in transmembrane receptors from bacteria, lower eukaryotes and plants.\ It has been named CHASE (Cyclases/Histidine kinases Associated Sensory\ Extracellular) because of its presence in diverse receptor-like proteins with\ histidine kinase and nucleotide cyclase domains. The CHASE domain always\ occurs N-terminally in extracellular or periplasmic locations, followed by an\ intracellular tail housing diverse enzymatic signaling domains such as\ histidine kinase (), adenyl cyclase, GGDEF-type nucleotide\ cyclase and EAL-type phosphodiesterase domains, as well as non-enzymatic\ domains such PAS (), GAF (), phosphohistidine and response\ regulatory domains. The CHASE domain is predicted to bind\ diverse low molecular weight ligands, such as the cytokinin-like adenine\ derivatives or peptides, and mediate signal transduction through the\ respective receptors PUBMED:11590001.

    \

    \ The CHASE domain has a predicted alpha+beta fold, with two extended alpha\ helices on both boundaries and two central alpha helices separated by beta\ sheets. The termini are less conserved compared with the central part of the\ domain, which shows strongly conserved motifs.

    \ 1309 IPR005147 \

    Domain B5 is found in phenylalanine-tRNA synthetase beta subunits. This domain has been shown to bind DNA through a winged helix-turn-helix motif PUBMED:11152603. Phenylalanine-tRNA synthetase may influence common cellular processes via DNA binding, in addition to its aminoacylation function.

    \ \ \ 4595 IPR005496 \ This family contains a number of integral membrane proteins including the TerC protein. TerC has been implicated in resistance to tellurium, and may be involved in efflux of tellurium ions.\ The tellurite-resistant Escherichia coli strain KL53 was found during testing of a group of clinical isolates for antibiotic and heavy metal ion resistance PUBMED:10069007. The determinant of the strain's tellurite resistance was located on a large conjugative plasmid, and analyses showed the genes terB, terC, terD and terE were essential for conservation of this resistance.\ Members of this family contain a number of conserved aspartates which may be involved in metal ion binding.\ 8024 IPR012985 \

    This family of fungal proteins is involved in the processing of membrane bound transcription factor Stp1 PUBMED:15509782 and belongs to MEROPS petidase family S64 (clan PA). The processing causes the signalling domain of Stp1 to be passed to the nucleus where several permease genes are induced. The permeases are important for uptake of amino acids, and processing of tp1 only occurs in an amino acid-rich environment. This family is predicted to be distantly related to the trypsin family (MEROPS peptidase family S1) and to have a typical trypsin-like catalytic triad PUBMED:15509782.

    \ 7358 IPR006579 \

    This domain is present in proteins found exclusively in the arthropods, including a number of Drosophila\ species, the silk moth and the gypsy moth. These proteins are possibly\ involved in RNA binding or single strand DNA binding.

    \ 6642 IPR009637 \

    This family represents a conserved region with eukaryotic lung seven transmembrane receptors and related proteins.

    \ 1012 IPR001594 \ This domain is also known as NEW1 PUBMED:10231582. The DHHC Zn-finger was first isolated in the Drosophila putative transcription factor DNZ1 PUBMED:10231582. The function of this domain is unknown, but it has been predicted to be involved in protein-protein or protein-DNA interactions PUBMED:1892474.\ 1801 IPR003499 \ This family includes proteins that are probably involved in DNA packing in herpesvirus. This domain is normally found at the\ N-terminus of the protein.\ 338 IPR004240 \ The transmembrane 9 superfamily protein (TM9SF) may function as a channel or small molecule transporter. Proteins in this group are endosomal integral membrane proteins.\ 5917 IPR010349 \

    This family consists of several bacterial L-asparaginase II proteins. L-asparaginase () catalyses the hydrolysis of L-asparagine to L-aspartate and ammonium. Rhizobium etli possesses two asparaginases: asparaginase I, which is thermostable and constitutive, and asparaginase II, which is thermolabile, induced by asparagine and repressed by the carbon source PUBMED:10930734.

    \ 5417 IPR008486 \ This family consists of several uncharacterised hypothetical proteins from Mesorhizobium loti.\ 213 IPR012308 \

    This region is found in many but not all ATP-dependent DNA ligase enzymes (). It is thought to be involved in DNA binding and in catalysis. In human DNA ligase I (), and in Saccharomyces cerevisiae (), this region was necessary for catalysis, and separated from the amino terminus by targeting elements. In vaccinia virus () this region was not essential for catalysis, but deletion decreases the affinity for nicked DNA and decreased the rate of strand joining at a step subsequent to enzyme-adenylate formation PUBMED:9016621.

    \ 6411 IPR010560 \

    This entry represents the C terminus of eukaryotic neogenin precursor proteins, which contains several potential phosphorylation sites PUBMED:9121761. Neogenin is a member of the N-CAM family of cell adhesion molecules (and therefore contains multiple copies of and ) and is closely related to the DCC tumour suppressor gene product - these proteins may play an integral role in regulating differentiation programmes and/or cell migration events within many adult and embryonic tissues PUBMED:9264410.

    \ 3775 IPR005322 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases associated with clan U- have an unknown catalytic mechanism as the protein fold of the active site domain and the active site residues have not been reported.

    \

    This group of peptidases belong to MEROPS peptidase family U34 (dipeptidase A family, clan U-), which appear to be mainly dipeptidases PUBMED:8766699.

    \ 1074 IPR001095 \ This protein is a subunit of the acetyl coenzyme A carboxylase complex ().\ It catalyzes the first step in the synthesis of long-chain fatty acids which\ involves the carboxylation of acetyl-CoA to malonyl-CoA.\ The acetyl-CoA carboxylase complex () is a heterohexamer of biotin carboxyl\ carrier protein, biotin carboxylase and two non-identical carboxyl transferase subunits\ (alpha and beta) in a 2:2 association PUBMED:1355089.\ The reaction involves two steps:\ \ \ 8079 IPR013186 \

    The soybean early nodulin 40 (ENOD40) mRNA contains two short overlapping ORFs; in vitro translation yields two peptides of 12 and 24 amino acids PUBMED:11842184. The putative role of the ENOD40 genes has been in favour of organogenesis, such as induction of the cortical cell divisions that lead to initiation of nodule primordia, in developing lateral roots and embryonic tissues. This supports the hypothesis for a role of ENOD40 in lateral organ development PUBMED:12114565.

    \ 3290 IPR002866 \ This region is found in plant potential maturases, which probably assist in splicing chloroplast group II introns. The function of this region is unknown PUBMED:8255751.\ 2291 IPR007004 \ This is a family of hypothetical proteins, the majority is from Beet necrotic yellow vein virus.\ 2775 IPR001701 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 9 comprises enzymes with several known activities; endoglucanase (); cellobiohydrolase (). These enzymes were formerly known as cellulase family E.

    \ 8010 IPR012563 \

    This family consists of the GnsA/GnsB family. GnsA and GnsB are multicopy suppressors of the secG null mutation. These proteins participate in the synthesis of phospholipids, suggesting the functional relationship between SecG and membrane phospholipids. Over expression of gnsA and gnsB causes a remarkable increase in the unsaturated fatty acid content. However, the gnsA-gnsB double null mutant exhibits no effect. Both proteins are predicted to possess a helix-turn-helix structure PUBMED:11544213.

    \ 924 IPR005116 \

    The TOBE domain PUBMED:10829230 (Transport-associated OB) always occurs as a dimer as the C-terminal strand of each domain is supplied by the partner. It is probably involved in the recognition of small ligands such as molybdenum () and sulphate (), and is found in ABC transporters immediately after the ATPase domain.

    \ 1643 IPR011759 \

    Cytochrome c oxidase () PUBMED:6307356, PUBMED:8083153 is an oligomeric enzymatic complex which is a component of the respiratory chain and is involved in the transfer of electrons from cytochrome c to oxygen. In eukaryotes this enzyme complex is located in the mitochondrial inner membrane; in aerobic prokaryotes it is found in the plasma membrane. The enzyme complex consists of 3-4 subunits (prokaryotes) to up to 13 polypeptides (mammals).

    \

    Subunit 2 (CO II) transfers the electrons from cytochrome c to the catalytic subunit 1. It contains two adjacent transmembrane regions in its N-terminus and the major part of the protein is exposed to the periplasmic or to the mitochondrial intermembrane space, respectively. CO II provides the substrate-binding site and contains a copper center called Cu(A) (see ), probably the primary acceptor in cytochrome c oxidase. An exception is the corresponding subunit of the cbb3-type oxidase which lacks the copper A redox-center. Several bacterial CO II have a C-terminal extension that contains a covalently bound heme c.

    \

    The N-terminal domain of cytochrome C oxidase contains two transmembrane alpha-helices.

    \ 4961 IPR001007 \ The vWF domain is found in various plasma proteins:\ complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen \ types VI, VII, XII and XIV; and other extracellular proteins PUBMED:8412987, PUBMED:8145250, PUBMED:1864378. Although the majority of VWA-containing proteins are extracellular, the most ancient ones present in all eukaryotes are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport and the proteasome. A common feature appears to be involvement in multiprotein complexes. Proteins\ that incorporate vWF domains participate in numerous biological events\ (e.g. cell adhesion, migration, homing, pattern formation, and signal\ transduction), involving interaction with a large array of ligands PUBMED:8412987. A number of human diseases arise from mutations in VWA domains. Secondary structure prediction from 75 aligned vWF sequences has revealed a largely alternating sequence of alpha-helices and beta-strands PUBMED:8145250.\ The domain is named after the von Willebrand factor (VWF) type C repeat which is found in multidomain protein/multifunctional proteins involved in maintaining homeostasis PUBMED:3495268, PUBMED:1864378. For the von Willebrand factor the duplicated VWFC domain is thought to participate in oligomerization, but not in the initial dimerization step PUBMED:2007623. The presence of this region in a number of other complex-forming proteins points to the possible involvment of the VWFC domain in complex formation.\ 1891 IPR003743 \

    This entry describes proteins of unknown function.

    \ 1109 IPR005041 \ Adenoviruses (Ads) have evolved multiple mechanisms to evade the host immune response. Several of the\ immunomodulatory Ad proteins are encoded in early transcription unit 3 (E3). The E3B region is highly conserved among human Ads and codes for three proteins called 10.4K, 14.5K, and\ 14.7K - the E3/10.4K, 14.5K, and 14.7K proteins can protect cells from tumor necrosis\ factor alpha-mediated lysis.\ 3724 IPR001300 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to the MEROPS peptidase family C2 (calpain family, clan CA). A type example is calpain, which is an intracellular protease involved in many important cellular functions that are regulated by calcium PUBMED:2539381. The protein is a complex of 2\ polypeptide chains (light and heavy), with three known forms in mammals\ PUBMED:7845226, PUBMED:2555341: a highly calcium-sensitive (i.e., micro-molar range) form known as mu-calpain, mu-CANP or calpain I; a form sensitive to calcium in the milli-molar range, known as m-calpain, m-CANP or calpain II; and a third form, known as p94, which is found in skeletal muscle only PUBMED:2555341.

    \ \

    All forms have identical light but different heavy chains. Both mu- and m-calpain are heterodimers containing an identical 28-kDa subunit and an 80-kDa subunit that shares 55-65% sequence homology between the two proteases PUBMED:7845226, PUBMED:2539381. The crystallographic structure of m-calpain reveals six "domains" in the 80-kDa subunit:

    \ \
      \
    1. A 19-amino acid NH2-terminal sequence;
    2. \
    3. Active site domain IIa;
    4. \
    5. Active site domain IIb.\ \

      Domain 2 shows\ low levels of sequence similarity to papain; although the catalytic His has\ not been located by biochemical means, it is likely that calpain and papain\ are related PUBMED:7845226.

      \ \
    6. \
    7. Domain III;
    8. \
    9. An 18-amino acid extended sequence linking domain III to domain IV;
    10. \
    11. Domain IV, which resembles the penta EF-hand family of polypeptides, binds calcium and regulates activity PUBMED:7845226. />. Ca2+-binding causes a rearrangement of the protein backbone, the net effect of which is that a Trp side chain, which acts as a wedge between catalytic domains IIa and IIb in the apo state, moves away from the active site cleft allowing for the proper formation of the catalytic triad PUBMED:11914728.
    12. \
    \ \ \

    Calpain-like mRNAs have been identified in other organisms including bacteria, but the molecules encoded by these mRNAs have not been isolated, so little is known\ about their properties. How calpain activity is regulated in these organisms cells is still unclear In metazoans, the activity of calpain is controlled by a single proteinase inhibitor, calpastatin (). The calpastatin gene can produce eight or more calpastatin polypeptides ranging from 17 to 85 kDa by use of different promoters and alternative splicing events. The physiological significance of these different calpastatins is unclear, although all bind to three different places on the calpain molecule; binding to at least two of the sites is Ca2+ dependent. The calpains ostensibly participate in a variety of cellular processes including remodelling of cytoskeletal/membrane attachments, different signal transduction pathways, and apoptosis. Deregulated calpain activity following loss of Ca2+ homeostasis results in tissue damage in response to events such as myocardial infarcts, stroke, and brain trauma PUBMED:12843408.

    \ \ 4407 IPR002317 \

    The aminoacyl-tRNA synthetases () catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology PUBMED:2203971. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold and are mostly monomeric PUBMED:10673435, while class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet formation, flanked by\ alpha-helices PUBMED:8364025, and are mostly dimeric or multimeric. In reactions catalysed by the class I aminoacyl-tRNA synthetases,\ the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in\ class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases.\ The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases PUBMED:.

    \ \

    The 10 class I synthetases are considered to have in common the catalytic domain structure based on the Rossmann fold, which is totally different from the class II catalytic domain structure. The class I synthetases are further divided into three subclasses, a, b and c, according to sequence homology. tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases.

    \ \

    Class-II tRNA synthetases do not share a high degree of similarity, however at least three conserved regions are present PUBMED:8274143, PUBMED:2053131, PUBMED:1852601.

    \ \

    Seryl-tRNA synthetase () exists as monomer and belongs to class IIa PUBMED:7540217.

    \ 3643 IPR003822 \

    This family contains the paired amphipathic helix (PAH) repeat. The family contains the eukaryotic Sin3 proteins, which have at least three PAH domains (PAH1, PAH2, and PAH3). Sin3 proteins are components of a co-repressor complex that silences transcription, playing important roles in the transition between proliferation and differentiation. Sin3 proteins are recruited to the DNA by various DNA-binding transcription factors such as the Mad family of repressors, Mnt/Rox, PLZF, MeCP2, p53, REST/NRSF, MNFbeta, Sp1, TGIF and Ume6 PUBMED:11101889. Sin3 acts as a scaffold protein that in turn recruits histone-binding proteins RbAp46/RbAp48 and histone deacetylases HDAC1/HDAC2, which deacetylate the core histones resulting in a repressed state of the chromatin PUBMED:14705930. The PAH domains are protein-protein interaction domains through which Sin3 fulfils its role as a scaffold. The PAH2 domain of Sin3 can interact with a wide range of unrelated and structurally diverse transcription factors that bind using different interaction motifs. For example, the Sin3 PAH2 domain can interact with the unrelated Mad and HBP1 factors using alternative interaction motifs that involve binding in opposite helical orientations PUBMED:15235594.

    \ 356 IPR004113 \

    Some oxygen-dependent oxidoreductases are flavoproteins that contain a covalently bound FAD group which is attached to a histidine via an 8-alpha-(N3-histidyl)-riboflavin linkage. The region around the histidine that binds the FAD group is conserved in these enzymes (see ).

    \ 2327 IPR007818 \ This is a family of plant proteins of unknown function.\ 2093 IPR002723 \

    This family of prokaryotic proteins have not been characterized. All the members are 350-400 amino acids long.

    \ 3728 IPR005536 \

    This domain is found in almost all members of MEROPS peptidase family C25, (clan CD). Peptidase family C25 is a protein family found in the bacteria Porphyromonas gingivalis (Bacteroides gingivalis) a Gram-negative anaerobic bacterial species strongly associated with adult periodontitis. One of its distinguishing characteristics and putative virulence properties is the ability to agglutinate erythrocytes PUBMED:8926061. It is a highly proteolytic organism which metabolises small peptides and amino acids. Indirect evidence suggests that the proteases produced by this microorganism constitute an important virulence factor PUBMED:1322368. Protease-encoding genes have been shown to contain multiple copies of repeated nucleotide sequences. These conserved sequences have also been found in haemagglutinin genes PUBMED:9632563.

    \ 1518 IPR001099 \ Synonym(s): Chalcone synthase, Flavonone synthase, 6'-deoxychalcone synthase \

    Naringenin-chalcone synthases () and stilbene synthases (STS) \ (formerly known as resveratrol synthases) are related plant enzymes. CHS is an\ important enzyme in flavanoid biosynthesis and STS is a key enzyme in \ stilbene-type phyloalexin biosynthesis. Both enzymes catalyze the addition of three\ molecules of malonyl-CoA to a starter CoA ester (a typical example is\ 4-coumaroyl-CoA), producing either a chalcone (with CHS) or stilbene (with\ STS) PUBMED:.

    \ \

    These enzymes have a conserved cysteine residue, located in the central section\ of the protein sequence, which is essential for the catalytic activity of both\ enzymes and probably represents the binding site for the 4-coumaryl-CoA group\ PUBMED:2033084.

    \ 5907 IPR009283 \

    This family consists of several eukaryotic apyrase proteins (). The salivary apyrases of blood-feeding arthropods are nucleotide hydrolysing enzymes implicated in the inhibition of host platelet aggregation through the hydrolysis of extracellular adenosine diphosphate PUBMED:12234496.

    \ 1608 IPR003181 \ The virus capsid is composed 60 icosahedral units, each of which is composed of one copy of each of the two coat proteins. This family contains the large coat protein (LCP) PUBMED:1546463 of the comoviridae viral family.\ 5657 IPR008393 \ This family consists of several adenovirus late L2 mu core protein or protein X sequences PUBMED:3357209.\ 4106 IPR004590 \

    All proteins in this family for which functions are known bind single-stranded DNA and are involved in the the pairing of homologous DNA. RecT from Escherichia coli is a homotetramer which binds to single-stranded DNA and promotes the renaturation of complementary single-stranded DNA, and also plays a role in recombination. It is able to promote the annealing of complementary single\ DNA strands and can catalyze the formation of joint molecules PUBMED:12169595.

    \ 996 IPR003856 \

    A number of related proteins are involved in the synthesis of lipopolysaccharide, O-antigen polysaccharide, capsule polysaccharide and exopolysaccharides. Chain length determinant protein (or wzz protein) is involved in lipopolysaccharide (lps) biosynthesis, conferring a modal distribution of chain length on the O-antigen component of lps PUBMED:9573151. It gives rise to a reduced number of short chain molecules and increases in numbers of longer molecules, with a modal value of 20. The MPA/MPA2 proteins function in CPS and EPS polymerization and export PUBMED:10658645.

    \ 6793 IPR004670 \

    The Escherichia coli NhaA Na+:H+ Antiporter (NhaA) protein probably functions in the regulation of the internal pH when the\ external pH is alkaline. It also uses the H+ gradient to expel Na+ from the cell. Its activity is highly pH dependent.

    \ \ 5391 IPR008834 \ This family consists of several SpvD plasmid virulence proteins from different Salmonella species.\ 5094 IPR007931 \

    This family contains several Drosophila proteins of unknown function.

    \ 745 IPR000817 \

    Prion protein (PrP-c) PUBMED:2572197, PUBMED:1916104, PUBMED:2908696 is a small glycoprotein found in high \ quantity in the brain of animals infected with certain degenerative neurological diseases, such as \ sheep scrapie and bovine spongiform encephalopathy (BSE), and the human dementias Creutzfeldt-Jacob \ disease (CJD) and Gerstmann-Straussler syndrome (GSS). PrP-c is encoded in the host genome and is \ expressed both in normal and infected cells. During infection, however, the PrP-c molecule become \ altered (conformationally rather than at the amino acid level) to an abnormal isoform, PrP-sc. In detergent-treated brain extracts from infected individuals, fibrils\ composed of polymers of PrP-sc, namely scrapie-associated fibrils or prion rods, can be evidenced by electron microscopy. The precise function of the normal PrP isoform in healthy individuals remains unknown. Several results, mainly obtained in transgenic animals, indicate that PrP-c\ might play a role in long-term potentiation, in sleep physiology, in oxidative burst compensation (PrP can fix four Cu2+ through its octarepeat domain), in\ interactions with the extracellular matrix (PrP-c can bind to the precursor of the laminin receptor, LRP), in apoptosis and in signal transduction (costimulation of\ PrP-c induces a modulation of Fyn kinase phosphorylation) PUBMED:12354606.

    The normal isoform, PrP-c, is anchored at the cell membrane, in rafts, through a glycosyl phosphatidyl inositol (GPI); its half-life at the cell surface is 5 h, after which\ the protein is internalised through a caveolae-dependent mechanism and degraded in the endolysosome compartment. Conversion between PrP-c and PrP-sc\ occurs likely during the internalisation process.

    In humans, PrP is a 253 amino acid protein, which has a molecular weight of 3536 kDa. It has two hexapeptides\ and repeated octapeptides at the N-terminus, a disulphide bond and is associated at the C-terminus with a GPI, which enables it to anchor to the external part of the\ cell membrane. The\ secondary structure of PrP-c is mainly composed of alpha-helices, whereas PrP-sc is mainly beta-sheets: transconformation of alpha-helices into beta-sheets has been\ proposed as the structural basis by which PrP acquires pathogenicity in TSEs. The three-dimensional structures shows the protein to be made of a globular domain which includes three alpha-helices and two small antiparallel beta-sheet\ structures, and a long flexible tail whose conformation depends on the biophysical parameters of the environment. Crystals of the globular domain of PrP\ have recently been obtained; their analysis suggests a possible dimerisation of the protein through the three-dimensional swapping of the C-terminal helix 3 and\ rearrangement of the disulphide bond.

    \ 1920 IPR003810 \ Uncharacterized domain in proteins of unknown function.\ 4083 IPR001405 \ RadC is a DNA repair protein found in many bacteria. They share a region of similarity at the C-terminus that contains two perfectly conserved\ histidine residues.\ 6193 IPR009420 \

    This family consists of several Enterobacterial FlhE flagellar proteins. The exact function of this family is unknown PUBMED:9387224.

    \ 2861 IPR002522 \

    The Hepatitis C virus has a ssRNA genome. The virion is a nucleocapsid covered by a lipoprotein envelope consisting of two proteins, protein M and glycoprotein E. The nucleocapsid is a complex of protein C and mRNA.

    \ 2769 IPR001722 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 7 comprises enzymes with several known activities; endoglucanase (); cellobiohydrolase (). These enzymes were formerly known as cellulase family C.

    \ \

    Exoglucanases and cellobiohydrolases PUBMED:1886523 play a role in the conversion of cellulose to glucose by cutting the dissaccharide\ cellobiose from the nonreducing end of the cellulose polymer chain.\ Structurally, cellulases and xylanases generally consist of a catalytic\ domain joined to a cellulose-binding domain (CBD) via a linker region that\ is rich in proline and/or hydroxy-amino acids. In type I exoglucanases, the\ CBD domain is found at the C-terminal extremity of these enzyme (this short\ domain forms a hairpin loop structure stabilised by 2 disulphide bridges).

    \ 7320 IPR011120 \

    Neutral trehalases mobilise trehalose accumulated by fungal cells as a protective and storage carbohydrate. This family represents a calcium-binding domain similar to EF hand. Residues 97 and 108 in have been implicated in this interaction. It is thought that this domain may provide a general mechanism for regulating neutral trehalase activity in yeasts and filamentous fungi PUBMED:12943532.

    \ 7857 IPR012524 \

    This family consists of antimicrobial peptides produced by bees. These peptides have strong antimicrobial and some anti-fungal activity and has homology to abaecin which is the largest proline-rich antimicrobial peptide isolated from European bumblebee Bombus pascuorum PUBMED:9219367.

    \ 6365 IPR010538 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 7926 IPR012629 \

    This family consists of conotoxins isolated from the venom of cone snail Conus tulipa and Conus geographus. Conotoxin TVIIA, isolated from Conus tulipa displays little sequence homology with other well-characterised pharmacological classes of peptides, but displays similarity with conotoxin GS, a peptide from Conus geographus. Both these peptides block skeletal muscle sodium channels and also share several biochemical features and represent a distinct subgroup of the four-loop conotoxins PUBMED:10903496.

    \ 637 IPR007230 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of autocatalytic serine endopeptidases belong to MEROPS peptidase family S59 (clan SP).

    \

    The nuclear pore complex protein plays a role in bidirectional transport across the nucleoporin complex in nucleocytoplasmic transport. The mammalian nuclear pore complex (NPC) is comprised of approximately 50 unique proteins, collectively known as nucleoporins. A number of the peptides are synthesised as precursors and undergo self-catalyzed cleavage.

    \ \

    The proteolytic cleavage site of yeast Nup145p has been mapped upstream of an evolutionary conserved serine residue. Cleavage occurs at the same site when a precursor is artificially expressed in Escherichia coli. A hydroxyl-containing residue is critical for the reaction, although a thiol-containing residue offers an acceptable replacement. In vitro kinetics experiments using a purified precursor molecule demonstrate that the cleavage is self-catalyzed and that the catalytic domain lies within the N-terminal moiety. Taken altogether, the data are consistent with a proteolytic mechanism involving an N>O acyl rearrangement and a subsequent ester intermediate uncovered in other self-processing proteins PUBMED:10542288.

    \ \

    Nup98 is a component of the nuclear pore that plays its primary role in the export of RNAs. Nup98 is expressed in two forms, derived from alternate mRNA splicing. Both forms are processed into two peptides through autoproteolysis mediated by the C-terminal domain of hNup98. The three-dimensional structure of the C-terminal domain reveals a novel protein fold, and thus a new class of autocatalytic proteases. The structure further reveals that the suggested nucleoporin RNA binding motif is unlikely to bind to RNA PUBMED:12191480.

    \ 1424 IPR005169 \

    Helicobacter pylori is the most common world-wide infection and plays an important role in pathogenesis of peptic ulcers. The CagA (cytotoxin-associated gene A) protein is a cell-surface antigen which may play a role in determining the relative virulence of the viral strains.

    \ 4755 IPR002501 \ Members of this family are involved in modifying bases in RNA\ molecules. They carry out the conversion of uracil bases to\ pseudouridine. This family includes TruB, a pseudouridylate synthase\ that specifically converts uracil 55 to pseudouridine in most tRNAs.\ This family also includes Cbf5p that modifies rRNA PUBMED:9472021.\ 6615 IPR009625 \

    This is a group of proteins of unknown function.

    \ 6035 IPR009343 \

    This protein family has no known function. Its members are about 300 amino acids in length. It has so far been detected in Firmicute bacteria and some archaebacteria.

    \ 5093 IPR007930 \

    This family contains several uncharacterised proteins found exclusively in Arabidopsis thaliana.

    \ 2783 IPR005076 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 6 comprises enzymes with three known activities; \ alpha-1,3-galactosyltransferase (); alpha-1,3 N-acetylgalactosaminyltransferase ();\ alpha-galactosyltransferase ().

    \ 7790 IPR012887 \

    In the salvage pathway of GDP-L-fucose, free cytosolic fucose is phosphorylated by L-fucokinase to form L-fucose-L-phosphate, which is then further converted to GDP-L-fucose in the reaction catalysed by GDP-L-fucose pyrophosphorylase PUBMED:14686921.

    \ 6776 IPR009710 \

    This family consists of several hypothetical archaeal and one bacterial protein of around 115 residues in length. The function of this family is unknown.

    \ 5404 IPR008761 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \ These group of serine peptidases belong to MEROPS peptidase family S37 (clan SC). The members of this group of secreted peptidases are restricted to bacteria. In Streptomyces lividans 66 the peptidase removes tripeptides from the N terminus of extracellular proteins (tripeptidyl aminopeptidase,Tap) PUBMED:8920189, PUBMED:7487044.\ \ \ 2081 IPR007335 \ This is a family of uncharacterised proteins.\ 3003 IPR002068 \

    Prokaryotic and eukaryotic organisms respond to heat shock or other\ environmental stress by inducing the synthesis of proteins collectively known\ as heat-shock proteins (hsp) PUBMED:2853609. Amongst them is a family of proteins with an\ average molecular weight of 20 Kd, known as the hsp20 proteins PUBMED:7925426. These\ seem to act as chaperones that can protect other proteins against heat-induced\ denaturation and aggregation. Hsp20 proteins seem to form large\ heterooligomeric aggregates. Structurally, this family is characterized by the presence of a conserved C-terminal domain of about 100 residues.

    \ 126 IPR003544 \ Within mitochondria and bacteria, a family of related proteins is involved\ in the assembly of periplasmic c-type cytochromes: these include CycK PUBMED:7665469, CcmF PUBMED:7635817,PUBMED:9043133, NrfE PUBMED:8057835 and CcbS PUBMED:8389979. These proteins may play a role in \ guidance of apocytochromes and haem groups for their covalent linkage \ by the cytochrome-c-haem lyase. Members of the family are probably integral\ membrane proteins, with up to 16 predicted transmembrane (TM) helices. \ \

    The gene products of the hel and ccl loci have been shown to be required\ specifically for the biogenesis of c-type cytochromes in the Gram-negative\ photosynthetic bacterium Rhodobacter capsulatus PUBMED:1310666. Genetic and molecular\ analyses show that the hel locus contains at least 4 genes, helA, helB, helC\ and orf52. HelA is similar to the ABC transporters and helA, helB, and\ helC are proposed to encode an export complex PUBMED:8057835. It is believed that the\ hel-encoded proteins are required for the export of haem to the periplasm,\ where it is subsequently ligated to the c-type apocytochromes PUBMED:1310666. However,\ while CcmB and CcmC have the potential to interact with CcmA, the 3 gene \ products probably associating to form a complex with (CcmA)2-CcmB-CcmC\ stoichiometry, the substrate for the putative CcmABC-transporter is probably\ neither haem nor c-type apocytochromes PUBMED:9043133. Hydropathy analysis suggests\ the presence of 6 TM domains.

    \ 608 IPR006153 \

    The monovalent Cation:Proton antiporter-1 (CPA1) family is a large family of proteins derived from Gram-positive and\ Gram-negative bacteria, blue green bacteria, yeast, plants and animals. \ Transporters from eukaryotes have been functionally characterized, and all of these\ catalyze Na+:H+ exchange. Their primary physiological functions may be in

  • cytoplasmic pH regulation, extruding the H+ generated during metabolism, and
  • salt\ tolerance (in plants), due to Na+ uptake into vacuoles.
  • \

    Na+/H+ exchange proteins eject protons from cells, effectively eliminating excess acid from actively metabolising cells. Na+/H+ exchange activity is also crucial for the regulation of cell volume, and for the reabsorption of NaCl across renal, intestinal, and other epithelia. These antiports exchange Na+ for H+ in an electroneutral manner, and this activity is carried out by a family of Na+/H+ exchangers, or NHEs, which are known to be present in both prokaryotic and eukaryotic cells. In mammalian cells, Na+/H+ exchange activity is found in both the plasma membrane and inner mitochondrial membrane. To date, six mammalian isoforms have been identified (designated NHE1-NHE6) PUBMED:9278382, PUBMED:9507001. These exchangers are highly-regulated (glyco)phosphoproteins, which, based on their primary structure, appear to contain 10-12 membrane-spanning regions (M) at the N-terminus and a large cytoplasmic region at the C-terminus. The transmembrane regions M3-M12 share identity with other members of the family. The M6 and M7 regions are highly conserved. Thus, this is thought to be the region that is involved in the transport of sodium and hydrogen ions. The cytoplasmic region has little similarity throughout the family. There is some evidence that the exchangers may exist in the cell membrane as homodimers, but little is currently known about the mechanism of their antiport PUBMED:9537504.

    \ \ 7744 IPR012465 \

    This family is composed of uncharacterized proteins expressed by Methanopyrus kandleri, a hyperthermophilic archaeon.

    \ 5476 IPR008733 \ This family consists of several peroxisomal biogenesis factor 11 (PEX11) proteins from several eukaryotic species. The PEX11 peroxisomal membrane proteins promote peroxisome division in multiple eukaryotes PUBMED:12417726.\ 4476 IPR001045 \ Synonym(s): Spermidine aminopropyltransferase\

    A group of polyamine biosynthetic enzymes involved in the fifth (last) step in the\ biosynthesis of spermidine from arginine and methionine which includes; \ spermidine synthase (), \ spermine synthase () and \ putrescine N-methyltransferase () PUBMED:9517003.

    \

    The Thermotoga maritima spermidine synthase monomer consists of two domains:\ an N-terminal domain composed of six beta-strands, and a Rossmann-like C-\ terminal domain PUBMED:11731804. The larger C-terminal catalytic core domain\ consists of a seven-stranded beta-sheet flanked by nine alpha helices. This\ domain resembles a topology observed in a number of nucleotide and\ dinucleotide-binding enzymes, and in S-adenosyl-L-methionine (AdoMet)-\ dependent methyltransferase (MTases) PUBMED:11731804.

    \ \ \ 7170 IPR009942 \

    This family consists of several bacterial proteins of around 100 residues in length. Members of this family seem to be found exclusively in Staphylococcus aureus. The function of this family is unknown.

    \ 3661 IPR004897 \ Paramyxoviral P genes are able to generate more than one product, using alternative reading frames and RNA editing. The P gene\ encodes the structural phosphoprotein P. In addition, it encodes several non-structural proteins present in the infected cell but not in\ the virus particle. This family includes phosphoprotein P and the non-structural phosphoprotein V from different paramyxoviruses.\ Phosphoprotein P is essential for the activity of the RNA polymerase complex which it forms with another subunit, L\ . Although all the catalytic activities of the polymerase are associated with the L subunit, its function requires specific interactions with phosphoprotein P PUBMED:11336555. The P and V phosphoproteins are amino co-terminal, but diverge at their C-termini. This difference is generated by an RNA-editing mechanism in which one or two non-templated G residues are inserted into\ P-gene-derived mRNA. In measles virus and Sendai virus, one G residue is inserted and the edited transcript encodes the V protein. In\ mumps, simian virus type 5 and Newcastle disease virus, two G residues are inserted, and the edited transcript codes for the P\ protein PUBMED:11336555. Being phosphoproteins, both P and V are rich in serine and threonine residues over their whole lengths. In addition, the\ V proteins are rich in cysteine residues at the C-termini PUBMED:8277263. \ \ 6952 IPR009807 \

    This family consists of several Phytoreovirus outer capsid protein P8 sequences PUBMED:9343255.

    \ 3990 IPR002496 \ Phosphoribosyl-AMP cyclohydrolase catalyses the third step in the histidine biosynthetic pathway:\ \ It requires Zn2+ ions for activity PUBMED:9931020.\ 2975 IPR006899 \ This domain consists of the N terminus of homeobox-containing transcription factor HNF-1. This region contains a dimerisation sequence PUBMED:1988016 and an acidic region that may be involved in transcription activation. Mutations and the common Ala/Val 98 polymorphism in HNF-1 cause the type 3 form of maturity-onset diabetes of the young (MODY3) PUBMED:9133564.\ 2016 IPR005625 \

    This domain represents a conserved TM helix found in bacterial proteins.

    \ 4447 IPR007011 \

    Late embryogenesis abundant (LEA) proteins accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. LEA proteins have been grouped into at least six families on the basis of sequence similarity. Although significant similarity has not been detected between the members of the different classes, a unifying and outstanding feature of these proteins is their high hydrophilicity and high percentage of glycines. Amino acid sequence analysis allows one to predict that these proteins exist primarily as random coils. This property has been confirmed in few cases with purified proteins and is supported by the fact that proteins of this type do not coagulate upon heating. LEA protein families have been identified in a wide range of different plant species to the extent that they can be considered ubiquitous in plants. Moreover, it has been shown that members of at least one of the LEA protein families, the so-called dehydrins, are present in a range of photosynthetic organisms, including lower plants, algae, and cyanobacteria. In addition similar proteins, the hydrophilins are induced in a variety of different taxons, of non-photosynthetic organsims, in response to osmotic stress. All of these proteins have a high hydrophilicity index, generally greater than 1.0 PUBMED:10681550.

    \ \

    This conserved region identifies a set of plant seed maturation proteins described as LEA D34.

    \ 4836 IPR002730 \

    The function of proteins belonging to this family were previously unknown, however many have now been characterised. The proteins are ribonuclease P subunits that are involved in tRNA processing. They generate mature tRNA molecules by cleaving their 5' ends.

    \ 390 IPR003838 \ Uncharacterized domain in proteins of unknown function. Proteins that contain this domain are often predicted permeases and hypothetical transmembrane proteins.\ 1410 IPR006889 \

    This is a family of plant infecting tospovirus NSM proteins a number of which contain the phospholipase A2 histidine active site, .

    \ 225 IPR007858 \ This domain is about 40 residues long and is probably formed of two alpha-helices. It is found in the Dpy-30 proteins hence the motifs name. It may be a dimerisation motif analogous that found in the cAMP-dependent protein kinase regulator, type II PKA R subunit .\ 3844 IPR003714 \ PhoH is a cytoplasmic protein and predicted ATPase that is induced by phosphate starvation and belongings to the phosphate regulon (pho) in Escherichia coli PUBMED:8444794.\ 2877 IPR005108 \

    The HELP (Hydrophobic ELP) domain is found in EMAP and EMAP-like proteins (ELPs) PUBMED:11694528, PUBMED:7989351. Although called a domain it contains a predicted transmembrane helix and may not form a globular domain. It is also not clear if these proteins localize to membranes.

    \ 4820 IPR001602 \ This family contains small uncharacterised proteins of 14 to 16 kDa mainly from bacteria although the signatures also occur in a hypothetical protein from archaea and from yeast.\ 7638 IPR012492 \

    This family contains sequences that are similar to the C-terminal region of Red protein (). This and related proteins are thought to be localised to the nucleus, and contain a RED repeat which consists of a number of RE and RD sequence elements PUBMED:10216252. The region in question has several conserved NLS sequences PUBMED:10216252. The function of Red protein is unknown, but efficient sequestration to nuclear bodies suggests that its expression may be tightly regulated or that the protein self-aggregates extremely efficiently PUBMED:10216252.

    \ 1065 IPR000412 \

    ATP-binding cassette (ABC) transporters are multidomain membrane proteins, responsible\ for the controlled efflux and influx of substances (allocrites) across cellular membranes. They are minimally composed of four domains, with two transmembrane domains\ (TMDs) responsible for allocrite binding and transport and two nucleotide-binding domains\ (NBDs) responsible for coupling the energy of ATP hydrolysis to conformational changes\ in the TMDs. Both NBDs are capable of ATP hydrolysis, and inhibition of\ hydrolysis at one NBD effectively abrogates hydrolysis at the other. Hydrolysis\ at the two NBDs may occur in an alternative fashion although they appear substantially functionally\ symmetrical in terms of their binding to diverse nucleotides PUBMED:12504680.

    \ A number of bacterial transport systems have been found to contain integral\ membrane components that have similar sequences PUBMED:1303751: these systems fit the\ characteristics of ATP-binding cassette transporters PUBMED:1659649. The\ proteins form homo- or hetero-oligomeric channels, allowing ATP-mediated \ transport. Hydropathy analysis of the proteins has revealed the presence\ of 6 possible transmembrane regions. These proteins belong to family 2 of ABC transporters.\ 7599 IPR011688 \ This is a family of sequences found in both bacteria and bacteriophages. This region is approximately 130 residues long and in some cases is found as part of the PVL (Panton-Valentine leukocidin) group of genes, which encode a member of the leukocidin group of bacterial toxins that kill leukocytes by creation of pores in the cell membrane PUBMED:12044378. PVL appears to be a virulence factor associated with a number of human diseases PUBMED:10524952.\ 337 IPR000949 \

    The ELM2 (Egl-27 and MTA1 homology 2) domain is a small domain of unknown function. It is found in the MTA1 protein that is part of the NuRD complex PUBMED:10226007. The domain is usually found to the N terminus of a myb-like DNA binding domain and a GATA binding domain. ELM2, in some instances, is also found associated with the ARID DNA binding domain . This suggests that ELM2 may also be involved in DNA binding, or perhaps is a protein-protein interaction domain.

    \ \ 6181 IPR010469 \

    This family consists of several hypothetical proteins of unknown function.

    \ 3465 IPR002715 \

    Nascent polypeptide-associated complex (NAC) is among the first ribosome-associated entities to bind the nascent polypeptide after peptide bond formation. The nascent polypeptide-associated complex (NAC) of yeast functions in the targeting process of ribosomes to the ER membrane PUBMED:10518932. NAC may prevent binding of ribosome nascent chains (RNCs) without a signal sequence to yeast membranes.

    \ 1143 IPR005509 \ The AfsA family are key enzymes in A-factor biosynthesis, which is essential for streptomycin production and resistance.\ 7626 IPR012862 \

    The members of this family include sequences that are parts of hypothetical proteins expressed by plant species. The region in question is about 170 amino acids long.

    \ 5630 IPR008611 \ EspB is a type-III-secreted pore-forming protein of enteropathogenic Escherichia coli (EPEC) which is essential for EPEC pathogenesis PUBMED:12071694. EspB is also found in Citrobacter rodentium.\ 1765 IPR001381 \

    3-dehydroquinate dehydratase (), or dehydroquinase, catalyzes the\ conversion of 3-dehydroquinate into 3-dehydroshikimate. It is the third step\ in the shikimate pathway for the biosynthesis of aromatic amino acids from\ chorismate. Two classes of dehydroquinases exist, known as types I and II.

    \

    The\ best studied type I enzyme is from Escherichia coli (gene aroD) and related\ bacteria where it is a homodimeric protein.\ In fungi, dehydroquinase is part of a multifunctional enzyme which catalyzes\ five consecutive steps in the shikimate pathway. A histidine PUBMED:1429576 is involved in the catalytic mechanism.

    \ 2486 IPR000191 \

    Formamidopyrimidine-DNA glycosylase () PUBMED:7704272 (Fapy-DNA glycosylase)\ (gene fpg) is a bacterial enzyme involved in DNA repair and which excise\ oxidized purine bases to release 2,6-diamino-4-hydroxy-5N-methylformamido-\ pyrimidine (Fapy) and 7,8-dihydro-8-oxoguanine (8-OxoG) residues. In addition\ to its glycosylase activity, FPG can also nick DNA at apurinic/apyrimidinic\ sites (AP sites). FPG is a monomeric protein of about 32 Kd which binds and\ require zinc for its activity.

    \

    The N-terminal section (PS01242) is the zinc binding site in the C-terminal part of the Formamidopyrimidine-DNA glycosylase\ enzyme where fours conserved and essential PUBMED:8473347 cysteines are located.

    \ 4191 IPR001383 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The ribosomal L28 protein family include proteins from bacteria\ and chloroplasts. The L24 protein from yeast, found in the large subunit of the mitochodrial ribosome, contains a region similar to the bacterial L28 protein.

    \ 5180 IPR008017 \

    Delta atracotoxin produces potentially fatal neurotoxic symptoms in primates by slowing the\ inactivation of voltage-gated sodium channels PUBMED:9384567. The structure of atracotoxin\ comprises a core beta region containing a triple-stranded a thumb-like extension protruding from\ the beta region and a C-terminal helix. The beta region contains a cystine knot motif, a feature seen in other neurotoxic\ polypeptides PUBMED:9384567.

    \ 2429 IPR002050 \ Enveloped viruses such as Human immunodeficiency virus 1, influenza virus, and Ebola virus express a surface glycoprotein that mediates both cell attachment and fusion of viral and cellular membranes.\ The ENV polyprotein (coat polyprotein) usually contains two coat proteins which differ depending on the source.\ 1860 IPR008217 \

    Members of this family have no known function and are predicted to be integral membrane proteins. The family includes Ccc1 protein from Saccharomyces cerevisiae () that may have a role in regulating calcium levels PUBMED:7941738.

    \ 3356 IPR007605 \ E protein causes host cell lysis by inhibiting MraY, a peptidoglycan biosynthesis enzyme. This leads to cell wall failure at septation PUBMED:12100551. The N-terminal transmembrane region matches the signal peptide model and must be omitted from the family.\ 384 IPR003378 \ The drosophila protein fringe (FNG) is a glucosaminyltransferase that controls the response of the Notch receptor to specific ligands. FNG is localised to the Golgi apparatus PUBMED:10899003 (not secreted as previously thought). Modification of Notch occurs through glycosylation by FNG. The xenopus homologue, lunatic fringe, has been implicated in a variety of functions.\ 6575 IPR010625 \

    A conserved motif was identified in the LOC118487 protein was called the CHCH motif. Alignment of this protein with related members showed the presence of three subgroups of proteins, which are called the S (Small), N (N-terminal extended) and C (C-terminal extended) subgroups. All three sub-groups of proteins have in common that they contain a predicted conserved [coiled coil 1]-[helix 1]-[coiled coil 2]-[helix 2] domain (CHCH domain). Within each helix of the CHCH domain, there are two cysteines present in a C-X9-C motif. The N-group contains an additional double helix domain, and each helix contains the C-X9-C motif. This family contains a number of characterised proteins: Cox19 protein - a nuclear gene of Saccharomyces cerevisiae, codes for an 11 kDa protein (Cox19p) required for expression of cytochrome oxidase. Because cox19 mutants are able to synthesise the mitochondrial and nuclear gene products of cytochrome oxidase, Cox19p probably\ functions post-translationally during assembly of the enzyme. Cox19p is present in the cytoplasm and mitochondria, where it exists as a soluble intermembrane protein. This dual location is similar to what was previously reported for Cox17p, a low molecular weight copper protein thought to be required for maturation of the CuA centre of subunit 2 of cytochrome oxidase. Cox19p have four conserved potential metal ligands, these are three cysteines and one histidine. Mrp10 - belongs to the class of yeast mitochondrial ribosomal proteins that are essential for translation PUBMED:9065385. Eukaryotic NADH-ubiquinone oxidoreductase 19 kDa (NDUFA8) subunit PUBMED:9860297.

    \ \ 3569 IPR008286 \ Pyridoxal-dependent decarboxylases are bacterial proteins acting on ornithine, lysine, arginine and related substrates PUBMED:8181483.\ One of the regions of sequence similarity contains a conserved lysine residue, which is the site of attachment of the pyridoxal-phosphate group.\ 106 IPR000008 \ The C2 domain is a Ca2+-dependent membrane-targeting module found in many cellular proteins involved in signal transduction or membrane trafficking. C2 domains are unique among membrane targeting domains in that they show wide range of lipid selectivity for the major components of cell membranes, including phosphatidylserine and phosphatidylcholine. This C2 domain is about 116 amino-acid residues and is located between the two copies of\ the C1 domain in Protein Kinase C (that bind phorbol esters and diacylglycerol) (see )\ and the protein kinase catalytic domain (see ). Regions with\ significant homology PUBMED:7559667 to the C2-domain have been found in many proteins.\ The C2 domain is thought to be involved in calcium-dependent phospholipid\ binding PUBMED:8253763 and in membrane targetting processes such as subcellular localisation.

    The 3D structure of the\ C2 domain of synaptotagmin has been reported\ PUBMED:7697723, the domain forms an eight-stranded beta sandwich constructed around a \ conserved 4-stranded motif, designated a C2 key PUBMED:7697723. Calcium binds in\ a cup-shaped depression formed by the N- and C-terminal loops of the\ C2-key motif. Structural analyses of several C2 domains have shown them to consist of similar ternary structures in which three Ca2+-binding loops are located at the end of an 8 stranded antiparallel beta sandwich.

    \ 4157 IPR000358 \

    Ribonucleotide reductase () PUBMED:3286319, PUBMED:8511586 catalyzes the reductive synthesis\ of deoxyribonucleotides from their corresponding ribonucleotides:\ \ It provides the precursors necessary for DNA synthesis. RNRs divide into three classes on the basis of their metallocofactor usage. Class I RNRs, found in eukaryotes, bacteria, bacteriophage and viruses, use a\ diiron-tyrosyl radical, Class II RNRs, found in bacteria,\ bacteriophage, algae and archaea, use coenzyme B12\ (adenosylcobalamin, AdoCbl). Class III RNRs, found in\ anaerobic bacteria and bacteriophage, use an FeS cluster and\ S-adenosylmethionine to generate a glycyl radical. Many\ organisms have more than one class of RNR present in their\ genomes.

    \

    Ribonucleotide reductase is an\ oligomeric enzyme composed of a large subunit (700 to 1000 residues) and a\ small subunit (300 to 400 residues) - class II RNRs are less complex,\ using the small molecule B12 in place of the small chain PUBMED:11875520.\ The small chain binds two iron atoms PUBMED:2190093 (three Glu, one Asp, and two His are\ involved in metal binding) and contains an active site tyrosine radical. The\ regions of the sequence that contain the metal-binding residues and the active\ site tyrosine are conserved in ribonucleotide reductase small chain from\ prokaryotes, eukaryotes and viruses.\ We have selected one of these regions as a signature pattern. It contains the\ active site residue as well as a glutamate and a histidine involved in the\ binding of iron.

    \ 645 IPR001754 \

    Orotidine 5'-phosphate decarboxylase (OMPdecase) PUBMED:2835631, PUBMED:1730672 catalyzes the last step in the de novo biosynthesis of pyrimidines, the decarboxylation of OMP into UMP. In higher eukaryotes OMPdecase is part, with orotate phosphoribosyltransferase, of a bifunctional enzyme, while the prokaryotic and fungal OMPdecases are monofunctional protein.

    \

    Some parts of the sequence of OMPdecase are well conserved across species. The best conserved region is located in the N-terminal half of OMPdecases and is centred around a lysine residue which is essential for the catalytic function of the enzyme.

    \ 8023 IPR012569 \

    These are small, all beta strand domains, structurally described for the protein Internalin (InlA) and related proteins InlB, InlE, InlH from the pathogenic bacterium Listeria monocytogenes. Their function appears to be mainly structural: They are fused to the C-terminal end of leucine-rich repeats (LRR), significantly stabilising the LRR, and forming a common rigid entity with the LRR. They are themselves not involved in protein-protein-interactions but help to present the adjacent LRR-domain for this purpose. These domains belong to the family of Ig-like domains in that they consist of two sandwiched beta sheets that follow the classical connectivity of Ig-domains. The beta strands in one of the sheets is, however, much smaller than in most standard Ig-like domains, making it somewhat of an outlier PUBMED:11575932 PUBMED:12526809 PUBMED:15003459.

    \ 5291 IPR008835 \ This family contains several mammalian sclerostin (SOST) proteins. SOST is thought to suppress bone formation. Mutations of the SOST gene lead to sclerosteosis, a progressive sclerosing bone dysplasia with an autosomal recessive mode of inheritance. Radiologically, it is characterised by a generalised hyperostosis and sclerosis leading to a markedly thickened and sclerotic skull, with mandible, ribs, clavicles and all long bones also being affected. Due to narrowing of the foramina of the cranial nerves, facial nerve palsy, hearing loss and atrophy of the optic nerves can occur. Sclerosteosis is clinically and radiologically very similar to van Buchem disease, mainly differentiated by hand malformations and a large stature in sclerosteosis patients PUBMED:11181578.\ 1580 IPR007072 \ Members of this family are about 220 amino acids long. The CmcI protein is presumed to represent the cephalosporin-7--hydroxylase PUBMED:9696752. However this has not been experimentally verified.\ 6065 IPR009357 \

    This is a family of uncharacterised eukaryotic proteins.

    \ 4228 IPR001209 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    S14 is one of the proteins from the small ribosomal subunit.\ In Escherichia coli, S14 is known to be required for the assembly of 30S particles\ and may also be responsible for determining the conformation of 16S rRNA at the A site.\ It belongs to a family of ribosomal proteins PUBMED:8441676, PUBMED: that\ include, bacterial, algal and plant chloroplast, yeast mitochondrial, cyanelle and archael, Methanococcus vannielii S14's, as well as yeast mitochondrial MRP2,\ yeast YS29A/B and mammalian S29.

    \ 17 IPR004147 \

    This entry includes ABC1 from yeast PUBMED:1648478 and AarF from Escherichia coli PUBMED:9422602. These proteins have a nuclear or mitochondrial subcellular location in eukaryotes. The exact molecular functions of these proteins is not clear, however yeast ABC1 suppresses a cytochrome b mRNA translation defect and is essential for the electron transfer in the bc 1 complex PUBMED:1648478 and E. coli AarF is required for ubiquinone production PUBMED:9422602. It has been suggested that members of the ABC1 family are novel chaperonins PUBMED:1648478. These proteins are unrelated to the ABC transporter proteins.

    \ 3515 IPR007298 \ This family represents a bacterial outer membrane lipoprotein that is necessary for signalling by the Cpx pathway PUBMED:11830644. This pathway responds to cell envelope disturbances and increases the expression of periplasmic protein folding and degradation factors. While the molecular function of the NlpE protein is unknown, it may be involved in detecting bacterial adhesion to abiotic surfaces. NlpE from Escherichia coli and Salmonella typhi is also known to confer copper tolerance in copper-sensitive strains of E. coli, and may be involved in copper efflux and delivery of copper to copper-dependent enzymes PUBMED:7635807.\ 3688 IPR008205 \

    This family contains prokaryotic proteins that are related to pcrB. Staphylococcus aureus chromosomal gene pcrA encodes a protein with significant similarity (40% identity) to two Escherichia coli helicases: the helicase II encoded by the uvrD gene and the Rep helicase. PcrB gene seems to belong to an operon containing at least one other gene, pcrBA, downstream from pcrB PUBMED:8232203. The PcrB proteins often contain an FMN binding site although the function of these proteins is still unknown.

    \ 1256 IPR002595 \ The multigene family 360 protein are found within the \ African swine fever virus (ASF) genome which consist of\ dsDNA and has similar structural features to the \ poxviruses PUBMED:2325203. The biological function of this family is \ not known PUBMED:2325203, although is a major structural \ protein PUBMED:7856088.\ 3239 IPR004830 \

    Leucine-rich repeats (LRR) are short sequence motifs present in over sixty proteins, all of which appear to be involved in protein-protein interactions PUBMED:7583641. The superfamily of leucine-rich repeat proteins can be subdivided into at least six subfamilies as characterised by their different lengths and consensus sequences, which consist of multiple tandem repeats of 20-29 residues rich in conserved leucines and other aliphatic residues. Proteins containing LRRs include hormone receptors, cell adhesion molecules, bacterial virulence factors, and DNA repair enzymes. It was proposed that the repeats from different subfamilies retain a similar superhelical fold, but differ in the three-dimensional structures of individual repeats PUBMED:9533877. This signature describes a leucine-rich repeat variant (LRV), which has a novel repetitive structural motif consisting of alternating alpha- and 3(10)-helices arranged in a right-handed superhelix, with the absence of the beta-sheets present in other LRRs PUBMED:8946850.

    \ 7248 IPR010883 \

    This family consists of several uncharacterised viral proteins from the Marek's disease-like viruses. Members of this family are typically around 400 residues in length. The function of this family is unknown.

    \ 5141 IPR007978 \

    This family consists of several baculovirus occlusion-derived virus envelope proteins (EC27 or\ E27) which appear to act as a multifuntional cyclins during the host cell cycle. The ODV-E27 protein has distinct functional characteristics compared to cellular and viral\ cyclins. When associated with cdc2, it\ exhibits cyclin B-like activity; when associated with cdk6, the complex possesses cyclin D-like activity and binds PCNA (proliferating cell nuclear antigen). PUBMED:9736714.

    \ 798 IPR007080 \

    RNA polymerases catalyse the DNA-dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 1, represents the clamp domain, which is a mobile domain involved in positioning the DNA, maintenance of the transcription bubble and positioning of the nascent RNA strand PUBMED:8910400, PUBMED:11313498.

    \ 5441 IPR008644 \

    The function of the U15 herpesvirus proteins is unknown PUBMED:11069999.

    \ 1039 IPR006176 \

    3-hydroxyacyl-CoA dehydrogenase () (HCDH) PUBMED:3479790 is an enzyme involved in fatty acid metabolism, it catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA. Most eukaryotic cells have 2 fatty-acid beta-oxidation systems, one located in mitochondria and the other in peroxisomes. In peroxisomes 3-hydroxyacyl-CoA dehydrogenase forms, with enoyl-CoA hydratase (ECH) and 3,2-trans-enoyl-CoA isomerase (ECI) a multifunctional enzyme where the N-terminal domain bears the hydratase/isomerase activities and the C-terminal domain the dehydrogenase activity. There are two mitochondrial enzymes: one which is monofunctional and the other which is, like its peroxisomal counterpart, multifunctional.

    \

    In Escherichia coli (gene fadB) and Pseudomonas fragi (gene faoA) HCDH is part of a multifunctional enzyme which also contains an ECH/ECI domain as well as a 3-hydroxybutyryl-CoA epimerase domain PUBMED:2204034.

    \

    There are two major regions of similarity in the sequences of proteins of the HCDH family, the first one located in the N-terminal, corresponds to the NAD-binding site, the second one is located in the center of the sequence. This represents the C-terminal domain which is also found in lambda crystallin. Some proteins include two copies of this domain.

    \ 6373 IPR009499 \

    This family contains hypothetical bacterial proteins of unknown function.

    \ 556 IPR007307 \

    The low-temperature viability protein LTV1 was identified in Saccharomyces cerevisiae, the exact function of this protein is unknown.

    \ 4089 IPR001806 \

    Many members of the Ras superfamily of GTPases have been implicated in the regulation of hematopoietic cells, with roles in growth,\ survival, differentiation, cytokine production, chemotaxis, vesicle-trafficking, and phagocytosis. The Ras superfamily of proteins now includes over 150 small GTPases (distinguished from the large,\ heterotrimeric GTPases, the G-proteins). It comprises six subfamilies, the Ras, Rho, Ran, Rab, Arf,\ and Kir/Rem/Rad subfamilies PUBMED:10712923. They exhibit remarkable overall amino acid identities, especially in the regions interacting with the guanine\ nucleotide exchange factors that catalyze their activation PUBMED:12384139.

    \ \ \ 1868 IPR002852 \

    The bacterial and archaeal proteins in this family have no known function.

    \ 4591 IPR004305 \

    Proteins containing this domain are found in all the three major phyla of life: archaebacteria, eubacteria, and eukaryotes. In\ Bacillus subtilis, TENA is one of a number of proteins that enhance the expression of extracellular enzymes, such as\ alkaline protease, neutral protease and levansucrase PUBMED:1898926.

    \

    The THI-4 protein, which is involved in thiamine biosynthesis, also contains this domain. The C-terminal part of these proteins consistently show significant sequence similarity to\ TENA proteins. This similarity was first noted with the Neurospora crassa THI-4 PUBMED:8662211. The exact molecular function of\ this domain is uncertain.

    \ 3533 IPR005554 \ Members of this family are nucleolar RNA-associated proteins (Nrap) which are highly conserved from yeast (Saccharomyces cerevisiae) to human. In the mouse, Nrap is ubiquitously expressed and is specifically localized in the nucleolus PUBMED:11895476. Nrap is a large nucleolar protein (of more than 1000 amino acids). Nrap appears to be associated with ribosome biogenesis by interacting with pre-rRNA primary transcript PUBMED:11895476.\ 6882 IPR009765 \

    This entry represents a repeated sequence of around 34 residues in length. This repeat is found in multiple copies in the Drosophila pericardin and other extracellular matrix proteins.

    \ 1134 IPR005502 \ This family includes enzymes that perform ADP-ribosylations, such as ADP-ribosylarginine hydrolase which cleaves ADP-ribose-L-arginine PUBMED:8349667. The family also includes dinitrogenase reductase activating glycohydrolase PUBMED:2506427, and most surprisingly jellyfish crystallins PUBMED:2506427, although these proteins appear to have lost the presumed active site residues.\ 584 IPR007330 \

    The MIT domain is found in vacuolar sorting proteins, spastin (probable ATPase involved in the assembly or function of nuclear protein complexes), and a sorting nexin, which may play a role in intracellular trafficking.

    \ 4144 IPR011539 \

    The Rel homology domain (RHD) is found in a family of eukaryotic transcription factors, which includes NF-kappaB, Dorsal, Relish, NFAT, among others. Some of these transcription factors appear to form multi-protein DNA-bound complexes PUBMED:9794820. Phosphorylation of the RHD appears to play a role in the regulation of some of these transcription factors, acting to modulate the expression of their target genes PUBMED:15516339. The RHD is composed of two immunoglobulin-like beta-barrel subdomains that grip the DNA in the major groove. The N-terminal specificity domain resembles the core domain of the p53 transcription factor, and contains a recognition loop that interacts with DNA bases; the C-terminal dimerisation domain contains the site for interaction with I-kappaB PUBMED:7830764.

    \ 339 IPR001604 \ A family of bacterial and eukaryotic endonucleases (EC 3.1.30.-) share the following characteristics: they act on both DNA and RNA, cleave double-stranded and single-stranded nucleic acids and require a divalent ion such as magnesium for their activity. An histidine has been shown PUBMED:8078761 to be essential for the activity of the Serratia marcescens nuclease. This residue is located in a conserved region which also contains an aspartic acid residue that could be implicated in the binding of the divalent ion.\ 2893 IPR006772 \

    This is a family of Herpesvirus proteins of unknown function.

    \ 3496 IPR002072 \ During the development of the vertebrate nervous system, many neurons\ become redundant (because they have died, failed to connect to target\ cells, etc.) and are eliminated. At the same time, developing neurons send\ out axon outgrowths that contact their target cells PUBMED:2369898. Such cells control\ their degree of innervation (the number of axon connections) by the\ secretion of various specific neurotrophic factors that are essential for\ neuron survival. One of these is nerve growth factor (NGF or beta-NGF), a vertebrate protein that stimulates\ division and differentiation of sympathetic and embryonic sensory neurons PUBMED:3589669,\ PUBMED:8488558. NGF is mostly found outside the central\ nervous system (CNS), but slight traces have been detected in adult CNS\ tissues, although a physiological role for this is unknown PUBMED:2369898; it has also\ been found in several snake venoms PUBMED:1477101, PUBMED:1995338.\

    NGF is a protein of about 120 residues that is cleaved from a larger\ precursor molecule. It contains six cysteines all involved in intrachain\ disulphide bonds. A schematic representation of the structure of NGF is shown\ below:\

    \
                                        +------------------------+\
                                        |                        |\
                                        |                        |\
            xxxxxxCxxxxxxxxxxxxxxxxxxxxxCxxxxCxxxxxCxxxxxxxxxxxxxCxCxxxx\
                  |                          |     |               |\
                  +--------------------------|-----+               |\
                                             +---------------------+\
    \
    \
    'C': conserved cysteine involved in a disulphide bond.\
    

    \ 7663 IPR012415 \

    Cfr10I () and Bse634I () are two Type II restriction endonucleases. They exhibit a conserved tetrameric architecture that is of functional importance, wherein two dimers are arranged, back-to-back, with their putative DNA-binding clefts facing opposite directions. These clefts are formed between two monomers that interact, mainly via hydrophobic interactions supported by a few hydrogen bonds, to form a U-shaped dimer. Each monomer is folded to form a compact alpha-beta structure, whose core is made up of a five-stranded mixed beta-sheet. The monomer may be split into separate N-terminal and C-terminal subdomains at a hinge located in helix alpha3 PUBMED:11842098.

    \ 994 IPR001202 \

    Synonym(s): Rsp5 or WWP domain

    \

    The WW domain is a short conserved region in a number of\ unrelated proteins, which folds as a stable, triple stranded beta-sheet.\ This short domain of approximately 40 amino acids, may be repeated up to four times in some proteins PUBMED:7846762, PUBMED:7802651, PUBMED:7828727,\ PUBMED:7641887. The name WW or WWP derives from the presence of two signature tryptophan residues that are spaced 20-23 amino acids apart and are present in most WW domains known to date, as well as that of a\ conserved Pro. The WW domain binds to proteins with\ particular proline-motifs, [AP]-P-P-[AP]-Y, and/or phosphoserine- phosphothreonine-containing motifs PUBMED:7644498, PUBMED:11911877. It is frequently associated with other domains typical for proteins in\ signal transduction processes.

    \ \

    A large variety of proteins containing the WW domain are known. These include; dystrophin,\ a multidomain cytoskeletal protein; utrophin, a dystrophin-like protein of unknown\ function; vertebrate YAP protein, substrate of an unknown serine kinase; mouse NEDD-4, \ involved in the embryonic development and differentiation of the central nervous system;\ yeast RSP5, similar to NEDD-4 in its molecular organization; rat FE65, a \ transcription-factor activator expressed preferentially in liver; tobacco DB10 protein\ and others.

    \ 4714 IPR000895 \

    Transthyretin (formerly prealbumin) is one of 3 thyroid hormone-binding proteins found in the blood of vertebrates PUBMED:1833190. It is produced in the liver and circulates in the bloodstream, where it binds retinol and thyroxine (T4) PUBMED:4054629. It differs from the other 2 hormone-binding proteins (T4-binding globulin and albumin) in 3 distinct ways: (1) the gene is expressed at a high rate in the brain choroid plexus; (2) it is enriched in cerebrospinal fluid; and (3) no genetically caused absence has been observed, suggesting an essential role in brain function, distinct from that played in the bloodstream PUBMED:1833190. The protein consists of around 130 amino acids, which assemble as a homotetramer that contains an internal channel in which T4 is bound. Within this complex, T4 appears to be transported across the blood-brain barrier, where, in the choroid plexus, the hormone stimulates further synthesis of transthyretin. The protein then diffuses back into the bloodstream, where it binds T4 for transport back to the brain PUBMED:1833190.

    \ 1644 IPR000298 \

    Cytochrome c oxidase () is the terminal enzyme of the respiratory chain of mitochondria and many aerobic bacteria. It catalyzes the transfer of electrons from reduced cytochrome c to molecular oxygen:

    \ \ \

    This reaction is coupled to the pumping of four additional protons across the mitochondrial or bacterial membrane PUBMED:10563795.

    \ \

    Cytochrome c oxidase is an oligomeric enzymatic complex that is located in the mitochondrial inner membrane of eukaryotes and in the plasma membrane of aerobic prokaryotes. The core structure of prokaryotic and eukaryotic cytochrome c oxidase contains three common subunits, I, II and III. In prokaryotes, subunits I and III can be fused and a fourth subunit is sometimes found, whereas in eukaryotes there are a variable number of additional small polypeptidic subunits PUBMED:8383670. The functional role of subunit III is not yet understood.

    \ \

    As the bacterial respiratory systems are branched, they have a number of distinct terminal oxidases, rather than the single cytochrome c oxidase present in the eukaryotic mitochondrial systems. Although the cytochrome o oxidases do not catalyze the cytochrome c but the quinol (ubiquinol) oxidation they belong to the same heme-copper oxidase superfamily as cytochrome c oxidases. Members of this family share sequence similarities in all three core subunits: subunit I is the most conserved subunit, whereas subunit II is the least conserved PUBMED:1316894, PUBMED:2162835, PUBMED:8083153.

    \ 5075 IPR007912 \

    Adenoviruses have evolved multiple mechanisms to evade the host immune response. Several of the immunomodulatory adenoviral\ proteins are encoded in early transcription unit 3 (E3). The E3A/19K protein interferes with antigen presentation and T cell recognition PUBMED:9707602.

    \ 793 IPR000391 \ The degradation of aromatic compounds by aerobic bacteria frequently begins with the dihydroxylation of the substrate by nonheme iron-containing dioxygenases. These enzymes consist of two or three soluble proteins that interact to form an electron-transport chain that transfers electrons from reduced nucleotides (NADH) via flavin and [2Fe-2S] redox centers to a terminal dioxygenase PUBMED:1444257.\ Aromatic-ring-hydroxylating dioxygenases oxidize aromatic hydrocarbons and related compounds to cis-arene diols. These enzymes utilize a mononuclear non-heme iron center to catalyze the addition of dioxygen to their respective substrates. \

    Naphthalene 1,2-dioxygenase (NDO) from Pseudomonas sp has a domain structure and iron coordination of the Rieske domain is very similar to that of the cytochrome bc1 domain. The active-site iron center of one of the alpha subunits is directly connected by hydrogen bonds through a single amino acid, Asp205, to the Rieske [2Fe-2S] center in a neighboring alpha subunit. This may be the main route for electron transfer PUBMED:9634695.

    \ 2509 IPR001015 \ Synonym(s): Protoheme ferro-lyase, Iron chelatase, etc.\

    Ferrochelatase catalyzes the last step in heme biosynthesis: the chelation of a ferrous ion to proto-porphyrin IX, to form protoheme PUBMED:2185242, PUBMED:1704134. In eukaryotic cells, it binds to the mitochondrial inner membrane with its active site on the matrix side of the membrane.

    \

    The X-ray structure of Bacillus subtilis and human ferrochelatase have been solved PUBMED:9384565, PUBMED:11175906.\ The human enzyme exists as a homodimer. Each\ subunit contains one [Fe2S2] cluster. The monomer is folded into two\ similar domains, each with a four-stranded parallel\ beta-sheet flanked by an alpha-helix in a beta-alpha-beta motif that is\ reminiscent of the fold found in the periplasmic binding\ proteins. The topological similarity between the domains suggests that\ they have arisen from a gene duplication event. However,\ significant differences exist between the two domains, including an\ N-terminal section (residues 80-130) that forms part of the\ active site pocket, and a C-terminal extension (residues 390-423) that\ is involved in coordination of the [Fe2S2]cluster and in\ stabilization of the homodimer. The [Fe2S2] cluster ligands are Cys196,\ Cys403, Cys406 and Cys411. The experiments with\ Co(II) binding show that His230 and Asp383 are part of the enzyme active\ site PUBMED:11175906.

    \

    Ferrochelatase seems to have a structurally conserved core region that is common to the enzyme from bacteria, plants and mammals. Porphyrin binds in the identified cleft; this cleft also includes the metal-binding site of the enzyme. It is likely that the structure of the cleft region will have different conformations upon substrate binding and release PUBMED:9384565.

    \ 7765 IPR012904 \

    The presence of 8-oxoguanine residues in DNA can give rise to G-C to T-A transversion mutations. This enzyme is found in archaeal, bacterial and eukaryotic species, and is specifically responsible for the process which leads to the removal of 8-oxoguanine residues. It has DNA glycosylase activity () and DNA lyase activity () PUBMED:10706276. The region featured in this family is the N-terminal domain, which is organised into a single copy of a TBP-like fold. The domain contributes residues to the 8-oxoguanine binding pocket PUBMED:11902834.

    \ 8134 IPR012387 \

    This group represents a tRNA ligase, yeast type. Please see the following relevant references: PUBMED:12466548, PUBMED:1922054.

    \ 6829 IPR009738 \

    This entry represents the N terminus (approximately 200 residues) of the proline-rich protein BAT2. BAT2 is similar to other proteins with large proline-rich domains, such as some nuclear proteins, collagens, elastin, and synapsin PUBMED:2156268.

    \ 577 IPR011607 \

    This domain composes the whole protein of methylglyoxal synthetase and the domain is also found in Carbamoyl phosphate synthetase (CPS) where it forms a regulatory domain that binds to the allosteric effector ornithine. This family also includes inosicase. The known structures in this family show a common phosphate binding site PUBMED:10526357.

    \ 6287 IPR009460 \

    The release of Ca2+ ions from intracellular stores is a key step in a wide variety of cellular functions. In striated muscle, the release of Ca2+ from the sarcoplasmic reticulum (SR) leads to muscle contraction. Ca2+ release occurs through large, high-conductance Ca2+ release channels, also known as ryanodine receptors (RyRs) because they bind the plant alkaloid ryanodine with high affinity and specificity PUBMED:15110152.

    \

    This region covers TM regions 4-6 of the ryanodine receptor 1 family.

    \ 4240 IPR001976 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This family contains the S24e ribosomal proteins from eukaryotes and archaebacteria. These proteins have 101 to 148 amino acids.

    \ 7615 IPR012427 \

    This is a family of 14 highly conserved sequences, from hypothetical proteins expressed by both bacterial and archaeal species.

    \ 3182 IPR007790 \

    The baculovirus Autographa californica nuclear polyhedrosis virus encodes a DNA-dependent RNA polymerase that is required for\ transcription of viral late genes. This polymerase is composed of four equimolar subunits, LEF-8, LEF-4, LEF-9, and p47. LEF-4 carries out all the enzymatic functions related to mRNA capping PUBMED:12124466.

    \ 1620 IPR000923 \ Blue (type 1) copper proteins are small proteins which bind a single copper atom and which are \ characterized by an intense electronic absorption band near 600 nm PUBMED:6698995, PUBMED:8433378. The most \ well known members of this class of proteins are the plant chloroplastic plastocyanins, which exchange \ electrons with cytochrome c6, and the distantly related bacterial azurins, which exchange electrons with \ cytochrome c551. This family of proteins also includes amicyanin from bacteria such as Methylobacterium \ extorquens or Thiobacillus versutus that can grow on methylamine; auracyanins A and B from Chloroflexus \ aurantiacus PUBMED:1313011; blue copper protein from Alcaligenes faecalis; cupredoxin (CPC) from cucumber \ peelings PUBMED:1468551; cusacyanin (basic blue protein; plantacyanin, CBP) from cucumber; halocyanin from \ Natrobacterium pharaonis PUBMED:8195126, a membrane associated copper-binding protein; pseudoazurin from \ Pseudomonas; rusticyanin from Thiobacillus ferrooxidans PUBMED:1879547; stellacyanin from the Japanese \ lacquer tree; umecyanin from horseradish roots; and allergen Ra3 from ragweed. This pollen protein is \ evolutionary related to the above proteins, but seems to have lost the ability to bind copper. Although \ there is an appreciable amount of divergence in the sequences of all these proteins, the copper ligand \ sites are conserved.\ 5580 IPR008437 \ This family consists of minor structural proteins largely from Homo sapiens calicivirus isolates. Human calicivirus causes gastroenteritis. The function of this family is unknown.\ 642 IPR004156 \

    This family consists of several eukaryotic Organic-Anion-Transporting Polypeptides (OATPs). Several have been identified mostly in human and rat. Different OATPs vary in tissue distribution and substrate specificity. Since the numbering of different OATPs in particular species was based originally on the order of discovery, similarly numbered OATPs in humans and rats did not necessarily correspond in function, tissue distribution and substrate specificity (in spite of the name, some OATPs also transport organic cations and neutral molecules). Thus, Tamai et al. PUBMED:10873595 initiated the current scheme of using digits for rat OATPs and letters for human ones. Prostaglandin transporter (PGT) proteins are also considered to be OATP family members. In addition, the methotrexate transporter OATK is closely related to OATPs. This family also includes several predicted proteins from Caenorhabditis elegans and Drosophila melanogaster. This similarity was not previously noted. Note: Members of this family are described (in the Swiss-Prot database) as belonging to the SLC21 family of transporters.

    \ 7503 IPR011633 \ These proteins have no known function.\ 6223 IPR010481 \

    This is a calponin homology domain.

    \ 7310 IPR011087 \

    The function of these proteins is unknown. All are from Bradyrhizobium japonicum.

    \ 7779 IPR012487 \

    The sequences in this family are similar to the Dugbe virus M polyprotein precursor (), which includes glycoproteins G1 and G2. Both are thought to be inserted in the membrane of the Golgi complex of the infected host cell, and G1 is known to have a role in infection of vertebrate hosts PUBMED:1387749.

    \ 7091 IPR009889 \

    This family consists of several mammalian dentin matrix protein 1 (DMP1) sequences. The dentin matrix acidic phosphoprotein 1 (DMP1) gene has been mapped to human chromosome 4q21 PUBMED:9177774. DMP1 is a bone and teeth specific protein initially identified from mineralised dentin. DMP1 is primarily localised in the nuclear compartment of undifferentiated osteoblasts. In the nucleus, DMP1 acts as a transcriptional component for activation of osteoblast-specific genes like osteocalcin. During the early phase of osteoblast maturation, Ca2+ surges into the nucleus from the cytoplasm, triggering the phosphorylation of DMP1 by a nuclear isoform of casein kinase II. This phosphorylated DMP1 is then exported out into the extracellular matrix, where it regulates nucleation of hydroxyapatite. DMP1 is a unique molecule that initiates osteoblast differentiation by transcription in the nucleus and orchestrates mineralised matrix formation extracellularly, at later stages of osteoblast maturation PUBMED:12615915. The DMP1 gene has been found to be ectopically expressed in lung cancer although the reason for this is unknown PUBMED:12929940.

    \ 5650 IPR006434 \

    This family is a small group of metazoan sequences with one sequence from Arabidopsis thaliana. The sequences from mouse are annotated as pyrimidine 5-nucleotidases, apparently in reference to HSPC233, the human homolog. However, no such annotation can currently be found for this gene. This group of sequences was found during searches for members of the haloacid dehalogenase (HAD) superfamily (). All of the conserved catalytic motifs PUBMED:7966317 are found. The placement of the variable domain between motifs 1 and 2 indicates membership in subfamily I of the superfamily, but these sequences are sufficiently different from any of the branches of that subfamily (IA-ID) as to constitute a separate branch to now be called IE. Considering that the closest identifiable hit outside of the noise range is to a phosphoserine phosphatase, this group may be considered to be most closely allied to subfamily IB.

    \ 5531 IPR008546 \ This family consists of several plant proteins of unknown function.\ 6061 IPR009354 \

    This is a family of bacterial proteins, referred to as Usg. Usg is found in the same operon as trpF, trpB, and trpA and is expressed in a coupled transcription-translation system PUBMED:2828322.

    \ 3742 IPR001548 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M2 (clan MA(E)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA. The catalytic residues and\ zinc ligands have been identified, the zinc ion being ligated to two His residues within the motif HEXXH, showing that the enzyme belongs to the E sub-group of metalloproteases PUBMED:7674922.

    \ \ \ \ Pepetidyl-dipeptidase A (angiotensin-converting enzyme) is a mammalian\ enzyme responsible for cleavage of dipeptides from the C-termini of\ proteins, notably converting angiotensin I to angiotensin II PUBMED:7674922. The enzyme\ exists in two differentially transcribed forms, the most common of which\ is from lung endothelium; this contains two homologous domains that have\ arisen by gene duplication PUBMED:7674922. The testis-specific form contains only the\ C-terminal domain, arising from a duplicated promoter region present in\ intron 12 of the gene PUBMED:7674922. Both enzymatic forms are membrane proteins that are anchored by means of a\ C-terminal transmembrane domain. Both domains of the endothelial enzyme are\ active, but have differing kinetic constants PUBMED:7674922. PUBMED:1851160. A number of insect enzymes have\ been shown to be similar to peptidyl-dipeptidase A, these containing a\ single catalytic domain.

    \ 4007 IPR000221 \ Protamines are small, highly basic proteins, that substitute for histones in\ sperm chromatin during the haploid phase of spermatogenesis. They pack\ sperm DNA into a highly condensed, stable and inactive complex. There are\ two different types of mammalian protamine, called P1 and P2. P1 has been\ found in all species studied, while P2 is sometimes absent. There seems to be\ a single type of avian protamine whose sequence is closely related to that of\ mammalian P1 PUBMED:2808336.\ 759 IPR002483 \

    The PWI domain, found only in eukaryotic proteins, contains a conserved PWI motif at its N terminus PUBMED:10322432. PWI motifs are present in the SRm160 splicing and 3'-end cleavage-stimulatory factor, and in other known pre-mRNA processing components factors. PWI domains have a four-helix bundle fold, and appear to have RNA/DNA binding capability, with an equal preference for single- and double-stranded nucleotides PUBMED:12600940. PWI may have important functions in within splicing complexes, including SRm160-dependent stimulation of 3-end formation.

    \ 506 IPR005635 \

    This region of the inner centromere protein has been found to be necessary and sufficient for binding to aurora-related kinase. This interaction has been implicated in the coordination of chromosome segregation with cell division in yeast.

    \ 251 IPR002902 \ This domain is found in plants and it has no known function. It is found in serine/threonine kinases, associated with the Eukaryotic protein kinase domain . The domain contains four conserved cysteines.\ 4812 IPR001498 \ This group contains a number of uncharacterised proteins from yeast and bacteria.\ 7787 IPR012471 \

    Family of uncharacterised fungal proteins.

    \ 5614 IPR008418 \ This family consists of several Barren protein homologues from several eukaryotic organisms. In Drosophila Barren (barr) is required for sister-chromatid segregation in mitosis. barr encodes a novel protein that is present in proliferating cells and has homologues in Saccharomyces cerevisiae and Homo sapiens. Mitotic defects in barr embryos become apparent during cycle 16, resulting in a loss of PNS and CNS neurons. Centromeres move apart at the metaphase-anaphase transition and Cyclin B is degraded, but sister chromatids remain connected, resulting in chromatin bridging. Barren protein localises to chromatin throughout mitosis. Colocalisation and biochemical experiments indicate that Barren associates with Topoisomerase II throughout mitosis and alters the activity of Topoisomerase II. It has been suggested that this association is required for proper chromosomal segregation by facilitating the decatenation of chromatids at anaphase PUBMED:8978614.\ 2639 IPR006699 \

    Glycerol enters bacterial cells via facilitated diffusion, an\ energy-independent transport process catalysed by the glycerol transport\ facilitator GlpF, an integral membrane\ protein of the aquaporin family. Intracellular\ glycerol is usually converted to glycerol-3-P in an ATP-requiring\ phosphorylation reaction catalysed by glycerol kinase (GlpK).\ Glycerol-3-P, the inducer of the glpFK operon, is not a substrate for GlpF\ and hence remains entrapped in the cell where it is metabolized further. In\ some bacterial species, for example Bacillus subtilis, glycerol-3-P activates the antiterminator GlpP PUBMED:1809833. In Bacillus subtilis, glpF and glpK are organized in an operon followed by the\ glycerol-3-P dehydrogenase-encoding glpD gene and preceded by glpP\ coding for an antiterminator regulating the expression of glpFK, glpD and\ glpTQ. Their induction\ requires the inducer glycerol-3-P, which activates the antiterminator GlpP\ by allowing it to bind to the leader region\ of glpD and presumably also of glpFK and glpTQ\ mRNAs.

    \ 3282 IPR001129 \ This family describes a widespread superfamily of membrane-associated proteins with\ highly divergent functions in eicosanoid and glutathione metabolism (MAPEG)\ PUBMED:10091672. Included are\ the 5-lipoxygenase activating protein (gene FLAP) that seems to be required for the activation of 5-lipoxygenase,\ leukotriene C4 synthase () that catalyzes the production of LTC4 from LTA4\ and microsomal glutathione S-transferase II () (GST-II) that also produces LTC4 from LTA4.\ 595 IPR000555 \

    Members of this family are found in proteasome regulatory subunits, eukaryotic initiation factor 3 (eIF3) subunits and regulators of transcription factors. This family is also known as the MPN domain PUBMED:9150866 and PAD-1-like domain PUBMED:9605331. It has been shown that this domain occurs in prokaryotes PUBMED:9605331.

    \ \

    Mov34 proteins act as the regulatory subunit of the 26 proteasome, which is involved in the ATP-dependent degradation of ubiquitinated proteins. The function of this domain is unclear, but it is found in the N-terminus of \ the proteasome regulatory subunits, eukaryotic initiation factor 3 (eIF3) subunits and regulators of transcription factors.

    \ \

    A number of the proteins associated with this family belong to MEROPS peptidase family M67 (clan M-). This includes the Poh1 peptidase of Saccharomyces cerevisiae which is a component of the 19S proteasome regulatory particle.

    \ 749 IPR001406 \ Transfer RNA-pseudouridine synthetase contains one\ atom of zinc essential for its native conformation and tRNA recognition PUBMED:9585540 and has a strictly conserved aspartic acid that is likely to be involved in catalysis. \ It is involved in the formation of pseudouridine at positions 38, 39 and 40 in the anticodon stem\ and loop of transfer-RNAs.\ Pseudouridine is the most abundant modified nucleoside found in \ all cellular RNAs.\ 7868 IPR012525 \

    This family consists of diapausin-related antimicrobial peptides. Diapause during periods of environmental adversity is an essential part of the life cycle of many organisms with the molecular basis being different among animals. Diapause-specific peptides provide anti-fungal activity and act as N-type voltage-gated calcium channel blocker PUBMED:14706547.

    \ 389 IPR002877 \

    RrmJ (FtsJ) is a well conserved heat shock protein present in prokaryotes, archaea, and eukaryotes. RrmJ is responsible for\ methylating 23 S rRNA at position U2552 in the aminoacyl (A)1-site of the ribosome PUBMED:11976298. U2552 is one of the five universally conserved A-loop residues and has been\ shown to be methylated at the ribose 2'-OH group in the majority of organisms investigated so far. This suggests that this modification plays an important role in the\ A-loop function. RrmJ recognizes its methylation target only when the 23 S rRNA is present in 50 S ribosomal subunits. This suggests that the RrmJ-mediated methylation must occur late in the maturation process of the\ ribosome. This is in contrast to other known 23 S rRNA modifications that occur in earlier maturation steps.

    \ \

    The 1.5 A crystal structure of RrmJ in complex with its cofactor S-adenosylmethionine revealed that RrmJ has a methyltransferase fold. The active site of RrmJ appears to be formed by a catalytic triad consisting of two lysine residues and the negatively charged aspartate residue. Another highly conserved glutamate residue that is present in the active site of RrmJ appears to play only a minor role in the methyltransfer reaction in vivo PUBMED:10983982.

    \ 3433 IPR004189 \

    This transposase is essential for integration, replication-transposition and excision of Enterobacteria phage Mu DNA. Transposition requires transposase and a transposition enhancer, and the DNA can be transposed into multiple sites in bacterial genomes.

    \

    The crystal structure of the core domain of Enterobacteria phage Mu transposase, MuA, has been determined. The first of two subdomains contains the active site and, despite very limited sequence homology, exhibits a striking similarity to the core domain of Human immunodeficiency virus-1 integrase. The enzymatic activity of MuA is known to be activated by formation of a DNA-bound tetramer of the protein PUBMED:7628012.

    \ 2343 IPR002775 \

    The DNA/RNA-binding protein Alba binds double-stranded DNA tightly but without sequence specificity. It binds rRNA and mRNA in\ vivo, and may play a role in maintaining the structural and functional\ stability of RNA, and, perhaps, ribosomes. It is distributed uniformly and abundantly on the chromosome. Alba has been shown to bind DNA and affect DNA supercoiling in a temperature dependent manner PUBMED:10869069. It is regulated by acetylation (alba = acetylation lowers binding affinity) by the Sir2 protein. Alba is proposed to play a role in establishment or maintenace of chromatin architecture and thereby in transcription repression.

    \ 5871 IPR009268 \

    These proteins of unknown function are found in Rice black streaked dwarf virus and other viruses.

    \ 5607 IPR008228 \ There is currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 4193 IPR001854 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. L29 belongs to a family of ribosomal proteins of 63 to 138 amino-acid residues which, on the basis of sequence similarities PUBMED:, groups:\

    \ 2659 IPR007504 \ Gar1 is a small nucleolar RNP that is required for pre-mRNA processing and pseudouridylation PUBMED:10690410. It is co-immunoprecipitated with the H/ACA families of snoRNAs. This family represents the conserved central region of Gar1. This region is necessary and sufficient for normal cell growth, and specifically binds two snoRNAs snR10 and snR30. This region is also necessary for nucleolar targeting, and it is thought that the protein is co-transported to the nucleolus as part of a nucleoprotein complex PUBMED:9556561. In humans, Gar1 is also component of telomerase in vivo PUBMED:10757788.\ 414 IPR000160 \

    This domain appears to be ubiquitous in bacteria and is often linked to a regulatory domain, such as a phosphorylation receiver or oxygen sensing domain. Its function is to synthesize cyclic di-GMP, which is used as an intracellular signalling molecule in a wide variety of bacteria PUBMED:15075296,PUBMED:15716451. Enzymatic activity can be strongly influenced by the adjacent domains. Processes regulated by this domain include exopolysaccharide synthesis, biofilm formation, motility and cell differentiation.

    \ \

    Structural studies of PleD from Caulobacter crescentus show that this domain forms a five-stranded beta sheet surrounded by helices, similar to the catalytic core of adenylate cyclase PUBMED:15569936.

    \ 2246 IPR007658 \ This is a family of uncharacterised proteins.\ 955 IPR003613 \

    Quality control of intracellular proteins is essential for cellular homeostasis. Molecular chaperones recognise and contribute to the refolding of misfolded or unfolded proteins, whereas the ubiquitin-proteasome system mediates the degradation of such abnormal proteins. Ubiquitin-protein ligases (E3s) determine the substrate specificity for ubiquitylation and have been classified into HECT and RING-finger families. More recently, however, U-box proteins, which contain a domain (the U box) of about 70 amino acids that is conserved from yeast to humans, have been identified as a new type of E3 PUBMED:12944364.

    \ \

    Members of the U-box family of proteins constitute a class of ubiquitin-protein ligases (E3s) distinct from the HECT-type and RING finger-containing E3\ families PUBMED:12944364. Using yeast two-hybrid technology, all mammalian U-box proteins have been reported to interact with molecular chaperones or co-chaperones, including Hsp90, Hsp70, DnaJc7, EKN1, CRN, and VCP. This suggests that the function of U box-type E3s is to mediate the degradation of unfolded or misfolded proteins in conjunction with molecular chaperones as receptors that recognise such abnormal proteins PUBMED:15115282, PUBMED:15189447.

    \ \

    Unlike the RING finger domain, , that is stabilised by Zn2+ ions coordinated by\ the cysteines and a histidine, the U-box scaffold is probably stabilised by a system of salt-bridges and hydrogen bonds. The charged and polar residues that participate in this network of bonds are more strongly conserved in the U-box proteins than in classic RING fingers, which supports their role in maintaining the stability of the U box. Thus, the U box appears to have evolved from a RING finger domain by appropriation of a new set of residues required to stabilise its structure, concomitant with the loss of the\ original, metal-chelating residues PUBMED:10704423.

    \ \ 7455 IPR013039 \

    A region of similarity shared by several Rhodopirellula baltica cytochrome-like proteins that are predicted to be secreted. These proteins also contain , , and .

    \ 2218 IPR007570 \ This is a protein of unknown function found in a cyanobacterium, and the chloroplasts of algae.\ 4025 IPR003398 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbN found in PSII. PsbN may have a role in PSII stability, however its actual function unknown. PsbN does not appear to be essential for photoautotrophic growth or normal PSII function.

    \ 2600 IPR006157 \

    Dihydroneopterin aldolase catalyses the conversion of 7,8-dihydroneopterin to 6-hydroxymethyl-7,8-dihydropterin in the biosynthetic pathway of tetrahydrofolate. In the opportunistic pathogen Pneumocystis carinii, dihydroneopterin aldolase function is expressed as the N-terminal portion of the\ multifunctional folic acid synthesis protein (Fas). This region encompasses two domains, FasA and FasB, which are 27% amino acid\ identical. FasA and FasB also share significant amino acid sequence similarity with bacterial dihydroneopterin aldolases.

    This region consists of two\ tandem sequences each homologous to folB and which form tetramers PUBMED:9709001.

    \ 4223 IPR002768 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The ribosomal protein LX appears to be specific to archaebacteria.

    \ 2682 IPR007389 \

    This domain is found at the N terminus of D-galactarate dehydratase () which is thought to catalyse the reaction PUBMED:9772162 and altronate hydrolase (altronic acid hydratase, ), which catalyses PUBMED:9579062. As purified, both enzymes are catalytically inactive in the absence of added Fe2+, Mn2+, and beta-mercaptoethanol. Synergistic activation of altronate hydrolase activity is seen in the presence of both iron and manganese ions, suggesting that the enzyme may have two ion binding sites. Mn2+ appears to be part of the enzyme active centre, but the function of the single bound Fe2+ ion is unknown. The hydratase has no Fe-S core PUBMED:3038546. The C-terminal is represented by .

    \ 1293 IPR005143 \

    This domain binds N-acyl homoserine lactones (AHLs), which are also known as autoinducers. These are small, diffusible molecules used as communication signals in a large variety of proteobacteria. It is almost always found in association with the DNA-binding LuxR domain (). The autoinducer binding domain forms the N-terminal region of the protein, while the DNA-binding domain forms the C-terminal region. In most cases, binding of AHL by this N-terminal domain leads to unmasking of the DNA-binding domain, allowing it to bind DNA and activate transcription PUBMED:11544353. In rare cases, some LuxR proteins such as EsaR, act as repressors PUBMED:12067349. In these proteins binding of AHL to this domain leads to inactivation of the protein as a transcriptional regulator. A large number of processes have been shown to be regulated by LuxR proteins, including bioluminescence, production of virulence factors in plant and animal pathogens, antibiotic production and plasmid transfer.

    \ \

    Structural studies of TraR from Agrobacterium tumefaciens PUBMED:12087407, PUBMED:12198141 show that the functional protein is a homodimer. Binding of the cognate AHL is required for protein folding, resistance to proteases and dimerisation. The autoinducer binding domain binds its cognate AHL in an alpha/beta/alpha sandwich and provides an extensive dimerisation surface, though residues from the C-terminal region also make some contribution to dimerisation. The autoinducer binding domain is also required for interaction with RpoA, allowing transcription to occur PUBMED:15237104.

    \ \

    There are some proteins which consist solely of the autoinducer binding domain. The function of these is not known, but TrlR from Agrobacterium has been shown to inhibit the activity of TraR by the formation of inactive heterodimers PUBMED:11309123.

    \ 6981 IPR009823 \

    This family consists of several SORF3 proteins from the Marek's disease-like viruses. Members of this family are around 350 residues in length. The function of this family is unknown.

    \ 5833 IPR009252 \

    This family consists of several bacterial and archaeal hypothetical proteins of unknown function.

    \ 7946 IPR012589 \

    This family consists of small proteolipids associated with the plasma membrane H+ ATPase. Two proteolipids (PMP1 and PMP2) are associated with the ATPase and both genes are similarly expressed in the wild-type strain of yeast with no modification of the level of transcription of one PMP gene is detected in a strain deleted of the other. Though both proteolipids show similarity with other small proteolipids associated with other cation -transporting ATPases, their functions remain unclear PUBMED:8063750.

    \ 5489 IPR008526 \ This family consists of several bacterial proteins of unknown function.\ 661 IPR007724 \ Poly(ADP-ribose) glycohydrolase (PARG) is a ubiquitously expressed exo- and endoglycohydrolase which mediates oxidative and excitotoxic neuronal death PUBMED:11593040.\ 121 IPR001956 \

    This domain is involved in cellulose binding PUBMED:1490597 and is found\ associated with a wide range of bacterial glycosyl hydrolases. The structure for\ this domain is known PUBMED:8918451; it forms a beta sandwich.

    \ 6586 IPR010630 \

    This entry represents several mammalian specific repeats of around 65 residues in length and is found in multiple copies in several human proteins. The function of this family is unknown.

    \ 3646 IPR003721 \ D-Pantothenate is synthesized via four enzymes from ketoisovalerate, which is an\ intermediate of branched-chain amino acid synthesis PUBMED:10223988.\ Pantoate-beta-alanine ligase, also know as pantothenate synthase, () catalyzes the formation of\ pantothenate from pantoate and alanine in the pantothenate biosynthesis pathway PUBMED:8760912.\ 1146 IPR007733 \ The agouti protein regulates pigmentation in the mouse hair follicle producing a black hair with a subapical yellow band. A highly homologous protein agouti signal protein (ASIP) is present in humans and is expressed at highest levels in adipose tissue where it may play a role in energy homeostasis and possibly human pigmentation PUBMED:11837451, PUBMED:11833005.\ 762 IPR001327 \

    This entry describes a small NADH binding domain within a larger FAD binding domain described by . It is found in both class I and class II oxidoreductases.

    \

    FAD flavoproteins belonging to the family of pyridine nucleotide-disulphide \ oxidoreductases (glutathione reductase, trypanothione reductase, lipoamide dehydrogenase, mercuric reductase, thioredoxin reductase, alkyl hydroperoxide reductase) share sequence similarity with a number of other flavoprotein oxidoreductases, in particular with ferredoxin-NAD+ reductases involved in oxidative metabolism of a variety of hydrocarbons (rubredoxin reductase, putidaredoxin reductase, terpredoxin reductase, ferredoxin-NAD+ \ reductase components of benzene 1,2-dioxygenase, toluene 1,2-dioxygenase, chlorobenzene dioxygenase, biphenyl dioxygenase), NADH oxidase and NADH peroxidase PUBMED:2319593, PUBMED:1404382, PUBMED:2067578. Comparison of the crystal structures of human glutathione \ reductase and Escherichia coli thioredoxin reductase reveals different locations of their active sites, suggesting that the enzymes diverged from an ancestral FAD/NAD(P)H reductase and acquired their disulphide reductase activities independently PUBMED:2067578.

    \

    \ Despite functional similarities, oxidoreductases of this family show no sequence \ similarity with adrenodoxin reductases PUBMED:2924777 and flavoprotein pyridine nucleotide cytochrome reductases (FPNCR) PUBMED:1748631. Assuming that disulphide reductase activity \ emerged later, during divergent evolution, the family can be referred to as FAD-dependent pyridine nucleotide reductases, FADPNR.

    \

    To date, 3D structures of glutathione reductase PUBMED:3656429, thioredoxin reductase PUBMED:2067578, mercuric reductase PUBMED:2067577, lipoamide dehydrogenase PUBMED:1880807, \ trypanothione reductase PUBMED:1924336 and NADH peroxidase PUBMED:1942054 have been solved. The enzymes share similar tertiary structures based on a doubly-wound alpha/beta fold, but the relative orientations of their FAD- and NAD(P)H-binding domains may vary \ significantly. By contrast with the FPNCR family, the folds of the FAD- and \ NAD(P)H-binding domains are similar, suggesting that the domains evolved by gene \ duplication PUBMED:7411611.\

    \ 2585 IPR001591 \ Orthomyxoviridae RNA polymerase with the subunit composition of PB1-PB2-PA is a unique multifunctional enzyme with the activities of both synthesis and cleavage of RNA, and is involved in both transcription and replication of the RNA genome. Transcription is initiated by using capped RNA fragments, which are generated after cleavage of host cell mRNA by the RNA polymerase-associated capped RNA endonuclease PUBMED:8806170. It would appear that two separate sequences, one N-(242-282) and the other C-terminal (538-577)\ proximal segments of PB2 subunit, constitute the RNA cap-binding site of the\ influenza virus RNA polymerase PUBMED:10526235.\ 1681 IPR004323 \

    CutA1 is a widespread protein of about 12 kDa found in bacteria, plants, and animals, including humans PUBMED:12949080. The protein was originally identified in a gene locus of\ Escherichia coli called cutA involved in divalent metal tolerancePUBMED:7623666. The cutA locus consists of two operons, one containing a single gene encoding a cytoplasmic\ protein, CutA1, and the other composed of two genes encoding a 50-kDa (CutA2) and a 24-kDa (CutA3) inner membrane proteins. Molecular genetics studies on the\ E. coli cutA locus showed that some mutations lead to copper sensitivity due to its increased uptake PUBMED:9260936. However, the specific function of CutA1 in E. coli is still\ unknown.

    \

    However, a possible role of mammalian CutA1 in the anchoring of the enzyme\ acetylcholinesterase (AChE)1 in neuronal cell membranes. CutA1 does not directly interact with AChE, but the CutA1 gene is widely expressed in different regions of the brain with an expression\ pattern that parallels that of AChE. In addition CutA1 Co-purified with AChE from human caudate nucleus. CutA1, thus, might provide an intriguing link between copper tolerance in bacteria and a\ complex process in the brain of the most evolved organisms.

    \

    Both rat and E. coli CutA1 have been crystallised PUBMED:12949080. Both\ proteins are trimeric in the crystals and in solution through an inter-subunit beta-sheet formation. Each monomer exhibits the same overall structure, adopting a ferredoxin-like fold made of an alpha-beta sandwich with antiparallel beta-sheet and containing an additional short strand and a C-terminal helix. In the beta-sheet, alternate strands are connected by helices with positive crossovers, resulting in a double beta-alpha-beta motif\ where the antiparallel beta-sheet packs against antiparallel alpha-helices. The C-terminal helix packs orthogonal to the N terminus.

    \ \

    \ The strong structure similarity of CutA1 with PII proteins might point to an role for CutA1 in signalling through allosteric communication between monomers. CutA1 may be involved in the tuning of a disulphide bond cascade in bacteria and mammals, acting as the PII proteins do in the nitrogen signal cascade in bacteria and plants.

    \ 5461 IPR008511 \ This family consists of several plant proteins of unknown function.\ 1758 IPR007030 \

    This conserved region is about 120 residues long, encompassing nearly the total sequence length. It defines a family of Bacterial proteins whose functions have not been determined. This family is named after the most conserved motif found in the alignment of the family members. In a single instance it is found as an N-terminal domain fused to a putative RNA polymerase sigma factor containing a Myb-like DNA-binding domain ().

    \ 2710 IPR008147 \

    Glutamine synthetase () (GS) PUBMED:2900091 plays an essential role in the metabolism of nitrogen by catalyzing the condensation of glutamate and ammonia to form glutamine.

    \

    There seem to be three different classes of GS PUBMED:8096645, PUBMED:2575672, PUBMED:7916055:\

    \

    While the three classes of GS's are clearly structurally related, the sequence similarities are not so extensive.

    \ 1183 IPR000120 \ It has been shown PUBMED:2254253, PUBMED:2001397, PUBMED:2263500\ that several enzymes from various prokaryotic and\ eukaryotic organisms which are involved in the hydrolysis of amides (amidases)\ are evolutionary related. All these enzymes contains in their central section\ a highly conserved region rich in glycine, serine, and alanine residues.\ 702 IPR006444 \

    This family represents the major capsid protein component of the heads (capsids) of bacteriophage HK97, phi-105, P27, and related phage. This group represent one of several analogous families lacking detectable sequence similarity. The gene encoding this component is typically located in an operon encoding the small and large terminase subunits, the portal protein and the prohead or maturation protease.

    \ 2644 IPR003322 \

    Retroviral matrix proteins (or major core proteins) are components of envelope-associated capsids, which line the inner surface of virus envelopes and are associated with viral membranes PUBMED:9657938. Matrix proteins are produced as part of Gag precursor polyproteins. During viral maturation, the Gag polyprotein is cleaved into major structural proteins by the viral protease, yielding the matrix (MA), capsid (CA), nucleocapsid (NC), and some smaller peptides. Gag-derived proteins govern the entire assembly and release of the virus particles, with matrix proteins playing key roles in Gag stability, capsid assembly, transport and budding. Although matrix proteins from different retroviruses appear to perform similar functions and can have similar structural folds, their primary sequences can be very different.

    \

    This entry represents matrix proteins from beta-retroviruses such as Mason-Pfizer monkey virus (M-PMV) and mouse mammary tumour virus (MMTV) PUBMED:15113883, PUBMED:9499052. This entry also identifies matrix proteins from several eukaryotic endogenous retroviruses, which arise when one or more copies of the retroviral genome becomes integrated into the host genome PUBMED:12876457.

    \ 7719 IPR012852 \

    Proteins found in this family are similar to the coiled-coil transcriptional coactivator protein expressed by Mus musculus (CoCoA, ). This protein binds to a highly conserved N-terminal domain of p160 coactivators, such as GRIP1 (), and thus enhances transcriptional activation by a number of nuclear receptors. CoCoA has a central coiled-coil region with three leucine zipper motifs, which is required for its interaction with GRIP1 and may regulate the autonomous transcriptional activation activity of the C-terminal region PUBMED:14690606.

    \ 1883 IPR003730 \

    This entry describes proteins of unknown function.

    \ 581 IPR004166 \ Proteins containing this domain consist of a novel group of eukaryotic protein kinase catalytic domains, which have no detectable similarity to conventional kinases. Proteins include myosin heavy chain kinases PUBMED:7822274, PUBMED:9054368 and Elongation Factor-2 kinase and a bifunctional ion channel PUBMED:11161216.\ 5068 IPR007905 \

    Emopamil binding protein (EBP) is a nonglycosylated type I integral\ membrane protein of endoplasmic reticulum and shows high level expression in epithelial tissues. The\ EBP protein has emopamil binding domains, including the sterol acceptor site and the catalytic\ centre, which show Delta8-Delta7 sterol isomerase activity. Human sterol isomerase, a homologue\ of mouse EBP, is suggested not only to play a role in\ cholesterol biosynthesis, but also to affect lipoprotein internalisation. In humans, mutations of EBP\ are known to cause the genetic disorder of X-linked dominant chondrodysplasia punctata (CDPX2).\ This syndrome of humans is lethal in most males, and affected females display asymmetric\ hyperkeratotic skin and skeletal abnormalities PUBMED:11471053.

    \ 3864 IPR000081 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This domain defines cysteine peptidases belong to MEROPS peptidase family C3 (picornain, clan PA(C)), subfamilies 3CA and 3CB. The protein fold of this peptidase domain for members of this family resembles that of the serine peptidase, chymotrypsin PUBMED:8164744, the type example for clan PA.

    \ \

    Picornaviral proteins are expressed as a single polyprotein\ which is cleaved by the viral 3C cysteine protease PUBMED:9460917. The poliovirus polyprotein is selectively cleaved between the Gln-|-Gly bond. In other picornavirus reactions Glu may be substituted for Gln, and Ser or Thr for Gly.\

    \ 6244 IPR010489 \

    This family consists of several hypothetical bacterial proteins exclusive to Escherichia coli and Salmonella typhi. The function of this family is unknown.

    \ 7483 IPR006209 \ A sequence of about thirty to forty amino-acid residues long found in the sequence of epidermal growth factor (EGF)\ has been shown PUBMED:, PUBMED:3282918, PUBMED:6607417, PUBMED:2288911, PUBMED:6334307 to be present, in a more\ or less conserved form, in a large number of other, mostly animal proteins. The list of proteins currently known to\ contain one or more copies of an EGF-like pattern is large and varied. The functional significance of EGF domains in\ what appear to be unrelated proteins is not yet clear. However, a common feature is that these repeats are found in\ the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin\ G/H synthase). The EGF domain includes six cysteine residues which have been shown (in EGF) to be involved in disulphide\ bonds. The main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet.\ Subdomains between the conserved cysteines vary in length.\ 5168 IPR008005 \

    This family consists of several uncharacterised nucleopolyhedrovirus proteins of unknown\ function.

    \ 5735 IPR008584 \ This family consists of a number of hypothetical eukaryotic proteins of unknown function with an average length of around 165 residues.\ 5815 IPR010295 \

    This family consists of several bacterial proteins of unknown function. Some of the family, including YjgN, are putative transmembrane proteins.

    \ 3639 IPR005149 \

    Phenolic acids, also called substituted hydroxycinnamic acids, are abundant in the plant kingdom because they are involved in the structure of plant cell walls and are present in some vacuoles. In plant-soil ecosystems they are released as free acids by hemicellulases produced by several fungi and bacteria. Of these weak acids, the most abundant are p-coumaric, ferulic, and caffeic acids, considered to be natural toxins that inhibit the growth of microorganisms, especially at low pHs. In spite of this chemical stress, some bacteria can use phenolic acids as a sole source of carbon. For other microorganisms, these compounds induce a specific response by which the organism adapts to its environment. The ubiquitous lactic acid bacterium Lactobacillus plantarum exhibits an inducible phenolic acid decarboxylase (PAD) activity which converts these substrates into less-toxic vinyl phenol derivatives. PadR acts as a repressor of padA gene expression in the phenolic acid stress response PUBMED:15066807.

    \ 2060 IPR007257 \

    DNA replication in eukaryotes results from a highly coordinated interaction between proteins, often as part of protein complexes, and the DNA template. One of the key early steps leading to DNA replication is formation of the prereplication complex, or pre-RC. The pre-RC is formed by the sequential binding of the origin recognition complex (ORC), Cdc6 and Cdt1 proteins, and the MCM complex. Activation of the pre-RC into the initiation complex (IC) is achieved via the action of S-phase kinases, eventually leading to the loading of the replication machinery.

    \

    Recently, a novel replication complex, GINS (for Go, Ichi, Nii, and San; five, one, two, and three in Japanese), has been identified PUBMED:12730133, PUBMED:12730134. \ \ The precise function of GINS is not known. However, genetic and two-hybrid interactions indicate that it mediates the loading of the enzymatic replication machinery at a step after the action of the S-phase kinases PUBMED:12730134. Furthermore, GINS may be a part of the replication machinery itself, since it is found associated with replicating DNA PUBMED:12730133, PUBMED:12730134. Electron microscopy of GINS shows that it forms a ring-like structure PUBMED:12730133, reminiscent of the structure of PCNA PUBMED:8001157, the DNA polymerase delta replication clamp.This observation, coupled with the observed interactions for GINS, indicates that the complex may represent the replication clamp for DNA polymerase epsilon PUBMED:12730133.

    \ \ \

    The GINS complex is essential for initiation of DNA replication in Xenopus egg extracts PUBMED:12730133. This 100 kDa stable complex includes Sld5, Psf1, Psf2, and Psf3. Homologues of these components are found also in other eukaryotes. This family of proteins represents the Psf2 component.

    \ 4469 IPR004261 \ The hepatitis E virus structural protein 2 has a high basic amino acid content suggesting that it may play a role in viral genomic RNA encapsidation.\ 2240 IPR007610 \

    This region represents the N-termini of bromovirus 2a protein, and is always found N-terminal to a predicted RNA dependent RNA polymerase region ().

    \ 1193 IPR003393 \

    Ammonia monooxygenase and the particulate methane monooxygenase are both integral membrane proteins, occurring in ammonia oxidisers and methanotrophs respectively, which are thought to be evolutionarily related PUBMED:7590173. These enzymes have a relatively wide substrate specificity and can catalyse the oxidation of a range of substrates including ammonia, methane, halogenated hydrocarbons and aromatic molecules PUBMED:12209257. These enzymes are composed of 3 subunits - A (), B () and C () - and contain various metal centres, including copper. Particulate methane monooxygenase from Methylococcus capsulatus (Bath) is an ABC homotrimer, which contains mononuclear and dinuclear copper metal centres, and a third metal centre containing a metal ion whose identity in vivo is not certainPUBMED:15674245.

    \

    The A subunit from Methylococcus capsulatus (Bath) resides primarily within the membrane and consists of 7 transmembrane helices and a beta-hairpin which interacts with the soluble region of the B subunit. A conserved glutamate residue is thought to contribute to a metal centre PUBMED:15674245.

    \ 752 IPR007557 \ This region is present in both eukaryotes and eubacteria. The yeast PSP1 protein is involved in suppressing mutations in the DNA polymerase alpha subunit in yeast PUBMED:9529527.\ 296 IPR007632 \ This family contains several uncharacterised eukaryotic proteins.\ 1671 IPR003329 \

    Synonym(s): CMP-N-acetylneuraminic acid synthetase

    \

    Acylneuraminate cytidylyltransferase () (CMP-NeuAc synthetase) catalyzes the reaction of CTP and NeuAc to form CMP-NeuAc, which is the nucleotide sugar donor used by sialyltransferases PUBMED:8663048. The outer membrane lipooligosaccharides of some microorganisms contain terminal sialic acid attached to N-acetyllactosamine and so this modification may be important in pathogenesis.

    \ 7535 IPR011660 \ This entry represents the Rv0623 ()-like group of transcription factors associated with the PSK operon PUBMED:14659018.\ 1982 IPR005104 \ This domain is always found with a pair of CBS domains . This region may be distantly related to the HrcA proteins of prokaryotes.\ 7472 IPR011503 \

    This conserved sequence is centered around an invariant motif of PGAMP in several short hypothetical proteins from the planctomycete Rhodopirellula baltica. The motif also occurs twice in .

    \ 6350 IPR010530 \

    This family consists of several plant specific B12D proteins. The function of this protein is unknown but in barley B12D transcripts are expressed mainly during seed maturation and germination PUBMED:11473698.

    \ 167 IPR005480 \

    Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate from \ glutamine () or ammonia () and bicarbonate PUBMED:1972379. This important enzyme \ initiates both the urea cycle and the biosynthesis of arginine and pyrimidines. Glutamine-dependent CPSase \ (CPSase II) is involved in the biosynthesis of pyrimidines and purines.

    In bacteria such as Escherichia coli, a \ single enzyme is involved in both biosynthetic pathways while other bacteria have separate enzymes. The \ bacterial enzymes are formed of two subunits. A small chain (carA) that provides glutamine amidotransferase \ activity (GATase) necessary for removal of the ammonia group from glutamine, and a large chain (carB)\ that provides CPSase activity. The large subunit consists of\ four structural units: the carboxyphosphate synthetic component, the oligomerization domain, the carbamoyl phosphate synthetic\ component and the allosteric domain PUBMED:10089390. Such a structure is also present in fungi for arginine biosynthesis (CPA1 \ and CPA2).

    Two main CPSases have been identified in mammals, CPSase I is mitochondrial, is found in \ high levels in the liver and is involved in arginine biosynthesis; while CPSase II is cytosolic, is \ associated with aspartate carbamoyltransferase (ATCase) and dihydroorotase (DHOase) and is involved in \ pyrimidine biosynthesis. In the pyrimidine pathway in most eukaryotes, CPSase is found as a domain in a \ multi-functional protein, which also has GATase, ACTase and DHOase activity. Ammonia-dependent CPSase \ (CPSase I) is involved in the urea cycle in ureolytic vertebrates and is a monofunctional protein located \ in the mitochondrial matrix. The CPSase domain is typically 120 kD in size and has arisen from the \ duplication of an ancestral subdomain of about 500 amino acids. Each subdomain independently binds to ATP \ and it is suggested that the two homologous halves act separately, one to catalyze the phosphorylation of \ bicarbonate to carboxyphosphate and the other that of carbamate to carbamyl phosphate. The CPSase subdomain \ is also present in a single copy in the biotin-dependent enzymes acetyl-CoA carboxylase () (ACC), \ propionyl-CoA carboxylase () (PCCase), pyruvate carboxylase () (PC) and urea carboxylase\ ().

    \ 2492 IPR000146 \ Inositol polyphosphate 1-phosphatase (1PTASE) and inositol monophosphatase\ (MPTASE) are enzymes of the inositol signalling pathway that share similar\ enzymatic activity PUBMED:7761465. Both enzymes exhibit an absolute requirement for\ metal ions (Mg2+ is preferred), and both are uncompetitively inhibited by\ submillimolar concentrations of Li+. Their amino acid sequences contain\ a number of conserved motifs, which are also shared by several other \ proteins related to MPTASE (including products of fungal QaX and qutG,\ bacterial suhB and cysQ, and yeast hal2) PUBMED:7761465. \

    Structural analysis of these proteins has revealed a common core of 155\ residues: the core comprises 5 alpha-helices and 11 beta-strands, and\ includes residues essential for metal binding and catalysis. While the\ core has been conserved presumably to impart catalytic function, the\ loops and regions of structure outside the core have evolved unique\ regulatory domains PUBMED:7761465.

    \

    An interesting property of the enzymes of this family is their sensitivity\ to Li+ at levels achieved in patients undergoing therapy for manic\ depression. The targets and mechanism of action of Li+ are unknown, but\ overactive inositol phosphate signalling may account for symptoms of the\ disease PUBMED:2553271. It has been proposed that these Li+-sensitive proteins \ could represent targets for Li+ in manic depressive disease PUBMED:7761465. \ Recently, the fold of fructose 1,6-bisphosphatase (FBPTASE) was noted to\ be identical to that of MPTASE PUBMED:8382485. FBPTASE is a critical enzyme in the\ gluconeogenic pathway that removes the 1-phosphate from fructose 1,6-bis-\ phosphate to form fructose 6-phosphate PUBMED:2159755, PUBMED:3008716. FBTASE also requires metal\ ions for catalysis (Mg2+ and Mn2+ being preferred) and the enzyme is \ potently inhibited by Li+.\ 1PTASE, MPTASE and FBPTASE share a sequence motif (Asp-Pro-Ile/Leu-Asp-\ Gly/Ser-Thr/Ser) which has been shown to bind metal ions and participate\ in catalysis. This motif is also found in the distantly-related fungal,\ bacterial and yeast MPTASE homologues. It has been suggested that these\ proteins define an ancient structurally conserved family involved in\ diverse metabolic pathways, including inositol signalling, gluconeogenesis,\ sulphate assimilation and possibly quinone metabolism PUBMED:7761465.

    \ 5060 IPR007897 \

    The proteins this domain is found in are typically involved in regulating polymer accumulation in\ bacteria, for example the production of poly-beta-hydroxybutyrate (PHB) which is formed via the polymerization of D(-)-3-hydroxybutyryl-CoA PUBMED:9922249. The function of\ this domain is unknown.

    \ 1304 IPR003132 \ This family contains the B domain of staphylococcal protein A, which specifically binds to the Fc portion of immunoglobulin G.\ 800 IPR007066 \ RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 3, represents the pore domain. The 3' end of RNA is positioned close to this domain. The pore delimited by this domain is thought to act as a channel through which nucleotides enter the active site and/or where the 3' end of the RNA may be extruded during back-tracking PUBMED:8910400, PUBMED:11313498.\ 5645 IPR008814 \ This family consists of several eukaryotic Ribophorin II (RPN2) proteins. The mammalian oligosaccharyltransferase (OST) is a protein complex that effects the cotranslational N-glycosylation of newly synthesised polypeptides, and is composed of at least four rough ER-specific membrane proteins: ribophorins I and II (RI and RII), OST48, and Dadl. The mechanism(s) by which the subunits of this complex are retained in the ER are not well understood PUBMED:10826490.\ 6698 IPR009668 \

    Saccharomyces cerevisiae A49 is a specific subunit associated with RNA polymerase I (Pol I) in eukaryotes. Pol I maintains transcription activities in A49 deletion mutants. However, such mutants are deficient in transcription activity at low temperatures. Deletion analysis of the fusion yeast homologue indicates that only the C-terminal two thirds are required for function. Transcript analysis has demonstrated that A49 is maximising transcription of ribosomal DNA PUBMED:12893961.

    \ 3896 IPR003342 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Dolichyl-phosphate-mannose-protein mannosyltransferase proteins belong to the glycosyltransferase family 39 () and are responsible for O-linked glycosylation of proteins. They catalyse the reaction:

    \

    The transfer of mannose to seryl and threonyl residues of secretory proteins is catalyzed by a family of protein mannosyltransferases in Saccharomyces cerevisiae coded for by seven genes (PMT1-7). Protein O-glycosylation is essential for cell wall rigidity and cell integrity and this protein modification is vital for S. cerevisiae PUBMED:8918452.

    \ 7305 IPR011099 \

    Alpha-glucuronidases, components of an ensemble of enzymes central to the recycling of photosynthetic biomass, remove the alpha-1,2 linked 4-O-methyl glucuronic acid from xylans. This family represents the C-terminal region of alpha-glucuronidase, which is mainly alpha-helical. It wraps around the catalytic domain (), making additional interactions both with the N-terminal domain () of its parent monomer and also forming the majority of the dimer-surface with the equivalent C-terminal domain of the other monomer of the dimer PUBMED:11937059.

    \ 1151 IPR002695 \ This is a family of bifunctional enzymes catalysing the last two steps in de novo purine biosynthesis. The bifunctional enzyme is found in both prokaryotes and eukaryotes. The second last step is catalysed by 5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase (AICARFT), this enzyme catalyses the formylation of AICAR with 10-formyl-tetrahydrofolate to yield FAICAR and tetrahydrofolate PUBMED:9332377. The last step is catalysed by IMP (Inosine monophosphate) cyclohydrolase (IMPCHase), cyclizing FAICAR (5-formylaminoimidazole-4-carboxamide ribonucleotide) to IMP PUBMED:9332377.\ 7076 IPR006611 \

    This cysteine-rich family of proteins has currently only been identified in Drosophila species.

    \ 3581 IPR007049 \

    The carbohydrate-selective porin OprB family includes the Pseudomonas aeruginosa porin B, a substrate-selective channel for a variety of different sugars. This protein may facilitate diffusion of a variety of diverse compounds, but is probably restricted to carbohydrates, and does facilitate glucose fusion across the outer membrane.

    \ \ 1697 IPR002541 \ This entry consists of various proteins involved in cytochrome c\ assembly from mitochondria and bacteria; CycK from Rhizobium leguminosarum PUBMED:7665469, \ CcmC from Escherichia coli and Paracoccus denitrificans PUBMED:7635817, PUBMED:9043133\ and orf240 from Triticum aestivum (wheat) mitochondria PUBMED:7529870. \ The members of this family are probably integral membrane proteins\ with six predicted transmembrane helices that may comprise the membrane component of an \ ABC (ATP binding cassette) transporter complex. This transporter may be necessary for transport of some component \ needed for cytochrome c assembly. \

    One member, R. leguminosarum CycK, contains a putative heme-binding motif PUBMED:7665469. Wheat \ orf240 also contains a putative heme-binding motif and is a proposed \ ABC transporter with c-type heme as its proposed substrate PUBMED:7529870.\ However it seems unlikely that all members of this family transport\ heme or c-type apocytochromes because P. denitrificans CcmC transports neither PUBMED:9043133.

    \ 1207 IPR003679 \ This family consists of bacterial aminoglycoside 3-N-acetyltransferases () that catalyse the reaction PUBMED:1761222:\ The enzyme\ can use a range of antibiotics with 2-deoxystreptamine rings as acceptor for its acetyltransferase activity, this\ inactivates and confers resistance to gentamicin, kanamycin, tobramycin, neomycin and apramycin amongst others. For the kanamycin group antibiotics acetylation occurred at the 3"-amino group in arbekacin and amikacin, and at the 3-amino group in dibekacin as in the case of kanamycin reflecting the effect of the (S)-4-amino-2-hydroxybutyryl side chain which is present in arbekacin and amikacin, but absent in dibekacin and kanamycin PUBMED:9766465.\ 4045 IPR007482 \ This family includes the mammalian protein tyrosine phosphatase-like protein, PTPLA. A significant variation of PTPLA from other protein tyrosine phosphatases is the presence of proline instead of catalytic arginine at the active site. It is thought that PTPLA proteins have a role in the development, differentiation, and maintenance of a number of tissue types PUBMED:10644438.\ 4632 IPR000398 \ Thymidylate synthase () PUBMED:6996564, PUBMED:2117882\ catalyzes the reductive methylation\ of dUMP to dTMP with concomitant conversion of 5,10-methylenetetrahydrofolate\ to dihydrofolate:\ \ This provides the sole de novo pathway for \ production of dTMP and is the only enzyme in folate metabolism in which the\ 5,10-methylenetetrahydrofolate is oxidised during one-carbon transfer PUBMED:3099389.\ The enzyme is essential for regulating the balanced supply of the 4 DNA\ precursors in normal DNA replication: defects in the enzyme activity\ affecting the regulation process cause various biological and genetic\ abnormalities, such as thymineless death PUBMED:2243092. The enzyme is an important target for certain chemotherapeutic drugs. \ Thymidylate synthase is an enzyme of about 30 to 35 Kd in most species except\ in protozoan and plants where it exists as a bifunctional enzyme that includes\ a dihydrofolate reductase domain PUBMED:3099389.\ A cysteine residue is involved in the catalytic mechanism (it covalently binds\ the 5,6-dihydro-dUMP intermediate). The sequence around the active site of\ this enzyme is conserved from phages to vertebrates.\ 7332 IPR011096 \

    The FTP domain is found in the propeptide region of bacterial and fungal metallopeptidases belonging to MEROPS peptidase families M4 and M36 respectively. In bacteria the FTP domain is N-terminal to this enrty, the PepSY domain; in fungi the M36 peptidases do not contain the PepSY domain. Propeptide swapping experiments have shown that the propeptides of the M4 and M36 families are not functionally interchangeable PUBMED:12589825.

    \ \

    The function of the propeptide in M36 peptidases has not been described, but it is likely, as in other related peptidases, to have targeting, chaperone activity and to inhibit peptidase activity, so as to prevent premature activation PUBMED:12589825, PUBMED:8636020.

    \ \ 6237 IPR009443 \

    This family consists of a series of primate specific nuclear pore complex interacting protein (NPIP) sequences. The function of this family is unknown but is well conserved from African apes to humans PUBMED:11586358.

    \ 4975 IPR005379 \

    The XH (rice gene X Homology) domain is found in a family of plant proteins including Oryza sativa . The molecular function of these proteins is unknown, however these proteins usually contain an XS domain () that is also found in the PTGS protein SGS3. As the XS and XH domains are fused in most of these proteins, these two\ domains may interact. The XH domain is between 124 and 145 residues in\ length and contains a conserved glutamate residue that may be functionally important PUBMED:12162795.

    \ 7653 IPR012497 \

    The members of this family resemble neurotoxin B-IV (), which is a crustacean-selective neurotoxin produced by the marine worm Cerebratulus lacteus. This highly cationic peptide is approximately 55 residues and is arranged to form two antiparallel helices connected by a well-defined loop in a hairpin structure. The branches of the hairpin are linked by four disulphide bonds. Three residues identified as being important for activity, namely Arg-17, -25 and -34, are found on the same face of the molecule, while another residue important for activity, Trp30, is on the opposite side. The protein,s mode of action is not entirely understood, but it may act on voltage-gated sodium channels, possibly by binding to an as yet uncharacterised site on these proteins. Its site of interaction may also be less specific, for example it may interact with negatively charged membrane lipids PUBMED:9180379.

    \ 6863 IPR009755 \

    This entry represents the C terminus (approximately 160 residues) of a number of proteins that resemble colon cancer-associated protein Mic1.

    \ 1056 IPR002890 \ The proteinase-binding alpha-macroglobulins (A2M) PUBMED:2473064 are large glycoproteins found in the plasma of vertebrates, in the hemolymph of some invertebrates and in reptilian and avian egg white. A2M-like proteins are able to inhibit all four classes of proteinases by a 'trapping' mechanism. They have a peptide stretch, called the 'bait region', which contains specific cleavage sites for different proteinases. When a proteinase cleaves the bait region, a conformational change is induced in the protein, thus trapping the proteinase. The entrapped enzyme remains active against low molecular weight substrates, whilst its activity toward larger substrates is greatly reduced, due to steric hindrance. Following cleavage in the bait region, a thiol ester bond, formed between the side chains of a cysteine and a glutamine, is cleaved and mediates the covalent binding of the A2M-like protein to the proteinase. This family includes the N-terminal region of the alpha-2-macroglobulin family.\ 2972 IPR008260 \ Hydroxymethylglutaryl-coenzyme A synthase (HMG-CoA synthase) catalyzes the condensation of\ acetyl-CoA with acetoacetyl-CoA to produce HMG-CoA and CoA PUBMED:7913309. A cysteine is known to act as the catalytic\ nucleophile in the first step of the reaction, the acetylation of the enzyme by acetyl-CoA. In vertebrates there are\ two isozymes located in different subcellular compartments, a cytosolic form which is the starting point of the\ mevalonate pathway which leads to cholesterol and other sterolic and isoprenoid compounds, and a mitochondrial form\ responsible for ketone body biosynthesis. HMG-CoA is also found in other eukaryotes such as insects, plants and fungi.\ 2995 IPR007065 \ These proteins are integral membrane proteins with four transmembrane spanning helices. The most conserved region of an alignment of the proteins is a motif HPP. The function of these proteins is uncertain but they may be transporters.\ 2909 IPR000912 \ The Herpesvirus major capsid protein (MCP) is the principal protein of the icosahedral capsid, forming\ the main component of the hexavalent and probably the pentavalent capsomeres. It shares similarity with\ all other Herpesvirus major capsid proteins.\ 2664 IPR004886 \ This family is a group of yeast glycolipid proteins anchored to the membrane. It includes Candida albicans pH-regulated protein, which is required for apical growth and plays a role in morphogenesis and Saccharomyces cerevisiae glycolipid anchored surface protein.\ 3454 IPR002928 \

    Muscle contraction is caused by sliding between the thick and thin filaments of the myofibril. Myosin is a major component of thick filaments and exists as a hexamer of 2 heavy chains PUBMED:1939027, 2 alkali light chains, and 2 regulatory light chains. The heavy chain can be subdivided into the N-terminal globular head and the C-terminal coiled-coil rod-like tail, although some forms have a globular region in their C-terminal. There are many cell-specific isoforms of myosin heavy chains, coded for by a multi-gene family PUBMED:2806546. Myosin interacts with actin to convert chemical energy, in the form of ATP, to mechanical energy PUBMED:3540939. The 3-D structure of the head portion of myosin has been determined PUBMED:8316857 and a model for actin-myosin complex has been constructed PUBMED:8316858.

    \

    This family consists of the coiled-coil myosin heavy chain tail region.\ The coiled-coil is composed of the tail from two molecules of myosin.\ These can then assemble into the macromolecular thick filament PUBMED:3783701.\ The coiled-coil region provides the structural backbone of the thick filament PUBMED:3783701.

    \ 2550 IPR004924 \

    The flagellar basal body consists of four rings (L,P,S and M) surrounding the flagellar rod, which is believed to transmit motor rotation to the filament PUBMED:2129540. The M ring is integral to the inner membrane of the cell, and may be connected to the rod via the S (supramembrane) ring, which lies just distal to it. The L and P rings reside in the outer membrane and periplasmic space, respectively.The FlgA protein is involved in the assembly of the flageller P-ring. It may associate with FlgF on the rod constituting a structure essential for the P-ring assembly, or may act as a modulator protein for P-ring assembly.

    \ 7263 IPR009996 \

    This family contains the bacterial protein YycH (approximately 450 residues long). The function of this protein is not known PUBMED:9829949.

    \ 6434 IPR009522 \

    This family consists of several Phlebovirus nucleocapsid (N) proteins.

    \ 2341 IPR002766 \ This family contains archaebacterial proteins of unknown function. Members of this\ family may be transmembrane proteins. It seems that all archaebacteria contains two members\ of this family.\ 7953 IPR012637 \

    This family consists of the lethal peptides (waglerins) that are found in the venom of Trimeresurus wagleri. Waglerins are 22-24 residue lethal peptides and are competitive antagonist of the muscle nicotinic receptor (nAChR). Waglerin-1 possesses a distinctive selectivity for the alpha-epsilon interface binding site of the mouse nAChR PUBMED:8533138.

    \ 3572 IPR003112 \

    The olfactomedin-domain was first identified in olfactomedin, an extracellular matrix protein of the olfactory neuroepithelium PUBMED:12615070. Members of this extracellular domain-family have since been shown to be present in several metazoan proteins, such as latrophilins, myocilins, optimedins and noelins, the latter being involved in the generation of neural crest cells. Myocilin is of considerable interest, as mutations in its olfactomedin-domain can lead to glaucoma PUBMED:15070869. The olfactomedin-domains in myocilin and optimedin are essential for the interaction between these two proteins PUBMED:12019210.

    \ 5793 IPR003889 \ The "FY-rich" domain C-terminal region is sometimes closely juxtaposed with the N-terminal region (), but sometimes is far distant. It is of unknown function, but occurs frequently in chromatin-associated proteins like trithorax and its homologues.\ 2770 IPR003318 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ Glucosyltransferases or sucrose 6-glycosyl transferases (GTF-S) (, ) catalyse the transfer of D-glucopyramnosyl units from sucrose onto acceptor molecules PUBMED:8982063. This signature roughly corresponds to the N-terminal catalytic domain of the enzyme. Members of this group also contain the putative cell wall binding repeat ().\ 3867 IPR001829 \

    Most Gram-negative bacteria possess a supramolecular structure - the pili - on their surface, that mediates attachment to specific receptors. Many interactive subunits are required to assemble pili, but their assembly only takes place after translocation across the cytoplasmic membrane.

    \

    Periplasmic chaperones assist pili assembly by binding to the subunits, thereby preventing premature aggregation PUBMED:8670884, PUBMED:1683764. This family of chaperones are structurally, and possibly evolutionarily, related to the immunoglobulin superfamily PUBMED:1348692: they contain two globular domains, with a topology identical to an immunoglobulin fold.

    \ 6520 IPR009575 \

    This family consists of several Melon necrotic spot virus (MNSV) P7B proteins. The function of this family is unknown.

    \ 4902 IPR007144 \ This protein is found to be part of a large ribonucleoprotein complex containing the U3 snoRNA PUBMED:12068309. Depletion of the Utp proteins impedes production of the 18S rRNA, indicating that they are part of the active pre-rRNA processing complex. This large RNP complex has been termed the small subunit (SSU) processome PUBMED:12068309.\ 2129 IPR007421 \

    This family is related to , and presumably has the same function (ATP-binding). A number of the archaeal members of this group are annotated as ATP-dependent DNA helicases (EC 3.6.1.-).

    \ \ 6023 IPR010397 \

    This is a family of uncharacterised bacterial and archaeal proteins.

    \ 5700 IPR008571 \ Members of this family have a P-loop containing nucleotide triphosphate hydrolases fold. This family is restricted to bacterial proteins, none of which have currently been characterised.\ 1686 IPR005172 \

    This entry includes proteins that have two copies of a cysteine rich motif as follows: C-X-C-X4-C-X3-YC-X-C-X6-C-X3-C-X-C-X2-C. The family includes Tesmin PUBMED:10191092 and TSO1 PUBMED:10769245. This group of proteins is called a CXC domain in PUBMED:10769245.

    \ 5687 IPR008685 \ Kinetochores are the chromosomal sites for spindle interaction and play a vital role for chromosome segregation. Fission Saccharomyces cerevisiae kinetochore protein Mis12, is required for correct spindle morphogenesis, determining metaphase spindle length PUBMED:10398680. Thirty-five to sixty percent extension of metaphase spindle length takes place in Mis12 mutants PUBMED:10398680. It has been shown that Mis12 might genetically interact with Mal2p PUBMED:12242294.\ 1105 IPR008131 \

    The E3B 14.5 kDa was first identified in human adenovirus type 5. It is an integral membrane protein oriented with its C terminus in the cytoplasm. It functions to down-regulate the epidermal growth factor receptor and prevent tumour necrosis factor cytolysis. It achieves this through the interaction with E3 10.4 kDa protein PUBMED:9488477, PUBMED:1531370.

    \ 932 IPR002931 \

    This domain is found in many proteins known to have transglutaminase activity, i.e. which cross-link proteins through an\ acyl-transfer reaction between the gamma-carboxamide group of peptide-bound glutamine and the\ epsilon-amino group of peptide-bound lysine, resulting in a epsilon-(gamma-glutamyl)lysine isopeptide bond. Tranglutaminases have been found in a diverse range of species, from bacteria through to mammals. The enzymes require calcium binding and their activity leads to post-translational\ modification of proteins through acyl- transfer reactions, involving peptidyl glutamine residues\ as acyl donors and a variety of primary amines as acyl acceptors, with the generation of\ proteinase resistant isopeptide bonds PUBMED:12366374.

    Sequence conservation in this superfamily primarily involves three motifs that center around conserved cysteine, histidine, and aspartate residues that form the catalytic triad in the structurally characterized transglutaminase, the human blood clotting factor XIIIa' PUBMED:7913750. On the basis of the experimentally demonstrated activity of the Methanobacterium phage pseudomurein endoisopeptidase PUBMED:9791169, it is proposed that many, if not all, microbial homologs of the transglutaminases are proteases and that the eukaryotic transglutaminases have evolved from an ancestral\ protease PUBMED:10452618.

    \

    A subunit of plasma Factor XIII revealed that each Factor XIIIA subunit is\ composed of four domains (termed N-terminal beta-sandwich, core domain (containing the\ catalytic and the regulatory sites), and C-terminal beta-barrels 1 and 2) and that two monomers\ assemble into the native dimer through the surfaces in domains 1 and 2, in opposite\ orientation. This organization in four domains is highly conserved during evolution among\ transglutaminase isoforms PUBMED:12366374.

    \ 5317 IPR008861 \ This family is found in a family of phage tail proteins. Sequence analysis suggests that they are related to which suggests a general peptidoglycan binding function.\ 6570 IPR010621 \

    This entry represents the C-terminal region of several hypothetical proteins of unknown function. Proteins in this entry are mostly bacterial, but a few are also found in eukaryotes and archaea.

    \ 3889 IPR001101 \

    Plectin may have a role in cross-linking intermediate filaments, in inter-linking intermediate filaments with microtubules and microfilaments and in anchoring intermediate filaments to the plasma and nuclear membranes. Plectin is recruited into hemidesmosomes multiprotein complexes that facilitate adhesion of epithelia to the basement membrane, thereby providing linkage between the intracellular keratin filaments to the laminins of the extracellular matrix. Plectin binds to hemidesmosomes through association of its actin-binding domain with the first pair of fibronectin type III repeats and a small part of the connecting segment of the integrin-beta4 subunit, the latter (integrin-alpha6,beta4) acting as a receptor for the extracellular matrix component laminin-5.

    \

    The plectin repeat is also seen in the cell adhesion junction plaque proteins, desmoplakin, envoplakin, and bullous pemphigoid antigen. The domains in plakins show considerable sequence homology. The N-terminus consists of a plakin domain containing a number of subdomains with high alpha-helical content, while the central coiled-coil domain is composed of heptad repeats involved in the dimerisation of plakin, and the C-terminus contains one or more homologous repeat sequences referred to plectin repeats PUBMED:14668477. This entry represents the plectin repeats found in the C-terminus of plakin proteins.

    \ \ 5219 IPR008669 \ This short motif is found at the C terminus of Prp24 proteins and probably interacts with the Lsm proteins to promote U4/U6 formation PUBMED:12458792.\ 5193 IPR008028 \

    Sarcolipin is a 31 amino acid integral membrane protein that regulates Ca-ATPase activity in\ skeletal muscle PUBMED:11781085.

    \ 3089 IPR000471 \ Interferons PUBMED:3022999 are proteins which produce antiviral and antiproliferative\ responses in cells. On the basis of their sequence interferons are classified\ into five groups: alpha, alpha-II (or omega), beta, delta (or trophoblast).\ The sequence differences may possibly cause different responses to various inducers, or \ result in the recognition of different target cell types PUBMED:6170983. The main\ conserved structural feature of interferons is a disulphide bond that, \ except in mouse beta interferon, occurs in all alpha, beta and omega\ sequences.\ 2765 IPR005194 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This family of glycosyl hydrolases () contains this domain and includes vacuolar acid trehalase and maltose phosphorylases. Maltose phosphorylase (MP) is a dimeric enzyme that catalyzes the conversion of maltose and inorganic phosphate into beta-D-glucose-1-phosphate and glucose. The C-terminal domain forms a two layered jelly roll motif. This domain is situated at the base of the catalytic domain, however its function remains unknown PUBMED:11587643.

    \ 6205 IPR009425 \

    This family consists of several hypothetical bacterial and phage proteins of unknown function.

    \ 7844 IPR012062 \

    Escherichia coli and other enteric bacteria contain two closely related D-tagatose 1,6-bisphosphate (TagBP)-specific aldolases involved in catabolism of galactitol (genes gatY gatZ) and of N-acetyl-galactosamine and D-galactosamine (genes kbaY, kbaZ, also called agaY, agaZ). The catalytic subunits GatY/KbaY alone are sufficient to show aldolase activity and contain most or all of the residues that have been identified as essential in substrate/product recognition and catalysis for class II aldolases PUBMED:11976750, PUBMED:8955298. However, these aldolases differ from other Class II aldolases (which are homodimeric enzymes) in that they require subunits GatZ/KbaZ for full activity and for good in vivo and in vitro stability. The Z subunits alone do not show any aldolase activity PUBMED:11976750. It should be noted that the previous suggestion of a tagatose 6P-kinase function for AgaZ PUBMED:8932697 and other members of this family turned out to be erroneous PUBMED:10931310, PUBMED:11976750.

    \ 3081 IPR000760 \ It has been shown that several proteins share two sequence motifs PUBMED:1660408. Two of these\ proteins, vertebrate and plant inositol monophosphatase (), and vertebrate inositol\ polyphosphate 1-phosphatase (), are enzymes of the inositol phosphate second messenger\ signalling pathway, and share similar enzyme activity. Both enzymes exhibit an absolute requirement\ for metal ions (Mg2+ is preferred), and their amino acid sequences contain a number of conserved\ motifs, which are also shared by several other proteins related to MPTASE (including products of fungal QaX and qutG, bacterial suhB and cysQ, and yeast hal2) PUBMED:7761465. The function of the\ other proteins is not yet clear, but it is suggested that they may act by enhancing the synthesis\ or degradation of phosphorylated messenger molecules PUBMED:1660408. Structural analysis of these\ proteins has revealed a common core of 155 residues, which includes residues essential for metal\ binding and catalysis. An interesting property of the enzymes of this family is their sensitivity\ to Li+. The targets and mechanism of action of Li+ are unknown, but overactive inositol phosphate\ signalling may account for symptoms of manic depression PUBMED:2553271.\ 3992 IPR004895 \ This family includes yeast hypothetical proteins and the uncharacterised rat prenylated rab acceptor protein PRA1.\ 3485 IPR006635 \

    This domain identifies a small family of protein with no known function which are found exclusively in bacteria.

    \ 5507 IPR008428 \

    This family represents Chondroitin N-acetylgalactosaminyltransferase. Proteins have a type II transmembrane topology.\ \ The enzyme is involved in the biosynthetic initiation and elongation of chondroitin sulphate and is the key enzyme responsible for the selective chain assembly of chondroitin/dermatan sulphate on the linkage region tetrasaccharide common to various proteoglycans containing chondroitin/dermatan sulphate or heparin/heparan sulphate chains.\

    \ 3314 IPR002116 \

    Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Api m 3.

    \ \

    Melittin is the principal protein component of the venom of the honeybee, Apis mellifera. It inhibits protein kinase C, Ca2+/calmodulin-dependent protein kinase II, myosin light chain kinase and Na+/K+-ATPase (synaptosomal membrane) and is a cell membrane lytic factor. Melittin is a small peptide with no disulphide bridge; the N-terminal part of the molecule is predominantly hydrophobic and the C-terminal part is hydrophilic and strongly basic.

    \

    The molecular mechanisms underlying the various effects of melittin on membranes have not been completely defined and much of the evidence indicates that different molecular mechanisms may underlie different actions of the peptide PUBMED:2187536.

    \

    Extensive work with melittin has shown that the venom has multiple effects, probably, as a result of its interaction with negatively changed phospholipids. It inhibits well known transport pumps such as the Na+-K+-ATPase and the H+-K+-ATPase. Melittin increases the permeability of cell membranes to ions, particularly Na+ and indirectly Ca2+, because of the Na+-Ca2+-exchange. This effect results in marked morphological and functional changes, particularly in excitable tissues such as cardiac myocytes. In some other tissues, e.g., cornea, not only Na+ but Cl- permeability is also increased by melittin. Similar effects to melittin on H+-K+-ATPase have been found with the synthetic amphipathic polypeptide Trp-3 PUBMED:10072885.

    \

    The study of melittin in model membranes has been useful for the development of methodology for determination of membrane protein structures. A molecular dynamics simulation of melittin in a hydrated dipalmitoylphosphatidylcholine (DPPC) bilayer was carried out. The effect of melittin on the surrounding membrane was localized to its immediate vicinity, and its asymmetry with respect to the two layers may be a result of the fact that it is not fully transmembranal. Melittin's hydrophilic C terminus anchors it at the extracellular interface, leaving the N terminus "loose" in the lower layer of\ the membrane PUBMED:10692322.

    \ 4066 IPR003379 \ This domain represents a conserved region in pyruvate carboxylase (PYC) (), oxaloacetate decarboxylase alpha chain (OADA) (), and transcarboxylase 5s subunit (). The domain is found adjacent to the HMGL-like domain () and often close to the biotin_lipoyl domain () of biotin requiring enzymes.\ 4677 IPR012000 \

    A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cofactor. It has been shown PUBMED:8604141 that some of these enzymes are structurally related. This central domain of TPP enzymes contains a 2-fold Rossman fold.

    \ 7290 IPR010901 \

    This entry represents the C-terminal region of merozoite surface protein 1 (MSP1), which is found in a number of Plasmodium species. MSP-1 is a 200 kDa protein expressed on the surface of the P. vivax merozoite. MSP-1 of Plasmodium species is synthesised as a high-molecular-weight precursor and then processed into several fragments. At the time of red cell invasion by the merozoite, only the 19 kDa C-terminal fragment (MSP-119), which contains two epidermal growth factor-like domains, remains on the surface. Antibodies against MSP-119 inhibit merozoite entry into red cells, and immunisation with MSP-119 protects monkeys from challenging infections. Hence, MSP-119 is considered a promising vaccine candidate PUBMED:12466500.

    \ 6721 IPR010689 \

    This family consists of Bacteriophage Mu P proteins and related sequences. The function of this family is unknown.

    \ 7212 IPR009972 \

    This family consists of several phage and bacterial proteins of around 59 residues in length. Members of this family seem to be found exclusively in Lactococcus lactis and the bacteriophages that infect this organism. The function of this family is unknown.

    \ 5927 IPR010353 \

    This family consists of several bacterial phenol hydroxylase subunit proteins, which are part of a multicomponent phenol hydroxylase. Some bacteria can utilise phenol or some of its methylated derivatives as their sole source of carbon and energy. The first step in this process is the conversion of phenol into catechol. Catechol is then further metabolised via the meta-cleavage pathway into TCA cycle intermediates PUBMED:7753034.

    \ 7596 IPR011684 \ This is a group of sequences found exclusively in plants. They are similar to kinase interacting protein 1 (KIP1), which has been found to interact with the kinase domain of PRK1, a receptor-like kinase PUBMED:11500547. This particular region contains two coiled-coils, which are described as motifs involved in protein-protein interactions PUBMED:11500547. It has also been suggested that the coiled-coils of the protein allow it to dimerise in vivo PUBMED:11500547.\ 6905 IPR009781 \

    This family consists of several hypothetical bacterial proteins of around 230 residues in length. The function of this family is unknown.

    \ 3622 IPR000648 \ A number of eukaryotic proteins that seem to be involved with sterol synthesis and/or its regulation have\ been found PUBMED:8017104 to be evolutionary related. These include mammalian oxysterol-binding protein\ (OSBP), a protein of about 800 amino-acid residues that binds a variety of oxysterols (oxygenated derivatives\ of cholesterol); yeast OSH1, a protein of 859 residues that also plays a role in ergosterol synthesis; yeast\ proteins HES1 and KES1, highly related proteins of 434 residues that seem to play a role in ergosterol synthesis;\ and yeast hypothetical proteins YHR001w, YHR073w and YKR003w.\ 7494 IPR011640 \ Escherichia coli has an iron(II) transport system (feo) which may make an important contribution to the iron supply of the cell under anaerobic conditions PUBMED:8407793. FeoB has been identified as part of this transport system. FeoB is a large 700-800 amino acid integral membrane protein. The N terminus has been previously erroneously described as being ATP-binding PUBMED:8407793. Recent work shows that it is similar to eukaryotic G-proteins and that it is a GTPase PUBMED:12446835.\ 4788 IPR004854 \ Post-translational ubiquitin-protein conjugates are recognized for degradation by the ubiquitin fusion degradation (UFD) pathway.\ Several proteins involved in this pathway have been identified PUBMED:7615550. This family includes UFD1, a 40kD protein that is essential for\ vegetative cell viability PUBMED:7615550. The human UFD1 gene is expressed at high levels during embryogenesis, especially in the eyes and in the\ inner ear primordia and is thought to be important in the determination of ectoderm-derived structures, including neural crest cells. In\ addition, this gene is deleted in the CATCH-22 (cardiac defects, abnormal facies, thymic hypoplasia, cleft palate and hypocalcaemia\ with deletions on chromosome 22) syndrome. This clinical syndrome is associated with a variety of developmental defects, all\ characterised by microdeletions on 22q11.2. Two such developmental defects are the DiGeorge syndrome OMIM:188400, and the\ velo-cardio- facial syndrome OMIM:145410. Several of the abnormalities associated with these conditions are thought to be due to\ defective neural crest cell differentiation PUBMED:9063746. \ 6581 IPR009608 \

    This family consists of several Bombina species specific bradykinin sequences. The skins of anuran amphibians, in addition to mucus glands, contain highly specialised poison glands, which, in reaction to stress or attack, exude a complex noxious cocktail of biologically active molecules. These secretions often contain a plethora of peptides among which bradykinin or structural variants have been identified PUBMED:12230583.

    \ 6673 IPR010669 \

    This region identifies several bacterial Myo-inositol catabolism (IolB) proteins. The Bacillus subtilis inositol operon (iolABCDEFGHIJ) is involved in myo-inositol catabolism. Glucose repression of the iol operon induced by inositol is exerted through catabolite repression mediated by CcpA and the iol induction system mediated by IolR PUBMED:11566986. The exact function of IolB is unknown.

    \ 3412 IPR002628 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    In PSII, the oxygen-evolving complex (OEC) is responsible for catalysing the splitting of water to O(2) and 4H+. The OEC is composed of a cluster of manganese, calcium and chloride ions bound to extrinsic proteins. In cyanobacteria there are five extrinsic proteins in OEC (PsbO, PsbP-like, PsbQ-like, PsbU and PsbV), while in plants there are only three (PsbO, PsbP and PsbQ), PsbU and PsbV having been lost during the evolution of green plants PUBMED:15258264.

    \

    This family represents the PSII OEC protein PsbO, which appears to be the most important extrinsic protein for oxygen evolution. PsbO lies closest to the Mn cluster where water oxidation occurs, and has a stabilising effect on the Mn cluster. As a result, PsbO is often referred to as the Mn-stabilising protein (MSP), although none of its amino acids are likely ligands for Mn. Calcium ions were found to modify the conformation of PsbO in solution PUBMED:14529295.

    \ \ 2962 IPR001310 \ The Histidine Triad (HIT) motif, His-phi-His-phi-His-phi-phi (phi, a\ hydrophobic amino acid) was identified as being highly conserved in a\ variety of organisms PUBMED:1472710. Crystal structure of rabbit Hint, purified as an adenosine and AMP-binding protein, showed that proteins in the HIT\ superfamily are conserved as nucleotide-binding proteins and that Hint\ homologs, which are found in all forms of life, are structurally related to\ Fhit homologs and GalT-related enzymes, which have more restricted\ phylogenetic profiles PUBMED:9164465. Hint homologs including rabbit Hint and yeast\ Hnt1 hydrolyze adenosine 5' monophosphoramide substrates such as AMP-NH2 and\ AMP-lysine to AMP plus the amine product and function as positive regulators\ of Cdk7/Kin28 in vivo PUBMED:11805111. Fhit homologs are diadenosine polyphosphate\ hydrolases PUBMED:8794732 and function as tumor suppressors in human and mouse PUBMED:10758156\ though the tumor suppressing function of Fhit does not depend on ApppA\ hydrolysis PUBMED:9576908. The third branch of the HIT superfamily, which includes\ GalT homologs, contains a related His-X-His-X-Gln motif and transfers\ nucleoside monophosphate moeities to phosphorylated second substrates rather\ than hydrolyzing them PUBMED:12119013.\ \ \ 481 IPR001092 \ Basic helix-loop-helix proteins (bHLH) are a group of eukaryotic transcription factors that exert a determinative influence in a variety of developmental pathways. These transcription factors are characterised by a highly evolutionary conserved bHLH domain that mediates specific dimerisation PUBMED:7553065. They facilitate the conversion of inactive monomers to trans-activating dimers at appropriate stages of development PUBMED:1755826. \

    The bHLH proteins can be classified into discrete categories. One such subdivision according to dimerisation, DNA binding and expression characteristics defines seven groups PUBMED:8018712. Class I proteins form dimers within the group or with class II proteins. Class II can only form heterodimers with class I factors. Class III factors are characterised by the presence of a leucine zipper () adjacent to the bHLH domain. Class IV factors may form homodimers or teterodimers with class III proteins. Class V and class VI proteins act as regulators of class I and class II factors and class VII proteins have a PAS domain ().\

    \ 1434 IPR000557 \ Calponin PUBMED:8130072, PUBMED:8144658 is a thin filament-associated protein that is implicated in the regulation\ and modulation of smooth muscle contraction. It is capable of binding to actin, calmodulin, troponin C and\ tropomyosin. The interaction of calponin with actin inhibits the actomyosin MgATPase activity. Calponin is a\ basic protein of approximately 34 Kd. Multiple isoforms are found in smooth muscles. Calponin contains three\ repeats of a well conserved 26 amino acid domain. Such a domain is also found in vertebrate smooth muscle protein\ (SM22 or transgelin), and a number of other proteins whose physiological role is not yet established, including\ Drosophila synchronous flight muscle protein SM20, Caenorhabditis elegans unc-87 protein PUBMED:7929573, rat neuronal protein NP25\ PUBMED:8015377, and an Onchocerca volvulus antigen PUBMED:7935620.\ 4695 IPR002560 \ Autonomous mobile genetic elements such as transposon or insertion sequences (IS)\ encode an enzyme, transposase, that is required for excising and inserting\ the mobile element. Transposases have been grouped into various families PUBMED:8041625, PUBMED:1310791, PUBMED:1718819. This family\ includes the IS204 PUBMED:8196545, IS1001 PUBMED:8093238, IS1096 PUBMED:1660454 and IS1165 PUBMED:1325060 transposases.\ 869 IPR005329 \

    SNXs are hydrophilic molecules that are localized in the cytoplasm\ and have the potential for membrane association either through their lipid-binding\ PX domains () or through proteinprotein interactions with membrane-associated\ protein complexes PUBMED:12461558. Indeed, several of the SNXs require several targeting motifs\ for their appropriate cellular localization. In almost every case studied,\ mammalian SNXs can be shown to have a role in protein sorting, with the\ most commonly used experimental model being plasma-membrane receptor\ endocytosis and sorting through the endosomal pathway. However, it is equally\ probable that SNXs sort vesicles that are not derived from the plasma\ membrane, and have a function in the accurate targeting of these vesicles and\ their cargo.

    The N-terminal domain appears to be specific to sorting nexins 1 and 2. SNX1 is both membrane-associated and cytosolic, where it probably exists as a\ tetramer in large protein complexes and may hetero-oligomerize with SNX2.

    \ 2098 IPR007374 \ This is an archaeal protein of unknown function.\ 7944 IPR012508 \

    This family consists of epsilon subunits of the ATP synthase. The ATP synthase complex is composed of an oligomeric transmembrane sector (CF0), and a catalytic core (CF1). CF1 is composed of 5 subunits, of which the epsilon subunit functions as a potent inhibitor of ATPase activity in both soluble and bound CF1. Only when the epsilon inhibition is disabled is high ATPase activity detected in ATPase PUBMED:12231816

    \ 7476 IPR011497 \

    This domain is usually indicative of serine protease inhibitors that belong to Merops inhibitor families: I1, I2, I17 and I31. However, kazal-like domains are also seen in the extracellular part of agrins, which are not known to be protease inhibitors. Kazal domains often occur in tandem arrays and have a small alpha+beta fold containing three disulphide bridges.

    \ 3879 IPR000961 \ Protein kinases are responsible for the phosphorylation of proteins, potentially for regulating their \ activity. This domain is found in a large variety of protein kinases with different functions and \ dependencies. Protein kinase C, for example, is a calcium-activated, phospholipid-dependent serine- and \ threonine-specific enzyme. It is activated by diacylglycerol which, in turn, phosphorylates a range of\ cellular proteins. This domain is most often found associated with .\ 509 IPR005821 \

    This group of proteins is found in sodium, potassium, and calcium ion channels proteins. The proteins have 6 transmembrane helices in which the last two helices flank a loop which determines ion selectivity. In some Na channels proteins the domain is repeated four times, whereas in others (e.g. K channels) the protein forms a tetramer in the membrane. A bacterial structure of the protein is known for the last two helices but is not included in the Pfam family due to it lacking the first four helices.

    \ 3295 IPR005153 \

    This domain is found in the MbtH protein as well as at the N-terminus of the antibiotic synthesis protein NIKP1. This domain is about 70 amino acids long and contains 3 fully conserved tryptophan residues. Many of the members of this family are found in known antibiotic synthesis gene clusters.

    \ 6211 IPR009428 \

    This family consists of several eukaryotic beta-catenin-interacting (ICAT) proteins. Beta-catenin is a multifunctional protein involved in both cell adhesion and transcriptional activation. Transcription mediated by the beta-catenin/Tcf complex is involved in embryological development and is upregulated in various cancers. ICAT selectively inhibits beta-catenin/Tcf binding in vivo, without disrupting beta-catenin/cadherin interactions PUBMED:12408824.

    \ 1154 IPR000728 \ This family includes Hydrogen expression/formation protein, HypE, which may be involved in\ the maturation of NifE hydrogenase; AIR synthase and FGAM synthase, which are involved in\ de novo purine biosynthesis; and selenide, water dikinase, an enzyme which synthesizes\ selenophosphate from selenide and ATP.\ 537 IPR001279 \ Apart from the beta-lactamases a number of other proteins contain this domain \ PUBMED:7588620. These proteins include thiolesterases, members of the glyoxalase II family,\ that catalyse the hydrolysis of S-D-lactoyl-glutathione to form glutathione and \ D-lactic acid and a competence protein that is essential for natural transformation in \ Neisseria gonorrhoeae and could be a transporter involved in DNA uptake. Except for the \ competence protein these proteins bind two zinc ions per molecule as cofactor.\ 4265 IPR001574 \ A number of bacterial and plant toxins act by inhibiting protein synthesis in eukaryotic cells. The toxins of the Shiga and ricin family inactivate 60S ribosomal subunits by an N-glycosidic cleavage which releases a specific adenine base from the sugar-phosphate backbone of 28S rRNA PUBMED:3276522, PUBMED:2714255, PUBMED:1742358. Members of the family include shiga and shiga-like toxins, and type I (e.g. trichosanthin and luffin) and type II (e.g. ricin, agglutinin and abrin) ribosome inactivating proteins (RIPs). All these toxins are structurally related. RIPs have been of considerable interest because of their potential use, conjugated with monoclonal antibodies, as immunotoxins to treat cancers. Further, trichosanthin has been shown to have potent activity against HIV-1-infected T cells and macrophages PUBMED:8066085. Elucidation of the structure-function relationships of RIPs has therefore become a major research effort. It is now known that RIPs are structurally related. A conserved glutamic residue has been implicated in the catalytic mechanism PUBMED:3357883; this lies near a conserved arginine, which also plays a role in catalysis PUBMED:8411176.\ 4014 IPR003817 \ Phosphatidylserine decarboxylase plays a pivotal role in the synthesis of phospholipid by the mitochondria. The substrate phosphatidylserine is synthesized extramitochondrially and must be translocated to the mitochondria prior to decarboxylation PUBMED:8407984. \ Phosphatidylserine decarboxylases is responsible for conversion of phosphatidylserine to phosphatidylethanolamine and plays a central role in the biosynthesis of aminophospholipids PUBMED:7890740.\ 2352 IPR002790 \

    This entry describes archaebacterial proteins of unknown function.

    \ 4093 IPR000651 \ This domain is found in several guanine nucleotide exchange factors for Ras-like small GTPases, and lies\ N-terminal to the RasGef (Cdc25-like) domain. Proteins belonging to this family include guanine nucleotide\ dissociation stimulator, which stimulates the dissociation of GDP from the Ras-related RalA and RalB\ GTPases and allows GTP binding and activation of the GTPases; GTPase-activating protein (GAP) for Rho1\ and Rho2, which is involved in the control of cellular morphogenesis; and the yeast cell division control\ protein, which promotes the exchange of Ras-bound GDP by GTP and controls the level of cAMP when\ the cell division cycle is triggered. Also included is the son of sevenless protein, which promotes the\ exchange of Ras-bound GDP by GTP during neuronal development.\ 1455 IPR004917 \ This protein is found in various caulimoviruses. It codes for an 18 kDa protein (PII), which is dispensable for infection but which is\ required for aphid transmission of the virus PUBMED:6311674. This protein interacts with the PIII protein PUBMED:10601029. \ \ 4891 IPR002042 \ Uricase () (urate oxidase) PUBMED:3182808 is the peroxisomal enzyme responsible\ for the degradation of urate into allantoin:\ \ Some species, like primates and\ birds, have lost the gene for uricase and are therefore unable to degrade\ urate PUBMED:2594778. Uricase is a protein of 300 to 400 amino acids, its sequence is well conserved.\ It is mainly localised in the liver,\ where it forms a large electron-dense paracrystalline core in many\ peroxisomes PUBMED:2338140.\ The enzyme exists as a tetramer of identical subunits, \ each containing a possible type 2 copper-binding site PUBMED:2594778. In legumes, 2\ forms of uricase are found: in the roots, the tetrameric form; and, in \ the uninfected cells of root nodules, a monomeric form, which plays an\ important role in nitrogen-fixation PUBMED:.\ 3340 IPR002084 \ Binding of a specific DNA fragment and S-adenosyl methionine (SAM) co-repressor molecules to the E. coli methionine repressor (MetJ) leads to a significant reduction in dynamic flexibility of the ternary complex, with considerable entropy-enthalpy\ compensation, not necessarily involving any overall conformational change PUBMED:8026581. MetJ is a regulatory protein which when combined with\ S-adenosylmethionine (SAM) represses the expression of the methionine\ regulon and of enzymes involved in SAM synthesis. It is also autoregulated.\

    The crystal structure of the met repressor-operator complex shows two dimeric\ repressor molecules bound to adjacent sites 8 base pairs apart on an 18-base-pair\ DNA fragment. Sequence specificity is achieved by insertion of double-stranded\ antiparallel protein beta-ribbons into the major groove of B-form DNA, with direct\ hydrogen-bonding between amino-acid side chains and the base pairs. The\ repressor also recognizes sequence-dependent distortion or flexibility of the operator\ phosphate backbone, conferring specificity even for inaccessible base pairs PUBMED:1406951.

    \ 5130 IPR007967 \

    This family consists of several uncharacterised eukaryotic proteins of unknown function.

    \ 7510 IPR011636 \ Thiosulphate:quinone oxidoreductase (TQO) catalyses one of the early steps in elemental sulphur oxidation. A novel TQO enzyme was purified from the thermo-acidophilic archaeon Acidianus ambivalens and shown to consist of a large subunit (DoxD) and a smaller subunit (DoxA). The DoxD- and DoxA-like two subunits are fused together in a single polypeptide in .\ 2751 IPR001088 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 4 \ comprises enzymes with several known activities; 6-phospho-beta-glucosidase (); 6-phospho-alpha-glucosidase (); alpha-galactosidase ().

    \ \ \

    6-phospho-alpha-glucosidase requires both NAD(H) and divalent metal (Mn2+, Fe2+, Co2+, or Ni2+) for activity PUBMED:9765262.

    \ 2400 IPR004700 \ Bacterial PTS transporters transport and concomitantly phosphorylate their sugar substrates, and typically consist of multiple subunits or protein domains.The Man family is unique in several respects among PTS permease families.\
  • It is the only PTS family in which members possess a IID protein.
  • It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.
  • Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.
  • \

    The mannose permease of Escherichia coli, for example, can transport and phosphorylate glucose, mannose, fructose, glucosamine, N-acetylglucosamine, and other sugars. Other members of this can transport sorbose, fructose and N-acetylglucosamine.

    \

    This family is specific for the sorbose-specific IIC subunits of this family of PTS transporters.

    \ 6319 IPR009474 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 1708 IPR002326 \ Cytochrome bc1 complex (ubiquinol:ferricytochrome c oxidoreductase) is \ found in mitochondria, photosynthetic bacteria and other prokaryotes PUBMED:.\ It is minimally composed of three subunits: cytochrome b, carrying a low-\ and a high-potential haem group; cytochrome c1 (cyt c1); and a high-potential Rieske iron-sulphur protein. The general function of the complex \ is electron transfer between two mobile redox carriers, ubiquinol and \ cytochrome c; the electron transfer is coupled with proton translocation \ across the membrane, thus generating proton-motive force in the form of an\ electrochemical potential that can drive ATP synthesis. In its structure and\ functions, the cytochrome bc1 complex bears extensive analogy to the\ cytochrome b6f complex of chloroplasts and cyanobacteria; cyt c1 plays an\ analogous role to cytochrome f, in spite of their different structures PUBMED:7631417.\ 7193 IPR009958 \

    This family consists of several alpha conotoxin precursor proteins from a number of Conus species. The alpha-conotoxins are small peptide neurotoxins from the venom of fish-hunting cone snails which block nicotinic acetylcholine receptors (nAChRs) PUBMED:3196703.

    \ 4398 IPR007671 \ SelP is the only known eukaryotic selenoprotein that contains multiple selenocysteine (Sec) residues, and accounts for more than 50% of the selenium content of rat and human plasma PUBMED:10775431. It is thought to be glycosylated PUBMED:11168591. SelP may have antioxidant properties. It can attach to epithelial cells, and may protect vascular endothelial cells against peroxynitrite toxicity PUBMED:10775431. The high selenium content of SelP suggests that it may be involved in selenium intercellular transport or storage PUBMED:11168591. The promoter structure of bovine SelP suggests that it may be involved in countering heavy metal intoxication, and may also have a developmental function PUBMED:9358058. The N-terminal region of SelP can exist independently of the C-terminal region. Zebrafish selenoprotein Pb () lacks the C-terminal Sec-rich region, and a protein encoded by the rat SelP gene and lacking this region has also been reported PUBMED:11168591. The N-terminal region contains a conserved SecxxCys motif, which is similar to the CysxxCys found in thioredoxins. It is speculated that the N-terminal region may adopt a thioredoxin fold and catalyse redox reactions PUBMED:11168591. The N-terminal region also contains a His-rich region, which is thought to mediate heparin binding. Binding to heparan proteoglycans could account for the membrane binding properties of SelP PUBMED:10775431.\ 5372 IPR008794 \ This family consists of proline racemase () proteins which catalyse the interconversion of L- and D-proline in bacteria PUBMED:3755058. This family also contains several similar eukaryotic proteins including a sequence with B-cell mitogenic properties which has been characterised as a co-factor-independent proline racemase PUBMED:10932226.\ 7894 IPR012586 \

    This characteristic repeat of proliferating cell nuclear antigen P120 is found in three copies PUBMED:15112237.

    \ 6236 IPR009442 \

    This family consists of several homospermidine synthase proteins (). Homospermidine synthase (HSS) catalyses the synthesis of the polyamine homospermidine from 2 mol putrescine in an NAD+-dependent reaction PUBMED:8841401.

    \ 5619 IPR008414 \ This family consists of several Bacillus haemolytic enterotoxins (HblC, HblD, HblA, NheA, and NheB) which can cause food poisoning in humans PUBMED:12039781.\ 4026 IPR006814 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight intrinsic protein PsbR found in PSII, which is also known as the 10 kDa polypeptide. The PsbR gene is found only in the nucleus of green algae and higher plants. PsbR may provide a binding site for the extrinsic oxygen-evolving complex protein PsbP to the thylakoid membrane. PsbR has a transmembrane domain to anchor it to the thylakoid membrane, and a charged N-terminal domain capable of forming ion bridges with extrinsic proteins, allowing PsbR to act as a docking protein. PsbR may be a pH-dependent stabilising protein that functions at both donor and acceptor sides of PSII PUBMED:1697267.

    \ \ \ 5897 IPR009277 \

    PerC is a transcriptional activator of EaeA/BfpA expression in enteropathogenic bacteria PUBMED:7729884.

    \ 6 IPR007513 \ Members of this family are short proteins that are rich in aspartate, glutamate, lysine and arginine. Although the function of these proteins is unknown, they are found to be ubiquitously expressed PUBMED:9731538.\ 2438 IPR000133 \

    Proteins resident in the lumen of the endoplasmic reticulum (ER) contain a C-terminal\ tetrapeptide (commonly Lys-Asp-Glu-Leu (KDEL) in mammals and His-Asp-Glu-Leu\ (HDEL) in yeast (Saccharomyces cerevisiae)) that acts as a signal for their retrieval from subsequent\ compartments of the secretory pathway. The receptor for this signal is a ~26 kDa Golgi\ membrane protein, initially identified as the ERD2 gene product in S.cerevisiae. The\ receptor molecule, known variously as the ER lumen protein retaining receptor or the\ 'KDEL receptor', is believed to cycle between the cis side of the Golgi apparatus and\ the ER. It has also been characterised in a number of other species, including plants,\ Plasmodium, Drosophila and mammals. In mammals, 2 highly related forms of the\ receptor are known.

    \ \

    The KDEL receptor is a highly hydrophobic protein of 220 residues; its sequence\ exhibits 7 hydrophobic regions, all of which have been suggested to traverse the\ membrane PUBMED:8392934. More recently, however, it has been suggested that only 6 of these\ regions are transmembrane (TM), resulting in both N- and C-termini on the cytoplasmic\ side of the membrane.

    \ 4367 IPR000914 \ Bacterial high affinity transport systems are involved in active transport of solutes across the \ cytoplasmic membrane. The protein components of these traffic systems include one or two transmembrane \ protein components, one or two membrane-associated ATP-binding proteins and a high affinity periplasmic \ solute-binding protein. The latter are thought to bind the substrate in the vicinity of the inner membrane, \ and to transfer it to a complex of inner membrane proteins for concentration into Gram-positive bacteria which are surrounded by a single membrane and therefore have no periplasmic region \ the equivalent proteins are bound to the membrane via an N-terminal lipid anchor. These homolog proteins \ do not play an integral role in the transport process per se, but probably serve as receptors to trigger \ or initiate translocation of the solute throught the membrane by binding to external sites of the integral \ membrane proteins of the efflux system. In addition at least some solute-binding proteins function in the \ initiation of sensory transduction pathways. On the basis of sequence similarities, the vast majority of \ these solute-binding proteins can be grouped PUBMED:8336670 into eight families of clusters, which generally \ correlate with the nature of the solute bound. Family 5 currently includes periplasmic oligopeptide-binding\ proteins (oppA) of Gram-negative bacteria and homologous lipoproteins in Gram-positive bacteria (oppA, amiA \ or appA); periplasmic dipeptide-binding proteins of Escherichia coli (dppA) and Bacillus subtilis (dppE); periplasmic \ murein peptide-binding protein of E. coli (mppA); periplasmic peptide-binding proteins sapA of E. coli, \ Salmonella typhimurium and Haemophilus influenzae; periplasmic nickel-binding protein (nikA) of E. coli;\ heme-binding lipoprotein (hbpA or dppA) from H. influenzae; lipoprotein xP55 from Streptomyces \ lividans; and hypothetical proteins from H. influenzae (HI0213) and Rhizobium strain NGR234 \ symbiotic plasmid (y4tO and y4wM).\ 4207 IPR000473 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \ Ribosomal protein L36 is the smallest protein from the large subunit of the prokaryotic ribosome. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities PUBMED: can be grouped into: bacterial L36; algal and plant chloroplast L36; Cyanelle L36. L36 is a small basic and cysteine-rich protein of 37 amino-acid residues.\ \ 1114 IPR000736 \ Hexon is the major coat protein from Adenovirus type 2, and is synthesised during late infection.\ It forms a homo-trimer. The 240 copies of the hexon trimer are organised so that 12 lie on each\ of the 20 facets. The central 9 hexons in a facet are cemented together by 12 copies of polypeptide\ IX. The penton complex, formed by the peripentonal hexons and base hexon (holding in place a\ fibre), lie at each of the 12 vertices PUBMED:7932702.\ 5838 IPR009255 \

    This domain of unknown function is found in several eukaryotic transcriptional co-activators.

    \ 7167 IPR009940 \

    This family consists of several Enterobacterial proteins of around 125 residues in length and contains 6 highly conserved cysteine residues. The function of this family is unknown.

    \ 7929 IPR012631 \

    This family consists of the T-superfamily of conotoxins. Eight different T-superfamily peptides from five Conus species were identified. These peptides share a consensus signal sequence, and a conserved arrangement of cysteine residues. T-superfamily peptides were found expressed in venom ducts of all major feeding types of Conus, suggesting that the T-superfamily is a large and diverse group of peptides, widely distributed in the 500 different Conus species PUBMED:10521453.

    \ 7803 IPR004465 \ Ribonucleotide reductases (RNRs) are enzymes that provide the precursors of DNA synthesis. The three characterized classes of RNRs differ by their metal cofactor and their stable organic radical. Class Ib RNR is encoded in four different genes: nrdH, nrdI, nrdE and nrdF PUBMED:12686643. The exact function of NrdI within the ribonucleotide reductases has not yet been fully characterised.\ 1479 IPR005087 \

    This domain is found in association with the signature for glycoside hydrolase, family 5 (). This domain is found in the endoglucanases ().

    \ 1140 IPR001203 \

    Enzymes of the aldehyde ferredoxin oxidoreductase (AOR) family PUBMED:9242907 contain a tungsten cofactor and an 4Fe4S cluster and catalyse the interconversion of aldehydes to carboxylates PUBMED:8672295. This family includes AOR, formaldehyde\ ferredoxin oxidoreductase (FOR), glyceraldehyde-3-phosphate ferredoxin oxidoreductase (GAPOR), all isolated from\ hyperthermophilic archea PUBMED:9242907; carboxylic acid reductase found in clostridia PUBMED:2550230; and hydroxycarboxylate viologen\ oxidoreductase from Proteus vulgaris, the sole member of the AOR family containing molybdenum PUBMED:8026480. GAPOR may be involved in glycolysis PUBMED:7721730, but the functions of the other proteins are not yet clear. AOR has been proposed to be the\ primary enzyme responsible for oxidising the aldehydes that are produced by the 2-keto acid oxidoreductases PUBMED:9275170.

    \ 1398 IPR002009 \ This family consists of bromovirus coat proteins. RNA-protein interactions stabilize many viruses and also the nucleoprotein cores of enveloped animal viruses (e.g. retroviruses). The nucleoprotein particles are frequently pleomorphic and generally unstable due to the lack of strong protein-protein interactions in their capsids.\

    The structure\ is known for cowpea chlorotic mottle virus PUBMED:7743132. It shows novel quaternary structure interactions based on interwoven carboxyterminal polypeptides that extend from canonical capsid beta-barrel subunits. Additional particle stability is provided by intercapsomere contacts between metal ion\ mediated carboxyl cages and by protein interactions with regions of ordered RNA.

    \ 5500 IPR008604 \ The organisation of microtubules varies with the cell type and is presumably controlled by tissue-specific microtubule-associated proteins (MAPs). The 115 kDa epithelial MAP (E-MAP-115) has been identified as a microtubule-stabilising protein predominantly expressed in cell lines of epithelial origin PUBMED:9745708. The binding of this microtubule associated protein is nucleotide independent PUBMED:8408219.\ 239 IPR008166 \

    This family contains Caenorhabditis elegans proteins of unknown function.

    \ 5748 IPR008422 \

    This family consists of several mating-type alpha and beta proteins from\ Coprinus cinereus (Inky cap fungus) as well as a related sequence from\ Schizophyllum commune (Bracket fungus). The A mating type locus of the\ fungus Coprinus cinereus is a complex, multigenic locus which regulates\ compatibility and subsequent sexual development.

    \ 5800 IPR009240 \

    The 15 aa repeat is found in the APC protein family. It is involved in binding beta-catenin PUBMED:9823329 along with the repeats. Many human cancer mutations map to the region around these motifs, and may be involved in disrupting their binding of beta-catenin.

    \ 4757 IPR001812 \ The trypanosome parasite expresses these proteins to evade the immune response PUBMED:2231728. The variant surface glycoprotein (VSG) of Trypanosoma brucei forms a coat on the surface of the parasite; by the expression of a series of antigenically distinct VSGs in the surface coat the parasite escapes the host immune response. \

    The 2.9A resolution crystal structure of the N-terminal domain of one variant, MITat 1.2, has been determined PUBMED:2231728. The "top" of the protein, which in the surface coat may be exposed to the external environment, is formed from the ends of the two long helices, a short three-stranded beta-sheet, and a strand having irregular conformation that packs above these secondary structure elements. Two conserved disulphide bridges are in this part of the molecule. Several elements of the MITat 1.2 sequence, which contribute to the formation of the helix bundle structure, have been identified. These elements can be found in the sequences of several different VSGs, suggesting that to some extent the VSG structure is conserved in those variants PUBMED:9574925.

    \ 2100 IPR007381 \ This is an archaeal protein of unknown function.\ 629 IPR007282 \ NOT1, NOT2, NOT3, NOT4 and NOT5 form a nuclear complex that negatively regulates the basal and activated transcription of many genes. This family includes NOT2, NOT3 and NOT5.\ 6385 IPR008304 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 5524 IPR008544 \ This family consists of several enterobacterial and siphoviral sequences of unknown function.\ 6259 IPR010932 \

    This domain of unknown function is found in virus antigenic proteins which may play a role in the initiation of DNA unwinding and replication.

    \ 784 IPR007676 \ Ribophorin I is an essential subunit of oligosaccharyltransferase (OST), which is also known as dolichyl-diphosphooligosaccharide--protein glycosyltransferase, (). OST catalyses the transfer of an oligosaccharide from dolichol pyrophosphate to selected asparagine residues of nascent polypeptides as they are translocated into the lumen of the rough endoplasmic reticulum. Ribophorin I and OST48 are thought to be responsible for OST catalytic activity PUBMED:11443278. Both yeast and mammalian proteins are glycosylated but the sites are not conserved. Glycosylation may contribute towards general solubility but is unlikely to be involved in a specific biochemical function PUBMED:7720878. Most family members are predicted to have a transmembrane helix at the C terminus of this region.\ 3449 IPR003327 \ This family consists of the leucine zipper dimerisation domain found in both cellular c-Myc proto-oncogenes and viral v-Myc oncogenes. Dimerisation via the leucine zipper motif with other basic helix-loop-helix-leucine\ zipper (b/HLH/lz) proteins is required for efficient DNA binding PUBMED:9680483. The Myc-Max\ dimer is a transactivating complex activating expression of growth related genes promoting cell proliferation.\ The dimerisation is facilitated via interdigitating leucine residues every 7th position of the alpha helix. Like\ charge repulsion of adjacent residues in this region preturbs the formation of homodimers with heterodimers\ being promoted by opposing charge attractions. It has been demonstrated that in transgenic mice the balance between oncogene-induced proliferation and apoptosis in a given tissue can be a critical determinant in the initiation and maintenance of the tumor PUBMED:10679391.\ 4681 IPR003688 \ The TRAG family are bacterial conjugation proteins. These proteins aid the transfer of DNA from the plasmid into\ the host bacterial chromosome although the exact mechanism of action is unknown.\ 8051 IPR013262 \

    The TOM13 family of proteins are mitochondrial outer membrane proteins that mediate the assembly of beta-barrel proteins PUBMED:15326197.

    \ 1195 IPR007820 \

    This family is annotated as putative ammonia monooxygenase enzymes by the COGS database (http://www.ncbi.nlm.nih.gov), which presents a compilation of orthologous groups of proteins from completely sequenced organisms.\ Ammonia monooxygenase catalyzes the oxidation of NH(3) to NH(2)OH.

    \ 2712 IPR004445 \ This is a family of sodium/glutamate symporters (glutamate permeases), which catalyse the sodium-dependent uptake of extracellular glutamate. The protein is located in the inner membrane.\ 6936 IPR010777 \

    This family consists of several Salmonella PipA (pathogenicity island-encoded protein A) and related phage sequences. PipA is thought to contribute to enteric but not to systemic salmonellosis PUBMED:9723926.

    \ 947 IPR003612 \ This domain is found is several proteins, including plant lipid transfer protein, seed storage protein and trypsin-alpha amylase inhibitor. The domain forms a four-helical bundle with an internal cavity.\ 8111 IPR013196 \

    This region defines helix-turn-helix domains in a wide variety of proteins.

    \ 6685 IPR009662 \

    This family consists of several bacterial malonate decarboxylase delta subunit (MdcD) proteins. Malonate decarboxylase of Klebsiella pneumoniae consists of four different subunits and catalyses the conversion of malonate plus H+ to acetate and CO2. The catalysis proceeds via acetyl and malonyl thioester residues with the phosphribosyl-dephospho-CoA prosthetic group of the acyl carrier protein (ACP) subunit. MdcC is the (apo) ACP subunit PUBMED:9208947.

    \ 6475 IPR010593 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 3805 IPR005846 \

    Phosphoglucomutase (, PGM) is an enzyme responsible for\ the conversion of D-glucose 1-phosphate into D-glucose 6-phosphate. PGM\ participates in both the breakdown and synthesis of glucose. Phosphomannomutase (, PMM) is an enzyme responsible for\ the conversion of D-mannose 1-phosphate into D-mannose 6-phosphate. PMM is\ required for different biosynthetic pathways in bacteria.

    \

    This domain is contained in both proteins.

    \ 4324 IPR007359 \ This bacterial family of integral membrane proteins represents a positive regulator of the sigma(E) transcription factor, namely RseC/MucC. The sigma(E) transcription factor is up-regulated by cell envelope protein misfolding, and regulates the expression of genes that are collectively termed ECF (devoted to Extra-Cellular Functions) PUBMED:9159522. In Pseudomonas aeruginosa, derepression of sigma(E) is associated with the alginate-overproducing phenotype characteristic of chronic respiratory tract colonization in cystic fibrosis patients. The mechanism by which RseC/MucC positively regulates the sigma(E) transcription factor is unknown. RseC is also thought to have a role in thiamine biosynthesis in Salmonella typhimurium PUBMED:9335303. In addition, this family also includes an N-terminal part of RnfF, a Rhodobacter capsulatus protein, of unknown function, that is essential for nitrogen fixation. This protein also contains a domain found in ApbE protein , which is itself involved in thiamine biosynthesis.\ 6513 IPR010602 \

    This family consists of several hypothetical bacterial proteins of around 250 residues in length and is found in several Chlamydia and Anabaena species. The function of this family is unknown.

    \ 4549 IPR002566 \ This family includes a number of bacterial surface antigens expressed on the surface of pathogens. The Anaplasma marginale surface proteins are targets of protective immune responses but are antigenically polymorphic PUBMED:8063397, PUBMED:8294020.\ 2072 IPR007297 \

    This is a domain of unknown function. It often occurs, as the N-terminal domain, in combination with either one or two domains of unknown function DUF403 () and DUF407 ().

    \ \ \ \ 4273 IPR011260 \

    The core of the bacterial RNA polymerase (RNAP) consists of four subunits, two alpha, a beta and a beta', which are conserved from bacteria to mammals. The alpha subunit (RpoA) initiates RNAP assembly by dimerising to form a platform on which the beta subunits can interact. The alpha subunit consists of a N-terminal domain (NTD) and a C-terminal domain (CTD), connected by a short linker. The NTD is essential for RNAP assembly, while the CTD is necessary for transcription regulation, interacting with transcription factors and promoter upstream elements. In Escherichia coli, the catabolite activator protein (CAP or CRP) was shown to exert its effect through its interactions with the CTD, where CAP binding to CTD promotes RNAP binding to promoter DNA, thereby stimulating transcription initiation at class I CAP-dependent promoters. At class II CAP-dependent promoters, the interaction of CAP with CTD is one of multiple interactions involved in activation PUBMED:12202833.

    \

    The CTD has a compact structure of four helices and two long arms enclosing its hydrophobic core, making its folding topology distinct from most other binding proteins. The upstream promoter element-binding site is formed from helices 1 and 4 PUBMED:7491496.

    \ \ 6908 IPR009783 \

    This family consists of several highly conserved hypothetical proteins of around 150 residues in length. The function of this family is unknown.

    \ 1045 IPR001708 \

    This family of proteins is required for the insertion of integral membrane proteins into cellular membranes. Many of these integral membrane proteins are associated with respiratory chain complexes, for example a large number of members of this family play an essential role in the activity and assembly of cytochrome c oxidase.

    \ Stage III sporulation protein J (SP3J) is a probable lipoprotein, rich in basic and hydrophobic amino acids. Mutations in the protein abolish the transcription of prespore-specific genes transcribed by the sigma G form of RNA polymerase PUBMED:1487728. SP3J could be involved in a signal transduction pathway coupling gene expression in the prespore to events in the mother cell, or it may be necessary for essential metabolic interactions between the two cells PUBMED:1487728. The protein shows a high degree of similarity to Bacillus subtilis YQJG, to yeast OXA1 and also to bacterial 60 kDa inner-membrane proteins PUBMED:7686882, PUBMED:7542800, PUBMED:1552862, PUBMED:8071197.

    \ 513 IPR002404 \ Insulin receptor substrate-1 proteins contain both a pleckstrin homology\ domain and a phosphotyrosine binding (PTB) domain. These domains facilitate \ interaction with the activated tyrosine-phosphorylated insulin receptor.\ The PTB domain is situated towards the N terminus. Two arginines in this domain are responsible for\ hydrogen bonding phosphotyrosine residues on a Ac-LYASSNPApY-NH2 peptide\ in the juxtamembrane region of the insulin receptor. Further interactions\ via 'bridged' water molecules are coordinated by residues an Asn and a Ser residue\ PUBMED:8646778.\

    The PTB domain has a compact, 7-stranded beta-sandwich structure, capped by\ a C-terminal helix. The substrate peptide fits into an L-shaped surface\ cleft formed from the C-terminal helix and strands 5 and 6 PUBMED:8599766.

    \ 5861 IPR009264 \

    This family consists of several nucleopolyhedrovirus proteins of unknown function.

    \ 5256 IPR008435 \ This family consists of several eukaryotic corticotropin-releasing factor binding proteins (CRF-BP or CRH-BP). Corticotropin-releasing hormone (CRH) plays multiple roles in vertebrate species. In mammals, it is the major hypothalamic releasing factor for pituitary adrenocorticotropin secretion, and is a neurotransmitter or neuromodulator at other sites in the central nervous system. In non-mammalian vertebrates, CRH not only acts as a neurotransmitter and hypophysiotropin, it also acts as a potent thyrotropin-releasing factor, allowing CRH to regulate both the adrenal and thyroid axes, especially in development. CRH-BP is thought to play an inhibitory role in which it binds CRH and other CRH-like ligands and prevents the activation of CRH receptors. There is however evidence that CRH-BP may also exhibit diverse extra and intracellular roles in a cell specific fashion and at specific times in development PUBMED:12379493.\ 4617 IPR002616 \ queuosine at position 34 in bacterial tRNAs and \ This is a family of queuine, archaeosine and general tRNA-ribosyltransferases , also known as tRNA-guanine transglycosylase and guanine insertion enzyme. Queuine tRNA-ribosyltransferase modifies tRNAs for asparagine, aspartic acid, histidine and tyrosine with queuine at position 34 and with archaeosine at position 15 in archaeal tRNAs. In bacterial it catalyses the exchange of guanine-34 at the wobble position with 7-aminomethyl-7-deazaguanine, and the addition of a cyclopentenediol moiety to 7-aminomethyl-7-deazaguanine-34 tRNA; giving a hypermodified base queuine in the wobble position PUBMED:8654383, PUBMED:8323579. The aligned region contains a zinc binding motif C-x-C-x2-C-x29-H, and important tRNA and 7-aminomethyl-7deazaguanine binding residues PUBMED:8654383.\ 2280 IPR006936 \ This conserved region is found in plant proteins including the resistance protein-like protein ().\ 1896 IPR003750 \

    This entry describes proteins of unknown function.

    \ 6971 IPR010791 \

    This family consists of several purple photosynthetic bacterial hydroxyneurosporene synthase (CrtC) proteins. The enzyme catalyses the conversion of various acyclic carotenes including 1-hydroxy derivatives. This broad substrate specificity reflects the participation of CrtC in 1'-HO-spheroidene and in spirilloxanthin biosynthesis PUBMED:12745254.

    \ 7014 IPR010804 \

    This family consists of several TraB proteins, which seem to be found exclusively in Agrobacterium species. TraB is known to be involved in conjugal transfer PUBMED:8763953. This family does not appear to be related to or .

    \ 5788 IPR010281 \

    This family consists of hypothetical bacterial proteins.

    \ 2840 IPR004046 \

    In eukaryotes, glutathione S-transferases (GSTs) participate in the\ detoxification of reactive electrophilic compounds by catalysing their\ conjugation to glutathione. The GST domain is also found in S-crystallins from squid, and proteins with no known GST activity, such as eukaryotic elongation factors 1-gamma and the HSP26 family of stress-related proteins, which include auxin-regulated proteins in plants and stringent starvation proteins in Escherichia coli. The major lens polypeptide of cephalopods is also a GST PUBMED:9074797, PUBMED:10783391, PUBMED:11035031, PUBMED:10416260.

    \

    Bacterial GSTs of known function often have a specific, growth-supporting role in biodegradative metabolism: epoxide ring opening and tetrachlorohydroquinone reductive dehalogenation are two examples of the reactions catalysed by these bacterial GSTs. Some regulatory proteins, like the stringent starvation proteins, also belong to the GST family PUBMED:11327815, PUBMED:9045797. GST seems to be absent from Archaea in which gamma-glutamylcysteine substitute to glutathione as major thiol.

    \

    Glutathione S-transferases form homodimers, but in eukaryotes can also form heterodimers of the A1 and A2 or YC1 and YC2 subunits. The homodimeric enzymes display a conserved structural\ fold. Each monomer is composed of a distinct N-terminal sub-domain,\ which adopts the thioredoxin fold, and a C-terminal all-helical\ sub-domain. This entry is the C-terminal domain.

    \ 4598 IPR005630 \

    Sequences containing this domain belong to the terpene synthase family. It has been suggested that this gene family be designated tps (for terpene synthase). Sequence comparisons reveal similarities between the monoterpene (C10) synthases, sesquiterpene (C15) synthases and the diterpene (C20) synthases. It has been split into six subgroups on the basis of phylogeny, called Tpsa-Tpsf PUBMED:9268308.

    \ \ \ \

    In the fungus Phaeosphaeria sp.L487 the synthesis of ent-kaurene from geranylgeranyl dophosphate is promoted by a single bifunctional protein PUBMED:9268298.

    \ 5624 IPR008446 \ This family consists of several Chordopoxvirus isatin-beta-thiosemicarbazone dependent protein (protein G2) sequences. Inactivation of the gene coding for this protein renders the virus dependent upon isatin-beta-thiosemicarbazone (IBT) for growth PUBMED:2024483.\ 1469 IPR003723 \

    Cobalamins (vitamin B12), both as deoxyadenosylcobalamin and methylcobalamin,\ are involved as cofactors in a variety of enzymatic reactions and are\ synthesized by some bacteria and archaea. About thirty enzymes are required to\ manufacture cobalamins, some of the most complex nonpolymeric molecules\ biosynthesized in the cell. Cobalamin biosynthesis can be divided into three\ distinct sections. The first results in the synthesis of the corrin ring\ component, cobinamide, from the ubiquitous tetrapyrrole primogenitor\ uroporphyrinogen III by a series of reactions including eight S-adenosyl-L-\ methionine-dependent methylations, ring contraction, cobalt chelation,\ decarboxylation, amidations, and 1-amino-2-propanol attachment. The second\ results in the synthesis of the lower axial ligand, dimethylbenzimidazole\ (DMB) and the third results in the assembly of the final coenzyme from the\ attachment of the corrin ring to the DMB as well as the addition of the upper\ coordinating ligand for the cobalt, either an adenosyl or a methyl group PUBMED:9742225.

    \

    \ A number of bacteria synthesize cobalamin (vitamin B12) by an anaerobic pathway, in which cobalt is added at an early stage and molecular oxygen is not required PUBMED:9742225. Of the 30 cobalamin synthetic genes, 25 are clustered in one operon, cob, and are arranged in three groups, each group encoding enzymes for a biochemically distinct portion of the biosynthetic pathway PUBMED:8501034. Precorrin-6x reductase , CbiJ/CobK, catalyses the reduction of macrocycle of precorrin-6Y to precorrin-6X.

    \ 7611 IPR012920 \

    This presumed domain is found at the C-terminus of a family of FtsJ-like methyltransferases. Members of this family are involved in 60S ribosomal biogenesis, for example PUBMED:10556316.

    \ 4893 IPR000193 \ Urocanase PUBMED:7944380 (also known as imidazolonepropionate hydrolase or\ urocanate hydratase) is the enzyme that catalyzes the second step in the\ degradation of histidine, the hydration of urocanate into\ imidazolonepropionate.\ \ Urocanase is found in some bacteria (gene hutU), in the\ liver of many vertebrates and has also been found in the plant Trifolium\ repens (white clover).\ Urocanase is a protein of about 60 Kd, it binds tightly to NAD+ and uses it\ as an electrophil cofactor. A conserved cysteine has been found to be\ important for the catalytic mechanism and could be involved in the binding of\ the NAD+.\ 469 IPR003754 \ Uroporphyrinogen III synthase (HEM4) catalyses the fourth step in the heme biosynthetic pathway in eukaryotes, bacteria and archaea PUBMED:7597845.\

    Congenital erythropoietic porphyria (CEP) is an autosomal recessive inborn error of metabolism that results from the markedly deficient activity of HEM4 PUBMED:8829650.

    \ 2708 IPR006097 \

    Glutamate, leucine, phenylalanine and valine dehydrogenases are structurally and functionally related. They contain a Gly-rich region containing a conserved Lys residue, which has been implicated in the catalytic activity, in each case a reversible oxidative deamination reaction.

    \

    Glutamate dehydrogenases (, , and ) (GluDH) are enzymes that catalyze the NAD- and/or NADP-dependent reversible deamination of L-glutamate into alpha-ketoglutarate PUBMED:1358610, PUBMED:8315654. GluDH isozymes are generally involved with either ammonia assimilation or glutamate catabolism. Two separate enzymes are present in yeasts: the NADP-dependent enzyme, which catalyses the amination of alpha-ketoglutarate to L-glutamate; and the NAD-dependent enzyme, which catalyses the reverse reaction PUBMED:2989290 - this form links the L-amino acids with the Krebs cycle, which provides a major pathway for metabolic interconversion of alpha-amino acids and alpha- keto acids PUBMED:3368458.

    \

    Leucine dehydrogenase () (LeuDH) is a NAD-dependent enzyme that catalyzes the reversible deamination of leucine and several other aliphatic amino acids to their keto analogues PUBMED:3069133. Each subunit of this octameric enzyme from Bacillus sphaericus contains\ 364 amino acids and folds into two domains, separated by a deep cleft. The\ nicotinamide ring of the NAD+ cofactor binds deep in this cleft, which is thought to\ close during the hydride transfer step of the catalytic cycle.

    \ \ 6038 IPR010406 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 1972 IPR002876 \ This domain is found in bacteria, plants, and yeast proteins. It compromises the entire length or central region of most of the proteins in the family, all of which are hypothetical with no known function. The average length of this domain is approximately 230 amino acids long.\ 5246 IPR008404 \ This family consists of several avian apovitellenin I sequences. As part of the avian reproductive effort, large quantities of triglyceride-rich very-low-density lipoprotein (VLDL) particles are transported by receptor-mediated endocytosis into the female germ cells. Although the oocytes are surrounded by a layer of granulosa cells harbouring high levels of active lipoprotein lipase, non-lipolysed VLDL is transported into the yolk. This is because VLDL particles from laying chickens (Gallus gallus) are protected from lipolysis by apolipoprotein (apo)-VLDL-II, a potent dimeric lipoprotein lipase inhibitor PUBMED:8713091. Apo-VLDL-II is produced in the liver and secreted into the blood stream when induced by estrogen production in female birds.\ 285 IPR002724 \

    This family was previously of unknown function, however many proteins have since been characterised as pyruvoyl-dependent arginine decarboxylases. The enzyme converts arginine to agmatine. Archaeoglobus fulgidus contains three copies of this\ 80 residue domain. These three copies, one of which is only half-length and excluded from the seed alignment, are very closely related and clearly arose by duplication after the separation from well-studied species. The other completed archaeal genomes each contain a single copy.

    \ 4057 IPR004716 \

    The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) PUBMED:8246840, PUBMED:2197982 is a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates and coupled with translocation across the cell membrane, makes the PTS a link between the uptake and metabolism of sugars.

    \ \

    The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred via a signal transduction pathway, to enzyme I (EI) which in turn transfers it to a phosphoryl carrier, the histidine protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease, a membrane-bound complex known as enzyme 2 (EII), which transports the sugar to the cell. EII consists of at least three structurally distinct domains IIA, IIB and IIC PUBMED:1537788. These can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII).

    \ \

    The first domain (IIA or EIIA) carries the first permease-specific phosphorylation site, a histidine which is phosphorylated by phospho-HPr. The second domain (IIB or EIIB) is phosphorylated by phospho-IIA on a cysteinyl or histidyl residue, depending on the sugar transported. Finally, the phosphoryl group is transferred from the IIB domain to the sugar substrate concomitantly with the sugar uptake processed by the IIC domain. This third domain (IIC or EIIC) forms the translocation channel and the specific substrate-binding site.

    \ \

    An additional transmembrane domain IID, homologous to IIC, can be found in some PTSs, e.g. for mannose PUBMED:8246840, PUBMED:1537788, PUBMED:7815935, PUBMED:11361063.

    \ \

    The Man family is unique in several respects among PTS permease families.\

  • It is the only PTS family in which members possess a IID protein.
  • It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.
  • Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.
  • \

    This family consists only of glucitol-specific transporters, and occur both in Gram-negative and Gram-positive bacteria. The system in Escherichia coli consists of a IIA protein, and a IIBC protein.

    This family is specific for the IIA component.

    \ 3734 IPR001456 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This family represents the potyvirus helper component protease found in genome polyproteins of potyviruses. It is is a cysteine peptidase belonging to the MEROPS peptidase family C6 (clan CA).

    \ \

    The genome polyprotein contains: N-terminal peptidase belonging to MEROPS peptidase family S30 (protein P1), helper component protease, MEROPS peptidase family C6, ()(HC-PRO), protein P3, 6KD protein (6K1), cytoplasmic inclusion protein (CI), 6KD protein 2 (6K2), genome-linked protein (VPG), nuclear inclusion protein A (), nuclear inclusion protein B () and coat protein (CP).

    \ \

    The helper component-proteinase is required for aphid transmission.

    \ 4401 IPR000534 \ The semialdehyde dehydrogenase family is found in N-acetyl-glutamine semialdehyde dehydrogenase (AgrC), which is involved in arginine biosynthesis, and aspartate-semialdehyde dehydrogenase PUBMED:10369777, an enzyme involved in the biosynthesis of various amino acids from aspartate. This family is also found in yeast and fungal Arg5,6 protein, which is cleaved into the enzymes N-acety-gamma-glutamyl-phosphate reductase and acetylglutamate kinase. These are also involved in arginine biosynthesis. All proteins in this entry contain a NAD binding region of semialdehyde dehydrogenase.\ \ 6690 IPR010678 \

    This family is defined by a C-terminal region of approximately 500 residues, which occurs in several hypothetical eukaryotic proteins of unknown function.

    \ 3417 IPR001369 \

    Phosphorylases that belong to the same family include, purine nucleoside phosphorylase \ () (PNP) from mammals as well as from some bacteria (gene deoD) (\ catalyzes the cleavage of guanosine or inosine to respective bases and sugar-1-phosphate \ molecules) PUBMED:2104852, 5'-methylthioadenosine\ phosphorylase () (MTA phosphorylase) from eukaryotes PUBMED:8687427 and xanthosine phosphorylase \ () from Escherichia coli (gene xapA) (degrades all purine nucleosides \ except adenosine and deoxyadenosine)PUBMED:7559336.\ Most bacterial PNP and archaebacterial MTA phosphorylases belong to a different group of \ phosphorylases . A number of uncharacterized proteins also belong to this \ group.

    \ 3337 IPR005493 \ Demethylmenaquinone methyltransferases convert dimethylmenaquinone (DMK) to menaquinone (MK) in the final step of menaquinone biosynthesis. This region is also found at the C-terminus of the DlpA protein .\ 4745 IPR007639 \ This is a region found N-terminal to the catalytic domain of glutaminyl-tRNA synthetase () in eukaryotes but not in Escherichia coli. This region is thought to bind RNA in a non-specific manner, enhancing interactions between the tRNA and enzyme, but is not essential for enzyme function PUBMED:10347214.\ 370 IPR003812 \ cAMP may be a regulation factor in cell division of some bacteria.\ The Fic (filamentation induced by cAMP) protein is involved in the synthesis of PAB or folate. It would appear that the Fic protein and cAMP are involved in a regulatory mechanism of cell division via folate metabolism and in these organisims cell division could be controlled by coordination of cAMP, Fic and Fts proteins PUBMED:1656497.\ 526 IPR005824 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The KOW (Kyprides, Ouzounis, Woese) motif is found in a variety of ribosomal proteins and the bacterial transcription antitermination proteins NusG PUBMED:8987397. Ribosomal protein L24 is one of the proteins from the large ribosomal subunit. In their mature form, these proteins have 103 to 150 amino-acid residues. \

    \ 3841 IPR006756 \ Under aerobic conditions, phenol is usually hydroxylated to catechol and degraded via the meta or ortho pathways. Two types of phenol hydroxylase are known: one is a multi-component enzyme the other is a single-component monooxygenase. This region is found in both types of enzymes PUBMED:2254258, PUBMED:11571188.\ 6751 IPR009693 \

    This family consists of several glucitol operon activator (GutM) proteins. Expression of the glucitol (gut) operon in Escherichia coli is regulated by an unusual, complex system, which consists of an activator (encoded by the gutM gene) and a repressor (encoded by the gutR gene) in addition to the cAMP-CRP complex (CRP, cAMP receptor protein). Synthesis of the mRNA, which initiates at the promoter specific to the gutR gene, occurs within the gutM gene. Expressional control of the gut operon appears to occur as a consequence of the antagonistic action of the products of the autogenously regulated gutM and gutR genes PUBMED:3062173.

    \ 1441 IPR001148 \

    Synonym(s): Carbonic dehydratase, Carbonic anhydrase

    \ \ Carbonate dehydratase () (CA) are zinc metalloenzymes which catalyze the\ reversible hydration of carbon dioxide. Eight enzymatic and evolutionary related forms\ of carbonic anhydrase are currently known to exist in vertebrates: three cytosolic\ isozymes (CA-I, CA-II and CA-III); two membrane-bound forms (CA-IV and CA-VII); a\ mitochondrial form (CA-V); a secreted salivary form (CA-VI); and a yet uncharacterized\ isozyme.\ 7005 IPR006573 \

    NEUZ is a domain of unknown function found in neuralized proteins, i.e. proteins involved in the specification of the neuroblast during cellular differentiation.

    \ 5117 IPR007954 \

    This entry contains the Baculovirus immediate-early protein IE-0.

    \ 3760 IPR001967 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S11 (D-Ala-D-Ala carboxypeptidase A family, clan SE). The protein fold of the peptidase domain for members of this family resembles that of D-Ala-D-Ala-carboxypeptidase B, the type example for clan SE.

    \ \ Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They include a wide range of peptidase activity, including exopeptidase, endo-peptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S27) of serine protease have been identified, these being grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural similarity and other functional evidence. Structures are known for four of the clans (SA, SB, SC and SE): these appear to be totally unrelated, suggesting at least four evolutionary origins of serine peptidases and possibly many more PUBMED:7845208.

    \

    Not with standing their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C clans have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Bacterial cell walls are complex structures containing amino acids and amino sugars, with alternating chains of N-acetylglucosamine and N-acetyl-muramic acid units linked by short peptides PUBMED:7845208: the link peptide in Escherichia coli is L-alanyl-D-isoglutamyl-L-meso-diaminopimelyl-D-alanine. The chains are usually cross-linked between the carboxyl of D-alanine and the free amino group of diaminopimelate. During the synthesis of peptidoglycan, the precursor has the described tetramer sequence with an added C-terminal D-alanine PUBMED:7845208.

    \

    D-Ala-D-Ala carboxypeptidase A is involved in the metabolism of cell components PUBMED:1741619; it is synthesised with a leader peptide to target it to the cell membrane PUBMED:7845208. After cleavage of the leader peptide, the enzyme is retained in the membrane by a C-terminal anchor. There are three families of serine-type D-Ala-D-Ala peptidase, which are also known as low molecular weight penicillin-binding proteins.

    \

    Family S11 contains only D-Ala-D-Ala peptidases, unlike families S12 and S13, which contain other enzymes, such as class C beta-lactamases and D-amino-peptidases PUBMED:7845208. Although these enzymes are serine proteases, some members of family S11 are partially inhibited by thiol-blocking agents PUBMED:1930140.

    \ 6441 IPR010575 \

    This domain is found in several KorB transcriptional repressor proteins. The korB gene is a major regulatory element in the replication and maintenance of broad host-range plasmid RK2. It negatively controls the replication gene trfA, the host-lethal determinants kilA and kilB, and the korA-korB operon PUBMED:3430606. This family is found in conjunction with .

    \ 1019 IPR006794 \ Zfx and Zfy are transcription factors implicated in mammalian sex determination. This region is found N-terminal to multiple copies of a C2H2 Zinc finger (). This region has been shown to activate transcription when fused to a GAL4 DNA binding domain PUBMED:2105457.\ 5348 IPR008452 \

    This family contains the P18 proteins of citrus tristeza virus (CTV). CTV is a member of the closterovirus group and is one of the more complex single-stranded RNA viruses. Assembly of the viral genome into virions is a critical process of the virus life cycle often defining the ability of the virus to move within the plant and to be transmitted horizontally to other plants. Closteroviridae virions are polar helical rods assembled primarily by a major coat protein, but with a related minor coat protein at one end. It is the only virus family that encodes a protein with similarity to cellular chaperones, a 70-kDa heat-shock protein homolog (HSP70h). Deletion mutagenesis reveales that p33, p6, p18, p13, p20, and p23 genes are not needed for virion formation. Their function is unknown PUBMED:11112500.

    \ 4301 IPR006064 \

    These enzymes correspond to Agrobacterium rolC and were characterized along with rolB. RolB and rolC were originally classified as glycoside hydrolase family 40 and 41 respectively. RolB has subsequently been shown PUBMED:8596628 to have tyrosine phosphatase activity.

    \ 7682 IPR012926 \

    A number of members of this family are annotated as being transmembrane proteins induced by tumour necrosis factor alpha, but no literature was found to support this.

    \ 1331 IPR004941 \ The function of the FP protein is not known. The protein is missing in baculovirus (Few Polyhedra) mutants PUBMED:8760443. \ 3022 IPR000671 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The large subunit of [NiFe]-hydrogenase, as well as other nickel metalloenzymes, is synthesized as a precursor devoid of the metalloenzyme active site. This precursor undergoes a complex post-translational maturation process that requires a number of accessory proteins PUBMED:11336840, PUBMED:12196162, PUBMED:10226043. At one step of this process, after nickel incorporation, each hydrogenase isoenzyme is processed by proteolytic cleavage at the C-terminal end by the corresponding hydrogenase maturation endopeptidase PUBMED:10727938. For example, Escherichia coli HycI is involved in processing of pre-HycE (the large subunit of hydrogenase 3) PUBMED:8125094, PUBMED:10795682; HybD is involved in processing of pre-HybC (the large subunit of hydrogenase 2) PUBMED:10331925; and HyaD is assumed to be involved in processing of the large subunit of hydrogenase 1. This group represents metallopeptidases of the MEROPS peptidase family M52 (HybD endopeptidase family, clan ML).

    \ \

    The cleavage site is after a His or an Arg, liberating a short peptide PUBMED:8405419, PUBMED:8125094. This cleavage occurs only in the presence of nickel, and the endopeptidase probably uses the metal in the large subunit of [NiFe]-hydrogenases as a recognition motif PUBMED:10727938. There is no direct evidence for the active site or substrate-binding site, but there are predictions based on an available structure PUBMED:10331925.

    \ \

    Nomenclature note: the following names are used in different organisms for members of this group: HycI, HybD, HyaD, HoxM, HoxW, HupD, HynC, HupM, VhoD, VhtD PUBMED:11336840. Gene/protein names are sometimes used interchangeably to designate various "hydrogenase cluster" proteins unrelated to each other in various organisms. For example, the following names are used for members of this group, but also for unrelated proteins: HupD is used in Azotobacter chroococcum and Anabaena sp. to designate an unrelated hydrogenase maturation factor; HydD is used to designate hydrogenase structural genes in Thermococcus litoralis, Pyrococcus abyssi, and other species.

    \ 7935 IPR012527 \

    This family consists of the uperin family of antimicrobial peptides. Uperin is a wide-spectrum antibiotic peptide isolated from the Australian toadlet, Uperoleia mjobergii. Being only 17 amino acid residues long, it is smaller than most other wide-spectrum antibiotic peptides isolated from amphibians. Uperin adopts a well-defined amphipathic alpha-helix with distinct hydrophilic and hydrophobic faces PUBMED:10461748.

    \ 4879 IPR005370 \

    The members of this family are small uncharacterised proteins.

    \ 614 IPR007229 \ Nicotinate phosphoribosyltransferase () is the rate-limiting enzyme that catalyses the first reaction in the NAD salvage synthesis. This family also contains a number of closely related proteins for which a catalytic activity has not been experimentally demonstrated.\ 5157 IPR007994 \

    This family contains several uncharacterised human\ proteins. The function of this family is unknown, however, the family member FKSG56 is a\ hepatocellular carcinoma-associated antigen.

    \ 6111 IPR009381 \

    This family consists of several bacterial ThuA like proteins. The function of the family is unknown.

    \ 4890 IPR007247 \ Ureidoglycolate hydrolase () carries out the third step in the degradation of allantoin.\ 2663 IPR000115 \ Phosphoribosylglycinamide synthetase () (GARS) (phosphoribosylamine\ glycine ligase) PUBMED:2687276 catalyzes the second step in the de novo biosynthesis of\ purine:\ \ \ \ In bacteria GARS is a monofunctional enzyme (encoded by the purD gene), in\ yeast it is part, with phosphoribosylformylglycinamidine cyclo-ligase (AIRS) \ of a bifunctional enzyme (encoded by the ADE5,7 gene), in higher eukaryotes it\ is part, with AIRS and with\ phosphoribosylglycinamide formyltransferase (GART) \ of a trifunctional enzyme (GARS-AIRS-GART).\ 3427 IPR005780 \

    This model describes N5-methyltetrahydromethanopterin: coenzyme M methyltransferase subunit E in methanogenic archaea. This methyltranferase is a\ membrane-associated enzyme complex that uses methyl-transfer reaction to drive sodium-ion pump. \ \ Archaea have evolved energy-yielding pathways marked by one-carbon biochemistry featuring novel cofactors and enzymes. This transferase (encoded by subunit A) is involved in the transfer of 'methyl' group from N5-methyltetrahydromethanopterin to coenzyme M. In an accompanying reaction, methane is produced by two-electron reduction of methyl-coenzyme M by another enzyme, methyl-coenzyme M reductase.

    \ \ 6605 IPR010643 \

    This domain represents a conserved region within a number of eukaryotic DNA repair helicases.

    \ 5252 IPR008602 \ This family contains several Plasmodium Duffy binding proteins. Plasmodium vivax and Plasmodium knowlesi merozoites invade Homo sapiens erythrocytes that express Duffy blood group surface determinants. The Duffy receptor family is localised in micronemes, an organelle found in all organisms of the phylum Apicomplexa PUBMED:2170017.\ 3710 IPR008209 \ Phosphoenolpyruvate carboxykinase (GTP) (PEPCK) PUBMED:2110163 catalyzes\ the formation of phosphoenolpyruvate by decarboxylation of oxaloacetate while\ hydrolyzing GTP:\ \ This is a rate limiting step in gluconeogenesis (the biosynthesis of\ glucose). In vertebrates there are two isozymes: a cytosolic form whose\ activity is affected by hormones regulating this metabolic process (such as\ glucagon, or insulin) and a mitochondrial form.\ An essential cysteine residue has been proposed PUBMED:2909519 to be implicated in the\ catalytic mechanism; this residue is located in the central part of PEPCK.\ 601 IPR000535 \

    Major sperm proteins (MSP) are central components in molecular interactions underlying sperm motility in Caenorhabditis elegans, whose sperm employ an amoebae-like crawling motion using a MSP-containing lamellipod, rather than the flagellar-based swimming motion associated with other sperm. These proteins oligomerise to form an extensive filament system that extends from sperm villipoda, along the leading edge of the pseudopod. About 30 MSP isoforms may exist in C. elegans.

    \

    MSPs form a fibrous network, whereby MSP dimers form helical subfilaments that coil around one another to produce filaments, which in turn form supercoils to produce bundles. The crystal structure of MSP from C. elegans reveals an immunoglobulin (Ig)-like seven-stranded beta sandwich fold PUBMED:12051923.

    \ 7992 IPR012581 \

    This C-terminal domain is found in nucleolar proteins PUBMED:15112237.

    \ 5722 IPR008784 \ This family consists of several DNA encapsidation protein (Gp16) sequences from the phi-29-like viruses. Gene product 16 catalyses the in vivo and in vitro genome-encapsidation reaction PUBMED:3879485.\ 49 IPR005561 \

    ANTAR (AmiR and NasR transcription antitermination regulators) is an RNA-binding domain found in bacterial transcription antitermination regulatory proteins PUBMED:11796212. This domain has been detected in various response regulators of two-component systems, which are structured around two proteins, a histidine kinase and a response regulator. This domain is also found in one-component sensory regulators from a variety of bacteria. Most response regulators interact with DNA, however ANTAR-containing regulators interact with RNA. The majority of the domain consists of a coiled-coil.

    \ 209 IPR002818 \

    This signature defines a diverse group of protein families which include proteins involved in RNA-protein interaction regulation,\ thiamine biosynthesis, Ras-related signal transduction, and those with protease activity. Examples of annotation are:

    \ \

    \ 7736 IPR012914 \

    This domain is found in the purine catabolism regulatory protein expressed by Bacillus subtilis (PucR, ). PucR is thought to be a transcriptional regulator of genes involved in the purine degradation pathway, and may contain a LysR-like DNA-binding domain. It is similar to LysR-type regulators in that it represses its own expression PUBMED:11344136. The other members of this family are also putative regulatory proteins.

    \ 3491 IPR005306 \

    The members of this family are derived from nepoviruses. Together with comoviruses and picornaviruses, nepoviruses are classified in the picornavirus superfamily of plus strand single-stranded RNA viruses. This family aligns several nepovirus coat protein sequences. In several cases, this is found at the C-terminus of the RNA2-encoded viral polyprotein. The coat protein consists of three trapezoid-shaped beta-barrel domains, and forms a pseudo T = 3 icosahedral capsid structure PUBMED:9519407.

    \ 6123 IPR009386 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 6138 IPR010450 \

    This family consists of mammalian neurexophilin proteins. Mammalian brains contain four different neurexophilin proteins. Neurexophilins form a family of related glycoproteins that are proteolytically processed after synthesis and bind to alpha-neurexins. The structure and characteristics of neurexophilins indicate that they function as neuropeptides that may signal via alpha-neurexins PUBMED:9570794.

    \ 2499 IPR001367 \ The diphtheria toxin repressor protein (DTXR) is a member of this group PUBMED:7568230. In \ Corynebacterium diphtheriae where it has been studied in some detail this protein acts\ as an iron-binding repressor of dipheteria toxin gene expression and may serve as a \ global regulator of gene expression. The N-terminus may be involved in iron binding and\ may associate with the Tox operator. Binding of DTXR to Tox operator requires a divalent\ metal ion such as cobalt, ferric, manganese and nickel whereas zinc shows weak \ activation PUBMED:7743135.\ 2635 IPR001748 \ A Xenopus protein known as G10 PUBMED:2568313 has been found to be highly conserved in a wide range of eukaryotic species. The function of G10 is still unknown. G10 is a protein of about 17 to 18 kDa (143 to 157 residues) which is hydrophilic and whose C-terminal half is rich in cysteines and could be involved in metal-binding.\ 7231 IPR009981 \

    This family consists of several uncharacterised Caenorhabditis elegans proteins of around 115 resides in length. Members of this family contain 6 highly conserved cysteine residues. The function of this family is unknown.

    \ 700 IPR011611 \

    This entry includes a variety of carbohydrate and pyrimidine kinases. The family includes phosphomethylpyrimidine kinase (). This enzyme is part of the Thiamine pyrophosphate (TPP) synthesis pathway, TPP is an essential cofactor for many enzymes PUBMED:9519409.

    \ 5564 IPR008722 \ This domain represents the presumed membrane spanning region of the OmpF proteins. This region is involved in channel formation and is thought to form an 8-stranded beta-barrel PUBMED:11034289.\ 993 IPR003125 \ This domain has no known function and is found in Caenorhabditis elegans proteins normally at the N-terminal.\ 1814 IPR007238 \ DNA primase is the polymerase that synthesises small RNA primers for the Okazaki fragments made during discontinuous DNA replication. DNA primase is a heterodimer of two subunits, the small subunit Pri1 (48 kDa in yeast), and the large subunit Pri2 (58 kDa in the yeast Saccharomyces cerevisiae) PUBMED:2528682. Both subunits participate in the formation of the active site, but the ATP binding site is located on the small subunit PUBMED:2023935. Primase function has also been demonstrated for human and mouse primase subunits PUBMED:8026492.\ 2717 IPR002932 \

    Ferredoxin-dependent glutamate synthases have been implicated in a number of functions including photorespiration in Arabidopsis where they may also play a role in primary nitrogen assimilation in roots PUBMED:9596633. This region is expressed as a seperate subunit in the glutamate synthase alpha subunit from archaebacteria, or part of a large multidomain enzyme in other organisms.

    \

    The aligned region of these proteins contains a putative FMN binding site and Fe-S cluster.

    \ 2551 IPR005648 \ FlgD is known to be absolutely required for hook assembly, yet it has not been detected in the mature flagellum PUBMED:8157595. It appears to act as a hook-capping protein to enable assembly of hook protein subunits PUBMED:8157595.\ 4950 IPR006743 \ This repeat is found in the extracellular (C-terminal) region of the variant surface antigen A (VlpA) of Mycoplasma hyorhinis. Mutations that change the number of repeats in the protein are involved in antigenic variation and immune evasion of this swine pathogen PUBMED:10671459.\ 3284 IPR002101 \

    Myristoylated alanine-rich C-kinase substrate (MARCKS) is a predominent\ cellular substrate for protein kinase C (PKC) that has been implicated in the regulation of brain development, \ macrophage activation, neuro-secretion and growth factor-dependent\ mitogenesis PUBMED:8420923, PUBMED:11829734. The N-terminal glycine is the site of myristoylation, \ which allows effective binding of the protein to the plasma membrane, where\ it co-localises with PKC PUBMED:2034276. MARCKS binds calmodulin in a calcium-dependent\ manner; the region responsible for calcium-binding is highly basic, a domain\ of about 25 amino acids known as the PSD or effector domain, which also contains the PKC\ phosphorylation sites and has been shown to contribute to membrane binding. When not phosphorylated, the effector domain can bind\ to filamentous actin PUBMED:1560845. It is believed that MARCKS may be a regulated \ crossbridge between actin and the plasma membrane; modulation of the actin\ cross-linking activity by calmodulin and phosphorylation, represent a\ potential convergence of the calcium-calmodulin and PKC signal transduction\ pathways in regulation of the actin cytoskeleton. MARCKS also contains an MH2 domain of unknown function.

    \

    MARCKS-related protein (MRP) is similar to MARCKS in terms of properties\ such as its myristoylation, phosphorylation and calmodulin-binding, and\ shares a high degree of sequence similarity. The two regions that show the highest\ similarity are the kinase C phosphorylation site domain and the N-terminal\ region containing the myristoylation site PUBMED:1864362. MARCKS and MRP amino acid \ compositions are similar, but the alanine content of the latter is lower. MARCKS proteins appear to adopt a native unfolded conformation i.e. as randomly folded chains arranged in non-classical extended conformations, in common with other substrates of PKC.

    \ 4806 IPR001727 \

    A number of uncharacterized proteins share regions of similarities. These include,\

    \

    These are hydrophobic proteins of 200 to 320 amino acids that seem to contain six or seven transmembrane domains.

    \ 1903 IPR003773 \

    This entry describes proteins of unknown function.

    \ 8093 IPR013165 \

    A total of 20 peptides of the superfamily allostatin were isolated from the shore crab Carcinus maenas. They are named carcinustatin 1 to 20 and their length ranges from 5 to 27 amino acids. This family includes carcinustatin 8, 9, 15 and 16 PUBMED:9461295.

    \ 7148 IPR010847 \

    This entry contains a number of plant harpin-induced 1 (Hin1) proteins, which are involved in the plant hypersensitive response (HR) PUBMED:8893538.

    \ 7056 IPR010822 \

    This family contains a number of bacterial stage II sporulation E proteins (). These are required for formation of a normal polar septum during sporulation. The N-terminal region is hydrophobic and is expected to contain up to 12 membrane-spanning segments PUBMED:8830262.

    \ 2140 IPR007431 \ This family includes several bacterial proteins of uncharacterised function.\ 812 IPR001247 \ This domain includes the 3'-5' exoribonucleases, ribonuclease PH that contains a single \ copy of this domain, and removes nucleotide residues following the -CCA terminus of \ tRNA and polyribonucleotide nucleotidyltransferase (PNPase) that contains two tandem \ copies of the domain and is involved in mRNA degradation in a 3'-5' direction. PNPase\ is involved in the RNA degradosome, a multi-enzyme complex important in RNA processing \ and messenger RNA degradation. In yeast these proteins are components of the exosome \ 3'-5' exoribonuclease complex that is required for 3' processing of the 5.8S rRNA\ PUBMED:9390555.\ 4877 IPR005367 \

    This is a small family of mainly hypothetical bacterial proteins of unknown function.

    \ 5345 IPR008907 \ This family encodes a 25 kDa protein that is phosphorylated by a Ser/Thr-Pro kinase PUBMED:1909972. It has been described as a brain specific protein, but it is found in Tetrahymena thermophila.\ 7390 IPR011500 \

    This conserved sequence contains several highly-conserved Cys residues that are predicted to form disulphide bridges. It is predicted to lie outside the cell membrane, tethered to in several receptor proteins.

    \ 432 IPR005154 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This represents a family of alpha-glucuronidases (). Deletion mutants have indicated that the central region is responsible for the catalytic activity. Within this central domain, the invariant Glu and Asp (residues 391 and 364 respectively from B. stearothermophilus) are thought to from the the catalytic centre PUBMED:11358519.

    \ 1991 IPR005185 \

    This proteins contain a domain which occurs as one or more copies in a small family of putative membrane proteins.

    \ 2186 IPR007474 \

    This domain is found in the bacterial protein ApaG and at the C termini of some F-box proteins (). F-box proteins contain a carboxy-terminal domain that interacts with protein substrates PUBMED:10531037. The ApaG domain is ~125 amino acids in length, and is named after the bacterial ApaG protein, of which it forms the core. The Salmonella typhimurium ApaG domain protein, CorD, is involved in Co(2+) resistance and Mg(2+) efflux. Tertiary structures from different ApaG proteins show a fold of several β-sheets. The ApaG domain may be involved in protein-protein interactions which could be implicated in \ substrate-specificity PUBMED:1779764, PUBMED:10945468, PUBMED:15213450.

    \ 5805 IPR010290 \

    This family consists of uncharacterised bacterial proteins, which are putative permeases belonging to the major facilitator superfamily. DitE is linked to the genes involved in the degradation of abietane diterpenoids in Pseudomonas abietaniphila BKME-9 PUBMED:10850995.

    \ 8100 IPR013187 \

    This domain occurs in a diverse superfamily of genes in plants. Most examples are found C-terminal to an F-box (), a 60 amino acid motif involved in ubiquitination of target proteins to mark them for degradation. Two-hybid experiments support the idea that most members are interchangeable F-box subunits of SCF E3 complexes PUBMED:12169662. Some members have two copies of this domain.

    \ 3507 IPR007415 \ This short protein is found in the nif (nitrogen fixation) operon. Its function is unknown but it is probably involved in nitrogen fixation or regulating some component of this process. The 75 residue region that defines these proteins is found in isolation in some members and in the N-terminal half of the longer NifZ proteins.\ 435 IPR007235 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 28 comprises enzymes with a number of known activities; 1,2-diacylglycerol 3-beta-galactosyltransferase (); 1,2-diacylglycerol 3-beta-glucosyltransferase (); beta-N-acetylglucosamine transferase ().\ Structural analysis suggests the C-terminal domain contains the UDP-GlcNAc binding site.

    \ 6594 IPR010636 \

    This is a domain found in fungal hydrophobins that seems to be restricted to ascomycetes. These are small, moderately hydrophobic extracellular proteins that have eight cysteine residues arranged in a strictly conserved motif. Hydrophobins are generally found on the outer surface of conidia and of the hyphal wall, and may be involved in mediating contact and communication between the fungus and its environment PUBMED:11343402. Note that some family members contain multiple copies of the domain.

    \ 632 IPR007717 \ The HRD4 gene is identical to NPL4, a gene previously implicated in nuclear transport. Using a diverse set of substrates and direct ubiquitination assays, analysis revealed that HRD4/NPL4 is required for a poorly characterized step in ER-associated degradation following ubiquitination of target proteins but preceeding their recognition by the 26S proteasome PUBMED:11739805. Npl4p physically associates with Cdc48p via Ufd1p to form a Cdc48p-Ufd1p-Npl4p complex. The Cdc48-Ufd1-Npl4 complex functions in the recognition of several polyubiquitin-tagged proteins and facilitates their presentation to the 26S proteasome for processive degradation or even more specific processing.\ 5018 IPR003851 \

    This family consists of proteins containing a Dof domain, which is a zinc finger DNA-binding domain that shows resemblance to the Cys2 zinc finger, although it has a longer putative loop where an extra Cys residue is conserved PUBMED:9688549. AOBP, a DNA-binding protein in pumpkin (Cucurbita maxima), contains a 52 amino acid Dof domain, which is highly conserved in several\ DNA-binding proteins of higher plants.

    \ 5300 IPR008865 \ This family contains several bacterial Ter proteins. The Ter protein specifically binds to DNA replication terminus sites on the host and plasmid genome and then blocks progress of the DNA replication fork PUBMED:2687269.\ 2622 IPR003510 \ Fumarate reductase is a membrane-bound flavoenzyme consisting of four subunits, A-B. A and B comprise the membrane-extrinsic catalytic domain and C and D link the catalytic centers to the electron-transport chain. This family consists of the 15kDa hydrophobic subunit C.\ 5901 IPR008318 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 84 IPR005158 \

    Found in the DNRI/REDD/AFSR family of regulators, this region of AFSR () along with the C-terminal region is capable of independently directing actinorhodin production.

    \ 564 IPR012302 \

    Malic enzymes (malate oxidoreductases) catalyse the oxidative decarboxylation of malate to form pyruvate PUBMED:, a reaction important in a number of metabolic pathways - e.g. carbon dioxide released from the reaction may be used in sugar production during the Calvin cycle of photosynthesis PUBMED:8300616. There are 3 forms of the enzyme PUBMED:1993674: an NAD-dependent form that decarboxylates oxaloacetate; an NAD-dependent form that does not decarboxylate oxalo-acetate; and an NADPH-dependent form PUBMED:8300616. Other proteins known to be similar to malic enzymes are the Escherichia coli scfA protein; an enzyme from Zea mays (Maize), formerly thought to be cinnamyl-alcohol dehydrogenase PUBMED:2103472; and the hypothetical Saccharomyces cerevisiae protein YKL029c.

    \

    Studies on the duck liver malic enzyme reveals that it can be alkylated by bromopyruvate, resulting in the loss of oxidative decarboxylation and the subsequent enhancement of pyruvate reductase activity PUBMED:1911848. The alkylated form is able to bind NADPH but not L-malate, indicating impaired substrate-or divalent metal ion-binding in the active site PUBMED:1911848. Sequence analysis has highlighted a cysteine residue as the point of alkylation, suggesting that it may play an important role in the activity of the enzyme PUBMED:1911848, although it is absent in the sequences from some species.

    \

    There are three well conserved regions in the enzyme sequences. Two of them seem to be involved in the binding NAD or NADP. The significance of the third one, located in the central part of the enzymes, is not yet known.

    \ 6498 IPR009556 \

    This family consists of several Microneme protein Etmic-2 sequences from Eimeria tenella. Etmic-2 is a 50 kDa acidic protein, which is found within the microneme organelles of Eimeria tenella sporozoites and merozoites PUBMED:8855556.

    \ 6162 IPR009403 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 6104 IPR008090 \

    Iron is essential for growth in both bacteria and mammals. Controlling the \ amount of free iron in solution is often used as a tactic by hosts to limit \ invasion of pathogenic microbes; binding iron tightly within protein \ molecules can accomplish this. Such iron-protein complexes include haem in \ blood, lactoferrin in tears/saliva, and transferrin in blood plasma. Some \ bacteria express surface receptors to capture eukaryotic iron-binding \ compounds, while others have evolved siderophores (enterobactins) to \ scavenge iron from iron-binding host proteins PUBMED:8057905. \

    \

    The control of such siderophore gene expression in Escherichia coli is under \ the regulation of the negative repressor protein FUR PUBMED:9990318. When complexed \ with Fe2+, it down-regulates the transcription not only of the siderophore \ genes, but also of the moieties that release Fe2+ ions bound to the hydrox-\ amate enterobactin proteins in the microbial cytoplasm PUBMED:9990318. An example of \ the latter is FhuF from the Gram-negative microbes Yersinia pestis, \ Salmonella typhi, and Escherichia coli PUBMED:9990318. In conjunction with the \ siderophore system, this gene has been demonstrated to be essential for \ growth and virulence in pathogenic enterobacteria PUBMED:9990318.\

    \

    \ FhuF is a member of the [2Fe-2S] ferric iron reductase family. However,\ in place of the symmetrical tetrahedral arrangement at the ferric iron\ binding site, an unusual Cys-Cys C-terminal group distorts the site in this\ protein PUBMED:10322040. This property makes FhuF inherently unstable, and another set\ of regulatory genes, designated "suf", is thought to maintain its activity\ in the cytoplasm.\

    \ \ 174 IPR007274 \

    The redox active metal copper is an essential cofactor in critical biological processes such as respiration, iron transport, oxidative stress protection, hormone production, and pigmentation. A widely conserved family of high-affinity copper transport proteins (Ctr proteins) mediates copper uptake at the plasma membrane. A series of clustered methionine residues in the hydrophilic extracellular domain, and an MXXXM motif in the second transmembrane domain, are important for copper uptake. These methionines probably coordinate copper during the process of metal transport.

    \ 5082 IPR007919 \

    This family of proteins is functionally uncharacterised.

    \ 1016 IPR007716 \ The HRD4 gene is identical to NPL4, a gene previously implicated in nuclear transport. Using a diverse set of substrates and direct ubiquitination assays, analysis revealed that HRD4/NPL4 is required for a poorly characterized step in ER-associated degradation after ubiquitination of target proteins but before their recognition by the 26S proteasome PUBMED:11739805. This region of the protein contains possibly two zinc binding motifs. Npl4p physically associates with Cdc48p via Ufd1p to form a Cdc48p-Ufd1p-Npl4p complex. The Cdc48-Ufd1-Npl4 complex functions in the recognition of several polyubiquitin-tagged proteins and facilitates their presentation to the 26S proteasome for processive degradation or even more specific processing.\ 4051 IPR002178 \

    The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) PUBMED:8246840, PUBMED:2197982 is a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates and coupled with translocation across the cell membrane, makes the PTS a link between the uptake and metabolism of sugars.

    \ \

    The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred via a signal transduction pathway, to enzyme I (EI) which in turn transfers it to a phosphoryl carrier, the histidine protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease, a membrane-bound complex known as enzyme 2 (EII), which transports the sugar to the cell. EII consists of at least three structurally distinct domains IIA, IIB and IIC PUBMED:1537788. These can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII).

    \ \

    The first domain (IIA or EIIA) carries the first permease-specific phosphorylation site, a histidine which is phosphorylated by phospho-HPr. The second domain (IIB or EIIB) is phosphorylated by phospho-IIA on a cysteinyl or histidyl residue, depending on the sugar transported. Finally, the phosphoryl group is transferred from the IIB domain to the sugar substrate concomitantly with the sugar uptake processed by the IIC domain. This third domain (IIC or EIIC) forms the translocation channel and the specific substrate-binding site.

    \ \

    An additional transmembrane domain IID, homologous to IIC, can be found in some PTSs, e.g. for mannose PUBMED:8246840, PUBMED:1537788, PUBMED:7815935, PUBMED:11361063.

    \ \

    \

    \ 1324 IPR003362 \ This family represents a conserved region from a number of different bacterial sugar transferases, involved in diverse biosynthesis pathways. Examples include galactosyl-P-P-undecaprenol synthetase (), which transfers galatose-1-phosphate to the lipid precursor undecaprenol phosphate in the first steps of O-polysaccharide biosynthesis; UDP-galactose-lipid carrier transferase, which is involved in the biosynthesis of amylovoran; and galactosyl transferase CpsD, which is essential for assembly of the group B Streptococci (GBS) type III capsular polysaccharide.\ 874 IPR004331 \

    The SPX domain is named after SYG1/Pho81/XPR1 proteins. This 180 residue length domain is found at\ the amino terminus of a variety of proteins. In the yeast protein SYG1, the N-terminus directly binds to the G- protein\ beta subunit and inhibits transduction of the mating pheromone signal PUBMED:7592711 suggesting that all the members of this\ family are involved in G-protein associated signal transduction. The C-terminal of these proteins often have an EXS domain () PUBMED:9990033.

    \

    The N-termini of several proteins involved in the\ regulation of phosphate transport, including the putative phosphate level sensors PHO81 from\ Saccharomyces cerevisiae and NUC-2 from Neurospora crassa, are also members of this family PUBMED:8918192, PUBMED:11069666. NUC-2 contains several ankyrin repeats ().

    \

    Several members of this family are the XPR1 proteins: the\ xenotropic and polytropic retrovirus receptor confers susceptibility to infection with murine leukaemia viruses (MLV) PUBMED:9990033.\ The similarity between SYG1, phosphate regulators and XPR1 sequences has been previously noted, as has the\ additional similarity to several predicted proteins, of unknown function, from Drosophila melanogaster, Arabidopsis\ thaliana, Caenorhabditis elegans, Schizosaccharomyces pombe, and Saccharomyces cerevisiae PUBMED:9990033, PUBMED:9927670. In addition, given\ the similarities between XPR1 and SYG1 and phosphate regulatory proteins, it has been proposed that XPR1 might be\ involved in G-protein associated signal transduction and may itself function as a phosphate sensor PUBMED:9990033.

    \ 8042 IPR013158 \

    This domain is found at the N terminus of the Apolipoprotein B mRNA editing enzyme. Apobec-1 catalyzes C to U editing of apolipoprotein B (apoB) mRNA in the mammalian intestine.

    The N-terminal domain of APOBEC-1 like proteins is the catalytic domain, while the C-terminal domain is a pseudocatalyitc domain. More specifically, the catalytic domain is a zinc dependent deaminases domain and is essential for cytidine deamination. APOBEC-3 like members contain two copies of this domain. This family also includes the functionally homologous activation induced deaminase, which is essential for the development of antibody diversity in B lymphocytes. RNA editing by APOBEC-1 requires homodimerisation and this complex interacts with RNA binding proteins to from the editosome PUBMED:12683974 (and references therein).

    \ 570 IPR004092 \

    The function of the malignant brain tumor (MBT) repeat is unknown, but is found in a number of nuclear proteins involved in transcriptional repression. The repeat contains a completely\ conserved glutamate at its amino terminus that may be important for function.

    The crystal structure of the two MBT repeats of human SCM-like 2 protein has been reported. Each repeat consists of an extended "arm" and a globular core. The arm of the first repeat packs against the core of the second repeat and vice versa. The structure of the core-interacting part of each arm consists of an N-terminal alpha-helix and a turn of 310 helix connected by a short beta-strand. The core consists of an Src homology 3-like five-stranded beta-barrel followed by a C-terminal alpha-helix and another short beta-strand. Each arm interacts with its partner core in a similar way, with the orientation of the N-terminal helix relative to the barrel varying slightly. There are also extensive interactions between the two barrels PUBMED:12952983.

    \ 2443 IPR005142 \

    This domain is found in the release factor eRF1 which terminates protein biosynthesis by recognizing stop codons at the A site of the ribosome and stimulating\ peptidyl-tRNA bond hydrolysis at the peptidyl transferase center. The crystal structure of human eRF1 is known PUBMED:10676813. The overall\ shape and dimensions of eRF1 resemble a tRNA molecule with domains 1, 2, and 3 of eRF1 corresponding to the anticodon loop,\ aminoacyl acceptor stem, and T stem of a tRNA molecule, respectively. The position of the essential GGQ motif at an exposed tip\ of domain 2 suggests that the Gln residue coordinates a water molecule to mediate the hydrolytic activity at the peptidyl\ transferase center. A conserved groove on domain 1, 80 A from the GGQ motif, is proposed to form the codon recognition site PUBMED:10676813.

    \ \

    This domain is also found in other proteins which may also be involved in translation termination but this awaits experimental verification.

    \ 6150 IPR010455 \

    This family consists of several phage antitermination protein Q and related bacterial sequences. Phage 82 gene Q encodes a phage-specific positive regulator of late gene expression, thought, by analogy to the corresponding gene of phage lambda, to be a transcription antiterminator PUBMED:3624233.

    \ 2529 IPR007540 \ Fimbriae, also known as pili, form filaments radiating from the surface of the bacterium to a length of 0.5-1.5 micrometres. They enable the cell to colonise host epithelia. This family constitutes the major subunits of CS1 like pili, including CS2 and CFA1 from Escherichia coli, and also the Cable type II pilin major subunit from Burkholderia cepacia PUBMED:10094617. The major subunit of CS1 pili is called CooA. Periplasmic CooA is mostly complexed with the assembly protein CooB. In addition, a small pool of CooA multimers, and CooA-CooD complexes exists, but the functional significance is unknown PUBMED:10094617. A member of this family has also been identified in Salmonella typhi and Salmonella enterica PUBMED:10417651.\ 5466 IPR008514 \ This family consists of several bacterial proteins of unknown function.\ 731 IPR007117 \

    Expansins are unusual proteins that mediate cell wall extension in plants PUBMED:7568110. They are believed to act as a sort of chemical grease, allowing polymers to slide past one another by disrupting non-covalent hydrogen bonds that hold many wall polymers to one another. This process is not\ degradative and hence does not weaken the wall, which could otherwise rupture under internal pressure during growth.

    \

    Sequence comparisons indicate at least four distinct expansin cDNAs in rice and at least six in Arabidopsis. The proteins are highly conserved in\ size and sequence (75-95% amino acid sequence similarity between any pairwise comparison), and phylogenetic trees indicate that this multigene\ family formed before the evolutionary divergence of monocotyledons and dicotyledons PUBMED:7568110. Sequence and motif analyses show no similarities to known functional domains that might account for expansin action on wall extension. It is thought that several highly-conserved tryptophans may function in expansin binding to cellulose, or other glycans. The high conservation of the family indicates that the mechanism by which expansins promote wall extensin tolerates little variation in protein structure.

    \

    Grass pollens, such as pollen from timothy grass, represent a major cause of type I allergy PUBMED:7930302. Interestingly, expansins share a high degree of\ sequence similarity with the Lol p I family of allergens. This entry represents the C-terminal domain.

    \ 4402 IPR012280 \ This domain contains N-acetyl-glutamine semialdehyde dehydrogenase (AgrC), which is involved in arginine biosynthesis, and aspartate-semialdehyde dehydrogenase PUBMED:10369777, an enzyme involved in the biosynthesis of various amino acids from aspartate. It also contains the yeast and fungal Arg5,6 protein, which is cleaved into the enzymes N-acety-gamma-glutamyl-phosphate reductase and acetylglutamate kinase. These are also involved in arginine biosynthesis. All proteins in this entry contain a dimerisation domain of semialdehyde dehydrogenase.\ 2013 IPR005602 \

    This is a family of proteins found in Staphylococcus aureus plasmid with no characterised function.

    \ 6257 IPR010930 \

    This entry consists of a number of C-terminal domains of unknown function. This domain seems to be specific to flagellar basal-body rod and flagellar hook proteins in which is often present at the extreme N terminus.

    \ 658 IPR000326 \ This family of enzymes includes phosphatidylglycerophosphatase B from Escherichia coli and other bacteria, type 2 phosphatidic acid\ phosphatase (PAP2) as well as other phosphoesterases.\

    Other proteins that contain this domain include a bacitracin transport permease from Bacillus licheniformis and a glucose-6-phosphatase from rat.

    \ 2640 IPR001282 \

    Glucose-6-phosphate dehydrogenase () (G6PDH) is a ubiquitous protein, present\ in bacteria and all eukaryotic cell types PUBMED:2838391. The enzyme catalyses the\ the first step in the pentose pathway, i.e. the conversion of glucose-6-phosphate to \ gluconolactone 6-phosphate in the presence of NADP, producing NADPH. The ubiquitous \ expression of the enzyme gives it a major role in the production of NADPH for the many \ NADPH-mediated reductive processes in all cells PUBMED:3393536. Deficiency of G6PDH is \ a common genetic abnormality affecting millions of people worldwide. Many sequence variants, most caused by single point mutations, are known, exhibiting a wide variety of \ phenotypes PUBMED:3393536.

    \ 2596 IPR004233 \

    FokI () is a member of an unusual class of bipartite restriction enzymes that recognize a specific DNA sequence and cleave DNA nonspecifically a short distance away from that sequence. It is a type IIs restriction endonuclease PUBMED:9724744. FokI contains amino- and carboxy-terminal domains corresponding to the DNA-recognition () and cleavage functions, respectively.

    \

    The catalytic domain contains only a single catalytic centre, raising the question of how monomeric FokI manages to cleave both DNA strands. The catalytic domain is sequestered in a 'piggyback' fashion by the recognition domain PUBMED:9214510.

    \ 2168 IPR007465 \ This is a family of uncharacterised proteins from Caenorhabditis elegans.\ 644 IPR007702 \ This family is comprised of the Ocnus, Janus-A and Janus-B proteins. These proteins have been found to be testes specific in Drosophila melanogaster PUBMED:11319264.\ 2139 IPR007429 \ This family contains uncharacterised protein encoded on Trypanosomal kinetoplast minicircles.\ 6664 IPR010665 \

    This family consists of a number of hypothetical putative membrane proteins which seem to be specific to Yersinia pestis. The function of this family is unknown.

    \ 5825 IPR010301 \

    Nop52 is believed to be involved in the generation of 28S rRNA PUBMED:10341208.

    \ 310 IPR006868 \ This region is sometimes found at the N terminus of putative plant bZIP proteins . The function of this conserved region is not known.\ 6621 IPR008321 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 5175 IPR008012 \

    UMP1 is a short-lived chaperone present in the precursor form of the 20S proteasome and\ absent in the mature complex. UMP1 is required for the correct assembly and enzymatic activation\ of the proteasome. UMP1 seems to be degraded by the proteasome upon its formation.

    \ 4138 IPR005010 \

    This is a family of phosphoproteins of unknown function expressed by Rhadovirus.

    \ \ 3269 IPR004315 \ The accessory gland of male insects is a genital tissue that secretes many components of the ejaculatory fluid, some of\ which affect the female's receptivity to courtship and her rate of oviposition. The protein is expressed exclusively in the\ male accessory glands of adult Drosophila melanogaster. During copulation it is transferred to the female genital tract where it is rapidly altered PUBMED:3142802.\ 7425 IPR011454 \

    This is a small family of short hypothetical proteins in Rhodopirellula baltica.

    \ 6415 IPR010564 \

    This family consists of several hypothetical proteins specific to Chlamydia species. The function of this family is unknown.

    \ 7283 IPR010006 \

    This family contains a number of phage polarity suppression proteins (Psu) (approximately 190 residues long). The Psu protein of bacteriophage P4 causes suppression of transcriptional polarity in Escherichia coli by overcoming Rho termination factor activity PUBMED:9007066.

    \ 1913 IPR003798 \ Uncharacterized domain in proteins of unknown function.\ 346 IPR005135 \

    This domain is found in a large number of proteins including magnesium dependent endonucleases and phosphatases involved in intracellular signalling PUBMED:10838565. Proteins this domain is found in include: AP endonuclease proteins (), DNase I proteins (), Synaptojanin an inositol-1,4,5-trisphosphate phosphatase () and Sphingomyelinase ().

    \ 2036 IPR007160 \ This domain is found in some iron-sulphur proteins.\ 5200 IPR008035 \

    Iron (II)/2-oxoglutarate (2-OG)-dependent oxygenases catalyse oxidative reactions in a range\ of metabolic processes. Proline 3-hydroxylase hydroxylates proline at position 3, the first of a\ 2-OG oxygenase catalysing oxidation of a free alpha-amino acid. The structure contains conserved\ motifs present in other 2-OG oxygenases including a jelly roll strand core and residues binding iron\ and 2-oxoglutarate, consistent with divergent evolution within the extended family. The structure\ differs significantly from many other 2-OG oxygenases in possessing a discrete C-terminal helical\ domain.

    \ 2061 IPR007263 \

    The DCC family, named after the conserved N-terminal DxxCxxC motif, encompasses . Proteins in this family have thioredoxin fold and most likely function as thiol-disulphide oxidoreductases PUBMED:15236740.

    \ 2754 IPR000334 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 45 comprises enzymes with only one known activity; endoglucanase ().

    \ \

    The microbial degradation of cellulose and xylans requires several types of\ enzymes such as endoglucanases, cellobiohydrolases ()\ (exoglucanases), or xylanases () PUBMED:2252383, PUBMED:1886523.\ Fungi and bacteria produce\ a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the\ basis of sequence similarities, can be classified into families. One of these\ families is known as the cellulase family K or as the glycosyl hydrolases\ family 45 PUBMED:8352747.\ The best conserved regions in these enzymes is located in the N-terminal\ section. It contains an aspartic acid residue which has been shown PUBMED:8377830 to act\ as a nucleophile in the catalytic mechanism.\ This also has several cysteines that are involved in forming disulphide bridges.

    \ 4668 IPR003571 \

    Snake toxins belong to a family of proteins PUBMED:6433031, PUBMED:, PUBMED: which groups short and\ long neurotoxins, cytotoxins and short toxins, as well as a other miscellaneous\ venom peptides. Most of these toxins act by binding to the nicotinic\ acetylcholine receptors in the postsynaptic membrane of skeletal muscles and\ prevent the binding of acetylcholine, thereby blocking the excitation of\ muscles.

    \

    Snake toxins are proteins that consist of sixty to seventy five amino acids.\ Among the invariant residues are eight cysteines all involved in disulphide\ bonds. The structure is small, disulphide-rich, nearly all beta sheet.

    \ 7772 IPR012897 \

    This entry features the tandem inactivation domain found at the N-terminus of the Kv1.4 potassium channel. It is composed of two subdomains. Inactivation domain 1 (ID1, residues 1-38) consists of a flexible N-terminus anchored at a 5-turn helix, and is thought to work by occluding the ion pathway, as is the case with a classical ball domain. Inactivation domain 2 (ID2, residues 40-50) is a 2.5 turn helix with a high proportion of hydrophobic residues that probably serves to attach ID1 to the cytoplasmic face of the channel. In this way, it can promote rapid access of ID1 to the receptor site in the open channel. ID1 and ID2 function together to bring about fast inactivation of the Kv1.4 channel, which is important for the role of the channel in short-term plasticity PUBMED:12590144.

    \ 3819 IPR006485 \

    Phage proteins for bacterial lysis typically include a membrane-disrupting protein, or holin, and one or more cell wall degrading enzymes that reach the cell wall because of holin action. Holins are found in a large number of mutually non-homologous families.

    \ 437 IPR001830 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 20 comprises enzymes with only one known activity; alpha, alpha-trehalose-phosphate synthase [UDP-forming] ().

    \ \ \

    Synthesis of trehalose in the yeast Saccharomyces cerevisiae is catalysed by the trehalose-6-phosphate (Tre6P) synthase/phosphatase complex, which is composed of at least three different subunits encoded by the genes TPS1, TPS2, and TSL1. Tps1 and Tps2 carry the catalytic activities of trehalose synthesis, namely Tre6P synthase (Tps1) and Tre6P phosphatase (Tps2), while TsI1 has regulatory functions. There is some evidence that TsI1 and Tps3\ may share a common function with respect to regulation and/or structural stabilization of the Tre6P synthase/phosphatase complex in exponentially growing, heat-shocked cells PUBMED:9194697.

    \

    OtsA (trehalose-6-phosphate synthase) from Escherichia coli has homology to the full-length TPS1, the N-terminal part of TPS2 and an internal region of TPS3 (TSL1) of yeast PUBMED:8045430.

    \ 831 IPR003034 \ The SAP (after SAF-A/B, Acinus and PIAS) motif is a putative\ DNA binding domain found in diverse nuclear proteins involved in chromosomal organization PUBMED:10694879.\ 2114 IPR007400 \ FldA () is thought to be involved in the degradation of the polyaromatic hydrocarbon fluorene by Sphingomonas LB126 PUBMED:11766961.\ 2362 IPR003821 \ 1-deoxy-D-xylulose 5-phosphate reductoisomerase synthesizes 2-C-methyl-D-erythritol 4-phosphate from 1-deoxy-D-xylulose 5-phosphate in a single step by intramolecular rearrangement and reduction and is responsible for terpenoid biosynthesis in some organisms PUBMED:9707569. In Arabidopsis thaliana 1-deoxy-D-xylulose 5-phosphate reductoisomerase is the first committed enzyme of the non-mevalonate pathway for isoprenoid biosynthesis.\ 6247 IPR010490 \

    COG6 is a component of the conserved oligomeric golgi complex, which is composed of eight different subunits and is required for normal golgi morphology and localisation.

    \ 788 IPR000702 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    L6 is a protein from the large (50S) subunit. In Escherichia coli, it is located in the aminoacyl-tRNA binding\ site of the peptidyltransferase centre, and is known to bind directly to 23S rRNA. It belongs\ to a family of ribosomal proteins, including L6 from bacteria, cyanelles (structures that\ perform similar functions to chloroplasts, but have structural and biochemical characteristics\ of Cyanobacteria) and mitochondria; and L9 from mammals, Drosophila, plants and yeast. L6\ comprises 2 almost identical folds, suggesting that is was derived by the duplication of an\ ancient RNA-binding protein gene. Analysis reveals several sites on the protein surface where\ interactions with other ribosome components may occur, the N-terminus being involved in \ protein-protein interactions and the C-terminus containing possible RNA-binding sites PUBMED:8262035.

    \ 2356 IPR002795 \ A highly diverged class of S-adenosylmethionine synthetases have been identified in the archaea. S-adenosylmethionine is the primary alkylating agent in all known organisms. ATP:L-methionine S-adenosyltransferase (MAT) catalyzes the only known biosynthetic route to this central metabolite. Although the amino acid sequence of MAT is strongly conserved among bacteria and eukarya (see ) no homologs had been recognized in the completed genome sequences of any archaea. The identification of a second major class of MAT emphasizes the long evolutionary history of the archaeal lineage and the structural diversity found even in crucial metabolic enzymes PUBMED:10660563. Three\ bacterial genomes encode both the archaeal and eukaryotic/bacterial types of MAT PUBMED:10660563.\ 4266 IPR007040 \ This protein associates with 70s ribosomes and converts them to a dimeric form (100S ribosomes) which appear during the transition from the exponential growth phase to the stationary phase of Escherichia coli cells.\ 877 IPR000897 \

    The signal recognition particle (SRP) is an oligomeric complex that mediates targeting and insertion \ of the signal sequence of exported proteins into the membrane of the endoplasmic reticulum. SRP \ consists of a 7S RNA and six protein subunits. One of these subunits, the 54 kD protein (SRP54), is \ a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. The 54K subunit of the signal recognition particle has a two domain structure: the G-domain that binds GTP and the M-domain (see ) that binds the 7s RNA and also binds the signal sequence. The \ N-terminal 300 residues of SRP54 include the GTP-binding site (G-domain) and are evolutionary related \ to similar domains in other proteins PUBMED:7518075.

    \

    These proteins include Escherichia coli and Bacillus \ subtilis ffh protein (P48), which seems to be the prokaryotic counterpart of SRP54; signal recognition \ particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which \ ensures, in conjunction with SRP, the correct targeting of nascent secretory proteins to the \ endoplasmic reticulum membrane; bacterial FtsY protein, which is believed to play a similar role to \ that of the docking protein in eukaryotes; the pilA protein from Neisseria gonorrhoeae, the homolog of \ ftsY; and bacterial flagellar biosynthesis protein flhF.

    \ 170 IPR004871 \ This family includes a region that lies towards the C-terminus of the cleavage and polyadenylation specificity factor (CPSF) A (160 kDa)\ subunit. CPSF is involved in mRNA polyadenylation and binds the AAUAAA conserved sequence in pre-mRNA. CPSF has also been\ found to be necessary for splicing of single-intron pre-mRNAs PUBMED:11421366. The function of the aligned region is unknown but may be involved\ in RNA/DNA binding. \ \ 3253 IPR005269 \

    This family of conserved hypothetical proteins has no known function.

    \ 8074 IPR013217 \

    Members of this family are SAM dependent methyltransferases.

    \ 3698 IPR002022 \

    Pectate lyase is an enzyme involved in the maceration and soft rotting of plant tissue. \ Pectate lyase is responsible for the eliminative cleavage of pectate,\ yielding oligosaccharides with 4-deoxy-alpha-D-mann-4-enuronosyl groups\ at their non-reducing ends. The protein is maximally expressed late in\ pollen development. It has been suggested that the pollen expression of \ pectate lyase genes might relate to a requirement for pectin degradation\ during pollen tube growth PUBMED:1983191.

    \ \

    The structure and the folding kinetics of one member of this family, pectate lyase C\ (pelC)1 from Erwinia chrysanthemi has been investigated in some detail PUBMED:11926834,PUBMED:8502994. PelC contains a parallel beta-helix folding motif. The majority of the regular secondary structure is composed of parallel beta-sheets (about\ 30%). The individual strands of the sheets are connected by unordered loops of varying length. The backbone is then formed by a large helix composed of beta-sheets. There are two disulphide bonds in pelC and 12 proline residues. One of these prolines, Pro220, is involved in a cis peptide bond. he folding mechanism of pelC involves two slow phases that have been attributed to proline isomerization.

    \ \

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Amb a 1, Amb a 2, Amb a 3, Cha o 1, Cup a 1, Cry j 1, Jun a 1.

    \ \

    Two of the major allergens in the pollen of short ragweed (Ambrosia \ artemisiifolia) are Amb aI and Amb aII. The primary structure of Amb aII\ has been deduced and has been shown to share ~65% sequence identity with\ the Amb alpha I multigene family of allergens PUBMED:1717566. Members of the Amb aI/aII\ family include tobacco pectate lyase, which is similar to the deduced amino\ acid sequences of two pollen-specific pectate lyase genes identified in\ tomato PUBMED:1421152; Cry jI, a major allergenic glycoprotein of Cryptomeria japonica \ (Japanese cedar) - the most common pollen allergen in Japan PUBMED:7920021; and P56\ and P59, which share sequence similarity with pectate lyases of plant \ pathogenic bacteria PUBMED:1983191.

    \ \ \ 7024 IPR010810 \

    The function of this region is not clear, but it is found in many flagellar hook proteins, including FliD homologues PUBMED:11230454. It is normally repeated, but is also seen singly. A conserved Ile-Asn is seen at the centre of the motif. The diversity of these motifs makes it likely that some members of the family are not identified.

    \ 5878 IPR010327 \

    Degradation of glutamate via the hydroxyglutarate pathway involves the syn-elimination of water from 2-hydroxyglutaryl-CoA. This anaerobic process is catalysed by 2-hydroxyglutaryl-CoA dehydratase, an enzyme with two components (A and D) that reversibly associate during reaction cycles. This component contains one non-reducible [4Fe-4S]2+ cluster and a reduced riboflavin 5'-monophosphate PUBMED:11980491.

    \ 6933 IPR009797 \

    This family consists of several highly conserved, hypothetical bacterial and phage proteins of around 200 resides in length. The function of this family is unknown.

    \ 4237 IPR002583 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Bacterial ribosomal protein S20 forms part of the 30S ribosomal subunit, and interacts with 16S rRNA PUBMED:3373529. This family is found in bacteria and eukaryotes.

    \ 2347 IPR002782 \

    This prokaryotic family of proteins have no known function.\ The proteins contain four conserved cysteines that may be involved in metal binding or disulphide bridges.

    \ 394 IPR000467 \ The D111/G-patch domain PUBMED:10470032 is a short conserved region of about 40 amino acids\ which occurs in a number of putative RNA-binding proteins, including tumor \ suppressor and DNA-damage-repair proteins, suggesting that this\ domain may have an RNA binding function. This domain\ has seven highly conserved glycines.\ A multiple alignment of a small subset of D111/G-patch domains is shown in Fig. 2b\ of PUBMED:10353602.\ 2306 IPR007748 \ This is a family of uncharacterised viral proteins of unknown function.\ 2675 IPR003681 \

    The glycophorin-binding protein contains a tandem repeat. The repeated sequence determines the binding domain for an erythrocyte receptor binding protein of Plasmodium falciparum, the malarial parasite PUBMED:7891744. Erythrocyte invasion by the malarial merozoite is a receptor-mediated process, an obligatory step in the development of the parasite. The P. falciparum protein binds to the erythrocyte receptor glycophorin.

    \ 4404 IPR007455 \ Serglycin is the most prevalent proteoglycan produced in haemopoietic cells. Serglycin is a proteinase resistant secretory granule proteoglycan PUBMED:2261494.\ 1888 IPR002862 \

    Proteins that contain this domain are of unknown function. It appears to occur towards the C-terminus of proteins from Mycoplasma pneumoniae PUBMED:8948633.

    \ 7243 IPR010881 \

    This family consists of several Gammaherpesvirus latent membrane protein (LMP2) proteins. Epstein-Barr virus is a human Gammaherpesvirus that infects and establishes latency in B lymphocytes in vivo. The latent membrane protein 2 (LMP2) gene is expressed in latently infected B cells and encodes two protein isoforms, LMP2A and LMP2B, that are identical except for an additional N-terminal 119 aa cytoplasmic domain which is present in the LMP2A isoform. LMP2A is thought to play a key role in either the establishment or the maintenance of latency and/or the reactivation of productive infection from the latent state. The significance of LMP2B and its role in pathogenesis remain unclear PUBMED:11961256.

    \ 1482 IPR005089 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ Carbohydrate-binding module, family 25 CAZy GH_25 has a starch-binding function as demonstrated in one case.\ \ 6094 IPR009371 \

    The species Pseudomonas syringae encompasses plant pathogens with differing host specificities and corresponding pathovar designations. P. syringae requires the Hrp (type III protein secretion) system, encoded by a 25-kb cluster of hrp and hrc genes, in order to elicit the hypersensitive response (HR) in nonhosts or to be pathogenic in hosts. The exact function of HrpF is unknown but the protein is needed for pathogenicity PUBMED:9721291.

    \ 4532 IPR006270 \

    The sequences represented in this group are identified by a domain which consists of the N-terminal half of a family of Streptococcal proteins that contain a signal peptide and then up to five repeats of a region that includes a His-X-X-His-X-His (histidine triad) motif. Additional copies of the repeats are found in more poorly conserved regions. Members of this family from Streptococcus pneumoniae are suggested to cleave human C3, and the member PhpA has been shown in vaccine studies to be a protective antigen in mice PUBMED:11349048.

    \ 1349 IPR004194 \ Type II restriction endonucleases are characterized by their specificity for recognising and cleaving specific DNA sequences. The sequences of these endonucleases are surprisingly unrelated, however the structure of restriction endonuclease BamHI was determined at 1.95 A resolution, and shows a resemblance to the structure of endonuclease EcoRI PUBMED:8145855.\ 1212 IPR002575 \ This entry consists of bacterial antibiotic resistance proteins,\ which confer resistance to various aminoglycosides they include:-\ aminoglycoside 3'-phosphotransferase or kanamycin kinase / \ neomycin-kanamycin phosphotransferase and streptomycin 3''-kinase\ or streptomycin 3''-phosphotransferase. The aminoglycoside \ phosphotransferases inactivate aminoglycoside antibiotics via \ phosphorylation PUBMED:2167474.\ 1813 IPR000201 \ This domain is at the N-terminus of hepadnavirus P proteins and covers the so-called terminal protein and the spacer region of the protein. This domain is always associated with and .\ 1983 IPR005105 \

    This domain is found associated with an N-terminal cyclic nucleotide-binding domain () and two CBS domains (). This domain, normally represents the C-terminal region, is uncharacterised; however, it seems to be similar to the nucleotidyltransferase domain (), conserving the DXD motif, which strongly suggests that proteins containing this domain are also nucleotidyltransferases.

    \ 7551 IPR011716 \

    This entry includes tetratricopeptide-like repeats found in the LcrH/SycD-like chaperones PUBMED:12799000.

    \ 5088 IPR007925 \

    The TraM protein is an essential part of the DNA transfer machinery of the conjugative\ resistance plasmid R1 (IncFII). On the basis of mutational analyses, it was shown that the essential\ transfer protein TraM has at least two functions. First, a functional TraM protein was found to be\ required for normal levels of transfer gene expression. Second, experimental evidence was obtained\ that TraM stimulates efficient site-specific single-stranded DNA cleavage at the oriT, in vivo.\ Furthermore, a specific interaction of the cytoplasmic TraM protein with the membrane protein TraD\ was demonstrated, suggesting that the TraM protein creates a physical link between the relaxosomal\ nucleoprotein complex and the membrane-bound DNA transfer apparatus PUBMED:11258958.

    \ 7586 IPR011669 \ This entry contains a number of hypothetical bacterial and archaeal proteins. The region is approximately 350 residues long. A member of this family () is thought to associate with another subunit to form an H+-transporting ATPase, but no evidence has been found to support this.\ 971 IPR007319 \ Utp21 is a subunit of U3 snoRNP, which is essential for synthesis of 18S rRNA.\ 7919 IPR012623 \

    This family consists of members of the conotoxin O-superfamily. The O-superfamily of conotoxins consists of 3 groups of Conus peptides that belong to the same structural group. These 3 groups differ in their pharmacological properties: the w-conotoxins which inhibit calcium channels, the delta-conotoxins which slow down the inactivation rate of voltage -sensitive sodium channels and the muO-conotoxins block the voltage sensitive sodium currents PUBMED:7622492.

    \ 3244 IPR006755 \ This is a family of uncharacterised plant pathogen luteovirus proteins.\ 6481 IPR009545 \

    This family consists of several Caenorhabditis elegans specific proteins of unknown function.

    \ 4246 IPR006846 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This entry is for the ribosomal protein S30.

    \ 7037 IPR009858 \

    This family consists of several hypothetical bacterial proteins of around 180 residues in length. The function of this family is unknown.

    \ 7598 IPR011687 \ This entry contains sequences that bear similarity to the glioma tumour suppressor candidate region gene 2 protein (p60) PUBMED:10708517. This protein has been found to interact with herpes simplex type 1 regulatory proteins, but its exact role in the life cycle of the virus is not known PUBMED:10196275.\ 6526 IPR009579 \

    This family consists of several short, hypothetical, bacterial proteins of around 60 residues in length. The function of this family is unknown.

    \ 3675 IPR005543 \

    The PASTA domain is found at the C-termini of several Penicillin-binding proteins (PBP) and bacterial serine/threonine kinases. It binds the beta-lactam stem, which implicates it in sensing D-alanyl-D-alanine - the PBP transpeptidase substrate. In PknB of Mycobacterium tuberculosis (), all of the extracellular portion is predicted to be made up of four PASTA domains, which strongly suggests that it is a signal-binding sensor\ domain. The domain has also been found in proteins involved in cell wall biosynthesis, where it is implicated in localizing the\ biosynthesis complex to unlinked peptidoglycan.

    PASTA is a small globular fold consisting of 3 beta-sheets and an alpha-helix, with a loop region of variable length between the first and\ second beta-strands. The name PASTA is derived from PBP and Serine/Threonine kinase Associated domain PUBMED:12217513.

    \ 7269 IPR010001 \

    This family contains the sigmaK-factor processing regulatory protein BofA (Bypass-of-forespore protein A) (approximately 80 residues long). During sporulation in Bacillus subtilis, transcription is controlled in the developing sporangium by a cascade of sporulation-specific transcription factors (sigma factors). Following engulfment, processing of sigmaK is inhibited by BofA. It has been suggested that this effect is exerted by alteration of the level of the SpoIVFA protein PUBMED:10464210.

    \ 5050 IPR007887 \

    The multiple antibiotic resistance of methicillin-resistant\ strains of Staphylococcus aureus (MRSA) has become a\ major clinical problem worldwide. Methicillin resistance in MRSA strains is\ due to the acquisition of the mecA gene via horizontal transfer\ from an unidentified species which encodes penicillin-binding protein 2a (PBP2a).

    \

    The structure of the N-terminal domain from MecA is known PUBMED:12389036 and is found to be similar to that found in NTF2 . The length of the PBP2A N-terminal domain\ (which positions the transpeptidase active site more than 100 Ã… from the\ expected C terminus of the transmembrane anchor) suggests a\ possible structural role and potentially gives the transpeptidase\ domain substantial reach from the cell membrane. This domain seems unlikely to have an enzymatic function.

    \ 4053 IPR003352 \ The bacterial phosphoenolpyruvate: sugar phosphotransferase system (PTS) is a multi-protein system involved in the regulation of a variety of metabolic and transcriptional processes. The PTS catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred to enzyme-I (EI) of PTS which in turn transfers it to a phosphoryl carrier protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease which consists of at least three structurally distinct domains (IIA, IIB, and IIC) PUBMED:1537788 which can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII). The IIC domain catalyzes the transfer of a phosphoryl group from IIB to the sugar substrate.\ 8039 IPR013185 \

    This entry represents the N-terminal domain of homologues of elongation factor P, which probably are translation initiation factors.

    \ \ \ \ \ \ 6966 IPR009815 \

    This family consists of several hypothetical Nucleopolyhedrovirus proteins of around 350 residues in length. The function of this family is unknown.

    \ 6924 IPR009792 \

    This family consists of several hypothetical eukaryotic proteins of around 125 residues in length. The function of this family is unknown.

    \ 4408 IPR007333 \ This family consists of bacterial transmembrane proteins with a putative sugar-specific permease function, analogous to the IIC component of the PTS system (). It has been suggested that this permease may form part of an L-ascorbate utilization pathway, with proposed specificity for 3-keto-L-gulonate (formed by hydrolysis of L-ascorbate) PUBMED:11741871.\ 2703 IPR002218 \ Bacterial glucose inhibited division protein A (gene gidA) is a protein of 70 Kd whose function is not yet known and whose sequence is highly conserved. It is evolutionary related to yeast hypothetical protein YGL236C, Caenorhabditis elegans hypothetical protein F52H3.2 and a Bacillus subtilis protein called gid (and which is different from B. subtilis gidA).\ 368 IPR000413 \

    Integrins are the major metazoan receptors for cell adhesion to extracellular matrix proteins and, in vertebrates, also play important roles in certain cell-cell adhesions, make transmembrane connections to the cytoskeleton and activate many intracellular signaling pathways PUBMED:12297042. Integrins are alpha-beta heterodimers; each subunit crosses the membrane once, with most of the polypeptide in the extracellular space, and has two short cytoplasmic domains. Most integrins recognise relatively short peptide motifs, and in general require an acidic amino acid to be present. Ligand specificity depends on both the alpha and beta subunits. Many integrins are expressed on cell surfaces in an inactive state in which they do not bind ligands and do not signal. Integrins frequently intercommunicate and the engagement of one may lead to the activation or inhibition of another.

    \

    The structure of unliganded alphaV beta3 showed the molecule to be folded, with the head bent over towards the C termini of the legs which would normally be inserted into the membrane. The head comprises a beta propeller domain at the end terminus of the alphaV subunit and an I/A domain inserted into a loop on the top of the hybrid domain in the beta subunit. The I/A domain consists of a Rossman fold with a core of beta parallel sheets surrounded by amphipathic alpha helices.

    \ Some alpha subunits are cleaved post-\ translationally to produce a heavy and a light chain linked by a disulphide\ bond PUBMED:3028640, PUBMED:2199285. Integrin alpha chains share a conserved sequence which is found at\ the beginning of the cytoplasmic domain, just after the end of the\ transmembrane region. Within the N-terminal domain of alpha subunits, seven sequence repeats, each\ of approximately 60 amino acids, have been found PUBMED:3327687. It has been predicted \ that these repeats assume the beta-propeller fold. The domains contain seven \ four-stranded beta-sheets arranged in a torus around a pseudosymmetry axis\ PUBMED:8990162. Integrin ligands and a putative Mg2+ ion are predicted to bind to the\ upper face of the propeller, in a manner analogous to the way in which the\ trimeric G-protein beta subunit (G beta) (which also has a beta-propeller\ fold) binds the G protein alpha subunit PUBMED:8990162.\

    Integrin cytoplasmic domains are normally less than 50 amino acids in length, with the beta-subunit sequences\ exhibiting greater homology to each other than the alpha-subunit sequences PUBMED:12826403. This is consistent with\ current evidence that the beta subunit is the principal site for binding of cytoskeletal and signalling\ molecules, whereas the alpha subunit has a regulatory role. The first ten residues of the\ alpha-subunit cytoplasmic domain appear to form an alpha helix that is terminated by a proline residue. The\ remainder of the domain is highly acidic in nature and this loops back to contact the\ membrane-proximal lysine anchor residue.

    \ 7050 IPR009865 \

    This family consists of several mammalian specific proacrosin binding protein sp32 sequences. sp32 is a sperm specific protein, which is known to bind with 55- and 53 kDa proacrosins and the 49 kDa acrosin intermediate. The exact function of sp32 is unclear, it is thought however that the binding of sp32 to proacrosin may be involved in packaging the acrosin zymogene into the acrosomal matrix PUBMED:8144514.

    \ 4475 IPR002017 \ Spectrin repeats PUBMED:8266097 are found in several proteins involved in\ cytoskeletal structure. These include spectrin, alpha-actinin\ and dystrophin. \ The spectrin repeat forms a\ three helix bundle. The second helix is interrupted by proline\ in some sequences. The repeats are defined by a characteristic\ tryptophan (W) residue at position 17 in helix A and a leucine\ (L) at 2 residues from the carboxyl end of helix C.\ 6617 IPR009626 \

    This is a group of proteins of unknown function.

    \ 5005 IPR006182 \ This bacterial family includes proteins that are related to the YscJ lipoprotein, and the amino terminus of FliF, the flagellar M-ring protein. The members of the YscJ family are thought to be involved in secretion of several proteins. The FliF protein ring is thought to be part of the export apparatus for flagellar proteins, based on the similarity to YscJ proteins PUBMED:10049798.\ 232 IPR003732 \

    This homodimeric enzyme appears able to cleave any D-amino acid (and glycine, which does not have distinct D/L forms) from charged tRNA. The name reflects characterization with respect to D-Tyr on tRNA(Tyr) as established in the literature, but substrate specificity seems much broader.

    \ 7230 IPR009980 \

    This family consists of several Human herpesvirus U26 proteins of around 300 residues in length. The function of this family is unknown.

    \ 8066 IPR013255 \

    Spc25 is a conserved eukaryotic kinetochore protein involved in cell division. In fungi the Spc25 protein is a subunit of the Nuf2-Ndc80 complex PUBMED:15728720, and in vertebrates it forms part of the Ndc80 complex PUBMED:14738735. It is required for chromosome segregation.

    \ 6792 IPR010720 \

    This entry represents the C terminus (approximately 200 residues) of bacterial and eukaryotic alpha-L-arabinofuranosidase (). This catalyses the hydrolysis of non-reducing terminal alpha-L-arabinofuranosidic linkages in L-arabinose-containing polysaccharides PUBMED:7887599.

    \ 2092 IPR007362 \ This is a family of uncharacterised proteins.\ 2759 IPR001968 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 56 comprises enzymes with only one known activity; hyaluronidase ().

    \ \

    The venom of honeybees contains several biologically-active peptides and\ two enzymes, one of which is a hyaluronidase PUBMED:7682712. The amino acid sequence\ of bee venom hyaluronidase contains 349 amino acids, and includes four\ cysteines and a number of potential glycosylation sites PUBMED:7682712. The sequence\ shows a high degree of similarity to PH-20, a membrane protein of mammalian\ sperm involved in sperm-egg adhesion, supporting the view that hyaluronidases\ play a role in fertilisation PUBMED:7682712.

    \

    PH-20 is required for sperm adhesion to the egg zona pellucida; it is\ located on both the sperm plasma membrane and acrosomal membrane PUBMED:2269661. The\ amino acid sequence of the mature protein contains 468 amino acids, and\ includes six potential N-linked glycosylation sites and twelve cysteines,\ eight of which are tightly clustered near the C-terminus PUBMED:2269661.

    \ 207 IPR003351 \

    \ Wnt proteins constitute a large family of secreted signalling molecules that\ are involved in intercellular signalling during development. The name \ derives from the first 2 members of the family to be discovered: int-1 \ (mouse) and wingless (Wg) (Drosophila) PUBMED:9891778. It is now recognised that Wnt \ signalling controls many cell fate decisions in a variety of different \ organisms, including mammals. Wnt signalling has been implicated in \ tumorigenesis, early mesodermal patterning of the embryo, morphogenesis of \ the brain and kidneys, regulation of mammary gland proliferation and \ Alzheimer's disease PUBMED:10967351.\

    \

    \ Wnt signal transduction proceeds initially via binding to their cell\ surface receptors - the so-called frizzled proteins. This activates the\ signalling functions of B-catenin and regulates the expression of specific\ genes important in development PUBMED:10733430. More recently, however, several non-canonical Wnt signalling pathways have been elucidated that act\ independently of B-catenin. In both cases, the transduction mechanism\ requires dishevelled protein (Dsh), a cytoplasmic phosphoprotein that acts\ directly downstream of frizzled PUBMED:12072470. In addition to its role in Wnt\ signalling, Dsh is also involved in generating planar polarity in Drosophila\ and has been implicated in the Notch signal transduction cascade. Three \ human and mouse homologues of Dsh have been cloned (DVL-1 to 3); it is \ believed that these proteins, like their Drosophila counterpart, are \ involved in signal transduction. Human and murine orthologues share more \ than 95% sequence identity and are each 40-50% identical to Drosophila Dsh.\

    \

    \ Sequence similarity amongst Dsh proteins is concentrated around three \ conserved domains: at the N-terminus lies a DIX domain (mutations \ mapping to this region reduce or completely disrupt Wg signalling); a PDZ \ (or DHR) domain, often found in proteins involved in protein-protein \ interactions, lies within the central portion of the protein (point \ mutations within this module have been shown to have little effect on \ Wg-mediated signal transduction); and a DEP domain is located towards the C-terminus and is conserved among a set of proteins that regulate various \ GTPases (whilst genetic and molecular assays have shown this module to be \ dispensable for Wg signalling, it is thought to be important in planar \ polarity signalling in flies PUBMED:12072470).\

    \ \ This domain is specific to the signaling protein dishevelled. In Drosophila melanogaster, the dishevelled segment polarity protein is required to establish coherent arrays of polarized cells and segments in embryos. It plays a role in wingless signaling, possibly through the reception of the wingless signal by target cells and subsequent redistribution of arm protein in response to that signal in embryos.The domain is found adjacent to the PDZ domain (), often in conjunction with DEP () and DIX ().\ 292 IPR007590 \ This is a family of eukaryotic proteins with undetermined function.\ 1437 IPR003644 \

    The calx-beta motif is present as a tandem repeat in the cytoplasmic domains of Calx Na-Ca exchangers, which are used to expel calcium from cells. This motif overlaps domains used for calcium binding and regulation. The calx-beta motif is also present in the cytoplasmic tail of mammalian integrin-beta4, which mediates the bi-directional transfer of signals across the plasma membrane, as well as in some cyanobacterial proteins. This motif contains a series of beta-strands and turns that form a self-contained beta-sheet PUBMED:9294196, PUBMED:10390612.

    \ 548 IPR004183 \

    Dioxygenases catalyse the incorporation of both atoms of molecular oxygen into substrates using a variety of reaction mechanisms. Cleavage of aromatic rings is one of the most important functions of dioxygenases, which play key roles in the degradation of aromatic compounds. The substrates of ring-cleavage dioxygenases can be classified into two groups according to the mode of scission of the aromatic ring. Intradiol enzymes () use a non-haem Fe(III) to cleave the aromatic ring between two hydroxyl groups (ortho-cleavage), whereas extradiol enzymes use a non-haem Fe(II) to cleave the aromatic ring between a hydroxylated carbon and an adjacent non-hydroxylated carbon (meta-cleavage) PUBMED:10730195, PUBMED:15264822. These two subfamilies differ in sequence, structural fold, iron ligands, and the orientation of second sphere active site amino acid residues. Extradiol dioxygenases are usually homo-multimeric, bind one atom of ferrous ion per subunit and have a subunit size of about 33 kDa. Extradiol dioxygenases can be divided into three classes. Class I and II enzymes () show sequence similarity, with the two-domain class II enzymes having evolved from a class I enzyme through gene duplication. Class III enzymes are different in sequence and structure, but they do share several common active-site characteristics with the class II enzymes, in particular the coordination sphere and the disposition of the putative catalytic base are very similar. Class III enzymes usually have two subunits, designated A and B. This entry represents the extradiol dioxygenase class III enzymes, subunit B.

    \

    Enzymes that belong to the extradiol class III family include Protocatechuate 4,5-dioxygenase (4,5-PCD; LigAB) () PUBMED:10467151, of which LigB is represented by this entry; and 2'-aminobiphenyl-2,3-diol 1,2-dioxygenase (CarBaBb) PUBMED:12728990, of which CarBb is represented by this entry.

    \ \ 1041 IPR002698 \ 5-formyltetrahydrofolate cyclo-ligase or methenyl-THF synthetase catalyses the interchange of 5-formyltetrahydrofolate (5-FTHF) to 5-10-methenyltetrahydrofolate, this requires ATP and Mg2+ PUBMED:8522195. 5-FTHF is used in chemotherapy where it is clinically known as Leucovorin PUBMED:8034591.\ 1125 IPR000274 \ Adenylate cyclase is the enzyme responsible for the synthesis of cAMP from ATP. From sequence data, it has been proposed that there are three different classes of adenylate cyclases PUBMED:7863008. Class I cyclases are found in enterobacteria and related Gram-negative bacteria. They are proteins of about 850 residues that consist of two functional domains: a N-terminal catalytic domain and a C-terminal regulatory domain.\ There are two highly conserved regions, the first one is located in the catalytic domain and the second one in the regulatory domain. The second signature includes a conserved histidine which could be phosphorylated by a PTS system IIA enzyme, thus leading to the activation of the cyclase.\ 3123 IPR000622 \

    The K-Cl co-transporter (KCC) mediates the coupled movement of K+ and Cl-\ ions across the plasma membrane of many animal cells. This transport is\ involved in the regulatory volume decrease in response to cell swelling in\ red blood cells, and has been proposed to play a role in the vectorial\ movement of Cl- across kidney epithelia. The transport process involves one\ for one electroneutral movement of K+ together with Cl-, and, in all\ known mammalian cells, the net movement is outward PUBMED:8663127.

    \ \

    In neurones, it appears to play a unique role in maintaining low\ intracellular Cl-concentration, which is required for the functioning of Cl-\ dependent fast synaptic inhibition, mediated by certain neurotransmitters,\ such as gamma-aminobutyric acid (GABA) and glycine.

    \ \

    Two isoforms of the K-Cl co-transporter have been described, termed KCC1 and\ KCC2, containing 1085 and 1116 amino acids, respectively. They are both\ predicted to have 12 transmembrane (TM) regions in a central hydrophobic\ domain, together with hydrophilic N- and C-termini that are likely\ cytoplasmic. Comparison of their sequences with those of other\ ion-tranporting membrane proteins reveals that they are part of a new\ superfamily of cation-chloride co-transporters, which includes the Na-Cl and\ Na-K-2Cl co-transporters. KCC1 is widely expressed in human tissues, while\ KCC2 is expressed only in brain neurones, making it likely that this is the\ isoform responsible for maintaining low Cl- concentration in neurones PUBMED:8663311, PUBMED:9930699.

    \ \

    KCC1 is widely expressed in human tissues, and when heterologously expressed,\ possesses the functional characteristics of the well-studied red blood cell\ K-Cl co-transporter, including stimulation by both swelling and\ N-ethylmaleimide. Several splice variants have also been identified.

    \ 1507 IPR004944 \ These proteins are activators of cyclin-dependent kinase 5. They are heterodimers of a catalytic subunit and a regulatory subunit. \ 2101 IPR007376 \ This family consists of uncharacterised bacterial proteins.\ 4417 IPR007001 \ This domain represents the high-similarity N-terminal constant region shared by shufflon proteins. Shufflon proteins are created as a result of a clustered inversion region. The proteins retain a constant N-terminal domain, with different C-terminal domains.\ 7355 IPR003650 \ This domain confers specificity among members of the Hairy/E(SPL) family. HES-2 (hairy and enhancer of split 2) is a transcription factor, and the hairy protein is a pair-rule protein that regulates embryonic segmentation and adult bristle patterning. These proteins are transcriptional repressors of genes that require the BHLH protein for their transcription.\ 4968 IPR004982 \

    This is a family of mainly hypothetical Schizosacchoromyces pombe proteins. Their function is unknown but the family includes at least one protein up-regulated during meiosis.

    \ 4137 IPR001903 \ Different families of ssRNA negative-strand viruses contain glycoproteins responsible for forming spikes on the surface of the virion. The glycoprotein spike is made up of a trimer of glycoproteins. These proteins are frequently abbreviated to G protein. Channel formed by glycoprotein spike is thought to function in a similar manner to Influenza virus M2 protein channel, thus allowing a signal to pass across the viral membrane to signal for viral uncoating PUBMED:1660200, PUBMED:9000093.\ 3442 IPR004230 \ MutS, MutL and MutH are the three essential proteins for initiation of methyl-directed DNA mismatch repair to correct mistakes made during DNA replication in Escherichia coli. MutH cleaves a newly synthesized and unmethylated daughter strand 5' to the sequence d(GATC) in a hemi-methylated duplex. Activation of MutH requires the recognition of a DNA mismatch by MutS and MutL PUBMED:9482749.\ 6732 IPR009683 \

    This entry represents the C terminus (approx. 120 residues) of a number of bacterial extensin-like proteins. Extensins are cell wall glycoproteins normally associated with plants, where they strengthen the cell wall in response to mechanical stress PUBMED:8148875. Many proteins in this entry are hypothetical.

    \ 3299 IPR004243 \ This minor capsid protein may act as a link between the external capsid and the internal DNA-protein core. Residues at the C-terminal end of the protein may act as a protease cofactor leading to activation of the adenovirus proteinase PUBMED:3959314.\ 7200 IPR009964 \

    This family consists of several bacterial proteins of around 115 residues in length. Members of this family seem to be found exclusively in the alphaproteobacteria. The function of this family is unknown.

    \ 6395 IPR009505 \

    This entry represents the C-terminal cytoplasmic domain of vertebrate neural chondroitin sulphate proteoglycans that contain EGF modules. Evidence has been accumulated to support the idea that neural proteoglycans are involved in various cellular events including mitogenesis, differentiation, axonal outgrowth and synaptogenesis PUBMED:9321696. This domain contains a number of potential sites of phosphorylation by protein kinase C PUBMED:9950058.

    \ 3962 IPR005023 \

    All members of this family show similarity to the vaccinia virus late protein H2, which is often referred to by its gene name H2R. Members from this family all\ belong to the viral taxon Poxviridae.

    \ 4023 IPR003372 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbL found in PSII. PsbL is located in a gene cluster with PsbE, PsbF and PsbJ (PsbEFJL). Both PsbL and PsbJ () are essential for proper assembly of the OEC. Mutations in PsbL prevent the formation of both PSII core dimers and PSII-light harvesting complex PUBMED:14686923. In addition, both PsbL and PsbJ are involved in the unidirectional flow of electrons, where PsbJ regulates the forward electron flow from D2 (Qa) to the plastoquinone pool, and PsbL prevents the reduction of PSII by back electron flow from plastoquinol protecting PSII from photo-inactivation PUBMED:14979726.

    \ 3730 IPR000855 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine aminopeptidases belong to the peptidase family C5 (adenain family, clan CE). Several adenovirus proteins are synthesised as precursors, requiring\ processing by a protease before the virion is assembled PUBMED:7845226, PUBMED:3052288. Until\ recently, the adenovirus endopeptidase was classified as a serine protease,\ having been reported to be inhibited by serine protease inhibitors PUBMED:7845226, PUBMED:462815.\ However, it has since been shown to be inhibited by cysteine protease\ inhibitors, and the catalytic residues are believed to be His-54 and\ Cys-104 PUBMED:7845226, PUBMED:3052288.

    \ 1428 IPR006931 \ Calcipressin is also known as calcineurin-binding protein, since it inhibits calcineurin-mediated transcriptional modulation by binding to catalytic domain of calcineurin PUBMED:12039863.\ 7855 IPR012523 \

    This family consists of the ponericin family of antimicrobial peptides isolated from predatory ant Pachycondyla goeldii. The ponericin peptides may adopt amphipathic alpha-helical structure in polar environments. In the ant colony, these peptides exhibit a defensive role against microbial pathogens arising from prey introduction and/or ingestion PUBMED:11279030.

    \ 6604 IPR010642 \

    This family consists of several invasion associated locus B (IalB) proteins and related sequences. IalB is known to be a major virulence factor in Bartonella bacilliformis where it was shown to have a direct role in human erythrocyte parasitism. IalB is upregulated in response to environmental cues signaling vector-to-host transmission. Such environmental cues would include, but not be limited to, temperature, pH, oxidative stress, and haemin limitation. It is also thought that IalB would aide B. bacilliformis survival under stress-inducing environmental conditions PUBMED:12668141. The role of this protein in other bacterial species is unknown.

    \ 3263 IPR001862 \

    The membrane-attack complex (MAC) of the complement system forms transmembrane channels. These channels disrupt the phospholipid bilayer of target cells, leading to cell lysis and death PUBMED:1722985, PUBMED:. A number of proteins participate in the assembly of the MAC. Freshly activated C5b binds to C6 to form a C5b-6 complex, then to C7 forming the C5b-7 complex. The C5b-7 complex binds to C8, which is composed of three chains (alpha, beta, and gamma), thus forming the C5b-8 complex. C5b-8 subsequently binds to C9 PUBMED:3219351, PUBMED:4018030, PUBMED:6095282 and acts as a catalyst in the olymerization of C9. Active MAC has a subunit composition of C5b-C6-C7-C8-C9{n}.

    \

    Perforin PUBMED:1722985, PUBMED:3419519 is a protein found in cytolytic T-cell and killer cells. In the presence of calcium, perforin polymerizes into transmembrane tubules and is capable of lysing, non-specifically, a variety of target cells PUBMED:2395434.

    \

    There are a number of regions of similarity in the sequences of complement components C6, C7, C8-alpha, C8-beta, C9 and perforin.

    \ 1895 IPR003748 \

    This entry describes proteins of unknown function.

    \ 7547 IPR011700 \

    The basic-leucine zipper (bZIP) transcription factors PUBMED:7780801, PUBMED: of eukaryotes are proteins that contain a basic region mediating sequence-specific DNA-binding, followed by a leucine zipper region (see ), which is required for dimerization.

    \ 6063 IPR009356 \

    This family consists of NADH dehydrogenase subunit 4L (NAD4L) proteins from the mitochondria of several parasitic flatworms.

    \ 116 IPR005043 \ Mammalian cellular apoptosis susceptibility (CAS) proteins are homologous to the yeast chromosome-segregation protein, CSE1 PUBMED:7479798. This family aligns the C-terminal\ halves (approximately). CAS is involved in both cellular apoptosis and proliferation PUBMED:8639641, PUBMED:8610099. Apoptosis is inhibited in CAS-depleted cells, while the expression of CAS\ correlates to the degree of cellular proliferation. Like CSE1, it is essential for the mitotic checkpoint in the cell cycle (CAS depletion blocks the cell in the G2 phase),\ and has been shown to be associated with the microtubule network and the mitotic spindle PUBMED:8610099, as is the protein MEK, which is thought to regulate the intracellular\ localization (predominantly nuclear vs. predominantly cytosolic) of CAS. In the nucleus, CAS acts as a nuclear transport factor in the importin pathway PUBMED:9323134. The\ importin pathway mediates the nuclear transport of several proteins that are necessary for mitosis and further progression. CAS is therefore thought to affect the cell\ cycle through its effect on the nuclear transport of these proteins PUBMED:9323134. Since apoptosis also requires the nuclear import of several proteins (such as P53 and\ transcription factors), it has been suggested that CAS also enables apoptosis by facilitating the nuclear import of at least a subset of these essential proteins PUBMED:9497270.\ 4751 IPR006905 \ Tryptophan halogenase catalyses the chlorination of tryptophan to form 7-chlorotryptophan. This is the first step in the biosynthesis of pyrrolnitrin, an antibiotic with broad-spectrum anti-fungal activity. Tryptophan halogenase is NADH-dependent PUBMED:10547442.\ 1619 IPR007348 \ CopC is a bacterial blue copper protein that binds 1 atom of copper per protein molecule. Along with CopA, CopC mediates copper resistance by sequestration of copper in the periplasm PUBMED:1924351.\ 6672 IPR009654 \

    This family consists of several short bacterial proteins of around 100 residues in length. The function of this family is unknown.

    \ 72 IPR000793 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    Vacuolar ATPases PUBMED:2531737 (V-ATPases) are responsible for acidifying a variety of\ intracellular compartments in eukaryotic cells. Like F-ATPases, they are\ oligomeric complexes of a transmembrane and a catalytic sector. The sequence\ of the largest subunit of the catalytic sector (70 Kd) is related to that of\ F-ATPase beta subunit, while a 60 Kd subunit, from the same sector, is related\ to the F-ATPases alpha subunit PUBMED:2528146.\ Archaebacterial membrane-associated ATPases are composed of three subunits.\ The alpha chain is related to F-ATPases beta chain and the beta chain is\ related to F-ATPases alpha chain PUBMED:2528146.\ A protein highly similar to F-ATPase beta subunits is found PUBMED:8491729 in some\ bacterial apparatus involved in a specialized protein export pathway that\ proceeds without signal peptide cleavage. This protein is known as fliI in\ Bacillus subtilis and Salmonella typhimurium, Spa47 (mxiB) in Shigella flexneri, HrpB6 in\ Xanthomonas campestris and yscN in Yersinia pestis virulence plasmids.

    \

    In bacteria the alpha chain is the regulatory subunit and the beta chain is the catalytic subunit. In V-type ATP synthase the archaeal alpha chain is the catalytic subunit while the beta chain is the regulatory subunit.

    \ \ 684 IPR003653 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of proteins contain cysteine peptidases belonging to MEROPS peptidase family C48 (Ulp1 endopeptidase family, clan CE). The protein fold of the peptidase domain for members of this family resembles that of adenain, the type example for clan CE. This group of sequences also contains a number of hypothetical proteins, which have not yet been characterised, and non-peptidase homologues. These are proteins that have either been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity of the peptidases in the family.

    \ \

    The Ulp1 endopeptidase family contain the deubiquitinating enzymes (DUB) that can de-conjugate ubiquitin or ubiquitin-like proteins from ubiquitin-conjugated proteins. They can be classified in 3 families according to sequence homology PUBMED:10603300, PUBMED:8982460: Ubiquitin carboxyl-terminal hydrolase (UCH) (see ), Ubiquitin-specific processing protease (UBP) (see ), and ubiquitin-like protease (ULP) specific for de-conjugating ubiquitin-like proteins. In contrast to the UBP pathway, which is very redundant (16 UBP enzymes in yeast), there are few ubiquitin-like proteases (only one in yeast, Ulp1).

    \

    Ulp1 catalyses two critical functions in the SUMO/Smt3 pathway via its\ cysteine protease activity. Ulp1 processes the Smt3 C-terminal sequence\ (-GGATY) to its mature form (-GG), and it de-conjugates Smt3 from the lysine\ epsilon-amino group of the target protein PUBMED:10094048.

    \

    Crystal structure of yeast Ulp1 bound to Smt3 PUBMED:10882122 revealed that the catalytic and interaction interface is situated in a shallow and narrow cleft where conserved residues recognise the Gly-Gly motif at the C-terminal extremity of Smt3 protein. Ulp1 adopts a novel architecture despite some structural similarity with other cysteine protease. The secondary structure is composed of seven alpha helices and seven beta strands. The catalytic domain includes the central alpha helix, beta-strands 4 to 6, and the catalytic triad (Cys-His-Asp). This profile is directed against the C-terminal part of ULP proteins that displays full proteolytic activity PUBMED:10882122.

    \ 3631 IPR006730 \

    Exposure of mammalian cells to hypoxia, radiation and certain chemotherapeutic agents promotes cell cycle arrest and/or apoptosis.\ Activation of p53 responsive genes is believed to play an important role in mediating such responses. PA26 is differentially induced\ by genotoxic stress (UV, gamma-irradiation and cytotoxic drugs) in a p53-dependent manner.\ \ PA26 gene is a novel p53 target gene with properties common to the GADD family of growth arrest and\ DNA damage-inducible stress-response genes, and, thus, a potential novel regulator of cellular growth PUBMED:9926927. A homolgue found in Xenopus, XPA26, was initially detected in the anterior portion of developing notochord at neurula stages, and later in the entire\ notochord except its posterior region at tailbud stages PUBMED:11165487.

    \ 5766 IPR010264 \

    This family consists of a series of plant proteins which are related to the Papaver rhoeas S1 self-incompatibility protein. Self-incompatibility (SI) is the single most important outbreeding device found in angiosperms and is a mechanism that regulates the acceptance or rejection of pollen. S1 is known to exhibit specific pollen-inhibitory properties PUBMED:8134385.

    \ 4194 IPR002673 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L29e forms part of the 60S ribosomal subunit PUBMED:1840484. This family is found in eukaryotes. There are there are 20 to 22 copies of the L29 gene in rat. Rat L29 is related to yeast ribosomal protein YL43 PUBMED:8484767.

    \ 6801 IPR009720 \

    This entry represents the C terminus (approximately 200 residues) of a number of archaeal proteins of unknown function. One member is annotated as being a possible carboligase enzyme.

    \ 3708 IPR005075 \

    This signature, PepSY, is found in the propeptide of members of the MEROPS peptidase family M4 (clan MA(E)), which contains the thermostable thermolysins (), and related thermolabile neutral proteases (bacillolysins) () from various species of Bacillus. It is also in many non-peptidase proteins, including Bacillus subtilis YpeB protein a regulator of SleB spore cortex lytic enzyme and a large number of eubacterial and archaeal cell-wall-associated and secreted proteins which are mostly annotated as hypothetical protein.

    \ \ \

    Many extracellular bacterial proteases are produced as proenzymes. The propeptides usually have a dual function, i.e. they function as an intramolecular chaperone required for the folding of the polypeptide and as an inhibitor preventing premature activation of the enzyme. Analysis of the propeptide region of the M4 family of peptidases reveals two regions of conservation, the PepSY domain and a second domain, proximate to the N terminus, the FTP domain (), which is also found in isolation in the propeptide of eukaryotic peptidases belong to MEROPS peptidase family M36.

    \ \

    Propeptide domain swapping experiments, for example swapping the propeptide domain of PA protease with that of vibrolysin, both propeptides contain the FTP and PepSY domains, allows the PA protease domain to fold correctly and inhibits the C-terminal autoprocessing activity. However, swapping the propeptide of PA protease for the thermolysin propeptide, does not facilitate the correct folding nor the processing of the chimaeric protein into an active peptidase PUBMED:12589825. Mutational analysis of the Pseudomonas aeruginosa elastase gene revealed two mutations in the propeptide which resulted in the loss of inhibitory activity but not chaperone activity: A-15V and T153I (where +1 is defined as the first residue of the mature peptidase). Both mutations resulted in peptidase activity, the T153V mutation being much less effective than the A15I mutation PUBMED:11021931 in activating peptidase activity. The T-153V mutation lies N-terminal to the FTP domain while the A-15I mutation is C-terminal to the PepSY domain.

    \ \

    Given the diverse range of other proteins, both domains occur in in isolation, the exact function of each is still unclear; though it has been proposed that the PepSY domain primarily has inhibitory activity and in conjunction with the FTP domain in chaperone activity.

    \ \ \ 1612 IPR011538 \

    Respiratory-chain NADH dehydrogenase () PUBMED:2029890 (also known as complex I or NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the inner mitochondrial membrane which also seems to exist in the chloroplast and in cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this bioenergetic enzyme complex there is one with a molecular weight of 51 kDa (in mammals), which is the second largest subunit of complex I and is a component of the iron-sulphur (IP) fragment of the enzyme. It seems to bind to NAD, FMN, and a 2Fe-2S cluster. The 51 kDa subunit and the bacterial hydrogenase alpha subunit contain three regions of sequence similarities. The first one most probably corresponds to the NAD-binding site, the second to the FMN-binding site, and the third one, which contains three cysteines, to the iron-sulphur binding region.

    \ 2984 IPR001342 \

    Homoserine dehydrogenase () (HDh) catalyzes NAD-dependent reduction of \ aspartate beta-semialdehyde into homoserine PUBMED:8500624, PUBMED:8395899. This reaction is the third step in\ a pathway leading from aspartate to homoserine. The latter participates in the \ biosynthesis of threonine and then isoleucine as well as in that of methionine.

    \

    HDh is found either as a single chain protein as in some bacteria and yeast,\ or as a bifunctional enzyme consisting of an N-terminal aspartokinase domain\ and a C-terminal HDh domain as in bacteria such as Escherichia coli and in plants.

    \ 3970 IPR004974 \

    The Poxvirus RNA polymerase-associated transcription specificity factor Rap94 associates with RNA polymerase and may mediate binding of the core polymerase to VetF. It is required for transcription of early genes.

    \ 2322 IPR007800 \ This family consists of uncharacterised proteins from Borrelia burgdorferi.\ 1488 IPR007026 \ This short domain contains four conserved cysteines that are probably required for the formation of two disulphide bonds. The domain is named after the characteristic CC motif.\ 2588 IPR000960 \

    Flavin-containing monooxygenases (FMOs) constitute a family of xenobiotic-metabolising enzymes PUBMED:8311461. Using an NADPH cofactor and FAD prosthetic group, these microsomal proteins catalyse the oxygenation of nucleophilic nitrogen, sulphur, phosphorous and selenium atoms in a range of structurally diverse compounds. FMOs have been implicated in the metabolism of a number of pharmaceuticals, pesticides and toxicants. In man, lack of hepatic FMO-catalysed trimethylamine metabolism results in trimethylaminuria (fish odour syndrome). Five mammalian forms of FMO are now known and have been designated FMO1-FMO5 PUBMED:1712018, PUBMED:2318837,\ PUBMED:1542660, PUBMED:1417778, PUBMED:8486656. This is a recent nomenclature based on comparison of amino acid sequences, and has been introduced in an attempt to eliminate confusion inherent in multiple, laboratory-specific designations and tissue-based classifications PUBMED:8311461. Following the determination of the complete nucleotide sequence of S. cerevisiae PUBMED:8091229, a novel gene was found to encode a protein with similarity to mammalian monooygenases.

    \ \ 5101 IPR007938 \

    This family consists of several nucleopolyhedrovirus occlusion-derived virus envelope E25\ proteins. The N terminus of this protein is extremely hydrophobic, studies suggest that this defined hydrophobic domain is sufficient to direct the protein to\ induced membrane microvesicles within a baculovirus-infected cell nucleus and the viral envelope. In addition,\ movement of the protein into the nuclear envelope may initiate through cytoplasmic membranes, such as endoplasmic reticulum, and\ that transport into the nucleus may be mediated through the outer and inner nuclear membrane PUBMED:9108103.

    \ 7531 IPR011644 \ The HNOB (Heme NO Binding) domain, is a predominantly alpha-helical domain and binds heme via a covalent linkage to histidine. The HNOB domain is predicted to function as a heme-dependent sensor for gaseous ligands, and transduce diverse downstream signals, in both bacteria and animals.\ 6311 IPR009470 \

    This ~170 aa region is found at the C terminus of .

    \ 24 IPR002656 \ This entry contains a range of acyltransferase enzymes as well as yet uncharacterised proteins from Caenorhabditis elegans.\ 2237 IPR007606 \ This family contains several uncharacterised chlamydial proteins.\ 9 IPR000337 \

    G-protein-coupled receptors, GPCRs, constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence PUBMED:8170923. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. There is a specialized database for GPCRs: http://www.gpcr.org/7tm/.

    \

    The metabotropic glutamate receptors are functionally and pharmacologically distinct from the ionotropic glutamate receptors. They are coupled to G-proteins and stimulate the inositol phosphate/Ca2+ intracellular signalling pathway PUBMED:1847995, PUBMED:1656524, PUBMED:1320017, PUBMED:1309649. The amino acid sequences of the receptors contain high proportions of hydrophobic residues grouped into 7 domains, in a manner reminiscent of the rhodopsins and other receptors believed to interact with G-proteins. However, while a similar 3D framework has been proposed to account for this, there is no significant sequence identity between these and receptors of the rhodopsin-type family: the metabotropic glutamate receptors thus bear their own distinctive '7TM' signature. This 7TM signature is also shared by the calcium-sensing receptors, and GABA (gamma-amino-butyric acid) type B (GABA(B)) receptors.

    \

    \ 7865 IPR012990 \

    This domain is the Sec23/Sec24 beta-sandwich domain.

    \ 761 IPR003114 \ This domain is found associated with PX domains. The PX (phox) domain PUBMED:8931154 occurs in a variety of eukaryotic proteins associated with intracellular signaling pathways.\ 4052 IPR001996 \

    The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) PUBMED:8246840, PUBMED:2197982 is a major carbohydrate transport system in bacteria. The PTS catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred to enzyme-I (EI) of PTS which in turn transfers it to a phosphoryl carrier protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease which consists of at least three structurally distinct domains (IIA, IIB, and IIC) PUBMED:1537788 which can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII).

    \ \

    The first domain (IIA) carries the first permease-specific phoshorylation site, a histidine, which is phosphorylated by phospho-HPr. The second domain (IIB) is phosphorylated by phospho-IIA on a cysteinyl or histidyl residue, depending on the permease. Finally, the phosphoryl group is transferred from the IIB domain to the sugar substrate in a process catalyzed by the IIC domain; this process is coupled to the transmembrane transport of the sugar.

    \ 1832 IPR003834 \ DsbA and DsbC, periplasmic proteins of Escherichia coli, are two key players involved in disulphide bond formation. DsbD generates a reducing source in the periplasm, which is required for maintaining proper redox conditions PUBMED:7628442. DipZ is essential for maintaining cytochrome c apoproteins in the correct conformations for the covalent attachment of haem groups to the appropriate pairs of cysteine residues PUBMED:7623667.\ 63 IPR011021 \

    G protein-coupled receptors are a large family of signalling molecules that respond to a wide variety of extracellular stimuli. The receptors relay the information encoded by the\ ligand through the activation of heterotrimeric G proteins and intracellular effector molecules. To ensure the appropriate regulation of the signalling cascade, it is vital to properly\ inactivate the receptor. This inactivation is achieved, in part, by the binding of a soluble protein, arrestin, which uncouples the receptor from the downstream G protein after the receptors are phosphorylated by G\ proteincoupled receptor kinases. In\ addition to the inactivation of G protein-coupled receptors, arrestins have also been implicated in the endocytosis of receptors and cross talk with other signalling pathways. Arrestin (retinal S-antigen) is a major protein of the retinal rod outer segments. It interacts with \ photo-activated phosphorylated rhodopsin, inhibiting or 'arresting' its ability to interact with transducin\ PUBMED:15335861. The protein binds calcium, and shows similarity in its C-terminus to alpha-transducin and\ other purine nucleotide-binding proteins. In mammals, arrestin is associated with autoimmune uveitis.

    \ Arrestins comprise a family of closely-related proteins that includes beta-arrestin-1 and -2, which regulate\ the function of beta-adrenergic receptors by binding to their phosphorylated forms, impairing their capacity\ to activate G(S) proteins. The crystal structure of bovine retinal arrestin comprises two domains of antiparallel beta-sheets connected through a hinge\ region and one short alpha-helix on the back of the amino-terminal fold PUBMED:9495348. The binding region for phosphorylated light-activated rhodopsin is\ located at the N-terminal domain, as indicated by the docking of the photoreceptor to the three-dimensional structure of arrestin. The N-terminal domain is a sandwich formed by several beta-sheets in addition to the short alpha helix.

    \ 511 IPR000048 \

    Calmodulin (CaM) is recognized as a major calcium sensor and orchestrator of regulatory events through its interaction with a diverse group of cellular proteins. Three classes of recognition motifs exist for many of the known CaM binding proteins; the IQ motif as a consensus for Ca2+-independent binding and two related motifs for Ca2+-dependent binding, termed\ 18-14 and 1-5-10 based on the position of conserved hydrophobic residues PUBMED:9141499.

    \

    The regulatory domain of scallop myosin is a three-chain protein complex that\ switches on this motor in response to Ca2+ binding. Side-chain interactions link the two light chains in tandem to adjacent segments of the heavy chain bearing the IQ-sequence motif. The Ca2+-binding site is a novel EF-hand motif on the essential light chain and is stabilized by linkages involving the heavy chain and both light chains, accounting for the requirement of all three chains for Ca2+ binding and regulation in the intact myosin molecule PUBMED:8127365.

    \ 4081 IPR007232 \ The DNA single-strand annealing proteins (SSAPs), such as RecT, Red-beta, ERF and Rad52, function in RecA-dependent and RecA-independent DNA recombination pathways. This family includes proteins related to Rad52. These proteins contain two helix-hairpin-helix motifs PUBMED:11914131.\ 6228 IPR009066 \

    The alpha-2-macroglobulin receptor-associated protein (RAP) is a glycoprotein that binds to the alpha-2-macroglobulin receptor, as well as to other members of the low density lipoprotein receptor family (). RAP acts to inhibit the binding of all know ligands for these receptors, and may prevent receptor aggregation and degradation in the endoplasmic reticulum, thereby acting as a molecular chaperone PUBMED:9207124. RAP may be under the regulatory control of calmodulin, since it is able to bind calmodulin and be phosphorylated by calmodulin-dependent kinase II ().

    \

    RAP is comprised of three domains. Both domains 1 and 3 are involved in binding to the alpha-2-macroglobulin receptor, while domain 1 is also involved in inhibiting the binding of activated alpha-2-macroglobulin (). Structural studies have revealed the RAP domain 1 to be comprised of a partly opened bundle of three helices, the first one being shorter than the other two.

    \ \ 5401 IPR008719 \ NosL is one of the accessory proteins of the nos (nitrous oxide reductase) gene cluster. NosL is a monomeric protein of 18,540 MW that specifically and stoichiometrically binds Cu(I). The copper ion in NosL is ligated by a Cys residue, and one Met and one His are thought to serve as the other ligands. It is possible that NosL is a copper chaperone involved in metallocenter assembly PUBMED:11293413.\ 4062 IPR006628 \

    The Pur protein family consists of four known members in humans and is strongly conserved throughout evolution. Pur-alpha is a highly conserved, sequence-specific DNA- and RNA-binding protein involved in diverse cellular and viral functions including transcription, replication, and cell growth. Pur-alpha has a modular structure with alternating three basic aromatic class I and two acidic leucine-rich class II repeats in the central region of the protein PUBMED:1448097.

    \ \ \

    In addition to its involved in basic cellular function, Pur-alpha, has been implicated in the development of blood cells and cells of the central nervous system; it has also been implicated in the inhibition of oncogenic transformation and along with Pur-beta in myelodysplastic syndrome progressing to acute myelogenous leukemia. Pur-alpha can influence viral interaction through functional associations, for example with the Tat protein and TAR RNA of HIV-1, and with large T-antigen and DNA regulatory regions of JC virus. JC virus causes opportunistic infections in the brains of certain HIV-1-infected individuals PUBMED:12894583.

    \ \ 4887 IPR002669 \ UreD is a urease accessory protein. Urease hydrolyses urea into ammonia and carbamic acid PUBMED:8550495. UreD is involved in activation of the urease enzyme via the UreD-UreF-UreG-urease complex PUBMED:9209019 and is required for urease nickel metallocenter assembly PUBMED:7909161. See also UreF , UreG .\ 2849 IPR000879 \ Guanylin, a 15-amino-acid peptide, is an endogenous ligand of the intestinal receptor guanylate \ cyclase-C, known as STaR PUBMED:7713512, PUBMED:1409606. Upon receptor binding, guanylin increases the \ intracellular concentration of cGMP, it induces chloride secretion and decreases intestinal fluid \ absorption, ultimately causing diarrhoea PUBMED:1346555. The peptide stimulates the enzyme through \ the same receptor binding region as the heat-stable enterotoxins PUBMED:1409606.\ 2896 IPR003404 \ Glycoprotein E (gE) of Alphaherpesvirus forms a complex with glycoprotein I (gI), functioning as an immunoglobulin G (IgG) Fc binding protein. gE is involved in virus spread but is not essential for propagation PUBMED:10881679.\ 3531 IPR004338 \ This family of proteins describes the Nqr2 (NqrB) subunit of the bacterial 6-subunit sodium-translocating NADH-ubiquinone oxidoreductase (i.e. a respiration linked sodium pump). In Vibrio cholerae, it negatively regulates the expression of virulence factors through inhibiting (by an unknown mechanism) the transcription of the transcriptional activator ToxT PUBMED:10077658. The family also includes RnfD, which is involved in nitrogen fixation. The similarity of RnfD to NADH-ubiquinone oxidoreductases was previously noted PUBMED:9154934.\ 7450 IPR011475 \

    Several Rhodopirellula baltica proteins share this probable domain. Most of these proteins are predicted to be secreted or membrane-associated.

    \ 2484 IPR000686 \ Fanconi anaemia (FA) PUBMED:8490620, PUBMED:7929819, PUBMED:1574115 is a recessive inherited disease characterised\ by defective DNA repair. FA cells are sensitive to DNA cross-linking agents that cause chromosomal instability\ and cell death. The disease is manifested clinically by progressive pancytopenia, variable physical anomalies,\ and predisposition to malignancy. Four complementation groups have been identified, designated A to D. The\ gene for group C (FACC) has been cloned. Expression of the FACC cDNA corrects the phenotypic defect of FA(C)\ cells, resulting in normalized cell growth in the presence of DNA cross-linking agents such as mitomycin C\ (MMC). Gene transfer of the FACC gene should provide a survival advantage to transduced hematopoietic cells,\ suggesting that FA might be an ideal candidate for gene therapy PUBMED:7929819. The function of the FACC gene\ is not known. Immunofluorescence and sub-cellular fractionation studies of human cell lines, and COS-7 cells\ transiently expressing human FACC, showed the protein to be located primarily in the cytoplasm. Yet, placement\ of a nuclear localisation signal at the N-terminus of FACC directed the hybrid protein to the nuclei of\ transfected COS-7 cells. Such findings suggest an indirect role for FACC in regulating DNA repair in this\ group of Fanconi anaemia PUBMED:8058745.\ 7970 IPR012608 \

    This family consists of Sex Peptides (SP) that are found in Drosophila. On mating, Drosophila females decreases her remating rate and increases her egg-laying rate due, in part, to the transfer of SP from the male to the female. SP are found in seminal fluids transferred from the male to the female during mating. The male seminal fluid proteins are referred to as accessory gland proteins (Acps). The SP is one of the most interesting Acps and plays an important role in reproduction PUBMED:12913117.

    \ 6167 IPR009405 \

    This family consists of several Vibrio cholerae toxin co-regulated pilus biosynthesis protein F (TcpF) sequences. TcpF is known to be a secreted virulence protein but its exact function is unknown PUBMED:11466276.

    \ 5064 IPR007901 \

    This putative domain is found in the MoeZ protein and the MoeB protein. The domain has two\ CXXC motifs that are only partly conserved. MoeZ is necessary for the synthesis of pyridine-2,6-bis(thiocarboxylic acid), a small secreted metabolite that has a high affinity for transition\ metals, increases iron uptake efficiency by 20% in Pseudomonas stutzeri, has the ability to reduce both soluble and mineral forms of\ iron, and has antimicrobial activity towards several species of bacteria. MoeB is the molybdopterin synthase activating enzyme in the molybdopterin cofactor biosynthesis pathway.\ Both these enzymes are members of a superfamily consisting of related but structurally distinct proteins that are members of pathways involved in the\ transfer of sulphur-containing moieties to metabolites PUBMED:11972321 and both also contain the UBA/THIF-type NAD/FAD binding fold ().

    \ 4112 IPR005518 \

    Remorin binds both simple and complex galaturonides. The N-terminal region of remorin is proline rich, while the C-terminal region has been predicted to form a coiled-coil, that is expected to interact with other macromolecules, most likely DNA. Functional similarities between the behavior of the proteins and viral proteins involved in intercellular communication have been noted PUBMED:8989883.

    \ 4678 IPR011766 \

    A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cofactor. It has been shown PUBMED:8604141 that some of these enzymes are structurally related. This represents the C-terminal TPP binding domain of TPP enzymes.

    \ 2361 IPR008180 \

    Synonym(s): dUTP diphosphatase, Deoxyuridine-triphosphatase

    \

    The essential enzyme dUTP pyrophosphatase () is specific for dUTP and is critical for the fidelity of DNA replication and repair. dUTPase hydrolyzes dUTP to dUMP and pyrophosphate, simultaneously reducing dUTP levels and providing the dUMP for dTTP biosynthesis. dUTPase decreases the intracellular concentration of dUPT so that uracil cannot be incorporated into DNA PUBMED:8805593.

    \

    The crystal structure of human dUTPase reveals that each subunit of the dUTPase trimer folds into an eight-stranded jelly-roll beta barrel, with the C-terminal beta strands interchanged among the subunits. The structure is similar to that of the Escherichia coli enzyme, despite low sequence homology between the two enzymes PUBMED:8805593.

    \

    Other enzymes like deoxycytidine triphosphate deaminase (dCTP) () that specifically bind uridine also belong to this group suggesting that the signature may recognise a putative uridine-binding motif.

    \

    Some retroviruses encode dUTPases. Retroviral dUTPase is synthesised as part of POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, dUTPase and RNase H.

    \ 5851 IPR009260 \

    This family consists of several archaeal strongly conserved proteins of unknown function that are associated with CRISPR (Clustered, Regularly Interspaced Short Palidromic Repeats).

    \ \ \ 5145 IPR007982 \

    This family consists of several Tombusvirus movement\ proteins. These proteins allow the virus to move from cell-to-cell and allow host-specific systemic\ spread PUBMED:11483749.

    \ 5753 IPR008108 \

    Some Gram-negative animal enteropathogens express a specialised secretion \ system to directly "inject" exotoxins into the cytoplasm of host cells. \ Dubbed the type III secretion system, it is of specific interest to \ researchers, as the components of such a system are only expressed in \ pathogenic strains PUBMED:11018143. The system is composed of structural proteins and \ exotoxin effectors; these are often encoded on large virulence plasmids or \ on the bacterial chromosome itself PUBMED:11018143. \

    \

    The Shigella flexneri invasion plasmid antigen (ipa) genes are found on such \ a plasmid, and are arranged into an operon. Directly upstream of this operon \ is another cluster of type III genes, termed ipgD, E and F PUBMED:8478058. Deletion \ mutational studies of all three genes showed they were essential for \ virulence in S.flexneri, and that IpgD is secreted by the type III needle \ to the outside of the bacterial cell PUBMED:8478058. Further analysis of the ipg operon\ confirmed that the IpgD gene product is chaperoned by the IpgE protein while \ in the bacterial cytoplasm PUBMED:11029686. \

    \

    More recently, a large study into the spread of the ipa/mxi/ipg\ pathogenicity islands through their relevant plasmid has revealed that \ homologues exist in many different Shigella strains, as well as \ enteroinvasive Escherichia coli and Salmonella spp. PUBMED:11553574. There is evidence that the \ genes were acquired from Shigella through lateral transfer, like most of the\ other type III secretion system virulence plasmids.

    \ \ 6046 IPR009348 \

    This family of regulators are involved in post-translational control of nitrogen permease.

    \ 903 IPR000814 \

    The TATA-box binding protein (TBP) is required for the initiation of transcription by RNA polymerases I, II and III, from promoters with or without a TATA box PUBMED:12782648, PUBMED:10974559. TBP associates with a host of factors, including the general transcription factors TFIIA, -B, -D, -E, and -H, to form huge multi-subunit pre-initiation complexes on the core promoter. Through its association with different transcription factors, TBP can initiate transcription from different RNA polymerases. There are several related TBPs, including TBP-like (TBPL) proteins PUBMED:12878007.

    \

    The C-terminal core of TBP (~180 residues) is highly conserved and contains two 77-amino acid repeats that produce a saddle-shaped structure that straddles the DNA; this region binds to the TATA box and interacts with transcription factors and regulatory proteins PUBMED:1436073. By contrast, the N-terminal region varies in both length and sequence.

    \ \ 6647 IPR009639 \

    This region is of unknown function but is found in some archaeal . It is predicted to be of mixed alpha/beta secondary structure by JPred.

    \ 273 IPR007137 \ This domain normally occurs as tandem repeats; however it is found as a single copy in the Saccharomyces cerevisiae DNA-binding nuclear protein YCR593 ().\ 1486 IPR005085 \

    Carbohydrate-binding module, family 25 PUBMED: has a starch-binding function as demonstrated in one case.

    \ 4534 IPR005145 \

    The function of this domain is unknown, it is found in and its relatives. It is found C-terminal to the .

    \ 8150 IPR013167 \

    This region is found in yeast oligomeric golgi complex component 4 which is involved in ER to Golgi and intra Golgi transport PUBMED:12006647.

    \ 2807 IPR000173 \

    Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) plays an important role in glycolysis and gluconeogenesis PUBMED:2716055 by reversibly catalysing the oxidation and phosphorylation of D-glyceraldehyde-3-phosphate to 1,3-diphospho- glycerate. The enzyme exists as a tetramer of identical subunits, each containing 2 conserved functional domains: an NAD-binding domain, and a highly conserved catalytic domain PUBMED:6303388. The enzyme has been found to bind to actin and tropomyosin, and may thus have a role in cytoskeleton assembly. Alternatively, the cytoskeleton may provide a framework for precise positioning of the glycolytic enzymes, thus permitting efficient passage of metabolites from enzyme to enzyme PUBMED:6303388.

    \

    GAPDH displays diverse non-glycolytic functions as well, its role depending upon its subcellular location. For instance, the translocation of GAPDH to the nucleus acts as a signalling mechanism for programmed cell death, or apoptosis PUBMED:10740219. The accumulation of GAPDH within the nucleus is involved in the induction of apoptosis, where GAPDH functions in the activation of transcription. The presence of GAPDH is associated with the synthesis of pro-apoptotic proteins like BAX, c-JUN and GAPDH itself.

    \

    GAPDH has been implicated in certain neurological diseases: GAPDH is able to bind to the gene products from neurodegenerative disorders such as Huntington’s disease, Alzheimer’s disease, Parkinson’s disease and Machado-Joseph disease through stretches encoded by their CAG repeats. Abnormal neuronal apoptosis is associated with these diseases. Propargylamines such as deprenyl increase neuronal survival by interfering with apoptosis signalling pathways via their binding to GAPDH, which decreases the synthesis of pro-apoptotic proteins PUBMED:12721812.

    \ \ \ 6379 IPR010546 \

    This family consists of several bacterial proteins, at least one of which is involved in enzyme induction following nitrogen deprivation. The exact function of this family is unknown

    \ 5892 IPR009274 \

    The Gam protein inhibits RecBCD nuclease and is found in both bacteria and bacteriophage PUBMED:8335632.

    \ 8013 IPR012554 \

    This family consists of the DegQ (formerly sacQ) regulatory peptides. The DegQ family of peptides control the rates of synthesis of a class of both secreted and intracellular degradative enzymes in Bacillus subtilis. DegQ is 46 amino acids long and activates the synthesis of degradative enzymes. The expression of this peptide was shown to be subjected both to catabolite repression and DegS-DegU-mediated control. Thus allowing an increase in the rate of synthesis of degQ under conditions of nitrogen starvation PUBMED:1688843.

    \ 3157 IPR008211 \

    Laminin is a large molecular weight glycoprotein present only in basement membranes in almost every animal tissue. Laminin is thought to mediate the attachment, migration and organisation of cells into tissues during embryonic development by interacting with other extracellular matrix components PUBMED:1975589. Each laminin is a heterotrimer assembled from alpha, beta and gamma chain subunits, secreted and incorporated into cell-associated extracellular matrices PUBMED:10842354.

    \

    \ Basement membrane assembly is a cooperative process in which laminins polymerize through their N-terminal domain (LN or domain VI) and anchor to the cell surface through their G domains. Netrins may also associate with this network through heterotypic LN domain interactions PUBMED:8349613. This leads to cell signaling through integrins and dystroglycan (and possibly other receptors) recruited to the adherent laminin. This LN domain dependent self-assembly is considered to be crucial for the integrity of basement membranes, as highlighted by genetic forms of muscular dystrophy containing the deletion of the LN module from the alpha 2 laminin chain PUBMED:7874173. The laminin N-terminal domain is found in all laminin and netrin subunits except laminin alpha 3A, alpha 4 and gamma 2.

    \ \ \ 406 IPR006075 \

    Glutamyl-tRNA(Gln) amidotransferase subunit B () PUBMED:9342321 is a microbial enzyme that furnishes a means for formation of correctly charged Gln-tRNA(Gln) through the transamidation of misacylated Glu-tRNA(Gln) in organisms which lack glutaminyl-tRNA synthetase. The reaction takes place in the presence of glutamine and ATP through an activated gamma-phospho-Glu-tRNA(Gln). The enzyme is composed of three subunits: A (an amidase), B and C. It also exists in eukaryotes as a protein targeted to the mitochondria.\

    \ 1909 IPR002542 \ This domain has no known function. It is found in one or two\ copies in several Caenorhabditis elegans proteins. It is\ roughly 130 amino acids and contains 12 conserved\ cysteines.\ 7373 IPR011520 \

    The mammalian TEF and the Drosophila scalloped genes belong to a conserved family of transcriptional factors that possesses a TEA/ATTS DNA-binding domain. Transcriptional activation by these proteins likely requires interactions with specific coactivators. In Drosophila, Scalloped (Sd) interacts with Vestigial (Vg) to form a complex, which binds DNA through the Sd TEA/ATTS domain. The Sd-Vg heterodimer is a key regulator of wing development, which directly controls several target genes and is able to induce wing outgrowth when ectopically expressed. This short conserved region is needed for interaction with Sd PUBMED:10518497.

    \ 7098 IPR009895 \

    This family consists of several hypothetical proteins of around 170 residues in length, which appear to be mouse specific. The function of this family is unknown.

    \ 7808 IPR013114 \

    Fatty acids biosynthesis occurs by two distinct pathways: in fungi, mammals and mycobacteria, type I or associative fatty-acid biosynthesis (type I FAS) is accomplished by multifunctional proteins in which distinct domains catalyse specific reactions; in plants and most bacteria, type II or dissociative fatty-acid biosynthesis (type II FAS) is accomplished by distinct enzymes PUBMED:14684903.

    \

    Both FabZ and FabA catalyse the dehydration of beta-hydroxyacyl acyl carrier protein (ACP) to trans 2-enoyl ACP. However, FabZ and FabA display subtle differences in substrate specificities, whereby FabA is most effective on acyl ACPs of 9-11 carbon atoms in length, while FabZ is less specific. Unlike FabA, FabZ does not function as an isomerase and cannot initiate unsaturated fatty acid biosynthesis. However, only FabZ can act during the elongation of unsaturated fatty acid chains.

    \ \ \

    This enzyme domain has a HotDog fold.

    \ 6360 IPR009495 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 8101 IPR013163 \

    Cache is a signaling domain common to animal Ca2+-channel subunits and a class of prokaryotic chemotaxis receptors PUBMED:11084361.

    \ 5326 IPR008801 \ RALF, a 5 kDa ubiquitous polypeptide in plants, arrests root growth and development.\ 2723 IPR001419 \

    Gluten is the protein component of wheat flour. It consists of numerous\ proteins, which are of two different types responsible for different physical\ properties of dough: the glutenins, which are primarily responsible for\ the elasticity, and the gliadins, which contribute to the extensibility.

    \

    The glutenins are of two different types, termed low (LMW) and high \ molecular weight (HMW) subunits PUBMED:3840588. The glutenin high molecular weight subunits are classified as\ elastomeric proteins, because the glutenin network can withstand significant deformations without breaking, and return to the\ original conformation when the stress is removed. Elastomeric proteins differ considerably in amino acid sequence, but they are all\ polymers whose subunits consist of elastomeric domains, composed of repeated motifs, and non-elastic domains that mediate\ cross-linking between the subunits. The elastomeric domain motifs are all rich in glycine residues in addition to other hydrophobic\ residues. High molecular weight glutenin subunits have an extensive central elastomeric domain, flanked by two terminal non-elastic\ domains that form disulphide cross-links. The central elastomeric domain is characterised by the following three repeated motifs:\ PGQGQQ, GYYPTS[P/L]QQ, GQQ. It possesses overlapping beta-turns within and between the repeated motifs, and assumes a\ regular helical secondary structure with a diameter of approx. 1.9 nm and a pitch of approx. 1.5 nm PUBMED:11084370.

    \ 6902 IPR009779 \

    This family consists of several eukaryotic translocon-associated protein, gamma subunit (TRAP-gamma) sequences. The translocation site (translocon), at which nascent polypeptides pass through the endoplasmic reticulum membrane, contains a component previously called 'signal sequence receptor' that is now renamed as 'translocon-associated protein' (TRAP). The TRAP complex is comprised of four membrane proteins alpha, beta, gamma and delta, which are present in a stoichiometric relation, and are genuine neighbours in intact microsomes. The gamma subunit is predicted to span the membrane four times PUBMED:7916687.

    \ 488 IPR002121 \ The HRDC (Helicase and RNase D C-terminal) domain has a putative role in nucleic acid binding. Mutations in the HRDC domain associated with the human BLM gene result in Bloom Syndrome (BS), an autosomal recessive disorder characterized by proportionate pre- and postnatal growth deficiency; sun-sensitive, telangiectatic, hypo- and hyperpigmented skin; predisposition to malignancy; and chromosomal instability PUBMED:9397680.\ 140 IPR003492 \

    Batten's disease, the juvenile variant of neuronal ceroid lipofuscionosis\ (NCL), is a recessively inherited disorder affecting children of 5-10\ years of age. The disease is characterised by progressive loss of vision,\ seizures and psychomotor disturbances. Biochemically, the disease is\ characterised by lysosomal accumulation of hydrophobic material, mainly ATP\ synthase subunit C, largely in the brain but also in other tissues. The disease is fatal within a decade PUBMED:7553855.

    \

    Mutations in the CLN3 gene are believed to cause Batten's disease PUBMED:7553855. The\ CLN3 gene, with a predicted 438-residue product, maps to chromosome p16p12.1. The gene contains at least 15 exons spanning 15kb and is highly conserved in mammals PUBMED:2142158. A 1.02kb deletion in the CLN3 gene, occurring in either one or both alleles, is found in 85% of Batten disease chromosomes causing a frameshift generating a predicted translated product of 181 amino acid residues PUBMED:7553855, PUBMED:10191115. 22 other mutations, including deletions, insertions and point mutations, have been\ reported. It has been suggested that such mutations result in severely\ truncated CLN3 proteins, or affect its structure/conformation PUBMED:7553855, PUBMED:9311735.

    \

    CLN3 proteins, which are believed to associate in complexes, are heavily\ glycosylated lysosomal membrane proteins PUBMED:10191115, containing complex Asn-linked\ oligosaccharides PUBMED:2142158. Extensive glycosylation is important for the stability\ of these lysosomal proteins in the highly hydrolytic lysosomal lumen. Lysosomal\ sequestration of active lysosomal enzymes, transport of degraded molecules\ from the lysosomes, and fusion and fission between lysosomes and other\ organelles. The CLN3 protein is a 43kDa, highly hydrophobic, multi-transmembrane (TM),\ phosphorylated protein PUBMED:10191115. Hydrophobicity analysis predicts 6-9 TM\ segments, suggesting that CLN3 is a TM protein that may function as a\ chaperone or signal transducer. The majority of putative phosphorylation\ sites are found in the N-terminal domain, encompassing 150 residues PUBMED:10191114.\ Phosphorylation is believed to be important for membrane compartment \ interaction, in the formation of functional complexes, and in regulation \ and interactions with other proteins PUBMED:1482112.

    \

    CLN3 contains several motifs that may undergo lipid post-translational\ modifications (PTMs). PTMs contribute to targeting and anchoring of modified\ proteins to distinct biological membranes PUBMED:7716512. There are three general \ classes of lipid modification: N-terminal myristoylation, C-terminal \ prenylation, and palmitoylation of cysteine residues. Such modifications \ are believed to be a common form of PTM occurring in 0.5% of all cellular\ proteins, including brain tissue PUBMED:10191112. The C terminus of the CLN3 contains\ various lipid modification sites: C435, target for prenylation; G419, \ target for myristoylation; and C414, target for palmitoylation PUBMED:9384607.\ Prenylation results in protein hydrophobicity, influences interaction with\ upstream regulatory proteins and downstream effectors, facilitates protein-protein interaction (multisubunit assembly) and promotes anchoring to\ membrane lipids. The prenylation motif, Cys-A-A-X, is highly conserved\ within CLN3 protein sequences of different species PUBMED:10191112.\ Species with known CLN3 protein homologues include: Homo sapiens, Canis \ familiaris, Mus musculus, Saccharomyces cerevisiae and Drosophila\ melanogaster.

    \ \ 8076 IPR013189 \

    This domain corresponds to the C terminal domain of glycosyl hydrolase family 32. It forms a beta sandwich module PUBMED:14973124.

    \ 844 IPR007225 \

    Sec15 is a component of the exocyst complex involved in the docking of exocystic vesicles with a fusion site on the plasma membrane. The exocyst complex is composed of Sec3, Sec5, Sec6, Sec8, Sec10, Sec15, Exo70 and Exo84.

    \ 613 IPR003441 \

    The NAC domain (for Petunia hybrida NAM and for Arabidopsis ATAF1, ATAF2, and CUC2) is an\ N-terminal module of ~160 amino acids, which is found in proteins of the NAC\ family of plant-specific transcriptional regulators (no apical meristem (NAM) proteins) PUBMED:9212461. NAC proteins are\ involved in developmental processes, including formation of the shoot apical\ meristem, floral organs and lateral shoots, as well as in plant hormonal\ control and defence. The NAC domain is accompanied by diverse C-terminal\ transcriptional activation domains. The NAC domain has been shown to be a DNA-\ binding domain (DBD) and a dimerization domain PUBMED:11114891,PUBMED:12175016.

    \ \

    The NAC domain can be subdivided into five subdomains (A-E). Each subdomain is\ distinguishable by blocks of heterogeneous amino acids or gaps. While the NAC\ domains were rich in basic amino acids (R, K and H) as a whole, the\ distribution of positive and negative amino acids in each subdomain were\ unequal. Subdomains C and D are rich in basic amino acids but poor in acidic\ amino acids, while subdomain B contains a high proportion of acidic amino\ acids. Putative nuclear localization signals (NLS) have been detected in\ subdomains C and D PUBMED:10660065. The DBD is contained within a 60 amino acid region\ located within subdomains D and E PUBMED:12175016. The overall structure of the NAC domain\ monomer consists of a very twisted antiparallel beta-sheet, which packs\ against an N-terminal alpha-helix on one side and one shorter helix on the\ other side surrounded by a few helical elements. The\ structure suggests that the NAC domain mediates dimerization through conserved\ interactions including a salt bridge, and DNA binding through the NAC dimer\ face rich in positive charges PUBMED:15083810.\

    \ 6409 IPR009511 \

    This family of proteins may function to silence the spindle checkpoint and allow mitosis to proceed through anaphase by binding to MAD2L1 after it has become dissociated from the MAD2L1-CDC20 complex. During early mitosis, the protein is unevenly distributed throughout the nucleoplasm. From metaphase to anaphase, it is concentrated on the spindle.

    \ 4153 IPR007756 \ This domain is about 85 residues in length and very rich in charged residues, hence the name RICH (Rich In CHarged residues). It is found in secreted proteins such as PspC , SpsA and IgA FC receptor from Streptococcus agalactiae. This domain could be involved in bacterial adherence or cell wall binding.\ 1165 IPR005164 \

    This family is found in pairs in Allantoicases, forming the majority of the protein. These proteins allow the use of purines as secondary nitrogen sources in nitrogen-limiting conditions through the reaction:

    \ 575 IPR002903 \ This is a family of methyltransferases. Methyltransferases are responsible for the transfer of methyl groups between two molecules.\ 7540 IPR011714 \

    This repeat is found in some Plasmodium and Theileria proteins.

    \ 6878 IPR009763 \

    This family consists of several hypothetical bacterial proteins of around 145 residues in length. Members of this family appear to be specific to the Orders Bacillales and Lactobacillales. The function of this family is unknown.

    \ 3484 IPR001564 \

    Nucleoside diphosphate kinases () (NDK) PUBMED: are enzymes required for the synthesis of nucleoside triphosphates (NTP) other than ATP. They provide NTPs for nucleic acid synthesis, CTP for lipid synthesis, UTP for polysaccharide synthesis and GTP for protein elongation, signal transduction and microtubule polymerization.

    \

    In eukaryotes, there seems to be a small family of NDK isozymes each of which acts in a different subcellular compartment and/or has a distinct biological function. Eukaryotic NDK isozymes are hexamers of two highly related chains (A and B) PUBMED:1851158. By random association (A6, A5B...AB5, B6), these two kinds of chain form isoenzymes differing in their isoelectric point.

    \

    NDK are proteins of 17 Kd that act via a ping-pong mechanism in which a histidine residue is phosphorylated, by transfer of the terminal phosphate group from ATP. In the presence of magnesium, the phosphoenzyme can transfer its phosphate group to any NDP, to produce an NTP.

    \

    NDK isozymes have been sequenced from prokaryotic and eukaryotic sources. It has also been shown PUBMED:2175255 that the Drosophila awd (abnormal wing discs) protein, is a microtubule-associated NDK. Mammalian NDK is also known as metastasis inhibition factor nm23. The sequence of NDK has been highly conserved through evolution. There is a single histidine residue conserved in all known NDK isozymes, which is involved in the catalytic mechanism PUBMED:1851158. Our signature pattern contains this residue.

    \ 304 IPR006836 \ This family includes several uncharacterised proteins from Caenorhabditis elegans.\ 1683 IPR000675 \

    Aerial plant organs are protected by a cuticle composed of an insoluble polymeric structural compound,\ cutin, which is a polyester composed of hydroxy and hydroxyepoxy fatty acids PUBMED:. Plant pathogenic\ fungi produce extracellular degradative enzymes PUBMED:1557023 that play an important role in pathogenesis.\ They include cutinase, which hydrolyses cutin, facilitating fungus penetration through the cuticle. Inhibition\ of the enzyme can prevent fungal infection through intact cuticles. Cutin monomers released from the cuticle\ by small amounts of cutinase on fungal spore surfaces can greatly increase the amount of cutinase secreted by\ the spore, the mechanism for which process is as yet unknown PUBMED:, PUBMED:1557023.

    \

    Cutinase is a serine esterase containing the classical Ser, His, Asp triad of serine hydrolases PUBMED:.\ The protein belongs to the alpha-beta class, with a central beta-sheet of 5 parallel strands covered by 5\ helices on either side of the sheet. The active site cleft is partly covered by 2 thin bridges formed by amino\ acid side chains, by contrast with the hydrophobic lid possessed by other lipases PUBMED:1560844. The protein \ also contains 2 disulphide bridges, which are essential for activity, their cleavage resulting in complete \ loss of enzymatic activity PUBMED:. Two cutinase-like proteins (MtCY39.35 and MtCY339.08c) have been \ found in the genome of the bacteria Mycobacterium tuberculosis.

    \ 3714 IPR005313 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of aspartic peptidases belongs to the MEROPS family A21 (clan AB). The protein fold of the peptidase active site domain for members of this family is that of the nodavirus endopeptidase, the type example for clan AB. The type example for the family is the tetravirus endopeptidase from Nudaurelia capensis omega virus. Members of this family are found as a capsid protein in some of the tetraviridae.

    \ 3838 IPR006516 \

    This set of sequences represent a family of phage and plasmid replication proteins. In bacteriophage IKe and related phage, the full-length protein is designated gene II protein. A much shorter protein of unknown function, translated from a conserved in-frame alternative initiator, is designated gene X protein. Members of this family also include plasmid replication proteins.

    \ 2028 IPR007140 \ This motif occurs in a small set of bacterial proteins. It has two transmembrane regions, and often occurs as tandem repeats. The are no conserved catalytic residues.\ 8046 IPR013178 \

    This is a fungal family of proteins whose function is unknown.

    \ 6541 IPR009589 \

    This family consists of several hypothetical proteins specific to Oceanobacillus and Bacillus species. Members of this family are typically around 130 residues in length. The function of this family is unknown.

    \ 5653 IPR008735 \ This family consists of the mammalian specific protein beta-microseminoprotein. Prostatic secretory protein of 94 amino acids (PSP94), also called beta-microseminoprotein, is a small, nonglycosylated protein, rich in cysteine residues. It was first isolated as a major protein from Homo sapiens seminal plasma PUBMED:10639193. The exact function of this protein is unknown.\ 6758 IPR009698 \

    This family consists of several hypothetical proteins of around 200 residues in length. The function of this family is unknown although a number of family members are thought to be putative membrane proteins.

    \ 4866 IPR005359 \

    This family contains a set of short bacterial proteins of unknown function.

    \ 6632 IPR010653 \

    This family consists of a number of bacterial lipoproteins often known as NlpB or DapX. This lipoprotein is detected in outer membrane vesicles in Escherichia coli and appears to be nonessential PUBMED:1885529.

    \ 3605 IPR003483 \ This is a family of outer surface proteins (Osp) from the Borrelia spp. spirochete PUBMED:8982001. The family includes OspE, OspF, and OspEF-related proteins (Erp) PUBMED:8655548. These proteins are coded for on different circular plasmids in the Borrelia genome.\ 3906 IPR000959 \

    A subgroup of serine/threonine protein kinases, Polo or Polo-like kinases play multiple roles during the cell cycle. Polo kinases are required at several key points\ through mitosis, starting from control of the G2/M transition through phosphorylation of Cdc25C and mitotic cyclins. Polo kinases are characterised by an amino terminal catalytic domain, and a carboxy terminal non-catalytic domain consisting of three blocks of conserved\ sequences known as polo boxes which form one single functional domain PUBMED:9914175. The domain is named after its founding member encoded by the polo gene of Drosophila melanogaster PUBMED:1660828. This domain of around 70 amino acids has been found in species ranging from yeast to mammals. Polo boxes appear to mediate interaction with multiple proteins through protein:protein interactions; some but not all of these proteins are substrates for the kinase domain of the molecule PUBMED:12615979.

    \

    The crystal structure of the polo domain of the murine protein, Sak, is dimeric,\ consisting of two alpha-helices and two six-stranded beta-sheets PUBMED:12352953. The topology of one polypeptide subunit of the\ dimer consists of, from its N- to C-terminus, an extended strand segment, five beta-strands, one alpha-helix (A) and a\ C-terminal beta-strand. Beta-strands from one\ subunit form a contiguous antiparallel beta-sheet with beta-strands from the second subunit. The two beta-sheets pack with a\ crossing angle of 110°, orienting the hydrophobic surfaces\ inward and the hydrophilic surfaces outward. Helix A, which is\ colinear with beta-strand 6 of the same polypeptide, buries a large\ portion of the non-overlapping hydrophobic beta-sheet surfaces.\ Interactions involving helices A comprise a majority of the\ hydrophobic core structure and also the dimer interface.

    \

    Point mutations in the Polo box of the budding yeast Cdc5 protein abolish the ability of overexpressed Cdc5 to interact with the spindle poles and to organize cytokinetic structures PUBMED:10594031.

    \ 1899 IPR003769 \

    In the bacterial cytosol, ATP-dependent protein degradation is performed by several different chaperone-protease pairs, including ClpAP. ClpS directly influences the ClpAP machine by binding to the N-terminal domain of the chaperone ClpA. The degradation of ClpAP substrates, both SsrA-tagged proteins and ClpA itself, is specifically inhibited by ClpS. ClpS modifies ClpA substrate specificity, potentially redirecting degradation by ClpAP toward aggregated proteins PUBMED:11931773.

    \

    ClpS is a small alpha/beta protein that consists of three alpha-helices connected to three antiparallel beta-strands PUBMED:12426582. The protein has a globular shape, with a curved layer of three antiparallel alpha-helices over a twisted antiparallel beta-sheet. Dimerization of ClpS may occur through its N-terminal domain. This short extended N-terminal region in ClpS is followed by the central seven-residue beta-strand, which is flanked by two other beta-strands in a small beta-sheet.

    \ 5913 IPR010346 \

    This family consists of several bacteria and phage lipoprotein Rz1 precursors. Rz1 is a proline-rich lipoprotein from bacteriophage lambda, which is known to have fusogenic properties. Rz1-induced liposome fusion is thought to be mediated primarily by the generation of local perturbation in the bilayer lipid membrane and to a lesser extent by electrostatic forces PUBMED:10651816.

    \ 5436 IPR008496 \ This family consists of several eukaryotic proteins of unknown function.\ 1404 IPR007520 \

    This domain is the C terminus of Saccharomyces cerevisiae Bul1. Bul1 binds the ubiquitin ligase Rsp5, via an N-terminal PPSY motif (157-160 in ) PUBMED:9931424. The complex containing Bul1 and Rsp5 is involved in intracellular trafficking of the general amino acid permease Gap1 PUBMED:11500494, degradation of Rog1 in cooperation with Bul2 and GSK-3 PUBMED:10958669, and mitochondrial inheritance PUBMED:10366593. Bul1 may contain HEAT repeats. The N terminus is .

    \ 2198 IPR002739 \

    These archaebacterial proteins have no known function.

    \ 6676 IPR010671 \

    This entry describes several repeats which seem to be specific to the Methanosarcina archaea species and are often found in multiple copies in disaggregatase proteins. Members of this family are also found in single copies in several hypothetical proteins.

    \ 7628 IPR012438 \

    This approximately 50-residue region is found in a number of sequences derived from hypothetical plant proteins. This region features a highly basic 5 amino-acid stretch towards its centre.

    \ 2910 IPR006882 \ This is a family of Herpesvirus proteins sharing a conserved region present in the ORF11 protein.\ 6196 IPR009104 \

    Sea anemones are a rich source of lethal pore-forming peptides and proteins, known collectively as cytolysins or actinoporins. There are several different groups of cytolysins based on their structure and function PUBMED:11689232. This entry represents the most numerous group, the 20-kDa highly basic peptides. These cytolysins form cation-selective pores in sphingomyelin-containing membranes. Examples include equinatoxins (from Actinia equina), sticholysins (from Stichodactyla helianthus), magnificalysins (from Heteractis magnifica), and tenebrosins (from Actinia tenebrosa), which exhibit pore-forming, haemolytic, cytotoxic, and heart stimulatory activities.

    \

    Cytolysins adopt a stable soluble structure, which undergoes a conformational change when brought in contact with a membrane, leading to an active, membrane-bound form that inserts spontaneously into the membrane. They often oligomerise on the membrane surface, before puncturing the lipid bilayers, causing the cell to lyse. The 20-kDa sea anemone cytolysins require a phosphocholine lipid headgroup for binding, however sphingomyelin is required for the toxin to promote membrane permeability PUBMED:14604518. The crystal structures of equinotoxin II PUBMED:11827489 and sticholysin II PUBMED:14604522 both revealed a compact beta-sandwich consisting of ten strands in two sheets flanked on each side by two short alpha-helices, which is a similar topology to osmotin. It is believed that the beta sandwich structure attaches to the membrane, while a three-turn alpha helix lying on the surface of the beta sheet may be involved in membrane pore formation, possibly by the penetration of the membrane by the helix.

    \ \ 7342 IPR011119 \

    The members of this family are restricted to the Gammaproteobacteria. Some members have been annotated as helicase, conjugative relaxase or nickase. The majority contain an HD domain, which is found in a superfamily of enzymes with a predicted or known phosphohydrolase activity. These enzymes appear to be involved in the nucleic acid metabolism, signal transduction and possibly other functions in bacteria, archaea and eukaryotes.

    \ \ \ 7647 IPR012442 \

    These sequences are derived from a number of hypothetical plant proteins. The region in question is approximately 270 amino acids long. Some members of this family are annotated as yeast pheromone receptor proteins AR781 but no literature was found to support this.

    \ 7077 IPR010829 \

    This family contains a number of fungal cerato-platanin phytotoxic proteins approximately 150 residues long. Cerato-platanin contains four cysteine residues that form two disulphide bonds PUBMED:10455173.

    \ 2260 IPR006851 \ This is a family of chloroplast proteins of unknown function. Some members have two copies of the conserved region.\ 5216 IPR008442 \

    This signature is found at the N terminus of serine carboxypeptidases, which belong to MEROPS peptidase family S10. This region contains the signal peptide and pro-peptide regions PUBMED:8789258,PUBMED:10077185.

    \ 848 IPR004179 \ This domain was named after the yeast Sec63 (or NPL1) protein in which it was found. This protein is required for preprotein translocation. Other yeast proteins containing this domain include pre-mRNA splicing helicase BRR2, HFM1 protein and putative helicases.\ 1618 IPR004398 \ This is a family of conserved hypothetical proteins, which includes a putative methylase.\ 6085 IPR009366 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 4033 IPR004928 \

    Photosystem I, a membrane complex found in the chloroplasts of plants and cyanobacteria \ uses light energy to transfer electrons from plastocyanin to ferredoxin PUBMED:. \ The electron transfer components of the photosystem include the primary electron donor \ chlorophyll P-700 and 5 electron acceptors: chlorophyll (A0), phylloquinone (A1) and \ three 4Fe-4S iron-sulphur centres, designated Fx, Fa and Fb. The role of this protein, subunit VI or PsaH, may be in docking of the light harvesting \ complex I antenna to the core complex.

    \ 4552 IPR002013 \ Synaptic vesicles are recycled with remarkable speed and precision in nerve\ terminals. A major recycling pathway involves clathrin-mediated endocytosis at\ endocytic zones located around sites of release. Different 'accessory' proteins\ linked to this pathway have been shown to alter the shape and composition of lipid\ membranes, to modify membrane-coat protein interactions, and to influence actin\ polymerization. These include the GTPase dynamin, the lysophosphatidic acid acyl\ transferase endophilin, and the phosphoinositide phosphatase synaptojanin PUBMED:9851978. \

    The recessive suppressor of secretory defect in yeast Golgi and yeast\ actin function belongs to this family. This protein may be involved in the coordination of the activities of the secretory pathway and the actin cytoskeleton.

    \ \

    Human synaptojanin which may be localised on coated endocytic intermediates in\ nerve terminals also belongs to this family.

    \ 3256 IPR001695 \

    Lysyl oxidase () (LOX) PUBMED:8104038 is an extracellular copper-dependent enzyme that catalyzes the oxidative deamination of peptidyl lysine residues in precursors of various collagens and elastins, yielding alpha-aminoadipic-delta-semialdehyde. The deaminated lysines are then able to form semialdehyde cross-links, resulting in the formation of insoluble collagen and elastin fibres in the extracellular matrix PUBMED:1357535.

    \

    The active site of LOX resides towards the C terminus: this region also binds a single copper atom in an octahedral coordination complex involving at least 3 His residues PUBMED:1352776. Four histidine residues are clustered in a central region of the enzyme. This region is thought to be involved in cooper-binding and is called the 'copper-talon' PUBMED:8104038.

    \ 7642 IPR012495 \

    The members of this family are similar to a region of the protein product of the bacterial tadE locus (). In various bacterial species, the tad locus is closely linked to flp-like genes, which encode proteins required for the production of pili involved in adherence to surfaces PUBMED:11553455. It is thought that the tad loci encode proteins that act to assemble or export an Flp pilus in various bacteria PUBMED:11553455. All tad loci but TadA have putative transmembrane regions PUBMED:11553455, and in fact the region in question is this family has a high proportion of hydrophobic amino acid residues.

    \ 6148 IPR008110 \

    Periodontal disease in humans is a major health problem in the developed \ world, and is caused by a number of specialised pathogens that inhabit \ the oral cavity. Amongst the bacterial species culturable from periodontal \ lesions are the streptococcal microbes Streptococcus mutans and S treptococcus sobrinus, and \ the Gram-negative anaerobe Porphyromonas (Bacteroides) gingivalis PUBMED:2895100. The\ latter bacterium has been implicated as the causative agent of peridontitis,\ pulpal infections and tonsillar abcesses PUBMED:2895100.\

    \

    Adherence by Porphyromonas gingivalis to the periodontal surface is mediated by its \ major virulence factor fimbriae PUBMED:1987052. This differs from other pathogenic \ Gram-negative bacterial polymeric Type I and IV fimbriae/pili in that it is \ much more simplified, consisting of only a monomeric fimbrillin repeating \ subunit, Fma1/FimA. Fma1/FimA has a molecular weight of 43kDa, and can \ exhibit antigenic diversity in different Porphyromonas gingivalis strains \ PUBMED:1987052. Unusually, this form of fimbrillin possesses a far longer leader \ peptide compared to the fimbrial subunits of other bacteria PUBMED:1987052. It has been \ hypothesised that this allows for the maturation of the preprotein during \ secretion PUBMED:1987052.\

    \

    Recently, a study into the different antigenic types of P. gingivalis\ fimbrillin classified them into five distinct groups, depending on their \ gene sequences PUBMED:11748193. Investigations into the functional differences of\ each type revealed that in the majority of peridontitis cases, bacterial\ strains possessing the type II Fma1/FimA were the most prevalent PUBMED:11748193; in\ healthy adults, type I strains were the most common. This has implications \ for particular strains that are associated with periodontal disease.\

    \ \ 1253 IPR007290 \ Arv1 is a transmembrane protein with potential zinc-binding motifs. ARV1 is a novel mediator of eukaryotic sterol homeostasis PUBMED:11063737.\ 6910 IPR010765 \

    This family consists of several hypothetical proteins from both cyanobacteria and plants. Members of this family are typically around 250 residues in length. The function of this family is unknown but the species distribution indicates that the family may be involved in photosynthesis.

    \ 2420 IPR004954 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M60 (enhancin family, clan MA(E)). The active site residues for members of this family and thermolysin, the type example for clan MA, occur in the motif HEXXH. \ The viral enhancin protein, or enhancing factor, is involved in disruption of the peritrophic membrane and fusion of nucleocapsids with mid-gut cells.

    \ \ 3973 IPR005059 \

    The DNA-dependent RNA polymerase from vaccinia virions has a molecular weight of approximately 500 kDa and can be dissociated into putative subunits of 140, 137, 37, 35, 31, 22, and 17 kDa. This group represents a DNA-directed RNA polymerase, 35 kDa subunit. DNA-dependent RNA polymerases () are\ responsible for the polymerisation of ribonucleotides\ into a sequence complementary to the template DNA.

    \ 7208 IPR009969 \

    This family consists of several Pneumovirus M2 proteins. The M2-1 protein of respiratory syncytial virus (RSV) is a transcription processivity factor that is essential for virus replication PUBMED:12692207.

    \ 6330 IPR009480 \

    This family consists of several equine infectious anaemia virus S2 proteins. The function of this family is unknown.

    \ 415 IPR004993 \ Transcription of the gene family, GH3, has been shown to be specifically induced by the plant\ hormone auxin. The auxin-responsive GH3 gene promoter is composed of multiple auxin response elements (AuxREs), and each\ AuxRE contributes incrementally to the strong auxin inducibility to the promoter.\ 5999 IPR009326 \

    This is a family of bacterial proteins with unknown function.

    \ 5212 IPR008595 \ This is a group of Bacillus DegS proteins. The DegS-DegU two-component regulatory system of Bacillus subtilis controls various processes that characterise the transition from the exponential to the stationary growth phase, including the induction of extracellular degradative enzymes, expression of late competence genes and down-regulation of the sigma D regulon PUBMED:12471443. The entry also contains one sequence from Thermoanaerobacter tengcongensis which is described as a sensory transduction histidine kinase.\ 1674 IPR000269 \

    Amine oxidases (AO) are enzymes that catalyze the oxidation of a wide range of biogenic amines including many neurotransmitters, histamine and xenobiotic amines. There are two classes of amine oxidases: flavin-containing () and copper-containing ().\ Copper-containing AO act as a disulphide-linked homodimer. They catalyse the oxidation of primary amines to aldehydes, with the subsequent release of ammonia and hydrogen peroxide: which requires one copper ion per subunit and topaquinone as cofactor PUBMED:8591028. Copper-containing amine oxidases are found in bacteria, fungi, plants and animals. In prokaryotes, the enzyme enables various amine substrates to be used as sources of carbon and nitrogen PUBMED:9048544, PUBMED:9405045. In eukaryotes they have a broader range of functions, including cell differentiation and growth, wound healing, detoxification and cell signalling PUBMED:8805580.

    \

    The copper amine oxidases occur as mushroom-shaped homodimers of 70-95 kDa, each monomer containing a copper ion and a covalently bound redox cofactor, topaquinone (TPQ). TPQ is formed by post-translational modification of a conserved tyrosine residue. The copper ion is coordinated with three histidine residues and two water molecules in a distorted square pyramidal geometry, and has a dual function in catalysis and TPQ biogenesis. The catalytic domain is the largest of the 3-4 domains found in copper amine oxidases, and consists of a beta sandwich of 18 strands in two sheets. The active site is buried and requires a conformational change to allow the substrate access.

    \ 4489 IPR005562 \

    Members of this family are all transcribed from the spoVA operon. These proteins are poorly characterised, but are thought to be involved in dipicolinic acid transport into the developing forespore during sporulation PUBMED:11751839.

    \ 7527 IPR011622 \ This entry represents one of two distinct types of extracellular domain found in the 7TM-DISM (7TM Receptors with Diverse Intracellular Signalling Modules) bacterial transmembrane proteins PUBMED:12914674. It is possible that this domain adopts a jelly roll fold and acts as a receptor for carbohydrates and their derivatives PUBMED:12914674.\ 3526 IPR007264 \ Nop10p is a nucleolar protein that is specifically associated with H/ACA snoRNAs. It is essential for normal 18S rRNA production and rRNA pseudouridylation by the ribonucleoprotein particles containing H/ACA snoRNAs (H/ACA snoRNPs). Nop10p is probably necessary for the stability of these RNPs PUBMED:9843512.\ 833 IPR008138 \ Saposins are small lysosomal proteins that serve as activators of various\ lysosomal lipid-degrading enzymes PUBMED:7595087. They probably act by isolating the\ lipid substrate from the membrane surroundings, thus making it more \ accessible to the soluble degradative enzymes. All mammalian saposins\ are synthesized as a single precursor molecule (prosaposin) which contains\ four Saposin-B domains, yielding the active saposins after proteolytic\ cleavage, and two Saposin-A domains that are removed in the activation\ reaction. \ The Saposin-B domains also occur in other \ proteins, many of them active in the lysis of membranes PUBMED:8003971, PUBMED:8868085.\ \ 459 IPR007502 \ This presumed domain is about 90 amino acid residues in length. It is found as a diverse set of RNA helicases. Its function is unknown, however it seems likely to be involved in nucleic acid binding.\ 6187 IPR009064 \

    Protozoan pheromones are cell-type specific protein signals. Representatives of this family of proteins include Er-1, Er-2, Er-10, Er11 and Er22 from the ciliated protozoan Euplotes raikovi, which are constitutively secreted and bound back in an autocrine fashion with a positive effect on mitotic cell growth. The mitogenic activity induced by the Er pheromone autocrine signalling can be inhibited by cAMP PUBMED:12681291. The NMR structure reveals a closed up-and-down bundle of three helices with a left-handed twist PUBMED:7833812. In some cases, these pheromones can compete with each other in binding to their cell-surface receptors.

    \ \ 1759 IPR000422 \ 3,4-Dihydroxy-2-butanone 4-phosphate is biosynthesized\ from ribulose 5-phosphate and serves as the biosynthetic\ precursor for the xylene ring of riboflavin PUBMED:9211332.\ It is sometimes found as a bifunctional enzyme with GTP cyclohydrolase II that catalyses the first committed step in the biosynthesis of riboflavin ().\

    No sequences with significant homology to DHBP synthase are found in the metazoa.

    \ 6894 IPR009773 \

    This family consists of several Lactococcus phage middle-3 (M3) proteins of around 160 residues in length. The function of this family is unknown.

    \ 3799 IPR013078 \

    Phosphoglycerate mutase () (PGAM) and bisphosphoglycerate mutase () \ (BPGM) are structurally related enzymes that catalyse reactions involving the transfer of phospho groups between the three carbon atoms of phosphoglycerate PUBMED:2847721, PUBMED:2831102, PUBMED:10958932. Both enzymes can catalyse three different reactions with different specificities, the isomerization of 2-phosphoglycerate (2-PGA) to 3-phosphoglycerate (3-PGA) with 2,3-diphosphoglycerate (2,3-DPG) as the primer of the reaction, the synthesis of 2,3-DPG from 1,3-DPG with 3-PGA as a primer and the degradation of 2,3-DPG to 3-PGA (phosphatase activity).

    \

    In mammals, PGAM is a dimeric protein with two isoforms, the M (muscle) and B (brain) forms. In yeast, PGAM is a tetrameric protein.

    BPGM is a dimeric protein and is found mainly in erythrocytes where it plays a major role in regulating haemoglobin oxygen affinity as a consequence of controlling 2,3-DPG concentration. The catalytic mechanism of both PGAM and BPGM involves the formation of a phosphohistidine intermediate PUBMED:6294454.

    A number of other proteins including, the bifunctional enzyme 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase PUBMED:2557623 that catalyses both the synthesis and the degradation of fructose-2,6-bisphosphate and bacterial alpha-ribazole-5'-phosphate phosphatase, which is involved in cobalamin biosynthesis, contain this domain PUBMED:7929373.

    \ 5822 IPR008081 \

    Cytoplasmic fragile X mental retardation protein (FMRP) interacting protein\ belongs to a highly conserved but, as yet, functionally uncharacterised\ family. Absence of FMRP is responsible for pathologic manifestations in \ Fragile X Syndrome, the most frequent cause of inherited mental retardation\ PUBMED:10449408. FMRP is an RNA-binding protein that may have a role in local protein\ translation at neuronal dendrites and in dendritic spine maturation PUBMED:10449408.\ CYFIP1 and CYFIP2, which share a high level of sequence identity, have \ recently been identified as cytoplasmic FMRP interacting proteins PUBMED:10449408.\ CYFIP2 interacts with FMRP-related proteins FXR1P/2P, while CYFIP1 interacts\ exclusively with FMRP. The FMRP-CYFIP interaction involves the domain of\ FMRP that also mediates homo- and heteromerisation, suggesting competition\ between the various interaction partners. CYFIP1 also interacts with the \ small GTPase Rac1 implicated in development and maintenance of neuronal\ structures. CYFIP1/2 are both present in synaptosomal extracts PUBMED:10449408. \

    \

    PIR121 (121F-specific p53 inducible RNA) is another functionally\ uncharacterised member of this family. The PIR121 gene maps to human\ chromosome 5q34, a region frequently translocated in acute myeloid leukaemia\ but not known to be amplified or deleted in solid tumours. Interaction\ between PIR121 and FMRP has been demonstrated, and hence PIR121 has also \ been termed CYFIP2 (Cytoplasmic FMRP Interacting Protein 2) PUBMED:10449408, PUBMED:9756361.\

    \

    Shyc (Selective HYbridizing Clone) is a cytoplasmic protein of unknown \ function, expressed in the developing and embryonic nervous system. The\ protein has also been designated CYFIP1 due to the high sequence identity\ (98.7%) to its human orthologue. The CYFIP orthologues in Caenorhabditis\ elegans and Drosophila melanogaster share about 51% and 67% sequence \ identity with the human proteins, respectively PUBMED:10449408. The high level of\ conservation manifest throughout the entire CYFIP sequence between various\ orthologues suggests a number of functionally/structurally important domains.

    \ 4499 IPR001190 \

    The egg peptide speract receptor is a transmembrane glycoprotein PUBMED:8140623. Other members of this family include the macrophage\ scavenger receptor type I (a membrane glycoprotein implicated in the pathologic\ deposition of cholesterol in arterial walls during artherogenesis), an enteropeptidase\ and T-cell surface glycoprotein CD5 (may act as a receptor in regulating T-cell\ proliferation).

    \ 1142 IPR003460 \

    Antifreeze proteins (AFPs) are a class of proteins that are able to bind to and inhibit the growth of macromolecular ice, thereby permitting an organism to survive subzero temperatures by decreasing the probability of ice nucleation in their bodies PUBMED:15291806. These proteins have been characterized from a variety of organisms, including fish, plants, bacteria, fungi and arthropods. This entry represents insect AFPs of the type found in the yellow mealworm Tenebrio molitor and in the pyrochroid beetle Dendroides canadensis.

    \

    The structure of these AFPs consists of a right-handed beta-helix with 12 residues per coil. The beta-helices of insect AFPs present a highly rigid array of threonine residues and bound water molecules that can effectively mimic the ice lattice. As such, beta-helical AFPs provide a more effective coverage of the ice surface compared to the alpha-helical fish AFPs PUBMED:10917536.

    \

    A second insect antifreeze from Choristoneura fumiferana () also consists of beta-helices, however in these proteins the helices form a left-handed twist; these proteins show no sequence homology to the current entry, but may act by a similar mechanism. The beta-helix motif may be used as an AFP structural motif in non-homologous proteins from other (non-fish) organisms as well.

    \ \ 5543 IPR008899 \ This (predicted) Zinc finger is found in the bassoon and piccolo proteins (e.g. ). There are eight conserved cysteines, suggesting that it coordinates two zinc ligands.\ 2236 IPR007650 \ This is a family of uncharacterised proteins.\ 1229 IPR003313 \ This entry defines the arabinose-binding and dimerisation domain of the bacterial gene regulatory protein AraC. \ The crystal structure of the arabinose-binding and dimerization domain of the Escherchia coli gene regulatory protein AraC was determined in the presence and\ absence of L-arabinose. The arabinose-bound molecule shows that the protein adopts an unusual fold, binding sugar within a beta barrel and completely burying the arabinose with the amino-terminal arm of the protein. Dimer contacts in the presence of arabinose are mediated by an antiparallel coiled-coil. In the uncomplexed protein, the amino-terminal arm is disordered, uncovering the sugar-binding pocket and allowing it to serve as an oligomerization interface PUBMED:9103202.\ 2674 IPR003463 \ This family includes insect peptides that are short (23 amino acids) and contain 1 disulphide bridge. The family includes growth-blocking peptide (GBP) of Pseudaletia separata and the paralytic peptides from Manduca sexta, Heliothis virescens, and Spodoptera exigua PUBMED:2071576 as well as plasmatocyte-spreading peptide (PSP1) PUBMED:9988679. These peptides function to halt metamorphosis from larvae to pupae.\ 7879 IPR012565 \

    This family consists of the leader peptide of the histidine (his) operon. The his operon contains all the genes necessary for histidine biosynthesis. The region corresponding to the untranslated 5, end of the transcript, named the his leader region, displays the typical features of the T box transcriptional attenuation mechanism which is involved in the regulation of many amino acid biosynthetic operons PUBMED:10094678.

    \ 910 IPR007582 \ This region, possibly a domain is found in subunits of transcription factor TFIID. The function of this region is unknown.\ 6689 IPR010677 \

    This family consists of several BALF1 proteins, which seem to be specific to the Lymphocryptoviruses. BALF1, inhibits the antiapoptotic activity of EBV BHRF1 and of KSBcl-2 PUBMED:11836425.

    \ 1613 IPR001242 \ This domain is found in many multi-domain enzymes which synthesize peptide antibiotics. This domain catalyses a\ condensation reaction to form peptide bonds in non-ribosomal peptide biosynthesis. It is usually found to the carboxy\ side of a phosphopantetheine binding domain (pp-binding). It has been shown that mutations in the HHXXXDG motif\ abolish activity suggesting this is part of the active site PUBMED:9712910. \ 2075 IPR007302 \

    This is a domain of unknown function. It sometimes occurs in combination with two domains of unknown function DUF403 () and DUF404 ().

    \ 179 IPR006045 \

    This family represents the conserved barrel domain of the 'cupin' superfamily ('cupa' is the Latin term for a small barrel). This family contains 11S and 7S plant seed storage proteins, and germins. Plant seed storage proteins provide the major nitrogen source for the developing plant.

    \ 2149 IPR007438 \ This family includes several proteins of uncharacterised function.\ 4789 IPR003670 \ The UK protein is an African swine fever virus (ASFV) protein that is highly conserved amongst strains. Data indicates that the\ highly conserved UK gene of ASFV, while being nonessential for growth in\ macrophages in vitro, is an important viral virulence determinant for domestic pigs PUBMED:9444996.\ 4239 IPR001931 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic ribosomal proteins can be grouped on the basis of\ sequence similarities. These proteins have 82 to 87 amino acids. The amino termini are all N alpha-acetylated. The N-terminal halves of the protein molecules are highly conserved in contrast to the carboxy-terminal parts PUBMED:3910104.

    \ 4494 IPR005530 \

    A short repeat found in a small family of membrane-bound proteins. This repeat contains a conserved SPW motif in the first of two transmembrane helices.

    \ 3436 IPR007406 \

    This is the N-terminal region of MukB. MukB is involved in the segregation and condensation of prokaryotic chromosomes. MukE () along with MukF () interact with MukB in vivo forming a complex, which is required for chromosome condensation and segregation in Escherichia coli PUBMED:10545099. The Muk complex appears to be similar to the SMC-ScpA-ScpB complex in other prokaryotes where MukB is the homologue of SMC PUBMED:12065423. ScpA () and ScpB () have little sequence similarity to MukE or MukF, though they are predicted to be structurally similar, being predominantly alpha-helical with coiled coil regions.

    \ \ \

    The structure of the N-terminal domain consists of an antiparallel six-stranded beta sheet surrounded by one helix on one side and by five helices on the other side PUBMED:10545328. It contains an exposed Walker A loop in an unexpected helix-loop-helix motif. In other proteins, Walker A motifs generally adopt a P loop conformation as part of a strand-loop-helix motif embedded in a conserved topology of alternating helices and (parallel) beta strands PUBMED:10545328.

    \ \ 426 IPR000757 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 16 comprises enzymes with a number of known activities; lichenase (); xyloglucan xyloglucosyltransferase (); agarase (); kappa-carrageenase (); endo-beta-1,3-glucanase (); endo-beta-1,3-1,4-glucanase (); endo-beta-galactosidase ().

    \ 1755 IPR007677 \ The precise function of this protein is unknown. A deletion/insertion mutation is associated with an autosomal dominant non-syndromic hearing impairment form PUBMED:9771715. In addition, this protein has also been found to contribute to acquired etoposide resistance in melanoma cells PUBMED:11297734.\ 627 IPR000940 \

    Methyl transfer from the ubiquitous S-adenosyl-L-methionine (AdoMet) to either nitrogen, oxygen or carbon atoms is frequently employed in diverse organisms ranging from bacteria to plants and mammals. The reaction is catalyzed by methyltransferases (Mtases) and modifies DNA, RNA, proteins and small molecules, such as catechol for regulatory purposes. The various aspects of the role of DNA methylation in prokaryotic restriction-modification systems and in a number of cellular processes in eukaryotes including gene regulation and differentiation is well documented.

    \ \

    Three classes of DNA Mtases transfer the methyl group from AdoMet to the target base to form either N-6-methyladenine, or N-4-methylcytosine, or C-5- methylcytosine. In C-5-cytosine Mtases, ten conserved motifs are arranged in the same order PUBMED:8127644. Motif I (a glycine-rich or closely related consensus sequence; FAGxGG in M.HhaI PUBMED:8343957), shared by other AdoMet-Mtases PUBMED:2684970, is part of the cofactor binding site and motif IV (PCQ) is part of the catalytic site. In contrast, sequence comparison among N-6-adenine and N-4-cytosine Mtases indicated two of the conserved segments PUBMED:2690010, although more conserved segments may be present. One of them corresponds to motif I in C-5-cytosine Mtases, and the other is named (D/N/S)PP(Y/F). Crystal structures are known for a number of Mtases PUBMED:7607476, PUBMED:8343957, PUBMED:8127644, PUBMED:7971991. The cofactor binding sites are almost identical and the essential catalytic amino acids coincide. The comparable protein folding and the existence of equivalent amino acids in similar secondary and tertiary positions indicate that many (if not all) AdoMet-Mtases have a common catalytic domain structure. This permits tertiary structure prediction of other DNA, RNA, protein, and small-molecule AdoMet-Mtases from their amino acid sequences PUBMED:7897657.

    \ \

    Several cytoplasmic vertebrate methyltransferases are evolutionary related PUBMED:8182091, including\ nicotinamide N-methyltransferase () (NNMT); phenylethanolamine N-methyltransferase \ () (PNMT); and thioether S-methyltransferase \ () (TEMT). NNMT catalyzes the \ N-methylation of nicotinamide and other pyridines to form pyridinium ions. This activity is important \ for the biotransformation of many drugs and xenobiotic compounds. PNMT catalyzes the last step in \ catecholamine biosynthesis, the conversion of noradrenalin to adrenalin; and TEMT catalyzes the\ methylation of dimethyl sulphide into trimethylsulphonium. These three enzymes use S-adenosyl-L-methionine \ as the methyl donor. They are proteins of 30 to 32 kDa.

    \ 7559 IPR011711 \

    Many bacterial transcription regulation proteins bind DNA through a helix-turn-helix (HTH) motif, which can be classified into subfamilies on the basis of sequence similarities. The HTH GntR family has many members distributed among diverse bacterial groups that regulate various biological processes. It was named GntR after the Bacillus subtilis repressor of the gluconate operon PUBMED:2060763. In general, these proteins contain a DNA-binding HTH domain at the N terminus, and an effector binding or oligomerisation domain at the C terminus. The winged-helix DNA-binding domain is well conserved in structure for the whole of the GntR family (), and is similar in structure to other transcriptional regulator families. The C-terminal effector-binding and oligomerisation domains are more variable and are consequently used to define the subfamilies. Based on the sequence and structure of the C-terminal domains, the GtnR family can be divided into four major groups, as represented by FadR (), HutC, MocR and YtrA, as well as some minor groups such as those represented by AraR and PlmA PUBMED:11756427.

    \

    This entry represents the C-terminal ligand binding domain of many members of the GntR family. This domain probably binds to a range of effector molecules that regulate the transcription of genes through the action of the N-terminal DNA-binding domain. This domain is found in and that are regulators of sugar biosynthesis operons.

    \ \ 1530 IPR002545 \ CheW proteins are part of the chemotaxis signaling\ mechanism in bacteria. CheW interacts with the methyl\ accepting chemotaxis proteins (MCPs) and relays signals\ to CheY, which affects flageller rotation. This family\ includes CheW and other related proteins that are\ involved in chemotaxis. The CheW-like regulatory domain\ in CheA PUBMED:9989504 binds to CheW, suggesting that these domains can\ interact with each other.\ 1824 IPR007249 \ DopA is the founding member of the Dopey family and is required for correct cell morphology and spatiotemporal organisation of multicellular structures in the filamentous fungus Aspergillus nidulans. DopA homologues are found in mammals. Saccharomyces cerevisiae DOP1 is essential for viability and, affects cellular morphogenesis PUBMED:10931277.\ 4741 IPR006677 \

    tRNA-intron endonucleases () cleave pre-tRNA producing 5'-hydroxyl and 2',3'-cyclic phosphate termini, and specifically removing the intron PUBMED:9200602. This entry is for C-terminal domain of tRNA-intron endonuclease.

    \ 5380 IPR008910 \ This alignment represents a conserved transmembrane helix as well as some flanking sequence. It is often found in association with .\ 5671 IPR008847 \ This domain consists of several eukaryotic suppressor of forked (Suf) like proteins. The Drosophila melanogaster suppressor of forked [Su(f)] protein shares homology with the Saccharomyces cerevisiae RNA14 protein and the 77 kDa subunit of Homo sapiens cleavage stimulation factor, which are proteins involved in mRNA 3' end formation. This suggests a role for Su(f) in mRNA 3' end formation in Drosophila. The su(f) gene produces three transcripts; two of them are polyadenylated at the end of the transcription unit, and one is a truncated transcript, polyadenylated in intron 4. It is thought that su(f) plays a role in the regulation of poly(A) site utilisation and the GU-rich sequence is important for this regulation to occur PUBMED:9826695.\ 4676 IPR007327 \ The hD52 gene was originally identified through its elevated expression level in human breast carcinoma. Cloning of D52 homologues from other species has indicated that D52 may play roles in calcium-mediated signal transduction and cell proliferation. Two human homologues of hD52, hD53 and hD54, have also been identified, demonstrating the existence of a novel gene/protein family PUBMED:9484778. These proteins have an N-terminal coiled-coil that allows members to form homo- and heterodimers with each other PUBMED:9484778.\ 7047 IPR010817 \

    This entry represents the N terminus (approximately 150 residues) of bacterial HemY porphyrin biosynthesis proteins. These are membrane protein involved in a late step of protoheme IX synthesis PUBMED:7928957.

    \ 7537 IPR011715 \

    This region contains a probable site of ubiquitination that ensures rapid degradation of tyrosine aminotransferase in rats. The half life of the enzyme in vivo is about 2-4 hours. In addition, unpublished information identifies at least 2 phosphorylation sites including CAPK at Ser29 and, at the other end of the protein, a casein kinase II site at S*QEECDK. This region of TAT is probably primarily related to regulatory events. Most other transaminases are much more stable and are not phosphorylated.

    \ 2462 IPR004992 \

    This is a family of related bacterial proteins with roles in ethanolamine and carbon dioxide metabolism.

    \ 7258 IPR009994 \

    This domain represents a conserved region approximately 200 residues long, four copies of which are found within the plant phloem filament protein PP1. This is one of the constituents of the proteinaceous filaments found in the sieve elements of Cucurbita phloem PUBMED:9263452.

    \ 4688 IPR002702 \ The translational regulator protein regA is encoded by the T4 bacteriophage and binds to a region of messenger RNA (mRNA) that includes the initiator codon. RegA is unusual in that it represses the translation of about 35 early T4 mRNAs but does not affect nearly 200 other mRNAs PUBMED:7761833.\ 7420 IPR011518 \

    These transposases are found in the planctomycete Rhodopirellula baltica, the cyanobacterium Nostoc, and the Gram-positive bacterium Streptomyces.

    \ 5268 IPR008683 \ This family contains several microvirus A* proteins. The A* protein binds to double stranded DNA and prevents their hydrolysis by nucleases PUBMED:158588.\ 5125 IPR007962 \

    This family consists of Bombinin and Maximin proteins from Bombina\ maxima. Two groups of antimicrobial peptides have been isolated from skin secretions of B. maxima. Peptides in the first group, named maximins 1,\ 2, 3, 4 and 5, are structurally related to bombinin-like peptides (BLPs). Unlike BLPs, sequence\ variations in maximins occurred all through the molecules. In addition to the potent antimicrobial\ activity, cytotoxicity against tumour cells and spermicidal action of maximins, maximin 3 possessed a\ significant anti-Simian-Human immunodeficiency virus activity.\ Maximins 1 and 3 have been found to be toxic to mice.\ Peptides in the second group, termed maximins H1, H2, H3 and H4, are homologous with bombinin\ H peptides PUBMED:11835991.

    \ 886 IPR004262 \ This family represents the C-terminal region of the male sterility protein in a number of organisms. The Arabidopsis thaliana male sterility 2 (MS2) protein is involved in male\ gametogenesis. The MS2 protein shows sequence similarity to a jojoba protein (also a member of this group) that converts wax fatty acids to fatty alcohols. It has been suggested that a possible function of the MS2 protein may be as a fatty acyl reductase in the formation\ of pollen wall substances PUBMED:9351246.\ 6765 IPR009704 \

    This family consists of several animal EURL proteins. EURL is preferentially expressed in chick retinal precursor cells as well as in the anterior epithelial cells of the lens at early stages of development. EURL transcripts are found primarily in the peripheral dorsal retina, i.e., the most undifferentiated part of the dorsal retina. EURL transcripts are also detected in the lens at stage 18 and remain abundant in the proliferating epithelial cells of the lens until at least day 11. The distribution pattern of EURL in the developing retina and lens suggest a role before the events leading to cell determination and differentiation PUBMED:12815627.

    \ 1940 IPR004180 \ This family of proteins are found in Borrelia burgdorferi and Borrelia garinii. The proteins are about 190 amino acids long and have no known function.\ 6743 IPR010701 \

    This family consists of several hypothetical plant specific proteins of around 150 residues in length. Members of this family contain several conserved cysteine residues. The function of the family is unknown.

    \ 1276 IPR000131 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    The gamma subunit is believed to be important in regulating ATPase activity and the flow\ of protons through the CF(0) complex. The best conserved region of the gamma\ subunit PUBMED:2896606 is its C-terminus which seems to be essential for assembly and catalysis.

    \ 1702 IPR013082 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    The alpha subunit (PsbE) of cytochrome b559, forms a haem-binding heterodimer with the beta subunit (PsbF) () within the reaction centre core of PSII. Both PsbE and PsbF are essential components for PSII assembly, and are probably involved in secondary electron transport mechanisms that help to protect PSII from photo-damage PUBMED:12560096.

    \ \

    This domain occurs in the lumenal region of the alpha subunit. It is usually found in conjuction with an N-terminal domain ().

    \ 5640 IPR008642 \ This family consists of several herpes virus BLRF2 proteins. The family also contains the C-terminal region of and (hypothetical Homo sapiens and Mus musculus sequences) which align with the N terminus of the viral sequences.\ 1184 IPR002901 \ This family includes mannosyl-glycoprotein endo-beta-N-acetylglucosamidase . Also included in this family is the flagellar protein J that has been shown to\ hydrolyse peptidoglycan PUBMED:10049388.\ 183 IPR002619 \ This domain has no known function. It is found in several Caenorhabditis elegans proteins. The domain contains 6 conserved cysteines that probably form three disulphide bridges.\ 448 IPR004182 \ The GRAM domain is found in in glucosyltransferases, myotubularins and other putative membrane-associated proteins.\ 2069 IPR007294 \ Members of this family are predicted to have 10 transmembrane regions.\ 2480 IPR002938 \ Monooxygenases incorporate one hydroxyl group into substrates and are found in many metabolic pathways. In this reaction, two atoms of dioxygen are reduced to one hydroxyl group and one H2O molecule by the concomitant oxidation of NAD(P)H PUBMED:1444267. P-hydroxybenzoate hydroxylase from Pseudomonas fluorescens contains this sequence motif (present in in flavoprotein hydroxylases) with a putative dual function in FAD and NADPH binding PUBMED:10025942.\ 4555 IPR001388 \

    Synaptobrevin is an intrinsic membrane protein of small synaptic vesicles PUBMED:2560644, specialised secretory organelles of neurons that actively accumulate neurotransmitters and participate in their calcium-dependent release by exocytosis. Vesicle function is mediated by proteins in their membranes, although the precise nature of the protein-protein interactions underlying this are still uncertain PUBMED:1976629. Synaptobrevin may play a role in the molecular events underlying neurotransmitter release and vesicle recycling and may be involved in the regulation of membrane flow in the nerve terminal, a process mediated by interaction with low molecular weight GTP-binding proteins PUBMED:8406010. Synaptic vesicle-associated membrane proteins (VAMPs) from Torpedo californica (electric ray) and SNC1 from yeast are related to synaptobrevin.

    \ 3543 IPR003423 \ The OEP family (Outer membrane efflux protein) form trimeric channels that allow export of a variety of substrates in Gram negative bacteria. Each member of this family is composed of two repeats. The trimeric channel is composed of a 12\ stranded all beta sheet barrel that spans the outer membrane, and a long all helical barrel that spans the periplasm. Examples include the Escherichia coli TolC outer membrane protein, which is required for proper expression of outer membrane protein genes; the Rhizobium nodulation protein; and the Pseudomonas FusA protein, which is involved in resistance to fusaric acid.\ 6561 IPR010614 \

    This represents a conserved region within a number of RAD3-like DNA-binding helicases that are seemingly ubiquitous - members include proteins of eukaryotic, bacterial and archaeal origin. RAD3 is involved in nucleotide excision repair, and forms part of the transcription factor TFIIH in yeast PUBMED:10915862.

    \ 7692 IPR012503 \

    This family is found at the N-terminus of the Tropheryma whipplei WisP family proteins PUBMED:12606174.

    \ 1070 IPR003140 \ This family consists of both phospholipases PUBMED:9644627 and carboxylesterases with broad substrate specificity, and is structurally related to alpha/beta hydrolases PUBMED:9438866.\ 4256 IPR000235 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S7 is one of the proteins from the small ribosomal subunit.\ In Escherichia coli, S7 is known to bind directly to part of the 3'end of 16S\ ribosomal RNA. It belongs to a family of ribosomal proteins which have been grouped on the\ basis of sequence similarities PUBMED:8338632, PUBMED:, PUBMED:8524651. The structure for S7 is known PUBMED:9331418.

    \ 4268 IPR004278 \ Caliciviruses are a small round-structured virus group defined by RNA-dependent RNA polymerase and capsid diversity.\ 3578 IPR000839 \

    The outer membrane-spanning (Oms) proteins of Borrelia burgdorferi have been\ isolated and their porin activities characterised; 0.6-nS porin activity\ was found to reside in a 28 kD protein, designated Oms28 PUBMED:8759855. The gene\ sequence of oms28 was found to encode a 257-amino-acid precursor protein\ with a putative 24-amino-acid leader peptidase I signal sequence PUBMED:8759855. The\ Oms28 protein partly fractionated to the outer membrane, and was\ characterised by an average single-channel conductance of 1.1 nS in a\ planar lipid bilayer assay, confirming Oms28 to be a porin PUBMED:8759855.

    \ 4018 IPR003757 \ The trimeric photosystem I of the cyanobacterium Synechococcus elongatus recomprises 11 protein subunits. Subunit XI, PsaL, from plants and bacteria is one of the smaller subunits with only two transmembrane alpha helices. PsaL interacts closely with PsaI PUBMED:8901876.\ 429 IPR000322 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 31 comprises enzymes with several known activities; alpha-glucosidase (), alpha-galactosidase (); glucoamylase (), sucrase-isomaltase () (); alpha-xylosidase (); alpha-glucan lyase ().

    \

    Glycoside hydrolase family 31 groups a number of glycosyl hydrolases on the basis of sequence\ similarities PUBMED:1747104, PUBMED:1761061, PUBMED:1743281\ An aspartic acid has been implicated PUBMED:1856189 in the catalytic activity of sucrase,\ isomaltase, and lysosomal alpha-glucosidase.

    \ 1471 IPR003705 \

    The cobalt transport protein CbiN is part of the active cobalt transport system involved in uptake of cobalt in to the cell involved with cobalamin biosynthesis (vitamin B12). It has been suggested that CbiN may function as\ the periplasmic binding protein component of the active cobalt transport system PUBMED:8501034.

    \ 6984 IPR010795 \

    This family contains prenylcysteine lyases () that are approximately 500 residues long. Prenylcysteine lyase is a FAD-dependent thioether oxidase that degrades a variety of prenylcysteines, producing free cysteine, an isoprenoid aldehyde and hydrogen peroxide as products of the reaction PUBMED:12186880. It has been noted that this enzyme has considerable homology with ClP55, a 55 kDa protein that is associated with chloride ion pumps PUBMED:11716481.

    \ 4670 IPR002061 \

    Scorpion toxins, which may be mammal or insect specific, bind to sodium\ channels, inhibiting the inactivation of activated channels and blocking\ neuronal transmission. The complete covalent structure of the toxins has\ been deduced: it comprises around 66 amino acid residues and is cross-\ linked by 4 disulphide bridges PUBMED:2311768, PUBMED:6845379. An anti-epilepsy peptide isolated\ from scorpion venom PUBMED:2930463 shows similarity to both scorpion neurotoxins and anti-insect toxins.

    \ \

    This family also contains a group of proteinase inhibitors from Arabidopsis thaliana and Brassica spp., which belong to MEROPS inhibitor family I18, clan I-. The Brassica napus (oil seed rape) and Sinapsis alba (Brassica alba, white mustard) inhibitors PUBMED:8143882, PUBMED:1451776, inhibit the catalytic activity of bovine beta-trypsin and bovine alpha-chymotrypsin, which belong to MEROPS peptidase family S1 () PUBMED:14705960.

    \ 3667 IPR001415 \ Parathyroid hormone (PTH) is a polypeptidic hormone that elevates calcium\ level by dissolving the salts in bone and preventing their renal excretion.\ \ The 'parathyroid hormone-related\ protein' (PTH-rP) is structurally related to PTH PUBMED:2682846 and seems to play a physiological role in lactation,\ possibly as a hormone for the mobilization and/or transfer of calcium to the\ milk. PTH and\ PTH-rP bind to the same G-protein coupled receptor.\ 2524 IPR006821 \ This domain represents the N-terminal head region of intermediate filaments. Intermediate filament heads bind DNA PUBMED:11513613. Vimentin heads are able to alter nuclear architecture and chromatin distribution, and the liberation of heads by HIV-1 protease liberates may play an important role in HIV-1 associated cytopathogenesis and carcinogenesis PUBMED:11160829. Phosphorylation of the head region can affect filament stability PUBMED:12177195. The head has been shown to interaction with the rod domain of the same protein PUBMED:12064937.\ 3583 IPR004813 \ The transporter OPT family are transporters of small oligopeptides, demonstrated\ experimentally in three different species of yeast. OPT1 is not a member of the ABC or PTR membrane transport families PUBMED:9043116.\ 6580 IPR009607 \

    This entry represents the C terminus of eukaryotic enhancer of polycomb proteins, which have roles in heterochromatin formation PUBMED:9735366. This family contains several conserved motifs.

    \ 3802 IPR005843 \

    Phosphoglucomutase (, PGM) is an enzyme responsible for\ the conversion of D-glucose 1-phosphate into D-glucose 6-phosphate. PGM\ participates in both the breakdown and synthesis of glucose. Phosphomannomutase (, PMM) is an enzyme responsible for\ the conversion of D-mannose 1-phosphate into D-mannose 6-phosphate. PMM is\ required for different biosynthetic pathways in bacteria.

    \

    This domain is contained in the C-terminal of both proteins.

    \ 8112 IPR013199 \

    Mga is a DNA-binding protein that activates the expression of several important virulence genes in group A streptococcus in response to changing environmental conditions PUBMED:11952907.

    \ 5366 IPR008439 \ This family consists of Campylobacter major outer membrane proteins. The major outer membrane protein (MOMP), a putative porin and a multifunction surface protein of Campylobacter jejuni, may play an important role in the adaptation of the organism to various host environments PUBMED:10992471.\ 4110 IPR007337 \

    Plasmids may be maintained stably in bacterial populations through the action of addiction modules, in which a toxin and antidote are encoded in a cassette on the plasmid. In any daughter cell that lacks the plasmid, the toxin persists and is lethal after the antidote protein is depleted. Toxin/antitoxin pairs are also found on main chromosomes, and likely represent selfish DNA. Sequences in the seed for this alignment all were found adjacent to toxin genes. Several toxin/antitoxin pairs may occur in a single species. \ RelE and RelB form a toxin-antitoxin system; RelE represses translation, probably through binding ribosomes PUBMED:11274135, PUBMED:12123459. RelB stably binds RelE, presumably deactivating it.

    \ 7571 IPR011696 \ HaTx1 is a 35 amino acid peptide toxin that was isolated from Chilean tarantula (Grammostola spatulata) venom. It inhibits the drk1 voltage-gated K(+) channel not by blocking the pore, but by altering the energetics of gating PUBMED:10731427.\ 7105 IPR010836 \

    This family contains a number of bacterial SapC proteins approximately 250 residues long. In Campylobacter fetus, SapC forms part of a paracrystalline surface layer (S-layer) that confers serum resistance PUBMED:9851986.

    \ 6946 IPR009804 \

    This family consists of several hypothetical Sulfolobus virus proteins of around 100 residues in length. The function of this family is unknown.

    \ 3088 IPR002369 \

    Integrins are the major metazoan receptors for cell adhesion to extracellular matrix proteins and, in vertebrates, also play important roles in certain cell-cell adhesions, make transmembrane connections to the cytoskeleton and activate many intracellular signaling pathways PUBMED:12297042. Integrins are alpha-beta heterodimers; each subunit crosses the membrane once, with most of the polypeptide in the extracellular space, and has two short cytoplasmic domains. Most integrins recognise relatively short peptide motifs, and in general require an acidic amino acid to be present. Ligand specificity depends on both the alpha and beta subunits. Many integrins are expressed on cell surfaces in an inactive state in which they do not bind ligands and do not signal. Integrins frequently intercommunicate and the engagement of one may lead to the activation or inhibition of another.

    \

    The structure of unliganded alphaV beta3 showed the molecule to be folded, with the head bent over towards the C termini of the legs which would normally be inserted into the membrane. The head comprises a beta propeller domain at the end terminus of the alphaV subunit and an I/A domain inserted into a loop on the top of the hybrid domain in the beta subunit. The I/A domain consists of a Rossman fold with a core of beta parallel sheets surrounded by amphipathic alpha helices.

    \ Integrins are important therapeutic targets in\ conditions such as atherosclerosis, thrombosis, cancer and asthma PUBMED:2199285.\ \

    At the N-terminus of the beta subunit is a cysteine-containing domain\ reminiscent of that found in presenillins and semaphorins, which has hence\ been termed the PSI domain. C-terminal to the PSI domain is an A-domain,\ which has been predicted to adopt a Rossmann fold similar to that of the\ alpha subunit, but with additional loops between the second and third\ beta strands PUBMED:9009218. The murine gene Pactolus shares significant similarity\ with the beta subunit PUBMED:9535848, but lacks either one or both of the inserted \ loops. \ The C-terminal portion of the beta subunit extracellular domain contains\ an internally disulphide-bonded cysteine-rich region, while the intra-\ cellular tail contains putative sites of interaction with a variety of\ intracellular signalling and cytoskeletal proteins, such as focal adhesion\ kinase and alpha-actinin respectively PUBMED:9818167. Integrin cytoplasmic domains are normally less than 50 amino acids in length, with the beta-subunit sequences\ exhibiting greater homology to each other than the alpha-subunit sequences. This is consistent with\ current evidence that the beta subunit is the principal site for binding of cytoskeletal and signalling\ molecules, whereas the alpha subunit has a regulatory role. The first 20 amino acids of the beta-subunit cytoplasmic domain are also alpha helical, but the final 25\ residues are disordered and, apart from a turn that follows a conserved NPxY motif, appear to lack\ defined structure, suggesting that this is adopted on effector binding. The two membrane-proximal\ helices mediate the link between the subunits via a series of hydrophobic and electrostatic contacts.

    \ 3147 IPR003386 \ Lecithin:cholesterol acyltransferase (LACT) also known as phosphatidylcholine-sterol acyltransferase (), is involved in extracellular metabolism of plasma lipoproteins, including cholesterol. It esterifies the free cholesterol transported in plasma lipoproteins, and is activated by apolipoprotein A-I. Defects in LACT cause Norum and Fish eye diseases.\ 3735 IPR002704 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \ This group of cysteine peptidases belong to MEROPS peptidase family C7 (clan CA). These are found in fungi and viruses (Hypoviridae). They are involved in transmissible hypovirulence and may indicate the possible origins of hypovirulence-associated dsRNAs PUBMED:2009854.\ 7491 IPR011652 \ This entry represents an apparent variant of the repeat (personal obs:C Yeats).\ 4120 IPR000525 \ RepB is an initiator of plasmid replication, and possesses nicking-closing- (topoisomerase I) like activity. The protein is also able to perform a strand transfer reaction on ssDNA that contains its target.\ 4766 IPR004118 \ A nonenveloped and single-stranded DNA virus designated TT virus (TTV) has been\ reported from Japan in association with hepatitis of unknown etiology PUBMED:10388667.\ 3086 IPR004191 \ The integrase family of site-specific recombinases catalyze a diverse array of DNA rearrangements in archaebacteria, eubacteria and yeast. The structure of the\ DNA binding domain of the the conjugative transposon Tn916 integrase protein was determined using NMR spectroscopy. The N-terminal domain was found to be structurally similar to the double stranded RNA binding domain (dsRBD). Experimental evidence suggests that the integrase protein interacts with DNA using residues located on the face of its three stranded beta-sheet PUBMED:9665166.\ 5508 IPR008386 \ This family consists of several ATP synthase E chain sequences which are components of the CF(0) subunit PUBMED:8011660.\ 3776 IPR005081 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases associated with clan U- have an unknown catalytic mechanism as the protein fold of the active site domain and the active site residues have not been reported.

    \

    This group of peptidases belong to the MEROPS peptidase family U4 (SpoIIGA peptidase family, clan U-).

    \ \

    Sporulation in bacteria such as Bacillus subtilis involves the formation of a polar septum, which divides the sporangium into a mother cell and a forespore. The sigma E factor, which is encoded within the spoIIG operon, is a cell-specific regulatory protein that directs gene transcription in the mother cell. Sigma E is synthesised as an inactive proprotein pro-sigma E, which is converted to the mature factor by the putative processing enzyme SpoIIGA PUBMED:11849534.

    \ 3580 IPR006024 \

    Vertebrate endogenous opioid neuropeptides are released by post-translational proteolytic cleavage of precursor proteins. The precursors consist of the following components: a signal sequence that precedes a conserved region of about 50 residues; a variable-length region; and the sequence of the\ neuropeptide itself. Three types of precursor are known: preproenkephalin A \ (gene PENK), which is processed to produce 6 copies of Met-enkephalin, plus \ Leu-enkephalin; preproenkephalin B (gene PDYN), which is processed to\ produce neoendorphin, dynorphin, leumorphin, rimorphin and Leu-enkephalin; \ and prepronocipeptin (gene PNOC), whose processing produces nociceptin\ (orphanin FQ) and two other potential neuropeptides.

    \

    Sequence analysis reveals that the conserved N-terminal region of the\ precursors contains 6 cysteines, which are probably involved in disulphide\ bond formation. It is speculated that this region might be important for \ neuropeptide processing PUBMED:8710928.

    \ 3241 IPR001783 \

    The following proteins have been shown PUBMED:1996310, PUBMED:1560772 to be structurally and evolutionary related:\

    \

    These proteins seem to have evolved from the duplication of a domain of about 100 residues. In its C-terminal section, this domain contains a conserved motif [KR]-V-N-[LI]-E which has been proposed to be the binding site for lumazine (Lum) and some of its derivatives. RS-alpha which binds two molecules of Lum has two perfect copies of this motif, while LumP which binds one molecule of Lum, has a Glu instead of Lys/Arg in the first position of the second copy of the motif. Similarly, YFP, which binds to one molecule of FMN, also seems to have a potentially dysfunctional binding site by substitution of Gly for Glu in the last position of the first copy of the motif.

    \ \ 6654 IPR006512 \

    These sequences contain a domain that is duplicated in HI0035 of Haemophilus influenzae, in YidE and YbjL of E. coli, and\ in a number of other putative transporters. Member proteins may have 0, 1, or 2 copies of the TrkA-C potassium uptake domain () between the duplications. The duplication appears distantly related to both the N- and the C-terminal domains the sodium/hydrogen exchanger family domain (). The domain contains several apparent transmembrane regions and is proposed here to act in transport.

    \ 7144 IPR009925 \

    This family consists of several hypothetical bacterial proteins of around 140 residues in length. Members of this family seem to be found exclusively in Borrelia burgdorferi (Lyme disease spirochete). The function of this family is unknown.

    \ 267 IPR005183 \

    A domain that is found in small family of bacterial secreted proteins with no known function. It ia also found in Paramecium bursaria chlorella virus 1. This domain is short and found in one or two copies. The domain has a conserved HH motif that may be functionally important.

    \ 1329 IPR006733 \

    This family represents the E56 protein, which is localized to the occlusion derived virus (ODV) envelope, but not to the budded virus (BV) envelope PUBMED:8599240. Signals necessary for transport and/or retention into this structure are believed to be found within the C-terminal portion of ODV-E56.

    \ 507 IPR007306 \

    This enzyme () modifies exclusively the initiator tRNA in position 64 using 5'-phosphoribosyl-1'-pyrophosphate as the modification donor. As the initiator tRNA participates both in the initiation and elongation of translation, the 2'-O-ribosyl phosphate modification discriminates the initiator tRNAs from the elongator tRNAs. \

    \ 2460 IPR001925 \

    The major protein of the outer mitochondrial membrane of eukaryotes is a porin that forms a voltage-dependent anion-selective channel (VDAC) that behaves as a general diffusion pore for small hydrophilic molecules PUBMED:8031826, PUBMED:1384178, PUBMED:1689252, PUBMED:2442148. The channel adopts an open conformation at low or zero membrane potential and a closed conformation at potentials above 30-40 mV.

    \

    This protein contains about 280 amino acids and its sequence is composed of between 12 to 16 beta-strands that span the mitochondrial outer membrane. Yeast contains two members of this family (genes POR1 and POR2); vertebrates have at least three members (genes VDAC1, VDAC2 and VDAC3) PUBMED:8812436.

    \ 7689 IPR013100 \

    Epoxide hydrolases catalyse the hydrolysis of epoxides to corresponding diols, which is important in detoxification, synthesis of signal molecules, or metabolism. Limonene-1,2- epoxide hydrolase (LEH) differs from many other epoxide hydrolases in its structure and its novel one-step catalytic mechanism. Its main fold consists of a six-stranded mixed beta-sheet, with three N-terminal alpha helices packed to one side to create a pocket that extends into the protein core. A fourth helix lies in such a way that it acts as a rim to this pocket. Although mainly lined by hydrophobic residues, this pocket features a cluster of polar groups that lie at its deepest point and constitute the enzyme's active site PUBMED:12773375.

    \ 1035 IPR002562 \

    This domain is responsible for the 3'-5' exonuclease proofreading\ activity of Escherichia coli DNA polymerase I (polI) and other enzymes, \ it catalyses the hydrolysis of unpaired or mismatched nucleotides. \ This domain consists of the amino-terminal half of the Klenow fragment \ in E. coli polI it is also found in the Werner syndrome helicase \ (WRN), focus forming activity 1 protein (FFA-1) and ribonuclease D\ (RNase D) PUBMED:9697700.

    \ 3199 IPR002691 \ The LIM-domain binding protein, binds to the LIM domain of LIM homeodomain proteins which are transcriptional regulators of development. Nuclear LIM interactor (NLI) / LIM domain-binding protein 1 (LDB1) is located in the nuclei of neuronal cells during development, it is co-expressed with Isl1 in early motor neuron differentiation and has a suggested role in the Isl1 dependent development of motor neurons PUBMED:8876198. It is suggested that these proteins act synergistically to enhance transcriptional efficiency by acting as co-factors for LIM homeodomain and Otx class transcription factors both of which have essential roles in development PUBMED:9192866. The Drosophila melanogaster protein Chip is required for segmentation and activity of a remote wing margin enhancer PUBMED:9334334. Chip is a ubiquitous chromosomal factor required for normal expression of diverse genes at many stages of development PUBMED:9334334. It is suggested that Chip cooperates with different LIM domain proteins and other factors to structurally support remote enhancer-promoter interactions PUBMED:9334334.\ 871 IPR006570 \

    SPK is a domain of unknown function found in SET and PHD domain containing proteins and protein\ kinases.

    \ 6990 IPR010796 \

    This family represents a conserved region approximately 100 residues long within the eukaryotic protein B9. B9 has been isolated from endothelial precursor cells.

    \ 4175 IPR000456 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L17 is one of the proteins from the large ribosomal subunit. Bacterial L17 is a protein of 120 to 130 amino-acid residues while yeast YmL8 is\ twice as large (238 residues). The N-terminal half of YmL8 is colinear\ with the sequence of L17 from Escherichia coli.

    \ 1299 IPR003676 \ This family consists of the protein products of a gene cluster that encodes a group of auxin-regulated RNAs (small auxin up RNAs, SAURs) PUBMED:2485235. Proteins from this ARG7 auxin responsive genes family have no identified functional role PUBMED:10524760.\ 5332 IPR008844 \ The GerAC protein of the Bacillus subtilis spore is required for the germination response to L-alanine. Members of this family are thought to be located in the inner spore membrane. Although the function of this family is unclear, they are likely to encode the components of the germination apparatus that respond directly to this germinant, mediating the spore's response PUBMED:11418573.\ 2276 IPR006915 \

    This group of sequences from Pseudomonas aeruginosa and Neisseria meningitidisa contain a conserved region which is often associated with a second conserved domain, . These proteins may have hemagglutinin or hemolysin activity.

    \ 1264 IPR007041 \ Arginine N-succinyltransferase catalyzes the transfer of succinyl-CoA to arginine to produce succinylarginine. This is the first step in arginine catabolism by the arginine succinyltransferase pathway.\ 3858 IPR000909 \ Phosphatidylinositol-specific phospholipase C (), a eukaryotic intracellular enzyme, plays \ an important role in signal transduction processes PUBMED:1849017. It catalyzes the hydrolysis of \ 1-phosphatidyl-D-myo-inositol-3,4,5-triphosphate into the second messenger molecules diacylglycerol \ and inositol-1,4,5-triphosphate. This catalytic process is tightly regulated by reversible phosphorylation \ and binding of regulatory proteins PUBMED:1419362, PUBMED:1319994, PUBMED:1335185. In mammals, there are at \ least 6 different isoforms of PI-PLC, they differ in their domain structure, their regulation, and their \ tissue distribution. Lower eukaryotes also possess multiple isoforms of PI-PLC. All eukaryotic PI-PLCs \ contain two regions of homology, sometimes referred to as the 'X-box' and 'Y-box'. The order of these two \ regions is always the same (NH2-X-Y-COOH), but the spacing is variable. In most isoforms, the distance\ between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, \ and one SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been \ shown to be important for the catalytic activity. By profile analysis, we could show that sequences with \ significant similarity to the X-box domain occur also in prokaryotic and trypanosome PI-specific \ phospholipases C. Apart from this region, the prokaryotic enzymes show no similarity to their eukaryotic \ counterparts.\ 3826 IPR005563 \

    The single-stranded RNA genome of bacteriophage MS2 is 3,569 nt long and contains 4 genes. Their products are necessary for phage\ maturation, encapsidation, lysis of the host, and phage RNA replication, respectively. The maturation protein is required for the typical attachment of the phage to the side of the bacterial pili. It accompanies the viral DNA into the cell.

    \ \ 5359 IPR008702 \ This family consists of several nucleopolyhedrovirus P10 proteins which are thought to be involved in the morphogenesis of the polyhedra PUBMED:9634101.\ 6888 IPR010759 \

    This family contains a number of ProFAR isomerase-like proteins found in eukaryotes, bacteria and archaea. ProFAR isomerase () is involved in the biosynthesis of the amino acid histidine, through catalysis of the irreversible isomerisation of an amino-aldose to an amino-ketose PUBMED:10944186.

    \ 5460 IPR008510 \ This family consists of several hypothetical proteins found in Borrelia burgdorferi and Borrelia garinii.\ 2005 IPR003225 \

    This family is found in hypothetical proteins of viruses.

    \ 7579 IPR011682 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 38 comprises enzymes with only one known activity; alpha-mannosidase () (). This domain is found at the C terminus of glycosyl hydrolases from family 38.

    \ 6745 IPR010702 \

    This family consists of several Enterobacterial periplasmic pectate lyase proteins. A major virulence determinant of the plant-pathogenic enterobacterium Erwinia chrysanthemi is the production of pectate lyase enzymes that degrade plant cell walls PUBMED:12423024.

    \ 5059 IPR007896 \

    This domain represents a conserved pair of transmembrane helices. It appears to be found as two\ tandem repeats in a family of hypothetical proteins.

    \ 2416 IPR001928 \

    Endothelins (ET's) are the most potent vasoconstrictors known PUBMED:2690429, PUBMED:2168326, PUBMED:1916094. They stimulate cardiac contraction, regulate release of vasoactive substances, and stimulate mitogenesis in blood vessels in primary culture. They also stimulate contraction in almost all other smooth muscles (e.g., uterus, bronchus, vas deferensa and stomach) and stimulate secretion in several tissues (e.g., kidney, liver and adrenals). Endothelin receptors have also been found in the brain, e.g. cerebral cortex, cerebellum and glial cells. Endothelins have been implicated in a variety of pathophysiological conditions associated with stress, including hypertension, myocardial infarction, subarachnoid haemorrhage and renal failure.

    \

    Endothelins are synthesised by proteolysis of large preproendothelins, which are cleaved to 'big endothelins' before being processed to the mature peptide.

    \

    Sarafotoxins (SRTX) and bibrotoxin (BTX) are cardiotoxins from the venom of snakes of the Atractaspis family, structurally and functionally PUBMED:2549664, PUBMED:1656557 similar to endothelin.

    \

    As shown in the following schematic representation, these peptides which are 21 residues long contain two intramolecular disulphide bonds.\

    \
                            +-------------+\
                            |             |\
                            CxCxxxxxxxCxxxCxxxxxx\
                              |       |\
                              +-------+\
    'C': conserved cysteine involved in a disulphide bond.\
    

    \ 2814 IPR001702 \

    The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic compounds. Proteins, known as porins PUBMED:2901351, are responsible for the 'molecular sieve' properties of the outer membrane. Porins form large water-filled channels which allows the diffusion of hydrophilic molecules into the periplasmic space. Some porins form general diffusion channels that allows any solutes up to a certain size (that size is known as the exclusion limit) to cross the membrane, while other porins are specific for a solute and contain a binding site for that solute inside the pores (these are known as selective porins). As porins are the major outer membrane proteins, they also serve as receptor sites for the binding of phages and bacteriocins.

    \

    General diffusion porins generally assemble as trimer in the membrane and the transmembrane core of these proteins is composed exclusively of beta strands PUBMED:2178269. It has been shown PUBMED:1662760 that a number of general porins are evolutionary related, these porins are:\

    \ 6014 IPR009333 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 1192 IPR003211 \ This family includes UreI and proton gated urea channel as well as putative amide transporters PUBMED:10642549.\ 5394 IPR008791 \ Interleukin-18 (IL-18) is a proinflammatory cytokine that plays a key role in the activation of natural killer and T helper 1 cell responses principally by inducing interferon-gamma (IFN-gamma). Several poxvirus genes encode proteins with sequence similarity to IL-18BPs. It has been shown that vaccinia, ectromelia and cowpox viruses secrete from infected cells a soluble IL-18BP (vIL-18BP) that may modulate the host antiviral response. The expression of vIL-18BPs by distinct poxvirus genera that cause local or general viral dissemination, or persistent or acute infections in the host, emphasises the importance of IL-18 in response to viral infections PUBMED:10769064.\ 5553 IPR008615 \ This repeat is approximately 22 residues long and is only found in Dictyostelium discoideum. It appears to be related to (personal obs:C Yeats). The alignment consists of two tandem repeats. It is termed the FNIP repeat after the pattern of conserved residues.\ 7791 IPR012420 \

    The CBP4 gene in Saccharomyces cerevisiae is essential for the expression and activity of ubiquinol-cytochrome c reductase PUBMED:8063753, PUBMED:8811190. This family appears to be fungal specific.

    \ 1653 IPR001476 \

    The chaperonins are 'helper' molecules required for correct folding and subsequent assembly of some proteins PUBMED:1349837. These are required for normal cell growth PUBMED:2897629, \ and are stress-induced, acting to stabilise or protect disassembled \ polypeptides under heat-shock conditions. Type I chaperonins present in eubacteria, mitochondria and chloroplasts require the concerted action of 2 proteins, chaperonin 60 (cpn60) and chaperonin 10 (cpn10) PUBMED:12354603.

    \

    The 10 kDa chaperonin (cpn10 - or groES in bacteria) exists as a ring-shaped oligomer of between six to eight identical subunits, while the 60 kDa \ chaperonin (cpn60 - or groEL in bacteria) forms a structure comprising 2 \ stacked rings, each ring containing 7 identical subunits PUBMED:2897629. These ring \ structures assemble by self-stimulation in the presence of Mg2+-ATP. The \ central cavity of the cylindrical cpn60 tetradecamer provides as isolated environment for protein folding whilst cpn-10 binds to cpn-60 and synchronizes the release of the folded protein in an Mg2+-ATP dependent manner PUBMED:1350777. The binding of cpn10 to \ cpn60 inhibits the weak ATPase activity of cpn60.

    \

    Escherichia coli GroES has also been shown to bind ATP cooperatively, and \ with an affinity comparable to that of GroEL PUBMED:7901771. Each GroEL subunit contains three structurally distinct domains: an apical, an intermediate and an equatorial domain. The apical\ domain contains the binding sites for both GroES and the unfolded protein substrate. The equatorial domain contains the ATP-binding site and most of the oligomeric\ contacts. The intermediate domain links the apical and equatorial domains and transfers allosteric information between them. The GroEL oligomer is a tetradecamer,\ cylindrically shaped, that is organized in two heptameric rings stacked back to back. Each GroEL ring contains a central cavity, known as the 'Anfinsen cage',\ that provides an isolated environment for protein folding. The identical 10 kDa subunits of GroES form a dome-like heptameric oligomer in solution. ATP binding to GroES may\ be important in charging the seven subunits of the interacting GroEL ring\ with ATP, to facilitate cooperative ATP binding and hydrolysis for \ substrate protein release.

    \ 7096 IPR009894 \

    This family consists of a number of exported protein precursor (EppA and BapA) sequences which seem to be specific to Borrelia burgdorferi (Lyme disease spirochete). bapA gene sequences are quite stable but the encoded proteins do not provoke a strong immune response in most individuals. Conversely, EppA proteins are much more antigenic but are more variable in sequence. It is thought that BapA and EppA play important roles during the Borrelia burgdorferi infectious cycle PUBMED:12724373.

    \ 8084 IPR013203 \

    In this family there are leaders peptides involved in the regulation of the glutaminase subunit (small subunit) of arginine-specific carbamoyl phosphate synthetase. In Neurospora crassa it is a small upstream ORF of 24 codons above the arg-2 locus PUBMED:2141606. In yeast it is the leader peptide of the CPA1 gene. The 5' region of CPA1 mRNA contains a 25 codon upstream open reading frame. The leader peptide, the product of the upstream open reading frame, plays an essential, negative role in the specific repression of CPA1 by arginine PUBMED:3555844.

    \ 7722 IPR012870 \

    These sequences are derived from hypothetical plant proteins of unknown function. The region in question is approximately 250 residues long.

    \ 6813 IPR010730 \

    This entry represents a conserved region approximately 150 residues long within various heterokaryon incompatibility proteins that seem to be restricted to ascomycete fungi. Genetic differences in specific het genes prevent a viable heterokaryotic fungal cell from being formed by the fusion of filaments from two different wild-type strains PUBMED:12019224. Many proteins of this entry also contain the WD domain, G-beta repeat and the NACHT domain.

    \ 1927 IPR003830 \

    Methanogenic archaea produce methane via the anaerobic reduction of acetate or single carbon compounds PUBMED:12440773. Coenzyme M (CoM; 2-mercaptoethanesulfonic acid) serves as the terminal methyl carrier for this process. Previously thought to be unique to methanogenic archaea, CoM has also been found in methylotrophic bacteria.

    \ \

    Biosynthesis of CoM begins with the Michael addition of sulfite to phosphoenolpyruvate, forming 2-phospho-3-sulfolactate (PSL). This reaction is catalyzed by members of this family, PSL synthase (ComA) PUBMED:11830598. Subsequently, PSL is dephosphorylated by phosphosulfolactate phosphatase (ComB) to form 3-sulfolactate PUBMED:11589710, which is then converted to \ 3-sulfopyruvate by L-sulfolactate dehydrogenase (ComC; ) PUBMED:10850983. Sulfopyruvate decarboxylase (ComDE; ) converts 3-sulfopyruvate to sulfoacetaldehyde PUBMED:10940029. Reductive thiolation of sulfoacetaldehyde is the final step.

    \ 5786 IPR010280 \

    This family consists of (uracil-5-)-methyltransferases from bacteria, archaea and eukaryotes.

    \ \ \

    A 5-methyluridine (m(5)U) residue at position 54 is a conserved feature of bacterial and eukaryotic tRNAs. The methylation of U54 is catalysed by the tRNA(m5U54)methyltransferase, which in Saccharomyces cerevisiae is encoded by the nonessential TRM2 gene. It is thought that tRNA modification enzymes might have a role in tRNA maturation not necessarily linked to their known catalytic activity PUBMED:12003492.

    \ \

    This protein family also contains the 23SrRNA methyltransferases, first proposed to be RNA methyltransferases by homology to the TrmA family. The member from Escherichia coli has now been shown to act as the 23S RNA methyltransferase for the conserved U1939. The gene is now designated rumA and was previously designated ygcA PUBMED:11779873.

    \ \ 901 IPR005637 \ The vertebrate Tap protein is a member of the NXF family of shuttling transport receptors for nuclear export of mRNA. Tap has a modular structure, and its most C-terminal domain is important for binding to FG repeat-containing nuclear pore proteins (FG-nucleoporins) and is sufficient to mediate nuclear shuttling PUBMED:11875519. The structure of the C-terminal domain is composed of four helices PUBMED:11875519. The structure is related to the UBA domain.\ 8119 IPR013251 \

    Spc19 is a component of the DASH complex. The DASH complex associates with the spindle pole body and is important for spindle and kinetochore integrity during cell division PUBMED:11799062, PUBMED:11782438.

    \ 1470 IPR002751 \ This integral membrane protein is involved in cobalamin synthesis PUBMED:8501034. Two pathways for corrin ring formation have been found-an aerobic pathway (in Pseudomonas denitrificans) and an anaerobic pathway (in Propionibacterium freudenreichii subsp. shermanii and Salmonella typhimurium)-that differ in the point of cobalt insertion. Analysis of B12 transport in Escherichia coli reveals two systems: one (with two proteins) for the outer membrane, and one (with three proteins) for the inner membrane PUBMED:8905078.\ 3628 IPR001128 \

    The cytochrome P450 enzymes constitute a superfamily of haem-thiolate proteins. P450 enzymes usually act as terminal oxidases in multicomponent\ electron transfer chains, called P450-containing monooxygenase systems and are involved\ in metabolism of a plethora of both exogenous and endogenous compounds. P450-containing\ monooxygenase systems primarily fall into two major classes: bacterial/mitochondrial\ (type I), and microsomal (type II). All P450 enzymes can be categorised into two\ main groups, the so-called B- and E-classes: P450 proteins of prokaryotic 3-component\ systems and fungal P450nor (CYP55) belong to the B-class; all other known P450 proteins\ from distinct systems are of the E-class PUBMED:7678494.

    \ \

    \ 4073 IPR003699 \

    Queuosine is a hypermodified nucleoside that usually occurs in the first position of the anticodon of tRNAs specifying the amino acids asparagine, aspartate, histidine, and tyrosine. The hypermodified nucleoside is found in bacteria and eukaryotes PUBMED:8347586. Queuosine is synthesized de novo exclusively in bacteria; for eukaryotes the compound is a nutrient factor. Queuosine biosynthesis protein, or S-adenosylmethionine:tRNA -ribosyltransferase-isomerase, is required for the synthesis of the queuosine precursor (oQ).

    \ \ 2784 IPR002201 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 9 comprises enzymes with two known activity; lipopolysaccharide N-acetylglucosaminyltransferase (), heptosyltransferase ().

    \ \

    Heptosyltransferase I is thought to add L-glycero-D-manno-heptose to the inner\ 3-deoxy-D-manno-octulosonic acid (Kdo) residue of the lipopolysaccharide core PUBMED:9446588.\ Heptosyltransferase II is a glycosyltransferase involved in the synthesis of the inner core region of lipopolysaccharide PUBMED:11054112. Lipopolysaccharide is a major component of the outer leaflet of the outer membrane in Gram-negative bacteria. It is composed of three domains; lipid A, Core oligosaccharide and the O-antigen. These enzymes transfer heptose to the lipopolysaccharide core PUBMED:9446588.

    \ 4633 IPR000062 \

    Thymidylate kinase (; dTMP kinase) catalyzes the phosphorylation of thymidine 5'-monophosphate (dTMP) to form thymidine 5'-diphosphate (dTDP) in the presence of ATP and magnesium:

    \ \

    Thymidylate kinase is an ubiquitous enzyme of about 25 Kd and is important in the dTTP synthesis pathway for DNA synthesis. The function of dTMP kinase in eukaryotes comes from the study of a cell cycle mutant, cdc8, in Saccharomyces cerevisiae Saccharomyces cerevisiae. Structural and functional analyses suggest that the cDNA codes for authentic human dTMP kinase. The mRNA levels and enzyme activities corresponded to cell cycle progression and cell growth stagesPUBMED:8024690.

    \ \ 7012 IPR010803 \

    This family consists of several Citrus tristeza virus (CTV) P33 proteins. The function of P33 is unclear although it is known that the protein is not needed for virion formation PUBMED:11112500.

    \ 5304 IPR008732 \ The nuclear PET122 gene of Saccharomyces cerevisiae encodes a mitochondrial-localised protein that activates initiation of translation of the mitochondrial mRNA from the COX3 gene, which encodes subunit III of cytochrome c oxidase PUBMED:10410243.\ 2870 IPR006712 \

    Homeodomain leucine zipper (HDZip) genes encode putative transcription factors that are unique to plants. This observation suggests that homeobox-leucine zipper genes evolved after the\ divergence of plants and animals, perhaps to mediate specific regulatory events PUBMED:7915839.

    \ \ This domain is the N-terminal of plant homeobox-leucine zipper proteins. Its function is unknown.

    \ 5846 IPR010312 \

    This family consists of several bacterial GTP-sensing transcriptional pleiotropic repressor CodY proteins. CodY has been found to repress the dipeptide transport operon (dpp) of Bacillus subtilis in nutrient-rich conditions PUBMED:7783641. The CodY protein also has a repressor effect on many genes in Lactococcus lactis during growth in milk PUBMED:11401725.

    \ 1341 IPR006962 \

    This family comprises the Baculovirus P48 proteins. They contain two possible membrane-spanning\ domains and a cysteine-rich domain that are conserved in all of the proteins. The Bombyx mori nuclear polyhedrosis\ virus protein, , has been described as a putative DNA helicase.

    \ 4210 IPR002675 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L38e forms part of the 60S ribosomal subunit PUBMED:1840484. This family is found in eukaryotes.

    \ 2066 IPR007314 \ No function is known for any member of this family.\ 3624 IPR005919 \

    Phosphomevalonate kinase () catalyzes the phosphorylation of 5-phosphomevalonate into 5-diphosphomevalonate,\ an essential step in isoprenoid biosynthesis via the mevalonate pathway. In an example of nonorthologous gene displacement, two different types of phosphomevalonate kinase are found - the higher eukaryotic form and the ERG8 type. This model represents the form of the enzyme found in animals.

    \ \ 3908 IPR002646 \

    This group includes nucleic acid independent RNA polymerases, such as polynucleotide adenylyltransferase (), which adds the poly (A) tail to mRNA. This group also includes the tRNA nucleotidyltransferase that adds the CCA to the 3' of the tRNA .

    \ 4609 IPR002853 \ The general transcription factor TFIIE has an essential role in eukaryotic\ transcription initiation together with RNA polymerase II and other\ general factors. Human TFIIE consists of two subunits TFIIE-alpha\ and TFIIE-beta and joins the preinitiation\ complex after RNA polymerase II and TFIIF PUBMED:1956403. This family consists\ of the conserved amino terminal region of eukaryotic TFIIE-alpha\ and proteins from archaebacteria that are presumed to be TFIIE-alpha\ subunits also PUBMED:9389475.\ 2350 IPR008201 \

    This entry describes archaebacterial proteins of unknown function.

    \ 787 IPR001569 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of proteins of 56 to 96 amino-acid residues that share a highly conserved region located in the N-terminal part.

    \ 7286 IPR010007 \

    This family contains human sperm proteins associated with the nucleus and mapped to the X chromosome (SPAN-X) (approximately 100 residues long). SPAN-X proteins are cancer-testis antigens (CTAs), and thus represent potential targets for cancer immunotherapy because they are widely distributed in tumours but not in normal tissues, except testes. They are highly insoluble, acidic, and polymorphic PUBMED:11133693.

    \ 816 IPR000228 \ RNA cyclases are a family of RNA-modifying enzymes that are conserved in\ eukaryotes, bacteria and archaea.\ RNA 3'-terminal phosphate cyclase () PUBMED:9184239, PUBMED:2199762 catalyses the conversion\ of 3'-phosphate to a 2',3'-cyclic phosphodiester at the end of RNA.\ \ These enzymes might be responsible for production of the cyclic phosphate RNA ends that are known to be required by many RNA ligases in both prokaryotes and eukaryotes.\

    RNA cyclase is a protein of from 36 to 42 kDa. The best conserved region is a\ glycine-rich stretch of residues located in\ the central part of the sequence and which is reminiscent of various ATP, GTP\ or AMP glycine-rich loops.

    \

    The crystal structure of RNA 3'-terminal phosphate cyclase shows that each molecule consists of two domains. The larger domain contains three repeats of a folding unit comprising two parallel alpha helices and a\ four-stranded beta sheet; this fold was previously identified in translation initiation factor 3 (IF3).\ The large domain is similar to one of the two domains of 5-enolpyruvylshikimate-3-phosphate\ synthase and UDP-N-acetylglucosamine enolpyruvyl transferase. The smaller domain uses a\ similar secondary structure element with different topology, observed in many other proteins such\ as thioredoxin PUBMED:10673421. Although the active site of this enzyme could not be\ unambiguously assigned, it can be mapped to a region surrounding His309, an adenylate\ acceptor, in which a number of amino acids are highly conserved in the enzyme from different\ sources PUBMED:10673421.

    \ 3746 IPR001567 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M3 (clan MA(E)), subfamilies M3A and M3B. The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA.

    \ \ \

    The Thimet oligopeptidase family, is a large family of archaeal, bacterial and eukaryotic oligopeptidases that cleave medium sized peptides. The group contains:

    \ \ \ 5019 IPR006456 \

    This group of sequences described by a 54-residue domain found in the N-terminal region of plant proteins, the vast majority of which contain a ZF-HD class homeobox domain toward the C terminus. The region between the two domains typically is rich in low complexity sequence. The companion ZF-HD homeobox domain is described in .

    \ 7938 IPR012514 \

    This family consists of the formaecin family of antimicrobial peptides isolated from the bulldog ant Myrmecia gulosa in response to bacterial infection. Formaecins are inducible peptide antibiotics and are active against growing Escherichia coli but were inactive against other Gram-negative and Gram-positive bacteria. Formaecin peptides are 16 amino acids long, are rich in proline and have N-acetylgalactosamine O-linked to a conserved threonine PUBMED:9497332.

    \ 5916 IPR009092 \

    ACMNPV (Autographa californica nuclear polyhedrosis virus) telokin-like protein (TLP20) lies in a region of the baculoviral genome that is expressed late in the viral replication cycle, however its function is unknown. TLP20 was discovered using anti-telokin antibodies, telokin being the C-terminal domain of smooth-muscle myosin light-chain kinase PUBMED:7517434. Both TLP20 and telokin display a seven-stranded antiparallel beta-barrel structure, although the 3-dimensional structures of the beta-barrels are different and there is no sequence homology between the two. TLP20 is structurally similar to dUTPase in its fold and trimeric assembly.

    \ \ 6200 IPR009106 \

    The cocaine and amphetamine regulated transcript (CART) is a brain-localised peptide that acts as a satiety factor in appetite regulation. CART was found to inhibit both normal and starvation-induced feeding, and completely blocks the feeding response induced by neuropeptide Y. CART is regulated by leptin in the hypothalamus, and can be transcriptionally induced after cocaine or amphetamine administration PUBMED:9590691. Posttranslational processing of CART produces an N-terminal CART peptide and a C-terminal CART peptide. The C-terminal CART peptide has been isolated from the hypothalamus, nucleus accumbens, and the anterior pituitary lobe in rats. C-terminal CART is the biologically active part of the molecule affecting food intake. The structure of C-terminal CART consists of a disulphide-bound fold containing a beta-hairpin and two adjacent disulphide bridges PUBMED:11478874.

    \ \ 1754 IPR002742 \

    Desulfoferrodoxins contains two types of iron: an Fe-S4 site very similar to that found in desulfoferrodoxin from Desulfovibrio gigas, and an octahedral coordinated high-spin ferrous site most probably with nitrogen/oxygen-containing ligands. Due to this rather unusual combination of active centres, this novel protein is named desulfoferrodoxin PUBMED:2174880.

    \

    This domain comprises essentially the full length of neelaredoxin (, PUBMED:8001576), a monomeric, blue, non-haeme iron protein of D. gigas said to bind two iron atoms per monomer with identical spectral properties. Neelaredoxin was shown recently to have significant superoxide dismutase activity PUBMED:9914498. This domain is also found (in a form in which the distance between the motifs H[HWYF]IXW and CN[IL]HGXW is somewhat shorter) as the C-terminal domain of desulfoferrodoxin, which is said to bind a single ferrous iron atom.\ The N-terminal domain of desulfoferrodoxin is described by .

    \ \ 4908 IPR004628 \

    This Fe2+-requiring enzyme plays a role in D-glucuronate catabolism in Escherichia coli. Mannonate dehydratase converts D-mannonate to 2-dehydro-3-deoxy-D-gluconate. An apparent equivalog is found in a glucuronate utilization operon in Bacillus stearothermophilus T-6.

    \ 7709 IPR012869 \

    The proteins in this family have not been characterised, but contain a ribbon-helix-helix domain, making them a family of putative repressors.

    \ 5526 IPR008880 \ In the Escherichia coli cytosol, a fraction of the newly synthesised proteins requires the activity of molecular chaperones for folding to the native state. The major chaperones implicated in this folding process are the ribosome-associated Trigger Factor (TF), and the DnaK and GroEL chaperones with their respective co-chaperones. Trigger Factor is an ATP-independent chaperone and displays chaperone and peptidyl-prolyl-cis-trans-isomerase (PPIase) activities in vitro. It is composed of at least three domains, an N-terminal domain which mediates association with the large ribosomal subunit, a central substrate binding and PPIase domain with homology to FKBP proteins, and a C-terminal domain of unknown function. The positioning of TF at the peptide exit channel, together with its ability to interact with nascent chains as short as 57 residues renders TF a prime candidate for being the first chaperone that binds to the nascent polypeptide chains PUBMED:12603737. These sequences contain the C-terminal domain.\ 5231 IPR008779 \ This family consists of several histidine-rich protein II and III sequence from Plasmodium falciparum PUBMED:8432609, PUBMED:3016741.\ 6016 IPR009335 \

    This family consists of several bacterial HrpE proteins. The exact function of this family is unknown but it is thought that HrpE is involved in the secretion of HrpZ (harpinPss) PUBMED:7579617.

    \ 2794 IPR007867 \ The glucose-methanol-choline (GMC) oxidoreductase oxidoreductases are FAD flavoproteins oxidoreductases. The function of this domain is currently unknown.\ 113 IPR000938 \

    Cytoskeleton-associated proteins (CAP) are made of three distinct parts, an N-terminal section that is most probably globular and contains the CAP-Gly domain, a large central region predicted to be in an alpha-helical coiled-coil conformation and, finally, a short C-terminal globular domain. The CAP-Gly \ domain is a conserved, glycine-rich domain of about 42 residues found in some CAPs PUBMED:8480366. Proteins known to contain this domain include restin (also known as cytoplasmic linker protein-170 or CLIP-170), a 160 kDa protein associated with intermediate filaments and that links endocytic vesicles to microtubules; vertebrate dynactin (150 kDa dynein-associated polypeptide; DAP) and Drosophila glued, a major component of activator I; yeast protein BIK1, which seems to be required for the formation or\ stabilisation of microtubules during mitosis and for spindle pole body fusion during conjugation; yeast protein NIP100 (NIP80); human protein CKAP1/TFCB; Schizosaccharomyces pombe protein alp11 and Caenorhabditis elegans hypothetical protein F53F4.3. The latter proteins contain a N-terminal ubiquitin domain and a C-terminal \ CAP-Gly domain.

    \ \

    The crystal structure of the CAP-Gly domain of C. elegans F53F4.3 protein, solved by single wavelength sulphur-anomalous phasing, revealed a novel protein fold containing three beta-sheets. The most conserved sequence, GKNDG, is located in two consecutive sharp turns on the surface, forming the entrance to a groove. Residues in the groove are highly conserved as measured from the information content of the aligned sequences. The C-terminal tail of another molecule in the crystal is bound in this groove PUBMED:12221106.

    \ 7678 IPR012864 \

    This family contains many eukaryotic hypothetical proteins. The region featured in this family is approximately 120 residues long. Members of this family may belong to the cupin superfamily.

    \ 3243 IPR000382 \

    ORF2 of potato leafroll luteovirus (PLLV) encodes a polyprotein which is translated following a -1 frameshift. The polyprotein has a putative linear arrangement of membrane achor-VPg-peptidase-polmerase domains. The serine peptidase domain which is found in this group of sequences belongs to MEROPS peptidase family S39 (clan PA(S)), subfamily S39B. It is likely that the peptidase domain is involved in the cleavage of the polyprotein PUBMED:9714253.

    \ \ \ \

    The nucleotide sequence for the RNA of potato leafroll luteovirus (PLRV) has been determined PUBMED:2732710, PUBMED:2466700. The sequence contains six large open reading frames\ (ORFs). The 5' coding region encodes two polypeptides of 28K and 70K, which \ overlap in different reading frames; it is suggested that the third ORF in \ the 5' block is translated by frameshift readthrough near the end of the 70K \ protein, yielding a 118K polypeptide PUBMED:2732710. Segments of the predicted amino acid sequences of these ORFs resemble those of known viral RNA polymerases, ATP-binding proteins and viral genome-linked proteins.\ The nucleotide sequence of the genomic RNA of beet western yellow virus (BWYV) has been determined PUBMED:3194229. The sequence contains six long ORFs. A cluster of three of these ORFs, including the coat protein cistron, display extensive amino acid sequence similarity to corresponding ORFs of a second luteovirus, the PAV isolate of barley yellow dwarf virus (BYDV) PUBMED:3194229.

    \ \ 7111 IPR010837 \

    This family contains TrbH, a bacterial conjugal transfer protein approximately 150 residues long. This contains a putative membrane lipoprotein lipid attachment site PUBMED:9829924.

    \ 3482 IPR007574 \ In the cyanobacterium Synechococcus species PCC 7942 (), nblA triggers degradation of light-harvesting phycobiliproteins in response to deprivation nutrients including nitrogen, phosphorus and sulphur. The mechanism of nblA function is not known, but it has been hypothesised that nblA may act by disrupting phycobilisome structure, activating a protease or tagging phycobiliproteins for proteolysis. Members of this family have also been identified in the chloroplasts of some red algae.\ 2798 IPR003474 \ This is a family of integral membrane permeases that are involved in gluconate uptake. Escherichia coli contains several members of this family including GntU, a low affinity transporter PUBMED:9135111 and GntT, a high affinity transporter PUBMED:9045817.\ 5114 IPR007951 \

    This family consists of several mouse anagen-specific\ protein mKAP13 (PMG1 and PMG2). PMG1 and 2 contain characteristic repeats reminiscent of\ the keratin-associated proteins (KAPs). Both genes are expressed in growing hair follicles in skin as\ well as in sebaceous and eccrine sweat glands. Interestingly, expression is also detected in the\ mammary epithelium where it is limited to the onset of the pubertal growth phase and is independent\ of ovarian hormones. Their broad, developmentally controlled expression pattern, together with their\ unique amino acid composition, demonstrate that pmg-1 and pmg-2 constitute a novel KAP gene\ family participating in the differentiation of all epithelial cells forming the epidermal appendages\ PUBMED:10446281.

    \ 4847 IPR002549 \

    This is a family of hypothetical proteins. A number of the sequence records state they are transmembrane proteins or putative permeases. It is not clear what source suggested that these proteins might be permeases and this\ information should be treated with caution.

    \ \ 840 IPR003452 \ Stem cell factor (SCF) is a homodimer involved in hematopoiesis. SCF binds to and activates the SCF receptor (SCFR), a receptor tyrosine kinase. SCF stimulates the proliferation of mast cells and is able to augment the proliferation of both myeloid and lymphoid hematopoietic progenitors in bone marrow culture. It also mediates cell-cell adhesion and acts synergistically with other cytokines. SCF is a type I membrane protein, but is also found in a secretable, soluble form. The crystal structure of human SCF has been resolved and a potential receptor-binding site identified PUBMED:10884405.\ 353 IPR001810 \

    The F-box domain was first described as a sequence motif found in cyclin-F that interacts with the protein SKP1 PUBMED:8706131, PUBMED:9346238. This relatively conserved structural motif is present in numerous proteins and serves as a link between a target protein and a ubiquitin-conjugating enzyme. The SCF complex (e.g., Skp1-Cullin-F-box) plays a similar role as an E3 ligase in the ubiquitin protein degradation pathway PUBMED:9499404, PUBMED:9635407. Different\ F-box proteins as a part of SCF complex recruit particular substrates for ubiquitination through specific proteinprotein interaction domains.

    \ \

    Many mammalian F-box domains contain leucine-rich or WD-40 repeats (). However, several F-box\ proteins either have other previously described domains such as Sec7 domain found in FBS protein or do not contain defined proteinprotein\ interaction domains or motifs.

    \ 1915 IPR003801 \

    This entry describes proteins of unknown function.

    \ 2827 IPR004887 \ Glutathione synthetase () (GSS) catalyses the conversion of gamma-L-glutamyl-L-cysteine and glycine to phosphate and glutathione in the presence of ATP. This is the second step in glutathione biosynthesis. In humans, defects in GSS are inherited in an autosomal recessive way and are the cause of severe metabolic acidosis, 5-oxoprolinuria, increased rate of hemolysis and defective function of the central nervous system.\ \ 7133 IPR009918 \

    This family consists of several Enterobacterial sequences of around 200 residues in length, which are often known as YiiQ proteins. The function of this family is unknown.

    \ 600 IPR006797 \

    These proteins contain a conserved region found in the yeast YLR168C gene MSF1 product. The function of this protein is unknown, though it is thought to be involved in intra-mitochondrial protein sorting. This region is also found in a number of other eukaryotic proteins. The PRELI/MSF1 domain is an eukaryotic protein module which occurs in stand-\ alone form in several proteins, including the human PRELI protein and the\ yeast MSF1 protein, and as an amino-terminal domain in an orthologous group of\ proteins typified by human SEC14L1, which is conserved in all animals. In this\ group of proteins, the PRELI/MSF1 domain co-occurs with the CRAL-TRIO (see\ ) and the GOLD domains (see ). The PRELI/MSF1 domain is\ approximately 170 residues long and is predicted to assume a globular alpha +\ beta fold with six beta strands and four alpha helices. It has been suggested\ that the PRELI/MSF1 domain may have a function associated with cellular\ membrane PUBMED:12049664.

    \ \ 802 IPR007081 \ RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 5, represents the discontinuous cleft domain that is required to form the central cleft or channel where the DNA is bound PUBMED:8910400, PUBMED:11313498.\ 3655 IPR004270 \ The E5 protein from papillomaviruses is about 80 amino acids long and contain three regions that have been predicted to be transmembrane alpha helices. The function of this protein is unknown.\ 3128 IPR003852 \ This is a family of KdpD sensor kinase proteins that regulate the kdpFABC operon responsible for potassium transport PUBMED:9226259. The aligned region corresponds to the N-terminal cytoplasmic part of the protein which may be the sensor domain responsible for sensing turgor pressure PUBMED:1532388.\ 4127 IPR001789 \

    Bipartite response regulator proteins are involved in a two-component signal transduction system in bacteria, and certain eukaryotes like protozoa, that functions to detect and respond to environmental changes PUBMED:7699720. These systems have been detected during host invasion, drug resistance, motility, phosphate uptake, osmoregulation, and nitrogen fixation, amongst others PUBMED:12015152. The two-component system consists of a histidine protein kinase environmental sensor that phosphorylates the receiver domain of a response regulator protein; phosphorylation induces a conformational change in the response regulator, which activates the effector domain, triggering the cellular response PUBMED:10966457. The domains of the two-component proteins are highly modular, but the core structures and activities are maintained.

    \

    The response regulators act as phosphorylation-activated switches to affect a cellular response, usually by transcriptional regulation. Most of these proteins consist of two domains, an N-terminal response regulator receiver domain, and a variable C-terminal effector domain with DNA-binding activity. This entry represents the response regulator receiver domain, which belongs to the CheY family, and receives the signal from the sensor partner in the two-component system.

    \ \ 4030 IPR001302 \ PsaI has a crucial role in aiding normal structural organization of PsaL within the photosystem I complex and the absence of PsaI alters PsaL organization, leading to a small, but physiologically significant, defect in photosystem I function PUBMED:7608190.\ PsaL encodes a subunit of photosystem I and is necessary for trimerization of photosystem I. PsaL may constitute the trimer-forming domain in the structure of photosystem I PUBMED:8262256.\ 5906 IPR009282 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 2603 IPR004227 \

    This domain represents the tetrahydrofolate (THF) dependent glutamate formiminotransferase involved in the histidine utilization pathway. This enzyme interconverts L-glutamate and N-formimino-L-glutamate. The enzyme is bifunctional as it also catalyzes the cyclodeaminase reaction on N-formimino-THF, converting it to 5,10-methenyl-THF and releasing ammonia; part of the process of regenerating THF. This model covers enzymes from metazoa as well as Gram-positive bacteria and archaea. In humans, deficiency of this enzyme results in a disease phenotype PUBMED:12815595. The crystal structure of the enzyme has been studied in the context of the catalytic mechanism PUBMED:10673422.

    \ \ 5342 IPR008637 \ This is a family of plant proteins that are associated with the hypersensitive response (HR) pathway of defence against plant pathogens.\ 4313 IPR004294 \

    Carotenoids such as beta-carotene, lycopene, lutein and beta-cryptoxanthine are produced in plants and certain bacteria, algae and fungi, where they function as accessory photosynthetic pigments and as scavengers of oxygen radicals for photoprotection. They are also essential dietary nutrients in animals. Carotenoid oxygenases cleave a variety of carotenoids into a range of biologically important products, including apocarotenoids in plants that function as hormones, pigments, flavours, floral scents and defence compounds, and retinoids in animals that function as vitamins, visual pigments and signalling molecules PUBMED:14704328. Examples of carotenoid oxygenases include:

    \

    \ \ \ 6494 IPR009553 \

    This family contains a group of hypothetical bacterial proteins that contain three conserved cysteine residues towards the N-terminal. The function of these proteins is unknown.

    \ 2996 IPR000550 \ All organisms require reduced folate cofactors for the synthesis of a variety of metabolites. Most microorganisms must synthesise folate de novo because they lack the active transport system of higher vertebrate cells which allows these organisms to use dietary folates. Enzymes involved in folate\ biosynthesis are therefore targets for a variety of antimicrobial agents such as trimethoprim or sulphonamides. 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase () (HPPK) catalyzes the attachment of pyrophosphate to 6-hydroxymethyl-7,8-dihydropterin to form 6-hydroxymethyl-7,8-dihydropteridine pyrophosphate. This is the first step in a three-step pathway leading to 7,8 dihydrofolate. Bacterial HPPK (gene folK or sulD) PUBMED:1325970 is a protein of 160 to 270 amino acids. In the lower eukaryote Pneumocystis carinii, HPPK is the central domain of a multifunctional folate synthesis enzyme (gene fas) PUBMED:1313386.\ 7756 IPR012915 \

    The sequences in this family are similar to the reoviral minor core protein lambda 3 (), which functions as a RNA-dependent RNA polymerase within the protein capsid. It is organised into 3 domains. The N- and C-terminal domains create a "cage" which encloses a conserved central catalytic domain within a hollow centre. This catalytic domain is arranged to form finger, palm and thumb subdomains. Unlike other RNA polymerases, such as HIV reverse transcriptase and T7 RNA polymerase, the lambda 3 protein binds template and substrate with only localised rearrangements, and catalytic activity can occur with little structural change. However, the structure of the catalytic complex is similar to that of other polymerase catalytic complexes with known structure PUBMED:12464184.

    \ 5535 IPR008900 \ This family consists of bacterial and viral proteins which are very similar to the Zonular occludens toxin (Zot). Zot is elaborated by bacteriophage present in toxigenic strains of Vibrio cholerae. Zot is a single polypeptide chain of 44.8 kDa, with the ability to reversibly alter intestinal epithelial tight junctions, allowing the passage of macromolecules through mucosal barriers.\ 1292 IPR005521 \

    This domain is found in attacin, sarcotoxin and diptericin. All members of these proteins are insect antibacterial proteins which are induced by the fat body and subsequently secreted into the hemolymph where they act synergistically to kill the invading microorganism PUBMED:7772280.

    \ 1390 IPR000874 \ Bombesin-like peptides comprise a large family of peptides which were initially isolated from amphibian\ skin, where they stimulate smooth muscle contraction. They were later found to be widely distributed in \ mammalian neural and endocrine cells. The amphibian peptides which belong to this family are currently \ classified into three subfamilies PUBMED:6141890, PUBMED:3868775; the Bombesin group, which includes bombesin and alytesin; the \ Ranatensin group, which includes ranatensins, litorin, and Rohdei litorin; and the Phyllolitorin group, \ which includes Leu(8)- and Phe(8)-phyllolitorins. In mammals and birds two categories of bombesin-like \ peptides are known PUBMED:1726343, PUBMED:2458345, gastrin-releasing peptide (GRP), which stimulates the \ release of gastrin as well as other gastrointestinal hormones, and neuromedin B (NMB), a neuropeptide \ whose function is not yet clear. Bombesin-like peptides, like many other active peptides, are synthesized \ as larger protein precursors that are enzymatically converted to their mature forms. The final peptides \ are eight to fourteen residues long.\ 846 IPR006896 \

    COPII (coat protein complex II)-coated vesicles carry proteins from the endoplasmic reticulum (ER) to the Golgi complex PUBMED:11535824. COPII-coated vesicles form on the ER by the stepwise recruitment of three cytosolic components: Sar1-GTP to initiate coat formation, Sec23/24 heterodimer to select SNARE and cargo molecules, and Sec13/31 to induce coat polymerisation and membrane deformation PUBMED:12239560.

    \

    Sec23 p and Sec24p are structurally related, folding into five distinct domains: a beta-barrel, a zinc-finger (), an alpha/beta trunk domain, an all-helical region (), and a C-terminal gelsolin-like domain (). This entry describes the Sec23/24 alpha/beta trunk domain, which is formed from a single, approximately 250-residue segment plugged into the beta-barrel between strands beta-1 and beta-19. The trunk has an alpha/beta fold with a vWA topology, and it forms the dimer interface, primarily involving strand beta-14 on Sec23 and Sec24; in addition, the trunk domain of Sec23 contacts Sar1.

    \ \ 779 IPR007614 \ This is a domain of Drosophila proteins related to the C-terminal region of the fly Retinin protein. Conserved region is found towards the C terminus of the member proteins.\ 7225 IPR010871 \

    This family consists of a number of repeats of around 34 residues in length. Members of this family seem to be found exclusively in three hypothetical Murid herpesvirus 4 proteins. The function of this family is unknown.

    \ 4293 IPR002759 \

    This family contains proteins found in some eukaryotes and archaebacteria that are related to yeast ribonuclease P. This enzyme is essential for tRNA processing generating 5'-termini of mature tRNA\ molecules PUBMED:7731988. tRNA processing enzyme ribonuclease P (RNase P) consists of an RNA molecule associated with at least eight protein subunits, hPop1, Rpp14, Rpp20, Rpp25, Rpp29,\ Rpp30, Rpp38, and Rpp40 PUBMED:10024167.

    \ 26 IPR013149 \

    This region is the C-terminal domain of the Zinc-binding alcohol dehydrogenases.

    \ 3750 IPR001570 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases constitutes the MEROPS peptidase family M4 (thermolysin family, clan MA(E)). The protein fold of the peptidase domain of thermolysin, is the type eaxample for members of the clan MA. The thermolysin family is composed only of secreted eubacterial endopeptidases. The zinc-binding residues\ are H-142, H-146 and E-166, with E-143 acting as the catalytic residue.\ Thermolysin also contains 4 calcium-binding sites, which contribute to its\ unusual thermostability. The family also includes enzymes from a number\ of pathogens, including Legionella and Listeria, and the protein pseudolysin,\ all with a substrate specificity for an aromatic residue in the P1' position. Three-dimensional structure analysis has shown that the enzymes undergo\ a hinge-bend motion during catalysis. Pseudolysin has a broader\ specificity, acting on large molecules such as elastin and collagen,\ possibly due to its wider active site cleft PUBMED:7674922.

    \ 3471 IPR003694 \ NAD+ synthase () catalyzes the last step in the biosynthesis of nicotinamide adenine dinucleotide and is induced by stress factors such as heat shock and glucose limitation. The three-dimensional structure of NH3-dependent NAD+ synthetase from Bacillus subtilis, in its free form and in complex with ATP shows that the enzyme consists of a tight homodimer with alpha/beta subunit topology PUBMED:8895556.\ 4189 IPR001684 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    L27 is a protein from the large (50S) subunit; it is essential for ribosome function, but its exact role is unclear. It belongs to a family of ribosomal proteins, examples of which are found in bacteria, chloroplasts of plants and red algae and the mitochondria of fungi (e.g. MRP7 from yeast mitochondria). The schematic relationship between these groups of proteins is shown below.\

    \
    Bacterial L27           Nxxxxxxxxx\
    Algal L27               Nxxxxxxxxx\
    Plant L27          tttttNxxxxxxxxxxxxx\
    Yeast MRP7           tttNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\
    \
    't': transit peptide.\
    'N': N-terminal of mature protein.\
    

    \ 7184 IPR010858 \

    This family consists of several hypothetical bacterial proteins of around 230 residues in length. Members of this family are often referred to as YjaH and are found in the Orders Vibrionales and Enterobacteriales. The function of this family is unknown.

    \ 3219 IPR001595 \

    This family of lipoproteins is Mycoplasma specific, and includes a variety of hypothetical proteins PUBMED:8948633. They all have a prokaryotic membrane lipoprotein lipid attachment site which is probable acts as a membrane anchor.

    \ 4588 IPR011564 \

    The telomere-binding protein forms a heterodimer in ciliates consisting of an alpha and a beta subunit. This complex may function as a protective cap for the single-stranded telomeric overhang. Alpha subunit consists of 3 structural domains, all with the same beta-barrel OB fold.

    \ 4906 IPR000212 \

    Members of this family are helicases that catalyse ATP dependent\ unwinding of double stranded DNA to single stranded DNA. THe family\ includes both Rep and UvrD helcases.\ The Rep family helicases are composed of four structural domains PUBMED:9288744.\ The Rep proteins function as dimers.

    \ 5511 IPR004647 \

    A number of Fe-S cluster-containing hydro-lyases share a conserved motif, including\ argininosuccinate lyase, adenylosuccinate lyase, aspartase, class I fumarate hydratase\ (fumarase), and tartrate dehydratase (see ). Proteins in this group represent\ a subset of closely related proteins or modules, including the Escherichia coli tartrate dehydratase\ beta chain and the C-terminal region of the class I fumarase (where the N-terminal region is\ homologous to the tartrate dehydratase alpha chain). The activity of the archaeal proteins in\ this group is unknown.

    \ \ 2051 IPR007211 \ These are predicted membrane proteins of unknown function. The majority of the proteins have two predicted transmembrane regions.\ 1365 IPR001255 \ Beta-amyloid protein (beta-APP) is a 40-residue peptide implicated in the pathogenesis \ of Alzheimers disease (AD) and aged Down's Syndrome (which is promoted by the acquisition \ of an additional copy of chromosome 21) PUBMED:8425535, PUBMED:8380642, PUBMED:1363811. The peptide is a proteolytic product of the much \ larger amyloid precursor protein (APP) encoded by a gene on chromosome 21. \ The protein comprises a large extracellular N-terminal domain, and a short hydrophobic \ membrane-spanning domain, followed by a short C-terminal region. Beta-APP both precedes \ and forms part of the transmembrane region. \

    In AD, pathologically the brain is characterised by extracellular amyloid plaques, \ intraneuronal neurofibrillary tangles, and vascular and neuronal damage. The major\ protein found within these deposits is a small, highly aggregating peptide (beta-APP), \ which is thought to be derived from aberrant catabolism of its precursor.\

    \ 3901 IPR000845 \ The following phosphorylases belong to the same family, purine nucleoside phosphorylase () \ (PNP) from most bacteria (gene deoD), which catalyzes the cleavage of guanosine or inosine to\ respective bases and sugar-1-phosphate molecules PUBMED:8534998; uridine phosphorylase () \ (UdRPase) from bacteria (gene udp) and mammals, which catalyzes the cleavage of uridine into uracil \ and ribose-1-phosphate, the products of the reaction are used either as carbon and energy sources or \ in the rescue of pyrimidine bases for nucleotide synthesis PUBMED:7744869; and 5'-methylthioadenosine \ phosphorylase () (MTA phosphorylase) from Sulfolobus solfataricus PUBMED:7929153. It should \ be noted that mammalian and some bacterial PNP as well as eukaryotic MTA phosphorylase belong to a \ different family of phosphorylases.\ 4501 IPR000609 \

    Animals recognise a wide variety of chemicals using their senses of taste and smell. The nematode Caenorhabditis elegans\ has only 14 types of chemosensory neuron, yet is able to respond to dozens of chemicals because each\ neuron detects several stimuli. More than 40 highly divergent transmembrane proteins that could contribute\ to this functional diversity have been described. Most of the candidate receptor genes are in clusters of\ similar genes; 11 of these appear to be expressed in small subsets of chemosensory neurons. A single type of\ neuron can potentially express at least 4 different receptor genes. Some of these might encode receptors for water-soluble attractants, repellents and pheromones, which may be divergent members of the G-protein-coupled receptor family PUBMED:7585938.

    \ \

    This entry contains sequences of the str and stl gene families which encode seven-transmembrane G-protein-coupled or serpentine receptors in Caenorhabditis elegans and C. briggsae PUBMED:9582190. These can be distinguished from other 7TM proteins (especially those known to couple G-proteins) by their own characteristic TM signatures.

    \ 1286 IPR006721 \

    This family constitutes the mitochondrial ATP synthase epsilon subunit. This is not to be confused with the bacterial epsilon subunit, which is homologous to the mitochondrial delta subunit (. ATP synthase produces ATP from ADP and Pi by using the transmembrane proton motive force generated by oxidative phosphorylation or\ photosynthesis. It is composed of two major parts: a cytoplasmic F1 part that includes the three catalytic sites for ATP\ synthesis/hydrolysis and a membrane-embedded F0 part that constitutes a proton channel. These two parts are structurally\ connected by two stalks, a central stalk of the gamma and epsilon subunits and an outer stalk. A regulatory protein, IF1, is found also in isolated mitochondrial ATP synthases.

    \ The epsilon subunit is located in the extrinsic membrane section F1, which is the catalytic site of ATP synthesis. The epsilon subunit was not well ordered in the crystal structure of bovine F1 PUBMED:8065448, but it is known to be located in the stalk region of F1 PUBMED:10727396. The epsilon subunit acts as an inhibitor of the ATPase of the isolated F1 subunit, with all of the inhibitory effect caused by the C-terminal\ helix-turn-helix domain of the epsilon subunit. Recent studies have also demonstrated a role of the subunit in inhibition of the ATPase activity of EF1F0. The epsilon subunit can exist in two very different conformations of subunit within EF1F0 by which the subunit can function as a ratchet to differentially regulate\ ATP hydrolysis and ATP synthesis \ PUBMED:10727396.\ 5599 IPR008786 \

    This family contains the vaccinia virus A31R protein, the function of which is not known.

    \ 381 IPR008254 \

    This domain is found in a number of proteins including flavodoxin and nitric-oxide synthase. Flavodoxins are electron-transfer proteins that function in various electron transport systems. They bind one FMN molecule, which serves as a\ redox-active prosthetic group PUBMED:2597140 and are functionally interchangeable\ with ferredoxins. They have been isolated from prokaryotes, cyanobacteria, and\ some eukaryotic algae. Nitric oxide synthase () produces nitric oxide from L-arginie and NADPH. Nitric oxide acts as a messenger molecule in the body.

    \ 6068 IPR010419 \

    The CO dehydrogenase structural genes coxMSL are flanked by nine accessory genes arranged as the cox gene cluster. The cox genes are specifically and coordinately transcribed under chemolithoautotrophic conditions in the presence of CO as carbon and energy source PUBMED:10433972.

    \ 3051 IPR001322 \

    Intermediate filaments (IF) are primordial components of the cytoskeleton and the \ nuclear envelope PUBMED:8771189. They generally form filamentous structures 8 to 14 nm \ wide. IF proteins are members of a very large multigene family of proteins which has been \ subdivided in five major subgroups, type I: acidic cytokeratins, type II: basic \ cytokeratins, type III: vimentin, desmin, glial fibrillary acidic protein (GFAP),\ peripherin, and plasticin, type IV: neurofilaments L, H and M, alpha-internexin and \ nestin, and type V: nuclear lamins A, B1, B2 and C. The lamins are components of the\ nuclear lamina, a fibrous layer on the nucleoplasmic side of the inner nuclear membrane\ that may provide a framework for the nuclear envelope and may interact with chromatin.

    \

    All IF proteins are structurally similar in that they consist of a central rod domain \ arranged in coiled-coil alpha-helices, with at least two short characteristic \ interruptions; a N-terminal non-helical domain (head) of variable length; and a C-terminal\ domain (tail) which is also non-helical, and which shows extreme length variation between \ different IF proteins. The C-terminal domain has been charcterised for the lamins.

    \ 7964 IPR012511 \

    This family consists of the S-adenosyl-l-methionine decarboxylase (AdoMetDC) leader peptides. AdoMetDC is a key regulatory enzyme in the biosynthesis of polyamines. All expressed plant AdoMetDC mRNA 5, leader sequences contain a highly conserved pair of overlapping upstream ORFs (uORFs) that overlap by one base. Sequences of the small uORFs are highly conserved between monocot, dicot and gymnosperm AdoMetDC mRNA species, suggesting a translational regulatory mechanism PUBMED:11139406.

    \ 7386 IPR011516 \

    Shugoshin-like proteins contain a conserved sequence at the N terminus. Shugoshin (Sgo1) protects Rec8 at centromeres during anaphase I (during meiosis) so that sister chromatids remain tethered PUBMED:14730319.

    \ 4164 IPR001197 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A variety of eukaryotic and plant ribosomal L10E proteins can be grouped.\ This family consists of vertebrate L10 (QM) PUBMED:8780716, plant L10, Caenorhabditis elegans L10, yeast L10 (QSR1) and\ Methanococcus jannaschii MJ0543.

    \ 7635 IPR012893 \

    The members of this entry are similar to a region close to the C-terminus of the HipA protein expressed by various bacterial species (for example ). This protein is known to be involved in high-frequency persistence to the lethal effects of inhibition of either DNA or peptidoglycan synthesis PUBMED:1715862. When expressed alone, it is toxic to bacterial cells PUBMED:1715862, but it is usually tightly associated with HipB PUBMED:8021189, and the HipA-HipB complex may be involved in autoregulation of the hip operon. The hip proteins may be involved in cell division control and may interact with cell division genes or their products PUBMED:8021189.

    \ 360 IPR004330 \

    This domain was first identified in an Arabidopsis mutant, far1 (far-red-impaired response), which has reduced responsiveness to continuous far-red light, but\ responds normally to other light wavelengths. The\ FAR1 gene encodes a protein with no significant sequence similarity to any proteins of known function PUBMED:10444599. The FAR1 protein\ contains a predicted nuclear localization signal and is targeted to the nucleus in transient transfection assays.

    \ \

    This domain is also found in members from other\ plant species, such as Arabidopsis thaliana and Oryza sativa (rice).

    \ 1049 IPR003411 \

    This family consists of a 7 kDa coat protein from carlavirus and potexvirus PUBMED:8010191.

    \ 4894 IPR001483 \ Urotensin II, a small peptide that contains a\ disulphide bridge, was originally isolated from the caudal\ portion of the spinal cord of teleost and elasmobranch fish PUBMED:1620290. The peptide has also been found in the brain of frogs PUBMED:1445302. Urotensin II seems to be involved in smooth\ muscle stimulation.\ \ 5934 IPR009295 \

    This family consists of several hypothetical proteins from different Staphylococcus species. The function of this family is unknown.

    \ 5631 IPR008447 \ This family consists of several Chordopoxvirus L2 proteins.\ 3301 IPR003183 \

    Methyl-coenzyme M reductase (MCR) is the enzyme responsible for microbial formation of methane. It is a hexamer composed of 2 alpha, 2 beta, and 2 gamma subunits with two identical nickel porphinoid active sites PUBMED:9367957.

    \

    The N-terminal domain has a ferredoxin-like fold.

    \ 6164 IPR010460 \

    This domain, of unknown function, is found associated with ubiquitin carboxyl-terminal hydrolase family 2 (, MEROPS peptidase family C19). They are a family 100 to 200 kDa peptides which includes the Ubp1 ubiquitin peptidase from yeast.

    \ 3408 IPR007560 \ This is a prokaryotic family found in type II restriction enzymes containing the hallmark (D/E)-(D/E)XK active site. Presence of catalytic residues implicates this region in the enzymatic cleavage of DNA PUBMED:1650347, PUBMED:11313145.\ 3148 IPR007464 \ This is a family of bacteriocins from lactic acid bacteria.\ 3890 IPR002510 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    The peptidases associated with clan U- have an unknown catalytic mechanism as the protein fold of the active site domain and the active site residues have not been reported.

    \

    This group of peptidases belong to MEROPS peptidase family U62 (clan U-). The type example is microcin-processing peptidase 1 from Escherichia coli, which is the product of the gene PmbA. It has been suggests that the pmbA gene product acts to inhibit the interaction between the letD protein and the A subunit of DNA gyrase. The letA (ccdA) and letD (ccdB) genes of the F plasmid, located just outside the sequence essential for F-plasmid replication, contribute to stable maintenance of the plasmid in E. coli\ cells. The letD gene product acts to inhibit partitioning of chromosomal DNA and cell\ division by inhibiting DNA gyrase activity, whereas the letA gene product acts to reverse the\ inhibitory activity of the letD gene product PUBMED:8604133.

    \ \ It has also been proposed that PmbA facilitates the secretion of microcin B17 (MccB17) the by completing its maturation PUBMED:2082149. Microcin B17 (MccB17) is a peptide antibiotic produced by E. coli strains harbouring plasmid pMccB17.\

    \ 6438 IPR010574 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 4029 IPR006145 \ Pseudouridine synthases are responsible for synthesis of pseudouridine from uracil in 23S rRNA.\ Proteins belonging to the family of pseudouridine synthases have been shown to share regions of\ similarities PUBMED:9660827. These include Escherichia coli and Haemophilus influenzae ribosomal large subunit\ pseudouridine synthase A (gene rluA), C (gene rluC) and D (gene rluD); yeast DRAP deaminase\ (gene RIB2); Escherichia coli hypothetical protein yqcB and HI1435, the corresponding Haemophilus influenzae protein;\ Bacillus subtilis hypothetical proteins yhcT, yjbO and ylyB; Helicobacter pylori hypothetical proteins HP0347;\ HP0745 and HP0956; Mycoplasma genitalium hypothetical proteins MG209 and MG370; Synechocystis strain\ PCC 6803 hypothetical proteins slr1592 and slr1629; yeast hypothetical proteins YDL036c, YGR169c and\ SpAC18B11.02c; and Caenorhabditis elegans hypothetical protein K07E8.7. These are proteins of from 21 to 50 kDa which\ contain a number of conserved regions in their central section. This domain includes members of both the Rsu and Rlu families.\ 3829 IPR006429 \

    This group of sequences represent one of several distantly related families of phage portal protein. This protein forms a hole, or portal, that enables DNA passage during packaging and ejection. It also forms the junction between the phage head (capsid) and the tail proteins. It functions as a dodecamer of a single polypeptide of average mol. wt. of 40-90 KDa.

    \ 6272 IPR010933 \

    This entry represents of the C-terminal region specific to the eukaryotic NADH dehydrogenase subunit 2 protein and is found in conjunction with .

    \ 2456 IPR000049 \ The electron transfer flavoprotein (ETF) PUBMED:2326318, PUBMED:8525056\ serves as a specific electron\ acceptor for various mitochondrial dehydrogenases. ETF transfers electrons to\ the main respiratory chain via ETF-ubiquinone oxidoreductase. ETF is an\ heterodimer that consist of an alpha and a beta subunit and which bind one\ molecule of FAD per dimer. A similar system also exists in some bacteria.\ The beta subunit of ETF is a protein of about 28 Kd which is structurally\ related to the bacterial nitrogen fixation protein fixA which could play a\ role in a redox process and feed electrons to ferredoxin.\ The beta subunit protein is distantly related to and forms a\ heterodimer with the alpha subunit .\ 6548 IPR009595 \

    This family consists of several bacteriophage phi-29 early protein GP16.7 sequences of around 130 residues in length. The function of this family is unknown.

    \ 3939 IPR011608 \

    The PRD domain (for PTS Regulation Domain), is the phosphorylatable regulatory domain found in bacterial transcriptional antiterminator of the BglG family as well as in activators such as MtlR and LevR. The PRD domain is phosphorylated on a conserved histidine residue. PRD-containing proteins are involved in the regulation of catabolic operons in Gram+ and Gram- bacteria and are often characterised by a short N-terminal effector domain that binds to either RNA (CAT-RBD for antiterminators, ) or DNA (for activators), and a duplicated PRD module which is phosphorylated on conserved histidines by the sugar phosphotransferase system (PTS) in response to the availability of carbon source. The phosphorylations are thought to modify the stability of the dimeric proteins and thereby the RNA- or DNA-binding activity of the effector domain PUBMED:11751049, PUBMED:11733988, PUBMED:11447120.

    \ 3296 IPR003209 \

    Methenyltetrahydromethanopterin cyclohydrolase catalyses the interconversion of methenyltetrahydromethanopterin and N(5)formyltetrahydromethanopterin, and is found in both archaea and bacteria. In methanogenic archaea, such as Methanobacterium autotrophicum, this enzyme is involved in the production of methane from carbon dioxide PUBMED:8617278. In the sulphate-reducer Archaeoglobus fulgidus, this enzyme is involved in the tetrahydromethanopterin-dependent oxidation of lactate PUBMED:8481088. In Gram-negative methylotrophic bacteria this enzyme is involved in the tetrahydromethanopterin-dependent oxidation of formaldehyde to formate PUBMED:10482517.

    \ \ 2439 IPR006166 \ This domain is predicted to be a nuclease domain, and is found in DNA repair proteins and proteins involved in recombination events during meiosis in Drosophila melanogaster.\ 1102 IPR002924 \

    This family consists of adenovirus E1B 19 kDa protein or small t-antigen. The E1B 19 kDa protein inhibits E1A induced apoptosis and hence prolongs the viability of the host cell PUBMED:8083992.\ It can also inhibit apoptosis mediated by tumor necrosis factor alpha and Fas antigen PUBMED:8083992. E1B 19 kDa blocks apoptosis by interacting with and inhibiting the p53-inducible and death-promoting Bax protein PUBMED:8600029.\ The E1B region of adenovirus encodes two proteins E1B 19 kDa the small t-antigen as found in this family and E1B 55 kDa the\ large t-antigen which is not found in this family; both\ of these proteins inhibit E1A induced apoptosis PUBMED:8083992.

    \ 6907 IPR010764 \

    This family consists of several hypothetical bacterial proteins of around 610 residues in length. Members of this family are highly conserved and seem to be specific to Chlamydia species. The function of this family is unknown.

    \ 2265 IPR006873 \ This is a family of uncharacterised proteins.\ 6852 IPR009211 \

    This entry contains proteins of unknown function that occur in bacteria that interact with and manipulate eukaryotic cells PUBMED:12437215.

    \

    Salmonella enterica SciE is encoded in the centisome 7 genomic island (SCI) PUBMED:10417651. Deletion of the entire island affects the ability of bacteria to enter eukaryotic cells PUBMED:12437215. Therefore, SciE and other SCI proteins may be involved in virulence.

    \

    Interestingly, another member of this family, Rhizobium leguminosarum ImpE, has been reported to be encoded by an avirulence locus involved in temperature-dependent protein secretion PUBMED:12580282. It is believed that the imp locus is involved in the secretion to the environment of proteins, including periplasmic RbsB protein, that cause blocking of Rhizobium leguminosarum infection in plants PUBMED:12580282.

    \ 132 IPR006823 \ This family represents a group of neutral/alkaline ceramidases found in both bacteria and eukaryotes PUBMED:10753931, PUBMED:10781606, PUBMED:10593963.\ 7902 IPR012542 \

    The DTCHT region is the C-terminal part of DNA gyrases B / topoisomerase IV / HATPase proteins PUBMED:15112237. This region is composed of quite low complexity sequence.

    \ 5842 IPR009256 \

    This family consists of several short bacterial proteins of unknown function.

    \ 2981 IPR007869 \ Homing endonucleases are encoded by mobile DNA elements that are found inserted within host genes in all domains of life. The crystal structure of the homing nuclease PI-Sce PUBMED:12219083 revealed two domains: an endonucleolytic centre resembling the C-terminal domain of Drosophila melanogaster Hedgehog protein, and a second domain containing the protein-splicing active site. This domain corresponds to the C-terminal domain, which has structural similarity to .\ 4290 IPR007811 \ This family comprises a specific subunit for Pol III, the tRNA specific polymerase.\ 603 IPR001005 \ The retroviral oncogene v-myb, and its cellular counterpart c-myb, encode nuclear DNA-binding proteins. These belong to the SANT domain family that specifically recognize the sequence YAAC(G/T)G PUBMED:3185713, PUBMED:8882580. In myb, one of the most conserved regions consisting of three tandem repeats has been shown to be involved in DNA-binding PUBMED:2824190.\ 7152 IPR010849 \

    This family contains DiGeorge syndrome critical region 6 (DGCR6) proteins (approximately 200 residues long) of a number of vertebrates. DGCR6 is a candidate for involvement in the DiGeorge syndrome pathology by playing a role in neural crest cell migration into the third and fourth pharyngeal pouches, the structures from which derive the organs affected in DiGeorge syndrome PUBMED:8733130. Also found in this family is the Drosophila melanogaster gonadal protein gdl.

    \ 3144 IPR007682 \ Lantibiotics are antibiotic peptides distinguished by the presence of the rare thioether amino acids lanthionine and/or methyllanthionine. They are produced by Gram-positive bacteria as gene-encoded precursor peptides and undergo post-translational modification to generate the mature peptide. Based on their structural and functional features lantibiotics are currently divided into two major groups: the flexible amphiphilic type-A and the rather rigid and globular type-B. Type-A lantibiotics act primarily by pore formation in the bacterial membrane by a mechanism involving the interaction with specific docking molecules such as the membrane precursor lipid II PUBMED:7601145.\ 4321 IPR007448 \ This family includes bacterial transcriptional regulators that are thought to act through an interaction with the conserved region 4 of the sigma(70) subunit of RNA polymerase. The Pseudomonas aeruginosa homologue, AlgQ, positively regulates virulence gene expression and is associated with the mucoid phenotype observed in Pseudomonas aeruginosa isolates from cystic fibrosis patients.\ 1887 IPR003738 \

    This entry describes proteins of unknown function.

    \ 2284 IPR006959 \ This family contains uncharacterised proteins from Vibrio cholerae.\ 4086 IPR006985 \ The calcitonin-receptor-like receptor can function as either a calcitonin-gene-related peptide or an adrenomedullin receptor. The receptors function is modified by receptor activity modifying protein or RAMP. RAMPs are single-transmembrane-domain proteins PUBMED:9620797.\ 5715 IPR008882 \ This family consists of several Trypanosoma brucei procyclic acidic repetitive protein (PARP) like sequences. The procyclic acidic repetitive protein (parp) genes of T. brucei encode a small family of abundant surface proteins whose expression is restricted to the procyclic form of the parasite. They are found at two unlinked loci, parpA and parpB; transcription of both loci is developmentally regulated PUBMED:2342468.\ 5743 IPR008590 \ This family consists of several uncharacterised eukaryotic proteins. The function of this family is unknown.\ 6845 IPR009746 \

    This family consists of several bacterial antimicrobial peptide resistance and lipid A acylation (PagP) proteins. The bacterial outer membrane enzyme PagP transfers a palmitate chain from a phospholipid to lipid A. In a number of pathogenic Gram-negative bacteria, PagP confers resistance to certain cationic antimicrobial peptides produced during the host innate immune response.

    \ 1727 IPR000846 \

    Dihydrodipicolinate reductase () catalyzes the second step in the biosynthesis of \ diaminopimelic acid and lysine, the NAD or NADP-dependent reduction of 2,3-dihydrodipicolinate \ into 2,3,4,5-tetrahydrodipicolinate.

    \ 5946 IPR010361 \

    This family consists of a number of archaeal specific PaREP8 proteins. The function of this protein is unknown.

    \ 5734 IPR008583 \ This family consists of a series of 12 repeats of 35 amino acids in length which are found exclusively in herpes virus-7. The function of this family is unknown.\ 609 IPR003841 \ This family includes the mammalian type II renal Na+/Pi-cotransporters and other proteins from lower eukaryotes and bacteria some of which are also Na+/Pi-cotransporters. In the kidney these proteins may be involved in actively transporting phosphate into cells via Na+ cotransport in the renal brush border membrane PUBMED:8327470.\ 3266 IPR004691 \ The MSS family includes the monobasic malonate:Na+ symporter of Malonomonas rubra. It consists of two integral membrane proteins, MadL and MadM.The transporter is believed to catalyze the electroneutral reversible uptake of H+-malonate with one Na+, and both subunits have been shown to be essential for activity.\ 2749 IPR000514 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 39 comprises enzymes with several known activities; alpha-L-iduronidase (); beta-xylosidase ().

    \ \ \

    The most highly conserved regions in these enzymes are located in their N-terminal\ sections. These contain a glutamic acid residue which, on the basis of\ similarities with other families of glycosyl hydrolases PUBMED:7624375, probably acts as\ the proton donor in their catalytic mechanism.

    \ 1217 IPR007243 \ In yeast, 15 Apg proteins coordinate the formation of autophagosomes. Autophagy is a bulk degradation process induced by starvation in eukaryotic cells PUBMED:11689437. Apg6/Vps30p has two distinct functions in the autophagic process, either associated with the membrane or in a retrieval step of the carboxypeptidase Y sorting pathway PUBMED:9712845.\ 723 IPR000601 \ The PKD domain was first identified in the Polycystic kidney disease protein PKD1, and contains an Ig-like fold. PKD1 is involved in adhesive protein-protein and protein-carbohydrate interactions, however it is not clear if the PKD domains mediate any of these interactions. Most of these domains are present in the extracellular parts of proteins involved in\ interactions with other proteins. The domain is most often found in proteins \ archaebacteria and some vertebrates.\ 6832 IPR010736 \

    This represents a short conserved region (approximately 30 residues long) that is repeated in several eukaryotic proteins of unknown function. One member of this family is annotated as possibly being related to alpha collagen.

    \ 5472 IPR008518 \ This family consists of several eukaryotic proteins of unknown function.\ 4754 IPR002091 \ Amino acid permeases are integral membrane proteins involved in the transport\ of amino acids into the cell. A number of such proteins have been found to be\ evolutionary related PUBMED:3146645, PUBMED:2687114, PUBMED:8382989. \

    Aromatic amino acids are concentrated in the cytoplasm of Escherichia coli by 4 \ distinct transport systems: a general aromatic amino acid permease, and a\ specific permease for each of the 3 types (Phe, Tyr and Trp) PUBMED:1987112. It has been shown PUBMED:2022620 that some permeases in E. coli and related bacteria are evolutionary related.\ These permeases are proteins of about 400 to 420 amino acids and are located in the cytoplasmic membrane and, like bacterial sugar/cation transporters, are thought to contain 12 transmembrane (TM)\ regions PUBMED:1987112 - hydropathy analysis, however, is inconclusive, suggesting the\ possibility of 10 to 12 membrane-spanning domains PUBMED:2022620. The best conserved domain is a stretch of 20 residues which seems to be located in a cytoplasmic loop between the\ first and second transmembrane region.

    \ 254 IPR004950 \

    This family of proteins, from Caenorhabditis species, have not been characterised though a number are annotated as 'serpentine receptor, class r' proteins.

    \ 4064 IPR007190 \ This domain is found in PWP2, a member of the WD-repeat family of proteins, which is an essential Saccharomyces cerevisiae protein involved in cell separation.\ 1352 IPR000468 \ Barnase is the extracellular ribonuclease of Bacillus amyloliquefaciens, and barstar its specific intracellular inhibitor PUBMED:2696173, PUBMED:3050134. Expression of barstar is necessary to counter the lethal effect of expressed active barnase. The structure of the barnase-barstar complex is known PUBMED:8043575.\ 6429 IPR009521 \

    This family consists of several Orthopoxvirus F6L proteins the function of which is unknown.

    \ 5385 IPR008672 \ This family consists of several eukaryotic mitotic checkpoint (Mitotic arrest deficient or MAD) proteins. The mitotic spindle checkpoint monitors proper attachment of the bipolar spindle to the kinetochores of aligned sister chromatids and causes a cell cycle arrest in prometaphase when failures occur. Multiple components of the mitotic spindle checkpoint have been identified in Saccharomyces cerevisiae and higher eukaryotes. In Saccharomyces cerevisiae, the existence of a Mad1-dependent complex containing Mad2, Mad3, Bub3 and Cdc20 has been demonstrated PUBMED:12574116.\ 3048 IPR000649 \

    Initiation factor 2 binds to Met-tRNA, GTP and the small ribosomal subunit. The eukaryotic translation\ initiation factor EIF-2B is a complex made up of five different subunits, alpha, beta, gamma, delta and epsilon,\ and catalyzes the exchange of EIF-2-bound GDP for GTP. This family includes initiation factor 2B alpha, beta\ and delta subunits from eukaryotes; related proteins from archaebacteria and IF-2 from prokaryotes and also contains a subfamily of proteins in eukaryotes, archaeae (e.g. Pyrococcus furiosus), or eubacteria such as Bacillus subtilis and Thermotoga maritima. Many of these proteins were initially annotated as putative translation initiation factors despite the fact that there is no evidence for the requirement of an IF2 recycling factor in prokaryotic translation initiation. Recently, one of these proteins from Bacillus subtilis has been functionally characterized as a 5-methylthioribose-1-phosphate isomerase (MTNA) PUBMED:14551435. This enzyme participates in the methionine salvage pathway catalyzing the isomerization of 5-methylthioribose-1-phosphate to 5-methylthioribulose-1-phosphate PUBMED:15215245. The methionine salvage pathway leads to the synthesis of methionine from methylthioadenosine, the end product of the spermidine and spermine anabolism in many species.

    \ 1932 IPR003847 \

    This entry describes proteins of unknown function.

    \ 2930 IPR005210 \

    The UL36 open reading frame (ORF) encodes the largest herpes simplex virus type 1 (HSV-1) protein, a 270 kDa polypeptide designated VP1/2, which is also a component of the virion tegument. A null mutation in the UL36 gene of herpes simplex virus type 1 results in accumulation of unenveloped DNA-filled capsids in the cytoplasm of infected cells PUBMED:1331541. The region which defines these sequences only covers a small central part of this large protein.

    \ 5779 IPR010275 \

    This family consists of a series of hypothetical bacterial proteins of unknown function.

    \ 6131 IPR009390 \

    This family consists of several uncharacterised bacterial proteins of unknown function.

    \ 4905 IPR004601 \

    Schizosaccharomyces pombe ultraviolet damage endonuclease (UVDE or Uve1p) performs the initial step in an alternative excision repair pathway for UV-induced DNA damage. This DNA repair pathway was originally thought to be specific for UV damage, however Uve1p also recognizes UV-induced bipyrimidine photoadducts and other non-UV-induced DNA adducts PUBMED:10801329.

    The Deinococcus radiodurans UVSE protein has also shown to be a UV DNA damage endonuclease that catalyzes repair of UV-induced DNA damage by a similar mechanism PUBMED:11807060.

    \ 4431 IPR006886 \ This is a family of higher eukaryotic proteins. SIN was identified as a protein that interacts specifically with SXL (sex lethal) in a yeast two-hybrid assay. The interaction is mediated by one of the SXL RNA-binding domains PUBMED:10521666.\ 6351 IPR009490 \

    This family consists of several hypothetical bacterial proteins found in Escherichia coli and Citrobacter rodentium. The function of this family is unknown.

    \ 7334 IPR011111 \

    This family includes proteins with sequence similarity to the RepB partitioning protein of the large Ti (tumour-inducing) plasmids of Agrobacterium tumefaciensPUBMED:10613878, PUBMED:9524202.

    \ 1717 IPR000829 \ Diacylglycerol kinase () (DAGK) is an enzyme that catalyzes the formation of phosphatidic \ acid from diacylglycerol and ATP, an important step in phospholipid biosynthesis. In bacteria DAGK is \ very small (13 to 15 kD) membrane protein which seems to contain three transmembrane domains PUBMED:8071224. \ The best conserved region, is a stretch of 12 residues which are located in a cytoplasmic loop between \ the second and third transmembrane domains.\ 2056 IPR005234 \

    This family represents ScpB, which along with ScpA () interacts with SMC in vivo forming a complex that is required for chromosome condensation and segregation PUBMED:12065423, PUBMED:12897137. The SMC-Scp complex appears to be similar to the MukB-MukE-Muk-F complex in Escherichia coli PUBMED:10545099, where MukB () is the homologue of SMC. ScpA and ScpB have little sequence similarity to MukE () or MukF (), they are predicted to be structurally similar, being predominantly alpha-helical with coiled coil regions.

    \ \ \

    In general scpA and scpB form an operon in most bacterial genomes. Flanking genes are highly variable suggesting that the operon has moved throughout evolution. Bacteria containing an smc gene also contain scpA or scpB but not necessarily both. An exception is found in Deinococcus radiodurans, which contains scpB but neither smc nor scpA. In the archaea the gene order SMC-ScpA is conserved in nearly all species, as is the very short distance between the two genes, indicating co-transcription of the both in different archaeal genera and arguing that interaction of the gene products is not confined to the homologues in Bacillus subtilis. It would seem probable that, in light of all the studies, SMC, ScpA and ScpB proteins or homologues act together in chromosome condensation and segregation in all prokaryotes PUBMED:12100548.

    \ \ 1789 IPR002180 \ This family includes the beta chain of 6,7-dimethyl-8-ribityllumazine synthase , an enzyme involved in riboflavin biosynthesis. The family also includes a subfamily of distant archaebacterial proteins that may also have the same function for example .\ 4238 IPR001911 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Evidence suggests that, in prokaryotes, the peptidyl\ transferase reaction is performed by the large subunit 23S rRNA, whereas\ proteins probably have a greater role in eukaryote ribosomes. Most of the\ proteins lie close to, or on the surface of, the 30S subunit, arranged\ peripherally around the rRNA PUBMED:9281425. The small subunit ribosomal proteins can\ be categorised as primary binding proteins, which bind directly and\ independently to 16S rRNA; secondary binding proteins, which display no\ specific affinity for 16S rRNA, but its assembly is contingent upon the\ presence of one or more primary binding proteins; and tertiary binding\ proteins, which require the presence of one or more secondary binding\ proteins and sometimes other tertiary binding proteins.\ The small ribosomal subunit protein S21 contains 55-70 amino acid residues,\ and has only been found in eubacteria to date, though it has been reported that plant chloroplasts and mammalian mitochondria contain ribosomal subunit protein S21. Experimental evidence has\ revealed that S21 is well exposed on the surface of the Escherichia coli\ ribosome PUBMED:9371771, and is one of the 'split proteins': these are a discrete group\ that are selectively removed from 30S subunits under low salt conditions\ and are required for the formation of activated 30S reconstitution\ intermediate (RI*) particles.

    \ 3719 IPR000200 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases belong to MEROPS peptidase family C10 (streptopain family, clan CA). Streptopain is a cysteine protease found in Streptococcus pyogenes that\ shows some structural and functional similarity to papain (family C1) PUBMED:7845226, PUBMED:1270417. The order of the catalytic cysteine/histidine dyad is the same and the\ surrounding sequences are similar. The two proteins also show similar\ specificities, both preferring a hydrophobic residue at the P2 site PUBMED:7845226, PUBMED:4683008.

    \ \

    Streptopain shows a high degree of sequence similarity to the S. pyogenes exotoxin B, and strong similarity to the prtT gene product of\ Porphyromonas gingivalis, both of which have been included in the family PUBMED:7845226.

    \ 2203 IPR007531 \

    Dysbindin is an evolutionary conserved 40-kDa coiled-coil-containing protein that binds to alpha- and beta-dystrobrevin in muscle and brain. Dystrophin and alpha-dystrobrevin are co-immunoprecipitated with dysbindin, indicating that dysbindin is DPC-associated in muscle. Dysbindin co-localises with alpha-dystrobrevin at the sarcolemma and is up-regulated in dystrophin-deficient muscle. In the brain, dysbindin is found primarily in axon bundles and especially in certain axon terminals, notably mossy fibre synaptic terminals in the cerebellum and hippocampus. Dysbindin may have implications for the molecular pathology of Duchenne muscular dystrophy and may provide an alternative route for anchoring dystrobrevin and the DPC to the muscle membrane PUBMED:2098102. Genetic variation in the human dysbindin gene is also thought to be associated with Schizophrenia PUBMED:11316798.

    .\ 7299 IPR010906 \

    Terminase, the DNA packaging enzyme of bacteriophage lambda, is a heteromultimer composed of subunits Nu1 and A. The smaller Nu1 terminase subunit has a low-affinity ATPase stimulated by non-specific DNA PUBMED:10600592.

    \ 4721 IPR005118 \

    This domain is found in proteins necessary for strand-specific repair in DNA such as TRCF in E. coli. A lesion in the template strand blocks the RNA polymerase complex (RNAP). The RNAP-DNA-RNA complex is specifically recognised by TRCF which releases RNAP and the truncated transcript.

    \ \ 6406 IPR009510 \

    This family consists of several YscK proteins. The function of this protein is unknown but it belongs to an operon involved in the secretion of Yop proteins across bacterial membranes.

    \ 6286 IPR009459 \

    This entry represents a series of repeated sequences of around 50 residues in length. The repeat is found in bacterial peptidoglycan bound proteins and is often found in conjunction with and .

    \ 4864 IPR005357 \

    This family of small proteins is uncharacterised. In this domain is found next to a DNA binding helix-turn-helix domain , which suggests that this is some kind of ligand binding domain.

    \ 7601 IPR011691 \ This is a group of sequences derived from eukaryotic proteins. They are similar to a region of a SNARE-like protein required for traffic through the Golgi complex, SFT2 protein () PUBMED:7596416. This is a conserved protein with four putative transmembrane helices, thought to be involved in vesicular transport in later Golgi compartments PUBMED:10406798. The members of this entry also show four putative transmembrane regions.\ 4315 IPR007485 \ The Escherichia coli family member has been named Rare lipoprotein B (RplB). Thioglyceride and N-fatty acyl residues may be attached to the N-terminal cysteine, which is conserved in this family. RplB is speculated to be involved in cell duplication PUBMED:3316191.\ 6771 IPR005735 \

    This model describes a putative zinc finger domain found in three closely spaced copies in Arabidopsis protein LSD1 and in two copies in other proteins from the same species. The motif resembles CxxCRxxLMYxxGASxVxCxxC PUBMED:9054508. This domain may play a role in the regulation of transcription, via either repression of a prodeath pathway or activation of an antideath pathway, in response to signals emanating from cells undergoing\ pathogen-induced hypersensitive cell death.

    \ \ 6710 IPR010685 \

    This entry represents a conserved region located towards the C terminus of a number proteins of unknown function that seem to be specific to Oryza sativa.

    \ 1782 IPR007793 \ The Bacillus subtilis divIVA1 mutation causes misplacement of the septum during cell division, resulting in the formation of small, circular, anucleate minicells PUBMED:9045828. Inactivation of divIVA produces a minicell phenotype, whereas overproduction of DivIVA results in a filamentation phenotype PUBMED:9045828. These proteins appear to contain coiled-coils.\ 150 IPR002018 \ Higher eukaryotes have many distinct esterases. Among the different types are\ those which act on carboxylic esters (). Carboxyl-esterases have\ been classified into three categories (A, B and C) on the basis of\ differential patterns of inhibition by organophosphates. The sequence of a\ number of type-B carboxylesterases indicates PUBMED:3163407, PUBMED:1862088, PUBMED:8453375 that the majority are evolutionary related. As is the case for lipases and serine proteases, the catalytic apparatus of\ esterases involves three residues (catalytic triad): a serine, a glutamate or\ aspartate and a histidine.\ 2176 IPR007509 \ This is a family of hypothetical archaeal proteins.\ 4647 IPR001187 \ Tissue factor (TF) is an integral membrane glycoprotein that initiates blood coagulation\ by forming a complex with circulating factor VII or VIIa. The complex then activates\ factors IX or X by specific limited proteolysis PUBMED:8609606, PUBMED:1840552. TF plays a role in\ normal hemostasis by initiating the cell-surface assembly and propagation of the\ coagulation protease cascade.\ 4967 IPR000976 \ Wilm's tumour (WT) is an embryonal malignancy of the kidney, affecting around 1 in 10,000 infants. It \ occurs in both sporadic and hereditary forms. Inactivation of WT1 is one of the causes of Wilm's tumour. \ Defects in the WT1 gene are also associated with Denys-Drash Syndrome (DDS), which is characterised by \ typical nephropathy and genital abnormalities. The WT1 gene product shows similarity to the zinc fingers \ of the mammalian growth regulated EGR1 and EGR2 proteins PUBMED:8393820, PUBMED:1671709, PUBMED:2154702, PUBMED:1317572.\ 2998 IPR008207 \ This domain is present at the N terminus in proteins which undergo autophosphorylation. The group includes, the gliding motility regulatory protein from Myxococcus xanthus and a number of bacterial chemotaxis proteins.\ 5615 IPR008557 \ This family consists of bacterial proteins of unknown function.\ 6136 IPR009391 \

    This family consists of several very short bacterial 23S rRNA methylase leader peptide (ErmC) sequences. ermC confers resistance to macrolide-lincosamide streptogramin B antibiotics by specifying a ribosomal RNA methylase, which results in decreased ribosomal affinity for these antibiotics. ermC expression is induced by exposure to erythromycin PUBMED:4018035.

    \ 6388 IPR010551 \

    This family consists of several bacterial and archaeal glucose-6-phosphate isomerase (GPI) proteins (), which are involved in glycolysis and in gluconeogenesis and catalyse the conversion of D-glucose 6-phosphate to D-fructose 6-phosphate. The deduced amino acid sequence of the first archaeal PGI isolated from Pyrococcus furiosus revealed that it is not related to its eukaryotic and many of its bacterial counterparts. In contrast, this archaeal PGI shares similarity with the cupin superfamily that consists of a variety of proteins that are generally involved in sugar metabolism in both prokaryotes and eukaryotes PUBMED:11533028.

    \ \ 4844 IPR005341 \

    This is a small family of proteins of unknown function.

    \ 2688 IPR006892 \

    This family consists mostly of Gemini virus AC4 and AC5 proteins PUBMED:7844539.

    \ 2030 IPR007147 \ This is a family of proteins of unknown function found in yeast.\ 2023 IPR005660 \

    This presumed domain is found in one or two copies per protein. The domain is about 230 amino acids in length and has many conserved motifs that are probably functionally important.

    \ 3759 IPR002169 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases constitutes the MEROPS peptidase family M9, subfamily M9A (clan MA(E)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA and the predicted active site residues for members of this family and thermolysin occur in the motif HEXXH PUBMED:7674922.

    \ \

    Microbial collagenases have been identified from bacteria of both the\ Vibrio and Clostridium genuses. Collagenase is used during bacterial attack to degrade the collagen barrier of the host during invasion. Vibrio bacteria are non-pathogenic, and are sometimes used in hospitals to remove dead tissue from burns and ulcers. Clostridium histolyticum is a pathogen that causes gas gangrene;\ nevertheless, the isolated collagenase has been used to treat bed sores.\ Collagen cleavage occurs at an Xaa+Gly in Vibrio bacteria and at Yaa+Gly\ bonds in Clostridium collagenases PUBMED:.

    \ \

    Analysis of the primary structure of the gene product from Clostridium perfringens has revealed that the enzyme is produced with a stretch of 86 residues that contain a putative signal sequence PUBMED:8282691. Within this stretch is found PLGP, an amino acid sequence typical of collagenase substrates. This sequence may thus be implicated in self-processing of the collagenase PUBMED:8282691.

    \ 1128 IPR006713 \ This family of adhesins bind to the Dr blood group antigen component of decay-accelerating factor. This mediates adherence of uropathogenic Escherichia coli to the urinary tract. This family contains both fimbriated and afimbriated adherence structures PUBMED:9169726. This protein also confers the phenotype of mannose-resistant hemagglutination, which can be inhibited by chloramphenicol. The N-terminal portion of the protein is thought to be responsible for chloramphenicol sensitivity PUBMED:1670929.\ 4122 IPR006881 \ This is a family of plasmid encoded proteins involved in plasmid replication. The role of RepA in the replication process is not clearly understood PUBMED:11914352.\ 1693 IPR006208 \

    This domain is found at the C-terminal of glycoprotein hormones and various extracellular proteins. It is believed to be involved in disulphide-linked dimerisation.

    \ 4457 IPR002161 \

    Members of this family are involved in the pyridoxine biosynthetic pathway PUBMED:8955308, PUBMED:9791124. The regulation of cellular growth and proliferation in response to environmental cues is\ critical for development and the maintenance of viability in all organisms. In unicellular\ organisms, such as the budding yeast Saccharomyces cerevisiae, growth and proliferation\ are regulated by nutrient availability.

    \ 85 IPR001025 \

    The BAH (bromo-adjacent homology) family contains proteins such as eukaryotic DNA (cytosine-5) methyltransferases , the origin recognition complex 1 (Orc1) proteins, as well as several proteins involved in transcriptional regulation. The BAH domain appears to act as a protein-protein interaction module specialized in gene silencing, as suggested for example by its interaction within yeast Orc1p with the silent information regulator Sir1p. The BAH module might therefore play an important role by linking DNA methylation, replication and transcriptional regulation PUBMED:10100640.

    \ \ 2444 IPR005490 \ This family of proteins are found in a range of bacteria. The conserved region contains a histidine and cysteine suggesting that these proteins have an enzymatic activity. Several members of this family contain peptidoglycan binding domains and therefore may use peptidoglycan or a precursor as their substrate.\ 3391 IPR005302 \

    The MOSC (MOCO sulphurase C-terminal) domain is a superfamily of beta-strand-rich domains identified in the molybdenum cofactor sulphurase and several other proteins from both prokaryotes and eukaryotes. These MOSC domains contain an absolutely conserved cysteine and occur either as stand-alone forms such as , or fused to other domains such as NifS-like catalytic domain in Molybdenum cofactor sulphurase. The MOSC domain is predicted to be a sulphur-carrier domain that receives sulphur abstracted by the pyridoxal phosphate-dependent NifS-like enzymes, on its conserved cysteine, and delivers it for the formation of diverse sulphur-metal clusters.

    \ 1549 IPR003370 \ Members of this family probably act as chromate transporters PUBMED:2152903, PUBMED:2180932, and are found in both bacteria and archaebacteria. The protein reduces chromate accumulation and is essential for chromate resistance. They are composed of one or two copies of this region. The alignment contains two conserved motifs, FGG and PGP.\ 5053 IPR007890 \

    CHASE2 is an extracellular sensory domain, which is present in various classes of\ transmembrane receptors that are upstream of signal transduction pathways in bacteria. Specifically,\ CHASE2 domains are found in histidine kinases, adenylate cyclases, serine/threonine kinases and\ predicted diguanylate cyclases/phosphodiesterases. Environmental factors that are recognized by\ CHASE2 domains are not known at this time PUBMED:12486065.

    \ 2507 IPR000392 \ Nitrogen fixing bacteria possess a nitrogenase enzyme complex PUBMED:2672439 that\ comprises 2 components, which catalyse the reduction of molecular nitrogen\ to ammonia PUBMED:6327620, PUBMED:: component I (nitrogenase MoFe protein or dinitrogenase)\ contains 2 molecules each of 2 non-identical subunits; component II \ (nitrogenase Fe protein or dinitrogenase reductase) is a homodimer, the\ monomer being coded for by the nifH gene PUBMED:6327620. Component II has 2 ATP-binding\ domains and one 4Fe-4S cluster per homodimer: it supplies energy by ATP \ hydrolysis, and transfers electrons from reduced ferredoxin or flavodoxin\ to component I for the reduction of molecular nitrogen to ammonia PUBMED:2491672.\ There are a number of conserved regions in the sequence of these proteins: in\ the N-terminal section there is an ATP-binding site motif 'A' (P-loop) and in\ the central section there are two conserved cysteines which have been shown,\ in nifH, to be the ligands of the 4Fe-4S cluster.\ 5038 IPR000433 \ Skeletal muscle dystrophin is a 427 kDa protein thought to act as a link between\ the actin cytoskeleton and the extracellular matrix. Perturbations of the dystrophin-associated complex, for example, between dystrophin and the transmembrane glycoprotein beta-dystroglycan, may lead to muscular dystrophy.\ Previously, the cysteine-rich region and first half of the carboxy-terminal domain of dystrophin were shown to interact with beta-dystroglycan through a stretch of fifteen amino acids at the carboxy-terminus of beta-dystroglycan. This region of dystrophin implicated in binding beta-dystroglycan contains four modular protein domains: a WW domain, two putative Ca2+-binding EF-hand motifs, and a putative zinc finger ZZ domain PUBMED:8848831, PUBMED:10355629.\ 1078 IPR001447 \

    Arylamine N-acetyltransferase (NAT) is a cytosolic enzyme of approximately 30 kDa. It facilitates the transfer of an acetyl\ group from acetyl coenzyme A on to a wide range of arylamine, N-hydroxyarylamines and hydrazines. Acetylation of\ these compounds generally results in inactivation. NAT is found in many species from Mycobacteria (Mycobacterium tuberculosis, Mycobacterium\ smegmatis etc) to Homo sapiens. It was the first enzyme to be observed to have polymorphic activity amongst human individuals.\ NAT is responsible for the inactivation of Isoniazid (a drug used to treat tuberculosis) in humans. The NAT protein has\ also been shown to be involved in the breakdown of folic acid. NAT catalyses the reaction:

    \ \ \

    NAT is the target of a common genetic polymorphism of clinical relevance in\ humans. The N-acetylation polymorphism is determined by low or high NAT\ activity in liver. NAT has been implicated in the action and toxicity \ of amine-containing drugs, and in the susceptibility to cancer and\ systematic lupus erythematosus. Two highly similar human genes for NAT, \ termed NAT1 and NAT2, encode genetically invariant and variant NAT proteins,\ respectively.

    \ 7854 IPR012521 \

    This family consists of the major classes of antimicrobial peptides secreted from the skin of frogs that protect the frogs against invading microbes. They are typically between 10-50 amino acids long and are derived from proteolytic cleavage of larger precursors. Major classes of peptides such esculentin, gaegurin, brevinin, rugosin and ranatuerin are included in this family PUBMED:12470734.

    \ 6515 IPR009571 \

    This family consists of several fungal specific SUR7 proteins. In Saccharomyces cerevisiae the SUR7 gene encodes a putative integral membrane protein with four transmembrane domains. It has been suggested that the Rvs161 and Rvs167 proteins act together in relation with SUR7. The transmembranous character of SUR7 suggests a membrane localisation of the Rvs function, a localisation that is consistent with the different rvs phenotypes and the actin-Rvs167p interaction PUBMED:9219339. It has also been suggested that SUR7 may play a role in sporulation PUBMED:11784867.

    \ 3498 IPR003168 \ Nitrile hydratases are unusual metalloenzymes that catalyse the hydration of nitriles to their corresponding amides. They are used as biocatalysts in acrylamide production, one of the few commercial scale bioprocesses, as well as in environmental remediation for the removal of nitriles from waste streams. Nitrile hydratases are composed of two subunits, alpha and beta, and they contain one iron atom per alpha beta unit PUBMED:9195885.\ 1610 IPR002023 \

    Respiratory-chain NADH dehydrogenase (ubiquinone) () PUBMED:, PUBMED:2029890 (also known as complex\ I or NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex\ located in the inner mitochondrial membrane which also seems to exist in\ the chloroplast and in cyanobacteria (as a NADH-plastoquinone oxidoreductase).\ Among the 25 to 30 polypeptide subunits of this bioenergetic enzyme complex\ there is one with a molecular weight of 24 kDa (in mammals), which is a\ component of the iron-sulphur (IP) fragment of the enzyme. It seems to bind a\ 2Fe-2S iron-sulphur cluster. The 24 kDa subunit is nuclear encoded, as a\ precursor form with a transit peptide in mammals and in Neurospora crassa.\ There is a highly conserved region located in the\ central section of this subunit that contains two conserved cysteines,\ that are probably involved in the binding of the 2Fe-2S center.\ The 24 kDa subunit is highly similar to PUBMED:7690854, PUBMED:1445936:\

    \

    \ \ 5982 IPR009320 \

    This family of proteins includes three proteins from Escherichia coli YagB, YeeU and YfjZ. The function of these proteins is unknown. They are about 120 amino acids in length.

    \ 2414 IPR007346 \ Bacterial periplasmic or secreted () Escherichia coli endonuclease I (EndoI) is a sequence independent endonuclease located in the periplasm. It is inhibited by different RNA species. It is thought to normally generate double strand breaks in DNA, except in the presence of high salt concentrations and RNA, when it generates single strand breaks in DNA. Its biological role is unknown PUBMED:7867949. Other family members are known to be extracellular PUBMED:3036665. This family also includes a non-specific, Mg2+-activated ribonuclease precursor () PUBMED:1396690.\ 73 IPR004100 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    The overall structure of the bacterial PUBMED:9261073, mitochondrial PUBMED:11062563 and chloroplast PUBMED:11032839 F1-ATPase alpha and beta subunits are very similar to one another, suggesting a common catalytic mechanism. Both the non-catalytic alpha and catalytic beta subunits have almost identical folds, and are arranged alternately about a central axis. The N-terminal domain is comprised of a closed six-stranded beta-barrel with a Greek key topology, and is almost always associated with the central region (see ).

    \

    Other proteins are related to the F-ATPases. Vacuolar ATPases (V-ATPases), which are responsible for acidifying intracellular compartments in eukaryotic cells, contain 70 kDa and 60 kDa subunits that are related to the F-ATPase beta and alpha subunits, respectively PUBMED:2531737. Archaebacterial membrane-associated ATPases are composed of three subunits, where the alpha chain is related to the F-ATPase beta chain, and the beta chain is related to the F1-ATPase alpha chain PUBMED:2528146. The fliI in Bacillus and Salmonella, Spa47 in Shigella flexneri, HrpB6 in Xanthomonas campestris, and yscN in Yersinia virulence plasmids are specialised export proteins that are related to the F-ATPase beta subunit PUBMED:8491729.

    \ \ 1189 IPR005814 \ Aminotransferases share certain mechanistic features with other pyridoxalphosphate-dependent enzymes, such \ as the covalent binding of the pyridoxalphosphate group to a lysine residue. On the basis of sequence \ similarity, these various enzymes can be grouped PUBMED:1618757 into subfamilies. One of these, called \ class-III, includes acetylornithine aminotransferase (), which catalyzes the transfer of an \ amino group from acetylornithine to alpha-ketoglutarate, yielding N-acetyl-glutamic-5-semi-aldehyde and \ glutamic acid; ornithine aminotransferase (), which catalyzes the transfer of an amino group \ from ornithine to alpha-ketoglutarate, yielding glutamic-5-semi-aldehyde and glutamic acid; omega-amino \ acid--pyruvate aminotransferase (), which catalyzes transamination between a variety of \ omega-amino acids, mono- and diamines, and pyruvate; 4-aminobutyrate aminotransferase () (GABA \ transaminase), which catalyzes the transfer of an amino group from GABA to alpha-ketoglutarate, yielding \ succinate semialdehyde and glutamic acid; DAPA aminotransferase (), a bacterial enzyme (bioA), \ which catalyzes an intermediate step in the biosynthesis of biotin, the transamination of \ 7-keto-8-aminopelargonic acid to form 7,8-diaminopelargonic acid; 2,2-dialkylglycine decarboxylase \ (), a Pseudomonas cepacia enzyme (dgdA) that catalyzes the decarboxylating amino transfer of\ 2,2-dialkylglycine and pyruvate to dialkyl ketone, alanine and carbon dioxide; glutamate-1-semialdehyde \ aminotransferase () (GSA); Bacillus subtilis aminotransferases yhxA and yodT; Haemophilus \ influenzae aminotransferase HI0949; and Caenorhabditis elegans aminotransferase T01B11.2.\ 1845 IPR002811 \

    This group contains aspartate dehydrogenases that belong to a unique class of amino acid\ dehydrogenases.

    The structure of Thermotoga maritima TM1643 has been found to\ contain an N-terminal Rossmann fold domain (which binds the NAD(+) cofactor) and a C-terminal\ alpha/beta domain PUBMED:12496312. This suggested that TM1643 may be a dehydrogenase with the active\ site located at the interface between the two domains. Enzymatic characterisation of TM1643 revealed that it\ possesses NAD or NADP-dependent dehydrogenase activity toward l-aspartate but no aspartate oxidase activity\ PUBMED:12496312. The product of the aspartate dehydrogenase activity is also iminoaspartate. It has\ been suggested that two different enzymes, an oxidase and a dehydrogenase, may have evolved to catalyse the\ first step of NAD biosynthesis PUBMED:12496312. Members of this group share some structural\ similarity to several other NAD(P)+-dependent oxidoreductases, including inositol 1-phosphate\ synthase, dihydrodipicolinate reductase, and ASA-DH PUBMED:12496312.

    It has been proposed that\ in Thermotoga maritima, TM1643 catalyses the first reaction of de novo biosynthesis\ of NAD from aspartate, and it produces iminoaspartate required for this pathway. The formation of an enzyme\ complex between TM1643 and NadA, the next enzyme of the pathway, may allow the channeling of this unstable\ product directly to the NadA active site PUBMED:12496312.

    The same domain is present in animals\ (e.g., Caenorhabditis elegans F17C8.3 protein).

    \ 1516 IPR005579 \

    Members of this family are coiled-coil proteins that are involved in pre-rRNA processing PUBMED:11932453.

    \ 3057 IPR000807 \ Imidazoleglycerol-phosphate dehydratase is the enzyme that catalyzes the seventh \ step in the biosynthesis of histidine in bacteria, fungi and plants. In most organisms it is a \ monofunctional protein of about 22 to 29 kD. In some bacteria such as Escherichia coli, it is the \ C-terminal domain of a bifunctional protein that include a histidinol-phosphatase domain \ PUBMED:3062174.\ 3438 IPR000390 \

    Members of this family which have been characterized, belong to the small multidrug resistance (Smr) protein family and are integral membrane proteins. They confer resistance to a wide range of toxic compounds by removing them for the cells. The efflux is coupled to an influx of protons.\ An example is Escherichia coli mvrC which prevents the incorporation of methyl viologen into cells PUBMED:1320256 and is involved in ethidium bromide efflux PUBMED:1936950.

    \ 6657 IPR009645 \

    This family consists of several hypothetical bacterial proteins of around 325 residues in length. The function of this family is unknown.

    \ 7393 IPR003020 \

    Bicarbonate (HCO3-) transport mechanisms are the principal regulators of pH\ in animal cells. Such transport also plays a vital role in acid-base\ movements in the stomach, pancreas, intestine, kidney, reproductive organs\ and the central nervous system. Functional studies have suggested four\ different HCO3- transport modes. Anion exchanger proteins exchange\ HCO3- for Cl- in a reversible, electroneutral manner PUBMED:2289848.\ Na+/HCO3- co-transport\ proteins mediate the coupled movement of Na+ and HCO3- across plasma\ membranes, often in an electrogenic manner PUBMED:9261985. Na-\ driven Cl-/HCO3- exchange\ and K+/HCO3- exchange activities have also been detected in\ certain cell types, although the molecular identities of the proteins\ responsible remain to be determined.

    \

    Sequence analysis of the two families of HCO3- transporters that have been\ cloned to date (the anion exchangers and Na+/HCO3- co-transporters) reveals\ that they are homologous. This is not entirely unexpected, given that they\ both transport HCO3- and are inhibited by a class of pharmacological agents\ called disulphonic stilbenes PUBMED:9235899. They share around ~25-30% sequence\ identity, which is distributed along their entire sequence length, and have\ similar predicted membrane topologies, suggesting they have ~10\ transmembrane (TM) domains.

    \ 3726 IPR000317 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \ The two signatures that defines this group of calivirus polyproteins identify a cysteine peptidase signature that belongs to MEROPS peptidase family C24 (clan PA(C)). \ \

    Caliciviruses are positive-stranded ssRNA viruses that cause gastroenteritis. The calicivirus genome contains two open reading frames, ORF1 and ORF2. ORF2 encodes a structural protein PUBMED:8892921; while \ ORF1 encodes a non-structural polypeptide, which has RNA helicase, cysteine\ protease and RNA polymerase activity. The regions of the polyprotein in\ which these activities lie are similar to proteins produced by the picornaviruses. Two different families of caliciviruses can be distinguished on the basis of sequence similarity, namely those classified as small round structured viruses (SRSVs) and those classed as non-SRSVs.

    \ \

    Calicivirus proteases from the non-SRSV group, which are members of the PA\ protease clan, constitute family C24 of the cysteine proteases (proteases\ from SRSVs belong to the C37 family). As mentioned above, the protease\ activity resides within a polyprotein. The enzyme cleaves the polyprotein\ at sites N-terminal to itself, liberating the polyprotein helicase.

    \ 5351 IPR008894 \

    This group contains proteins which have a wide range of Swiss-Prot annotations, from 'hypothetical protein', 'lipopolysaccharide biosynthesis protein' to 'bifunctional acetyl transferase/isomerase'.

    \ 801 IPR007083 \ RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This entry, domain 4, represents the funnel domain. The funnel domain contains the binding site for some elongation factors PUBMED:8910400, PUBMED:11313498.\ 5211 IPR006630 \

    Human Ro ribonucleoproteins (RNPs) are composed of one of the four small Y RNAs and at least two proteins, Ro60 and La. The La protein is a 47 kDa polypeptide that frequently acts as an autoantigen in systemic lupus erythematosus and Sjogren's syndrome PUBMED:15016896. In the nucleus, La acts as a RNA polymerase III (RNAP III) transcription factor, while in the cytoplasm, La acts as a translation factor PUBMED:14636586. In the nucleus, La binds to the 3’UTR of nascent RNAP III transcripts to assist in folding and maturation PUBMED:15004549. In the cytoplasm, La recognises specific classes of mRNAs that contain a 5’-terminal oligopyrimidine (5’TOP) motif known to control protein synthesis PUBMED:14690589. The specific recognition is mediated by the N-terminal domain of La, which comprises a La motif and a RNA recognition motif (RRM). The La motif adopts an alpha/beta fold that comprises a winged-helix motif PUBMED:15048103.

    \

    Homologous La domain-containing proteins have been identified in a wide range of organisms except Archaea, bacteria and viruses PUBMED:7799435.

    \ \ 4128 IPR004028 \

    The Gag polyprotein directs the assembly and release of virus particles from infected cells. The Gag polyprotein has three domains required for activity: an N-terminal membrane-binding (M) domain that directs Gag to the plasma membrane, an interaction (I) domain involved in Gag aggregation, and a late assembly (L) domain that mediates the budding process PUBMED:10590103. During viral maturation, the Gag polyprotein is then cleaved into major structural proteins by the viral protease, yielding the matrix, capsid, nucleoprotein, and some smaller peptides. In Rous sarcoma virus (RSV), the M domain consists of the first 85 residues of the matrix protein. However, unlike other Gag polyproteins, the M domain of RSV Gag is not myristylated, but retains full activity PUBMED:11070020.This domain forms an alpha helical bundle structure PUBMED:9642071.

    \

    This entry represents the M domain of the Gag polyprotein found in avian retroviruses. This entry also identifies Gag polyproteins from several avian endogenous retroviruses, which arise when one or more copies of the retroviral genome becomes integrated into the host genome PUBMED:14680291.

    \ \ 4196 IPR000597 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L3 is known to\ bind to the 23S rRNA and may participate in the formation of the peptidyltransferase center of the ribosome. It\ belongs to a family of ribosomal proteins which, on the basis of sequence similarities includes bacterial, red algal, cyanelle, \ mammalian, yeast and Arabidopsis thaliana L3 proteins; archaeal Haloarcula marismortui\ HmaL3 (HL1), and yeast mitochondrial YmL9 PUBMED:1597181, PUBMED:1499563, PUBMED:2406244, PUBMED:.

    \ 4155 IPR000788 \

    Ribonucleotide reductase () PUBMED:3286319, PUBMED:8511586 catalyzes the reductive\ synthesis of deoxyribonucleotides from their corresponding ribonucleotides. It provides\ the precursors necessary for DNA synthesis. RNRs divide into three classes on the basis of their\ metallocofactor usage. Class I RNRs, found in eukaryotes, bacteria, bacteriophage and viruses, use a diiron-tyrosyl\ radical, Class II RNRs, found in bacteria, bacteriophage, algae and archaea, use coenzyme B12\ (adenosylcobalamin, AdoCbl). Class III RNRs, found in anaerobic bacteria and bacteriophage, use an FeS cluster\ and S-adenosylmethionine to generate a glycyl radical. Many organisms have more than one class of RNR present in\ their genomes.

    \

    Ribonucleotide reductase is an oligomeric\ enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (300 to\ 400 residues) - class II RNRs are less complex, using the small molecule B12 in place of the small\ chain PUBMED:11875520.

    The reduction of ribonucleotides to deoxyribonucleotides involves the transfer of free radicals,\ the function of\ each metallocofactor is to generate an active site thiyl radical. This thiyl radical then initiates the nucleotide reduction\ process by hydrogen atom abstraction from the ribonucleotide PUBMED:9309223. The radical-based reaction involves five\ cysteines: two of these are located at adjacent anti-parallel strands in a\ new type of ten-stranded alpha/beta-barrel; two others reside at the\ carboxyl end in a flexible arm; and the fifth, in a loop in the centre of\ the barrel, is positioned to initiate the radical reaction PUBMED:8052308. There are several regions of similarity in the sequence of the large \ chain of prokaryotes, eukaryotes and viruses spread across 3 domains:\ an N-terminal domain common to the mammalian and bacterial enzymes; a\ C-terminal domain common to the mammalian and viral ribonucleotide \ reductases; and a central domain common to all three PUBMED:9309223.

    \ 2207 IPR007538 \ This entry represents the N terminus of a protein of unknown function, found in dsDNA viruses with no RNA stage, including bacteriophages lambda and P22, and also in some Escherichia coli prophages.\ 2760 IPR004300 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 57 comprises enzymes with two known activities; alpha-amylase () and 4-alpha-glucanotransferase ().

    \ 6318 IPR010518 \

    This domain is found at the N terminus of a subset of sigma54-dependent transcriptional activators that are involved in regulation of flagellar motility e.g. FleQ in Pseudomonas aeruginosa. It is clearly related to , but lacks the conserved aspartate residue that undergoes phosphorylation in the classic two-component system response regulator ().

    \ 2095 IPR007368 \ This is a family of uncharacterised proteins.\ 5356 IPR008458 \ Avian infectious bronchitis virus (Avian infectious bronchitis virus), a member of Coronaviridae family, has a single-stranded positive-sense RNA genome, which is 27 kb in length. Gene 5 contains two (5a and 5b) open reading frames. The function of the 5a and 5b proteins is unknown PUBMED:9168126.\ 8031 IPR013184 \

    This is a family of short conserved proteins of 37 amino acids, described in Lactococcus bacteriophage. The function of these proteins is unknown.

    \ 3023 IPR002821 \ This family includes the enzymes hydantoinase and oxoprolinase ().\ Both reactions involve the hydrolysis of 5-membered rings via hydrolysis\ of their internal imide bonds PUBMED:8943290.\ 5855 IPR009262 \

    This family consists of several hypothetical proteins of unknown function. Some of the sequences in this family are annotated as putative membrane proteins.

    \ 5155 IPR007992 \

    This family consists of several eukaryotic succinate dehydrogenase [ubiquinone] cytochrome B\ small subunit, mitochondrial precursor (CybS) proteins. SDHD encodes the small subunit (cybS) of\ cytochrome b in succinate-ubiquinone oxidoreductase (mitochondrial complex II). Mitochondrial\ complex II is involved in the Krebs cycle and in the aerobic electron transport chain. It contains four\ proteins. The catalytic core consists of a flavoprotein and an iron-sulphur protein; these proteins are\ anchored to the mitochondrial inner membrane by the large subunit of cytochrome b (cybL) and\ cybS, which together comprise the heme-protein cytochrome b. Mutations in the SDHD gene can\ lead to hereditary paraganglioma, characterised by the development of benign, vascularised tumours\ in the head and neck PUBMED:10657297.

    \ 4730 IPR002905 \ This enzyme uses S-adenosyl-L-methionine to methylate tRNA:\ \ The TRM1 gene of Saccharomyces cerevisiae is necessary for the N2,N2-dimethylguanosine modification of both mitochondrial and cytoplasmic tRNAs PUBMED:9685492. The enzyme is found in both eukaryotes and archaea PUBMED:3299379.\ 1802 IPR001001 \ Describes the beta chain of DNA polymerase III. This is a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria. The beta chain is required for initiation of replication from an RNA primer, nucleotide triphosphate (dNTP)\ residues being added to the 5'-end of the growing DNA chain.\ 7479 IPR011492 \

    This is the Flavivirus DEAD domain. The domain is related to the DEAD/DEAH box helicase domain which is found in a large family of ATPases.

    \ 7501 IPR011632 \ This repeat is found in a small number of proteins and is apparently limited to Coxiella.\ 4602 IPR003025 \ Otx proteins constitute a class of vertebrate homeodomain-containing\ transcription factors that have been shown to be essential for anterior\ head formation, including brain morphogenesis. They are orthologous to the\ product of the Drosophila head gap gene, orthodenticle (Otd), and appear to\ play similar roles in both, since the developmental abnormalities caused by\ disruption of these transcription factors in one, can be recovered by\ substitution of the factor(s) from the other. Such studies have provided\ strong evidence that there exists a conserved genetic programme for insect\ and mammalian brain development, which presumably arose in a more primitive\ common ancestor PUBMED:10199636, PUBMED:10440864.\

    Two vertebrate orthodenticle-related transcription factors have been\ indentified, Otx1 and Otx2, which have sizes of 355 and 289 residues\ respectively. They contain a bicoid-like homeodomain, which features a\ conserved lysine residue at position 9 of the DNA recognition helix, which\ is thought to confer high-affinity binding to TAATCC/T elements on DNA PUBMED:10375352.\ Otd-like transcription factors have also been found in zebrafish and \ certain lamprey species.

    \ 7405 IPR011443 \

    This domain appears to be found only in a small family of Chlamydia species. It is usually found repeated. The function of these proteins is not known.

    \ 1059 IPR004841 \

    Amino acid permeases are integral membrane proteins involved in the transport of amino acids into the cell. A number of such proteins have been found to be evolutionary related PUBMED:3146645, PUBMED:2687114, PUBMED:8382989. These proteins seem to contain up to 12 transmembrane segments. The best conserved region in this family is located in the second transmembrane segment.

    \

    This domain is found in a wide variety of permeases, as well as several hypothetical proteins.

    \ 6611 IPR009621 \

    This is a group of transmembrane proteins of unknown function.

    \ 4572 IPR001130 \ This family of proteins are related to a large superfamily of metalloenzymes PUBMED:9144792. TatD, a member of this family has\ been shown experimentally to be a DNase enzyme PUBMED:10747959. Allantoinase (), \ N-isopropylammelide isopropyl amidohydrolase () and \ the SCN1 protein from fission yeast belong to this family.\ 133 IPR005559 \

    CG-1 domains are highly conserved domains of about 130 amino-acid residues containing a predicted bipartite NLS and named after a partial cDNA clone isolated from parsley encoding a sequence-specific DNA-binding protein PUBMED:8075408. CG-1 domains are associated with CAMTA proteins (for CAlModulin -binding Transcription Activator) that are transcription factors containing a calmodulin-binding domain and ankyrins (ANK) motifs (Bouche et al. 2002, J. Biol. Chem., in press).

    \ 7333 IPR011101 \

    This family contains phage proteins Gp37 (bacteriophage phiE125) and Gp68 (mycobacteriophage Che9c) and bacterial homologues.

    \ 5478 IPR008520 \ This region is found as two or more repeats in a small number of hypothetical proteins.\ 5801 IPR009241 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 4964 IPR000738 \ A conserved domain of 46 amino acids, called WHEP-TRS has been shown PUBMED:1756734 to exist in \ a number of higher eukaryote aminoacyl-transfer RNA synthetases. This domain is present one to six\ times in the several enzymes. There are three copies in mammalian multifunctional aminoacyl-tRNA \ synthetase in a region that separates the N-terminal glutamyl-tRNA synthetase domain from the \ C-terminal prolyl-tRNA synthetase domain, and six copies in the intercatalytic region of the Drosophila enzyme. The domain is found at the N-terminal extremity of the mammalian tryptophanyl-\ tRNA synthetase and histidyl-tRNA synthetase, and the mammalian, insect, nematode and plant glycyl-\ tRNA synthetases PUBMED:8463296. This domain could contain a central alpha-helical region and \ may play a role in the association of tRNA-synthetases into multienzyme complexes.\ 1728 IPR005012 \

    Daxx is a ubiquitously expressed protein that functions, in part, as a transcriptional co-repressor through its interaction with a growing number\ of nuclear, DNA-associated proteins. Human Daxx contains four\ structural domains commonly found in transcriptional regulatory proteins: two predicted paired amphipathic helices, an acid-rich domain and a\ Ser/Pro/Thr (SPT)-rich domain. The post-translational modification status of the SPT-domain of hDaxx regulates its association with\ transcription factors such as Pax3 and ETS-1, effectively bringing hDaxx to sites of active transcription.\ Through its presence at the site of active transcription, hDaxx could then be able to associate with acetylated histones present in the nucleosomes and\ Dek that is associated with chromatin. Through its association with the SPT-domain of hDaxx, histone deacetylases may also\ be brought to the site of active transcription. As a consequence, nucleosomes in the vicinity of the site of active transcription will have the histone tails\ deacetylated, allowing the deactylated tail to bind to DNA, thereby leading to an inactive chromatin structure and transcriptional repression PUBMED:12140263.

    \

    The Daxx protein (also known as the Fas-binding protein) is thought to play a role in apoptosis as a component of nuclear promyelocytic leukemia\ protein (PML) oncogenic domains (PODS). Daxx associates with PODs through a direct interaction with\ PML, a critical component of PODs. The interaction is a dynamic, cell cycle regulated\ event and is dependent on the post-translational modification of PML by the small ubiquitin-related modifier SUMO-1.

    \ 266 IPR005182 \

    A domain that is found in uncharacterised family of membrane proteins. 1-3 copies found in each protein, with each copy flanked by transmembrane helices.

    \ 2641 IPR001282 \

    Glucose-6-phosphate dehydrogenase () (G6PDH) is a ubiquitous protein, present\ in bacteria and all eukaryotic cell types PUBMED:2838391. The enzyme catalyses the\ the first step in the pentose pathway, i.e. the conversion of glucose-6-phosphate to \ gluconolactone 6-phosphate in the presence of NADP, producing NADPH. The ubiquitous \ expression of the enzyme gives it a major role in the production of NADPH for the many \ NADPH-mediated reductive processes in all cells PUBMED:3393536. Deficiency of G6PDH is \ a common genetic abnormality affecting millions of people worldwide. Many sequence variants, most caused by single point mutations, are known, exhibiting a wide variety of \ phenotypes PUBMED:3393536.

    \ 1001 IPR006977 \ This domain defines a group of proteins of unknown function. \ 2349 IPR002786 \

    This is a family of prokaryotic proteins of unknown function.

    \ 3558 IPR005038 \ This octapeptide repeat is found in several bacterial proteins. The function of this repeat is unknown.\ 6279 IPR009456 \

    Moricin is a antibacterial peptide that is highly basic. The structure of moricin reveals that it is comprised of a long alpha-helix. The N terminus of the helix is amphipathic, and the C terminus of the helix is predominately hydrophobic. The amphipathic N-terminal segment of the alpha- helix is mainly responsible for the increase in permeability of the bacterial membrane which kills the bacteria PUBMED:11997013.

    \ 1508 IPR007847 \ This domain is found individually and at the N terminus of a number of multi-domain proteins, including several found in the bacterium Deinococcus radiodurans which is capable of surviving ionizing irradiation and other DNA-damaging assaults at doses that are lethal to all other organisms.\ 2800 IPR003109 \

    In heterotrimeric G-protein signalling, cell surface receptors (GPCRs) are\ coupled to membrane-associated heterotrimers comprising a GTP-hydrolyzing\ subunit G-alpha and a G-beta/G-gamma dimer. The inactive form contains the\ alpha subunit bound to GDP and complexes with the beta and gamma subunit. When\ the ligand is associated to the receptor, GDP is displaced from G-alpha and\ GTP is bound. GTP/G-alpha complex dissociates from the trimer and associates\ to an effector until the intrinsic GTPase activity of G-alpha returns the\ protein to GDP bound form. Reassociation of GDP bound G-alpha with\ G-beta/G-gamma dimer terminates the signal. Several mechanisms regulate the\ signal output at different stage of the G-protein cascade. Two classes of\ intracellular proteins act as inhibitors of G protein activation: GTPase\ activating proteins (GAPs), which enhance GTP hydrolysis (see ),\ and guanine dissociation inhibitors (GDIs), which inhibit GDP dissociation.\ The GoLoco or G-protein regulatory (GPR) motif found in various G-protein\ regulators PUBMED:10470031, PUBMED:10606204 acts as a GDI on G-alpha(i) PUBMED:11121039, PUBMED:11024022.

    \ \

    The crystal structure of the GoLoco motif in complex with G-alpha(i) has been\ solved PUBMED:11976690. It consists of three small alpha helices. The\ highly conserved Asp-Gln-Arg triad within the GoLoco motif participates\ directly in GDP binding by extending the arginine side chain into the\ nucleotide binding pocket, highly reminiscent of the catalytic arginine finger\ employed in GTPase-activating protein (see ). This addition of an\ arginine in the binding pocket affects the interaction of GDP with G-alpha and\ therefore is certainly important for the GoLoco GDI activity PUBMED:11976690.

    \ \ Some proteins known to contain a GoLoco motif are listed below:\ \
  • Mammalian regulators of G-protein signaling 12 and 14 (RGS12 and RGS14),\ multifaceted signal transduction regulators.
  • \
  • Loco, the drosophila RGS12 homologue.
  • \
  • Mammalian Purkinje-cell protein-2 (Pcp2). It may function as a cell-type\ specific modulator for G protein-mediated cell signaling. It is uniquely\ expressed in cerebellar Purkinje cells and in retinal bipolar neurons.
  • \
  • Eukaryotic Rap1GAP. A GTPase activator for the nuclear ras-related\ regulatory protein RAP-1A.
  • \
  • Drosophila protein Rapsynoid (also known as Partner of Inscuteable, Pins)\ and its mammalian homologues AGS3 and LGN. They form a G-protein regulator\ family that also contains TPR repeats.\
  • \ 530 IPR005160 \

    The Ku heterodimer (composed of Ku70 and Ku80 ) contributes to genomic integrity through its ability to bind DNA double-strand breaks and facilitate repair by the non-homologous end-joining pathway. This is the C-terminal arm. This alpha helical region embraces the beta-barrel domain of the opposite subunit PUBMED:11493912.

    \ 2017 IPR005631 \

    This is a family of uncharacterised small proteins.

    \ 7211 IPR009971 \

    This family consists of several bacterial proteins of around 90 residues in length. Members of this family seem to be found exclusively in the Orders Vibrionales and Enterobacteriales. The function of this family is unknown.

    \ 400 IPR000922 \ The D-galactoside binding lectin purified from sea urchin (Anthocidaris crassispina) eggs exists as a disulphide-linked homodimer of two subunits; the dimeric form is essential for hemagglutination activity PUBMED:2001368. The sea urchin egg lectin (SUEL) forms a new class of lectins. Although SUEL was first isolated as a D-galactoside \ binding lectin it was latter shown that it bind to L-rhamnose preferentially PUBMED:2001368, PUBMED:10564781. L-rhamnose and D-galactose share the same hydroxyl group orientation at C2 and C4 of the pyranose ring structure.

    A cysteine-rich domain homologous to the SUEL protein has been identified in the following proteins PUBMED:9261169, PUBMED:9668106, PUBMED:9920906:

    \ \ 3353 IPR003061 \

    \ The structural and functional relationships among independently cloned\ segments of the plasmid ColE1 region that regulates and codes for colicin E1\ (cea), immunity (imm) and the mitomycin C-induced lethality function (lys)\ have been analysed PUBMED:3936034. A model for the structure and expression of the \ colicin E1 operon has been proposed in which the cea and lys genes are \ expressed from a single inducible promoter that is controlled by the lexA\ repressor in response to the SOS system of Escherichia coli PUBMED:3936034. The imm \ gene lies between the cea and lys genes and is expressed by transcription\ in the opposite direction from a promoter located within the lys gene PUBMED:3936034.\ This arrangement indicates that the transcriptional units for all three\ genes overlap. It is proposed that the formation of anti-sense RNA may \ be an important element in the coordinate regulation of gene expression\ in this system PUBMED:3936034.

    \ \

    Hydropathy analysis of the imm gene products suggests that they have \ hydrophobic domains characteristic of membrane-associated proteins PUBMED:3936034.\ The microcin E1 immunity protein is able to protect a cell that harbours\ the plasmid ColE1 encoding colicin E1 against colicin E1; it is thus\ essential both for autonomous replication and colicin E1 immunity PUBMED:384144.

    \ \ 434 IPR005201 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This is a family of endo-beta-N-acetylglucosaminidases belonging to glycoside hydrolase family 85 (). These enzymes work on a broad spectrum of substrates.

    \ 3480 IPR007288 \ The NB glycoprotein is found in Influenza type B virus. Its function is unknown.\ 4969 IPR007016 \ This group of bacterial proteins is involved in the synthesis of O-antigen, a lipopolysaccharide found in the outer membrane in Gram-negative bacteria. The enzyme is coded for by the gene wzy which is part of the O-antigen gene cluster PUBMED:12107146. related proteins are found in Gram-positive organisms.\ 1799 IPR002099 \ Mismatch repair contributes to the overall fidelity of DNA replication. It\ involves the correction of mismatched base pairs that have been missed by the\ proofreading element of the DNA polymerase complex PUBMED:3304141. The sequence of some\ proteins involved in mismatch repair in different organisms have been found to\ be evolutionary related.\ 4163 IPR001790 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    On the basis of sequence similarities the following prokaryotic and eukaryotic ribosomal proteins can be grouped:\

    \ 1364 IPR000916 \

    Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Aln g 1, Api g 1, Bet v 1, Car b 1, Cor a 1, Dau c 1, Mal d 1 and Pru a 1.

    \

    Trees within the order \ Fagales possess particularly potent allergens, e.g. Bet v1, the major Birch (Betula verrucosa) pollen \ antigen. Bet v1 is the main cause of type I allergies observed in early spring. Type I, or immunoglobulin \ E-mediated (IgE-mediated) allergies affect 1 in 5 people in Europe and North America. Commonly-observed \ symptoms are hay fever, dermatitis, asthma and, in severe cases, anaphylactic shock. First contact with \ these allergens results in sensitisation; subsequent contact produces a cross-linking reaction of IgE on \ mast cells and concomitant release of histamine. The inevitable symptoms of an allergic reaction ensue.

    \

    Recent NMR analysis PUBMED:8702605 has confirmed earlier predictions of the protein structure and site of \ the major T-cell epitope PUBMED:8660368. \ The Bet v1 protein comprises 6 anti-parallel beta-strands and 3 \ alpha-helices. Four of the strands dominate the global fold, and 2 of the helices form a C-terminal\ amphipathic helical motif. This motif is believed to be the T-cell epitope. Other proteins belonging to \ this family include the major pollen allergens from alder (Aln g I); celery (Api G I); hornbeam (Car b I);\ hazel (Cor a I) and apple (Mal d I); asparagus wound-induced protein AoPR1; pathogenesis-related proteins \ from kidney bean; parsley (PR1-1 and PR1-3) and Potato (STH-2 and STH-21); pea disease resistance response \ proteins pI49, pI176 and DRRG49-C; pea abscisic acid-responsive proteins ABR17 and ABR18; and soybean \ stress-induced protein SAM22.

    \ 3024 IPR003692 \ An appreciable fraction of the sulphur present in the mammal occurs in the form of glutathione. The synthesis of glutathione and its utilization take place by the reactions of the gamma-glutamyl cycle, which include those catalysed by gamma-glutamylcysteine and glutathione synthetases, gamma-glutamyl transpeptidase, cysteinylglycinase, gamma-glutamyl cyclotransferease, and 5-oxoprolinase PUBMED:45011.\

    This family includes N-methylhydantoinase B which converts hydantoin to N-carbamyl-amino acids, and\ 5-oxoprolinase which catalyses the formation of L-glutamate from 5-oxo-L-proline.\ These enzymes are part of the oxoprolinase family and are related to hydantoinase_A.

    \ 635 IPR007271 \

    This family of membrane proteins transport nucleotide sugars from the cytoplasm into golgi vesicles. transports CMP-sialic acid, transports UDP-galactose and transports UDP-GlcNAc. This family has some but not complete overlap with the UDP-galactose transporter family .

    \ 988 IPR000697 \

    The EVH1 (WH1, RanBP1-WASP) domain is found in multi-domain proteins implicated in a diverse range of signaling, nuclear transport and cytoskeletal events. This domain of around 115 amino acids is present in species ranging from yeast to mammals. Many EVH1-containing proteins associate with actin-based structures and play a role incytoskeletal organisation. EVH1 domains recognise and bind the proline-rich motif FPPPP with low-affinity, further interactions then form between flanking residues PUBMED:11911879PUBMED:9312002

    \

    WASP family proteins contain a EVH1 (WH1) in their N-terminals which bind proline-rich sequences in the WASP interacting protein. Proteins of the RanBP1 family contain a WH1 domain in their N terminal region,\ which seems to bind a different sequence motif present in the C terminal\ part of RanGTP protein PUBMED:9883880,PUBMED:7724562.

    \

    Tertiary structure of the WH1 domain of the Mena protein revealed structure similarities with the pleckstrin homology (PH) domain. The overall fold consists of a compact parallel beta-sandwich, closed along one edge by a long alpha-helix. A highlyconserved cluster of three surface-exposed aromatic side-chains forms the recognition site for the molecules target ligands. PUBMED:10338211.

    \ 1890 IPR003742 \ Family of uncharacterized proteins of unknown function. The protein from Streptococcus pneumoniae may be a sensor regulator PUBMED:9157240.\ 5796 IPR009239 \

    This family consists of the Bacillus species-specific PapR protein. The papR gene belongs to the PlcR regulon and is located 70 bp downstream from plcR. It encodes a 48-amino-acid peptide. Disruption of the papR gene abolishes expression of the PlcR regulon, resulting in a large decrease in haemolysis and virulence in insect larvae. A processed form of PapR activates the PlcR regulon by allowing PlcR to bind to its DNA target. This activating mechanism is strain specific PUBMED:12198157.

    \ 5451 IPR008507 \ This family consists of several plant proteins of unknown function.\ 6303 IPR009467 \

    This family consists of several hypothetical bacterial proteins. The function of this family is unknown.

    \ 3399 IPR007281 \ The Mre11 complex is a multi-subunit nuclease that is composed of Mre11, Rad50 and Nbs1/Xrs2, and is involved in checkpoint signalling and DNA replication PUBMED:11988766. Mre11 has an intrinsic DNA-binding activity that is stimulated by Rad50 on its own or in combination with Nbs1 PUBMED:10823903.\ 6157 IPR010458 \

    This family consists of several fungal trichodiene synthase proteins. TRI5 encodes the enzyme trichodiene synthase, which has been shown to catalyse the first step in the trichothecene pathways of Fusarium and Trichothecium species PUBMED:9529523,PUBMED:11698643.

    \ 6402 IPR002606 \ This family consists of part of the bifunctional enzyme riboflavin \ kinase / FAD synthetase. These enzymes have both ATP:riboflavin \ 5'-phospho transferase and ATP:FMN-adenylyltransferase activities PUBMED:3023344.\ They catalyse the 5'-phosphorylation of riboflavin to FMN and the \ adenylylation of FMN to FAD PUBMED:3023344. A domain has been identified in the N-terminal region that is well conserved in all the bacterial FAD synthetases.This domain has remote similarity to nucleotidyl transferases and, hence, it may be involved in the adenylylation reaction of FAD synthetases PUBMED:12517446.\ 5167 IPR008004 \

    This family consists of several uncharacterised plant proteins of unknown function.

    \ 7700 IPR012451 \

    The proteins in this entry have not been characterised.

    \ 7757 IPR012929 \

    This domain is found in a number of proteins, including TPR protein () and yeast myosin-like proteins 1 (MLP1, ) and 2 (MLP2, ). These proteins share a number of features; for example, they all have coiled-coil regions and all three are associated with nuclear pores PUBMED:9024684, PUBMED:7798308, PUBMED:10617624. TPR is thought to be a component of nuclear pore complex- attached intranuclear filaments PUBMED:9024684, and is implicated in nuclear protein import PUBMED:7798308. Moreover, its N-terminal region is involved in the activation of oncogenic kinases, possibly by mediating the dimerisation of kinase domains or by targeting these kinases to the nuclear pore complex PUBMED:7798308. MLP1 and MLP2 are involved in the process of telomere length regulation, where they are thought to interact with proteins such as Tel1p and modulate their activity PUBMED:12490156.

    \ 2127 IPR007420 \ Family members are found in small bacterial proteins, and also in the heavy chains of eukaryotic myosin and kinesin, C-terminal of the motor domain. Members of this family may form coiled coil structures.\ 1966 IPR004878 \

    This is a group of uncharacterised proteins from eukaryotes.

    \ 3230 IPR006817 \

    This repeating sequence, NAKVDQLSNDV, is found in the enterobacterial outer membrane lipoprotein LPP. The outer membrane lipoprotein is the most abundant protein in an Escherichia coli cell. The messenger RNA for the lipoprotein of the E. coli outer membrane codes for a putative precursor, prolipoprotein, which has 20 additional amino acid residues extending from the amino terminus of the lipoprotein.

    \ 807 IPR007120 \

    RNA polymerases () catalyse the DNA dependent polymerisation of RNA.\ Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not\ including mitochondrial and chloroplast polymerases). This domain represents the\ hybrid-binding domain and the wall domain PUBMED:11313498. The\ hybrid-binding domain binds the nascent RNA strand/template DNA strand in the\ Pol II transcription elongation complex. This domain contains the important structural\ motifs, switch 3 and the flap loop and binds an active site metal ion PUBMED:11313498. This domain is also involved in binding to Rpb1 and Rpb3\ PUBMED:11313498. Many of the bacterial members contain large insertions\ within this domain, which are known as dispensable region 2 (DRII).

    \ 2010 IPR005589 \

    The members of this family are uncharacterised proteins from a number of bacterial species. They range in size from 50-100 residues.

    \ 774 IPR002859 \ Sequence similarity between a region of the autosomal dominant polycystic kidney disease (ADPKD) protein, polycystin-1 and a sea urchin sperm glycoprotein involved in fertilization, the receptor for egg jelly (suREJ) has been known for some time. The suREJ protein binds the glycoprotein coat of the egg (egg jelly), triggering the acrosome reaction, which transforms the sperm into a fusogenic cell. The sequence similarity and expression pattern suggests that the predicted human PKDREJ protein is a mammalian equivalent of the suREJ protein and therefore may have a central role in human fertilization PUBMED:9949214.\ 4635 IPR000716 \

    Thyroglobulin (Tg) is a large glycoprotein specific to the thyroid gland and is the precursor of the iodinated thyroid hormones thyroxine (T4) and triiodothyronine (T3). The N-terminal section of Tg contains 10 repeats of a domain of about 65 amino acids which is known as the Tg type-1 repeat PUBMED:3595599, PUBMED:8797845. Such a domain has also been found as a single \ or repeated sequence in the HLA class II associated invariant chain PUBMED:3038530; human pancreatic carcinoma marker proteins GA733-1 and GA733-2 PUBMED:2333300; nidogen (entactin), a sulphated glycoprotein which is widely distributed in basement membranes and that is tightly associated with laminin; insulin-like growth factor binding proteins (IGFBP) PUBMED:1709161; saxiphilin, a transferrin-like protein from Rana catesbeiana (North American bullfrog)\ that binds specifically to the neurotoxin saxitoxin PUBMED:8146142; chum salmon egg cysteine proteinase inhibitor, and equistatin, a thiol-protease inhibitor from Actinia equina (sea anemone) PUBMED:9153250. The existence of Thyr-1 domains in such a wide variety of proteins raises questions about their activity and function, and their interactions with neighbouring domains. The Thyr-1 and related domains belong to MEROPS proteinase inhibitor family I31, clan IX.

    \ \

    Equistatin from A. equina is composed of three Thyr-1 domains; as with other proteins that contains Thyr-1 domains, the thyropins, they bind reversibly and tightly to cysteine proteases (inhibitor family C1). In equistatin inhibition of papain is a function of domain-1. Unusually domain-2 inhibits cathepsin D, an aspartic protease (inhibitor family A1) and has no activity against papain. Domain-3, does not inhibit either papain or cathepsin D, and its function or its target peptidase has yet to be determined PUBMED:9153250, PUBMED:12650938.

    \ \ 7773 IPR012901 \

    This family features sequences that are similar to a region of hypothetical yeast gene product N2227 (). This is thought to be expressed during meiosis and may be involved in the defence response to stressful conditions PUBMED:8771715.

    \ 5029 IPR007212 \

    This is a probable metal-binding domain. It is found in a probable precorrin-3B C17-methyltransferase from Methanobacterium thermoautotrophicum, that catalyses the methylation of C-17 in precorrin-3B to form precorrin-4.

    \ 5083 IPR007920 \

    This family of proteins is functionally uncharacterised.

    \ 3455 IPR000881 \ Myotoxins PUBMED:2253781, PUBMED:1862521 are small basic peptides found in rattlesnake venom that cause \ severe muscle necrosis by a non-enzymatic mechanism. The peptides act very rapidly, causing \ instantaneous paralysis of the limbs to limit the flight of prey, and promoting death by paralysis of \ the diaphragm. Myotoxins have a well-conserved structure containing 6 cysteine residues, which are \ involved in 3 disulphide bridges.\ 5742 IPR008805 \ This family consists of several RIB43A-like eukaryotic proteins. Ciliary and flagellar microtubules contain a specialised set of protofilaments, termed ribbons, that are composed of tubulin and several associated proteins. RIB43A was first characterised in the unicellular biflagellate, Chlamydomonas reinhardtii although highly related sequences are present in several higher eukaryotes including humans. The function of this protein is unknown although the structure of RIB43A and its association with the specialised protofilament ribbons and with basal bodies is relevant to the proposed role of ribbons in forming and stabilising doublet and triplet microtubules and in organising their three-dimensional structure. Human RIB43A homologues could represent a structural requirement in centriole replication in dividing cells PUBMED:10637302.\ 701 IPR001849 \

    The 'pleckstrin homology' (PH) domain is a domain of about 100 residues that occurs in a wide range of proteins involved in intracellular signaling or as constituents of the cytoskeleton PUBMED:8500161, PUBMED:8497315, PUBMED:8236453, PUBMED:7985225, PUBMED:7531822, PUBMED:7890802, PUBMED:7583640.

    \

    The function of this domain is not clear, several putative functions have been suggested:

    \
  • binding to the beta/gamma subunit of heterotrimeric G proteins,
  • \
  • binding to lipids, e.g. phosphatidylinositol-4,5-bisphosphate,
  • \
  • binding to phosphorylated Ser/Thr residues,
  • \
  • attachment to membranes by an unknown mechanism.
  • \

    It is possible that different PH domains have totally different ligand requirements.

    \

    The 3D structure of several PH domains has been determined PUBMED:7634082. All known cases have a common structure consisting of two perpendicular anti-parallel beta sheets, followed by a C-terminal amphipathic helix. The loops connecting the beta-strands differ greatly in length, making the PH domain relatively difficult to detect. There are no totally invariant residues within the PH domain.

    \

    Proteins reported to contain one more PH domains belong to the following families:

    \ \ 7187 IPR009955 \

    This family consists of several mammalian liver-expressed antimicrobial peptide 2 (LEAP-2) sequences. LEAP-2 is a cysteine-rich, and cationic protein. LEAP-2 contains a core structure with two disulfide bonds formed by cysteine residues in relative 1-3 and 2-4 positions. LEAP-2 is synthesised as a 77-residue precursor, which is predominantly expressed in the liver and highly conserved among mammals. The largest native LEAP-2 form of 40 amino acid residues is generated from the precursor at a putative cleavage site for a furin-like endoprotease. In contrast to smaller LEAP-2 variants, this peptide exhibits dose-dependent antimicrobial activity against selected microbial model organisms PUBMED:12493837. The exact function of this family is unclear.

    \ 6571 IPR010622 \

    This family represents a conserved region of eukaryotic Fas-activated serine/threonine (FAST) kinases that contains several conserved leucine residues. FAST kinase is rapidly activated during Fas-mediated apoptosis, when it phosphorylates TIA-1, a nuclear RNA-binding protein that has been implicated as an effector of apoptosis PUBMED:7544399. Note that many family members are hypothetical proteins.

    \ 5127 IPR007964 \

    This family consists of several uncharacterised mammalian proteins of unknown function.

    \ 4017 IPR003685 \

    PsaD is a small, extrinsic polypeptide located on the stromal side (cytoplasmic side in cyanobacteria) of the photosystem I reaction centre complex. It is required for native assembly of PSI reaction clusters and is implicated in the electrostatic binding of ferredoxin within the reaction center PUBMED:9692933. PsaD forms a dimer in solution which is bound by PsaE however PsaD is monomeric in its native complexed PSI environment PUBMED:9692933.

    \ 2448 IPR011259 \ The ERM family consists of three closely-related proteins, ezrin, radixin and moesin PUBMED:9048483.\ Ezrin was first identified as a constituent of microvilli PUBMED:6885906, radixin as a barbed, \ end-capping actin-modulating protein from isolated junctional fractions PUBMED:2500445, and moesin as a heparin\ binding protein PUBMED:3046603. A tumour suppressor molecule responsible for neurofibromatosis type 2 (NF2)\ is highly similar to ERM proteins and has been designated merlin (moesin-ezrin-radixin-like protein).\ ERM molecules contain 3 domains, an N-terminal globular domain; an extended alpha-helical domain; and a\ charged C-terminal domain PUBMED:9048483. Ezrin, radixin and merlin also contain a polyproline region between\ the helical and C-terminal domains. The N-terminal domain is highly conserved, and is also found in merlin,\ band 4.1 proteins and members of the band 4.1 superfamily. ERM proteins crosslink actin filaments with\ plasma membranes. They co-localise with CD44 at actin filament-plasma membrane interaction sites,\ associating with CD44 via their N-terminal domains and with actin filaments via their C-terminal domains\ PUBMED:9048483.\ 3518 IPR000903 \ Myristoyl-CoA:protein N-myristoyltransferase () (Nmt) PUBMED:8322618 is the enzyme responsible \ for transferring a myristate group on the N-terminal glycine of a number of cellular eukaryotics and \ viral proteins. Nmt is a monomeric protein of about 50 to 60 kD whose sequence appears to be well \ conserved.\ 7343 IPR011093 \

    This family of proteins are from pathogenic strains of Gammaproteobacteria. Though the function of these proteins is unknown, they could be involved in pathogenesis.

    \ 4362 IPR005011 \ This family of proteins appear to contain a leucine zipper PUBMED:10887110 and may therefore be a family of transcription factors.\ 1916 IPR003802 \

    This entry describes proteins of unknown function.

    \ 262 IPR001534 \

    This new apparently nematode-specific protein family has been called family 2 PUBMED:9417907. The proteins show weak similarity to transthyretin (formerly called prealbumin) which transports thyroid hormones. The specific function of this protein is unknown.

    \ \ 5475 IPR008519 \ This family consists of a series of 29 residue long repeats found in a single Caenorhabditis elegans protein . The function of both the repeat and the whole sequence are unknown.\ 3868 IPR001082 \ Pilin is a subunit of the pilus, a polar flexible filament, which consists\ of a single polypeptide chain arranged in a helical configuration of five\ subunits per turn. Gram-negative bacteria produce pilin which is characterized\ by the presence of a very short leader peptide of 6 to 7 residues, followed by\ a methylated N-terminal phenylalanine residue and by a highly conserved sequence\ of about 24 hydrophobic residues, of the NMePhe type pilin PUBMED:2898203, PUBMED:3118043.\ 1413 IPR004873 \ The BURP domain is found at the C-terminus of several different plant proteins. It was named after the proteins in which it was first\ identified: the BNM2 clone-derived protein from Brassica napus; USPs and USP-like proteins (, ); RD22 from Arabidopsis thaliana; and PG1beta from Lycopersicon esculentum. This domain is\ around 230 amino acid residues long. It possesses the following conserved features: two phenylalanine residues at its N-terminus; two\ cysteine residues; and four repeated cysteine-histidine motifs, arranged as: CH-X(10)-CH-X(25-27)-CH-X(25-26)-CH, where X can be\ any amino acid PUBMED:9790599. The function of this domain is unknown.\ 7354 IPR006563 \

    This domain in found exclusively in plant proteins, associated with HOX domains which may suggest these proteins are\ homeodomain transcription factors.

    \ 80 IPR003340 \ Two DNA binding proteins, RAV1 and RAV2 from Arabidopsis thaliana contain two distinct amino acid sequence domains found only in higher plant species. The N-terminal regions of RAV1 and RAV2 are homologous to the AP2 DNA-binding domain (see ) present in a family of transcription factors, while the C-terminal region exhibits homology to the highly conserved C-terminal domain, designated B3, of VP1/ABI3 transcription factors PUBMED:9862967. The AP2 and B3-like domains of RAV1 bind autonomously to the CAACA and CACCTG motifs, respectively, and together achieve a high affinity and specificity of binding. It has been suggested that the AP2 and B3-like domains of RAV1 are connected by a highly flexible structure\ enabling the two domains to bind to the CAACA and CACCTG motifs in various\ spacings and orientations PUBMED:9862967.\ 2068 IPR007293 \ This domain is found in functionally uncharacterised proteins from such pathogenic bacteria as Helicobacter pylori, Campylobacter jejuni, and Vibrio cholerae. The H. pylori protein consists of two copies of this domain.\ 6325 IPR009477 \

    This family consists of several hypothetical Baculovirus proteins of unknown function.

    \ 6333 IPR010523 \

    This domain is found at the N terminus of a subset of sigma54-dependent transcriptional activators in several proteobacteria, including activators of phenol degradation such as XylR. It is found adjacent to .

    \ 4996 IPR003537 \ Secretion of virulence factors in Gram-negative bacteria involves \ transportation of the protein across two membranes to reach the cell \ exterior. There have been four secretion systems described in \ animal enteropathogens, such as Salmonella and Yersinia, with further \ sequence similarities in plant pathogens like Ralstonia and Erwinia PUBMED:9618447.\ \

    The type III secretion system is of great interest, as it is used to \ transport virulence factors from the pathogen directly into the host cell \ and is only triggered when the bacterium comes into close contact with\ the host. The protein subunits of the system are very similar to those of \ bacterial flagellar biosynthesis. However, while the latter forms a\ ring structure to allow secretion of flagellin and is an integral part of\ the flagellum itself PUBMED:9618447, type III subunits in the outer membrane \ translocate secreted proteins through a channel-like structure.

    \ \

    Exotoxins secreted by the type III system do not possess a secretion signal,\ and are considered unique for this reason PUBMED:9618447. Yersinia secrete a Rho GTPase-activating protein, YopE PUBMED:2307658, PUBMED:2191183, that disrupts the host cell actin cytoskeleton. YopE is regulated by another bacterial gene, SycE PUBMED:10419539, that enables the exotoxin to remain soluble in the bacterial cytoplasm. A similar protein, exoenzyme S from Pseudomonas aeruginosa, has both ADP-ribosylation and GTPase activity PUBMED:2191183, PUBMED:10419539.

    \ 973 IPR007705 \

    A crucial step in membrane fusion is the formation of the SNARE complex in which conserved regions, called 'SNARE motifs', from individual SNAREs associate and twist to form the core complex, which is an all-parallel coiled coil. The neuronal SNARE complex is a heterotrimer of vesicular (v-) SNARE\ VAMP-2 and the two target plasma membrane (t-) SNAREs\ syntaxin 1A and SNAP-25. It has been proposed that SNARE core complex formation proceeds like a zipper, beginning at the\ membrane-distal region and propagating toward the\ membrane-proximal end. SNARE complex formation is an\ energy-releasing process that may supply the required free\ energy for membrane fusion PUBMED:12740606.

    \

    This family includes the Golgi SNAP receptor (SNARE) complex protein, which is involved in transport from the endoplasmic reticulum to the golgi apparatus and intra-golgi transport, and the vesicle transport v-SNARE protein, that mediates vesicle transport pathways through interaction with T-SNAREs on the target membrane.

    \ 612 IPR007781 \ Alpha-N-acetylglucosaminidase is a lysosomal enzyme required for the stepwise degradation of heparan sulphate PUBMED:10588735. Mutations on the alpha-N-acetylglucosaminidase (NAGLU) gene can lead to Mucopolysaccharidosis type IIIB (MPS IIIB; or Sanfilippo syndrome type B) characterised by neurological dysfunction but relatively mild somatic manifestations PUBMED:12049639.\ 7639 IPR012916 \

    This domain contains sequences that are similar to the N-terminal region of Red protein (). This and related proteins contain a RED repeat which consists of a number of RE and RD sequence elements PUBMED:10216252. The region in question has several conserved NLS sequences and a putative trimeric coiled-coil region PUBMED:10216252, suggesting that these proteins are expressed in the nucleus PUBMED:10216252. The function of Red protein is unknown, but efficient sequestration to nuclear bodies suggests that its expression may be tightly regulated, or that the protein self-aggregates extremely efficiently PUBMED:10216252.

    \ 1057 IPR000833 \ Alpha amylase inhibitor inhibits mammalian alpha-amylases specifically, by forming a tight \ stoichiometric 1:1 complex with alpha-amylase. The inhibitor has no action on plant and microbial\ alpha amylases.\ 7454 IPR013036 \

    A region of similarity shared by several Rhodopirellula baltica cytochrome-like proteins that are predicted to be secreted. These proteins also contain , , and .

    \ 344 IPR006715 \ The N-terminal of the PEA3 transcription factors is implicated in transactivation and in inhibition of DNA binding PUBMED:9259977. Transactivation is potentiated by activation of the Ras/MAP kinase and protein kinase A signalling cascades. The N-terminal region contains conserved MAP kinase phosphorylation sites PUBMED:9285689.\ 4133 IPR003150 \ RFX is a regulatory factor which binds to the X box of MHC class II genes and is essential for their expression. The DNA-binding domain of RFX is the central domain of the protein and binds ssDNA as either a monomer or homodimer PUBMED:2253877.\ 7011 IPR009843 \

    This family consists of several hypothetical bacterial proteins of around 320 residues in length. Members of this family are mainly found in Rhizobium and Agrobacterium species. The function of this family is unknown.

    \ 5501 IPR008533 \ This domain consists of several bacterial proteins of unknown function.\ 1208 IPR004914 \ This family includes various protein that are involved in antirestriction. The ArdB protein efficiently inhibits restriction by\ members of the three known families of type I systems of Escherichia coli PUBMED:8393008. \ 6449 IPR010578 \

    This entry represents the C-terminal region of the eukaryotic single-minded (SIM) protein. Drosophila single-minded acts as a positive master gene regulator in central nervous system midline formation. There are two homologues in mammals: SIM1 and SIM2, which are members of the basic-helix-loop-helix PAS family of transcription factors. SIM1 and SIM2 are novel heterodimerisation partners for ARNT in vitro, and they may function both as positive and negative transcriptional regulators in vivo, during embryogenesis and in the adult organism PUBMED:9020169. SIM2 is thought to contribute to some specific Down syndrome phenotypes PUBMED:9199934. This domain is found in conjunction with a domain and associated with motif.

    \ 6235 IPR009441 \

    This family consists of several Borna disease virus P40 proteins. Borna disease (BD) is a persistent viral infection of the central nervous system caused by the single-negative-strand, nonsegmented RNA Borna disease virus (BDV). P40 is known to be a nucleoprotein PUBMED:9882386.

    \ 5001 IPR005229 \

    This family of conserved hypothetical proteins has no known function.

    \ 4663 IPR000380 \ Prokaryotic topoisomerase I () PUBMED:7773745, PUBMED:7770916, otherwise known as relaxing enzyme, untwisting \ enzyme or swivelase, catalyses the ATP-independent breakage of single-\ stranded DNA, followed by passage and rejoining of another single-stranded \ DNA region PUBMED:8114910. This reaction brings about the conversion of one topological\ isomer of DNA into another: e.g., relaxation of superhelical turns; \ interconversion of simple and knotted rings of single-stranded DNA; and\ intertwisting of single-stranded rings of complementary sequences PUBMED:8114910, PUBMED:2553698.\ Prokaryotic topoisomerase I folds in an unusual way to give 4 distinct\ domains, enclosing a hole large enough to accommodate a double-stranded DNA\ segment PUBMED:8114910. A tyrosine at the active site, which lies at the interface of\ 2 domains, is involved in transient breakage of a DNA strand, and formation\ of a covalent protein-DNA intermediate PUBMED:8114910. The structure reveals a\ plausible mechanism by which this and related enzymes could catalyse the \ passage of one DNA strand through a transient break in another strand PUBMED:8114910.\ Escherichia coli contains 2 type I topoisomerases: topoisomerases I and III PUBMED:2553698.\ Topoisomerase III can be purified as a potent concatenase, but its role in\ DNA metabolism is still unclear PUBMED:2553698. Yeast, a eukaryote, also contains a\ topoisomerase, which is similar in sequence and function to the prokaryotic\ type I topoisomerases PUBMED:2546682.\ 2390 IPR006947 \ Allicin is a thiosulphinate that gives rise to dithiines, allyl sulphides and ajoenes, the three groups of active compounds in Allium species. Allicin is synthesised from sulphoxide cysteine derivatives by alliinase, whose C-S lyase activity cleaves C(beta)-S(gamma) bonds. It is thought that this enzyme forms part of a primitive plant defence system PUBMED:12235163.\ 3308 IPR007444 \ This family represents MdoG, a protein that is necessary for the synthesis of periplasmic glucans. The function of MdoG remains unknown. It has been suggested that it may catalyse the addition of branches to a linear glucan backbone.\ 2368 IPR001017 \ This entry includes a number of dehydrogenases all of which use thiamine\ pyrophosphate as a cofactor and are members of a multienzyme complex.\ Pyruvate dehydrogenase (), a component of the multienzyme\ pyruvate dehydrogenase complex; 2-oxoglutarate dehydrogenase (),\ a component of the multienzyme 2-oxoglutarate dehydrogenase which contains\ multiple copies of three enzymatic components: 2-oxoglutarate dehydrogenase (E1),\ dihydrolipoamide succinyltransferase (E2) and lipoamide dehydrogenase (E3);\ and 2-oxoisovalerate dehydrogenase (), a component of the multienzyme\ branched-chain alpha-keto dehydrogenase complex all belong to this family.\ 2526 IPR002561 \ This family includes an extracellular region from the\ envelope glycoprotein of Ebola and Marburg viruses.\ This region is also produced as a separate transcript\ that gives rise to a non-structural, secreted glycoprotein,\ which is produced in large amounts and has an unknown function PUBMED:9576958.\ Processing of this protein may be involved in viral\ pathogenicity PUBMED:8622982.\ 3351 IPR005330 \

    The MHYT (~190-residue) domain is thought to function as a sensor domain in bacterial signalling proteins, and is named after its conserved amino acid motif, methionine, histidine, and tyrosine. The MHYT domain consists of six predicted transmembrane (TM) segments, connected by short arginine-rich cytoplasmic and periplasmic loops rich in charged residues. Three of the TM segments contain the MHYT motif near the outer face of the cytoplasmic membrane. The MHYT domain has been found in several phylogenetically distinct bacteria, either as a separate, single domain, or in combination with other domains, such as a LytTR-type DNA-binding helix-turn-helix (), or the signalling domains histidine kinase (), GGDEF (), EAL () or PAS (). Proteins containing this repeat include CoxC () and CoxH () from Pseudomonas carboxydovorans.

    \ 2188 IPR007479 \ This is a small bacterial protein of unknown function.\ 7412 IPR011445 \

    These proteins share a highly-conserved sequence at their N terminus. They include several proteins from Rhodopirellula baltica and also several from proteobacteria.

    \ 3233 IPR003835 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    These enzymes belong to the glycosyltransferase family 19 . Lipid-A-disaccharide synthetase is involved with acyl-[acyl-carrier-protein]--UDP-N-acetylglucosamine O-acyltransferase and tetraacyldisaccharide 4'-kinase in the biosynthesis of the phosphorylated glycolipid, lipid A, in the outer membrane of Escherichia coli and other bacteria. These enzymes catalyse the first disaccharide step in the synthesis of lipid-A-disaccharide.

    \ 4372 IPR007575 \ Members of this entry have only been identified in species of the Streptomyces genus. Two family members are known to be part of gene clusters involved in the synthesis of polyketide-based spore pigments, homologous to clusters involved in the synthesis of polyketide antibiotics. The function of this protein is unknown, but it has been speculated to contain a NAD(P) binding site PUBMED:8344517.\ 1905 IPR003775 \

    This entry describes proteins of unknown function.

    \ 8110 IPR013173 \

    Eubacterial DnaG primases interact with several factors to form the replisome. One of these factors is DnaB, a helicase. This domain has been demonstrated to be responsible for the interaction between DnaG and DnaB PUBMED:8308039.

    \ 7822 IPR013116 \

    Acetohydroxy acid isomeroreductase catalyses the conversion of acetohydroxy acids into dihydroxy valerates. This reaction is the second in the synthetic pathway of the essential branched side chain amino acids valine and isoleucine.

    \ 2561 IPR005503 \ This FliL protein controls the rotational direction of the flagella during chemotaxis PUBMED:3519573. FliL is a cytoplasmic membrane protein associated with the basal body PUBMED:10439416.\ 4919 IPR007428 \

    \ Lipoproteins in Gram-negative microbes also act as structural stabilisers,\ forming non-covalent bonds with peptidoglycan on the outer membrane of the \ cell PUBMED:7542800. Following completion of the genomes of several Gram-negative \ prokaryotes, a putative lipoprotein, VacJ, has been discovered in the raw \ sequence open reading frames. Biochemical analysis of the Shigella \ flexneri VacJ protein revealed it to be essential for virulence, promoting \ spread of bacterial cells through the intercellular space of tissues PUBMED:8145644. \

    \

    \ Upon expression in the facultative intracellular microbe, host cells form \ membranous protrusions containing the pathogen, allowing it to move to the \ cytoplasm of the next target cell. As homologues of this lipoprotein \ have largely been found in obligate or facultative intracellular microbial \ genomes, it appears to be specific for that particular lifestyle PUBMED:8145644.\

    \ \ 1167 IPR007173 \ This domain is specific to D-arabinono-1,4-lactone oxidase , which is involved in the final step of the D-erythroascorbic acid biosynthesis pathway PUBMED:10094636.\ 3435 IPR004332 \ The plant MuDR transposase domain is present in plant proteins that are presumed to be the transposases for Mutator transposable elements PUBMED:7672579, PUBMED:1661256. The function of these proteins is unknown.\ 6684 IPR009661 \

    This family consists of the N-terminal region of several hypothetical Nucleopolyhedrovirus proteins of unknown function.

    \ 1298 IPR000526 \ Auxin binding protein is located in the lumen of the endoplasmic reticulum (ER). The primary structure contains an N-terminal hydrophobic leader sequence of 30-40 amino acids, which could represent a signal for translocation of the protein to the ER PUBMED:2555179, PUBMED:1321684. The mature protein comprises around 165 residues, and contains a number of potential N-glycosylation sites. In vitro transport studies have demonstrated co-translational glycosylation PUBMED:1321684. Retention within the lumen of the ER correlates with\ an additional signal located at the C terminus, represented by the sequence Lys-Asp-Glu-Leu, known to be responsible for preventing secretion of proteins from the lumen of the ER in eukaryotic cells PUBMED:2555179, PUBMED:1321684.\ 2004 IPR005537 \

    The members of this family have no known function. They are around 300 amino acids in length and have two conserved motifs. At the N-terminus is a PXXIG motif and a more strongly conserved motif in the central region YXPGXXXKGXXR where X can be any amino acid.

    \ 3324 IPR000869 \

    Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, \ nickel, etc. They have a high content of cysteine residues that bind the metal ions through clusters \ of thiolate bonds PUBMED:3064814, PUBMED:2959513, PUBMED:1779825. The metallothionein superfamily comprises \ all polypeptides that resemble equine renal metallothionein in several respects, e.g. low molecular\ weight; high metal content; amino acid composition with high Cys and low aromatic residue content; \ unique sequence with characteristic distribution of cysteines, and spectroscopic manifestations \ indicative of metal thiolate clusters. A MT family subsumes MTs that share particular sequence-specific \ features and are thought to be evolutionarily related. Fifteen MT families have been characterised, \ each family being identified by its number and its taxonomic range.\

    Fungi-IV (family 11) MTs are \ proteins of about 55-56 residues, with 9 conserved cysteines. Its members are recognised by the sequence pattern C-X-K-C-x-C-x(2)-C-K-C. \ The taxonomic range of the members extends to ascomycotina. \ The protein contains a number of unusual histidine and phenylalanine residues conserved in the N-terminal part of the sequence. This fragment does not contain any Cys. The protein binds to copper ions.

    \ 28 IPR005613 \

    Aip3p/Bud6p is a regulator of cell and cytoskeletal polarity in Saccharomyces cerevisiae that was previously identified as an\ actin-interacting protein. Actin-interacting protein 3 (Aip3p) localizes at the cell cortex where cytoskeleton assembly must be achieved\ to execute polarized cell growth, and deletion of AIP3 causes gross defects in cell and cytoskeletal polarity. Aip3p localization is mediated by the secretory pathway, mutations in early- or late-acting components of the secretory apparatus lead\ to Aip3p mislocalization PUBMED:10679021.

    \ 4645 IPR003536 \ Secretion of virulence factors in Gram-negative bacteria involves \ transportation of the protein across two membranes to reach the cell \ exterior. There have been four secretion systems described in \ animal enteropathogens, such as Salmonella and Yersinia, with further \ sequence similarities in plant pathogens like Ralstonia and Erwinia PUBMED:9618447.\ \

    The type III secretion system is of great interest, as it is used to \ transport virulence factors from the pathogen directly into the host cell \ and is only triggered when the bacterium comes into close contact with\ the host. The protein subunits of the system are very similar to those of \ bacterial flagellar biosynthesis. However, while the latter forms a\ ring structure to allow secretion of flagellin and is an integral part of\ the flagellum itself PUBMED:9618447, type III subunits in the outer membrane \ translocate secreted proteins through a channel-like structure.

    \ \

    Exotoxins secreted by the type III system do not possess a secretion signal,\ and are considered unique for this reason PUBMED:9618447. Enteropathogenic and entero-\ haemorrhagic Escherichia coli secrete the bacterial adhesion mediation\ molecule intimin PUBMED:10835344, which targets the translocated intimin receptor, Tir. Tir is secreted by the bacteria and is embedded in the target cell's plasma membrane PUBMED:10835344. This facilitates bacterial cell attachment to the host.

    \ 6923 IPR010771 \

    This family consists of several bacterial intracellular growth attenuator (IgaA) proteins. IgaA is involved in negative control of bacterial proliferation within fibroblasts. IgaA is homologous to the Escherichia coli YrfF and Proteus mirabilis UmoB proteins. Whereas the biological function of YrfF is currently unknown, UmoB has been shown elsewhere to act as a positive regulator of FlhDC, the master regulator of flagella and swarming. FlhDC has been shown to repress cell division during P. mirabilis swarming, suggesting that UmoB could repress cell division via FlhDC. This biological function, if maintained in Salmonella enterica, could sustain a putative negative control of cell division and growth exerted by IgaA in intracellular bacteria PUBMED:11553591.

    \ 4516 IPR003120 \ This family consists of transcription factors related to STE and is found associated with the C2H2 zinc finger in some proteins.\ 4882 IPR007193 \ Transcripts harbouring premature signals for translation termination are recognized and rapidly degraded by eukaryotic cells through a pathway known as nonsense-mediated mRNA decay. In Saccharomyces cerevisiae, three trans-acting factors (Upf1 to Upf3) are required for nonsense-mediated mRNA decay PUBMED:11073994.\ 2850 IPR007804 \

    Gas vesicles are intracellular, protein-coated, and hollow organelles found in cyanobacteria and halophilic archaea. They are\ permeable to ambient gases by diffusion and provide buoyancy, enabling cells to move upwards in water to access oxygen and/or light. Proteins containing this family are involved in the formation of gas vesicles PUBMED:9573198.\

    \ 6590 IPR009613 \

    This family, which includes bacterial and eukaryotic members, represents a conserved region located towards the C-terminal end of a number of hypothetical proteins of unknown function. These are possibly integral membrane proteins.

    \ 492 IPR001387 \

    This is large family of DNA binding helix-turn helix proteins that include a bacterial plasmid copy control protein, bacterial methylases, various bacteriophage transcription control proteins and a vegetative specific protein from Dictyostelium discoideum.

    \ 6081 IPR010426 \

    This family consists of several trimethylamine methyltransferase (MTTB) proteins from numerous Rhizobium and Methanosarcina species.

    \ 5186 IPR008023 \

    This is a family of proteins of unknown function.

    \ 3385 IPR007681 \ Segregation of nuclear and cytoplasmic processes facilitates regulation of many eukaryotic cellular functions such as gene expression and cell cycle progression. Trafficking through the nuclear pore requires a number of highly conserved soluble factors that escort macromolecular substrates into and out of the nucleus. The Mog1 protein has been shown to interact with RanGTP, which stimulates guanine nucleotide release, suggesting Mog1 regulates the nuclear transport functions of Ran PUBMED:11733047. The human homologue of Mog1 is thought to be alternatively spliced.\ 1984 IPR005175 \

    This putative conserved domain is found in proteins that contain AT-hook motifs , suggesting a DNA-binding function for the proteins as a whole, however, the function of this domain is unknown. Overexpression of a protein containing this domain, , in Arabidopsis thaliana causes late flowering and modified leaf development PUBMED:10759496.

    \ 4543 IPR011547 \

    A number of proteins involved in the transport of sulphate across a membrane\ as well as some yet uncharacterised proteins have been shown PUBMED:8140616, PUBMED:7616962 to be evolutionary related.\ These proteins are:\

    \

    These proteins are highly hydrophobic and seem to contain about 12 transmembrane domains.

    \ \ 4979 IPR006086 \

    Xeroderma pigmentosum (XP) PUBMED:8160271 is a human autosomal recessive disease, characterized by a high incidence of sunlight-induced skin cancer. People's skin cells with this condition are hypersensitive to ultraviolet light, due to defects in the incision step of DNA excision repair. There are a minimum of seven genetic complementation groups involved in this pathway: XP-A to XP-G. XP-G is one of the most rare and phenotypically heterogeneous of XP, showing anything from slight to extreme dysfunction in DNA excision repair PUBMED:8464724, PUBMED:8206890. XP-G can be corrected by a 133 Kd nuclear protein, XPGC PUBMED:8160271. XPGC is an acidic protein that confers normal UV resistance in expressing cells PUBMED:8206890. It is a magnesium-dependent, single-strand DNA endonuclease that makes structure-specific endonucleolytic incisions in a DNA substrate containing a duplex region and single-stranded arms PUBMED:8206890, PUBMED:8090225. XPGC cleaves one strand of the duplex at the border with the single-stranded region PUBMED:8090225.

    \

    XPG belongs to a family of proteins that includes RAD2 from budding yeast and rad13 from fission yeast, which are single-stranded DNA endonucleases PUBMED:8090225, PUBMED:8247134; mouse and human FEN-1, a structure-specific endonuclease; RAD2 from fission yeast and RAD27 from budding yeast; fission yeast exo1, a 5'-3' double-stranded DNA exonuclease that may act in a pathway that corrects mismatched base pairs; yeast DHS1, and yeast DIN7. Sequence alignment of this family of proteins reveals that similarities are largely confined to two regions. The first is located at the N-terminal extremity (N-region) and corresponds to the first 95 to 105 amino acids. The second region is internal (I-region) and found towards the C-terminus; it spans about 140 residues and contains a highly conserved core of 27 amino acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). It is possible that the conserved acidic residues are involved in the catalytic mechanism of DNA excision repair in XPG. The amino acids linking the N- and I-regions are not conserved.

    \ 176 IPR000323 \ Copper type II, ascorbate-dependent monooxygenases PUBMED:2792366 are a class of enzymes\ that requires copper as a cofactor and which uses ascorbate as an electron\ donor. This family contains two related enzymes, Dopamine-beta-monooxygenase ()\ and Peptidyl-glycine alpha-amidating monooxygenase ().\ There are a few regions of sequence similarities between these two enzymes,\ two of these regions contain clusters of conserved histidine residues which\ are most probably involved in binding copper.\ 6108 IPR010435 \

    This domain of unknown function is present in bacterial and plant peptidases belonging to MEROPS peptidase family S8 (clan SB), subfamily S8A subtilisin), and is found in conjunction with the PA (Protease associated) domain () and additionally in bacteria with the surface protein anchor domain ().

    \ 3800 IPR001672 \

    Phosphoglucose isomerase () (PGI) PUBMED:6115414, PUBMED:1593646 is a dimeric enzyme that catalyses the reversible isomerization of glucose-6-phosphate and fructose-6-phosphate. PGI is involved in different pathways: in most higher organisms it is involved in glycolysis; in mammals it is involved in gluconeogenesis; in plants in carbohydrate biosynthesis; in some bacteria it provides a gateway for fructose into the Entner-Doudouroff pathway. The multifunctional protein, PGI, is also known as neuroleukin (a neurotrophic factor that mediates the differentiation of neurons), autocrine motility factor (a tumour-secreted cytokine that regulates cell motility), differentiation and maturation mediator and myofibril-bound serine proteinase inhibitor, and has different roles inside and outside the cell. In the cytoplasm, it catalyses the second step in glycolysis, while outside the cell it serves as a nerve growth factor and cytokine PUBMED:10653639.

    \

    PGI from Bacillus stearothermophilus has an open twisted alpha/beta structural motif consisting of two globular domains and two protruding parts. It has been suggested that the top part of the large domain together with one of the protruding loops might participate in inducing the neurotrophic activity PUBMED:10318897. The structure of rabbit muscle phosphoglucose isomerase complexed with various inhibitors shows that the enzyme is a dimer with two alpha/beta-sandwich domains in each subunit. The location of the bound D-gluconate 6-phosphate inhibitor leads to the identification of residues involved in substrate specificity. In addition, the positions of amino acid residues that are substituted in the genetic disease nonspherocytic hemolytic anemia suggest how these substitutions can result in altered catalysis or protein stability PUBMED:10653639, PUBMED:10770936.

    \ 3015 IPR000005 \

    Many bacterial transcription regulation proteins bind DNA through a\ 'helix-turn-helix' (HTH) motif. One major subfamily of these proteins PUBMED:8451183, PUBMED:2314271 is related to the arabinose \ operon regulatory protein AraC PUBMED:8451183, PUBMED:2314271.\ Except for celD PUBMED:2179047, all of these proteins seem to be positive transcriptional factors.

    \ \

    Although the sequences belonging to this family differ somewhat in length, in nearly every case the HTH motif is situated towards the C-terminus in the third quarter of most of the sequences. The minimal DNA binding domain spans roughly 100 residues and comprises two HTH subdomains; the classical HTH domain and another HTH subdomain with similarity to the classical HTH domain but with an insertion of one residue in the turn-region. The N-terminal and central regions of these proteins are presumed\ to interact with effector molecules and may be involved in dimerization PUBMED:8516313.

    \ \

    The known structure of MarA () shows that the AraC domain is alpha helical and shows the two HTH subdomains both bind the major groove of the DNA. The two HTH subdomains are separated by only 27\ angstroms, which causes the cognate DNA to bend.

    \ 288 IPR007491 \ Some members of this plant protein family have one or more zinc-finger motifs towards the C terminus of the region represented in this family.\ 1829 IPR002773 \ Eukaryotic initiation factor 5A (eIF-5A) contains an unusual amino acid,\ hypusine [N epsilon-(4-aminobutyl-2-hydroxy)lysine]. The first step in the\ post-translational formation of hypusine is catalysed by the enzyme\ deoxyhypusine synthase (DS) . The enzyme catalyses the following reaction:\ \ The modified version of eIF-5A,\ and DS, are required for eukaryotic cell proliferation PUBMED:9493264. The structure is known for this enzyme PUBMED:9493264 in complex with its NAD+ cofactor.\ 6114 IPR009382 \

    This family consists of several insect coleoptericin, acaloleptin, holotricin and rhinocerosin proteins which are all known to be antibacterial proteins PUBMED:11520352. These all appear to be short, glycine-rich molecules, inducible by infection.

    \ 6514 IPR009570 \

    This family consists of several bacterial stage III sporulation protein AC (SpoIIIAC) sequences. The exact function of this family is unknown.

    \ 5255 IPR008614 \ Acidic fibroblast growth factor (aFGF) intracellular binding protein (FIBP) is a protein found mainly in the nucleus that is thought to be involved in the intracellular function of aFGF PUBMED:11104667.\ 3036 IPR005296 \

    These proteins are the product of ORF 3C from Avian infectious bronchitis virus (IBV). Currently, the function of this protein remains unknown.

    \ 814 IPR007201 \

    This RNA recognition motif 2 is found in Meiosis protein mei2. It is found C-terminal to the RNA-binding region RNP-1 ().

    \ 1272 IPR003135 \ The ATP-grasp domain has an unusual nucleotide-binding fold, also referred to as palmate, and is found in a superfamily of enzymes including D-alanine-D-alanine ligase, glutathione synthetase, biotin carboxylase, and carbamoyl phosphate\ synthetase, the ribosomal protein S6 modification enzyme (RimK), urea amidolyase, tubulin-tyrosine ligase, and three enzymes of purine biosynthesis. This family does not contain all known ATP-grasp domain members. All the enzymes of this family possess ATP-dependent carboxylate-amine ligase activity, and their catalytic mechanisms are likely to include acylphosphate intermediates.\ 7786 IPR012882 \

    This is a family of uncharacterised fungal proteins.

    \ 3876 IPR003102 \ The nuclear factor CREB activates transcription of target genes in part through direct interactions with the KIX domain of the coactivator CBP in a phosphorylation-dependent manner. CBP and P300 bind to the pKID (phosphorylated kinase-inducible-domain) domain of CREB PUBMED:9413984.\ 1607 IPR004288 \ This family consists exclusively of streptococcal competence stimulating peptide precursors, which are generally up to 50 amino acid residues long. In all the members of this family, the leader sequence is cleaved after two conserved glycine residues; thus the leader sequence is of the double- glycine type PUBMED:9352904. Competence stimulating peptides (CSP) are small\ (less than 25 amino acid residues) cationic peptides. The N-terminal amino acid residue is negatively charged, either\ glutamate or aspartate. The C-terminal end is positively charged. The third residue is also positively charged: a highly\ conserved arginine PUBMED:9352904. Some COMC proteins and their precursors (not included in this family) do not fully follow the\ above description.\

    Functionally, CSP act as\ pheromones, stimulating competence for genetic transformation in streptococci. In streptococci, the (CSP mediated)\ competence response requires exponential cell growth at a critical density, a relatively simple requirement when\ compared to the stationary-phase requirement of Haemophilus, or the late-logarithmic- phase of Bacillus PUBMED:7479953. All bacteria\ induced to competence by a particular CSP are said to belong to the same pherotype, because each CSP is recognized\ by a specific receptor (the signalling domain of a histidine kinase ComD). Pherotypes are not necessarily species-specific.\ In addition, an organism may change pherotype. There are two possible mechanisms for pherotype switching: horizontal\ gene transfer, and accumulation of point mutations. The biological significance of pherotypes and pherotype switching is\ not definitively determined. Pherotype switching occurs frequently enough in naturally competent streptococci to suggest\ that it may be an important contributor to genetic exchange between different bacterial species PUBMED:9352904.

    \ 7876 IPR012551 \

    This domain is found in a variety of actinomycetales proteins. All of the proteins containing this domain are hypothetical and probably membrane bound or associated. Currently, it is unclear to the function of this domain.

    \ 4306 IPR002148 \ This proteins in this family are variouselt described as either NSP or NS53. They are non-structural RNA binding protein that contain a characteristic cysteine rich region PUBMED:8395125, PUBMED:9015101.\ It is made in low levels in the infected cells and is a component of early replication and is known to accumulate on the cytoskeleton of the infected cell.\ 3810 IPR005564 \

    Major capsid protein E plays a role in the stabilization of the condensed form of the DNA molecule in phage heads PUBMED:2522554.

    \ 5238 IPR008741 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases corresponds to MEROPS peptidase family C31 (clan CA). Type example is porcine respiratory and reproductive syndrome arterivirus-type cysteine proteinase alpha (lactate-dehydrogenase-elevating virus), which is involved in viral polyprotein processing PUBMED:10725411.

    \ 4456 IPR006939 \ SNF5 is a component of the yeast SWI/SNF complex, which is an ATP-dependent nucleosome-remodelling complex that regulates the transcription of a subset of yeast genes. SNF5 is a key component of all SWI/SNF-class complexes characterised so far PUBMED:10325430. This family consists of the conserved region of SNF5, including a direct repeat motif. SNF5 is essential for the assembly promoter targeting and chromatin remodelling activity of the SWI-SNF complex PUBMED:11390659. SNF5 is also known as SMARCB1, for SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin, subfamily b, member 1, and also INI1 for integrase interactor 1. Loss-of function mutations in SNF5 are thought to contribute to oncogenesis in malignant rhabdoid tumours (MRTs) PUBMED:9671307.\ 2489 IPR000782 \

    \ The FAS1 or BIgH3 domain is an extracellular module of about 140 amino acid\ residues. It has been suggested that the FAS1 domain represents an ancient\ cell adhesion domain common to plants and animals PUBMED:7925267; related FAS1 domains\ are also found in bacteria PUBMED:7822037. Most FAS1 domain containing proteins are GPI\ anchored and contain two or four copies of the domain. FAS1 domains of BIgH3\ protein mediate cell adhesion throught an interaction with alpha3/beta1\ integrin. A short motif (EPDIM), located on the C-teminal side of the fourth\ domain, is essential for the binding to integrin PUBMED:10906123.\ \ The crystal structure of two FAS1 domains (FAS1 3-4) of a fas1 protein have\ been solved PUBMED:12575939. Each domain consists of seven-stranded wedge\ and at least five alpha helices. Two well-ordered N-acetylglucosamine moities\ attached to a conserved asparagine are located in the interface region between\ the two FAS1 domains.

    \ \ Some of the proteins containing a FAS1 domain are listed below:\ \
  • Drosophila fasciclin I (fas1) protein. A cell adhesion molecule involved in\ axon guidance. It is attached to the membrane by a GPI-anchor (4 copies).
  • \
  • Human TGF-beta induced Ig-H3 (BIgH3) protein. Mutation in its FAS1 domains\ result in corneal distrophy, due to the deposition of insoluble protein\ aggregates (4 copies).
  • \
  • Arabidopsis fasciclin-like arabinogalactan proteins (2 copies).
  • \
  • Volvox major cell adhesion protein (2 copies).
  • \
  • Bacterial immunogenic protein MPT70 (1 copy).
  • \
  • Human extracellular matrix protein periostin (4 copies).
  • \
  • Mammalian stabilin protein (7 copies).
  • \ \ 6229 IPR010483 \

    The alpha-2-macroglobulin receptor-associated protein (RAP) is a intracellular glycoprotein that binds to the 2-macroglobulin receptor and other members of the low density lipoprotein receptor family. The protein inhibits binding of all currently known ligands of these receptors PUBMED:9207124. Two different studies have provided conflicting domain boundaries.

    \ 5620 IPR008440 \ This family consists of several agglutinin-like proteins from different Candida species. ALS genes of Candida albicans encode a family of cell-surface glycoproteins with a three-domain structure. Each Als protein has a relatively conserved N-terminal domain, a central domain consisting of a tandemly repeated motif, and a serine-threonine-rich C-terminal domain that is relatively variable across the family. The ALS family exhibits several types of variability that indicate the importance of considering strain and allelic differences when studying ALS genes and their encoded proteins PUBMED:11124701.\ 5677 IPR008660 \ This family consists of several moth fibroin light chain (L-fibroin) proteins. Fibroin of Bombyx mori is secreted into the lumen of posterior silk gland (PSG) from the surrounding PSG cells as a molecular complex consisting of a heavy (H)-chain of approximately 350 kDa, a light (L)-chain of 25 kDa and a P25 of about 27 kDa. The H- and L-chains are disulphide-linked but P25 is associated with the H-L complex by non-covalent force PUBMED:10366732.\ 1985 IPR004352 \

    Eighty-one archaeal-like genes, ranging in\ size from 4-20kb, are clustered in 15 regions of the Thermotoga maritima genome PUBMED:10360571.\ Conservation of gene order between Thermotoga maritima and Archaea in many of these\ regions suggests that lateral gene transfer may have occurred between\ thermophilic Eubacteria and Archaea PUBMED:10360571.

    \

    One of the Thermotoga maritima sequences (hypothetical protein TM1410) \ shares similarity with Methanococcus jannaschii hypothetical protein MJ1477\ and with hypothetical protein DR0705 from Deinococcus radiodurans. The \ sequences are characterised by relatively variable N- and C-terminal domains,\ and a more conserved central domain. They share no similarity with any other \ known, functionally or structurally characterised proteins.

    \ 2296 IPR007023 \

    This is a family of eukaryotic ribosomal biogenesis regulatory proteins.

    \ 5663 IPR008849 \ This family consists of several eukaryotic synaphin 1 and 2 proteins. Synaphin/complexin is a cytosolic protein that preferentially binds to syntaxin within the SNARE complex. Synaphin promotes SNAREs to form precomplexes that oligomerise into higher order structures. A peptide from the central, syntaxin binding domain of synaphin competitively inhibits these two proteins from interacting and prevents SNARE complexes from oligomerising. It is thought that oligomerisation of SNARE complexes into a higher order structure creates a SNARE scaffold for efficient, regulated fusion of synaptic vesicles PUBMED:11239399. Synaphin promotes neuronal exocytosis by promoting interaction between the complementary syntaxin and synaptobrevin transmembrane regions that reside in opposing membranes prior to fusion PUBMED:12200427.\ 5251 IPR008693 \ This family contains several membrane proteins from Mycobacterium species PUBMED:11891304.\ 2762 IPR001524 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 6 comprises enzymes with several known activities; endoglucanase (); cellobiohydrolase (). These enzymes were formerly known as cellulase family B.

    \ \

    The 3D structure of the enzymatic core of cellobiohydrolase II (CBHII) from\ the fungus Trichoderma reesei reveals an alpha-beta protein with a fold\ similar to the ubiquitous barrel topology first seen in triose phosphate\ isomerase PUBMED:2377893. The active site of CBHII is located at the C-terminal end of\ a parallel beta barrel, in an enclosed tunnel through which the cellulose\ threads. Two aspartic acid residues, located in the center of the tunnel\ are the probable catalytic residues PUBMED:2377893.

    \ 2706 IPR005026 \ The protein called postsynaptic density (PSD) is a specialized\ submembranous structure within which synaptic membrane proteins are\ linked to cytoskeleton and signalling proteins. Guanylate-kinase-associated protein (PSD-95/synapse-associated protein 90) is one of the major\ components of PSD, and functions as a scaffold protein for various ion\ channels and associated signalling molecules. \ 2364 IPR006996 \ Dynamitin is a subunit of the microtubule-dependent motor complex, it is also implicated in cell adhesion by binding to macrophage-enriched myristoylated alanine-rice C kinase substrate (MacMARCKS) PUBMED:12082093.\ 6593 IPR010635 \

    This family consists of several heparan sulphate 6-sulfotransferase (HS6ST) proteins. Heparan sulphate 6- O -sulphotransferase (HS6ST) catalyses the transfer of sulphate from adenosine 3'-phosphate, 5'-phosphosulphate to the 6th position of the N-sulphoglucosamine residue in heparan sulphate PUBMED:12492399.

    \ 3893 IPR005599 \

    Members of this family are mannosyltransferase enzymes PUBMED:9576863, PUBMED:10954751. At least some members are localised in endoplasmic reticulum and involved in GPI anchor biosynthesis PUBMED:12200473, PUBMED:12030331. In yeast the SMP3 (YOR149C) has been implemented in plasmid stability PUBMED:2005867.

    \ 6241 IPR010487 \

    This family consists of several mouse and human neugrin proteins. Neugrin and m-neugrin are mainly expressed in neurons in the nervous system, and are thought to play an important role in the process of neuronal differentiation PUBMED:11118320.

    \ 4154 IPR006175 \

    This domain is found in endoribonuclease, that is active on single-stranded mRNA and inhibits protein synthesis by cleavage of mRNA PUBMED:10368157. Previously it was thought to inhibit protein synthesis initiation PUBMED:8530410. This endoribonuclease may also be involved in the regulation of purine biosynthesis PUBMED:10400702.

    \ 345 IPR006863 \ Biogenesis of Fe/S clusters involves a number of essential mitochondrial proteins. Erv1p of Saccharomyces cerevisiae mitochondria is required for the maturation of Fe/S proteins in the cytosol. The ALR (augmenter of liver regeneration) represents a mammalian ortholog of yeast Erv1p. Both Erv1p and full-length ALR are located in the mitochondrial intermembrane and it is thought to operate downstream of the mitochondrial ABC transporter. PUBMED:11493598.\ 1499 IPR003874 \ CDC45 is an essential gene required for initiation of DNA replication in Saccharomyces cerevisiae (cell division control protein 45), forming a complex with MCM5/CDC46. Homologs of CDC45 have been identified in human PUBMED:9660782, mouse and the smut fungus, Melampsora spp., (tsd2 protein) among others.\ 6386 IPR009502 \

    This family consists of several bacterial Secretion monitor precursor (SecM) proteins. SecM is known to regulate SecA expression by translational coupling of the secM secA operon. Translational pausing at a specific Pro residue 5 residues before the end of the protein may allow disruption of a mRNA repressor helix that normally suppresses secA translation initiation. The eubacterial protein secretion machinery consists of a number of soluble and membrane associated components. One critical element is SecA ATPase, which acts as a molecular motor to promote protein secretion at translocation sites that consist of SecYE, the SecA receptor, and SecG and SecDFyajC proteins, which regulate SecA membrane cycling PUBMED:10986266.

    \ 3882 IPR004126 \ Proteins in this group inhibit basic phospholipase A2 isozymes in snake's venom PUBMED:9395334.\ 3197 IPR005811 \

    This domain includes the CoA ligases Succinyl-CoA synthetase alpha and beta chains, malate CoA\ ligase and ATP-citrate lyase. Some members of the domain utilise ATP others use GTP.

    \ 2496 IPR005121 \

    This is the anticodon binding domain found in some phenylalanyl tRNA synthetases. The domain has a ferredoxin\ fold PUBMED:10447505, PUBMED:9016717.

    \ 2418 IPR005638 \

    This family contains insecticidal toxins produced by Bacillus species of bacteria. During spore formation the bacteria produce crystals of this protein. When an insect ingests these proteins they are activated by proteolytic cleavage. The N-terminus is cleaved in all of the proteins and a C-terminal extension is cleaved in some members. Once activated the endotoxin binds to the gut epithelium and causes cell lysis leading to death. This activated region of the delta endotoxin is composed of three structural domains. The N-terminal helical domain is involved in membrane insertion and pore formation. The second and third domains are involved in receptor binding.

    \ 1201 IPR000663 \ Atrial natriuretic peptides (ANPs) are vertebrate hormones that play an important role in the control of\ cardiovascular homeostatis, and sodium and water balance in general PUBMED:1652921, PUBMED:2536732, PUBMED:.\ There are different NPs that vary in length but share a common core. All are processed from a single precursor.\ A disulphide bond resident in the C-terminal section is required for full activity of atriopeptins. The family\ of NPs includes structurally-related peptides that elicit similar pharmacological spectra. Amongst these are\ brain natriuretic peptide (BNP); C-type natriuretic peptide (CNP); ventricular natriuretic peptide (VNP)\ PUBMED:1828035; and green mamba natriuretic peptide (DNP) PUBMED:1352773.\ 5771 IPR010269 \

    This family consists of a number of uncharacterised bacterial proteins. The function of this family is unknown.

    \ 5495 IPR008530 \ This family consists of several eukaryotic proteins of unknown function.\ 4907 IPR003766 \ Glucuronate isomerase catalyses the reaction D-glucuronate to D-fructuronate and also converts D-galacturonate to D-tagaturonate PUBMED:9882655.\ 4211 IPR000077 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaebacterial large subunit ribosomal proteins can be grouped on the basis of sequence similarities.\ These proteins are very basic. About 50 residues long, they are the smallest\ proteins of eukaryotic-type ribosomes.

    \ 1098 IPR005019 \

    This family of methyladenine glycosylases includes DNA-3-methyladenine glycosylase I () which acts as a base excision repair enzyme by severing the glycosylic bond\ of numerous damaged bases. The enzyme is constitutively expressed and is specific for the alkylated 3-methyladenine DNA.

    \ 5824 IPR009249 \

    This family consists of several different but closely related proteins which include phycocyanobilin:ferredoxin oxidoreductase (PcyA), 15,16-dihydrobiliverdin:ferredoxin oxidoreductase (PebA) and phycoerythrobilin:ferredoxin oxidoreductase (PebB). Phytobilins are linear tetrapyrrole precursors of the light-harvesting prosthetic groups of the phytochrome photoreceptors of plants and the phycobiliprotein photosynthetic antennae of cyanobacteria, red algae, and cryptomonads. It is known that that phytobilins are synthesised from heme via the intermediacy of biliverdin IX alpha (BV), which is reduced subsequently by ferredoxin-dependent bilin reductases with different double-bond specificities PUBMED:11283349.

    \ 1220 IPR004828 \ These antibacterial peptides are found in bees. These heat-stable, non-helical peptides are active against a wide range of plant-associated bacteria and some human pathogens PUBMED:2676519. This family contains a conserved region including the propeptide and apidaecin sequence.\ 5158 IPR007995 \

    This family consists of several uncharacterised Streptomyces proteins as well as one from\ Mycobacterium tuberculosis. The function of these proteins is\ unknown.

    \ 3659 IPR004965 \

    Paralemmin was identified in the chicken lens as a protein with a molecular weight of 65 kDa (isoform 1) and a splice variant of 60 kDa (isoform 2). Isoform 2 is predominant during infancy and levels of isoform 1 increase with age. Paralemmin is localised to the plasma membrane of fibre cells, and was not detected in the annular pad cells. Its localisation to the short side of the fibre cell and the sites of fibre cell interlocking suggests that paralemmin may play a role in the development of such interdigitating processes PUBMED:12874826. Palmitoylation is important for localising these proteins to the filopodia of dendritic cells where they have been implicated in the regulation of membrane dynamics and process outgrowth.

    \ 1118 IPR005641 \

    Hexon () is the major coat protein from adenovirus type 2. Hexon forms a homo-trimer. The 240 copies of the hexon trimer are organised so that 12 lie on each of the 20 facets. The central 9 hexons in a facet are cemented together by 12 copies of polypeptide IX.

    \ 3833 IPR007067 \ This family represents the tail sheath protein Gp18 of bacteriophage T4 and its homologues.\ 295 IPR002744 \ This family includes prokaryotic proteins of unknown\ function. The family also includes PhaH ()\ from Pseudomonas putida. PhaH forms a complex with\ PhaF (), PhaG () and PhaI (),\ which hydroxylates phenylacetic acid to 2-hydroxyphenylacetic\ acid PUBMED:9600981. So members of this family may all be components\ of ring hydroxylating complexes.\ 5885 IPR010333 \

    This entry contains several bacterial VirJ virulence proteins. VirJ is thought to be involved in the type IV secretion system. It is thought that the substrate proteins localised to the periplasm may associate with the pilus in a manner that is mediated by VirJ, and suggest a two-step process for type IV secretion in Agrobacterium PUBMED:12207700.

    \ 451 IPR004212 \ This region of sequence similarity is found up to six times in a variety of proteins including GTF2I. It has been suggested that this may be a DNA binding domain PUBMED:9774679, PUBMED:10198167.\ 588 IPR005111 \

    This domain is found in proteins involved in biosynthesis of molybdopterin cofactor however\ the exact molecular function of this domain is uncertain. The structure of this domain is\ known PUBMED:11525167 and forms an incomplete beta barrel.

    \ 3133 IPR005582 \

    This family contains MukF, which are proteins involved in the segregation and condensation of prokaryotic chromosomes. MukE () along with MukF interact with MukB () in vivo forming a complex, which is required for chromosome condensation and segregation in Escherichia coli PUBMED:10545099. The Muk complex appears to be similar to the SMC-ScpA-ScpB complex in other prokaryotes where MukB is the homologue of SMC PUBMED:12065423. ScpA () and ScpB () have little sequence similarity to MukE or MukF, though they are predicted to be structurally similar, being predominantly alpha-helical with coiled coil regions.

    \ 6495 IPR009554 \

    This family consists of several bacterial phage shock protein B (PspB) sequences. The phage shock protein (psp) operon is induced in response to heat, ethanol, osmotic shock and infection by filamentous bacteriophages PUBMED:1712397. Expression of the operon requires the alternative sigma factor sigma54 and the transcriptional activator PspF. In addition, PspA plays a negative regulatory role, and the integral-membrane proteins PspB and PspC play a positive one PUBMED:12562786.

    \ 2652 IPR002659 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    Glycosyltransferase family 31 () comprises\ enzymes with a number of known activities; N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase ();\ beta-1,3-galactosyltransferase (); fucose-specific beta-1,3-N-acetylglucosaminyltransferase (); globotriosylceramide beta-1,3-GalNAc transferase () PUBMED:9417100, PUBMED:9417047.

    \ 1777 IPR000512 \ Diphtheria toxin () is a 58 kDa protein secreted by lysogenic strains of Corynebacterium diphtheriae. The toxin causes the disease diphtheria in humans by gaining entry into the cell cytoplasm and inhibiting protein synthesis PUBMED:8573568. The mechanism of inhibition involves transfer of the ADP-ribose group of NAD to elongation factor-2 (EF-2), rendering EF-2 inactive. The catalysed reaction is as follows: \ \ The crystal structure of the diphtheria toxin homodimer has been determined to 2.5A resolution PUBMED:1589020. The structure reveals a Y-shaped molecule of 3 domains, a catalytic domain (fragment A), whose fold is of the alpha + beta type; a transmembrane (TM) domain, which consists of 9 alpha-helices, 2 pairs of which may participate in pH-triggered membrane insertion and translocation; and a receptor-binding domain, which forms a flattened beta-barrel with a jelly-roll-like topology PUBMED:1589020. The TM- and receptor binding-domains together constitute fragment B.\ 3379 IPR003448 \

    This family contains the MoaE protein that is involved\ in biosynthesis of molybdopterin PUBMED:8514782. Molybdopterin, the universal\ component of the pterin molybdenum cofactors, contains a dithiolene\ group serving to bind Mo. Addition of the dithiolene sulphurs to a\ molybdopterin precursor requires the activity of the converting factor.\ Converting factor contains the MoaE and MoaD proteins.

    \ \ 2481 IPR002346 \

    Oxidoreductases, that also bind molybdopterin, have essentially no similarity outside this common domain. \ They include aldehyde oxidase (), that converts an aldehyde and water to an acid and hydrogen peroxide, and xanthine dehydrogenase (), that converts xanthine to urate. These enzymes require molybdopterin and FAD as cofactors and have and two 2FE-2S clusters. Another enzyme that contains this domain is the Pseudomonas thermocarboxydovorans carbon monoxide oxygenase.

    \ 3142 IPR003472 \ This protein family is found in pox viruses, the function of the protein is unknown.\ 7443 IPR011522 \

    This entry represents YKOF-related proteins. The domain is found in pairs in these proteins.

    \ 5581 IPR008856 \ This family consists of several eukaryotic translocon-associated protein beta (TRAPB) or signal sequence receptor beta subunit (SSR-beta) proteins. The normal translocation of nascent polypeptides into the lumen of the endoplasmic reticulum (ER) is thought to be aided in part by a translocon-associated protein (TRAP) complex consisting of 4 protein subunits. The association of mature proteins with the ER and Golgi, or other intracellular locales, such as lysosomes, depends on the initial targeting of the nascent polypeptide to the ER membrane. A similar scenario must also exist for proteins destined for secretion PUBMED:11204460.\ 1986 IPR005177 \

    This is a family of bacterial proteins with no known function.

    \ 2676 IPR004588 \

    This protein previously of unknown biochemical function is essential in Escherichia coli. It has now been characterised as 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase, which converts 2C-methyl-D-erythritol 2,4-cyclodiphosphate (ME-2,4CPP) into 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate in the sixth step of nonmevalonate terpenoid biosynthesis. The family is restricted to bacteria, where it is widely but not universally distributed. No homology can be detected between this family and other proteins.

    \ 3637 IPR004303 \

    In the presence of calcium ions, Protein-arginine deiminase (PAD) enzymes catalyse the\ post-translational modification reaction responsible for the formation of citrulline residues from protein-bound arginine residues PUBMED:10092850. Four PAD isotypes of PAD have been identified in mammals, a fifth may also exist. Non-mammalian vertebrates appear to have only a single PAD enzyme. All known natural substrates of PAD are proteins known to have an important structural function, such as keratin (PAD1), intermediate filaments or proteins associated with intermediate filaments. Citrulination may have consequences for the structural integrity and interactions of these proteins. Physiological levels of calcium appear to be too low to activate these enzymes suggesting a role between PAD activation and loss of calcium homeostasis during terminal differentiation and cell death (apoptosis).

    \ \ \ \ \ 6961 IPR010784 \

    This family consists of several Plasmodium falciparum SPAM (secreted polymorphic antigen associated with merozoites) proteins. Variation among SPAM alleles is the result of deletions and amino acid substitutions in non-repetitive sequences within and flanking the alanine heptad-repeat domain. Heptad repeats in which the a and d position contain hydrophobic residues generate amphipathic alpha-helices which give rise to helical bundles or coiled-coil structures in proteins. SPAM is an example of a P. falciparum antigen in which a repetitive sequence has features characteristic of a well-defined structural element PUBMED:7891748,PUBMED:7893643.

    \ 2395 IPR007783 \ This family is made up of eukaryotic translation initiation factor 3 subunit 7 (eIF-3 zeta/eIF3 p66/eIF3d). Eukaryotic initiation factor 3 is a multi-subunit complex that is required for binding of mRNA to 40S ribosomal subunits, stabilisation of ternary complex binding to 40 S subunits, and dissociation of 40 and 60 S subunits. These functions and the complex nature of eIF3 suggest multiple interactions with many components of the translational machinery PUBMED:11042177. The gene coding for the protein has been implicated in cancer in mammals PUBMED:11733359.\ 293 IPR007592 \ This is a family of uncharacterised proteins.\ 2684 IPR003437 \

    This family consists of glycine cleavage system P-proteins () from bacterial, mammalian and plant sources. The P protein is part of the glycine decarboxylase multienzyme complex ( (GDC) also annotated as glycine cleavage system or glycine synthase. The P protein binds the alpha-amino group of glycine through its pyridoxal phosphate cofactor, carbon dioxide is released and the remaining methylamin moiety is then transferred to the lipoamide cofactor of the H protein. GDC consists of four proteins P, H, L and T PUBMED:8181752. The reaction catalysed by this protein is:

    \

    Glycine + lipoylprotein = S-aminomethyldihydrolipoylprotein + CO2

    \ 4562 IPR005604 \

    The bacteriophage T7 tail complex consists of a conical tail-tube surrounded by six kinked tail-fibers, which are oligomers of the viral protein gp17.

    \ 3321 IPR004223 \

    Vitamin B12 dependent methionine synthase (5-methyltetrahydrofolate--homocysteine S-methyltransferase) catalyses the conversion of 5-methyltetrahydrofolate and L-homocysteine to tetrahydrofolate and L-methionine as the final step in de novo methionine biosynthesis. The enzyme requires methylcobalamin as a cofactor. In humans, defects in this enzyme are the cause of autosomal recessive inherited methylcobalamin deficiency (CBLG), which causes mental retardation, macrocytic anemia and homocystinuria. Mild deficiencies in activity may result in mild hyperhomocysteinemia, and mutations in the enzyme may be involved in tumorigenesis. Vitamin B12 dependent methionine synthase is found in prokaryotes and eukaryotes, but in prokaryotes the cofactor is cobalamin.

    \

    In Escherichia coli, methionine synthase is a large enzyme composed of four structurally and functionally distinct modules: the first two modules bind homocysteine and tetrahydrofolate, the third module binds the B12 cofactor (, ), and the C-terminal module (activation domain) binds S-adenosylmethionine. The activation domain is essential for the reductive activation of the enzyme. During the catalytic cycle, the highly reactive cob(I)alamin intermediate can be oxidised to produce an inactive cob(II)alamin enzyme; the enzyme is then reactivated via reductive methylation by the activation domain PUBMED:11731805. The activation domain adopts an unusual alpha/beta fold.

    \ 6077 IPR010424 \

    The eut operon of Salmonella typhimurium encodes proteins involved in the cobalamin-dependent degradation of ethanolamine. The role of EutQ in this process is unclear PUBMED:10464203.

    \ 5592 IPR008895 \ The proteins in this family are designated YL1 PUBMED:7702631. They have been shown to be DNA-binding and may be transcription factors PUBMED:7702631.\ 1879 IPR003390 \ This domain is about 120 amino acids long. The function of this domain is unknown, however the distribution of conserved histidines and aspartates suggests that this may be a metal dependent phosphoesterase. This may be a nuclear domain as the hypothetical protein YacK from Bacillus subtilis also contains a Helix-hairpin-helix HHH motif that is characteristic of DNA binding proteins.\ 1953 IPR004853 \ This family consists entirely of aligned regions from Drosophila melanogaster proteins. contains three repeats of this region. In other proteins, the aligned region is located towards the C-terminus. The function of the aligned region is unknown. \ \ 2766 IPR005195 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    The family of glycosyl hydrolases () which contains this domain includes vacuolar acid trehalase and maltose phosphorylase. Maltose phosphorylase (MP) is a dimeric enzyme that catalyzes the conversion of maltose and inorganic phosphate into beta-D-glucose-1-phosphate and glucose. The central domain is the catalytic domain, which binds a phosphate ion that is proximal the the highly conserved Glu. The arrangement of the phosphate and the glutamate is thought to cause nucelophilic attack on the anomeric carbon atom PUBMED:11587643. The catalytic domain also forms the majority of the dimerisation interface.

    \ 555 IPR000483 \

    Leucine-rich repeats (LRR, see ) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape PUBMED:14747988. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions PUBMED:11751054. Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response.

    \ \

    LRRs are often flanked by cysteine-rich domains: an N-terminal LRR domain () and a C-terminal LRR domain. This entry represents the C-terminal LRR domain.

    \ \ 2022 IPR005651 \

    This family of short proteins have no known function. The bacterial members are about 60-70 amino acids in length and the eukaryotic examples are about 120 amino acids in length. The C-terminus contains the strongest conservation.

    \ 3196 IPR001581 \

    On the basis of functional and structural similarities, the small cytokines leukemia inhibitory factor (LIF) and oncostatin (OSM) can be classified into a single family PUBMED:1566332, PUBMED:1717982.

    \

    It has been said PUBMED:1717982 that LIF and OSM can be included in the IL-6 family of cytokines (), but while all these cytokines seem to be structurally related, the sequence similarity is not high enough to allow the use of a single consensus pattern.

    \ \ 7762 IPR012853 \

    The members of this family are all similar to chloramphenicol 3-O phosphotransferase (CPT, ) expressed by Streptomyces venezuelae. Chloramphenicol (Cm) is a metabolite produced by this bacterium that can inhibit ribosomal peptidyl transferase activity and therefore protein production. By transferring a phosphate group to the C-3 hydroxyl group of Cm, CPT inactivates this potentially lethal metabolite PUBMED:11468347.

    \ 358 IPR008333 \

    These sequences contain an oxidoreductase FAD-binding domain.

    \

    To date, the 3D-structures of the flavoprotein domain of Zea mays nitrate reductase PUBMED:7812715 and of pig NADH:cytochrome b5 reductase PUBMED:7893687 have been solved. The overall fold is similar to that of ferredoxin:NADP+ reductase PUBMED:8027025: the FAD-binding domain (N-terminal) has the topology of an anti-parallel beta-barrel, while the NAD(P)-binding domain (C-terminal) has the topology of a classical pyridine dinucleotide-binding fold (i.e. a central parallel beta-sheet flanked by 2 helices on each side).

    \ \ 1336 IPR007765 \ The unidentified baculovirus protein p24 is associated with nucleocapsids of budded and polyhedra-derived virions PUBMED:11602755, PUBMED:8423444.\ 1232 IPR002774 \ Members of this family are the proteins that form the flagella\ in archaebacteria PUBMED:3417656. Each bacterium has\ multiple members of this family.\ 3817 IPR003515 \ This is a family of proteins from single-stranded DNA bacteriophages. The G protein is a major spike protein involved in attachment to the bacterial host cell. The virion is composed of sixty copies of each of the F, G and J proteins, and 12 copies of the H protein. There are twelve spikes formed by five G proteins, each a tight beta barrel, and one H protein.\ 966 IPR005372 \

    This family contains uncharacterised integral membrane proteins.

    \ 1951 IPR004348 \ The function of this family of plant proteins is unknown.\ 4522 IPR001217 \

    The STAT protein (Signal Transducers and Activators of Transcription) family contains transcription factors that are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus PUBMED:12039028. Binding of these factors to cell-surface receptors leads to receptor autophosphorylation at a tyrosine, the phosphotyrosine being recognised by the STAT SH2 domain, which mediates the recruitment of STAT proteins from the cytosol and their association with the activated receptor. The STAT proteins are then activated by phosphorylation via members of the JAK family of protein kinases, causing them to dimerise and translocated to the nucleus, where they bind to specific promoter sequences in target genes. In mammals, STATs comprise a family of seven structurally and functionally related proteins: Stat1, Stat2, Stat3, Stat4, Stat5a and Stat5b, Stat6. STAT proteins play a critical role in regulating innate and acquired host immune responses. Dysregulation of at least two STAT signaling cascades (i.e. Stat3 and Stat5) is associated with cellular transformation.

    \

    Signaling through the JAK/STAT pathway is initiated when a cytokine binds to its corresponding receptor. This leads to conformational changes in the\ cytoplasmic portion of the receptor, initiating activation of receptor associated members of the JAK family of kinases. The JAKs, in turn, mediate phosphorylation at the specific receptor tyrosine residues, which then serve as docking sites for STATs and other signaling molecules. Once recruited to the receptor, STATs also become phosphorylated by JAKs, on a single tyrosine residue. Activated STATs dissociate from the receptor, dimerize, translocate to the nucleus and bind to members of the GAS (gamma activated site) family of enhancers.

    \

    The seven STAT proteins identified in mammals range in size from 750 and 850 amino acids. The chromosomal distribution of these STATs, as well as the identification of STATs in more primitive eukaryotes, suggest that this family arose from a single primordial gene. STATs share structurally and functionally conserved domains including: an N-terminal domain that strengthens interactions between STAT dimers on adjacent DNA-binding sites; a coiled-coil STAT domain that is implicated in protein-protein interactions; a DNA-binding domain with an immunoglobulin-like fold similar to p53 tumour suppressor protein; an EF-hand-like linker domain connecting the DNA-binding and SH2 domains; an SH2 domain () that acts as a phosphorylation-dependent switch to control receptor recognition and DNA-binding; and a C-terminal transactivation domain PUBMED:9630226. The crystal structure of the N-terminus of Stat4 reveals a dimer. The interface of this dimer is formed by a ring-shaped element consisting of five short helices. Several studies suggest that this N-terminal dimerization promotes cooperativity of binding to tandem GAS elements and with the transcriptional coactivator CBP/p300.

    \ 6873 IPR010753 \

    This family consists of several hypothetical bacterial proteins of around 90 residues in length. The function of this family is unknown.

    \ 3495 IPR003635 \ Tachykinins PUBMED:3284438, PUBMED:1969374, PUBMED:1324401 are a group of biologically active peptides which excite\ neurons, evoke behavioral responses, are potent vasodilatators and contract\ (directly or indirectly) many smooth muscles. This family includes neurokinins, as well as many other peptides. Like other tachykinins, neurokinins are synthesized as larger protein precursors that are enzymatically converted to their mature forms.\ 3302 IPR003179 \ Methyl-coenzyme M reductase (MCR) is the enzyme responsible for microbial formation of methane. It is a hexamer composed of 2 alpha, 2 beta, and 2 gamma subunits with two identical nickel porphinoid active sites PUBMED:9367957.\ 7712 IPR012888 \

    Proteins containing this domain are similar to L-fucose isomerase expressed by Escherichia coli (, ). This enzyme corresponds to glucose-6-phosphate isomerase in glycolysis, and converts an aldo-hexose to a ketose to prepare it for aldol cleavage. The enzyme is a hexamer, with each subunit being wedge-shaped and composed of three domains. Both domains 1 and 2 contain central parallel beta-sheets with surrounding alpha helices. Domain 1 demonstrates the beta-alpha-beta-alpha- beta Rossman fold. The active centre is shared between pairs of subunits related along the molecular three-fold axis, with domains 2 and 3 from one subunit providing most of the substrate-contacting residues, and domain 1 from the adjacent subunit contributing some other residues PUBMED:9367760.

    \ 6377 IPR009500 \

    This family consists of several hypothetical plant proteins of unknown function.

    \ 2136 IPR007417 \ This is a family of uncharacterised archaeal proteins.\ 7683 IPR012865 \

    The sequences making up this family are derived from various hypothetical phage and prophage proteins. The region in question is approximately 140 amino acids long.

    \ 880 IPR000424 \ The Escherichia coli single-strand binding protein PUBMED:2087220 (gene ssb), also known\ as the helix-destabilizing protein, is a protein of 177 amino acids. It\ binds tightly, as a homotetramer, to single-stranded DNA (ss-DNA) and plays an\ important role in DNA replication, recombination and repair.\ Closely related variants of SSB are encoded in the genome of a variety of\ large self-transmissible plasmids. SSB has also been characterized in bacteria\ such as Proteus mirabilis or Serratia marcescens.\ Eukaryotic mitochondrial proteins that bind ss-DNA and are probably involved\ in mitochondrial DNA replication are structurally and evolutionary related to\ prokaryotic SSB.\ 7161 IPR010851 \

    This family consists of a number of cysteine rich SLR1 binding pollen coat like proteins. Adhesion of pollen grains to the stigmatic surface is a critical step during sexual reproduction in plants. In Brassica, S locus-related glycoprotein 1 (SLR1), a stigma-specific protein belonging to the S gene family of proteins, has been shown to be involved in this step. SLR1-BP specifically binds SLR1 with high affinity. The SLR1-BP gene is specifically expressed in pollen at late stages of development and is a member of the class A pollen coat protein (PCP) family, which includes PCP-A1, an SLG (S locus glycoprotein)-binding protein PUBMED:10716697.

    \ 5017 IPR007853 \ This family contains a short presumed domain which probably binds to zinc. It is found in a number of eukaryotic proteins and is named after a short C-terminal motif of D(N/H)L. The domain is found in proteins having a novel zinc-finger essential for protein import into mitochondria PUBMED:15383543.\ 1474 IPR003153 \

    Cbl adaptor proteins are RING-type E3 ubiquitin ligases. Cbl may be involved in the negative regulation of thymocyte development, targeting its substrate for ubiquitination PUBMED:11864842. The ubiquitin ligase activity of Cbl, and of its homologue Cbl-b, plays a role in the negative regulation of upstream kinases, such as Lck, Syk and PI3K, in T and B cells PUBMED:12787751. Cbl can interact with the EGF receptor (EGFR), causing the ubiquitination of the receptor following EGF ligand binding and Grb2 association. Ubiquitination is required for ligand-induced endocytosis of the EGFR PUBMED:15194809. The N-terminal domain of Cbl is evolutionarily conserved, and is known to bind to phosphorylated tyrosine residues.

    \ 6724 IPR009681 \

    This family consists of several bacterial and phage proteins of around 115 residues in length. The function of this family is unknown.

    \ 7473 IPR013091 \

    A sequence of about forty amino-acid residues found in epidermal growth factor (EGF) has been shown PUBMED:2288911, PUBMED:6334307, PUBMED:3534958, PUBMED:6607417, PUBMED:3282918, PUBMED: to be present in a large number of membrane-bound and extracellular, mostly animal, proteins. Many of these proteins require calcium for their biological function and a calcium-binding site has been found at the N-terminus of some EGF-like domains PUBMED:1527084. Calcium-binding may be crucial for numerous protein-protein interactions.

    \

    For human coagulation factor IX it has been shown PUBMED:7606779 that the calcium-ligands form a pentagonal bipyramid. The first, third and fourth conserved negatively charged or polar residues are side chain ligands. The latter is possibly hydroxylated (see aspartic acid and asparagine hydroxylation site) PUBMED:1527084. A conserved aromatic residue, as well as the second conserved negative residue, are thought to be involved in stabilizing the calcium-binding site.

    \

    As in non-calcium binding EGF-like domains, there are six conserved cysteines and the structure of both types is very similar as calcium-binding induces only strictly local structural changes PUBMED:1527084.

    \
    \
                                 +------------------+        +---------+\
                                 |                  |        |         |\
                   nxnnC-x(3,14)-C-x(3,7)-CxxbxxxxaxC-x(1,6)-C-x(8,13)-Cx\
                       |                  | \
                       +------------------+\
    \
    'n': negatively charged or polar residue [DEQN]\
    'b': possibly beta-hydroxylated residue [DN]\
    'a': aromatic amino acid\
    'C': cysteine, involved in disulphide bond\
    'x': any amino acid.\
    
    \ 6506 IPR009564 \

    This family consists of several hypothetical Caenorhabditis elegans proteins of around 106 residues in length. The function of the family is unknown.

    \ 2695 IPR000263 \ Geminiviruses are characterised by a genome of circular single-stranded DNA encapsidated in twinned (geminate) quasi-isometric particles, from which the group derives its name PUBMED:. Most geminiviruses can be divided into 2 subgroups on the basis of host range and/or insect vector: i.e. those that infect dicotyledenous plants and are transmitted by the same whitefly species, and those that infect monocotyledenous plants and are transmitted by different leafhopper vectors. \ It has been shown that the 104 N-terminal amino acids of the maize streak virus coat protein bind DNA non-specifically PUBMED:9191917.\ 2451 IPR005095 \

    EspA is the prototypical member of this family. EspA, together with EspB, EspD and Tir are exported by a type III secretion system. These proteins are essential for\ attaching and effacing lesion formation. EspA is a structural protein and a major component of a large, transiently expressed, filamentous surface organelle which\ forms a direct link between the bacterium and the host cell PUBMED:9545230, PUBMED:10760148.

    \ 2342 IPR002767 \ This family contains proteins of unknown function. Members of this family\ are found in archaebacteria, eukaryotes and eubacteria.\ 586 IPR005301 \

    Mob1 is an essential Saccharomyces cerevisiae protein, identified from a two-hybrid screen, that binds Mps1p, a protein kinase essential for spindle pole body duplication and mitotic checkpoint regulation. Mob1 contains no known structural motifs; however MOB1 is a member of a conserved gene family and shares sequence similarity with a nonessential yeast gene, MOB2. Mob1 is a phosphoprotein in vivo and a substrate for the Mps1p kinase in vitro. Conditional alleles of MOB1 cause a late nuclear division arrest at restrictive temperature PUBMED:9436989. This family also includes phocein , a rat protein that by yeast two hybrid interacts with striatin PUBMED:11251078.

    \ 1668 IPR003706 \ Escherichia coli induces the synthesis of at least 30 proteins at the onset of carbon starvation, two-thirds of which are positively regulated by the cyclic AMP (cAMP) and cAMP receptor protein (CRP) complex. \ This family consists of carbon starvation protein CstA a predicted membrane protein. It has been suggested that\ CstA is involved in peptide utilization PUBMED:1848300.\ 6992 IPR009831 \

    This family consists of several putative bacterial flagellar hook associated protein 3 (HAP3 or FlgL) sequences. Members of this family appear to be specific to the Order Rhizobiales. No experimental evidence could be found to support the function assigned to family members.

    \ 6026 IPR009340 \

    This is a family of conserved Schizosaccharomyces pombe proteins with unknown function.

    \ 6531 IPR009582 \

    This family consists of several microsomal signal peptidase 25 kDa subunit proteins. Translocation of polypeptide chains across the endoplasmic reticulum (ER) membrane is triggered by signal sequences. Subsequently, signal recognition particle interacts with its membrane receptor and the ribosome-bound nascent chain is targeted to the ER where it is transferred into a protein-conducting channel. At some point, a second signal sequence recognition event takes place in the membrane and translocation of the nascent chain through the membrane occurs. The signal sequence of most secretory and membrane proteins is cleaved off at this stage. Cleavage occurs by the signal peptidase complex (SPC) as soon as the lumenal domain of the translocating polypeptide is large enough to expose its cleavage site to the enzyme. The signal peptidase complex is possibly also involved in proteolytic events in the ER membrane other than the processing of the signal sequence, for example the further digestion of the cleaved signal peptide or the degradation of membrane proteins. Mammalian signal peptidase is as a complex of five different polypeptide chains. This family represents the 25 kDa subunit (SPC25).

    \ 2428 IPR001026 \

    The ENTH (Epsin N-terminal homology) domain is approximately 150 amino acids in length and is always found located at the N-termini of proteins. The domain forms a compact globular structure, composed of 9 alpha-helices connected by loops of varying length. The general topology is determined by three helical hairpins that are stacked consecutively with a right hand twist. PUBMED:11911874. An N-terminal helix folds back, forming a deep basic groove that\ forms the binding pocket for the Ins(1,4,5)P3 ligand PUBMED:12353027. The ligand is coordinated by residues from surrounding alpha-helices and all three phosphates are multiply coordinated. The coordination of Ins(1,4,5)P3 suggests that ENTH is specific for particular head groups.

    \

    Proteins containing this domain have been found to bind PtdIns(4,5)P2 and PtdIns(1,4,5)P3 suggesting that the domain may be a membrane interacting module. The main function of proteins containing this domain appears to be to act as accessory clathrin adaptors in endocytosis, Epsin is able to recruit and promote clathrin polymerisation on\ a lipid monolayer, but may have additional roles in signalling and actin regulation PUBMED:10048338. Epsin causes a strong degree of membrane curvature and\ tubulation, even fragmentation of membranes with a high PtdIns(4,5)P2 content. Epsin binding to\ membranes facilitates their deformation by insertion of the N-terminal helix into the outer leaflet of the bilayer, pushing the head groups\ apart. This would reduce the energy needed to curve the membrane into a vesicle, making it easier for the clathrin cage to\ fix and stabilise the curved membrane. This points to a pioneering role for epsin in vesicle\ budding as it provides both a driving force and a link between membrane invagination and clathrin polymerisation.

    \ 6774 IPR009708 \

    This family consists of several Listeria bacteriophage holin proteins and related bacterial sequences. Holins are a diverse family of proteins that cause bacterial membrane lysis during late-protein synthesis. It is thought that the temporal precision of holin-mediated lysis may occur through the build up of a holin oligomer which causes the lysis PUBMED:11459934.

    \ 5563 IPR008859 \ This region is found at the C terminus of thrombospondin and related proteins.\ 5447 IPR008505 \

    This family consists of several hypothetical proteins of unknown function from Borrelia burgdorferi. They may be proteinases as the majority contain a propeptide proteinase inhibitor domain which is associated with both serine and metallopeptidases.

    \ 8103 IPR013137 \

    In eukaryotes the initiation of transcription of protein encoding genes by the polymerase II complexe (Pol II) is modulated by general and specific transcription factors. The general transcription factors operate through common promoters elements (such as the TATA box). At least seven different proteins associate to form the general transcription factors: TFIIA, -IIB, -IID, -IIE, -IIF, -IIG, and -IIH PUBMED:1633439.

    \

    TFIIB and TFIID are responsible for promoter recognition and interaction with pol II; together with Pol II, they form a minimal initiation complex capable of transcription under certain conditions. The TATA box of a Pol II promoter is bound in the initiation complex by the TBP subunit of TFIID, which bends the DNA around the C-terminal domain of TFIIB whereas the N-terminal zinc finger of TFIIB interacts with Pol II PUBMED:8516312, PUBMED:8504927.

    \

    The TFIIB zinc finger adopts a zinc ribbon fold characterized by two β-haipins forming two structurally similar zinc-binding sub-sites PUBMED:8564536. The zinc finger contacts the rbp1 subunit of Pol II through its dock domain, a conserved region of about 70 amino acids located close to the polymerase active site PUBMED:15024075. In the Pol II complex this surface is located near the RNA exit groove. Interestingly this sequence is best conserved in the three polymerases that utilize a TFIIB-like general transcription factor (Pol II, Pol III, and archaeal RNA polymerase) but not in Pol I PUBMED:15024075.

    \ 5327 IPR008468 \ DNA methylation can contribute to transcriptional silencing through several transcriptionally repressive complexes, which include methyl-CpG binding domain proteins (MBDs) and histone deacetylases (HDACs). The chief enzyme that maintains mammalian DNA methylation, DNMT1, can also establish a repressive transcription complex. The non-catalytic N terminus of DNMT1 binds to HDAC2 and DMAP1 (for DNMT1 associated protein), and can mediate transcriptional repression. DMAP1 has intrinsic transcription repressive activity, and binds to the transcriptional co-repressor TSG101. DMAP1 is targeted to replication foci through interaction with the far N terminus of DNMT1 throughout S phase, whereas HDAC2 joins DNMT1 and DMAP1 only during late S phase, providing a platform for how histones may become deacetylated in heterochromatin following replication PUBMED:10888872.\ 6911 IPR009785 \

    This family consists of several bacterial and phage proteins of around 230 residues in length. The function of this family is unknown.

    \ 2858 IPR002532 \ The medium (M) genome segment of hantaviruses (family Bunyaviridae)\ encodes the two virion glycoproteins PUBMED:3114716. G1 and G2, as a precursor\ protein in the complementary sense RNA.\ 7083 IPR009884 \

    This family consists of several Benyvirus specific 14KDa proteins of around 125 residues in length. Members of this family contain 9 conserved cysteine residues. The function of this family is unknown.

    \ 4654 IPR006052 \

    The following cytokines can be grouped into a family on the basis of sequence, functional, and structural similarities PUBMED:8095800, PUBMED:1377364, PUBMED:15335677:

    \ \

    All these cytokines seem to form homotrimeric (or heterotrimeric in the case of LT-alpha/beta) complexes that are recognized by their specific receptors. The PROSITE pattern for this family is located in a beta-strand in the central section of the protein which is conserved across all members.

    \ 6140 IPR009392 \

    This family consists of several Drosophila ACP53EA accessory gland (seminal) proteins.

    \ 5044 IPR007525 \

    Coenzyme F420 hydrogenase () reduces the low-potential two-electron acceptor coenzyme F420. This family contains the C-termini of F420 hydrogenase and dehydrogenase beta subunits PUBMED:2207102, PUBMED:10751389. The C terminus of Methanobacterium formicicum formate dehydrogenase beta chain (, ) is also represented in this entry PUBMED:3531194. This region is often found in association with the 4Fe-4S binding domain, fer4 (), and the N terminus .

    \ 5034 IPR007395 \

    Members of this family of bacterial proteins are described as hypothetical proteins or zinc-dependant proteases. The majority have a HExxH zinc-binding motif characteristic of neutral zinc metallopeptidases, however there is no evidence to support their function as metallopeptidases.

    \ 2881 IPR002051 \

    Haem oxygenase () (HO) PUBMED:3290025 is the microsomal enzyme that, in animals, carries out the oxidation of haem, it cleaves the haem ring at the alpha-methene bridge to form biliverdin and carbon monoxide PUBMED:3032976. Biliverdin is subsequently converted to bilirubin by biliverdin reductase. In mammals there are three isozymes of haem oxygenase: HO-1 to HO-3. The first two isozymes differ in their tissue expression and their inducibility: HO-1 is highly inducible by its substrate haem and by various non-haem substances, while HO-2 is non-inducible. It has been suggested PUBMED:8093563 that HO-2 could be implicated in the production of carbon monoxide in the brain where it is said to act as a neurotransmitter. In the genome of the chloroplast of red algae as well as in cyanobacteria, there is a haem oxygenase (gene pbsA) that is the key enzyme in the synthesis of the chromophoric part of the photosynthetic antennae PUBMED:9326680. A haem oxygenase is also present in the bacteria Corynebacterium diphtheriae (gene hmuO), where it is involved in the acquisition of iron from the host haem PUBMED:9006041. There is, in the central section of these enzymes, a well-conserved region centred on a histidine residue.

    \ 7845 IPR012548 \

    This family contains many hypothetical proteins.

    \ 6658 IPR009646 \

    The cells at the periphery of the root cap are continuously sloughed off from the root into the mucilage, and are thought to be programmed to die PUBMED:10427770.This family represents a conserved region approximately 60 residues in length within plant root cap proteins, which may be involved in the process.

    \ 4542 IPR004596 \

    All proteins in this family for which the functions are known are cell division inhibitors. In Escherichia coli, SulA is one of the SOS regulated genes. Accumulation of SulA causes rapid cessation of cell division and the appearance of long, non-septate filaments. The expression of SulA is repressed by LexA. The N-terminus of SulA may be involved in recognising the cell division apparatus.

    \ 5312 IPR008665 \

    This iron sulphur cluster is found at the N terminus of some proteins containing leucine-repeat variant (LRV) repeats (). These proteins have a two-domain structure, composed of a small N-terminal domain containing a cluster of four Cys residues that houses the 4Fe:4S cluster, and a larger C-terminal domain containing the LRV repeats PUBMED:8946850. Biochemical studies revealed that the 4Fe:4S cluster is sensitive to oxygen, but does not appear to have reversible redox activity.

    \ 872 IPR007159 \ This domain is found in AbrB from Bacillus subtilis. The product of the abrB gene is an ambiactive repressor and activator of the transcription of genes expressed during the transition state between vegetative growth and the onset of stationary phase and sporulation PUBMED:2504584. AbrB is thought to interact directly with the transcription initiation regions of genes under its control PUBMED:8755877. AbrB contains a helix-turn-helix structure, but this domain ends before the helix-turn-helix begins PUBMED:1908787. The product of the Bacillus subtilis gene spoVT is another member of this family and is also a transcriptional regulator PUBMED:8755877. DNA-binding activity in this AbrB homologue requires hexamerisation PUBMED:10978510. Another family member has been isolated from the Sulfolobus solfataricus and has been identified as a homologue of bacterial repressor-like proteins. The Escherichia coli family member SohA or Prl1F appears to be bifunctional and is able to regulate its own expression as well as relieve the export block imposed by high-level synthesis of beta-galactosidase hybrid proteins PUBMED:2152898.\ 7469 IPR011485 \

    This is a family of proteins for which no function is known yet.

    \ 8026 IPR013172 \

    Drosophila immune-induced molecules (DIMs) are short proteins induced during the immune response of Drosophila. This family includes DIMs 1 to 4 that have masses below 5 kDa PUBMED:9736738.

    \ 2556 IPR003481 \ The flagellar hook-associated protein 2 (HAP2 or FliD) is the capping protein for the flagella and forms the distal end of the flagella. The protein plays a role in mucin specific adhesion of the bacteria PUBMED:9488388.\ 4834 IPR005339 \

    DNA replication in eukaryotes results from a highly coordinated interaction between proteins, often as part of protein complexes, and the DNA template. One of the key early steps leading to DNA replication is formation of the prereplication complex, or pre-RC. The pre-RC is formed by the sequential binding of the origin recognition complex (ORC), Cdc6 and Cdt1 proteins, and the MCM complex. Activation of the pre-RC into the initiation complex (IC) is achieved via the action of S-phase kinases, eventually leading to the loading of the replication machinery.

    \

    Recently, a novel replication complex, GINS (for Go, Ichi, Nii, and San; five, one, two, and three in Japanese), has been identified PUBMED:12730133, PUBMED:12730134. The precise function of GINS is not known. However, genetic and two-hybrid interactions indicate that it mediates the loading of the enzymatic replication machinery at a step after the action of the S-phase kinases PUBMED:12730134. Furthermore, GINS may be a part of the replication machinery itself, since it is found associated with replicating DNA PUBMED:12730133, PUBMED:12730134. Electron microscopy of GINS shows that it forms a ring-like structure PUBMED:12730133, reminiscent of the structure of PCNA PUBMED:8001157, the DNA polymerase delta replication clamp.This observation, coupled with the observed interactions for GINS, indicates that the complex may represent the replication clamp for DNA polymerase epsilon PUBMED:12730133.

    \

    This family of proteins represents the PSF1 component (for partner of SLD five) of the GINS complex.

    \ \ 4140 IPR005060 \

    The matrix (M) proteins of rabies virus (RV) plays a key role in both assembly and\ budding of progeny virions. A PPPY motif (PY motif or late-budding domain) is conserved in the M proteins. These PY motifs are important for virus budding and for mediating interactions with specific cellular proteins containing\ WW domains.

    \ 4019 IPR001056 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight phosphoprotein PsbH found in PSII. The phosphorylation site of PsbH is located in the N-terminus, where reversible phosphorylation is light-dependent and redox-controlled. PsbH is necessary for the photoprotection of PSII, being required for: (1) the rapid degradation of photodamaged D1 core protein to prevent further oxidative damage to the PSII core, and (2) the insertion of newly synthesised D1 protein into the thylakoid membrane PUBMED:12909614. PsbH may also regulate the transfer of electrons from D2 (Qa) to D1 (Qb) in the reaction core.

    \ \ \ 1513 IPR003922 \

    An operon encoding 4 proteins required for bacterial cellulose biosynthesis\ (bcs) in Acetobacter xylinum has been isolated via genetic complementation\ with strains lacking cellulose synthase activity PUBMED:2146681. Nucleotide sequence analysis showed the cellulose synthase operon to consist of 4 genes, \ designated bcsA, bcsB, bcsC and bcsD, all of which are required for maximal bacterial cellulose synthesis in A. xylinum.

    \

    The calculated molecular mass of the protein encoded by bcsD is 17.3kDa PUBMED:2146681. The function of BcsD is unknown.

    \ 4687 IPR005474 \

    Transketolase () (TK) catalyzes the reversible transfer of a\ two-carbon ketol unit from xylulose 5-phosphate to an aldose receptor, such as\ ribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3-\ phosphate. This enzyme, together with transaldolase, provides a link between\ the glycolytic and pentose-phosphate pathways.\ TK requires thiamine pyrophosphate as a cofactor. In most sources where TK has\ been purified, it is a homodimer of approximately 70 Kd subunits. TK sequences\ from a variety of eukaryotic and prokaryotic sources PUBMED:1567394, PUBMED:1737042 show that the\ enzyme has been evolutionarily conserved.\ In the peroxisomes of methylotrophic yeast Hansenula polymorpha, there is a\ highly related enzyme, dihydroxy-acetone synthase (DHAS) () (also\ known as formaldehyde transketolase), which exhibits a very unusual\ specificity by including formaldehyde amongst its substrates.

    \ 1-deoxyxylulose-5-phosphate synthase (DXP synthase) PUBMED:9371765 is an enzyme so far\ found in bacteria (gene dxs) and plants (gene CLA1) which catalyzes the\ thiamine pyrophosphoate-dependent acyloin condensation reaction between carbon\ atoms 2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D-\ xylulose-5-phosphate (dxp), a precursor in the biosynthetic pathway to\ isoprenoids, thiamine (vitamin B1), and pyridoxol (vitamin B6). DXP synthase\ is evolutionary related to TK. The N-terminal section, contains a histidine residue which appears to function in\ proton transfer during catalysis PUBMED:1628611. In the central\ section there are conserved acidic residues that are part of the active cleft\ and may participate in substrate-binding PUBMED:1628611.\ This family includes transketolase enzymes \ and also partially matches to 2-oxoisovalerate dehydrogenase\ beta subunit . Both these enzymes\ utilise thiamine pyrophosphate as a cofactor, suggesting\ there may be common aspects in their mechanism of catalysis.

    \ 2120 IPR007407 \ This is a putative periplasmic protein.\ 1327 IPR006725 \ This family includes several hypothetical baculoviral proteins, with predicted molecular weights of approximately 44 kDa.\ 5902 IPR009279 \

    This family consists of several bacterial proteins of unknown function as well as the Bacteriophage Mu gp29 protein .

    \ 3550 IPR007187 \ RNA undergoing nuclear export first encounters the basket of the nuclear pore. Nup133 is a nucleoporin accessible on the basket side of the pore.\ 3330 IPR002629 \ This is a domain of vitamin-B12 independent methionine synthases or 5-methyltetrahydropteroyltriglutamate--homocysteine methyltransferases, from bacteria and plants. Plants are the only higher eukaryotes that have the required enzymes for methionine synthesis PUBMED:9636232. This enzyme catalyses the last step in the production of methionine by transferring a methyl group from 5-methyltetrahydrofolate to homocysteine PUBMED:9636232. The aligned region makes up the carboxy region of the approximately 750 amino acid protein except in some hypothetical archaeal proteins present in the family, where this region corresponds to the entire length.\ 7251 IPR009988 \

    This family consists of several hypothetical bacterial proteins of around 200 residues in length. The function of this family is unknown.

    \ 4371 IPR006160 \ Members of this family may be short chain fatty acid transporters although there has been no experimental characterisation of this function.\ 3261 IPR006395 \

    These sequences describe methylaspartate ammonia-lyase, also called beta-methylaspartase. It follows methylaspartate mutase (composed of S and E subunits) in one of several possible pathways of glutamate fermentation.

    \ 7837 IPR012963 \

    This family contains many hypothetical bacterial proteins and two putative membrane proteins ( and ).

    \ 1357 IPR003426 \ Bacteriochlorophyll A protein is involved in the energy transfer system of green photosynthetic bacteria. The protein forms a homotrimer, with each monomer unit containing seven molecules of bacteriochlorophyll A.\ 6370 IPR009497 \

    This family consists of hypothetical Caenorhabditis elegans proteins.

    \ 7542 IPR009087 \

    Rab geranylgeranyltransferase (RabGGT) catalyses the transfer of geranylgeranyl groups to the C-terminal cysteine residues of Rab proteins, Ras-related small GTPases that function in intracellular vesicular transport PUBMED:10745007. RabGGT is only able to prenylate Rab when it is complexed to the Rab escort protein (REP), after which REP remains bound to the prenylated Rab and delivers it to its target membrane. RabGGT is a member of the protein prenyltransferase family (), all of which are heterodimers consisting of alpha and beta subunits. RabGGT is distinct from other members of the prenyltransferase family because of the presence of an Ig-like insert domain in the alpha subunit that is folded into an eight-stranded sandwich between two helices in the helical domain.

    \ \ 7863 IPR001250 \

    Mannose-6-phosphate isomerase or phosphomannose isomerase () (PMI) is the enzyme that catalyzes the interconversion of mannose-6-phosphate and fructose-6-phosphate. In eukaryotes PMI is involved in the synthesis of GDP-mannose, a constituent of N- and O-linked glycans and GPI anchors and in prokaryotes it participates in a variety of pathways, including capsular polysaccharide biosynthesis and D-mannose metabolism. PMI's belong to the cupin superfamily whose functions range from isomerase and epimerase activities involved in the modification of cell wall carbohydrates in bacteria and plants, to non-enzymatic storage proteins in plant seeds, and transcription factors linked to congenital baldness in mammals PUBMED:11165500. Three classes of PMI have been defined PUBMED:8307007.

    \

    Type I includes eukaryotic PMI and the enzyme encoded \ by the manA gene in enterobacteria. PMI has a bound zinc ion, which is essential for activity.

    \

    A crystal structure of PMI from Candida albicans shows that the enzyme has three distinct domains PUBMED:8612079. The active site lies in the central domain, contains a single essential zinc atom, and forms a deep, open cavity of suitable dimensions to contain M6P or F6P The central domain is flanked by a helical domain on one side and a jelly-roll like domain on the other.

    \ 6000 IPR009327 \

    This is a family of uncharacterised proteins found in bacteria and eukaryotes.

    \ 7739 IPR012855 \

    D-aminoacylase (, ) hydrolyses a wide variety of N-acyl derivatives of neutral D-amino acids, in a zinc-dependent manner. The enzyme is composed of a small beta-barrel domain and a larger catalytic alpha/beta-barrel. The C-terminal region featured in this family forms part of the beta-barrel domain, together with a short N-terminal segment. The beta-strands of both barrels were found to superimpose well. The small beta-barrel domain does not seem to contribute to the substrate-binding site or to be involved in the catalytic process PUBMED:12454005.

    \ 4554 IPR001359 \ Synapsins are neuronal phosphoproteins that coat synaptic vesicles, bind to several \ elements of the cytoskeleton (including actin filaments), and are believed to function in \ the regulation of neurotransmitter release PUBMED:2117454, PUBMED:10578110. The synapsin family currently \ includes the highly related synapsin I and II. Both synapsins exist in two alternatively \ spliced variants, IA and IB and IIA and IIB, that only differ at the C-terminus. \ It also includes synapsin III.\ 5924 IPR009289 \

    Family of proteins from various Baculoviruses with undetermined function.

    \ 6349 IPR009489 \

    This family consists of several plant specific PAR1 proteins from Nicotiana tabacum and Arabidopsis thaliana. The function of this family is unknown.

    \ 3191 IPR002703 \ The Levivirus coat protein forms the bacteriophage coat that encapsidates the viral RNA. 180 copies of this protein form the virion shell. The MS2 bacteriophage coat protein controls two distinct processes: sequence-specific RNA encapsidation and repression of replicase translation-by binding to an RNA\ stem-loop structure of 19 nucleotides containing the initiation codon of the replicase gene. The binding of a coat protein dimer to this hairpin shuts off synthesis of the viral replicase, switching the viral replication cycle to virion assembly rather than continued replication PUBMED:7523953.\ 7591 IPR011674 \ This is a group of sequences from hypothetical archaeal proteins. The region in question is approximately 330 amino acid residues long.\ 2884 IPR005204 \

    Haemocyanins are copper-containing oxygen transport proteins found in the haemolymph of many \ invertebrates. They are divided into 2 main groups, arthropodan and molluscan. These have structurally \ similar oxygen-binding centres, which are similar to the oxygen-binding centre of tyrosinases \ PUBMED:, but their quaternary structures are arranged differently. The arthropodan proteins exist \ as hexamers comprising 3 heterogeneous subunits (a, b and c) and possess 1 oxygen-binding centre per \ subunit; and the molluscan proteins exist as cylindrical oligomers of 10 to 20 subunits and possess 7 \ or 8 oxygen-binding centres per subunit PUBMED:3207675. Although the proteins have similar amino acid \ compositions, the only real similarity in their primary sequences is in the region corresponding to the\ second copper-binding domain, which also shows similarity to the copper-binding domain of tyrosinases \ PUBMED:.

    \

    Larval storage proteins (LSP) PUBMED:2808410 are proteins from the hemolymph of insects,\ which may serve as a store of amino acids for synthesis of adult proteins. There are two classes of \ LSP's, arylphorins, which are rich in aromatic amino acids, and methionine-rich LSP's. LSP's forms \ hexameric complexes. LSP's are structurally related to arthropods hemocyanins.

    \ 6107 IPR010434 \

    This family consists of several hypothetical bacterial proteins. Many of the sequences in this family are annotated as putative DNA binding proteins but the function of this family is unknown.

    \ 5264 IPR008675 \ This entry contains the N-terminal regions of the Saccharomyces mating factor alpha precursor protein. All proteins in this family contain one or more copies of further toward their C terminus.\ 212 IPR012309 \

    This region is found in many but not all ATP-dependent DNA ligase enzymes (). It is thought to constitute part of the catalytic core of ATP dependent DNA ligase PUBMED:9016621.

    \ 4305 IPR003668 \

    Rotavirus non-structural protein 35 (Ns35) is a basic protein which possesses RNA-binding activity and is essential\ for genome replication PUBMED:8380660. It may also be important for viral RNA packaging.

    \ 8154 IPR013224 \

    This family consists of several origin Saccharomycetes recognition complex subunit 6 (ORC6) proteins. Despite differences in their structure and sequences among eukaryotic replicators, ORC is a conserved feature of replication initiation in all eukaryotes. ORC-related genes have been identified in organisms ranging from Schizosaccharomyces pombe to plants to humans. All DNA replication initiation is driven by a single conserved eukaryotic initiator complex termed the origin recognition complex (ORC). The ORC is a six protein complex. The function of ORC is reviewed in PUBMED:11914271.

    \ 5639 IPR008560 \ This family consists of a number of conserved eukaryotic proteins of unknown function.\ 2611 IPR007044 \ This family is defined by the cyclodeaminase active site. In prokaryotes it is a single functional protein whereas in animals it occurs as a C-terminal domain in the the bifunctional enzyme formiminotransferase-cyclodeaminase (FTCD).\ 4693 IPR000264 \ A number of serum transport proteins are known to be evolutionarily related, including albumin, alpha-fetoprotein, vitamin D-binding protein and afamin PUBMED:2481749, PUBMED:2423133, PUBMED:7517938. Albumin is the main protein of plasma; it binds water, cations (such as Ca2+, Na+ and K+), fatty acids, hormones, bilirubin and drugs - its main function is to regulate the colloidal osmotic pressure of blood. Alphafeto- protein (alpha-fetoglobulin) is a foetal plasma protein that binds various cations, fatty acids and bilirubin. Vitamin D-binding protein binds to vitamin D and its metabolites, as well as to fatty acids. The biological role of afamin (alpha-albumin) has not yet been characterised. The 3D structure of human serum albumin has been determined by X-ray crystallography to a resolution of 2.8A PUBMED:1630489. It comprises three homologous domains that assemble to form a heart-shaped molecule PUBMED:1630489. Each domain is a product of two subdomains that possess common structural motifs PUBMED:1630489. The principal regions of ligand binding to human serum albumin are located in hydrophobic cavities in subdomains IIA and IIIA, which exhibit similar chemistry. Structurally, the serum albumins are similar, each domain containing five or six internal disulphide bonds, as shown schematically below:\
    \
                        +---+          +----+                        +-----+\
                        |   |          |    |                        |     |\
     xxCxxxxxxxxxxxxxxxxCCxxCxxxxCxxxxxCCxxxCxxxxxxxxxCxxxxxxxxxxxxxxCCxxxxCxxxx\
       |                 |       |     |              |               |\
       +-----------------+       +-----+              +---------------+\
    
    \ 6222 IPR010480 \

    The members of this group of proteins belong to MEROPS inhibitor family I33, clan IR; the nematode aspartyl protease inhibitors or Aspins. They are restricted to parasitic nematode species. Structural features common to the nematode Aspins include the presence of a signal peptide sequence and the conservation of all four cysteine residues in the mature protein. The Y[V.A]RDLT sequence motif has been suggested as being of crucial functional importance in several filarial nematode inhibitors PUBMED:8433724, this sequence is not conserved in Tco-API-1 from Trichostrongylus colubriformis and it has been demonstrated that Tco-API-1, is not an Aspin as it does not inhibit porcine pepsin PUBMED:13678638. Related inhibitors from Onchocerca volvulus, Ov33 PUBMED:9392607 and Ascaris suum, PI-3 PUBMED:9654082 inhibit the in vitro activity of aspartyl proteases such as pepsin and cathepsin E (MEROPS peptidase family A1).

    \ \

    The three-dimensional structures of pepsin inhibitor-3 (PI-3) from A. suum and of the complex between PI-3 and porcine pepsin at 1. 75 A and 2.45 A resolution, respectively, have revealed the mechanism of aspartic protease inhibition. PI-3 has a new fold consisting of two identical domains, each comprising an antiparallel beta-sheet flanked by an alpha-helix. In the enzyme-inhibitor complex, the N-terminal beta-strand of PI-3 pairs with one strand of the 'active site flap' (residues 70-82) of pepsin, thus forming an eight-stranded beta-sheet that spans the two proteins. PI-3 has a novel mode of inhibition, using its N-terminal residues to occupy and therefore block the first three binding pockets in pepsin for substrate residues C-terminal to the scissile bond (S1'-S3') PUBMED:10932249.

    \ \ 8055 IPR013225 \

    This family contains proteins that are similar to the product of the paaX gene of Escherichia coli (). This protein is involved in the regulation of expression of a group of proteins known to participate in the metabolism of phenylacetic acid PUBMED:10766858.

    \ 3754 IPR005317 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M49 (dipeptidyl-peptidase III family, clan M-). The predicted active site residues occur in the motif HEXXXH which is unlike that in any other family. The dipeptidyl peptidase III aminopeptidases cleave dipeptides from the N-terminal of peptides consisting of four or more amino acids and have a broad specificity.

    \ 8148 IPR013226 \

    Pal1 is a membrane associated protein that is involved in the maintenance of cylindrical cellular morphology. It localises to sites of active growth. Pal1 physically interacts and displays overlapping localisation with the Huntingtin-interacting-protein (Hip1)-related protein Sla2p/End4p PUBMED:15975911.

    \ 6987 IPR009828 \

    This family consists of several hypothetical eukaryotic proteins of around 320 residues in length. The function of this family is unknown.

    \ 15 IPR003959 \

    A large family of ATPases has been described PUBMED:1860879, PUBMED:1825027, PUBMED:2140770, PUBMED:8507683, PUBMED:7646486 whose key feature is that they share a conserved region of about 220 amino acids that contains an ATP-binding site. This family is now called AAA, for 'A'TPases 'A'ssociated with diverse cellular 'A'ctivities. The proteins that belong to this family either contain one or two AAA domains.

    \

    It is proposed that, in general, the AAA domains in these proteins act as ATP-dependent protein clamps PUBMED:7646486.

    \

    In addition to the ATP-binding 'A' and 'B' motifs (see the relevant entry ), which are located in the N-terminal half of this domain, there is a highly conserved region located in the central part of the domain.

    \ 7060 IPR009870 \

    This family consists of several archaeal proteins of around 320 residues in length. Members of this family seem to be found exclusively in Halobacterium and Haloferax species. The function of this family is unknown.

    \ 2221 IPR007571 \ This is a protein of unknown function found in algal chloroplasts and in a cyanobacterium.\ 5683 IPR008666 \ This family consists of several bacterial lipooligosaccharide sialyltransferases similar to the Haemophilus ducreyi LST protein. H. ducreyi is the cause of the sexually transmitted disease chancroid and produces a lipooligosaccharide (LOS) containing a terminal sialyl N-acetyllactosamine trisaccharide PUBMED:9933604.\ 1566 IPR005553 \

    Clag (cytoadherence linked asexual gene) is a malaria surface protein which has been shown to be involved in the binding of Plasmodium falciparum infected erythrocytes to host endothelial cells, a process termed cytoadherence. The cytoadherence phenomenon is associated with the sequestration of infected erythrocytes in the blood vessels of the brain, cerebral malaria. Clag is a multi-gene family in Plasmodium falciparum with at least 9 members identified to date. Orthologous proteins in the rodent malaria species Plasmodium chabaudi (Lawson D Unpubl. obs.) suggest that the gene family is found in other malaria species and may play a more generic role in cytoadherence.

    \ 6147 IPR009395 \

    This family consists of several eukaryotic GCN5-like protein 1 (GCN5L1) sequences. The function of this family is unknown PUBMED:8646881,PUBMED:9426003.

    \ 2130 IPR007414 \ This is a family of uncharacterised yeast proteins.\ 392 IPR000306 \ The FYVE zinc finger is named after four proteins that it has been found in: Fab1, YOTB/ZK632.12, Vac1, and EEA1. The FYVE finger has been shown to bind two Zn2+ ions PUBMED:8798641. The FYVE finger has eight potential zinc coordinating cysteine positions. Many members of this family also include two histidines in a motif R+HHC+XCG, where + represents a charged residue and X any residue.\ 6206 IPR010476 \

    This family consists of several bacterial L-rhamnose-proton symport protein (RhaT) sequences PUBMED:1551902,PUBMED:8757746.

    \ 2978 IPR002711 \ HNH endonuclease is found in bacteria and viruses PUBMED:9358175, PUBMED:7920259, PUBMED:7817395. This family includes pyocins, colicins and anaredoxins.\ 2141 IPR002729 \

    This family of proteins are found in archaea and bacteria and are, as yet, functionally uncharacterised. It is one of four protein families in prokaryotic genomes that contain multiple CRISPR elements. CRISPR is an acronym for Clustered Regularly Interspaced Short Palindromic Repeats. The cas genes are found near the repeats PUBMED:11952905. This protein is otherwise uncharacterized.

    \ 2574 IPR006859 \ BM2 is synthesised in the late phase of infection and incorporated into the virion. It may be phosphorylated in vivo. The function of BM2 is unknown PUBMED:10573149.\ 7820 IPR012943 \

    Proteins with this domain associate with the spindle body during cell division PUBMED:15004232.

    \ 7173 IPR009945 \

    This family consists of several hypothetical bacterial proteins of around 100 residues in length. Members of this family are found in Bradyrhizobium, Rhizobium, Brucella and Caulobacter species. The function of this family is unknown.

    \ 2301 IPR007714 \ This family of proteins are highly conserved in eukaryotes. Some proteins in the family are annotated as transcription factors. However, there is currently no support for this in the literature.\ 5021 IPR001510 \

    Synonym(s): Poly(ADP) polymerase (PARP)

    \ \

    NAD(+) ADP-ribosyltransferase()PUBMED:3118181, PUBMED:8016868 is a eukaryotic enzyme that catalyzes the covalent attachment of ADP-ribose units from NAD(+) to various nuclear acceptor proteins. This post-translational modification of nuclear proteins is dependent on DNA. It appears to be involved in the regulation of various important cellular processes such as differentiation, proliferation and tumor transformation as well as in the regulation of the molecular events involved in the recovery of the cell from DNA damage.

    \

    Structurally, NAD(+) ADP-ribosyltransferase consists of three distinct domains: an N-terminal zinc-dependent DNA-binding domain, a central automodification domain and a C-terminal NAD-binding domain.

    \

    The DNA-binding region contains a pair of PARP-type zinc finger domains which have been shown to bind DNA in a zinc-dependent manner. The PARP-type zinc finger domains seem to bind specifically to single-stranded DNA and to act as a DNA nick sensor. DNA ligase III PUBMED:7760816 contains, in its N-terminal section, a single copy of a zinc finger highly similar to those of PARP.

    \ 4985 IPR005594 \ This region represents the C-terminal 120 amino acids of a family of surface-exposed bacterial proteins. YadA, an adhesin from Yersinia, was the first member of this family to be characterized. UspA2 from Moraxella was second. The Eib immunoglobulin-binding proteins from E. coli were third, followed by the DsrA proteins of Haemophilus ducreyi and others. These proteins are homologous at their C-terminal and have predicted signal sequences, but they diverge elsewhere. The C-terminal 9 amino acids, consisting of alternating hydrophobic amino acids ending in F or W, comprise a targeting motif for the outer membrane of the Gram negative cell envelope. This region is important for oligomerisation PUBMED:11705900.\ 6715 IPR009676 \

    This family represents a conserved region approximately 50 residues long within a number of proteins of unknown function that seem to be restricted to Caenorhabditis elegans.

    \ 805 IPR007645 \ RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Domain 3, is also known as the fork domain and is proximal to catalytic site PUBMED:11313498.\ 4121 IPR003491 \ Plasmid replication is initiated by the replication initiation factor (REP). This family represents a probable topoisomerase that\ makes a sequence-specific single-stranded nick in the plasmid DNA at the origin of replication. Human proteins also belong to\ this family, including myelin transcription factor 2 and cerebrin-50 PUBMED:7735128.\ 921 IPR005617 \

    The N-terminal domain of the Grouch/TLE co-repressor proteins are involved in oligomerisation.

    \ 6782 IPR009714 \

    This family consists of several mammalian resistin proteins. Resistin is a 12.5 kDa cysteine-rich secreted polypeptide first reported from rodent adipocytes. It belongs to a multigene family termed RELMs or FIZZ proteins. Plasma resistin levels are significantly increased in both genetically susceptible and high-fat-diet-induced obese mice. Immunoneutralisation of resistin improves hyperglycemia and insulin resistance in high-fat-diet-induced obese mice, while administration of recombinant resistin impairs glucose tolerance and insulin action in normal mice. It has been demonstrated that increases in circulating resistin levels markedly stimulate glucose production in the presence of fixed physiological insulin levels, whereas insulin suppressed resistin expression. It has been suggested that resistin could be a link between obesity and type 2 diabetes PUBMED:12885401.

    \ 8133 IPR013193 \

    The molecular function of the non-structural 5a protein is uncertain. The NS5a protein is phosphorylated when expressed in mammalian cells. It is thought to interact with the dsRNA dependent (interferon inducible) kinase PKR, PUBMED:9710605, PUBMED:9143277. This region corresponds to the 1b domain PUBMED:15902263.

    \ 5148 IPR007985 \

    This family consists of haemolysin expression modulating protein (Hha) from Escherichia coli and its enterobacterial homologues, such as YmoA from Yersinia enterocolitica, and RmoA encoded on the R100 plasmid. These proteins act as modulators of bacterial gene expression. Members of the Hha/YmoA/RmoA family act in conjunction with members of the H-NS family, participating in the thermoregulation of different virulence factors and in plasmid transfer PUBMED:11890540. Hha, along with the chromatin-associated protein H-NS, is involved in the regulation of expression of the toxin alpha-haemolysin in response to osmolarity and temperature PUBMED:11790731. YmoA modulates the expression of various virulence factors, such as Yop proteins and YadA adhesin, in response to temperature. RmoA is a plasmid R100 modulator involved in plasmid transfer PUBMED:9851035. The HHA family of proteins display striking similarity to the oligomerization domain of the H-NS proteins.

    \ 4328 IPR003995 \

    Secretion of virulence factors in Gram-negative bacteria involves transportation of the protein across two membranes to reach the cell exterior PUBMED:1558765. Four principal exotoxin secretion systems have been described. In the type II and IV secretion systems, toxins are first exported to the periplasm by way of a cleaved N-terminal signal sequence; a second set of proteins is used for extracellular transport (type II), or the C-terminus of the exotoxin itself is used (type IV). Type III secretion involves at least 20 molecules that assemble into a needle; effector proteins are then translocated through this without need of a signal sequence. In the Type I system, a complete channel is formed through both membranes, and the secretion signal is carried on the C-terminus of the exotoxin.

    \

    The RTX (repeats in toxin) family of cytolytic toxins belong to the Type I \ secretion system, and are important virulence factors in Gram-negative bacteria. As well as the C-terminal signal sequence, several glycine-rich\ repeats are also found. These are essential for binding calcium, and are critical for the biological activity of the secreted toxins PUBMED:8800842. All RTX toxin operons exist in the order rtxCABD, RtxA protein being the structural\ component of the exotoxin, both RtxB and D being required for its export from the bacterial cell; RtxC is an acyl-carrier-protein-dependent acyl- modification enzyme, required to convert RtxA to its active form PUBMED:10470043.

    \

    Escherichia coli hemolysin (HlyA) is often quoted as the model for RTX \ toxins. Recent work on its relative rtxC gene product HlyC PUBMED:9521785 has revealed that it provides the acylation aspect for post-translational modification of two internal lysine residues in the HlyA protein. To cause pathogenicity, the HlyA toxin must first bind Ca2+ ions to the set of glycine-rich repeats and then be activated by HlyC PUBMED:8808931. This has been demonstrated both in vitro and in vivo.

    \ \

    A number of the sequences in this family are metallopeptidases belonging to MEROPS peptidase family M10 (clan MA(M)), subfamily M10B: serralysin, epralysin and unassigned peptidases.

    \ 6855 IPR008309 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 5179 IPR008016 \

    The head-tail connector of bacteriophage 29 is composed of 12 36 kDa\ subunits with 12 fold symmetry. It is the central component of a rotary motor that packages the\ genomic dsDNA into pre-formed proheads. This motor consists of the head-tail connector,\ surrounded by a 29-encoded, 174-base, RNA and a viral ATPase protein PUBMED:9891587.

    \ 3055 IPR000867 \ The insulin-like growth factors (IGF-I and IGF-II) bind to specific binding proteins in \ extracellular fluids with high affinity PUBMED:7680510, PUBMED:1725860, PUBMED:2480830. These IGF-binding\ proteins (IGFBP) prolong the half-life of the IGFs and have been shown to either inhibit or \ stimulate the growth promoting effects of the IGFs on cells culture. They seem to alter the \ interaction of IGFs with their cell surface receptors. There are at least six different IGFBPs and \ they are structurally related. The following growth-factor inducible proteins are structurally \ related to IGFBPs and could function as growth-factor binding proteins PUBMED:1654338, PUBMED:1309586, \ mouse protein cyr61 and its probable chicken homolog, protein CEF-10; human connective tissue growth \ factor (CTGF) and its mouse homolog, protein FISP-12; and vertebrate protein NOV.\ 7190 IPR009956 \

    This family consists of several Enterobacterial post-segregation antitoxin CcdA proteins. The F plasmid-carried bacterial toxin, the CcdB protein, is known to act on DNA gyrase in two different ways. CcdB poisons the gyrase-DNA complex, blocking the passage of polymerases and leading to double-strand breakage of the DNA. Alternatively, in cells that overexpress CcdB, the A subunit of DNA gyrase (GyrA) has been found as an inactive complex with CcdB. Both poisoning and inactivation can be prevented and reversed in the presence of the F plasmid-encoded antidote, the CcdA protein PUBMED:10196173.

    \ 8065 IPR013220 \

    Proteins in this family are involved in repairing double-stranded DNA breaks created by the cleavage reaction of topoisomerase II PUBMED:15718301.

    \ 6093 IPR009370 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 4947 IPR000052 \ Potexviruses and carlaviruses are plant-infecting viruses whose genome consist\ of a single-stranded RNA molecule encapsided in a coat protein. The genome of\ many potexviruses is known and their coat protein sequence has been shown to\ be rather well conserved PUBMED:2738582. The same observation applies to the coat protein\ of a variety of carlaviruses whose sequences are related to those of\ potexviruses PUBMED:2732711, PUBMED:1629709. The coat proteins of potexviruses and of carlaviruses\ contain from 190 to 300 amino acid residues.\ The best conserved region of these coat proteins is located in the central\ part.\ 5959 IPR009306 \

    This family consists of a series of repeated sequences from one hypothetical protein () found in Schizosaccharomyces pombe. The function of this family is unknown.

    \ 3423 IPR004687 \ The proteins of the MET family have 4 TMS regions and are located in late endosomal or lysosomal membranes. Substrates of the mouse MTP transporter include thymidine, both nucleoside and nucleobase analogues, antibiotics, anthracyclines, ionophores and steroid hormones. MET transporters may be involved in the subcellular compartmentation of steroid hormones and other compounds.Drug sensitivity by mouse MET was regulated by compounds that inhibit lysosomal function, interface with intracellular cholesterol transport, or modulate the multidrug resistance phenotype of mammalian cells. Thus, MET family members may compartmentalize diverse hydrophobic molecules, thereby affecting cellular drug sensitivity, nucleoside/nucleobase availability and steroid hormone responses.\ 6951 IPR009806 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbW found in PSII, where it is a subunit of the oxygen-evolving complex. PsbW appears to have several roles, including guiding PSII biogenesis and assembly, stabilising dimeric PSII PUBMED:10950961, and facilitating PSII repair after photo-inhibition PUBMED:9335523. There appears to be two classes of PsbW, class 1 being found predominantly in algae and cyanobacteria, and class 2 being found predominantly in plants. This entry represents class 2 PsbW.

    \ 2442 IPR005141 \

    This domain is found in the release factor eRF1 which terminates protein biosynthesis by recognizing stop codons at the A site of the ribosome and stimulating\ peptidyl-tRNA bond hydrolysis at the peptidyl transferase center. The crystal structure of human eRF1 is known PUBMED:10676813. The overall\ shape and dimensions of eRF1 resemble a tRNA molecule with domains 1, 2, and 3 of eRF1 corresponding to the anticodon loop,\ aminoacyl acceptor stem, and T stem of a tRNA molecule, respectively. The position of the essential GGQ motif at an exposed tip\ of domain 2 suggests that the Gln residue coordinates a water molecule to mediate the hydrolytic activity at the peptidyl\ transferase center. A conserved groove on domain 1, 80 A from the GGQ motif, is proposed to form the codon recognition site PUBMED:10676813.

    \ \

    This domain is also found in other proteins which may also be involved in translation termination

    \ 4343 IPR006779 \

    S1FA is an unusual small plant peptide of only 70 amino acids with a basic\ domain which contains a nuclear localization signal and a putative DNA binding helix. S1FA is highly conserved\ between dicotyledonous and monocotyledonous plants and may be a DNA-binding protein that specifically recognises the negative promoter element S1F PUBMED:7739894.

    \ 138 IPR007051 \ CHORD represents a Zn binding domain. Silencing of the Caenorhabditis elegans CHORD-containing gene results in semisterility and embryo lethality, suggesting an essential function of the wild-type gene in nematode development. The CHORD domain is sometimes found N-terminal to the CS domain, , in metazoan proteins, but occurs separately from the CS domain in plants. This association is thought to be indicative of an functional interaction between CS and CHORD domains PUBMED:10571178.\ 2403 IPR004704 \

    The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) PUBMED:8246840, PUBMED:2197982 is a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates and coupled with translocation across the cell membrane, makes the PTS a link between the uptake and metabolism of sugars.

    \ \

    The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred via a signal transduction pathway, to enzyme I (EI) which in turn transfers it to a phosphoryl carrier, the histidine protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease, a membrane-bound complex known as enzyme 2 (EII), which transports the sugar to the cell. EII consists of at least three structurally distinct domains IIA, IIB and IIC PUBMED:1537788. These can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII).

    \ \

    The first domain (IIA or EIIA) carries the first permease-specific phosphorylation site, a histidine which is phosphorylated by phospho-HPr. The second domain (IIB or EIIB) is phosphorylated by phospho-IIA on a cysteinyl or histidyl residue, depending on the sugar transported. Finally, the phosphoryl group is transferred from the IIB domain to the sugar substrate concomitantly with the sugar uptake processed by the IIC domain. This third domain (IIC or EIIC) forms the translocation channel and the specific substrate-binding site.

    \ \

    An additional transmembrane domain IID, homologous to IIC, can be found in some PTSs, e.g. for mannose PUBMED:8246840, PUBMED:1537788, PUBMED:7815935, PUBMED:11361063.

    \ \ Bacterial PTS transporters transport and concomitantly phosphorylate their sugar substrates, and typically consist of multiple subunits or protein domains.The Man family is unique in several respects among PTS permease families.\
  • It is the only PTS family in which members possess a IID protein.
  • \
  • It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.
  • \
  • Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.
  • \ \

    The mannose permease of Escherichia coli, for example, can transport and phosphorylate glucose, mannose, fructose, glucosamine,N-acetylglucosamine, and other sugars. Other members of this can transport sorbose, fructose and N-acetylglucosamine.

    This family is specific for the IID subunits of this family of PTS transporters.

    \ 3887 IPR002596 \ This family consists of conserved hypothetical proteins from\ Borrelia burgdorferi the lyme disease spirochaete, some of which\ are putative plasmid partition proteins PUBMED:9695920.\ 3840 IPR003430 \ Bacterial phenol hydroxylase () is a multicomponent enzyme that catabolises phenol and some of its methylated derivatives. This family contains both the P1 and P3 polypeptides of phenol hydroxlase and the alpha and beta chain of methane\ hydroxylase protein A. Methane hydroxylase protein A () is responsible for the initial oxygenation of methane to methanol in methanotrophs. It also catalyses the monohydroxylation of a variety of unactivated alkenes, alicyclic, aromatic and heterocyclic compounds. Also included in this family is toluene-4-monooxygenase system protein A (), which hydroxylates toluene to form P-cresol.\ 2746 IPR013148 \

    This domain corresponds to the N terminal domain of glycosyl transferase family 32 which forms a five bladed beta propeller structure PUBMED:14973124.

    \ 4545 IPR005100 \ This short region of similarity is found in two tandem copies in Supt5 proteins that are involved in chromatin regulation. The function of this region is unknown.\ 2411 IPR000348 \

    p24 proteins are major membrane components of COPI- and COPII-coated vesicles and are implicated in cargo selectivity of ER to Golgi transport PUBMED:9472029 PUBMED:8947548.\ \ Multiple members of the p24 family are found in all eukaryotes, from yeast to mammals. \ Members of the p24 family are type I membrane proteins with a signal peptide at the amino terminus, a lumenal coiled-coil (extracytosolic) domain, a single transmembrane domain with conserved amino acids, and a short cytoplasmic tail. They may be grouped into at least three subfamilies based on primary sequence PUBMED:8663407. One subfamily comprises yeast Emp24p and mammalian p24A. Another subfamily comprises yeast Erv25p and mammalian Tmp21, and the third subfamily comprises mammalian gp25L proteins.

    \ \ 1880 IPR003677 \ This domain has no known function.\ 699 IPR006785 \ This conserved region defines a group of peroxisomal membrane anchor proteins which bind the PTS1 (peroxisomal targeting signal) receptor and are required for the import of PTS1-containing proteins into peroxisomes. Loss of functional Pex14p results in defects in both the PTS1 and PTS2-dependent import pathways. Deletion analysis of this conserved region implicates it in selective peroxisome degradation. In the majority of members this region is situated at the N-terminus of the protein PUBMED:9094717, PUBMED:11564741.\ 7089 IPR010833 \

    This family consists of several bacterial replication initiation and membrane attachment (DnaB) proteins. The DnaB protein is essential for both replication initiation and membrane attachment of the origin region of the chromosome and plasmid pUB110 in Bacillus subtilis. It is known that there are two different classes (DnaBI and DnaBII) in the DnaB mutants; DnaBI is essential for both chromosome and pUB110 replication, whereas DnaBII is necessary only for chromosome replication PUBMED:3027697.

    \ 1449 IPR004341 \ The CAT RNA-binding domain is found at the amino terminus of a family of transcriptional antiterminator proteins, the Co-AntiTerminator (CAT) domain. This domain forms a dimer in the crystal structure PUBMED:9305644. Transcriptional antiterminators of the BglG/SacY family are\ regulatory proteins that mediate the induction of sugar metabolizing operons in Gram-positive and Gram-negative bacteria. Upon activation, these proteins bind to specific targets in nascent mRNAs, thereby preventing abortive dissociation of the RNA polymerase from the DNA template PUBMED:10610766.\ 579 IPR001003 \

    Major Histocompatibility Complex (MHC) glycoproteins are heterodimeric cell surface receptors that function to present antigen peptide fragments to T cells responsible for cell-mediated immune responses. MHC molecules can be subdivided into two groups on the basis of structure and function: class I molecules present intracellular antigen peptide fragments (~10 amino acids) on the surface of the host cells to cytotoxic T cells; class II molecules present exogenously derived antigenic peptides (~15 amino acids) to helper T cells. MHC class I and II molecules are assembled and loaded with their peptide ligands via different mechanisms. However, both present peptide fragments rather than entire proteins to T cells, and are required to mount an immune response.

    \

    Class II MHC glycoproteins are expressed on the surface of antigen-presenting cells (APC), including macrophages, dendritic cells and B cells. MHC II proteins present peptide antigens that originate extracellularly from foreign bodies such as bacteria. Proteins from the pathogen are degraded into peptide fragments within the APC, which sequesters these fragments into the endosome so they can bind to MHC class II proteins, before being transported to the cell surface. MHC class II receptors display antigens for recognition by helper T cells (stimulate development of B cell clones) and inflammatory T cells (cause the release of lymphokines that attract other cells to site of infection) PUBMED:15120183.

    \

    MHC class II molecules are comprised of two membrane-spanning chains, alpha and beta (), of similar size. Both chains consist of two globular domains (N- and C-terminal), and a transmembrane segment to anchor them to the membrane PUBMED:7612235. A groove in the structure acts as the peptide-binding site. This entry represents the N-terminal domain (also called alpha-1 domain) of the alpha chain.

    \ \ 6062 IPR009355 \

    This family consists of several Toluene-4-monooxygenase system protein B (TmoB) sequences. Pseudomonas mendocina KR1 metabolises toluene as a carbon source. The initial step of the pathway is hydroxylation of toluene to form p-cresol by a multicomponent toluene-4-monooxygenase (T4MO) system PUBMED:1885512.

    \ 5974 IPR009315 \

    Phosphate-starvation-inducible E (PsiE) expression is under direct positive and negative control by PhoB and cAMP-CRP, respectively PUBMED:10986267. The function of PsiE remains to be determined.

    \ 1403 IPR007602 \ This family includes NS2 proteins from other members of the Orbivirus genus. NS2 is a non-specific single-stranded RNA-binding protein that forms large homomultimers and accumulates in viral inclusion bodies of infected cells. Three RNA-binding regions have been identified in Bluetongue virus serotype 17 () at residues 2-11, 153-166 and 274-286 PUBMED:11752140. NS2 multimers also possess nucleotidyl phosphatase activity PUBMED:11162836. The precise function of NS2 is not known, but it may be involved in the transport and condensation of viral mRNAs PUBMED:11752140.\ 6651 IPR009642 \

    This family contains a number of hypothetical bacterial proteins of unknown function. Some family members contain more than one copy of the region represented by this family.

    \ 1097 IPR002553 \ This domain is the N-terminal region of various alpha, \ beta and gamma subunits of the AP-1, AP-2 and AP-3 adaptor\ protein complexes. The adaptor protein (AP) complexes are involved in\ the formation of clathrin-coated pits and vesicles PUBMED:9261055.\ The N-terminal region of the various adaptor proteins (APs) is constant\ by comparison to the C-terminal which is variable within members of the\ AP-2 family PUBMED:2495531; and it has been proposed that this constant region\ interacts with another uniform component of the coated vesicles PUBMED:2495531.\ 3912 IPR002643 \ This family consists of the DNA-binding protein or agnoprotein from various polyomaviruses. This protein is highly basic and can bind single stranded and double stranded DNA PUBMED:6262654. Mutations in the agnoprotein produce smaller viral plaques, hence its function is not essential for growth in tissue culture cells but something has slowed in the normal replication cycle PUBMED:3027418. There is also evidence suggesting that the agnogene and agnoprotein act as regulators of structural protein synthesis PUBMED:3027418.\ 5668 IPR008766 \ This family consists of a group of bacteriophage replication gene A protein (GPA) like sequences from both viruses and bacteria. The members of this family are likely to be endonucleases PUBMED:1701261, PUBMED:7997180, PUBMED:8510152.\ 2839 IPR007690 \ This is a family of membrane proteins involved in the secretion of a number of molecules in Gram-negative bacteria. The precise function of these proteins is unknown, though in Vibrio cholerae, the EpsM protein interacts with the EpsL protein, and also forms homodimers PUBMED:10322014.\ 4428 IPR004317 \

    Reoviruses are double-stranded RNA viruses that lack a membrane envelope. Their capsid is organized in two concentric icosahedral layers: an inner core and an outer capsid layer. The sigma1 protein is found in the outer capsid, and the sigma2 protein is found in the core. There are four other kinds of protein (besides sigma2) in the core, termed lambda 1-3, mu2. Interactions between sigma2 and lambda 1 and lambda 3 are thought\ to initiate core formation, followed by mu2 and lambda2 PUBMED:9971813.

    \

    Sigma1 is a trimeric protein, and is positioned at the 12 vertices of the icosahedral outer capsid layer. Its N-terminal fibrous tail, arranged as a triple coiled coil,\ anchors it in the virion, and a C-terminal globular head interacts with the\ cellular receptor PUBMED:11438552. These two parts form by separate trimerization events.\ The N-terminal fibrous tail forms on the polysome, without the involvement\ of ATP or chaperones. The post- translational assembly of the C-terminal\ globular head involves the chaperone activity of Hsp90, which is associated\ with phosphorylation of Hsp90 during the process PUBMED:11438552. Sigma1 protein acts\ as a cell attachment protein, and determines viral virulence, pathways of\ spread, and tropism. Junctional adhesion molecule has been identified as a\ receptor for sigma1 PUBMED:11239401. In type 3 reoviruses, a small region, predicted to\ form a beta sheet, in the N-terminal tail was found to bind target cell surface\ sialic acid (i.e. sialic acid acts as a co-receptor) and promote apoptosis PUBMED:11287552.\ The sigma1 protein also binds to the lambda2 core protein PUBMED:9311901.

    \ 6618 IPR009627 \

    This is a group of proteins of unknown function.

    \ 3009 IPR000847 \ Numerous bacterial transcription regulatory proteins bind DNA via a helix-turn-helix (HTH) motif. \ These proteins are very diverse, but for convenience may be grouped into subfamilies on the basis \ of sequence similarity. One such family, the lysR family, groups together a range of proteins, \ including ampR, catM, catR, cynR, cysB, gltC, iciA, ilvY, irgB, lysR, metR, mkaC, mleR, nahR, nhaR, \ nodD, nolR, oxyR, pssR, rbcR, syrM, tcbR, tfdS and trpI PUBMED:1907267, PUBMED:1592818, PUBMED:1840615, \ PUBMED:3413113, PUBMED:2034653. The majority of these proteins appear to be transcription activators\ and most are known to negatively regulate their own expression. All possess a potential HTH \ DNA-binding motif towards their N-termini.\ 1487 IPR003610 \

    The carbohydrate-binding domain (CBD) is a short domain found in many different glycosyl hydrolase enzymes, such as the C-terminal cellulose-binding domain of endoglucanase Z PUBMED:9405041. The domain has a core structure consisting of a 3-stranded meander beta-sheet, which contains six aromatic groups that may be important for binding.

    \

    The overall topology of the CBD is structurally similar to the C-terminal chitin-binding domains (ChBD) of chitinase A1 and chitinase B, however the binding mechanism for the ChBD may be different from that of the CBD PUBMED:10788483.

    \ \ \ 2417 IPR001178 \ This entry contains insecticidal toxins produced by Bacillus species of bacteria.\ During spore formation the bacteria produce crystals of this protein. When an insect\ ingests these proteins they are activated by proteolytic cleavage. The N terminus is\ cleaved in all of the proteins and a C-terminal extension is cleaved in some members.\ Once activated the endotoxin binds to the gut epithelium and causes cell lysis leading\ to death. This activated region of the delta endotoxin is composed of three structural\ domains. The N-terminal helical domain is involved in membrane insertion and pore\ formation. The second and third domains are involved in receptor binding.\ 3185 IPR007825 \ This family consists of major outer membrane protein precursors from Legionella pneumophila.\ 6293 IPR009463 \

    This is a group of proteins of unknown function.

    \ 4490 IPR007170 \ This is a stage V sporulation protein G. It is essential for sporulation and specific to stage V sporulation in Bacillus megaterium and Bacillus subtilis PUBMED:1373326. In B. subtilis, expression decreases after 30-60 minutes of cold shock PUBMED:8755892.\ 783 IPR000999 \ Prokaryotic ribonuclease III () (gene rnc) PUBMED:3903434 is an enzyme that digests \ double-stranded RNA. It is involved in the processing of ribosomal RNA precursors and of some mRNAs. \ RNase III is evolutionary related PUBMED:9241229 to the fission yeast pac1, a ribonuclease that probably\ inhibits mating and meiosis by degrading a specific mRNA required for sexual development; yeast \ ribonuclease III (gene RNT1), a dsRNA-specific nuclease that cleaves eukaryotic preribosomal RNA at \ various sites; Caenorhabditis elegans hypothetical protein F26E4.13; Paramecium bursaria chlorella virus \ 1 protein A464R; Synechocystis strain PCC 6803 hypothetical protein slr0346; fission yeast hypothetical \ protein SpAC8A4.08c, a protein with a N-terminal helicase domain and a C-terminal RNase III domain; and\ Caenorhabditis elegans hypothetical protein K12H4.8, a protein with the same structure as SpAC8A4.08c.\ 830 IPR000770 \

    The SAND domain (named after Sp100, AIRE-1, NucP41/75, DEAF-1) is a conserved\ ~80 residue region found in a number of nuclear proteins, many of which\ function in chromatin-dependent transcriptional control. These include\ proteins linked to various human diseases, such as the Sp100 (Speckled protein\ 100 kDa), NUDR (Nuclear DEAF-1 related), GMEB (Glucocorticoid Modulatory\ Element Binding) proteins and AIRE-1 (Autoimmune regulator 1) proteins.

    \

    \ Proteins containing the SAND domain have a modular structure; the SAND domain\ can be associated with a number of other modules, including the bromodomain, the PHD finger and the MYND finger.\ Because no SAND domain has been found in yeast, it is thought that the SAND\ domain could be restricted to animal phyla. Many SAND domain-containing\ proteins, including NUDR, DEAF-1 (Deformed epidermal autoregulatory factor-1)\ and GMEB, have been shown to bind DNA sequences specifically. The SAND domain\ has been proposed to mediate the DNA binding activity of these proteins PUBMED:9697411, PUBMED:11427895.

    \

    \ The resolution of the 3D structure of the SAND domain from Sp100b has revealed\ that it consists of a novel alpha/beta fold. The SAND domain\ adopts a compact fold consisting of a strongly twisted, five-stranded\ antiparallel beta-sheet with four alpha-helices packing against one side of\ the beta-sheet. The opposite side of the beta-sheet is solvent exposed. The\ beta-sheet and alpha-helical parts of the structure form two distinct regions.\ Multiple hydrophobic residues pack between these regions to form a structural\ core. A conserved KDWK sequence motif is found within the alpha-helical,\ positively charged surface patch. The DNA binding surface has been mapped to\ the alpha-helical region encompassing the KDWK motif PUBMED:11427895.

    \ 1169 IPR004185 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ Enzymes containing this domain belong to family 13 () of the glycosyl hydrolases. The maltogenic alpha-amylase is an enzyme which catalyses hydrolysis of (1-4)-alpha-D-glucosidic linkages in polysaccharides so as to remove successive alpha-maltose residues from the non-reducing ends of the chains in the conversion of starch to maltose. Other enzymes include neopullulanase, which hydrolyses pullulan to panose, and cyclomaltodextrinase, which hydrolyses cyclodextrins.\ 59 IPR001164 \

    This entry describes a family of small GTPase activating proteins, for example ARF1-directed GTPase-activating protein, the cycle control GTPase\ activating protein (GAP) GCS1 which is important for the regulation of\ the ADP ribosylation factor ARF, a member of the Ras superfamily of GTP-binding\ proteins PUBMED:9446556. The GTP-bound form of ARF is essential for the maintenance of normal\ Golgi morphology, it participates in recruitment of coat proteins which are\ required for budding and fission of membranes. Before the fusion with an\ acceptor compartment the membrane must be uncoated. This step required the\ hydrolysis of GTP associated to ARF. These proteins contain a characteristic zinc finger motif\ (Cys-x2-Cys-x(16,17)-x2-Cys) which displays some similarity to the C4-type\ GATA zinc finger. The ARFGAP domain display no obvious similarity to other GAP\ proteins.

    \ \ The 3D structure of the ARFGAP domain of the PYK2-associated protein beta has\ been solved PUBMED:10601011. It consists of a three-stranded beta-sheet surrounded by 5\ alpha helices. The domain is organized around a central zinc atom which is\ coordinated by 4 cysteines. The ARFGAP domain is clearly\ unrelated to the other GAP proteins structures which are exclusively helical.\ Classical GAP proteins accelerate GTPase activity by supplying an arginine\ finger to the active site. The crystal structure of ARFGAP bound to ARF\ revealed that the ARFGAP domain does not supply an arginine to the active site\ which suggests a more indirect role of the ARFGAP domain in the GTPase\ hydrolysis PUBMED:10102276.

    \ \

    The Rev protein of human immunodeficiency virus type 1 (HIV-1) facilitates\ nuclear export of unspliced and partly-spliced viral RNAs PUBMED:7637788. Rev contains\ an RNA-binding domain and an effector domain; the latter is believed to \ interact with a cellular cofactor required for the Rev response and hence\ HIV-1 replication. Human Rev interacting protein (hRIP) specifically\ interacts with the Rev effector. The amino acid sequence of hRIP is \ characterised by an N-terminal, C-4 class zinc finger motif.

    \ \ \ 6824 IPR010732 \

    This family consists of several hypothetical bacterial proteins of around 300 residues in length. The function of this family is unknown although one member () from Salmonella enterica is thought to be involved in virulence PUBMED:12437215.

    \ 983 IPR005127 \ During infection, the intestinal protozoan parasite Giardia lamblia undergoes continuous antigenic variation which is determined\ by diversification of the parasite's major surface antigen, named VSP (variant surface protein).\ 334 IPR002048 \ Many calcium-binding proteins belong to the same evolutionary family and share\ a type of calcium-binding domain known as the EF-hand. This type of\ domain consists of a twelve residue loop flanked on both side by a twelve\ residue alpha-helical domain. In an EF-hand loop the calcium ion is\ coordinated in a pentagonal bipyramidal configuration. The six residues\ involved in the binding are in positions 1, 3, 5, 7, 9 and 12; these residues\ are denoted by X, Y, Z, -Y, -X and -Z. The invariant Glu or Asp at position 12\ provides two oxygens for liganding Ca (bidentate ligand).\ 6260 IPR009450 \

    Glycosylphosphatidylinositol (GPI) represents an important anchoring molecule for cell surface proteins. The first step in its synthesis is the transfer of N-acetylglucosamine (GlcNAc) from UDP-N-acetylglucosamine to phosphatidylinositol (PI). This step involves products of three or four genes in both yeast (GPI1, GPI2 and GPI3) and mammals (GPI1, PIG A, PIG H and PIG C), respectively.

    \ 3357 IPR007567 \ This family represents a region near the C terminus of Mid2, which contains a transmembrane region. The remainder of the protein sequence is serine-rich and of low complexity, and is therefore impossible to align accurately. Mid2 is thought to act as a mechanosensor of cell wall stress. The C-terminal cytoplasmic region of Mid2 is known to interact with Rom2, a guanine nucleotide exchange factor (GEF) for Rho1, which is part of the cell wall integrity signalling pathway.\ 4258 IPR000630 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S8 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S8 is known to bind\ directly to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence\ similarities PUBMED:, groups eubacterial, algal and plant chloroplast, cyanelle, archaebacterial and\ Marchantia polymorpha mitochondrial S8; mammalian and plant S15A; and yeast S22 (S24) ribosomal proteins.

    \ 7621 IPR012859 \

    The sequences making up this family are derived from hypothetical proteins of unknown function expressed by various archaeal species. The region in question is approximately 160 residues long.

    \ 6970 IPR010790 \

    This entry represents a repeated motif of around 29 residues in length. This repeat are found in the variable surface lipoproteins in Mycoplasma bovis and in mammalian neurofilament triplet H (NefH or NF-H) proteins. This repeat contains several Lys-Ser-Pro (KSP) motifs and in NefH these are thought to function as the main target for neurofilament directed protein kinases in vivo PUBMED:3138108.

    \ 3590 IPR000145 \ The orbivirus VP5 protein is one of the two proteins\ (with VP2) which make up the virus particle outer capsid. Cryoelectron microscopy indicates that VP5 is a trimer suggesting \ that there are 360 copies of VP5 per virion PUBMED:9281498.\ 514 IPR004193 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \ Enzymes containing this domain belong to family 13 () of the glycosyl hydrolases. This domain is found in a range of enzymes that act on branched substrates ie. isoamylase, pullulanase and\ branching enzyme. Isoamylase hydrolyses 1,6-alpha-D-glucosidic branch linkages in glycogen, amylopectin and\ dextrin; 1,4-alpha-glucan branching enzyme functions in the formation of 1,6-glucosidic linkages of glycogen; and\ pullulanase is a starch-debranching enzyme.\ 2764 IPR005193 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This is a family of alpha -L-arabinofuranosidases () which are all members of glycoside\ hydrolase family 62 (). This enzyme hydrolyzed aryl alpha-L-arabinofuranosides and cleaves arabinosyl side chains from arabinoxylan and arabinan.

    \ 7125 IPR009914 \

    This family consists of several eukaryotic dolichol phosphate-mannose biosynthesis regulatory (DPM2) proteins. Biosynthesis of glycosylphosphatidylinositol and N-glycan precursor is dependent upon a mannosyl donor, dolichol phosphate-mannose (DPM). DPM2, an 84 amino acid membrane protein expressed in the endoplasmic reticulum (ER), makes a complex with DPM1 that is essential for the ER localisation and stable expression of DPM1. Moreover, DPM2 enhances binding of dolichol phosphate, a substrate of DPM synthase. Biosynthesis of DPM in mammalian cells is regulated by DPM2 PUBMED:9724629.

    \ 377 IPR001157 \ The flavivirus genome polypepetide contains the capsid protein C (core protein),\ the matrix protein (envelope protein M), the major envelope protein E, a number\ of small non structural proteins (NS1, NS2A, NS2B, NS4A and NS4B), helicase and\ RNA-directed polymerase (NS5) PUBMED:9371625.\ 4641 IPR006906 \

    The timeless gene in Drosophila melanogaster and its homologues in a number of other insects and mammals (including human) are involved in circadian rhythm control PUBMED:11710984. This family includes related proteins from a number of fungal species and from Arabidopsis thaliana.

    \ 7280 IPR010895 \

    CHRD (after SWISS-PROT abbreviation for chordin) is a novel domain identified in chordin, an inhibitor of bone morphogenetic proteins. This family includes bacterial homologues. It is anticipated to have an immunoglobulin-like beta-barrel structure based on limited similarity to superoxide dismutases but, as yet, no clear functional prediction can be made PUBMED:13678956.

    \ 2220 IPR007657 \ This is a family of uncharacterised proteins.\ 6184 IPR009415 \

    This family consists of several Hadronyche versuta (Blue mountains funnel-web spider) specific omega-atracotoxin proteins. Omega-Atracotoxin-Hv1a is an insect-specific neurotoxin whose phylogenetic specificity derives from its ability to antagonise insect, but not vertebrate, voltage-gated calcium channels. Two spatially proximal residues, Asn(27) and Arg(35), form a contiguous molecular surface that is essential for toxin activity. It has been proposed that this surface of the beta-hairpin is a key site for interaction of the toxin with insect calcium channels PUBMED:11313356.

    \ 916 IPR003749 \ ThiS (thiaminS) is a 66 aa protein involved in sulphur transfer. ThiS is coded in the thiCEFSGH operon in Escherichia coli. This family of proteins have two conserved Glycines at the COOH terminus. Thiocarboxylate is formed at the last G in the activation process. Sulphur is transferred from ThiI to ThiS in a reaction catalysed by IscS PUBMED:10781607. MoaD, a protein involved in sulphur transfer during molybdopterin synthesis, is about the same length and shows limited sequence similarity to ThiS. Both have the conserved GG at the COOH end.\ 3494 IPR002154 \

    Neuregulins are a sub-family of EGF-like molecules that have been shown to play multiple essential roles in vertebrate embryogenesis including: cardiac development, Schwann cell and oligodendrocyte differentiation, some aspects of neuronal development, as well as the formation of neuromuscular synapses PUBMED:9892702, PUBMED:9208852. Included in the family are heregulin; neu differentiation factor; acetylcholine receptor synthesis stimulator; glial growth factor; and sensory and motor-neuron derived factor PUBMED:9804837. Multiple family members are generated by alternate splicing or by use of several cell type-specific transcription initiation sites. In general, they bind to and activate the erbB family of receptor tyrosine kinases (erbB2 (HER2), erbB3 (HER3), and erbB4 (HER4)), functioning both as heterodimers and homodimers.

    The transmembrane forms of neuregulin 1\ (NRG1) are present within synaptic vesicles, including those containing glutamate PUBMED:12145742. After\ exocytosis, NRG1 is in the presynaptic membrane, where the ectodomain of NRG1 may be cleaved off. The ectodomain\ then migrates across the synaptic cleft and binds to and activates a member of the EGF-receptor family on the postsynaptic\ membrane. This has been shown to increase the expression of certain glutamate-receptor subunits. NRG1 appears to signal for glutamate-receptor subunit\ expression, localization, and /or phosphorylation facilitating subsequent glutamate transmission.

    The NRG1 gene has been identified as a potential gene determining susceptibility to schizophrenia by a combination of genetic linkage and association approaches PUBMED:12145742.

    \ 791 IPR000772 \ Ricin is a legume lectin from the seeds of the castor bean plant, \ Ricinus communis. The seeds are poisonous to \ people, animals and insects and just one milligram of ricin can kill an adult. \ \

    Primary structure analysis has shown the presence of a similar domain in many carbohydrate-recognition proteins like plant and bacterial AB-toxins, glycosidases or proteases PUBMED:9603958, PUBMED:7664090, PUBMED:8844840. This domain, known as the ricin B lectin domain, can be present in one or more copies and has been shown in some instance to bind simple sugars, such as galactose or lactose.

    \

    The ricin B lectin domain is composed of three homologous subdomains of 40 amino acids (alpha, beta and gamma) and a linker peptide of around 15 residues (lambda). It has been proposed that the ricin B lectin domain arose by gene triplication from a primitive 40 residue galactoside-binding peptide PUBMED:3561502, PUBMED:1881882. The most characteristic, though not completely conserved, sequence feature is the presence of a Q-W pattern. Consequently, the ricin B lectin domain as also been refered as the (QxW)3 domain and the three homologous regions as the QxW repeats PUBMED:7664090, PUBMED:8844840. A disulphide bond is also conserved in some of the QxW repeats PUBMED:7664090.

    \

    The 3D structure of the ricin B chain has shown that the three QxW repeats pack around a pseudo threefold axis that is stabilised by the lambda linker PUBMED:3561502. The ricin B lectin domain has no major segments of a helix or beta sheet but each of the QxW repeats contains an omega loop PUBMED:1881882. An idealized omega-loop is a compact, contiguous segment of polypeptide that traces a 'loop-shaped' path in three-dimensional space; the main chain resembles a Greek omega.

    \ 5606 IPR008403 \ This family consists of several mammalian apolipoprotein CIII (Apo-CIII) sequences. Apolipoprotein C-III is a 79-residue glycoprotein. It is synthesised in the intestine and liver as part of the very low density lipoprotein (VLDL) and the high density lipoprotein (HDL) particles. Owing to its positive correlation with plasma triglyceride (Tg) levels, Apo-CIII is suggested to play a role in Tg metabolism and is therefore of interest regarding atherosclerosis. However, unlike other apolipoproteins such as Apo-AI, Apo E or CII for which many naturally occurring mutations are known, the structure-function relationships of apo C-III remains a subject of debate. One possibility is that apo C-III inhibits lipoprotein lipase (LPL) activity, as shown by in vitro experiments. Another suggestion, is that elevated levels of Apo-CIII displace other apolipoproteins at the lipoprotein surface, modifying their clearance from plasma PUBMED:12082170.\ 34 IPR008183 \

    Aldose 1-epimerase () (mutarotase) is the enzyme responsible for the anomeric interconversion of D-glucose and other aldoses between their alpha- and beta-forms.

    \

    The sequence of mutarotase from two bacteria, Acinetobacter calcoaceticus and Streptococcus thermophilus is available PUBMED:1694527. It has also been shown that, on the basis of extensive sequence similarities, a mutarotase domain seems to be present in the C-terminal half of the fungal GAL10 protein which encodes, in the N-terminal part, UDP-glucose 4-epimerase.

    \

    The best conserved region in the sequence of mutarotase is centered around a conserved histidine residue which may be involved in the catalytic mechanism.

    \ 362 IPR001060 \ This domain was first identified in cell division control protein 15 from\ fission yeast where after the onset of mitosis, it forms a ring-like structure\ which co-localizes with the medial actin ring. It may mediate cytoskeletal\ rearrangements required for cytokinesis. \

    Also occurs in protein-tyrosine kinases, where it may play a role\ in regulatory processes such as cell cycle control, and in human Rho-GAP\ hematopoietic protein C1, where it has an inhibitory effect on stress fiber\ organisation.

    \ 722 IPR003165 \

    This domain is found in the stem cell self-renewal protein Piwi and its relatives in Drosophila melanogaster PUBMED:9851978. It has been found in the C-terminal of a number of proteins which also contain the PAZ domain () in their central region, for example the Argonaute proteins. Several of these proteins have been implicated in the\ development and maintenance of stem cells through the RNA-mediated gene-quelling mechanisms\ associated with the protein DICER.

    \ 7723 IPR012460 \

    Hypothetical archaeal and bacterial proteins make up this family. A few proteins are annotated as being potential metal-binding proteins, and in fact the members of this family have four highly conserved cysteine residues, but no further literature evidence was found in this regard.

    \ 6066 IPR009358 \

    This family consists of a number of lipoproteins from the Lyme disease spirochete Borrelia burgdorferi PUBMED:8655511.

    \ 2845 IPR006762 \

    GTR1 was first identified in Saccharomyces cerevisiae as a suppressor of a mutation in RCC1. RCC1 catalyzes guanine nucleotide exchange on Ran, a well characterized nuclear Ras-like small G protein that plays an essential role in the import and export of proteins and RNAs across the nuclear membrane\ through the nuclear pore complex. RCC1 is located inside the nucleus, bound to chromatin. The concentration of GTP within the cell is\ ~30 times higher than the concentration of GDP, thus resulting in the preferential production of the GTP form of Ran by RCC1 within the nucleus.

    Gtr1p is located within both the cytoplasm and the nucleus and has been reported to play a role in cell growth. Biochemical analysis revealed that Gtr1 is in fact a G protein of the Ras family. The RagA/B proteins are the human homologues of Gtr1 and Rag A and Gtr1p belong to the sixth subfamily of the Ras-like small\ GTPase superfamily PUBMED:11073942.

    \ 3907 IPR003715 \ The extracellular polysaccharide colanic acid (CA) is produced by species of the family Enterobacteriaceae. In Escherichia coli K12 the CA cluster comprises 19 genes. The wzx gene encodes a protein with multiple transmembrane segments that may function in export of the CA repeat unit from the cytoplasm into the periplasm in a process analogous to O-unit export. The CA gene clusters may be involved in the export of polysaccharide from the cell PUBMED:8759852.\ 698 IPR007217 \ A member of this family has been implemented in protein processing in the endoplasmic reticulum PUBMED:10831844.\ 3592 IPR001803 \

    Bluetongue virus (BTV) is a representative of the orbivirus genus of the Reoviridae PUBMED:7816101. Orbiviruses infect mammalian hosts through insect vectors, causing economically-important diseases of domesticated animals PUBMED:7816101. They possess a segmented, double-stranded RNA genome within a capsid that comprises four major polypeptides, designated VP2, VP3, VP5 and VP7. On entering a target cell, an outer layer, formed from VP2 and VP5, is removed, leaving an intact core within the cell PUBMED:7816101. The core, which is 70nm across, contains 780 copies of VP7, which together form 260 trimeric 'bristly' capsomeres clothing an inner scaffold constructed from VP3 PUBMED:7816101.

    \

    The 3D structure of VP7 reveals two domains, one a beta-sandwich, the other a bundle of alpha-helices, and a short C-terminal arm, which is thought to unite trimers during capsid formation PUBMED:7816101. A concentration of methionine residues at the core of the molecule could provide plasticity, relieving structural mismatches during assembly PUBMED:7816101.

    \

    The 3D structure of baculovirus-expressed core protein VP7 of African horse sickness virus serotype 4 (AHSV-4) has been determined to 2.3A resolution PUBMED:8648715. During crystallisation, the two-domain protein is cleaved, leaving only the top domain, in a manner reminiscent of BTV VP7; this suggests that connections between top and bottom domains are relatively weak for these two distinct orbiviruses PUBMED:8648715. The top domains of both BTV and AHSV VP7 are trimeric and structurally very similar. Electron density maps indicate an extra density feature along their molecular 3-fold axes, probably the result of an unidentified ion PUBMED:8648715. The characteristics of the molecular surface indicate the possibility of attachment to the cell via attachment of an Arg-Gly-Asp (RGD) motif in the top domain of VP7 to a cellular integrin for both of these orbiviruses PUBMED:8648715.

    \ 4365 IPR005597 \

    The monomer of the Satellite tobacco necrosis virus coat protein contains a "jelly-roll" motif. The narrow end of the jelly roll forms fivefold contacts about a Ca2+ ion. Electron density maps suggest that double-helical RNA segments are associated with each coat protein dimer PUBMED:8553559.

    \ 7004 IPR010802 \

    This domain is specific to cyanobacterial proteins, its function and the function of the proteins it is associated with, are uncharacterised.

    \ 6956 IPR009811 \

    This family consists of several hypothetical bacterial proteins of around 140 residues in length. Members of this family seem to be specific to Enterobacteria. The function of this family is unknown.

    \ 2314 IPR007774 \ This family contains several uncharacterised bacterial proteins. These proteins are found in nitrogen fixation operons, so are likely to play a role in this process.\ 4819 IPR001890 \

    This family is composed of small proteins of unknown function.

    \ 7749 IPR012418 \

    This region featured in this family is repeated in spinach cold acclimation protein CAP160 () CAP160 is induced during periods of drought stress; its precise function is unknown but it has been implicated in the stabilisation of membranes, cytoskeletal elements, and ribosomes. By acting as a compatible solute, it may reduce the toxic effects of cellular solutes that accumulate at high concentration PUBMED:9536054. Other members of this family are also induced by water stress, abscisic acid, and/or low temperature, such as desiccation-responsive protein 29B () and CDet11-24 protein ().

    \ 5763 IPR010262 \

    This family consists of several bacterial arylsulfotransferase proteins. Arylsulfotransferase (ASST) transfers a sulphate group from phenolic sulphate esters to a phenolic acceptor substrate PUBMED:8887346.

    \ 4815 IPR001455 \

    SirA functions as a response regulator as part of a two-component system, where BarA is the sensor kinase. This system increases the expression of virulence genes and decreases the expression of motility genes PUBMED:14645287. BarA phosphorylates SirA, thereby activating the protein. Phosphorylated SirA directly activates virulence expression by interacting with hilA and hilC promoters, while repressing the flagellar regulon indirectly by binding to the csrB promoter, which in turn affects flagellar gene expression. Orthologues of SirA from Salmonella spp. can be found throughout proteobacteria, such as GacA in Psuedomonas spp., VarA in Vibrio cholerae, ExpA in Erwinia carotovora, LetA in Legionella pneumophila, and UvrY in Escherichia coli PUBMED:11768529. A sensor kinase for SirA is present in each of these organisms as well; the sensor kinase is known as BarA in E. coli and Salmonella spp., but has different names in other genera. In different species, SirA/BarA orthologues are required for virulence gene expression, exoenzyme and antibiotic production, motility, and biofilm formation.

    \

    The structure of SirA consists of an alpha/beta sandwich with a beta-alpha-beta-alpha-beta(2) fold, comprising a mixed four-stranded beta-sheet stacked against two alpha-helices, both of which are nearly parallel to the strands of the beta-sheet PUBMED:11080457.

    \

    Several uncharacterised bacterial proteins (73 to 81 amino-acid residues in length) that contain a well-conserved region in their N-terminal region show structural similarity to the SirA protein, including the E. coli protein YedF, and other members of the UPF0033 family.

    \ \ 3050 IPR001040 \ Eukaryotic translation initiation factor 4E (eIF-4E) PUBMED:1733496 is a protein that\ binds to the cap structure of eukaryotic cellular mRNAs. eIF-4E recognizes and binds\ the 7-methylguanosine-containing (m7Gppp) cap during an early step in the initiation\ of protein synthesis and facilitates ribosome binding to a mRNA by inducing the unwinding\ of its secondary structures. A tryptophan in the central part of the sequence of human\ eIF-4E seems to be implicated in cap-binding PUBMED:1672854.\ 3008 IPR007331 \ This domain is found in HtaA, a secreted protein implicated in iron acquisition and transport PUBMED:10760164.\ 2469 IPR002189 \

    The actin filament system, a prominent part of the cytoskeleton in eukaryotic cells, is both a static structure and a dynamic network that can undergo rearrangements: it is thought to be involved in processes such as cell movement and phagocytosis PUBMED:2341404, as well as muscle contraction.

    \

    The F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filaments (barbed end) thereby blocking the exchange of subunits at these ends. Unlike gelsolin and severin this protein does not sever actin filaments. The F-actin capping protein is a heterodimer composed of two unrelated subunits: alpha and beta (see ). Neither of the subunits shows sequence similarity to other filament-capping proteins PUBMED:2341404.

    \

    The alpha subunit is a protein of about 268 to 286 amino acid residues whose sequence is well conserved in eukaryotic species PUBMED:1711931.

    \ 3347 IPR006668 \

    This region is the integral membrane part of the eubacterial MgtE family of magnesium transporters. It is presumed to be an intracellular domain, that may be involved in magnesium binding.

    \ 6750 IPR009692 \

    This family consists of several Citrus tristeza virus (CTV) P13 13 kDa proteins. Citrus tristeza virus (CTV), a member of the closterovirus group, is one of the more complex single-stranded RNA viruses PUBMED:9024813. The function of this family is unknown.

    \ 1267 IPR004337 \ The astrovirus genome is apparently organized with nonstructural proteins encoded at the 5' end and structural proteins at the 3' end PUBMED:8254779. \ Proteins in this family are encoded by astrovirus ORF2, one of the three astrovirus ORFs (1a, 1b, 2). The proteins contain a viral RNA-dependent RNA polymerase motif PUBMED:8254779. The 87kDa precursor polyprotein\ undergoes an intracellular cleavage to form a 79kDa protein. Subsequently, extracellular trypsin cleavage yields the three\ proteins forming the infectious virion PUBMED:10644354.\ 1389 IPR002634 \ This family consist of the morpho-protein BolA from Escherichia coli and its various homologs. In E. coli, over-expression of this protein causes round morphology and may be involved in switching the cell between elongation and septation systems during cell division PUBMED:10361282. The expression of BolA is growth rate regulated and is induced during the transition into the the stationary phase PUBMED:10361282. BolA is also induced by stress during early stages of growth PUBMED:10361282 and may have a general role in stress response. It has also been suggested that BolA can induce the transcription of penicillin binding proteins 6 and 5 PUBMED:2684651, PUBMED:10361282.\ 1466 IPR003722 \

    A number of bacteria synthesize cobalamin (vitamin B12) by an anaerobic pathway, in which cobalt is added at an early stage and molecular oxygen is not required PUBMED:9742225. Of the 30 cobalamin synthetic genes, 25 are clustered in one operon, cob, and are arranged in three groups, each group encoding enzymes for a biochemically distinct portion of the biosynthetic pathway PUBMED:8501034. Precorrin-8X methylmutase (also known as precorrin isomerase), CbiC/CobH, catalyses a methyl rearrangement.

    \ 742 IPR006603 \

    This repeated motif of unknown function has been found between the transmembrane helices of cystinosin, yeast\ ERS1 and mannose-P-dolichol utilization defect\ 1. The positioning of this repeat suggests that it may be\ associated with the glycosylation machinery.

    \ 124 IPR000644 \

    CBS (cystathionine-beta-synthase) domains are small intracellular modules, mostly found in two or four copies within a protein, that occur in several different proteins in all kingdoms of life. Tandem pairs of CBS domains can act as binding domains for adenosine derivatives and may regulate the activity of attached enzymatic or other domains PUBMED:14722619. In some cases, CBS domains may act as sensors of cellular energy status by being activated by AMP and inhibited by ATP PUBMED:14722619. In chloride ion channels, the CBS domains have been implicated in intracellular targeting and trafficking, as well as in protein-protein interactions, but results vary with different channels: in the CLC-5 channel, the CBS domain was shown to be required for trafficking PUBMED:14521953, while in the CLC-1 channel, the CBS domain was shown to be critical for channel function, but not necessary for trafficking PUBMED:14718533.

    \ \

    Mutations in conserved residues within CBS domains cause a variety of human hereditary diseases, including (with the gene mutated in parentheses): homocystinuria (cystathionine beta-synthase); Wolff-Parkinson-White syndrome (gamma 2 subunit of AMP-activated protein kinase); retinitis pigmentosa (IMP dehydrogenase-1); congenital myotonia, idiopathic generalized epilepsy, hypercalciuric nephrolithiasis, and classic Bartter syndrome (CLC chloride channel family members).

    \ \ \ 4506 IPR006972 \ SseC is a secreted protein that forms a complex together with SecB and SecD on the surface of Salmonella typhimurium. All these proteins are secreted by the type III secretion system PUBMED:1156700. Many mucosal pathogens use type III secretion systems for the injection of effector proteins into target cells. SecB, SseC and SecD are inserted into the target cell membrane. where they form a small pore or translocon PUBMED:1156700, PUBMED:11580752. In addition to SseC, this family includes the bacterial secreted proteins PopB, PepB, YopB and EspD which are thought to be directly involved in pore formation, and type III secretion system translocon.\ 6688 IPR010676 \

    This family consists of several bacterial proteins of around 180 residues in length. Members of this family seem to be specific to Listeria species and the function of the family is unknown.

    \ 740 IPR000030 \ This mycobacterial family is named after a conserved amino-terminal region of about 180\ amino acids, the PPE motif. The carboxy termini of proteins belonging to the PPE family are variable, and on the basis of this region at least three groups can be distinguished. The MPTR subgroup is characterized by tandem copies of a motif NXGXGNXG. The second subgroup contains a conserved motif at about position 350.\ The third group shares only similarity in the amino terminal region.\ The function of these proteins is uncertain but it has been suggested that they may be related to antigenic variation of Mycobacterium tuberculosis PUBMED:9634230.\ 2560 IPR000809 \ Many flagellar proteins are exported by a flagellum-specific export pathway. Attempts have been \ made to characterise the apparatus responsible for this process, by designing assays to screen \ for mutants with export defects PUBMED:1646201. Experiments involving filament removal from \ temperature-sensitive flagellar mutants of Salmonella typhimurium have shown that, while most \ mutants were able to regrow filaments, flhA, fliH, fliI and fliN mutants showed no or greatly \ reduced regrowth. This suggests that the corresponding gene products are involved in the process \ of flagellum-specific export. The sequences of fliH, fliI and the adjacent gene, fliJ, have been \ deduced. FliJ was shown to encode a protein of molecular mass 17,302 Da PUBMED:1646201. It is a \ membrane-associated protein that affects chemotactic events, mutations in FliJ result in failure \ to respond to chemotactic stimuli.\ 301 IPR006769 \ This family represents a conserved region found in several uncharacterised eukaryotic proteins.\ 7219 IPR010867 \

    This is a nine residue repeat, which was called NPR after NonaPeptide Repeat. It is found in two malarial proteins and has the consensus EEhhEEhhP where h stands for a hydrophobic amino acid.

    \ 6636 IPR010656 \

    This domain represents a conserved region located towards the N terminus of the DctM subunit of the bacterial and archaeal TRAP C4-dicarboxylate transport (Dct) system permease. In general, C4-dicarboxylate transport systems allow C4-dicarboxylates like succinate, fumarate, and malate to be taken up. TRAP C4-dicarboxylate carriers are secondary carriers that use an electrochemical H+ gradient as the driving force for transport. DctM is an integral membrane protein that is one of the constituents of TRAP carriers PUBMED:11803016, PUBMED:11524131. Note that many family members are hypothetical proteins.

    \ 184 IPR006765 \

    Aromatic polyketides are assembled by a type II (iterative) polyketide synthase in bacteria. Iterative type II polyketide synthases produce polyketide chains of variable but defined length from a\ specific starter unit and a number of extender units. They also specify the initial regiospecific folding and cyclization pattern of nascent\ polyketides either through the action of a cyclase (CYC) subunit or through the combined action of site-specific ketoreductase \ and CYC subunits. Additional CYCs and other modifications may be necessary to produce linear aromatic polyketides.

    This family represents a number of cyclases involved in polyketide synthesis in a number of actinobacterial species.

    \ 6675 IPR009655 \

    This region is of unknown function, which is found at the C terminus of archaeal peptidases, which are related to FlaK, a preflagellin aspartic acid signal peptidase PUBMED:14622420. Flak is required for the removal of the leader peptide from archaeal flagellin. As bacterial flagellins lack a leader peptide and a peptidase is not required for export and assembly, the requirement for FlaK further emphasizes the similarity archaeal flagella have with type IV pili, rather than with bacterial flagella.

    \ \ \ \

    FlaK and it related sequences belong to the MEROPS peptidase family A24B (preflagellin peptidase, clan AD).

    \ 1108 IPR003470 \ Early region 3 (E3) of human adenoviruses (Ads) codes for proteins that appear to control viral interactions with the host PUBMED:8627757. This region called CR1 (conserved region 1) PUBMED:8627757 is found three times in Adenovirus type 19 (a subgroup D adenovirus) 49 Kd protein in the E3 region. CR1 is also found in the 20.1 Kd protein of subgroup B adenoviruses. The function of this 80 amino acid region is unknown. This region is probably a divergent immunoglobulin domain.\ 7602 IPR011694 \ This entry contains a group of peptides derived from a salivary gland cDNA library of the tick Ixodes scapularis PUBMED:12177149. Also present are peptides from a related tick species, Ixodes ricinus. They are characterised by a putative signal peptide, indicative of secretion, and conserved cysteine residues.\ 7538 IPR011705 \

    This domain is found associated with and .

    \ 2988 IPR000981 \ Oxytocin and vasopressin are nine-residue, structurally and functionally related neurohypophysial peptide \ hormones. Oxytocin mediates contraction of the smooth muscle of the uterus and mammary gland, while \ vasopressin has antidiuretic action on the kidney, and mediates vasoconstriction of the peripheral vessels \ PUBMED:3147712. In common with most active peptides, both hormones are synthesised as larger protein \ precursors that are enzymatically converted to their mature forms. Members of this family are found in birds,\ fish, reptiles and amphibians (mesotocin, isotocin, valitocin, glumitocin, aspargtocin, vasotocin, seritocin, \ asvatocin, phasvatocin), in worms (annetocin), octopi (cephalotocin), locust (locupressin or neuropeptide\ F1/F2) and in molluscs (conopressins G and S) PUBMED:7591488.\ 2437 IPR000781 \ The Drosophila protein 'enhancer of rudimentary' (gene (e(r)) is a small protein of 104\ residues whose function is not yet clear. From an evolutionary point of view, it is highly\ conserved PUBMED:9074495 and has been found to exist in probably all multicellular\ eukaryotic organisms. It has been proposed that this protein plays a role in the cell cycle.\ 1028 IPR005834 \

    This group of hydrolase enzymes is structurally different from the alpha/beta hydrolase family (abhydrolase). This group includes L-2-haloacid\ dehalogenase, epoxide hydrolases and phosphatases. The structure consists of two domains. One is an\ inserted four helix bundle, which is the least well conserved region of the alignment, between residues 16 and 96 of\ HAD1_PSESP. The rest of the fold is composed of the core alpha/beta domain.

    \ 7839 IPR012641 \

    Members of this family are polydna viral proteins that contain a cysteine rich motif PUBMED:11724552. Some members of this family have multiple copies of this domain.

    \ 5026 IPR003126 \

    The N-end rule-based degradation signal, which targets a protein for ubiquitin-dependent proteolysis, comprises a destabilizing amino-terminal residue and a specific internal lysine residue. This entry describes a putative zinc finger in N-recognin, a recognition component of the N-end rule pathway.

    \ 1171 IPR008152 \

    Adaptins are components of the adaptor complexes which link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. Gamma-adaptin is a subunit of the golgi adaptor. Alpha adaptin is a heterotetramer that regulates clathrin-bud formation. The carboxyl-terminal appendage of the alpha subunit regulates translocation of endocytic accessory proteins to the bud site. This Ig-fold domain is found in alpha, beta and gamma adaptins and consists of a beta-sandwich containing 7 strands in 2 beta-sheets in a greek-key topology PUBMED:10430869, PUBMED:12176391. The adaptor appendage contains an additional N-terminal strand.

    \ \ 940 IPR002513 \ This family includes transposases of Tn3, Tn21, Tn1721,\ Tn2501, Tn3926 transposons from Escherichia coli. The specific binding of the Tn3 transposase to DNA has been demonstrated.\ Sequence analysis has suggested that the invariant triad of Asp689, Asp765, Glu895 (numbering as in Tn3) may correspond to the D-D-35-E motif previously implicated in the catalytic performance of numerous transposases PUBMED:8932514.\ 4854 IPR005266 \

    The function of this family is unknown. These proteins are from 222 to 233 residues in length, lack hydrophobic stretches, and are found so far only in thermophiles.

    \ 5481 IPR008521 \ This family consists of several eukaryotic proteins of unknown function.\ 2818 IPR007583 \ GRASP55 (Golgi reassembly stacking protein of 55 kDa) and GRASP65 (a 65 kDa) protein are highly homologous. GRASP55 is a component of the Golgi stacking machinery. GRASP65, an N-ethylmaleimide-sensitive membrane protein required for the stacking of Golgi cisternae in a cell-free system PUBMED:10487747.\ 195 IPR003533 \

    X-linked lissencephaly is a severe brain malformation affecting males.\ Recently it has been demonstrated that the doublecortin gene is implicated in\ this disorder PUBMED:9489699. Doublecortin was found to bind to the microtubule\ cytoskeleton. In vivo and in vitro assays show that Doublecortin stabilizes\ microtubules and causes bundling PUBMED:10441322. Doublecortin is a basic protein with an\ iso-electric point of 10, typical of microtubule-binding proteins. However,\ its sequence contains no known microtubule-binding domain(s).

    \ \

    The detailed sequence analysis of Doublecortin and Doublecortin-like proteins\ allowed the identification of an evolutionarily conserved Doublecortin (DC)\ domain. This domain is found in the N-terminus of proteins and consists of one\ or two tandemly repeated copies of an around 80 amino acids region. It has\ been suggested that the first DC domain of Doublecortin binds tubulin and\ enhances microtubule polymerization PUBMED:10749977.

    \ 5754 IPR009225 \

    This family consists of several phage head completion protein (GPL) as well as related bacterial sequences. Members of this family allow the completion of filled heads by rendering newly packaged DNA in the heads resistant to DNase. The protein is thought to bind to DNA filled capsids PUBMED:1837355.

    \ 4546 IPR002828 \

    Members of this family are acid phosphatases, PUBMED:1423722. Members include proteins from the yeast Yarrowia lipolytica, eubacterium Thermotoga maritama and crenarchaeon Pyrobactulum aerophilum. In bacteria they may be involved in the stress response PUBMED:11709173. Escherichia coli cells with the surE gene disrupted are found to survive poorly in stationary phase. In this organism, surE is next to pcm, an L-isoaspartyl protein repair methyltransferase that is also required for stationary phase survival PUBMED:7928962.

    \ 6901 IPR009778 \

    This family consists of several bacterial modulator of Rho-dependent transcription termination (ROF) proteins. ROF binds transcription termination factor Rho and inhibits Rho-dependent termination in vivo PUBMED:9723924.

    \ 4679 IPR012001 \

    A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cofactor. It has been shown PUBMED:8604141 that some of these enzymes are structurally related. This represents the N-terminal TPP binding domain of TPP enzymes.

    \ 5220 IPR008433 \

    Cytochrome oxidase subunit VIIB is one of the nuclear-coded polypeptide chains of cytochrome c oxidase, the terminal oxidase in mitochondrial electron transport. The X-ray structure of azide-bound fully oxidized cytochrome c oxidase from bovine heart at 2.9 A resolution has been determined PUBMED:10771420.

    \ 4213 IPR001975 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This family contains the L40 ribosomal protein from both prokaryotes and eukaryotes. Bovine ribosomal protein L40 has been identified as a secondary RNA binding protein PUBMED:3129699. L40 is fused to a ubiquitin protein PUBMED:7488009.

    \ 2729 IPR001000 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 10 \ comprises enzymes with a number of known activities; xylanase (); endo-1,3-beta-xylanase (); cellobiohydrolase (). These enzymes were formerly known as cellulase family F.

    \ \

    The microbial degradation of cellulose and xylans requires several types of\ enzymes such as endoglucanases (), cellobiohydrolases ()\ (exoglucanases), or xylanases () PUBMED:2252383, PUBMED:1886523. Fungi and bacteria produces\ a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the\ basis of sequence similarities, can be classified into families. One of these\ families is known as the cellulase family F PUBMED:2806912 or as the glycosyl hydrolases\ family 10 PUBMED:1747104.

    \ 8047 IPR013179 \

    This is a conserved group of eukaryotic proteins of unknown function.

    \ 3958 IPR005006 \

    This is a family of proteins expressed by members of the Poxviridae.

    \ 8085 IPR013204 \

    These short proteins are leader peptides (15-19 amino acids) of erm genes that code for resistance determinants in Staphylococcus aureus PUBMED:2985541.

    \ 5285 IPR008877 \

    The proteins associated with this family are either annotated as 'transposase' or 'hypothetical protein'. There is no direct evidence to suggest they are directly involved in transposition.

    \ 1405 IPR007519 \

    This domain is the N terminus of Saccharomyces cerevisiae Bul1. Bul1 binds the ubiquitin ligase Rsp5, via an N-terminal PPSY motif (157-160 in ) PUBMED:9931424. The complex containing Bul1 and Rsp5 is involved in intracellular trafficking of the general amino acid permease Gap1 PUBMED:11500494, degradation of Rog1 in cooperation with Bul2 and GSK-3 PUBMED:10958669, and mitochondrial inheritance PUBMED:10366593. Bul1 may contain HEAT repeats. The C terminus is .

    \ 3632 IPR003185 \ PA28 activator complex (also known as 11S regulator of 20S proteasome) is a ring shaped hexameric structure of alternating alpha and beta subunits. This entry represents the alpha subunit. The activator complex binds to the 20S proteasome and stimulates peptidase activity in and ATP-independent manner.\ 3905 IPR006041 \

    Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Ole e 1.

    \ \

    A number of plant pollen proteins, whose biological function is not yet\ known, are structurally related PUBMED:8404906.\ These proteins are most probably secreted and consist of about 145 residues.\ There are six cysteines\ which are conserved in the sequence of these proteins. They seem to be\ involved in disulphide bonds.

    \ 241 IPR004127 \

    Prefoldin (PFD) is a chaperone that interacts exclusively with type II chaperonins, hetero-oligomers lacking an obligate co-chaperonin that are found only in eukaryotes (chaperonin-containing T-complex polypeptide-1 (CCT)) and archaea. Eukaryotic PFD is a multi-subunit complex containing six polypeptides in the molecular mass range of 1423 kDa. In archaea, on the other hand, PFD is composed of two types of subunits, two alpha and four beta. The six subunits associate to form two back-to-back up-and-down eight-stranded barrels, from which hang six coiled coils. Each subunit contributes one (beta subunits) or two (alpha subunits) beta hairpin turns to the barrels. The coiled coils are formed by the N and C termini of an individual subunit. Overall, this unique arrangement resembles a jellyfish. The eukaryotic PFD hexamer is composed of six different subunits; however, these can be grouped into two alpha-like (PFD3 and -5) and four beta-like (PFD1, -2, -4, and -6) subunits based on amino acid sequence similarity with their archaeal counterparts. Eukaryotic PFD has a six-legged structure similar to that seen in the archaeal homologue PUBMED:11106732, PUBMED:12456645. This family contains the archaeal alpha subunit, eukaryotic prefoldin subunits 3 and 5 and the UXT (ubiquitously expressed transcript) family. \ \

    \ \

    Eukaryotic PFD has been shown to bind both actin and tubulin co-translationally. The chaperone then delivers the target protein to CCT, interacting with the chaperonin through the tips of the coiled coils. No authentic target proteins of any archaeal PFD have been identified, to date.

    \ 1333 IPR004955 \

    This family includes the gp64 glycoprotein from baculovirus as well as other viruses. The gp64 protein is a phosphoglycoprotein located on the surface of both infected cells and budding virions. The protein may play a role in fusion of the viral envelope with the endosomal membrane for viral entry into cells.

    \ 205 IPR002710 \ Dilute encodes a novel type of myosin heavy chain, with a tail, or C-terminal, region that has elements of both type II (alpha-helical coiled-coil) and type I (non-coiled-coil) myosin heavy chains. \ The DIL non alpha-helical domain is found in dilute myosin heavy chain proteins and other myosins. In mouse the dilute protein may play a role in the elaboration, maintenance, or function of cellular processes of melanocytes and neurons PUBMED:1996138.\ The MYO2 protein of Saccharomyces cerevisiae is implicated in vectorial vesicle transport and is homologous to the dilute protein over practically its entire length PUBMED:2016335.\ 7474 IPR011498 \

    The kelch motif was initially discovered in Kelch (). In this protein there are six copies of the motif. It has been shown that is related to galactose oxidase PUBMED:8126718 for which a structure has been solved PUBMED:2002850. The kelch motif forms a beta sheet and several of these sheets associate to form a beta propeller structure as found in , and .

    \ 900 IPR007304 \ The TOR signalling pathway activates a cell-growth program in response to nutrients PUBMED:10604478. TIP41 interacts with TAP42 and negatively regulates the TOR signaling pathway PUBMED:11741537.\ 3875 IPR004171 \ Members of this family are extremely potent competitive inhibitors of cAMP-dependent protein kinase activity. These proteins interact with the catalytic subunit of the enzyme after the cAMP-induced dissociation of its regulatory chains.\ 5437 IPR008662 \ This family contains Rattus norvegicus LAP1C proteins and several uncharacterised highly related sequences from both Mus sp. and humans. LAP1s (lamina-associated polypeptide 1s) are type 2 integral membrane proteins with a single membrane-spanning region of the inner nuclear membrane PUBMED:12061773. LAP1s bind to both A- and B-type lamins and have a putative role in the membrane attachment and assembly of the nuclear lamina PUBMED:7721789.\ 1925 IPR003829 \

    This entry represents N-terminal domain of Pirin proteins from both eukaryotes and prokaryotes. The function of Pirin is unknown but the gene coding for this protein is known to be expressed in all tissues in the human body although it is expressed most strongly in the liver and heart. Pirin is known to be a nuclear protein, exclusively localised within the nucleoplasma and predominantly concentrated within dot-like subnuclear structures. A tomato homologue of human Pirin has been found to be induced during programmed cell death. Human Pirin interacts with Bcl-3 and NFI and hence is probably involved in the regulation of DNA transcription and replication. It appears to be an Fe(II)-containing member of the Cupin superfamily.

    \ 269 IPR005514 \

    This is a family of uncharacterised proteins from Caenorhabditis elegans.

    \ 6671 IPR010668 \

    This family consists of a number of hypothetical bacterial proteins of around 200 residues in length. The function of this family is unknown.

    \ 770 IPR000331 \

    The Rap/ran-GAP domain is found in the GTPase activating protein (GAP) responsible for the activation of nuclear Ras-related regulatory proteins Rap1, Rsr1 and Ran in vitro converting it to the putatively inactive GDP-bound state PUBMED:1904317, PUBMED:7799964. Ran is an evolutionary conserved member of the Ras superfamily that regulates all receptor-mediated transport between the nucleus and the cytoplasm. RanGAP is a leucine rich repeat containing protein which forms a highly curved crescent. Each LRR forms a short beta-strand and a longer alpha-helix that results in a beta-alpha hairpin motif PUBMED:12019565.

    \ \

    The domain is also present in tuberin (a tuberous sclerosis homolog protein) that specifically stimulates the intrinsic GTPase activity of Ras-related protein Rap1A suggesting a possible mechanism for its role in the regulation of cellular growth.

    \ 3794 IPR006845 \

    This region is the N-terminal part of the Pex2 and Pex12 peroxisomal biogenesis proteins, which contain two predicted transmembrane segments. The majority of these proteins have a C-terminal ring finger domain .

    \ 3341 IPR006742 \

    This repeated sequence,WHWLQLKPGQPMY, characterizes the mating factor alpha-1 or alpha-1 mating pheromone [contains: Mating factor alpha].The hormone is excreted into the culture medium by haploid cells of the alpha mating type and acts on cells of the opposite mating type (type A) by binding to a cognate G-protein coupled receptor which is coupled to a downstream signal transduction pathway. It inhibits DNA synthesis in type A cells synchronising them with type alpha, and so mediates the conjugation process.

    \ 1386 IPR003929 \

    Potassium channels are the most diverse group of the ion channel family\ PUBMED:1772658, PUBMED:1879548. They are important in shaping the action potential, and in neuronal excitability and plasticity PUBMED:2451788. The potassium channel family is\ composed of several functionally distinct isoforms, which can be broadly\ separated into 2 groups PUBMED:2555158: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.

    \

    These are all highly similar proteins, with only small amino acid\ changes causing the diversity of the voltage-dependent gating mechanism,\ channel conductance and toxin binding properties. Each type of K+ channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; and others are regulated by GTP-binding proteins or\ other second messengers PUBMED:2448635. In eukaryotic cells, K+ channels\ are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes PUBMED:1373731. In prokaryotic cells, they play a role in the\ maintenance of ionic homeostasis PUBMED:11178249.

    \

    All K+ channels discovered so far possess a core of \ alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has\ been termed the K+ selectivity sequence.\ In families that contain one P-domain, four subunits assemble to form a selective pathway for K+ across the membrane.\ However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K+ channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains.\ The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K+ channels; and three types of calcium (Ca)-activated K+ channels (BK, IK and SK)\ PUBMED:11178249, PUBMED:. The 2TM domain family comprises inward-rectifying K+ \ channels. In addition, there are K+ channel alpha-subunits that possess two P-domains. These are usually highly regulated K+ selective leak channels.

    \

    Ca2+-activated K+ channels are a diverse group of channels that are activated by an increase in intracellular Ca2+ concentration. They are found in the majority of nerve cells, where they modulate cell excitability and action potential. Three types of Ca2+-activated K+ channel have been characterised, termed small-conductance (SK), intermediate conductance (IK) and large conductance (BK) respectively PUBMED:9687354.

    \ \

    BK channels (also referred to as maxi-K channels) are widely expressed in the body, being found in glandular tissue, smooth and skeletal muscle, as well as in neural tissues. They have been demonstrated to regulate arteriolar and airway diameter, and also neurotransmitter release. Each channel complex is thought to be composed of 2 types of subunit; the pore-forming (alpha) subunits and smaller accessory (beta) subunits.

    \ \

    The alpha subunit of the BK channel was initially thought to share the characteristic 6TM organisation of the voltage-gated K+ channels. However, the molecule is now thought to possess an additional TM domain, with an extracellular N-terminus and intracellular C-terminus. This C-terminal region contains 4 predominantly hydrophobic domains, which are also thought to lie intracellularly. The extracellular N-terminus and the first TM region are required for modulation by the beta subunit. The precise location of the Ca2+-binding site that modulates channel activation remains unknown, but it is thought to lie within the C-terminal hydrophobic domains.

    \ 3375 IPR002758 \

    This family contains both characterised and uncharacterised bacterial and archaeal proteins; some of which are possibly transmembrane proteins involved in Na+/H+ or K+/H+ transport.

    \ \ \

    The characterised proteins are mnhE (Staphylococcus aureus) and phaE. (Rhizobium meliloti), which are subunits of the Na+/H+ or K+/H+ antiporters, that are required for sodium and potassium excretion, respectively PUBMED:9852009, PUBMED:9680201.

    \ \ \ 4649 IPR001889 \

    The thymidine kinase from herpes virus catalyses the reaction:

    \

    The enzyme is not subject to feedback inhibition by its product and the crystal structure of the enzyme from HSV type 1 has been reported PUBMED:7552712.

    \ 2335 IPR007841 \ The proteins in this family are functionally uncharacterised. The proteins are around 450 amino acids long.\ 4275 IPR007224 \

    The RNA polymerase I specific transcription initiation factor is a member of a multiprotein complex essential for the initiation of transcription by RNA polymerase I. Binding to the DNA template is dependent on the initial binding of other factors.

    \ 4881 IPR005374 \

    This is a small family of mainly hypothetical proteins of unknown function.

    \ 6168 IPR009406 \

    This family consists of several putative head-tail joining bacteriophage proteins.

    \ 4653 IPR001337 \ This family contains coat proteins from tobamoviruses. Tobamoviruses are ssRNA positive-strand viruses with no DNA stage.\

    In order to establish infections, viruses must be delivered to the cells of potential hosts and must then engage in activities that enable their genomes to be expressed and replicated. With most viruses, the events that precede the onset of production of progeny virus particles are referred to as the early events and, in the case of positive-strand RNA viruses, they include the initial\ interaction with and entry of host cells and the release (uncoating) of the genome from the virus particles. The uncoating process in tobacco mosaic virus may involve the bidirectional release of coat protein subunits from the viral RNA which may be mediated by cotranslational and coreplicational disassembly mechanisms PUBMED:10212940.

    \

    The tobacco mosaic virus particle is assembled from its constituent coat protein and RNA by a complex process. The protein forms an obligatory intermediate (a cylindrical disk composed of two layers of protein units), which recognizes a specific RNA hairpin sequence. This mechanism simultaneously fulfils the physical requirement for nucleating the growth of the helical particle and the biological requirement for specific recognition of the viral DNA PUBMED:10212932.

    \ 5912 IPR009284 \

    This family consists of several Cytomegalovirus TRL10 proteins. TRL10 represents a structural component of the virus particle and like the other HCMV envelope glycoproteins, is present in a disulfide-linked complex PUBMED:11773418.

    \ 3173 IPR000033 \

    The low-density lipoprotein receptor (LDLR) regulates cholesterol homeostasis in mammalian cells. LDLR binds cholesterol-carrying LDL, associates with clathrin-coated pits, and is internalized into acidic endosomes where it separates from its ligand. The ligand is degraded in lysosomes, while the receptor returns to the cell surface PUBMED:3513311. The LDLR has several domains. The ligand-binding domain contains seven LDL receptor class A repeats, each with three disulfide bonds and a coordinated Ca2+ ion. The second conserved region contains two EGF repeats, followed by six YWTD or LDL receptor class B repeats and another EGF repeat PUBMED:9790844. This conserved region is critical for ligand release and recycling of the receptor PUBMED:3494949.

    \

    The structure of the six YWTD repeats of LDL receptor have been solved PUBMED:11373616. The six YWTD repeats together fold into a six-bladed beta-propeller. Each blade of the propeller consists of four antiparallel beta-strands; the innermost strand of each blade is labeled 1 and the outermost strand, 4. The sequence repeats are offset with respect to the blades of the propeller, such that any given 40-residue YWTD repeat spans strands 24 of one propeller blade and strand 1 of the subsequent blade. This offset ensures circularization of the propeller because the last strand of the final sequence repeat acts as an innermost strand 1 of the blade that harbors strands 24 from the first sequence repeat. The repeat is found in a variety of proteins that include, vitellogenin receptor from Drosophila melanogaster, low-density lipoprotein (LDL) receptor PUBMED:6091915, preproepidermal growth factor, and nidogen (entactin).

    \ 2180 IPR007473 \ This is a bacterial protein of unknown function, possibly secreted.\ 3459 IPR001463 \

    Sodium symporters can be divided by sequence and functional similarity\ into various groups. One such group is the sodium/alanine symporter family,\ the members of which transport alanine in association with sodium ions.

    \ \

    These transporters are believed to possess 8 transmembrane (TM) helices\ PUBMED:1447975, PUBMED:1400476, forming a channel or pore through the cytoplasmic membrane, the\ interior face being hydrophilic to allow the passage of alanine molecules\ and sodium ions PUBMED:1447975. This family is restricted to the bacteria and archaea, examples are the alanine carrier protein from the Thermophilic bacterium\ PS-3; the D-alanine/glycine permease from Alteromonas haloplanktis; and the\ hypothetical protein yaaJ from Escherichai coli.

    \ 6084 IPR009365 \

    This family consists of several Nucleopolyhedrovirus late expression factor-12 (LEF-12) proteins. The function of this family is unknown PUBMED:10814576,PUBMED:12414945.

    \ 4006 IPR003465 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    Members of the Pin2 family are proteinase inhibitors that belong to MEROPS inhibitor family I20, clan IA and are restricted to plants. They inhibit serine peptidases belonging to MEROPS peptidase family S1 PUBMED:14705960 (). They have a multidomain structure PUBMED:12446136, which permits circular permutation of the sequences. It was been shown that some naturally occurring Pin2 proteins, have an 'ancestral' circularly permuted structure PUBMED:11604534. Circular permutation/ rearrangements of sequences has also been observed between species, such as favin from Vicia faba and the lectin concanavalin A from Canavalia ensiformis PUBMED:4506778 or amongst members of the plant aspartyl proteinases and human lung surfactant proteins PUBMED:7610480.

    \ \ \

    The Pin2 family of proteinase inhibitors are present in seeds, leaves and other organs. Perhaps the best known representatives are the wound-induced proteinase inhibitors PUBMED:11216843, PUBMED:11351092, which contain up to eight sequence-repeats (the 'IP repeats'). The sequence of the IP repeats is quite variable, only the cysteines constituting the four disulfide bridges and a single proline residue are conserved throughout all the known repeat sequences. The structure of the proteinase inhibitor complex is known PUBMED:2494344.

    \ 6542 IPR009590 \

    This domain is found at the N terminus of the Gp5 baseplate protein of bacteriophage T4. This domain binds to the Gp27 protein PUBMED:11823865. This domain has the common OB fold PUBMED:11823865.

    \ 2148 IPR007437 \ This family contains several proteins of uncharacterised function.\ 4233 IPR001648 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Evidence suggests that, in prokaryotes, the peptidyl transferase reaction is performed by the large subunit 23S rRNA, whereas proteins probably have a greater role in eukaryotic ribosomes. Most of the proteins lie close to, or on the surface of, the 30S subunit, arranged peripherally around the rRNA PUBMED:9281425. The small subunit ribosomal proteins can be categorised as primary binding proteins, which bind directly and independently to 16S rRNA; secondary binding proteins, which display no specific affinity for 16S rRNA, but its assembly is contingent upon the presence of one or more primary binding proteins; and tertiary binding proteins, which require the presence of one or more secondary binding proteins and sometimes other tertiary binding proteins.

    \

    The small ribosomal subunit protein S18 is known to be involved in binding the aminoacyl-tRNA complex in Escherichia coli PUBMED:2647521, and appears to be situated at the tRNA A-site. Experimental evidence has revealed that S18 is well exposed on the surface of the E. coli ribosome, and is a secondary rRNA binding protein PUBMED:9371771. S18 belongs to a family of ribosomal proteins PUBMED:2179947 that includes: eubacterial S18; metazoan mitochondrial S18, algal and plant chloroplast S18; and cyanelle S18.

    \ \ 4839 IPR003507 \

    This signature is found in the Escherichia coli microcin C7 self-immunity protein mccF and in muramoyltetrapeptide carboxypeptidase (, LD-carboxypeptidase A). LD-carboxypeptidase A belongs to MEROPS peptidase family S66 (clan SS). The entry also contains uncharacterised proteins including hypothetical proteins from various bacteria archaea.

    \ \ \ 6629 IPR009632 \

    This family, is restricted to plants and specifically to the Solanaceae. The members have low similarity to metallocarboxypeptidase inhibitor (MCPI) proteins from potato and tomato () and there is no direct evidence to suggest that these are proteinase inhibitors, despite publications to the contrary PUBMED:11488477. In Henbane (Hyoscyamus niger) the HR7 protein is found in the primordia and bases of lateral roots. Over expression of HR7 dramatically enhances the frequency of lateral root formation, suggesting it plays a role in lateral root initiation PUBMED:10356981.

    \ 4566 IPR007055 \

    The BON domain is typically ~60 residues long and has an alpha/beta predicted fold. There is a\ conserved glycine residue and several hydrophobic regions. This pattern of conservation is more\ suggestive of a binding or structural function rather than a catalytic function. Most proteobacteria seem to possess one or two BON-containing proteins, typically of the\ OsmY-type proteins; outside of this group the distribution is more disparate.

    The OsmY protein is an Escherichia coli 20 kDa outer membrane or periplasmic protein that is expressed in response to a variety of stress conditions, in particular, helping to provide\ protection against osmotic shock. One hypothesis is that OsmY prevents shrinkage of\ the cytoplasmic compartment by contacting the phospholipid interfaces surrounding the periplasmic\ space. The domain architecture of two BON domains alone suggests\ that these domains contact the surfaces of phospholipids, with each domain contacting a membrane PUBMED:12878000.

    \ \ 6165 IPR010461 \

    This family consists of several bacterial ComK proteins. The ComK protein of Bacillus subtilis positively regulates the transcription of several late competence genes as well as comK itself. It has been found that ClpX plays an important role in the regulation of ComK at the post-transcriptional level PUBMED:12761164.

    \ 5028 IPR000962 \

    This domain identifies the members of the DksA/TraR family that have a zinc finger.

    \ \ \

    DksA is a critical component of the rRNA transcription initiation machinery that potentiates the regulation of rRNA promoters by ppGpp and the initiating NTP. In delta-dksA mutants, rRNA promoters are unresponsive to changes in amino acid availability, growth rate, or growth phase. In vitro, DksA binds to RNAP, reduces open complex lifetime, inhibits rRNA promoter activity, and amplifies effects of ppGpp and the initiating NTP on rRNA transcription PUBMED:15294156, PUBMED:15294157. The dksA gene product suppresses the temperature-sensitive growth and filamentation of a dnaK deletion mutant of Escherichia coli. Gene knockout PUBMED:2180916 and deletion PUBMED:8063112 experiments have shown the gene to be non-essential, mutations causing a mild sensitivity to UV light, but not affecting DNA recombination PUBMED:8063112. In Pseudomonas aeruginosa, dksA is a novel regulator involved in the post-transcriptional control of extracellular virulence factor production PUBMED:12775693.

    \ \

    The proteins contain a C-terminal region thought to fold into a 4-cysteine zinc finger. Other proteins found to contain a similar domain include:

    \ \ \ 1730 IPR005580 \

    This RNA binding domain is found at the C-terminus of a number of DEAD helicase proteins PUBMED:10481020.

    \ 2541 IPR001492 \ Flagellin is the subunit which polymerizes to form the filaments of bacterial\ flagella. This N-terminal domain and the C-terminal always occur together.\ 4872 IPR005363 \

    The proteins in this family are about 200 amino acids long and each contain 3 CXXC motifs.

    \ 4279 IPR007830 \ RNA polymerase I is comprised of 14 different subunits. The Rpa43 sbunit is one of the subunits contacted by the transcription factor TIF-IA PUBMED:12393749.\ 1617 IPR004630 \

    This is a family of conserved hypothetical proteins, which are found in both Gram-positive and Gram-negative species. Members of this family are found so far only in one archaeal species, Archaeoglobus fulgidus. These proteins have a molecular weight of approximately 35 to 38 kDa.

    \ 2463 IPR003400 \ This group of proteins are membrane bound transport proteins essential for ferric ion uptake in bacteria PUBMED:9371459. The family consists of ExbD, and TolR which are involved in TonB-dependent transport of various receptor bound substrates including colicins PUBMED:3294803.\ 3881 IPR002642 \ This family consists of lysophospholipase / phospholipase B and cytosolic phospholipase A2 which also has a C2 domain . Phospholipase B enzymes catalyse the release of fatty acids from lysophsopholipids and are capable in vitro of hydrolyzing all phospholipids extractable from yeast cells PUBMED:8027085. Cytosolic phospholipase A2 associates with natural membranes in response to physiological increases in Ca2+ and selectively hydrolyses arachidonyl phospholipids PUBMED:8051052, the aligned region corresponds the carboxy-terminal Ca2+-independent catalytic domain of the protein as discussed in PUBMED:8051052.\ 75 IPR005520 \

    This domain is found in attacin and sarcotoxin, but not diptericin (which shares similarity to the C-terminal region of attacin). All these proteins are insect antibacterial proteins which are induced by the fat body and subsequently secreted into the hemolymph where they act synergistically to kill the invading microorganism PUBMED:7772280.

    \ 6860 IPR009753 \

    This family consists of several hypothetical 9.4 kDa Borrelia burgdorferi (Lyme disease spirochete) proteins of around 78 residues in length. The function of this family is unknown.

    \ 5542 IPR008421 \ This family consists of several virulent strain associated lipoproteins from Borrelia burgdorferi.\ 1971 IPR005020 \

    This is a family of Caenorhabditis elegans proteins of unknown function.

    \ 5900 IPR009278 \

    This family consists of several US9 and related proteins from the Alphaherpesviruses. The function of the US9 protein is unknown although in Bovine herpesvirus 5 Us9 is essential for the anterograde spread of the virus from the olfactory mucosa to the bulb PUBMED:11907224.

    \ 6430 IPR010569 \

    This family represents a region within eukaryotic myotubularin-related proteins that is sometimes found with . Myotubularin is a dual-specific lipid phosphatase that dephosphorylates phosphatidylinositol 3-phosphate and phosphatidylinositol (3,5)-bi-phosphate PUBMED:12847286. Mutations in gene encoding myotubularin-related proteins have been associated with disease PUBMED:12045210.

    \ 6906 IPR009782 \

    This family consists of several hypothetical mammalian proteins of around 320 residues in length. The function of this family is unknown although several of the family members are annotated as putative 40-2-3 proteins.

    \ 4010 IPR007738 \ The homeobox gene Prox1 is expressed in a subpopulation of endothelial cells that, after budding from veins, gives rise to the mammalian lymphatic system PUBMED:11927535. Prox1 has been found to be an early specific marker for the developing liver and pancreas in the mammalian foregut endoderm PUBMED:12351178. This family contains an atypical homeobox domain.\ 704 IPR006944 \

    This is a family of bacteriophage and prophage portal proteins. Positioned at one of the twelve icosahedral vertices, of the viral capsid, is a dodecameric complex of the virus encoded portal protein. This dodecameric complex, known as the portal or connector complex, forms the channel through which the viral DNA is packaged into the capsid, and through which it exits during infection. While the portal proteins from different phage show relatively little sequence homology and vary widely in molecular weight, portal complexes display significant morphological similarity as determined by electron microscopy. Morphologically, they present as disk-like structures approximately 150 Angstroms in diameter with radially arranged projections and a 30 Angstroms central channel. The packaging reaction is energy dependent and typically involves several components. ATP hydrolysis provides the driving force, and it is estimated that one ATP molecule is required for every base pair that is packaged. It appears that the portal motor may represent a new and extremely powerful class of motor which couples rotation to DNA translocation PUBMED:11839289.

    \ 274 IPR007139 \ This motif is found singly or as up to five tandem repeats in a small set of bacterial proteins. There are two or three alpha-helices, and possibly a beta-strand.\ 4220 IPR004038 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    This family includes: Ribosomal L7A from metazoa, Ribosomal L8-A and L8-B from fungi, 30S ribosomal protein HS6 from archaebacteria, 40S ribosomal protein S12 from eukaryotes, ribosomal protein L30 from eukaryotes and archaebacteria, Gadd45 and MyD118 PUBMED:9151207.

    \ 4440 IPR004658 \ Slp superfamily members are present in the Gram-negative gamma proteobacteria Escherichia coli, which also contains a close paralog, Haemophilus influenzae and Pasteurella multocida and Vibrio cholera. The known members of the family to date share a motif LX[GA]C near the N-terminus, which is compatible with the possibility that the protein is modified into a lipoprotein with Cys as the new N-terminus. Slp from Escherichia coli is known to be a lipoprotein of the outer membrane and to be expressed in response to carbon starvation.\ 4889 IPR002639 \ This family consists of the urease accessory protein, UreF. The urease enzyme (urea amidohydrolase) hydrolyses urea into ammonia and carbamic acid PUBMED:8550495. UreF is proposed to modulate the activation process of urease by eliminating the binding of nickel irons to noncarbamylated protein PUBMED:8808930.\ 3481 IPR002182 \

    This is the NB-ARC domain, a novel signalling motif found in bacteria and eukaryotes, shared by plant resistance gene products and regulators of cell death in animals PUBMED:9545207.

    \ 7928 IPR012534 \

    This family consists of the bombolitin peptides that are found in the venom of the bumblebee Megabombus pennsylvanicus. Bombolitins are structurally and functionally very similar. They lyse erythrocytes and liposomes, release histamine from rat peritoneal mast cells, and stimulate phospholipase A2 from different sources PUBMED:2578459.

    \ 2843 IPR000926 \

    GTP cyclohydrolase II catalyses the first committed step in the biosynthesis of riboflavin. The enzyme\ converts GTP and water to formate, 2,5-diamino-6-hydroxy-4-(5-phosphoribosylamino)- pyrimidine and\ pyrophosphate, and requires magnesium as a cofactor. It is sometimes found as a bifunctional enzyme with 3,4-dihydroxy-2-butanone 4-phosphate synthase (DHBP_synthase) .

    \ 1215 IPR007242 \ In yeast, 15 Apg proteins coordinate the formation of autophagosomes. Autophagy is a bulk degradation process induced by starvation in eukaryotic cells PUBMED:11689437. Apg12 is covalently bound to Apg5 PUBMED:9852036.\ 2459 IPR004991 \

    This is a family of related bacterial toxins.

    \ 2932 IPR007764 \ UL43 genes are expressed with true-late (gamma2) kinetics and have been identified as a virion tegument component PUBMED:12029146. Studies suggest that the N-terminal sequences target UL43 to protein aggregates and that C-terminal sequences are important for incorporation into particles.\ 8077 IPR013221 \

    In the complex process of biosynthesis of bacterial peptidoglycan, the assembly of the peptide moiety of its monomer unit has recently been the topic of numerous investigations. It is performed by a series of enzymes designated as the Mur synthetases (MurC, MurD, MurE, and MurF), which are responsible for the successive additions of L-alanine, D-glutamate, meso-diaminopimelate or L-lysine, and D-alanyl-D-alanine to UDP-N-acetylmuramic acid.

    \ \ \

    UDP-N-acetylmuramoyl-L-alanine:D-glutamate ligase (MurD) is a cytoplasmic enzyme involved in the biosynthesis of peptidoglycan which catalyzes the addition of D-glutamate to the nucleotide precursor UDP-N-acetylmuramoyl-L-alanine (UMA). The crystal structure of MurD in the presence of its substrate UMA has been solved to 1.9 A resolution PUBMED:9218784. The structure comprises three domains of topology each reminiscent of nucleotide-binding folds: the N- and C-terminal domains are Rossman dinucleotide-binding folds, and the central domain is a mononucleotide-binding fold that is also observed in the GTPase family.

    \ \ 2164 IPR007546 \ This is a family of hypothetical bacterial proteins.\ 5670 IPR008612 \ This family consists of several mating pheromone proteins from Euplotes octocarinatus. Cells of the ten mating types of the ciliate Euplotes octocarinatus communicate by pheromones before they enter conjugation. The pheromones induce homotypic pairing when applied to mating types that do not secrete the same pheromone(s). Heterotypic pairs (i.e., those between cells of different mating types) are formed only when both mating types in a mixture secrete a pheromone that the other does not. The genetics of mating types is based on four codominant mating type alleles, each allele determining production of a different pheromone. The pheromones not only induce pair formation but also attract cells PUBMED:9018841.\ 7618 IPR012430 \

    Sequences making up this family are derived from hypothetical proteins expressed by both prokaryotic and eukaryotic species. The region in question is approximately 250 residues long.

    \ 4458 IPR001424 \

    Superoxide dismutases are ubiquitous metalloproteins that prevent damage\ by oxygen-mediated free radicals by catalysing the dismutation of superoxide\ into molecular oxygen and hydrogen peroxide PUBMED:2751312. Superoxide is a normal \ by-product of aerobic respiration and is produced by a number of reactions, \ including oxidative phosphorylation and photosynthesis. The dismutase\ enzymes have a very high catalytic efficiency due to the attraction of\ superoxide to the ions bound at the active site PUBMED:1463506, PUBMED:3891411.

    \

    There are three forms of superoxide dismutase, depending on the metal cofactor: \ Cu/Zn (which binds both copper and zinc), Fe and Mn types. The Fe and Mn\ forms are similar in their primary, secondary and tertiary structures, but \ are distinct from the Cu/Zn form PUBMED:2263641. Prokaryotes and protists contain Mn,\ Fe or both types, while most eukaryotic organisms utilise the Cu/Zn type.

    \ 2458 IPR000418 \

    Transcription factors are protein molecules that bind to specific DNA\ sequences in the genome, resulting in the induction or inhibition of gene\ transcription PUBMED:2163347. The ets oncogene is such a factor, possessing a region \ of 85-90 amino acids known as the ETS (erythroblast transformation specific) domain PUBMED:2163347, PUBMED:2253872, PUBMED:14693367. This domain is rich in\ positively-charged and aromatic residues, and binds to purine-rich segments\ of DNA. The ETS domain has been identified in other transcription factors\ such as PU.1, human erg, human elf-1, human elk-1, GA binding protein, and\ a number of others PUBMED:2163347, PUBMED:2253872, PUBMED:8425553.\ It is generally localized at the C-terminus of the protein,\ with the exception of ELF-1, ELK-1, ELK-3, ELK-4 and ERF where it is found at\ the N-terminus.

    \

    NMR-analysis of the structure of the Ets domains revealed that it contains three alpha-helixes (13)\ and four-stranded beta-sheets (14) arranged in the order alpha1-beta1-beta2-alpha2-alpha3-beta3-beta4 forming a\ winged helixturnhelix (wHTH) topology PUBMED:12559563. The third alpha-helix is\ responsive to contact to the major groove of the DNA. Different members of the Ets family proteins\ display distinct DNA binding specificities. The Ets domains and the flanking amino acid sequences\ of the proteins influence the binding affinity, and the alteration of a\ single amino acid in the Ets domain can change its DNA binding specificities.

    \

    Avian leukemia virus E26 is a replication defective retrovirus that induces a\ mixed erythroid/myeloid leukemia in chickens. E26 virus carries two distinct\ oncogenes, v-myb and v-ets. The ets portion of this oncogene is required for\ the induction of erythroblastosis. V-ets and c-ets-1, its cellular progenitor,\ have been shown PUBMED:2165853 to be nuclear DNA-binding proteins. Ets-1 differs slightly\ from v-ets at its carboxy-terminal region. In most species where it has been\ sequenced, c-ets-1 exists in various isoforms generated by alternative\ splicing and differential phosphorylation.

    \ 5124 IPR007961 \

    This family consists of several latent membrane protein 1 or LMP1s mostly from Human herpesvirus 4. LMP1 of EBV is\ a 62-65 kDa plasma membrane protein possessing six membrane spanning regions, a short\ cytoplasmic N terminus and a long cytoplasmic carboxy tail of 200 amino acids. EBV latent\ membrane protein 1 (LMP1) is essential for EBV-mediated transformation and has been associated\ with several cases of malignancies. EBV-like viruses in Macaca fascicularis (Cynomolgus monkeys) have been associated with high lymphoma rates in\ immunosuppressed monkeys PUBMED:12457963

    \ 4512 IPR000969 \ Human structure-specific recognition protein, SSRP1, PUBMED:1372440 binds specifically to DNA modified with\ the anti-cancer drug cisplatin. An 81 kD protein is predicted, containing several highly-charged domains \ and a stretch of 75 residues that share 47% identity with a portion of the high mobility group (HMG) protein \ HMG1. This HMG box probably constitutes the structure recognition element for cisplatin-modified DNA, the \ probable recognition motif being the local duplex unwinding and bending that occurs on formation of \ intra-strand cross-links PUBMED:1372440. SSRP1 is the human homologue of a recently identified mouse protein \ that binds to recombination signal sequences PUBMED:1678855. These sequences have been postulated to form \ stem-loop structures, further implicating local bends and unwinding in DNA as a recognition target for \ HMG-box proteins. A Drosophila melanogaster cDNA encoding an HMG-box-containing protein has also been \ isolated PUBMED:7688122, PUBMED:8479916. This protein shares 50% sequence identity with human SSRP1. In vitro\ binding studies using Drosophila SSRP showed that the protein binds to single-stranded DNA and RNA, with \ highest affinity for nucleotides G and U. Comparison of the predicted amino acid sequences among SSRP family \ members reveals 48% identity, with structural conservation in the C-terminus of the HMG box, as well as \ domains of highly charged residues. The most highly conserved regions lie in the poorly understood N-terminus, \ suggesting that this portion of the protein is critical for its function PUBMED:8479916.\ 2190 IPR007503 \ This is a family of hypothetical archaeal proteins.\ 7402 IPR011513 \

    Saccharomyces cerevisiae Nse1 () forms part of a complex with SMC5-SMC6. This non-structural maintenance of chromosomes (SMC) complex plays an essential role in genomic stability, being involved in DNA repair and DNA metabolism PUBMED:12966087, PUBMED:11927594. It is conserved in eukaryotes from yeast to human.

    \ 8105 IPR013237 \

    This region represents the zinc binding domain. It is found in the N-terminal region of the bacteriophage P4 alpha protein, which is is a multifunctional protein with origin recognition, helicase and primase activities PUBMED:8253092.

    \ 5654 IPR008774 \ This family consists of several phospholipase A2-like proteins, mostly from insects PUBMED:12167627.\ 1547 IPR002701 \

    Chorismate mutase, , catalyses the conversion of chorismate to prephenate in the pathway of tyrosine and phenylalanine biosynthesis. This enzyme is negatively regulated by tyrosine, tryptophan and phenylalanine PUBMED:9642265, PUBMED:9497350. Prephenate dehydratase (, , PDT) catalyses the decarboxylation of prephenate into phenylpyruvate. In microorganisms PDT is involved in the terminal pathway of the biosynthesis of phenylalanine. In some bacteria, such as Escherichia coli, PDT is part of a bifunctional enzyme (P-protein) that also catalyzes the transformation of chorismate into prephenate (chorismate mutase) while in other bacteria it is a monofunctional enzyme. The sequence of monofunctional chorismate mutase aligns well with the N-terminal part of P-proteins PUBMED:9642265.

    \ 3072 IPR002681 \ This family consists of various coat proteins from the ilarviruses part of the Bromoviridae, members include apple mosaic virus and prune dwarf virus. The ilarvirus coat protein is required to initiate replication of the viral genome in host plants PUBMED:7730792. Members of the Bromoviridae have a positive stand ssRNA genome with no DNA stage in their replication.\ 248 IPR004159 \

    Members of this family of hypothetical plant proteins are putative methyltransferases.

    \ 6766 IPR010707 \

    This family consists of several hypothetical bacterial proteins of around 200 residues in length. The function of this family is unknown.

    \ 5865 IPR010321 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 6893 IPR009772 \

    This family contains a number of eukaryotic D123 proteins approximately 330 residues long. It has been shown that mutated variants of D123 exhibit temperature-dependent differences in their degradation rate PUBMED:11699637.

    \ 5751 IPR009223 \

    This short region is found repeated in the mid region of the adenomatous polyposis proteins (APCs). In the human protein many cancer-linked SNPs are found near the first three occurrences of the motif. These repeats bind beta-catenin PUBMED:9823329.

    \ 5469 IPR008395 \ This domain is related to the TUDOR domain PUBMED:12575993. The function of the agenet domain is unknown. This signature matches one of the two Agenet domains in the FMR proteins PUBMED:12575993.\ 1520 IPR003466 \ Chalcone-flavanone isomerase () is a plant enzyme responsible for the isomerisation of chalcone to naringenin, a key step in the biosynthesis of flavonoids. The Petunia hybrida genome contains two genes coding for very similar enzymes, ChiA and ChiB, but only the first seems to encode a functional chalcone-flavanone isomerase.\ 5593 IPR008643 \ Herpes virus virions share a characteristic architecture in which the double-stranded DNA genome is surrounded by an icosahedral protein capsid, a thick tegument layer, and a lipid bilayer envelope. This large tegument protein is found in a variety of herpesviruses.\ 5648 IPR008563 \ This family consists of several highly related baculovirus proteins of unknown function.\ 7695 IPR012447 \

    The proteins in this entry have not been characterised.

    \ 1355 IPR002546 \ This basic domain is found in the MyoD family of muscle specific proteins \ that control muscle development. The bHLH region of the MyoD family\ includes the basic domain and the Helix-loop-helix (HLH) motif.\ The bHLH region mediates specific DNA binding PUBMED:9343420. With 12 residues\ of the basic domain involved in DNA binding PUBMED:8790335. The basic domain\ forms an extended alpha helix in the structure.\ 2965 IPR000417 \ Thiamine pyrophosphate (TPP), a required cofactor for many enzymes in the \ cell, is synthesised de novo in Salmonella typhimurium PUBMED:9244280. Five kinase \ activities have been implicated in TPP synthesis, which involves joining \ a 4-methyl-5-(beta-hydroxyethyl)thiazole (THZ) moiety and a 4-amino-5-\ hydroxymethyl-2-methylpyrimidine (HMP) moiety PUBMED:9244280, PUBMED:7982968. \ THZ kinase () activity is involved in the salvage synthesis of \ TH-P from the thiazole: \ \ Hydroxyethylthiazole kinase expression is regulated at the mRNA level by\ intracellular thiamin pyrophosphate PUBMED:7982968.\ 3851 IPR000484 \

    This family represents the photosynthetic reaction centre L (light) and M (medium) subunits from purple photosynthetic bacteria, and the homologous D1 (PsbA) and D2 (PsbD) photosystem II (PSII) reaction centre proteins from cyanobacteria, algae and plants. The D1 and D2 proteins only show approximately 15% sequence homology with the L and M subunits, however the conserved amino acids correspond to the binding sites of the phytochemically active cofactors. As a result, the reaction centres (RCs) of purple photosynthetic bacteria and PSII display considerable structural similarity in terms of cofactor organisation.

    \ \

    The bacterial photosynthetic RC is composed of three protein subunits (L, M and H) and a number of bound cofactors, which are anchored in the cell membrane PUBMED:11095707. Upon light excitation, an electron is transferred from the primary donor (bacteriochlorophyll dimer) via the intermediate acceptor bacteriopheophytin to the primary acceptor quinone Qa, and finally to the secondary acceptor quinone Qb, resulting in the formation of quinol QbH2 that acts as a proton carrier to the cytochrome bc1 complex, culminating in the production of ATP via a proton gradient across the membrane PUBMED:12872158, PUBMED:2676514.

    \ \

    The D1 and D2 proteins occur as a heterodimer that form the reaction core of PSII, a multisubunit protein-pigment complex containing over forty different cofactors, which are anchored in the cell membrane in cyanobacteria, and in the thylakoid membrane in algae and plants. Upon absorption of light energy, the D1/D2 heterodimer undergoes charge separation, and the electrons are transferred from the primary donor (chlorophyll a) via pheophytin to the primary acceptor quinone Qa, then to the secondary acceptor Qb, which like the bacterial system, culminates in the production of ATP. However, PSII has an additional function over the bacterial system. At the oxidising side of PSII, a redox-active residue in the D1 protein reduces P680, the oxidised tyrosine then withdrawing electrons from a manganese cluster, which in turn withdraw electrons from water, leading to the splitting of water and the formation of molecular oxygen. PSII thus provides a source of electrons that can be used by photosystem I to produce the reducing power (NADPH) required to convert CO2 to glucose PUBMED:12518057, PUBMED:14871485.

    \ 6653 IPR009643 \

    Heat shock factor binding protein 1 (HSBP1) appears to be a negative regulator of the heat shock response PUBMED:9649501.

    \ 7995 IPR012582 \

    This is domain B in the catalytic subunit of DNA-dependent protein kinases.

    \ 1335 IPR006923 \

    This is a family of Baculoviridae late expression factor 5, required for late and very late gene expression.

    \ 6426 IPR010568 \

    This entry contains a number of repeats found in Chlorovirus glycoproteins. The function of these proteins is unknown.

    \ 3972 IPR004973 \

    The Poxvirus DNA-directed RNA polymerase () catalyses DNA-template-directed extension of the 3'-end of an RNA strand by one nucleotide at a time. The enzyme consists of at least eight subunits, this is the 18 kDa subunit.

    \ 1977 IPR005049 \

    This is a protein family of unknown function.

    \ 3928 IPR006758 \

    The A32 protein is thought to be an ATPase involved in viral DNA packaging PUBMED:8470370.

    \ 2495 IPR006452 \

    This family of sequences describe an accessory protein required for the assembly of formate dehydrogenase of certain proteobacteria although not present in the final complex PUBMED:2170340. The exact nature of the function of FdhE in the assembly of the complex is unknown, but considering the presence of selenocysteine, molybdopterin, iron-sulphur clusters and cytochrome b556, it is likely to be involved in the insertion of cofactors.

    \ 8075 IPR013256 \

    This entry includes the Saccharomyces cerevisiae protein SPT2 which is a chromatin protein involved in transcriptional regulation PUBMED:15563464.

    \ 4263 IPR002676 \

    The RimM protein is essential for efficient processing of 16S rRNA PUBMED:9422595. The RimM protein was shown to have affinity for free ribosomal 30S subunits but not for 30S subunits in the 70S ribosomes PUBMED:9422595.

    \ 136 IPR004302 \ Entomopoxviruses are a class of insect viruses whose virions are embedded in cytoplasmic occlusion bodies. The major component of these protective complexes is a protein called spheroidin/spindolin. Intermolecular disulphide bonds have been shown to play major roles in the formation and structure of these viral occlusion bodies PUBMED:2327073 some of which are spindle body proteins.\ 958 IPR000626 \

    Ubiquitin is a protein of 76 amino acid residues, found in all eukaryotic cells and whose sequence is extremely well conserved from protozoan to vertebrates. Ubiquitin acts through its post-translational attachment (ubiquitinylation) to other proteins, where these modifications alter the function, location or trafficking of the protein, or targets it for destruction by the 26S proteasome PUBMED:15454246. The terminal glycine in the C-terminal 4-residue tail of ubiquitin can form an isopeptide bond with a lysine residue in the target protein, or with a lysine in another ubiquitin molecule to form a ubiquitin chain that attaches itself to a target protein. Ubiquitin has seven lysine residues, any one of which can be used to link ubiquitin molecules together, resulting in different structures that alter the target protein in different ways. It appears that Lys(11)-, Lys(29) and Lys(48)-linked poly-ubiquitin chains target the protein to the proteasome for degradation, while mono-ubiquitinylated and Lys(6)- or Lys(63)-linked poly-ubiquitin chains signal reversible modifications in protein activity, location or trafficking PUBMED:14998368. For example, Lys(63)-linked poly-ubiquitinylation is known to be involved in DNA damage tolerance, inflammatory response, protein trafficking and signal transduction through kinase activation PUBMED:15556404. In addition, the length of the ubiquitin chain alters the fate of the target protein. Regulatory proteins such as transcription factors and histones are frequent targets of ubquitinylation PUBMED:15525528.

    \

    Ubiquitinylation is an ATP-dependent process that involves the action of at least three enzymes: a ubiquitin-activating enzyme (E1, ), a ubiquitin-conjugating enzyme (E2, ), and a ubiquitin ligase (E3, , ), which work sequentially in a cascade. There are many different E3 ligases, which are responsible for the type of ubiquitin chain formed, the specificity of the target protein, and the regulation of the ubiquitinylation process PUBMED:12646216. Ubiquitinylation is an important regulatory tool that controls the concentration of key signalling proteins, such as those involved in cell cycle control, as well as removing misfolded, damaged or mutant proteins that could be harmful to the cell. Several ubiquitin-like molecules have been discovered, such as SUMO1 (), NEDD8, Rad23 (), Elongin B and Parkin (), the latter being involved in Parkinsons disease PUBMED:15564047.

    \ \ 522 IPR004088 \

    The K homology (KH) domain was first identified in the human heterogeneous\ nuclear ribonucleoprotein (hnRNP) K. It is a domain of around 70 amino acids\ that is present in a wide variety of quite diverse nucleic acid-binding\ proteins PUBMED:8036511. It has been shown to bind RNA PUBMED:9302998, PUBMED:10369774. Like many other RNA-binding motifs, KH motifs are found in one or multiple copies (14 copies in chicken vigilin) and, at least for hnRNP K (three copies) and FMR-1 (two copies), each motif is necessary for in vitro RNA binding activity, suggesting that they may function cooperatively or, in the case of single KH motif proteins (for example, Mer1p), independently PUBMED:8036511.

    \

    According to structural PUBMED:9302998, PUBMED:10369774, PUBMED:11160884 analysis the KH domain can be separated in two groups. The first group or type-1 contain a beta-alpha-alpha-beta-beta-alpha structure, whereas in the type-2 the two last beta-sheet are located in the N terminal part of the domain (alpha-beta-beta-alpha-alpha-beta). Sequence similarity between these two folds are limited to a short region (VIGXXGXXI) in the RNA binding motif. This motif is located between helice 1 and 2 in type-1 and between helice 2 and 3 in type-2. Proteins known to contain a type-1 KH domain include bacterial polyribonucleotide nucleotidyltransferases (); vertebrate fragile X mental retardation protein 1 (FMR1); eukaryotic heterogeneous nuclear ribonucleoprotein K (hnRNP K), one of at least 20 major proteins that are part of hnRNP particles in mammalian cells; mammalian poly(rC) binding proteins; Artemia salina glycine-rich protein GRP33; yeast PAB1-binding protein 2 (PBP2); vertebrate vigilin; and human high-density lipoprotein binding protein (HDL-binding protein).

    \ 4056 IPR003353 \ The bacterial phosphoenolpyruvate: sugar phosphotransferase system (PTS) is a multi-protein system involved in the regulation of a variety of metabolic and transcriptional processes. The PTS catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred to enzyme-I (EI) of PTS which in turn transfers it to a phosphoryl carrier protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease which consists of at least three structurally distinct domains (IIA, IIB, and IIC) PUBMED:1537788 which can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII). IIB () is is phosphorylated by phospho-IIA, before the phosphoryl group is transferred to the sugar substrate.\ 5042 IPR007454 \ This family includes several proteins of uncharacterised function.\ 4672 IPR001319 \ Nuclear transition protein 1 (TP1) is one of the spermatid-specific proteins \ PUBMED:2040274. TP1 is a basic protein well \ conserved in mammalian species. In mammals, the second stage of spermatogenesis is \ characterized by the conversion of nucleosomal chromatin to the compact, nonnucleosomal \ and transcriptionally inactive form found in the sperm nucleus. This condensation is \ associated with a double-protein transition. The first transition corresponds to the \ replacement of histones by several spermatid-specific proteins (also called transition \ proteins) which are themselves replaced by protamines during the second transition.\ 2914 IPR006731 \ This family includes UL25 proteins from HCMV, as well as U14 proteins from HHV 6 and HHV7. These 85 kDa phosphoproteins appear to act as structural antigens, but their precise function is otherwise unknown.\ 3229 IPR004565 \

    This protein, LolB, is known so far only in the gamma subdivision of the Proteobacteria. It is a processed, lipid-modified outer membrane protein. \ In Escherichia coli, lipoproteins are anchored to the\ periplasmic side of either the inner or outer membrane through N-terminal lipids, depending on the lipoprotein-sorting signal present at\ position 2 PUBMED:12032293. Five Lol proteins are involved in the sorting and outer membrane localization of lipoproteins. LolCDE, an ATP\ binding cassette (ABC) transporter, in the inner membrane releases outer membrane-directed lipoproteins from the inner membrane in an ATP-dependent manner, leading to the formation of a water-soluble complex between the lipoprotein and LolA. The LolA-lipoprotein complex crosses the periplasm and then\ interacts with outer membrane receptor LolB, which is essential for the anchoring of lipoproteins to the outer membrane.

    \ 8052 IPR001034 \

    The deoR-type HTH domain is a DNA-binding, helix-turn-helix (HTH) domain of\ about 50-60 amino acids present in transcription regulators of the deoR\ family, involved in sugar catabolism. This family of prokaryotic regulators is\ named after Escherichia coli deoR, a repressor of the deo operon, which\ encodes nucleotide and deoxyribonucleotide catabolic enzymes. DeoR also\ negatively regulates the expression of nupG and tsx, a nucleoside-specific\ transport protein and a channel-forming protein, respectively.

    \ \

    DeoR-like transcription repressors occur in diverse bacteria as regulators of\ sugar and nucleoside metabolic systems. The effector molecules for deoR-like\ regulators are generally phosphorylated intermediates of the relevant\ metabolic pathway. The DNA-binding deoR-type HTH domain occurs usually in the\ N-terminal part. The C-terminal part can contain an effector-binding domain\ and/or an oligomerization domain. DeoR occurs as an octamer, whilst glpR and\ agaR are tetramers. Several operators may be bound simultaneously, which could\ facilitate DNA looping PUBMED:1731335, PUBMED:14731281.

    \ \ 7363 IPR006572 \

    Zinc finger domains PUBMED:3125980, PUBMED: are nucleic acid-binding protein structures first \ identified in the Xenopus laevis transcription factor TFIIIA. These domains have since been found in \ numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid \ residues including 2 conserved Cys and 2 conserved His residues in a C-2-C-12-H-3-H type motif. \ The 12 residues separating the second Cys and the first His are mainly polar and basic, implicating \ this region in particular in nucleic acid binding. The zinc finger motif is an unusually small, \ self-folding domain in which Zn is a crucial component of its tertiary structure. All bind 1 atom of \ Zn in a tetrahedral array to yield a finger-like projection, which interacts with nucleotides in the \ major groove of the nucleic acid. The Zn binds to the conserved Cys and His residues. Fingers have \ been found to bind to about 5 base pairs of nucleic acid containing short runs of guanine residues. \ They have the ability to bind to both RNA and DNA, a versatility not demonstrated by the helix-turn-helix motif. The zinc finger may thus represent the original nucleic acid binding protein. It has \ also been suggested that a Zn-centred domain could be used in a protein interaction, e.g. in protein \ kinase C. Many classes of zinc fingers are characterized according to the number and positions of the \ histidine and cysteine residues involved in the zinc atom coordination. In the first class to be \ characterized, called C2H2, the first pair of zinc coordinating residues are cysteines, while the \ second pair are histidines.

    \

    This domain represents a zinc finger found in DBF-like proteins, present in all eukaryotes except Nematodes\ and plants. Proteins containing this domain may be regulators of DNA replication and cell cycle, for example Cdc7/Dbf4 is a protein kinase that is required for the initiation of DNA replication in eukaryotes during\ G1/S cell cycle transition PUBMED:1592236, and may play a role role in checkpoint function and in the maintenance of genomic integrity PUBMED:11269496.

    \ 7763 IPR012857 \

    D-aminopeptidase () is a dimeric enzyme with each monomer being composed of three domains. Domain C is organised to form a beta barrel made up of eight antiparallel beta strands. It is connected to domain B by a short linker sequence, and interacts extensively with the domain A, the catalytic domain. The gamma loop of domain C forms part of the wall of the catalytic pocket; domain C is in fact thought to confer substrate and inhibitor specificity to the enzyme.

    \ 2267 IPR006458 \

    This group of sequences contain an uncharacterized domain of about 70 residues found exclusively in plants, generally toward the C terminus of proteins of 200 to 350 amino acids in length. At least 14 such proteins are found in Arabidopsis thaliana. Other regions of these proteins tend to consist largely of low-complexity sequence. Function is not known.

    \ 6800 IPR009719 \

    This family represents a conserved region approximately 60 residues long within a number of plant proteins of unknown function.

    \ 7905 IPR012958 \

    The CHD N-terminal domain is found in PHD/RING fingers and chromo domain-associated helicases PUBMED:15112237.

    \ 1233 IPR011579 \

    This domain has been found in a number of bacterial and archaeal proteins, all of which contain a conserved P-loop motif that is involved in binding ATP.

    \ 6644 IPR010660 \

    NOTCH signalling plays a fundamental role during a great number of developmental processes in multicellular animals PUBMED:10221902. NOD (NOTCH protein domain) represents a region present in many NOTCH proteins and NOTCH homologues in multiple species such as 0, NOTCH2 and NOTCH3, LIN12, SC1 and TAN1. Role of NOD domain remains to be elucidated.

    \ 3796 IPR000023 \ The enzyme-catalysed transfer of a phosphoryl group from ATP is an\ important reaction in a wide variety of biological processes PUBMED:2953977. One\ enzyme that utilises this reaction is phosphofructokinase (PFK), which\ catalyses the phosphorylation of fructose-6-phosphate to fructose-1,6-\ bisphosphate, a key regulatory step in the glycolytic pathway PUBMED:12023862, PUBMED:7825568. \ PFK exists as a homotetramer in bacteria and mammals (where each monomer\ possesses 2 similar domains), and as an octomer in yeast (where there are\ 4 alpha- (PFK1) and 4 beta-chains (PFK2), the latter, like the mammalian\ monomers, possessing 2 similar domains PUBMED:7825568).

    PFK is ~300 amino acids in length, and structural studies of the\ bacterial enzyme have shown it comprises two similar (alpha/beta) lobes: one involved in\ ATP binding and the other housing both the substrate-binding site and the allosteric site (a regulatory binding site distinct from the active site, but that affects enzyme\ activity). The identical tetramer subunits adopt 2 \ different conformations: in a 'closed' state, the bound magnesium ion\ bridges the phosphoryl groups of the enzyme products (ADP and fructose-1,6-\ bisphosphate); and in an 'open' state, the magnesium ion binds only the ADP\ PUBMED:2975709, as the 2 products are now further apart. These conformations are\ thought to be successive stages of a reaction pathway that requires subunit\ closure to bring the 2 molecules sufficiently close to react PUBMED:2975709.

    \

    Deficiency in PFK leads to glycogenosis type VII (Tauri's disease), an\ autosomal recessive disorder characterised by severe nausea, vomiting,\ muscle cramps and myoglobinuria in response to bursts of intense or\ vigorous exercise PUBMED:7825568. Sufferers are usually able to lead a reasonably\ ordinary life by learning to adjust activity levels PUBMED:7825568.

    \ 3467 IPR006115 \

    6-Phosphogluconate dehydrogenase () (6PGD) is an oxidative carboxylase that catalyses the decarboxylating reduction of 6-phosphogluconate into ribulose 5-phosphate in the presence of NADP. This reaction is a component of the hexose mono-phosphate shunt and pentose phosphate pathways (PPP) PUBMED:2113917, PUBMED:6641716. Prokaryotic and eukaryotic 6PGD are proteins of about 470 amino acids whose sequence are highly conserved PUBMED:1659648. The protein is a homodimer in which the monomers act independently PUBMED:6641716: each contains a large, mainly alpha-helical domain and a smaller beta-alpha-beta domain, containing a mixed parallel and anti-parallel 6-stranded beta sheet PUBMED:6641716. NADP is bound in a cleft in the small domain, the substrate binding in an adjacent pocket PUBMED:6641716.

    This family represents the NAD binding domain of 6-phosphogluconate dehydrogenase which adopts a Rossman fold. The C-terminal domain is described in .

    \ 1124 IPR008172 \

    These sequences are functionally identified as members of the adenylate cyclase family, which catalyses the conversion of ATP to 3',5'-cyclic AMP and pyrophosphate.

    \ \

    The protein CyaB from Aeromonas hydrophila is a second adenylyl cyclase from that species, as demonstrated by complementation in Escherichia coli and by assay of the enzymatic properties of purified recombinant protein PUBMED:9642185. It has no detectable homology to any other protein of known function, and has several unusual properties, including an optimal temperature of 65 degrees and an optimal pH of 9.5. A cluster of uncharacterised archaeal homologs may be orthologous and serve (under certain circumstances) to produce the regulatory metabolite cyclic AMP (cAMP).

    \ \ 560 IPR004474 \ This entry describes a domain of unknown function that is found in the predicted extracellular domain of a number of putative membrane-bound proteins. One of these is protein psr, described as a penicillin binding protein 5 (PDP-5) synthesis repressor. Another is Bacillus subtilis LytR, described as a transcriptional attenuator of itself and the LytABC operon, where LytC is N-acetylmuramoyl-L-alanine amidase. A third is CpsA, a putative regulatory protein involved in exocellular polysaccharide biosynthesis. These proteins share the property of having a short putative N-terminal cytoplasmic domain and transmembrane domain forming a signal-anchor.\ 2772 IPR005199 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This is a family of endo-beta-N-glucuronidase, or heparanase belonging to glycoside hydrolase family 79 (). Heparan sulphate proteoglycans (HSPGs) play a key role in the self- assembly, insolubility and barrier properties of basement membranes and extracellular matrices. Hence, cleavage of heparan sulphate (HS) affects the integrity and functional state of tissues and thereby fundamental normal and pathological phenomena involving cell migration and response to changes in the extracellular microenvironment. Heparanase degrades HS at specific intrachain sites. The enzyme is synthesized as a latent approximately 65 kDa protein that is processed at the N-terminus into a highly active approximately 50 kDa form. Experimental evidence suggests that heparanase may facilitate both tumor cell invasion and neovascularization, both critical steps in cancer progression. The enzyme is also involved in cell migration associated with inflammation and autoimmunity PUBMED:11530216.

    \ 6099 IPR010432 \

    This domain contains three highly conserved amino acids: one arginine and two aspartates, hence the name of RDD domain. This region contains two predicted transmembrane regions. The arginine occurs at the N terminus of the first helix and the first aspartate occurs in the middle of this helix. The molecular function of this region is unknown. However this region may be involved in transport of an as yet unknown set of ligands.

    \ 1597 IPR002108 \

    The ADF/cofilins are a family of actin-binding proteins expressed in all eukaryotic cells so far examined. Members of this family\ remodel the actin cytoskeleton, for example during cytokinesis, when the actin-rich contractile ring shrinks as it contracts\ through the interaction of ADF/cofilins with both monomeric and filamentous actin. ADF/cofilins sever actin filaments (F-actin) and/or bind to actin monomers, or G-actin, thus preventing actin-polymerization by sequestering the monomers. The ADF/cofilins are formed by a single folded domain, the ADF homology domain, which is also found in other actin-binding\ protein families and is the most conserved region of these proteins consisting of a twenty amino-acid segment that ends some 30 residues from their C-terminal extremity PUBMED:1313794. The main actin-binding structure being a long alpha-helix.

    \

    Plants and animals have multiple ADF/cofilin genes, belonging\ in vertebrates to two types, ADF and cofilins. Other eukaryotes (such as yeast, Acanthamoeba and slime moulds) have a single\ ADF/cofilin gene. The following proteins are evolutionary related and belong to a family of low molecular weight (137 to 166 residues) actin-depolymerizing proteins PUBMED:8399167, PUBMED:8440472, PUBMED:8357799, PUBMED:8107682:

    \ \ \ 6195 IPR006396 \

    These sequences represent the E (epsilon) subunit of methylaspartate mutase (glutamate mutase), a cobalamin-dependent enzyme that catalyzes the first step in a pathway of glutamate fermentation.

    \ 3346 IPR003416 \ The MgtC protein is found in an operon with the Mg2+ transporter protein MgtB. The function of MgtC and its homologues is not known, but it is thought that MgtC may act as an accessory protein for MgtB, thus mediating magnesium influx into the cytosol. Also included in this family are the Bacillus subtilis SapB protein and several hypothetical proteins.\ 3002 IPR000232 \ Heat shock factor (HSF) is a transcriptional activator of heat shock genes\ PUBMED:2257625: it binds specifically to heat shock promoter elements, which are\ palindromic sequences rich with repetitive purine and pyrimidine motifs PUBMED:2257625.\ Under normal conditions, HSF is a homo-trimeric cytoplasmic protein, but\ heat shock activation results in relocalisation to the nucleus PUBMED:1871105.\ Each HSF monomer contains one C-terminal and three N-terminal leucine zipper\ repeats PUBMED:1871106. Point mutations in these regions result in disruption of\ cellular localisation, rendering the protein constitutively nuclear PUBMED:1871105.\ Two sequences flanking the N-terminal zippers fit the consensus of a bi-\ partite nuclear localisation signal (NLS). Interaction between the N- and \ C-terminal zippers may result in a structure that masks the NLS sequences: following activation of HSF, these may then be unmasked, resulting in \ relocalisation of the protein to the nucleus PUBMED:1871106. The DNA-binding component\ of HSF lies to the N-terminus of the first NLS region, and is referred to\ as the HSF domain.\ 7962 IPR012519 \

    This family consists of the type A lantibiotic peptides. Both Pep5 and epicidin-280 are ribosomally-synthesised antimicrobial peptides produced by Gram-positive bacteria that are characterised by the presence of lanthionine and/or methyllanthionine residues. The lantibiotics family has a highly specific activity against multi- drug resistant bacteria and has potential to be utilised in a wide range of medical applications PUBMED:2253617,PUBMED:9726851.

    \ 2990 IPR000476 \ Glycoprotein hormones PUBMED:6267989, PUBMED:1445230 (or gonadotropins) are a family of proteins, which include the mammalian hormones follitropin (FSH), lutropin (LSH), thyrotropin\ (TSH) placental chorionic gonadotropins hCG and eCG PUBMED:6314263 and chorionic \ gonadotropin (CG), as well as at least two forms of fish\ gonadotropins. These hormones are central to the \ complex endocrine system that regulates normal growth, sexual development, \ and reproductive function PUBMED:6177696. The hormones LH, FSH and TSH are secreted\ by the anterior pituitary gland, while hCG and eCG are secreted by the \ placenta PUBMED:1713773. \ All these hormones consist of two glycosylated chains (alpha\ and beta). The alpha subunit is common to each protein dimer (well conserved within species, \ but differing between them PUBMED:6177696), and a unique beta subunit, which \ confers biological specificity PUBMED:6314263.\ The alpha chains are highly conserved proteins of about 100 amino acid\ residues which contain ten conserved cysteines all involved in disulphide\ bonds PUBMED:8202136, as shown in the following schematic representation.\
    \
                            +---------------------------+\
                +----------+|             +-------------|--+\
                |          ||             |             |  |\
            xxxxCxCxxxxxxCxCCxxxxxxxxxxxxxCCxxxxxxxxxxCxCxxCx\
                  |      |                 |          |\
                  +------|-----------------+          |\
                         |                            |\
                         +----------------------------+\
    \
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \ Intracellular levels of free alpha subunits are greater than those of the\ mature glycoprotein, implying that hormone assembly is limited by the\ appearance of the specific beta subunits, and hence that synthesis of alpha\ and beta is independently regulated PUBMED:6314263.\ 3294 IPR006983 \ The MbeD and MobD proteins are plasmid encoded, and are involved in the plasmid mobilisation and transfer in the presence of conjugative plasmids PUBMED:2671664.\ 4717 IPR007039 \

    Conjugal transfer protein, TrbC, has been identified as a subunit of the pilus precursor in bacteria. The protein undergoes three processing steps before gaining its mature cyclic structure PUBMED:12160637.

    \ 2433 IPR003331 \ UDP-N-acetylglucosamine 2-epimerase catalyses the production of UDP-ManNAc from UDP-GlcNAc. Some of the enzymes is this family are bifunctional. In microorganisms the epimerase is involved in in the synthesis of the capsule precursor UDP-ManNAcA PUBMED:9515923, PUBMED:9440531. The protein from rat liver displays both epimerase and kinase activity PUBMED:9305888.\ 8067 IPR013209 \

    This domain is found in Saccharomyces cerevisiae protein SMP2, proteins with an N-terminal lipin domain () and phosphatidylinositol transfer proteins PUBMED:8437575. SMP2 is involved in plasmid maintenance and respiration PUBMED:12376568. Lipin proteins are involved in adipose tissue development and insulin resistance PUBMED:11792863.

    \ 2099 IPR007380 \ This is a a group of uncharacterised proteins.\ 55 IPR007192 \

    The anaphase-promoting complex is composed of eight protein subunits, including BimE (APC1), CDC27 (APC3), CDC16 (APC6), and CDC23 (APC8). This entry is for CDC23.

    \ 316 IPR006984 \ This family is comprises of uncharacterized eukaryotic proteins.\ 7597 IPR011685 \ This is a group of mainly hypothetical eukaryotic proteins. Putative features found in LETM1, such as a transmembrane domain and a CK2 and PKC phosphorylation site PUBMED:10486213, are relatively conserved throughout the family. Deletion of LETM1 is thought to be involved in the development of Wolf-Hirschhorn syndrome in humans PUBMED:10486213. A member of this family, , is known to be expressed in the mitochondria of Drosophila melanogaster PUBMED:10071211, suggesting that this may be a group of mitochondrial proteins.\ 3252 IPR001123 \ Lysine exporter protein is involved in the efflux of excess L-lysine as a\ control for intracellular levels of L-lysine. A number of proteins belong\ to this family. These include the chemotactic transduction protein from\ Pseudomonas aeruginosa, the threonine efflux protein and a number of\ uncharacterised proteins from a variety of sources.\ 3764 IPR006198 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This signature is associated with serine peptidases belong to MEROPS peptidase families: S24 (LexA family, clan SF); S26A (signal peptidase I) and S26B (signalase). \ \ The S24 family includes:

    \

    \ \

    All of these proteins, with the possible exception of RulA, interact with RecA, which activates self cleavage either derepressing transcription in the case of CI and LexA PUBMED:10692372 or activating the lesion-bypass polymerase in the case of UmuD and MucA. UmuD'2, is the homodimeric component of DNA pol V, which is produced from UmuD by RecA-facilitated self-cleavage. The first 24 N-terminal residues of UmuD are removed; UmuD'2 is a DNA lesion bypass polymerase PUBMED:10692372, PUBMED:11483531. MucA PUBMED:9925794, PUBMED:11016960, like UmuD, is a plasmid encoded a DNA polymerase (pol RI) which is converted into the active lesion-bypass polymerase by a self-cleavage reaction involving RecA PUBMED:11114935

    \ \

    The S26A and B families are signal peptidases (Spases), also known as leader peptidases, they remove\ signal peptides from secretory proteins. In prokaryotes three types of SPases\ are known: type I (gene lepB) which is responsible for the processing of the\ majority of exported pre-proteins; type II (gene lsp) which only process\ lipoproteins, and a third type involved in the processing of pili subunits.

    \ \ \

    Eukaryotic microsomal signal peptidase is involved in the removal of signal peptides from secretory proteins as they pass into the endoplasmic reticulumen PUBMED:7845208. The peptidase is more complex than its mitochondrial and bacterial counterparts, containing a number of subunits, ranging from two in the chicken oviduct peptidase, to five in the dog pancreas protein PUBMED:7845208. They share sequence similarity with the bacterial leader peptidases (family S26A), although activity here is mediated by a serine/histidine dyad rather than a serine/lysine dyad PUBMED:7845208. Archaeal signal peptidases also belong to this group.

    .\ \ \

    This group of proteins also contains proteins classified as non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for catalytic activity.

    \ 1654 IPR007870 \ DNA transcription, replication, repair and/or recombination require DNA accessibility to factors involved in the initiation of such processes. In addition, protein complexes, whose size is large compared to a nucleosome, should be able to scan the DNA packaged in chromatin. This requires sequential changes into chromatin structure. To achieve such chromatin structural changes two major mechanisms have been proposed:
    1. the post-translational modification of histones; and
    2. the action of ATP-dependent chromatin remodelling complexes.
    \ The function of this particular chromatin remodelling protein is currently unknown.\ 21 IPR008278 \

    These proteins transfer the 4'-phosphopantetheine (4'-PP) moiety from coenzyme A\ (CoA) to the invariant serine of pp-binding. This post-translational modification renders\ holo-ACP capable of acyl group activation via thioesterification of the cysteamine thiol of 4'-PP\ PUBMED:7559576. This superfamily consists of two subtypes: The ACPS type such as ACPS_ECOLI and the\ Sfp type such as SFP_BACSU. The structure of the Sfp type is known PUBMED:10581256, which shows the\ active site accommodates a magnesium ion. The most highly conserved regions of the\ alignment are involved in binding the magnesium ion.

    \ 326 IPR000375 \ Dynamin is a microtubule-associated force-producing protein of 100 Kd\ which is involved in the production of microtubule bundles. At the N terminus of\ dynamin is a GTPase domain (see ),\ and at the C-terminus is a PH domain (see ).\ Between these two domains lies a central region of unknown function.\ 3470 IPR002504 \ Members of this family are ATP-NAD kinases . The enzymes catalyse the phosphorylation of NAD to NADP utilizing ATP and other nucleoside triphosphates as well as inorganic polyphosphate as a source of phosphorus.\ \ 3946 IPR007027 \ These proteins belong to the poxvirus F11 family. They are early virus proteins.\ 2942 IPR002690 \ This family consist of various capsid proteins from members of the herpesviridae. The capsid protein VP23 in herpes simplex virus forms a triplex together with VP19C these fit between and link together adjacent capsomers as formed by VP5 and VP26 PUBMED:10400780. VP3 along with the scaffolding proteins helps to form normal capsids by defining the curvature of the shell and size of the particle PUBMED:10400780.\ 4652 IPR003733 \

    Thiamine monophosphate synthase (TMP) () catalyzes the substitution of the pyrophosphate of 2-methyl-4-amino-5- hydroxymethylpyrimidine pyrophosphate by 4-methyl-5- (beta-hydroxyethyl)thiazole phosphate to yield thiamine phosphate in the thiamine biosynthesis pathway PUBMED:9139923.

    \ \

    TENI, a protein from Bacillus subtilis that regulates the production of several extracellular enzymes by reducing alkaline protease production belongs to this group PUBMED:1898926.

    \ 1461 IPR005612 \

    This domain is present in the CAATT-binding protein which is essential for growth and necessary for\ 60S ribosomal subunit biogenesis. Other proteins containing this domain stimulate transcription from the HSP70 promoter

    \ 5973 IPR009314 \

    One of the members of this family is a 4.9 kDa proteins, encoded by Bovine coronavirus NS1 PUBMED:2142556.

    \ 5868 IPR009266 \

    This family consists of several Adenovirus E3 proteins. The E3 protein does not seem to be essential for virus replication in cultured cells suggesting that the protein may function in virus-host interactions PUBMED:7769690.

    \ 483 IPR002202 \

    Hydroxymethylglutaryl-coenzyme A reductase () (HMG-CoA reductase) PUBMED:2491679, PUBMED:3065625 catalyzes the NADP-dependent synthesis of mevalonate from 3-hydroxy-3-methylglutaryl-CoA. In vertebrates, HMG-CoA reductase is the rate-limiting enzyme in cholesterol biosynthesis. In plants, mevalonate is the precursor of all isoprenoid compounds. The reduction of HMG-CoA to mevalonate is regulated by feedback inhibition by sterols and non-sterol metabolites derived from mevalonate PUBMED:2991281, including cholesterol.

    \

    HMG-CoA reductase is a membrane bound glycoprotein that remains in the endoplasmic reticulum after synthesis and glycosylation PUBMED:3065625. Structurally, it consists of 3 domains. An N-terminal region that contains a variable number of transmembrane segments (7 in mammals, insects and fungi; 2 in plants), a linker region and a C-terminal catalytic domain of approximately 400 amino-acid residues. Although little sequence similarity is found between the transmembrane domains of HMG-CoA reductases from different species, the C-terminal catalytic domain is well conserved. The structure of this region is predicted to consist of amphipathic helices flanking an extended beta-pleated sheet.

    \

    In archebacteria PUBMED:1556098 HMG-CoA reductase, which is involved in the biosynthesis of the isoprenoids side chains of lipids, seems to be cytoplasmic and lack the N-terminal hydrophobic domain.

    \

    Some bacteria, such as Pseudomonas mevalonii, can use mevalonate as the sole carbon source. These bacteria use an NAD-dependent HMG-CoA reductase () to deacetylate mevalonate into 3-hydroxy-3-methylglutaryl-CoA PUBMED:1556098. The Pseudomonas enzyme is structurally related to the catalytic domain of NADP-dependent HMG-CoA reductases.

    \ 6234 IPR009440 \

    This family consists of several bacterial StbA plasmid stability proteins PUBMED:1706707.

    \ 8038 IPR013223 \

    This domain includes the N-terminal OB domain found in ribonuclease B proteins in one or two copies.

    \ 4623 IPR001031 \ Thioesterase domains often occur integrated in or associated with peptide synthetases\ which are involved in the non-ribosomal synthesis of peptide antibiotics PUBMED:9560421.\ Thioesterases are required for the addition of the last amino acid to the peptide\ antibiotic, thereby forming a cyclic antibiotic. Next to the operons encoding these\ enzymes, in almost all cases, are genes that encode proteins that have similarity to\ the type II fatty acid thioesterases of vertebrates.\ 7532 IPR011645 \ The HNOBA (Heme NO Binding) domain is found associated with the HNOB domain and in soluble cyclases and signalling proteins. The HNOB domain is predicted to function as a heme-dependent sensor for gaseous ligands, and transduce diverse downstream signals in both bacteria and animals.\ 6394 IPR010555 \

    This family represents the chondroitin sulphate attachment domain of vertebrate neural transmembrane proteoglycans that contain EGF modules. Evidence has been accumulated to support the idea that neural proteoglycans are involved in various cellular events including mitogenesis, differentiation, axonal outgrowth and synaptogenesis PUBMED:9321696. This domain contains several potential sites of chondroitin sulphate attachment, as well as potential sites of N-linked glycosylation PUBMED:9950058.

    \ 53 IPR001471 \

    Pathogenesis-related genes transcriptional activator binds to the GCC-box pathogenesis-related promoter element and activates the plant's defense genes.\ Ethylene, chemically the simplest plant hormone, participates in a \ number of stress responses and developmental processes: e.g., fruit\ ripening, inhibition of stem and root elongation, promotion of seed\ germination and flowering, senescence of leaves and flowers, and sex\ determination PUBMED:7732375. DNA sequence elements that confer ethylene \ responsiveness have been shown to contain two 11bp GCC boxes, which\ are necessary and sufficient for transcriptional control by ethylene.\ Ethylene responsive element binding proteins (EREBPs) have now been\ identified in a variety of plants. The proteins share a similar domain\ of around 59 amino acids, which interacts directly with the GCC box in\ the ERE.

    \ 3668 IPR003115 \ Proteins containing this domain, appear to be related to the Escherichia coli plasmid protein ParB, which preferentially cleaves single-stranded DNA. ParB also nicks\ supercoiled plasmid DNA preferably at sites with potential single-stranded\ character, like AT-rich regions and sequences that can form cruciform structures. ParB also exhibits 5--3 exonuclease activity.\ 4966 IPR002889 \ The WSC domain is a putative carbohydrate binding domain. The domain\ contains up to eight conserved cysteine residues that may be involved\ in disulphide bridges.\ The Trichoderma harzianum beta-1,3 exoglucanase contains two copies of the WSC domain, while the yeast SLG1 protein contains only one.\ 7201 IPR010860 \

    This family consists of several bacterial CAMP factor (Cfa) proteins, which seem to be specific to Streptococcus species. The CAMP reaction is a synergistic lysis of erythrocytes by the interaction of an extracellular protein (CAMP factor) produced by some streptococcal species with the Staphylococcus aureus sphingomyelinase C (beta-toxin) PUBMED:10456923.

    \ 3862 IPR001263 \

    Phosphatidylinositol 3-kinase (PI3-kinase) () is an enzyme\ that phosphorylates phosphoinositides on the 3-hydroxyl group of the inositol\ ring. The role of the accessory domain of phosphoinositide 3-kinase (PI3-kinase) \ is unclear. It may be involved in substrate presentation \ PUBMED:8248783.

    \ 2254 IPR006739 \ This family includes several uncharacterised proteins from Borrelia species.\ 7036 IPR009857 \

    This family consists of several hypothetical bacterial proteins of around 70 residues in length. Members of this family are often referred to as YejL. The function of this family is unknown.

    \ 3444 IPR007860 \ This domain is found in proteins of the MutS family (DNA mismatch repair proteins) and is found associated with several other domains, , , and . The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair; other members of the family included the eukaryotic MSH 1,2,3, 4,5 and 6 proteins. These have various roles in DNA repair and recombination. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein PUBMED:8036718. This domain corresponds to domain II in Thermus aquaticus MutS as characterised in PUBMED:11048710, and has similarity to RNAse-H-like domains (see ).\ 6525 IPR010605 \

    This family contains hypothetical plant proteins of unknown function.

    \ 7449 IPR011476 \

    This is a family of hypothetical proteins found in Rhodopirellula baltica.

    \ 6620 IPR010647 \

    This is a group of proteins of unknown function.

    \ 2552 IPR000527 \ The flgH, flgI and fliF genes of Salmonella typhimurium encode the major proteins for the L, P and M rings of the flagellar basal body PUBMED:2544561. In fact, the basal body consists of four rings (L,P,S and M) surrounding the flagellar rod, which is believed to transmit motor rotation to the filament PUBMED:2129540. The M ring is integral to the inner membrane of the cell, and may be connected to the rod via the S (supramembrane) ring, which lies just distal to it. The L and P rings reside in the outer membrane and periplasmic space, respectively. FlgH and FlgI, which are exported across the cell membrane to their destinations in the outer membrane and periplasmic space, have typical N-terminal cleaved signal-peptide sequences. FlgH is predicted to have an extensive beta-sheet structure, in keeping with other outer membrane proteins PUBMED:2544561.\ 8122 IPR013195 \

    This short region is found at the N-terminus of some hepatitis core proteins. Its conservation of four Cys suggests a zinc binding domain.

    \ 7973 IPR012610 \

    This family consists of the small acid-soluble spore proteins (SASP) of the H type (sspH). SspH are unique to spores of Bacillus subtilis and are expressed only in the forespore compartment during sporulation of this organism. The sspH genes are monocistronic and are recognised by the forespore-specific sigma factor for RNA polymerase - sigma-G. The specific role of this protein is unclear but is thought to play a role in sporulation under conditions different from that of the common laboratory tests of spore properties PUBMED:10333516.

    \ 3761 IPR000667 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This family of serine peptidases belong to MEROPS peptidase family S13 (D-Ala-D-Ala carboxypeptidase C, clan SE). The predicted active site residues for members of this family and family S12 occur in the motif SXXK.

    \

    D-Ala-D-Ala carboxypeptidase C is involved in the metabolism of\ cell components PUBMED:1741619; it is synthesised with a leader peptide to target it to the cell membrane PUBMED:7845208. After cleavage of the leader peptide, the enzyme is retained in the membrane by a C-terminal anchor PUBMED:7845208. There are three families of serine-type D-Ala-D-Ala peptidase (designated S11, S12 and S13), which are also known as low molecular weight penicillin-binding proteins PUBMED:7845208.\ Family S13 comprises D-Ala-D-Ala peptidases that have sufficient sequence\ similarity around their active sites to assume a distant evolutionary\ relationship to other clan members; members of the S13 family also bind\ penicillin and have D-amino-peptidase activity. Proteases of family S11 have\ exclusive D-Ala-D-Ala peptidase activity, while some members of S12 are\ C beta-lactamases PUBMED:7845208.

    \ \ 1636 IPR001218 \ The coronavirus nucleocapsid protein.\ Sequence comparison of the N genes of five strains of the\ coronavirus mouse hepatitis virus suggests a three domain structure\ for the nucleocapsid protein PUBMED:2171216. There seems to be a specific interaction between the\ coronavirus mouse hepatitis virus A59 nucleocapsid protein\ and packaging signal PUBMED:9426448.\ 4132 IPR007668 \ The RFX family is a family of winged-helix DNA-binding proteins. RFX1 is a regulatory factor essential for expression of MHC class II genes. This region is found N-terminal to the RFX DNA-binding region () in some mammalian RFX proteins, and is thought to activate transcription when associated with DNA. Deletion analysis has identified the region 233-351 in human RFX1 () as being required for maximal activation PUBMED:9278482.\ 3846 IPR001211 \

    Phospholipase A2 () (PLA2) is a small lipolytic enzyme that releases fatty\ acids from the second carbon group of glycerol. It is involved in a number\ of physiologically important cellular processes, such as the liberation of arachidonic\ acid from membrane phospholipids PUBMED:7664098. It plays a pivotal role in the biosynthesis of prostaglandin and other\ mediators of inflammation. PLA2 has four to seven disulphide bonds and binds a calcium\ ion that is essential for activity. Within the active enzyme, the alpha amino group is\ involved in a conserved hydrogen-bonding network linking the N-terminal region to\ the active site. The side chains of two conserved residues, His and Asp, participate in\ the catalytic network.

    \ \

    Many PLA2's are widely distributed in snakes, lizards, bees and mammals. In mammals\ there are at least four forms: pancreatic, membrane-associated as well as two less\ well characterized forms. The venom of most snakes contains multiple forms of PLA2.\ Some of them are presynaptic neurotoxins which inhibit neuromuscular transmission by\ blocking acetylcholine release from the nerve termini.

    \

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Api m 1.

    \ \ 3264 IPR003543 \ The egg peptide speract receptor is a transmembrane glycoprotein of about\ 500 amino acids PUBMED:2538832. Topologically, it comprises a large extracellular domain of about 450 residues, followed by a transmembrane domain and a short cytoplasmic region of about 12 amino acids. The extracellular\ domain contains 4 repeats of a well-conserved region, which spans 115\ amino acids and contains 6 conserved cysteines. A similar domain is also\ found towards the C-terminus of macrophage scavenger receptor type I PUBMED:1978939, a membrane glycoprotein implicated in the pathologic deposition of\ cholesterol in arterial walls during artherogenesis, and in the CD5\ glycoprotein, which acts as a receptor in regulating T-cell proliferation.\ \

    The type I and type II human scavenger receptors are similar to their \ bovine, rabbit and murine counterparts. They consist of 6 domains:\ cytoplasmic (I); membrane-spanning (II); spacer (III); alpha-helical coiled-\ coil (IV); collagen-like (V); and a type-specific C-terminal (VI) PUBMED:2251254. Immunohistochemical studies have indicated the presence of scavenger\ receptors in the macrophages of lipid-rich atherosclerotic lesions, suggesting the involvement of these receptors in atherogenesis PUBMED:2251254.

    \ \

    The macrophage scavenger receptor is trimeric and has unusual ligand-binding\ properties PUBMED:2300204. The trimeric structure of the bovine type I scavenger \ receptor contains 3 extracellular C-terminal cysteine-rich domains connected\ to the transmembrane domain by a long fibrous stalk. The stalk structure,\ which consists of an alpha-helical coiled coil and a collagen-like triple\ helix, has not previously been observed in an integral membrane protein PUBMED:2300204.

    \ 3300 IPR009047 \

    Methyl-coenzyme M reductase (MCR) is the enzyme responsible for microbial formation of methane. It is a hexamer composed of 2 alpha, 2 beta, and 2 gamma subunits with two identical nickel porphinoid active sites PUBMED:9367957.

    \

    The C-terminal domain is comprised of an all-alpha multi-helical bundle.

    \ 1752 IPR001034 \

    The deoR-type HTH domain is a DNA-binding, helix-turn-helix (HTH) domain of\ about 50-60 amino acids present in transcription regulators of the deoR\ family, involved in sugar catabolism. This family of prokaryotic regulators is\ named after Escherichia coli deoR, a repressor of the deo operon, which\ encodes nucleotide and deoxyribonucleotide catabolic enzymes. DeoR also\ negatively regulates the expression of nupG and tsx, a nucleoside-specific\ transport protein and a channel-forming protein, respectively.

    \ \

    DeoR-like transcription repressors occur in diverse bacteria as regulators of\ sugar and nucleoside metabolic systems. The effector molecules for deoR-like\ regulators are generally phosphorylated intermediates of the relevant\ metabolic pathway. The DNA-binding deoR-type HTH domain occurs usually in the\ N-terminal part. The C-terminal part can contain an effector-binding domain\ and/or an oligomerization domain. DeoR occurs as an octamer, whilst glpR and\ agaR are tetramers. Several operators may be bound simultaneously, which could\ facilitate DNA looping PUBMED:1731335, PUBMED:14731281.

    \ \ 6791 IPR009716 \

    This family represents a conserved region approximately 100 residues long within eukaryotic Ferroportin1 (FPN1), a protein that may play a role in iron export from the cell PUBMED:11809412. This family may represent a number of transmembrane regions in Ferroportin1.

    \ 3679 IPR006170 \

    The olfactory receptors of terrestrial animals exist in an aqueous environment, yet detect odorants that are primarily hydrophobic. The aqueous solubility of hydrophobic odorants is thought to be greatly enhanced via odorant binding proteins which exist in the extracellular fluid surrounding the odorant receptors PUBMED:2010751. This family is composed of pheromone binding proteins (PBP), which are male-specific and associate with pheromone-sensitive neurons and general-odorant binding proteins (GOBP).

    \ 6603 IPR009617 \

    This family consists of several hypothetical eukaryotic proteins of unknown function.

    \ 3687 IPR003376 \ Peridinin-chlorophyll-protein, a water-soluble light-harvesting complex that has a blue-green absorbing carotenoid as its main pigment, is present in most photosynthetic dinoflagellates. These proteins are composed of two similar repeated domains. These domains constitute a scaffold with pseudo-twofold symmetry surrounding a hydrophobic cavity filled by two lipid, eight peridinin, and two chlorophyll a molecules PUBMED:8650577.\ 6312 IPR009471 \

    This domain is found in the intracellular N-terminal region of the Teneurin family of proteins. These proteins are 'pair-rule' genes and are involved in tissue patterning PUBMED:11146505. The intracellular domain is cleaved in response to homophilic interaction of the extracellular domain, and translocates to the nucleus PUBMED:12361962, PUBMED:10588872. Here it probably carries out to some transcriptional regulatory activity PUBMED:12783990, PUBMED:12783990. The length of this region and the conservation suggests that there may be two structural domains here.

    \ 5262 IPR008853 \ This family contains several eukaryotic transmembrane proteins which are homologous to Homo sapiens transmembrane protein 9 . The TMEM9 gene encodes a 183 amino-acid protein that contains an N-terminal signal peptide, a single transmembrane region, three potential N-glycosylation sites and three conserved cys-rich domains in the N terminus, but no known functional domains. The protein is highly conserved between species from Caenorhabditis elegans to H. sapiens and belongs to a novel family of transmembrane proteins. The exact function of TMEM9 is unknown although it has been found to be widely expressed and localised to the late endosomes and lysosomes PUBMED:12359240. Members of this family contain CXCXC repeats in their N-terminal region.\ 2666 IPR000638 \ Gas vesicles are small, hollow, gas filled protein structures found in several cyanobacterial and archaebacterial\ microorganisms PUBMED:2513809. They allow the positioning of the bacteria at the favourable depth for growth.\ Gas vesicles are hollow cylindrical tubes, closed by a hollow, conical cap at each end. Both the conical end\ caps and central cylinder are made up of 4-5 nm wide ribs that run at right angles to the long axis of the\ structure. Gas vesicles seem to be constituted of two different protein components, GVPa and GVPc. GVPa, a\ small protein of about 70 amino acid residues, is the main constituent of gas vesicles and form the essential\ core of the structure. The sequence of GVPa is extremely well conserved. GvpJ and gvpM, two proteins encoded\ in the cluster of genes required for gas vesicle synthesis in the archaebacteria Halobacterium halobium and\ Haloferax mediterranei, have been found PUBMED:1864501 to be evolutionary related to GVPa. The exact function\ of these two proteins is not known, although they could be important for determining the shape determination\ gas vesicles. The N-terminal domain of Aphanizomenon flos-aquae protein gvpA/J is also related to GVPa.\ 6295 IPR010507 \

    MYM-type zinc fingers were identified in MYM family proteins PUBMED:9716603. Human protein is involved in a chromosomal translocation and may be responsible for X-linked retardation in XQ13.1 PUBMED:8817323. is also involved in disease. In myeloproliferative disorders it is fused to FGF receptor 1 PUBMED:9576949; in atypical myeloproliferative disorders it is rearranged PUBMED:9694738. Members of the family generally are involved in development.

    \ 3618 IPR001457 \ Bacterial proton-translocating NADH-quinone oxidoreductase (NDH-1) is composed of 14 different subunits. The chain belonging to this family is a subunit that constitutes the membrane sector of the complex. It reduces ubiquinone to ubiquinol utilising NADH.\

    \ Plant chloroplastic NADH-plastoquinone oxidoreductase reduces plastoquinone to plastoquinol. Mitochondrial NADH-ubiquinone oxidoreductase from a variety of sources reduces ubiquinone to ubiquinol.

    \ 2919 IPR007619 \ In cytomegalovirus this protein is known as UL71. This family of proteins has no known function.\ 4539 IPR007324 \

    This probable domain is found in bacterial transcriptional regulators such as DeoR and SorC. One of these proteins, , has an N-terminal helix-turn-helix that binds to DNA. This domain is probably the ligand regulator binding region. SorC is regulated by sorbose and other members of this family are likely to be regulated by other sugar substrates.

    \ 5089 IPR007926 \

    This family consists of several Borrelia P83/P100 antigen proteins.

    \ 2154 IPR007456 \ Members of this family of uncharacterised proteins are often named Smg.\ 4356 IPR002657 \

    This family of proteins are found both in prokaryotes and eukaryotes. They are related to the human bile acid:sodium symporters, which are transmembrane proteins functioning in the liver in the uptake of bile acids from portal blood plasma, a process mediated by the co-transport of Na+ PUBMED:1961729.

    \

    In yeast, overexpression of the ACR3 gene confers an arsenite- but not an arsenate-resistance phenotype PUBMED:9234670.

    \ 3828 IPR006528 \

    This group of sequences is identified by a region of about 110 amino acids found exclusively in phage-related proteins, internally or toward the C terminus. One member, gp7 of phage SPP1, appears to be involved in head morphogenesis.

    \ 2238 IPR007607 \ This family contains several uncharacterised hypothetical proteins.\ 2359 IPR002802 \

    The function of the archaebacterial proteins in this family is unknown.

    \ 7552 IPR011717 \

    This entry includes tetratricopeptide-like repeats not detected by the , and models. The tetratricopeptide repeat (TPR) motif is a protein-protein interaction module found in multiple copies in a number of functionally different proteins that facilitates specific interactions with a partner protein(s) PUBMED:10517866.

    \ 5190 IPR008026 \

    US12 is a key factor in the evasion of cellular immune response against HSV-infected cells.\ Specific inhibition of the transporter associated with antigen processing (TAP) by US12 prevents\ peptide transport into the endoplasmic reticulum and subsequent loading of major histocompatibility\ complex (MHC) class I molecules PUBMED:10521276. US12 is comprised of three helices and is\ associated with cellular membranes PUBMED:10521276.

    \ 1129 IPR004940 \ This family corresponds to a short 100 residue region found in adhesins and hypothetical adhesin-like proteins from Mycoplasmas.\ \ 679 IPR007810 \ This region is found in a number of proteins identified as being involved in Golgi function and vacuolar sorting. The molecular function of this region is unknown. Proteins containing this domain also contain a C-terminal ring finger domain.\ 3623 IPR002187 \

    In Gram-negative bacteria, the activity and concentration of glutamine synthetase (GS) is regulated in response to nitrogen source availability. PII, a tetrameric protein encoded by the glnB gene, is a component of the adenylation cascade involved in the regulation of GS activity PUBMED:1702507. In nitrogen-limiting conditions, when the ratio of glutamine to 2-ketoglutarate decreases, P-II is uridylylated on a tyrosine residue to form P-II-UMP. P-II-UMP allows the deadenylation of GS, thus activating the enzyme. Conversely, in nitrogen excess, P-II-UMP is deuridylated and then promotes the adenylation of GS. P-II also indirectly controls the transcription of the GS gene (glnA) by preventing NR-II (ntrB) to phosphorylate NR-I (ntrC) which is the transcriptional activator of glnA. Once P-II is uridylylated, these events are reversed.

    \

    P-II is a protein of about 110 amino acid residues extremely well conserved. The tyrosine which is uridylated is located in the central part of the protein. In cyanobacteria, P-II seems to be phosphorylated on a serine residue rather than being uridylated. In methanogenic archaebacteria, the nitrogenase iron protein gene (nifH) is followed by two open reading frames highly similar to the eubacterial P-II protein PUBMED:2068380. These proteins could be involved in the regulation of nitrogen fixation. In the red alga, Porphyra purpurea, there is a glnB homolog encoded in the chloroplast genome.

    \

    Other proteins highly similar to glnB are:

    \ \ 4978 IPR000538 \ The link domain PUBMED:8318021 is a hyaluronan(HA)-binding region found in proteins of vertebrates that are involved in the assembly of extracellular matrix, cell adhesion, and migration. The structure has been shown PUBMED:8797823 to consist of two alpha helices and two antiparallel beta sheets arranged around a large hydrophobic core similar to that of C-type\ lectin. This domain contains four conserved cysteines involved in two disulphide bonds. The link domain has also been termed HABM PUBMED:8318021 (HA binding module) and PTR PUBMED:8690089 (proteoglycan tandem repeat). Proteins with such a domain include the proteoglycans aggrecan, brevican, neurocan and versican, which are expressed in the CNS; the cartilage link protein (LP), a proteoglycan that together with HA and aggrecan forms multimolecular aggregates; Tumor necrosis factor-inducible protein TSG-6, which may be involved in cell-cell and cell-matrix interactions during inflammation and tumorgenesis; and CD44 antigen, the main cell surface receptor for HA.\ 2213 IPR007565 \ This is a family of uncharacterised, hypothetical prokaryotic proteins.\ 5232 IPR008855 \ This family consists of several eukaryotic translocon-associated protein, delta subunit precursors (TRAP-delta or SSR-delta). The exact function of this protein is unknown PUBMED:7492314.\ 4248 IPR008282 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S3 is one of the proteins from the small ribosomal subunit. In \ Escherichia coli, S3 is known to be involved in the binding of initiator Met-tRNA. This family of ribosomal proteins includes S3 from bacteria, algae and \ plant chloroplast, cyanelle, archaebacteria, plant mitochondria, vertebrates, insects,\ Caenorhabditis elegans and yeast PUBMED:8036511. This entry is the N-terminal domain.

    \ 5808 IPR009243 \

    This family consists of several short spider neurotoxin proteins including many from the Funnel-web spider.

    \ 4384 IPR004728 \ Members of the NSCC2 family have been sequenced from various yeast, fungal and animals species including Saccharomyces cerevisiae, Drosophila melanogaster and Homo sapiens. These proteins are the Sec62 proteins, believed to be associated with the Sec61 and Sec63 constituents of the general protein secretary systems of yeast microsomes. They are also the non-selective cation (NS) channels of the mammalian cytoplasmic membrane. The yeast Sec62 protein has been shown to be essential for cell growth. The mammalian NS channel proteins have been implicated in platelet derived growth factor(PGDF) dependent single channel current in fibroblasts. These channels are essentially closed in serum deprived tissue-culture cells and are specifically opened by exposure to PDGF. These channels are reported to exhibit equal selectivity for Na+, K+ and Cs+ with low permeability to Ca2+, and no permeability to anions.\ 4292 IPR002738 \

    Members of this protein family are part of the ribonuclease P complex () that takes part in endonucleolytic cleavage of RNA, removing 5'-extra-nucleotide from tRNA precursor. This process is essential for tRNA processing.

    \ 1882 IPR003729 \

    This entry describes proteins of unknown function.

    \ 7432 IPR011461 \

    This is a large family of short hypothetical proteins in Leptospira interrogans.

    \ 6443 IPR009527 \

    This family consists of several short Circovirus proteins of unknown function.

    \ 5102 IPR007939 \

    This family consists of several bacterial copper resistance proteins. Copper is essential and\ serves as a cofactor for more than 30 enzymes yet a surplus of copper is toxic and leads to free radical\ formation and oxidation of biomolecules. Therefore, copper homeostasis is a key requisite for every\ organism. CopB serves to extrude copper when it approaches toxic levels PUBMED:11696373 and has been\ shown to act as an ATPase ().

    \ 2851 IPR007805 \

    Gas vesicles are intracellular, protein-coated, and hollow organelles found in cyanobacteria and halophilic archaea. They are permeable to ambient gases by diffusion and provide buoyancy, enabling cells to move upwards in liquid to access oxygen and/or light. Proteins containing this domain are involved in the formation of gas vesicles PUBMED:1404376.

    \ 7250 IPR010884 \

    This family contains sexual stage s48/45 antigens from Plasmodium (approximately 450 residues long). These are surface proteins expressed by Plasmodium male and female gametes that have been shown to play a conserved and important role in fertilisation PUBMED:11163248.

    \ 2797 IPR004139 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \ Alpha-1,3-mannosyl-glycoprotein beta-1,2-N-acetylglucosaminyltransferase (GNT-I, GLCNAC-T I) transfers N-acetyl-D-glucosamine from UDP to high-mannose glycoprotein N-oligosaccharide. This is an essential step in the synthesis of complex or hybrid-type N-linked oligosaccharides. The enzyme is an integral membrane protein localized to the Golgi apparatus, and is probably distributed in all tissues. The catalytic domain is located at the C-terminus PUBMED:10406843. These proteins are members of the glycosyl transferase family 13 ()\ 5765 IPR009232 \

    This region at the C terminus of the APC proteins binds the microtubule-associating protein EB-1 PUBMED:11514192. At the C terminus of the alignment is also a PDZ-binding domain. A short motif in the middle of the region appears to be found in the APC2 proteins (e.g. ).

    \ 3225 IPR000907 \

    Lipoxygenases () are a class of iron-containing dioxygenases\ which catalyzes the hydroperoxidation of lipids, containing a cis,cis-1,4-\ pentadiene structure. They are common in plants where they may be involved in\ a number of diverse aspects of plant physiology including growth and\ development, pest resistance, and senescence or responses to wounding PUBMED:. In\ mammals a number of lipoxygenases isozymes are involved in the metabolism of\ prostaglandins and leukotrienes PUBMED:3017195. Sequence data is available for the\ following lipoxygenases:

    \ \ \
  • Plant lipoxygenases ( ). Plants express a variety of cytosolic\ isozymes as well as what seems to be a chloroplast isozyme PUBMED:7508918.
  • \
  • Mammalian arachidonate 5-lipoxygenase ( ).
  • \
  • Mammalian arachidonate 12-lipoxygenase ( ).
  • \
  • Mammalian erythroid cell-specific 15-lipoxygenase ( ).
  • \ \ \

    The iron atom in lipoxygenases is bound by four ligands, three of which are\ histidine residues PUBMED:8502991. Six histidines are conserved in all lipoxygenase\ sequences, five of them are found clustered in a stretch of 40 amino acids.\ This region contains two of the three zinc-ligands; the other histidines have\ been shown PUBMED:1567851 to be important for the activity of lipoxygenases.

    \

    \ 1191 IPR000192 \ Aminotransferases share certain mechanistic features with other pyridoxal-\ phosphate dependent enzymes, such as the covalent binding of the pyridoxal-\ phosphate group to a lysine residue. On the basis of sequence similarity,\ these various enzymes can be grouped PUBMED:8482384 into subfamilies.\ This family is called class-V.\ 2999 IPR002571 \ In response to elevated temperature, both prokaryotic and eukaryotic cells increase expression of a small family of chaperones. The regulatory network that functions to control the transcription of the heat shock genes in bacteria includes unique structural motifs in the promoter region of these genes and the expression of alternate sigma factors. One of the conserved structural motifs, the inverted repeat CIRCE element, is found in the 5' region of many heat shock operons PUBMED:8606155.\

    For Bacillus subtilis three classes of heat shock genes regulated by different mechanisms have been described. Regulation of class I heat shock genes (dnaK and groE operons) involves an inverted repeat (CIRCE element) which most probably serves as an operator for a repressor PUBMED:8576042.

    \ 6450 IPR009532 \

    This family consists of several enterobacterial SepQ proteins from Escherichia coli and Citrobacter rodentium. The function of this family is unclear.

    \ 1313 IPR013317 \

    This entry represents the central domain of bacterial DnaA proteins PUBMED:8110826, PUBMED:1779750, PUBMED:2558436 that play an important role in initiating and regulating chromosomal replication. DnaA is an ATP- and DNA-binding protein. It binds specifically to 9 bp nucleotide repeats known as dnaA boxes which are found in the chromosome origin of replication (oriC).

    \

    DnaA is a protein of about 50 kDa that contains two conserved regions: the first is located in the N-terminal half and corresponds to the ATP-binding domain, the second is located in the C-terminal half and could be involved in DNA-binding. The protein may also bind the RNA polymerase beta subunit, the dnaB and dnaZ proteins, and the groE gene products (chaperonins) PUBMED:2172087.

    \ 3277 IPR007704 \ PIG-M has a DXD motif. The DXD motif is found in many glycosyltransferases that utilise nucleotide sugars. It is thought that the motif is involved in the binding of a manganese ion that is required for association of the enzymes with nucleotide sugar substrates PUBMED:11226175.\ 4869 IPR005361 \

    This is a small family of hypothetical bacterial proteins of unknown function.

    \ 6770 IPR009707 \

    This family consists of several bacterial GlpM membrane proteins. GlpM is a hydrophobic protein containing 109 amino acids. It is thought that GlpM may play a role in alginate biosynthesis in Pseudomonas aeruginosa PUBMED:7642508.

    \ 6549 IPR009596 \

    This entry represents the C terminus of a number of Arabidopsis thaliana hypothetical proteins of unknown function. Family members contain a conserved DFD motif.

    \ 4230 IPR000307 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein S16 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities PUBMED:, groups: \

    \ \ \ S16 proteins have about 100 amino-acid residues.

    \ \ \ \ 7328 IPR003618 \ This domain is found in the central region of transcription elongation factor S-II and in several hypothetical proteins.\ 6285 IPR009458 \

    Ectatomin is a toxin from the venom of the ant Ectatomma tuberculatum. Ectatomin can efficiently insert into the plasma membrane, where it can form channels. Ectatomin was shown to inhibit L-type calcium currents in isolated rat cardiac myocytes PUBMED:10336635. In these cells, ectatomin induces a gradual, irreversible increase in ion leakage across the membrane, which can lead to cell death.

    \

    Ectatomin is comprised of two subunits, A and B, which are homologous. The structure of ectatomin reveals that each subunit consists of two alpha helices with a connecting hinge region, which form a hairpin structure that is stabilised by disulphide bridges. A disulphide bridge between the hinge regions of the two subunits links the heterodimer together, forming a closed bundle of four helices with a left-handed twist PUBMED:7881269.

    \ 4077 IPR004582 \

    To be effective as a mechanism that preserves genomic integrity, the DNA damage checkpoint must be\ extremely sensitive in its ability to detect DNA damage. In Saccharomyces cerevisiae the Ddc1/Rad17/Mec3 complex and Rad24 are DNA damage checkpoint components which may promote checkpoint\ activation by "sensing" DNA damage directly PUBMED:11691833. Rad24 shares sequence homology with RF-c, a protein that recognizes DNA template/RNA primer hybrids during DNA replication. The\ Ddc1 complex has structural homology to proliferating-cell nuclear antigen (PCNA), which clamps onto\ DNA and confers processivity to DNA polymerases delta and epsilon. Rad24 is postulated to\ recognize DNA lesions and then recruit the Ddc1 complex to generate checkpoint signals.

    \ 2894 IPR007013 \ Replicative DNA polymerases are capable of polymerizing tens of thousands of nucleotides without dissociating from their DNA templates. The high processivity of these polymerases is dependent upon accessory proteins that bind to the catalytic subunit of the polymerase or to the substrate. The Epstein-Barr virus (EBV) BMRF1 protein is an essential component of the viral DNA polymerase and is absolutely required for lytic virus replication PUBMED:9934686. BMRF1 is also a transactivator PUBMED:9934686. This family is predicted to have a UL42-like structure PUBMED:10882068.\ 6069 IPR010420 \

    This is a family of uncharacterised proteins found in both eukaryotes and bacteria.

    \ 8060 IPR013241 \

    This family of fungal proteins form a subunit of RNase P, the ribonucleoprotein enzyme that cleaves the leader sequence of precursor tRNAs to generate mature tRNAs. The structure of Pop3 has been assigned the L7Ae/L30e fold PUBMED:15613537. This RNA-binding fold is also present in human RNase P subunit Rpp38, raising the possibility that Pop3p and Rpp38 are functional homologues.

    \ 2713 IPR008164 \ This short repeat of unknown function is found in multiple copies in several Caenorhabditis elegans proteins. The repeat is five residues long and consists of XGLTT where X can be any amino acid.\ 164 IPR004202 \ Cytochrome c oxidase, a 13 sub-unit complex, is the terminal oxidase in the mitochondrial electron transport chain. This\ family is composed of cytochrome c oxidase subunit VIIc. The yeast member of this family is called COX VIII\ 6403 IPR009509 \

    This family consists of several hypothetical proteins from Neisseria meningitidis. The function of this family is unknown.

    \ 2727 IPR001150 \

    Synonym(s):Pyruvate formate-lyase

    \ \

    Escherichia coli Formate C-acetyltransferase () (genes pflB and pflD) is\ a key enzyme of anaerobic glucose metabolism, it converts pyruvate and CoA\ into acetyl-CoA and pyruvate. This enzyme is posttranslationally interconverted,\ under anaerobic conditions, from an inactive to an active form that carries a stable\ radical localized to a specific glycine at the C-terminus \ PUBMED:1310545. \ Such a glycine radical seems\ PUBMED:8421692 also to be present\ in Escherichia coli (gene nrdD) and bacteriophage T4 (gene nrdD or sunY) anaerobic\ ribonucleoside-triphosphate reductase ().

    \ 763 IPR004099 \ Proteins containing this domain include both class I and class II oxidoreductases and also\ NADH oxidases and peroxidases.\ 1185 IPR003198 \ This entry contains glycine () and inosamine () amidinotransferases, enzymes involved in creatine and streptomycin biosynthesis respectively.\ 4416 IPR001085 \ Synonym(s): Serine hydroxymethyltransferase, Serine aldolase, Threonine aldolase\

    Serine hydroxymethyltransferase (SHMT) is a pyridoxal phosphate (PLP) dependent enzyme and belongs to the aspartate aminotransferase superfamily (fold type I) PUBMED:10828359. The pyridoxal-P group is attached to a lysine residue around which the sequence is highly conserved in all forms of the enzyme PUBMED:8305478. The enzyme carries out interconversion of serine and glycine using PLP as the cofactor. SHMT catalyses the transfer of a hydroxymethyl group from N5, N10- methylene tetrahydrofolate to glycine, resulting in the formation of serine and tetrahydrofolate. Both eukaryotic and prokaryotic SHMT enzymes form tight obligate homodimers and the mammalian enzyme forms a homotetramer PUBMED:10828359, PUBMED:11877399. PLP dependent enzymes were previously classified into alpha, beta and gamma classes, based on the chemical characteristics (carbon atom involved) of the reaction they catalysed. The availability of several structures allowed a comprehensive analysis of the evolutionary classification of PLP dependent enzymes, and it was found that the functional classification did not always agree with the evolutionary history of these enzymes. Structure and sequence analysis has revealed that the PLP dependent enzymes can be classified into four major groups of different evolutionary origin: aspartate aminotransferase superfamily (fold type I), tryptophan synthase beta superfamily (fold type II), alanine racemase superfamily (fold type III), D-amino acid superfamily (fold type IV) and glycogen phophorylase family (fold type V) PUBMED:8112347, PUBMED:7748903.

    \

    In vertebrates, glycine hydroxymethyltransferase exists in a cytoplasmic and a mitochondrial form whereas\ only one form is found in prokaryotes.

    \ 5835 IPR010306 \

    This family consists of several bacterial phosphonate metabolism (PhnJ) sequences. The exact role that PhnJ plays in phosphonate utilisation is unknown.

    \ 689 IPR008283 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M17 (leucyl aminopeptidase family, clan MF), the type example being leucyl aminopeptidase from Bos taurus.

    \ \

    Aminopeptidases are exopeptidases involved in the processing and regular\ turnover of intracellular proteins, although their precise role in cellular\ metabolism is unclear PUBMED:1555602, PUBMED:2395881. Leucine aminopeptidases cleave leucine residues\ from the N-terminal of polypeptide chains, but substantial rates are evident\ for all amino acids PUBMED:2395881.

    \ \

    The enzymes exist as homo-hexamers, comprising 2 trimers stacked on top of\ one another PUBMED:2395881. Each monomer binds 2 zinc ions and folds into 2\ alpha/beta-type quasi-spherical globular domains, producing a comma-like shape PUBMED:2395881. The\ N-terminal 150 residues form a 5-stranded beta-sheet with 4 parallel and 1\ anti-parallel strand sandwiched between 4 alpha-helices PUBMED:2395881. An alpha-helix\ extends into the C-terminal domain, which comprises a central 8-stranded\ saddle-shaped beta-sheet sandwiched between groups of helices, forming the\ monomer hydrophobic core PUBMED:2395881. A 3-stranded beta-sheet resides on the surface\ of the monomer, where it interacts with other members of the hexamer PUBMED:2395881.\ The two zinc ions and the active site are entirely located in the C-terminal\ catalytic domain PUBMED:2395881.

    \ \ 7755 IPR012489 \

    This family consists of protein sequences that are similar to the nuclease A inhibitor expressed by bacteria of the genus Anabaena ((NuiA, ). This sequence is organised to form an alpha-beta-alpha sandwich fold, which is similar to the PR-1-like fold. NuiA interacts with nuclease A by means of residues located at one end of the molecule, including residues making up the loop between helices III and IV and the loop between strands C and D. The mechanism of inhibition of nuclease A by NuiA is as yet incompletely understood PUBMED:12095254.

    \ 7291 IPR010902 \

    NUMOD4 is a putative DNA-binding motif found in homing endonucleases and related proteins PUBMED:13678957.

    \ 1517 IPR006840 \ The ChaC protein is thought to be associated with the putative ChaA Ca2+/H+ cation transport protein in Escherichia coli. Its function is not known. This family also includes homologues regions from several other bacterial and eukaryotic proteins.\ 5534 IPR008425 \ This family consists of cyclin-dependent kinase inhibitor 3 or kinase associated phosphatase proteins from several mammalian species. The cyclin-dependent kinase (Cdk)-associated protein phosphatase (KAP) is a Homo sapiens dual specificity protein phosphatase that dephosphorylates Cdk2 on threonine 160 in a cyclin-dependent manner PUBMED:10987270,PUBMED:8127873.\ 7714 IPR013096 \

    This family represents the conserved barrel domain of the cupin superfamily PUBMED:9573603 (cupa is the Latin term for a small barrel).

    \ 3195 IPR007074 \ The LICD family of proteins show high sequence similarity and are involved in phosphorylcholine metabolism. There is evidence to show that LicD2 mutants have a reduced ability to take up choline, have decreased ability to adhere to host cells and are less virulent PUBMED:10200966.\ 2273 IPR006901 \

    This is a family of uncharacterised bacterial proteins.

    \ 2192 IPR007486 \ Some family members may be secreted or integral membrane proteins.\ 4909 IPR004907 \

    ATP synthase () is a multisubunit non-phosphorylated ATPase that is involved in the transport of ions. V-type (vacuolar) ATPases are responsible for acidifying a variety of intracellular compartments in eukaryotic cells. V-ATPase is a heteromultimeric enzyme composed of a peripheral catalytic V1 complex of components A to H, attached to an integral membrane V0 proton pore complex, components A, C, C', C'' and D. This family represents subunit C of the peripheral V1 complex of vacuolar ATPase, which is responsible for the assembly of the catalytic sector of the enzyme and probably has a specific function in its catalytic activity.

    \ 7636 IPR012894 \

    The members of this entry contain a region that is found towards the N-terminus of the HipA protein expressed by various bacterial species (for example ). This protein is known to be involved in high-frequency persistence to the lethal effects of inhibition of either DNA or peptidoglycan synthesis PUBMED:1715862. When expressed alone, it is toxic to bacterial cells PUBMED:1715862, but it is usually tightly associated with HipB PUBMED:8021189, and the HipA-HipB complex may be involved in autoregulation of the hip operon. The hip proteins may be involved in cell division control and may interact with cell division genes or their products PUBMED:8021189.

    \ 7731 IPR012872 \

    The hypothetical eukaryotic proteins found in this family are of unknown function.

    \ 726 IPR002929 \

    This family consists mainly of the potato leaf roll virus read through protein. This is generated via a readthrough of open reading frame 3, a coat protein, allowing transcription of open reading frame 5 to give an extended coat protein\ with a large C-terminal addition or read through domain PUBMED:7513925.\ The read through protein is thought to play a role in the circulative aphid transmission of potato leaf roll virus PUBMED:7513925.\ Also in the family is open reading frame 6 from beet western yellows virus and potato leaf roll virus, both luteovirus, and an unknown protein from cucurbit aphid-borne yellows virus a closterovirus.

    \ 1540 IPR004220 \ 5-carboxymethyl-2-hydroxymuconate isomerase () transforms\ 5-carboxymethyl-2-hydroxy-muconic acid into 5-oxo-pent-3-ene-1,2,5-tricarboxylic acid during the\ third step of the homoprotocatechuate catabolic pathway.\ 465 IPR003594 \ This domain is found in several ATP-binding proteins for example: histidine kinase, DNA gyrase B, topoisomerases, heat shock protein HSP90, phytochrome-like ATPases and DNA mismatch repair proteins.\ 1101 IPR003853 \ This is a family of adenoviral early E1A proteins. The E1A protein is 32 kDa it can however be cleaved to yield the 28 kDa protein. The E1A protein is responsible for the transcriptional activation of the early genes with in the viral genome at the start of the infection process as well as some cellular genes PUBMED:1835093.\ 6759 IPR009699 \

    This family consists of several Mastadenovirus E4 ORF3 proteins. Early proteins E4 ORF3 and E4 ORF6 have complementary functions during viral infection. Both proteins facilitate efficient viral DNA replication, late protein expression, and prevention of concatenation of viral genomes. A unique function of E4 ORF3 is the reorganisation of nuclear structures known as PML oncogenic domains (PODs). The function of these domains is unclear, but PODs have been implicated in a number of important cellular processes, including transcriptional regulation, apoptosis, transformation, and response to interferon PUBMED:12692231.

    \ 5425 IPR008769 \

    Polyhydroxyalkanoates (PHAs) are storage polyesters synthesised by various bacteria as intracellular carbon and energy reserve material. PHAs are accumulated as water-insoluble inclusions within the cells. This family consists of the phasins PhaF and PhaI which act as a transcriptional regulator of PHA biosynthesis genes. PhaF has been proposed to repress expression of the phaC1 gene and the phaIF operon.

    \ 1726 IPR001653 \ Diaminopimelate epimerase () catalyzes the isomeriazation of L,L- to D,L-meso-diaminopimelate in the biosynthetic pathway leading from aspartate to lysine. This enzyme is a protein of about 30 kDa. Two conserved cysteines seem PUBMED:9843410 to function as the acid and base in the catalytic mechanism.\ 4699 IPR004242 \ This family includes a En/Spm-like transposable element, Tdc1 from carrot PUBMED:9180694. The function of these proteins is unknown.\ 5512 IPR008537 \ This family contains proteins of unknown function from archaeal, bacterial and plant species.\ 7257 IPR009993 \

    This family contains the bacterial enzyme 4-alpha-L-fucosyltransferase (Fuc4NAc transferase) (approximately 360 residues long). This catalyses the synthesis of Fuc4NAc-ManNAcA-GlcNAc-PP-Und (lipid III) as part of the biosynthetic pathway of enterobacterial common antigen (ECA), a polysaccharide comprised of the trisaccharide repeat unit Fuc4NAc-ManNAcA-GlcNAc PUBMED:11673418.

    \ 6007 IPR009332 \

    This family consists of several eukaryotic Surfeit locus protein 5 (SURF5) sequences. The human Surfeit locus has been mapped on chromosome 9q34.1. The locus includes six tightly clustered housekeeping genes (Surf1-6), and the gene organisation is similar in human, mouse and chicken Surfeit locus. The exact function of this family is unknown PUBMED:11891058.

    \ 1311 IPR001470 \ Chlorosomes, which are attached to the inner surface of the cytoplasmic\ membrane, consist of four polypeptides and associated pigments and lipids.\ The principal light-harvesting pigment of the green filamentous bacterium\ Chloroflexus aurantiacus is bacteriochlorophyll (Bchl) c. This pigment is\ either bound to, or constrained by, a small approximately 80-residue\ polypeptide designated Bchlc-binding protein. In C.aurantiacus, a C-terminal\ extension is believed to play a role in proper incorporation of the protein\ during chlorosome assembly PUBMED:2376566. The protein has a high degree of similarity\ to Bchlc-binding proteins of other photosynthetic bacteria.\ 2057 IPR005220 \

    This family includes putative periplasmic proteins.

    \ 2803 IPR004295 \

    This family includes the gp36 protein from retroviruses such as mouse mammary tumor virus (MMTV) and Human endogenous retrovirus (HERVs). The gp36 protein is an envelope protein that has a predicted transmembrane helix at\ its amino terminus.

    \ 3286 IPR000835 \

    The marR-type HTH domain is a DNA-binding, winged helix-turn-helix (wHTH)\ domain of about 135 amino acids present in transcription regulators of the\ marR/slyA family, involved in the development of antibiotic resistance. This\ family of transcription regulators is named after Escherichia coli marR, a\ repressor of genes which activate the multiple antibiotic resistance and\ oxidative stress regulons, and after slyA from Salmonella typhimurium and E.\ coli, a transcription regulator that is required for virulence and survival in\ the macrophage environment. Regulators with the marR-type HTH domain are\ present in bacteria and archaea and control a variety of biological functions,\ including resistance to multiple antibiotics, household disinfectants, organic\ solvents, oxidative stress agents and regulation of the virulence factor\ synthesis in pathogens of humans and plants. Many of the marR-like regulators\ respond to aromatic compounds PUBMED:10498949, PUBMED:10094687, PUBMED:12649270.

    \ \

    The crystal structures of marR, mexR and slyA have been determined and show a\ winged HTH DNA-binding core flanked by helices involved in dimerization. The DNA-binding domains are ascribed to the superfamily of winged\ helix proteins, containing a three (four)-helix (H) bundle and a\ three-stranded antiparallel beta-sheet (B) in the topology:\ H1-(H1')-H2-B1-H3-H4-B2-B3-H5-H6. Helices 3 and 4 comprise the\ helix-turn-helix motif and the beta-sheet is called the wing. Helix 4 is\ termed the recognition helix, like in other HTHs where it binds the DNA major\ groove. The helices 1, 5 and 6 are involved in dimerization, as most marR-like\ transcription regulators form dimers PUBMED:12649270, PUBMED:11473263.\

    \ 7224 IPR010870 \

    This family represents a conserved region approximately 400 residues long within the bacterial phosphate-selective porins O and P. These are anion-specific porins, the binding site of which has a higher affinity for phosphate than chloride ions. Porin O has a higher affinity for polyphosphates, while porin P has a higher affinity for orthophosphate PUBMED:1370289. In P. aeruginosa, porin O was found to be expressed only under phosphate-starvation conditions during the stationary growth phase PUBMED:1406271.

    \ 7265 IPR009998 \

    This family contains the precursor of the bacterial protein YfaZ (approximately 180 residues long). Many members of this family are hypothetical proteins.

    \ 5787 IPR009235 \

    This family consists of several hypothetical baculovirus proteins of unknown function.

    \ 4508 IPR006776 \

    The precise function of SsgA is unknown. It is an acidic, cytosolic protein which has been found to be essential for spore formation, and to stimulate cell division in Streptomyces coelicolor PUBMED:11004161.

    \ 5341 IPR008873 \ Conjugative transfer of a bacteriocin plasmid, pPD1, of Enterococcus faecalis is induced in response to a peptide sex pheromone, cPD1, secreted from plasmid-free recipient cells. cPD1 is taken up by a pPD1 donor cell and binds to an intracellular receptor, TraA. Once a recipient cell acquires pPD1, it starts to produce an inhibitor of cPD1, termed iPD1, which functions as a TraA antagonist and blocks self-induction in donor cells. TraA transduces the signal of cPD1 to the mating response PUBMED:12399504.\ 2169 IPR007510 \ This is a family of hypothetical archaeal proteins.\ 5598 IPR008656 \ This family consists of several inositol 1,3,4-trisphosphate 5/6-kinase proteins. Inositol 1,3,4-trisphosphate is at a branch point in inositol phosphate metabolism. It is dephosphorylated by specific phosphatases to either inositol 3,4-bisphosphate or inositol 1,3-bisphosphate. Alternatively, it is phosphorylated to inositol 1,3,4,6-tetrakisphosphate or inositol 1,3,4,5-tetrakisphosphate by inositol trisphosphate 5/6-kinase PUBMED:8662638.\ 4592 IPR006960 \ This protein family is uncharacterized. Proteins accumulate in large amounts in tenuivirus infected cells. They are found in the inclusion bodies that are formed after infection PUBMED:8317091.\ 7627 IPR012437 \

    This family contains sequences covering an approximately 270 amino acid stretch of a group of hypothetical proteins. These proteins are expressed by archaeal species of the Methanosarcina genus.

    \ 7572 IPR011665 \ This region covers both the Brf homology II and III regions PUBMED:12660736. This region is involved in binding TATA binding protein PUBMED:12660736.\ 255 IPR004951 \ This family consists of proteins of unknown function found in Caenorhabditis species.\ 83 IPR002372 \ Pyrrolo-quinoline quinone (PQQ) is a redox coenzyme, which serves as a cofactor\ for a number of enzymes (quinoproteins) and particularly for some bacterial\ dehydrogenases PUBMED:2549854, PUBMED:2572081. A number of bacterial quinoproteins belong to this family.\ \

    Enzymes in this group have repeats of a beta propeller.

    \ 6163 IPR009404 \

    This family consists of several Coronavirus 5a proteins. The function of this family is unknown PUBMED:9168126.

    \ 8048 IPR013180 \

    This domain is found in eukaryotic proteins. A human nuclear protein with this domain () is thought to have a role in apoptosis PUBMED:12659813.

    \ 5065 IPR007902 \

    This family includes CHL4 that is involved in chromosome segregation PUBMED:8243998. It\ is required for chromosome stability but is non-essential for growth.

    \ 1519 IPR012328 \ Synonym(s): Chalcone synthase, Flavonone synthase, 6'-deoxychalcone synthase \

    Naringenin-chalcone synthases () and stilbene synthases (STS) \ (formerly known as resveratrol synthases) are related plant enzymes. CHS is an\ important enzyme in flavanoid biosynthesis and STS is a key enzyme in \ stilbene-type phyloalexin biosynthesis. Both enzymes catalyze the addition of three\ molecules of malonyl-CoA to a starter CoA ester (a typical example is\ 4-coumaroyl-CoA), producing either a chalcone (with CHS) or stilbene (with\ STS) PUBMED:.

    \ \

    These enzymes have a conserved cysteine residue, located in the central section\ of the protein sequence, which is essential for the catalytic activity of both\ enzymes and probably represents the binding site for the 4-coumaryl-CoA group\ PUBMED:2033084.

    \

    This domain of chalcone synthase is reported to be structurally similar to domains in thiolase and beta-ketoacyl synthase. The differences in activity are accounted for by differences in the N-terminal domain.

    \ 5444 IPR008708 \ This family consists of several Neisseria meningitidisTspB virulence factor proteins.\ 1857 IPR002835 \

    The function of the prokaryotic proteins in this family is unknown.

    \ 7686 IPR012445 \

    This family is made up of sequences derived from hypothetical eukaryotic proteins of unknown function.

    \ 1282 IPR002379 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    Synonym(s): ATP synthase, F(1)-ATPase

    \ \

    The H(+)-transporting two-sector ATPase () produce ATP from ADP in the presence of a proton gradient across the membrane. These ATPases have two components, CF(1) the catalytic core and CF(0) the membrane proton channel. CF(1) has five subunits, alpha (3), beta (3), gamma (1), delta (1) and epsilon (1). CF(0) seems to have nine subunits, A, B, C, D, E, F, G, F6 and 8 (or A6L).

    \

    The CF(0) C subunit (also called protein 9, proteolipid, or subunit III) PUBMED:1832049, PUBMED:1533253\ is a highly hydrophobic protein of about 8 kDa which has been implicated in the\ proton-conducting activity of ATPase. Structurally the C subunit consist of two\ long terminal hydrophobic regions, which probably span the membrane, and a\ central hydrophilic region. N,N'-dicyclohexylcarbodiimide (DCCD) can bind\ covalently to the C subunit thereby abolishing the ATPase activity. DCCD binds to\ a specific glutamate or aspartate residue which is located in the middle of\ the second hydrophobic region near the C-terminal. Proteins in this family include bacterial, plasma membrane and vacoular ATP synthases.

    \ 4980 IPR006085 \

    Xeroderma pigmentosum (XP) PUBMED:8160271 is a human autosomal recessive disease, characterized by a high incidence of sunlight-induced skin cancer. People's skin cells with this condition are hypersensitive to ultraviolet light, due to defects in the incision step of DNA excision repair. There are a minimum of seven genetic complementation groups involved in this pathway: XP-A to XP-G. XP-G is one of the most rare and phenotypically heterogeneous of XP, showing anything from slight to extreme dysfunction in DNA excision repair PUBMED:8464724, PUBMED:8206890. XP-G can be corrected by a 133 Kd nuclear protein, XPGC PUBMED:8160271. XPGC is an acidic protein that confers normal UV resistance in expressing cells PUBMED:8206890. It is a magnesium-dependent, single-strand DNA endonuclease that makes structure-specific endonucleolytic incisions in a DNA substrate containing a duplex region and single-stranded arms PUBMED:8206890, PUBMED:8090225. XPGC cleaves one strand of the duplex at the border with the single-stranded region PUBMED:8090225.

    \

    XPG belongs to a family of proteins that includes RAD2 from budding yeast and rad13 from fission yeast, which are single-stranded DNA endonucleases PUBMED:8090225, PUBMED:8247134; mouse and human FEN-1, a structure-specific endonuclease; RAD2 from fission yeast and RAD27 from budding yeast; fission yeast exo1, a 5'-3' double-stranded DNA exonuclease that may act in a pathway that corrects mismatched base pairs; yeast DHS1, and yeast DIN7. Sequence alignment of this family of proteins reveals that similarities are largely confined to two regions. The first is located at the N-terminal extremity (N-region) and corresponds to the first 95 to 105 amino acids. The second region is internal (I-region) and found towards the C-terminus; it spans about 140 residues and contains a highly conserved core of 27 amino acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). It is possible that the conserved acidic residues are involved in the catalytic mechanism of DNA excision repair in XPG. The amino acids linking the N- and I-regions are not conserved.

    \ 6407 IPR010558 \

    This family consists of several Caenorhabditis elegans specific ly-6-related HOT and ODR proteins. These proteins are involved in the olfactory system. Odr-2 mutants are known to be defective in the ability to chemotax to odorants that are recognised by the two AWC olfactory neurons. Odr-2 encodes a membrane-associated protein related to the Ly-6 superfamily of GPI-linked signaling proteins PUBMED:11139503.

    \ 1622 IPR001260 \ Coprogen oxidase (i.e. coproporphyrin III oxidase or coproporphyrinogenase) catalyses \ the oxidative decarboxylation of coproporphyrinogen III to proto-porhyrinogen IX in the \ haem and chlorophyll biosynthetic pathways PUBMED:8407975, PUBMED:8219054. The protein is a \ homodimer containing two internally bound iron atoms per molecule of native protein \ PUBMED:3516695. The enzyme is active in the presence of molecular oxygen that acts\ as an electron acceptor). The enzyme is widely distributed having been found in a variety of eukaryotic and \ prokaryotic sources.\ 3025 IPR001338 \ The surface of many fungal spores is covered by a hydrophobic sheath, the rodlet layer \ whose main component is a protein known as the rodlet protein PUBMED:2065971, PUBMED:1459459. The \ rodlet proteins of Neurospora crassa (gene eas) and Emericella nidulans (gene rodA) are \ evolutionary related to proteins found in the cell wall of fruiting bodies of the \ mushroom Schizophyllum commune PUBMED:2401401.\ Collectively, these low-molecular-weight, cysteine-rich (eight conserved cysteines), \ hydrophobic proteins, are known as hydrophobins.\ 6343 IPR009485 \

    This family consists of several Borna disease virus P10 (or X) proteins. Borna disease virus (BDV) is unique among the non-segmented negative-strand RNA viruses of animals and man because it transcribes and replicates its genome in the nucleus of the infected cell. It has been suggested that the p10 protein plays a role in viral RNA synthesis or ribonucleoprotein transport PUBMED:10725419.

    \ 1964 IPR004949 \ This family includes proteins of unknown function from plants.\ 2951 IPR000320 \

    This domain identifies a group of sequences which belong to the MEROPS peptidase family C46 (clan CH). The type example is the hedgehog protein from Drosophila melanogaster which self-processes by a one-time cysteine dependant self cleavage.

    \ \ \ \

    Hedgehog is a family of secreted signal molecules required\ for embryonic cell differentiation. members of this family are\ composed of two domains. These proteins are autocatalytically cleaved by the\ C-terminal domain . This family\ is the N-terminal domain that is responsible for both local and long-range\ signalling activities.

    \ \

    The structure of this domain is known PUBMED:7477329 and reveals a tetrahedrally coordinated zinc ion that appears to be structurally\ analogous to the zinc coordination sites of zinc hydrolases, such as\ thermolysin and carboxypeptidase A. This putative catalytic site\ represents a distinct activity from the autoprocessing activity that\ resides in the carboxy-terminal domain.

    \ 1637 IPR002551 \ The type I glycoprotein S of coronavirus, trimers of which constitute the typical viral spikes, is assembled into virions through noncovalent interactions with the M protein. The spike glycoprotein is translated\ as a large polypeptide that is subsequently cleaved to S1 and S2 PUBMED:2984314. Both chimeric S proteins appeared to cause cell fusion when expressed individually, suggesting that they were biologically fully active PUBMED:10627571. The spike is a type I membrane glycoprotein that possesses a conserved transmembrane anchor and an unusual cysteine-rich (cys) domain that bridges the putative junction of the anchor and the cytoplasmic tail PUBMED:10725213.\ 7730 IPR012474 \

    This family is composed of plant proteins that are similar to FRIGIDA protein expressed by Arabidopsis thaliana (). This protein is probably nuclear and is required for the regulation of flowering time in the late-flowering phenotype. It is known to increase RNA levels of flowering locus C. Allelic variation at the FRIGIDA locus is a major determinant of natural variation in flowering time PUBMED:11030654.

    \ 3705 IPR000121 \ A number of enzymes that catalyze the transfer of a phosphoryl group from\ phosphoenolpyruvate (PEP) via a phospho-histidine intermediate have been shown\ to be structurally related PUBMED:7686067, PUBMED:8973315, PUBMED:2176881, PUBMED:1557039. All these enzymes share the same catalytic mechanism: they bind PEP and\ transfer the phosphoryl group from it to a histidine residue. The sequence\ around that residue is highly conserved. This domain is often found associated with the pyruvate phosphate dikinase, PEP/pyruvate-binding domain () at its N-terminus and the PEP-utilizing enzyme mobile domain.\ 3983 IPR003414 \ Polyphosphate kinase (Ppk) () catalyzes the formation of polyphosphate from ATP, with chain lengths of up to a thousand or more orthophosphate molecules. It is a membrane protein and goes through an intermediate stage during the reaction where it is autophosphorylated with a phosphate group covalently linked to a basic amino acid residue through an N-P bond.\ 6687 IPR010675 \

    This entry represents a conserved region of approximately 120 residues within eukaryotic Bicoid-interacting protein 3 (Bin3). Bin3, which shows similarity to a number of protein methyltransferases that modify RNA-binding proteins, interacts with Bicoid, which itself directs pattern formation in the early Drosophila embryo. The interaction might allow Bicoid to switch between its dual roles in transcription and translation PUBMED:10717484. Note that proteins of the entry contain a conserved HLN motif.

    \ 5821 IPR009113 \

    Mu1 is an outer capsid protein that acts as a reoviral penetration agent. Non-enveloped animal reoviruses must enter host cells by membrane penetration that does not involve membrane fusion, as they lack a viral membrane. Reoviruses are activated by proteolytic cleavage in the intestinal lumen, leading to infectious subviral particles. The core of the virus is coated by a layer of mu1 and sigma3 proteins. Proteases strip off sigma3 exposing mu1, which provides the membrane penetration machinery that perforates the membrane. Mu1 forms a trimer, where the three mu1 molecules are coiled around one another with a right-handed twist. The mu1 chain folds into four distinct domains: three intertwined, predominantly alpha helical domains and a jelly-roll beta-sandwich PUBMED:11832217.

    \ \ 7937 IPR012513 \

    This family consists of the metchnikowin family of antimicrobial peptides from Drosophila. metchnikowin is a proline-rich peptide whose expression is immune-inducible. Induction of the metchnikowin gene expression can be mediated either by the TOLL pathway or by the imd gene product. The metchnikowin peptide is unique among the Drosophila antimicrobial peptides in that it is active against both bacteria and fungi PUBMED:9600835.

    \ 3978 IPR007532 \ The poxvirus early transcription factor (VETF), in addition to the viral RNA polymerase, is required for efficient transcription of early genes in vitro. VETF is a heterodimeric protein that binds specifically to early gene promoters. The heterodimer is comprised of an 82 kDa (this family) subunit and a 70 kDa subunit.\ 4249 IPR001593 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of proteins that have from 220 to 250 amino acids.

    \ 1084 IPR001036 \

    The Escherichia coli acrA and acrB genes encode a multi-drug efflux system that is\ believed to protect the bacterium against hydrophobic inhibitors PUBMED:8407802.\ E. coli AcrB is a transporter that is energized by proton-motive force and that shows the widest substrate specificity among all known multidrug pumps,\ ranging from most of the currently used antibiotics, disinfectants, dyes, and detergents to simple solvents.

    \

    The structure of ligand-free AcrB shows that it is a homotrimer of 110 kD per subunit. Each subunit contains 12 transmembrane helices and two large\ periplasmic domains (each exceeding 300 residues) between helices 1 and 2, and helices 7 and 8. X-ray analysis of the overexpressed AcrB protein\ demonstrated that the three periplasmic domains form, in the center, a funnel-like structure and a connected narrow (or closed) pore. The pore is opened to\ the periplasm through three vestibules located at subunit interfaces. These vestibules were proposed to allow direct access of drugs from the periplasm as well as the\ outer leaflet of the cytoplasmic membrane. The three transmembrane domains of AcrB protomers form a large, 30Ã…-wide central cavity that spans the\ cytoplasmic membrane and extends to the cytoplasm

    \

    X-ray crystallographic structures of the\ trimeric AcrB pump from Escherichia coli with four structurally diverse ligands demonstrated that three molecules of\ ligand bind simultaneously to the extremely large central cavity of 5000 cubic angstroms, primarily by hydrophobic, aromatic\ stacking and van der Waals interactions. Each ligand uses a slightly different subset of AcrB residues for binding. The bound\ ligand molecules often interact with each other, stabilizing the binding.

    \ 631 IPR004136 \

    2-Nitropropane dioxygenase () catalyses the oxidation of nitroalkanes into their corresponding carbonyl compounds and nitrite using eithr FAD or FMN as a cofactor PUBMED:15582992. This entry also includes fatty acid synthase subunit beta (), which catalyses the formation of long- chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH. The beta subunit contains domains for: [acyl-carrier protein] acetyltransferase and malonyltransferase, S-acyl fatty acid synthase thioesterase, enoyl-[acyl-carrier protein] reductase, and 3-hydroxypalmitoyl-[acyl-carrier protein] dehydratase.

    \ 6080 IPR009363 \

    This family consists of several bacterial and phage proteins of unknown function.

    \ 2725 IPR000343 \

    Delta-aminolevulinic acid (ALA) is the obligatory precursor for the synthesis\ of all tetrapyrroles including porphyrin derivatives such as chlorophyll and\ heme. ALA can be synthesized via two different pathways: the Shemin (or C4)\ pathway which involves the single step condensation of succinyl-CoA and\ glycine and which is catalyzed by ALA synthase () and via the C5\ pathway from the five-carbon skeleton of glutamate. The C5 pathway operates\ in the chloroplast of plants and algae, in cyanobacteria, in some eubacteria\ and in archaebacteria.

    \ The initial step in the C5 pathway is carried out by members of this family, glutamyl-tRNA reductases\ (GluTR) PUBMED:1502723 which catalyzes the Mg2+/NADPH-dependent conversion of glutamate-\ tRNA(Glu) to glutamate-1-semialdehyde (GSA) with the concomitant release of\ tRNA(Glu) which can then be recharged with glutamate by glutamyl-tRNA\ synthetase. GSA is converted to ALA by GSA aminotransferase. This example of an aminoacyl-tRNA being used in any reaction\ other \ than peptide bond formation is highly unusual.

    \

    \ GluTR is a protein of about 50 Kd (467 to 550 residues) which contains a few\ conserved region. The best conserved region is located in positions 99 to 122\ in the sequence of known GluTR. This region seems important for the activity\ of the enzyme.

    \ 931 IPR004365 \

    The OB-fold (oligonucleotide/oligosaccharide-binding fold) is found in all three kingdoms and its common architecture presents a binding face that has adapted to bind different ligands. The OB-fold is a five/six-stranded closed beta-barrel formed by 70-80 amino acid residues. The strands are connected by loops of varying length which form the functional appendages of the protein. The majority of OB-fold proteins use the same face for ligand binding or as an active site. Different OB-fold proteins use this 'fold-related binding face' to, variously, bind oligosaccharides, oligonucleotides, proteins, metal ions and catalytic substrates.

    \

    This entry contains OB-fold domains that bind to nucleic acids PUBMED:10829230. It includes the anti-codon binding domain of lysyl, aspartyl, and asparaginyl-tRNA synthetases (See ). Aminoacyl-tRNA synthetases catalyse the addition of an amino acid to the appropriate tRNA molecule (EC 6.1.1.-) This domain is found in RecG helicase involved in DNA repair. Replication factor A is a heterotrimeric complex, that contains a subunit in this family PUBMED:7760808, PUBMED:8990123. This domain is also found at the C terminus of bacterial DNA polymerase III alpha chain.

    \ 361 IPR007397 \

    Proteins containing this domain are associated with F-box domains (), hence the name FBA. This domain is probably involved in binding other proteins that will be targeted for ubiquitination. is involved in binding to N-glycosylated proteins.

    \ 1375 IPR000775 \ Bindin, the major protein component of the acrosome granule of sea urchin sperm, mediates species-specific adhesion of sperm to the egg surface during fertilisation PUBMED:1991551, PUBMED:1775065. The \ protein coats the acrosomal process after externalisation by the acrosome reaction; it binds to \ sulphated, fucose-containing polysaccharides on the vitelline-layer receptor proteoglycans that \ cover the egg plasma membrane. Bindins from different genera show high levels of sequence similarity \ in both the mature bindin domain and in the probindin precursor region. The most highly conserved \ region is a 42-residue segment in the central portion of the mature bindin protein. This domain may \ be responsible for conserved functions of bindin, while the more highly divergent flanking regions \ may be responsible for its species-specific properties PUBMED:1991551.\ 2904 IPR005205 \

    The immediate-early protein ICP4 (infected-cell polypeptide 4) is required for efficient transcription of early and late viral genes and is thus essential for productive infection. ICP4 is a large phosphoprotein that binds DNA in a sequence specific manner as a homodimer. ICP4 represses transcription from LAT, ICP4 and ORF-P that have high-affinity a ICP4 binding site that spans the transcription initiation site. ICP4 proteins have two highly conserved regions, this family contains the C-terminal region that probably acts as an enhancer for the N-terminal region PUBMED:11739685.

    \ 3325 IPR000347 \ Members of this family are metallothioneins. These\ proteins are cysteine rich proteins that bind to heavy\ metals. Members of this family appear to be closest to\ Class II metallothioneins.\ 4657 IPR007195 \

    TolB is a periplasmic protein from Escherichia coli that is part of the Tol-dependent translocation system involving group A and E colicins that is used to penetrate and kill cells PUBMED:10545334, PUBMED:10673426. TolB has two domains, an alpha-helical N-terminal domain that shares structural similarity with the C-terminal domain of transfer RNA ligases, and a beta-propeller C-terminal domain that shares structural similarity with numerous members of the prolyl oligopeptidase family and, to a lesser extent, to class B metallo-beta-lactamases PUBMED:10545334. The function of the N-terminal domain is uncertain.

    \ 2001 IPR005529 \ This group of proteins may be related to the FARP (FMRFamide) family, . Currently this repeat was only detectable in Arabidopsis thaliana.\ 4923 IPR007683 \ This is a family of bacterial proteins associated with virulence. They are defined by a conserved region found at the N terminus of the VapD protein PUBMED:1398971.\ 585 IPR001993 \

    A variety of substrate carrier proteins that are involved in energy transfer are found in the inner mitochondrial membrane or integral to the membrane of other eukaryotic organelles such as the peroxisome PUBMED:2158156, PUBMED:, PUBMED:8140286, PUBMED:8487299, PUBMED:8206158, PUBMED:8291088. Such proteins include: ADP,ATP carrier protein (ADP/ATP translocase); 2-oxoglutarate/malate carrier protein; phosphate carrier protein; tricarboxylate transport protein (or citrate transport protein); Graves disease carrier protein; yeast mitochondrial proteins MRS3 and MRS4; yeast mitochondrial FAD carrier protein; and many others. Structurally, these proteins can consist of up to three tandem repeats of a domain of approximately 100 residues, each domain containing two transmembrane regions.

    \ 4759 IPR000580 \ Several eukaryotic proteins are evolutionary related and are thought to be involved in transcriptional regulation.\ These proteins are highly similar in a region of about 50 residues that include a conserved leucine-zipper domain\ most probably involved in homo- or hetero-dimerization. Proteins containing this signature include the vertebrate\ protein TSC-22 PUBMED:9022669, a transcriptional regulator which seems to act on C-type natriuretic peptide (CNP)\ promoter; mammalian protein DIP (DSIP-immunoreactive peptide) PUBMED:8982256, a protein whose function is not yet\ known; Drosophila protein bunched PUBMED:7555710 (gene bun) (also known as shortsighted), a probable transcription\ factor required for peripheral nervous system morphogenesis, eye development and oogenesis; and the C. elegans\ hypothetical protein T18D3.7.\ 5095 IPR007932 \

    This family contains several Gp38 proteins from T-even-like phages. Gp38, together with a\ second phage protein, gp57, catalyses the organisation of gp37 but is absent from the phage\ particle. Gp37 is responsible for receptor recognition PUBMED:9680195.

    \ 1544 IPR002635 \ This family consists of the chorion superfamily proteins classes A, B, CA, CB and high-cysteine HCB from silk, gypsy and polyphemus moths. The chorion proteins make up the moths egg shell a complex extracellular structure PUBMED:3462711.\ 4899 IPR006016 \ The universal stress protein UspA PUBMED:8152377 is a small cytoplasmic\ bacterial protein whose expression\ is enhanced when the cell is exposed to\ stress agents. UspA enhances the rate of cell survival during\ prolonged exposure to such conditions, and may provide a general\ "stress endurance" activity.\ The crystal structure of Haemophilus influenzae UspA PUBMED:11738040 reveals\ an alpha/beta fold similar to that of the Methanococcus jannaschi\ MJ0577 protein, which binds ATP PUBMED:9860944, though UspA lacks ATP-binding\ activity.\ 3380 IPR001668 \ With some plasmids, recombination can occur in a site specific manner that is independent of RecA. In such cases, the recombination event requires another protein called Pre. Pre is a plasmid recombination enzyme. This protein is also known as Mob (conjugative mobilization) PUBMED:2768188.\ \ 5541 IPR008687 \ This family consists of several bacterial MobC-like, mobilisation proteins. MobC proteins belong to the group of relaxases. Together with MobA and MobB they bind to a single cis-active site of a mobilising plasmid, the origin of transfer (oriT) region PUBMED:11976306. The absence of MobC has several different effects on oriT DNA. Site- and strand-specific nicking by MobA protein is severely reduced, accounting for the lower frequency of mobilisation. The localised DNA strand separation required for this nicking is less affected, but becomes more sensitive to the level of active DNA gyrase in the cell. In addition, strand separation is not efficiently extended through the region containing the nick site. These effects suggest a model in which MobC acts as a molecular wedge for the relaxosome-induced melting of oriT DNA. The effect of MobC on strand separation may be partially complemented by the helical distortion induced by supercoiling. However, MobC extends the melted region through the nick site, thus providing the single-stranded substrate required for cleavage by MobA PUBMED:9302013.\ 4069 IPR002801 \

    Aspartate carbamoyltransferase (aspartate transcarbamylase, ATCase) exists as a dimer of catalytic trimers (3x33kDa) that are held together by three dimeric (2x17kDa) regulatory subunits ((c3)2(r2)3). ATCase plays a central role in the regulation of the pyrimidine pathway in bacteria. In (c3)2(r2)3 ATCases, the\ association of the catalytic subunits c3 with the regulatory subunits r2 is responsible for the establishment of positive co-operativity between catalytic sites for the binding of aspartate and it dictates the pattern of allosteric response toward nucleotide effectors. ATCase from Escherichia coli is the most extensively studied allosteric enzyme PUBMED:7791626. The crystal structure of the T-state, the T-state with CTP bound, the R-state with N-phosphonacetyl-L-aspartate (PALA) bound, and the R-state with phosphonoacetamide plus malonate bound have been used in interpreting kinetic and mutational studies.

    \ \

    A high-resolution structure of E. coli ATCase in the presence of PALA (a bisubstrate\ analog) allows a detailed description of the binding at the active site of the enzyme \ and allows a detailed model of the tetrahedral intermediate to be constructed. The\ entire regulatory chain has been traced showing that the N-terminal regions\ of the regulatory chains R1 and R6 are located in close proximity to each other\ and to the regulatory site. This portion of the molecule may be involved in the \ observed asymmetry between the regulatory binding sites as well as in the heterotropic \ response of the enzyme PUBMED:10651286.

    \ \

    ATCase from Erwinia herbicola differs from the\ other investigated enterobacterial ATCases by its absence of homotropic\ co-operativity toward the substrate aspartate and its lack of response to ATP which is\ an allosteric effector (activator) of this family of enzymes. Nevertheless, the E. herbicola ATCase has the same quaternary structure, two trimers of catalytic chains\ with three dimers of regulatory chains ((c3)2(r2)3), as other enterobacterial ATCases\ and shows extensive primary structure conservation PUBMED:10600394.

    \ \ 7335 IPR011122 \

    These proteins are encoded by putative wav gene clusters, which are responsible for the synthesis of the core oligosaccharide (OS) region of Vibrio cholerae lipopolysaccharide PUBMED:11953379.

    \ 6493 IPR009552 \

    This family represents a conserved region of unknown function within NAC1 and a number of hypothetical proteins whose sequences bear resemblance to it. NAC1 is a constitutively-expressed POZ/BTB transcription factor found in mammalian neurones that can regulate behaviours associated with cocaine use PUBMED:12725910. All family members contain the domain.

    \ 6461 IPR009536 \

    This family consists of several Cucumber mosaic virus ORF IIB proteins. The function of this family is unknown.

    \ 1280 IPR000194 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    This family includes the ATP synthase alpha and beta subunits the ATP synthase associated with flagella. The sequences of the alpha and beta subunits are related and both contain a\ nucleotide-binding site for ATP and ADP. The central region is almost always associated with the N-terminal domain (see ).

    \

    Vacuolar ATPases PUBMED:2531737 (V-ATPases) are responsible for acidifying a variety of\ intracellular compartments in eukaryotic cells. Like F-ATPases, they are\ oligomeric complexes of a transmembrane and a catalytic sector. The sequence\ of the largest subunit of the catalytic sector (70 Kd) is related to that of\ F-ATPase beta subunit, while a 60 Kd subunit, from the same sector, is related\ to the F-ATPases alpha subunit PUBMED:2528146.\ Archaebacterial membrane-associated ATPases are composed of three subunits.\ The alpha chain is related to F-ATPases beta chain and the beta chain is\ related to F-ATPases alpha chain PUBMED:2528146.\ A protein highly similar to F-ATPase beta subunits is found PUBMED:8491729 in some\ bacterial apparatus involved in a specialized protein export pathway that\ proceeds without signal peptide cleavage. This protein is known as fliI in\ Bacillus subtilis and Salmonella typhimurium, Spa47 (mxiB) in Shigella flexneri, HrpB6 in\ Xanthomonas campestris and yscN in Yersinia pestis virulence plasmids.

    \

    In bacteria the alpha chain is the regulatory subunit and the beta chain is the catalytic subunit. In V-type ATP synthase the archaeal alpha chain is the catalytic subunit while the beta chain is the regulatory subunit.

    \ \ 1126 IPR001114 \

    Adenylosuccinate synthetase () plays an important role in purine\ biosynthesis, by catalyzing the GTP-dependent conversion of IMP and aspartic\ acid to AMP. Adenylosuccinate synthetase has been characterized from various\ sources ranging from Escherichia coli (gene purA) to vertebrate tissues. In\ vertebrates, two isozymes are present - one involved in purine biosynthesis\ and the other in the purine nucleotide cycle.

    \

    The crystal structure of adenylosuccinate synthetase from E. coli reveals that the dominant structural element of each monomer of the homodimer is a central beta-sheet of 10 strands. The first nine strands of the sheet are mutually parallel with right-handed crossover connections between the strands. The 10th strand is antiparallel with respect to the first nine strands. In addition, the enzyme has two antiparallel beta-sheets, comprised of two strands and three strands each, 11 alpha-helices and two short 3/10-helices. Further, it has been suggested that the similarities in the GTP-binding domains of the synthetase and the p21ras protein are an example of convergent evolution of two distinct families of GTP-binding proteins PUBMED:8244965. Structures of adenylosuccinate synthetase from Triticum aestivum and Arabidopsis thaliana when compared with the known structures from E. coli reveals that the overall fold is very similar to that of the E. coli protein PUBMED:10669609.

    \ \ 4005 IPR007447 \ This family includes ProQ, which is required for full activation of the osmoprotectant transporter, ProQ, in Escherichia coli.\ 6940 IPR010778 \

    This family consists of several proteins with seem to be specific to red algae plasmids. Members of this family are typically around 415 residues in length. The function of this family is unknown.

    \ 7893 IPR012587 \

    This short region is found in two copies in p68-like RNA helicases PUBMED:15112237.

    \ 5213 IPR008680 \ This family consists of Homo sapiens and simian mastadenovirus early E4 13 kDa proteins. Human adenovirus type 9 (Ad9) is unique in eliciting exclusively estrogen-dependent mammary tumours in Rattus spp. and in not requiring viral E1 region transforming genes for tumorigenicity. E4 codes for an oncoprotein essential for tumourigenesis by Ad9 PUBMED:11134268.\ 215 IPR006134 \

    DNA is the biological information that instructs cells how to exist in an ordered fashion: accurate replication is thus one of the\ most important events in the life cycle of a cell. This function is performed by DNA- directed DNA-polymerases )\ by adding nucleotide triphosphate (dNTP) residues to the 5'-end of the growing chain of DNA, using a complementary DNA\ chain as a template. Small RNA molecules are generally used as primers for chain elongation, although terminal proteins\ may also be used for the de novo synthesis of a DNA chain. Even though there are 2 different methods of priming, these are\ mediated by 2 very similar polymerases classes, A and B, with similar methods of chain elongation. \ \ A number of DNA polymerases have been grouped under the designation of DNA polymerase family B. Six regions\ of similarity (numbered from I to VI) are found in all or a subset of the B family polymerases. The most conserved region (I)\ includes a conserved tetrapeptide with two aspartate residues. Its function is not yet known, however, it has been suggested\ that it may be involved in binding a magnesium ion. All sequences in the B family contain a characteristic DTDS motif, and\ possess many functional domains, including a 5'-3' elongation domain, a 3'-5' exonuclease domain PUBMED:8679562, a DNA binding domain,\ and binding domains for both dNTP's and pyrophosphate PUBMED:9757117.

    \

    This region of DNA polymerase B appears to consist of more than one structural domain, possibly including elongation,\ DNA-binding and dNTP binding activities PUBMED:9757117.

    \ 4336 IPR000699 \

    Ryanodine and Inositol 1,4,5-trisphosphate (IP3) receptors are intracellular Ca2+-release channels. They become activated upon binding of their respective ligands, Ca2+ and IP3, opening an intrgral Ca2+ channel. Ryanodine receptor activation is a key component of muscular contraction, their activation allowing release of Ca2+ from the sarcoplasmic reticulum. Mutations in the ryanodine receptor lead to malignant hyperthermia susceptibility the and central core disease of muscle.

    \ 5780 IPR009233 \

    Natural genetic competence in Bacillus subtilis is controlled by quorum-sensing (QS). The ComP- ComA two-component system detects the signalling molecule ComX, and this signal is transduced by a conserved phosphotransfer mechanism. ComX is synthesised as an inactive precursor and is then cleaved and modified by ComQ before export to the extracellular environment PUBMED:12067344.

    \ 5424 IPR008860 \ This family consists of several antigen proteins from Taenia and Echinococcus (tapeworm) species.\ 5843 IPR009257 \

    This family consists of several short Chordopoxvirus proteins which are homologous to the A30L protein of Vaccinia virus. The vaccinia virus A30L protein is required for the association of electron-dense, granular, proteinaceous material with the concave surfaces of crescent membranes, an early step in viral morphogenesis. A30L is known to interact with the G7L protein and it has been shown that the stability of each is dependent on its association with the other PUBMED:12610117.

    \ 4631 IPR003669 \

    Two forms of microbial thymidylate synthase are known: ThyA () and ThyX PUBMED:9665876. This family describes ThyX, a homotetrameric flavoprotein. Both enzymes convert dUMP to dTMP. Under oxygen-limiting conditions, thyX can complement a thyA mutation PUBMED:15046578.

    \ \ \ \ \ 2564 IPR005838 \

    Secretion of virulence factors in Gram-negative bacteria involves transportation of the protein across two membranes to reach the cell exterior PUBMED:8969244. There have been four secretion systems described in animal enteropathogens such as Salmonella and Yersinia, with further sequence similarities in plant pathogens like Ralstonia and Erwinia. The type III secretion system is of great interest as it is used to transport virulence factors from the pathogen directly into the host cell PUBMED:10334981 and is only triggered when the bacterium comes into close contact with the host. The protein subunits of the system are very similar to those of bacterial flagellar biosynthesis PUBMED:10564516. However, while the latter forms a ring structure to allow secretion of flagellin and is an integral part of the flagellum itself PUBMED:10564516, type III subunits in the outer membrane translocate secreted proteins through a channel-like structure. It is believed that the family of type III inner membrane proteins are used as structural moieties in a complex with several other subunits PUBMED:9618447, including the ATPase necessary for driving the secretion system.

    \ \

    One such set of inner membrane proteins, termed "P" here for nomenclature purposes, includes the Salmonella and Shigella SpaP, the Yersinia YscR, the Erwinia HrcR, and the Xanthamonas Pro2 genes PUBMED:9618447, as well as several FliP flagellar biosynthesis genes PUBMED:10564516. FliP is an ~30Kd protein containing three or four transmembrane (TM) regions.

    \ \ 3831 IPR006450 \

    This group of sequences represents small (~100 amino acids) proteins found in phage and in putative prophage regions of a number of bacterial genomes. The function of these sequences is unknown.

    \ 1246 IPR006660 \

    An anion-translocating ATPase has been identified as the product of the arsenical resistance operon of resistance plasmid R773 PUBMED:1704144. When expressed in Escherichia coli this ATP-driven oxyanion pump catalyses extrusion of the oxyanions arsenite, antimonite and arsenate. Maintenance of a low intracellular concentration of oxyanion produces resistance to the toxic agents. The pump is composed of two polypeptides, the products of the arsA and arsB genes. This two-subunit enzyme produces resistance to arsenite and antimonite. A third gene, arsC, expands the substrate specificity to allow for arsenate pumping and resistance. ArsC catalyzes the reduction of arsenate to arsenite.

    \ 2680 IPR006222 \ This is a family of glycine cleavage T-proteins, part of the glycine \ cleavage multienzyme complex (GCV) found in bacteria and the mitochondria\ of eukaryotes. GCV catalyses the catabolism of glycine in eukaryotes.\ The T-protein is an aminomethyl transferase \ that catalyses the following reaction:\ \ 7179 IPR009950 \

    This family consists of several hypothetical Enterobacterial proteins of around 80 residues in length. The function of this family is unknown.

    \ 446 IPR001674 \

    The amidotransferase family of enzymes utilizes the ammonia derived from the hydrolysis of glutamine for a subsequent chemical reaction catalyzed by the same enzyme. The ammonia intermediate does not dissociate into solution during the chemical transformations PUBMED:10387030.\ GMP synthetase is a glutamine amidotransferase from the de novo purine biosynthetic pathway. The C-terminal domain is specific to the GMP synthases . In prokaryotes this domain mediates dimerisation. Eukaryotic GMP synthases are monomers. This domain in eukaryotes includes several large insertions that may form globular domains PUBMED:8548458.

    \ \ 6874 IPR009761 \

    This family consists of several repeats of around 42 residues in length. These repeated sequences are found in multiple copies in Trypanosoma cruzi antigens, contains 23 copies of this repeat.

    \ 4234 IPR002222 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The small subunit ribosomal proteins can\ be categorised as: primary binding proteins, which bind directly and\ independently to 16S rRNA; secondary binding proteins, which display no\ specific affinity for 16S rRNA, but its assembly is contingent upon the\ presence of one or more primary binding proteins; and tertiary binding\ proteins, which require the presence of one or more secondary binding\ proteins and sometimes other tertiary binding proteins.\ The small ribosomal subunit protein S19 contains 88-144 amino acid residues.\ In Escherichia coli, S19 is known to form a complex with S13 that binds \ strongly to 16S ribosomal RNA. Experimental evidence PUBMED:9371771 has revealed that \ S19 is moderately exposed on the ribosomal surface, and is designated \ a secondary rRNA binding protein. S19 belongs to a family of ribosomal \ proteins PUBMED:9371771, PUBMED:2044758 that includes: eubacterial S19; algal and plant chloroplast \ S19; cyanelle S19; archaebacterial S19; plant mitochondrial S19; and \ eukaryotic S15 ('rig' protein).

    \ 2058 IPR002708 \

    This domain is about 320 residues long and is found in proteins that have two C-terminal CBS domains, . The function of DUF39 is unknown. The\ protein is described as inosine-5'-monophosphate dehydrogenase related protein VIII, based on the sequence simarity it shares to the CBS domains.

    \ 258 IPR005024 \

    This is a family of eukaryotic proteins which are variously described as either hypothetical protein, developmental protein or related to yeast SNF7. The family contains human CHMP1. CHMP1 (CHromatin Modifying Protein; CHarged Multivesicular body Protein), is encoded by an alternative open reading frame in the PRSM1 gene PUBMED:8863740 and is conserved in both complex and simple eukaryotes. CHMP1 contains a predicted bipartite nuclear localisation signal and distributes as distinct forms to the cytoplasm and the nuclear matrix in all cell lines tested.

    \

    Human CHMP1 is strongly implicated in multivesicular body formation. A multivesicular body is a vesicle-filled endosome that targets proteins to the interior of lysosomes. Immunocytochemistry and biochemical fractionation localise CHMP1 to early endosomes and CHMP1 physically interacts with SKD1/VPS4, a highly conserved protein directly linked to multivesicular body sorting in yeast. Similar to the action of a mutant SKD1 protein, over expression of a fusion derivative of human CHMP1 dilates endosomal compartments and disrupts the normal distribution of several endosomal markers. Genetic studies in Saccharomyces cerevisiae further support a conserved role of CHMP1 in vesicle trafficking. Deletion of CHM1, the budding yeast homolog of CHMP1, results in defective sorting of carboxypeptidases S and Y and produces abnormal, multi-lamellar prevacuolar compartments. This phenotype classifies CHM1 as a member of the class E vacuolar protein sorting genes PUBMED:11559747.

    \ 6842 IPR009743 \

    This entry represents the C terminus (approximately 270 residues) of a number of plant Hs1pro-1 proteins, which are believed to confer nematode resistance PUBMED:12669798.

    \ 6562 IPR010615 \

    This domain represents a conserved region within viral UL97 phosphotransferases. UL97 participates in the phosphorylation of the nucleoside analog ganciclovir (GCV) to produce GCV-monophosphate PUBMED:9217058.

    \ 65 IPR007042 \ This conserved, predominantly, C-terminal region is found in a number of proteins including arsenite-resistance protein 2, which is thought to play a role in arsenite resistance PUBMED:10069470. Arsenite is a carcinogenic compound which can act as a comutagen by inhibiting DNA repair.\ 7567 IPR003541 \ A large group of bacterial exotoxins are referred to as "A/B toxins", \ essentially because they are formed from two subunits. The "A" subunit\ possesses enzyme activity, and is transferred to the host cell following\ a conformational change in the membrane-bound transport "B" subunit PUBMED:8225592.\

    Bacillus anthracis, a large Gram-positive spore-forming rod, is the \ causative agent of anthrax PUBMED:3149607. Its two virulence factors are the \ poly-D-glutamate polypeptide capsule, and the actual anthrax exotoxin PUBMED:1910002. The toxin comprises three factors: the protective antigen (PA); the oedema factor (EF) PUBMED:3149607; and the lethal factor (LF) PUBMED:2509294. Each is a thermolabile protein of ~80kDa. PA forms the "B" part of the exotoxin and allows passage of the "A" moiety (consisting of EF and LF) into target cells.

    \ \

    EF is necessary for the oedema-producing activity of the toxin, and is \ known to be an inherent adenylate cyclase. It causes dysregulation of intracellular signalling PUBMED:3149607, PUBMED:1910002. Uptake of the lethal factor LF occurs via activated heptameric PA. Once inside the host lymphocytes/macrophages, the zinc metalloprotease of LF PUBMED:7851740 cleaves MAP kinases, inhibits cell proliferation, and leads to cell death PUBMED:1910002, PUBMED:2509294.

    \ 8118 IPR013252 \

    Spc24 is a component of the evolutionarily conserved kinetochore-associated Ndc80 complex and is involved in chromosome segregation PUBMED:11266451

    \ 2083 IPR007338 \ This is a bacterial family of uncharacterised proteins.\ 2104 IPR005915 \

    The members of this family share 50 % or greater sequence identity. They are found as eleven tandem genes, arranged head-to-tail, in Staphylococcus aureus strain COL. Distant full-length homologs are found in a Staphylococcus haemolyticus plasmid and in Bacillus halodurans. The function of these proteins is unknown.

    \ 1897 IPR003756 \

    This entry describes proteins of unknown function.

    \ 349 IPR004263 \ Hereditary multiple exostoses (EXT) is an autosomal dominant disorder that is characterized by the\ appearance of multiple outgrowths of the long bones (exostoses) at their epiphyses PUBMED:9473480. Mutations in two homologous genes, EXT1 and EXT2, are responsible for the EXT\ syndrome. The human and mouse EXT genes have at least two homologs in the invertebrate\ Caenorhabditis elegans, indicating that they do not function exclusively as regulators of bone growth.\ EXT1 and EXT2 have both been shown to encode glycosyltransferases involved in the chain\ elongation step of heparan sulphate biosynthesis PUBMED:9756849.\ 6983 IPR009825 \

    This family consists of several bacterial proteins of around 180 residues in length. The function of this family is unknown.

    \ 3305 IPR003178 \ Methyl-coenzyme M reductase (MCR) is the enzyme responsible for microbial formation of methane. It is a hexamer composed of 2 alpha, 2 beta, and 2 gamma subunits with two identical nickel porphinoid active sites PUBMED:9367957.\ 4780 IPR003422 \ The ubiquinol-cytochrome C reductase complex (cytochrome bc1 complex) is a respiratory multienzyme complex PUBMED:9651245. The bc1 complex contains 11 subunits; 3 respiratory subunits (cytochrome B, cytochrome C1, Rieske protein), 2 core proteins and 6 low molecular weight proteins. This family represents the 'hinge' protein of the complex which is thought to mediate formation of the cytochrome c1 and cytochrome c complex.\ 7768 IPR012468 \

    The members of this family are all hypothetical proteins of unknown function expressed by the eukaryotic parasite Encephalitozoon cuniculi GB-M1. The region in question is approximately 250 amino acids long.

    \ 3316 IPR004891 \ The mercury resistance protein, MerC, is an inner mebrane protein that mediates Hg2+ transport into the cytoplasm, thereby conferring mercury resistance.\ 6991 IPR010797 \

    This family consists of Pex26 and related mammalian proteins. Pex26 is a type II peroxisomal membrane protein that recruits Pex6-Pex1 complexes to peroxisomes PUBMED:12717447. Mutations in Pex26 can lead to human disorders PUBMED:12851857.

    \ 3030 IPR002607 \ This family consist of various hydratases and 4-oxalocrotonate \ decarboxylases which are involved in the bacterial meta-cleavage \ pathways for degradation of aromatic compounds.\ 2-hydroxypentadienoic acid hydratase encoded by mhpD in Escherichia coli\ is involved in the phenylpropionic acid pathway of\ E. coli and catalyses the conversion of 2-hydroxy pentadienoate to\ 4-hydroxy-2-keto-pentanoate and uses a Mn2+ co-factor PUBMED:9492273.\ OHED hydratase encoded by hpcG in E. coli is involved \ in the homoprotocatechuic acid (HPC) catabolism PUBMED:7737515.\ XylI in Pseudomonas putida is a 4-Oxalocrotonate decarboxylase PUBMED:8510667.\ 6261 IPR009451 \

    Methylamine dehydrogenase () is a periplasmic quinoprotein found in several methyltrophic bacteria PUBMED:8021187. It is induced when grown on methylamine as a carbon source MADH and catalyses the oxidative deamination of amines to their corresponding aldehydes. The redox cofactor of this enzyme is tryptophan tryptophylquinone (TTQ). Electrons derived from the oxidation of methylamine are passed to an electron acceptor, which is usually the blue-copper protein amicyanin ().

    \ \ \ \

    MADH is a hetero-tetramer, comprised of two heavy subunits and two light subunits. The heavy subunit forms a seven-bladed beta-propeller like structure PUBMED:9514722.

    \ 5494 IPR008617 \

    The function of these proteins is unknown.

    \ 6659 IPR000191 \

    Formamidopyrimidine-DNA glycosylase () PUBMED:7704272 (Fapy-DNA glycosylase)\ (gene fpg) is a bacterial enzyme involved in DNA repair and which excise\ oxidized purine bases to release 2,6-diamino-4-hydroxy-5N-methylformamido-\ pyrimidine (Fapy) and 7,8-dihydro-8-oxoguanine (8-OxoG) residues. In addition\ to its glycosylase activity, FPG can also nick DNA at apurinic/apyrimidinic\ sites (AP sites). FPG is a monomeric protein of about 32 Kd which binds and\ require zinc for its activity.

    \

    The N-terminal section (PS01242) is the zinc binding site in the C-terminal part of the Formamidopyrimidine-DNA glycosylase\ enzyme where fours conserved and essential PUBMED:8473347 cysteines are located.

    \ 3612 IPR001155 \ The NADH:flavin oxidoreductase/NADH oxidases that belong to this family are mostly of bacterial or yeast origin and reduce a range of alternative electron acceptors. Most use FAD/FMN as a cofactor. Members of this family have TIM barrel structure.\ 5463 IPR008815 \ This family consists of bacterial 23S rRNA proteins PUBMED:8341711. \ 4983 IPR006031 \

    This repeat is found in a wide variety of proteins and generally consists of the motif XYPPX where X can be any amino acid. The family includes annexin VII ANX7_DICDI, the carboxy tail of certain rhodopsins OPSD_LOLSU. This family also includes plaque matrix proteins, however this motif is embedded in a ten residue repeat in FP1_MYTED. The molecular function of this repeat is unknown. It is also not clear is all the members of this family share a common evolutionary ancestor due to its short length and biased amino acid composition.

    \ 5847 IPR009258 \

    This family consists of several GP30.8 proteins from the T4-like phages. The function of this family is unknown.

    \ 1523 IPR004866 \

    This domain represents the N-terminal domain in chitobiases and beta-hexosaminidases . Chitobiases degrade chitin, which forms the exoskeleton in insects and crustaceans, and which is one of the most abundant polysaccharides on earth PUBMED:8673609. Beta-hexosaminidases are composed of either a HexA/HexB heterodimer or a HexB homodimer, and can hydrolyse diverse substrates, including GM(2)-gangliosides; mutations in this enzyme are associated with Tay-Sachs disease PUBMED:12662933. HexB is structurally similar to chitobiase, consisting of a beta sandwich structure; this structure is similar to that found in the cellulose-binding domain of cellulase from Cellulomonas fimi (), suggesting that it may function as a carbohydrate-binding domain.

    \ \ \ 6331 IPR010521 \

    This family consists of several hypothetical Fijivirus proteins of unknown function.

    \ 6925 IPR010772 \

    This family, represents ORF4 of the abiN operon of Lactococcus lactis, and ORF27 of the temperate bacteriophage TP901 PUBMED:11312666. Members of this family are found exclusively in L. lactis and the bacteriophages that infect this species. The function of this family is unknown.

    \ 6733 IPR010695 \

    This family consists of several fas apoptotic inhibitory molecule (FAIM) proteins. FAIM expression is upregulated in B cells by anti-Ig treatment that induces Fas-resistance, and overexpression of FAIM diminishes sensitivity to Fas-mediated apoptosis of B and non-B cell lines. FAIM is highly evolutionarily conserved and is widely expressed in murine tissues, suggesting that FAIM plays an important role in cellular physiology PUBMED:11483211.

    \ 1587 IPR003704 \ Carbon monoxide dehydrogenase (Cdh) from Methanosarcina frisia Go1 is a Ni2+-, Fe2+-, and S2-containing alpha2beta2 heterotetramer PUBMED:8662887. The CO dehydrogenase enzyme complex from Methanosarcina thermophila contains a corrinoid/iron-sulphur enzyme composed of two subunits (delta and gamma) PUBMED:8550451. \ This family consists of carbon monoxide dehydrogenase I/II beta subunit and CO dehydrogenase (acetyl-CoA synthase\ epsilon subunit).\ 3450 IPR012682 \ The class III basic helix-turn-helix (bHLH) transcription factors have proliferative and apoptotic roles and are characterised by the presence of a leucine zipper adjacent to the bHLH domain. The myc oncogene gene was first discovered in small-cell lung cancer cell lines where it is found to \ be deregulated PUBMED:2827002. Although the biochemical function of the gene product is unknown, as a nuclear protein with a short half-life it may play a direct or indirect role in controlling gene expression PUBMED:3018999. Myc forms a heterodimer with Max, and this complex regulates cell growth through direct activation of genes involved in cell replication PUBMED:9175477.\

    This entry represents the N-terminal domain found adjacent to the basic helix-loop-helix (bHLH) region ().

    \ 2704 IPR003682 \

    GidB (glucose-inhibited division protein B) appears to be present and in a single copy in all complete eubacterial genomes so far. Its mode of action is unknown, but a methytransferase fold is reported from the crystal structure. It may be a family of bacterial glucose inhibited division proteins that are involved in the regulation of cell division PUBMED:9795152.

    \ 3859 IPR001711 \

    Phosphatidylinositol-specific phospholipase C (), an eukaryotic intracellular enzyme, plays an important role in signal transduction processes PUBMED:1849017 (see ). It catalyzes the hydrolysis of 1-phosphatidyl-D-myo-inositol-3,4,5-triphosphate into the second messenger molecules diacylglycerol and inositol-1,4,5-triphosphate. This catalytic process is tightly regulated by reversible phosphorylation and binding of regulatory proteins PUBMED:1419362, PUBMED:1319994, PUBMED:1335185.

    \

    In mammals, there are at least 6 different isoforms of PI-PLC, they differ in their domain structure, their regulation, and their tissue distribution. Lower eukaryotes also possess multiple isoforms of PI-PLC.

    \

    All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as 'X-box' (see ) and 'Y-box'. The order of these two regions is always the same (NH2-X-Y-COOH), but the spacing is variable. In most isoforms, the distance between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and one SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown to be important for the catalytic activity. At the C-terminal of the Y-box, there is a C2 domain (see ) possibly involved in Ca-dependent membrane attachment.

    \ 2882 IPR000896 \

    Haemocyanins are copper-containing oxygen transport proteins found in the haemolymph of many \ invertebrates. They are divided into 2 main groups, arthropodan and molluscan. These have structurally \ similar oxygen-binding centres, which are similar to the oxygen-binding centre of tyrosinases \ PUBMED:, but their quaternary structures are arranged differently. The arthropodan proteins exist \ as hexamers comprising 3 heterogeneous subunits (a, b and c) and possess 1 oxygen-binding centre per \ subunit; and the molluscan proteins exist as cylindrical oligomers of 10 to 20 subunits and possess 7 \ or 8 oxygen-binding centres per subunit PUBMED:3207675. Although the proteins have similar amino acid \ compositions, the only real similarity in their primary sequences is in the region corresponding to the\ second copper-binding domain, which also shows similarity to the copper-binding domain of tyrosinases \ PUBMED:.

    \

    Larval storage proteins (LSP) PUBMED:2808410 are proteins from the hemolymph of insects,\ which may serve as a store of amino acids for synthesis of adult proteins. There are two classes of \ LSP's, arylphorins, which are rich in aromatic amino acids, and methionine-rich LSP's. LSP's forms \ hexameric complexes. LSP's are structurally related to arthropods hemocyanins.

    \ 2170 IPR002733 \

    The contiguous gene deletion syndrome is characterized by Alport syndrome (A), mental retardation (M), midface hypoplasia (M), and elliptocytosis (E), as well as generalized hypoplasia and cardiac abnormalities. It is caused by a deletion in Xq22.3, comprising several genes including AMME chromosomal region gene 1 (AMMECR1), which encodes a protein with a nuclear location and presently unknown function. The C-terminal region of AMMECR1 (from residue 122 to 333) is well conserved, and homologues appear in species ranging from bacteria and archaea to eukaryotes. The high level of conservation of the AMMECR1 domain points to a basic cellular function, potentially in either the transcription, replication, repair or translation machinery PUBMED:10049589, PUBMED:10828604.

    \

    \ The AMMECR1 domain contains a 6-amino-acid motif (LRGCIG) that might be functionally important since it is strikingly conserved throughout evolution PUBMED:10049589. The AMMECR1 domain consists of two distinct subdomains of different sizes. The large subdomain, which contains both the N- and C-terminal regions, consists of five α-helices and five β-strands. These five β-strands form an antiparallel β-sheet. The small subdomain consists of four α-helices and three β-strands, and these β-strands also form an antiparallel β-sheet. The conserved 'LRGCIG' motif is located at β 2 and its N-terminal loop, and most of the side chains of these residues point toward the interface of the two subdomains. The two subdomains are connected by only two loops, and the interaction between the two subdomains is not strong. Thus, these subdomains may move dynamically when the substrate enters the cleft. The size of the cleft suggests that the substrate is large, e.g., the substrate may be a nucleic acid or protein. However, the inner side of the cleft is not filled with positively charged residues, and therefore it is unlikely that negatively charged nucleic acids such as DNA or RNA interact at this site PUBMED:15558565.

    \ \ \ 2366 IPR001177 \ Papillomavirus helicase E1 protein is an ATP-dependent DNA helicase required for\ initiation of viral NA replication. It forms a complex with the viral E2 protein.\ The E1-E2 complex binds to the replication origin which contains binding sites for\ both proteins.\ 4312 IPR005090 \

    Proteins in this group have homology with the RepC protein of Agrobacterium Ri and Ti plasmids PUBMED:7991675. They may be involved in plasmid replication and stabilization functions.

    \ 4765 IPR004219 \ TT virus (TTV), isolated initially from a Japanese patient with hepatitis of unknown aetiology, has since been found to infect both healthy and diseased individuals and numerous prevalence studies have raised questions about its role in unexplained hepatitis. ORF1 is a large 750 residue protein.\ 6033 IPR010403 \

    This domain is found in the NvdB protein (), which is involved in the production of beta-(1-->2)-glucan.

    \ 1698 IPR011994 \

    Cytidylate kinase () catalyses the phosphorylation of cytidine 5'-monophosphate (dCMP) to cytidine 5'-diphosphate (dCDP) in the presence of ATP or GTP.

    \ 3265 IPR004690 \ The MSS family includes the monobasic malonate:Na+ symporter of Malonomonas rubra. It consists of two integral membrane proteins, MadL and MadM. The transporter is believed to catalyze the electroneutral reversible uptake of H+-malonate with one Na+, and both subunits have been shown to be essential for activity.\ 7798 IPR012935 \

    This zinc-finger like domain is distributed throughout the eukaryotic kingdom in NIPA (Nuclear interacting partner of ALK) and other proteins. NIPA is thought to perform an antiapoptotic role in nucleophosmin-anaplastic lymphoma kinase (ALK) mediated signalling events PUBMED:12748172. The domain is often repeated, with the second domain usually containing a large insert (approximately 90 residues) after the first three cysteine residues. The Schizosaccharomyces pombe protein containing this domain () is involved in mRNA export from the nucleus PUBMED:15357289.

    \ 3596 IPR007289 \ This short protein has no known function and is found in Jaagsiekte sheep retrovirus. Jaagsiekte sheep retrovirus (JSRV) is the etiological agent of a contagious lung tumour of sheep known as sheep pulmonary adenomatosis. JSRV exhibits a simple genetic organization, characteristic of the type D and type B retroviruses, with the canonical retroviral sequences gag, pro, pol and env encoding the structural proteins of the virion and an additional open reading frame (orf-x), of approximately 500 bp overlapping pol PUBMED:10653922.\ 7051 IPR010820 \

    This family represents a conserved region approximately 350 residues long within a number of plant proteins of unknown function.

    \ 739 IPR001932 \

    This domain is found in protein phosphatase 2C, as well as other proteins eg. pyruvate dehydrogenase (lipoamide)]-phosphatase () and adenylate cyclase ().

    \

    Protein phosphatase 2C (PP2C) is one of the four major classes of mammalian\ serine/threonine specific protein phosphatases (). PP2C PUBMED:1312947 is a\ monomeric enzyme of about 42 Kd which shows broad substrate specificity and\ is dependent on divalent cations (mainly manganese and magnesium) for its\ activity. Its exact physiological role is still unclear. Three isozymes are\ currently known in mammals: PP2C-alpha, -beta and -gamma. In yeast, there are\ at least four PP2C homologs: phosphatase PTC1 PUBMED:8395005 which has weak tyrosine\ phosphatase activity in addition to its activity on serines, phosphatases PTC2\ and PTC3, and hypothetical protein YBR125c. Isozymes of PP2C are also known\ from Arabidopsis thaliana (ABI1, PPH1), Caenorhabditis elegans (FEM-2,\ F42G9.1, T23F11.1), Leishmania chagasi and Paramecium tetraurelia.\ In Arabidopsis thaliana, the kinase associated protein phosphatase (KAPP) PUBMED:7973632\ is an enzyme that dephosphorylates the Ser/Thr receptor-like kinase RLK5 and\ which contains a C-terminal PP2C domain.

    \

    PP2C does not seem to be evolutionary related to the main family of serine/\ threonine phosphatases: PP1, PP2A and PP2B. However, it is significantly\ similar to the catalytic subunit of pyruvate dehydrogenase phosphatase\ () (PDPC) PUBMED:8396421, which catalyzes dephosphorylation and concomitant\ reactivation of the alpha subunit of the E1 component of the pyruvate\ dehydrogenase complex. PDPC is a mitochondrial enzyme and, like PP2C, is\ magnesium-dependent.

    \ 7701 IPR012452 \

    This domain appears to be restricted to the Bacillales.

    \ 297 IPR007656 \ This is a family of uncharacterised proteins.\ 5411 IPR008396 \ This region is found in albicidin resistance proteins. Its boundaries were determined by its existence as a tandem repeat in Burkholderia pseudomallei protein BPSL2084.\ 1096 IPR004026 \ The Escherichia coli Ada protein repairs O6-methylguanine residues and methyl phosphotriesters in DNA by direct transfer of the methyl group to a\ cysteine residue. This domain contains four conserved cysteines that form a\ zinc binding site PUBMED:1581309, PUBMED:8500619. One of these cysteines is a methyl group acceptor. The methylated domain can then specifically bind to the ada box on a DNA duplex PUBMED:8500619.\ 7963 IPR012553 \

    This family consists of the defensin-like peptides (DLPs) isolated from platypus venom. These DLPs show similar three-dimensional fold to that of beta-defensin-12 and sodium-channel neurotoxin Shl. However the side chains known to be functionally important to beta-defensin-12 and Shl are not conserved in DLPs. This suggests a different biological function. Consistent with this contention, DLPs have been shown to possess no anti-microbial properties and have no observable activity on rat dorsal-root-ganglion sodium-channel currents PUBMED:10417345.

    \ 5525 IPR008881 \ In the Escherichia coli cytosol, a fraction of the newly synthesised proteins requires the activity of molecular chaperones for folding to the native state. The major chaperones implicated in this folding process are the ribosome-associated Trigger Factor (TF), and the DnaK and GroEL chaperones with their respective co-chaperones. Trigger Factor is an ATP-independent chaperone and displays chaperone and peptidyl-prolyl-cis-trans-isomerase (PPIase) activities in vitro. It is composed of at least three domains, an N-terminal domain which mediates association with the large ribosomal subunit, a central substrate binding and PPIase domain with homology to FKBP proteins, and a C-terminal domain of unknown function. The positioning of TF at the peptide exit channel, together with its ability to interact with nascent chains as short as 57 residues renders TF a prime candidate for being the first chaperone that binds to the nascent polypeptide chains PUBMED:12603737. This group of sequences contain the ribosomal subunit association domain.\ 538 IPR003804 \ L-lactate permease is an integral membrane protein probably involved in L-lactate transport.\ 313 IPR006911 \ This domain is found in mammalian proteins of unknown function.\ 1081 IPR001030 \ Synonym(s): Citrate hydro-lyase, Aconitase\

    Aconitase (aconitate hydratase) () is the enzyme from the\ tricarboxylic acid cycle that catalyzes the reversible, stereo-specific,\ isomerization of citrate to isocitrate via cis-aconitate in the tricarboxylic acid\ cycle, a non-redox active process PUBMED:2598939, PUBMED:9020582. Aconitase, in\ its active form, contains a 4Fe-4S iron-sulphur cluster; three cysteine residues have\ been shown to be ligands of the 4Fe-4S cluster PUBMED:2726740. Unlike the majority of\ iron-sulphur proteins that function as electron carriers, the Fe-S cluster of\ aconitase reacts directly with an enzyme substrate PUBMED:8151704.

    \

    In eukaryotes two isozymes of aconitase are known to exist: one found in the\ mitochondrial matrix and the other found in the cytoplasm. The aconitase family\ contains a variety of proteins which include: the iron-responsive element binding\ protein (IRE-BP)PUBMED:8347279; alpha-isopropylmalate isomerase, an enzyme catalysing\ the second step in the biosynthesis of leucine; and homoaconitase.

    \

    The aconitate hydratase, N-terminal domain is almost always found along with the aconitate hydratase, C-terminal domain .

    \ 3949 IPR006798 \

    This entry represents the Poxvirus F16 proteins.

    \ 557 IPR001795 \

    The nucleotide sequence for the RNA of potato leafroll luteovirus (PLRV) has been determined PUBMED:2732710, PUBMED:2466700. The sequence contains six large open reading frames (ORFs). The 5' coding region encodes two polypeptides of 28K and 70K, which overlap in different reading frames; it is suggested that the third ORF in the 5' block is translated by frameshift readthrough near the end of the 70K protein, yielding a 118K polypeptide PUBMED:2732710. The C-terminal part of the 118K protein contains a consensus sequence for RNA-dependent RNA-polymerases PUBMED:2732710.

    \

    The genomic RNA sequence of cowpea southern bean mosaic virus (SBMV-C) has been determined PUBMED:2823471. The genome contains four ORFs. The largest ORF encodes the two largest proteins translated in cell-free extracts from full-length virion RNA PUBMED:2823471. Segments of the predicted amino acid sequence of this ORF resemble those of known viral RNA-polymerases, ATP-binding proteins and viral genome-linked proteins PUBMED:2823471.

    \

    The genome sequence of pea enation mosaic virus (PEMV) RNA 1 shows strong organisational relationships and sequence similarities to the beet western yellows virus (BWYV) and PLRV PUBMED:1875194. Sequence analysis reveals five predominant ORFs. The third ORF is characterised by a number of RNA-polymerase motifs and a helicase-like motif typical of RNA-dependent RNA-polymerases PUBMED:1875194. It overlaps (out of frame) the ORF 2 product and is proposed to be expressed by a frameshift fusion of ORF 2 and ORF 3 PUBMED:1875194.

    \

    The PLRV sequence shows some similarities to the putative polymerase of SBMV PUBMED:2823471, and more extensive similarities to the corresponding BWYV polypeptide PUBMED:3194229.

    \ 3814 IPR003514 \ This is a family of proteins from single-stranded DNA bacteriophages. Protein F is the major capsid component, sixty\ copies of which are found in the virion. The virion is also composed of 60 copies of each of the G and J proteins, and 12 copies of the H protein.\ 6437 IPR010573 \

    This family consists of several fungal specific trichothecene efflux pump proteins. Many of the genes involved in trichothecene toxin biosynthesis in Fusarium sporotrichioides are present within a gene cluster. It has been suggested that TRI12 may play a role in F. sporotrichioides self-protection against trichothecenes PUBMED:10485289.

    \ 3855 IPR006918 \

    This is a family of plant proteins believed to be phytochelatin synthetases. This enzyme is responsible for the production of phytochelatins, small glutamic acid, cysteine and glycine-rich peptides, produced in response to cadmium stress. In yeast it would appear that this activity is performed by \ glutathione synthetase (GSH2) (), suggesting that the yeast GSH2 () encodes a bifunctional enzyme that is able to catalyse both the synthesis of GSH by adding glycine to the dipeptide (gammaGlu-Cys) and the synthesis of phytochelatins PUBMED:9729167, PUBMED:10219997.

    \ \ 6812 IPR010729 \

    This entry represents the N-terminal region (approximately 8 residues) of the eukaryotic mitochondrial 39-S ribosomal protein L47 (MRP-L47). Mitochondrial ribosomal proteins (MRPs) are the counterparts of the cytoplasmic ribosomal proteins, in that they fulfil similar functions in protein biosynthesis. However, they are distinct in number, features and primary structure PUBMED:9445368.

    \ 78 IPR006158 \

    The cobalamin (vitamin B12) binding domain has an alpha/beta fold that is a common motif found in several different cobalamin-binding proteins. Proteins containing this domain include methionine synthase, the small subunit of glutamate mutase PUBMED:10467146, and the alpha and beta subunits of methylmalonyl-CoA mutase. In methionine synthase, there is a second, adjacent domain involved in cobalamin binding that forms a 4-helical bundle cap (); in the conversion to the active conformation of this enzyme, the 4-helical cap rotates to allow the cobalamin cofactor to bind the activation domain () PUBMED:11731805.

    \

    The core structure of the cobalamin domain consists of 5 parallel beta-sheets, surrounded by 4-5 alpha helices in three layers, alpha/beta/alpha PUBMED:7992050. The fold of the domain resembles that of the nucleotide-binding proteins (a Rossman fold). Upon binding B12, important elements of the binding site appear to become structured, including an alpha-helix that forms on one side of the cleft accommodating the nucleotide 'tail' of the cofactor.

    \ \ \ 6583 IPR009610 \

    This family consists of several hypothetical proteins which seem to be specific to the enterobacteria Escherichia coli and Shigella flexneri. Family members are often known as YeeV proteins and are around 125 residues in length. The function of this family is unknown.

    \ 7580 IPR011679 \

    ERp29 is a ubiquitously expressed endoplasmic reticulum protein found in mammals PUBMED:11435111. This protein is found associated with an N-terminal thioredoxin-like domain (), which is homologous to the domain of human protein disulfide isomerase (PDI). ERp29 may help mediate the chaperone function of PDI. The C-terminal Erp29 domain has a 5-helical bundle fold. ERp29 is thought to form part of the thyroglobulin folding complex PUBMED:11884402.

    \ 7698 IPR012449 \

    This family consists of proteins from the Pseudomonadaceae.

    \ 1661 IPR001064 \

    Crystallins are the dominant structural components of the eye lens. Among the\ different type of crystallins, the beta and gamma crystallins form a family of\ related proteins PUBMED:2107329, PUBMED:3064189. Structurally, beta and gamma crystallins\ are composed of two similar domains which, in turn, are each composed of two similar\ motifs with the two domains connected by a short connecting peptide. Each motif,\ which is about forty amino acid residues long, is folded in a distinctive \ 'Greek key' pattern.

    \ \ 536 IPR003475 \ This family of insect proteins are each about 100 amino acids long and have 6 conserved cysteine residues. They all have a predicted signal peptide and are probably excreted. The function of the proteins is unknown PUBMED:8568884.\ 5627 IPR008432 \ This family consists of plant cytochrome c oxidase subunit 5c proteins PUBMED:10586516.\ 220 IPR001623 \

    The prokaryotic heat shock protein DnaJ interacts with the chaperone hsp70-like DnaK protein PUBMED:8016869. Structurally, the DnaJ protein consists of an N-terminal conserved domain (called 'J' domain) of about 70 amino acids, a glycine-rich region ('G' domain') of about 30 residues, a central domain containing four repeats of a CXXCXGXG motif ('CRR' domain) and a C-terminal region of 120 to 170 residues.

    \

    Such a structure is shown in the following schematic representation:\

    \
      +------------+-+-------+-----+-----------+--------------------------------+\
      | N-terminal | | Gly-R |     | CXXCXGXG  | C-terminal                     |\
      +------------+-+-------+-----+-----------+--------------------------------+\
    

    \

    It is thought that the 'J' domain of DnaJ mediates the interaction with the dnaK protein. The J- and CRR-domains are found in many prokaryotic and eukaryotic proteins PUBMED:1585456, either together or separately: e.g., those containing both J- and CRR-domains include yeast proteins MAS5/YDJ1, MDJ1, SCJ1, XDJ1 and YNL077w, plant dnaJ homologues from leek and cucumber, and human HDJ2; those with only the J-domain include Sinorhizobium fredii nolC, Escherichia coli cbpA PUBMED:8302830, yeast proteins SEC63/NPL1, SIS1, CAJ1, YFR041c, YIR004w and YJL162c, Plasmodium falciparum ring-infected erythrocyte surface antigen, human HDJ1 and HSJ1, and drosophila cysteine-string protein.

    \ 8151 IPR013160 \

    This is a family of putative ER integral membrane proteins involved in cell wall organisation and biogenesis. Deletion of Saccharomyces cerevisiae BIG1 causes an approximately 95% reduction in cell wall beta-1,6-glucan, an essential polymer involved in the cell wall attachment of many surface mannoproteins. PUBMED:12112232.

    \ 1835 IPR007834 \ This family contains SEM1 and DSS1 which are short acidic proteins. In Saccharomyces cerevisiae, SEM1 is a regulator of both exocyst function and pseudohyphal differentiation PUBMED:9927667. Loss of DSS1 in humans has been associated with split hand/split foot malformations PUBMED:8782053\ 3416 IPR007757 \ MT-A70 is the S-adenosylmethionine-binding subunit of human mRNA:m6A methyl-transferase (MTase), an enzyme that sequence-specifically methylates adenines in pre-mRNAs.\ 8147 IPR013169 \

    The cwf18 family is involved in mRNA splicing. It has been isolated as a subcomplex of the splicosome in Schizosaccharomyces pombe PUBMED:11884590.

    \ 7609 IPR012485 \

    Mis6 is an essential centromere connector protein acting during G1-S phase of the cell cycle. Mis6 is thought to be required for recruiting CENP-A, the centromere- specific histone H3 variant; an important event for centromere function and chromosome segregation during mitosis PUBMED:9230309, PUBMED:10864871.

    \ 6201 IPR009423 \

    This family consists of several NADH-ubiquinone oxidoreductase subunit b14.5b proteins ().

    \ 385 IPR000539 \ The frizzled (fz) locus of Drosophila coordinates the cytoskeletons of epidermal cells, producing a parallel array of cuticular hairs and bristles PUBMED:2174014, PUBMED:2493583. In fz mutants, the orientation of individual hairs with respect both to their neighbours and to the organism as a whole is altered. In the wild-type wing, all hairs point towards the distal tip PUBMED:2493583. In the developing wing, fz has 2 functions: it is required for the proximal-distal transmission of an intracellular polarity signal; and it is required for cells to respond to the polarity signal. Fz produces an mRNA that encodes an integral membrane protein with 7 putative transmembrane (TM) domains. This protein should contain both extracellular and cytoplasmic domains, which could function in the transmission and interpretation of polarity information PUBMED:2493583. This signature is usually found downstream of the Fz domain ()\ 3559 IPR003421 \

    This group of enzymes act on the CH-NH substrate bond using NAD(+) or NADP(+) as an acceptor. This domain is found primarily in octopine dehydrogenase (), nopaline dehydrogenase (), and lysopine dehydrogenase (). NADPH is the preferred cofactor, but NADH is also used. Octopine dehydrogenase is involved in the reductive condensation of arginine and pyruvic acid to D-octopine PUBMED:9665174.

    \

    Some of these opine dehydrogenases are involved in crown gall tumours that are produced by Agrobacterium sp., and which encode for the opine dehydrogenases on a Ti-plasmid. These bacteria can transfer a portion of this plasmid (T-DNA) to a susceptible plant cell; the T-DNA then integrates into the plant nuclear genome, where its genes can be expressed. Some of these genes direct the synthesis and secretion of unusual amino acid and sugar derivatives called opines these opines are used as a carbon and sometimes a nitrogen source by the infecting bacteria.

    \ \ 4822 IPR002882 \

    This entry contains LPPG:Fo 2-phospho-L-lactate transferase (CofD) and related sequences of unknown function. CofD catalyses the fourth step in the biosynthesis of coenzyme F420, a flavin derivative found in methanogens, Mycobacteria, and several other lineages. This enzyme is characterised so far in Methanococcus jannaschii PUBMED:11888293 but appears restricted to F420-containing species and is predicted to carry out the same function in these other species.

    \ 6153 IPR009397 \

    This family consists of several Vesiculovirus matrix proteins. The matrix (M) protein of vesicular stomatitis virus (VSV) expressed in the absence of other viral components causes many of the cytopathic effects of VSV, including an inhibition of host gene expression and the induction of cell rounding. It has been shown that M protein also induces apoptosis in the absence of other viral components. It is thought that the activation of apoptotic pathways causes the inhibition of host gene expression and cell rounding by M protein PUBMED:12692256.

    \ 3100 IPR007062 \ Protein phosphatase inhibitor 2 (IPP-2) is a phosphoprotein conserved among all eukaryotes, and it appears in both the nucleus and cytoplasm of tissue culture cells PUBMED:12235284.\ 5305 IPR008833 \ Surfeit locus protein 2 is part of a group of at least six sequence unrelated genes (Surf-1 to Surf-6). The six Surfeit genes have been classified as housekeeping genes, being expressed in all tissue types tested and not containing a TATA box in their promoter region. The exact function of SURF2 is unknown PUBMED:9414319.\ 4483 IPR001543 \ Proteins in this group are involved in a secretory pathway responsible for the surface presentation of invasion plasmid antigen needed for the entry of Salmonella and other species into mammalian cells\ PUBMED:1447979, PUBMED:8885278.They could play a role in preserving the translocation competence of the IPA antigens and are required for secretion of the three IPA proteins PUBMED:1312536.\ 8014 IPR012594 \

    This family consists of the pedibin and Hym-346 signalling peptides. These two peptides have been isolated from Hydra vulgaris and Hydra magnipapillata. Experiments have indicated that both cause a reduction in the positional value gradient, the principle patterning process governing the maintenance of form in the adult hydra. The peptides cause an increase in the rate of foot regeneration following bisection of the body column. Thus both play important signalling roles in patterning processes in cnidaria and maybe in more complex metazoans PUBMED:9876180.

    \ 4337 IPR003032 \ This domain is called RyR for Ryanodine receptor PUBMED:10664581. The domain is found in four copies in the ryanodine receptor. The function of this domain is unknown.\ 5552 IPR008854 \ This family consists of thiopurine S-methyltransferase proteins from both eukaryotes and prokaryotes. Thiopurine S-methyltransferase (TPMT) is a cytosolic enzyme that catalyses S-methylation of aromatic and heterocyclic sulphhydryl compounds, including anticancer and immunosuppressive thiopurines PUBMED:9780226.\ 1521 IPR003055 \ The tsx gene of Escherichia coli encodes an outer membrane protein, Tsx,\ which constitutes the receptor for colicin K and Enterobacteria phage T6, and \ functions as a substrate-specific channel for nucleosides and deoxy-\ nucleosides PUBMED:2265760. The protein contains 294 amino acids, the first 22 of which are characteristic of a bacterial signal sequence peptide. The putative mature form of Tsx contains 272 residues with a calculated Mr of\ 31418. The Tsx sequence shows an even distribution of charged residues\ and lacks extensive hydrophobic stretches PUBMED:2265760. Tsx shows no significant similarities to the channel-forming proteins OmpC, OmpF, PhoE and LamB from the E. coli outer membrane.\ 2229 IPR006835 \ This represents a conserved region found in a number of Chlamydophila pneumoniae proteins.\ 828 IPR001660 \

    The sterile alpha motif (SAM) domain is a putative protein interaction module present in a wide variety of proteins PUBMED:9007998\ involved in many biological processes. The SAM domain that spreads over around 70 residues is found in diverse\ eukaryotic organisms PUBMED:9886291. SAM domains have been shown to homo- and hetero-oligomerise, forming multiple self-association architectures and also binding to various non-SAM\ domain-containing proteins PUBMED:9343432, nevertheless with a\ low affinity constant PUBMED:9933164. SAM domains also appear to possess the ability to bind RNA PUBMED:14659692. Smaug a protein that helps to establish a morphogen gradient in Drosophila embryos by\ repressing the translation of nanos (nos) mRNA binds to the 3'\ untranslated region (UTR) of nos mRNA via two similar hairpin structures. The 3D crystal\ structure of the Smaug RNA-binding region shows a cluster of positively charged residues on the Smaug-SAM domain, which\ could be the RNA-binding surface. This electropositive potential is unique among all previously\ determined SAM-domain structures and is conserved among Smaug-SAM homologs. These results\ suggest that the SAM domain might have a primary role in RNA binding.

    \ \

    Structural analyses show that the SAM domain is arranged in a small five-helix bundle with two large interfaces PUBMED:9343432. In\ the case of the SAM domain of EphB2, each of these interfaces is able to form dimers. The presence of these two\ distinct intermonomers binding surface suggest that SAM could form extended polymeric structures PUBMED:9933164.

    \ \ \ 3349 IPR001132 \ This family of proteins was first identified in Caenorhabditis elegans. Mammalian dwarfins are\ phosphorylated in response to transforming growth factor beta and are implicated\ in control of cell growth \ PUBMED:8799132. \ The dwarfin family also includes the Drosophila protein MAD\ that is required for the function of decapentaplegic (DPP) and may play a role in\ DPP signaling. Drosophila Mad binds to DNA and directly mediates activation\ of vestigial by Dpp PUBMED:9230443.\ 3548 IPR002668 \

    This entry contains nucleoside transport proteins. S282_RAT is a purine-specific Na+-nucleoside cotransporter localised to the bile canalicular membrane PUBMED:8027026. S281_RAT is a a Na+-dependent nucleoside transporter selective for pyrimidine nucleosides and adenosine it also transports the anti-viral nucleoside analogues AZT and ddC PUBMED:7775409.

    \ 6304 IPR009468 \

    This family consists of several bacterial proteins of unknown function and is known as YqjC in Escherichia coli.

    \ 5701 IPR008689 \ This family consists of several ATP synthase D chain, mitochondrial (ATP5H) proteins. Subunit D has no extensive hydrophobic sequences, and is not apparently related to any subunit described in the simpler ATP synthases in bacteria and chloroplasts PUBMED:7509337, PUBMED:2890767.\ 7969 IPR012552 \

    This family consists of the DVL family of proteins. In a gain-of-function genetic screen for genes that influence fruit development in Arabidopsis, DEVIL (DVL) gene was identified. DVL is a small protein and over expression of the protein results in pleiotropic phenotypes featured by shortened stature, rounder rosette leaves, clustered inflorescences, shortened pedicles, and siliques with pronged tips. DVL family is a novel class of small polypeptides and the over expression phenotypes suggest that these polypeptides may have a role in plant development PUBMED:14871303.

    \ 41 IPR002937 \ This entry consists of various amine oxidases, including maize polyamine oxidase (PAO) PUBMED:9598979, L-amino acid oxidases (LAO) and various flavin containing monoamine oxidases (MAO). The aligned region includes the flavin binding site of these\ enzymes.\ In vertebrates MAO plays an important role in regulating the intracellular levels of amines via their oxidation; these include various neurotransmitters, neurotoxins and trace amines PUBMED:9162023. In lower eukaryotes\ such as aspergillus and in bacteria the main role of amine oxidases is to provide a source of ammonium PUBMED:7770050.\ PAOs in plants, bacteria and protozoa oxidise spermidine and spermine to an aminobutyral, diaminopropane and hydrogen peroxide and are involved in the catabolism of polyamines PUBMED:9598979.\ Other members of this family include tryptophan 2-monooxygenase, putrescine oxidase, corticosteroid binding proteins and antibacterial glycoproteins.\ 5517 IPR008541 \ This family consists of a series of repeated sequences (of around 180 residues) which are found in Salmonella typhimurium, Salmonella typhi and Escherichia coli. These repeats are almost always found with . The repeats are associated with RatA and RatB, the coding sequences of which are found in the pathogeneicity island of Salmonella. The sequences may be determinants of pathogenicity PUBMED:12540539, PUBMED:15347755.\ 500 IPR001288 \

    Initiation factor 3 (IF-3) (gene infC) is one of the three factors required for the \ initiation of protein biosynthesis in bacteria. IF-3 is thought to function as a \ fidelity factor during the assembly of the ternary initiation complex which consist of \ the 30S ribosomal subunit, the initiator tRNA and the messenger RNA. IF-3 is a basic\ protein that binds to the 30S ribosomal subunit PUBMED:8405963. The chloroplast initiation factor IF-3(chl) is a protein that \ enhances the poly(A,U,G)-dependent binding of the initiator tRNA to chloroplast ribosomal\ 30s subunits in which the central section is evolutionary related to the sequence of \ bacterial IF-3 PUBMED:8144528.

    \ 3576 IPR000036 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Aspartic endopeptidases () of vertebrate, fungal and retroviral origin have been characterised PUBMED:1455179.\ Aspartate peptidases are so named because Asp residues are the ligands of the activated water molecule in all examples where the catalytic residues have been identified, although at least one viral enzyme is believed to have an Asp and an Asn as its catalytic dyad. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    \

    This group of aspartic peptidases belongs to the MEROPS family A26 (clan AF). The omptin family, comprises a number of novel outer membrane-associated\ serine proteases that are distinct from trypsin-like proteases in that \ they cleave polypeptides between two basically-charged amino acids PUBMED:3056908. The\ enzyme is sensitive to the serine protease inhibitor diisopropylfluoro-\ phosphate, to divalent cations such as Cu2+, Zn2+ and Fe2+ PUBMED:3056908, and is\ temperature regulated, activity decreasing at lower temperatures PUBMED:3056908, PUBMED:8288530.\ Temperature regulation is most prominently shown in the Yersinia pestis\ coagulase/fibrinolysin protein, where coagulase activity is prevalent \ below 30 degrees Celsius, and fibrinolysin (protease) activity is prevalent\ above this point, the optimum temperature being 37 degrees PUBMED:2526282. It is possible that this assists in 'flea blockage' and transmission of the bacteria to animals PUBMED:2526282.

    \ \ The Escherichia coli OmpT has previously been classified as a serine protease with Ser(99) and His(212) as active site residues. The X-ray structure of the enzyme is inconsistent with this classification, and the involvement of a nucleophilic water molecule that is activated by the Asp(210)/His(212) catalytic\ dyad classifies this as a aspartic endopeptidase where activity is also strongly dependent on Asp(83) and Asp(85). Both may function in binding of the water molecule and/or oxyanion stabilisation. The proposed mechanism implies a novel proteolytic catalytic site PUBMED:11576541, PUBMED:11566868.\ 265 IPR005178 \

    This is a family of mainly hypothetical proteins of no known function.

    \ 6239 IPR010486 \

    HNS (histone-like nucleoid structuring)-dependent expression A (HdeA) protein is a stress response protein found in highly acid resistant bacteria such as Shigella flexneri and Escherichia coli, but which is lacking in mildly acid tolerant bacteria such as Salmonella PUBMED:10623550. HdeA is one of the most abundant proteins found in the periplasmic space of E. coli, where it is one of a network of proteins that confer an acid resistance phenotype essential for the pathogenesis of enteric bacteria PUBMED:12694615. HdeA is thought to act as a chaperone, functioning to prevent the aggregation of periplasmic proteins denatured under acidic conditions. The HNS protein, a chromatin-associated protein that influences the gene expression of several environmentally-induced target genes, represses the expression of HdeA. HdeB, which is encoded within the same operon, may form heterodimers with HdeA. HdeA is a single domain protein with an overall fold that is similar to the fold of the N-terminal subdomain of the GluRS anticodon-binding domain. This entry only covers the HdeA domain.

    \ \ 7013 IPR009844 \

    This family consists of several archaeal proteins of around 180 residues in length. Members of this family seem to be found exclusively in Sulfolobus tokodaii and Sulfolobus solfataricus. The function of this family is unknown.

    \ 542 IPR007174 \ Las1 is an essential nuclear protein involved in cell morphogenesis and cell surface growth PUBMED:8582632.\ 3539 IPR002075 \

    Ran () is an evolutionary conserved member of the Ras superfamily of small GTPases that regulates all receptor-mediated transport between the\ nucleus and the cytoplasm. Import\ receptors bind their cargos in the cytoplasm where the concentration of RanGTP is low and release their cargos in the\ nucleus where the concentration of RanGTP is high PUBMED:12019565. Export receptors respond to Ran GTP in the opposite\ manner.

    Nuclear transport factor 2 (NTF2) is a homodimer of approximately 14kDa subunits which stimulates efficient nuclear import\ of a cargo protein. NTF2 binds to both RanGDP and FxFG repeat-containing nucleoporins. NTF2 binds to RanGDP\ sufficiently strongly for the complex to remain intact during transport through NPCs,\ but the interaction between NTF2 and FxFG nucleoporins is much more transient,\ which would enable NTF2 to move through the NPC by hopping from one repeat to\ another PUBMED:11129791, PUBMED:10930458.

    NTF2 folds into a cone with a deep hydrophobic cavity, the opening of which is surrounded by several negatively charged residues. RanGDP binds to NTF2 by inserting a conserved phenylalanine residue into the hydrophobic pocket of NTF2 and making electrostatic interactions with the conserved negatively charged residues that surround the cavity.

    \

    A structurally similar domain appears in other nuclear import proteins.

    \ 8019 IPR012619 \

    This family consists of myoactive tetradecapeptides that are isolated from the gut of earthworms, Eisenia foetida and Pheretima vitata. These peptides were termed ETP and PTP respectively. Both peptides showed a potent excitatory action on spontaneous contractions of the anterior gut. These peptides show similarity to Molluscan tetradecapeptides and arthropodan tridecapeptides PUBMED:8532604.

    \ 3247 IPR003815 \

    In bacteria, the regulation of gene expression in response to changes in cell density is called quorum sensing. Quorum-sensing bacteria produce, release, and respond to hormone-like molecules (autoinducers) that accumulate in the external environment as the cell population grows. The LuxS protein is involved in quorum sensing and is a autoinducer-production protein PUBMED:9990077.

    \ 7708 IPR012456 \

    The proteins in this entry have not been characterised.

    \ 2915 IPR004998 \

    This is a family of early or early-intermediate transcription factors. This family includes EBV BRLF1 and similar ORF 50 proteins from other herpesviruses.

    \ 6338 IPR009482 \

    This family consists of several hypothetical archaeal proteins of unknown function.

    \ 4625 IPR002155 \

    Two different types of thiolase PUBMED:1755959, PUBMED:2191949, PUBMED:1354266 are found both in eukaryotes and in prokaryotes: acetoacetyl-CoA thiolase () and 3-ketoacyl-CoA thiolase (). 3-ketoacyl-CoA thiolase (also called thiolase I) has a broad chain-length specificity for its substrates and is involved in degradative pathways such as fatty acid beta-oxidation. Acetoacetyl-CoA thiolase (also called thiolase II) is specific for the thiolysis of acetoacetyl-CoA and involved in biosynthetic pathways such as poly beta-hydroxybutyrate synthesis or steroid biogenesis.

    \ \

    In eukaryotes, there are two forms of 3-ketoacyl-CoA thiolase: one located in the mitochondrion and the other in peroxisomes.

    \ \

    There are two conserved cysteine residues important for thiolase activity. The first located in the N-terminal section of the enzymes is involved in the formation of an acyl-enzyme intermediate; the second located at the C-terminal extremity is the active site base involved in deprotonation in the condensation reaction.

    \ \

    Mammalian nonspecific lipid-transfer protein (nsL-TP) (also known as sterol carrier protein 2) is a protein which seems to exist in two different forms: a 14 Kd protein (SCP-2) and a larger 58 Kd protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in lipid transport; the latter is found in peroxisomes. The C-terminal part of SCP-x is identical to SCP-2 while the N-terminal portion is evolutionary related to thiolases PUBMED:1755959.

    \ 3386 IPR006963 \

    The molybdopterin oxidoreductase Fe4S4 domain is found in a number of reductase/dehydrogenase families, which include the periplasmic nitrate reductase precursor and the formate dehydrogenase alpha chain.

    \ 1911 IPR003795 \

    This entry describes proteins of unknown function.

    \ 7078 IPR009880 \

    This entry represents the N terminus (approximately 300 residues) of a number of plant and fungal glyoxal oxidase enzymes. Glyoxal oxidase catalyses the oxidation of aldehydes to carboxylic acids, coupled with reduction of dioxygen to hydrogen peroxide. It is an essential component of the extracellular lignin degradation pathways of the wood-rot fungus Phanerochaete chrysosporium PUBMED:10593910.

    \ 2568 IPR004851 \ Flotillins are integral membrane proteins that have been shown to be present in several subcellular components, including caveolae\ (invaginated plasma membrane microdomains), lipid rafts (sphingolipid and cholesterol-rich, detergent-resistant plasma membrane\ microdomains), and the Golgi apparatus. The molecular function of flotillins remains uncertain. They are probably involved in organizing\ the structure of caveolae and lipid rafts, and other detergent resistant membrane domains. They may also be involved in signal\ transduction. Flotillins have been shown to accumulate in brain cells with the development of Alzheimer's pathology PUBMED:10936685. Also included\ in this domain are Reggie proteins, which are expressed in non-caveolar neuronal plasma membrane domains. \ \ 2534 IPR006714 \ Periplasmic flagella are the organelles of spirochete mobility, and are structurally different from the flagella of other motile bacteria. They reside inside the cell within the periplasmic space, and confer mobility in viscous gel-like media such as connective tissue PUBMED:2194955. The flagella are composed of an outer sheath of FlaA proteins and a core filament of FlaB proteins. Each species usually has several FlaA protein species PUBMED:8990312.\ 7132 IPR009917 \

    This family consists of several hypothetical mammalian steroid receptor RNA activator proteins. SRA-RNAs likely to encode stable proteins are widely expressed in breast cancer cell lines. SRA-RNA is a steroid receptor co-activator which acts as a functional RNA and is classified as belonging to the growing family of functional non-coding RNAs.

    \ 540 IPR004860 \ This is a family of site-specific DNA endonucleases encoded by DNA mobile elements. Similar to the homing endonuclease LAGLIDADG/HNH domain (), the members of this family are also LAGLIDADG endonucleases. \ 1596 IPR003203 \ This family is composed of a group of bifunctional cobalbumin biosynthesis enzymes which display cobinamide kinase and cobinamide phosphate guanyltransferase activity. The crystal structure of the enzyme reveals the molecule to be a trimer with a propeller-like shape PUBMED:9601028.\ 4786 IPR002618 \ This family consists of UTP--glucose-1-phosphate uridylyltransferases, . Also known as UDP-glucose pyrophosphorylase (UDPGP) and Glucose-1-phosphate uridylyltransferase. UTP--glucose-1-phosphate uridylyltransferase catalyses the interconversion of MgUTP + glucose-1-phosphate and UDP-glucose + MgPPi PUBMED:8631325. UDP-glucose is an important intermediate in mammalian carbohydrate interconversion involved in various metabolic roles depending on tissue type PUBMED:8631325. In Dictyostelium discoideum (slime mold), mutants in this enzyme abort the development cycle PUBMED:3035502. Also within this family is UDP-N-acetylglucosamine pyrophosphorylase () PUBMED:9603950 and two hypothetical proteins from Borrelia burgdorferi, the lyme disease spirochaete ( and ).\ 4008 IPR000492 \ Protamines P1 and P2 form a family of small basic peptides that represent the major sperm proteins in placental mammals. In human and mouse protamine P2 is one of the most abundant sperm proteins.\ Protamine 2 (PRM2) is a low molecular weight arginine-rich protein which is present in haploid spermatogenic cells of human, mouse and other primates. The protamine P2 gene codes for a P2 precursor, pro-P2 which is later processed by proteolytic cleavages in its N-terminal region to form the mature P2 protamines PUBMED:8513810.\

    Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis. They compact sperm DNA into a highly condensed, stable and inactive complex.

    \ 1373 IPR003540 \ A large group of bacterial exotoxins are referred to as "A/B toxins", \ essentially because they are formed from two subunits. The "A" subunit\ possesses enzyme activity, and is transferred to the host cell following\ a conformational change in the membrane-bound transport "B" subunit PUBMED:8225592.\ \

    Clostridial species are one of the major causes of food poisoning/gastro-\ intestinal illnesses. They are Gram-positive, spore-forming rods that occur\ naturally in the soil PUBMED:8225592. Included in the family are: Clostridium botulinum, which produces one of the most potent toxins in existence; Clostridium tetani, causative agent of tetanus; and Clostridium perfringens, commonly found in wound infections and diarrhoea cases.

    \ \

    Among the toxins produced by certain Clostridium spp. are the binary \ exotoxins. These proteins consist of two independent polypeptides, which\ correspond to the A/B subunit moieties. The enzyme component (A) enters \ the cell through endosomes produced by the oligomeric binding/translocation\ protein (B), and prevents actin polymerisation through ADP-ribosylation of \ monomeric G-actin PUBMED:8225592, PUBMED:8645309, PUBMED:10802189.

    \ \

    Members of the "A" binary toxin family include C.perfringens iota toxin Ia\ PUBMED:8225592, C.botulinum C2 toxin CI PUBMED:8645309, and Clostridium difficile ADP-ribosyltransferase \ PUBMED:10802189. Other homologous proteins have been found in Clostridium spiroforme PUBMED:8645309, PUBMED:10802189.

    \ 2196 IPR007493 \ This family consists of several plant proteins of unknown function.\ 1791 IPR006691 \ This repeat is found as 6 tandem copies at the C-termini of GyrA and ParC DNA gyrases. It is predicted to form 4 beta strands and to probably form a beta-propeller structure PUBMED:11948780. This region has been shown to bind DNA non-specifically and may stabilize the DNA-topoisomerase complex PUBMED:1657531.\ 5115 IPR007952 \

    This family consists of several poxvirus A3L or A2_5L proteins. The entry of vaccinia virus (VV) into the host cell results in the delivery of the double-stranded DNA genome-containing core into the\ cytoplasm. The core is disassembled, releasing the viral DNA in order to initiate VV cytoplasmic transcription and DNA replication.\ A3L protein is a part of that core PUBMED:10729126. The A2.5L gene product is an\ all-alpha-helical protein with a conserved Cxx(x)C motif in the N-terminal alpha-helix. It appears to be an integral component of intracellular virions PUBMED:12350360.

    \ 2761 IPR001286 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 59 comprises enzymes with only one known activity; galactocerebrosidase ().

    \ \

    Globoid cell leukodystrophy (Krabbe disease) is a severe, autosomal\ recessive disorder that results from deficiency of galactocerebrosidase\ (GALC) activity PUBMED:8661004, PUBMED:7601472, PUBMED:9434153. GALC is responsible for the lysosomal catabolism of\ certain galactolipids, including galactosylceramide and psychosine PUBMED:8661004.

    \ 7679 IPR012906 \

    This entry describes the N-terminal region of proteins that are similar to, and nclude, the product of the paaX gene of Escherichia coli (). PaaX is a transcriptional regulator that is always found in association with operons believed to be involved in the degradation of phenylacetic acid PUBMED:11260461. The gene product has been shown to bind to the promoter sites and repress their transcription PUBMED:10766858.

    \ 8120 IPR013234 \

    This domain is found on phosphatidylinositol N-acetylglucosaminyltransferase proteins. These proteins are involved in GPI anchor biosynthesis and are associated with the disease paroxysmal nocturnal haemoglobinuria PUBMED:12488505.

    \ 5858 IPR010317 \

    This family consists of putative cell surface proteins, from Firmicutes, of unknown function.

    \ 5570 IPR008551 \ This family is found in eukaryotes, prokaryotes and viruses and has no known function. has been found to be expressed during early embryogenesis in Mus sp. PUBMED:8268909.\ 2802 IPR000777 \ The entry of HIV requires interaction of viral GP120, an envelope glycoprotein with human\ T-cell surface glycoprotein CD4 and a chemokine receptor on the cell surface. Proteins\ belonging to this family are found in HIV types 1 and 2, and Simian Immunodeficiency virus\ (SIV).\ 1770 IPR001796 \

    Dihydrofolate reductase (DHFR) () catalyses the NADPH-dependent reduction of dihydrofolate to tetrahydrofolate, an essential step in de novo synthesis both of glycine and of purines and deoxythymidine phosphate (the precursors of DNA synthesis) PUBMED:2830673, and important also in the conversion of deoxyuridine monophosphate to deoxythymidine monophosphate. Although DHFR is found ubiquitously in prokaryotes and eukaryotes, and is found in all dividing cells, maintaining levels of fully reduced folate coenzymes, the catabolic steps are still not well understood PUBMED:3383852.

    \

    Bacterial species possesses distinct DHFR enzymes (based on their pattern of binding diaminoheterocyclic molecules), but mammalian DHFRs are highly similar PUBMED:500653. The active site is situated in the N-terminal half of the sequence, which includes a conserved Pro-Trp dipeptide; the tryptophan has been shown PUBMED:6815178 to be involved in the binding of substrate by the enzyme. Its central role in DNA precursor synthesis, coupled with its inhibition by antagonists such as trimethoprim and methotrexate, which are used as anti-bacterial or anti-cancer agents, has made DHFR a target of anticancer chemotherapy. However, resistance has developed against some drugs, as a result of changes in DHFR itself PUBMED:2601715.

    \ 6011 IPR010391 \

    This family of short proteins includes DNA-damage-inducible protein I (DinI) and related proteins. The SOS response, a set of cellular phenomena exhibited by eubacteria, is initiated by various causes that include DNA damage-induced replication arrest, and is positively regulated by the co- protease activity of RecA. Escherichia coli DinI, a LexA-regulated SOS gene product, shuts off the initiation of the SOS response when overexpressed in vivo. Biochemical and genetic studies indicated that DinI physically interacts with RecA to inhibit its co-protease activity PUBMED:12626715. The structure of DinI is known PUBMED:11152126.

    \ 2747 IPR001944 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 35 comprises enzymes with only one known activity; beta-galactosidase ().

    \ \

    Mammalian beta-galactosidase is a lysosomal enzyme (gene GLB1) which cleaves the terminal galactose from gangliosides, glycoproteins, and glycosaminoglycans and whose deficiency is the cause of the genetic disease Gm(1) gangliosidosis (Morquio disease type B).

    \ 687 IPR000718 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to the MEROPS peptidase family M13 (neprilysin family, clan MA(E)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA and the predicted active site residues for members of this family and thermolysin occur in the motif HEXXH PUBMED:7674922.

    \ \

    M13 peptidases are well-studied proteases found in a wide range of organisms including mammals and bacteria. In mammals they participate in processes such as cardiovascular development, blood-pressure regulation, nervous control of respiration, and regulation of the function of neuropeptides in the central nervous system. In bacteria they may be used for digestion of milk PUBMED:11223883, PUBMED:7674922. The family includes eukaryotic and prokaryotic oligopeptidases, as well as some of the proteins responsible for the molecular basis of the blood group antigens e.g. Kell PUBMED:7674922.

    \ \

    Neprilysin (), is another member of this group, it is variously known as common acute lymphoblastic leukemia antigen (CALLA), enkephalinase (gp100) and neutral endopeptidase metalloendopeptidase (NEP). It is a plasma membrane-bound mammalian enzyme that is able to digest biologically-active peptides, including enkephalins PUBMED:7674922. The zinc ligands of neprilysin are known and are analogous to those in thermolysin, a related peptidase PUBMED:7674922, PUBMED:8099556. Neprilysins, like thermolysin, are inhibited by phosphoramidon, which appears to selectively inhibit this family in mammals. The enzymes are all oligopeptidases, digesting oligo- and polypeptides, but not proteins PUBMED:7674922. Neprilysin consists of a short cytoplasmic domain, a membrane-spanning region and a large extracellular domain. The cytoplasmic domain contains a conformationally-restrained octapeptide, which is thought to act as a stop transfer sequence that prevents proteolysis and secretion PUBMED:7674922, PUBMED:3555489.

    \ \ \ 5128 IPR007965 \

    This family consists of several uncharacterised eukaryotic proteins of unknown function.

    \ 4995 IPR003951 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases correspond to MEROPS peptidase family C58 (clan CA). They are found in bacteria that include plant pathogens (Pseudomonas syringae), root nodule bacteria, and intracellular pathogens (e.g. Yersinia pestis, Haemophilus ducreyi, Pasteurella multocida, Chlamydia trachomatis) of animal hosts. The peptidase domain features a catalytic triad of Cys, His, and Asp. Sequences can be extremely divergent outside of a few well-conserved motifs. YopT, a virulence effector protein of Yersinia pestis, cleaves and releases host cell Rho GTPases from the membrane, thereby disrupting the actin cytoskeleton. Members of the family from pathogenic bacteria are likely to be pathogenesis factors PUBMED:12062101.

    \ \ Secretion of virulence factors in Gram-negative bacteria involves \ transportation of the protein across two membranes to reach the cell exterior. There have been four secretion systems described in \ animal enteropathogens such as Salmonella and Yersinia, with further sequence homologies in plant pathogens like Ralstonia and Erwinia PUBMED:9618447. \

    The type III secretion system is of great interest, as it is used to \ transport virulence factors from the pathogen directly into the host cell \ and is only triggered when the bacterium comes into close contact with\ the host. The protein subunits of the system are very similar to those of \ bacterial flagellar biosynthesis. However, while the latter forms a\ ring structure to allow secretion of flagellin and is an integral part of\ the flagella itself, type III subunits in the outer membrane translocate \ secreted proteins through a channel-like structure PUBMED:9618447.

    \

    Exotoxins secreted by the type III system do not possess a secretion signal, and are considered unique because of this \ PUBMED:9618447. Yersinia pestis secretes such a protein, YopT PUBMED:9746557. YopT \ is injected into the host cell upon contact, and is therefore considered to be a virulence factor. Haemophilus \ spp. express a similar toxin on their surface, a 76kDa\ antigen PUBMED:9746557.

    \ 204 IPR007006 \ The ALG10 protein from Saccharomyces cerevisiae encodes the alpha-1,2 glucosyltransferase of the endoplasmic reticulum. This protein has been characterised in rat as potassium channel regulator 1 PUBMED:9722534.\ 1288 IPR006995 \ This is one of the chains of the nonenzymatic component (CF(0) subunit) of the mitochondrial ATPase complex.\ 7931 IPR012632 \

    This family consists of the calcine family of scorpion toxins. The calcine family consists of Maurocalcine and Imperatoxin. These toxins have been shown to be potent effector of ryanodyne-sensitive calcium channel from skeletal muscles. These toxins are thus useful for dihydropyridine receptor/ryanodine receptor interaction studies PUBMED:10861934,PUBMED:11867448.

    \ 6115 IPR010437 \

    This family describes a small protein, always smaller than 100 amino acids, encoded in pathogenicity islands for bacterial type III secretion systems in various strains of Yersinia, Salmonella, and enteropathogenic Escherichia coli, as well as Chromobacterium violaceum and Citrobacter rodentium. Although strictly associated with type III secretion systems, this protein seems not yet to have been characterised as part of the apparatus or as an effector protein.

    \ 7069 IPR009877 \

    This family consists of several hypothetical bacterial proteins of around 90 residues in length. The function of this family is unknown.

    \ 1743 IPR002366 \

    Defensins are 2-6 kDa, cationic, microbicidal peptides active against many Gram-negative and Gram-positive bacteria, \ fungi, and enveloped viruses PUBMED:8528769, containing three pairs of intramolecular disulphide bonds PUBMED:12072367. On the basis of their size and pattern of\ disulphide bonding, mammalian defensins are classified into alpha, beta and theta categories. Alpha-defensins, which have been identified in humans, monkeys and several\ rodent species, are particularly abundant in neutrophils, certain macrophage populations and Paneth cells of the small intestine. Every mammalian species\ explored thus far has beta-defensins. In cows, as many as 13 beta-defensins exist in neutrophils. However, in other species, beta-defensins are more often produced by\ epithelial cells lining various organs (e.g. the epidermis, bronchial tree and genitourinary tract). Theta-defensins are cyclic and have so far only been identified in primate\ phagocytes.

    Defensins are produced constitutively and/or in response to microbial products or proinflammatory cytokines. Some defensins are also called corticostatins (CS) because \ they inhibit corticotropin-stimulated corticosteroid production. The mechanism(s) by which microorganisms are killed and/or inactivated by defensins is not understood completely. However, it is generally believed that killing is a\ consequence of disruption of the microbial membrane. The polar topology of defensins, with spatially separated charged and hydrophobic regions, allows them to\ insert themselves into the phospholipid membranes so that their hydrophobic regions are buried within the lipid membrane interior and their charged (mostly cationic)\ regions interact with anionic phospholipid head groups and water. Subsequently, some defensins can aggregate to form 'channel-like' pores; others might bind to and cover the microbial membrane in a 'carpet-like' manner. The net outcome is the disruption of membrane integrity and function,\ which ultimately leads to the lysis of microorganisms. Some defensins are synthesized as propeptides which may be relevant to this process - in neutrophils only the mature peptides have been identified but in Paneth cells, the propeptide is stored in vesicles PUBMED:12021776 and appears to be cleaved by trypsin on activation.

    \ 6716 IPR006384 \

    This group of sequences belong to the IB subfamily of the haloacid dehalogenase (HAD) superfamily of aspartate-nucleophile hydrolases. With exceptions from Bacillus subtilis and Clostridium acetabutylicum, the members of this group are all eukaryotic, spanning metazoa, plants and fungi.

    \ 2162 IPR007511 \ This is a family of uncharacterised bacterial proteins.\ 2832 IPR001992 \ A number of bacterial proteins, some of which are involved in a general secretion pathway (GSP) for the export of proteins (also called the type II pathway) PUBMED:8438237, have been found to be evolutionary related. These are proteins of about 400 amino acids that are highly hydrophobic and which are thought to be integral protein of the inner membrane.\ 3226 IPR006864 \ This repeated sequence element is found in the LMP group of surface-located membrane proteins of Mycoplasma hominis. The the number of repeats in the protein affects the tendency of cells to spontaneously aggregate. Agglutination may be an important factor in colonization. Non-agglutinating microorganisms might easily be distributed whereas aggregation might provide a better chance to avoid an antibody response since some of the epitopes may be buried PUBMED:7543881.\ 4832 IPR005338 \

    The proteins is this family are about 370 amino acids long and have no known function.

    \ 1236 IPR006689 \

    The small ADP ribosylation factor (Arf) GTP-binding proteins are major regulators of vesicle biogenesis in intracellular traffic PUBMED:12429613. They are the founding members of a growing family that includes Arl (Arf-like), Arp\ (Arf-related proteins) and the remotely related Sar (Secretion-associated and Ras-related) proteins. Arf proteins cycle between inactive GDP-bound and active GTP-bound forms that bind selectively to effectors. The classical structural GDP/GTP switch is characterized by conformational changes at the so-called switch 1 and switch 2 regions, which bind tightly to the gamma-phosphate of GTP but poorly or not at all to the GDP nucleotide. Structural studies of Arf1 and Arf6 have revealed that although these proteins feature the switch 1 and 2 conformational changes, they depart from other small GTP-binding proteins in that they use an additional, unique switch to propagate structural information from one side of the protein to the other.

    The GDP/GTP structural cycles of human Arf1 and Arf6 feature a unique conformational change that affects the beta2beta3 strands connecting switch 1 and switch 2 (interswitch) and also the amphipathic helical N-terminus. In GDP-bound\ Arf1 and Arf6, the interswitch is retracted and forms a pocket to which the N-terminal helix binds, the latter serving as a molecular hasp to maintain the inactive conformation. In the GTP-bound form of these proteins, the interswitch undergoes a two-residue register shift that pulls switch 1 and switch 2 up, restoring an active conformation that can bind GTP. In this conformation, the interswitch projects out of the protein and extrudes the N-terminal hasp by occluding its binding pocket.

    \ 4722 IPR003993 \

    Treacher Collins Syndrome (TCS) is an autosomal dominant disorder of\ craniofacial development, the features of which include conductive hearing \ loss and cleft palate PUBMED:9096354, PUBMED:9042910; it is the most common of the human mandibulo-facial dysostosis disorders PUBMED:9096354. The TCS locus has been mapped to human chromosome 5q31.3-32 and the mutated gene identified (TCOF1) PUBMED:9042910. To date, 35 mutations have been reported in TCOF1, all but one of which result in the introduction of a premature-termination codon into the predicted protein, Treacle. The observed mutational spectrum supports the hypothesis that TCS results from haploinsufficiency.

    \

    Treacle is a low complexity protein of 1,411 amino acids whose predicted\ protein structure contains a set of highly polar repeated motifs PUBMED:9096354. These motifs are common to nucleolar trafficking proteins in other species and are predicted to be phosphorylated by casein kinase. In concert with this observation, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localisation signals PUBMED:9096354. Throughout the open\ reading frame are found mutations in TCS families and several polymorphisms. It has thus been suggested that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development.

    \ 1783 IPR001158 \ Dishevelled (Dsh) protein is an important component of the Wnt signal-transduction pathway. It has three relatively conserved domains: DIX, PDZ and DEP. The DIX domain of Dvl-1 (a mammalian Dishevelled homolog) shares 37% identity with the C-terminal region of Axin. Dsh can interact with the Axin/APC/GSK3/beta-catenin complex, and may thus modulate its activity PUBMED:10330181.\

    The Wnt signaling pathway is conserved in various species from worms to\ mammals, and plays important roles in development, cellular proliferation,\ and differentiation. The molecular mechanisms by which the Wnt signal\ regulates cellular functions are becoming increasingly well understood. Wnt\ stabilizes cytoplasmic beta-catenin, which stimulates the expression of genes\ including c-myc, c-jun, fra-1, and cyclin D1. Axin and its homolog Axil are components of the Wnt signaling pathway that negatively regulate this pathway. Other components of the Wnt signaling pathway, including Dvl, glycogen synthase kinase-3beta (GSK-3beta), beta-catenin, and adenomatous polyposis coli (APC), interact with Axin, and the phosphorylation and stability of beta-catenin are regulated in the Axin complex. Axil has similar functions to Axin. Thus, Axin and Axil act as scaffold proteins in the Wnt signaling pathway, thereby modulating the Wnt-dependent cellular functions PUBMED:10647780.

    \ 2883 IPR005203 \

    Haemocyanins are copper-containing oxygen transport proteins found in the haemolymph of many \ invertebrates. They are divided into 2 main groups, arthropodan and molluscan. These have structurally \ similar oxygen-binding centres, which are similar to the oxygen-binding centre of tyrosinases \ PUBMED:, but their quaternary structures are arranged differently. The arthropodan proteins exist \ as hexamers comprising 3 heterogeneous subunits (a, b and c) and possess 1 oxygen-binding centre per \ subunit; and the molluscan proteins exist as cylindrical oligomers of 10 to 20 subunits and possess 7 \ or 8 oxygen-binding centres per subunit PUBMED:3207675. Although the proteins have similar amino acid \ compositions, the only real similarity in their primary sequences is in the region corresponding to the\ second copper-binding domain, which also shows similarity to the copper-binding domain of tyrosinases \ PUBMED:.

    \

    Larval storage proteins (LSP) PUBMED:2808410 are proteins from the hemolymph of insects,\ which may serve as a store of amino acids for synthesis of adult proteins. There are two classes of \ LSP's, arylphorins, which are rich in aromatic amino acids, and methionine-rich LSP's. LSP's forms \ hexameric complexes. LSP's are structurally related to arthropods hemocyanins.

    \ \ 5915 IPR010347 \

    Covalent intermediates between topoisomerase I and DNA can become dead-end complexes that lead to cell death. Tyrosyl-DNA phosphodiesterase can hydrolyse the bond between topoisomerase I and DNA PUBMED:10521354.

    \ 466 IPR011531 \

    This family contains Band 3 anion exchange proteins that exchange CL-/HCO3- such as . This family also includes cotransporters of Na+/HCO3- such as and a number of putative transporters from plants and fungi.

    \ 4651 IPR007713 \ This short repeat consists of the motif WXXh where X can be any residue and h is a hydrophobic residue. The repeat is named TMP after its occurrence in the tape measure protein (TMP). Tape measure protein is a component of phage tail and probably forms a beta-helix. Truncated forms of TMP lead to shortened tail fibres PUBMED:11040123. This repeat is also found in non-phage proteins where it may play a structural role.\ 5378 IPR008751 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \ This group of cysteine peptidases belong to MEROPS peptidase family C53 (clan C-). The active site residues occur in the order E, H, C in the sequence which is unlike that in any other family. They are unique to pestiviruses. The N-terminal cysteine peptidase (Npro) encoded by the bovine viral diarrhoea virus genome is responsible for the self-cleavage that releases the N terminus of the core protein. This unique protease is dispensable for viral replication, and its coding region can be replaced by a ubiquitin gene directly fused in frame to the core PUBMED:11711606, PUBMED:10864644, PUBMED:9499122, PUBMED:8972567.\ 2375 IPR001334 \

    The papillomavirus E6 oncoproteins are small zinc-binding proteins that share a conserved zinc-binding CXXC motif and do not have identified intrinsic enzymatic activity. E6 proteins are thought to act as adapter proteins, thereby altering the function of E6-associated cellular proteins. This model for E6 function is best supported by observations of human papillomavirus type 16 (HPV-16) E6 (16E6), which can alter the metabolism of the p53 tumor suppressor through association with a cellular E3 ubiquitin ligase called E6AP. HPV-16 E6 interacts with an 18-amino-acid sequence in E6AP, and in an as yet ill-defined fashion the E6AP-16E6 complex binds to p53, inducing the ubiquitin-dependent degradation of the trimolecular complex. 16E6 apparently functions as an adapter protein in the complex with p53, since E6AP does not interact with p53 in the absence of E6 and since the degradation of p53 requires both E6 and E6AP.

    \ \

    Despite the similarity in structure of the E6 oncoproteins, studies have indicated surprising biochemical diversity among E6 oncoproteins of different papillomavirus types. E6 from the cancer-associated human papillomaviruses (HPVs) complex with a cellular protein termed E6-AP and together with E6-AP bind to the p53 tumor suppressor protein thereby degrading p53 through ubiquitin-mediated proteolysis. E6 from the non-cancer-associated HPV types do not bind E6-AP or degrade p53. Bovine papilloma virus E6 (BE6) binds E6-AP but fails either to complex with p53 or to degrade associated proteins, implying that BE6 might transform cells through a mechanism different from that of the HPVs. In addition to targeting p53, E6 of both cancer-associated HPVs and BPV-1 have been shown to associate with a cellular-calcium-binding protein localized to the endoplasmic reticulum PUBMED:10623743, PUBMED:9151888.

    \ 6727 IPR010691 \

    This family consists of several WzyE proteins, which appear to be specific to Enterobacteria. Members of this family are described as putative ECA polymerases this has been found to be incorrect PUBMED:11673418. The function of this family is unknown.

    \ 5562 IPR008550 \ This family consists of several gammaherpesvirus proteins of unknown function.\ 4042 IPR000489 \

    All organisms require reduced folate cofactors for the synthesis of a variety of metabolites. Most microorganisms must synthesize folate de novo because they lack the active transport system of higher vertebrate cells that allows these organisms to use dietary folates. Proteins containing this domain include dihydropteroate synthase () as well as a group of methyltransferase enzymes including methyltetrahydrofolate, corrinoid iron-sulphur protein methyltransferase (MeTr) that catalyses a key step in the Wood-Ljungdahl pathway of carbon dioxide fixation.

    \ \

    Dihydropteroate synthase () (DHPS) catalyses the condensation of 6-hydroxymethyl-7,8-dihydropteridine pyrophosphate to para-aminobenzoic acid to form 7,8-dihydropteroate. This is the second step in the three-step pathway leading from 6-hydroxymethyl-7,8-dihydropterin to 7,8-dihydrofolate. DHPS is the target of sulphonamides, which are substrate analogues that compete with para-aminobenzoic acid. Bacterial DHPS (gene sul or folP) PUBMED:2123867 is a protein of about 275 to 315 amino acid residues that is either chromosomally encoded or found on various antibiotic resistance plasmids. In the lower eukaryote Pneumocystis carinii, DHPS is the C-terminal domain of a multifunctional folate synthesis enzyme (gene fas) PUBMED:1313386.

    \ 1558 IPR007576 \ CITED, CBP/p300-interacting transactivator with ED-rich tail, is characterised by a conserved 32-amino acid sequence at the C terminus. CITED protein does not bind DNA directly and is thought to function as a transcriptional co-activator PUBMED:11744733.\ 2545 IPR002535 \ The flaviviruses are small enveloped animal viruses containing a single\ positive strand genomic RNA PUBMED:2174669. The genome encodes one large ORF, a\ polyprotein which undergos proteolytic processing into mature viral \ peptide chains.\ This entry consists of a propeptide region of approximately 90 amino\ acids in length.\ 7442 IPR011471 \

    This is a family of hypothetical proteins found in Leptospira interrogans.

    \ 7413 IPR011446 \

    This is a family of proteins identified in Rhodopirellula baltica. The function is not known.

    \ 5048 IPR007372 \ Escherichia coli YceI is a base-induced periplasmic protein. Its function has not yet been characterised PUBMED:12107143.\ 773 IPR000494 \ The type-1 insulin-like growth-factor receptor (IGF-1R) and insulin receptor (IR) are closely related members of the tyrosine-kinase receptor superfamily . IR is essential for glucose homeostasis, whereas IGF-1R is involved in both normal growth and development and malignant transformation. Homologues of these\ receptors are found in animals as simple as cnidarians. The epidermal growth-factor receptor (EGFR) family is closely related to the IR family and has\ significant sequence identity to the first three domains of the extracellular portion of IGF-IR (L1-Cys-rich-L2). \

    The L domains each consist of a single-stranded right-handed beta-helix. The Cys-rich region is composed of eight disulphide-bonded modules, seven of which form a rod-shaped domain with modules associated in an unusual manner. The three domains surround a central space of sufficient size to accommodate a ligand molecule. Although the fragment (residues 1-462) does not bind ligand, many of the determinants responsible for hormone binding and ligand specificity map to this central site. This structure therefore shows how the IR subfamily might interact with their ligandsPUBMED:9690478.

    \

    A number of receptor systems have been implicated to play an important role in the\ development and progression of many human cancers. The epidermal growth\ factor (EGF) receptor tyrosine kinase family has been found to consistently play a\ leading role in tumor progression PUBMED:10579913.

    \ 819 IPR000040 \ The AML1 gene is rearranged by the t(8;21) translocation in acute myeloid\ leukemia PUBMED:7651838. The gene is highly similar to the Drosophila melanogaster segmentation \ gene runt and to the mouse transcription factor PEBP2 alpha subunit gene PUBMED:7651838.\ The region of shared similarity, known as the Runt domain, is responsible \ for DNA-binding and protein-protein interaction. \

    In addition to the highly-conserved Runt domain, the AML-1 gene product\ carries a putative ATP-binding site (GRSGRGKS), and has a C-terminal region\ rich in proline and serine residues. The protein (known as acute myeloid \ leukemia 1 protein, oncogene AML-1, core-binding factor (CBF), alpha-B \ subunit, etc.) binds to the core site, 5'-pygpyggt-3', of a number of\ enhancers and promoters.

    \

    The protein is a heterodimer of alpha- and beta-subunits. The alpha-subunit\ binds DNA as a monomer, and appears to have a role in the development of\ normal hematopoiesis. CBF is a nuclear protein expressed in numerous tissue\ types, except brain and heart; highest levels have been found to occur in \ thymus, bone marrow and peripheral blood.

    \ 6324 IPR009476 \

    This family consists of several bacterial putative membrane proteins.

    \ 7831 IPR012544 \

    This family contains many bacterial hypothetical proteins.

    \ 95 IPR003406 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    This is the glycosyltransferase family 14 , a family of two different beta-1,6-N-acetylglucosaminyltransferase enzymes, I-branching enzyme () and core-2 branching enzyme (). I-branching enzyme, an integral membrane protein, converts linear into branched poly-N-acetyllactosaminoglycans in the glycosylation pathway, and is responsible for the production of the blood group I-antigen during embryonic development PUBMED:8449405. Core-2 branching enzyme, also an integral membrane protein, forms crucial side-chain branches in O-glycans in the glycosylation pathway PUBMED:9915862.

    \ 5798 IPR010285 \

    The majority of members in this family have no known function. Most of the sequences in the family are described as hypothetical, however some are putative helicases and some have a nuclic acid binding fold.

    \ 2501 IPR003149 \ This family represents the small subunit of the Fe-only hydrogenases . The subunit is comprised of alternating random coil and alpha helical structures that encompass the large subunit in a novel protein fold PUBMED:10368269.\ 6250 IPR010929 \

    ABC (ATP-binding cassette) transporters are involved in the uptake of nutrients, the secretion of signalling molecules and toxins, and multidrug resistance PUBMED:12504680. The many different ABC transporters contain highly conserved sequences for ATP binding and hydrolysis (). In yeast, the PDR and CDR ABC transporters display extensive sequence homology, and confer resistance to several anti-fungal compounds by actively transporting their substrates out of the cell. These transporters have two homologous halves, each with an N-terminal intracellular hydrophilic region that contains an ATP-binding site, followed by a C-terminal membrane-associated region containing six transmembrane segments PUBMED:12709320. This entry represents a domain of the PDR/CDR ABC transporter comprising extracellular loop 3, transmembrane segment 6 and a linker region.

    \ 6602 IPR010641 \

    This entry represents the N terminus of a number of archaeal proteins which are putative transposases. Note that many family members are annotated as hypothetical proteins and a few as neutral proteinase.

    \ 6477 IPR010595 \

    This family consists of several short, hypothetical bacterial proteins of unknown function.

    \ 1250 IPR002556 \ This family consists of viral envelope proteins from the\ arterivirus genus; this includes porcine reproductive and \ respiratory virus (PRRSV) envelope protein GP3 and lactate \ dehydrogenase elevating virus (LDV) structural glycoprotein.\ Arteriviruses consists of positive ssRNA and do not have a DNA\ stage.\ 2612 IPR000559 \

    Formate--tetrahydrofolate ligase () (formyltetrahydrofolate synthetase) (FTHFS) is one of the enzymes\ participating in the transfer of one-carbon units, an essential element of various biosynthetic pathways. In many of\ these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF). In eukaryotes\ the FTHFS activity is expressed by a multifunctional enzyme, C-1-tetrahydrofolate synthase (C1-THF synthase), which\ also catalyzes the dehydrogenase and cyclohydrolase activities. Two forms of C1-THF synthases are known PUBMED:2836393,\ one is located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms the FTHFS domain\ consists of about 600 amino acid residues and is located in the C-terminal section of C1-THF synthase. In prokaryotes\ FTHFS activity is expressed by a monofunctional homotetrameric enzyme of about 560 amino acid residues PUBMED:2200509.

    \

    The crystal structure of N(10)-formyltetrahydrofolate synthetase from Moorella thermoacetica shows that the subunit is composed of three domains organized around three mixed beta-sheets. There are two cavities between adjacent domains. One of them was identified as the nucleotide binding site by\ homology modeling. The large domain contains a seven-stranded beta-sheet surrounded by helices on both sides. The second domain contains a five-stranded beta-sheet with two\ alpha-helices packed on one side while the other two are a wall of the active site cavity. The\ third domain contains a four-stranded beta-sheet forming a half-barrel. The concave side is\ covered by two helices while the convex side is another wall of the large cavity. Arg 97 is likely\ involved in formyl phosphate binding. The tetrameric molecule is relatively flat with the shape of\ the letter X, and the active sites are located at the end of the subunits far from the subunit interface PUBMED:10747779.

    \ 486 IPR004925 \ HpaB encodes part of the 4-hydroxyphenylacetate 3-hydroxylase from Escherichia coli PUBMED:8077235. HpaB is part of a heterodimeric\ enzyme that also requires HpaC. The enzyme is NADH-dependent and uses FAD as the redox chromophore. This family also includes\ PvcC, which may play a role in one of the proposed hydroxylation steps of pyoverdine chromophore biosynthesis PUBMED:10383985. \ \ 287 IPR006502 \

    This family of uncharacterised plant proteins are defined by a region found toward the C-terminus. This region is strongly conserved (greater than 30 % sequence identity between most pairs of members) but flanked by highly divergent regions including stretches of low-complexity sequence.

    \ 5288 IPR008721 \ This family consists of several eukaryotic origin recognition complex subunit 6 (ORC6) proteins. All DNA replication initiation is driven by a single conserved eukaryotic initiator complex termed he origin recognition complex (ORC). The ORC is a six protein complex. The function of ORC is reviewed in PUBMED:11914271.\ 3524 IPR003387 \ Nodulin is a plant protein of unknown function. It is induced during nodulation in legume roots after rhizobium infection.\ 4344 IPR006380 \

    This family of sequences represent sucrose phosphate phosphohydrolase (SPP) from plants and cyanobacteria PUBMED:11050182. SPP is a member of the Class IIB subfamily of the haloacid dehalogenase (HAD) superfamily of aspartate-nucleophile hydrolases. SPP catalyzes the final step in the biosynthesis of sucrose, a critically important molecule for plants. Sucrose phosphate synthase (SPS), the prior step in the biosynthesis of sucrose contains a domain which exhibits considerable similarity to SPP albeit without conservation of the catalytic residues. The catalytic machinery of the synthase resides in another domain. It seems likely that the phosphatase-like domain is involved in substrate binding, possibly binding both substrates in a "product-like" orientation prior to ligation by the synthase catalytic domain.

    \ 1591 IPR006692 \

    RET1P, the alpha-subunit of the coatomer complex in\ Saccharomyces cerevisiae, participates in membrane transport between the endoplasmic reticulum and Golgi apparatus. The protein contains six WD-40 repeat motifs in its N-terminal region PUBMED:8647451.

    \ 3570 IPR005308 \

    This domain has a flavodoxin-like fold, and is termed the "wing" domain because of its position in the overall 3D structure. Ornithine decarboxylase from Lactobacillus 30a (L30a OrnDC, ) is representative of the large, pyridoxal-5'-phosphate-dependent\ decarboxylases that act on lysine, arginine or ornithine. The crystal structure of the L30a OrnDC has been solved to 3.0 A resolution. Six dimers related by C6 symmetry compose the enzymatically active\ dodecamer (approximately 106 Da). Each monomer of L30a OrnDC can be described in terms of five sequential folding domains.\ The amino-terminal domain, residues 1 to 107, consists of a five-stranded beta-sheet termed the "wing" domain. Two wing domains of\ each dimer project inward towards the center of the dodecamer and contribute to dodecamer stabilization PUBMED:7563080.

    \ 6088 IPR009368 \

    This family consists of several hypothetical Staphylococcus aureus and Staphylococcus aureus phage phi proteins. The function of this family is unknown.

    \ 7492 IPR011657 \ This entry consists of nucleoside transport proteins. is a purine-specific Na+-nucleoside cotransporter localised to the bile canalicular membrane PUBMED:7775409. is a Na+-dependent nucleoside transporter selective for pyrimidine nucleosides and adenosine. It also transports the anti-viral nucleoside analogues AZT and ddC PUBMED:8027026. This entry covers the C terminus of this family of transporters.\ 3232 IPR007443 \ This family includes several bacterial outer membrane antigens, whose molecular function is unknown.\ 1941 IPR004239 \

    This group comprises proteins of unknown function from Borrelia burgdorferi, the causitive organism of Lyme disease.

    \ 6387 IPR010550 \

    This family consists of several bacterial 2'-deoxycytidine 5'-triphosphate deaminase proteins ().

    \ 2956 IPR000170 \ High potential iron-sulphur proteins (HiPIP) PUBMED:1917989 are a specific class of\ high-redox potential 4Fe-4S ferredoxins that functions in anaerobic electron\ transport and which occurs in photosynthetic bacteria and in Paracoccus\ denitrificans.\ The HiPIPs are small proteins which show significant variation in their\ sequences, their sizes (from 63 to 85 amino acids), and in their oxidation-\ reduction potentials. As shown in the following schematic representation the\ iron-sulphur cluster is bound by four conserved cysteine residues.\
    \
                               [ 4Fe-4S cluster]\
                               | |       |     |\
            xxxxxxxxxxxxxxxxxxxCxCxxxxxxxCxxxxxCxxxx\
    \
    'C': conserved cysteine involved in the binding of the iron-sulphur cluster.\
    
    \ 7276 IPR010892 \

    This family represents a conserved region approximately 140 residues long within secreted phosphoprotein 24 (Spp-24), which seems to be restricted to vertebrates. This is a non-collagenous protein found in bone that is related in sequence to the cystatin family of thiol protease inhibitors. This suggests that Spp-24 could function to modulate the thiol protease activities known to be involved in bone turnover. It is also possible that the intact form of Spp-24 found in bone could be a precursor to a biologically active peptide that coordinates an aspect of bone turnover PUBMED:7814406.

    \ 898 IPR001699 \

    Transcription factors of the T-box family are required both for early cell-fate decisions, such as those necessary for formation of\ the basic vertebrate body plan, and for differentiation and organogenesis PUBMED:12093383. The T-box is defined as the minimal region within the T-box protein that is both necessary and sufficient for sequence-specific\ DNA binding, all members of the family so far examined bind to the DNA consensus sequence TCACACCT. The T-box is a relatively large DNA-binding domain, generally comprising about a third of the entire protein (17-26 kDa).

    \

    These genes were uncovered on the basis of similarity to the DNA binding domain PUBMED:9504043 of murine Brachyury (T) gene product, which similarity is the defining feature of the family. The Brachyury gene is named for its phenotype, which was identified 70 years ago as a mutant mouse strain with a short blunted tail. The gene, and its paralogues, have become a well-studied model for the family, and hence much of what is known about the T-box family is derived from the murine Brachyury gene.

    \

    Consistent with its nuclear location, Brachyury protein has a sequence-specific DNA-binding activity and can act as a transcriptional regulator PUBMED:9503012. Homozygous mutants for the gene undergo extensive developmental anomalies, thus rendering the mutation lethal PUBMED:9395282. The postulated role of Brachyury is as a transcription factor, regulating the specification and differentiation of posterior mesoderm during gastrulation in a dose-dependent manner PUBMED:9504043.

    \

    T-box proteins tend to be expressed in specific organs or cell types, especially during development, and they are generally required for the development of\ those tissues, for example, Brachyury is expressed in posterior mesoderm and in the developing notochord, and it is required for\ the formation of these cells in micePUBMED:9196325.

    \ 2966 IPR005565 \

    Haemolysin (HlyA) and related toxins are secreted across both the cytoplasmic and outer membranes of Gram-negative bacteria in a process which proceeds without a periplasmic intermediate. HlyA is directed by an uncleaved C-terminal targeting signal and the HlyD and HlyB translocator proteins PUBMED:1419114.

    \ 4000 IPR002872 \ The proline oxidase/dehydrogenase is responsible for the first step in the conversion of proline to glutamate for use as a carbon and nitrogen source. The enzyme requires FAD as a cofactor, and is induced by proline.\ It is found in combination with in bacteria.\ 3904 IPR002914 \

    Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee\ King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,\ Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of\ the first three letters of the genus; a space; the first letter of the\ species name; a space and an arabic number. In the event that two species\ names have identical designations, they are discriminated from one another\ by adding one or more letters (as necessary) to each species designation.

    \

    The allergens in this family include allergens with the following designations: Lol p 5, Pha a 5, Phl p 5, Phl p 6, Phl p 11 and Poa p 9.

    \ \ \

    Grass pollen allergens are one of the major causes of type I allergies (including allergic rhinoconjunctivitis, allergic bronchial asthma and hayfever), afflicting 15-20% of a genetically predisposed population PUBMED:1702432. The predicted molecular masses of the known pollen allergen proteins range from 28.3 to 37.8 kD PUBMED:1702432. Northern analysis indicates that expression of the genes is confined to pollen tissue. A low level of similarity is observed between the Phl p 5 allergens and the N-terminal sequences of Poa pratensis p 9 proteins PUBMED:2051020 (see ).

    \

    The N-terminal region of P. pratensis p 9 has been shown to possess epitopes that cross-react with the acidic group V allergens of Phleum pratense PUBMED:2051020. Comparison of amino acid sequences of recombinant P. pratensis p 9 proteins with those of Lol p 5 isoallergens revealed a low level of similarity between the N-terminal sequences of these proteins PUBMED:2051020. A C-terminal region (), conserved in P. pratensis p 9 allergens, appears to contain epitopes unique to these proteins PUBMED:2051020.

    \ 3074 IPR000581 \ Two dehydratases, dihydroxy-acid dehydratase () (gene ilvD or ILV3) and 6-phosphogluconate\ dehydratase () (gene edd) have been shown to be evolutionary related PUBMED:1624451. Dihydroxy-acid\ dehydratase catalyzes the fourth step in the biosynthesis of isoleucine and valine, the dehydratation of\ 2,3-dihydroxy-isovaleic acid into alpha-ketoisovaleric acid. 6-Phosphogluconate dehydratase catalyzes the\ first step in the Entner-Doudoroff pathway, the dehydratation of 6-phospho-D-gluconate into \ 6-phospho-2-dehydro-3-deoxy-D-gluconate. Another protein containing this signature is the Escherichia coli hypothetical protein\ yjhG. The N-terminal part of the proteins contains a cysteine that could be involved in the binding of a\ 2Fe-2S iron-sulphur cluster PUBMED:8299945.\ 3171 IPR003767 \

    The malate dehydrogenase (MDH) of some extremophilies is more similar to the L-lactate dehydrogenases (L-LDH) from various sources than to other MDHs PUBMED:8476859.

    \ \

    This family consists of bacterial and archaeal malate/L-lactate dehydrogenases. The archaebacterial malate dehydrogenase , deviates from the eubacterial and eukaryotic enzymes having a low selectivity for the coenzyme (NAD(H) or NADP(H)) and catalyzing the reduction of oxalacetate to malate more efficiently than the reverse reaction PUBMED:2110059.

    \ 2573 IPR004213 \ The flt3 (fms-related tyrosine kinase 3) ligand is a short chain cytokine with a 4 helical bundle fold. It is a type I membrane protein which stimulates the proliferation of of early hematopoeitic cells, and synergises well with other colony stimulating factors and interleukins.\ 4896 IPR000015 \ In Gram-negative bacteria the biogenesis of fimbriae (or pili) requires a two-\ component assembly and transport system which is composed of a periplasmic\ chaperone (see ) and an outer membrane protein which has been\ termed a molecular 'usher' PUBMED:7909802, PUBMED:7906265, PUBMED:7906046.

    The usher protein is rather large (from 86 to\ 100 Kd) and seems to be mainly composed of membrane-spanning beta-sheets, a\ structure reminiscent of porins. \ Although the degree of sequence similarity of these proteins is not very high\ they share a number of characteristics. One of these is the presence of two pairs\ of cysteines, the first one located in the N-terminal part and the second\ at the C-terminal extremity that are probably involved in disulphide bonds.\ The best conserved region is located in the central part of these proteins.

    \ 2372 IPR003316 \ The mammalian transcription factor E2F plays an important role in regulating the\ expression of genes that are required for passage through the cell cycle. Multiple E2F family members have been identified that bind to DNA as heterodimers, interacting with proteins known as DP - the dimerisation partners PUBMED:7739537.\ 4099 IPR003435 \

    The RbcX protein has been identified as having a possible chaperonin-like function PUBMED:9642201. The rbcX gene is juxtaposed to and cotranscribed with rbcL and rbcS encoding RubisCO in Anabaena sp. CA. RbcX has been shown to possess a chaperonin-like function assisting correct folding of RubisCO in Escherichia coli expression studies and is needed for RubisCO to reach its maximal activity PUBMED:9171433.

    \ 5859 IPR009263 \

    This entry represents a novel motif designated as SERTA (for SEI-1, RBT1, and TARA), corresponding to the largest conserved region among TRIP-Br proteins PUBMED:11861561. The function of this motif is uncertain, but the CDK4-interacting segment of p34SEI-1 (amino acid residues 44-161) includes most of the SERTA motif PUBMED:10580009.

    \ 3665 IPR003875 \ This family consists of the polymerase accessory protein C from members of the paramyxoviridae.\ 5243 IPR008746 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of cysteine peptidases correspond to MEROPS peptidase family C36 (clan CA). The type example is beet necrotic yellow vein furovirus-type papain-like endopeptidase (beet necrotic yellow vein virus), which is involved in processing the viral polyprotein.

    \ 5367 IPR008781 \

    This family of proteins contain the major surface glycoprotein of turkey rhinotracheitis virus (TRTV), avian pneumovirus (APV), the aetiological agent of turkey rhinotracheitis (TRT), and other Metapneumoviruses. The major surface glycoprotein is the attachment (G) protein, which, by analogy with other respiratory\ syncytial viruses (RSV), has been proposed to be responsible for virus binding to its cell receptor. The APV G gene and its predicted protein have several features in common with their RSV counterparts. Both G proteins are type II glycoproteins and both the RSV G and APV G proteins are heavily O-glycosylated. In both RSV and APV, the G protein is the most variable protein and is a major target for neutralizing antibodies PUBMED:11038385.

    \ 5185 IPR008022 \

    DicB is part of the dic operon, which resides on cryptic prophage Kim. Under normal\ conditions, expression of dicB is actively repressed. When expression is induced, however, cell\ division rapidly ceases, and this division block is dependent on MinC with which it interacts\ PUBMED:12003935.

    \ 4547 IPR002994 \

    The surfeit locus 1 gene (SURF1 or surf-1) encodes a conserved protein of\ about 300 amino-acid residues that seems to be involved in the biogenesis of\ cytochrome c oxidase PUBMED:9843204. Vertebrate SURF1 is evolutionary related to yeast\ protein SHY1. There seems to be two transmembrane regions in these proteins,\ one in the N-terminal, the other in the C-terminal.\ Rickettsia prowazekii protein RP733 is also a member of this protein family.

    \ 1826 IPR007357 \ This family appears to be related to DNA photolyases.\ 403 IPR000679 \ A number of transcription factors (including erythroid-specific transcription factor and nitrogen regulatory\ proteins), specifically bind the DNA sequence (A/T)GATA(A/G) PUBMED:2249770 in the regulatory regions of genes.\ They are consequently termed GATA-binding transcription factors. The interactions occur via highly-conserved\ zinc finger domains in which the zinc ion is coordinated by 4 cysteine residues PUBMED:2776214, PUBMED:8332909.\ NMR studies have shown the core of the zinc finger to comprise 2 irregular anti-parallel beta-sheets and an\ alpha-helix, followed by a long loop to the C-terminal end of the finger. The N-terminal part, which includes\ the helix, is similar in structure, but not sequence, to the N-terminal zinc module of the glucocorticoid\ receptor DNA-binding domain. The helix and the loop connecting the 2 beta-sheets interact with the major\ groove of the DNA, while the C-terminal tail wraps around into the minor groove. It is this tail that is the\ essential determinant of specific binding. Interactions between the zinc finger and DNA are mainly hydrophobic,\ explaining the preponderance of thymines in the binding site; a large number of interactions with the\ phosphate backbone have also been observed PUBMED:8332909.Two GATA zinc fingers are found in the GATA\ transcription factors. However there are several proteins which only contains a single copy of the domain.\ 4800 IPR004289 \ Members of this family are functionally uncharacterised proteins from herpesviruses.\ 3836 IPR006848 \ This family represents a number of putative transcription repressor proteins found in several Lactococcus bacteriophages. Horizontal transfer may account for the presence of similar proteins in Lactococcus PUBMED:11337471.\ 4597 IPR005335 \

    Packaging of double-stranded viral DNA concatemers requires interaction of the prohead with virus DNA. This process is mediated by a phage-encoded DNA recognition and terminase protein. The terminase enzymes described so far, which are hetero-oligomers composed of a small and a large subunit, do not have a significant level of sequence homology. The small terminase subunit is thought to form a nucleoprotein structure that helps to position the terminase large subunit at the packaging initiation site PUBMED:2679356.

    \ 1936 IPR003864 \ This domain is found in a family of hypothetical transmembrane proteins none of which have any known function, the aligned region is at 538 residues at maximum length.\ 6242 IPR010488 \

    This family consists of several bacterial zeta toxin proteins. Zeta toxin is thought to be part of a postregulational killing system in bacteria. It relies on antitoxin/toxin systems that secure stable inheritance of low and medium copy number plasmids during cell division and kill cells that have lost the plasmid PUBMED:12571357.

    \ 4814 IPR002033 \

    Proteins encoded by the mttABC operon (formerly yigTUW), mediate a novel Sec-independent membrane targeting and translocation system in Escherichia coli that interacts with cofactor-containing redox proteins having a S/TRRXFLK "twin arginine" leader motif. This family contains the Escherichia coli mttB gene (TATC) PUBMED:9546395.

    \ \

    A functional Tat system or Delta pH-dependent pathway requires three integral membrane proteins: TatA/Tha4, TatB/Hcf106 and TatC/cpTatC. The TatC protein is essential for the function of both pathways. It might be involved in twin-arginine signal peptide recognition, protein translocation and proton translocation. Sequence analysis predicts that TatC contains six transmembrane helices (TMHs), and experimental data confirmed that N and C termini of TatC or cpTatC are exposed to the cytoplasmic or stromal face of the membrane. The cytoplasmic N terminus and the first cytoplasmic loop region of the Escherichia coli TatC protein are essential for protein export. At least two TatC molecules co-exist within each Tat translocon PUBMED:9649434, PUBMED:12163163.

    \ 7998 IPR012980 \

    This domain is found in a novel family of nucleolar proteins PUBMED:15112237.

    \ 4405 IPR001563 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S10 (clan SC). The type example is carboxypeptidase Y from Saccharomyces cerevisiae PUBMED:7845208.

    All known carboxypeptidases are either metallo carboxypeptidases or serine carboxypeptidases (EC 3.4.16.5 and EC 3.4.16.6). The catalytic activity of the serine carboxypeptidases, like that of the trypsin family serine proteases, is provided by a charge relay system involving an aspartic acid residue hydrogen-bonded to a histidine, which is itself hydrogen-bonded to a serine PUBMED:2324088. The sequences surrounding the active site serine and histidine residues are highly conserved in all the serine carboxypeptidases.

    \ \ 4960 IPR004603 \

    All proteins in this family for which functions are known are G:T mismatch endonucleases that function in a specialized mismatch repair process used usually to repair G:T mismatches in specific sections of the genome. G:T mismatches are caused by deamination of 5-methylcytosine in DNA, and can lead to C-to-T transition mutations if not repaired. Vsr (very short patch repair protein) repairs the mismatches in favour of the G-containing strand. In Escherichia coli, this endonuclease nicks double-stranded DNA within the sequence CT(AT)GN or NT(AT)GG next to the thymidine residue, which is mismatched to 2'-deoxyguanosine. The incision is mismatch-dependent and strand specific.

    \ 5107 IPR007944 \

    This family consists of several bacterial flagellar transcriptional activator (FlhC) proteins. FlhC\ combines with FlhD to form a regulatory complex in Escherichia coli,\ this complex has been shown to be a global regulator involved in many cellular processes as well as\ a flagellar transcriptional activator PUBMED:11287152.

    \ 3713 IPR005312 \

    This is a small family of proteins of unknown function found in the metazoa.

    \ 2422 IPR000941 \

    Enolase (2-phospho-D-glycerate hydrolase) is an essential glycolytic enzyme that catalyses the interconversion of 2-phosphoglycerate and phosphoenolpyruvate PUBMED:1859865, PUBMED:1840492. In vertebrates, there are 3 different, tissue-specific isoenzymes, designated alpha, beta and gamma. Alpha is present in most tissues, beta is localised in muscle tissue, and gamma is found only in nervous tissue. The functional \ enzyme exists as a dimer of any 2 isoforms. In immature organs and in adult liver, it is usually an alpha homodimer, in adult skeletal muscle, a beta homodimer, and in adult neurons, a gamma homodimer. In developing muscle, it is usually an alpha/beta heterodimer, and in the developing nervous system, an \ alpha/gamma heterodimer PUBMED:3390159. The tissue specific forms display minor kinetic differences. Tau-crystallin, one of the major lens proteins in some fish, reptiles and birds, has been shown PUBMED:3589669 to be evolutionary related to enolase.

    \

    Neuron-specific enolase is released in a variety of neurological diseases, such as multiple sclerosis and after seizures or acute stroke. Several tumour cells have also been found positive for neuron-specific enolase. Beta-enolase deficiency is associated with glycogenosis type XIII defect.

    \ 1703 IPR003321 \ The enzyme cytochrome c nitrite reductase (c552) catalyses the six-electron reduction of nitrite to ammonia as one of the key steps in the biological nitrogen cycle, where it participates in the anaerobic energy metabolism of dissimilatory nitrate ammonification. Cytochrome c nitrite reductase from Sulfurospirillum deleyianum is a functional dimer, with 10 close-packed haem groups of type c and an unusual lysine-coordinated high-spin haem at the active site PUBMED:10440380.\ 169 IPR002474 \ The carbamoyl-phosphate synthase domain is in the N terminus of the\ protein.\ Carbamoyl-phosphate synthase catalyses the ATP-dependent synthesis of\ carbamoyl-phosphate from glutamine or ammonia and bicarbonate. This \ important enzyme initiates both the urea cycle and the biosynthesis \ of arginine and/or pyrimidines PUBMED:1972379.\ The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a \ heterodimer of a small and large chain. The small chain promotes\ the hydrolysis of glutamine to ammonia, which is used by the large\ chain to synthesize carbamoyl phosphate.\ 2268 IPR006938 \

    This family consists of uncharacterised or hypothetical bacterial proteins.

    \ 3135 IPR003101 \

    The nuclear factor CREB activates transcription of target genes in part through direct interactions with the KIX domain of the coactivator CBP in a\ phosphorylation-dependent manner PUBMED:9413984. This provides a model for\ activator:coactivator interactions. The KIX domain of CBP also binds to transactivation domains of other nuclear factors including Myb and Jun.

    \ \ 6592 IPR010634 \

    This family consists of several hypothetical proteins of around 250 residues in length, which are found in both plants and bacteria. The function of this family is unknown.

    \ 6455 IPR009534 \

    This family consists of several short, hypothetical bacterial proteins of unknown function.

    \ 7378 IPR011432 \

    This domain is found duplicated in proteins of unknown function. The proteins typically also contain leucine-rich repeats.

    \ 2271 IPR006865 \ This domain represents a region of several plant proteins of unknown function. A C2H2 zinc finger is predicted in this region in some family members, but the spacing between the cysteine residues is not conserved throughout the family.\ 1889 IPR003741 \

    This entry describes proteins of unknown function.

    \ 4088 IPR004318 \

    Members of this family are found in the parasite Babesia bigemina. Other rhoptry-associated proteins are found in Plasmodium falciparum but these do not belong to this family. Animal infection with Babesia bigemina may produce a pattern similar to human malaria PUBMED:10614497. Rhoptry organelles form part of the apical complex in apicomplexan parasites.

    \

    Rhoptry-associated proteins are\ antigenic, and generate partially protective immune responses in infected mammals. Thus RAPs are among the targeted\ vaccine antigens for babesial (and malarial) parasites. However, RAP-1 proteins are encoded by by a multigene family;\ thus RAP-1 proteins are polymorphic, with B and T cell epitopes that are conserved among strains, but not across\ species PUBMED:9662706, PUBMED:9476795, PUBMED:9529082. Antibodies to Babesia bigemina RAP-1 may also be helpful in the serological detection of Babesia bigemina infections PUBMED:10364599.

    \ 1958 IPR005489 \

    This family of proteins is of unknown function.

    \ 5621 IPR008851 \ Transcription initiation factor IIF, alpha subunit (TFIIF-alpha) or RNA polymerase II-associating protein 74 (RAP74) is the large subunit of transcription factor IIF (TFIIF), which is essential for accurate initiation and stimulates elongation by RNA polymerase II PUBMED:12354769.\ 913 IPR000672 \ Enzymes that participate in the transfer of one-carbon units require the coenzyme tetrahydrofolate (THF).\ Various reactions generate one-carbon derivatives of THF, which can be interconverted between different\ oxidation states by methylene-THF dehydrogenase (), methenyl-THF cyclohydrolase ()\ and formyl-THF synthetase () PUBMED:2541774, PUBMED:8485162. The dehydrogenase and cyclohydrolase\ activities are expressed by a variety of multifunctional enzymes, including the tri-functional eukaryotic\ C1-tetrahydrofolate synthase PUBMED:2541774; a bifunctional eukaryotic mitochondrial protein; and the\ bifunctional Escherichia coli folD protein PUBMED:2541774, PUBMED:8485162. Methylene-tetrahydrofolate dehydrogenase and\ methenyltetrahydrofolate cyclo-hydrolase share an overlapping active site PUBMED:2541774, and as such are\ usually located together in proteins, acting in tandem on the carbon-nitrogen bonds of substrates other\ than peptide bonds.\ 1401 IPR006832 \

    This is a family of herpes virus proteins of unknown function.

    \ 3005 IPR013126 \ A family of heat shock proteins, the hsp70 proteins have an average molecular weight of 70 kDa PUBMED:2686623, PUBMED:2944601, PUBMED:3282176. In most species,\ there are many proteins that belong to the hsp70 family. Some of these are only\ expressed under stress conditions (strictly inducible), while some are present in cells\ under normal growth conditions and are not heat-inducible (constitutive or cognate) PUBMED:2143562, PUBMED:2841196. Hsp70 proteins can be found in different cellular compartments\ (nuclear, cytosolic, mitochondrial, endoplasmic reticulum, etc...).\

    Little is known of the function of hsp70 proteins. Some evidence suggests that\ the constitutive members have a role in the disassembly of clathrin cages PUBMED:2143562, and\ may also participate in the post-translational transmembrane targetting of proteins to cellular\ organelles. No specific activities or associations have been found for the inducible members\ PUBMED:2143562, although it has been suggested that they may accept incoming precursor proteins,\ keep them unfolded, then pass them on to the hsp60/hsp10 (cpn60/cpn10) complex for folding and\ assembly.

    \ 550 IPR002921 \

    Triglyceride lipases are lipolytic enzymes that hydrolyse ester linkages of\ triglycerides PUBMED:3147715. Lipases are widely distributed in animals, plants and prokaryotes. This family of lipases have been called Class 3 as they are not closely related to other lipase families.

    \ 2088 IPR007353 \ This family of uncharacterised proteins is known as YDFR family\ 7387 IPR011491 \

    This domain is found in several bacterial FlaE flagellar proteins. These proteins are part of the flageller basal body rod complex.

    \ 1431 IPR004005 \ Caliciviruses are positive-stranded ssRNA viruses that cause gastroenteritis PUBMED:1840711. The calicivirus genome contains two open reading frames, ORF1 and ORF2 PUBMED:8892921, PUBMED:8642693. ORF1 encodes a non-structural polypeptide, which has RNA helicase, cysteine protease and RNA polymerase activity. The regions of the poly-protein in which these activities lie are similar to proteins produced by the picornaviruses PUBMED:8892921, PUBMED:1551442. ORF2 encodes a structural protein PUBMED:8892921. This signature finds ORF2, the structural coat protein. Two different families of caliciviruses can be distinguished on the basis of sequence similarity, namely those classified as small round structured viruses (SRSVs) and those classed as non-SRSVs.\ 3693 IPR000072 \ Platelet-derived growth factor (PDGF) PUBMED:2546599, PUBMED:1425569 is a potent mitogen for cells of\ mesenchymal origin, including smooth muscle cells and glial cells. In both mouse and human, the PDGF signalling network consists of four ligands, PDGFA-D, and two receptors, PDGFRalpha and PDGFRbeta. All PDGFs function as secreted, disulfide-linked\ homodimers, but only PDGFA and B can form functional heterodimers. PDGFRs also function as homo- and heterodimers. All known PDGFs have characteristic 'PDGF domains',\ which include eight conserved cysteines that are involved in inter- and intramolecular bonds.\ Alternate splicing of the A chain transcript can give rise to two different\ forms that differ only in their C-terminal extremity. The transforming protein\ of simian sarcoma virus (SSV), encoded by the v-sis oncogene, is derived from the B chain of PDGF.\

    PDGFs are mitogenic during early developmental stages, driving the proliferation of undifferentiated mesenchyme and some progenitor populations. During later maturation stages, PDGF signalling has been implicated in tissue remodelling and cellular differentiation, and in inductive events involved in patterning and morphogenesis. In addition to driving\ mesenchymal proliferation, PDGFs have been shown to direct the migration, differentiation and function of a variety of specialised mesenchymal and migratory cell types, both during development and in the\ adult animal PUBMED:12952899. Other growth factors in this family include vascular endothelial growth factors B and C (VEGF-B, VEGF-C) PUBMED:8637916, PUBMED:8617204 which are active in angiogenesis and endothelial cell growth, and placenta growth factor (PlGF) which is also active in angiogenesis PUBMED:7681160.

    \

    PDGF is structurally related to a number of other growth factors which also form disulphide-linked homo- or heterodimers.

    \ 1409 IPR004915 \

    This family represents the Bunyavirus NS-S family. Bunyavirus has three genomic segments: small (S), middle-sized (M), and large (L).\ The S segment encodes the nucleocapsid and a non-structural protein. The M segment codes for two glycoproteins, G1 and G2, and\ another non-structural protein (NSm). The L segment codes for an RNA polymerase.

    \ \ 8061 IPR013248 \

    This family of proteins are membrane localised chaperones that are required for correct plasma membrane localisation of amino acid permeases (AAPs) PUBMED:15623581. Shr3 prevents AAPs proteins from aggregating and assists in their correct folding. In the absence of Shr3, AAPs are retained in the ER.

    \ 5945 IPR009301 \

    This family consists of several hypothetical proteins from Escherichia coli, Salmonella typhi, Shigella flexneri and Proteus vulgaris. The function of this family is unknown.

    \ 5254 IPR008397 \ This family contains several bacterial alginate lyase proteins. Alginate is a family of 1-4-linked copolymers of beta -D-mannuronic acid (M) and alpha -L-guluronic acid (G). It is produced by brown algae and by some bacteria belonging to the genera Azotobacter and Pseudomonas. Alginate lyases catalyse the depolymerisation of alginates by beta -elimination, generating a molecule containing 4-deoxy-L-erythro-hex-4-enepyranosyluronate at the nonreducing end PUBMED:9683471.\ 3538 IPR004850 \

    Agrin is a multidomain heparan sulphate proteoglycan, that is a key organizer for the induction of postsynaptic specializations at the\ neuromuscular junction. Binding of agrin to basement membranes requires the amino terminal (NtA) domain PUBMED:9321698. This region mediates\ high affinity interaction with the coiled-coil domain of laminins. The binding of agrin to laminins via the NtA domain is subject to\ tissue-specific regulation. The NtA domain-containing form of agrin is expressed in non-neuronal cells or in neurons that project to\ non-neuronal cell such as motor neurons. The NtA domain forms the most N-terminal part, followed by 9 Kazal-like domains and 2 LE domains. The C-terminal part consists of a SEA domain, 4 EGF-like domains and 3 Laminin G domains, responsible for the clustering of acetylcholine receptors PUBMED:11473262.

    \

    \ Tertiairy structures show that the NtA domain folds as a beta-barrel core flanked by N- and C-terminal helical regions. The core of the domain consists of 5 beta-strands that form 2 beta-sheets. The structure belongs to the OB fold family and shows similarity with the protease inhibition domain of TIMP-1, suggesting alternative functions for agrin in addition to synaptogenic activity PUBMED:11473262. Residues Leu 117 and Val 124 in helix 3 of the NtA domain are essential for binding to the laminin gamma1 chain PUBMED:12554653.

    \ \ \ 4642 IPR007725 \ The timeless (tim) gene is essential for circadian function in Drosophila. Putative homologues of Drosophila tim have been identified in both mice and humans (mTim and hTIM, respectively). Mammalian TIM is not the true orthologue of Drosophila TIM, but is the likely orthologue of a fly gene, timeout (also called tim-2) PUBMED:11237000. mTim has been shown to be essential for embryonic development, but does not have substantiated circadian function PUBMED:10903565. Some family members contain a SANT domain in this region.\ 2750 IPR002772 \

    Glycoside hydrolase family 3 comprises enzymes with a number of known activities; beta-glucosidase (); beta-xylosidase (); N-acetyl beta-glucosaminidase (); glucan beta-1,3-glucosidase (); cellodextrinase(); exo-1,3-1,4-glucanase ().

    \ \

    These enzymes are two-domain globular proteins that are N-glycosylated at three sites PUBMED:10368285. This domain is often C-terminal to the glycoside hydrolase family 3, N terminal domain .

    \ 5772 IPR010270 \

    This family consists of several phage small terminase subunit proteins as well as some related bacterial sequences PUBMED:1837355.

    \ 2470 IPR001558 \

    Human immunodeficiency virus (HIV) negative factor (Nef protein) accelerates virulent\ progression of acquired immunodeficiency syndrome (AIDS) by its interaction with specific\ cellular proteins involved in signal transduction and host cell activation. Nef has been shown\ to bind specifically to a subset of the Src family of kinases PUBMED:9351809.

    \ \ 2450 IPR004922 \

    Trypanosoma brucei is the causative agent of sleeping sickness in humans and nagana in cattle. The parasite lives extracellularly in the blood and tissue fluids of the mammalian host, and is transmitted by the bite of infected tsetse. Each variant surface glycoprotein (Vsg) expression site (ES) in bloodstream-form Trypanosoma brucei is a polycistronic transcription unit containing several distinct expression site-associated genes (esag), in addition to a single vsg gene. They are co-transcribed with the gene encoding the VSG protein, forming the surface coat of the parasite.

    ESAG1 genes from different ESs encode a highly polymorphic family of membrane-associated glycoproteins, whose function is unknown PUBMED:8892306.

    \ 5370 IPR008479 \ This family contains several uncharacterised plant proteins.\ 4463 IPR003127 \

    Sorbin is an active peptide present in the digestive tract, where it has pro-absorptive and anti-secretory effects in different parts of the intestine, including the ability to decrease VIP (vasoactive intestinal peptide) and cholera toxin-induced secretion. It is expressed in some intestinal and pancreatic endocrine tumours in humans PUBMED:10704721.

    \

    Sorbin-homology domains are found in adaptor proteins such as vinexin, CAP/ponsin and argBP2, which regulate various cellular functions, including cell adhesion, cytoskeletal organisation, and growth factor signalling PUBMED:11937713. In addition to the sorbin domain, these proteins contain three SH3 (src homology 3) domains. The sorbin homology domain mediates the interaction of vinexin and CAP with flotillin, which is crucial for the localisation of SH3-binding proteins to the lipid raft, a region of the plasma membrane rich in cholesterol and sphingolipids that acts to concentrate certain signalling molecules. The sorbin homology domain of adaptor proteins may mediate interactions with the lipid raft that are crucial to intracellular communication PUBMED:11481476.

    \ \ 5983 IPR010376 \

    This family consists of several short bacterial proteins and one sequence () from Oryza sativa. The function of this family is unknown.

    \ 5250 IPR008828 \ This family consists of several stress-activated map kinase interacting protein 1 (MAPKAP1 OR SIN1) sequences. The Schizosaccharomyces pombe Sty1/Spc1 mitogen-activated protein (MAP) kinase is a member of the eukaryotic stress-activated MAP kinase (SAPK) family. Sin1 interacts with Sty1/Spc1. Cells lacking Sin1 display many, but not all, of the phenotypes of cells lacking the Sty1/Spc1 MAP kinase including sterility, multiple stress sensitivity and a cell-cycle delay. Sin1 is phosphorylated after stress but this is not Sty1/Spc1-dependent PUBMED:10428959.\ 3729 IPR005074 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue.

    \ \

    Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad PUBMED:11517925:

    \ \ \

    This group of sequences defined by this cysteine peptidase domain belong to the MEROPS peptidase family C39 (clan CA). It is found in a wide range of ABC transporters, which are maturation proteases for peptide bacteriocins, the proteolytic domain residing in the N-terminal region of the protein PUBMED:7674922. A number of the proteins are classified as non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    \ \ \

    Lantibiotic and non-lantibiotic bacteriocins are synthesised as precursor peptides containing N-terminal extensions (leader peptides) which are cleaved off during maturation. Most non-lantibiotics and also some lantibiotics have leader peptides of the so-called double-glycine type. These leader peptides share consensus sequences and also a common processing site with two conserved glycine residues in positions -1 and -2. The double- glycine-type leader peptides are unrelated to the N-terminal signal sequences which direct proteins across the cytoplasmic membrane via the sec pathway. Their processing sites are also different from typical signal peptidase cleavage sites, suggesting that a different processing enzyme is involved.

    \ \ \ 4600 IPR004979 \

    Activator protein-2 (AP-2) transcription factors constitute a family of\ closely related and evolutionarily conserved proteins that bind to the DNA \ consensus sequence GCCNNNGGC and stimulate target gene transcription\ PUBMED:2010091, PUBMED:1998122.\ Four different isoforms of AP-2 have been identified in mammals, termed AP-2\ alpha, beta, gamma and delta. Each family member shares a common structure, \ possessing a proline/glutamine-rich domain in the N-terminal region, which \ is responsible for transcriptional activation PUBMED:2010091, and a helix-span-helix\ domain in the C-terminal region, which mediates dimerisation and site-specific DNA binding PUBMED:1998122.\

    \

    The AP-2 family have been shown to be critical regulators of gene expression\ during embryogenesis. They regulate the development of facial prominence and\ limb buds, and are essential for cranial closure and development of the lens\ PUBMED:11137286; they have also been implicated in tumorigenesis. AP-2 protein \ expression levels have been found to affect cell transformation, tumour \ growth and metastasis, and may predict survival in some types of cancer\ PUBMED:9632718, PUBMED:10864206\

    \ \

    \ 2172 IPR007468 \ This is a bacterial protein of unknown function.\ 3574 IPR000498 \

    The ompA-like transmembrane domain is present in a number of different outer membrane proteins of several Gram-negative bacteria. Many of the proteins having this domain in the N-terminal also have the conserved bacterial outer membrane protein domain at the C terminus. The outer membrane protein A of Escherichia coli (OmpA), is one of the most studied proteins in this group PUBMED:10554771. It has a multifunctional role. OmpA is required for the action of colicins K and L and for the stabilization of mating aggregates in conjugation. It also serves as a receptor for a number of T-even like phages and can act as a porin with low permeability that allows slow penetration of small solutes PUBMED:1974149.

    \ \

    OmpA consists of a regular, extended eight-stranded beta-barrel and appears to be constructed like an inverse micelle with large water-filled cavities, but does not form a pore. The cavities seem to be highly conserved during evolution. The structure corroborates the concept that all outer membrane proteins consist of beta-barrels PUBMED:9808047. The beta-barrel membrane anchor appears to be the outer membrane equivalent of the single-chain alpha-helix anchor of the inner membrane.

    \ 6139 IPR010448 \

    This family consists of several eukaryotic torsin proteins. Torsion dystonia is an autosomal dominant movement disorder characterised by involuntary, repetitive muscle contractions and twisted postures. The most severe early-onset form of dystonia has been linked to mutations in the human DYT1 (TOR1A) gene encoding a protein termed torsinA. While causative genetic alterations have been identified, the function of torsin proteins and the molecular mechanism underlying dystonia remain unknown. Phylogenetic analysis of the torsin protein family indicates these proteins share distant sequence similarity with the large and diverse family of AAA ATPase, central region containing proteins () proteins. It has been suggested that torsins play a role in effectively managing protein folding and that possible breakdown in a neuroprotective mechanism that is, in part, mediated by torsins may be responsible for the neuronal dysfunction associated with dystonia PUBMED:12554684.

    \ 5804 IPR010289 \

    This entry identifies sequences from gamma and beta proteobacteria, cyanobacteria and firmicutes that belong to the yccS/yhfK family. These proteins are more than 700 amino acids long and many have been annotated as putative membrane proteins. The gene from Salmonella has been annotated as a putative efflux transporter. The gene from Escherichia coli has been annotated as yccS. The YccS hypothetical equivalog is found in beta and gamma proteobacteria, while the smaller YhfK group is only found in E. coli, Salmonella and Yersinia.

    \ 4692 IPR001207 \ Autonomous mobile genetic elements such as transposon or insertion sequences (IS)\ encode an enzyme, transposase, that is required for excising and inserting\ the mobile element. Transposases have been grouped into various families PUBMED:8041625, PUBMED:1310791, PUBMED:1718819. The mutator\ family of transposases consists of a number of elements that include, mutator from maize,\ IsT2 from Thiobacillus ferrooxidans, Is256 from Staphylococcus aureus, Is1201 from\ Lactobacillus helveticus, Is1081 from Mycobacterium bovis, IsRm3 from Rhizobium meliloti\ and others.\ 1830 IPR001853 \ DSBA is a sub-family of the Thioredoxin family PUBMED:9149147. The efficient and correct folding of bacterial disulphide bonded proteins in vivo is dependent upon a class of periplasmic oxidoreductase proteins called DsbA, after the Escherichia coli enzyme. The bacterial protein-folding factor DsbA is the most oxidizing of the thioredoxin family. DsbA catalyzes disulphide-bond formation during the folding of secreted proteins. The extremely oxidizing nature of DsbA has been proposed to result from either domain motion or stabilizing active-site interactions in the reduced form. DsbA's highly oxidizing nature is a result of hydrogen bond, electrostatic and helix-dipole interactions that favour the thiolate over the disulphide at the active site PUBMED:9655827. In the pathogenic bacterium Vibrio cholerae, the DsbA homolog (TcpG) is responsible for the folding, maturation and secretion of virulence factors. \

    While the overall architecture of TcpG and DsbA is similar and the surface features are retained in TcpG, there are significant differences. For example, the kinked active site helix results from a three-residue loop in DsbA, but is caused by a proline in TcpG (making TcpG more similar to thioredoxin in this respect). Furthermore, the proposed peptide binding groove of TcpG is substantially shortened compared with that of DsbA due to a six-residue deletion. Also, the hydrophobic pocket of TcpG is more shallow and the acidic patch is much less extensive than that of E. coli DsbA PUBMED:9149147.

    \ 1318 IPR011251 \

    Bacterial luciferase is a flavin monooxygenase that catalyses the oxidation of long-chain aldehydes and releases energy in the form of visible light, and which uses flavin as a substrate rather than a cofactor PUBMED:8703001. Bacterial luciferase is an alpha/beta (LuxA/LuxB) heterodimer, where each individual subunit folds into a single TIM (beta/alpha)8-barrel domain. There are structural similarities between bacterial luciferase and nonfluorescent flavoproteins (LuxF, FP390), alkanesulfonate monooxygenase (SsuD), and coenzyme F420-dependent terahydromethanopterin reductase, which make up clearly related families with somewhat different folds PUBMED:7776372, PUBMED:12445781, PUBMED:10891279.

    \ 6967 IPR009816 \

    This family represents a conserved region approximately 300 residues long within a number of hypothetical proteins of unknown function that seem to be restricted to mammals.

    \ 3505 IPR002871 \

    Pioneering investigations on the maturation of Fe-S proteins were performed in bacteria and have led to the identification of two operons termed nif (nitrogen fixation) and isc (iron-sulfur cluster assembly) that function in Fe-S-cluster biosynthesis. The nif operon encodes proteins that execute specific functions in the assembly of nitrogenase, a complex metalloenzyme that catalyses the fixation of nitrogen; some of the Nif proteins are specifically involved in the formation of the Fe-S cluster of nitrogenase and these are found in organisms that do not fix nitrogen PUBMED:8875867, PUBMED:8048161. The isc operon encodes proteins necessary for the maturation of bacterial Fe-S proteins.

    \ \ \

    In a number of organisms, for example Azotobacter vinelandii, NifU is a protein associated with the nif operon. It contains two domains, the N-terminal, presented in this entry, and the C-terminal (). These domains exist either together or on different polypeptides, both domains being found in organisms that do not fix nitrogen e.g. yeast, so they have a broader significance in the cell than nitrogen fixation. It has been proposed that they are specifically required for the formation and maturation of Fe-S clusters that in eukaryotes occurs in the mitochondrial matrix. In yeast, for example, deletion of the C-terminal domain does not markedly affect Fe-S biosynthesis but in combination with inactivation of ISU1 there is a defect in mitochondrial FE-S-protein maturation.

    \ \ 4994 IPR005033 \

    Named the YEATS family, after 'YNK7', 'ENL', 'AF-9', and 'TFIIF small subunit', this family also contains the GAS41 protein. All these proteins are thought to have a transcription stimulatory activity.

    \ 6835 IPR009739 \

    This family consists of several bacterial proteins of around 120 residues in length. Members of this family contain four highly conserved cysteine residues. The function of this family is unknown.

    \ 2313 IPR007773 \ This family consists of uncharacterised baculovirus proteins.\ 3440 IPR004101 \

    Proteins containing this domain include a number of related ligase enzymes that catalyse consecutive steps in the synthesis of peptidoglycan. Proteins also include folylpolyglutamate synthase that transfers glutamate to folylpolyglutamate and cyanophycin synthetase that catalyses the biosynthesis of the cyanobacterial reserve material multi-L-arginyl-poly-L-aspartate (cyanophycin) PUBMED:9652408.

    \

    The C-terminal domain is almost always associated with the cytoplasmic peptidoglycan synthetases, N-terminal domain (see ).

    \ 2841 IPR004045 \

    In eukaryotes, glutathione S-transferases (GSTs) participate in the\ detoxification of reactive electrophilic compounds by catalysing their\ conjugation to glutathione. The GST domain is also found in S-crystallins from squid, and proteins with no known GST activity, such as eukaryotic elongation factors 1-gamma and the HSP26 family of stress-related proteins, which include auxin-regulated proteins in plants and stringent starvation proteins in Escherichia coli. The major lens polypeptide of Cephalopoda is also a GST PUBMED:9074797, PUBMED:10783391, PUBMED:11035031, PUBMED:10416260.

    \

    Bacterial GSTs of known function often have a specific, growth-supporting role in biodegradative metabolism: epoxide ring opening and tetrachlorohydroquinone reductive dehalogenation are two examples of the reactions catalysed by these bacterial GSTs. Some regulatory proteins, like the stringent starvation proteins, also belong to the GST family PUBMED:11327815, PUBMED:9045797. GST seems to be absent from Archaea in which gamma-glutamylcysteine substitute to glutathione as major thiol.

    \

    Glutathione S-transferases form homodimers, but in eukaryotes can also form heterodimers of the A1 and A2 or YC1 and YC2 subunits. The homodimeric enzymes display a conserved structural\ fold. Each monomer is composed of a distinct N-terminal sub-domain,\ which adopts the thioredoxin fold, and a C-terminal all-helical\ sub-domain. This entry is the N-terminal domain.

    \ 835 IPR007146 \ This family contains Sas10 which has been identified as a regulator of chromatin silencing PUBMED:9611201. The family also contains Utp3 a component of the U3 ribonucleoprotein complex PUBMED:12068309. The exact molecular function of this family is unknown.\ 6555 IPR009599 \

    This family consists of a number of hypothetical bacterial proteins of around 410 residues in length, which seem to be specific to Chlamydia species. The function of this family is unknown.

    \ 286 IPR007427 \ This protein is predicted to be an integral membrane protein with multiple membrane spans.\ 3378 IPR002820 \

    DMSO reductase of Rhodobacter capsulatus contains a pterin molybdenum cofactor (Moco) that is located in the periplasm. There are four genes involved in the biosynthesis of the Moco (moaA, moaD, moeB and moaC) PUBMED:10411269. MoaA and moaC from Escherichia coli catalyse the first steps in MoCo synthesis PUBMED:9731530.

    \ 1828 IPR003787 \ Four small, soluble proteins (DsrE, DsrF, DsrH and DsrC) are encoded in the dsr gene region of the phototrophic sulphur bacterium Chromatium vinosum D. The dsrAB genes encoding dissimilatory sulphite reductase are part of the gene cluster, dsrABEFHCMK. The remaining proteins that are encoded are a transmembrane protein (DsrM) with similarity to haem-b-binding polypeptides and a soluble protein (DsrK) resembling [4Fe-4S]-cluster-containing\ heterodisulphide reductase from methanogenic archaea. \ DsrE is a small soluble protein involved in intracellular sulphur reduction PUBMED:9695921.\ 1219 IPR006748 \

    The aminoglycosides are a large group of biologically active bacterial secondary metabolites, best known for their antibiotic properties PUBMED:9211644. Aminoglycoside phosphotransferases achieve inactivation of these enzymes by phosphorylation, utilising ATP. Likewise, hydroxyurea is inactivated by phosphorylation of the hydroxy group in the hydroxylamine moiety.

    \ 3587 IPR001742 \ This family contains the outer capsid, VP2 proteins from the orbiviruses; these are dsRNA viruses belonging to the Reoviridae. VP2 acts as an anchor for VP1 and VP3 and contains a non-specific DNA and RNA binding domain in the N-terminus PUBMED:9311813, PUBMED:9281498.\ 1881 IPR003728 \

    This entry describes proteins of unknown function.

    \ 4380 IPR004027 \ The SEC-C motif found in the C-terminus of the SecA protein, in the middle of some SWI2 ATPases and also solo in several proteins. The motif is predicted to chelate zinc with the CXC and C[HC] pairs that constitute the most conserved feature of the motif. It is predicted to be a potential nucleic acid binding domain.\ 6973 IPR009818 \

    This entry represents a conserved region approximately 250 residues long located towards the C terminus of eukaryotic ataxin-2. Ataxin-2 is a protein of unknown function, within which expansion of a polyglutamine tract (due to expansion of unstable CAG repeats in the coding region of the SCA2 gene) causes spinocerebellar ataxia type 2 (SCA2), a late-onset neurodegenerative disorder PUBMED:9339681. The expanded polyglutamine repeat in ataxin-2 causes disruption of the normal morphology of the Golgi complex and increased incidence of cell death PUBMED:12812977. Ataxin-2 is predicted to consist of mostly non-globular domains PUBMED:9462862.

    \ 5741 IPR008589 \ This family consists of several conserved hypothetical proteins from bacteria and archaea. The function of this family is unknown though a number are annotated as outer surface proteins.\ 3242 IPR001517 \

    Barley yellow dwarf virus (BYDV) can be separated into two groups based on serological relationships, presumably governed by the viral capsid structure PUBMED:2273382. Coding regions of coat proteins have been identified for the MAV-PS1, P-PAV (group 1) and NY-RPV (group 2) isolates of BYDV. Group 1 proteins show 71% sequence similarity to each other, 51% similarity to those of group 2, and a high degree of similarity to those from other luteoviruses (including coat proteins from beet western yellow virus (BWYV) PUBMED:3194229 and potato leafroll virus (PLRV) PUBMED:2732704, PUBMED:2732710).

    \

    Among luteovirus coat protein sequences in general, several highly conserved domains can be identified, while other domains differentiate group 1 isolates from group 2 and other luteoviruses. Sequence comparisons between the genomes of PLRV, BWYV and BYDV have revealed ~65% protein sequence similarity between the capsid proteins of BWYV and PLRV and ~45% similarity between BYDV and PLRV PUBMED:2273382. The N-terminal regions of these sequences, like those of many plant virus capsid proteins, is highly basic. These regions may be involved in protein-RNA interaction.

    \ 5714 IPR008726 \ This family consists of several Orthopoxvirus F8 proteins. The function of this family is unknown.\ 6516 IPR009572 \

    This family consists of several short, hypothetical bacterial proteins of around 62 residues in length. Members of this family are found in Escherichia coli and Salmonella typhi. The function of this family is unknown.

    \ 3994 IPR005652 \

    The family corresponds to the photosynthetic reaction centre H subunit in non-oxygenic photosynthetic bacteria. The reaction centre is an integral membrane pigment-protein complex that carries out light-driven electron transfer reactions in photosynthetic bacteria. At the core of reaction centre is a collection of light-harvesting cofactors and closely associated polypeptides. The core protein complex is made of L, M and H subunits PUBMED:11005826. The common cofactors include bacteriochlorophyll, bacteriopheophytins, ubiquinone and non-haem ferrous iron. The net result of electron transfer reactions is the establishment of proton electrochemical gradient and production of reducing equivalents in the form of NADH. Ultimately, the process results in the reduction of C02 to carbohydrates (C6H12O6). In non-oxygenic organisms, the electron donor is an organic acid rather than water. Much of our current functional understanding of photosynthesis comes from the structural determination and spectroscopic studies on the reaction centre of Rhodobacter sphaeroides.

    \ 3957 IPR006987 \ The vaccinia virus interferon (IFN)-gamma receptor (IFN-gammaR) is a 43 kDa soluble glycoprotein that is\ secreted from infected cells early during infection. IFN-gammaR from vaccinia virus, cowpox\ virus and camelpox virus exist naturally as homodimers, whereas the cellular IFN-gammaR dimerizes only upon\ binding the homodimeric IFN-gamma. The existence of the virus protein as a dimer in the absence of ligand may\ provide an advantage to the virus in efficient binding and inhibition of IFN-gamma in solution PUBMED:11842249.\ 4901 IPR006038 \

    Uteroglobin PUBMED:2704039 is a protein that seems specific to lagomorphes (rabbit, hare,\ and pica) and which binds progesterone specifically and with high affinity. It\ may regulate progesterone concentrations reaching the blastocyst. Uteroglobin\ is also a potent inhibitor of phospholipase A2. It is a protein of 70 amino\ acids that form antiparallel disulphide-linked dimers. The progesterone-\ binding site is formed by a cavity between the monomeric subunits. A schematic\ representation of the location of the two disulphide bonds in the antiparallel\ dimer is shown below:\

    \
     NH2-xxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCx-COOH\
           |                                                              |      \
     COOH-xCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxx-NH2\
    
    \ \ The precise role of uteroglobin has still to be elucidated PUBMED:7770456.

    \ 5498 IPR008532 \ This domain occurs in proteins that have been annotated as Fibronectin/fibrinogen binding protein by similarity. This annotation comes from where the N-terminal region is involved in this activity PUBMED:8063411. Hence the activity of this C-terminal domain is unknown. This domain contains a conserved motif D/E-X-W/Y-X-H that may be functionally important.\ 847 IPR007265 \

    Sec34 and Sec35 form a sub-complex in a seven-protein complex that includes Dor1. This complex is thought to be important for tethering vesicles to the Golgi PUBMED:11703943.

    \ 1567 IPR000804 \ Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such as receptor mediated \ endocytosis. In addition to clathrin, the CCV are composed of a number of other components including \ oligomeric complexes which are known as adaptor or clathrin assembly proteins (AP) complexes \ PUBMED:2177341. The adaptor complexes are believed to interact with the cytoplasmic tails of \ membrane proteins, leading to their selection and concentration. In mammals two type of adaptor \ complexes are known, AP-1 which is associated with the Golgi complex and AP-2 which is associated \ with the plasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains, \ the adaptins, (gamma and beta' in AP-1; alpha and beta in AP-2); a medium chain (AP47 in AP-1; AP50 \ in AP-2) and a small chain (AP19 in AP-1; AP17 in AP-2). The small chains of AP-1 and AP-2 are \ evolutionary related proteins of about 18 kD. Homologs of AP17 and AP19 have also been found in yeast \ (genes APS1/YAP19 and APS2/YAP17) PUBMED:2040623, PUBMED:8373805, PUBMED:8157009. AP17 and \ AP19 are also related to the zeta-chain PUBMED:8276893 of coatomer (zeta-cop), a cytosolic \ protein complex that reversibly associates with Golgi membranes to form vesicles that mediate \ biosynthetic protein transport from the endoplasmic reticulum, via the Golgi up to the trans Golgi \ network.\ 4817 IPR005336 \

    This is a family of proteins of unknown function.

    \ 5693 IPR008464 \ This family consists of several Cypovirus polyhedrin proteins. Polyhedrin is known to form a crystalline matrix (polyhedra) in infected insect cells PUBMED:8286955.\ 3042 IPR000918 \

    Isocitrate lyase () PUBMED:2696959, PUBMED:2361956 is an enzyme that catalyzes the conversion of \ isocitrate to succinate and glyoxylate. This is the first step in the glyoxylate bypass, an alternative \ to the tricarboxylic acid cycle in bacteria, fungi and plants. A cysteine, a histidine and a glutamate \ or aspartate have been found to be important for the enzyme's catalytic activity. Only one cysteine \ residue is conserved between the sequences of the fungal, plant and bacterial enzymes; it is located in \ the middle of a conserved hexapeptide.

    \

    Other enzymes also belong to this family including carboxyvinyl-carboxyphosphonate phosphorylmutase () which catalyses the conversion of 1-carboxyvinyl carboxyphosphonate to 3-(hydrohydroxyphosphoryl) pyruvate carbon dioxide, and phosphoenolpyruvate mutase (), which is involved in the biosynthesis of phosphinothricin tripeptide antiobiotics.

    \ 5350 IPR001396 \

    Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, nickel, etc. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds PUBMED:1779825, PUBMED:2959513. An empirical classification into three classes has been proposed by Fowler and coworkers PUBMED:2959504 and Kojima PUBMED:1779826. Members of class I are defined to include polypeptides related in the positions of their cysteines to equine MT-1B, and include mammalian MTs as well as from crustaceans and molluscs. Class II groups MTs from a variety of species, including sea urchins,\ fungi, insects and cyanobacteria. Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units PUBMED:2959504.

    \

    This original classification system has been found to be limited, in the sense that it does not allow clear differentiation of patterns of structural similarities, either between or within classes. Consequently, all class I and class II MTs (the proteinaceous sequences) have now been grouped into families of phylogenetically-related and thus alignable sequences. This system subdivides the MT superfamily into families, subfamilies, subgroups, and isolated isoforms and alleles.

    \

    The metallothionein superfamily comprises all polypeptides that resemble equine renal metallothionein in several respects PUBMED:2959504: e.g., low molecular weight; high metal content; amino acid composition with high Cys and low aromatic residue content; unique sequence with characteristic distribution of cysteines, and spectroscopic manifestations indicative of metal thiolate clusters. A MT family subsumes MTs that share particular sequence-specific features and are thought to be evolutionarily related. The inclusion of a MT within a family presupposes that its amino acid sequence is alignable with that of all members. Fifteen MT families [http://www.unizh.ch/~mtpage/MT.html] have been characterised, each family being identified by its number and its taxonomic range: e.g., Family 1: vertebrate MTs.

    \

    Echinoidea (sea urchin, family 4) MTs are 64-67 residue proteins. Members of this family are recognised by the sequence pattern P-D-x-K-C-[V,F]-C-C-x(5)-C-x-C-x(4)-C-C-x(4)-C-C-x(4,6)-C-C located near the N terminus. \ The taxonomic range of the members extends to sea urchins (echinodea). \ The protein sequence is divided into two structural domains, each containing 9 and 11 Cys residues binding 3 and 4 bivalent metal ions, respectively.\ Family 4 includes subfamilies: e1, e2, they are separate phylogenetic groups.

    \ 8037 IPR013162 \

    This domain belongs to the immunoglobulin superfamily.

    \ 1924 IPR003828 \

    This entry describes proteins of unknown function.

    \ 8141 IPR013211 \

    This repeat is found in bacterial and archaeal cell surface proteins, many of which are hypothetical. The secondary structure corresponding to this repeat is predicted to comprise 4 beta-strands, which may associate to form a beta-propeller PUBMED:. The repeat copy number varies from 2-14. This repeat is sometimes found with the PKD domain .

    \ 6619 IPR009628 \

    This entry represents a conserved region located towards the N-terminal end of prophage tail length tape measure protein (TMP). TMP is important for assembly of phage tails and involved in tail length determination. Mutated forms TMP cause tail fibres to be shortened PUBMED:11040123.

    \ 1572 IPR003058 \

    Bacteriocins are protein antibiotics that kill bacteria closely related to the producing species. Colicins are a subgroup of bacteriocins that are produced by and target Escherichia coli. The lethal action of most colicins is exerted either by formation of a pore in the cytoplasmic membrane of the target cell, or by an enzymatic nuclease digestion mechanism. Most colicins are able to translocate the outer membrane by a two-receptor system, where one receptor is used for the initial binding and the second for translocation. The initial binding is to cell surface receptors such as the porins OmpF, FepA, BtuB, Cir and FhuA; colicins have been classified according which receptors they bind to. The presence of specific periplasmic proteins, such as TolA, TolB, TolC, or TonB, are required for translocation across the membrane PUBMED:12423783. Cloacin DF13 is a bacteriocin that inactivates ribosomes by hydrolysing 16S RNA in 30S ribosomes at a specific site PUBMED:6344017.

    \

    Colicins are composed of domains with distinct functional roles. In general they contain a central R (receptor) domain that mediates receptor binding, an N-terminal T (translocation) domain that mediates translocation of the protein from the outer membrane receptor to the colicin's target within the cell, and a C-terminal C (catalytic) domain that performs the catalytic cleavage PUBMED:12409205.

    \ \ 4845 IPR005155 \

    This family includes both eukaryotic and archaeal proteins. Most of these proteins contain a PUA (PseudoUridine synthase and Archaeosine transglycosylase) domain, which is predicted to bind RNA molecules with complex folded structures PUBMED:10093218. The only characterised member of this family, , also known as Nip7, is an essential nucleolar protein from Saccharomyces cerevisiae PUBMED:9271378, PUBMED:9891085. This protein is required for efficient 60S ribosome subunit biogenesis and has been shown to interact with another essential nucleolar protein, Nop8p, and the exosome subunit Rrp43p. These three proteins are required for 60S subunit synthesis and may be part of a dynamic complex involved in this process.

    \ 7392 IPR011439 \

    This domain is found in several cell surface proteins. Some are involved in antibiotic resistance (e.g. and ) PUBMED:10332717 and/or cellular adhesion (e.g. ) PUBMED:12438342. In some proteins it is repeated more than fifteen times.

    \ 5998 IPR009325 \

    This family consists of several bacterial proteins of unknown function.

    \ 6825 IPR009735 \

    This family consists of a number of bacterial and phage proteins of around 250 residues in length. The family contains several hypothetical proteins and the Gp17 protein from bacteriophage A118 (). The function of this family is unknown.

    \ 1484 IPR002044 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    This domain binds to starch, and is found often at the C-terminus of a variety of glycosyl hydrolases acting on polysaccharides more rapidly than on oligosaccharides. Reations include: the hydrolysis of terminal 1,4-linked alpha-D-glucose residues successively from non-reducing ends of the chains with release of beta-D-glucose, the degradation of starch to cyclodextrins by formation of a 1,4-alpha-D-glucosidic bond, and hydrolysis of 1,4-alpha-glucosidic linkages in polysaccharides to remove successive maltose units from the non-reducing ends of the chains.

    \ \ 280 IPR005240 \

    This family of conserved hypothetical proteins has no known function. It includes potential integral membrane proteins.

    \ 5174 IPR008011 \

    This family of short proteins includes proteins from the NADH-ubiquinone oxidoreductase\ complex I. The family includes the B14 subunit from bovine NADH-ubiquinone oxidoreductase B14 subunit , and the B22 subunit from the human enzyme . All the members of this family are predicted to be\ components of complex I. The family has been named LYR after a highly conserved tripeptide motif\ close to the N terminus of these proteins.

    \ 8049 IPR013176 \

    The function of this fungal family of proteins is unknown.

    \ 3068 IPR000186 \ Interleukin-5 (IL5), also known as eosinophil differentiation factor (EDF),\ is a lineage-specific cytokine for eosinophilpoiesis PUBMED:3498940, PUBMED:8483502. It regulates \ eosinophil growth and activation PUBMED:3498940, and thus plays an important role in\ diseases associated with increased levels of eosinophils, including asthma\ PUBMED:8483502. \ IL5 has a similar overall fold to other cytokines (e.g., IL2, IL4 and GCSF)\ PUBMED:8483502, but while these exist as monomeric structures, IL5 is a homodimer. The\ fold contains an anti-parallel 4-alpha-helix bundle with a left handed twist,\ connected by a 2-stranded anti-parallel beta-sheet PUBMED:8483502, PUBMED:2037074. The monomers are\ held together by 2 interchain disulphide bonds PUBMED:2037074.\ 4783 IPR001732 \

    The UDP-glucose/GDP-mannose dehydrogenases are a small group of enzymes which possesses the ability to catalyze the NAD-dependent 2-fold oxidation of an alcholol to an acid without the release of an aldehyde intermediate PUBMED:2470755, PUBMED:9013585.

    \ \

    The enzymes have a wide range of functions. In plants UDP-glucose dehydrogenase, , is an important enzyme in the synthesis of hemicellulose and pectin PUBMED:12031484, which are the components of newly formed cell walls; while in zebrafish UDP-glucose dehydrogenase is required for cardiac valve formation PUBMED:11533493. In Xanthomonas campestris, a plant pathogen, UDP-glucose dehydrogenase is required for virulence PUBMED:11554764.

    \ \

    GDP-mannose dehydrogenase, , catalyzes the formation of GDP-mannuronic acid, which is the monomeric unit from which the exopolysaccharide alginate is formed. Alginate is secreted by a number of bacteria, which include, the pathogenic bacterium Pseudomonas aeruginosa and Azotobacter vinelandii. In Pseudomonas aeruginosa alginate is believed to play an important role in the bacteria's resistance to antibiotics and the host immune response PUBMED:12135385, while in Azotobacter vinelandii it is essential for the encystment process PUBMED:9864323.

    \ 3609 IPR006131 \

    This family contains two related enzymes:\

      \
    1. Aspartate carbamoyltransferase () (ATCase) catalyzes the conversion\ of aspartate and carbamoyl phosphate to carbamoylaspartate, the second step\ in the de novo biosynthesis of pyrimidine nucleotides PUBMED:3015959. In prokaryotes\ ATCase consists of two subunits: a catalytic chain (gene pyrB) and a\ regulatory chain (gene pyrI), while in eukaryotes it is a domain in a multi-\ functional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD\ in mammals PUBMED:8098212) that also catalyzes other steps of the biosynthesis of\ pyrimidines.
    2. \
    3. Ornithine carbamoyltransferase () (OTCase) catalyzes the conversion\ of ornithine and carbamoyl phosphate to citrulline. In mammals this enzyme\ participates in the urea cycle PUBMED:2662961 and is located in the mitochondrial\ matrix. In prokaryotes and eukaryotic microorganisms it is involved in the\ biosynthesis of arginine. In some bacterial species it is also involved in the\ degradation of arginine PUBMED:3109911 (the arginine deaminase pathway).
    4. \
    \ It has been shown PUBMED:6379651 that these two enzymes are evolutionary related. The\ predicted secondary structure of both enzymes are similar and there are some\ regions of sequence similarities. One of these regions includes three\ residues which have been shown, by crystallographic studies PUBMED:6377306, to be\ implicated in binding the phosphoryl group of carbamoyl phosphate and is described by . The carboxyl-terminal, aspartate/ornithine-binding domain is connected to the amino-terminal\ domain by two alpha-helices, which comprise a hinge between domains PUBMED:10318893.

    \ 1527 IPR005659 \

    This chemotaxis protein stimulates methylation of MCP proteins.

    \ 6641 IPR010658 \

    This entry represents a conserved region within plant nodulin-like proteins.

    \ 1684 IPR002479 \ This repeat is found in multiple tandem copies in several\ proteins. The repeat is 20 amino acid residues long. \ It has been suggested that these repeats in \ might be responsible for the specific recognition of\ choline-containing cell walls PUBMED:3422470. Similar repeats are found in the glucosyltransferases and glucan-binding protein of\ oral streptococci, dextransucrases of Leuconostoc mesenteroides as well as toxins of Clostridium difficile PUBMED:15576779.\ 5626 IPR008765 \ This family consists of bacteriophage FRD3 proteins.\ 2317 IPR007784 \ This family consists of uncharacterised baculovirus proteins.\ 4303 IPR001385 \ Rotaviruses consist of three concentric protein shells. The intermediate\ (middle) protein layer contains VP6, the major internal structural protein. VP6 is the most \ abundant protein in the virion and is involved in virion assembly,\ VP6 possesses the ability to interact with VP2, VP4 and VP7 PUBMED:9266993, PUBMED:8057471.\ 5854 IPR004788 \ Ribose 5-phosphate isomerase (), also known as phosphoriboisomerase, catalyses the conversion of D-ribose 5-phosphate to D-ribulose 5-phosphate in the non-oxidative branch of the pentose phosphate pathway.\ 4139 IPR006870 \ M protein is involved in condensing and targeting the ribonucleoprotein (RNP) coil to the plasma membrane. M interacts specifically with the transmembrane spike protein (G) and it is important for the incorporation of G protein into budding virions PUBMED:9847327.\ 957 IPR000537 \ The COX10/ctaB/cyoE signature is found in prenyltransferases including bacterial 4-hydroxybenzoate octaprenyltransferase (gene ubiA); yeast mitochondrial para-hydroxybenzoate--polyprenyltransferase (gene COQ2); and protoheme IX farnesyltransferase (heme O synthase) from yeast and mammals(gene COX10), and from bacteria (genes cyoE or ctaB) PUBMED:8155731, PUBMED:7885224. These are integral membrane proteins, which probably contain seven transmembrane segments. The signature is also found in cytochrome C oxidase assembly factor. The complexity of cytochrome C oxidase requires assistance in building the complex, and this is carried out by the cytochrome C oxidase assembly factor.\ 7884 IPR012602 \

    This family consists of the pyrBI operon leader peptides. The expression of the pyrBI operon, which encodes the subunits of the pyrimidine biosynthetic enzyme aspartate transcarbamylase. is regulated primarily through a UTP-sensitive transcriptional attenuation control mechanism. In this mechanism, the concentration of UTP determines the extent of coupling between transcription and translation within the pyrBI leader region, hence determining the level of rho-independent transcriptional termination at an attenuator preceding the pyrB gene PUBMED:7517939.

    \ 6146 IPR009394 \

    This family consists of several bacterial proteins of unknown function.

    \ 6109 IPR009379 \

    Sulfolobus spindle-shaped virus 1 (SSV1) and its fusellovirus homologues can be found in many acidic (pH less than 4.0) hot springs (greater than 70 degrees C) around the world. SSV1 contains a 15.5-kb double-stranded DNA genome that encodes 34 proteins with greater than 50 amino acids PUBMED:1926776. A site-specific integrase and a DnaA-like protein have been previously identified by sequence homology, and three structural proteins have been isolated from purified virus and identified by N-terminal sequencing (VP1, VP2, and VP3).

    \ 6918 IPR010768 \

    This family consists of several hypothetical bacterial proteins of around 250 residues in length. The function of this family is unknown.

    \ 2844 IPR001474 \

    GTP cyclohydrolase I () catalyzes the biosynthesis of formic acid\ and dihydroneopterin triphosphate from GTP. This reaction is the first step in\ the biosynthesis of tetrahydrofolate in prokaryotes, of tetrahydrobiopterin in\ vertebrates, and of pteridine-containing pigments in insects.\ The\ comparison of the sequence of the enzyme from bacterial and eukaryotic sources\ shows that the structure of this enzyme has been extremely well conserved\ throughout evolution PUBMED:7542887.

    \ 2063 IPR007272 \ This entry includes YeeE and YedE from Escherichia coli. These proteins are integral membrane proteins of unknown function. Many of these proteins contain two homologous regions that are represented by this entry. This region contains several conserved glycines and an invariant cysteine that is probably an important functional residue.\ 1736 IPR005013 \

    Members of this family are involved in asparagine-linked protein glycosylation. In particular, dolichyl-diphosphooligosaccharide-protein glycosyltransferase (DDOST), also\ known as oligosaccharyltransferase (), transfers the high-mannose sugar GlcNAc(2)-Man(9)-Glc(3) from a dolichol-linked donor to an asparagine acceptor in\ a consensus Asn-X-Ser/Thr motif. In most eukaryotes, the DDOST complex is composed of three subunits, which in humans are described as a 48kDa subunit,\ ribophorin I, and ribophorin II. However, the yeast DDOST appears to consist of six subunits (alpha, beta, gamma, delta, epsilon, zeta). The yeast beta subunit is a\ 45kDa polypeptide, previously discovered as the Wbp1 protein, with known sequence similarity to the human 48kDa subunit and the other orthologues. This family\ includes the 48kDa-like subunits from several eukaryotes; it also includes the yeast DDOST beta subunit Wbp1.

    \ 5903 IPR010341 \

    This family consists of several hypothetical proteins from plants. The function of this family is unknown.

    \ 602 IPR004686 \ The MTC family consists of a limited number of homologues, all from eukaryotes. One member of the family has been functionally characterized as a tricarboxylate carrier from rat liver mitochondria. The rat liver mitochondrial tricarboxylate carrier has been reported to transport citrate, cis-aconitate, threo-D-isocitrate, D- and L-tartrate, malate, succinate and phosphoenolpyruvate. It presumably functions by a proton symport mechanism. The rest of the characterized proteins appear to be sideroflexins involved in iron transport.\ 7186 IPR009954 \

    This family consists of several Enterobacterial proteins of around 60 residues in length. The function of this family is unknown.

    \ 3523 IPR002687 \ This domain is present in various pre-mRNA processing ribonucleoproteins. The function of the domain is unknown however it may be a common RNA or snoRNA or Nop1p binding domain.\

    Proteins have been implicated in an expanding variety of functions during\ pre-mRNA splicing. Molecular cloning has identified genes encoding spliceosomal proteins that potentially act as novel RNA helicases, GTPases, or protein isomerases. Novel protein-protein and protein-RNA interactions that are required for functional spliceosome formation have also been described. Finally, growing evidence suggests that proteins may contribute directly to the spliceosome's active sites PUBMED:9159080.

    \ 6041 IPR006538 \

    These proteins are CobT subunits of the aerobic cobalt chelatase (aerobic cobalamin biosynthesis pathway). Pseudomonas denitrificans CobT has been experimentally characterized PUBMED:1917840, PUBMED:1429466. Aerobic cobalt chelatase consists of three subunits, CobT, CobN () and CobS ().

    \

    Cobalamin (vitamin B12) can be complexed with metal via the ATP-dependent reactions (aerobic pathway) (e.g., in Pseudomonas denitrificans) or via ATP-independent reactions (anaerobic pathway) (e.g., in Salmonella typhimurium) PUBMED:8905078, PUBMED:11469861. The corresponding cobalt chelatases are not homologous. However, aerobic cobalt chelatase subunits CobN and CobS are homologous to Mg-chelatase subunits BchH and BchI, respectively PUBMED:11469861. CobT, too, has been found to be remotely related to the third subunit of Mg-chelatase, BchD (involved in bacteriochlorophyll synthesis, e.g., in Rhodobacter capsulatus) PUBMED:11469861.

    \

    Nomenclature note: CobT of the aerobic pathway Pseudomonas denitrificans is not a homolog of CobT of the anaerobic pathway (Salmonella typhimurium, Escherichia coli). Therefore, annotation of any members of this family as nicotinate-mononucleotide--5,6-dimethylbenzimidazole phosphoribosyltransferases is erroneous.

    \ \ 7244 IPR009985 \

    This family consists of several Crinivirus P26 proteins which seem to be found exclusively in the Lettuce infectious yellows virus. The function of this family is unknown.

    \ 4843 IPR005242 \

    This family of conserved hypothetical proteins has no known function. It includes potential integral membrane proteins.

    \ 8102 IPR013236 \

    Mga is a DNA-binding protein that activates the expression of several important virulence genes in group A streptococcus in response to changing environmental conditions PUBMED:11952907. This region corresponds to the PRD like region.

    \ 2355 IPR002794 \ Many members of this family have no known function and are predicted to be integral membrane proteins.\ 4443 IPR007236 \ The SlyX protein has no known function. It is short, less than 80 amino acids, and its gene is found close to the slyD gene. The SlyX protein has a conserved PPH(Y/W) motif at its C terminus. The protein may be a coiled-coil structure.\ 531 IPR005161 \

    The Ku heterodimer (composed of Ku70 and Ku80 ) contributes to genomic integrity through its ability to bind DNA double-strand breaks and facilitate repair by the non-homologous end-joining pathway. This is the N-terminal alpha/beta domain. This domain only makes a small contribution to the dimer interface. The domain comprises a six stranded beta sheet of the Rossman fold PUBMED:10191092.

    \ 953 IPR008191 \ There are multiple copies of this domain in the Drosophila melanogaster tudor protein and it\ has been identified in several RNA-binding proteins PUBMED:9048482. Although the\ function of this domain is unknown, in Drosophila melanogaster the tudor protein is required\ during oogenesis for the formation of primordial germ cells and for normal\ abdominal segmentation PUBMED:9003410.\ 1747 IPR003208 \ This family contains the medium subunit of the trimeric diol dehydratases and glycerol dehydratases. These enzymes are produced by some enterobacteria in response to growth substances.\ 3575 IPR005632 \

    This family contains proteins annotated as OmpH (outer membrane protein H). OmpH is a major structural protein of the outer membrane. In Pasteurella multocida it acts as a channel-forming transmembrane porin PUBMED:9401047. Porins act as molecular sieves to allow the diffusion of small hydrophilic solutes through the outer membrane and also acts as a receptor for bacteriophages and bacteriocins. Porins are highly immunogenic and are conserved in bacterial families, making them attractive vaccine candidates PUBMED:10067687.

    \ \ \

    The 17-kDa protein (Skp, OmpH) of Escherichia coli is a homotrimeric periplasmic chaperone for newly synthesised outer-membrane proteins, the X-ray structure of which has been reported at resolutions of 2.35 A and 2.30 A PUBMED:15361861, PUBMED:15304217. Three hairpin-shaped alpha-helical extensions reach out by approximately 60 A from a trimerisation domain, which is composed of three intersubunit beta-sheets that wind around a central axis. The alpha-helical extensions approach each other at their distal turns, resulting in a fold that resembles a 'three-pronged grasping forcep'. The overall shape of Skp is reminiscent of the cytosolic chaperone prefoldin (), although it is based on a radically different topology. The peculiar architecture, with apparent plasticity of the prongs and distinct electrostatic and hydrophobic surface properties, supports the recently proposed biochemical mechanism of this chaperone: formation of a Skp(3)-Omp complex protects the outer membrane protein from aggregation during passage through the bacterial periplasm.

    \ \

    The ability of Skp to prevent the aggregation of model substrates in vitro is independent of ATP. Skp can interact directly with membrane lipids and lipopolysaccharide. These interactions are needed for efficient Skp-assisted folding of membrane proteins PUBMED:15304217.

    \ \ 5196 IPR008031 \

    Monomethylamine methyltransferase of the archaebacterium Methanosarcina barkeri contains a novel amino acid, pyrrolysine, encoded by the termination codon UAG PUBMED:12121639. The structure of the enzyme reveals a homohexamer comprised of individual\ subunits with a TIM barrel fold. MtmB initiates the metabolism of monomethylamine by catalysing the transfer of the methyl group from monomethylamine to the corrinoid cofactor of MtmC.

    \ 5393 IPR008840 \ This family contains both viral and bacterial proteins which are related to the Gp157 protein of the Streptococcus thermophilus SFi bacteriophage. It is thought that bacteria possessing the gene coding for this protein have an increased resistance to the bacteriophage PUBMED:9792848.\ 5949 IPR010363 \

    The function of this N-terminal domain has not been characterised and is not expressed in the 'short' isoform of collagen XVIII PUBMED:9503365.

    \ 1159 IPR001731 \ Delta-aminolevulinic acid dehydratase () (ALAD) PUBMED:2656410 catalyzes the second step in the biosynthesis of heme, the condensation of two molecules of 5-aminolevulinate to form porphobilinogen. The enzyme is an oligomer composed of eight identical subunits. Each of the subunits binds an atom of zinc or of magnesium (in plants). A lysine has been implicated in the catalytic mechanism PUBMED:3092810. The sequence of the region in the vicinity of the active site residue is conserved in ALAD from various prokaryotic and eukaryotic species. Inactivating mutations in the human enzyme are responsible for an inherited porphyria, and enzyme inactivation has also been implicated in acute lead poisoning. The enzyme has also been reported to be a regulatory component within the 26S proteasome.\ 3487 IPR007396 \ In Bacillus subtilis, family member , PAI 2, is involved in the negative regulation of protease synthesis and sporulation PUBMED:2108124.\ 6288 IPR009461 \

    This family covers the NSP13 region of the coronavirus polyprotein. This protein has the predicted function of an mRNA cap-1 methyltransferase PUBMED:12809601. The human coronavirus 229E (HCoV-229E) replicase gene-encoded nonstructural protein 13 (nsp13) contains an N-terminal zinc-binding domain and a C-terminal superfamily 1 helicase domain PUBMED:15220459. All natural ribonucleotides and nucleotides are substrates of nsp13, with ATP, dATP, and GTP being hydrolyzed most efficiently. Using the NTPase active site, HCoV-229E nsp13 also mediates RNA 5'-triphosphatase activity, which may be involved in the capping of viral RNAs.

    \ 935 IPR005476 \

    Transketolase () (TK) catalyzes the reversible transfer of a\ two-carbon ketol unit from xylulose 5-phosphate to an aldose receptor, such as\ ribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3-\ phosphate. This enzyme, together with transaldolase, provides a link between\ the glycolytic and pentose-phosphate pathways.\ TK requires thiamine pyrophosphate as a cofactor. In most sources where TK has\ been purified, it is a homodimer of approximately 70 Kd subunits. TK sequences\ from a variety of eukaryotic and prokaryotic sources PUBMED:1567394, PUBMED:1737042 show that the\ enzyme has been evolutionarily conserved.\ In the peroxisomes of methylotrophic yeast Hansenula polymorpha, there is a\ highly related enzyme, dihydroxy-acetone synthase (DHAS) () (also\ known as formaldehyde transketolase), which exhibits a very unusual\ specificity by including formaldehyde amongst its substrates.

    \ 1-deoxyxylulose-5-phosphate synthase (DXP synthase) PUBMED:9371765 is an enzyme so far\ found in bacteria (gene dxs) and plants (gene CLA1) which catalyzes the\ thiamine pyrophosphoate-dependent acyloin condensation reaction between carbon\ atoms 2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D-\ xylulose-5-phosphate (dxp), a precursor in the biosynthetic pathway to\ isoprenoids, thiamine (vitamin B1), and pyridoxol (vitamin B6). DXP synthase\ is evolutionary related to TK.\ The N-terminal section, contains a histidine residue which appears to function in\ proton transfer during catalysis PUBMED:1628611. In the central\ section there are conserved acidic residues that are part of the active cleft\ and may participate in substrate-binding PUBMED:1628611.\ This family includes transketolase enzymes \ and also partially matches to 2-oxoisovalerate dehydrogenase\ beta subunit . Both these enzymes\ utilise thiamine pyrophosphate as a cofactor, suggesting\ there may be common aspects in their mechanism of catalysis.

    \ 303 IPR006816 \

    This domain is a conserved region found in a number of eukaryotic proteins involved in the cytoskeletal rearrangements required for phagocytosis of apoptotic cells and cell motility, including CED-12, ELMO I and ELMO II. ELMO1 is a component of signalling pathways that regulate phagocytosis and cell migration and is the mammalian orthologue of the C. elegans gene, ced-12. CED-12 is required for the engulfment of dying cells and cell migration.

    \ 5933 IPR009294 \

    This family consists of several eukaryotic Aph-1 proteins. Gamma-secretase catalyses the intramembrane proteolysis of Notch, beta-amyloid precursor protein, and other substrates as part of a new signalling paradigm and as a key step in the pathogenesis of Alzheimer's disease. It is thought that the presenilin heterodimer comprises the catalytic site and that a highly glycosylated form of nicastrin associates with it. Aph-1 and Pen-2, two membrane proteins genetically linked to gamma-secretase, associate directly with presenilin and nicastrin in the active protease complex. Co-expression of all four proteins leads to marked increases in presenilin heterodimers, full glycosylation of nicastrin, and enhanced gamma-secretase activity PUBMED:12740439.

    \ 6854 IPR009750 \

    This family consists of several hypothetical bacterial and phage proteins of around 60 residues in length. The function of this family is unknown.

    \ 970 IPR004029 \

    Urease and other nickel metalloenzymes are synthesised as precursors devoid of the metalloenzyme active site. These precursors then undergo a complex post-translational maturation process that requires a number of accessory proteins.

    \ \

    Members of this group are nickel-binding proteins required for urease metallocenter assembly PUBMED:8318889. They are believed to function as metallochaperones to deliver nickel to urease apoprotein PUBMED:12072968, PUBMED:10753863. It has been shown by yeast two-hybrid analysis that UreE forms a dimeric complex with UreG in Helicobacter pylori PUBMED:12388207. The UreDFG-apoenzyme complex has also been shown to exist PUBMED:11157956, PUBMED:7721685 and is believed to be, with the addition of UreE, the assembly system for active urease PUBMED:7721685. The complexes, rather than the individual proteins, presumably bind to UreB via UreE/H recognition sites.

    \ \

    The structure of Klebsiella aerogenes UreE reveals a unique two-domain architecture.The N-terminal domain is structurally related to a heat shock protein, while the C-terminal domain shows homology to the Atx1 copper metallochaperone PUBMED:11591723, PUBMED:11602602. Significantly, the metal-binding sites in UreE and Atx1 are distinct in location and types of residues despite the relationship between these proteins and the mechanism for UreE activation of urease is proposed to be different from the thiol ligand exchange mechanism used by the copper metallochaperones.

    \ \

    The N-terminal domain is termed the peptide-binding domain. Deletion of this domain does not eliminate enzymatic activity, and the truncated protein can still activate urease PUBMED:15866948.

    \ \ 7916 IPR012984 \

    The PROCT domain is the C-terminal domain in pre-mRNA splicing factors of PRO8 family PUBMED:15112237.

    \ 2665 IPR003108 \

    The growth-arrest-specific protein 2 domain is found associated with the spectrin repeat, calponin homology domain and EF hand in many proteins.

    \ 987 IPR001680 \

    WD-40 repeats (also known as WD or beta-transducin repeats) are short ~40 amino acid motifs, often terminating in a Trp-Asp (W-D) dipeptide. WD-containing proteins have 4 to 16 repeating units, all of which are thought to form a circularised beta-propeller structure. WD-repeat proteins are a large family found in all eukaryotes and are implicated in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control and apoptosis. The underlying common function of all WD-repeat proteins is coordinating multi-protein complex assemblies, where the repeating units serve as a rigid scaffold for protein interactions. The specificity of the proteins is determined by the sequences outside the repeats themselves. Examples of such complexes are G proteins (beta subunit is a beta-propeller), TAFII transcription factor, and E3 ubiquitin ligase PUBMED:11814058, PUBMED:10322433.

    \ 1068 IPR000073 \ The alpha/beta hydrolase fold PUBMED:1409539 is common to a number of hydrolytic enzymes\ of widely differing phylogenetic origin and catalytic function. The core\ of each enzyme is an alpha/beta-sheet (rather than a barrel), containing 8\ strands connected by helices PUBMED:1409539. The enzymes are believed to have diverged\ from a common ancestor, preserving the arrangement of the catalytic\ residues. All have a catalytic triad, the elements of which are borne on\ loops, which are the best conserved structural features of the fold.\ 1491 IPR004329 \ CcmE is the product of one of a cluster of Ccm genes that are necessary for cytochrome c biosynthesis in eubacteria.\ Expression of these proteins is induced when the organisms are grown under anaerobic conditions with nitrate or nitrite as\ the final electron acceptor.\ 366 IPR001450 \

    Ferredoxins are iron-sulphur proteins that mediate electron transfer\ in a range of metabolic reactions; they fall into several subgroups\ according to the nature of their iron-sulphur cluster(s) PUBMED:3932661, PUBMED:2506358. One group,\ originally found in bacteria, has been termed "bacterial-type", in which\ the active centre is a 4Fe-4S cluster. 4Fe-4S ferredoxins may in\ turn be subdivided into further groups, based on their sequence properties. Most contain at least one conserved domain, including four Cys residues\ that bind to a 4Fe-4S centre.

    \

    During the evolution of bacterial-type ferredoxins, intrasequence gene\ duplication, transposition and fusion events occured, resulting in the\ appearance of proteins with multiple iron-sulphur centres: e.g. dicluster-\ type (2[4Fe-4S]) and polyferredoxins, iron-sulphur subunits of bacterial\ succinate dehydrogenase/fumarate reductase, formate hydrogenlyase and\ formate dehydrogenase complexes, pyruvate-flavodoxin oxidoreductase,\ NADH:ubiquinone reductase and others. In some bacterial ferredoxins, one\ of the duplicated domains has lost one or more of the four conserved Cys\ residues. These domains have either lost their iron-sulphur binding\ property, or bind to a 3Fe-4S centre instead of a 4Fe-4S centre. \ 3D structures are now known both for a number of monocluster-type PUBMED:2600971 and\ dicluster-type PUBMED:7966291 4Fe-4S ferredoxins.

    \

    CAUTION: PRINTS signature in the current entry is known to miss protein matches and should be updated in the near future.

    \ 2968 IPR006143 \

    Gram-negative bacteria produce a number of proteins which are secreted into the growth medium by a mechanism that does not require a cleaved N-terminal signal sequence. These proteins, while having different functions, require the help of two or more proteins for their secretion across the cell envelope. These secretion proteins include members belonging to the ABC transporter family (see the relevant entry ) and a protein belonging to a family which is currently composed PUBMED:2249654, PUBMED:2184029, PUBMED:1622271, PUBMED:1427098, PUBMED:9301333 of the following members:

    \
    \
     Gene  Species                  Protein which is exported\
     ----  ----------------------   --------------------------------------------\
     hlyD  Escherichia coli         Hemolysin\
     appD  A.pleuropneumoniae       Hemolysin\
     lcnD  Lactococcus lactis       Lactococcin A\
     lktD  A.actinomycetemcomitans  Leukotoxin\
           Pasteurella haemolytica\
     rtxD  A.pleuropneumoniae       Toxin-III\
     cyaD  Bordetella pertussis     Calmodulin-sensitive adenylate cyclase-\
                                    hemolysin (cyclolysin)\
     cvaA  Escherichia coli         Colicin V\
     prtE  Erwinia chrysanthemi     Extracellular proteases B and C\
     aprE  Pseudomonas aeruginosa   Alkaline protease\
     emrA  Escherichia coli         Drugs and toxins\
     yjcR  Escherichia coli         Unknown\
    
    \

    The secretion proteins are evolutionary related and consist of from 390 to 480 amino acid residues. They seem to be anchored in the inner membrane by a N-terminal transmembrane region. Their exact role in the secretion process is not yet known.

    \ 4020 IPR003686 \

    Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll a that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.

    \ \ \

    PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane PUBMED:12518057, PUBMED:15100025. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10 kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection PUBMED:14871485.

    \ \ \

    This family represents the low molecular weight transmembrane protein PsbI, which is tightly associated with the D1/D2 heterodimer in PSII. The function of PsbI is unknown, but it may be involved in the assembly, dimerisation or stabilisation of PSII dimers PUBMED:8544827.

    \ 7892 IPR012976 \

    This is the central domain in Nop56/SIK1-like proteins PUBMED:15112237.

    \ 7188 IPR010859 \

    This family consists of several bacterial and phage proteins of around 410 residues in length. Bacterial members of this family seem to be found exclusively in Streptococcus species. The function of this family is unknown.

    \ 7191 IPR009957 \

    This family consists of several hypothetical bacterial proteins of around 110 residues in length. Members of this family appear to be found exclusively in Ralstonia solanacearum. The function of this family is unknown.

    \ 7566 IPR008243 \

    Chorismate mutase (CM; ) catalyses the reaction at the branch point of the biosynthetic pathway leading to the three aromatic amino acids, phenylalanine, tryptophan and tyrosine (chorismic acid is the last common intermediate, and CM leads to the L-phenylalanine/L-tyrosine branch). It is part of the shikimate pathway, which is present only in bacteria, fungi and plants.

    \ \

    This entry represents a family of monofunctional (non-fused) chorismate mutases from Gram-positive bacteria (Firmicutes) and cyanobacteria. Trusted members of the family are found in operons with other enzymes of the chorismate pathways, both up- and downstream of CM (Listeria, Bacillus, Oceanobacillus) or are the sole CM in the genome where the other members of the chorismate pathways are found elsewhere in the genome (Nostoc, Thermosynechococcus).\ \ They are monofunctional, homotrimeric, nonallosteric enzymes and are not regulated by the end-product aromatic amino acids.

    \ \ \

    The three types of CM are AroQ class, Prokaryotic type (e.g., and others); AroQ class, Eukaryotic type (); and AroH class. They fall into two structural folds (AroQ class and AroH class) which are completely unrelated PUBMED:11528003. The two types of the AroQ structural class (the Escherichia coli CM dimer and the yeast CM monomer) can be structurally superimposed, and the topology of the four-helix bundle forming the active site is conserved PUBMED:11528003.

    For additional information please see PUBMED:8046752, PUBMED:8061004, PUBMED:2105742, PUBMED:8378335, PUBMED:10818343, PUBMED:11450855, PUBMED:9383421.

    \ >\ 3311 IPR004354 \

    REC114 is one of 10 genes required for initiation of meiotic recombination \ in Saccharomyces cerevisiae PUBMED:9267437. Located on chromosome XIII, it is \ transcribed only in meiosis and has no detectable function in mitosis PUBMED:8385581.

    \

    REC114 has been shown to possess an intron and is one of only three genes\ in yeast with 3' introns PUBMED:9267437. The 3' splice site utilised in REC114 is a\ very rare AAG sequence - only three other genes in yeast use this non-\ consensus sequence PUBMED:9267437. It appears that the intron is not essential for\ expression of REC114 and is not absolutely required for meiotic function.\ Nevertheless, it is conserved in evolution - two other species of yeast\ contain an intron at the same location in their REC114 genes PUBMED:9267437.

    \ 210 IPR001275 \ This domain was first discovered in the doublesex proteins of Drosophila melanogaster and is also seen in proteins from C. elegans PUBMED:9490411. In Drosophila the \ doublesex gene controls somatic sexual differentiation by producing alternatively spliced mRNAs encoding related sex-specific polypeptides PUBMED:8978051. These proteins are believed to function as transcription factors on downstream sex-determination genes, especially on neuroblast differentiation and yolk protein genes transcription PUBMED:1907913, PUBMED:3046751. The DM domain binds DNA as a dimer, allowing the recognition of pseudopalindromic sequences PUBMED:8978051, PUBMED:9927589, PUBMED:10898790. The NMR analysis of the DSX DM domain PUBMED:10898790 revealed a novel zinc module containing 'intertwined' CCHC and HCCC \ zinc-binding sites. The recognition of the DNA requires the carboxy-terminal basic\ tail which contacts the minor groove of the target sequence.\ 5201 IPR008036 \

    Mu-conotoxins are a family of peptides from the venoms of predatory cone snails. Mu-conotoxins are peptide inhibitors of voltage-sensitive sodium channels,\ preferentially in skeletal muscle. Conotoxin gm9a, a putative 27-residue polypeptide encoded by Conus gloriamaris,\ has been shown to adopt an inhibitory cystine knot motif constrained by three\ disulphide bonds\ PUBMED:12006587, PUBMED:12193600.

    \ 2527 IPR002953 \ The filoviridae are a group of viruses that cause haemorrhagic fevers with\ a high mortality rate. The family currently contains three viruses: Ebola, \ Marburg and Reston, named after their corresponding outbreak regions. \ They possess negative-stranded RNA genomes, which encode at least 7 proteins.\ The VP35 protein is found in the genomes of all filoviruses. Its function is \ presently unknown, but it is thought to share the function of the \ phosphorylated proteins (polymerase subunits) of rhabdoviruses and \ paramyxoviruses due to its position in the genome. There is no evidence \ however, to suggest that VP35 is phosphorylated PUBMED:, PUBMED:8482365.\ 3585 IPR002630 \ This family consists of orbivirus non-structural protein NS1, or hydrophobic tubular protein. NS1 has no specific function in virus replication, it is however thought to play a role in transport of mature virus particles from virus inclusion bodies to the cell membrane PUBMED:9152425. Orbivirus are part of the larger reoviridae which have a dsRNA genome of at least 10 segments encoding at least 10 viral proteins PUBMED:9152425; orbivirus found in this family include bluetongue virus, and african horsesickness virus.\ 52 IPR004094 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    This group of serine protease inhibitors belong to MEROPS inhibitor family I15, clan IO. They inhibit serine peptidases of the S1 family () PUBMED:14705960 and are characterized by a well conserved pattern of cysteine residues. Many of the proteins that belong to this family are anti-coagulants.

    \ 4423 IPR007634 \ This DNA-binding domain is based on peptide fragmentation data. This domain is proximal to DNA in the promoter/holoenzyme complex. Furthermore, this region contains a putative helix-turn-helix motif. At the C terminus, there is a highly conserved region known as the RpoN box and is the signature of the sigma-54 proteins PUBMED:10894718.\ 4627 IPR006662 \ Thioredoxins PUBMED:3896121, PUBMED:2668278, PUBMED:7788289, PUBMED:7788290 are small disulphide-containing redox proteins that have been found in all the kingdoms of living organisms. Thioredoxin serves as a general protein disulphide oxidoreductase. It interacts with a broad range of proteins by a redox mechanism based on reversible oxidation of 2 cysteine thiol groups to a disulphide, accompanied by the transfer of 2 electrons and 2 protons. The net result is the covalent interconversion of a disulphide and a dithiol. \ \ \ \ In the NADPH-dependent protein disulphide reduction, thioredoxin reductase (TR) catalyses reduction of oxidised thioredoxin (trx) by NADPH using FAD and its redox-active disulphide (steps 1 and 2). Reduced thioredoxin then directly reduces the disulphide in the substrate protein (step 3) PUBMED:3896121. Protein disulphide isomerase (PDI), a resident foldase of the endoplasmic recticulum, is a multi-functional protein that catalyses the formation and isomerisation of disulphide bonds during protein folding PUBMED:7913469, PUBMED:7983029. PDI contains 2 redox active domains, near the N- and C-termini, that are similar to thioredoxin: both contribute to disulphide isomerase activity, but are functionally non-equivalent PUBMED:7983029. Interestingly, a mutant PDI, with all 4 of the active cysteines replaced by serine, displays a low but detectable level of disulphide isomerase activity PUBMED:7983029. Moreover, PDI exhibits chaperone-like activity towards proteins that contain no disulphide bonds, i.e. behaving independently of its disulphide isomerase activity PUBMED:7635143. A number of endoplasmic reticulum proteins that differ from the PDI major isozyme contain 2 (ERp60, ERp5) or 3 (ERp72 PUBMED:2295602) thioredoxin domains; all of them seem to be PDIs. 3D-structures have been determined for a number of thioredoxins PUBMED:8590004. The molecule has a doubly-wound alternating alpha/beta fold, consisting of a 5-stranded parallel beta-sheet core, enclosed by 4 alpha-helices. The active site disulphide is located at the N-terminus of helix 2 in a short segment that is separated from the rest of the helix by a kink caused by a conserved proline. The 4-membered disulphide ring is located on the surface of the protein. A flat hydrophobic surface lies adjacent to the disulphide, which presumably facilitates interaction with other proteins.

    One invariant feature of all thioredoxins is a cis-proline located in a loop preceding beta-strand 4. This residue is positioned in van der Waals contact with the active site cysteines and is important both for stability and function PUBMED:8590004. Thioredoxin belongs to a structural family that includes glutaredoxin, glutathione peroxidase, bacterial protein disulphide isomerase DsbA, and the N-terminal domain of glutathione transferase PUBMED:7788290. Thioredoxins have a beta-alpha unit preceding the motif common to all these proteins.

    \ 7813 IPR012564 \

    Members of this family are viral glycoproteins that form part of an envelope complex PUBMED:9733861.

    \ 974 IPR002014 \

    The VHS domain is a ~140 residues long domain, whose name is derived\ from its occurrence in VPS-27, Hrs and STAM. Based on regions surrounding the domain, VHS-proteins can be divided into 4 groups PUBMED:11911875:

    The VHS domain is always found at the N-\ terminus of proteins suggesting that such topology is important for function. The domain is considered to have a general membrane targeting/cargo recognition role in vesicular trafficking PUBMED:10985773.

    \ \

    Resolution of the crystal structure of the VHS domain of Drosophila Hrs and\ human Tom1 revealed that it consists of eight helices arranged in a double-layer superhelix\ PUBMED:10693761. The existence of conserved patches of residues on the domain surface suggests that VHS domains may be involved in protein-protein recognition and docking. Overall, sequence similarity is low (approx 25%) amongst domain family members

    \ \ \ \ 5612 IPR008649 \ This family represents the N-terminal region of the Betaherpesvirus UL82 and UL83 proteins. As viruses are reliant upon their host cell to serve as proper environments for their replication, many have evolved mechanisms to alter intracellular conditions to suit their own needs. Human cytomegalovirus induces quiescent cells to enter the cell cycle and then arrests them in late G(1), before they enter the S phase, a cell cycle compartment that is presumably favourable for viral replication. The protein product of the Homo sapiens cytomegalovirus UL82 gene, pp71, can accelerate the movement of cells through the G(1) phase of the cell cycle. This activity would help infected cells reach the late G(1) arrest point sooner and thus may stimulate the infectious cycle. pp71 also induces DNA synthesis in quiescent cells, but a pp71 mutant protein that is unable to induce quiescent cells to enter the cell cycle still retains the ability to accelerate the G(1) phase. Thus, the mechanism through which pp71 accelerates G(1) cell cycle progression appears to be distinct from the one that it employs to induce quiescent cells to exit G(0) and subsequently enter the S phase PUBMED:12610120.\ 5096 IPR007933 \

    This family consists of several phage CII regulatory proteins. CII plays a key role in the\ lysis-lysogeny decision in\ bacteriophage lambda and related phages PUBMED:12397182.

    \ 1638 IPR002552 \

    The type I glycoprotein S of coronavirus, trimers of which constitute the typical viral spikes, is assembled into virions through noncovalent interactions with the M protein. The spike glycoprotein is translated\ as a large polypeptide that is subsequently cleaved to S1 and S2 PUBMED:2984314. Both chimeric S proteins appeared to cause cell fusion when expressed individually, suggesting that they were biologically fully active PUBMED:10627571. The spike is a type I membrane glycoprotein that possesses a conserved transmembrane anchor and an unusual cysteine-rich (cys) domain that bridges the putative junction of the anchor and the cytoplasmic tail PUBMED:10725213.

    \ 6106 IPR009378 \

    This family consists of several conserved eukaryotic proteins of unknown function.

    \ 1976 IPR005048 \

    This is a domain of unknown function found in proteins of unknown function, DUF287.

    \ 8132 IPR013192 \

    The molecular function of the non-structural 5a protein is uncertain. The NS5a protein is phosphorylated when expressed in mammalian cells. It is thought to interact with the dsRNA dependent (interferon inducible) kinase PKR, PUBMED:9710605, PUBMED:9143277. This region corresponds to the N-terminal zinc binding domain (1a) PUBMED:15902263.

    \ 7853 IPR013112 \

    This FAD binding domain is associated with ferric reductase NAD binding proteins and the heavy chain of Cytochrome b-245.

    \ 7158 IPR009935 \

    This family consists of several bacterial proteins of around 90 residues in length. The function of this family is unknown.

    \ 4419 IPR005328 \

    Serotype M1 group A Streptococcus strains cause epidemic waves of human infections. This family includes the sic protein, an extracellular protein (streptococcal inhibitor of complement) that inhibits human complement PUBMED:10426317. The exact mechanism of\ inhibition has not been completely elucidated, but Sic is\ incorporated into the complement membrane-attack complex\ (C5bC9) responsible for target killing. Preliminary analysis of variation in the sic gene in\ M1 Group A streptococci strains identified a level of polymorphism far\ exceeding that of other genes in these organisms, selection of new\ Sic structural variants on mucosal surfaces generates a very\ large pool of subclones in the course of epidemic waves. This\ process may help to sustain and enlarge the epidemic waves

    \ 2032 IPR007154 \ Members of this family are around 120 amino acids in length and are found in some archaebacteria. The function of this family is unknown. However it contains a conserved motif IHPPAH that may be involved in its function.\ 2332 IPR007840 \ This is a family of eubacterial hypothetical proteins.\ 708 IPR001204 \ The PHO-4 family of transporters includes the phosphate-repressible phosphate permease\ (PHO-4) from Neurospora crassa which is probably a sodium-phosphate symporter PUBMED:7732001. This family also includes\ the human leukemia virus receptor.\ 1334 IPR006824 \ This is a family of Baculovirus DNA helicases, which are essential for the initiation of viral DNA replication and may contribute to other functions, such as controlling the switch to the late phase and leading to the inhibition of host protein synthesis.\ 6432 IPR010571 \

    This family consists of several bacterial outer membrane lipoprotein omp19 sequences PUBMED:10456959.

    \ 2363 IPR001401 \

    Membrane transport between compartments in eukaryotic cells requires proteins that allow the budding and scission of nascent cargo vesicles from one compartment and their targeting and fusion with another. Dynamins are large GTPases that belong to a protein superfamily PUBMED:15040446 that, in eukaryotic cells, includes classical dynamins, dynamin-like proteins, OPA1, Mx proteins, mitofusins and guanylate-binding proteins/atlastins PUBMED:2142876, PUBMED:2112425, PUBMED:1532158, PUBMED:2607176, and are involved in the scission of a wide range of vesicles and organelles. They play a role in many processes including budding of transport vesicles, division of organelles, cytokinesis and pathogen resistance.

    The minimal distinguishing architectural features that are common to all dynamins and are distinct from other GTPases are the structure of the large GTPase domain (300 amino acids) and the presence of two additional domains; the middle domain and the GTPase effector domain (GED), which are involved in oligomerization and regulation of the GTPase activity. The GTPase domain contains the GTP-binding motifs that are needed for guanine-nucleotide binding and hydrolysis. The conservation of these motifs is absolute except for the the final motif in guanylate-binding proteins. The GTPase catalytic activity can be stimulated by oligomerisation of the protein, which is mediated by interactions between the GTPase domain, the middle domain and the GED.

    \ 422 IPR007043 \ This family of enzymes deaminates glutamine to glutamate .\ 589 IPR005110 \

    Proteins in this family contain two structural domains. One of these contains the conserved DGXA motif. This region is\ found in proteins involved in biosynthesis of molybdopterin cofactor however the exact molecular function of\ this region is uncertain.

    \ 1775 IPR002728 \ Members of this family include , a candidate tumour suppressor gene PUBMED:8603384, and DPH2 from\ yeast PUBMED:8406038, which confers resistance to diphtheria toxin and\ has been found to be involved in diphthamide synthesis. Diphtheria\ toxin inhibits eukaryotic protein synthesis by ADP-ribosylating\ diphthamide, a posttranslationally modified histidine residue present\ in EF2. The exact function of the members of this family is\ unknown.\ 3462 IPR000402 \ The sodium pump (Na+,K+ ATPase), located in the plasma membrane of all animal\ cells PUBMED:1645948, is an heterotrimer of a catalytic subunit (alpha chain), a\ glycoprotein subunit of about 34 kDa (beta chain) and a small hydrophobic\ protein of about 6 kDa. The beta subunit seems PUBMED:2156741 to regulate, through the\ assembly of alpha/beta heterodimers, the number of sodium pumps transported\ to the plasma membrane.\ Structurally the beta subunit is composed of a charged cytoplasmic domain of\ about 35 residues, followed by a transmembrane region, and a large\ extracellular domain that contains three disulphide bonds and glycosylation\ sites. This structure is schematically represented in the figure below.\
    \
                                    +----+ +--+       +-----------+\
                                    |    | |  |       |           |                                \
            xxxxxxxxxxxxxxxxxxxxxxxxCxxxxCxCxxCxxxxxxxCxxxxxxxxxxxCxxxx\
            |-Cyt-||TM||------------Extracellular---------------------|\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \ 4174 IPR000114 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal protein L16 is one of the proteins from the large ribosomal subunit.\ In Escherichia coli, L16 is known to bind directly the 23S rRNA and to be\ located at the A site of the peptidyltransferase center. L16 is a protein\ of 133 to 185 amino-acid residues.

    \ \ 6221 IPR010479 \

    BID is a member of the Bcl-2 superfamily of proteins that are key regulators of programmed cell death, hence this family is related to the Apoptosis regulator Bcl-2 protein BH domain. BID is a pro-apoptotic member of the Bcl-2 superfamily and as such posses the ability to target intracellular membranes and contains the BH3 death domain. The activity of BID is regulated by a Caspase 8-mediated cleavage event, exposing the BH3 domain and significantly changing the surface charge and hydrophobicity, which causes a change of cellular localisation PUBMED:10089878.

    \ 5265 IPR008407 \

    This family consists of a number of bacterial and archaeal branched-chain amino acid transport proteins. AzlD, a member of this group, has been shown by mutational analysis to be involved in branched-chain amino acid transport, and to be involved in conferring resistance to 4-azaleucine PUBMED:9287000. However, its exact role in these processes is not yet clear PUBMED:9287000. Based on its hydropathy profile, it has been suggested to be a membrane protein PUBMED:9287000.

    \ \ \ 1241 IPR002813 \

    Members of the ArgJ family catalyse the first and fifth steps in arginine biosynthesis PUBMED:8473852.

    \ 2755 IPR000400 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 46 comprises enzymes with only one known activity; chitosanase ().

    \ \

    Chitosanase enzymes catalyse the endohydrolysis\ of beta-1,4-linkages between N-acetyl-D-glucosamine and D-glucosamine\ residues in a partly acetylated chitosan.

    \ 457 IPR004011 \ The GYR motif is found in several Drosophila melanogaster proteins. Its function is unknown, however the presence of completely conserved tyrosine residues may suggest it could be a substrate for tyrosine kinases.\ 2993 IPR003678 \ Helicobacter pylori is a causative agent of gastritis and peptic ulceration in humans. As the first step towards development of a vaccine against H. pylori infection, many attempts have been made to identify protective antigens. A potential target of vaccine development would be a H. pylori specific proteins that are surface-exposed and highly antigenic.\

    This family consists of putative outer membrane proteins from Helicobacter pylori.

    \ 4875 IPR005366 \

    This is a small family of proteins of unknown function.

    \ 7823 IPR013027 \ This entry describes both class I and class II oxidoreductases. \ FAD flavoproteins belonging to the family of pyridine nucleotide-disulphide \ oxidoreductases (glutathione reductase, trypanothione reductase, lipoamide dehydrogenase, \ mercuric reductase, thioredoxin reductase, alkyl hydroperoxide reductase) share sequence \ similarity with a number of other flavoprotein oxidoreductases, in particular with \ ferredoxin-NAD+ reductases involved in oxidative metabolism of a variety of hydrocarbons \ (rubredoxin reductase, putidaredoxin reductase, terpredoxin reductase, ferredoxin-NAD+ \ reductase components of benzene 1,2-dioxygenase, toluene 1,2-dioxygenase, chlorobenzene \ dioxygenase, biphenyl dioxygenase), NADH oxidase and NADH peroxidase PUBMED:2319593, \ PUBMED:1404382, PUBMED:2067578. Comparison of the crystal structures of human glutathione \ reductase and Escherichia coli thioredoxin reductase reveals different locations of their active \ sites, suggesting that the enzymes diverged from an ancestral FAD/NAD(P)H reductase and \ acquired their disulphide reductase activities independently PUBMED:2067578. \

    \ Despite functional similarities, oxidoreductases of this family show no sequence \ similarity with adrenodoxin reductases PUBMED:2924777 and flavoprotein pyridine nucleotide\ cytochrome reductases (FPNCR) PUBMED:1748631. Assuming that disulphide reductase activity \ emerged later, during divergent evolution, the family can be referred to as FAD-dependent \ pyridine nucleotide reductases, FADPNR.

    \

    To date, 3D structures of glutathione reductase PUBMED:3656429, thioredoxin reductase \ PUBMED:2067578, mercuric reductase PUBMED:2067577, lipoamide dehydrogenase PUBMED:1880807, \ trypanothione reductase PUBMED:1924336 and NADH peroxidase PUBMED:1942054 have been solved. \ The enzymes share similar tertiary structures based on a doubly-wound alpha/beta fold, \ but the relative orientations of their FAD- and NAD(P)H-binding domains may vary \ significantly. By contrast with the FPNCR family, the folds of the FAD- and \ NAD(P)H-binding domains are similar, suggesting that the domains evolved by gene \ duplication PUBMED:7411611.\

    \ 7172 IPR009944 \

    This family contains the eukaryotic surface glycoprotein amastin (approximately 180 residues long).In Trypanosoma cruzi, amastin is particularly abundant during the amastigote stage.

    \ 2340 IPR002765 \

    This family of bacterial proteins have not been characterized.

    \ 3968 IPR004972 \

    This family is the Poxvirus P4B major core protein. It is a precursor for one of the two most abundant structural components of the virion (major core proteins 4A and 4B).

    \ 7090 IPR009888 \

    This family consists of several hypothetical bacterial proteins of around 160 residues in length. The function of this family is unknown.

    \ 4298 IPR003667 \

    The rnf genes of Rhodobacter capsulatus, essential for nitrogen fixation, are thought to encode a system for electron transport to nitrogenase. The rnfABCDGEH operon comprises seven genes that show similarities in gene arrangement and deduced protein sequences to homologous regions in the genomes of Haemophilus influenzae and Escherichia coli. Four of the rnf gene products were found to be similar in sequence to components of an Na+-dependent NADH:ubiquinone oxidoreductase (NQR) from Vibrio alginolyticus PUBMED:9492268. The NQR-type enzyme of Klebsiella pneumoniae was shown to catalyse sodium-dependent NADH oxidation in the respiratory chain PUBMED:15063750.

    \ 7651 IPR012930 \

    The members of this family are sequences that are similar to TraC () from Rhizobium etli CFN42. The gene encoding this protein is one of a group of genes found on plasmid p42a of Rhizobium etli CFN42 that are thought to be involved in the process of plasmid self-transmission. Mobilisation of plasmid p42a is of importance as it is required for transfer of plasmid p42d, the symbiotic plasmid which carries most of the genes required for nodulation and nitrogen fixation by this symbiotic bacterium. The predicted protein products of p42a are similar to known transfer proteins of Agrobacterium tumefaciens plasmid pTiC58 PUBMED:12591886.

    \ 6880 IPR010756 \

    This family represents a conserved region approximately 100 residues long within mammalian hepatocellular carcinoma-associated antigen 59 and similar proteins. Family members are found in a variety of eukaryotes, mainly as hypothetical proteins.

    \ 5682 IPR008674 \ This family consists of archaeal chromosomal protein MC1 sequences which protect DNA against thermal denaturation PUBMED:2503033.\ 1511 IPR000875 \ Cecropins PUBMED:3318666, PUBMED:2015623, PUBMED:1915368 are potent antibacterial proteins that constitute a \ main part of the cell-free immunity of insects. Cecropins are small proteins of about 35 amino acid \ residues active against both Gram-positive and Gram-negative bacteria. They seem to exert a lytic \ action on bacterial membranes. Cecropins isolated from insects other than Hyalophora cecropia have been given \ various names; bactericidin, lepidopteran, sarcotoxin, etc. All of these peptides are structurally \ related. Cecropin P1, an intestinal antibacterial peptide from Sus scrofa (pig), also belongs to this family.\ 5377 IPR008398 \

    This family of sequences contains the 40 kDa polypeptides from garlic viruses (Allexiviruses), which do not resemble any other plant virus gene products reported so far PUBMED:8376963.

    \ 4971 IPR006043 \

    This family includes permeases for diverse substrates such as xanthine, uracil and vitamin C. However many members of this family are functionally uncharacterised and may transport other substrates. Members of this family have ten predicted transmembrane helices.

    \ 2222 IPR007572 \ This is a predicted transmembrane protein found in plants, chloroplasts and cyanobacteria. This family is also known as YCF20.\ 5147 IPR007984 \

    The poxvirus DNA-directed RNA polymerase () catalyses the transcription of DNA into RNA. It consists of at least eight subunits, this is the 19 kDa subunit.

    \ 4683 IPR013150 \

    This is a cyclin related domain associated with TFIIB.

    \ 2079 IPR007332 \

    The function of the members of this bacterial protein family is unknown. Some members may be involved in conferring cation resistance.

    \ 3259 IPR003345 \ This short repeat is found in multiple copies in bacterial M proteins. The M proteins bind to IgA and are closely associated with virulence.\ The M protein has been postulated to be a major group A streptococcal (GAS) virulence factor because of its contribution to the bacterial resistance to opsonophagocytosis PUBMED:8830235.\ 6116 IPR009383 \

    This family consists of several bacterial YihD proteins of unknown function PUBMED:9868784.

    \ 3473 IPR003486 \ The nucleoprotein of the ssRNA negative-strand Nairovirus is an internal part of the virus particle.\ 4113 IPR008257 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of peptidases belong to the MEROPS peptidase family M19 (membrane dipeptidase family, clan MJ). The protein fold of the peptidase domain for members of this family resembles that of Klebsiella urease, the type example for clan MJ.

    \ \ \

    Renal dipeptidase (rDP) (), also known as microsomal dipeptidase,\ is a zinc-dependent metalloenzyme that hydrolyzes a wide range of dipeptides.\ It is involved in renal metabolism of glutathione and its conjugates. It is a\ homodimeric disulphide-linked glycoprotein attached to the renal brush border\ microvilli membrane by a GPI-anchor.\ A glutamate residue has recently been shown PUBMED:8097406 to be important for the\ catalytic activity of rDP.\ rDP seems to be evolutionary related to hypothetical proteins in the PQQ\ biosynthesis operons of Acinetobacter calcoaceticus and Klebsiella pneumoniae.

    \ 6683 IPR010673 \

    This family consists of several short hypothetical bacterial proteins of around 70 residues in length. Members of this family seem to all belong to the order Bacillales or Lactobacillales. The function of this family is unknown.

    \ 7097 IPR010834 \

    This family consists of several VirB7 proteins from Agrobacterium and Rhizobium species. The virulence genes of the Agrobacterium tumefaciens Ti plasmid are grouped into six transcription units and direct the transfer of T-DNA into plant cells. VirB is the largest vir operon from the Ti plasmid pTiA6NC. It is thought that VirB proteins are involved in the formation of a transmembrane structure which mediates the passage of the transferred T-DNA molecule through the bacterial and plant cell membranes PUBMED:3281947.

    \ 3258 IPR007492 \

    The LytTr domain is a DNA-binding, potential winged helix-turn-helix domain (~100 residues) present in a variety of bacterial transcriptional regulators of the algR/agrA/lytR family. It is named after the lytR response regulators involved in the regulation of cell autolysis. The LytTr domain binds to a specific DNA sequence pattern in the upstream regions of target genes PUBMED:12034833. The N-terminal of the protein contains a response regulator receiver domain ().

    \ 7742 IPR012876 \

    The sequences found in this family are all derived from hypothetical plant proteins of unknown function. The region features a number of highly conserved cysteine residues.

    \ 2895 IPR002597 \ This family consists of probable major envelope glycoproteins\ from members of the herpesviridae including herpes simplex \ virus, human cytomegalovirus and varicella-zoster virus.\ Members of the herpesviridae have a dsDNA genome and do\ not have a RNA stage during there replication.\ 4117 IPR000989 \ Replication proteins (rep) are involved in plasmid replication. The Rep protein binds to the plasmid \ DNA and nicks it at the double strand origin (dso) of replication. The 3'-hydroxyl end created is \ extended by the host DNA replicase, and the 5' end is displaced during synthesis. At the end of one \ replication round, Rep introduces a second single stranded break at the dso and ligates the ssDNA\ extremities generating one double-stranded plasmid and one circular ssDNA form. Complementary strand \ synthesis of the circular ssDNA is usually initiated at the single-stranded origin by the host RNA\ polymerase PUBMED:9570403.\ 3113 IPR001910 \

    Inosine-uridine preferring nucleoside hydrolase () (IU-nucleoside hydrolase or IUNH) is an enzyme first identified in protozoan PUBMED:8634237 that catalyzes the hydrolysis of all of the commonly occuring purine and pyrimidine nucleosides into ribose and the associated base, but has a preference for inosine and uridine as substrates. This enzyme is important for these parasitic organisms, which are deficient in de novo synthesis of purines, to salvage the host purine nucleosides.\ IUNH from Crithidia fasciculata has been sequenced and characterized, it is an homotetrameric enzyme of subunits of 34 Kd. An histidine has been shown to be important for the catalytic mechanism, it acts as a proton donor to activate the hypoxanthine leaving group.

    \

    A highly conserved region located in the N-terminal extremity contains four conserved aspartates that have been shown PUBMED:8634238 to be located in the active site cavity.

    \

    IUNH is evolutionary related to a number of uncharacterized proteins from various biological sources.

    \ 8054 IPR013198 \

    This family consists of the C-terminal helix-turn-helix domain found in several bacterial GTP-sensing transcriptional pleiotropic repressor CodY proteins. CodY has been found to repress the dipeptide transport operon (dpp) of Bacillus subtilis in nutrient-rich conditions PUBMED:7783641. The CodY protein also has a repressor effect on many genes in Lactococcus lactis during growth in milk PUBMED:11401725.

    \ 7556 IPR011703 \

    This entry includes some of the AAA proteins not detected by the model.

    \ 6367 IPR010539 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 894 IPR003121 \

    The SWI/SNF family of complexes, which are conserved from yeast to humans, are ATP-dependent chromatin-remodelling proteins that facilitate transcription activation PUBMED:11147808. The mammalian complexes are made up of 9-12 proteins called BAFs (BRG1-associated factors). The BAF60 family have at least three members: BAF60a, which is ubiquitous, BAF60b and BAF60c, which are expressed in muscle and pancreatic tissues, respectively. BAF60b is present in alternative forms of the SWI/SNF complex, including complex B (SWIB), which lacks BAF60a. The SWIB domain is a conserved region found within the BAF60b proteins PUBMED:12016060, and can be found fused to the C-terminus of DNA topoisomerase in Chlamydia.

    \

    The MDM2 oncoprotein contains a conserved MDM2 domain that is able to bind to and suppress the p53 tumour suppressor transcription factor by blocking its transactivation domain PUBMED:8875929. The SWIB and MDM2 domains are homologous and share a common fold.

    \ \ 554 IPR001611 \

    Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape PUBMED:14747988. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions PUBMED:11751054. Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response.

    \ \

    Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterized by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the ß-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear" segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins. 3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with alpha-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions PUBMED:11751054. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats PUBMED:11967365.

    \ \ \ 4922 IPR006976 \

    This family contains several examples of the VanZ protein, but also contains examples of phosphotransbutyrylases. VanZ confers low-level resistance to the glycopeptide antibiotic teicoplanin (Te). Analysis of cytoplasmic peptidoglycan precursors, accumulated in the presence of ramoplanin, showed that VanZ-mediated Te resistance does not involve incorporation of a substituent of D-alanine into the peptidoglycan precursors PUBMED:7867956.

    \ 6781 IPR010712 \

    This family consists of several bacterial arsenical resistance operon trans-acting repressor ArsD proteins. ArsD is a trans-acting repressor of the arsRDABC operon that confers resistance to arsenicals and antimonials in Escherichia coli. It possesses two-pairs of vicinal cysteine residues, Cys(12)-Cys(13) and Cys(112)-Cys(113), that potentially form separate binding sites for the metalloids that trigger dissociation of ArsD from the operon. However, as a homodimer it has four vicinal cysteine pairs PUBMED:11980902.

    \ 5143 IPR007980 \

    This family consists of the Saccharomyces cerevisiae\ mitochondrial ribosomal proteins VAR1. Mitochondria possess their own ribosomes responsible for\ the synthesis of a small number of proteins encoded by the mitochondrial genome. In S. cerevisiae the two ribosomal RNAs and a single ribosomal\ protein, VAR1, are products of mitochondrial genes, and the remaining approximately 80 ribosomal\ proteins are encoded in the nucleus PUBMED:8988258. VAR1 along with 15S rRNA are necessary\ for the formation of mature 37S subunits PUBMED:7770043.

    \ 5533 IPR008547 \ This family consists of several uncharacterised eukaryotic proteins.\ 3421 IPR007761 \ The mannitol operon of Escherichia coli, encoding the mannitol-specific enzyme II of the phosphotransferase system (MtlA) and mannitol phosphate dehydrogenase (MtlD) contains an additional downstream open reading frame which encodes the mannitol repressor (MtlR).\ 2913 IPR006930 \ Members of this family contain a conserved region found in most herpesvirus pp38 phosphoproteins.\ 3707 IPR002870 \

    This signature covers the region of the propeptide for members of the MEROPS peptidase family M12B (clan MA(M), adamalysin family). The propeptide contains a sequence motif similar to the "cysteine switch" of the matrixins, which mediate cell-cell or cell-matrix interactions.

    \ 1481 IPR005086 \ This domain is found in a number of alkaline cellulases.\ 4715 IPR005595 \

    The alpha-subunit of the TRAP complex (TRAP alpha) is a single-spanning membrane protein of the endoplasmic reticulum (ER) which is found in proximity of nascent polypeptide chains translocating across the membrane PUBMED:8050590.

    \ 6950 IPR009805 \

    This entry represents a 29 residue repeated sequence which seem to be specific to the Ehrlichia chaffeensis variable length PCR target (VLPT) protein. Ehrlichia chaffeensis is a tick-transmitted rickettsial agent and is responsible for human monocytic ehrlichiosis (HME). The function of this family is unknown PUBMED:12496165.

    \ 4586 IPR000818 \ Transcriptional enhancer activators are nuclear proteins that contain a TEA/ATTSdomain, a DNA-binding region of 66-68 amino acids. The TEA/ATTS domain is found in the N-termini of certain gene regulatory proteins, such as the SV40 enhancer\ factor TEF-1, yeast trans-acting factor TEC-1 (which is required for TY1\ enhancer activity), and the Aspergillus abaA regulatory gene product.\ SV40 and retroviral enhancers, and those to which TEF-1, TEC-1 and abaA\ proteins bind, contain GT-IIC sites: the TEA/ATTS domain may therefore recognise\ and bind such sites.\ Secondary structure predictions suggest the presence of 3 helices, but have\ not confirmed the presence of the helix-turn-helix motif characteristic of\ many DNA-binding proteins: DNA-binding may therefore be effected by a\ different mechanism PUBMED:2070413.\ 1088 IPR002864 \ This family is found in plants. It consists of various acyl-acyl carrier protein (ACP) thioesterases (TE) which terminate fatty acyl group extension via hydrolyzing an acyl group on a fatty acid PUBMED:7479856.\ 4956 IPR005377 \

    The movement of lipid and protein components between intracellular organelles requires the regulated interactions of many molecules. Vacuolar protein sorting-associated protein (Vps)5 is a yeast protein that is a subunit of a large multimeric complex, termed the retromer complex, involved in retrograde transport of proteins from endosomes to the trans-Golgi network. Sorting nexin (SNX) 1 and SNX2 are its mammalian orthologs PUBMED:11102511.

    \ \

    To carry out its biological functions, Vps5 forms the retromer complex\ with at least four other proteins: Vps17, Vps26, Vps29, and Vps35 PUBMED:11102511. This family of Vps26-proteins also contains Down syndrome critical region 3/A.

    \ 5357 IPR008417 \ Bap31 is a polytopic integral protein of the endoplasmic reticulum membrane and a substrate of caspase-8. Bap31 is cleaved within its cytosolic domain, generating pro-apoptotic p20 Bap31 PUBMED:11917123.\ 664 IPR003100 \

    This domain is named after the proteins Piwi Argonaut and Zwille. It is also found in the CAF protein from Arabidopsis thaliana. The function of the domain is unknown but has been found in the middle region of a number of members of the Argonaute protein family, which also contain the Piwi domain () in their C-terminal region PUBMED:12906857. Several members of this family have been implicated in the\ development and maintenance of stem cells through the RNA-mediated gene-quelling mechanisms\ associated with the protein DICER.

    \ \ 3194 IPR000066 \ In photosynthetic bacteria the antenna complexes function as light-harvesting\ systems that absorb light radiation and transfer the excitation energy to the\ reaction centers. The antenna complexes are generally composed of two\ polypeptides (alpha and beta chains); two or three bacteriochlorophyll (BChl)\ molecules and some carotenoids PUBMED:1577009, PUBMED:1460542.\ Both the alpha and the beta chains of antenna complexes are small proteins of\ 42 to 68 residues which share a three-domain organization. They are composed\ of a N-terminal hydrophilic cytoplasmic domain followed by a transmembrane\ region and a C-terminal hydrophilic periplasmic domain. In the transmembrane\ region of both chains there is a conserved histidine which is most probably\ involved in the binding of the magnesium atom of a bacteriochlorophyll group.\ The beta chains contain an additional conserved histidine which is located at\ the C-terminal extremity of the cytoplasmic domain and which is also thought\ to be involved in bacteriochlorophyll-binding.\ 4505 IPR007718 \ This presumed domain is found at the C terminus of the Saccharomyces cerevisiae SRP40 protein and its homologues. SRP40/nopp40 is a chaperone involved in nucleocytoplasmic transport. SRP40 is also a suppressor of mutant AC40 subunit of RNA polymerase I and III.\ 2730 IPR001137 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 11 \ comprises enzymes with only one known activity, xylanase (). These enzymes were formerly known as cellulase family G.

    \ 7509 IPR013098 \

    The proteins in this entry contain the Immunoglobulin I-set domain.

    \ 2739 IPR001540 \

    O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or\ more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl\ hydrolases, based on sequence similarity, has led to the definition of 85 different families PUBMED:1747104, PUBMED:8352747, PUBMED:8687420, PUBMED:1732212, PUBMED:8535779, PUBMED:. This classification\ is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. Because the fold of proteins is better conserved than\ their sequences, some of the families can be grouped in 'clans'.

    \

    Glycoside hydrolase family 20 comprises enzymes with several known activities; beta-hexosaminidase (); lacto-N-biosidase (). Carbonyl oxygen of the C-2 acetamido group of the substrate acts as the catalytic nucleophile/base in this family of enzymes.

    \ \

    In the brain and other tissues, beta-hexosaminidase A degrades GM2 gangliosides; specifically, the enzyme hydrolyses terminal non-reducing N-acetyl-D-hexosamine residues in N-acetyl-beta-D-hexosaminides. There are 3 forms of beta-hexosaminidase: hexosaminidase A is a trimer, with one alpha, one beta-A and one beta-B chain; hexosaminidase B is a tetramer of two beta-A and two beta-B chains; and hexosaminidase S is a homodimer of alpha chains. The two beta chains are derived from the cleavage of a precursor. Mutations in the beta-chain lead to Sandhoff disease, a lysosomal storage disorder characterised by accumulation of GM2 ganglioside PUBMED:8357844.

    \ 7878 IPR012567 \

    This family consists of the leader peptides of ilvGEDA operon. The expression of the ilvGEDA operon of E coli K-12 is multivalently controlled by the three branched -chain amino acids. Regulation is thought to occur by attenuation of transcription in response to the changing levels of the cognate tRNAs. Transcription of this operon is usually terminated at the end of the leader (regulatory) region PUBMED:3900037.

    \ 7155 IPR010850 \

    This family consists of several locust specific neuroparsin proteins. Neuroparsins are produced by the A1 type of protocerebral median neurosecretory cells of the PI-CC system and display pleiotropic activities: inhibition of the effect of juvenile hormone, stimulation of fluid reabsorption of isolated recta, induction of an increase in hemolymph lipid and trehalose levels, and neurotrophic effects PUBMED:9114464.

    \ 4939 IPR000635 \ Although the overall picture of HCMV DNA synthesis appears typical of the\ herpesviruses, some novel features are emerging. Six herpesvirus-group-common genes encode proteins that likely constitute the replication fork machinery, including a two-subunit DNA polymerase, a helicas-primase complex and a single-stranded DNA-binding protein PUBMED:9130047. \

    The herpes simplex virus type-1 single-strand DNA-binding protein ICP8 is a 128-kDa zinc metalloprotein. Photoaffinity labeling has shown that the region encompassing residues 368-902 contains the single-strand DNA-binding site of ICP8 PUBMED:10529391. The herpes simplex virus type-1 UL5, UL8, and UL52 genes encode an essential heterotrimeric DNA helicase-primase that is responsible for concomitant DNA unwinding and primer synthesis at the viral DNA\ replication fork. ICP8 may stimulate DNA unwinding and enable bypass of cisplatin damaged DNA by recruiting the helicase-primase to the DNA PUBMED:9593724.

    \ 5605 IPR008392 \ This family consists of accessory gland-specific 26Ab peptides or male accessory gland secretory protein 355B from different Drosophila species. Drosophila males, like males of most other insects, transfer a group of specific proteins (Acp26Ab and Acp26Aa in Drosophila) to the females during mating. These proteins are produced primarily in the accessory gland and are likely to influence the female's reproduction PUBMED:1361475.\ 2235 IPR007603 \ This is a family of uncharacterised proteins.\ 2693 IPR001530 \

    Geminiviruses are characterised by a genome of circular single-stranded DNA encapsidated in twinned (geminate) quasi-isometric particles, from which the group derives its name PUBMED:. Most geminiviruses can be divided into 2 subgroups on the basis of host range and/or insect vector: i.e. those that infect dicotyledenous plants and are transmitted by the same whitefly species, and those that infect monocotyledenous plants and are transmitted by different leafhopper vectors. The genomes of the whitefly-transmitted cassava latent (CLV), tomato golden mosaic (TGMV) and bean golden mosaic (BGMV) viruses possess a bipartite genome. By contrast, only a single DNA component has been identified for the leafhopper-transmitted maize streak (MSV) and wheat dwarf (WDV) viruses PUBMED:6526009, PUBMED:2829117.

    \

    Beet curly top (BCTV), bean summer death and tobacco yellow dwarf viruses belong to a third possible subgroup. Like MSV and WDV, BCTV is transmitted by a specific leafhopper species, yet like the whitefly-transmitted gemini-viruses it has a host range confined to dicotyledenous plants.

    \

    Sequence comparison of the whitefly-transmitted squash leaf curl PUBMED:1984668 and tomato yellow leaf curl viruses PUBMED:1840676, PUBMED:1926771 with the genomic components of TGMV and BGMV reveals a close evolutionary relationship PUBMED:1984668. Amino acid sequence alignments of potato yellow mosaic viral (PYMV) proteins with those encoded by other geminiviruses show that PYMV is closely related to geminiviruses isolated from the New World, especially in the putative coat protein gene regions PUBMED:1856690.

    \ 897 IPR006011 \

    Syntaxins A and B are nervous system-specific proteins implicated in the docking of synaptic vesicles with the presynaptic plasma membrane. Syntaxins are a family of\ receptors for intracellular transport vesicles. Each target membrane may be\ identified by a specific member of the syntaxin family PUBMED:7690687.\ Members of the syntaxin family PUBMED:8493722, PUBMED:8490959 have a size ranging from\ 30 Kd to 40 Kd; a C-terminal extremity which is highly hydrophobic and anchors the protein on the cytoplasmic surface of cellular membranes; a central, well\ conserved region, which seems to be in a coiled-coil conformation.\

    \ 2045 IPR007171 \ This is an archaeal family of unknown function.\ 4076 IPR003021 \ REC1 of Ustilago maydis plays a key role in regulating the genetic system\ of the fungus. REC1 mutants are very sensitive to UV light. Mutation\ leads to a complex phenotype with alterations in DNA repair, recombination,\ mutagenesis, meiosis and cell division PUBMED:8276878. The predicted product of the\ REC1 gene is a polypeptide of 522 amino acid residues with molecular mass \ 57 kD. The protein shows 3'--5' exonuclease activity, but only in cells\ over-expressing REC1 PUBMED:8276878. While it is distinguishable from the major\ bacterial nucleases, the protein has certain enzymatic features in common\ with epsilon, the proof-reading exonuclease subunit of Escherichia coli DNA polymerase\ III holoenzyme PUBMED:8276878.\ The rad1 gene of Schizosaccharomyces pombe comprises three exons and encodes\ a 37 kD protein that exhibits partial similarity to the REC1 gene of \ U. maydis PUBMED:7926829. The two genes share putative functional similarities\ in their respective organisms.\ 7789 IPR012469 \

    A family of uncharacterised fungal proteins.

    \ 315 IPR006968 \

    This is a family of proteins of unknown function, restricted to eukaryotes.

    \ 946 IPR001978 \ The troponin (Tn) complex regulates Ca2+ induced muscle contraction. Tn contains three subunits, Ca2+ binding (TnC), inhibitory (TnI), and tropomyosin binding (TnT). This family includes troponin T and troponin I. Troponin I binds to actin and troponin T binds to tropomyosin PUBMED:3102969, PUBMED:7852318, PUBMED:7601340.\ 3530 IPR004249 \ The RPT2 protein is a signal transducer of the phototropic response in Arabidopsis thaliana. The RPT2 gene is light inducible; encodes a novel protein with putative phosphorylation sites, a nuclear localization signal, a BTB/POZ domain (), and a coiled-coil domain. RPT2 belongs to a large gene family that includes the recently isolated NPH3 gene PUBMED:10662859. The NPH3 protein is a NPH1 photoreceptor-interacting protein that is essential for phototropism.\ Phototropism of A. thaliana seedlings in response to a blue light source is initiated by nonphototropic hypocotyl 1 (NPH1), a light-activated serine-threonine protein kinase PUBMED:10542152. NPH3 is a member of\ a large protein family, apparently specific to higher plants, and may function as an adapter or scaffold protein to bring\ together the enzymatic components of a NPH1-activated phosphorelay PUBMED:10542152. Many of the proteins in this group also contain the BTB/POZ domain () at the N-terminal.\ 4400 IPR001627 \

    The Sema domain occurs in semaphorins, which are a large family of secreted and transmembrane proteins, some of which function as repellent signals during axon guidance. Sema domains also occur in a hepatocyte growth factor receptor, in SEX protein PUBMED:9875845 and in viral proteins.

    \ \

    CD100 (also called SEMA4D) is associated with PTPase and serine kinase activity. CD100 increases PMA, CD3 and CD2 induced T cell proliferation, increases CD45 induced T cell adhesion, induces B cell homotypic adhesion and down-regulates B cell expression of CD23.

    \

    \ The Sema domain is characterised by a conserved set of cysteine residues,\ which form four disulfide bonds to stabilise the structure. The Sema domain\ fold is a variation of the beta propeller topology, with seven blades radially\ arranged around a central axis. Each blade contains a four-\ stranded (strands A to D) antiparallel beta sheet. The inner strand of each\ blade (A) lines the channel at the center of the propeller, with strands B and\ C of the same repeat radiating outward, and strand D of the next repeat\ forming the outer edge of the blade. The large size of the Sema domain is not\ due to a single inserted domain but results from the presence of additionnal\ secondary structure elements inserted in most of the blades. The Sema domain\ uses a 'loop and hook' system to close the circle between the first and the\ last blades. The blades are constructed sequentially with an N-terminal beta-\ strand closing the circle by providing the outermost strand (D) of the seventh\ (C-terminal) blade. The beta-propeller is further stabilized by an extension\ of the N-terminus, providing an additional, fifth beta-strand on the outer\ edge of blade 6 PUBMED:12925274, PUBMED:12958590, PUBMED:15167892.

    \

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at http://www.ncbi.nlm.nih.gov/PROW/guide/45277084.htm \

    \ \ 7661 IPR012911 \

    Protein phosphatase 2C (PP2C) is involved in regulating cellular responses to stress in various eukaryotes. It consists of two domains: an N-terminal catalytic domain and a C-terminal domain characteristic of mammalian PP2Cs. This domain consists of three antiparallel alpha helices, one of which packs against two corresponding alpha-helices of the N-terminal domain. The C-terminal domain does not seem to play a role in catalysis, but it may provide protein substrate specificity due to the cleft that is created between it and the catalytic domain PUBMED:9003755.

    \ 5676 IPR008463 \ This family consists of several Firmicute transcriptional repressor of class III stress gene (CtsR) proteins. CtsR of L. monocytogenes negatively regulates the clpC, clpP and clpE genes belonging to the CtsR regulon PUBMED:10692157.\ 2312 IPR007772 \ This family contains uncharacterised beak and feather disease virus proteins.\ 5138 IPR007975 \

    Autographa californica nucleopolyhedrovirus p31 is a\ nuclear phosphoprotein that accumulates in the virogenic stroma, which is the viral replication centre\ in the infected-cell nucleus. The protein binds to DNA, and serves as a late expression factor\ PUBMED:8794314.

    \ 2042 IPR007256 \ Proteins of this family have no known function.\ 7815 IPR012937 \

    This domain occurs in many hypothetical proteins. It also occurs in some prion-like proteins.

    \ 2961 IPR007125 \

    The core histones together with some other DNA binding proteins appear to form\ a superfamily defined by a common fold and distant sequence similarities PUBMED:7651829,\ PUBMED:9016552. Some proteins contain local\ homology domains related to the histone fold PUBMED:9305837.

    \ 456 IPR002489 \

    Glutamate synthase (GltS) is a complex iron-sulphur flavoprotein that catalyses the reductive synthesis of L-glutamate from 2-oxoglutarate and L-glutamine via intramolecular channelling of ammonia, a reaction in the bacterial, yeast and plant pathways for ammonia assimilation PUBMED:11188694. GltS is a multifunctional enzyme that functions through three distinct active centres carrying out multiple reaction steps: L-glutamine hydrolysis, conversion of 2-oxoglutarate into L-glutamate, and electron uptake from an electron donor. The active centres are synchronised to avoid the wasteful consumption of L-glutamine PUBMED:11967268.. There are three classes of GltS, which share many functional properties: bacterial NADPH-dependent GltS, ferredoxin-dependent GltS from photosynthetic cells, and NAD(P)H-dependent GltS from yeast, fungi and lower animals.

    \

    The dimeric alpha subunits each consist of four domains: N-terminal amidotransferase domain, the central domain, the FMN binding domain and the C-terminal domain. The C-terminal domain forms a right-handed beta-helix that comprises seven helical turns PUBMED:11188694. Each helical turn has a sharp bend that is associated with a repeated sequence motif consisting of G-XX-G-XXX-G. This domain does not contain any residues directly involved in catalysis, but has a crucial structural role.

    \

    This domain is also found in proteins such as subunit C of formylmethanofuran dehydrogenase, which catalyses the first step in methane formation from carbon dioxide in methanogenic archaea. There are two isoenzymes of formylmethanofuran dehydrogenase: a tungsten-containing isoenzyme (FwdC) and a molybdenum-containing isoenzyme (FmdC). The tungsten isoenzyme is constitutively transcribed, whereas transcription of the molybdenum operon is induced by molybdate PUBMED:9818358.

    \ \ 4225 IPR001971 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \ Ribosomal protein S11 PUBMED:3191988 plays an essential role in selecting the correct tRNA in protein biosynthesis. It is located on the large lobe of the small ribosomal subunit. On the basis of sequence similarities, S11 belongs to a family of bacterial, archaeal and eukaryotic ribosomal proteins PUBMED:.\ 284 IPR007367 \ This is a family of uncharacterised proteins.\ 2048 IPR007172 \ This is a bacterial domain of unknown function.\ 2974 IPR000665 \ Hemagglutinin is responsible for attaching viruses to cell receptors and for initiating infection. Neuroaminidase activity helps the efficient spread of the virus by dissociating the mature virions from the neuraminic acid-containing glycoproteins. Hemagglutinin-neuramidase is external, and anchored to the envelope by its N-terminal hydrophobic sequence. Proteins belonging to this family are from ssRNA negative-strand viruses.\ 4200 IPR001515 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The L32e family consists of proteins that have 135 to 240 amino-acid residues.

    \ 1788 IPR005173 \

    This region is found to the C terminus of the DM DNA-binding domain \ PUBMED:10729224. DM-domain proteins with this motif are known as DMRTA proteins. The function of this region is unknown.

    \ 7126 IPR009915 \

    This family consists of several plant and bacterial NnrU proteins. NnrU is thought to be involved in the reduction of nitric oxide. The exact function of NnrU is unclear. It is thought however that NnrU and perhaps NnrT are required for expression of both nirK and nor PUBMED:9171397.

    \ 7676 IPR012863 \

    The sequences featured in this family are derived from a number of hypothetical prokaryotic proteins. The region in question is approximately 130 amino acids long.

    \ 7585 IPR011668 \ This domain is found in archaeal species. It is likely to bind zinc via its four well-conserved cysteine residues.\ 4236 IPR001865 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    Ribosomal S2 proteins have been shown to belong to a family that includes 40S ribosomal subunit 40kDa proteins, putative laminin-binding proteins, NAB-1 protein and 29.3kDa protein from Haloarcula marismortui PUBMED:1531984, PUBMED:8119397. The laminin-receptor proteins are thus predicted to be the eukaryotic homologue of the eubacterial S2 risosomal proteins PUBMED:7899076.

    \ 6294 IPR009464 \

    This region is spliced out of isoform 2. It is predicted to be of a mixed alpha/beta fold - though predominantly helical.

    \ 1380 IPR001370 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    The baculovirus inhibitor of apoptosis protein repeat (BIR) is a domain of tandem repeats separated by a variable length linker that seems to confer cell death-preventing activity PUBMED:8139034, PUBMED:8552191. The BIR domains characterise the Inhibitor of Apoptosis (IAP) family of proteins (MEROPS proteinase inhibitor family I32, clan IV) that suppress apoptosis by interacting with and inhibiting the enzymatic activity of both initiator and effector caspases (MEROPS peptidase family C14, ). Several distinct mammalian IAPs including XIAP, c-IAP1, c-IAP2, and ML-IAP, have been identified, and they all exhibit antiapoptotic activity in cell culture. The functional unit in each IAP protein is the baculoviral IAP repeat (BIR), which contains approximately 80 amino acids folded around a zinc atom. Most mammalian IAPs have more than one BIR domain, with the different BIR domains performing distinct functions. For example, in XIAP, the third BIR domain (BIR3) potently inhibits the catalytic activity of caspase-9, whereas the linker sequences immediately preceding the second BIR domain (BIR2) selectively targets caspase-3 or 7.

    \

    Homologs of most components in the mammalian apoptotic pathway have been identified in fruit flies. The Drosophila Apaf-1, known as Dapaf-1, HAC-1 or Dark, shares significant sequence similarity with its mammalian counterpart, and is critically important for the activation of the Drosophila initiator caspase Dronc. Dronc, in turn, cleaves and activates the effector caspase DrICE. The Drosophila IAP, DIAP1, binds to and in-activates both DrICE and Dronc through its BIR1 and BIR2 domains. During apoptosis, the anti-death function of DIAP1 is countered by at least four pro-apoptotic proteins, Reaper, Hid, Grim, and sickle, through direct physical interactions. These four proteins represent the functional homologs of the mammalian protein Smac, and they all share a conserved IAP-binding motif at their N termini. The three proteins Reaper, Hid, and Grim are collectively referred to as the RHG proteins PUBMED:11511363, PUBMED:15273300.

    \

    Both XIAP and DIAP1 contain a RING domain at their C termini, and can act as an E3 ubiquitin ligase. Indeed, both XIAP and DIAP1 have been shown to promote self-ubiquitination and degradation as well as to negatively regulate the target caspases. Nonetheless, important differences exist between XIAP and DIAP1. The primary function of XIAP is thought to inhibit the catalytic activities of caspases; to what extent the ubiquitinating activity of XIAP contributes to its function remains unclear. For DIAP1, however, the ubiquitinating activity appears to be essential for its function.

    \

    Recently a Drosophila p53 protein has been identified that mediates apoptosis via a novel pathway involving the activation of the Reaper gene and subsequent inhibition of the inhibitors of apoptosis (IAPs). CIAP1, a major mammalian homolog of Drosophila IAPs, is irreversibly inhibited (cleaved) during p53-dependent apoptosis and this cleavage is mediated by a serine protease. Serine protease inhibitors that block CIAP1 cleavage inhibit p53-dependent apoptosis. Furthermore, activation of the p53 protein increases the transcription of the HTRA2 gene, which encodes a serine protease that interacts with CIAP1 and potentiates apoptosis. Therefore mammalian p53 protein activates apoptosis through a novel pathway functionally similar to that in Drosophila, which involves HTRA2 and subsequent inhibition of CIAP1 by cleavage PUBMED:12569127.

    \ \ 13 IPR001365 \

    Adenosine deaminase () catalyzes the hydrolytic deamination of adenosine into \ inosine and AMP deaminase () catalyzes the hydrolytic deamination of AMP into IMP.\ It has been shown PUBMED:1998686 that these two \ enzymes share three regions of sequence similarities; these regions are centered \ on residues which are proposed to play an important role in the catalytic mechanism of \ these two enzymes.

    \ 327 IPR004273 \

    Dynein is a multisubunit microtubule-dependent motor enzyme that acts as the force generating protein of eukaryotic cilia and flagella. The cytoplasmic\ isoform of dynein acts as a motor for the intracellular retrograde motility of\ vesicles and organelles along microtubules.

    \

    Dynein is composed of a number of\ ATP-binding large subunits, intermediate size subunits and small subunits (see ).\ \ This family represents the C-terminal region of dynein heavy chain. The dynein heavy chain also exhibits ATPase activity and\ microtubule binding ability and acts as a motor for the movement of organelles and vesicles along microtubules.

    \ 1858 IPR002837 \

    This archaebacterial domain has no known function. In Methanococcus jannaschii it occurs with an endonuclease domain .

    \ 1764 IPR002658 \

    The 3-dehydroquinate synthase () domain is present in isolation in various bacterial 3-dehydroquinate synthases and also present as a domain in the pentafunctional AROM polypeptide () PUBMED:7556173. 3-dehydroquinate (DHQ) synthase catalyses the formation of dehydroquinate (DHQ) and orthophosphate from 3-deoxy-D-arabino heptulosonic 7 phosphate PUBMED:9613570. This reaction is part of the shikimate pathway which is involved in the biosynthesis of aromatic amino acids.

    \ 3519 IPR008199 \

    Neuromedin U (NmU) PUBMED:3239891, PUBMED:1455013 is a vertebrate peptide which stimulates uterine smooth muscle contraction and causes selective vasoconstriction. Like most other active peptides, it is proteolytically processed from a larger precursor protein. The mature peptides are 8 (NmU-8) to 25 (NmU-25) residues long and C-terminally amidated.

    \

    The sequence of the C-terminal extremity of NmU is extremely well conserved.

    \ 7463 IPR011429 \

    These proteins share a region of homology at their N terminus that contains the C-{CPWHF}-{CPWR}-C-H-{CFYW} motif typical of cytochromes C.

    \ 4283 IPR007646 \ RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Domain 4, is also known as the external 2 domain PUBMED:11313498.\ 3778 IPR002142 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of serine peptidases belong to MEROPS peptidase family S49 (protease IV family, clan S-). The predicted active site serine for members of this family occurs in a transmembrane domain.

    \ \

    The domain defines sequences in viruses, archaea, bacteria and plants. These sequences are variousely annotated in the different taxonomic groups, examples are:

    \ \

    \ \

    This group also contains proteins classified as non-peptidase homologues that either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity of peptidases. Related proteins, non-peptidase homologs and unclassified S49 members are also to be found in .

    \ \ 2787 IPR000234 \ This family of proteins are the surface glycoprotein of various herpesviruses.\ The glycoprotein is anchored to the lipid envelope of the virus by a transmembrane region.\ 2360 IPR001269 \

    Members of this family catalyse the reduction of the 5,6-double bond of a uridine residue on tRNA. Dihydrouridine modification of tRNA is widely observed in prokaryotes and eukaryotes, and also in some archae. Most dihydrouridines are found in the D loop of t-RNAs. The role of dihydrouridine in tRNA is currently unknown, but may increase conformational flexibility of the tRNA. It is likely that different family members have different substrate specificities, which may overlap. Dus 1 () from Saccharomyces cerevisiae acts on pre-tRNA-Phe, while Dus 2 () acts on pre-tRNA-Tyr and pre-tRNA-Leu. Dus 1 is active as a single subunit, requiring NADPH or NADH, and is stimulated by the presence of FAD PUBMED:12003496. Some family members may be targeted to the mitochondria and even have a role in mitochondria PUBMED:12003496.

    \ 1551 IPR008251 \

    Chromo shadow domain is distantly related to chromo domain. It is always found in association with a chromo domain.

    \

    The CHROMO (CHRromatin Organization MOdifier) domain PUBMED:1982376, PUBMED:1708124, PUBMED:7667093, PUBMED:7501439 \ is a conserved region of around 60 amino acids, originally identified in Drosophila modifiers of variegation.\ These are proteins that alter the structure of chromatin to the condensed morphology of heterochromatin, \ a cytologically visible condition where gene expression is repressed. In one of these proteins, Polycomb, \ the chromo domain has been shown to be important for chromatin targeting. Proteins that contain a chromo \ domain appear to fall into 3 classes. The first class includes proteins having an N-terminal chromo domain \ followed by a region termed the chromo shadow domain PUBMED:7667093, eg. Drosophila and human heterochromatin \ protein Su(var)205 (HP1); and mammalian modifier 1 and modifier 2. The second class includes proteins with \ a single chromo domain, eg. Drosophila protein Polycomb (Pc); mammalian modifier 3; human Mi-2 autoantigenand \ and several yeast and Caenorhabditis elegans hypothetical proteins. In the third class paired tandem chromo domains are \ found, eg. in mammalian DNA-binding/helicase proteins CHD-1 to CHD-4 and yeast protein CHD1.

    \ 633 IPR002934 \

    A small region that overlaps with a nuclear localization signal and binds to the RNA primer contains three aspartates that are essential for catalysis. Sequence and secondary structure comparisons of regions surrounding these aspartates with sequences of other polymerases revealed a significant homology to the palm structure of DNA polymerase beta, terminal deoxynucleotidyltransferase and DNA polymerase IV of Saccharomyces cerevisiae, all members of the family X of polymerases. This homology extends as far as cca: tRNA nucleotidyltransferase and streptomycin adenylyltransferase, an antibiotic resistance factor PUBMED:7482698, PUBMED:8665867.

    \

    \ Proteins containing this domain include kanamycin nucleotidyltransferase (KNTase) which is a plasmid-coded enzyme responsible for some types of bacterial resistance to aminoglycosides. KNTase inactivates antibiotics by catalysing the addition of a nucleotidyl group onto the drug. In experiments, Mn2+ strongly stimulated this reaction due to a 50-fold lower Ki for 8-azido-ATP in the presence of Mn2+. Mutations of the highly conserved\ Asp residues 113, 115, and 167, critical for metal binding in the catalytic domain of bovine poly(A) polymerase, led to a strong\ reduction of cross-linking efficiency, and Mn2+ no longer stimulated the reaction. Mutations in the region of the "helical turn motif"\ (a domain binding the triphosphate moiety of the nucleotide) and in the suspected nucleotide-binding helix of bovine poly(A) polymerase\ impaired ATP binding and catalysis. The results indicate that ATP is bound in part by the helical turn motif and in part by a region that\ may be a structural analogue of the fingers domain found in many polymerases.

    \ 4965 IPR005817 \

    Wnt-1 (previously known as int-1) is a proto-oncogene induced by the integration of the mouse mammary tumor virus. It is thought to play a role in intercellular communication and seems to be a signalling molecule important in the development of the central nervous system (CNS). The sequence of wnt-1 is highly conserved in mammals, fish, and amphibians. Wnt-1 is a member of a large family of related proteins that are all thought to be developmental regulators. These proteins are known as wnt-2 (also known as irp), wnt-3 up to wnt-15. At least four members of this family are present in Drosophila. One of them, wingless (wg), is implicated in segmentation polarity. All these proteins share the following features characteristics of secretory proteins, a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines that are probably involved in disulphide bonds. The Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are therefore likely to signal over only few cell diameters.

    \ 5712 IPR008574 \ This family consists of proteins of unknown function found in Caenorhabditis species.\ 1023 IPR007698 \

    Alanine dehydrogenases () and pyridine nucleotide transhydrogenase () have been\ shown to share regions of similarity PUBMED:8439307. Alanine dehydrogenase catalyzes the NAD-dependent\ reversible reductive amination of pyruvate into alanine. Pyridine nucleotide transhydrogenase catalyzes\ the reduction of NADP+ to NADPH with the concomitant oxidation of NADH to NAD+. This enzyme is located\ in the plasma membrane of prokaryotes and in the inner membrane of the mitochondria of eukaryotes. The\ transhydrogenation between NADH and NADP is coupled with the translocation of a proton across the\ membrane. In prokaryotes the enzyme is composed of two different subunits, an alpha chain (gene pntA)\ and a beta chain (gene pntB), while in eukaryotes it is a single chain protein. The sequence of alanine\ dehydrogenase from several bacterial species are related with those of the alpha subunit of bacterial\ pyridine nucleotide transhydrogenase and of the N-terminal half of the eukaryotic enzyme. The two most\ conserved regions correspond respectively to the N-terminal extremity of these proteins and to a central\ glycine-rich region which is part of the NAD(H)-binding site.

    \

    This is a C-terminal domain of alanine dehydrogenases (). This domain is also found in the lysine 2-oxoglutarate reductases.

    \ 5768 IPR010266 \

    This family consists of several bacterial NnrS like proteins. NnrS is a putative haeme-Cu protein (NnrS) and a member of the short-chain dehydrogenase family PUBMED:12618453. Expression of nnrS is dependent on the transcriptional regulator NnrR, which also regulates expression of genes required for the reduction of nitrite to nitrous oxide, including nirK and nor. NnrS is a haem- and copper-containing membrane protein. Genes encoding putative orthologues of NnrS are sometimes but not always found in bacteria encoding nitrite and/or nitric oxide reductase PUBMED:11882718.

    \ 6418 IPR009513 \

    This family consists of several PerB or BfpV proteins found specifically in Escherichia coli. PerB is thought to play a role in regulating the expression of BfpA PUBMED:7729884.

    \ 7810 IPR013117 \

    This domain is found at the C terminus of intimin. Its structure has been solved and shown to have a C-lectin type of structure PUBMED:10835344. Intimin is a bacterial adhesion molecule involved in intimate attachment of enteropathogenic and enterohemorrhagic Escherichia coli to mammalian host cells. Intimin targets the translocated intimin receptor (Tir), which is exported by the bacteria and integrated into the host cell plasma membrane.

    \ 4594 IPR007791 \ This family contains the TerB tellurite resistance proteins from a number of bacteria.\ 1871 IPR002855 \

    The archaeal proteins in this family have no known function.

    \ 6092 IPR006453 \

    This family describes a small protein of about 100 amino acids found in bacteriophage and in bacterial prophage regions.The function of these proteins is not known.

    \ 6183 IPR009414 \

    This family consists of several phage and bacterial proteins of unknown function.

    \ 2311 IPR007771 \ This family contains uncharacterised proteins which seem to be found exclusively in Mesorhizobium loti.\ 3602 IPR000711 \

    Synonym(s): ATP synthase, bacterial Ca2+/Mg2+ ATPase, chloroplast ATPase, coupling factors (F0,F1 and CF1), F0F1-ATPase, F1-ATPase, \ F1F0H+-ATPase, H+-ATPase, H+-translocating ATPase, H+-transporting ATPase, mitochondrial ATPase, proton-ATP.

    \ \

    The H(+)-transporting two-sector ATPase () is a component of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATP synthase complex is composed of a nine-subunit (A-G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (alpha, beta, gamma, delta, epsilon) catalytic core for ATP synthesis (F1-ATPase). The F1-ATPase uses the transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and phosphate. The F1-ATPase has been shown to be a rotary motor in which the central gamma subunit rotates inside the cylinder made of alpha3beta3 subunits, using the ATP as a driving force via an ATP binding and hydrolysis cycle PUBMED:11309608.

    \ \ \ \ \ \

    The catalytic core delta subunit (referred to as oligomycin\ sensitivity conferral protein, OSCP, in mitochondria) appears to be part of the stalk that links\ CF0 and CF1, in which context it either transmits conformational changes from CF0 into CF1, or is\ implicated in proton conduction PUBMED:2154253. Delta subunits contain around 200 amino acids, the\ proteins from different sources exhibiting only moderate sequence similarity.

    \ 2239 IPR007608 \ This family contains several uncharacterised proteins.\ 2108 IPR007386 \ This is an archaeal protein of unknown function.\ 5884 IPR010332 \

    This family of proteins are annotated as ATPase subunits of phage terminase after PUBMED:10949585. Terminases are viral proteins that are involved in packaging viral DNA into the capsid.

    \ 3651 IPR006880 \ This is a group of proteins with a conserved C-terminal region which is found in PAPA-1, a PAP-1 binding protein, . \ 3837 IPR006498 \

    The tails of some phage are contractile. These sequences represent the tail tube, or tail core, protein of the contractile tail of phage P2, and homologous proteins from other phage.

    \ 5052 IPR007889 \

    This DNA-binding motif is found in four copies in the pipsqueak protein of Drosophila melanogaster PUBMED:9774480. In pipsqueak this domain\ binds to GAGA sequence PUBMED:9774480. The pipsqueak family, which includes proteins from fungi, sea urchins,\ nematodes, insects, and vertebrates appear to be proteins essential for sequence-specific targeting of a polycomb group protein\ complex PUBMED:12167718.

    \ 1495 IPR007593 \ This family includes the human leukocyte antigen CD225, which is an interferon inducible transmembrane protein, and is associated with interferon induced cell growth suppression PUBMED:7559564.\ 2393 IPR001361 \ Equine infectious anemia (EIAV) belongs to the family retroviridae. EIAV gp90 is \ hypervariable in the carboxyl-end region and more stable in the amino-end region. This \ variability is a pathogenicity factor that allows the evasion of the host's immune \ response PUBMED:1649329.\ 6958 IPR009813 \

    This family consists of several bacterial YebG proteins of around 75 residues in length. The exact function of this protein is unknown but it is thought to be involved in the SOS response. The induction of the yebG gene occurs as cell enter into the stationary growth phase and is dependent on is dependent on cyclic AMP and H-NS PUBMED:10474193.

    \ 4169 IPR001380 \

    Ribosomes are the particles that catalyze mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites PUBMED:11297922, PUBMED:11290319. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    \

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilize its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organize and stabilize the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome PUBMED:11290319, PUBMED:11114498.

    \ \ \

    The ribosomal protein L13e is widely found in vertebrates PUBMED:8198561, Drosophila melanogaster, plants, yeast and others.

    \ 1200 IPR004349 \

    The nitrogenase complex catalyses the conversion of molecular nitrogen to ammonia (nitrogen fixation). The complex is hexameric, consisting of 2 alpha, 2 beta, and 2 delta subunits.

    \ \

    This family represents the delta\ subunit of a group of nitrogenases that do not utilise molybdenum (Mo) as a cofactor, but instead use either vanadium (V\ nitrogenases), or iron (alternative nitrogenases).

    \ 6067 IPR010418 \

    Activation of NF-kappaB as a consequence of signaling through the Toll and IL-1 receptors is a major element of innate immune responses. ECSIT plays an important role in signalling to NF-kappaB, functioning as the intermediate in the signaling pathways between TRAF-6 and MEKK-1 PUBMED:10465784.

    \ 1204 IPR006805 \ Anthranilate synthase catalyses the first step in the biosynthesis of tryptophan. Component I catalyses the formation of anthranilate using ammonia and chorismate. The catalytic site lies in the adjacent region, described in the chorismate binding enzyme family (). This region is involved in feedback inhibition by tryptophan PUBMED:11371633. This family also contains a region of Para-aminobenzoate synthase component I.\ 7608 IPR012900 \

    This region is found to the N-terminus of , which is a transcription factor domain. It is between 150 and 200 amino acids in length. The N-terminal half is rather rich in proline residues and has been termed the PRD (proline rich domain) PUBMED:11722549, whereas the C-terminal half is more polar and has been called the MFMR (multifunctional mosaic region). It has been suggested that this family is composed of three sub-families called A, B and C PUBMED:8127687, classified according to motif composition. It has been suggested that some of these motifs may be involved in mediating protein-protein interactions PUBMED:8127687. The MFMR region contains a nuclear localisation signal in bZIP opaque and GBF-2 PUBMED:11722549. The MFMR also contains a transregulatory activity in TAF-1. The MFMR in CPRF-2 contains cytoplasmic retention signals PUBMED:11722549.

    \ 6128 IPR010443 \

    This family consists of several type II restriction enzymes.

    \ 1443 IPR001315 \

    The caspase recruitment domain domain (CARD) is a homotypic protein interaction module composed of a bundle of six alpha-helices. CARD is related in sequence and structure to the death domain (DD, see ) and the death effector domain (DED, see ), which work in similar pathways and show similar interaction properties PUBMED:11504623. The CARD domain typically associates with other CARD-containing proteins, forming either dimers or trimers. CARD domains can be found in isolation, or in combination with other domains. Domains associated with CARD include: NACHT () (in Nal1 and Bir1), NB-ARC () (in Apaf-1), pyrin/dapin domains () (in Nal1), leucine-rich repeats () (in Nal1), WD repeats () (in Apaf1), Src homology domains (), PDZ (), RING, kinase and DD domains PUBMED:15226512.

    \

    CARD-containing proteins are involved in apoptosis through their regulation of caspases that contain CARDs in their N-terminal pro-domains, including human caspases 1, 2, 9, 11 and 12 PUBMED:9175472. CARD-containing proteins are also involved in inflammation through their regulation of NF-kappaB PUBMED:12101092. The mechanisms by which CARDs activate caspases and NF-kappaB involve the assembly of multi-protein complexes, which can facilitate dimerisation or serve as scaffolds on which proteases and kinases are assembled and activated.

    \ \ \ \ 5116 IPR007953 \

    This family consists of several borrelial hemolysin accessory proteins (BLYB). BLYB was\ thought to be an accessory protein, which was proposed to comprise a hemolysis system but it is\ now thought that BlyA and BlyB function instead as a prophage-encoded holin or holin-like system\ PUBMED:11073925.

    \ 3919 IPR000860 \

    Porphobilinogen deaminase (PBGD), or hydroxymethylbilane synthase, is the third enzyme in the \ biosynthetic pathway of tetrapyrroles, which include the vitally important macrocycles haem, \ chlorophyll and corrin PUBMED:. PBGD catalyses the head-to-tail polymerisation of 4 molecules \ of porphobilinogen to assemble the open chain tetrapyrrole, hydroxymethylbilane. PBGD is a \ ubiquitously occurring, monomeric protein, showing high sequence conservation among proteins from \ bacteria, fungi, plants and mammals. The protein contains a dipyrromethane cofactor, which is \ covalently attached to a cysteine side chain. The structure of PBGD shows the same chain fold\ as proteins from 2 classes of binding protein, the transferrins and the group-II periplasmic \ receptors (the sulphate-, phosphate-, maltodextrin- and lysine/arginine/ornithine-binding proteins). \ Despite structural similarities, there is no significant identity between their sequences.

    \ \ 4916 IPR003436 \ This is a family of viral fusion proteins from the chordopoxviruses. A 14-kDa Vaccinia Virus protein has been demonstrated to function as a viral fusion protein mediating cell fusion at endosmomal (low) pH PUBMED:2389560. The protein, found in the envelope fraction of the virions, is required for fusing the outermost of the two golgi-derived membranes enveloping the virus with the plasma membrane, and its subsequent release extracellularly. The N-terminal proximal region is essential for its fusion ability.\ 2214 IPR007566 \ This is a family of uncharacterised, hypothetical archaeal proteins.\ 605 IPR000857 \

    The microtubule-based kinesin motors and actin-based myosin motors generate movements required for intracellular trafficking, cell division, and muscle contraction. In general, these proteins consist of a motor domain that generates movement and a tail region that varies widely from class to class and is thought to mediate many of the regulatory or cargo binding functions specific to each class of motor PUBMED:11212352. The Myosin Tail Homology 4 (MyTH4) domain has been identified as a conserved domain in the tail domains of several different unconventional myosins PUBMED:11401444 and a plant kinesin-like protein PUBMED:1074599, but has more recently been found in several non-motor proteins PUBMED:12062040. Although the function is not yet fully understood, there is an evidence that the MyTH4 domain of Myosin-X (Myo10) binds to microtubules and thus could provide a link between an actin-based motor protein and the microtubule cytoskeleton PUBMED:15372037.

    \ \

    The MyTH4 domain is found in one or two copies associated\ with other domains, such as myosin head, kinesin motor, FERM, PH, SH3 and IQ. The domain is predicted to be largely alpha-helical, interrupted by three or\ four turns. The MyTH4 domain contains four highly conserved regions designated\ MGD (consensus sequence L(K/R)(F/Y)MGDhP, LRDE (consensus LRDEhYCQhhKQHxxxN),\ RGW (consensus RGWxLh), and ELEA (RxxPPSxhELEA), where h indicates a\ hydrophobic residue and x is any residue PUBMED:11401444.

    \ \ 5004 IPR003319 \

    Mitochondria, organelles specialized in energy conservation reactions in eukaryotic cells, have evolved from bacteria-like endosymbionts whose closest known relatives are the Rickettsia group of alphaproteobacteria.\ A primitive mitochondrial genome, in the freshwater protozoon Reclinomonas americana has been described PUBMED:9168110 and seems to contain genes for 5S ribosomal RNA, the RNA component of RNase P, and at least 18 proteins not previously known to be encoded in mitochondria.

    \

    This domain represents ATPase subunit 8, which is part of the Fo component of the mitochondrial ATP synthase PUBMED:12681508, PUBMED:12671689, PUBMED:9461442. This domain is also known as orfB or ymf19. It is sometimes found in association and N-terminal to , in higher plants.

    \ 7198 IPR008320 \ There are currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.\ 351 IPR004342 \

    The EXS domain is named after ERD1/XPR1/SYG1 and proteins containing this motif include the C-terminal of the SYG1 G-protein associated signal transduction protein from Saccharomyces cerevisiae, and sequences that are\ thought to be murine leukaemia virus (MLV) receptors (XPR1. The N-terminal of these proteins often have an SPX domain ()PUBMED:9990033.

    \

    While the N-terminal is thought to be involved in signal\ transduction, the role of the C-terminal is not known. This region of similarity contains\ several predicted transmembrane helices. This family also includes the ERD1 (ERD: ER retention defective) Saccharomyces cerevisiae\ proteins. ERD1 proteins are involved in the localization of endogenous endoplasmic reticulum (ER)\ proteins. erd1 null mutants secrete such proteins even though they possess the C-terminal HDEL ER lumen localization\ label sequence. In addition, null mutants also exhibit defects in the Golgi-dependent processing of several glycoproteins,\ which led to the suggestion that the sorting of luminal ER proteins actually occurs in the Golgi, with subsequent return of\ these proteins to the ER via 'salvage' vesicles PUBMED:2178921.

    \ 5662 IPR008671 \ This family consists of lycopene beta and epsilon cyclase proteins. Carotenoids with cyclic end groups are essential components of the photosynthetic membranes in all plants, algae, and cyanobacteria. These lipid-soluble compounds protect against photo-oxidation, harvest light for photosynthesis, and dissipate excess light energy absorbed by the antenna pigments. The cyclisation of lycopene (psi, psi-carotene) is a key branch point in the pathway of carotenoid biosynthesis. Two types of cyclic end groups are found in higher plant carotenoids: the beta and epsilon rings. Carotenoids with two beta rings are ubiquitous, and those with one beta and one epsilon ring are common; however, carotenoids with two epsilon rings are rare PUBMED:8837512.\ 4504 IPR000992 \ It has recently been shown PUBMED:1304897 that three yeast proteins, two of which are known to be induced \ by various stress conditions, are structurally related and are probably part of a larger family. These \ proteins include cold-shock inducible protein TIR1 (also known as serine-rich protein 1, SRP1), which is \ induced by glucose PUBMED:3139887 and cold shock PUBMED:7746155; temperature-shock inducible protein 1 \ (SRP2) PUBMED:7746155; seripauperins, which are closely related protein of about 13 kD (120 to 124 residues) \ and are generally encoded at the extremity of yeast chromosomes (eg. PAU1, PAU2, PAU3, PAU4, PAU5, PAU6, \ YBR301w, YGL261c, YGR294w, YHL046c, YIL176c, YIR041w and YKL224c) PUBMED:7926827; and hypothetical proteins \ YIL011w, YJR150c and YJR151c. These proteins all seem to start with a putative signal sequence followed by \ a conserved domain of about 90 residues. In TIR1, TIR2, TIP1, YIL011w, YJR150c and YJR151c, this domain is \ followed by a repetitive serine and alanine rich region absent in the other members of this family.\ 4482 IPR005605 \

    Saccharomyces cerevisiae Spo7 has an unknown function, but has a role in formation of a spherical nucleus and meiotic division PUBMED:9822591.

    \ 1263 IPR000246 \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Threonine peptidases are characterized by a threonine nucleophile at the N terminus of the mature enzyme. The type example for this clan is the archaean proteasome beta component of Thermoplasma acidophilum.

    \

    This group of sequences have a signature that places them in MEROPS peptidase family T2 (clan PB(T)). The glycosylasparaginases () are threonine peptidases. Also in this family is L-asparaginase (), which catalyses the following reaction:\

    \

    Glycosylasparaginase catalyses:\ \ cleaving the GlcNAc-Asn bond that links oligosaccharides to asparagine in N-linked glycoproteins. The enzyme is composed of two non-identical alpha/beta subunits joined by strong non-covalent forces and has one glycosylation site located in the alpha subunit PUBMED:8877373 and plays a major role in the degradation of glycoproteins.

    \ 6753 IPR009695 \

    This family represents a conserved region of approximately 180 residues within plant and bacterial monogalactosyldiacylglycerol (MGDG) synthase (). In Arabidopsis, there are two types of MGDG synthase which differ in their N-terminal portion: type A and type B PUBMED:11553816.

    \ 919 IPR007303 \ The TOR signalling pathway activates a cell-growth program in response to nutrients PUBMED:10604478. TIP41 interacts with TAP42 and negatively regulates the TOR signaling pathway PUBMED:11741537.\ 1844 IPR002810 \

    This entry describes archaeal and bacterial proteins of unknown function which are variously described, examples are: nodulation protein, nodulation efficiency protein D (nfeD), hypothetical protein and membrane-bound serine protease (ClpP class). A number of these proteins are classified in MEROPS peptidase family S49 () as non-peptidase homologues or as unassigned peptidases.

    \ \

    The nfe genes (nfeA, nfeB, and nfeD) are involved in the nodulation efficiency and competitiveness of the Sinorhizobium meliloti strain GR4 on alfalfa roots PUBMED:10830257. The specific function of this family is unknown although it is unlikely that NfeD is specifically involved in nodulation as the family contains several different archaeal and bacterial species most of which are not symbionts.

    \ 2923 IPR007629 \ UL20 is predicted to be a transmembrane protein with multiple membrane spans. It is involved in the trans-cellular transport of enveloped virions, and is therefore important for viral egress. However, UL20 operates in different cellular compartments and different stages of egress in pseudorabies virus and herpes simplex virus. This is thought to be due to differences in egress pathways between these two viruses PUBMED:9188641.\ 4801 IPR004280 \ Members of this family are functionally uncharacterised proteins from herpesviruses.\ 3119 IPR004121 \ Current genotyping systems for human herpesvirus 8 (HHV-8) are based\ on the highly variable gene encoding the K1 glycoprotein PUBMED:11172090.\ 3934 IPR004966 \ The Pox virus Ag35 surface protein is an evelope protein known as protein H5.\ 3987 IPR002088 \

    Protein prenylation is the posttranslational attachment of either a farnesyl group or a geranylgeranyl group via a thioether\ linkage (-C-S-C-) to a cysteine at or near the carboxyl terminus of the protein. Farnesyl and geranylgeranyl groups are\ polyisoprenes, unsaturated hydrocarbons with a multiple of five carbons; the chain is 15 carbons long in the farnesyl moiety\ and 20 carbons long in the geranylgeranyl moiety. There are three different protein prenyltransferases in\ humans: farnesyltransferase (FT) and geranylgeranyltransferase 1 (GGT1) share the same motif (the CaaX box) around the\ cysteine in their substrates, and are thus called CaaX prenyltransferases, whereas geranylgeranyltransferase 2 (GGT2, also\ called Rab geranylgeranyltransferase) recognizes a different motif and is thus called a non-CaaX prenyltransferase. Protein prenyltransferases are currently known only in eukaryotes, but they are widespread, being found in vertebrates,\ insects, nematodes, plants, fungi and protozoa, including several parasites.

    Each\ protein consists of two subunits, alpha and beta; the alpha subunit of FT and GGT1 is encoded by the same gene, FNTA. The alpha subunit is thought to participate in \ a stable complex with the isoprenyl substrate; the beta subunit binds the peptide \ substrate. In the alpha subunits of both types of protein prenyltransferases, seven tetratricopeptide repeats are\ formed by pairs of helices that are stabilized by conserved intercalating residues. The alpha subunits of GGT2 in\ mammals and plants also have an immunoglobulin-like domain between the fifth and sixth tetratricopeptide repeat, as well\ as leucine-rich repeats at the carboxyl terminus. The functions of these additional domains in GGT2 are as yet undefined,\ but they are apparently not directly involved in the interaction with substrates and Rab escort proteins.\ The tetratricopeptide repeats of the alpha subunit form a right-handed superhelix, which embraces the (alpha-alpha)6 barrel of the beta\ subunit PUBMED:1622936.

    \ 1172 IPR000930 \

    Proteolytic enzymes that exploit serine in their catalytic activity are\ ubiquitous, being found in viruses, bacteria and eukaryotes PUBMED:7845208. They\ include a wide range of peptidase activity, including exopeptidase, endopeptidase,\ oligopeptidase and omega-peptidase activity. Over 20 families\ (denoted S1 - S27) of serine protease have been identified, these being\ grouped into 6 clans (SA, SB, SC, SE, SF and SG) on the basis of structural\ similarity and other functional evidence PUBMED:7845208. Structures are known for four\ of the clans (SA, SB, SC and SE): these appear to be totally unrelated,\ suggesting at least four evolutionary origins of serine peptidases and\ possibly many more PUBMED:7845208.

    \ \

    Notwithstanding their different evolutionary origins, there are similarities\ in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin\ and carboxypeptidase C clans have a catalytic triad of serine, aspartate and\ histidine in common: serine acts as a nucleophile, aspartate as an\ electrophile, and histidine as a base PUBMED:7845208. The geometric orientations of\ the catalytic residues are similar between families, despite different\ protein folds PUBMED:7845208. The linear arrangements of the catalytic residues\ commonly reflect clan relationships. For example the catalytic triad in\ the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the\ subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) PUBMED:7845208, PUBMED:8439290.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    Togavirin, also known as Sindbis virus core endopeptidase, is a serine protease resident at the N-terminus of the p130 polyprotein of togaviruses PUBMED:7845208. The endopeptidase signature identifies the peptidase as belonging to the MEROPS peptidase family S3 (togavirin family, clan PA(S)). The polyprotein also includes structural proteins for the nucleocapsid core and for the glycoprotein spikes PUBMED:7845208. Togavirin is only active while part of the polyprotein, cleavage at a Trp-Ser bond resulting in total lack of activity PUBMED:7845208. Mutagenesis studies have identified the location of the His-Asp-Ser catalytic triad, and X-ray studies have revealed the protein fold to be similar to that of chymotrypsin PUBMED:7845208, PUBMED:1944569.

    \ 6610 IPR009620 \

    This is a group of proteins of unknown function.

    \ 5647 IPR008718 \ This family consists of Rhizobium NolX and Xanthomonas HrpF proteins. The interaction between the plant pathogen Xanthomonas campestris pv. vesicatoria and its host plants is controlled by hrp genes (hypersensitive reaction and pathogenicity), which encode a type III protein secretion system. Among type III-secreted proteins are avirulence proteins, effectors involved in the induction of plant defence reactions. HrpF is dispensable for protein secretion but required for AvrBs3 recognition in planta, is thought to function as a translocator of effector proteins into the host cell PUBMED:11115117. NolX, a Glycine max cultivar specificity protein, is secreted by a type III secretion system (TTSS) and shows homology to HrpF of the plant pathogen Xanthomonas campestris pv. vesicatoria. It is not known whether NolX functions at the bacterium-plant interface or acts inside the host cell. NolX is expressed in planta only during the early stages of nodule development PUBMED:11790754.\ 5810 IPR010293 \

    This is a family of bacterial proteins with unknown function

    \ 7990 IPR012977 \

    This N-terminal domain is found in a novel nucleolar protein family PUBMED:15112237.

    \ 5325 IPR008597 \ Destabilase is an endo-epsilon(gamma-Glu)-Lys isopeptidase, which cleaves isopeptide bonds formed by transglutaminase (Factor XIIIa) between glutamine gamma-carboxamide and the epsilon-amino group of lysine PUBMED:9003282.\ 2687 IPR003130 \

    Dynamin GTPase effector domain found in proteins related to dynamin.

    \ \

    Dynamin is a GTP-hydrolysing protein that is an essential participant in clathrin-mediated endocytosis by cells. It self-assembles into 'collars' in vivo at the necks of invaginated coated pits; the self-assembly of dynamin being coordinated by the GTPase domain. Mutation studies indicate that dynamin functions as a molecular regulator of receptor-mediated endocytosis PUBMED:10206643.

    \ \ 6273 IPR010499 \

    This domain is found in the probable effector binding domain of a number of different bacterial transcription activators PUBMED:10802742 and is also present in some DNA gyrase inhibitors. The absence of a HTH motif in the DNA gyrase inhibitors is thought to indicate the fact that these do not bind DNA.

    \ 5918 IPR009286 \

    This is a family of eukaryotic proteins with unknown function.

    \ 5217 IPR008681 \ This family contains several bacterial MecA proteins. The development of competence in Bacillus subtilis is regulated by growth conditions and several regulatory genes. In complex media competence development is poor, and there is little or no expression of late competence genes. Mec mutations Trachinotus falcatus competence development and late competence gene expression in complex media, bypassing the requirements for many of the competence regulatory genes. The mecA gene product acts negatively in the development of competence. Null mutations in mecA allow expression of a late competence gene comG, under conditions where it is not normally expressed, including in complex media and in cells mutant for several competence regulatory genes. Overexpression of MecA inhibits comG transcriptionPUBMED:11004200, PUBMED:12028382, PUBMED:8412687.\ 160 IPR003780 \ Cytochrome aa3 is one of two terminal oxidase complexes in the Bacillus subtilis\ electron transport chain. CtaA is required for cytochrome aa3 biosynthesis and sporulation in Bacillus subtilis PUBMED:2549006. In yeast the COX15 protein is required for cytochrome c oxidase assembly.\ 7023 IPR010809 \

    The flagellar hook-associated protein 2 (HAP2 or FliD) forms the distal end of the flagella, and plays a role in mucin specific adhesion of the bacteria PUBMED:9488388. This alignment covers the C-terminal region of the flagellar hook-associated protein 2.

    \ 930 IPR003480 \ This family includes a number of transferase enzymes. These include anthranilate N-hydroxycinnamoyl/benzoyltransferase that catalyzes the first committed reaction of phytoalexin biosynthesis PUBMED:9426598. Deacetylvindoline 4-O-acetyltransferase () catalyzes the last step in vindoline biosynthesis is also a member of this family PUBMED:9681034. The motif HXXXD is probably part of the active site. The family also includes trichothecene 3-O-acetyltransferase.\ 6482 IPR009546 \

    This family represents a conserved region approximately 150 residues long within a number of hypothetical Oryza sativa proteins of unknown function.

    \ 5221 IPR008652 \ This family consists of several early glycoproteins from Homo sapiens adenoviruses.\ 1392 IPR004874 \ This is a group of Borrelia proteins that have not yet been characterised, but contain repeated regions.\ 1807 IPR001098 \ Synonym(s): DNA nucleotidyltransferase (DNA-directed) \

    DNA-directed DNA polymerases() are the key enzymes catalyzing the\ accurate replication of DNA. They require either a small RNA molecule or a\ protein as a primer for the de novo synthesis of a DNA chain. A number of\ polymerases belong to this family.

    \ 5578 IPR008819 \ Rubella virus is an enveloped positive-strand RNA virus of the family Togaviridae. Virions are composed of three structural proteins: a capsid and two membrane-spanning glycoproteins, E2 and E1. During virus assembly, the capsid interacts with genomic RNA to form nucleocapsids. It has been discovered that capsid phosphorylation serves to negatively regulate binding of viral genomic RNA. This may delay the initiation of nucleocapsid assembly until sufficient amounts of virus glycoproteins accumulate at the budding site and/or prevent non-specific binding to cellular RNA when levels of genomic RNA are low. It follows that at a late stage in replication, the capsid may undergo dephosphorylation before nucleocapsid assembly occurs PUBMED:12525610. This family is found together with and .\ 6317 IPR009473 \

    This family consists of several Orthopoxvirus A49R proteins. The function of this family is unknown.

    \ 2024 IPR007126 \

    This family consists of several REV proteins from Borrelia burgdorferi (Lyme disease spirochete) and Borrelia garinii. The function of REV is unknown although it has been shown that the gene is induced during the ingesting of host blood suggesting a role in the metabolic activation of borreliae to adapt to physiological stimuli PUBMED:11580974.

    \ \ 4837 IPR004255 \ This family of uncharacterised proteins is greatly expanded in Mycobacterium tuberculosis.\ 7713 IPR012889 \

    Proteins containing this domain are similar to L-fucose isomerase expressed by Escherichia coli (, ). This enzyme corresponds to glucose-6-phosphate isomerase in glycolysis, and converts an aldo-hexose to a ketose to prepare it for aldol cleavage. The enzyme is a hexamer, with each subunit being wedge-shaped and composed of three domains. Both domains 1 and 2 contain central parallel beta- sheets with surrounding alpha helices. The active centre is shared between pairs of subunits related along the molecular three-fold axis, with domains 2 and 3 from one subunit providing most of the substrate-contacting residues PUBMED:9367760.

    \ 7175 IPR009947 \

    This family contains the eukaryotic NADH:ubiquinone oxidoreductase subunit B14.5a (Complex I-B14.5a) (). This is approximately 100 residues long, and forms part of a multiprotein complex that resides on the inner mitochondrial membrane. The main function of the complex is the transport of electrons from NADH to ubiquinone, accompanied by translocation of protons from the mitochondrial matrix to the intermembrane space PUBMED:9878551.

    \ 3181 IPR005640 \

    Animal lectins display a wide variety of architectures.\ They are classified according to the carbohydrate-recognition\ domain (CRD) of which there are two main types, S-type and C-type.

    \

    C-type lectins display a wide range of specificities.\ They require Ca2+ for their activity\ They are found predominantly but not exclusively in vertebrates.

    \

    This entry presents N-terminal domain, which is found in C-type lectins.

    \ 6723 IPR009680 \

    This family consists of several Lactococcus lactis and Lactococcus phage proteins of around 74 residues in length. The function of this family is unknown.

    \ 5952 IPR009303 \

    This family consists of several hypothetical proteins from several species of Staphylococcus. The function of this family is unknown.

    \ 4470 IPR002954 \ The Salmonella typhimurium Surface Presentation of Antigens M gene (SpaM)\ is one of 12 that form a cluster responsible for invasion properties PUBMED:8404849.\ The gene product is required for entry by the bacterium into epithelial\ cells, and is thus considered to be a virulence factor PUBMED:8404849. Other Spa genes \ in the cluster are related to invasion (Inv) genes in similar Salmonella \ and Shigella species PUBMED:7752894, and flagella biosynthesis genes in Helicobacter\ pylori PUBMED:10066464.\ \

    A homologue of this protein has been found recently in Salmonella enterica\ PUBMED:9068645. The protein, named InvI, is required by the organism to gain access to\ mammalian epithelial cells, and cellular mutants (InvI-) failed to\ successfully infect these cells. It has also been found that the inv-spa \ loci of this particular species encode for a type III protein secretion\ system, essential in the bacterium's host cell invasion process PUBMED:8751894.

    \ 8025 IPR013175 \

    This is a family of conserved fungal proteins of unknown function.

    \ 6005 IPR010387 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 6568 IPR009603 \

    This family represents a short conserved repeat within Drosophila melanogaster proteins of unknown function. Approximately 50 copies of this repeat are present in each protein.

    \ 5685 IPR008875 \ This family consists of several bacterial TraX proteins. TraX is responsible for the N-terminal acetylation of F-pilin subunits PUBMED:8444800.\ 1442 IPR004231 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    This family is represented by the well-characterised metallocarboxypeptidase A inhibitor (MCPI) from potatoes, which belongs to the MEROPS inhibitor family I37, clan IE. It inhibits metallopeptidases belonging to MEROPS peptidase family M14, carboxypeptidase A. In Russet Burbank potatoes, it is a mixture of approximately equal amounts of two polypeptide chains containing 38 or 39 amino acid residues. The chains differ in their amino terminal sequence only PUBMED:1122280 and are resistant to fragmentation by proteases PUBMED:444453. The structure of the complex between bovine carboxypeptidase A and the 39-amino-acid carboxypeptidase A inhibitor from potatoes has been determined at 2.5-A resolution PUBMED:6933511.

    \ \

    The potato inhibitor is synthesised as a precursor, having a 29 residue N-terminal signal peptide, a 27 residue pro-peptide, the 39 residue mature inhibitor region and a 7 residue C-terminal extension. The 7 residue C-terminal extension is involved in inhibitor inactivation and may be required for targeting to the vacuole where the mature active inhibitor accumulates PUBMED:9862450.

    \ \

    The N-terminal region and the mature inhibitor are weakly related to other solananaceous proteins found in this entry, from potato, tomato and henbane, which have been incorrectly described as metallocarboxipeptidase inhibitors PUBMED:11488477.

    \ \ 5795 IPR010283 \

    This family consists of several conserved eukaryotic proteins of unknown function which contain a TLC domain that contains at least 5 transmembrane alpha-helices. Proteins containing this domain may possess multiple functions such as lipid trafficking, metabolism, or sensing.

    \ 2537 IPR003223 \ Flagellin is the subunit which polymerizes to form the filaments of bacterial\ flagella. The proteins in this family are transcriptional repressors of phase-1 flagellin genes.\ 6861 IPR009754 \

    This family consists of several Orthopoxvirus B11R proteins of around 70 residues in length. The function of this family is unknown.

    \ 4821 IPR000825 \

    Proteins containing [Fe-S] clusters perform essential functions in all domains of life. Of particular interest are the sufB, sufC and sufD genes, which are conserved among eubacteria, archaea, plants and parasites. The sufABCDSE operon of the Gram-negative bacterium Escherichia coli is induced by oxidative stress and iron deprivation. The sufABCDSE operon is also necessary for virulence of the plant pathogen Erwinia chrysanthemi.

    \ \

    SufE protein and the SufBCD complex act synergistically to modulate the cysteine desulphurase activity of SufS. SufBCD is essential for iron acquisition via chrysobactin, a siderophore of major importance in virulence PUBMED:12554644; they also contribute to bacterial pathogenicity via their role in the assembly of [Fe-S] clusters under oxidative stress and iron limitation and may be important for limiting sulphide release during oxidative stress conditions in vivo.

    \ \ 5450 IPR008636 \ This family consists of several HOOK1, 2 and 3 proteins from different eukaryotic organisms. The different members of the Homo sapiens gene family are HOOK1, HOOK2 and HOOK3. Different domains have been identified in the three Homo sapiens HOOK proteins, and it was demonstrated that the highly conserved NH2-domain mediates attachment to microtubules, whereas the central coiled-coil motif mediates homodimerisation and the more divergent C-terminal domains are involved in binding to specific organelles (organelle-binding domains). It has been demonstrated that endogenous HOOK3 binds to Golgi membranes PUBMED:11238449, whereas both HOOK1 and HOOK2 are localised to discrete but unidentified cellular structures. In mice the Hook1 gene is predominantly expressed in the testis. Hook1 function is necessary for the correct positioning of microtubular structures within the haploid germ cell. Disruption of Hook1 function in mice causes abnormal sperm head shape and fragile attachment of the flagellum to the sperm head PUBMED:12075009.\ 3787 IPR004899 \

    Bordetella pertussis is a Gram-negative, aerobic coccobacillus that causes \ pertussis (whooping cough), especially in young children PUBMED:2542937. Once present in the lungs, the bacterium attaches to ciliated pulmonary epithelial cells via a collection of outer membrane proteins, all of which are virulence \ factors.

    \

    Pertactin, or P69 protein, is one of these virulence factors. Pertactin and\ filamentous haemagglutinin have been identified as Bordetella adhesins PUBMED:1527510. Both proteins contain an arg-gly-asp (RGD) motif that promotes binding to integrins, known to be important in cell mobility and development. The\ production of most Bordetella virulence factors (including pertactin) is \ controlled by a two-component signal transduction system, comprising the\ BvgA regulator and the BvgS sensor PUBMED:10943406. Pertactin shares a high level of similarity with other Bordetella adhesins, such as BrkA. The protein is\ first produced as a 93kDa precursor. Upon secretion into the extracellular\ environment, a 30kDa domain at the C-terminus remains in the outer membrane,\ while the mature 60.4kDa pertactin molecule is released PUBMED:8609998.

    \

    The crystal structure of mature pertactin has been determined to 2.5A \ resolution by means of X-ray diffraction. The fold is characterised by a 16-stranded parallel beta-helix, with a V-shaped cross-section. Several between-strand amino-acid repeats form internal and external ladders. The helical structure is interrupted by several protruding loops that contain motifs associated with the activity of the protein. One such sequence - [GGXXP]5 - appears directly after the RGD motif, and may mediate interaction with epithelial cells. The C-terminal region of P.69 pertactin contains a [PQP]5 motif loop, which contains the major immunoprotective epitope PUBMED:8609998.

    \

    The superfamily also includes immunoglobulin A1 protease and adhesion penetration protein HAP.

    \ 2922 IPR007640 \ UL17 protein is required for DNA cleavage and packaging in herpes viruses. It has been shown to associate with immature B-type capsids PUBMED:10752563, and is required for the localisation of capsids and capsid proteins to the intranuclear sites where viral DNA is cleaved and packaged PUBMED:9875322. In the virion, UL17 is a component of the tegument, which is a protein layer surrounding the viral capsid PUBMED:9557660.\ 3012 IPR000281 \ This domain contains a helix-turn-helix motif PUBMED:8576032.\ Every member of this family is N-terminal to a SIS domain . Members of this family are probably regulators of genes\ involved in phosphosugar metobolism.\ 173 IPR000374 \ Phosphatidate cytidylyltransferase () PUBMED:2995359, PUBMED:8557688, PUBMED:9083091 (also known as CDP-\ diacylglycerol synthase) (CDS) is the enzyme that catalyzes the synthesis of\ CDP-diacylglycerol from CTP and phosphatidate (PA):\ \ CDP-diacylglycerol is an\ important branch point intermediate in both prokaryotic and eukaryotic\ organisms. CDS is a membrane-bound enzyme.\ 6573 IPR010624 \

    This family represents a conserved region within bacterial and archaeal proteins, most of which are hypothetical. More than one copy is sometimes found in each protein. This family includes KaiC, which is one of the Kai proteins among which direct protein-protein association may be a critical process in the generation of circadian rhythms in cyanobacteria PUBMED:10064581.

    \ 5178 IPR008015 \

    GMP-PDE delta subunit was originally identified as a fourth subunit of rod-specific cGMP\ phosphodiesterase (PDE) (). The precise function of PDE delta\ subunit in the rod specific GMP-PDE complex is unclear. In addition, PDE delta subunit is not\ confined to photoreceptor cells but is widely distributed in different tissues. PDE delta subunit is\ thought to be a specific soluble transport factor for certain prenylated proteins and Arl2-GTP a\ regulator of PDE-mediated transport PUBMED:11980706.

    \ 300 IPR006745 \

    This family contains proteins from the Eukaryota; functionally they are uncharacterised.

    \ 3783 IPR002016 \ Peroxidases are haem-containing enzymes that use hydrogen peroxide as\ the electron acceptor to catalyse a number of oxidative reactions.\ Most haem peroxidases follow the reaction scheme:\ \ \ \ \

    In this mechanism, the enzyme reacts with one equivalent of H2O2 to give \ [Fe4+=O]R' (compound I). This is a two-electron oxidation/reduction \ reaction where H2O2 is reduced to water and the enzyme is oxidised. One \ oxidising equivalent resides on iron, giving the oxyferryl PUBMED:8062820 \ intermediate, while in many peroxidases the porphyrin (R) is oxidised to \ the porphyrin pi-cation radical (R'). Compound I then oxidises an organic \ substrate to give a substrate radical PUBMED:7922023.

    \ \

    Haem peroxidases include two superfamilies: one found in bacteria, fungi, plants and the second found in animals. The first one can be\ viewed as consisting of 3 major classes PUBMED:. Class\ I, the intracellular peroxidases, includes: yeast cytochrome c peroxidase\ (CCP), a soluble protein found in the mitochondrial electron transport\ chain, where it probably protects against toxic peroxides; ascorbate\ peroxidase (AP), the main enzyme responsible for hydrogen peroxide removal\ in chloroplasts and cytosol of higher plants PUBMED:; and bacterial catalase-\ peroxidases, exhibiting both peroxidase and catalase activities. It is\ thought that catalase-peroxidase provides protection to cells under\ oxidative stress PUBMED:1954228.

    \

    Class II consists of secretory fungal peroxidases: ligninases, or lignin \ peroxidases (LiPs), and manganese-dependent peroxidases (MnPs). These are\ monomeric glycoproteins involved in the degradation of lignin. In MnP,\ Mn2+ serves as the reducing substrate PUBMED:8167033. Class II proteins contain four\ conserved disulphide bridges and two conserved calcium-binding sites.

    \

    Class III consists of the secretory plant peroxidases, which have multiple \ tissue-specific functions: e.g., removal of hydrogen peroxide from\ chloroplasts and cytosol; oxidation of toxic compounds; biosynthesis of the\ cell wall; defence responses towards wounding; indole-3-acetic acid (IAA) \ catabolism; ethylene biosynthesis; and so on PUBMED:. Class III proteins are \ also monomeric glycoproteins, containing four conserved disulphide bridges \ and two calcium ions, although the placement of the disulphides differs \ from class II enzymes.

    \

    The crystal structures of a number of these proteins show that they share the same architecture - two all-alpha domains between which the haem group is embedded.

    \ 4523 IPR001217 \

    The STAT protein (Signal Transducers and Activators of Transcription) family contains transcription factors that are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus PUBMED:12039028. Binding of these factors to cell-surface receptors leads to receptor autophosphorylation at a tyrosine, the phosphotyrosine being recognised by the STAT SH2 domain, which mediates the recruitment of STAT proteins from the cytosol and their association with the activated receptor. The STAT proteins are then activated by phosphorylation via members of the JAK family of protein kinases, causing them to dimerise and translocated to the nucleus, where they bind to specific promoter sequences in target genes. In mammals, STATs comprise a family of seven structurally and functionally related proteins: Stat1, Stat2, Stat3, Stat4, Stat5a and Stat5b, Stat6. STAT proteins play a critical role in regulating innate and acquired host immune responses. Dysregulation of at least two STAT signaling cascades (i.e. Stat3 and Stat5) is associated with cellular transformation.

    \

    Signaling through the JAK/STAT pathway is initiated when a cytokine binds to its corresponding receptor. This leads to conformational changes in the\ cytoplasmic portion of the receptor, initiating activation of receptor associated members of the JAK family of kinases. The JAKs, in turn, mediate phosphorylation at the specific receptor tyrosine residues, which then serve as docking sites for STATs and other signaling molecules. Once recruited to the receptor, STATs also become phosphorylated by JAKs, on a single tyrosine residue. Activated STATs dissociate from the receptor, dimerize, translocate to the nucleus and bind to members of the GAS (gamma activated site) family of enhancers.

    \

    The seven STAT proteins identified in mammals range in size from 750 and 850 amino acids. The chromosomal distribution of these STATs, as well as the identification of STATs in more primitive eukaryotes, suggest that this family arose from a single primordial gene. STATs share structurally and functionally conserved domains including: an N-terminal domain that strengthens interactions between STAT dimers on adjacent DNA-binding sites; a coiled-coil STAT domain that is implicated in protein-protein interactions; a DNA-binding domain with an immunoglobulin-like fold similar to p53 tumour suppressor protein; an EF-hand-like linker domain connecting the DNA-binding and SH2 domains; an SH2 domain () that acts as a phosphorylation-dependent switch to control receptor recognition and DNA-binding; and a C-terminal transactivation domain PUBMED:9630226. The crystal structure of the N-terminus of Stat4 reveals a dimer. The interface of this dimer is formed by a ring-shaped element consisting of five short helices. Several studies suggest that this N-terminal dimerization promotes cooperativity of binding to tandem GAS elements and with the transcriptional coactivator CBP/p300.

    \ 3506 IPR004893 \ Nitrogenase is a complex metalloenzyme composed of two proteins designated the Fe-protein and the MoFe-protein. Apart from\ these two proteins, a number of accessory proteins are essential for the maturation and assembly of nitrogenase. Even though\ experimental evidence suggests that these accessory proteins are required for nitrogenase activity, the exact roles played by many of\ these proteins in the functions of nitrogenase are unclear PUBMED:9514861. Using yeast two-hybrid screening it has been shown that NifW can\ interact with itself as well as NifZ. \ \ 3627 IPR007188 \ Arp2/3 protein complex has been implicated in the control of actin polymerisation in cells. The human complex consists of seven subunits, which include the actin related Arp2 and Arp3, and five others referred to as p41-Arc, p34-Arc, p21-Arc, p20-Arc, and p16-Arc PUBMED:9230079. This family represents the p34-Arc subunit.\ 3516 IPR006419 \

    The PnuC protein of Escherichia coli is membrane protein responsible for nicotinamide mononucleotide transport, subject to regulation by interaction with the NadR (also called NadI) protein (see ). The extreme N- and C-terminal regions are poorly conserved.

    \ 2791 IPR000312 \

    The glycosyl transferase family includes anthranilate phosphoribosyltransferase (TrpD, ) and thymidine phosphorylase ().\ All these proteins can transfer a phosphorylated ribose substrate. Thymidine phosphorylase () catalyses the reversible phosphorolysis\ of thymidine, deoxyuridine and their analogues to their respective bases and\ 2-deoxyribose 1-phosphate. This enzyme regulates the availability of thymidine\ and is therefore essential to nucleic acid metabolism.

    \ \ \ \ 5311 IPR008827 \ Synaptonemal complex protein 1 (SCP-1) is the major component of the transverse filaments of the synaptonemal complex. Synaptonemal complexes are structures that are formed between homologous chromosomes during meiotic prophase PUBMED:1464329.\ 4502 IPR003210 \ The signal recognition particle (SRP) is a multimeric protein involved in targeting secretory proteins to the rough endoplasmic reticulum membrane. SRP14 and SRP9 form a complex essential for SRP RNA binding. \ 7114 IPR009906 \

    This family represents a conserved region approximately 150 residues long within a number of hypothetical bacterial and eukaryotic proteins of unknown function.

    \ 1731 IPR007387 \ The function of the members of this family is unknown, but DctQ homologues are invariably found in the tripartite ATP-independent periplasmic transporters PUBMED:10627041.\ 2542 IPR000404 \ Flaviviruses encode a single polyprotein. This is cleaved into\ three structural and seven non-structural proteins. The NS4A\ protein is small and poorly conserved among the Flaviviruses.\ NS4A contains multiple hydrophobic potential membrane spanning\ regions PUBMED:2174669. NS4A has only been found in cells infected by Kunjin\ virus PUBMED:2541547.\ 4816 IPR003444 \

    This family is characterized by a 70 amino acid region. Its members are probably enzymes containing a conserved DXXXR motif that probably forms part of the active site.

    \ 7481 IPR011511 \

    SH3 (Src homology 3) domains are often indicative of a protein involved in signal transduction related to cytoskeletal organisation. These were first described in the Src cytoplasmic tyrosine kinase . The structure is a partly opened beta barrel.

    \ 7541 IPR010991 \

    The p53 protein is a tetrameric transcription factor that plays a central role in the prevention of neoplastic transformation PUBMED:7878469. Oligomerization appears to be essential for the tumour suppressing activity of p53. p53 can be divided into different functional domains: an N-terminal transactivation domain, a proline-rich domain, a DNA-binding domain (), a tetramerisation domain and a C-terminal regulatory region. The tetramerisation domain of human p53 extends from residues 325 to 356, and has a 4-helical bundle fold. The tetramerisation domain is essential for DNA binding, protein-protein interactions, post-translational modifications, and p53 degradation PUBMED:11420672.

    \ \ 7285 IPR010899 \

    This family contains a number of hypothetical bacterial proteins of unknown function approximately 120 residues long.

    \ 7617 IPR012429 \

    These sequences are found in hypothetical proteins of unknown function expressed by bacterial and archaeal species. The region in question is approximately 230 residues long.

    \ 3167 IPR004043 \

    The LCCL domain has been named after the best characterized proteins that were found to contain it, namely Limulus factor C, Coch-5b2 and Lgl1. It is an about 100 amino acids domain whose C-terminal part contains a highly conserved histidine in a conserved motif YxxxSxxCxAAVHxGVI. The LCCL module is thought to be an autonomously folding domain that has been used for the construction of various modular proteins through exon-shuffling. It has been found in various metazoan proteins in association with complement B-type domains, C-type lectin domains, von Willebrand type A domains, CUB domains, discoidin lectin domains or CAP domains. It has been proposed that the LCCL domain could be involved in lipopolysaccharide (LPS) binding PUBMED:10971586, PUBMED:9806553. Secondary structure prediction suggests that the LCCL domain contains six beta strands and two alpha helices PUBMED:10971586.

    \

    Some proteins known to contain a LCCL domain include Limulus factor C, a LPS endotoxin-sensitive trypsin type serine protease which serves to protect the organism from bacterial infection; vertebrate cochlear protein cochlin or coch-5b2 (Cochlin is probably a secreted protein, mutations affecting the LCCL domain of coch-5b2 cause the deafness disorder DFNA9 in humans); and mammalian late gestation lung protein Lgl1, contains two tandem copies of the LCCL domain PUBMED:10362728.

    \ 4159 IPR004664 \

    Members of this entry include ribonuclease BN (rbn) from Escherichia coli and homologues from a number of bacteria, including the largely uncharacterised BrkB (Bordetella spp. resist killing by serum B) from Bordetella pertussis. Some members have an additional C-terminal domain. Paralogs from Escherichia coli (yhjD) and Mycobacterium tuberculosis (Rv3335c) are part of a smaller, related subfamily that form their own cluster. Ribonuclease BN is a homodimer in E. coli and does not contain a nucleic acid component. Enterobacteria phage T4 encodes several tRNAs that require this host ribonuclease for maturation. However, host tRNAs with the normal universal 3 sequence of CCA do not appear to be substrates. The substrate specificity of RNase BN appears to be very narrow and its biological role is uncertain. It is one of five ribonucleases in E. coli for which any of the five can confer viability, with the order of efficacy being RNase T > RNase PH > RNase D > RNase II > RNase BN.

    \ 2189 IPR007480 \ This entry represents a repeated region found in several Theileria parva proteins.\ 873 IPR003877 \ The SPRY domain is of unknown function. Distant homologues are domains in\ butyrophilin/marenostrin/pyrin PUBMED:9204703.\ Ca2+-release from the sarcoplasmic or endoplasmic reticulum, the intracellular\ Ca2+ store, is mediated by the ryanodine receptor (RyR) and/or the inositol\ trisphosphate receptor (IP3R).\ 477 IPR001767 \

    This domain identifies a group of cysteine peptidases correspond to MEROPS peptidase family C46 (clan CH). The type example is the Hedgehog protein from Drosophila melanogaster. These are involved in intracellular signalling required for a variety of patterning events during development.

    \ \

    The hedgehog family of proteins self process by a cysteine-dependent mechanism, which is a one-time autolytic cleavage. It is differentiated from a typical peptidase reaction by the fact that the newly-formed carboxyl group\ is esterified with cholesterol, rather than being left free. The three-dimensional structure of the autolytic domain of the hedgehog protein of Drosophila melanogaster shows that it is formed from two divergent copies of a\ module that also occurs in inteins, called a ‘Hint’ domain PUBMED:9335337,PUBMED:9489693.

    \ \ \ 3029 IPR003410 \ This domain is known as the HYR (Hyalin Repeat) domain, after the protein hyalin that is composed exclusively of this repeat. This domain probably corresponds to a new superfamily in the immunoglobulin fold. The function of this domain is\ uncertain it may be involved in cell adhesion. In the Sushi repeat-containing protein (SrpX), this domain is found between two sushi repeats.\ 4308 IPR002873 \ This family consists of rotaviral non-structural RNA binding protein 34 (NS34 or NSP3). The NSP3 protein has been shown to bind viral RNA. The NSP3 protein consists of 3 conserved functional domains; a basic region which binds ssRNA, a region containing heptapeptide repeats mediating oligomerisation and a leucine zipper motif PUBMED:1326821. NSP3 may play a central role in replication and assembly of genomic RNA structures PUBMED:1326821. Rotaviruses have a dsRNA genome and are a major cause cause of acute gastroenteritis in the young of many species PUBMED:7871749.\ 6635 IPR010655 \

    This family consists of several pre-mRNA cleavage complex II Clp1 (or HeaB) proteins. Six different protein factors are required in vitro for 3' end formation of mammalian pre-mRNAs by endonucleolytic cleavage and polyadenylation. Clp1 is a subunit of cleavage complex IIA, which is required for cleavage, but not for polyadenylation of pre-mRNA PUBMED:11060040.

    \ 6505 IPR009563 \

    This family consists of several Sjogren's syndrome/scleroderma autoantigen 1 (Autoantigen p27) sequences. It is thought that the potential association of anti-p27 with anti-centromere antibodies suggests that autoantigen p27 might play a role in mitosis PUBMED:9486406.

    \ 2686 IPR003438 \

    Glial cell line-derived neurotrophic factor (GDNF) and its related factors\ neurturin (NTN), artemin (ART) and persephin (PSP), are members of the GDNF\ family of neurotrophic factors. They form a sub-group in the transforming \ growth factor-beta (TGF-beta) superfamily. These factors are involved in\ the promotion of neurone survival, exerting their effects through specific \ receptors.

    \

    The GDNF family receptors (GFRs) are glycosyl-phosphatidylinositol-linked,\ cell surface receptors PUBMED:10356294. Four receptor subtypes, termed GFRalpha-1 to 4, are currently recognised. GFRalpha-1 and 2 are activated by GDNF and NTN respectively, although some degree of ligand promiscuity is thought to occur PUBMED:9192684. Homologues for these receptor subtypes have been cloned from mammalian and avian tissue. The principal ligand for GFRalpha-3 is artemin. This receptor subtype is currently described only in mammals PUBMED:9576965. GFRalpha-4 is activated by persephin and has so far only been found in chicken PUBMED:9647690. This entry is general for types 1 to 3.

    \

    Activation of GFR family members triggers their interaction with the membrane-bound receptor kinase Ret. This induces Ret homo-dimerisation, \ triggering a cascade of intracellular signalling events such as the \ activation of the Ras-mitogen-activated protein kinase (MAPK), phosphoinositol-3-kinase (PI3K), Jun N-terminal kinase (JNK) and \ phospholipase C gamma (PLC gamma) dependent pathways PUBMED:10356294.

    \ 5925 IPR010352 \

    This family consists of several hypothetical bacterial proteins of unknown function.

    \ 4643 IPR001820 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    Tissue inhibitors of metalloproteinases (TIMPs, PUBMED:2793861, PUBMED:1850705, PUBMED:1512267) and their target matrix metalloproteinases (MMPs, MEROPS peptidase family M10A) are important in connective tissue re-modelling in diseases of the cardiovascular system and in the physiological degradation of connective tissue, as well as in pathological states such as tumor invasion and arthritis. TIMPs belong to MEROPS proteinase inhibitor family I35, clan IT.

    \ \

    TIMPs complex with extracellular matrix metalloproteinases (such as collagenases) and irreversibly inactivate them. Members of this family are common in extracellular regions of vertebrate species PUBMED:7918391. TIMPs are proteins of about 200 amino acid residues, 12 of which are cysteines involved in disulphide bonds PUBMED:2163605.\ The basic structure of such a type of inhibitor is shown in the following schematic representation:

    \
    \
              +-----------------------------+         +--------------+\
              |                             |         |              |\
            CxCxCxxxxxxxxxxxxxxxxxCxxxxxxxxxCxxxxxxxCxCxCxCxCxxxxxCxxCxxx\
            |   |                 |                 |   | | |     |\
            |   +-----------------|-----------------+   +-+ +-----+\
            +---------------------+\
    \
    'C': conserved cysteine involved in a disulphide bond.\
    
    \ \

    The crystal structure of the human proMMP-2/TIMP-2 complex reveals an interaction between the hemopexin domain of proMMP-2 and the C-terminal domain of TIMP-2, leaving the catalytic site of MMP-2 and the inhibitory site of TIMP-2 distant and spatially isolated. The interfacial contact of these two proteins is characterised by two distinct binding regions composed of alternating hydrophobic and hydrophilic interactions. This unique structure provides information for how specificity for non-inhibitory MMP/TIMP complex formation is achieved PUBMED:12032297.

    \ 4634 IPR001152 \ Thymosin beta-4 is a small polypeptide whose exact physiological role is not\ yet known PUBMED:4088087. It was first\ isolated as a thymic hormone that induces terminal deoxynucleotidyltransferase. It is\ found in high quantity in thymus and spleen but is widely distributed in many tissues.\ It has also been shown to bind to actin monomers and thus to inhibit actin\ polymerization PUBMED:15336106.\ \ \ \

    A number of peptides closely related to thymosin beta-4 belong to this family. They\ include, thymosin beta-9 (and beta-8) in Bos taurus and Sus scrofa (pig), thymosin beta-10 \ in man and Rattus norvegicus (rat), thymosin beta-11 and beta-12 in Oncorhynchus mykiss (Rainbow Trout) and human Nb thymosin\ beta.

    \ 3834 IPR006724 \

    This family contains a major tail protein from phage.

    \ 6371 IPR009498 \

    This entry represents the C terminus of Lactococcus bacteriophage repressor proteins.

    \ 3122 IPR002350 \

    Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen.

    \ \ \

    This family of Kazal inhibitors, belongs to MEROPS inhibitor family I1, clan IA. They inhibit serine peptidases of the S1 family () PUBMED:14705960. The members are primarily metazoan, but includes exceptions in the alveolata (apicomplexa), stramenopiles, higher plants and bacteria.

    \ \ \

    Kazal inhibitors, which inhibit a number of serine proteases (such as\ trypsin and elastase), belong to family of proteins that includes\ pancreatic secretory trypsin inhibitor; avian ovomucoid; acrosin inhibitor;\ and elastase inhibitor. These proteins contain between 1 and 7 Kazal-type\ inhibitor repeats PUBMED:6699915, PUBMED:3828298.

    The structure of the Kazal repeat includes a large quantity of extended chain, 2 short alpha-helices and a 3-stranded anti-parallel beta sheet PUBMED:6699915.The inhibitor makes 11 contacts with its enzyme substrate: unusually, 8 of these important residues are hypervariable PUBMED:3828298. Altering the enzyme-contact residues, and especially that of the active site bond, affects the the strength of inhibition and specificity of the inhibitor for particular serine proteases PUBMED:3828298, PUBMED:7046785. The presence of this Pfam domain is usually indicative of serine protease inhibitors, however, Kazal-like domains are also seen in the extracellular part of agrins which are not known to be proteinase inhibitors.

    \ 2718 IPR003440 \

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These are enzymes that catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates () and related proteins into distinct sequence based families has been described PUBMED:9334165. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    \ \

    This is the glycosyltransferase 48 family , which consists of various 1,3-beta-glucan synthase components including Gls1, Gls2 and Gls3 from yeast. 1,3-beta-glucan synthase () also known as callose synthase catalyses the formation of a beta-1,3-glucan polymer that is a major component of the fungal cell wall PUBMED:9209021. The reaction catalysed is:-

    UDP-glucose + {(1,3)-beta-D-glucosyl}(N)\ = UDP + {(1,3)-beta-D-glucosyl}(N+1).

    \ 4374 IPR003033 \

    This domain is involved in binding sterols. The human sterol carrier protein 2 (SCP2) is a basic protein that is believed to participate in the intracellular transport of cholesterol and various other lipids PUBMED:8243660. The unc-24 protein of Caenorhabditis elegans contains a domain similar to part of two ion channel regulators (the erythrocyte integral membrane protein stomatin and the C. elegans neuronal protein MEC-2) juxtaposed to a domain similar to nonspecific lipid transfer protein (nsLTP; also called sterol carrier protein 2) PUBMED:8667025.

    \ \ 3773 IPR005080 \

    Metalloproteases are the most diverse of the four main types of protease, with more than 30 families identified to date PUBMED:7674922. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. \ Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site PUBMED:7674922. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as abXHEbbHbc, where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases PUBMED:7674922.

    \

    Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    \ \

    This group of metallopeptidases belong to MEROPS peptidase family M63 (gpr protease family, clan ML).\ \ These are tetrameric proteases that makes the rate-limiting first cut in the small, acid-soluble spore proteins (SASP) of Bacillus subtilis and related species during spore germination. The enzyme lacks clear homology to other known proteases. It processes its own amino end before becoming active to cleave SASPs.

    \ 6518 IPR009573 \

    This family consists of several hypothetical archaeal proteins of around 260 residues in length, which seem to be specific to Methanobacterium, Methanococcus and Methanopyrus species. The function of this family is unknown.

    \ 743 IPR007728 \ This protein motif is a zinc binding motif PUBMED:12389037. It contains 9 conserved cysteines that coordinate three zinc ions. It is thought that this region plays a structural role in stabilising SET domains.\ 5439 IPR008498 \ This family consists of several short proteins of unknown function found in Caenorhabditis species.\ 4606 IPR003194 \ Accurate transcription in vivo requires at least six general transcription initiation factors, in addition to RNA polymerase II. Transcription initiation factor IIA (TFIIA) is a multimeric protein which facilitates the binding of TFIID to the TATA box. \ 896 IPR007526 \

    The SWIRM domain is a small alpha-helical domain of about 85 amino acid residues found in eukaryotic chromosomal proteins. It is named after the proteins SWI3, RSC8 and MOIRA in which it was first recognised. This domain is predicted to mediate protein-protein interactions in the assembly of chromatin-protein complexes. The SWIRM domain can be linked to different domains, such as the ZZ-type zinc finger (), the Myb DNA-binding domain (), the HORMA domain (), the amino-oxidase domain, the chromo domain (), and the JAB1/PAD1 domain.

    \ 864 IPR001119 \ S-layers are paracrystalline mono-layered assemblies of (glyco)proteins which\ coat the surface of bacteria. Several S-layer proteins and some other cell\ wall proteins contain one or more copies of a domain of about 50-60 residues,\ which has been called SLH (for S-layer homology). There is strong evidence\ that this domain serves as an anchor to the peptidoglycan PUBMED:8113161, PUBMED:7730277.\ The SLH domain is present in a variety of S-layer proteins from different sources,\ outer membrane protein Omp-alpha from Thermotoga maritima, cellulosome anchoring\ protein (gene ancA) from Clostridium thermocellum, amylopullulanases, xylanase A\ (gene xynA) from Thermoanaerobacter saccharolyticum and many others.\ 6532 IPR009583 \

    This family consists of several DspF and related sequences from several plant pathogenic bacteria. The 'disease-specific' (dsp) region next to the hrp gene cluster of Erwinia amylovora is required for pathogenicity but not for elicitation of the hypersensitive reaction. DspF and AvrF are small (16 kDa and 14 kDa) and acidic with predicted amphipathic alpha helices in their C termini; they resemble chaperones for virulence factors secreted by type III secretion systems of animal pathogens PUBMED:9448330.

    \ 7838 IPR013109 \

    This family contains many hypothetical proteins that belong to the cupin superfamily.

    \ 6993 IPR009832 \

    This family consists of several insect specific proteins. is annotated as being a haemolymph glycoprotein precursor. The function of this family is unknown PUBMED:7742978.

    \ 3407 IPR007208 \ Members of the PhaF/MrpF family are predicted to be integral membrane proteins with three transmembrane regions, involved in regulation of pH. PhaF is part of a potassium efflux system involved in pH regulation. It is also involved in symbiosis in Rhizobium meliloti PUBMED:11356194. MrpF is a part of a Na+/H+ antiporter complex, also involved in pH homeostasis. MrpF is thought to be an efflux system for Na+ and cholate PUBMED:10198001. The Mrp system in Gram-positive species may also have primary energisation capacities PUBMED:9680201.\ 6025 IPR009339 \

    This is a family of conserved archaeal proteins.

    \ 4648 IPR001267 \

    Thymidine kinase (TK) () is an ubiquitous enzyme that catalyzes the\ ATP-dependent phosphorylation of thymidine. Two different families of TK have \ been identified PUBMED:3027984, PUBMED:2389555 and are included in this family; one family groups\ together TK from herpesviruses as well as cellular thymidylate kinases and the \ second family groups TK from various sources that include, vertebrates, bacteria, the \ bacteriophage T4, poxviruses, african swine fever virus (ASF) and fish lymphocystis \ disease virus (FLDV). The major capsid protein of insect iridescent viruses also\ belongs to this family. The Prosite pattern recognises only the cellular type of thymidine kinases.

    \ 7846 IPR012597 \

    This family corresponds to mating-type pheromone proteins. The homobasidiomycetes, or mushroom fungi, have arguably the most complex mating system of all known organisms. Many species possess a mating system known as bifactorial incompatibility, where two unlinked loci control the mating -type of an individual incompatibility loci (the A and B mating-type loci). Each A mating-type sublocus encodes a pair of divergently transcribed homeodomain transcription factors while the genes responsible for B mating-type activity encode lipopeptide pheromones and G-protein -coupled pheromone receptors PUBMED:15219565.

    \ 268 IPR005508 \ This is a family of proteins from Arabidopsis thaliana with uncharacterised function.\ 8129 IPR013268 \

    This family of proteins is associated with U3 snoRNA PUBMED:12068309. U3 snoRNA is required for nucleolar processing of pre-18S ribosomal RNA.

    \ 4640 IPR003397 \

    The membrane-embedded multi-protein complexes of mitochondria mediate the transport of nuclear-encoded proteins across and into the outer or inner mitochondrial membranes PUBMED:15232570. The TOM (translocase of the outer mitochondrial membrane) complex consists of cytosol-exposed receptors and a pore-forming core, and mediates the transport of proteins from the cytosol across and into the outer mitochondrial membrane. A novel protein complex in the outer membrane of mitochondria, called the SAM complex (sorting and assembly machinery), is involved in the biogenesis of beta-barrel proteins of the outer membrane. Two translocases of the inner mitochondrial membrane (TIM complexes) mediate protein transport at the inner membrane.

    The TIM23 complex (a presequence translocase) mediates the transport of presequence-containing proteins across and into the inner membrane. TIM17 forms a part of this complex, although its role is not yet fully understood. The TIM22 complex (a twin-pore carrier translocase) catalyses the insertion of multi-spanning proteins that have internal targeting signals into the inner membrane. The TIM22 complex mediates the membrane insertion of multi-spanning inner-membrane proteins that have internal targeting signals, and it uses a as an external driving force. The Tim22 subunit of the mitochondrial import inner membrane translocase is included in this family.

    \ 7470 IPR011486 \

    This is a family of proteins for which no function is known yet.

    \ 4704 IPR005063 \ Transposase proteins are necessary for efficient DNA transposition. This family represents bacterial IS1 transposases. \ 7720 IPR012458 \

    The members of this family are hypothetical plant proteins of unknown function. The region featured in this family is approximately 100 amino acids long.

    \