InteracDome

The InteracDome is a resource consisting of inferred interaction sites in 4,000+ protein domains

InteracDome is a resource consisting of inferred interaction sites in 4,128 protein domains from Pfam (v31.0). InteracDome models domain interactions with DNA, RNA, small molecules, peptides and ions.

S.N. Kobren and M. Singh (2018) "Systematic domain-based aggregation of protein structures highlights DNA-, RNA-, and other ligand-binding positions." Nucleic Acids Res, 47(2): 582-593. [doi: 10.1093/nar/gky1224]

View Per-Position Binding Frequencies

In this interactive web browser, we only display per-domain-position binding frequencies

  • from the representable-NR set (see Download page for description)
  • that model interactions with DNA base, DNA backbone, RNA base, RNA backbone, peptides, and the 'ion', 'metabolite', and 'small molecule' groups.

The complete set of domain interactions (including with particular ions or small molecules) is available for download.

Frequently Asked Questions

Q1: How do I use this site?

InteracDome, like Pfam , is primarily a domain database, not a sequence nor structure database. As such, we invite you to search for a domain by Pfam name or identifier on the homepage to find how positions within that domain interact with various ligand types, if at all. The resulting values displayed on the homepage are "binding frequencies", which are described in the response to the next question.

You also have the ability to query a protein sequence for instances of InteracDome domains in order to determine which positions within your sequence have some domain-based evidence of binding various ligands. This functionality is available under the "Query by Sequence" tab.

Q2: What is a "binding frequency"?

The "binding frequency" of a position within a domain corresponds to the (weighted) fraction of times that position was found to be in contact with a particular ligand across structural instances. Specifically, a residue at a given protein position is "in contact" with a ligand if a non-hydrogen atom from its side chain is within 3.6 Å of a ligand in a crystal structure. Different structural instances are weighted according to the relative uniqueness of their sequence, where structures with highly similar sequences are down-weighted and those with more unique sequences are up-weighted. Binding frequency is marked along the y-axis of the plots generated on the home page.

Q3: How can I finding binding sites in my sequence?

We have not currently implemented a search by sequence name or identifier . However, you can obtain the amino acid sequence for your sequence of interest through UniProt, Ensembl, and/or other databases. Copy and paste the sequence into the box on the "Query by Sequence" page and click "Go". This will return a list of Pfam domains found in your sequence and, if applicable, corresponding binding frequency values annotated to your sequence. In addition, we have provided some sample input by protein name on the same page.

Q4: How can I search using a PDB sequence?

If you are interested in finding domain matches in the protein chain sequence(s) from your structure and determining which of those have corresponding binding frequency values, copy and paste the protein chain sequence into the box on the "Query by Sequence" page and run. To get the corresponding protien sequence for your structure, visit http://www.rcsb.org/pdb/download/viewFastaFiles.do?structureIdList=XXXX&compressionType=uncompressed replacing the "XXXX" with a PDB identifier in the URL.

Alternatively, if you are interested in seeing which PDB co-complex structures are associated with each domain-ligand interaction, you can find this information in the downloadable tab-delimited file(s) from the Downloads page. Exact mappings of PDB protein chain position to domain match state can be found here .

Q5: How do you determine which ligands to include?

All ligands that we consider to be biologically relevant are obtained from the BioLiP database ; details about when ligands are presumed to be biologically relevant are provided in the Methods section of their paper (J. Yang, A. Roy and Y. Zhang, Nucleic Acids Res, 2012). Briefly, any ligand that is present more than 15 times in a particular structure and appears on a list of ligands that are observed > 20 times across all protein structures (i.e., on an "artifact" list) is discarded. Any remaining ligand that contacts < 2 protein residues or has only consecutive residue contacts is discarded. Remaining ligands that aren't on the artifact list or that are explicitly mentioned in the reporting paper's PubMed abstract are retained.

Q6: What if a domain binds multiple ligands using the same interface?

In modeling per-domain-position binding frequencies, we do not require domain positions to interact exclusively with particular ligands. Indeed, the same domain might interact with multiple different ligands using the same subset of domain positions across structures. For instance, in the RRM domain (PF00076), position 2 and positions 27-45 are involved in binding both DNA and RNA. Such domains would be included in multiple ligand-binding categories.

To identify such multi-ligand binding interfaces in a given domain, we include the "ALL_" ligand category in the downloadable InteracDome files indicating how each domain position is involved in binding any ligand type. For those domains with multiple binding partners, high binding frequencies in the "ALL_" category may indicate that those positions are involved in binding multiple ligand types.

Q7: Why can't I see how my domain interacts with specific ions or small molecules?

Only domain interactions with DNA base and backbone, RNA base and backbone, peptides, and curated groups of "ions", "metabolites" and "small molecules" are available for interactive browsing on the webserver. Domain interactions with specific small molecule or ion ligands (e.g., ATP or zinc) are available for download only. The reason for this discrepancy is that certain domains can interact with tens or more of highly-similar small molecules, and the webserver slows down substantially when generating nearly identical binding frequency plots for each of these. We feel that a reasonable compromise is to interactively generate plots only for all the ligand groups that we describe in our main paper, and make these plus all specific small molecule and ion interactions available for download.

Something else?

If you did not find an answer to your question here, please send a message to interacdome@princeton.edu .

Search Protein Sequence for InteracDome Domains

Paste your amino acid protein sequence into the box below to search for instances of interaction domains and view corresponding per-position ligand-binding frequencies.

Sample input: CTCF

Download Database Files

We provide the following sets of per-domain-position binding frequency values, which are described in detail in our paper. Briefly, these sets are:

  • Representable-NR Interactions correspond to domain-ligand interactions that had nonredundant instances across three or more distinct PDB structures. We recommend using this collection to learn more about domain binding properties.

    Representable-NR Domain-Ligand Interactions

  • Confident Interactions correspond to domain-ligand interactions that had nonredundant instances across three or more distinct PDB entries and achieved a cross-validated precision of at least 0.5. We recommend using this collection to annotate potential ligand-binding positions in protein sequences.

    Confident Domain-Ligand Interactions

  • Representable Interactions correspond to domain-ligand interactions that have at least one representative structure in the PDB.

    All (Representable) Domain-Ligand Interactions

Download Source Code

All source code to regenerate InteracDome values from co-complex structures is available at http://github.com/Singh-Lab/InteracDome .