Description of DroID - the Drosophila Interactions Database (Data version 2018_08 updated 29 August 2018)

DroID assembles gene and protein interaction data from a variety of sources into one location. Drosophila interactome data in DroID can be accessed and downloaded at the DroID home page, http://www.droidb.org. The data also can be searched, integrated, graphed, and downloaded using IM Browser or the DroID Cystoscape plugin (For Ctytoscape 2.x).

DroID is updated periodically. The current version is described in this document. Previous versions are described and available for downlaoding on the version history page.

Flybase gene IDs have been updated to Flybase release FB2018_03. As in previous versions, FBgn's were removed if they were ambiguous. If an old FBgn split into two new primary FBgns, interaction records involving it were deleted. Because of this it is possible for some data sets to have fewer interactions (or genes) than the previous version of DroID. Refer to Flybase Document for more information about primary and secondary FBgn's.

Summary of DroID v2018_08 Data (updated 29 Aug 2018)*
Data set Number of interactions Number of genes
PPI: Curagen yeast two-hybrid
PPI: Finley Lab yeast two-hybrid
PPI: Hybrigenics yeast two-hybrid
PPI: Perrimon Lab co-AP/MS
PPI: DPIM co-AP/MS 17652 3732
PPI: Flybase curated 31491 5125
PPI: From other databases**
PPI: Human interologs
PPI: Yeast interologs
PPI: Worm interologs
Total Unique Protein Interactions 327960 11790
PDI: Transcription Factor-gene
RRI: miR-Gene
GI: Genetic Interactions - Flybase

*PDI, and RRI, were not updated for DroID_v2018_08

** Protein-protein interactions from other major databases (listed below)

Below is a brief description of the various data sets. Definitions of the fields in each data set can be found here.

Normalized gene expression values - the Percent Max or pmax scale

As of DroID v2014_01, DroID includes normalized gene expression values that can be used to filter interaction data, as described in Murali et al. 2014 (PMID: 24913703). Based on FlyAtlas gene expression data, a gene is expressed in a particular tissue at some percentage of its maximal level across all tissues. In DroID, each gene has a 'percent maximum' or pmax value for each tissue. Likewise, based on modENCODE RNA-seq data, a gene is expressed at a particular developmental time point at some percentage of its maximal level across all time points. Thus, in DroID each gene also has a pmax value for each developmental time point. Filtering interactome data using the pmax values can reveal gene and protein networks that are likely to operate in specific tissues or developmental time points.

Gene expression correlation values

As of DroID v2013_02, gene expresion correlation values are calculated based on developmental expresison profiles from modENCODE and tissue expression profiles from FlyAtlas. Both expression data sets can also be used to filter interaction data (see below). The Weighted Correlation valus for each gene pair provides a measure of how fequently the two genes are expressed together across tissues or develomental time. The Weighted Correlation values can be displayed and used to filter lists of gene or protein interactions.

Gene expression data from modENCODE and FlyAtlas

First added to DroID v2011_08. Interaction data can be filtered using the gene expression data from the modENCODE project (PMID: 21179090) or from FlyAtlas (PMID: 17534367). The modENCODE data are from RNA-seq for 30 developmental time points from early embryos to adult males and females. Expression levels are represented as "fragments (sequenced) per kilobase of transcript per million fragments mapped" or FKPM. Interaction data can be filter at different FKPM values for each developmental time point by selecting the "Life Cycle Expression" link at the top of a list of interactions. More information at modENCODE.org. FlyAtlas data are from DNA microarray-based gene expression profiling of ~25 tissues. More information at flyatlas.org.

Transcription Factor (TF) - Gene Interactions

First added to DroID v2010_08. This table contains interactions between TFs and specific genes that they may regulate. The current release includes high quality, curated TF-gene interactions from the REDFly database (PMID: 18039705) and experimentally determined TF-gene interactions from the modENCODE project (PMID: 21177974). These are a subclass of protein-DNA interactions (PDI) for which there is experimental evidence that the TF binds to the gene and regulates its transcription. The modENCODE interactions have been inferred from the binding profiles of TFs in the genomic regions of the target genes using genome-wide location analysis (ChIP-chip and ChIP-seq). Links to REDFly and modENCODE and the original literature citations are included in the data.

microRNA - Gene Interactions

First added to DroID v2010_08. This table includes putative regulatory interactions between miroRNAs or miRs and their target genes. Since miRs regulate their targets by base pairing with target RNA, in molecular interaction terms these are RNA-RNA interactions (RRI). The miR-Gene interactions currently in DroID are from TargetScanFly, MinoTar and the modENCODE project. Interactions are predicted by TargetScanFly based on base complementarity (PMID: 15652477; PMID: 17989254) in the 3’ UTRs of target genes, MinoTar interactions are predicted conserved miR targeting within the protein-coding regions (PMID: 20729470), and the modENCODE interactions (PMID: 21177974) are predicted based on the genome-wide occurrence of evolutionarily conserved of miR seed motifs.

Phenotype and Gene Expression terms

First added to DroID v2010_08. Genes can be searched and interaction data can be filtered based on gene expression and phenotype terms from Flybase controlled vocabularies (CV). To search for genes on the DroID Home/Search page click on the "Phenotype" or "Gene Expression" check box and enter a term, such as "female sterile" or "eye disc". Similalry, these terms can be used to filter interactions from a list of interactions page. Phenotype and Gene Expression data are from Flybase. For a small set of genes the phenotype terms were modified from Flybase to enable efficient searching and filtering. These are: FBgn0004635 FBgn0003731 FBgn0011674 FBgn0000492 FBgn0000014 FBgn0003205 FBgn0000490 FBgn0004644 FBgn0003944 FBgn0000463 FBgn0004647 FBgn0002973 FBgn0004009.

Genetic interactions - from Flybase

Gene-gene interactions downloaded from Flybase. These represent interactions between two gene alleles. For example, an allele of one gene may enhance or suppress the phenotype of an allele in another gene. Alternatively, the combination of two alleles may result in a "synthetic" phenotype not observed for either of the individual alleles.

Protein-protein interactions

DroID includes protein-protein (PPI) and protein-DNA (PDI; i.e., transcription factor-gene) interactions. Although a gene may encode multiple proteins, the methods used to detect PPI and PDI rarely record which protein variant from a gene was used. Thus, interactions involving proteins are represented by pairs of genes. The precise way to interpret a protein interaction represented as "gene 1 - gene 2" is that one or more proteins encoded by gene 1 interact with one or more proteins encoded by gene 2. The gene identifiers used in this database are Flybase Gene Numbers, FBgn.

Protein-protein interactions from other databases - these are experimentally derived physical interactions other than those from the major high throughput datasets listed separately below. These interactions are collected from other large databases ( BioGRID, IntAct, MINT, and BIND) at each refresh of DroID. As of DroID v2014_10, interactions curated by MINT are obtained from IntAct and not from MINT driectly. The orginal database source and information is available for each interaction. This includes links to original publications for each interaction.

Flybase protien-protien interactions - these are experimentally derived physical interactions curated from the literature by Flybase.

Perrimon coAP complex - Protein interactions determined in large-scale co-affinity purification (co-AP)/MS screens in the Perrimon Lab. The co-complex data was converted to binary interactions using the hub-spoke model, where baits are predicted to interact with each of the co-purified proteins.

Perrimon CoAP complex data - 11/23/2011. Data from Friedman et al., 2011. Includes 384 interactions among 252 proteins determined using 15 canonical components of RTK/Ras/ERK pathways as baits. C-terminally TAP-tagged bait proteins were expresed in stably transfected S2R+ cells at baseline or stimulated with either insulin or EEGF, complexes were affinity purified, and associatd proteins were determined by LC-MS/MS. The data includes dataset-specific confidence scores (SAINT scores). This dataset includes interactions above the author-determined SAINT score cutoff of 0.83 and FDR of 7.2%. Friedman et al., correlated this interactome data with RNAi screens designed to detect genes required for EGF- or insulin-stimulated ERK activation. The interaction data can be searched or filtered using the RNAi data with IM Browser. Friedman et al., Science Signaling 25 October 2011 (PMID: 22028469).

DPIM coAP complex - Protein interactions determined in large-scale co-affinity purification (co-AP)/MS screens by the Drosophila Protein Interaction Mapping (DPIM) project, a collaboration among the laboratories of Spyros Artavanis-Tsakonas, Steven Gygi , Susan Celniker, and K. Vijay Raghavan. The co-complex data was converted to binary interactions using the hub-spoke model, where baits are predicted to interact with each of the co-purified proteins. This data set does not include matrix interactions - predicted interactions between co-purifying prey proteins.

DPIM coAP complex data - DroID_v2017_08 update. Data from Guruharsha et al., 2011. C-terminally FLAG-HA-tagged bait proteins were expresed in transiently transfected S2R+ cells, the bait and associated proteins were immunoaffinity putified with anti-HA resin, and proteins were identified by LC-MS/MS. 4,273 full-length bait clones were used, 3,488 of which resulted in successful puriifications. This dataset includes only bait-prey interactions that were found in both Table S1 and Table S3 and involving proteins that interacted with fewer than 10% (313) of all bait proteins. The data includes dataset-specific confidence scores (HGSCores) and other information originally obtained from the DPIM project web site. Guruharsha et al., Cell 28 October 2011 (PMID: 22036573). High confidence matrix interactions from Guruharsha et al. can be found in PPI curated by Flybase and other databases.

Finley YTH - Includes protein interaction data generated in the Finley laboratory using the LexA yeast two-hybrid system, mostly from high throughput screens. The project is described here and is ongoing. Data versions are as follows.

Finley YTH v1.0 - 08/01/2004 - 423 interactions detected in a pilot screen using randomly selected Drosophila "bait" BD proteins. A list of the BD proteins used is here. (Zhong, Patel, Zhang, Mangiola, Stanyon, Finley, unpublished).
Finley YTH v2.5 - 12/10/2004 - Added 1,814 interactions detected in screens with 152 proteins related to cell cycle regulators. This data is described in Stanyon et al., 2004, Genome Biology, 5(12):R96. (PMID: 15575970)
Finley YTH v2.6 - 2/16/2007 - Secondary FBgn's mapped to primary FBgn's. Ambiguous FBgn's removed.
Finley YTH v3.0 - 7/2/2008 - Added results from a Y2H screen that tested computationally predicted protein-protein interactions. Described in Schwatrz et al 2009 (PMID: 19079254). Two different types of predictions were tested, distinguished by data in the SCREEN field. Either "Test of combined evidence predictions (JY) 9_2006" or "Test of conservation-based predictions from Sharan 2005 PMID:1568750". There was also a number of random pairs tested and positive, indicated by "Test of random pairs 9_2006".
Finley YTH v4.0 - 9/18/2010 - Added results from two ongoing projects, including ~4,000 interactions detected in a genome-wide screen using baits with no previously detected YTH interacitons ("Untouched Proteome 2010" screen) and ~2000 interactions detected in tests of interactions originally reportered using the Gal4 system by Hybrigenics or Curagen (see below) ("Gal4 retests in LexA system" screen). These two datasets are unpublished. When using them please cite the DroID web site.

Curagen YTH - Protein interactions detected in a high throughput yeast two-hybrid screen conducted at Curagen (New Haven, CT) in collaboration with the Finley lab. All of the interactions were assigned dataset-specific confidence scores, with roughly one quarter of them falling into the high confidence set (scores >0.5). This data was described in Giot et al., 2003, Science 203, 1727-1736. PMID: 14605208

Hybrigenics YTH - Protein interactions detected in high throughput yeast two-hybrid screens conducted at Hybrigenics (Paris, France). They used 102 bait proteins to detect >2,300 interactions, and assigned 710 of these to a high confidence group. This data was described in Formstecher et al., 2005, Genome Research 15, 376-384. PMID: 15710747. Hybrigenics provides interaction data based on internal coding sequence ids, some of which could not be mapped to protein coding FBgns.

Interolog data

Predicted interactions between Drosophila proteins based on experimental evidence for interactions between orthologous proteins in other species. At each refresh of DroID we collect interactions for yeast, worm, and human from online interaction databases (noted below). Proteins for each species are mapped to Fly orthologs using, orthology mapping algorithms InParanoid or DIOPT. The dates that original data was downloaded are noted in each table.

Yeast Interologs - Yeast interactions were downloaded from BioGRID and IntAct. The integrated interaction set was then mapped to Fly interologs using DIOPT. For each interolog, the source databases containing the original yeast interaction and the associated PubMed IDs are given

Worm Interologs - Worm interactions were downloaded from BioGRID and IntAct. The integrated interaction set was then mapped to Fly interologs using DIOPT. For each interolog, the source databases containing the original worm interaction and the associated PubMed IDs are given

Human Interologs - Experimentally determined human protein interactions were downloaded from BioGRID and IntAct. Genesin the integrated interaction set were mapped to Drosophila orthologs using DIOPT. For each interolog, the source databases containing the original human interaction and the associated PubMed IDs are given.