The experiments and analyses described in this study were based on a representative subset of the complete SGA dataset, unless otherwise noted. The subset includes ~19 million double mutants covering more than 80% of all double mutants tested, and consists of ~460,000 negative and ~275,000 positive genetic interactions, identified at an intermediate confidence threshold on the genetic interaction score that was previously defined (1). The nonessential x nonessential (NxN), essential x essential (ExE) and essential x nonessential (ExN) genetic networks based on this subset of data are described in detail below and are provided in various formats as Supplementary Data Files (Data Files S1-S2). We note that Data Files S1-S2 contain raw interaction data corresponding to all tested gene pairs. This data should be filtered for specific applications. We suggest three different thresholds [lenient (P < 0.05), intermediate (P < 0.05 and |ε| > 0.08), and stringent confidence (P < 0.05 and 0.16 < ε < -0.12)] that strike different balances between false negatives and false positives as described in our previous study (1). The quality estimates of data produced at each of these thresholds was completed as described below and is provided in fig. S2. Pearson Correlation Coefficient (PCC) matrices used to generate the essential, nonessential and global similarity networks (Fig. 1-2) are also provided (Data File S3). The dataset can also be browsed interactively at http://thecellmap.org/.
The complete genetic interaction dataset based on analysis of ~23 million double mutants, which mapped ~550,000 negative and ~350,000 positive genetic interactions and covers ~90% of all yeast genes as either array and/or query mutants can be downloaded from http://thecellmap.org/costanzo2016/.
Download the complete Supplementary Online Material package. (500MB)
Genes on the DMA and TSA with the highest 20% (Table S1) and lowest 20% (Table S2) negative GI degrees, based on the intermediate threshold, were tested for associations with gene features. In cases where multiple alleles of one query or array gene were screened (about one third of the genes represented on the TSA) one allele was selected randomly before calculating degree. Part 1 of each table (S1-1 and S2-1) shows results of two-tailed hypergeometric tests of enrichment and depletion of the presence of binary gene features the gene set. All P-values are doubled to correct for performing two tests for each feature. Only the lower of the two P-values is shown in the table and, if significant (P < 0.05), the “Test result” column indicates which test was used. Part 2 of each table (S1-2 and S2-2) shows results of Wilcoxon rank-sum tests on continuous and ordinal gene features, comparing genes in the high- and low-20% set to all other genes. The “Test result” column indicates how the median degree of the focus gene set compares to other genes if the P-value is significant; if the medians are equal and the P-value significant, the mean degrees were compared instead and the result column says “(mean)”. “N/A” indicates that < 60% of data is present for genes on the array and the test was not done. Uncorrected P-values are listed for all features. Given that analysis of different features required using different statistical tests and some features are not expected to be independent of each other, no multiple hypotheses correction procedures were used. We do note that 31 gene features were tested.
Gene pairs corresponding to a subset of 302 positive genetic interactions involving a proteasome component (100 interactions) or gene pairs not involving a proteasome gene (202 interactions) were selected and confirmed by spot dilution growth assays on solid agar medium. Positive interactions involving the proteasome exhibited SGA scores ranging (ε) from 0.09 to 0.36 whereas SGA scores (ε) for non-proteasome gene pairs ranged from 0.19 to 0.44. Despite a lower score magnitude, 32% of positive interactions involving the proteasome were classified as genetic suppression compared to only 24% of positive interactions not involving the proteasome. Based on these results, we estimated that the proteasome participates in 1168 genetic suppression interactions compared to 544 suppression interactions involving non-proteasome genes in the ExE network.
(A) Scatter plot comparing negative genetic interaction (GI) degree of DAmP and TS alleles of the same gene. (B) Scatter plot comparing positive GI degree of DAmP and TS alleles for the same gene. The Spearman correlation is noted on each plot and illustrates that TS and DAmP alleles for the same gene do not tend to show similar numbers of negative and positive genetic interactions. (C) Distribution of negative GI degree of all TS and DAmP queries against the essential genes on the TSA. (D) Distribution of positive GI degree of all TS and DAmP queries against the essential genes on the TSA. In (C) and (D) the header shows the P-value derived from a rank-sum test. TS alleles have higher positive and negative genetic interaction degree than DAmP mutants. (E) Single mutant fitness (SMF) distribution of all TS and DAmP query alleles screened against the TSA. The header shows the P-value derived from a rank-sum test. TS alleles exhibit greater fitness defects than DAmP alleles. (F) Negative query strain GI degree of TS and DAmP alleles binned by SMF. (G) Positive query strain GI degree of TS and DAmP alleles binned by SMF. All GI degrees were averaged across 100 randomizations where a single array allele per gene was randomly selected.
(A) Average precision and recall of genetic interactions (GI) derived from screening both essential and nonessential gene queries against the essential TS array (TSA) and nonessential deletion mutant array (DMA) at different SGA score cutoffs. These estimates are based on a subset of 40 screens (14 queries against the TSA and 26 queries against the DMA) that were each replicated 5 times and true interactions were assumed to be interactions that were found in two or more of the replicates. Error bars indicate the standard deviation. In general, negative and positive interactions derived from screens against the same mutant array exhibit similar levels of technical reproducibility. Positive and negative GIs involving essential genes tend to be more reproducible than nonessential gene interactions. (B) Fraction of GIs that were identified at least 1, 2, 3, 4, or 5 times out of 5 screens. The average fraction of GIs for queries with at least 5 replicates is shown for actual called interactions (blue lines for negative GIs; yellow lines for positive GIs) and randomly selected gene pairs (grey lines). These results are based on an intermediate genetic interaction score cutoff (genetic interaction score, |ε| ≥ 0.08, P < 0.05). (C) Fraction of interacting genes that are co-annotated to the same GO biological process term. GIs were grouped by the number of times they were found in the 5 replicates for a given query. The average across queries with at least 5 replicates is shown. The dashed line indicates the background co-annotation rate for non-interacting pairs and (*) indicates statistically significant differences from background (P < 0.05). These results are based on an intermediate genetic interaction score cutoff (genetic interaction score, |ε| ≥ 0.08, P < 0.05). These results show that highly reproducible negative interactions (blue) are more likely to connect gene pairs that are annotated to the same GO biological process terms for both essential and non-essential genes. The relationship between functional relationships and reproducibility of positive interactions was weaker for non-essential genes and a similar trend between functional relationship and interaction reproducibility was not observed for essential genes. See Methods Section entitled, “Estimating reproducibility of genetic interactions”.
Circles represent all unique pairs of genes, unique essential gene pairs (ExE), unique essential-nonessential gene pairs (ExN) and unique nonessential gene pairs (NxN) encoded in the S. cerevisiae genome. The fraction (%) of gene pairs tested for genetic interactions in the complete SGA dataset, ExE, ExN and NxN networks described in this study (http://thecellmap.org/costanzo2016/) are shown in blue, and the fraction of tested pairs relative to all annotated genes is indicated, whereas the fraction of all tested pairs relative to the number of gene pairs that could be tested by SGA is indicated in brackets. The fraction of gene pairs which have not yet been tested is shown in white and the fraction of gene pairs which could not be tested is shown in grey. The fraction of “Untestable” gene pairs includes mutant strains that are incompatible with SGA technology (e.g. sterile mutants, histidine, arginine and lysine auxotrophs etc.), mutant strains that do not survive SGA selection steps due to extreme fitness defects, and genes that are not represented in our array and/or query mutant collections.
Genes with varying degree of genetic interaction profile similarity were evaluated for overlap with either Gene Ontology biological process co-annotations or protein-protein interactions using precision-recall analysis as described in (62). For all plots, gene pairs were sorted by Pearson correlation coefficients reflecting similarity in their genetic interaction profiles. The dashed lines show the background rate of co-annotation for the relevant set of gene pairs. Precision-recall analysis was completed separately for genetic interaction profiles derived from the essential, nonessential and the combined genetic interaction profile similarity networks. (A) Gene pairs evaluated against a GO biological process standard. In general, a genetic interaction profile similarity network based on both nonessential and essential gene profiles (global similarity network) identified functionally related genes pairs with the greatest amount of accuracy. (B) Gene pairs evaluated against a protein-protein interaction standard. See Methods Section entitled, “Comparison of SGA genetic interactions with other genomic datasets and precision-recall analysis”.
Shown are the distributions of Pearson correlation coefficients (PCC) measuring genetic interaction (GI) profile similarity for all pairs of essential genes (purple triangles) and all pairs of nonessential genes (orange triangles). Also shown are PCCs for GI profiles corresponding to pairs of genes belonging to the same protein complex (purple circles for essential genes, orange circles for nonessential genes). Distributions were normalized (total area of 1) and smoothed over a 3-bin window. This analysis illustrates that GI profiles associated with essential genes encoding members of the same complex tend to share greater similarity to each other compared to GI profiles of nonessential genes encoding members of the same complex. This indicates that essential GIs, in general, tend to be more coherent than nonessential GIs.S05.pdf
We predicted Gene Ontology annotations for a set of GO biological process terms defined in (62). (A) Performance when predicting function for nonessential genes. Terms for which nonessential gene predictions were more successful when based on interactions with essential genes fall above the diagonal while nonessential gene predictions that were more accurate when based on interactions with other nonessential genes fall below the diagonal. Biological process terms that clearly fall in either area of the graph are labeled. Prediction performance was summarized for each GO term by computing the precision at 25% recall. The X-axis represents performance when only nonessential query mutant profiles were used to calculate array-array gene pair similarities. The Y-axis plots performance when only interactions from essential query mutants are used. (B) Performance when predicting function for essential genes. Terms for which predictions were more successful when based on interactions with other essential genes fall above the diagonal. Terms for which predictions were more successful when based on interactions with nonessential genes fall below the diagonal. (C) Complete precision-recall curves for the indicated terms are shown. The performance of all essential, an equal subset of nonessential queries, the complete set of nonessential queries, and all queries are shown. Predictions were made 50 times with different random allele samplings and error bars represent the standard error on precision over 50 iterations. For the complete dataset, all data (i.e. all alleles) were used, thus only a single set of results is shown. (D) Cumulative performance across all included GO terms of the K-nearest neighbor classifier is summarized for the various query mutant subsets used to calculate array similarities. The left panel shows results for nonessential genes while the right shows results when predicting function for essential genes. In general, we found that essential gene interaction profiles provided higher accuracy gene function predictions across a diverse set of bioprocesses when compared to nonessential gene interaction profiles. However, either the essential or nonessential similarity networks uniquely predicted certain functions. For example, interactions with nonessential genes were highly predictive of vacuolar transport and peroxisome functions, whereas interactions with essential genes were more informative for predicting chromosome segregation and RNA splicing related functions. Optimal functional prediction performance was achieved by using the global similarity network (black lines on graph). See Methods Section entitled, “Predicting gene function from essential versus nonessential genetic interaction profile similarity networks”.
(A) A gene hierarchy based on a subset of high-confidence genetic interaction profiles corresponding to ~1000 genes, including both essential and nonessential genes that a shared highly similar genetic interaction profile (PCC > 0.6) with at least one other gene (see section above, “A genetic profile similarity-derived functional hierarchy” for details). Clusters identified at different levels of the hierarchy represent functional relationships of different specificity as indicated by the colored lines on the dendogram. The colored heatmaps are colored to match the lines indicating their level of specificity on the hierarchy and the alternating shades of each color reflect different clusters (e.g. 10 clusters for the cell compartment level of the hierarchy). The grey-scale heatmap summarizes enrichment observed for the functional standards described in (B) and the Z-scores are calculated from histograms shown in (B). (B) To assess the concordance of hierarchical clusters with established functional data, we performed a hypergeometric enrichment test for each cluster identified at the same hierarchical level, using several functional standards including a protein localization standard based on automated image analysis of the yeast GFP collection (17, 119), GO biological process annotations (14), a protein complex standard (Data File S12) and a pathway standard, KEGG (120). We then measured the tendency of clusters with one or more annotation(s) in common to merge together at higher hierarchical levels, which correspond to weaker shared profile similarity. Finally, we randomized the parent assignments of each cluster and counted the number of sibling pairs sharing an annotation after each randomization. This randomization allows the distribution of clusters, their membership, their sizes, and the number of sibling pairs under each parent to remain fixed, while randomizing only the hierarchical relationships of the clusters themselves. Put differently, we simply shuffled the parent assignments across the children clusters. In any random instance, an existing sibling relationship can be preserved (with low probability) and each parent has the same number of children as it did before the shuffling, though their identities (along with their annotations) will have changed. We repeated the randomization process 1000 times to derive empirical distributions of expected annotation overlap with a randomized hierarchy (shown in gray). The observed number of siblings enriched for at least one common term derived from the real hierarchy is indicated with the black arrow along with the corresponding P-value determined empirically from the proportion of random iterations where the test statistic meets or exceed the observed value. Each column represents a different functional annotation standard and rows correspond to results of the test at varying depths in the hierarchy. Rows 1, 2, and 3, show scores for the children of hierarchy levels corresponding to cell compartment (PCC=0.05), bioprocesses (PCC=0.2), and protein pathways/complexes (PCC=0.4) thresholds, respectively. This analysis illustrates that the hierarchical structure suggested by genetic interaction profiles is supported by all tested functional standards at the deepest levels of the hierarchy, including the most specific standards (protein complexes and pathways). Larger clusters formed at intermediate levels of the hierarchy are most strongly supported by GO biological processes. Finally, the structure at the highest levels of the hierarchy (weaker but significant genetic interaction profile similarity) was supported by a protein localization standard corresponding to coherent cellular compartments. See Methods Section entitled, “A genetic profile similarity-derived functional hierarchy”
(A) Processing extracts were prepared from wild type (WT) as well as ipa1-5001, cft2-1 and pcf11-2 TS mutants grown at 30°C or grown at 30°C and then shifted to 37oC for 2 hours where indicated. Assays were conducted for 20 min at 30°C. For the coupled cleavage-polyadenylation assays, extracts were incubated with ATP and 32P-labeled, full-length GAL7-1 RNA. For cleavage only assays, the reactions were performed as described above except that 3’-dATP was used instead of ATP. For poly(A) addition assays, the reactions were performed as above except that precleaved GAL7-9 RNA, ending at the poly(A) site, was used as a precursor. RNA products were resolved on a denaturing 5% polyacrylamide gel and visualized with a phosphorimager. The positions of substrate and products are depicted on the side of the images, and the lane marked “precursor” indicates the unreacted substrate for each reaction. (B) Distribution of identified poly(A) reads with respect to annotated transcription termination sites (69). The cumulative distribution is shown in the bottom panel. Significant differences between mutants and BY4741 were computed using the Wilcoxon rank-sum test for the 200 bp surrounding annotated transcription termination site. These results show that strains expressing mutant a mutant TS allele of IPA1 exhibit mRNA processing defects both in vitro and in vivo. See Methods section entitled, “IPA1 experimental validation”.
(A) Triple mutant (TM) genetic interaction analysis of MTC pathway genes. Individual MTC pathway deletion mutants were mated to double mutant strains carrying mutations in redundant components of the aromatic amino acid biosynthesis pathway. Diploid strains (genotype indicated) were sporulated and 15-20 tetrads from each cross were dissected. Representative tetrads from each cross are shown below the table. The growth of haploid progeny carrying all three selectable markers is summarized (TM phenotype and comments column). (B) Representative fluorescence micrographs highlighting yeast cells with wild-type and aberrant Bap2-GFP localization in may24 and mtc2 mutants. Bap2-GFP and FM4-64 vacuolar membrane staining, as well as an image of the fluorescent signal overlay are shown. See Methods section entitled, “MTC pathway experimental validation”.
Relative genetic interaction (GI) degrees of all strains on the TSA and DMA were calculated by counting interactions that met the intermediate thresholds and dividing by the total number of query mutants screened against the relevant array. (A) Distributions of negative GI degree for TS array (TSA) mutants. (B) Distributions of positive GI degree for TS array (TSA) mutants. (C) Distributions of negative GI degree for deletion array (DMA) mutants. (D) Distributions of positive GI degree for deletion array (DMA) mutants.
Negative (fig. S11) and positive (fig. S12) genetic interaction (GI) degrees were calculated for all strains on the DMA and TSA using the intermediate genetic interaction score threshold (genetic interaction score, |ε| ≥ 0.08, P < 0.05) and counting any interacting query strain. Wilcoxon rank-sum tests compared the GI degree in paired gene sets defined by absence and presence of each binary feature tested. (A) For uncorrected P-values meeting a P<0.05 threshold, the “Test result” column describes the degree of the set of genes for which the listed binary feature is true (compared to the set for which the feature is false). Tests were not performed, indicated by “N/A”, if data were present for fewer than 50 strains and strains with missing data were excluded from the tests. (B) Pearson’s correlation was used to measure associations between GI degree and features that are continuous or count-based. Error bars reflect the 95% confidence interval on the correlation coefficient. For all correlation analyses, only strains corresponding to essential genes on the TSA were included. Given that analysis of different features required using different statistical tests and some features are not expected to be independent of each other, no multiple hypotheses correction procedures were used. We do note that 29 gene features were tested. These analyses identify a set of physiological and evolutionary features associated with the frequency of negative and positive GIs for each gene, which can be exploited to predict genes that serve as highly connected genetic network hubs and thus general genetic modifiers in other organisms. The features examined in this analysis are described in the Methods section entitled “Genetic interaction degree and frequency analysis”. The same analysis was repeated based on negative and positive GI degree associated with nonessential and essential query mutant strains. The results were consistent with those shown here and the data are provided as Data File S10.
Genetic interaction (GI) frequencies were evaluated for genes annotated to specific GO Molecular Function term, excluding any term with a size less than five genes. The GI frequency associated with each GO term was determined using pairs of array gene mutants annotated to that GO term and was calculated as the ratio of negative (blue) or positive (yellow) GIs to the total number of array gene pairs screened. (A) The GI frequency among nonessential genes annotated to the specified GO term. (B) The GI frequency among essential genes annotated to the specified GO term. One random allele was selected in instances where multiple mutant alleles were available for the same gene. The dotted lines represent background frequency of negative (blue) and positive (yellow) GIs in the nonessential and essential genetic networks, which were calculated by adding all interactions and screened pairs that were counted for individual GO terms, then dividing the sums. Gene assignments to GO Molecular Function terms were obtained from the S. cerevisiae-specific GO slim terms, which were downloaded from http://geneontology.org/page/go-slim-and-subset-guide in Jan 2013. Only interactions that met the intermediate threshold (genetic interaction score, |ε| > 0.08, P < 0.05) were considered. See Methods section entitled “Genetic interaction degree and frequency analysis”.
Genetic interaction (GI) frequencies were evaluated for genes encoding proteins consisting of a common PFAM-annotated domain were measured, excluding any domain that appeared in fewer than five proteins. (A) The GI frequency among nonessential genes annotated to the specified PFAM domain. (B) The GI frequency among essential genes annotated to the specified PFAM domain. The GI frequency associated with each PFAM domain determined using pairs of array gene mutants annotated to a specific PFAM domain and was calculated as the ratio of negative (blue) and positive (yellow) GIs to the total number of array gene pairs screened. One random allele was selected in instances where multiple mutant alleles were available for the same gene. The dotted lines represent background frequency of negative (blue) and positive (yellow) GIs in the nonessential and essential genetic networks, which were calculated by adding all interactions and screened pairs that were counted for individual GO terms, then dividing the sums. Gene assignments to PFAM groups were downloaded from http://pfam.xfam.org/proteome?taxId=559292 in May 2014. Only interactions that met the intermediate threshold (genetic interaction score, |ε| > 0.08, P < 0.05) were considered. See Methods section entitled “Genetic interaction degree and frequency analysis”.
Genetic interactions (GIs) were evaluated for overlap with either Gene Ontology biological process co-annotations or protein-protein interactions using precision-recall analysis as described in (62). For all plots, negative and positive GIs were thresholded at the intermediate cutoff (genetic interaction score, |ε| ≥ 0.08, P < 0.05) and sorted by |ε|. The dashed line indicates the background rate of co-annotation for the relevant gene pairs. Precision-recall analysis was completed separately for positive (yellow lines) and negative (blue lines) GIs on different sets of gene pairs and different standards. (A) Essential-nonessential (ExN) gene pairs evaluated against a GO biological process standard. (B) Nonessential-nonessential (NxN) gene pairs evaluated against a protein-protein interaction standard. (C) Essential-essential (ExE) gene pairs evaluated against a protein-protein interaction standard. (D) Essential-nonessential gene pairs (ExN) evaluated against a protein-protein interaction standard. See Methods Section entitled “Comparison of SGA genetic interactions with other genomic datasets and precision-recall analysis”.
Genetic interactions (GIs) and GI profile similarities were evaluated for overlap with either Gene Ontology biological process co-annotations or protein-protein interactions using precision-recall analysis as described in (62). For all analysis of direct GIs, negative GIs were thresholded at the intermediate cutoff (genetic interaction score, |ε| ≥ 0.08, P < 0.05) and sorted by |ε|. Pearson correlation coefficients (PCCs) were calculated for array-array mutant pairs (matrix columns) and only a single allele of each gene (selected at random) was used. (A) Comparison of functional information associated with direct negative GIs (blue line) versus profile similarity (Pearson, orange line) for essential genes evaluated against a GO biological process co-annotation standard. (B) Comparison of functional information associated with direct negative GIs (blue line) versus GI profile similarity (Pearson, orange line) for essential genes evaluated against a protein-protein interaction standard. (C) Comparison of functional information associated with direct negative GIs (blue line) versus profile similarity (Pearson, orange line) for nonessential genes evaluated against a GO biological process co-annotation standard. (D) Comparison of functional information associated with direct negative GIs (blue line) versus GI profile similarity (Pearson, orange line) for nonessential genes evaluated against protein-protein interaction standard. See Methods Section entitled “Comparison of SGA genetic interactions with other genomic datasets and precision-recall analysis”.
(A) Jaccard similarity coefficient between different alleles of the same gene (left) and between alleles of different genes (right). The median value across different alleles is shown for each gene. This analysis shows that alleles of the same gene tend to share more positive genetic interactions (GIs) in common than random pairs of alleles of different essential genes. (B) The same analysis as shown in (A), but restricted to profile similarity between query mutants belonging to genes comprising “Nuclear-related” functional clusters defined in cluster in Fig. 5D. Profile similarity in this analysis was based on positive GIs with genes found in the “Cytosolic/Vesicle traffic-related” functional clusters also defined in Fig. 5D. (C) The reciprocal analysis of (B), where profile similarity of query mutants in the “cytosolic” functional clusters was evaluated based on positive GIs with genes in the “nuclear-related” functional cluster defined in Fig. 5D. P-values were calculated with a Wilcoxon rank-sum test. (D) Precision-recall curve for predicting protein complex membership based on similarity of positive GI profiles alone. Pairs of query mutants were sorted based on their Jaccard similarity coefficient calculated using binarized profiles of positive GIs. (E) The same analysis as in (D) but query mutants were restricted to those in the “Nuclear-related” functional cluster and GI profiles were based only on positive GIs involving genes belonging to the “Cytosolic/Vesicle traffic-related” functional cluster defined in fig. 5D. (F) The reciprocal analysis where query mutants were restricted to those in the “Cytosolic/Vesicle traffic-related” functional cluster and the GI profiles were based only on positive GIs involving genes in the “Nuclear-related” functional cluster defined in Fig. 5D. The dashed grey line shows the background precision expected from randomly selected gene pairs. A list of genes and alleles used in this analysis is provided in Data File S6. See Methods section entitled “Evaluating functional coherence of positive interactions”.
A pooled DAmP allele collection was screened against 92 diverse compounds. (A) Compared to the average score per compound (filled circles), a ynl181W DAmP allele mutant (open circle) exhibited a wide range of responses to different compounds from strong resistance to strong sensitivity. For example, the ynl181w mutant strain was resistant to cycloheximide (translation inhibitor) and microazole (ergosterol biosynthesis inhibitor) but highly sensitive to amphotericin (binds ergosterol) and micafungin (inhibits glucan synthesis). A list of the compounds screened and the corresponding Chemical Genetic Interaction (CGI) scores are provided in Data File S17. (B) Growth of a ynl181w-DAmP allele strain in liquid culture in the presence of the indicated compounds relative to an isogenic wild type strain grown in the same condition. The ynl181w mutant strain was sensitive to amphotericin B (binds ergosterol) and caspofungin (inhibits glucan synthesis) as predicted from the pooled experiment. The DAmP mutant showed greater resistance to poacic acid (binds glucan) and tunicamycin (inhibits glycosylation). Error bars indicate the standard error based on three biological replicates. See Methods section entitled “YNL181W chemical genetic screens”.
(A) Negative genetic interaction (GI) frequency in the essential (ExE) network for genes encoding members of the 19S and 20S proteasome subunits and negative GI frequency between 19S and 20S proteasome subunit genes. (B) The boxplot shows query gene profile similarity (PCC) between the query gene profiles of (i) the 19S proteasome subunit; (ii) the 20S proteasome subunit; (iii) the 19S and 20S proteasome subunits; (iv) the 19S subunit and all query genes not annotated to the proteasome; and (v) the 20S subunit and all query genes not annotated to the proteasome. Significance was assessed with Wilcoxon rank-sum tests. Results shown correspond to a representative randomization where a single query and allele were selected per gene for genes that have more than one allele. (C) Heterozygous deletion of yeast genes encoding 20S proteasome subunits showed increased sensitivity to the proteasome inhibitor, Boretzomib, compared to strains heterozygous for deletion of 19S proteasome genes. Yeast chemical-genetic interactions for Bortezomib were obtained from (30). Frequency distribution of the heterozygous chemical-genetic interaction profile scores (MADL) for Bortezomib at 200 nM for genes annotated to the 20S proteasome core particle (purple), 19S proteasome regulatory particle (orange) and all other heterozygous mutants (black). Heterozygous mutants in the 20S proteasome core particle show significantly more sensitivity to Bortezomib than the background (Wilcoxon rank-sum P-value < 10-10). Conversely, 19S proteasome regulatory particle gene mutants show significantly more resistance to Bortezomib than the background (Wilcoxon rank-sum P-value = 0.003).
(A) Cell fitness (ATP content in cell population) upon knockdown of components of the 19S (blue) or 20S (purple) subunit of the 26S proteasome without Bortezomib treatment as a fraction of the non-targeting RNAi reagent (targeting Rluc) effect (dashed line). Each dot represents the median of 6 measurements. Both subunits were not separated by the knockdown effects of their components (P = 0.976, Wilcoxon rank sum test, two-sided). (B) Dose-response curves to Bortezomib upon knockdown of quality-filtered 26S proteasome components. The effects of each knockdown to Bortezomib were normalized to the control (0 nM) treatment. Box plots represent 36 measurements of the non-targeting RNAi reagent at each Bortezomib concentration. For each knockdown of a 26S proteasome component, the lines connect the median of 6 measurements per knockdown and Bortezomib concentration. Dashed lines connect the median absolute deviation of the 6 measurements per knockdown and Bortezomib concentration. Purple lines represent 20S, blue lines 19S components. (C) Distinct sensitivities of 19S and 20S subunit components upon Bortezomib treatment. Boxplots represent the median values (of 6 measurements) for all components of the 20S (purple) or 19S (blue) subunit of the 26S proteasome. Members of the 19S and 20S subunits were separated from the control RNAi measurement at Bortezomib concentrations from 2 nM to 8 nM (p < 0.05, Wilcoxon rank sum test, two-sided), and members of those two proteasome subunits were separated from each other between 2 nM and 16 nM (p < 0.05, Wilcoxon rank sum test, two-sided). See Methods section entitled “RNAi and cell fitness measurements in Drosophila cell culture”.
Fold enrichment in genes containing at least 1, 2, or 3 structural domains among the 10% of genes that show the highest number of negative (blue) and positive (yellow) interactions with proteasome encoding genes. A domain was only considered once in cases where a gene encoded multiple repeats of the same domains. The dotted line reflects the expected fraction of genes in the top 10% interactors with the proteasome and “*” marks significant differences from the background fraction. See Methods section entitled “Protein features associated with proteasome interacting genes”.
(A) Distribution of protein complexes enriched (yellow) or depleted (grey) for positive genetic interactions (GIs) in the nonessential (NxN) network (see above, Methods section entitled “Analysis of protein complexes enriched for positive interactions” for more details and Data file S15). Protein complexes that met the chosen threshold (N_fold_pos > 1.5, Data File S15) were evaluated, using Fisher's exact test, and found to be enriched for proteostasis-related functions (118). (B) Distribution of the positive:negative interaction ratio for protein complexes enriched for positive interactions in (A). Genes belonging to complexes biased for positive interactions in the nonessential genetic network (N_fold_pos > 1X and posGI_bias_with_N > 1.5X, Data File S15) did not show significant functional enrichment for genes with cell cycle phenotypes. (C) Genes belonging to complexes biased for positive interactions in the ExE network are enriched for negative genetic interactions with cell cycle checkpoint-related genes (www.yeastgenome.org). P-values shown are based on a one-sided Fisher’s test.
We applied an intermediate threshold on the genetic interaction score (genetic interaction score, |ε| ≥ 0.08, P < 0.05) and computed Jaccard similarity coefficients between TSA-derived positive genetic interaction (GI) profiles for query strains corresponding to members of the CCT, prefoldin and proteasome complexes. (A) The boxplot shows the Jaccard similarity coefficient of specific query mutants with the proteasome complex. The left box (Within) shows similarity for genes in the proteasome complex; the center box (CCT & prefoldin) shows similarity between genes of the CCT and prefoldin complexes with genes of the proteasome complex; the right box (Other) shows similarity of all other query mutants in the global network with genes in the proteasome complex. Similarity of query mutants for the same gene against the proteasome complex was averaged. (B) Similar to (A), but Jaccard similarities were calculated with respect to the CCT complex. (C) Similar to (A), but Jaccard similarities were calculated with respect to the prefoldin complex. (A-C) suggest that members of the CCT, the proteasome and the prefoldin complexes share positive GIs in common. (D) Precision-recall curve for predicting protein folding-related genes (118) using query mutant positive GI profile similarity (Jaccard) to the proteasome complex. This analysis shows that genes involved in protein folding related functions (118) tend to share positive GIs in common with the proteasome. Query genes annotated to the proteasome, CCT, and prefoldin complexes were excluded from this analysis. See Methods section entitled “Evaluating functional coherence of positive interactions”.
(A) Enrichment analysis for positive and negative genetic interactions (GIs) with the proteasome among DAmP and TS query mutants. Fold enrichment was calculated as the frequency of GIs between DAmP or TS query mutant alleles with the proteasome divided by the frequency of GIs between DAmP and TS query mutant allele interactions with all mutants on the TS array (TSA). GIs between different members of the proteasome complex were excluded. GI frequencies were averaged across 100 randomizations where a single query and array allele per gene were randomly selected. The dashed line indicates the expected frequency of interactions. Fold enrichment and significance of the enrichment with respect to background are shown on top of each bar. (B) Similar to (A), but GI frequency was calculated against mRNA decay-annotated essential TS array genes.
This folder contains complete SGA genetic interaction data for the following:
The interaction datasets are provided in a tab-delimited format with 11 columns:
This folder contains complete SGA genetic interaction data matrix for the following:
Matrix files containing genetic interaction profile similarity values (as measured by Pearson correlation) for every pair of mutant strains in the dataset. Similarity values were computed for essential (ExE), non-essential (NxN) and the global similarity network derived from a combined set of all genetic interactions (ExE, NxN, ExN) as described above (see "Constructing genetic interaction profile similarity networks"). Each matrix contains 2 sets of row and column headers, providing a unique allele name for every mutant strain (row & column header #1) as well as a systematic ORF name (row & column header #2).
This file reports the performance of gene function prediction for non-essential or essential genes based on genetic interaction profiles. For both classes of genes (either nonessential or essential), the performance of a KNN classifier is reported as the Precision at 25% Recall based on interactions derived from TS queries (PR_TSQ) or nonessential deletion queries (PR25_SN). Although analyses were performed using complete genetic interaction profiles (e.g. negative and positive genetic interactions), similar prediction performance was obtained using genetic interaction profiles based on negative interactions alone.
This file lists the results from SAFE analysis of the global genetic profile similarity network (Fig. 1 and Fig. 2). Functional terms enriched within specific network clusters associated with GO biological processes (14) and/or protein complexes (Data File S12). A list of genes comprising each bioprocess-enriched cluster shown on the global similarity network is also provided. Functional terms enriched within specific network clusters associated with cell compartments (17, 119) are all shown on Fig. 2B.
The first tab (“Gene to hierarchy cluster mapping”) lists the clusters identified at each level of the genetic interaction-based hierarchy and the deletion and TS allele array mutants assigned to each cluster. Examples of clusters described in the main text are highlighted. The subsequent 9 tabs indicate enrichment of clusters resolved at the specified profile similarity range for specific cell compartments (Cyclops_enrich), biological processes (GO BP_enrich), protein complexes (complex_enrich) and KEGG pathways (KEGG_enrich). The final tab in the file indicates the clusters used to map the functional distribution of negative and positive interactions shown in Fig. 5D.
This file lists nonessential and essential query genes associated with high confidence pleiotropy scores based on their genetic interactions derived from the TSA (Essential derived pleiotropy) and DMA (Nonessential derived pleiotropy). The file also contains a second list of nonessential and essential query genes that participate in many genetic interactions but exhibited low pleiotropy scores indicating that these genes are more functionally specific.
This file lists proteins identified with high confidence as specific physical interactors with strains expressing Ipa1-GFP from its endogenous locus or Ipa1-HA from a galactose-inducible plasmid.
This file lists the negative and positive interaction degree associated with every nonessential deletion (sn#), essential TS (tsq#), and DAmP (damp#) query mutant strain screened against the DMA (“query degree X DMA” tab) and/or TSA (“query degree X TSA” tab). A subset of strains were found to carry a second, spontaneous suppressor mutation that affected fitness of the query mutant strain. Strains carrying a suppressor mutation mapped through SGA analysis are indicated (“-supp”). Query mutants comprising the 20% highest and lowest degree groups of strains are indicated. Furthermore, a “Co-batch signal” rank is provided for every query (see “Co-batch filtering of query mutant strains”). Low ranks correspond to evidence for lingering batch effects. Another column, “ Gene with correlated GI profiles that are co-annotated with the query gene (%)", provides the percent of correlated gene pairs that are co-annotated to the particular query. A low negative interaction degree (e.g. 20% lowest negative interaction degree) coupled with a low co-batch rank (e.g. < ~0.2) and a low fraction of correlated pairs that share a similar functional annotation with a given query strain (e.g. < ~0.15) may be indicative of a low confidence screen. However, these criteria should be considered as loose indicators and not definitive metrics of screen quality and thus, should not be used as strict filters on the global interaction dataset. Another list (“Queries removed - batch effects” tab) indicates ~300 query strains that exhibited severe systematic batch effects and thus were removed from the indicated data set. Finally, two additional tabs provide the negative and positive interaction degree associated with every nonessential (“nonessential array degree” tab) and essential (“essential array degree” tab) array mutant, respectively.
As a complement to analysis of array strains (fig. S11-S12), GI degrees were calculated for query strains by counting negative interactions (tab 1, interactions with DMA strains; tab 2, interactions with TSA strains) and by counting positive interactions (tab 3, interactions with DMA strains; tab 4, interactions with TSA strains). Essential and nonessential queries were analyzed separately and results are labeled by grouped column headers. Wilcoxon rank-sum tests compared the GI degree in paired gene sets defined by absence and presence of each binary feature tested (top table). If the P-value is significant (< 0.05), the “Test result” column describes the degree of the set of genes for which the listed binary feature is true (compared to the set for which the feature is false). Tests were not performed, indicated by “N/A”, if data were present for fewer than 50 strains; strains with missing data were excluded from the tests. Pearson’s correlation (column labeled “r”) was used to measure associations between GI degree and features that are continuous or counts (bottom table). Uncorrected P-values are shown. The features examined in this analysis are described above (see Methods section entitled, “ Genetic interaction degree and frequency analysis”). Given that analysis of different features required using different statistical tests and some features are not expected to be independent of each other, no multiple hypotheses correction procedures were used. We do note that 31 gene features were tested.
This file lists GO biological process, molecular function and cellular component terms that are enriched among of 10% of array strains with the most negative and 10% of array strains with the most positive interactions identified in the ExE network. Enrichments are also included for the 5% of array strains with the most negative interactions and 5% of array strains with the most positive interactions in the NxN genetic interaction network.
This file provides a list of protein complexes compiled from two sources: Baryshnikova 2010 (5) and Benschop 2010 (117).
This file lists all possible pairs of protein complexes tested in the ExE, NxN and ExN networks. Enrichment for negative and positive interactions between genes in the same complex or between genes in different complexes is indicated. In addition, enrichment for genetic interaction in general (Combined interaction enrichment) regardless of the type is also indicated. Finally, Interaction Sign Bias indicates the distribution of interactions between genes within the same complex or between genes in different complexes. The interaction sign bias is computed as the mean over all interactions for within a given complex or between a pair of complexes. For example, an interaction sign bias of -1 indicates that all interactions identified between a set of genes encoding complexes members are negative, whereas a score of 1 indicates that only positive interactions were identified between a particular set of protein complex encoding genes. Rows highlighted in blue indicated complex-complex pairs enriched for negative interactions where greater than 75% of all interactions detected were negative. Rows highlighted in yellow indicated complex-complex pairs enriched for negative interactions where greater than 75% of all interactions detected were positive. The analysis shown in Fig. 7 is based on subset of complexes composed of 75% essential genes (i.e. considered essential complexes) or 75% nonessential genes (i.e. considered nonessential complexes). The complexes used for this analysis and their enrichment results are listed in the tabs labeled, “_filtered”. The tabs named “all” list within and between complex enrichment for all protein complexes without prior filtering of complexes composed of less than 75% essential or 75% nonessential genes.
This file provides D. melanogaster S2 cell fitness upon RNAi-mediated 26S proteasome depletion and Bortezomib treatment.
This file indicates fold enrichment and biases in positive vs. negative interaction frequency for protein complexes and is described in detail above (see “Analysis of protein complexes exhibiting a positive interaction enrichment bias”). Rows highlighted in yellow indicate protein complexes that show > 1.5X enrichment for positive interactions (“E_fold_pos”) stronger enrichment for positive versus negative interactions when screened against the essential TSA. The file consists of the following columns:
This file includes raw data from spot dilution growth assays to identify positive interactions that can be classified as genetic suppression. The suppression score is based on visual assessment of double mutant strain growth relative to a wild type and single mutant control strains. The score reflects strength of suppression with a score of 4 indicative of a strong suppression interaction where double mutant growth exceeded growth of the sickest single mutant and a score of 0 indicates failure to confirm a suppression interaction.
Relative growth of a YNL181W-DAmP strain (CG score) measured in the presence of 92 different compounds.