Help

What is GeneMANIA?

GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional association data. Association data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. You can use GeneMANIA to find new members of a pathway or complex, find additional genes you may have missed in your screen or find new genes with a specific function, such as protein kinases. Your question is defined by the set of genes you input.

What biological questions can GeneMANIA answer?

Find more genes like these:

  • If members of your gene list make up a protein complex, GeneMANIA will return more potential members of the protein complex.
  • If your gene list consists of kinases, GeneMANIA will return more kinase genes.
  • If members of your gene list are involved in a specific disease, GeneMANIA will return more genes putatively involved in the same disease.
  • You want to perform a genetic screen with a phenotypic readout, for example. Enter the genes that you already know are likely to come out of your screen, for instance, genes known to underlie a specific phenotype. GeneMANIA will predict other genes underlying the phenotype, which can then be screened.

Tell me about my gene(s):

  • If you enter a single gene, GeneMANIA will find interactions in which your gene participates, within the selected datasets.
  • If you enter a gene list, GeneMANIA will return connections between your genes, within the selected datasets.
  • If you enter a gene list, GeneMANIA will show you in which datasets your gene list is highly connected.
  • If you enter genes from a protein interaction screen, like a pull down or yeast two-hybrid, and select only physical interaction networks, GeneMANIA will predict true positives (those more highly connected) and false positives (those less well connected).

Interpreting the results

GeneMANIA returns:

  • A list of genes with associated scores, including your input genes and predicted related genes.
  • A network that shows the relationships between genes in the list. This is a composite of all of the networks chosen from the database in a way that best connects related genes. Nodes represent genes and links represent networks. Genes can be linked by more than one type of network.
  • A list of networks weighted by their ability to connect related genes. This weighting is a measure of how 'informative' the network for the given set of input genes.

Predicted related genes can be, for instance, in the same pathway or complex as your input genes, can be co-expressed or have similar enzymatic function. To determine how predicted genes are related to your input genes, you need to study the links in the network to find out how your input genes are connected to each other and how new genes are related to your input genes.

GeneMANIA search tips

  • GeneMANIA works best if most of the input genes are functionally related. If they are not, a disconnected network will result and the network weighting will not be optimal. It does not matter which function they are related by, as long as that function is captured somehow by some functional association networks in the GeneMANIA system.
  • If your query list consists of 6 or more genes, GeneMANIA will calculate gene list-specific weights. If your query list has less than 6 genes, GeneMANIA will make gene function predictions based on GO annotations patterns.
  • GeneMANIA will be slower with an input gene list of more than 50 genes; if you have such large gene lists, we recommend using a gene list of no more than 100 genes. The GeneMANIA Cytoscape plugin is capable of handling larger gene lists.

Recognized gene identifiers

GeneMANIA recognizes Entrez, Ensembl, Standard gene symbols, Uniprot/SwissProt and RefSeq identifiers and unique gene names.

GeneMANIA network categories

GeneMANIA searches many large, publicly available biological datasets to find related genes. These include protein-protein, protein-DNA and genetic interactions, pathways, reactions, gene and protein expression data, protein domains and phenotypic screening profiles. Data is regularly updated.

Networks names describe the data source and are either generated from the PubMed entry associated with the data source (first author-last author-year), or simply the name of the data source (BioGRID, PathwayCommons-(original data source), Pfam)

  • Co-expression: Gene expression data. Two genes are linked if their expression levels are similar across conditions in a gene expression study. Most of these data are collected from the Gene Expression Omnibus (GEO); we only collect data associated with a publication.
  • Physical Interaction: Protein-protein interaction data. Two gene products are linked if they were found to interact in a protein-protein interaction study. These data are collected from primary studies found in protein interaction databases, including BioGRID and PathwayCommons.
  • Genetic interaction: Genetic interaction data. Two genes are functionally associated if the effects of perturbing one gene were found to be modified by perturbations to a second gene. These data are collected from primary studies and BioGRID.
  • Shared protein domains: Protein domain data. Two gene products are linked if they have the same protein domain. These data are collected from domain databases, such as InterPro, SMART and Pfam.
  • Co-localization: Genes expressed in the same tissue, or proteins found in the same location. Two genes are linked if they are both expressed in the same tissue or if their gene products are both identified in the same cellular location.
  • Pathway: Pathway data. Two gene products are linked if they participate in the same reaction within a pathway. These data are collected from various source databases, such as Reactome and BioCyc, via PathwayCommons.
  • Predicted: Predicted functional relationships between genes, often protein interactions. A major source of predicted data is mapping known functional relationships from another organism via orthology. For instance, two proteins are predicted to interact if their orthologs are known to interact in another organism. In these cases, network names describe the original data source of experimentally measured interactions and which organism the interactions were mapped from. E.g. A mouse network predicted from a human network: Barrios-Rodiles-Wrana-2005-Human2Mouse. Also, we include predicted functional associations from other groups that combine multiple data sources for a given organism, e.g., the entire YeastNet predicted network: Lee-Marcotte-2007 YeastNet; the genetic interaction data used to generate YeastNet: Lee-Marcotte-2007 Genetic interactions. In these cases, the network name indicates the original publication detailing the predicted network, and (in some cases) lists the individual network that was used to generate the entire predicted network (for latter example above). Some predicted networks include data from other organisms. In these cases, the original organism is mentioned in the network name, e.g., the yeast protein interaction data used to generate WormNet: Lee-Marcotte-2008 Protein interactions yeast2worm.
  • Other: Networks that do not fit into any of the above categories. Examples include phenotype correlations from Ensembl, disease information from OMIM and chemical genomics data.

Uploading your own network

You can upload your network to GeneMANIA and analyze it in the context of all publicly available networks that GeneMANIA knows about. Your network is deleted from the GeneMANIA server after your session ends, or within 24 hours. Please see our privacy policy for more information.

The upload network button can be found in the advanced options panel. Your network must be for one of the GeneMANIA supported organisms, be tab delimited text, and in the format GeneID <tab> GeneID <tab> Score. The score will vary depending on the type of network, but in general is a number ranging from zero (no interaction) to 1 (strong interaction). For an interaction network or a pathway where interactions either exist or don't exist, the score is 1 for all links. For a gene expression network, the score could be the Pearson correlation coefficient for the gene pair, representing the expression level simiarity across several experiments.

For a co-expression network, the score could be the Pearson correlation coefficient between the expression profiles of the two genes. Note that networks are normalized to reduce the effect of highly connected nodes, so scores may change slightly once uploaded.

Choosing an appropriate network weighting option

GeneMANIA can use a few different methods to weight networks when combining all networks to form the final composite network that results from a search. The default settings are usually appropriate, but you can choose a weighting method in the advanced option panel.

Query-dependent weighting

  • Automatically selected weighting method (default): the network weights are chosen based on the size of your input gene list. If your input gene list has less than 5 genes, the default network weighting method is 'Gene-Ontology (GO) based weighting, Biological Process based'. This weighting method assumes the input gene list is related in terms of biological processes (as defined by GO). If your input gene list has 5 or more genes, GeneMANIA assigns weights based to maximize connectivity between all input genes using the 'assigned based on query gene' strategy.
  • Assigned based on query gene: the weights are chosen automatically using linear regression, to make genes on your list interact as much as possible with each other, and as little as possible with genes not on your list. This is the default method if your input gene list contains more than 5 genes.

Gene Ontology (GO)-based weighting

These weighting methods are based on GO terms that have between 3 and 300 genes associated with them. Only the most reliable annotations were used (i.e. all annotations with an IEA evidence code were removed, as these are less reliable). There is one weighting method per GO branch.

  • Biological Process based: assumes the input gene list is related through GO biological processes. This is the default method if your input gene list contains less than 5 genes.
  • Molecular Function based: assumes the input gene list is related through the GO molecular functions.
  • Cellular Component based: assumes the input gene list is related through the GO cellular components.

Equal weighting

  • Equal by network: all networks are assigned an equal weight. This is useful if you want to see all networks that connect your input genes.
  • Equal by data type: all network categories are assigned an equal weight, with the weighting also evenly distributed among networks within each category.

Network data sources

Each network data source is represented as a weighted interaction network where each pair of genes is assigned an association weight, which is either zero indicating no interaction, or a positive value that reflects the strength of interaction or the reliability the observation that they interact. For example, the association of a pair of genes in a gene expression dataset is the Pearson correlation coefficient of their expression levels across multiple conditions in an experiment. The more the genes are co-expressed, the higher the weight they are linked by, ranging up to 1.0, meaning perfectly correlated expression.

Direct interactions are used for networks where binary information is available (like protein interactions). When two proteins interact, their network link has a weight of 1.

Shared neighbours were used for networks where the profile of one gene was compared to that of a second gene and the Pearson correlation coefficient was calculated (like protein domain data).

The GeneMANIA database consists of genomics and proteomics data from a variety of sources, including data from gene and protein expression profiling studies and primary and curated molecular interaction networks and pathways. GeneMANIA relies on the following data sources:

We maintain a complete list of networks currently in the GeneMANIA system.

GeneMANIA algorithm

GeneMANIA stands for Multiple Association Network Integration Algorithm.

The GeneMANIA algorithm consists of two parts:

  1. A linear regression-based algorithm that calculates a single composite functional association network from multiple data sources.
  2. A label propagation algorithm for predicting gene function given the composite functional association network.

GeneMANIA treats gene function prediction as a binary classification problem. As such, each functional association network derived from the data sources is assigned a positive weight, reflecting the data sources' usefulness in predicting the function. The weighted average of the association networks is constructed into a function-specific association network. GeneMANIA uses separate objective functions to fit the weights; this simplifies the optimization problem and decreases the run time.

GeneMANIA predicts gene function from the composite network using a variation of the Gaussian field label propagation algorithm that is appropriate for gene function prediction in which there are typically relatively few positive examples. Label propagation algorithms assign a score (the discriminant value) to each node in the network. This score reflects the computed strength of association that the node has to the seed list defining the given function. This value can be thresholded to enable predictions of a given gene function.

GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function
Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q (2008)
Genome Biology 9: S4.
PubMed Abstract (PDF)

Computer requirements

Web browser: GeneMANIA supports the latest versions of Chrome, Firefox, Safari and Internet Explorer. For a faster, smoother experience with GeneMANIA, we recommend you use a standards compliant browser, such as Chrome or Firefox.

Windows Mac OS Linux
Very well supported Chrome 5+, Firefox 3.6+ Chrome 5+, Firefox 3.6+, and Safari 5+
Reasonably well supported Internet Explorer 8+
May work Chrome 5+ and Firefox 3.6+
Not supported older versions and others older versions and others older versions and others

Internet Connection: A fast internet connection such as DSL, Cable or T1.

Computer: A modern computer with at least a 1GHz CPU, 1GB RAM and a modern video card.

How do I cite GeneMANIA?

We recommend citing the NAR webserver issue, as follows.

The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function
Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M, Grouios C, Kazi F, Lopes CT, Maitland A, Mostafavi S, Montojo J, Shao Q, Wright G, Bader GD, Morris Q
Nucleic Acids Res. 2010 Jul 1;38 Suppl:W214-20
PubMed Abstract (PDF)

What makes GeneMANIA unique?

Other gene function prediction programs are available, including STRING, bioPIXIE, Funcassociate, and FunCoup.

GeneMANIA has the advantage of flexibility, accuracy, and often speed of response over these other systems. In particular, in a competition (on yeast (Mostafavi et al., 2008 PDF, PubMed, journal) and mouse (Pena-Castillo et al., 2008 PDF, PubMed, journal), GeneMANIA was shown to be more accurate than other gene function prediction methods, and is generally faster, producing predictions within seconds. Because of this speed, GeneMANIA can produce results while you wait. Users can select arbitrary subsets of networks that they want to query and GeneMANIA automatically selects network weights based on the input gene list, generating a network specific to the user's gene list. Unlike other systems, GeneMANIA provides users with the ability to upload their own network and also compensates for redundancies in the data, so users don't have to worry about double-counting interactions.

The linking URL in its simplest form is http://genemania.org/link?o=<tid>&g=<genes>, where:

  • <tid> : NCBI taxonomy id for organism (A. thaliana=3702, C. elegans=6239, D. melanogaster=7227, H. sapiens=9606, M. musculus=10090, S. cerevisiae=4932)
  • <genes> : one or more gene symbols separated by pipes ("|")

Examples of the simplest form:

  • one gene : http://genemania.org/link?o=3702&g=rad50
  • multiple genes : http://genemania.org/link?o=3702&g=PHYB|ELF3|COP1|SPA1|FUS9

Optional Parameters:

GeneMANIA linking supports some optional parameters (reference GeneMANIA help section on meaning of the various weighting methods):

  • m : network combining method; must be one of the following:
    • automatic_relevance : Assigned based on query genes
    • automatic : Automatically selected weighting method
    • bp : biological process based
    • mf : molecular function based
    • cc : cellular component based
    • average : Equal by data type
    • average_category : Equal by network
  • r : the number of results generated by GeneMANIA; must be a number in the range 1..100.

If no optional parameters are provided, GeneMANIA assumes the default values: m=automatic; r=10.

Examples using optional parameters:

The following query runs the GeneMANIA algorithm for A. thaliana using 6 genes as input, the "average" method and returns 50 more genes: http://genemania.org/link?o=3702&g=DET1|HY5|CIP1|CIP8|PHYA|HFR1&m=average&r=50

The following query runs the GeneMANIA algorithm for A. thaliana's CIP1 gene using the "molecular process based" method and returns 101 genes: http://genemania.org/link?o=3702&g=CIP1&m=bp&r=100

Invalid queries:

  • http://genemania.org/link?o=3702 : at least one gene must be specified
  • http://genemania.org/link?o=1000 : invalid taxonomy id
  • http://genemania.org/link?o=3702&g=PHYA&m=super_smart&R=50 : invalid method
  • http://genemania.org/link?o=3702&g=det1&r=1000 : results must be less than 100

Happy linking!