Volcano plot (statistics)

Volcano plot showing metabolomic data. The red arrows indicate points-of-interest that display both large-magnitude fold-changes (x-axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.

In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data.[1] It plots significance versus fold-change on the y- and x-axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a p-value from an ANOVA model) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also statistically significant.

A volcano plot is constructed by plotting the negative log of the p-value on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing toward the top of the plot. The x-axis is the log of the fold change between the two conditions. The log of the fold-change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance (hence being toward the top).

Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a significance analysis of microarrays (SAM) gene selection criterion, an example of regularization.[2]

The concept of volcano plot can be generalized to other applications, where the x-axis is related to a measure of the strength of a statistical signal, and y-axis is related to a measure of the statistical significance of the signal. For example, in a genetic association case-control study, such as Genome-wide association study, a point in a volcano plot represents a single-nucleotide polymorphism. Its x value can be the odds ratio and its y value can be -log10 of the p-value from Chi-square test or a Chi-square test statistic.[3]

References

  1. Cui, X.; Churchill, G. A. (2003). "Statistical tests for differential expression in cDNA microarray experiments". Genome Biology 4 (4): 210. doi:10.1186/gb-2003-4-4-210. PMC 154570. PMID 12702200.
  2. Li, W. (2012). "Volcano plots in analyzing differential expressions with mRNA microarrays". Journal of Bioinformatics and Computational Biology 10 (6): 1231003. doi:10.1142/S0219720012310038. PMID 23075208.
  3. Li, W.; Freudenberg, J.; Suh, Y. J.; Yang, Y. (2014). "Using volcano plots and regularized-chi statistics in genetic association studies". Computational Biology and Chemistry 48: 77–83. doi:10.1016/j.compbiolchem.2013.02.003. PMID 23602812.

External links

This article is issued from Wikipedia - version of the Saturday, January 16, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.