Mining Differential Correlation

In many statistical problems a common set of variables is measured under two different experimental conditions (usually modeled by two underlying distributions),
and it is of interest to find variables that behave differently under one condition than the other. For high dimensional data, when the number of variables is large, it may happen that only a small set of variables behave differently under the two conditions. In most cases, researchers carry out a first order analysis in which they look separately for changes in individual variable. A classic example of first order analysis is the study of differential expression (changes in mean) in microarray studies.

In contrast to first order analyses, second order analyses look for changes in the pairwise association of variables, or equivalently, for variables that are associated differently under one experimental condition than the other. In practice, second order analyses can identify structure that is complementary to, and not revealed by, that found in first order analyses. We are currently working a special case of second order analysis, called differential correlation mining, in which the goal is to identify sets of variables having higher average pairwise correlation under one sample condition than the another. We have developed a method (called DCM) for differential correlation mining that is based on iterative testing. DCM is applicable to both low and high dimensional datasets: in applications to gene expression and brain FMRI data, it finds useful variable sets that are different from those found by first order (mean based) analyses.