Documentation is a multi-organism collection of genome-wide transcriptome or gene expression data that has been obtained from publicly available repositories and uniformly processed and normalized. allows biologists, clinicians, and machine learning researchers to search for experiments from different source repositories all in one place and build custom data sets for their questions of interest. is well-suited for quickly assessing if signals are present in particular datasets, for identifying and obtaining data sets for accelerated validation of findings, and for building large compendia for training machine learning models that can adequately handle the technical noise associated with integrating multiple experiments and platforms. is not a substitute for experiments and processing pipelines tailored to answer specific biological questions of interest or for input from relevant experts (e.g., those with statistics expertise), but rather a repository of samples processed with standardized pipelines that have been selected based on their wide-ranging utility.

For examples of how to use data, please see Downstream Analysis with Examples.

Frequently asked questions

What type of data does support?

How do you process the data?

How can I find out what versions of software/packages were used to process the data?


Please use the following:

Casey S. Greene, Dongbo Hu, Richard W. W. Jones, Stephanie Liu, David S. Mejia, Rob Patro, Stephen R. Piccolo, Ariel Rodriguez Romero, Hirak Sarkar, Candace L. Savonen, Jaclyn N. Taroni, William E. Vauclain, Deepashree Venkatesh Prasad, Kurt G. Wheeler. a resource of uniformly processed publicly available gene expression datasets. URL:

Note that the contributor list is in alphabetical order as we prepare a manuscript for submission.


If you have a question or comment, please file an issue on GitHub or send us an email at