refine.bio Documentation

refine.bio is a multi-organism collection of genome-wide transcriptome or gene expression data that has been obtained from publicly available repositories and uniformly processed and normalized. refine.bio allows biologists, clinicians, and machine learning researchers to search for experiments from different source repositories all in one place and build custom data sets for their questions of interest.

refine.bio is well-suited for quickly assessing if signals are present in particular datasets, for identifying and obtaining data sets for accelerated validation of findings, and for building large compendia for training machine learning models that can adequately handle the technical noise associated with integrating multiple experiments and platforms.

refine.bio is not a substitute for experiments and processing pipelines tailored to answer specific biological questions of interest or for input from relevant experts (e.g., those with statistics expertise), but rather a repository of samples processed with standardized pipelines that have been selected based on their wide-ranging utility.

For examples of how to use refine.bio data, please see Downstream Analysis with refine.bio Examples.

Frequently asked questions

What type of data does refine.bio support?

How do you process the data?

How can I find out what versions of software/packages were used to process the data?

Citing refine.bio

Please use the following:

Casey S. Greene, Dongbo Hu, Richard W. W. Jones, Stephanie Liu, David S. Mejia, Rob Patro, Stephen R. Piccolo, Ariel Rodriguez Romero, Hirak Sarkar, Candace L. Savonen, Jaclyn N. Taroni, William E. Vauclain, Deepashree Venkatesh Prasad, Kurt G. Wheeler. refine.bio: a resource of uniformly processed publicly available gene expression datasets. URL: https://www.refine.bio

Note that the contributor list is in alphabetical order as we prepare a manuscript for submission.

Questions/Feedback?

If you have a question or comment, please file an issue on GitHub or send us an email at requests@ccdatalab.org.