refine.bio Documentation
refine.bio is a multi-organism collection of genome-wide transcriptome or gene expression data that has been obtained from publicly available repositories and uniformly processed and normalized. refine.bio allows biologists, clinicians, and machine learning researchers to search for experiments from different source repositories all in one place and build custom data sets for their questions of interest.
refine.bio is well-suited for quickly assessing if signals are present in particular datasets, for identifying and obtaining data sets for accelerated validation of findings, and for building large compendia for training machine learning models that can adequately handle the technical noise associated with integrating multiple experiments and platforms.
refine.bio is not a substitute for experiments and processing pipelines tailored to answer specific biological questions of interest or for input from relevant experts (e.g., those with statistics expertise), but rather a repository of samples processed with standardized pipelines that have been selected based on their wide-ranging utility.
For examples of how to use refine.bio data, please see Downstream Analysis with refine.bio Examples.
Frequently asked questions
What type of data does refine.bio support?
How can I find out what versions of software/packages were used to process the data?
- Source Data
- Processing Information
- Downloadable Files
- refine.bio Compendia
- API
- Downstream Analysis with refine.bio Examples
- Getting Started with a refine.bio dataset
- Getting Started with Normalized Compendia
- FAQ
- What is the difference between refine.bio-processed and submitter-processed datasets?
- How do you process the data?
- What type of data does refine.bio support?
- What does “corrected” metadata mean?
- Why do the values differ a little bit if I download different datasets?
- Why do I get a limited number of genes back when I aggregate samples from different experiments?
- Why can’t I add certain samples to my dataset?
- Why do the genes included in RNA-seq experiments change between experiments from the same organism?
- How can I find out what genome build and release were used to process RNA-seq data?
- How can I find out what versions of software/packages were used to process the data?
- Are refine.bio datasets I download batch corrected?
- Why are the expression values different if I regenerate a dataset?
- What does it mean to skip quantile normalization for RNA-seq samples?
- How do I cite refine.bio?
- License
Citing refine.bio
Please use the following:
Casey S. Greene, Dongbo Hu, Richard W. W. Jones, Stephanie Liu, David S. Mejia, Rob Patro, Stephen R. Piccolo, Ariel Rodriguez Romero, Hirak Sarkar, Candace L. Savonen, Jaclyn N. Taroni, William E. Vauclain, Deepashree Venkatesh Prasad, Kurt G. Wheeler. refine.bio: a resource of uniformly processed publicly available gene expression datasets. URL: https://www.refine.bio
Note that the contributor list is in alphabetical order as we prepare a manuscript for submission.
Questions/Feedback?
If you have a question or comment, please file an issue on GitHub or send us an email at requests@ccdatalab.org.