Seminar Series: Data Clouds, Data Commons and Data Ecosystems

How Data Science is Changing How We Integrate, Analyze and Share Data and Reproduce Research
Biomedical data has grown too large for most research groups to host and analyze the data from large projects themselves. Data commons provide an alternative by co-locating data, storage and computing resources with commonly used software services, applications and tools for managing, analyzing and sharing data to create an interoperable resource for the research community. We give an overview of data commons and describe some lessons learned from the NCI Genomic Data Commons, Bionimbus, the BloodPAC Data Commons and the BRAIN Commons. We also describe how second-generation data commons are providing the foundation for data ecosystems and supporting reproducible research. We conclude by giving an overview of how an organization can set up a commons themselves.