A proposed data standard for breath sample data and metadata

Abstract

Background Data management in breath research is underdeveloped. While general metabolomics data standards exist, these do not include detailed individual sample provenance and quality assurance: something which is particularly important in the breathomics field. To facilitate quality control and data sharing, we propose a new data standard including sample specific metadata as a key component. Methods The proposed standard extends the ISATAB data archive standard with support for sample specific metadata/reports describing individual sample processing, provenance and quality. An informatics platform was developed to facilitate straightforward data archive creation, management, sample report generation and data sharing. Re-usable flexible scaffolds and templates were used to handle different equipment and processes, and could be adopted by any study requiring consistent metadata collection. Results To demonstrate the applicability of the proposed standard and the informatics platform we used breath sample data collected during the East Midlands Breathomics Pathology Node project. Collaborating researchers employed a pre-defined scaffold to integrate their existing data files into a template, with which they produced outputs including: a data archive (containing data and meta-data), and ‘data header’ PDF reports pertaining to individual samples. The latter were uploaded to a local repository. Conclusions The proposed data standard and informatics platform simplified the metadata collection and reporting process. Easily shareable PDF-based data headers allowed researchers to understand each step the sample had gone through and repeat relevant analytical processes. Furthermore, the ISATAB-compatible data archive supports uploading to existing data repositories, while including the provision for reporting individual sample meta-data.

Publication
International Association of Breath Research Meeting 2019