Management of research data
Projects within our SFB generate large data sets. This includes data obtained within our proteomics and genomics service projects as well as results from, for example, genome-wide analyses of DNA methylation or gene expression analyses. There is a range of systems available at JGU, TUD, GU and LMU to facilitate the processing and storing of such information, as well as enabling data-sharing within and outside of these institutions.
Data processing and storage
At JGU, MOGON II is among the top 100 high-performance computers in the world. It provides researchers at the JGU Mainz and associated institutions with a computing infrastructure to enable increasingly large datasets to be processed and analysed. Well-maintained systems for the short-, medium- and long-term storage of electronic data are provided by JGU’s ZDV (“Zentrale Datenverarbeitung”). Besides maintaining a substantial backup system, the data storage process is monitored by IT specialists to ensure the integrity of data for at least ten years.
Comparable facilities and services exist at the other institutions within our SFB. The Hessian Competence Cluster for High Performance Computing (HKHLR), for example, provides a powerful network of high-performance computers across Hessen with support and training for their use. The Hessen research data infrastructure network (HeFDI) is intended to initiate and coordinate the necessary organisational and technological processes to anchor research data management at the participating universities. Both GU Frankfurt and TU Darmstadt are members of the HeFDI.
Data sharing
Electronic data is easily and securely shared between groups within and outside of the SFB via file transfer protocol (FTP) servers, the file-sharing software Seafile (hosted at JGU), or the content and document management software SharePoint (hosted at the participating universities).
In order to make the data generated as part of this SFB accessible to the wider scientific community, publication of results in open access journals and databases is encouraged. Where possible, for example for proteomics projects, full datasets are included as standard supplementary material in any publications. Furthermore, researchers use public repositories, such as Gene Expression Omnibus or Short Read Archive, to make all additional relevant data associated with a publication publically available.
Within the SFB 1361, we provide training in data management to our scientists. For this, we cooperate with the JGU competence team “Research Data”.