Personal Health Train - PADME
Person-related data, especially in the medical domain, is often distributed over multiple sites which collect and manage their data independently from each other. In particular, the patient care data that is taken and documented in each hospital about a patient, i.e. structured data of an anamnesis form, laboratory data etc. or even images (e.g., MRI and CT) and texts (e.g., findings and discharge letters) and, finally, integrated by Data Integration Centers are of special interest for medical research. The conventional analysis approach of transferring and integrating all relevant data in a single location available is not feasible for all analysis scenarios, especially, when the data volume to be included in the analysis is high (e.g., images and genetic data), when the data is the patients’ footprint (e.g., discharge letters) and, therefore, there are privacy concerns, as well as the partner sites don’t want to lose the sovereignty (control) about their data.
The Personal Health Train is a concept allowing privacy-preserving distributed analysis of person-related data, such as those available in the medical domain. The fundamental principle is to retain the data at the partner sites, the so-called stations, which already manage the collected data. The analysis in terms of specific analysis scripts and programs, the so-called Analysis Trains, are sent to these stations which execute them and continually ship only the results without any relationship to the persons whose data have been used.
The PADME platform is an implementation of the PHT concept. The analytical spectrum that can be used by PADME is not limited by the platform but often depends on methodical limitations and the availability of implementations. PADME offers a central access point for all scientists who want to run an analysis but requires to manage the analysis program in a repository the medical community has access to, and thus, can check and rate it. All Analysis Trains are encrypted when they are transported from one station to the next one by a private-public-key infrastructure. Therefore, PADME aims to facilitate medical studies by bridging the gap between data scientists and data providers and offering a way to conduct data analytics on sensitive data in a General Data Protection Regulation (GDPR)-compliant manner. PADME is currently on TRL7.
PADME is an analysis platform for distributed sensitive data, such as those available in the medical domain. Instead of combining all relevant data on a centralized server, PADME follows the paradigm of “bringing the analysis to the data”. Data is managed at data holding organizations, so-called stations, which are analyzed by analysis trains. Each analysis train consists of the specific analysis program together with the execution environment. This shortens the time to execute and reduces the preparation effort at the station side since all dependencies required for the analysis execution (except the data itself) is included within the analysis train.
Each data holding organization (station) has full control over the analysis execution; they can associate the required data before the analysis starts, check the obtained results, or, basically, reject the analysis execution. Intermediate results are shipped within the analysis trains. Therefore, all analysis trains are encrypted using state-of-the-art private-public key infrastructure to ensure the privacy and origination of analysis trains, especially the containing analysis programs and intermediate results.
The PADME platform roughly consists of two parts, the central services and the station software that resides on station side:
1) Station Registry (central service): Registration service for all stations. Institutions that want to participate on the analysis network using the pADME platform can register itself. Once registration is complete, an automatic on-boarding service is run to guide institution managers through the installation of the Station Software (see 6).
2) Playground: Sandbox service for developing and testing analyses before they are executed in a distributed mode. With the help of the Playground, researchers are able to re-model the distributed data sets and simulate the analyses on synthetic data. This allows the scientist to find errors before the analysis is executed.
3) Creator: Service for the (step-by-step) creation and uploading of analyses to the PHT platform.
4) Storehouse: "App" store for the analyses. Analyses that have not yet been published can be viewed and evaluated here. The Storehouse serves as an audit platform for reviewing the analyses.
5) Requester (Central Service): Central dashboard to send analyses to the data-holding institutions. All available institutions are listed here and can be selected. After selection, the analyses are automatically orchestrated to the institutions.
6) Station Software: Software for the data-holding institutions. This software receives analyses and serves as a "remote control" for their management. In particular, the station software establishes a connection to the data requested by the analyses. Furthermore, the software allows insight into the changes of the data of the analyses before they are sent back to the central server.
Das Produkt im Einsatz
The PADME PHT platform is already in use at the RWTH, Leipzig University and the Leipzig University Medical Center. It has been - as of the beginning of 2023 - used for data analysis in different research projects, such as MII CORD, LEUKO-Expert and others.
The PADME team welcomes any feedback to this infrastructure, either questions or comments with respect to its application in specific (medical) use cases or according to technical solutions and usability or further open requirements.
Um einen Kommentar verfassen zu können, müssen Sie sich zunächst anmelden!