.
TRACY has received funding from the European Health and Digital Executive Agency (HADEA) under the Commission Digital Europe Programme (DIGITAL) with Grant Agreement No 101102641
DISCLAIMER: Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Health and Digital Executive Agency. Neither the European Union nor the granting authority can be held responsible for them
TRACY Common Model: From Chaos to Clarity
(By PT – Michaela Antonopoulou)
The Challenge
In today’s data-driven landscape, collecting information is just the beginning. The real challenge begins when that data comes from multiple sources, each with its own format, structure, and level of completeness. Integrating this fragmented data manually is not only time-consuming but also prone to errors. During TRACY, we faced this exact problem, as data received from MNOs have different schemas, naming conventions, and granularity levels. The problem was solved by introducing a Common Data Model and a fully automated ingestion pipeline.
The Turning Point
We realized the key to solving our ingestion challenge wasn’t just automation, it was standardization. This common model allowed us to retain all critical information from each source while transforming it into a consistent structure. By aligning fields, applying naming conventions, and defining transformation rules, we made it possible to treat diverse data sets as part of a single ecosystem. Once standardized, the data could be enriched further, enabling descriptive analytics, predictive models, and heuristic insights.
A Fully Automated Pipeline
With the Common Model in place, we built a robust, automated pipeline to manage the entire ingestion process from start to finish. It all begins with MinIO, our high-performance object storage platform, which acts as the central repository for all incoming data. Each case is assigned to a dedicated directory, where files are uploaded and organized by provider, ensuring clarity and traceability. From there, Apache Airflow takes over. This orchestration tool runs our ingestion workflow automatically: collecting the raw files, transforming them according to the Common Model, and loading the clean data into our analytics database. The entire process is scheduled to run periodically. It requires minimal human intervention, supports both batch and near-real-time ingestion, and ensures that data is always ready for use, whether in dashboards, reports, or machine learning pipelines.
The Result: Scalable, Reliable, Insight-Ready Data
This shift from manual, fragmented workflows to an automated, standardized ingestion process, has dramatically improved LEA operations. They can now onboard new data sources in hours rather than days. We’ve minimized errors, accelerated time-to-insight, and enabled more sophisticated analytics across the board.