Discovery and Analysis of Clinical Pathways in Health Data

01.01.2023 – 31.12.2027

Real-World Data in healthcare consists of electronic medical records, health insurance claims, prescriptions and other routinely collected data points observed or recorded during common medical practice. International efforts to create common standards and tools for utilising such data are happening within the OHDSI community. While some analyses have been well established in epidemiology, many challenges remain ahead. For example, the temporal aspects of various clinically relevant events described by raw data remain underused. We will systematically describe the mathematical models and algorithmic approaches for defining the relevant events represented by raw data and develop methods for discovering and analysing common pathways followed by many patients. The resulting knowledge will improve healthcare, allow better cost estimation, and provide novel opportunities for comparing quality and running clinical studies internationally.

This work was supported by the Estonian Research Council grant (PRG1844).



Duration: 1 Oct 2021 - 30 Sep 2026

The goal of Optima Oncology project is tackling cancer through real world data & artificial intelligence and development of a clinical decision support platform for optimal treatment of patients with breast, lung and prostate cancer. This is a strong alliance of 36 public & private partners from 6 countries. Optima Oncology vision is that every patient should have access to the most up-to-date individualised treatments and innovative therapies. By strengthening shared decision-making through dynamic computer-interpretable guidelines (CIGs), innovative access to broad data sets and AI-driven technology and tools, we envision revolutionising oncology care in Europe.



Duration: Nov 2018 - Apr 2024.

The Electronic Health Data in a European Network (EHDEN) is an EU project with the objective to provide all the necessary services that enable a distributed European data network to perform fast, scalable and highly reproducible research. The core of EHDEN is the use of a common data model (OMOP CDM), standardised outcome assessment (ICHOM), and transparent open-source analytics (OHDSI). Among other objectives, it aims to map more than 100 million patient records across Europe from different geographic areas and different data sources to OMOP CDM. Our research team is part of EHDEN Consortium, focusing on the technical implementation (security aspects of the platform) and personalised medicine (analysing disease pathways). We are also the pioneers of using OMOP CDM in Estonia, and participate in various research studies & study-a-thons that EHDEN is organising.


Duration: Mar 2019 – June 2023

PerMed is an IT-infrastructure project to bring personalised medicine into common clinical practice in Estonia. While many proof-of-principle solutions such as polygenic risk scores and extensive pharmacogenetic testing have been effectively demonstrated in science projects, new IT components need to be developed and deployed to the national health system to bring these into everyday clinical practice. In PerMed project, our team is building three main components for this – a national genetic database and a system for managing computational models together with a scalable computing environment.

Health Sense

Duration: Mar 2021 – Jan 2023

The goal of the Health Sense project is to develop a secure data storage, integration, access and analysis toolkit in order to provide large, complex and detailed sets of health and lifecycle data for the public sector, private sector and R&D institutions. Part of the project is to build a software tool that obfuscates patient level health data while preserving the privacy of the patients so that the data meets limitations and responsibilities described in the General Data Protection Regulation (GDPR). Our research team is responsible for building that tool.



Duration: 2009 - Aug 2022

The University of Tartu is a founding member of a private company STACC (previously known as Software Technology and Applications Competence Center). Its mission is to conduct high-level applied research in the field of data science and machine learning in cooperation with a consortium of scientific, government, industrial, and technology partners. Since the beginning, personalised medicine has been one of the main focus areas at STACC, especially health data analysis. The University of Tartu was the main contributor to these tasks, developing a wide range of health data management and analysis tools over the years. It can be said that our health informatics research group has grown out of the STACC project. We still use and keep improving many of these tools, such as data anonymisation tool, methods for fact extraction from free texts (natural language processing), data visualisation tools, and many others.


Duration: May 2018 − April 2022

With PRECISE4Q we set out to minimize the burden of stroke both for the individual and for society. To that end we will create multi-dimensional data-driven predictive simulation computer models. This will – for the first time – enable personalized stroke treatment and address the needs of the patient in the stage of the disease (1. Prevention, 2. Acute treatment, 3. Rehabilitation, 4. Reintegration). Stroke is one of the most severe medical problems with far-reaching public health and socio-economic impact and will gather momentum in an aging society. We will integrate heterogeneous input data from multidisciplinary sources: genomics/microbiomics, biochemical data; imaging data including mechanistic biophysiological models of brain perfusion/function; social, lifestyle and gender data; economic and worklife data. Data will be collected over a patient’s life and the models will enable the patient to report wellbeing, outcome and quality of life. PRECISE4Q will output different decision support systems depending on the life stage the patient is in. We will enable the user to optimize prevention and treatment strategies over time. We will provide coping strategies and support well-being and reintegration into social life and work. The predictive capability and clinical precision of PRECISE4Q will be validated with real clinical data generated by 1. prospective clinical studies and by 2. retrospective analyses of big data-sets such as health registries, cohort studies, health insurance data and electronic health records. PRECISE4Q will have a clinically measurable and sustainable impact and will lead to better understanding of risk, health and resilience factors. It will allow us to measure the impact of interventions on different scales and in different stages in a patient’s life. In contrast to current schematic therapy guidelines, PRECISE4Q will support the patient throughout his life-long journey by personalized strategies for his or her individual and specific needs

RITA Coriva

Duration: Sep 2020 – Mar 2022

In the beginning of the COVID-19 pandemic much was unknown about the disease, its risk factors, progression dynamics and consequences. The goal of the CORIVA project was to study these questions using administrative health data and a small cohort of COVID-19 patients. The project was led by the University of Tartu Faculty of Medicine. Within the project, we mapped the administrative data from the COVID-19 patients to OMOP CDM and developed risk models for the progression of the disease.  We are continuing to study various aspects of the disease using this cohort of patients. 


Duration: Oct 2019 – Feb 2022

RITA MAITT was a feasibility study for introducing machine learning and AI-powered solutions in state provided services. Our team was responsible for the health domain. We integrated patient-level data of three central health databases – insurance bills, digital prescriptions, and discharge summaries – and brought a random sample of 10% of the data to OMOP common data model. We demonstrated via more than ten clinical use cases how this integrated common dataset could be an effective tool for solving various tasks in the public health domain.

Algorithmic and Artificial Intelligence Approaches for Digital Health

Duration: Jan 2021 – Dec 2021

Digital Health data opens opportunities for applying algorithmic and artificial intelligence techniques for the analyses of those rich and complex data. Estonia is at the forefront in collecting health data in electronic centralised databases. We propose to study those data and develop methods for better fundamental approaches how to analyse such complex data. First, we will convert data into OHDSI/OMOP formats and define improtant high-level concepts. Secondly, we develop patient group level comparison approaches for disease trajectories. Thordly, we will develop methods and tools to improve the interpretability of the complex multidimensional health data. Last but not least, we will continue with collection, analysis and international collaboration with coronavirus SARS-CoV-2 caused COVID-19 disease. We have set up a survey and tools at and will carry on this research based on both the survey, as well as emerging virus RNA sequencing data and human genetic traits.


Duration: Jan 2013 – Jun 2018

It became clear in the 2010s that huge volumes of health data are already being collected and stored in electronic health records. However, the secondary use of these data is challenging as these exist in disparate locations and systems, and are generally used in isolation. The European Medical Information Framework (EMIF) was an ambitious project to improve access to human health data across Europe. To this aim, a common Information Framework (EMIF-Platform) was developed to facilitate access to diverse medical and research data sources. It was the first project in Europe that introduced OMOP CDM, and eventually led to the EHDEN project.

The methods, environments, and applications for solving large and complex computational problems

Duration: 1.01.2006 – 31.12.2011

The goal of the research is  in an integrated manner to develop novel methods and tools for solving large-scale and complex computational problems on distributed environments like GRID. We will develop methods for formal validation, data security and protection, middleware, as well as algorithms and methods for different applications that require large-scale data analysis. Overall, we will 1) develop data mining, pattern discovery, and machine learning algorithms and tools, 2) continue developing the DOUG solver for solving very large linear equations (Domain Decomposition on Unstructured Grids), 3) develop formal methods and practical approaches for ensuring the correctness, robustness, and data protection of GRID computations, 4) develop end-user interfaces and study user training aspects, and last but not least, 5) will apply the developed methods for solving various problems in several application areas, including bioinformatic analyses of gene regulatory networks and gene transcriptional control, computer systems logs analysis, and large database analysis.