Grants and projects - RESEARCH GROUP OF HEALTH INFORMATICS Health Informatics Projects

Estonian Centre of Excellence in Artificial Intelligence (EXAI)

01.01.2024–31.12.2030

Artificial intelligence (AI) has a strong and growing potential in helping to address challenges in many societal contexts. The capabilities of AI in interpreting complex data and generating solutions have been amplified through the use of foundation models such as large language models. The Estonian Centre of Excellence in Artificial Intelligence (EXAI) focuses on advancing innovative methodologies for

leveraging foundation models in building efficient and trustworthy analysis and prediction systems;
implementing control mechanisms and guardrails to ensure that the advanced AI systems follow their specification;
adapting and enhancing AI systems for improved performance in targeted application contexts; and
achieving end-to-end security and privacy assurance of AI systems.

We apply these methodologies to advance AI capabilities in key Estonian sectors, including e-governance, healthcare, business process management, and cybersecurity.

Data Analysis and Real World Interrogation Network (DARWIN EU^®)

2023–2027

The European Medicines Agency’s (EMA) DARWIN project is building a continuously expanding data network aimed at conducting high-quality observational studies. The results of these studies are shared with EU agencies to support regulatory decision-making.

The DARWIN network consists of a central coordination center and data partners who hold real-world health data harmonized into a common format. University of Tartu is one of the data partners.

Research questions are defined by the EMA, based on which the coordination center develops the study protocol and the software required to conduct the study. Data partners run the software on their own datasets and share the results with the coordination center.

The University of Tartu aims to participate in these studies using data from the Estonian Biobank (EBB) in order to support better decision-making throughout the entire lifecycle of medicines, by providing up-to-date and reliable evidence from everyday healthcare practice.

Enhancing the Capability of Secondary Usage of Health Data (TAKS)

01.01.2024–31.12.2028

Although Estonia has pioneered electronic health record collection, this data is little used. This significantly hinders our ability to develop better healthcare products, services and clinical guidelines. There is no comprehensive overview of what data is available to what quality. No secondary use infrastructure has been created that would enable data access and analysis in a quick and standardized way.

Our research group has developed extract-transform-load (ETL) workflows and tools that enable complex analysis routines independent of underlying studies. In this project, we will

1) bring ETL workflows to a qualitatively new level using artificial intelligence,

2) develop bespoke analysis tools to meet our partners’ analytical needs, and

3) bring these to end users in collaboration with clinical researchers, the public sector, and the private sector.

This will increase stakeholders’ ability to re-use healthcare data, thereby improving the healthcare innovation ecosystem in Estonia.

The project is co-funded by the European Union and Estonian Ministry of Education and Research (project TEM-TA72).

TeamPerMed is committed to developing clinical guidelines and decision support tools for healthcare systems in Estonia and the EU. Utilizing extensive databases and electronic health records, we create tools for early health risk identification and prevention. Our goal is to validate at least six personalized medicine tools in the next five years.

In partnership with Erasmus Medical Centre, Erasmus University Rotterdam, and the University of Helsinki, and supported by the European Commission and the Republic of Estonia, TeamPerMed aims to develop a scalable framework for translating genomic and health data into personalized medicine tools that can be used in clinical practice to guide prevention and treatment strategies for chronic and hereditary diseases.

Discovery and Analysis of Clinical Pathways in Health Data

01.01.2023 – 31.12.2027

Real-World Data in healthcare consists of electronic medical records, health insurance claims, prescriptions and other routinely collected data points observed or recorded during common medical practice. International efforts to create common standards and tools for utilising such data are happening within the OHDSI community. While some analyses have been well established in epidemiology, many challenges remain ahead. For example, the temporal aspects of various clinically relevant events described by raw data remain underused. We will systematically describe the mathematical models and algorithmic approaches for defining the relevant events represented by raw data and develop methods for discovering and analysing common pathways followed by many patients. The resulting knowledge will improve healthcare, allow better cost estimation, and provide novel opportunities for comparing quality and running clinical studies internationally.

This project is supported by the Estonian Research Council grant (PRG1844).

The goal of Optima Oncology project is tackling cancer through real world data & artificial intelligence and development of a clinical decision support platform for optimal treatment of patients with breast, lung and prostate cancer. This is a strong alliance of 36 public & private partners from 6 countries. Optima Oncology vision is that every patient should have access to the most up-to-date individualised treatments and innovative therapies. By strengthening shared decision-making through dynamic computer-interpretable guidelines (CIGs), innovative access to broad data sets and AI-driven technology and tools, we envision revolutionising oncology care in Europe.

The Electronic Health Data in a European Network (EHDEN) is an EU project with the objective to provide all the necessary services that enable a distributed European data network to perform fast, scalable and highly reproducible research. The core of EHDEN is the use of a common data model (OMOP CDM), standardised outcome assessment (ICHOM), and transparent open-source analytics (OHDSI). Among other objectives, it aims to map more than 100 million patient records across Europe from different geographic areas and different data sources to OMOP CDM. Our research team is part of EHDEN Consortium, focusing on the technical implementation (security aspects of the platform) and personalised medicine (analysing disease pathways). We are also the pioneers of using OMOP CDM in Estonia, and participate in various research studies and study-a-thons that EHDEN is organising.

Health Sense

Duration: Mar 2021 – Apr 2024

The Health Sense project aims to develop a secure data storage, integration, access, and analysis toolkit to provide large, complex, and detailed sets of health and lifecycle data for public, private, and R&D institutions. Part of the project is to build a software tool that obfuscates patient-level health data while preserving the patients’ privacy so that the data meets the limitations and responsibilities described in the General Data Protection Regulation (GDPR). Our research team was responsible for building that tool. Currently, we are focusing on optimally storing the information originally retrieved as HL7 CDA documents for Estonia’s central Health Information System.

Description of the project: The vision of the Software Technology and Applications Competence Centre (STACC) is to become a leading R&D organisation where companies and public sector agencies can access expertise in (big) data analytics and codevelop visionary technology product. STACC’s main business is providing data analytics and data privacy protection services to help companies to bring highquality services faster to the market. STACC has four strategic development areas: 1) Data Analytics for Software and Systems Optimization, 2) SpatioTemporal Data Analytics, 3) Big Data and Security, and 4) EHealth and Personalised Medicine. STACC strongly contributes to Estonian IT education and enhances research partners’ capacity in the field of data analytics and its applications.

Objective: The main objective of the Project is to turn STACC into a leading and economically independent R&D organisation where companies can access expertise in data analytics and codevelop visionary technology products.

Outcome:

1. Estonia has a world-class technology competence centre with a substantial expertise in data analytics that support Estonian ICT sector

2. STACC helps 23+ companies to bring their scalable technology products faster to global market in the frames of the CC Program and helps more than 50 companies by providing services commercially in the field of data analytics

3. Ratio of sales generated from new products for partner enterprises rises 41%, their R&D costs increase in average 4% and added value per employee 12,5% per year

4. STACC strongly contributes to Estonian IT education and enhances research partners’ capacity in the field of data analytics and its applications.

PerMed

Duration: Mar 2019 – June 2023

PerMed is an IT-infrastructure project to bring personalised medicine into common clinical practice in Estonia. While many proof-of-principle solutions such as polygenic risk scores and extensive pharmacogenetic testing have been effectively demonstrated in science projects, new IT components need to be developed and deployed to the national health system to bring these into everyday clinical practice. In PerMed project, our team is building three main components for this – a national genetic database and a system for managing computational models together with a scalable computing environment.

The University of Tartu is a founding member of a private company STACC (previously known as Software Technology and Applications Competence Center). Its mission is to conduct high-level applied research in the field of data science and machine learning in cooperation with a consortium of scientific, government, industrial, and technology partners. Since the beginning, personalised medicine has been one of the main focus areas at STACC, especially health data analysis. The University of Tartu was the main contributor to these tasks, developing a wide range of health data management and analysis tools over the years. It can be said that our health informatics research group has grown out of the STACC project. We still use and keep improving many of these tools, such as data anonymisation tool, methods for fact extraction from free texts (natural language processing), data visualisation tools, and many others.

Precise4Q

Duration: May 2018 − April 2022

With PRECISE4Q we set out to minimize the burden of stroke both for the individual and for society. To that end we will create multi-dimensional data-driven predictive simulation computer models. This will – for the first time – enable personalized stroke treatment and address the needs of the patient in the stage of the disease (1. Prevention, 2. Acute treatment, 3. Rehabilitation, 4. Reintegration). Stroke is one of the most severe medical problems with far-reaching public health and socio-economic impact and will gather momentum in an aging society. We will integrate heterogeneous input data from multidisciplinary sources: genomics/microbiomics, biochemical data; imaging data including mechanistic biophysiological models of brain perfusion/function; social, lifestyle and gender data; economic and worklife data. Data will be collected over a patient’s life and the models will enable the patient to report wellbeing, outcome and quality of life. PRECISE4Q will output different decision support systems depending on the life stage the patient is in. We will enable the user to optimize prevention and treatment strategies over time. We will provide coping strategies and support well-being and reintegration into social life and work. The predictive capability and clinical precision of PRECISE4Q will be validated with real clinical data generated by 1. prospective clinical studies and by 2. retrospective analyses of big data-sets such as health registries, cohort studies, health insurance data and electronic health records. PRECISE4Q will have a clinically measurable and sustainable impact and will lead to better understanding of risk, health and resilience factors. It will allow us to measure the impact of interventions on different scales and in different stages in a patient’s life. In contrast to current schematic therapy guidelines, PRECISE4Q will support the patient throughout his life-long journey by personalized strategies for his or her individual and specific needs

RITA Coriva

Duration: Sep 2020 – Mar 2022

In the beginning of the COVID-19 pandemic much was unknown about the disease, its risk factors, progression dynamics and consequences. The goal of the CORIVA project was to study these questions using administrative health data and a small cohort of COVID-19 patients. The project was led by the University of Tartu Faculty of Medicine. Within the project, we mapped the administrative data from the COVID-19 patients to OMOP CDM and developed risk models for the progression of the disease. We are continuing to study various aspects of the disease using this cohort of patients.

RITA MAITT

Duration: Oct 2019 – Feb 2022

RITA MAITT was a feasibility study for introducing machine learning and AI-powered solutions in state provided services. Our team was responsible for the health domain. We integrated patient-level data of three central health databases – insurance bills, digital prescriptions, and discharge summaries – and brought a random sample of 10% of the data to OMOP common data model. We demonstrated via more than ten clinical use cases how this integrated common dataset could be an effective tool for solving various tasks in the public health domain.

Algorithmic and Artificial Intelligence Approaches for Digital Health

Duration: Jan 2021 – Dec 2021

Digital Health data opens opportunities for applying algorithmic and artificial intelligence techniques for the analyses of those rich and complex data. Estonia is at the forefront in collecting health data in electronic centralised databases. We propose to study those data and develop methods for better fundamental approaches how to analyse such complex data. First, we will convert data into OHDSI/OMOP formats and define improtant high-level concepts. Secondly, we develop patient group level comparison approaches for disease trajectories. Thordly, we will develop methods and tools to improve the interpretability of the complex multidimensional health data. Last but not least, we will continue with collection, analysis and international collaboration with coronavirus SARS-CoV-2 caused COVID-19 disease. We have set up a survey and tools at koroona.ut.ee and will carry on this research based on both the survey, as well as emerging virus RNA sequencing data and human genetic traits.

The European Medical Information Framework (EMIF)

Duration: Jan 2013 – Jun 2018

It became clear in the 2010s that huge volumes of health data are already being collected and stored in electronic health records. However, the secondary use of these data is challenging as these exist in disparate locations and systems, and are generally used in isolation. The European Medical Information Framework (EMIF) was an ambitious project to improve access to human health data across Europe. To this aim, a common Information Framework (EMIF-Platform) was developed to facilitate access to diverse medical and research data sources. It was the first project in Europe that introduced OMOP CDM, and eventually led to the EHDEN project.

The methods, environments, and applications for solving large and complex computational problems

Duration: 1.01.2006 – 31.12.2011

The goal of the research is in an integrated manner to develop novel methods and tools for solving large-scale and complex computational problems on distributed environments like GRID. We will develop methods for formal validation, data security and protection, middleware, as well as algorithms and methods for different applications that require large-scale data analysis. Overall, we will 1) develop data mining, pattern discovery, and machine learning algorithms and tools, 2) continue developing the DOUG solver for solving very large linear equations (Domain Decomposition on Unstructured Grids), 3) develop formal methods and practical approaches for ensuring the correctness, robustness, and data protection of GRID computations, 4) develop end-user interfaces and study user training aspects, and last but not least, 5) will apply the developed methods for solving various problems in several application areas, including bioinformatic analyses of gene regulatory networks and gene transcriptional control, computer systems logs analysis, and large database analysis.

Estonian Centre of Excellence in Artificial Intelligence (EXAI)

Data Analysis and Real World Interrogation Network (DARWIN EU^®)

Enhancing the Capability of Secondary Usage of Health Data (TAKS)

TeamPerMed Center for Personalized Medicine: Transforming Healthcare in Estonia

Discovery and Analysis of Clinical Pathways in Health Data

OPTIMA ONCOLOGY

The Electronic Health Data in a European Network (EHDEN)

Health Sense

Software Technology and Applications Competence Centre (STACC)

PerMed

STACC

Precise4Q

RITA Coriva

RITA MAITT

Algorithmic and Artificial Intelligence Approaches for Digital Health

The European Medical Information Framework (EMIF)

The methods, environments, and applications for solving large and complex computational problems

Estonian Centre of Excellence in Artificial Intelligence (EXAI)

Data Analysis and Real World Interrogation Network (DARWIN EU®)

Enhancing the Capability of Secondary Usage of Health Data (TAKS)

TeamPerMed Center for Personalized Medicine: Transforming Healthcare in Estonia

Discovery and Analysis of Clinical Pathways in Health Data

OPTIMA ONCOLOGY

The Electronic Health Data in a European Network (EHDEN)

Health Sense

Software Technology and Applications Competence Centre (STACC)

PerMed

STACC

Precise4Q

RITA Coriva

RITA MAITT

Algorithmic and Artificial Intelligence Approaches for Digital Health

The European Medical Information Framework (EMIF)

The methods, environments, and applications for solving large and complex computational problems

Data Analysis and Real World Interrogation Network (DARWIN EU^®)