Tasks
iDPP CLEF 2023 offers two evaluation tasks focused on predicting the progression of Multiple Sclerosis (MS) and one position papers task on impact of exposition to pollutants on predicting the progression of Amyotrophic Lateral Sclerosis (ALS).
For Tasks 1 and 2 on MS, participants are given a dataset containing 2.5 years of visits. This dataset comes from two clinical institutions, one in Pavia, Italy, and the other in Turin, Italy, and it contains data about real patients, fully anonymized. For Task 3 on ALS, participants are given a dataset containing 6 months of visits. This dataset comes from two clinical institutions, one in Lisbon, Portugal, and the other in Turin, Italy, and it contains data about real patients, fully anonymized.
All the datasets are highly curated and they are produced from the BRAINTEASER Ontology (BTO), developed by the BRAINTEASER project, which ensures the consistency of the data represented. Moreover, several checks have been performed to ensure that all the instances are clean, contain proper values in the expected ranges, and do not have contradictions.
Task 1 - Predicting Risk of Disease Worsening (Multiple Sclerosis)
Task 1 focuses on ranking subjects based on the risk of worsening, setting the problem as a survival analysis task. More specifically the risk of worsening predicted by the algorithm should reflect how early a patient experiences the "worsening" event, and should range between 0 and 1.
Worsening is defined on the basis of the Expanded Disability Status Scale (EDSS), according to clinical standards.
In particular, we consider two different definitions of worsening corresponding to two different sub-tasks:
- Task1a: the patient crosses the threshold EDSS ≥ 3 at least twice within a one-year interval;
- Task1b: the second definition of worsening depends on the first recorded value, according to current clinical protocols:
- if the baseline is EDSS < 1, then the worsening event occurs when an increase of EDSS by 1.5 points is first observed;
- if the baseline is 1 ≤ EDSS < 5.5, then the worsening event occurs when an increase of EDSS by 1 point is first observed;
- if the baseline is EDSS ≥ 5.5, then worsening event occurs when an increase of EDSS by 0.5 points is first observed.
For each sub-task, participants are given a dataset containing 2.5 years of visits, with the occurrence of the worsening event and the time of occurrence pre-computed by the challenge organizers.
The training data are available upon registration for the challenge and the test data will be available one week before the run submission deadline. Please see the Datasets and Important Dates sections for more information.
Task 2 - Predicting Cumulative Probability of Worsening (Multiple Sclerosis)
Task 2 refines Task 1 by asking participants to explicitly assign the cumulative probability of worsening at different time windows, i.e., between years 0 and 2, 0 and 4, 0 and 6, 0 and 8, 0 and 10.
In particular, as in Task 1, we consider two different definitions of worsening corresponding to two different sub-tasks:
- Task2a: the patient crosses the threshold EDSS ≥ 3 at least twice within a one-year interval;
- Task2b: the second definition of worsening depends on the first recorded value, according to current clinical protocols:
- if the baseline is EDSS < 1, then the worsening event occurs when an increase of EDSS by 1.5 points is first observed;
- if the baseline is 1 ≤ EDSS < 5.5, then the worsening event occurs when an increase of EDSS by 1 point is first observed;
- if the baseline is EDSS ≥ 5.5, then worsening event occurs when an increase of EDSS by 0.5 points is first observed.
For each sub-task, participants are given a dataset containing 2.5 years of visits, with the occurrence of the worsening event and the time of occurrence pre-computed by the challenge organizers.
The training data are available upon registration for the challenge and the test data will be available one week before the run submission deadline. Please see the Datasets and Important Dates sections for more information.
Task 3 - Position Papers on Impact of Exposition to Pollutants (Amyotrophic Lateral Sclerosis)
We will evaluate proposals of different approaches to assess if exposure to different pollutants is a useful variable to predict time to Percutaneous Endoscopic Gastrostomy (PEG), Non-Invasive Ventilation (NIV), and death in ALS patients.
This task will be based on the same data and the same design as Task 1 in iDPP CLEF 2022. Therefore, both training and test data are available immediately and you can build on the experience gained last year. The difference with respect to last year's task is that we complement those data with environmental data to investigate the impact of exposition to pollutants on prediction of disease progression.
Since both training and test data are immediately available, we consider these submissions as position papers. However, we expect participants to use training and test data in the usual way, as it if were a regular challenge and the test data were released at the "last minute" without ground-truth, i.e., without the possibility of overfitting.
Participants are asked to rank subjects based on the risk of early occurrence of:
- Task3a: NIV (Non-Invasive Ventilation) or (competing event) Death, whichever occurs first;
- Task3b: PEG (Percutaneous Endoscopic Gastrostomy) or (competing event) Death, whichever occurs first;
- Task3c: Death.
For each of these tasks, participants are given a dataset containing 6 months of visits and are asked to rank patients on the risk of occurrence of
one of the above events after month 6. Participants are also given a series of environmental data, such as
PM10 (particulate matter with a diameter of 10 microns or less) or CO (Carbon monoxide) or NO2 (Nitrogen dioxide).
Ranges for environmental data series vary depending on the pollutant but, for most patients and on average, they cover from
(up to) 90 months before Time 0
up to 6 months after Time 0
, where
Time 0
is the time of the first ALSFRS-R questionnaire.
Please, see the Datasets and the Important Dates sections for more information.
In particular, for each subtask, we ask three types of submissions from participants:
- baseline submissions without using any environmental data. Ideally, any submission using environmental data (see below) should be accompanied by a baseline submission, in order to have the possibility of measuring the performance gap due to the use of environmental data;
- submissions using only environmental data 6 months before and after
Time 0
; - submissions using whatever time window of environmental data, as preferred by participants.
Participating
To participate in iDPP CLEF 2023, groups need to register at the following link: