Participation Guidelines

iDPP CLEF 2023

Tasks

iDPP CLEF 2023 offers two evaluation tasks focused on predicting the progression of Multiple Sclerosis (MS) and one position papers task on impact of exposition to pollutants on predicting the progression of Amyotrophic Lateral Sclerosis (ALS).

For Tasks 1 and 2 on MS, participants are given a dataset containing 2.5 years of visits. This dataset comes from two clinical institutions, one in Pavia, Italy, and the other in Turin, Italy, and it contains data about real patients, fully anonymized.
For Task 3 on ALS, participants are given a dataset containing 6 months of visits. This dataset comes from two clinical institutions, one in Lisbon, Portugal, and the other in Turin, Italy, and it contains data about real patients, fully anonymized.

All the datasets are highly curated and they are produced from the BRAINTEASER Ontology (BTO), developed by the BRAINTEASER project, which ensures the consistency of the data represented. Moreover, several checks have been performed to ensure that all the instances are clean, contain proper values in the expected ranges, and do not have contradictions.

Task 1 - Predicting Risk of Disease Worsening (Multiple Sclerosis)

Task 1 focuses on ranking subjects based on the risk of worsening, setting the problem as a survival analysis task. More specifically the risk of worsening predicted by the algorithm should reflect how early a patient experiences the "worsening" event, and should range between 0 and 1.

Worsening is defined on the basis of the Expanded Disability Status Scale (EDSS), according to clinical standards.

In particular, we consider two different definitions of worsening corresponding to two different sub-tasks:

  • Task1a: the patient crosses the threshold EDSS ≥ 3 at least twice within a one-year interval;
  • Task1b: the second definition of worsening depends on the first recorded value, according to current clinical protocols:
    • if the baseline is EDSS < 1, then the worsening event occurs when an increase of EDSS by 1.5 points is first observed;
    • if the baseline is 1 ≤ EDSS < 5.5, then the worsening event occurs when an increase of EDSS by 1 point is first observed;
    • if the baseline is EDSS ≥ 5.5, then worsening event occurs when an increase of EDSS by 0.5 points is first observed.

For each sub-task, participants are given a dataset containing 2.5 years of visits, with the occurrence of the worsening event and the time of occurrence pre-computed by the challenge organizers.

The training data are available upon registration for the challenge and the test data will be available one week before the run submission deadline. Please see the Datasets and Important Dates sections for more information.

Task 2 - Predicting Cumulative Probability of Worsening (Multiple Sclerosis)

Task 2 refines Task 1 by asking participants to explicitly assign the cumulative probability of worsening at different time windows, i.e., between years 0 and 2, 0 and 4, 0 and 6, 0 and 8, 0 and 10.

In particular, as in Task 1, we consider two different definitions of worsening corresponding to two different sub-tasks:

  • Task2a: the patient crosses the threshold EDSS ≥ 3 at least twice within a one-year interval;
  • Task2b: the second definition of worsening depends on the first recorded value, according to current clinical protocols:
    • if the baseline is EDSS < 1, then the worsening event occurs when an increase of EDSS by 1.5 points is first observed;
    • if the baseline is 1 ≤ EDSS < 5.5, then the worsening event occurs when an increase of EDSS by 1 point is first observed;
    • if the baseline is EDSS ≥ 5.5, then worsening event occurs when an increase of EDSS by 0.5 points is first observed.

For each sub-task, participants are given a dataset containing 2.5 years of visits, with the occurrence of the worsening event and the time of occurrence pre-computed by the challenge organizers.

The training data are available upon registration for the challenge and the test data will be available one week before the run submission deadline. Please see the Datasets and Important Dates sections for more information.

Task 3 - Position Papers on Impact of Exposition to Pollutants (Amyotrophic Lateral Sclerosis)

We will evaluate proposals of different approaches to assess if exposure to different pollutants is a useful variable to predict time to Percutaneous Endoscopic Gastrostomy (PEG), Non-Invasive Ventilation (NIV), and death in ALS patients.

This task will be based on the same data and the same design as Task 1 in iDPP CLEF 2022. Therefore, both training and test data are available immediately and you can build on the experience gained last year. The difference with respect to last year's task is that we complement those data with environmental data to investigate the impact of exposition to pollutants on prediction of disease progression.

Since both training and test data are immediately available, we consider these submissions as position papers. However, we expect participants to use training and test data in the usual way, as it if were a regular challenge and the test data were released at the "last minute" without ground-truth, i.e., without the possibility of overfitting.

Participants are asked to rank subjects based on the risk of early occurrence of:

  • Task3a: NIV (Non-Invasive Ventilation) or (competing event) Death, whichever occurs first;
  • Task3b: PEG (Percutaneous Endoscopic Gastrostomy) or (competing event) Death, whichever occurs first;
  • Task3c: Death.

For each of these tasks, participants are given a dataset containing 6 months of visits and are asked to rank patients on the risk of occurrence of one of the above events after month 6. Participants are also given a series of environmental data, such as PM10 (particulate matter with a diameter of 10 microns or less) or CO (Carbon monoxide) or NO2 (Nitrogen dioxide). Ranges for environmental data series vary depending on the pollutant but, for most patients and on average, they cover from (up to) 90 months before Time 0 up to 6 months after Time 0, where Time 0 is the time of the first ALSFRS-R questionnaire. Please, see the Datasets and the Important Dates sections for more information.

In particular, for each subtask, we ask three types of submissions from participants:

  • baseline submissions without using any environmental data. Ideally, any submission using environmental data (see below) should be accompanied by a baseline submission, in order to have the possibility of measuring the performance gap due to the use of environmental data;
  • submissions using only environmental data 6 months before and after Time 0;
  • submissions using whatever time window of environmental data, as preferred by participants.

Participating

To participate in iDPP CLEF 2023, groups need to register at the following link:

Register

Important Dates

Datasets

Tasks 1 & 2 Datasets (MS)

Tasks 1 and 2 share the same datasets.
The following data are available for the training and test sets of both sub-tasks (a and b):

For a reference on the EDSS score, please, visit here.

The following data are available as ground-truth:

Training and test datasets follow a (roughly) 80-20% proportion.

Training Dataset

There is a separate dataset for each subtask (a or b).

Test Dataset

There is a separate dataset for each subtask (a or b).

Details on the Datasets

Each of the datasets consists of the following files:

More in detail, the outcomes.csv file adopts the following format:


100619256189067386770484450960632124211 1 9.4
101600333961427115125266345521826407539 0 6.4
102874795308599532461878597137083911508 0 0.8
123988288044597922158182615705447150224 1 3.3
100381996772220382021070974955176218231 0 15
...
					

where:

Task 3 Dataset (ALS)

The following data are available for both the training and the test set:

The following data are available as ground-truth:

Training and test datasets follow a (roughly) 80%-20% proportion.

Training Dataset

There is a separate dataset for each of the possibile subtasks (a, b, or c).

Test Dataset

There is a separate dataset for each of the possibile subtasks (a, b, or c).

Details on the Datasets

Each of the datasets consists of the following files:

More in the detail, the outcome.csv file adopts the following format:


0x4bed50627d141453da7499a7f6ae84ab 1 PEG 20.5
0x4d0e8370abe97d0fdedbded6787ebcfc 1 PEG 18.3
0x5bbf2927feefd8617b58b5005f75fc0d 1 DEATH 17.6
0x814ec836b32264453c04bb989f7825d4 0 NONE 37.4
0x71dabb094f55fab5fc719e348dffc85 1 PEG 8.2
...
					

where:

Participant Repository

Participants are provided with a single repository for all the tasks they take part in. The repository contains the runs, resources, and possibly the code produced by each participant in order to promote reproducibility and open science.

The repository is organised as follows:

The submission and score folders are organized into sub-folders for each task as follows:

Participants which do not take part in a given task can simply delete the corresponding sub-folders.

Here, you can find a sample participant repository to get a better idea of its layout.

The goal of iDPP CLEF 2023 is to speed up the creation of systems and resources for MS and ALS progression prediction as well as openly share these systems and resources as much as possible. Therefore, participants are more than encouraged to share their code and any additional resources they have used or created.

All the contents of these repositories are released under the Creative Commons Attribution-ShareAlike 4.0 International License.

Run Submission Guidelines

Participating teams should satisfy the following guidelines:

Task 1 Run Format

Runs should be submitted as a text file (.txt) with the following format:


100619256189067386770484450960632124211 0.897 upd_T1a_survRF
101600333961427115125266345521826407539 0.773 upd_T1a_survRF
102874795308599532461878597137083911508 0.773 upd_T1a_survRF
123988288044597922158182615705447150224 0.615 upd_T1a_survRF
100381996772220382021070974955176218231 0.317 upd_T1a_survRF
...
					

where:

It is important to include all the columns and have a white space delimiter between the columns. No specific ordering is expected among patients (rows) in the submission file.

Task 2 Run Format

Runs should be submitted as a text file (.txt) with the following format:


100619256189067386770484450960632124211 0.221 0.437 0.515 0.817 0.916 upd_T2b_survRF
101600333961427115125266345521826407539 0.213 0.617 0.713 0.799 0.822 upd_T2b_survRF
102874795308599532461878597137083911508 0.205 0.312 0.418 0.781 0.856 upd_T2b_survRF
123988288044597922158182615705447150224 0.197 0.517 0.617 0.921 0.978 upd_T2b_survRF
100381996772220382021070974955176218231 0.184 0.197 0.315 0.763 0.901 upd_T2b_survRF
...
					

where:

It is important to include all the columns and have a white space delimiter between the columns. No specific ordering is expected among patients (rows) in the submission file.

Task 3 Run Format

Runs should be submitted as a text file (.txt) with the following format:


0x4bed50627d141453da7499a7f6ae84ab 0.897 upd_T3a_EW6_survRF
0x4d0e8370abe97d0fdedbded6787ebcfc 0.773 upd_T3a_EW6_survRF
0x5bbf2927feefd8617b58b5005f75fc0d 0.773 upd_T3a_EW6_survRF
0x814ec836b32264453c04bb989f7825d4 0.615 upd_T3a_EW6_survRF
0x71dabb094f55fab5fc719e348dffc85x 0.317 upd_T3a_EW6_survRF
...
					

where:

It is important to include all the columns and have a white space delimiter between the columns. No specific ordering is expected among patients (rows) in the submission file.

Since different time windows may be considered, participants are allowed to submit predictions for a variable number of patients. We encourage participants to submit predictions for as many patients as possible. To avoid favoring runs that consider only a few patients, submitted runs will be evaluated based on their correctness as well as the number of patients included. The number of patients included is also reported in the output of the provided evaluation scripts.

Submission Upload

Runs should be uploaded in the repository provided by the organizers. Following the repository structure discussed above, for example, a run submitted for the first task should be included in submission/task1.

Runs should be uploaded using the following name convention for their identifiers:

<teamname>_T<1|2|3><a|b|c>_[type_]<freefield>

where:

For example, a complete run identifier may look like

upd_T3a_EW6_survRF

where

The name of the text file containing the run must be the identifier of the run followed by the txt extension. In the above example

upd_T3a_EW6_survRF.txt

Run Scores

Performance scores for the submitted runs will be returned by the organizers in the score folder, which follows the same structure as the submission folder.

For each submitted run, participants will find a file named

<teamname>_T<1|2|3><a|b|c>_[type_]<freefield>.score.txt

where <teamname>_T<1|2|3><a|b|c>_[type_]<freefield> matches the corresponding run. The file will contain performance scores for each of the evaluation measures described below. In the above example

upd_T3a_EW6_survRF.score.txt

Evaluation Measures

Task 1 - Predicting Risk of Disease Worsening (Multiple Sclerosis)

The effectiveness of the submitted runs will be evaluated using the Harrell's Concordance Index (C-index).

Task 2 - Predicting Cumulative Probability of Worsening (Multiple Sclerosis)

The effectiveness of the submitted runs will be evaluated with the following measures:

To compute the AUROC and O/E Ratio, we applied censoring to the ground truth values using the following schema. Let A, B, C, and D be four subjects, where:

We report the outcome occurrence label and outcome time for each possible scenario of censoring time, which we refer to as t1, t2, and t3 in the table presented below.

* Being the outcome of the subject at censoring time ti unknown, the subject can not be considered for evaluation at censoring time ti

Task 3 - Position Papers on Impact of Exposition to Pollutants (Amyotrophic Lateral Sclerosis)

The effectiveness of the submitted runs will be evaluated with the following measures:

Computing the Evaluation Measures

Scripts for computing the above evaluation measures are available in the following repository.

Participant Papers

Participants are expected to write a report describing their participation in iDPP CLEF 2023, their proposed solution, what features have been used for prediction, the analysis of the experimental results and insights derived from them.

Participant reports will be peer-reviewed and accepted reports will be published in the CLEF 2023 Working Notes at CEUR-WS, indexed in DBLP and Scopus.

A template for the report is contained in the report folder of each participant repository. The template does not only define the layout of the report but it also provides a suggested structure for it and illustrates the kind of expected contents.
Participants are strongly encouraged to use LaTeX, whose template provides all the details about the report. However, it is possible to also use Word and a basic template is provided as well.

Participant papers are expected to be 10-20 pages in length, excluding references.

The schedule for submission, revision, and camera ready of the participant report is detailed above in the Important Dates section.

Participant papers can be submitted via Easychair at the following link https://easychair.org/conferences/?conf=clef2023 by choosing the Intelligent Disease Progression Prediction (iDPP) track.

Get in Contact

If you need any additional information, please get in contact with us writing to Nicola Ferro, ferrodei.unipd.it.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the grant agreement No GA101017598. The European Commission’s support for this project and the production of this website does not constitute an endorsement of the contents, which reflect the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein