Data Issues
Here is a list of data issues to be aware about:
Disease activity definitions vary between studies
The disease_activity and physician_global_assessment variables differ between GI-DAMPs, MUSIC, and Mini-MUSIC.
Current Variations in Disease Activity Classifications by Study
| Study | Field Name | Categories |
|---|---|---|
| GI-DAMPs | ibd_status | - Biochemical remission (normal CRP AND FC< 250) - Remission - Active - Highly active (admission for IV steroids) - Not applicable |
| MUSIC | physician_global_assessment | - Remission - Mildly active - Moderately active - Severely active |
| disease_activity | - Biochemical remission - Remission - Active - Biochemically Active - Not applicable |
|
| Mini-MUSIC | physician_global_assessment | - Biochemical remission - Clinical remission - Mildly active - Moderately active - Severely active |
| disease_activity | - Remission - Mild - Moderate - Severe - Not applicable |
Potential Standardized Definition
Disease activity could be standardised using these common variables:
has_active_symptoms(Boolean)crp(mg/L, threshold > 5)calprotectin(μg/g, threshold > 250)
Example Implementation Logic
Example Usage with DataFrame
| Python | |
|---|---|
GI-DAMPs Study ID Format Evolution
Historical Format
- Initial format:
GID-xxx-PorGID-xxx-HC - xxx: integer with inconsistent leading zeros (e.g.,
GID-001-PvsGID-1-P) - P: Patient, HC: Healthy Control
Multi-Center Expansion
- Center-specific formats introduced:
- Edinburgh:
GID-x - Glasgow:
GID-136-x - Dundee:
GID-138-x
Known Issues
- Potential conflicts with Edinburgh legacy IDs
GID-136-PandGID-138-P - Inconsistent formats at Glasgow/Dundee sites (e.g.,
GID-136-x-P)
Current Standard (December 2024)
- Remove
-Pand-HCsuffixes (usestudy_groupcolumn instead) - Use
GID-prefix only - No leading zeros
- Center-specific formats:
- Edinburgh:
GID-x - Glasgow:
GID-136-x - Dundee:
GID-138-x
⚠️ Important: When merging GI-DAMPs data, carefully check study_id column for legacy formats.