The CAMDA Contest Challenges
For 2022, we present:
- The Extended Literature AI for Drug Induced Liver Injury Challenge provides biomedical publications curated by FDA experts on DILI. Build their digital twin to identify rare positives and demonstrate robustness under distributional shifts!
- The Anti-Microbial Resistance Forensics Challenge features diverse meta-genomics profiles from urban and non-urban areas. Track emerging AMR and its relationship with phages!
- The Disease Maps to Modelling COVID-19 Challenge provides highly detailed expert-curated molecular mechanistic maps for COVID-19. Combine them with available omic data to expand the current biological knowledge on COVID-19 mechanism of infection and downstream consequences. The main topic for this year’s challenge is drug repurposing with the possibility of Real World Data based validation of the most promising candidates suggested!
CAMDA encourages an open contest, where all analyses of the contest data sets are of interest, not limited to the questions suggested here. There is an
online forum
for the free discussion of the contest data sets and their analysis, in which you are encouraged to participate.
We look forward to a lively contest!
Extended Literature AI for Drug Induced Liver Injury
Unexpected Drug-Induced Liver Injury (DILI) still is one of the main killers of promising novel drug candidates. It is a clinically significant disease that can lead to severe outcomes such as acute liver failure and even death. It remains one of the primary liabilities in drug development and regulatory clearance due to the limited performance of mandated preclinical models even today. The free text of scientific publications is still the main medium carrying DILI results from clinical practice or experimental studies. The textual data still has to be analysed manually. This process, however, is tedious and prone to human mistakes or omissions, as results are very rarely available in a standardized form or organized form. There is thus great hope that modern techniques from machine learning or natural language processing could provide powerful tools to better process and derive the underlying knowledge within free form texts. The pressing need to faster process potential drug candidates in the current COVID epidemic combined with recent advances in Artificial Intelligence for text processing make this Challenge particularly topical.
We have compiled a large set of PubMed papers relevant to DILI (positives) to be contrasted with a challenging set of unrelated papers (negatives). Both titles and abstracts have been collected. Can you build a classifier using modern AI or NLP techniques to identify the relevant papers?
This year's contest adds the following challenges reflecting the difficulty of the real-world task:
- In total, perhaps 1% of all manuscripts are DILI related. Can you build models that find the sparse real positives? We provide a range of test sets that are unbalanced to different degrees, i.e., include more and more true negatives to reflect the difficulty of the real world task. This year we provide a dynamic automatic leaderboard where you can submit your predictions for automatic scoring.
- We employ two concepts to help identify over-fitting in models: a) There will be additional private leaderboards opening close to the submission deadline; and b) One of these will feature abstracts from a different source, testing the robustness of models under distributional domain shifts.
As in the previous year,
- The positive reference data set was pulled from over 14,000 DILI related papers referenced in the NIH LiverTox database, which have been validated by a panel of DILI experts.
- The realistic, non-trivial negative reference data set incorporates over 14,000 papers highly enriched in manuscripts that are not relevant to DILI but where obvious negatives and any positives we could identify have been removed by filtering for keywords and through well established language models, followed by a selective manual review by DILI experts at the FDA.
Together, this thus recreates the problem faced by human experts: After the obvious, easy negatives and positives have been removed by basic algorithms, how can we identify true positives and negatives for the less obvious cases?
Data are provided in the form of text tables. Both files contain paper titles and abstracts (where available).
Please sign up to announcements from the CAMDA toxicity forum for alerts.
Please read and accept the data download agreement for access to the Literature AI for Drug Induced Liver Injury Download Site.
We thank the Institute of Advanced Research in Artificial Intelligence (IARAI) for its support in the preparation of this Challenge.
Anti-Microbial Resistance Forensics
Bacteriophages, being the re-occuring mystery in the history of science are believed to be they key for understanding of microbial evolution and the transfer of AMR genes. Recent studies show that there is a significant correlation between occurence of Phages and AMR genes, indicating that they are indeed taking part in the spread of them. While taking part in AMR dissemination the phages are also considered as the potential alternative to antibiotics. In such contradictory world there is a huge potential as well as urgent need for precise classification, description and analysis of capabilities. Due to pandemic of SARS-CoV-2, advance in phylogenetic algorithms and k-mer based methods have been extremely rapid and those improvements are witing to be adapted to different branches of life sciences.
For further progress of knowledge of the phage world, and their co-evolution with procaryota, we provide a datasets where first (urban microbiome based) contains:
- 62 samples with high level of AMR genes
- 62 samples with low level of AMR genes
Samples are placed in 124 tar compressed folders with names corresponding to their ID's and AMR class (high or low). Within samples you will find:
- fastq files compressed with dsrc
- binary alignments of AMR genes (some samples may not have BAM files)
- all sorts of contextual tabularized metadata regarding those genes
Depth of sequencing may vary and we cannot wait to hear what do you think about this property in terms of phage metagenomics!
Data is based on an initial large-scale analysis of anti-microbial resistance of the MetaSUB International Consortium.
Complementary non-urban samples will come shortly.
Questions of interest in this exploratory study include (but are not limited to):
Analysis suggestions:
* Exploration and searching for phage-like and prophage-like kmers and sequences and their corespondence to AMR genes
* Search for novel k-mer and information based methods of classification of phage and prophage sequences
* Search for microevolution events in phages, prophages and procaryota
* Advancement of algorithms for the identification and discovery of phages and pro-phages from bacterial genomes, especially from metagenomic samples; assessment of such novel algorithms (performance, validation, …)
* Applications and assessments of relation mining for the occurrence of phages and bacteria in the context of AMR
The FASTQ files containing raw metagenomics reads of aforementioned samples are made available for the first time with the corresponding metadata and results of MetaSUB AMR analysis.
Please sign up to announcements from the CAMDA metagenomics forum for alerts.
Please read and accept the data download agreement for access.
Disease Maps to Modelling COVID-19
The Disease Maps to modeling COVID-19 Challenge provides highly detailed expert-curated molecular mechanistic maps for COVID-19. Combine them with available omic data to expand the current biological knowledge on COVID-19 mechanism of infection and downstream consequences. The main topic for this year’s challenge is drug repurposing with the possibility of Real World Data based validation of the most promising candidates suggested.
Now an updated version of the COVID-19 disease map, the product of a collaborative effort involving over 230 biocurators, domain experts, modelers and data analysts from 120 institutions in 30 countries, has been released.
The new COVID-19 Disease Map (C19DMap), is an open-access collection of curated computational diagrams and models of molecular mechanisms implicated in the disease. (see Ostawzewski et al., Molecular Systems Biology, 2021, 17:e10387). This represents the first model of the cellular response to infection from a mechanistic perspective. Mechanistic pathway models can then provide a causal bridge from variations in gene activity or integrity to consequential changes in phenotype, making these models a useful tool for the identification of deregulated mechanisms and functions in the search for candidate targets or intervention points that might reverse the phenotype or slow down the progression of the disease.
More information and tools: This year, in addition to the COVID-19 mechanistic map, we present two new resources for modeling COVID-19: CoV-HiPathia (Rian et al., BioData mining 14, 5, 2021) tool and SIGNOR database (Licata et al., NAR 48, D504, 2019).
Possibility of validation of candidates for drug repurposing: This year, data from a retrospective cohort of 16.000 COVID-19 patients from the Andalusian Population health Database are available to the organizers, which offers the possibility of Real World Evidence (RWE) validations for promising candidates. Recently, we have used RWE to prove the protective effect of vitamin D metabolites by analyzing the effect of drug consumption on COVID-19 patient survival. See details in:
Consumption of other drugs, or other hypotheses on co-morbidities, or any other clinical parameter can be tested in this dataset. This dataset cannot be made public, but can be used by the challenge organizers to validate hypotheses.
The main challenge suggested is to use the COVID-19 disease map to suggest drugs candidate for repurposing, that could be tested using the RWD dataset
However, CAMDA is an open-ended contest and we challenge participants to expand the mechanistic understanding of COVID-19 in other creative ways that include (but are not restricted to):
- Improved functional annotation of the current COVID-19 maps will support more accurate inference in the context of COVID-19. Improved definition of COVID-characteristic processes will help our understanding of the disease and its progression, the cellular mechanisms involved, and help identify new ways to counter or minimize the effects of these processes.
- Application of modeling to identify new therapeutic targets and drug candidates or predict disease outcomes, such as response to treatments or risk of developing severe symptoms
- Patient stratification: How can molecular footprints from patient data be used in the context of the COVID-19 disease map to better stratify patients?
- Cross-species: What are the singular and common disease mechanisms of action in SARS-CoV infection and other comparable viruses?
COVID-19 mechanistic map
Check out and download the Disease Maps COVID-19 network in GPML, SBML, SBGN-ML or SIF format.
In order to download a Disease Maps COVID-19 sub-map, you would need to scroll in and select the sub-map you are interested in (example: PAMP signalling associated submap), then you can right-click on the map and select the option that best suits you (GPML, SBML, SBGN-ML). The Simple Interaction File (SIF) resulting from each map Disease Maps COVID-19 sub-map can be downloaded here
Check out and download the Disease Maps COVID-19 network in GPML, SBML, SBGN-ML or SIF format. In order to download a Disease Maps COVID-19 sub-map, you would need to scroll in and select the sub-map you are interested in (example: PAMP signalling associated submap), then you can right-click on the map and select the option that best suits you (GPML, SBML, SBGN-ML). The Simple Interaction File (SIF) resulting from each map Disease Maps COVID-19 sub-map can be downloaded here.
COVID-19 SIGNOR map
The SIGnaling Network Open Resource 2.0 (SIGNOR 2.0) is a public repository that stores signaling information as binary causal relationships between biological entities. The captured information is represented graphically as a signed directed graph. Each signaling relationship is associated with an effect (up/down-regulation) and the mechanism (e.g. binding, phosphorylation, transcriptional activation, etc.) causing the up/down-regulation of the target entity. SIGNOR has curated the causal relationships that, according to available evidence, are likely to be relevant for the COVID-19 pathology. The perturbations caused by viral infection are integrated into the global cell network.
Check out and download the COVID-19 causal Network from SIGNOR db in .tsv format.
In order to download the diagram please go to SIGNOR db and select from the “Download pathway data” box (third box) the COVID-19 causal Network. Detailed information about the entities composing the COVID-19 causal Network can be downloaded from the “Download SIGNOR entity data” box.
CoV-HiPathia web tool
This web tool implements a mechanistic model of human signaling for the interpretation of the consequences of the combined changes of gene expression levels and/or genomic mutations in the context of signaling pathways known to be involved in the infection by SARS-CoV-2, which are updated with the curated versions released by the COVID-19 Disease Map curation project.
You can analyze your COVID-19 data on the CoV-HiPathia web tool here.
Please sign up to announcements from the CAMDA general forum for alerts.
Please read and accept the data download agreement for access.
STAY CONNECTED
Tweet