SMITID Project

Statistical Methods for Inferring Transmissions of Infectious Diseases from deep sequencing data

Bandeau jaunisse de la betterave sucrière

Viruses can cause epidemics of high impact in developing and developed countries alike. For such pathogens, inferring transmission links within a host population or between host populations (e.g. for zoonoses) is crucial to build epidemiological predictions and control strategies. In this aim, for fast-evolving pathogens, one can take advantage of the statistical analysis of pathogen sequence data because they inform which hosts contain pathogen variants that are most closely related to each other. However, so far existing models have mostly exploited a limited amount of information from sequencing data, such as consensus Sanger sequences, although deep Sanger sequencing (DSS; based on amplicon cloning) and high-throughput sequencing (HTS) techniques can reveal the polymorphic nature of within-host populations of pathogens. In this project, we propose an avant-gardist modelling and statistical approach that will exploit DSS and HTS data to infer disease transmission links for fast-evolving pathogens, such as viruses, and to infer relationships between transmissions and environment.

ANR summary of the project

Key words: Computational biology ; Quantitative molecular epidemiology ; Model ; Statistical inference ; Disease transmissions


An ANR funded project - Work Programme 2016
Funding extended to March 31, 2022


2022-03-31: End of the project
2021-09-15: Mathilde Siegwart starts a postmaster funded by SMITID today
2020-12-14: Mélina Ribaud gives a talk during the ModStatSAP webinar about the identifiction of environmental factors favoring disease propagation
2020-12-11: Maryam Alamil defends her PhD thesis. Congratulation Maryam !
2020-02-18: Maryam Alamil presents a poster at CIRM, investigating the performance of SLAFEEL
2020-02-18: Samuel Soubeyrand gives a talk at CIRM about SLAFEEL
2019-10-14: Mélina Ribaud starts a postdoc funded by SMITID, with a co-supervision by Edith Gabriel
2019-06-06: Samuel Soubeyrand gives a talk about SLAFEEL for the French Biometrics Society at SFdS days, Nancy
2019-05-26: Maryam Alamil gives a talk about SLAFEEL at the Mathematical and Computational Evolutionary Biology meeting, Porquerolles
2019-05-13: Maryam Alamil gives a talk about SLAFEEL at the meeting of the GDR Ecostat, Avignon
2019-05-01: Maryam Alamil gives a talk about SLAFEEL at the meeting of young statisticians at Porquerolles
2019-03-12: Maryam Alamil gives a talk about SLAFEEL at the ModStatSAP meeting, Paris
2019-05-06: Our manuscript corresponding to the methodological core of the SMITID project is published in Philosophical Transactions B - Alamil et al. (2019)
2019-02-19: R packages SMITIDstruct et SMITIDvisu are on the CRAN
2019-02-01: Julien Boge joined us to work on the SMITID software WP during his Master internship supervised by Jean-François
2019-01-30: Maryam Alamil gives a talk about SLAFEEL at the annual workshop on Statistical Methods for Post Genomic Data held in Barcelona
2018-09-11: R scripts for estimating epidemiological links available in the ZENODO citeable archive, as well as Ebola, Swine influenza and potyvirus data used in our analyses
2018-06-25: First external PhD monitoring meeting for Maryam Alamil 
2018-06-22: Maryam Alamil presents a poster at the 3rd Mathematical Biology Modelling days of Besançon
2018-01-08: Karine Berthier will visit BioSP one day per weekly, in particular to work on the SMITID project
2017-10-01: Maryam Alamil begins her PhD funded by SMITID
2017-06-29: Meije Gawinowski starts her summer internship funded by SMITID
2017-06-07: Our review for characterizing plant virus spread using molecular epidemiology is online in Annual Review of Phytopathology
2017-05-02: Maryam Alamil starts her master internship funded by SMITID
2017-02-28: SMITID was presented at INRA’s conference for the 2017 Paris International Agricultural Show (SIA) - Symposium on People, Animals and the Environment: One Health 
2017-02-09-10: Discussion about the links between the research carried out in SMITID and the topics of interest in the STrATEGE network (STATistics in Ecology and GEnomic data)
2016-11-25: Kick-off meeting at the ANR headquarter (Paris) to launch the 2016 projects of the "Emerging pathogens-OneHealth" comity (CES 35)
2016-11-04: Kick-off meeting at Avignon gathering the project members and a few other colleagues
2016-11-01: Starting date of the project 


Samuel Soubeyrand (PI)
Jean-François Rey
Karine Berthier
Cécile Desbiez
Gaël Thébaud
Joseph Hughes


Siegwart M., Alamil M., Soubeyrand S. (2022). Tracing infectious disease transmissions from deep sequencing data with machine learning. In preparation.

Richard H., Lercier D. , Martinetti D. , Morris C., Soubeyrand S. (2022). Tropolink: A web application for computing geographic networks generated by air-mass movement. In preparation.

Alamil M., Thébaud G, Berthier K., Soubeyrand S. (2022). Characterizing viral within-host diversity in fast and non-equilibrium demo-genetic dynamics. Frontiers in Microbiology.

Ribaud M., Gabriel E., Hughes J., Soubeyrand S. (2021). Identifying potential significant factors impacting zero-inflated proportions data. ⟨hal-02936779v3

Alamil M., Bruchou C., Ribaud M., Thébaud G., Soubeyrand S. (2020). A study of factors influencing the performance of the reconstruction of transmissions in disease outbreaks. Poster at CIRM, Mathematical Modeling and Statistical Analysis of Infectious Disease Outbreaks, Marseille, France, 17-21/02/2020.

Alamil M., Hughes J., Berthier K., Desbiez C., Thébaud G., Soubeyrand S. (2019). Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases. Philosophical Transactions of the Royal Society B: Biological Sciences 374: 20180258. doi:10.1098/rstb.2018.0258

Picard C., Dallot S., Brunker K., Berthier K., Roumagnac P., Soubeyrand S., Jacquot E., Thébaud G. (2017). Exploiting Genetic Information to Trace Plant Virus Dispersal in Landscapes. Annual Review of Phytopathology 55. doi:10.1146/annurev-phyto-080516-035616


Alamil M., Thébaud G., Berthier K., Soubeyrand S. (2022). MoWPP: Model of Within-host Pathogen Population dynamics (1.0). Zenodo.

Ribaud M. (2021). ZIprop: Permutations tests and performance indicator for zero-inflated proportions response (0.1.1).

Ribaud M., Martinetti D., Soubeyrand S. (2021). Data for the comparison of COVID-19 mortality in European and North American geographic entities [Data set]. Zenodo.

Rey J.-F., Boge J. (2021). SMITIDvisu: Visualize Data for Host and Viral Population from 'SMITIDstruct' using 'HTMLwidgets' (0.0.9).

Alamil M., Hughes J., Berthier K., Desbiez C., Thébaud G., Soubeyrand S. (2019). SLAFEEL: R scripts and reformatted data analyzed by Alamil et al. (2019) (1.5) [Data set]. Zenodo.

Rey J.-F. (2019). SMITIDstruct: Data Structure and Manipulations Tool for Host and Viral Population (0.0.5).


Alamil, M. (2020). Reconstruction of the transmission of a virus during an epidemic by statistical learning on genomic data. PhD Thesis, Aix-Marseille University.

Alamil, M. (2017). Modélisation de la cinétique et de l'évolution virale intra-hôte. Master 2 Internship Report, BioSP, INRA. Under the supervision of S. Soubeyrand.

Gawinowski, M. (2017). A simulation model for the kinetics, evolution and transmission of viral populations. Master 1 Internship Report, BioSP, INRA. Under the supervision of S. Soubeyrand.

Boge, J. (2019). Développement d’un composant de visualisation spatio-temporelle d’une épidémie. Master 2 Internship Report, BioSP, INRA. Under the supervision of J.-F. Rey.


Samuel Soubeyrand (PI).