An ecosystem thatimproves healthcare with data science

MIT Critical Data develops local AI capacity in healthcare by building open data & software, fostering community engagement through datathons, and advocating for AI equity in research.

  • Data & Software

    We develop and share open source health data and software to foster healthcare innovation.

    Learn More
  • Community

    We believe in developing data communities and local AI capacity. We host datathons globally to achieve this.

    Learn More
  • Research

    We strongly believe in AI health equity. We lead the research community in how to achieve this goal.

    Learn More

What we do

Our Mission

Our mission is to unite clinicians, data scientists, engineers, and research communities around the world to revolutionize healthcare in a way that is democratised, decentralised, and equitable. We achieve this by publishing open health data and software tools; training local communities to build AI capacity through datathons; and advocating for diversity and equity in AI .

Its Impact

Open health datasets have been shared through our PhysioNet platform.
People have used our datasets, leading to 2000 new publications in the last 4 years.
Hosted in 21 countries have been organised to bring together clinicians, researchers, and engineers to leverage open health data within local communities to build AI capacity globally.
50 datathons

Decentralize Medical Knowledge

Develop AI capacity globally

We work with and develop multidisciplinary teams from around the world to build local capacity.

Democratize Clinical Data sharing

Advocate for open data sharing

We believe that sharing more clinical data is vital for innovation in healthcare.

Drive Health Equity Research

Promote health equity

We leverage relevant clinical data and data science to address health disparities.

Datasets and Software

We share open health datasets and develop open source software tools to foster innovation in healthcare.


Our team created and maintains the MIMIC and eICU-CRD databases - all freely available. MIMIC is one of the largest and most widely cited ICU datasets available.


Our team created and maintains the PhysioNet platform. The PhysioNet platform contains hundreds of health datasets and open source software that users can use.

Open health data and software is critical to addressing the disparities that currently pervade healthcare. For example, the insights produced by the global community from our open source MIMIC datasets are profound, with over 9000 citations and counting.

Community Building

We host datathons globally to build Health AI capacity in local communities

Every month we move around the world to train the global community on how to leverage open data, develop AI models, and evaluate AI tools in healthcare. Advancing safe, equitable, and democratised Health AI can only be reliably achieved together with collective expertise. We build that global AI capacity through datathons hosted worldwide.

We have worked with thousands of amazing people worldwide

“Engaging diverse communities into AI is the best defence we have against AI bias in healthcare.”

Leo Anthony Celi
Laboratory of Computational Physiology
Research Contributions

We produce high impact research to advance AI equity & the democratization of Health AI.

  • 2016Springer Open
    Secondary Analysis of Electronic Health Records
    Charlton P, Ghassemi M, Johnson AE, Komorowski M, Marshall D, Neumann T, Paik K,...+ More
    This is the first book published that describes the process of curating, exploring and analyzing messy electronic health record data. Like a cookbook, it comes with SQL and R codes and queries to demo...+Expand
    Read Full Paper
  • May 24, 2016Nature Scientific Data
    MIMIC-III, a freely accessible critical care database.
    Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovit...+ More
    This paper is a description of the MIMIC-III dataset. It has been cited more than 5000 times since its publication in 2016. Leo Celi, lead ICU clinician worked with the engineers at MIT who design and...+Expand
    Read Full Paper
  • Jul 11, 2022JAMA Internal Medicine
    Assessment of Racial and Ethnic Differences in Oxygen Supplementation Among Patients in the Intensive Care Unit
    Gottlieb ER, Ziegler J, Morley K, Rush B, Celi LA
    The manuscript reported that Asian, Black, and Hispanic patients received less supplemental oxygen than White patients after adjusting for clinical confounding. The gap is fully explained by the pulse...+Expand
    Read Full Paper
  • Jul 03, 2021JAMA Network Open
    Discrepancies Between Pulse Oximetry and Arterial Oxygen Saturation Measurements by Race and Ethnicity and Association With Organ Dysfunction and Mortality.
    Wong AI, Charpignon M, Kim H, Josef C, de Hond AAH, Fojas JJ, Tabaie A, Liu X, M...+ More
    In this study, there was greater variability in oxygen saturation levels for a given Spo2 level in patients who self-identified as Black, followed by Hispanic, Asian, and White. Patients with and with...+Expand
    Read Full Paper
  • 22 Oct, 2018Nature Medicine
    The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care
    Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA.
    This is the first paper published that employed reinforcement learning on electronic health record data. This article has been cited more than 800 times according to Google Scholar, and is in the 99th...+Expand
    Read Full Paper
  • May 11, 2022The Lancet Digital Health
    AI recognition of patient race in medical imaging: a modelling study
    Gichoya JW, Banerjee I, Bhimireddy AR, Burns JL, Celi LA, Chen LC, Correa R, Dul...+ More
    In this paper, we demonstrated that computers can learn the race-ethnicity from medical images in the absence of any clinical data. What was most perplexing was that we (and others who read the paper)...+Expand
    Read Full Paper
  • Jan 07, 2019Nature Medicine
    Guidelines for reinforcement learning in healthcare
    Gottesman O, Johansson F, Komorowski M, Faisal A, Sontag D, Doshi-Velez F, Celi ...+ More
    This paper describes the best practices in the application of reinforcement learning on healthcare data. It has been cited 352 times so far and is in the 98th percentile of the 445,021 articles tracke...+Expand
    Read Full Paper
  • Jul 05, 2022Springer Journal of Digital Imaging
    Developing and Validating Multi-Modal Models for Mortality Prediction in COVID-19 Patients: a Multi-center Retrospective Study
    Wu JT*, de la Hoz MÁA*, Kuo PC*, Paguio JA, Yao JS, Dee EC, Yeung W, Jurado J, M...+ More
    We have published numerous papers in machine learning not just in critical care medicine but across different specialties such as ophthalmology, radiology, surgery, nursing, bioethics, among others. W...+Expand
    Read Full Paper
  • Aug 14, 2023PLOS Global Public Health
    A new tool for evaluating health equity in academic journals; the Diversity Factor.
    Gallifant J, Zhang J, Whebell S, Quion J, Escobar B, Gichoya J, Herrera K, Jina ...+ More
    We introduce the concept of a diversity factor for journals. Given that journals play a huge role in the creation and dissemination of knowledge, it’s time to downgrade the importance of the impact fa...+Expand
    Read Full Paper
  • Sep, 2020The Lancet Digital Health
    The myth of generalisability in clinical research and machine learning in health care.
    Futoma J, Simons M, Panch T, Doshi-Velez F, Celi LA.
    This paper, which has been cited 212 times so far, addresses the second most critical issue in artificial intelligence (after data bias). The pursuit of model generalizability is misguided given that ...+Expand
    Read Full Paper