An ecosystem that
improves healthcare with data science

MIT Critical Data develops local AI capacity in healthcare by building open data & software, fostering community engagement through datathons, and advocating for AI equity in research.

  • Research

    We strongly believe in AI health equity. We lead the research community in how to achieve this goal.

  • Events

    We believe in developing data communities and local AI capacity. We host datathons globally to achieve this.

  • Resources

    We develop and share open source health data, software, and resources to foster healthcare innovation.

Our Mission

Our mission is to unite clinicians, data scientists, engineers, and research communities around the world to revolutionize healthcare in a way that is democratised, decentralised, and equitable. We achieve this by publishing open health data and software tools; training local communities to build AI capacity through datathons; and advocating for diversity and equity in AI.

We work with and develop multidisciplinary teams from around the world to build local capacity.

Our Impact

A key part of our mission is to build global capacity in Health AI and data science.
Datasets
260+
Open health datasets have been shared through our PhysioNet platform.
This has led to more than 2000 new publications in the last 4 years.
MIMIC Footprint
10k+ citations
The MIMIC dataset has been cited more than 10,000 times in scientific publications.
MIMIC
AI Capacity Building
100+ events
We have trained 5000+ people in data science and AI through our events and workshops.
Hosted in 39 countries, training a generation of AI-ready clinicians, researchers, and engineers.
Community
80k+ people
People have used our datasets, leading to a phenomenal scientific footprint.
The MIMIC dataset alone has been cited more than 10,000 times in scientific publications.
1
Our Research

We produce high impact research to advance AI equity & the democratization of Health AI.

  • 2016-01-01Springer Open
    Secondary Analysis of Electronic Health Records
    Charlton P, Ghassemi M, Johnson AE, Komorowski M, Marshall D, Neumann T, Paik K,...+ More
    This is the first book published that describes the process of curating, exploring and analyzing messy electronic health record data. Like a cookbook, it comes with SQL and R codes and queries to demo...+Expand
  • 2016-05-24Nature Scientific Data
    MIMIC-III, a freely accessible critical care database.
    Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovit...+ More
    This paper is a description of the MIMIC-III dataset. It has been cited more than 5000 times since its publication in 2016. Leo Celi, lead ICU clinician worked with the engineers at MIT who design and...+Expand
  • 2022-07-11JAMA Internal Medicine
    Assessment of Racial and Ethnic Differences in Oxygen Supplementation Among Patients in the Intensive Care Unit
    Gottlieb ER, Ziegler J, Morley K, Rush B, Celi LA
    The manuscript reported that Asian, Black, and Hispanic patients received less supplemental oxygen than White patients after adjusting for clinical confounding. The gap is fully explained by the pulse...+Expand
  • 2021-07-03JAMA Network Open
    Discrepancies Between Pulse Oximetry and Arterial Oxygen Saturation Measurements by Race and Ethnicity and Association With Organ Dysfunction and Mortality.
    Wong AI, Charpignon M, Kim H, Josef C, de Hond AAH, Fojas JJ, Tabaie A, Liu X, M...+ More
    In this study, there was greater variability in oxygen saturation levels for a given Spo2 level in patients who self-identified as Black, followed by Hispanic, Asian, and White. Patients with and with...+Expand
  • 2018-10-22Nature Medicine
    The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care
    Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA.
    This is the first paper published that employed reinforcement learning on electronic health record data. This article has been cited more than 800 times according to Google Scholar, and is in the 99th...+Expand
  • 2022-05-11The Lancet Digital Health
    AI recognition of patient race in medical imaging: a modelling study
    Gichoya JW, Banerjee I, Bhimireddy AR, Burns JL, Celi LA, Chen LC, Correa R, Dul...+ More
    In this paper, we demonstrated that computers can learn the race-ethnicity from medical images in the absence of any clinical data. What was most perplexing was that we (and others who read the paper)...+Expand
  • 2022-07-05Springer Journal of Digital Imaging
    Developing and Validating Multi-Modal Models for Mortality Prediction in COVID-19 Patients: a Multi-center Retrospective Study
    Wu JT*, de la Hoz MÁA*, Kuo PC*, Paguio JA, Yao JS, Dee EC, Yeung W, Jurado J, M...+ More
    We have published numerous papers in machine learning not just in critical care medicine but across different specialties such as ophthalmology, radiology, surgery, nursing, bioethics, among others. W...+Expand
  • 2023-08-14PLOS Global Public Health
    A new tool for evaluating health equity in academic journals; the Diversity Factor.
    Gallifant J, Zhang J, Whebell S, Quion J, Escobar B, Gichoya J, Herrera K, Jina ...+ More
    We introduce the concept of a diversity factor for journals. Given that journals play a huge role in the creation and dissemination of knowledge, it’s time to downgrade the importance of the impact fa...+Expand
  • 2020-09-01The Lancet Digital Health
    The myth of generalisability in clinical research and machine learning in health care.
    Futoma J, Simons M, Panch T, Doshi-Velez F, Celi LA.
    This paper, which has been cited 212 times so far, addresses the second most critical issue in artificial intelligence (after data bias). The pursuit of model generalizability is misguided given that ...+Expand

2
Community Building

We host events globally to build Health AI capacity in local communities

Every month we move around the world to train the global community on how to leverage open data, develop AI models, and evaluate AI tools in healthcare. Advancing safe, equitable, and democratised Health AI can only be reliably achieved together with collective expertise. We build that global AI capacity through datathons and health & systems thinking for equity workshops hosted worldwide.

3
Resources

We share open health datasets, develop open source software tools, and share learning material to foster innovation in healthcare.

Datasets

Our team created and maintains the MIMIC and eICU-CRD databases - all freely available. MIMIC is one of the largest and most widely cited ICU datasets available.

Software

Our team created and maintains the PhysioNet platform. The PhysioNet platform contains hundreds of health datasets and open source software that users can use.