An ecosystem that
improves healthcare with data science

MIT Critical Data develops local AI capacity in healthcare by building open data & software, fostering community engagement through datathons, and advocating for AI equity in research.

Research
We strongly believe in AI health equity. We lead the research community in how to achieve this goal.
Learn More
Events
We believe in developing data communities and local AI capacity. We host datathons globally to achieve this.
Learn More
Resources
We develop and share open source health data, software, and resources to foster healthcare innovation.
Learn More

Our Mission

Our mission is to unite clinicians, data scientists, engineers, and research communities around the world to revolutionize healthcare in a way that is democratised, decentralised, and equitable. We achieve this by publishing open health data and software tools; training local communities to build AI capacity through datathons; and advocating for diversity and equity in AI.

We work with and develop multidisciplinary teams from around the world to build local capacity.

Our Impact

A key part of our mission is to build global capacity in Health AI and data science.

Datasets

260+

Open health datasets have been shared through our PhysioNet platform.

This has led to more than 2000 new publications in the last 4 years.

MIMIC Footprint

10k+ citations

The MIMIC dataset has been cited more than 10,000 times in scientific publications.

AI Capacity Building

100+ events

We have trained 5000+ people in data science and AI through our events and workshops.

Hosted in 39 countries, training a generation of AI-ready clinicians, researchers, and engineers.

Community

80k+ people

People have used our datasets, leading to a phenomenal scientific footprint.

The MIMIC dataset alone has been cited more than 10,000 times in scientific publications.

Our Research

We produce high impact research to advance AI equity & the democratization of Health AI.

2023-08-14PLOS Global Public Health
A new tool for evaluating health equity in academic journals; the Diversity Factor.
Gallifant J, Zhang J, Whebell S, Quion J, Escobar B, Gichoya J, Herrera K, Jina ...+ More
We introduce the concept of a diversity factor for journals. Given that journals play a huge role in the creation and dissemination of knowledge, it’s time to downgrade the importance of the impact fa...+Expand
2022-07-11JAMA Internal Medicine
Assessment of Racial and Ethnic Differences in Oxygen Supplementation Among Patients in the Intensive Care Unit
Gottlieb ER, Ziegler J, Morley K, Rush B, Celi LA
The manuscript reported that Asian, Black, and Hispanic patients received less supplemental oxygen than White patients after adjusting for clinical confounding. The gap is fully explained by the pulse...+Expand
2022-07-05Springer Journal of Digital Imaging
Developing and Validating Multi-Modal Models for Mortality Prediction in COVID-19 Patients: a Multi-center Retrospective Study
Wu JT*, de la Hoz MÁA*, Kuo PC*, Paguio JA, Yao JS, Dee EC, Yeung W, Jurado J, M...+ More
We have published numerous papers in machine learning not just in critical care medicine but across different specialties such as ophthalmology, radiology, surgery, nursing, bioethics, among others. W...+Expand
2022-05-11The Lancet Digital Health
AI recognition of patient race in medical imaging: a modelling study
Gichoya JW, Banerjee I, Bhimireddy AR, Burns JL, Celi LA, Chen LC, Correa R, Dul...+ More
In this paper, we demonstrated that computers can learn the race-ethnicity from medical images in the absence of any clinical data. What was most perplexing was that we (and others who read the paper)...+Expand
2021-07-03JAMA Network Open
Discrepancies Between Pulse Oximetry and Arterial Oxygen Saturation Measurements by Race and Ethnicity and Association With Organ Dysfunction and Mortality.
Wong AI, Charpignon M, Kim H, Josef C, de Hond AAH, Fojas JJ, Tabaie A, Liu X, M...+ More
In this study, there was greater variability in oxygen saturation levels for a given Spo2 level in patients who self-identified as Black, followed by Hispanic, Asian, and White. Patients with and with...+Expand
2020-09-01The Lancet Digital Health
The myth of generalisability in clinical research and machine learning in health care.
Futoma J, Simons M, Panch T, Doshi-Velez F, Celi LA.
This paper, which has been cited 212 times so far, addresses the second most critical issue in artificial intelligence (after data bias). The pursuit of model generalizability is misguided given that ...+Expand
2018-10-22Nature Medicine
The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care.
Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA.
This is the first paper published that employed reinforcement learning on electronic health record data. This article has been cited more than 800 times according to Google Scholar, and is in the 99th...+Expand
2016-06-30Springer Open
Secondary Analysis of Electronic Health Records
Charlton P, Ghassemi M, Johnson AE, Komorowski M, Marshall D, Neumann T, Paik K,...+ More
This is the first book published that describes the process of curating, exploring and analyzing messy electronic health record data. Like a cookbook, it comes with SQL and R codes and queries to demo...+Expand
2016-05-24Nature Scientific Data
MIMIC-III, a freely accessible critical care database.
Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovit...+ More
This paper is a description of the MIMIC-III dataset. It has been cited more than 5000 times since its publication in 2016. Leo Celi, lead ICU clinician worked with the engineers at MIT who design and...+Expand

2
Community Building

We host events globally to build Health AI capacity in local communities

Every month we move around the world to train the global community on how to leverage open data, develop AI models, and evaluate AI tools in healthcare. Advancing safe, equitable, and democratised Health AI can only be reliably achieved together with collective expertise. We build that global AI capacity through datathons and health & systems thinking for equity workshops hosted worldwide.

3
Resources

We share open health datasets, develop open source software tools, and share learning material to foster innovation in healthcare.

Datasets

Our team created and maintains the MIMIC and eICU-CRD databases - all freely available. MIMIC is one of the largest and most widely cited ICU datasets available.

Software

Our team created and maintains the PhysioNet platform. The PhysioNet platform contains hundreds of health datasets and open source software that users can use.

An ecosystem thatimproves healthcare with data science

Our Mission

1We develop AI capacity globally

2We democratize Clinical Data Sharing

3We drive health equity research