2019.HST.953: Collaborative Data Science in Medicine

HST.953: Collaborative Data Science in Medicine is a guide for students who are interested in performing retrospective research using data from electronic health records (Medical Information Mart for Intensive Care [MIMIC] database and eICU Collaborative Research Database [eICU-CRD]). The course covers steps of parsing a clinical question into a study design and methodology for data analysis and interpretation, with emphasis on the data curation process that is required before any analysis can be performed. Understanding and navigating the databases requires working closely with the clinicians who work in intensive care units, and can be much more challenging than the statistics and machine learning tasks. Activities include reviewing case studies from the MIMIC and eICU-CRD databases and a collaborative research project. Student teams will choose a question and clinician to work with for their project. Students will meet weekly with clinician mentors at pre-arranged times.


While clinical trials are best in inferring causality, they are not adept at demonstrating small effect size across a population, which is typical given heterogeneity of treatment effect. Moreover, clinical trials typically exclude important subgroups (older patients, those with chronic diseases): findings may not be generalizable to the real-world. Because of the limitations of clinical trials including cost, many practice guidelines are supported by low-quality evidence. To make matters worse, these guidelines are often adopted in countries where funding for research is limited. The digitalization of healthcare data may provides an opportunity to develop locally relevant practice guidelines rather than adopting those that are based on research on populations that may not generalize to. Digital data is proliferating in diverse forms within the healthcare field, not only because of the adoption of electronic health records, but also because of the growing use of wireless technologies for ambulatory monitoring. Since clinical trials may be too expensive to perform in most countries, digital health data provides an opportunity to conduct locally relevant research. Rigorous observational studies have been shown to correlate well with clinical trials across the medical literature in terms of estimates of risk and effect size. The world is abuzz with applications of machine learning in almost every field – commerce, transportation, banking, and more recently, healthcare. These breakthroughs are due to rediscovered algorithms, powerful computers to run them, and most importantly, the availability of bigger and better data to train the algorithms.

Course Information

A variety of datasets will be available, including MIMIC-III and Philips eICU from the USA.


There are no prerequisites for this course for MIT, Harvard and Wellesley students. For the rest, we require some experience with R, Python and/or SQL. Everyone is required to complete an online human subjects training (if they haven’t already done so), and sign a Data User Agreement to obtain access to the MIMIC and eICU Collaborative Research Database. This is a project-based course and all the students are required to participate in clinical research using one or both of the databases.


For more info please, contact us: HST953 Faculty


Aldo Arevalo

Miguel Armengol

Lucas Bulgarelli

Leo Anthony Celi

Kotaro Ebina

Marta Fernandes

Ryan Kindle

Alistair Jonhson

Regina Leung

Xiaoli Liu

Ming Yu Lu

Ned McCague

Anthony O’Brien

Kenneth Paik

Tom Pollard

Jesse Raffa

Andre Silva

Wei-Hung Weng


Blood glucose management among patients with sepsis admitted on the weekend versus a weekday

Aldo Arevalo, Jason Maley

The intensive care unit is among the costliest areas in the hospital to receive care. However, the value, or health benefit relative to cost, is unknown for many treatments in the intensive care unit (ICU). In many cases, the delivery of more treatments, tests, and monitoring does not necessarily result in better outcomes for patients. One example of this is management of blood glucose levels. Aggressive treatment of high blood glucose with insulin in the ICU has proven harmful in randomized clinical trials. However, the optimal target range for blood glucose is unknown; therefore resources are still overutilized to monitor blood glucose and provide insulin in ICU patients. We have observed, in MIMIC data preliminarily, that glucose readings and treatments with insulin occur less frequently (i.e. pseudo-randomized to “less aggressive treatment”) on the weekend than the weekdays. We believe this is an opportunity to study a natural experiment of different intensities of treatment. We aim to examine outcomes of patients with sepsis who are admitted on the weekend as compared to the weekday and therefore receive different intensities of glucose management. This natural experiment may also provide a useful model for comparing other critical care treatments as well.

Time-limited trial of aggressive care for very elderly patients

Anthony O’Brien, Jason Maley

Treatment within an intensive care unit (ICU) is costly, ICU resources are in high-demand, and the appropriate allocation of ICU care to patients who stand to benefit is essential to providing the highest value care for the sickest patients. Historically, mortality rates for patients with cirrhosis (severe liver disease caused by alcohol, hepatitis C virus, and other factors) have been as high as 90% in the intensive care unit (ICU). Aggressive treatment within the intensive care unit (ICU) may reduce these rates by up to 44%. Lack of aggressive intensive care treatment is one of the strongest predictors of mortality even after adjusting for illness severity. Unfortunately, the response of cirrhotic patients to treatment within the ICU is unpredictable and costly, and starting or stopping invasive therapy can be taxing to the patient, family, and treating physician. As an alternative to unrestricted intensive care (life-saving care in the ICU without placing any limitation on quantity or duration), the use of time-limited trials (TLTs) of care to evaluate a patient’s potential for recovery, or deterioration, in the ICU is becoming more common. A common example of this would be a physician, patient, and family agreeing to administer aggressive treatment while monitoring the patient’s outcome over 5 days of treatment, to see if ICU treatments can save the patient or if the patient does not improve despite the most aggressive care. If the patient improves during a TLT, treatment is continued, but if the patient worsens, the patient is normally transferred to palliative care. However, given the ethical dilemma of randomly assigning unlimited life-saving treatment vs limited lifesaving treatment to patients in clinical trials, there is a gap in the literature on the optimal duration for TLTs. Accordingly, simulation models of a patient’s trajectory of illness in the ICU may overcome this limitation and inform physicians about the optimal duration of a TLT. Therefore, to address the gap, we can simulate the course of cirrhotic patients in the ICU receiving different durations of TLTs versus unrestricted intensive care, to determine the best TLT duration option.

Patient non-adherence

Joy Wu, Felipe Batalini, Shrey Lakhotia

Patient non-adherence (NA), which is estimated to cost around $100 billion annually in the United States healthcare system is an example of clinical information that is recorded exclusively in unstructured clinical notes. NA is a behavior when patients do not follow medical advice. It is thought that NA increases the risk of undiagnosed and poorly-treated illnesses, with consequences ranging from poorer patient-centered outcomes to biasing results of clinical drug trials (when not properly controlled for). Extracting complex medical concepts such as NA from free text notes with Natural Language Processing typically requires significant programming and clinical resources. With the latest Expert-in-the-loop NLP approaches, we have already created a NA dataset with reasonable accuracy when applied to MIMIC patients based on their notes (documented NA only). This NA dataset specifies which types of NA (medication, dietary, appointment, refusal of medical advice, not other wise specified) a patient may have. Based on the dataset, we have seen demographic patterns previously described only anecdotally, such as married patients are more adherent, being supported by data. We are looking for further collaboration on 1) building a model that predictions future NA or using NA to predict certain clinical outcome better and 2) incorporating deep learning/machine learning (possibly with active learning) to improve the NLP classification pipeline even more.

Provider bias

Joy Wu, Adrian Wong

Documented “provider bias” is an anecdotal provider behavior that is hard to pin down in recorded clinical data (e.g., “doctors provide poorer care to left handed patients”). This project explores the current limits of Natural Language Processing tools in the clinical domain for investigating such clinical “sentiment”. For example, are there any differences in how provider write notes for alcoholic cirrhosis and non-alcoholic cirrhosis patients (since the former involves an element of “choice” on the patients actions)? Additionally, we would like to explore if there are differences in the structured data that may indicate a biased treatment of patients due to possible social prejudice related to their disease process. Several possible comparison cohorts have already been extracted from the MIMIC-III dataset (e.g. alcoholic vs non-alcoholic cirrhosis, morbidly obese vs not obese, young MVA vs young drug addict, and different ethnic groups). The approaches that we would like to analyze the notes and structured data for “sentiment bias” or just “sentiment differences”, are: 1) open domain sentiment analysis tool(s), 2) topic modeling (supervised NLP), 3) Occurrence analysis of sentiment related vocabulary (lexicons) curated from the MIMIC notes using an Expert-in-the-loop NLP tool , 4) Open to ideas. The goal of the study is two-fold: firstly to explore in depth for evidence of documented “provider bias” (negative results ok) in MIMIC’s ICU population; and secondly, to use the challenges encountered during this process as material to write up a review for the maturity of different NLP tools for aiding clinical studies using unstructured notes.

Hypernatremia and serum sodium prediction

Eric Mlodzinski, Raffi Sherak

Hypernatremia, defined as a serum sodium concentration > 145 mEq/L, is a common clinical entity in patients across all hospital settings. The current standard of care for managing hypernatremia involves calculating a “free water deficit” to be replaced by free water and hypotonic fluids. This method does not always yield expected results, especially among more critically ill patients in the ICU. Inadequate serum sodium correction rates are associated with increased mortality, and several studies have found that the majority of hospitalized patients with hypernatremia are corrected at an inadequate rate. The goal of this project is to develop a prescriptive model for the amount of free water to administer to patients with hypernatremia based on a patient’s clinical characteristics. This involves describing the characteristics, treatment, and outcomes of patients with hypernatremia and identifying predictive and influential factors of serum sodium correction in the MIMIC and eICU databases. We then will use machine learning methods to develop a more reliable formula that can be applied clinically to improve outcomes for these patients and compare it to the current standard of care.

Capillary leak index in sepsis

David Kaufman, Miguel Armengol, Leo Anthony Celi

Sepsis is an important public health problem. It is a leading cause of morbidity and mortality, and recent models suggest that caring for patients with sepsis costs around $1.5 billion annually in the USA. Infusion of intravenous fluid is a fundamental component of caring for patients with sepsis. Increased permeability of the vasculature is a key characteristic of sepsis. The endothelium (the layer of cells that lines blood vessels) loses its ability to prevent the leakage of fluid and protein from the blood and into the tissues. This phenomenon, sometimes called the “capillary leak syndrome,” leads to crucial physiologic derangements, such as low circulating blood volume and shock (from the loss of fluids), impaired drug-binding capacity (from loss of proteins), and organ failure (from swelling of organs due to excessive buildup of water inside them.) Measuring the leakage of fluid and protein from the blood and into the circulation requires special equipment that is not widely available in clinical practice in the USA. Intuitively, inferring the degree of capillary leak should be possible by observing the concentration of hemoglobin (a large blood protein that does not leak out of the blood vessels) as intravenous fluids are infused. If the volume of fluid remains within the vasculature, the concentration of hemoglobin should decline, as dilution occurs. Conversely, if the fluid leaks out of the vasculature, the hemoglobin concentration should not decline much as IV fluids are infused. In this study, we aim to derive and validate a “capillary leak index” by evaluating the concentration of hemoglobin over time and comparing it to the volume of IV fluids infused. We hypothesize that patients with a higher capillary leak index will have a higher risk for morbidity and mortality. The goal of this project is to use the eICU and MIMIC III databases to test this hypothesis and formulate an easily used bedside estimate of capillary leak that clinicians can use to guide care and prognosis.

Predicting mortality in the cardiac ICU

Neel Butala, Darin Rosen

Care of patients with acute cardiac issues has transformed dramatically over the past couple of decades, and the cardiac intensive care unit (ICU) has evolved to accommodate patients with increasing complexity and acuity. Scoring systems to predict outcomes such as mortality in the ICU are important for clinical quality improvement and research. In particular, predictive scoring systems enable clinicians to benchmark care across institutions and permit researchers to ensure similar baseline risks between populations compared in clinical trials. Multiple validated predictive scoring systems exist for patients in the general ICU. However, there are no predictive scoring systems for the cardiac ICU, where patients often present with uniquely acute life-threatening issues and can have very rapid changes in hemodynamics and clinical status. This project involves leveraging the rich clinical data in the both MIMIC database and the Philips eICU database to develop a predictive scoring system for mortality in the cardiac ICU. Both traditional statistical modeling and advanced machine learning methods will be used to identify models with the highest accuracy in predicting death. This model can not only be used to improve quality of care across cardiac ICUs, but it can also fuel clinical trials to improve the care in the cardiac ICU patients more broadly.

Blood transfusion thresholds

Chris Worsham, Kangli Wu, Kotaro Ebina

Practices around blood transfusion in the intensive care unit vary from physician to physician, from unit to unit, and from institution to institution. While evidence-based guidelines are available for common conditions like upper GI bleeding, little is known about best practices for transfusion of red blood cells to critically ill patients with anemia not due to acute bleeding. A hemoglobin level of 7.0 is a commonly cited threshold below which many will chose to transfuse a patient regardless of whether they are bleeding or not, a threshold that arose from a randomized controlled trial of patients only with upper GI bleeding. Our goal is to examine transfusion practices for non-bleeding patients in medical centers in the eICU database, using regression discontinuity design around this hemoglobin threshold of 7.0 to assess clinical consequences and outcomes following blood transfusions in non-bleeding patients. ​

Assessing the role of relative hypoglycemia on ICU outcomes

Jesse Raffa, Wei-Hung Weng, Brian Xia

Maintenance of glucose levels in the ICU has largely focused on retaining the absolute levels of serum glucose in an optimal range. Recent studies have found that intense glycemic control resulting in hypoglycemia is associated with increased risk of death in the ICU. The exact mechanism of this increased risk of mortality remains unclear. We hypothesize that although the occurrence of absolute hypoglycemia may play a role, relative hypoglycemia (a sudden drop of glucose levels) may also increase the risk of death, even in patients not experiencing hypoglycemia throughout the ICU stay. This project involves modeling multiple dependent processes including: glucose testing, insulin administration, the occurrence of relative hypoglycemia and how these relate to mortality in the ICU. This project has the potential to have an impact on a large number of patients, as glycemic control is important for diabetic and non-diabetic patients in the ICU. It also requires a large degree of clinical and data science expertise as the process by which glucose levels are affected and controlled is complex, requiring collaboration from both domains to fully understand.

Prediction of moderate hypoxemia

Marie Charpignon, Ryan Kindle, Cong Feng

Mechanical ventilation is one of the most common interventions implemented in the ICU, with more than half of admitted ICU patients being ventilated in the first 24 hours. Mechanical ventilators are machines that can “breathe” for a patient, but can also be used to simply support a patient’s oxygen intake. It is usually administered to help patients recover from a variety of conditions, ranging from acute respiratory failure, surgery, and the focus of this paper, hypoxemia. Hypoxemia is an oxygen deficiency in the arterial blood and affects more than half of ICU patients. It can quickly become a life-threatening condition when not treated in a timely manner, especially for patients already in critical condition. In this work, we will use Gradient Boosting Trees to predict the onset of hypoxemia 1-6 hours prior to the window it occurs in and evaluate the performance against a baseline logistic regression model. Additionally, we will explore deep learning models such as convolutional neural networks (CNNs) as well as long short-term memory networks (LSTMs), and assess potential performance gains. If time permits, we could extend the model to the prediction of hypoxemia severity instead of just event occurrence. Another approach could be to develop a multitask model to predict the occurrence of various complications of mechanical ventilation among ICU patients – hypoxemia being one of them.

Hemodilution effect

Nikhil Shankar, Ming Lu, Andre Duarte Silva

In intensive care medicine, we frequently trend laboratory values over time to help predict a patient’s projected clinical course. The hemoglobin is a measure of red blood cell concentration and is commonly and frequently measured – a sudden decrease on serial measurement could suggest, for instance, intra-abdominal bleeding that would need expedited surgical repair, or even the onset of increased capillary permeability and the development of sepsis. Intravenous (IV) fluid is one of the most common medications administered in the ICU and has been associated with an artificial dilution of the hemoglobin level (so called ‘hemodilution’). Recent evidence strongly suggests that this phenomenon exists, but there is no model that could predict the expected drop in hemoglobin, given a certain amount of infused IV fluid. We aim to create such a predictive model using the MIMIC and eICU databases over the first 24-hour period of a patient’s ICU stay. Using this model, clinicians would be able to easily identify patients in whom hemoglobin drops more than expected, and could more quickly perform life-saving interventions such as surgery or early vasopressors.

MIMIC-CXR Project #1

Wei-Hung Weng, Jesse Raffa, Regina Leung


MIMIC-CXR Project #2

Alistair Johnson, Tom Pollard