2018.HST.953: Collaborative Data Science in Medicine

July 24, 2018

2018.HST.953: Collaborative Data Science in Medicine

HST.953: Collaborative Data Science in Medicine is a course that focuses on the secondary analysis of clinical data that is routinely collected in the process of care. In this fall course, students will work with Boston-area clinicians on research projects with the goal of a publication-ready manuscript at the end of the semester. Several of the papers from last 2 courses have already been published in high-impact clinical journals.


While clinical trials are best in inferring causality, they are not adept at demonstrating small effect size across a population, which is typical given heterogeneity of treatment effect. Moreover, clinical trials typically exclude important subgroups (older patients, those with chronic diseases): findings may not be generalizable to the real-world. Because of the limitations of clinical trials including cost, many practice guidelines are supported by low-quality evidence. To make matters worse, these guidelines are often adopted in countries where funding for research is limited. The digitalization of healthcare data may provides an opportunity to develop locally relevant practice guidelines rather than adopting those that are based on research on populations that may not generalize to. Digital data is proliferating in diverse forms within the healthcare field, not only because of the adoption of electronic health records, but also because of the growing use of wireless technologies for ambulatory monitoring. Since clinical trials may be too expensive to perform in most countries, digital health data provides an opportunity to conduct locally relevant research. Rigorous observational studies have been shown to correlate well with clinical trials across the medical literature in terms of estimates of risk and effect size. The world is abuzz with applications of machine learning in almost every field – commerce, transportation, banking, and more recently, healthcare. These breakthroughs are due to rediscovered algorithms, powerful computers to run them, and most importantly, the availability of bigger and better data to train the algorithms.

Course Information

A variety of datasets will be available, including MIMIC-III and Philips eICU from the USA. Please complete the Expression of Interest Form to be added to our mailing list and receive email notifications.


There are no prerequisites for this course for MIT, Harvard and Wellesley students. For the rest, we require some experience with R, Python and/or SQL. Everyone is required to complete an online human subjects training (if they haven’t already done so), and sign a Data User Agreement to obtain access to the MIMIC and eICU Collaborative Research Database. This is a project-based course and all the students are required to participate in clinical research using one or both of the databases.


Please, download our syllabus here Syllabus

For more info please, contact us: HST953 Faculty

Faculty (Listed Alphabetically)

Miguel Ángel Armengol

Leo Anthony Celi

Christina Chen

Alon Dagan

Marta Fernandes

Julian Euma

Alistair Johnson

Ryan Kindle

Ken Paik

Tom Pollard

Jesse Raffa

Nikhil Shankar

Shawn Sturland

Wei-Hung Weng

For more info please, contact us: HST953 Faculty