2017.HST.953: Collaborative Data Science in Medicine

September 8, 2017


HST.953: Collaborative Data Science in Medicine, focuses on the secondary analysis of clinical data that is routinely collected in the process of care. In this course, students will work with Boston-area clinicians on research projects with the goal of a publication-ready manuscript at the end of the semester. Three of the 15 papers from last fall are already being reviewed by various clinical journals, while the rest are on track for submission over the next few months.


While clinical trials are best in inferring causality, they are not adept at demonstrating small effect size across a population, which is typical given heterogeneity of treatment effect. Moreover, clinical trials typically exclude important subgroups (older patients, those with chronic diseases): findings may not be generalizable to the real-world. Because of the limitations of clinical trials including cost, many practice guidelines are supported by low-quality evidence. To make matters worse, these guidelines are often adopted in countries where funding for research is limited.

Digitalization of healthcare data may provide an opportunity to develop locally relevant practice guidelines rather than adopting those from other countries. Digital data is proliferating in diverse forms within the healthcare field, not only because of the adoption of electronic health records, but also because of the growing use of wireless technologies for ambulatory monitoring. Since clinical trials may be too expensive to perform in most countries, digital health data provides an opportunity to conduct locally relevant research. Rigorous observational studies have been shown to correlate well with clinical trials across the medical literature in terms of estimates of risk and effect size. The world is abuzz with applications of machine learning in almost every field – commerce, transportation, banking, and more recently, healthcare. These breakthroughs are due to rediscovered algorithms, powerful computers to run them, and most importantly, the availability of bigger and better data to train the algorithms.

Course Information

Syllabus, schedule, and more information will be coming soon. Complete the Expression of Interest Form to be added to our mailing list and receive email notifications.


There are no prerequisites for this course, other than the permission of the instructor. We would recommend some familiarity with basic programming and statistics, but students with biomedical backgrounds are encouraged to attend. Students wishing to audit or participate in the course are required to complete human subjects training (if they haven’t already done so), and submit proof of approval to work with the MIMIC­III database, but will not have to complete the exercises or other assignments. All students regardless of their status are expected to join a final project group and contribute to a final project.


Course Directors:

  • Dr. Leo Anthony Celi
  • Dr. Alistair Johnson
  • Dr. Tom Pollard
  • Dr. Jesse Rafa


  • Dr. Jerome Aboab
  • Dr. Christina Chen
  • Dr. Alon Dagan
  • Tristan Naumann
  • Dr. Kenneth Paik