UC Berkeley Statisticians use Galileo to estimate COVID-19 fatality rates more accurately, improve data collection

by Abhishek Priya | Sep 14, 2021 | Molecular Dynamics (page)

The other researchers involved in the study are Michael I. Jordan of UC Berkeley (named by Science as the most influential computer scientist alive in 2016), Reese Pathak of UC Berkeley, and Rohit Varma of the Southern California Eye Institute.

Read about the study, results, and Galileo in VentureBeat.

Lead author Angelopoulos turned to Galileo to run the statistical models for this project so that he could concentrate on the extremely time sensitive work at hand and avoid losing time and energy thinking about computing infrastructure:

“Galileo allowed me to run my expectation maximization algorithm in 1/3 of the time, and all I had to do was drag and drop my code. The experience was excellent.”

While the lab at Berkeley is clearly well-provisioned, Angelopoulos found that it was easier, faster, and more reliable to use Galileo than to rely solely on university resources. Before turning to Galileo, he lost an entire day of work when the university scheduling system assigned him to a GPU that was already in use!

The relative case fatality ratio is an important measure for data-driven policy and decision making because it can help to guide the allocation of scarce resources to the populations at the highest risk of death, which is critical during this pandemic. Other crucial measures include hospitalization rate and disease prevalence.

Because the data is spotty, all methods for estimating the fatality rate are at risk of introducing bias. But consistently looking at the relative fatality ratios between different groups gives policymakers the information they need in order to make decisions to protect human life.

The researchers sought to examine and correct biases in the calculation of case fatality ratios, especially biases related to time variation in case reporting across fatal and non-fatal cases. They also looked at the tendency towards under-reporting of cases that result in recovery.

Read the published study in Harvard Data Science Review.

To obtain more accurate estimations, the researchers recommend randomized data collection obtained through contact tracing. This involves testing all those who have recently come into contact with a known COVID-19 positive individual.

Graphical model from the study that captures aspects of the data generation process for COVID-19 surveillance data.

They explain: “We suggest that all of these contacts should be tested for COVID-19 one incubation period after exposure, regardless of whether or not they are symptomatic. … The population sampled using this strategy would be closer to the target population, since it would include asymptomatic cases.”

They show that this improved method for data collection would improve even the “naive” estimators of case fatality rates, which simply divide the number of deaths by the number of total cases.

Please contact us at galileo@hypernetlabs.io if you are interested in running statistical analysis in the cloud and/or access the app and start running immediately: Galileo App.