Data Science newsletter – March 11, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for March 11, 2020


Data Science News

HHS releases final rules on interoperability

MedCity News, Elise Reuter


The Department of Health and Human Services released two much-anticipated final rules on interoperability. The rules require health IT companies to gives patients access to their data and will penalize providers who don’t make patient data available.

Better Treatment through Big Data

Boston University, BU Today, The Brink


If you have a sore shoulder, physical therapy is likely to make you feel better. But traditionally, it has been difficult for physical therapists to quantify exactly how much better a patient gets.

Now imagine you enter a clinic, and after describing your symptoms and having an exam, the physical therapist shows you a graph charting the progress of people with the same condition. The curve’s trajectory details how you will improve and how many sessions it will take based on data from hundreds or thousands of patients who share your profile.

Patients in the Sargent College of Health & Rehabilitation Science BU Physical Therapy Center at the Ryan Center for Sports Medicine & Rehabilitation have started receiving such personalized charts. It’s part of a quest for sharper, data-fueled insights into patient outcomes.

Can an artificial intelligence algorithm be sued for malpractice?

STAT, Saurabh Jha


The rapid entry of artificial intelligence is stretching the boundaries of medicine. It will also test the limits of the law.

Artificial intelligence (AI) is being used in health care to flag abnormalities in head CT scans, cull actionable information from electronic health records, and help patients understand their symptoms.

At some point, AI is bound to make a mistake that harms a patient. When that happens, who — or what — is liable?

Exclusive: Email crash impeded HHS response to coronavirus

POLITICO, Dan Diamond


As health department officials worked quickly to negotiate an emergency funding package to fight the spreading coronavirus outbreak on Feb. 23, they came to a frustrating realization: Their email system had crashed.

The outage in the Health and Human Services secretary’s office stretched on much of the day, with some messages delayed up to 11 hours, creating frustration and slowing the Trump administration’s coronavirus response.

Twitter, Karim Boughida

Twitter, Karim B Boughida


best source of university responses to covid-19

Circulatory failure is predictable

ETH Zurich, Press Release


Researchers at ETH Zurich and Bern University Hospital have developed a method for predicting circulatory failure in patients in intensive care units – enabling clinicians to intervene at an early stage. Their approach uses machine learning methods to evaluate an extensive body of patient data.

OSU gets $8.7M machine learning grant

Corvallis Gazette Times (OR)


The Defense Department is giving Oregon State University almost $9 million to help robots learn more effectively by teaching them to explore their world like a toddler, the university announced on Monday.

Funded through a four-year, $8.7 million grant from the Defense Advanced Research Projects Agency, the OSU-led project is tasked with developing and training a “machine common sense service” that will learn about its environment much the way a toddler does.

OSU machine learning researcher Alan Fern is collaborating with roboticist Tucker Hermans of the University of Utah and behavioral psychologist Karen Adolph of New York University on the research project, which aims to inject elements of common sense into artificial intelligence and robotic systems.

Predicting Price Changes on Amazon Marketplace

Dartmouth College, Tuck School of Business


“The challenge for 3P sellers is that many of them are not established retailers like Amazon,” [Sharmistha] Sikdar says. “Many are really small businesses operating out of a garage, and they don’t have the technical capabilities to change prices or even know when they should change prices.”

Sikdar’s paper—“Price Dynamics on Amazon Marketplace: A Multivariate Random Forest Variable Selection Approach”—is one of the first studies of price dynamics in this context. An important methodological challenge that Sikdar and her co-authors faced while studying this topic is the statistical problem of a high dimensional covariate space. In layman’s terms, this means there was a large number of variables that could potentially be used to predict the price changes. They studied a single category of electrical kitchenware, analyzing 14 brands sold by more than 200 3P sellers. Without a way to narrow down the factors influencing price changes, this study would have had to do a regression analysis on hundreds of variables—a dataset too large and untargeted to yield helpful results. To get around this, the researchers created a variable selection algorithm using a machine learning tool called the multivariate random forest (MVRF) that winnows the variables down to those that are most influential—reducing them by 86 percent—which then allows a traditional regression analysis to be more effective. Furthermore, the prediction accuracy of the variables selected by the proposed algorithm when applied to the generalized additive regression model is higher than those of traditional methods, such as LASSO.

Creating Peru’s next generation of data scientists

MIT News, Institute for Data, Systems, and Society


“Participating in the MIT MicroMasters in Statistics and Data Science, I have discovered new concepts and skills that will allow me to become a data scientist,” says Karen Velasquez. “I am excited to apply what I have learned to challenges that will help NGOs in Peru.”

When Velasquez graduated with a bachelor’s degree in statistical engineering from the Universidad Nacional de Ingeniería in Lima, Peru, she was among the top 10 percent of students in her class. Now, while working for a marketing and intelligence company in Peru, she’s expanding her education as one of the first 25 participants in the Aporta’s Advanced Program in Data Science and Global Skills, which supports a cohort of Peruvians through the MIT MicroMasters Program in Statistics and Data Science.

The 10 most innovative data science companies of 2020

Fast Company


Over the last few years, the misleading and unethical ways that companies use consumer data has come under fire. As a result, regulators are starting to move toward ensuring that data is collected and shared in a transparent, consent-based way, with the country’s strictest data privacy law now in effect in California.

< But many startups and well-established corporations are also using cleverly captured data sets in a way that doesn’t violate privacy—to answer more questions about our world and make business more efficient, from assessing hog health in China to anticipating dangers on construction sites to predicting weather down to the city block. These are the honorees on this year’s Most Innovative Companies Data Science list. 1. Strava

Nanoscale 4D Printing Technique May Speed Development of New Therapeutics

City University of New York, Advanced Science Research Center


Researchers at the Advanced Science Research Center at The Graduate Center, CUNY (CUNY ASRC) and Northwestern University have created a 4D printer capable of constructing patterned surfaces that recreate the complexity of cell surfaces. The technology, detailed in a newly published paper in Nature Communications (DOI: 10.1038/s41467-020-14990-x), allows scientists to combine organic chemistry, surface science, and nanolithography to construct precisely designed nanopatterned surfaces that are decorated with delicate organic or biological molecules. The surfaces will have a wide variety of uses, including in drug research, biosensor development, and advanced optics. Importantly, this technology can create surfaces with different materials, and these materials can be patterned across the surface without the use of expensive photomasks or tedious clean room processes.

“I am often asked if I’ve used this instrument to print a specific chemical or prepare a particular system,” said the study’s primary investigator Adam Braunschweig, a faculty member with the CUNY ASRC Nanoscience Initiative and The Graduate Center and Hunter College Chemistry Departments. “My response is that we’ve created a new tool for performing organic chemistry on surfaces, and its usage and application are only limited by the imagination of the user and their knowledge of organic chemistry.”

Dressing for the Surveillance Age

The New Yorker, John Seabrook


As cities become ever more packed with cameras that always see, public anonymity could disappear. Can stealth streetwear evade electronic eyes?

Through apps, not warrants, ‘Locate X’ allows federal law enforcement to track phones

Protocol, Charles Levinson


U.S. law enforcement agencies signed millions of dollars worth of contracts with a Virginia company after it rolled out a powerful tool that uses data from popular mobile apps to track the movement of people’s cell phones, according to federal contracting records and six people familiar with the software.

The product, called Locate X and sold by Babel Street, allows investigators to draw a digital fence around an address or area, pinpoint mobile devices that were within that area, and see where else those devices have traveled, going back months, the sources told Protocol.

They said the tool tracks the location of devices anonymously, using data that popular cell phone apps collect to enable features like mapping or targeted ads, or simply to sell it on to data brokers.

College Towns to Clear Out as Virus Upends Local Businesses

Bloomberg Economics, Janet Lorin


  • Harvard, Stanford among schools to go virtual or cut visits
  • Economic damage could last long if U.S. campuses stay shut

  • Google tracked his bike ride past a burglarized home. That made him a suspect.

    NBC News, Jon Schuppe


    [Zachary] McCoy examined his phone. An avid biker, he used an exercise-tracking app, RunKeeper, to record his rides. The app relied on his phone’s location services, which fed his movements to Google. He looked up his route on the day of the March 29, 2019, burglary and saw that he had passed the victim’s house three times within an hour, part of his frequent loops through his neighborhood, he said.

    “It was a nightmare scenario,” McCoy recalled. “I was using an app to see how many miles I rode my bike and now it was putting me at the scene of the crime. And I was the lead suspect.”


    GlobusWorld 2020

    University of Chicago, Argonne National Laboratory


    Chicago, IL April 29-30. “GlobusWorld brings together researchers, systems administrators, developers and IT leaders from top computing centers, labs and universities around the U.S. The event is designed to provide new insights into managing research data and storage at scale, as well as building and using cutting-edge applications, portals and gateways.” [$$$]


    Call for Nominations: ISLS Early Career Award

    “The International Society of the Learning Sciences (ISLS) Early Career Award recognizes learning scientists who have made exciting and innovative contributions to the field during the early years following their Ph.D. Each year ISLS will select up to two awardees. Nominations can only be made by an ISLS member. Recipients will be announced at the annual ISLS conference.” Deadline for application materials is April 1.
    Tools & Resources

    I’ve started a slack called “gone virtual” — mostly around teaching online (which is something many of us will have to do very soon, many for the first time)

    Twitter, Zach Lieberman


    <-- if you want to join it, drop your info on this form and I'll send invites -->

    Google Cloud launches machine images to simplify data science workflows

    VentureBeat, Kyle Wiggers


    Google today announced machine images, a new type of Compute Engine resource in Google Cloud that contains all the information required to create, back up, or restore a virtual machine. The company claims it will reduce the time network admins and data scientists spend managing their cloud environments by eliminating extra steps and streamlining operations.

    Three Tricks to Amplify Small Data for Deep Learning

    datanami, Alex Woodie


    With the right techniques, data scientists and machine learning engineers can get in on the deep learning action, but without a huge corpus of training data up front.

    One of the technologists with a lot of experience making the most of smaller data is Vaibhav Nivargi, the CTO and co-founder of Moveworks, which develops IT ticket automation software. The company is just three-and-a-half years old and is backed by some of the top venture capitalists in Silicon Valley. But it doesn’t have a ton of IT tickets on which to train its predictive models, which posed a challenge to Nivargi.

    “IT tickets are not really the most voluminous types of data,” Nivargi says. “Even if you have a customer with several thousand employees, it’s a relatively infrequent activity to file an IT ticket. The data sets and relatively sparse and small. To be able to leverage more sophisticated techniques, running at very high levels of accuracy and precision, is highly non-trivial.

    National Wilderness Areas

    ArcGIS Hub


    “A map service depicting parcels of Forest Service land congressionally designated as wilderness such as National Wilderness Areas. This map service provides display, identification, and analysis tools for determining current boundary information for Forest Service managers, GIS Specialists, and others.”


    Internships and other temporary positions

    part-time grad student work

    New York University, Marron Institute; New York, NY, or Berlin, Germany
    Full-time positions outside academia

    Quantitative Researcher & Developer

    Hudson and Thames Quantitative Research; New York, NY

    Leave a Comment

    Your email address will not be published.