Data Science newsletter – February 16, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for February 16, 2017


Data Science News

Sensely raises $8M for AI-powered virtual nurse app, eyes large-scale partnerships

MobiHealthNews, Heather Mack


The app employs a fleet of AI-powered nurse avatars to clinics and patients with the aim of keeping both parties in communication between office visits, as well as prevent hospital readmission and develop ongoing lifestyle behaviors to promote health. The company has a core, rule-based engine algorithm built around clinical demarcations and protocols for specific diseases, and Odessky said is always looking to expand their capabilities.

“It’s always improving, learning more and more from patients, and as we partner with more organizations, bring in more disease states and research, we’ll layer in more protocols and content,” he said.

Government Data Science News

Scott Pruitt, a climate change denier, has been appointed the head of the Environmental Protection Agency despite objections from EPA staff. A bill to eliminate the EPA has been introduced into Congress.

The Department of Homeland Security (DHS) is “scrambling” to install facial recognition technology at US airports. This has attracted the attention of the American Civil Liberties Union whose legislative counsel on this case, Neema Singh Guliani, used to work for DHS.

Is the United States going to lose scientists to other countries without travel bans? Will we see climate science silenced? Under [former Canadian Prime Minister] Harper, the Canadian government actively silenced climate scientists, forcing them to pass all interview transcripts and conference presentations through federal offices for pre-approval. Approval was often denied.

Existing and impending European Union (EU) regulations on data continue to challenge the burgeoning field of personal AI. Fortune reported from a panel of EU-AI experts that, “the [upcoming] EU regulation leans heavily on protecting the personal data of an individual, so companies operating in the European Union have to be extremely careful with how they handle and process their customer data. If the EU determines that a company’s use of data analytic and automation technologies ends up discriminating against certain groups of people, or if a business is unable to fix potential problems with their technologies, they face stiff fines that can cost them up to 4 percent of their overall revenue.”

University Data Science News

An unnamed university was attacked by a seafood-loving IoT botnet running from its vending machines, light bulbs, etc. I cannot make this stuff up… but I did warn you about IoT security issues several weeks ago. And here’s a study that found 170m unsecured IoT devices in the US.

Seventeen schools joined the New York State Attorney General’s lawsuit against Trump’s travel ban. NYU was not invited to join the suit.

New York University’s David Heeger published a paper that “attempt[s] to push computer vision and visual neuroscience/computational neuroscience forward” by developing a “unified theory” of cortical functioning.

Women are getting promoted to administrator positions more often, but even in academia, there’s a pay gap problem.

Laura Haas has been promoted to Dean of UMassAmherst’s College of Information and Computer Sciences.

Brian Wansink’s Food and Brand lab at Cornell University may have fabricated analysis in four or more papers, this after he publicly flamed a postdoc for not “unhesitatingly” saying “yes” to work on said papers.

UC-Santa Barbara geographer Tim DeVries published an explanation for ocean acidification in Nature last week: less overturning of deep CO2-rich water enables greater absorption of CO2 at the surface, increasing overall acidification.

Marine biologists at Duke University and McGill are using drones to surveil whales, penguins, and microbes in Antarctica. The biggest upside? Fewer marine biologists are dying in small plane crashes while trying to capture data. Canadian br

The University of Massachusetts-Amherst bought an expensive 400 GPU cluster hoping to attract students interested in machine learning. In my humble experience here at NYU, while it’s nice to have massive hardware, students are better off finding schools with top-notch curriculum and faculty who care about teaching and mentoring. The fancy new GPUs may keep a professor or two from fleeing to industry, though, which is a different kind of win.

Cancer Research UK Cambridge Institute and the Wellcome Trust Sanger Institute were to predict remission rates in acute myelogenous leukemia with 100 percent accuracy.

MIT “researchers have built a low-power chip specialized for automatic speech recognition” that should make voice recognition ubiquitous. To celebrate or to quiver in surveillant fear, that is the question.

Perimeter Institute researchers used TensorFlow to detect phase changes in matter. They call the project Phasebook. Right. Of course they do.

A trio of researchers finds that songbirds in arid regions may face die-off events if climate change triggers more heat waves, as has predicted.

Several of my colleagues and I at NYU are excited about Data for Democracy, a group of data scientists working on projects that promote healthy civic values.

ASAPbio has gotten enough funding to make its dream come true; it wants to be the one-stop shop for life sciences preprints. Now to convince people to stop using -> biology, bioRxiv, chemRxiv, psyArXiv…

A survey of NSF-funded biological PIs found that what they need most is not a pre-print archive, but “training in data integration, data management, and scaling analyses for HPC”.

The Gates Foundation partnered with American Association for the Advancement of Science to allow all of its grantees to publish in Science journals under a Creative Commons Attribution License (CC BY). Meanwhile, Ross Mounce found that Elsevier continues to sell access to articles that the Wellcome Trust paid to be available via open access, a charge Elsevier has sustained in 2014, 2015, 2016, and now 2017.

Do your papers get more attention if you promote them on social media? According to Xianwen Wang, yes!, the impact is strong but ephemeral.

Revealing physical interaction networks from statistics of collective dynamics

Science Advances; Mor Nitzan, Jose Casadiego and Marc Timme


Revealing physical interactions in complex systems from observed collective dynamics constitutes a fundamental inverse problem in science. Current reconstruction methods require access to a system’s model or dynamical data at a level of detail often not available. We exploit changes in invariant measures, in particular distributions of sampled states of the system in response to driving signals, and use compressed sensing to reveal physical interaction networks. Dynamical observations following driving suffice to infer physical connectivity even if they are temporally disordered, are acquired at large sampling intervals, and stem from different experiments. Testing various nonlinear dynamic processes emerging on artificial and real network topologies indicates high reconstruction quality for existence as well as type of interactions. These results advance our ability to reveal physical interaction networks in complex synthetic and natural systems.

Diehard Coders Just Rescued NASA’s Earth Science Data

WIRED, Science, Megan Molteni


On Saturday morning, the white stone buildings on UC Berkeley’s campus radiated with unfiltered sunshine. The sky was blue, the campanile was chiming. But instead of enjoying the beautiful day, 200 adults had willingly sardined themselves into a fluorescent-lit room in the bowels of Doe Library to rescue federal climate data.

Like similar groups across the country—in more than 20 cities—they believe that the Trump administration might want to disappear this data down a memory hole. So these hackers, scientists, and students are collecting it to save outside government servers.

Job trends for R and Python

R-bloggers, David Smith


When we last looked at job trends from, job listings for “R statistics” were on the rise but were still around half the volume of listings for “SAS statistics”. Three-and-a-half years later, R has overtaken SAS in job listings for “statistics”.

AI’s Factions Get Feisty. But Really, They’re All on the Same Team

WIRED, Business, Cade Metz


Artificial intelligence is not one thing, but many, spanning several schools of thought. In his book The Master Algorithm, Pedro Domingos calls them the tribes of AI.

As the University of Washington computer scientist explains, each tribe fashions what would seem to be very different technology. Evolutionists, for example, believe they can build AI by recreating natural selection in the digital realm. Symbolists spend their time coding specific knowledge into machines, one rule at a time.

Right now, the connectionists get all the press. They nurtured the rise of deep neural networks, the pattern recognition systems reinventing the likes of Google, Facebook, and Microsoft. But whatever the press says, the other tribes will play their own role in the rise of AI.

The Benefits of a “Central Service” for Biology Preprints



Eleven major international funding agencies (Wellcome Trust, National Institutes of Health, Medical Research Council (UK), Helmsley Trust, Howard Hughes Medical Institute (HHMI), European Research Council, Simons Foundation, Canadian Institutes for Health Research, Alfred P. Sloan Foundation, Department of Biotechnology (Government of India), Laura and John Arnold Foundation) have released a statement calling for further technology development and the creation of a central resource for preprints, which is being provisionally called the Central Service (CS). The CS will be a database that aggregates preprints from multiple sources, making them easier to read by humans and machines. These features will enable scientists to find new knowledge that can accelerate their research. The CS will be overseen by a scientist-led governing body, which will ensure its mission in serving the scientific community and the public good.

5 Great Places to Live as a Data Scientist

Coursera blog


The demand for data scientists is growing in nearly every field worldwide. While San Francisco remains a mecca for tech jobs, data-driven corporations are setting up shop around the globe. Whether you want to work in an established research facility or with an emerging innovative startup, in the U.S. or abroad, begin your search in one of these five exciting cities that are bursting with career opportunities for current and aspiring data scientists.

1. Raleigh, NC

Tracking the digital footprints to scholarly articles: the fast accumulation and rapid decay of social media referrals

Impact of Social Sciences, Xainwen Wang


Academics are increasingly encouraged to share their scholarly articles via social media, as part of a wider drive to maximize their dissemination and engagement. But what effect does this have? Xianwen Wang has studied the referral data of academic papers, with particular focus on social media referrals and how these change over time. Referrals from social media do indeed account for a significant number of visits to articles, especially in the days immediately following publication. But this fast initial accumulation soon gives way to a rapid decay.


The vital role of government statistics: Strengthening research, governance, and innovation

American Enterprise Institute


Washington, DC On March 2, the Hamilton Project at Brookings Institution and the American Enterprise Institute will convene a policy luncheon discussion, highlighting the important role of government statistics. [rsvp required]

2017 Annual Workshop of the Northeast Big Data Innovation Hub

Northeast Big Data Innovation Hub


New York, NY February 24, Columbia University. [free, registration required]

Beyond the Professoriate – A Virtual Conference for PhDs in Career Transition

Jennifer Polk (From PhD to Life) and Maren Wood (Lilli Research Group).


Online May 6 and May 13, organized by Jennifer Polk and Maren Wood. [$$]

The 51st CREST Open Workshop – Tutorial on Landscape Analysis

University College London, Centre for Research on Evolution, Search and Testing


London, England February 27-28. Understanding the nature of the underlying structure of the search landscape is important for Search Based Software Engineering (SBSE). The purpose of this pair of distinguished tutorials is to help the Software Engineering community to raise its appreciation and potential application of landscape analysis to problems in SBSE. [register by email]

Audacious Innovation: Crossing Sector Boundaries to Create a New World

Harvard Kennedy School, Center for Public Leadership


Cambridge, MA A panel composed of experts on cross-sector and cross-industry innovation and environment creation. February 24, 12 noon, at Harvard Kennedy School. [rsvp required]

ACM at SXSW Interactive



Austin, TX ACM will host an official SXSW party featuring successful tech entrepreneurs and investors, and sponsored sessions with Eric Horvitz and Jeff Heer. March 11-13. [$$$$]

ASA Sponsors DSAA2017

4th IEEE International Conference on Data Science and Advanced Analytics


Tokyo, Japan Recognizing that statistics is one of three foundational areas of data science, the American Statistical Association is sponsoring the 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA2017) October 19–21 in Tokyo, Japan.

Tools & Resources

Apache Kafka: The Cornerstone of an Internet-of-Things Data Platform

The New Stack, Janakiram MSV


“If you are a developer considering IoT as a career option, it is time for you to start investing in Apache Kafka. This article explores the role that Apache Kafka plays in deploying a scalable IoT solution.”

early-career-hawaii/ at master

GitHub – DDD-Moore


Long list of tools & resources from the recent Early Careers workshop put on by Moore-Sloan Data Science Environment.

Spanner, the Google Database That Mastered Time, Is Now Open to Everyone

WIRED, Business, Cade Metz


Spanner “was the first global database, a way of storing information across millions of machines in dozens of data centers spanning multiple continents, and it now underpins everything from Gmail to AdWords, the company’s primary moneymaker. But it’s not just the size of this creation that boggles the mind. The real trick is that, even though Spanner stretches across the globe, it behaves as if it’s in one place.”

Release TensorFlow 1.0.0

GitHub – tensorflow



  • “XLA (experimental): initial release of XLA, a domain-specific compiler for TensorFlow graphs, that targets CPUs and GPUs.”
  • “TensorFlow Debugger (tfdbg): command-line interface and API.”
  • And more.

  • [1702.01715] Software Engineering at Google

    arXiv, Computer Science > Software Engineering; Fergus Henderson


    “We catalog and describe Google’s key software engineering practices.”


    Full-time positions outside academia

    Instagram, Software Engineer – Machine Learning

    Facebook; Menlo Park, CA

    Research Engineer (AI)

    Facebook; Menlo Park, CA, Paris, France, and New York, NY

    Postdoctoral Researcher (AI)

    Facebook; Menlo Park, CA, Paris, France, Seattle, WA, and New York, NY

    Open Postdoctoral Research Position in Programming Languages and Security

    University of Maryland, Laboratory for Programming Languages Research; College Park, MD
    Internships and other temporary positions

    Research Scientist, PhD University Grad (Machine Learning)

    Facebook; Seattle, WA

    AI Safety Research Internship

    University of California-Berkeley, Machine Intelligence Research Institute; Berkeley, CA

    The Alan Turing Institute Internship Programme 2017

    The Alan Turing Institute; London, England
    Tenured and tenure track faculty positions

    Chair, School of Interactive Computing

    Georgia Institute of Technology; Atlanta, GA

    Leave a Comment

    Your email address will not be published.