Data Science newsletter – May 16, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for May 16, 2018


Data Science News

Open Letter: On the Shortage of Computer Science Faculty

Haverford Clerk, Daniel Feshbach


Dear Education Policy Committee,

By now, many of you have probably seen the April Fools Day decorations that the Computer Science (CS) Majors put up this year, centering around our department’s critical shortage of faculty. In case you have not, we have included some pictures here in order to highlight the urgency communicated in these efforts. As computing has become increasingly central to our society and understanding it has also become increasingly important to a well-rounded education, our dire shortage of faculty has created what can only be described as a crisis for students interested in Computer Science, lotterying us out of required introductory and upper-level classes. Without more tenure-track hiring, the enrollment situation will only get worse. The lack of faculty leaves glaring gaps in our curricular offerings — as we lack experts in crucial areas of CS — and effectively destroys the department’s ability to be part of a balanced liberal arts education for non-majors.

We are writing to implore the Education Policy Committee to approve the Computer Science department’s request to conduct a search for at least two tenure-track faculty positions.

candidate: Tweet of the Week

Twitter, Arthur Charpentier


63: Take your data and go home

Make Me Smart with Kai and Molly podcast


What would a Hippocratic oath look like for the people we trust with our data? That’s one of the questions NYU professor Laura Norén asks in her course “Ethics for Data Science.” Consumers should be pushing for more empowered, informed consent, she says, because right now they have two choices: blindly agree to give up your data to [insert social media or digital platform here] or quit altogether. We’ll start there, and somehow end up at trans-humanism — it’s sci-fi stuff, but it’s where the privacy conversation wants to go, if you let it. But first, speaking of Facebook: Let’s talk about that apology ad the company is running everywhere. Check out for more, including that ad and the latest from the privacy fiasco. [audio, 42:07]

Probing for data: How a new backcountry citizen science effort is growing the snow-depth database

Backcountry magazine, Louise Lintilhac


Data collection for scientific research can be a tedious task, especially when it comes to gathering information from remote, wintry climates. But a group of scientists based out of the University of Oregon, University of Washington and University of Alaska have joined forces to mobilize an army of backcountry citizen scientist to gather crowd-sourced snow-depth data.

This broad, new citizen science-based program—dubbed Community Snow Observations and funded by NASA’s earth science division—is simple. When skiers and riders head into the backcountry, they can use their probes to take snow-depth readings wherever they are and then upload the measurement to one of two preexisting crowd-source apps: Mountain Hub, a platform for non-professional recreationists, or Snow Pilot, an app geared toward forecasters and guides. The data is then assembled and used by scientists to assess and interpret information gathered through aerial means by NASA and other agencies to improve water runoff models.

Professor David Hill at the University of Oregon is one of the scientists working on this new crowd-sourced data-collection model, and he sees great potential for this new way of activating citizens in the field to get a large amount of data—and quickly—so the information can then be plugged into a number of different models.

Biglaw Firms Become Tech Incubators

Above the Law, Robert Ambrogi


Increasingly, innovative legal technology is emerging from an incongruous breeding ground — large law firms. And the firms that are incubating these companies are most likely to be found not in the U.S., but in the U.K.

Three of these incubators have been in the news lately, as they have announced their second startup cohorts, and all are based in London: Fuse, a tech innovation space opened by Allen & Overy last September; MDR LAB, founded in May 2017 by Mishcon de Reya; and Fintech Fast Forward, launched in March 2017 by Slaughter and May.

Alternative models for sharing confidential biomedical data

Nature Biotechnology, Justin Guinney & Julio Saez-Rodriguez


Although much discussion has been focused on the need for more data sharing in the biomedical community, less attention has been paid to new kinds of biomedical data sharing, particularly the sharing of confidential patient data. In the traditional paradigm of data sharing, researchers transfer their data directly to data modelers. Here we describe an alternative model that allows the protection of confidential data through a process we term ‘model to data’ (MTD). In the MTD model, the flow of information between data generators and data modelers is reversed. This new sharing paradigm has been successfully demonstrated in crowdsourced competitions and represents a promising alternative for increasing the use of data that cannot—or will not—be more broadly shared.

Biomedical studies generate vast clinical, radiologic, cellular and molecular data sets, and enable new basic and translational science. However, there is substantial disagreement around the best ways to share these valuable assets, particularly in the context of clinical trials.

Credit score ratings: Is artificial intelligence scoring more fair?

Quartz, Rachel O'Dwyer


Money2020, the largest finance tradeshow in the world, takes place each year in the Venetian Hotel in Las Vegas. At a recent gathering, above the din of slot machines on the casino floor downstairs, cryptocurrency startups pitched their latest coin offerings, while on the main stage, PayPal President and CEO Dan Schulman made an impassioned speech to thousands about the globe’s working poor and their need for access to banking and credit. The future, according to PayPal and many other companies, is algorithmic credit scoring, where payments and social media data coupled to machine learning will make lending decisions that another enthusiast argues are “better at picking people than people could ever be.”

Credit in China is now in the hands of a company called Alipay, which uses thousands of consumer data points—including what they purchase, what type of phone they use, what augmented reality games they play, and their friends on social media—to determine a credit score. In a culture where the elderly casually pull out their phones to pay for groceries and even the homeless don QR codes to accept donations, there’s plenty of data to draw on. And while the credit score can dictate the terms of a loan, it also acts as a proxy for general good character. In China, having a high credit rank can help your chances of accessing employment, for example, or of getting a visa to travel within Europe, and even finding a partner via online dating. One Chinese dating site,, offers greater visibility to users with high credit scores.

And all of it is dictated by the algorithm.

Researcher studies importance of synchrony in ecological systems such as food webs

The University of Kansas, KU News Service


A recent $426,609, three-year grant from the National Science Foundation will enable a new investigation of how populations in different places react to changes in environmental factors (like temperature, rainfall) over time and space, which researchers dub “synchrony.” Spatial synchrony of population dynamics can be seen in organisms as diverse as mammals and single-cell organisms, at distances up to thousands of miles, and relates to resource shortages and pest outbreaks.

“Synchrony is when multiple populations in different places all fluctuate in the same way,” said Daniel Reuman, professor of ecology & evolutionary biology at the University of Kansas and senior scientist with the Kansas Biological Survey. “One of the causes of population synchrony is synchrony of environmental variables — like the amount of rainfall. When rainfall in different areas is correlated through time, like a drought that’s spread across a large area, that will affect all populations across that area in similar ways.”

Alligators on Beaches May Become the Norm

JSTOR Daily, Lina Zeldovich


In the recent past, alligators, wolves, leopards, mountain lions, and other large predators have been increasingly spotted in places where people didn’t expect them. Thanks to conservation efforts, these large predators—vital to their ecosystems and previously hunted nearly to extinction—are rebounding. But that also means that humans may find themselves face to face with them more often.

Nobody anticipated that new development. Scientists expected that when these predators rebounded, they would expand their ranges and venture into new land in search of food. People didn’t really think that the animals would try to return to the lands now occupied by humans. But a new study by Brian Silliman, associate professor of marine conservation biology at Duke’s Nicholas School of the Environment, and his colleagues, found otherwise. Silliman and his team syndicated data from several recent scientific studies and government reports. They found that alligators, sea otters, river otters, gray whales, gray wolfs, mountain lions, orangutans and bald eagles, among other large predators are increasingly expanding their habitats, some of which are encroaching on human locales.

“We can no longer chock up a large alligator on a beach or coral reef as an aberrant sighting,” said Silliman. “It’s not an outlier or short-term blip. It’s the old norm, the way it used to be before we pushed these species onto their last legs in hard-to-reach refuges. Now, they are returning.”

Greater Productivity Main Driver for Legal AI Adoption: Survey

Artificial Lawyer


The survey by IQPC, ahead of its Legal AI Forum event later this year (18 -19 September in London), gathered feedback from around 200 legal professionals and the following is what they found:

  • The biggest driver for the use of AI and automation is to increase productive capacity, (see below), that is to say, to increase the overall capability of the firm, or inhouse team, to do more work. Also, although not mentioned in the responses, by being able to do more, this will tend to mean a group of lawyers being able to automate more process level tasks and in turn be more productive when it comes to focusing on higher level, more complex matters. In short, get more done, where it matters most. Interestingly, cost alone was not the biggest driver, nor was efficiency – although improvements to both would be directly linked to any increase in productivity nevertheless. It’s especially positive to see how lawyers see the use of AI and automation as very much an economic and business issue – which fundamentally it is, as it changes ‘the means of production’. We can discuss the technical aspects of AI, or its impact on data analysis, for as long as we like, but ultimately this is about economics and productivity, or, as Artificial Lawyer likes to put it, ‘the industrialisation of the law’.

  • San Francisco will use new technology to automatically reduce marijuana convictions under Prop. 64: ‘It’s about leveling the playing field’

    Los Angeles Times, Sarah Parvini


    When recreational marijuana use was legalized in California, it presented an opportunity to reduce or expunge convictions for possession crimes that made it harder for some people to get ahead in life.

    Since then, some counties have worked to address those convictions, taking on the lengthy bureaucratic process so that people would not have to wade through the legal world on their own.

    San Francisco led the charge, announcing in January that the district attorney’s office would retroactively apply the new marijuana law to prior convictions dating as far back as 1975. But for prosecutors, the chance to change those convictions also came with a challenge: It required a lot of resources to plow through thousands of cases.

    On Tuesday, Dist. Atty. George Gascón announced what he believes to be the solution. San Francisco is working with a nonprofit organization to create a program that would automatically clear eligible convictions under California’s new marijuana legalization law.

    Despite Investment in Deep Learning, AI Talent Pool is Shallow

    Dataversity, Ben Lorica


    According to the Global AI Talent Report 2018, there are currently 22,000 PhD-educated AI researchers. Another report from Tencent claims that there are over 200,000 active developers in the industry, along with another 100,000 students and academic researchers. While this sounds like a bright outlook, for deep learning to succeed in the enterprise, there’s a need for developers who aren’t PhDs – and we need millions of them, not thousands.

    Making Deep Learning accessible to developers and domain experts in other disciplines – who aren’t doctors – is essential for progress. And it’s not all bad news, although a large majority of survey respondents reported that lack of skilled people was a bottleneck, tools for using Deep Learning have become easier to use and the underlying math used in its applications is within reach of most developers. Additionally, training is becoming a bigger focus for organizations, whether it’s provided in-house or led by a third-party company. Looking for solutions beyond the PhD pool is vital to filling the AI skills gap at the rate of technological change.

    Face recognition police tools ‘staggeringly inaccurate’

    BBC News, Chris Foxx


    Police must address concerns over the use of facial recognition systems or may face legal action, the UK’s privacy watchdog says.

    Information Commissioner Elizabeth Denham said the issue had become a “priority” for her office.

    An investigation by campaign group Big Brother Watch suggested the technology flagged up a “staggering” number of innocent people as suspects.

    But police have defended the technology and say safeguards are in place.

    This start-up made connected toothbrushes — now it aims to overthrow the ‘primitive’ dental insurance industry

    CNBC, Christina Farr


    Alex Frommeyer comes from a family of dentists. Now, he has a start-up that sells dental insurance, but with a twist.

    The Columbus, Ohio, company, called Beam Dental, started out by selling Bluetooth-connected toothbrushes, meaning that their product communicates with an app.

    Once they got people using it, the founders saw an opportunity to use more than a year’s worth of data they collected to move into the dental insurance market. The idea is to figure out which of their users are regularly flossing and brushing their teeth, and therefore less likely to run into expensive problems like root canals and cavities, and offer them cheaper rates and other incentives.

    That insurance product, which is now available in 16 states, earned the company a fresh round of $22.5 million in financing led by Kleiner Perkins.

    Walmart’s next healthcare move: Using data to identify bad doctors

    TechRepublic, Veronica Combs


    Data analysis is at the heart of Walmart’s new approach to healthcare, which will only include doctors who practice evidence-based medicine.


    IEEE VIS 2018



    Berlin, Germany October 21-26. “The worldwide, largest and most important conference on Scientific Visualization, Information Visualization and Visual Analytics.” [$$$]


    2018 Conference on Cognitive Computational Neuroscience

    Philadelphia, PA September 5-8. “CCN is a forum for discussion among cognitive science, neuroscience, and artificial intelligence researchers dedicated to understanding the computations that underlie complex behavior.” Deadline for paper submissions is May 25.

    Voice Assistants for All Challenge and Hackathon

    “The challenge is to use a home-based voice controlled intelligent personal assistant driven by artificial intelligence (AI) to develop a health advocate for all populations.” Deadline for June 12. Grand prize is $10,000.

    New course models the future of humans and the planet

    “Supported by the James S. McDonnell Foundation, this interdisciplinary, collaborative introduction to modeling and evaluating the future of human populations and their environments will bring leading scientists together with graduate students and other early career researchers over four days this October 13-16.” Deadline for applications is July 11.
    Tools & Resources

    A collection of public transport network data sets for 25 cities

    Nature, Scientific Data; Rainer Kujala, Christoffer Weckström, Richard K. Darst, Miloš N Mladenović & Jari Saramäki


    “To lower the barrier for using GTFS data in research, we publish a curated collection of 25 cities’ public transport networks in multiple easy-to-use formats including network edge lists, temporal network event lists, SQLite databases, GeoJSON files, and the GTFS data format. This collection promotes the study of how PT is organized across the globe, and also provides a testbed for developing tools for PT network analysis and PT routing algorithms.”

    Coordinate systems in ggplot2: easily overlooked and rather underrated

    STATWORX Blog, Lea Waniek


    “All plots have coordinate systems. Perhaps because they are such an integral element of plots, they are easily overlooked. However, in ggplot2, there are several very useful options to customize the coordinate systems of plots, which we will not overlook but explore in this blog post.”

    Software Engineering Takeaways

    John Regehr, Embedded in Academia blog


    I had a great time this spring teaching a software engineering course for a new professional masters degree program created by my department. Since I didn’t use slides or hand out lecture notes, some students were asking if maybe I could write up a summary of what I wanted them to learn in the course. This sounded like a good idea and I figured I’d make a blog post out of it.

    This course didn’t have a textbook, but we ended up going through much of Writing Solid Code and also several chapters of Code Complete. I particularly like Appendix D of the 2nd edition of Writing Solid Code, where Maguire describes how he conducts an interview for a programming position. The students also completed about a dozen (mostly) programming assignments, individually and in groups, that I’m not going to talk about here.


    Full-time positions outside academia

    Research Scientist, Control Systems

    United Technologies Research Center; Cork, Ireland

    Leave a Comment

    Your email address will not be published.