Data Science newsletter – May 18, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for May 18, 2019

GROUP CURATION: N/A

 
 
Data Science News



What’s the Price for Eating Healthy?

Westat


from

How healthy and thrifty are Americans’ food purchases? Until recently, that was a tough question to answer. However, not anymore. With the release of the Purchase-to-Plate Crosswalk (PPC) tool, researchers have the capability to assess the foods we buy to their nutritional content.

The PPC provides what was the missing link between barcoded retail data and nutritional databases maintained by the U.S Department of Agriculture (USDA). With this link, researchers can evaluate what we buy and how it measures up to established dietary guidelines, providing a significant resource for food economics research and policy evidence, according to the USDA.


As fintechs make inroads with customers, legacy banks push back

MIT Sloan School of Management, Kent Wosepka


from

Nicolas Kopp, the U.S. CEO of N26, a German online bank just this year establishing a beachhead in the United States, said it’s most important to “keep financial services as simple as possible for your customers. The long-term vision is not just to replicate a traditional incumbent bank today,” it’s to make better products that are easier to use and understand.

Kopp described how N26 uses machine learning and data mining to “provide certain products only to a select group of people. We can filter out in the background people for whom maybe a savings product is not relevant because they don’t have any money to save.”

Fintechs are likewise leveraging technology to offer customers additional products that fit their needs in the moment. Dan Westgarth, the U.S. CEO of the digital bank Revolut, which specializes in global fee-free spending, said, “If a customer enters an airport geolocation, we might offer them travel insurance. Similarly, if they break their phone and change SIM cards, we can push them mobile [phone] insurance.”


Digital Strategy for the Library of Congress

Library of Congress


from

The Library of Congress’s mission is to engage, inspire, and inform the Congress and the American people with a universal and enduring source of knowledge and creativity. To accomplish that mission, the Library is adopting a digital-forward strategy that harnesses technology to bridge geographical divides, expand our reach, and enhance our services. This document describes how we will secure the Library’s position in an increasingly digital world as we realize our vision that all Americans are connected to the Library of Congress.


Don Norman on how design fails older consumers

Fast Company, Don Norman


from

More people than ever are living long, healthy lives. According to the U.S. Department of Health and Human Services, the average life expectancy is 78.6 years for men and 81.1 for women. More relevant, however, is that as people grow older, their total life expectancy increases. So for those who are now 65, the average life expectancy is 83 for men and over 85 for women. And because I’m 83, I’m expected to live past 90 (but I’m aiming a lot higher than that). And these are averages, which means that perhaps half of us will live even longer.

Those of us who are still active and healthy at advanced ages–I qualify–discover that we aren’t quite as capable as our younger selves. That doesn’t mean that we aren’t healthy and workable–I still have a very active job and travel on business around the world, but I have to admit that I’m getting slower and weaker, with diminished eyesight, hearing, taste, touch, and, well, almost everything physical. The number of active, healthy oldsters is large–and increasing. We are not a niche market. And businesses should take note: We are good customers often with more free time and discretionary income than younger people.


Behind Twitter’s Plan To Get People To Stop Yelling At Each Other

BuzzFeed News, Nicole Nguyen


from

Twitter’s leadership knows that it’s fundamentally broken, and its latest attempt to fix itself shows how daunting of a task that will be.


Smart Parking and Sensors in the Age of IoT

Semi, Becky Oh


from

Unplanned or inadequately managed urban expansion leads to rapid sprawl, pollution and environmental degradation. Due to the lack of parking-space availability, for example, Japan is ranked among the most expensive countries for paid parking. If left unaddressed, poor parking management will continue to plague cities, both large and small. Fortunately, Smart City Internet of Things (IoT) initiatives are helping cities to address their parking issues.


Peeking behind the curtain of science legislation in the U.S.

Massive Science, Madison Hansen


from

The U.S. House of Representatives has a committee on Science, Space, and Technology. The committee discusses issues relating to science research and brings their findings back to the rest of the House. It is currently chaired by Representative Eddie Bernice Johnson (D-TX), the first woman and African-American to chair the committee. You can read a great interview with Rep. Johnson on her vision for Congress’ role in setting science policy here.

The committee’s job is to connect the representatives of the American people to science research agencies, boosting the benefit of science research to the country. You can watch video of the hearings on the committee’s website. Or, if you are in the Washington DC area, you can also attend the hearings in person.


Culture club: Why Danone is opening access to thousands of its yogurt strains

Food Dive, Cathy Siegner


from

Danone will open its collection of 1,800 yogurt strains for research to mark the 100th anniversary of the development of its first yogurt. The move underscores the company’s commitment to promoting open science and transparency in research, Danone said in a release.


Humanwide pilot project brings Precision Health vision to life

Stanford Medicine, Scope Blog


from

The idea behind Humanwide is simple: clinicians partner with patients to pull together their individual data, from lifestyle to DNA, and create a comprehensive picture of their health. The care team then helps the patient manage current health conditions and address future risks through a plan aligned with his or her personal goals.


The global internet is disintegrating. What comes next?

BBC – Future, Sally Adee


from

Nearly a decade on, that borderless spirit seems like a quaint memory. The nations who left the UN empty-handed had not been disabused of the notion that you could put a wall around your corner of cyberspace. They’ve simply spent the past decade pursuing better ways to make it happen.

Indeed, Russia is already exploring a novel approach to creating a digital border wall, and last month it passed two bills that mandate technological and legal steps to isolate the Russian internet. It is one of a growing number of countries that has had enough of the Western-built, Western-controlled internet backbone. And while Russia’s efforts are hardly the first attempt to secure exactly what information can and can’t enter a country, its approach is a fundamental departure from past efforts.

“This is different,” says Robert Morgus, a senior cybersecurity analyst at the New America Foundation. “Russia’s ambitions are to go further than anyone with the possible exceptions of North Korea and Iran in fracturing the global internet.”


The ‘3.5% rule’: how a small minority can change the world

BBC – Future, David Robson


from

There are, of course, many ethical reasons to use nonviolent strategies. But compelling research by Erica Chenoweth, a political scientist at Harvard University, confirms that civil disobedience is not only the moral choice; it is also the most powerful way of shaping world politics – by a long way.

Looking at hundreds of campaigns over the last century, Chenoweth found that nonviolent campaigns are twice as likely to achieve their goals as violent campaigns. And although the exact dynamics will depend on many factors, she has shown it takes around 3.5% of the population actively participating in the protests to ensure serious political change.

Chenoweth’s influence can be seen in the recent Extinction Rebellion protests, whose founders say they have been directly inspired by her findings. So just how did she come to these conclusions?


Alex Stamos on the security problems of the platforms’ content moderation, and what to do about them

Boing Boing, Cory Doctorow


from

After departing Facebook, Stamos started as new phase of his career as an academic in Stanford’s information warfare group, and in that capacity, he recently presented at UC Berkeley’s School of Information with a talk called “The Platform Challenge: Balancing Safety, Privacy and Freedom” at the schoo’s Dataedge 2019 conference.

The talk is an absolute must-watch.


Academic Data Science Alliance

Micaela Parker


from

The Academic Data Science Alliance (ADSA) is a newly launched initiative to advance the uptake of data science best practices in higher education. ADSA seeks to support efforts by and for university researchers to learn, use, and teach data-intensive tools and responsible applications, including the institutional changes needed to integrate data science into university research and training. By building networks of academic data science practitioners (including faculty, students, staff, and administrators), ADSA enables better sharing of knowledge, ideas, and lessons learned. ADSA grew out of the Moore-Sloan Data Science Environments and borrows heavily from the culture and values of this partnership. The ADSA initiative is funded for the next 3+ years, but we expect to secure additional funding to extend the work of the Alliance beyond the initial start up phase.


Apple, Google scrap for scarce AI expertise through acquisitions

Axios, Kaveh Waddell


from

Big Tech has snapped up more than 50 AI companies since 2010, carving out another front in the nonstop war among the giants for AI talent, data and ideas.

The big picture: The clamor reflects a scarcity of AI expertise, as we’ve reported in the past. But it also allows Big Tech companies to reinforce their advantage over the upstarts, each time making it harder for a new entrant to strike gold.

What’s happening: Several of the top AI researchers and most lucrative products at leading tech firms came from acquisitions, according to data compiled by CB Insights.

 
Events



Data & Ethics: In the Public Sector for the Public Good?

NYU Institute of Human Development and Social Change


from

New York, NY May 29, starting at 12 p.m., NYU Hemmerdinger Hall (100 Washington Square East). “This research-policy-practice conference will explore these critical ethical questions and data challenges as they relate to education, criminal justice, and interdisciplinary research.” [registration required]


Surge – Technology for Industry conference

Mid-Atlantic Venture Association


from

College Park, MD October 2 at the University of Maryland. “SURGE will feature an exclusive, invitation-only session focused on key industry trends and areas of mutual interest to leaders of industry, investment professionals, academia and high-growth companies.” [save the date]


Physics in the Dark: Searching for the Universe’s Missing Matter

World Science Festival


from

New York, NY May 29, starting at 8 p.m., part of the World Science Festival. “Join leading physicists on a scientific treasure hunt that has proved more challenging than anyone expected, and may ultimately require rethinking some of our most fundamental ideas about the universe.”

 
Deadlines



Why R? 2019

Warsaw, Poland September 26-29. Deadline for workshop submissions is May 31. Deadline for abstracts submissions is June 28.
 
Tools & Resources



Census Academy – Your Virtual Hub for Learning Data Skills

U.S. Census Bureau


from

“The Census Academy is our free training hub with a focus on digital content. Users of all skill levels can learn how to access and use Census Bureau data through courses, how-to videos, and webinars.”


The Turing Way

Alan Turing Institute, Kirstie Whitaker


from

“The Turing Way is a lightly opinionated guide to reproducible data science. Our goal is to provide all the information that researchers need at the start of their projects to ensure that they are easy to reproduce at the end.”


Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model

Google AI Blog, Ye Jia and Ron Weiss


from

In “Direct speech-to-speech translation with a sequence-to-sequence model”, we propose an experimental new system that is based on a single attentive sequence-to-sequence model for direct speech-to-speech translation without relying on intermediate text representation. Dubbed Translatotron, this system avoids dividing the task into separate stages, providing a few advantages over cascaded systems, including faster inference speed, naturally avoiding compounding errors between recognition and translation, making it straightforward to retain the voice of the original speaker after translation, and better handling of words that do not need to be translated (e.g., names and proper nouns).


Data Curation Network Primers

University of Minnesota Libraries, Digital Conservancy


from

“Archived primers from the 2018-2020 Specialized Data Curation Workshops presented by the Data Curation Network and funded by a grant from the Institute for Museum and Library Services (IMLS RE-85-18-0040-18). Data curation primers are interactive, living documents that detail a specific subject, disciplinary area or curation task and that can be used as a reference to curate research data.”


Small Files, Big Foils: Addressing the Associated Metadata and Application Challenges

Cloudera Engineering Blog, Shashank Naik and Bhagya Gummalla


from

Small files are a common challenge in the Apache Hadoop world and when not handled with care, they can lead to a number of complications. The Apache Hadoop Distributed File System (HDFS) was developed to store and process large data sets over the range of terabytes and petabytes. However, HDFS stores small files inefficiently, leading to inefficient Namenode memory utilization and RPC calls, block scanning throughput degradation, and reduced application layer performance. In this blog post, we will define the issue of small file storage and examine ways to tackle it while keeping the complications at bay.

 
Careers


Full-time positions outside academia

Sr. Data Analyst



Joymode; Los Angeles, CA

Behavioral Scientist



Ancestry; San Francisco, CA
Full-time, non-tenured academic positions

Research Associate in Reinforcement Learning



University of Oxford, Department of Computer Science; Oxford, England

Leave a Comment

Your email address will not be published.