Data Science newsletter – April 29, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for April 29, 2019

GROUP CURATION: N/A

 
 
Data Science News



How Technology Could Revolutionize Refugee Resettlement

The Atlantic, Krishnadev Calamur


from

The rise of populism has brought with it a concomitant hostility toward immigrants in general and refugees in particular. Last October, a gunman who had previously posted anti-Semitic messages online against HIAS killed 11 worshippers in a Pittsburgh synagogue. Many of the policy arguments over resettlement have shifted focus from humanitarian relief to security threats and cost. The Trump administration has drastically cut the number of refugees the United States accepts, and large parts of Europe are following suit.

If it works, Annie could change that dynamic. Developed at Worcester Polytechnic Institute in Massachusetts, Lund University in Sweden, and the University of Oxford in Britain, the software uses what’s known as a matching algorithm to allocate refugees with no ties to the United States to their new homes. (Refugees with ties to the United States are resettled in places where they have family or community support; software isn’t involved in the process.)

Annie’s algorithm is based on a machine learning model in which a computer is fed huge piles of data from past placements, so that the program can refine its future recommendations. The system examines a series of variables—physical ailments, age, levels of education and languages spoken, for example—related to each refugee case. In other words, the software uses previous outcomes and current constraints to recommend where a refugee is most likely to succeed.


Academic air travel has a limited influence on professional success

Journal of Cleaner Production; Seth Wynes, Simon D. Donner, Steuart Tannason, Noni Nabors


from

Lowering the growth in greenhouse gas emissions from air travel may be critical for avoiding dangerous levels of climate change, and yet some individuals perceive frequent air travel to be critical to their professional success. Using a sample of 705 travellers at the University of British Columbia, we investigated the influence of career stage, research productivity, field of expertise, and other variables on academic air travel and the associated emissions. This is the first time that research has evaluated the link between observed air travel and academic success. First, we compared air travel behaviour at different career stages and found that individuals at the start of their careers were responsible for fewer emissions from air travel than senior academics. Second, since career advancement may depend on an academic’s ability to form partnerships and disseminate their research abroad, we investigated the relationship between air travel emissions and publicly available bibliometric measurements. We found no relationship between air travel emissions and metrics of academic productivity including hIa (h-index adjusted for academic age and discipline). There was, however, a relationship between emissions and salary that remains significant even when controlling for seniority. Finally, based on the premise that academics studying topics related to sustainability may have greater responsibility or motivation to reduce their emissions, we coded 165 researchers in our sample as either “Green” or “Not-green.” We found no significant difference between Green and Not-green academics in total air travel emissions, or in the types of emissions that might be easiest to avoid. Taken together, this preliminary evidence suggests that there may be opportunities, especially for academics who study topics related to climate and sustainability, to reduce their emissions from air travel while maintaining productive careers. [abstract]


How Twitter Users Compare to the General Public

Pew Research Center, Stefan Wojcik and Adam Hughes


from

U.S. adult Twitter users are younger and more likely to be Democrats than the general public. Most users rarely tweet, but the most prolific 10% create 80% of tweets from adult U.S. users


IBM expands universities in its quantum computing research network

ZDNet, Between the Lines blog, Larry Dignan


from

While these research areas are fluid and early stages, IBM is laying down the groundwork for quantum advances and ensuring there are people able to work in the field and ultimately expand it.


The impact @mer__edith has in AI ethics is second to none.

Twitter, Moritz Hardt


from

What happens to her at Google will be a gauge for the wellbeing of the entire field. Watch closely.


DataCamp Teachers Boycott Their Own Classes Following Sexual Misconduct by Executive

VICE, Motherboard, Arielle Gordon


from

Instructors say that the online data science learning platform has failed to meaningfully address an incident that happened at a company gathering in 2017.


[D] “Everyone building machine learning products has a responsibility to understand that many users have misconceptions about the accuracy and ‘objectivity’ of ML”

reddit.com/r/MachineLearning


from

Rachel Thomas on Twitter has identified cases of misuse and abuse of ML. The linked thread from her tweet includes examples worth sharing and being aware of. [120 comments as of April 29]


Vue.ai raises $17M in Series B

Vue.ai blog, Ashwini Asokan


from

It’s a big milestone here at the Vue.ai headquarters. We’ve raised $17M in a series B led by Falcon Edge Capital, with participation from existing investors Sequoia and Global Brains (KDDI Japan).

At Vue.ai, our mission is to leverage the power of Image Recognition and Data Science to help retailers grow rapidly and become 10x more efficient across functions.


Internet of Things: Energy boon or bane?

Science, Perspective, Eric Hittinger and Paulina Jaramillo


from

Since the dawn of the internet, a digital revolution has transformed life for millions of people. Digital files have replaced paper, email has replaced letters, and cell phones provide access to many services that facilitate daily life. This digital revolution is not over, and there is now a growing deployment of technologies grouped under the term “Internet of Things” (IoT)—a worldwide network of interconnected objects that are uniquely addressable via standard communication protocols (1). By 2020, there may be as many as 30 billion objects connected to the internet (2), all of which require energy. These devices may yield direct energy savings (3, 4), but it is much less clear what their net effect on the broader energy system will be. Scientists and regulators will need to work together to ensure that the IoT’s benefits do not come at the expense of rising energy use.


Are the dead taking over Facebook? A Big Data approach to the future of death online

Big Data & Society journal; Carl J Öhman, David Watson


from

We project the future accumulation of profiles belonging to deceased Facebook users. Our analysis suggests that a minimum of 1.4 billion users will pass away before 2100 if Facebook ceases to attract new users as of 2018. If the network continues expanding at current rates, however, this number will exceed 4.9 billion. In both cases, a majority of the profiles will belong to non-Western users. In discussing our findings, we draw on the emerging scholarship on digital preservation and stress the challenges arising from curating the profiles of the deceased. We argue that an exclusively commercial approach to data preservation poses important ethical and political risks that demand urgent consideration. We call for a scalable, sustainable, and dignified curation model that incorporates the interests of multiple stakeholders. [full text]


the distribution of student loan debt

scatterplot blog, Dan Hirschman


from

Sometimes it’s hard to find basic descriptive statistics. Even when data exist that could provide the relevant descriptives, it still takes work to analyze them. Social scientists don’t have much incentive to produce high quality descriptive statistics, except as incidental by-products of our typical, more complicated statistical analyses (usually aimed at teasing out causation or explanations). And yet, sometimes, all you (or, in this case, a friend) wants to know is, something seemingly straightforward like: What’s the relationship between income and having student loan debt? How much do people owe in student loans given their incomes?

I took this question to twitter and got some helpful responses in terms of pointers to complicated data sets where the answer could be found, but no existing figures. Raph Charron-Chénier very graciously offered to produce a pair of graphs based on analysis of the Survey of Consumer Finances. The first looks at the percentage of individuals with any student loans by income quintile, and separately by age.


Computational models for the evolution of world cuisines

arXiv, Physics > Physics and Society; Rudraksh Tuwani, Nutan Sahoo, Navjot Singh, Ganesh Bagler


from

Cooking is a unique endeavor that forms the core of our cultural identity. Culinary systems across the world have evolved over a period of time in the backdrop of complex interplay of diverse sociocultural factors including geographic, climatic and genetic influences. Data-driven investigations can offer interesting insights into the structural and organizational principles of cuisines. Herein, we use a comprehensive repertoire of 158544 recipes from 25 geo-cultural regions across the world to investigate the statistical patterns in the usage of ingredients and their categories. Further, we develop computational models for the evolution of cuisines. Our analysis reveals copy-mutation as a plausible mechanism of culinary evolution. As the world copes with the challenges of diet-linked disorders, knowledge of the key determinants of culinary evolution can drive the creation of novel recipe generation algorithms aimed at dietary interventions for better nutrition and health.


Who Owns Huawei? The Company Tried to Explain. It Got Complicated.

The New York Times, Raymond Zhong


from

For one of the world’s largest technology companies, it should be a simple question: Who owns Huawei?

As the Chinese smartphone and telecommunications equipment giant battles the United States government over whether it should be allowed to build the world’s mobile networks, the company has been going to great lengths to present itself as open, transparent and trustworthy.

It has not always worked out. One reason is that certain simple questions about Huawei do not have simple answers.


Continuity and Change in Internet Law

Communications of the ACM, James Grimmelmann


from

This is my first column as editor for Communications’ Law and Technology column. I am taking over from the very capable Stefan Bechtold, who established the column in its current form and imbued it with his high standards of rigor, relevance, and readability. I thought I might mark this transition with some historical reflections on how the field of Internet law has changed over the last few decades, and what has stayed the same.

Start with the continuity. The basic issues around intellectual property rights in software have been the same for a very long time. In 2014, the U.S. Supreme Court expressed serious skepticism about patents to “do X on a computer” and a federal appeals court allowed Oracle to assert copyright in the Java APIs. Neither issue is new. The Supreme Court was just as skeptical about software patents in 1972 and 1978, and a different federal appeals court held in 1995 that Lotus 1-2-3’s macro interface was uncopyrightable.


U.S. universities reassess collaborations with foreign scientists in wake of NIH letters

Science, Jeffrey Mervis


from

Adam Kuspa tries to anticipate queries from his institution’s largest source of research funding, the U.S. National Institutes of Health (NIH) in Bethesda, Maryland. “We like to tell NIH things before they ask us,” says Kuspa, senior vice president and dean of research at Baylor College of Medicine (BCM) in Houston, Texas.

In August 2018, NIH Director Francis Collins asked BCM and thousands of other institutions to be more vigilant in defending the U.S. research enterprise against efforts by unscrupulous foreign governments to steal ideas and technology. Kuspa had just attended a classified Federal Bureau of Investigation briefing on the topic for Houston-area academic leaders and figured the issue was heating up. So he ordered up an audit of the foreign affiliations of every BCM faculty member with current NIH funding. The review, which won’t be finished until the end of the year, has meant poking into the professional lives of roughly 500 of the college’s 3500 scientists.

But Kuspa’s attempt to stay ahead of NIH came to naught. A few months into the audit, BCM received letters from NIH asking about four scientists it believed had violated the agency’s rule requiring them to disclose all foreign ties relating to their research.

 
Events



Cybersecurity Research Transition to Practice Workshop Slated for June 19

Internet2


from

Chicago, IL June 19. “The event is sponsored by the NSF Cybersecurity Center of Excellence at Indiana University under NSF Grant ACI-1547272, Indiana University Research and Technology Corporation (IURTC), and hosted by Microsoft.”

 
Deadlines



NISS 2019 Jerome Sacks Award: Call for Nominations

“Nominations are being sought for the 2019 National Institute of Statistical Sciences’ (NISS) Jerome Sacks Award for Outstanding Cross-Disciplinary Research. The prize recognizes sustained, high-quality, cross-disciplinary research involving the statistical sciences.” Deadline for nominations is April 30.

Workshop on Multi-Task and Lifelong Reinforcement Learning

Long Beach, CA June 15, at ICML. Deadline for submissions is May 3.

Mini-grant Funding Opportunity in Digital Agriculture

“The University of Nebraska-Lincoln and Iowa State University are continuing a mini-grant funding opportunity to address data challenges associated with digital agriculture.” Deadline for submissions is May 10.

Health Informatics Career Paths Project survey

“Thank you for showing an interest in the research we are doing on health informatics career pathways. Please feel free to explore this website for further information about the project. Please note we have used the term ‘health informatics’ throughout but we appreciate different terminology may be in use.”

Mozilla Research Grants 2019H1

“a program to help us keep the Internet safe, open, and accessible to all, as it evolves. Where appropriate, we are particularly looking for proposals that support our aim to grow an internet that truly puts people first, where individuals can shape their own experience and are empowered, safe and independent.” Deadline for submissions is May 31.

Deep Math 2019 – Conference on Mathematical Theory of Deep Neural Networks

New York, NY October 31-November 1. Deadline for submissions is June 28.

VAST Challenge 2019: Disaster at St. Himark!

“The VAST Challenge 2019 presents three mini-challenges and a grand challenge for you to apply your visual analytics research and technologies to help a city grapple with the aftermath of an earthquake that damages their nuclear power plant.” Deadline for submissions is July 12.

Our Top 10 Innovations Competition is Accepting Submissions | The Scientist Magazine®

“From disruptive sequencing technology to transformational tweaks on tried-and-true methods, each year’s winners have represented the future of biomedical research. We want to hear from you: whether you are an individual tinkering away at the bench or a company with a dedicated R&D team. What have you brought to market in the past year that could benefit the life sciences? Submit your product or technique for consideration in this year’s competition.” Deadline for submissions is August 16.

Special Issue – Computational Political Communication: Theory, Applications, and Interdisciplinary Challenges

“The goal of this Special Issue (SI) is to foreground the theoretical, empirical, and institutional opportunities and challenges of CCS that are relevant to the political communication community. We focus on research that develops, synthesizes and applies data collection and analysis techniques relying primarily on computational methods and tools, with the objective to answer substantive theory-driven questions in the field of political communication.” Deadline for submissions is October 1.
 
Tools & Resources



MuseNet

OpenAI, Christine Payne


from

We’ve created MuseNet, a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles. MuseNet was not explicitly programmed with our understanding of music, but instead discovered patterns of harmony, rhythm, and style by learning to predict the next token in hundreds of thousands of MIDI files. MuseNet uses the same general-purpose unsupervised technology as GPT-2, a large-scale transformer model trained to predict the next token in a sequence, whether audio or text.


Cheatsheets

RStudio


from

The cheat sheets below make it easy to learn about and use some of our favorite packages. From time to time, we will add new cheat sheets to the gallery. If you’d like us to drop you an email when we do, let us know by clicking the button to the right.

 
Careers


Full-time, non-tenured academic positions

Data Services Librarian



New York University School of Medicine, Medical Library; New York, NY

Director of Financial Strategy and Compliance



Mathematical Sciences Research Institute; Berkeley, CA
Full-time positions outside academia

Data Analyst



The FA; St. George’s Park, England

Business Intelligence Analyst



Bolton NHS Foundation Trust; Bolton, England
Postdocs

Postdoctoral Fellow in the Sociology of Food and Agriculture



University of Otago, Centre for Sustainability; Dunedin, New Zealand

Postdoctoral Appointment – Information Studies



Syracuse University, School of Information Studies; Syracuse, NY

Leave a Comment

Your email address will not be published.