Data Science newsletter – February 7, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for February 7, 2019

GROUP CURATION: N/A

 
 
Data Science News



Traffic updates: Saying a lot while revealing a little

Microsoft Research, John Krumm and Eric Horvitz


from

Our project, described in our paper, Traffic Updates: Saying a Lot While Revealing a Little, to be presented at the 33rd AAAI Conference on Artificial Intelligence in Honolulu, Hawaii later this month, is aimed at significantly reducing the number of speed reports while still maintaining an accurate estimate of how fast traffic is moving on all the roads. We also explore principles around the joint use of central and distributed predictive models and the opportunity to make inferences in the absence of communication.


U engineers will develop sensors that know when cornfields are getting attacked by insects or weeds

University of Utah, UNews


from

Plants have important things to say, and University of Utah electrical and computer engineering associate professor Hanseup Kim wants to listen to them.

He leads a team of U engineers who have received a $2.2-million grant from the U.S. Department of Energy to design and build small chemical sensors that can “sniff” out when a plant is being attacked by insects or overwhelmed by weeds. The research is to benefit the production of biomass from corn, which is used to create ethanol for gasoline.

“I want to communicate with plants, and I want to know what they are feeling or know what they are going through,” Kim says. “I believe that my research team can do this by building a communication loop with these sensors and the corn.”


People admit they trust news stories that contradict their views – for a price

Cornell University, Cornell Chronicle


from

When it comes to news, we believe what we want to believe – even though deep down we may know better.

Cornell Tech researchers and colleagues have found that people are far more likely to say that news stories are true if they align with their own political views regardless of the outlet. But when offered a cash bonus for correctly evaluating the stories’ accuracy, participants were more likely to say they believed the news stories that countered their views.

“There’s an issue of expressive responding, where people say what they want to be true rather than what they actually believe to be true,” said Mor Naaman.


How Algorithms Could Save the Planet

University of California-Berkeley, Cal Alumni Magazine


from

New algorithms can also help achieve complex conservation goals that involve multiple stakeholders, says [Carl] Boettiger, citing marine protected areas (MPAs) as an example. MPAs are coastal zones that are off-limits to fishing. The sanctuaries are designed to allow fish to reproduce undisturbed in the hope they will ultimately repopulate adjacent areas. Legislation authorizing MPAs was passed by the California Legislature in 1999, and numerous areas along the state’s coast, including the Channel Islands, have since been closed to fishing—but not without controversy. Determining MPA boundaries and the species targeted for protection are hotly disputed issues. Algorithms can help avoid some of the squabbling, says Boettiger.

“They allow you to utilize all available data and target broad objectives, balancing everything from economic values to ecological resilience, to different stakeholder needs,” says Boettiger, who is only half-joking when he suggests the Endangered Species Act might ultimately be replaced with an Endangered Species Algorithm Act.


Data 8 Thrives – On and Off Campus

Division of Data Sciences at UC Berkeley


from

More than 1,600 students signed up for Data 8: Foundations of Data Science this semester, making this innovative introductory course the largest on campus. The growing enrollment is just one indication of the continuing enthusiasm for Data 8, which launched as a pilot in 2015 with fewer than 100 students. Universities and colleges around the country, and even internationally, are creating their own versions of the course, and the second run of the popular online version, Data 8X (link is external), went live this week.


How Facebook Has Changed Computing

WIRED, Business, Klint Finley


from

“Facebook has been a driving force in open source for years, sharing many critical pieces of technology with the broader community,” says Jim Zemlin, executive director at the Linux Foundation. The foundation hosts an organization dedicated to GraphQL, a programming language created by Facebook to handle communications between apps and servers that is now used by many other companies.

Because of Facebook’s openness, many of its technologies have become industry standards. “Facebook has played a tremendous role in shifting not only in the way we build our servers, but also how we write code for browsers and phones,” says Adam Neary, a tech lead at Airbnb. “The entire ecosystem is driven by technologies that Facebook pioneered and then open sourced.”


Reddit is raising a huge round near a $3 billion valuation

TechCrunch, Josh Constine


from

Reddit is raising $150 million to $300 million to keep the front page of the internet running, multiple sources tell TechCrunch. The forthcoming Series D round is said to be led by Chinese tech giant Tencent at a $2.7 billion pre-money valuation. Depending on how much follow-on cash Reddit drums up from Silicon Valley investors and beyond, its post-money valuation could reach an epic $3 billion.


Imperial wins funding to train hundreds of PhD students at six new centres

Imperial College London, Imperial News


from

Imperial was successful in securing funding for three new Centres for Doctoral Training (CDTs) and three renewal grants for existing CDTs. The EPSRC will fund 40 PhD students per CDT over five years, with extra funding coming from industry partners and the College.

Professor Nick Jennings, Vice-Provost (Research and Enterprise) at Imperial, said: “I am very pleased that we will be leading six centres that will train around 300 PhD students in the coming years.

“We are also partnering in six more centres led by other institutions, highlighting our collaborative nature and wide participation across the scheme. We are also pleased to support the centres with £1m of bursary support from the College to support their important activities.”


Davidson College and Tresata Announce Research Partnership to Accelerate Analytics Curriculum and Innovation

tresata, Press Release


from

avidson College and Tresata, analytics software pioneer, announced today the signing of an innovative research partnership that will enable collaborative research and enhance academic opportunities.

The partnership officially launches today, January 31, 2019, with the christening of the Tresata Analytics Factory’s anchor space in Davidson’s Jay Hurt Hub for Innovation & Entrepreneurship. The partnership, as well as the 6-month-old Hurt Hub, spotlight Davidson’s distinctive advances among liberal arts colleges into analytics and the entrepreneur economy.

“Davidson students already work with data every day. They use analytics to help guide team strategy for the Wildcats’ basketball teams or to find new ways of looking at social issues like crime and economic opportunity in Charlotte,” said Laurie Heyer, Faculty Director for the Hurt Hub @ Davidson. “The Tresata Analytics Factory will super-charge their ability to turn data into knowledge with imagination and empathy.”


How eavesdropping on elephants is keeping them safe

BBC, Machine Minds, Rachel Nuwer


from

“Our goal is to better understand and protect forest elephants, a keystone species roaming the second largest tropical rainforest on earth,” says Peter Wrege, a behavioral biologist at Cornell University who is part of a team attempting to decipher the elephants’ calls. “We are using technology to improve their chance of survival and, in doing so, to conserve the biodiversity of their forests.”

Wrege and his colleagues recently teamed up with a company called Conservation Metrics to leverage technology on behalf of elephant survival. The aim: to find the location of the elephants – and the poachers who seek to kill them – so the animals can be kept safe. Wrege and his colleagues have collected around 900,000 hours of recordings from central African forests, thousands of hours of which include elephant vocalisations. They have found, for example, that low frequency rumbles keep groups in contact with each other, while long, overlapping rumbles serve as greetings.


To prepare for the future battlefield, the Army has opened its AI Task Force at CMU

Pittsburgh Post-Gazette, Courtney Linder


from

To prepare the armed forces for the “future battlefield” of 2028 or 2035, the U.S. Army is setting up shop at Carnegie Mellon University.

On Friday, the Army officially activated its new Artificial Intelligence Task Force at the National Robotics Engineering Center in Lawrenceville before a crowd of politicians and researchers from nearly a dozen universities.

The task force will become a national network of experts in academia and private industry, building out solutions that the Army can use not only on the battlefield but also in rescue missions and in protecting civilians.

CMU is the home base, but the task force will eventually include other partners.


In UI’s ‘Next 150,’ ‘we are planning to tear down many academic walls’

The News-Gazette (Champaign, IL), Julie Wurth


from

On the first day of Black History Month, University of Illinois Professor Ruby Mendenhall cited the civil-rights leader as she and a half dozen other UI faculty members explained Friday how they hope to fulfill the goals of the campus strategic plan, “The Next 150.”

“We are planning to tear down many academic walls and engage the community in unprecedented ways that will transform Champaign-Urbana and hopefully the larger society,” said Mendenhall, a professor of sociology and African-American studies and assistant dean at the new Carle Illinois College of Medicine.

A crowd of several hundred people gathered at the Illini Union for the launch of the five-year plan, which calls for an emphasis on diversity and public service in key areas such as the arts, data sciences, the humanities, globalization, food security, undergraduate education, health sciences and sustainability.


How Do Word Meanings Significantly Change Over Time and In Context? Data Science Has an Answer

Medium, NYU Center for Data Science


from

In a new paper, Rujun Han of the University of Southern California and CDS alum, Michael Gill of Facebook and former CDS Faculty Fellow, Arthur Spirling, Associate Professor of Politics and Data Science, Kyunghyun Cho, Assistant Professor of Computer Science and Data Science, propose a novel method that leverages document metadata to comprehensively model how a word’s meaning changes over time and in relation to similar terms. They focus on the problem of how to think about similarities between word meaning vectors in a statistical way. The researchers’ new approach allows for testing of hypotheses about the meanings of terms, determinations of whether a term is near or far from another, and assessments of the statistical significance of one word’s meaning relative to another.


New UA-affiliated Corporation Targets Defense Research

University of Arizona, UA News


from

The University of Arizona has launched a new research corporation focused on solving complex national security problems.

The University of Arizona Applied Research Corp., or UA-ARC, will leverage existing and emerging UA research strengths including optics, hypersonics, quantum information science, artificial intelligence and machine learning, cybersecurity, aerospace, and medicine to solve some of the most complex problems facing the nation.

“The University of Arizona is already a recognized leader in these critical technologies,” said UA President Robert C. Robbins. “This corporation allows us to leverage these strengths and dedicate resources, facilities and intellectual capital to the development of the next several generations of technology that will drive our economy and impact our lives in myriad ways in the Fourth Industrial Revolution. In turn, the UA-ARC will further lift our research profile and provide access to problem sets that traditionally have not been our primary focus.”


Databricks raises $250M at a $2.75B valuation for its analytics platform

TechCrunch, Frederic Lardinois


from

While Databricks is obviously known for its contributions to Apache Spark, the company itself monetizes that work by offering its Unified Analytics platform on top of it. This platform allows enterprises to build their data pipelines across data storage systems and prepare data sets for data scientists and engineers. To do this, Databricks offers shared notebooks and tools for building, managing and monitoring data pipelines, and then uses that data to build machine learning models, for example. Indeed, training and deploying these models is one of the company’s focus areas these days, which makes sense, given that this is one of the main use cases for big data, after all.

On top of that, Databricks also offers a fully managed service for hosting all of these tools.

 
Events



Love Data Week 2019

University of California-Berkeley, Berkeley Library


from

Berkeley, CA February 11-15. “Love Your Data Week is a nationwide campaign designed to raise awareness about data management, security, sharing, and preservation. Students, researchers, librarians, and data specialists are invited to attend these events to gain hands on experience, learn about resources, and engage in discussion around data needs throughout the research process.”


Hack the Solar System at the American Museum of Natural History

SpaceApps NYC


from

New York, NY “The fifth annual BridgeUP: STEM hackathon at the American Museum of Natural History will take place on Friday February 8th through Sunday February 10th.” [application required]

 
Deadlines



Seeking input: Challenges to sustainability of open source research data tools

“We are working on a research project with the goal of identifying systemic challenges to the sustainability of data driven tooling in science and scholarship – and we want your input.” … “If you have 15 minutes, we’d love to talk about your experiences in open source research and data-centric projects.” [link to survey]

Content Policy Research on Social Media Platforms request for proposals

Facebook requests research proposals designed to examine online content and ways to reduce harmful content. This includes, but is not limited to, research that will help us develop better content policies, assess possible interventions, including experimental and observational studies, along with qualitative and analytic work to understand the mechanisms by which online rhetoric influences offline events.

In 2019, Facebook will grant awards of $50K – $100K USD per awardee to fund projects of up to 12 months in duration. This RFP challenge is open to applicants worldwide from academic and/or research institutions that are eligible for research funding.

Deadline for applications is March 15.


Dissertation Grant – Microsoft Research

“The Microsoft Research Dissertation Grant aims to increase the pipeline of diverse talent receiving advanced degrees in computing-related fields by providing a research funding opportunity for doctoral students who are underrepresented in the field of computing, which include those who self-identify as a woman, African American, Black, Hispanic, Latinx, American Indian, Alaska Native, Native Hawaiian, Pacific Islander, and/or people with disabilities.” Deadline for applications is March 31.

2019 R Finance Call for Presentations

Chicago, IL “The eleventh annual R/Finance conference for applied finance using R will be held on May 17 and 18, 2019 in Chicago, IL, USA at the University of Illinois at Chicago.” Deadline for submissions is March 1.

DengAI: Predicting Disease Spread

“This is an intermediate-level practice competition. Your task is to predict the number of dengue cases each week (in each location) based on environmental variables describing changes in temperature, precipitation, vegetation, and more.” Competition end date is June 28.
 
Tools & Resources



Agency plus automation: Designing artificial intelligence into interactive systems

Proceedings of the National Academy of Sciences; Jeffrey Heer


from

Much contemporary rhetoric regards the prospects and pitfalls of using artificial intelligence techniques to automate an increasing range of tasks, especially those once considered the purview of people alone. These accounts are often wildly optimistic, understating outstanding challenges while turning a blind eye to the human labor that undergirds and sustains ostensibly “automated” services. This long-standing focus on purely automated methods unnecessarily cedes a promising design space: one in which computational assistance augments and enriches, rather than replaces, people’s intellectual work. This tension between human agency and machine automation poses vital challenges for design and engineering. In this work, we consider the design of systems that enable rich, adaptive interaction between people and algorithms. We seek to balance the often-complementary strengths and weaknesses of each, while promoting human control and skillful action. We share case studies of interactive systems we have developed in three arenas—data wrangling, exploratory analysis, and natural language translation—that integrate proactive computational support into interactive systems. To improve outcomes and support learning by both people and machines, we describe the use of shared representations of tasks augmented with predictive models of human capabilities and actions. We conclude with a discussion of future prospects and scientific frontiers for intelligence augmentation research. [full text]


Facebook’s Tool for Automated Testing at 2 Billion Users Scale

The New Stack, Jennifer Riggins


from

Sapienz is an application of search-based software engineering (SBSE) principles to automate testing at scale at Facebook (and beyond, once it’s open source.) Sapienz attempts to apply search techniques to automatically discover test sequences and then notify the developer of potential bugs. And now it’s moving toward suggesting the best fixes for the faults it finds.

It all begs the question — what would developers be able to accomplish if they didn’t have to worry about the boringly necessary that’s unnecessarily boring?


Cross-lingual pretraining sets new state of the art for natural language understanding

Facebook Code


from

This research offers a new approach to pretraining cross-lingual models for natural language processing (NLP) tasks. Our method delivers a significant improvement over the previous state of the art in both supervised and unsupervised machine translation, as well as in cross-lingual text classification of low-resource languages. For example, we set new accuracy marks for unsupervised German-English machine translation and supervised Romanian-English translation. Specifically, this pretraining technique pushes the performance of unsupervised machine translation systems closer to that of supervised ones and even surpasses them on low-resource language pairs, such as Romanian-English.

We are providing code and pretrained models to the community so that others can use and build upon this work.

 
Careers


Full-time, non-tenured academic positions

Lecturer In Computing



Charles Sturt University ; Bathurst, NSW, Australia
Postdocs

Postdoctoral Associate



Cornell University, Information Science Department; Ithaca, NY

Leave a Comment

Your email address will not be published.