Data Science newsletter – August 13, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for August 13, 2018

GROUP CURATION: N/A

 
 
Data Science News



Growth X 200

State Street Corporation, Fathom


from

With insight from our more than two centuries serving clients, we’re exploring the trends and events that have shaped the growth of the financial system – from the exponential explosion of data to the scale, complexity and connectivity of global markets


Major Quantum Computing Advance Made Obsolete by Teenager

Quanta Magazine, Kevin Hartnett


from

18-year-old Ewin Tang has proven that classical computers can solve the “recommendation problem” nearly as fast as quantum computers. The result eliminates one of the best examples of quantum speedup.


There’s Waldo is a robot that finds Waldo

YouTube, redpepper


from

There’s Waldo is a robot built to find Waldo and point at him. The robot arm is controlled by a Raspberry Pi using the PYARM Python library for the UARM Metal. Once initialized the arm is instructed to extend and take a photo of the canvas below. It then uses OpenCV to find and extract faces from the photo. The faces are sent to the Google Auto ML Vision service which compares each one against the trained Waldo model. If a confident match of 95% (0.95) or higher is found the robot arm is instructed to extend to the coordinates of the matching face and point at it. If there are multiple Waldos in a photo it will point to each one.


Clarkson University Offering New Bachelor’s in Data Science

Clarkson University, CU News & Events


from

Across business, industry, government, and nearly all scientific efforts, the explosion of data drives a growing need to manage, analyze, and extract insight and knowledge from that data. Clarkson University’s new Bachelor of Science degree in Data Science provides students the opportunity to develop an interdisciplinary, data-focused skillset and apply those skills to real-world challenges.


Machine Learning Can Identify the Authors of Anonymous Code

WIRED, Security, Louise Matsakis


from

Researchers who study stylometry—the statistical analysis of linguistic style—have long known that writing is a unique, individualistic process. The vocabulary you select, your syntax, and your grammatical decisions leave behind a signature. Automated tools can now accurately identify the author of a forum post for example, as long as they have adequate training data to work with. But newer research shows that stylometry can also apply to artificial language samples, like code. Software developers, it turns out, leave behind a fingerprint as well.

Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt’s former PhD student and now an assistant professor at George Washington University, have found that code, like other forms of stylistic expression, are not anonymous. At the DefCon hacking conference Friday, the pair will present a number of studies they’ve conducted using machine learning techniques to de-anonymize the authors of code samples. Their work could be useful in a plagiarism dispute, for instance, but it also has privacy implications, especially for the thousands of developers who contribute open source code to the world.


We Researched Hundreds Of Races. Here’s Who Democrats Are Nominating.

FiveThirtyEight; Meredith Conroy, Mai Nguyen and Nathaniel Rakich


from

You’ve probably already heard that 2018 is a new “Year of the Woman” in Democratic primaries. Women are being nominated in record numbers across the country. But that fact only tells you so much. To what extent is a candidate’s gender really affecting voters’ choices? And do other candidate characteristics make a difference?

Through Aug. 7, 811 people have appeared on the ballot this year in Democratic primaries for Senate, House and governor, not counting races featuring a Democratic incumbent.2 In partnership with ABC News and Ballotpedia, we collected data on every single one of those candidates.


The Pentagon finally has an AI strategy — too little too late?

The Next Web, Tristan Greene


from

The US Department of Defense’s recently established Joint Artificial Intelligence Center (JAIC) has finally debuted and will oversee the Pentagon’s AI strategy. Better late than never, right?

After a couple of years’ worth of paperwork drills, Deputy Defense Secretary Patrick Shanahan released a document at the end of June establishing the JAIC, and stating it would take the next 30 days to figure out what its national mission objectives and personnel needs are. The specifics of how the center will carry out its duties going forward haven’t been publicly released, but that’s par for the course when dealing with military strategy.


Machine Learning to Assess the Scientific Soundness of Medical Papers

Oracle DataScience.com, Matt Michelson and Guilherme Del Fiol


from

Clearly there is a need to be able to filter the good research from the not-so-good, and to do so as automatically as possible (without losing any results) in order for the doctor to find the best research with the least amount of effort.

A group of us from Evid Science, the University of Utah and McMaster University recently proposed a machine learning approach to this problem in our paper, “A Deep Learning Method to Automatically Identify Reports of Scientifically Rigorous Clinical Research from the Biomedical Literature: Comparative Analytic Study.”

The original paper was meant for biomedical informatics professionals, but the topic is applicable for data scientists and aspiring data scientists alike. Here we’ll summarize what we did and why we did it, and perhaps even inspire the next generation of work to classify the scientific soundness of medical papers.


How we judge research outputs when making funding decisions

Wellcome Trust


from

Robert Kiley, Head of Open Research, and Jim Smith, Director of Science, discuss the steps Wellcome is taking to fulfil the principles of the San Francisco Declaration on Research Assessment (DORA).

When we published our open access policy over a decade ago, we made it clear that what counts when we make funding decisions is the intrinsic merit of the work and not the journal or publisher.

Despite significant progress in making our research open access, we know many researchers remain unconvinced that it’s the work that counts and not where it’s published. We’re worried about this and so we’re continuing our efforts to promote best practice in research assessment.


Can sound help save a dwindling elephant population? Scientists using AI think so.

Microsoft, On the Issues blog


from

Conservation Metrics, a Microsoft AI for Earth grantee based in Santa Cruz, California, uses machine learning to monitor wildlife and evaluate conservation efforts. It is applying its sophisticated algorithms to help the Elephant Listening Project, based at Cornell University’s Lab of Ornithology, distinguish between forest elephant calls and all the other sounds in a noisy tropical rainforest. It’s a perfect job for AI — looking for these rare patterns in terabytes of data that would take humans years.

Researchers use the elephant call data to build more accurate and frequent population estimates, track their movements, provide better security and potentially to identify individual animals, which can’t be easily seen from the air.

 
Events



14th annual Large Synoptic Survey Telescope Workshop

Arizona Daily Star, Mikayla Mace


from

Tucson, AZ August 16, starting at 7 p.m. “Locals will have the opportunity to attend a free public lecture on the status of the next-generation telescope and learn how to get involved in the process of scientific discovery driven by LSST data.”


Oceanhackweek 2018

University of Washington


from

Seattle, WA August 20-24. “The Oceanhackweek is a 5-day learning hackathon aimed at exploring, creating and promoting effective computation and analysis workflows for large and complex oceanographic data.” [Applications have closed.]

 
Deadlines



1st Official LXAI Research Workshop – Creating Harmony and Opportunity for Latinx in AI

Montreal, QC, Canada December 6, co-located with NIPS 2018. Deadline for abstract submissions is August 30.

Artificial Intelligence Insights into Regulations (24062-19-096)

“The Canada School of Public Service is engaging organizations that employ artificial intelligence methods, including advanced data analytics and machine learning approaches, to offer new insight and analysis on federal regulations.” Deadline for proposals is September 14.

Microsoft Research India: Call for Collaborative Projects on Cloud and AI Technologies for Societal Impact

Microsoft Research India is creating a new program to work with academia, government and entrepreneurs to help incubate technologies for societal impact, with special emphasis on Cloud and AI platforms” Deadline for proposals is September 15.

DARPA Subterranean Challenge

“Drive novel approaches and technologies to allow warfighters and first responders to rapidly map, navigate, and search dynamic underground environments.” Registration opens on August 15.
 
Tools & Resources



FastText: stepping through the code

Medium, Maria Mestre


from

FastText is a library developed by Facebook for text classification, but it can also be used to learn word embeddings. Since becoming open-sourced in 2016¹, it has been widely adopted due to its training speed as well as its high performance.

In spite of reading the (very sparse) documentation, I realised that there were many parts of the algorithm that were obscure to me so I decided to go through the exercise of trying to understand how the model works.


Software Deposit Guidance for Researchers

Software Sustainability Institute, Mike Jackson


from

The Software Sustainability Institute has published a set of guides about depositing research software into digital repositories. These guides cover the main aspects of software deposit including why software should be deposited, when to deposit software, where to deposit it, how to make a deposit, what to deposit (and what not to deposit!), how to describe a deposit (metadata), how to choose a software licence and how to review a software deposit. The guides, development of which was funded by Jisc, are intended for researchers, principal investigators and research leaders and research data and digital repository managers.


Books on Computer Science for Data Scientists, recommended by Hadley Wickham

Five Books, Edouard Mathieu


from

“Data science is often said to be built on three pillars: domain expertise, statistics, and programming. Hadley Wickham, Chief Scientist at RStudio and creator of many packages for the R programming language, chooses the best books to help aspiring data scientists build solid computer science fundamentals.”


Analysing the R Twitterverse

Perry Stephenson


from

“In a previous blog post I wrote about scraping Twitter to get 400,000 tweets about R as part of my capstone project at the University of Technology, Sydney. I’ve got some big plans for network analysis with this dataset, but before I start untangling hairballs I thought I might as well take a look for any interesting stories that can help me understand the structure of the the R Twitter community.”


Four lessons for starting a research lab

Science, Alona Fyshe


from

As September approaches, a new cohort of junior faculty members are taking up their first positions as research group leaders. I was there 3 years ago, making career-shaping decisions—sometimes without much mentoring or support. I learned a lot in my first years—how to write a grant, manage rejection, and supervise students, to name just a few—and it was all trial by fire. Though I made it through and had some successes along the way, I certainly could have used advice about how to set up and run my lab. I’ve learned that my experience is the norm, which inspired me and a group of other early-career principal investigators to interview leaders in our fields about how they built successful research groups. Here are some of the lessons they shared.

 
Careers


Tenured and tenure track faculty positions

Faculty Positions in Data Science for Large Astronomical Surveys



Penn State University; State College, PA

Leave a Comment

Your email address will not be published.