Data Science newsletter – February 21, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for February 21, 2017

GROUP CURATION: N/A

 
 
Data Science News



Audio Super-Resolution using Neural Networks

OpenReview.net; Volodymyr Kuleshov, S. Zayd Enam, Stefano Ermon


from

We propose a neural network-based technique for enhancing the quality of audio signals such as speech or music by transforming inputs encoded at low sampling rates into higher-quality signals with an increased resolution in the time domain. This amounts to generating the missing samples within the low-resolution signal in a process akin to image super-resolution. On standard speech and music datasets, this approach outperforms baselines at 2x, 4x, and 6x upscaling ratios.


MD Anderson Benches IBM Watson In Setback For Artificial Intelligence In Medicine

Forbes, Matthew Herper


from

The partnership between IBM and one of the world’s top cancer research institutions is falling apart. The project is on hold, MD Anderson confirms, and has been since late last year. MD Anderson is actively requesting bids from other contractors who might replace IBM in future efforts. And a scathing report from auditors at the University of Texas says the project cost MD Anderson more than $62 million and yet did not meet its goals. The report, however, states: “Results stated herein should not be interpreted as an opinion on the scientific basis or functional capabilities of the system in its current state.”


5 Things I Learned from Studying Facebook Benchmark Data

Medium, Madrona Venture Group, Ryan Metzger


from

Through my role at Madrona Venture Group advising startups on growth, I meet a lot of marketers using Facebook to acquire and engage with customers. It’s a powerful platform. Unfortunately, many companies are falling short of their goals and are left frustrated with what they see as unrealized potential.

Recently I began a Facebook benchmarking project with the goal of providing marketers in our portfolio with aggregate performance data on different stages of the funnel. This project used 2015 and 2016 data from companies who were targeting consumers. B2B companies were not included in this study, though many have successfully marketed on Facebook and many of the same trends likely apply.

Through this process, I came up with 5 recommendations that companies should consider as they try to get the most out of this large and increasingly competitive channel.


The CS Capacity Program – New Tools and SIGCSE 2017

Google Research Blog; Chris Stephenson


from

The CS Capacity program was launched in March of 2015 to help address a dramatic increase in undergraduate computer science enrollments that is creating serious resource and pedagogical challenges for many colleges and universities. Over the last two years, a diverse group of universities have been working to develop successful strategies that support the expansion of high-quality CS programs at the undergraduate level. Their work focuses on innovations in teaching and technologies that support scaling while ensuring the engagement of women and underrepresented students. These innovations could provide assistance to many other institutions that are challenged to provide a high-quality educational experience to an increasing number of introductory-level students.


How we’re getting things done

Data Carpentry, Erin Becker


from

The Data and Software Carpentry staff have been working together to make progress on projects that are important for our community. To help us do this, we’re trying out a new work process based on BaseCamp’s six week work cycle. You can read their blog post if you’re interested in the details of how structuring a work cycle works. We’re picking a small handful of projects to focus on for each six week cycle, with each staff member working on one or two projects. For each project, we’re setting realistic goals we know we can accomplish before the end of the cycle and holding ourselves accountable to meeting those goals. We’re spending the first two weeks of the cycle planning those goals, dividing up the work into teams, and setting timelines to make sure we stay on track.


The one critical skill many data scientists are missing

VentureBeat, Emma Walker


from

Data science is a new career for the age of Big Data (whatever that means this week), but you can see that it’s at the intersection of qualities many people have been developing for years. As a graduate of the Science to Data Science (S2DS) summer school, I know people who have come to data science from a wide variety of backgrounds and found a new niche for themselves.

However, I believe there’s something missing from this picture — a vital skill that comes in many forms and needs constant practice and adaption to the situation at hand: communication.

This isn’t just a “soft” or “secondary” skill that’s nice to have. It’s a must-have for good data scientists.


Where are our data backups heading?

SAGE Connection – Insight, Ian Mulvany


from

We should be moving our research data into cloud services, or at least backing up our local copies to cloud service.

The cost of storing those hypothetical 28TB of research papers for one year in Google’s cloud platform is $6,800 dollars per year, and if you only need to access a portion of that data, while keeping most of it in a “warm” state, then that cost halves. It’s a bit more than the magical one dollar mark, but it’s possibly within reach of a research budget, and given that most small scale data comes nowhere near this size, the argument for cloud hosting at least a copy of your data becomes quite strong.


Beyond ‘Hidden Figures’: Nurturing New Black and Latino Math Whizzes

The New York Times, Amy Harmon


from

A free math camp for middle-school students from New York’s poorest neighborhoods was an effort to increase the number of blacks and Latinos with advanced math degrees.


Swarms of ‘Robotic Plankton’ Help Scientists Track Marine Wildlife, Oil Spills

The New Stack, Kimberly Mok


from

Despite their tiny size, plankton are a major food source for larger aquatic animals like whales and fish. Astonishingly, plankton also produce half of the planet’s oxygen. Despite their ecological importance, it’s not totally clear how these critters congregate and are pushed around by ocean currents, and how that affects their feeding and mating behaviors.

But that uncertainty is lifting, thanks to a swarm of underwater robots programmed to imitate the movements of plankton. Created by scientists over at the Scripps Institution of Oceanography at the University of California San Diego, they are now using these Miniature Autonomous Underwater Explorers (M-AUE) to better understand how plankton move in three dimensions underwater. The team’s findings could someday also help humanity mitigate threats like oil spills and harmful algal blooms, which can cause massive fish die-offs and make drinking water toxic for humans.


An AI-Fueled Credit Formula Might Help You Get a Loan

MIT Technology Review, Nannete Byrnes


from

Startup ZestFinance says it has built a machine-learning system that’s smart enough to find new borrowers and keep bias out of its credit analysis.


Whatever you do, don’t call this an ‘interesting’ idea

Aeon Ideas, Simson L Garfinkel


from

Replication and repeatability are thought by many laypersons to be a shared ideal among many scientists. In practice, few scientific studies are ever replicated. Last year, a survey by Vox.com of 270 scientists found few attempting replication studies because of the difficulty in funding and publishing. Funding agencies pride themselves on sponsoring transformative, breakthrough research – interesting work that, almost by definition, doesn’t repeat (read: replicate) what’s been done before. And journals generally don’t print articles that merely replicate findings that have been previously published; such articles aren’t considered sufficiently interesting.

The results are bad for the practice of science, because the scientific method relies on replication. Without it, it takes a lot longer for erroneous studies to be corrected. But getting things right is not interesting, it’s pedantic.


The race to map the human body — one cell at a time

Nature News & Comment, Heidi Ledford


from

The first time molecular biologist Greg Hannon flew through a tumour, he was astonished — and inspired. Using a virtual-reality model, Hannon and his colleagues at the University of Cambridge, UK, flew in and out of blood vessels, took stock of infiltrating immune cells and hatched an idea for an unprecedented tumour atlas.

“Holy crap!” he recalls thinking. “This is going to be just amazing.”

On 10 February, the London-based charity Cancer Research UK announced that Hannon’s team of molecular biologists, astronomers and game designers would receive up to £20 million (US$25 million) over the next five years to develop its interactive virtual-reality map of breast cancers.


George Loewenstein and the NEW New Economics of Information

YouTube, CMUHSS


from

Carnegie Mellon University’s George Loewenstein presented findings, most in collaboration with students and colleagues in the Social and Decision Sciences Department that challenges traditional economics accounts of how people deal with information. In some cases, motivated by curiosity, people seek out information that has no value for decision making. In other situations, if it threatens to be painful, people avoid information that could inform decisions. And, rather than updating their beliefs rationally, people often defend their beliefs as they would defend material possessions. [video, 1:17:39]

 
Events



Re-Boot Camp 2017! | .txtLAB @ mcgill

McGill University, .txtLAB


from

Montreal, Quebec, Canada June 12-16. Come spend a week in Montreal to learn new skills in the computational analysis of literature at McGill University’s .txtLAB. [$$$]


DeepLearn 2017

University of Deusto, Rovira i Virgili University


from

Bilbao, Spain International Summer School on Deep Learning will take place July 17-21.

 
Deadlines



ComSciCon 2017 National Workshop

Cambridge, MA We are excited to annouce the 5th annual ComSciCon National Workshop, happening June 8th-10th, 2017 in Cambridge, MA! Applications are now open, and will be accepted through March 1st.

Cybersecurity Factory Application

A summer program for security startups
in collaboration with Highland Capital Partners – the program dates are June 12-August 18. We will accept applications starting March 6, continuing on a rolling basis until March 20.
 
Tools & Resources



NIPS 2016 Workshop videos

YouTube, David Lopez-Paz


from

7 videos. 30-45 minutes in length.


Making Git and Jupyter Notebooks play nice

Tim Staley


from

“jq rocks for speedy JSON mangling. Use it to make powerful git clean filters, e.g. when stripping out unwanted cached-data from Jupyter notebooks. You can find the documentation of git ‘clean’ and ‘smudge’ filters buried in the page on git-attributes, or see my example setup.”


Debug TensorFlow Models with tfdbg

Google Developers Blog, Shanqing Cai


from

“We are excited to share TensorFlow Debugger (tfdbg), a tool that makes debugging of machine learning models (ML) in TensorFlow easier.”

 
Careers


Full-time positions outside academia

Director of Data Visualization



Bocoup; Boston, MA

Data Science Manager



DataKind; New York, NY

Research Scientist, News Feed Science



Facebook; Menlo Park, Ca
Full-time, non-tenured academic positions

Data Services Manager



Johns Hopkins University; Baltimore, MD

Leave a Comment

Your email address will not be published.