Data Science newsletter – August 16, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for August 16, 2017

GROUP CURATION: N/A

Data Science News

Ask HN: Is Georgia Tech’s Online Master in CS Worth It?

Hacker News

from August 15, 2017

Current working professional and an OMSCS student here. It highly depends on the context. Biggest pros are: 1. This is perhaps the cheapest Computer Science masters in the United States from a premier school. The degree is exactly the same as offered to the residential program and the credits acquired are all legit and transferable to other universities. I had friends who transferred from OMSCS to a regular school and skipped one full semester due to the credits earned. 2. An OMSCS qualification holds way more water than if you do random MOOC qualifications on Coursera and others. 3. The coursework is the same as the residential program.

Database provider MongoDB has filed confidentially for IPO

TechCrunch, Katie Roof

from August 15, 2017

MongoDB has filed confidentially for IPO, sources tell TechCrunch. The company has submitted an S-1 filing in the past few weeks and is aiming to go public before the end of the year.

New York-based MongoDB helps companies including Adobe, eBay and Citigroup manage databases. Some of its offerings include its name-bearing MongoDB open source database and the Atlas database-as-a-service offering.

Automated fingerprint analysis is one step closer to reality

EurekAlert! Science News, NIST

from August 14, 2017

Scientists from the National Institute of Standards and Technology (NIST) and Michigan State University report that they have developed an algorithm that automates a key step in the fingerprint analysis process. Their research has been published in IEEE Transactions on Information Forensics and Security.

“We know that when humans analyze a crime scene fingerprint, the process is inherently subjective,” said Elham Tabassi, a computer engineer at NIST and a co-author of the study. “By reducing the human subjectivity, we can make fingerprint analysis more reliable and more efficient.”

Inside the Increasingly Complex Algorithms That Get Packages to Your Door

MIT Technology Review, Jamie Condliffe

from August 15, 2017

Optimizing deliveries in the real world is thornier than the traveling-salesman problem on several levels. First, distances between locations need to be calculated, and as anyone who’s used Google Maps will know, there’s always more than one way to make a journey. Will Salter, managing director at Paragon, which provides routing and scheduling software in 60 countries for clients like the U.K. supermarket giant Tesco, says path-finding algorithms used in delivery planning are “highly customized” compared with a Google Maps algorithm. They’ll take into account road conditions, changing traffic flows at different times of day—even the marginal benefits of making a right turn at a junction.

It is time to replace genotyping arrays with sequencing

Medium, The Gencove Blog, Joe Pickrell

from August 14, 2017

The benefits and limitations of genetic studies using this technology (often confusingly called genome-wide association studies [1]) have been debated since before anyone even tried one (see examples from 2000, 2008, and 2017). But it’s generally believed that, if one accepts the basic goals and assumptions of a genome-wide association study, the most cost-effective and powerful technology to use is a genotyping array.

In this post, I argue that this is no longer the case: genotyping arrays are now less effective and more expensive than sequencing technologies.

Inside the Lab That’s Quantifying Happiness

Outside Online, Rowan Jacobsen

from August 11, 2017

At the University of Vermont, mathematicians in the Computational Story Lab are reading your tweets and learning a lot about our collective well-being

Paul Allen’s AI unit doubles down on startup support

The Seattle Times, Matt Day

from August 15, 2017

In addition to $250,000 in seed funding for startups, the Allen Institute for Artificial Intelligence is offering office workspace at its Wallingford headquarters, and counsel from its experts in machine learning, computer vision and hardware engineering.

How Big Data Saved the Mountain Town

Outside Online, Abe Streep

from August 04, 2017

How does a town go from logging and livestock to bits and bytes? Tiny Prineville, Oregon, is finding out as huge data centers from Apple and Facebook transform the timber town into a recreational hub of mountain bikers and craft brewers.

A graph exploration method for identifying influential spreaders in complex networks

Applied Network Science; Nikos Salamanos, Elli Voudigari and Emmanuel J. Yannakoudakis

from August 14, 2017

The problem of identifying the influential spreaders – the important nodes – in a real world network is of high importance due to its theoretical interest as well as its practical applications, such as the acceleration of information diffusion, the control of the spread of a disease and the improvement of the resilience of networks to external attacks. In this paper, we propose a graph exploration sampling method that accurately identifies the influential spreaders in a complex network, without any prior knowledge of the original graph, apart from the collected samples/subgraphs. The method explores the graph, following a deterministic selection rule and outputs a graph sample – the set of edges that have been crossed. The proposed method is based on a version of Rank Degree graph sampling algorithm. We conduct extensive experiments in eight real world networks by simulating the susceptible-infected-recovered (SIR) and susceptible-infected-susceptible (SIS) epidemic models which serve as ground truth identifiers of nodes spreading efficiency. Experimentally, we show that by exploring only the 20% of the network and using the degree centrality as well as the k-core measure, we are able to identify the influential spreaders with at least the same accuracy as in the full information case, namely, the case where we have access to the original graph and in that graph, we compute the centrality measures. Finally and more importantly, we present strong evidence that the degree centrality – the degree of nodes in the collected samples – is almost as accurate as the k-core values obtained from the original graph. [full text]

Demystifying Data Science

Metis

from September 27, 2017

Online Live Online Conference for Aspiring Data Scientists
on September 27. [free]

2017 All-Hands Meeting – Midwest Big Data Hub

Midwest Big Data Hub

from October 02, 2017

Omaha, NB MBDH All-Hands Meeting is October 2-3. [free, registration required]

Events

2017 Conference on Digital Experimentation (CODE)

MIT Initiative on the Digital Economy

from October 27, 2017

Cambridge, MA October 27, organized by MIT Initiative on the Digital Economy. [$$$]

StatFest 2017 Will Be Held in Atlanta

Emory University School of Public Health

from September 23, 2017

Atlanta, GA StatFest 2017—a one-day event aimed at encouraging undergraduate students from under-represented groups to consider graduate studies and careers in the statistical sciences—will be held September 23. [free]

Toronto machine learning summit

TMLS

from November 02, 2017

Toronto, Canada November 2-3. A 2-day, 1-night creative exploration of Big Data, ML, and AI in Toronto. [$$$]

Deadlines

Conference on Statistical Practice 2018

Portland, OR We invite you to submit your abstract for an electronic poster presentation at the Conference on Statistical Practice 2018. The submission deadline is August 31.

ICLR 2018

Vancouver, Canada ICLR 2018 will be held in Vancouver Convention Center, from April 30 to May 3, 2018 (notice the extra day!). The paper submission deadline is October 27.

NYU Center for Data Science News

NYU Scientists Invent New Protein for Regenerative Medicine

Medium, NYU Center for Data Science

from August 15, 2017

A major challenge facing the field, however, is creating biosynthetic materials that can not only deliver targeted drug therapy or tissue engineering to a damaged area, but also non-invasively visualize the surrounding cells so that doctors can monitor their patient’s treatment progress. Although some advances have been made, a full solution has not yet been discovered — but it looks like Associate Professor Jin Montclare from NYU Tandon, CDS’s Director Richard Bonneau, and Associate Professor Youssef Zaim Wadghiri from NYU School of Medicine are on their way to cracking the case.

Tools & Resources

Cloud Speech API improves longform audio recognition and adds 30 new language variants

Google Cloud Platform Blog

from August 14, 2017

Since its launch in 2016, businesses have used the Google Cloud Speech API to improve speech recognition for everything from voice-activated commands to call center routing to data analytics. And since then, we’ve gotten a lot of feedback that our users would like even more functionality and control. That’s why today we’re announcing Cloud Speech API features that expand support for long-form audio and further extend our language support to help even more customers inject AI into their businesses.

schmook comments on [D] How do you read math-heavy machine learning papers?

reddit.com/r/machinelearning

from August 05, 2017

The secret for reading algebra-heavy papers is NOT trying to follow the algebra on the first read. This is a mistake most students do. You don’t need to understand all steps of a long calculation on the first read. You skim through the algebra and assume it is correct, take a deep look at key steps along the way. Read that thing written in English between the equations. Read the results. Read the conclusion. When you made sense of what this fucking paper is talking about generally, than you decide if you’re going to waste your time with the algebra. Don’t get bogged down on the steps you don’t understand. Assume they are correct and carry on. Go back to them later. Repeat until you get it all.

ipydatawidgets

GitHub – vidartf

from August 15, 2017

ipydatawidgets is a set of widgets to help facilitate reuse of large datasets across different widgets, and different packages.

APIs as infrastructure: future-proofing Stripe with versioningStripe

Stripe Blog > Engineering

from August 15, 2017

An API represents a contract for communication that can’t be changed without considerable cooperation and effort. Because so many businesses rely on Stripe as infrastructure, we’ve been thinking about these contracts since Stripe started. To date, we’ve maintained compatibility with every version of our API since the company’s inception in 2011. In this article, we’d like to share how we manage API versions at Stripe.

Building Mobile Applications with TensorFlow

O'Reilly Publishing, Pete Warden

from August 15, 2017

Aimed specifically at developers who already have a TensorFlow model successfully working in a desktop environment, this report shows you through hands-on examples how to deploy mobile AI applications that are small, fast, and easy to build. You’ll explore use cases for on-device deep learning—such as speech, image, and object recognition—and learn how to deliver interactive applications that complement cloud services. [free eBook]

In-Database Analytics for Large Array Data

Intel Science & Technology Center for Big Data

from August 15, 2017

The split between the analytics (compute) and reliable storage worlds is clear and breaching the two commonly includes data copies to fit the needs. But with data growth, the copies become a significant overhead with increasingly high associated costs of data extraction, redistribution, and copy-back operations.

We will show how in-database analytics addresses this problem directly with the use of modern database systems and established numerical algorithms.

Careers

Full-time, non-tenured academic positions

Training and Outreach Coordinator: NSF Arctic Data Center at NCEAS | NCEAS

UC-Santa Barbara, National Center for Ecological Analysis and Synthesis; Santa Barbara, CA

Sports.BradStenger.com

Data Science newsletter – August 16, 2017

Leave a Comment Cancel reply