Data Science newsletter – April 11, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for April 11, 2017

GROUP CURATION: N/A

Data Science News

Cornell Food Researcher’s Errors Do Not Constitute Misconduct, University Finds

The Cornell Daily Sun, Nicholas Bogel-Burroughs

from April 09, 2017

Cornell has determined that a prominent food researcher — accused earlier this year of using poor methods of data analysis, authoring papers with scores of data inconsistencies and reusing text in multiple publications — handled data inappropriately but that his errors “did not constitute scientific misconduct.”

Brian Wansink, the popular founder and director of the Cornell Food and Brand Lab, has faced a wave of criticism since authoring a blog post in November, in which, according to his critics, he endorsed shoddy research practices.

High-Tech Hope for the Hard of Hearing

The New Yorker, David Owen

from April 03, 2017

In recent years, scientists searching for ways to restore hearing have made a number of promising discoveries. There are also increasingly effective methods of preventing damage in the first place, and of compensating for it once it’s occurred. The natural human tendency, though, is to do nothing and hope for the best, usually while pretending that nothing is wrong. (People who notice they’re having hearing problems typically wait more than ten years before doing anything about them.) I recently heard a joke about a man who was worried his wife was going deaf. He told his doctor, who suggested a simple test. When the man got home, he stood at the door of the kitchen, where his wife was at the stove, and asked, “Honey, what’s for dinner?” She didn’t respond, so he moved closer and asked again. She still didn’t respond, so he stood directly behind her and asked one more time. She turned around and snapped, “For the third time, chicken!”

The memory remains: Understanding collective memory in the digital age

Science Advances; Ruth García-Gavilanes, Anders Mollgaard, Milena Tsvetkova and Taha Yasseri

from April 05, 2017

Recently developed information communication technologies, particularly the Internet, have affected how we, both as individuals and as a society, create, store, and recall information. The Internet also provides us with a great opportunity to study memory using transactional large-scale data in a quantitative framework similar to the practice in natural sciences. We make use of online data by analyzing viewership statistics of Wikipedia articles on aircraft crashes. We study the relation between recent events and past events and particularly focus on understanding memory-triggering patterns. We devise a quantitative model that explains the flow of viewership from a current event to past events based on similarity in time, geography, topic, and the hyperlink structure of Wikipedia articles. We show that, on average, the secondary flow of attention to past events generated by these remembering processes is larger than the primary attention flow to the current event. We report these previously unknown cascading effects.

IBM Patents Machine Learning Model for Pharmaceutical Discovery

HealthIT Analytics

from April 10, 2017

Machine learning is quickly making inroads across the healthcare industry, from clinical care to pharmaceutical discovery.

The latest development on the journey from big data aggregation to true artificial intelligence comes from IBM Research, which has secured a patent for an association engine and visual analytics system that harnesses cognitive computing to predict the impact and side effects of new drugs.

Cyber Attack 101: Criminals Go After U.S. Universities

SIGNAL Magazine, Sandra Jontz

from April 10, 2017

Millions of student, staff and faculty email addresses and passwords from 300 of the largest universities in the United States have been stolen and are being circulated by cyber criminals on the dark web, according to a recent report.

Hacktivists, scam artists and even terrorists intend to sell, trade or just give away the addresses and passwords, said the Digital Citizens Alliance report.

How companies and consumers benefit from AI-powered networks

VentureBeat, Mariya Yao

from April 09, 2017

As it has more than 12,500 patents, eight Nobel prizes, and a 140-year history of field-testing crazy ideas, it should surprise no one that AT&T would be an important player in artificial intelligence.

“AT&T is a backbone of the internet,” explains Nadia Morris, head of Innovation at the AT&T Connected Health Foundry. The company manages wireless, landline, and even private secure networks to power connectivity for both individuals and corporations. All these networks generate incredible volumes of data that is ripe for machine analysis.

AT&T has built AI and machine learning systems for decades, using algorithms to automate operations such as common call center procedures and the analysis and correction of network outages. On the entertainment side, AT&T’s DirecTV division leverages users’ rating histories, viewing behaviors, and other factors to anticipate the next films they’ll watch.

A Crowdsourced Recap of Data Science Day 2017

Columbia University, Data Science Institute

from April 07, 2017

The morning kicked off with talks on privacy, security, and civic tech, with SIPA’s Hollie Russon Gilman, Columbia Engineering’s Suman Jana, and Columbia Journalism School’s Susan McGregor. Graduate School of Architecture, Planning and Preservation’s Laura Kurgan moderated.

Announcing the 2017 Google PhD Fellows for North America, Europe and the Middle East

Google Research Blog; Michael Rennaker

from April 10, 2017

Google created the PhD Fellowship program in 2009 to recognize and support outstanding graduate students doing exceptional research in Computer Science and related disciplines. Now in its eighth year, our fellowship program has supported hundreds of future faculty, industry researchers, innovators and entrepreneurs.

Reflecting our continuing commitment to supporting and building relationships with the academic community, we are excited to announce the 33 recipients from North America, Europe and the Middle East. We offer our sincere congratulations to Google’s 2017 Class of Google PhD Fellows.

NYC Media Lab Data / AI / Machine Learning newsletter

NYC Media Lab

from April 08, 2017

We had the opportunity to attend the NYU Future Labs AI Summit on Wednesday this week and it was incredible to feel the strength of the NYC AI community in one place, particularly from university, corporate, startup and investment leaders here. From watching Future Labs alum Anand Sanwal present CB Insights’ latest AI industry landscape, to watching Matt Zeiler walk through the power of Clarif.ai, to watching legends like Yann LeCun and Gary Marcus present, it was a great reminder of how NYC has a robust network of AI industry innovators. As John Frankel of ffVC, one of the event organizers said – “New York City has all the elements in place to become one of the world’s AI hubs”.

Thanks to Flowtune, Ultra Fast Page Loading is Here

Edgy Labs, Zayan Guedim

from April 04, 2017

MIT researchers have developed a new system for efficiently allocating bandwidth in data center networks. The system, which would make all web pages components load faster, was presented at the Usenix Symposium on NSDI ’17.

What the Evolution of Language in Humans Means for AI

Edgy Labs, Paige

from April 09, 2017

Language is always changing, but there are some basic principles that hold true no matter what language you speak. It is possible that the ways that language affects culture and the brain, can be studied to help steer the direction of the future of AI.

I Went to the ‘Contact’ Radio Telescope with the Astrophysicist Behind Twitter’s All-Time Sickest Burn

VICE, Motherboard, Sarah Scoles

from April 11, 2017

Most scientists famous for talking about science are men who became celebritized well into their careers. Not Katie Mack.

Epigenetics Has Become Dangerously Fashionable

Quillette, Brian Boutwell and JC Barnes

from April 07, 2017

For the past few years, social scientists have been buzzing over a particular topic in molecular biology—gene regulation. The hype has been building steam for some time, but recently, it rocketed to the forefront of public discussion due to a widely circulated piece in the New Yorker. Articles on the topic are almost always fascinating: They often give the impression that this particular area of biology stands poised to solve huge mysteries of human development. While that conclusion may be appropriate in fields like medicine and other related disciplines, a number of enthusiasts have openly speculated about its ability to also explain lingering social ills like poverty, crime, and obesity. The trouble is, this last bit isn’t really a feeling shared by many of the genetics experts.

Social scientists’ excitement surrounds what we can refer to broadly as transgenerational epigenetics. To understand why social scientists have become enamored with it, we must first consider basic genetics. Many metaphors exist for describing and understanding the genome; they all capture the reality that genes provide the information for building and running biological machinery like the human body.

Events

The Machine Learning: Farm-To-Table Workshop

Midwest Big Data Hub, International Food Security at Illinois

from April 18, 2017

Urbana, IL April 18-19 [free, registration required]

Ali Yazdani – Hunting for Exotic Quantum Particles under the Microscope

NYU Department of Physics, NYU Dean for Science

from June 28, 2017

New York, NY A Public Physics Talk on Wednesday, June 28, starting at 6 p.m., Vanderbilt Hall [free, please register]

Shaking It Up Comes to Toronto: Making Open Science Work For You

Digital Science

from April 25, 2017

Toronto, Ontario, Canada Join us for Making the Open Science Mandate Work for You on Tuesday, April 25, at the University of Toronto. [free]

Deadlines

ECPR’s Summer School in Methods and Techniques

Budapest, Hungary Courses run July 27-August 12. Deadline to apply is June 1.

Tools & Resources

Episode 14: Jay Kreps on streaming data, the power of Kafka and a whole lot more by ArchiTECHt – Listen to music

SoundCloud, ArchiTECHt

from April 06, 2017

In this episode of the ARCHITECHT Show, Derrick Harris speaks with Apache Kafka creator and Confluent co-founder Jay Kreps about a wide range of topics including, obviously, Kafka and Confluent. Kreps also tackles digital transformation and real-time data in enterprises; Microsoft’s LinkedIn acquisition; the importance of network infrastructure in microservices and data-centric environments; the limitations of big data frameworks; and more. [audio, 1:06:12]

Explanation of DeepMind’s Neural Turing Machine

Rylan Schaeffer

from April 10, 2017

I’ve found that the overwhelming majority of online information on artificial intelligence research falls into one of two categories: the first is aimed at explaining advances to lay audiences, and the second is aimed at explaining advances to other researchers. I haven’t found a good resource for people with a technical background who are unfamiliar with the more advanced concepts and are looking for someone to fill them in. This is my attempt to bridge that gap, by providing approachable yet (relatively) detailed explanations. In this post, I explain the titular paper – Neural Turing Machines.

I initially didn’t intend to cover this paper, but another paper that I did want to cover wasn’t making any sense to me, and since it used a modification of the NTM architecture, I decided to make sure that I really understood NTMs before moving on. Having done that, I’m now of the opinion that the other paper is just poorly motivated. In constrast, the NTM paper is very well written and I highly recommend reading it.

Writing a Time Series Database from Scratch

Fabian Reinartz

from April 10, 2017

I work on monitoring. In particular on Prometheus, a monitoring system that includes a custom time series database, and its integration with Kubernetes.

In many ways Kubernetes represents all the things Prometheus was designed for. It makes continuous deployments, auto scaling, and other features of highly dynamic environments easily accessible. The query language and operational model, among many other conceptual decisions make Prometheus particularly well-suited for such environments. Yet, if monitored workloads become significantly more dynamic, this also puts new strains on monitoring system itself. With this in mind, rather than doubling back on problems Prometheus already solves well, we specifically aim to increase its performance in environments with highly dynamic, or transient services.

Using US Census Data to Predict Housing Prices

Dataiku, Nicolas Gakrelidz

from April 10, 2017

We built a U.S. Census plugin for Dataiku, which connects to the USCB raw data, builds the dataset directly into Dataiku, and automatically chooses the most explicative variables or builds a custom dataset with a set of variables chosen by the user. The amount of time this saves, especially for users who aren’t well acquainted with the content and structure of the US Census data, is massive.

In this blog post, we’ll illustrate how the plugin works using a project to estimate house prices in five New England states and check if adding the U.S. Census data can bring a delta performance to this predictive model (the answer is, of course, yes, it does!). We’ll use Dataiku Data Science Studio (DSS) for the entire project, from scraping the real estate data from websites to creating and applying a predictive model.

Careers

Internships and other temporary positions

Social Media intern

Rutgers University, School of Communication & Information; New Brunswick, NJ

Full-time positions outside academia

Head of Data Engineering

MassMutual; Boston, MA

Executive Director

Partnership on Artificial Intelligence to Benefit People and Society

Policy Researcher

Google DeepMind; London, England

Sports.BradStenger.com

Data Science newsletter – April 11, 2017

Leave a Comment Cancel reply