Data Science newsletter – September 1, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for September 1, 2017

GROUP CURATION: N/A

 
 
Data Science News



Python overtakes R as the leading language in Data Science

reddit.com/r/datascience, KDnuggets


from

reddit discussion of KDnuggets article


Train, Score, Repeat, Watch Out! Zillow’s Andrew Martin on modeling pitfalls in a dynamic world.

Kaggle, No Free Hunch blog


from

The $1 Million Zillow Prize is a Kaggle competition challenging data scientists to push the accuracy of Zestimates (automated home value estimates). As the competition heats up, we’ve invited Andrew Martin, Sr. Data Science Manager at Zillow, to write about how his team handles the challenges of delivering new predictions on a daily basis and how the mechanics of the Zillow Prize competition have been structured to account for these challenges.


Bitcoin’s Academic Pedigree

ACM Queue, Arvind Narayanan and Jeremy Clark


from

If you’ve read about bitcoin in the press and have some familiarity with academic research in the field of cryptography, you might reasonably come away with the following impression: Several decades’ worth of research on digital cash, beginning with David Chaum,10,12 did not lead to commercial success because it required a centralized, banklike server controlling the system, and no banks wanted to sign on. Along came bitcoin, a radically different proposal for a decentralized cryptocurrency that didn’t need the banks, and digital cash finally succeeded. Its inventor, the mysterious Satoshi Nakamoto, was an academic outsider, and bitcoin bears no resemblance to earlier academic proposals.

This article challenges that view by showing that nearly all of the technical components of bitcoin originated in the academic literature of the 1980s and ’90s (see figure 1). This is not to diminish Nakamoto’s achievement but to point out that he stood on the shoulders of giants. Indeed, by tracing the origins of the ideas in bitcoin, we can zero in on Nakamoto’s true leap of insight—the specific, complex way in which the underlying components are put together. This helps explain why bitcoin took so long to be invented. Readers already familiar with how bitcoin works may gain a deeper understanding from this historical presentation. (For an introduction, see Bitcoin and Cryptocurrency Technologies by Arvind Narayanan et al.36) Bitcoin’s intellectual history also serves as a case study demonstrating the relationships among academia, outside researchers, and practitioners, and offers lessons on how these groups can benefit from one another.


The Center for Open Science, Alternative to Elsevier, Announces New Preprint Services Today

Ithaka, Roger C. Schonfeld


from

The past year has been a momentous period for preprint-driven open access. Elsevier has made two major acquisitions, of SSRN with its edited research networks and of bepress with its Digital Commons institutional repository service. Springer Nature sibling Digital Science has worked to develop its presence too, expanding figshare as not just a data repository but as a full institutional solution and more recently improving its support for preprints. As commercial providers buy and build their way into the institutional repository and preprint marketplace, the not-for-profit Center for Open Science (COS) is offering an alternative by expanding what it calls the preprint services it powers through its platform. Today, COS announced the availability of six new services, including a national repository for Indonesia and a variety of new disciplinary services. While for now relatively small in scale, COS is building a platform for the research community that is controlled by a not-for-profit and therefore presents an intriguing and potentially powerful alternative.


Robotic system monitors specific neurons

MIT News


from

Recording electrical signals from inside a neuron in the living brain can reveal a great deal of information about that neuron’s function and how it coordinates with other cells in the brain. However, performing this kind of recording is extremely difficult, so only a handful of neuroscience labs around the world do it.

To make this technique more widely available, MIT engineers have now devised a way to automate the process, using a computer algorithm that analyzes microscope images and guides a robotic arm to the target cell.


The 2020 Census may be wildly inaccurate—and it matters more than you think

The Brookings Institution, Robert Shapiro


from

The Census Bureau prepared to ramp up funding in 2017 and 2018, as it normally did, under the $12.5 billion cap. Enter the Trump administration, which cut the Obama administration’s 2017 budget request for the Census Bureau by 10 percent and then, this past April, flat-lined the funding for 2018. It is no coincidence that the Director of the Census Bureau, John Thompson, resigned in May, effective in June. It’s a serious loss, since Dr. Thompson directed the 2000 decennial count and is probably the most able person available to contain the coming damage to the 2020 count. For its part, the administration hasn’t even identified, much less nominated, his successor. It is no surprise that the Government Accountability Office recently
designated the 2020 Census as one of a handful of federal programs at “High Risk” of failure.

The costs of starving the decennial Census could be great. It not only paints the country’s changing demographic and geographic portrait every 10 years. Its state-by-state counts determine how the 435 members of the House of Representatives are allocated among the states, and its counts by “Census block” (roughly a neighborhood) shape how members of state legislatures and many city councils are allocated in those jurisdictions. That’s just the beginning.


‘Learning database’ speeds queries from hours to seconds

University of Michigan, Michigan News


from

A tool that makes large databases work smarter, not harder, could unlock the potential of big data to drive medical research, inform business decisions and speed up a slew of other applications that today are mired in a worldwide data glut.

University of Michigan researchers developed software called Verdict that enables existing databases to learn from each query a user submits, finding accurate answers without trawling through the same data again and again. Verdict allows databases to deliver answers more than 200 times faster while maintaining 99 percent accuracy. In a research environment, that could mean getting answers in seconds instead of hours or days.

When speed isn’t required, it can be set to save electricity, using 200 times less than a traditional database.


Norway sets national open access goals

Science Library Pad blog, Richard Akerman


from

The Norwegian Ministry of Education and Research (Kunnskapsdepartementet) released National goals and guidelines for open access to research articles. It sets out not just an open access mandate, but a number of accompanying steps and activites.


The Virtual Reality Game Gathering Data for Dementia Researchers

Bloomberg Technology, Jeremy Kahn


from

Navigating an ice-walled lake or scouring a swamp for a hidden monster may sound like a fun premise for a virtual-reality video game. But there’s a serious purpose behind the new game Sea Hero Quest VR: helping neuroscientists design a new test for dementia.

London-based game design firm Glitchers worked with researchers from British and Swiss universities, as well as dementia and Alzheimer’s charities, to create Sea Hero Quest VR. Development of the free-to-download game, which is being released Tuesday for Samsung’s Gear VR headset and Facebook Inc.’s Oculus Rift, was funded by German mobile carrier Deutsche Telekom AG.


How Apple Plans to Change the Way You Use the Next iPhone

Bloomberg Technology, Mark Gurman


from

Apple Inc. plans to transform the way people use its next high-end iPhone by eliminating the concept of a home button and making other adjustments to a flagship device that’s becoming almost all screen, according to images of the new device viewed by Bloomberg News and people familiar with the gadget.

The home button is the key to the iPhone and the design hasn’t changed much since it launched in 2007. Currently, users click it to return to the starting app grid that greets them multiple times a day. They hold it down to talk to the Siri digital assistant. Double click it and you get multitasking where different apps screens can be swiped through like a carousel.


Neural networks meet space

symmetry magazine, Manuel Gnida


from

Researchers from the Department of Energy’s SLAC National Accelerator Laboratory and Stanford University have for the first time shown that neural networks—a form of artificial intelligence—can accurately analyze the complex distortions in spacetime known as gravitational lenses 10 million times faster than traditional methods.

“Analyses that typically take weeks to months to complete, that require the input of experts and that are computationally demanding, can be done by neural nets within a fraction of a second, in a fully automated way and, in principle, on a cell phone’s computer chip,” says postdoctoral fellow Laurence Perreault Levasseur, a co-author of a study published today in Nature.


Facebook to open new office in Kendall Square, adding hundreds of jobs

The Boston Globe, Janelle Nanos


from

Facebook has a status update: The social network will open a new office in Cambridge next year and plans to hire more than 500 employees, bringing the staff to 650.

The company, which founder Mark Zuckerberg launched at Harvard University before decamping for the West Coast, established its first Boston-area team nearly four years ago, with a small group of employees sharing a workspace. Today, that team has grown to more than 100 people in a Kendall Square office, and space is getting tight, said Ryan Mack, who leads the Facebook Boston office.


[1708.09843] Predicting Cardiovascular Risk Factors from Retinal Fundus Photographs using Deep Learning

arXiv, Computer Science > Computer Vision and Pattern Recognition; Ryan Poplin, Avinash V. Varadarajan, Katy Blumer, Yun Liu, Michael V. McConnell, Greg S. Corrado, Lily Peng, Dale R. Webster


from

Traditionally, medical discoveries are made by observing associations and then designing experiments to test these hypotheses. However, observing and quantifying associations in images can be difficult because of the wide variety of features, patterns, colors, values, shapes in real data. In this paper, we use deep learning, a machine learning technique that learns its own features, to discover new knowledge from retinal fundus images. Using models trained on data from 284,335 patients, and validated on two independent datasets of 12,026 and 999 patients, we predict cardiovascular risk factors not previously thought to be present or quantifiable in retinal images, such as such as age (within 3.26 years), gender (0.97 AUC), smoking status (0.71 AUC), HbA1c (within 1.39%), systolic blood pressure (within 11.23mmHg) as well as major adverse cardiac events (0.70 AUC). We further show that our models used distinct aspects of the anatomy to generate each prediction, such as the optic disc or blood vessels, opening avenues of further research.

 
Events



Explore Smart Cities of the future in Harvard’s Future Cities Program

Harvard University


from

Cambridge, MA September 21-22 at Harvard Graduate School of Design [$$$$]


StanCon 2018

StanCon Organizing Committee


from

Pacific Grove, CA January 10-12, 2018 at Asilomar Conference Grounds [$$$]

 
Deadlines



N+1 fish, N+2 fish

Making these fisheries sustainable means accurately counting all fish caught, including those thrown back at sea because they’re the wrong size or species. Managers require fishermen to monitor that discarded catch and some fishermen recently started carrying video cameras that record fish as they’re returned to the water. But, humans still have to watch hours and hours of video footage to extract the number, size, and species of discarded catch. Can you help automate the video review and make it cheaper and easier to keep track of the fish in (and out) of the sea? Deadline for competition entries is October 30.
 
Tools & Resources



Moving Real-Time Data Flow Across Cloud Providers

Metamarkets, Charles Allen


from

“Eventually in the course of data growth, a company needs to make a major migration of data or processes from one physical location to another. This post is the story of how we moved a real-time data flow across cloud providers using Kafka, Samza, and some creative engineering.”


Collecting Data When the Going Gets Tough

Abt Associates, Abt Perspectives blog, Kate Hausdorff


from

In March 2017, two days before I was supposed to leave for Northern Nigeria to set up a final round of survey data collection for an evaluation, two German expats were kidnapped from a village in the area and held for ransom. My role as an analyst on the multi-donor AgResults initiative – a project to encourage and reward high-impact agricultural innovations that promote global food security, health and nutrition – was to supervise training and the beginning of field work with the local survey firm we hired to ensure that we received clean, usable data.

We met with our security team several times, and they gave us satellite phones and guidance on areas to avoid, including instructions to stay in the city. Though I was assured the city itself was safe and peaceful, this trip would clearly be quite different than my previous survey launches. Conducting field work in an insecure and high-threat climate produces challenges that require adaptation and contingency plans.


Deep Learning (DLSS) and Reinforcement Learning (RLSS) Summer School, Montreal 2017

VideoLectures.NET


from

The Deep Learning Summer School (DLSS) is aimed at graduate students and industrial engineers and researchers who already have some basic knowledge of machine learning (and possibly but not necessarily of deep learning) and wish to learn more about this rapidly growing field of research. [27 video lectures available]


Introducing KSQL: Open Source Streaming SQL for Apache Kafka

Confluent, Neha Narkhede


from

“I’m really excited to announce KSQL, a streaming SQL engine for Apache KafkaTM. KSQL lowers the entry bar to the world of stream processing, providing a simple and completely interactive SQL interface for processing data in Kafka. You no longer need to write code in a programming language such as Java or Python! KSQL is open-source (Apache 2.0 licensed), distributed, scalable, reliable, and real-time. It supports a wide range of powerful stream processing operations including aggregations, joins, windowing, sessionization, and much more.”


How to ‘Flip’ a Classroom

The George Washington University, GW Today


from

The George Washington University Teaching and Learning Center hosted a workshop on flipped learning … “‘Flipping’ is a response to a basic structural problem of traditional teaching models, in which students generally do the easiest work—learning vocabulary, new ideas and basic principles—in class, where they have the most access to help from peers and teachers.”

 
Careers


Postdocs

Postdoctoral Fellowship: Simulation of Information Diffusion in Online Social Networks



Indiana University, Center for Complex Networks and Systems Research; Bloomington, IN

Postdoc



Northwestern University, Northwestern Institute on Complex Systems (NICO) and Kellogg School of Management; Evanston, IL

Postdoctoral Position (2)



University of California-Berkeley, Department of Mathematics; Berkeley, CA
Tenured and tenure track faculty positions

Tenure-track faculty member



Harvard University, Center for Brain Science; Cambridge, MA

Leave a Comment

Your email address will not be published.