Data Science newsletter – January 24, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for January 24, 2017


Data Science News

How Disordered Proteins Are Upending Molecular Biology

Quanta Magazine, Alla Katsnelson


Structure equals function: If there’s one thing we all learned about proteins in high school biology, that would be it. According to the textbook story of the cell, a protein’s three-dimensional shape determines what it does — drive chemical reactions, pass signals up and down the cell’s information superhighway, or maybe hang molecular tags onto DNA. For more than a century, biologists have thought that the proteins carrying out these functions are like rigid cogs in the cell’s machinery.

Of course, exceptions would occasionally crop up. A scientist might bump into a protein that performed its functions perfectly well yet didn’t have rigid structures. Most researchers chalked these cases up to experimental error, or dismissed them as insignificant outliers.

More recently, however, biologists have begun paying attention to these shapeshifters. Their findings are tearing down the structure-function dogma.

Ground: A data context service

Adrian Colyer, the morning paper


Hellerstein et al. argue that in addition to the three V’s of big data, we should be worrying about the ‘ABCs’ of data context: Applications, Behaviour, and Change.

Fashion’s Next Revolution Is All About Machine Learning

Minutehack, Eric Bowman


Are [consumers] just browsing? Looking for something specific? Going deeper, we can analyze data from returns and integrate consumer feedback about size, taste, and shape to enhance our understanding of the individual.

Machines can analyse this information, weaving it together to create a tapestry of consumer tastes and preferences. Over time, machines will begin to understand what a consumer is searching for or what they want to achieve. And then it will be able to offer recommendations that are on par with a highly-skilled stylist.

University Data Science News

Shulamit Khan, an economist at Boston University found that doing a postdoc comes at a cost, “giv[ing] up about one-fifth of their earning potential in the first 15 years after finishing their doctorates—which, for those who end up in industry, amounts to $239,970”. Roughly forty percent of postdocs will not find positions in academia.

Long read of the week: How statistics lost their power – and why we should fear what comes next by William Davies who argues, “The declining authority of statistics – and the experts who analyse them – is at the heart of the crisis that has become known as ‘post-truth’ politics”.

David Baker‘s lab at the University of Washington-Seattle has turbo-charged our understanding of how proteins fold using metagenome sequencing. They found 140 previously unknown protein folds and predicted structural models for an additional 474. Helpfully, Quanta’s Alla Katsnelson has a great piece of science writing on protein folding.

Columbia University has launched a “Micro Masters in AI” in a rebrand of the degree formerly known as a certificate. Most credits from the MicroMasters will transfer to a full Masters, if students wish to continue.

DeepMind, acquired by Alphabet, has announced it will partner with University College London on a master’s degree module in machine learning. Beat that, competing masters programs.

Cornell Tech’s first tenant in The Bridge will be investment firm Two Sigma.

UCLA iSchool faculty scraped climate science data from federal websites, fearing it would disappear under Trump’s administration. A similar event is taking place at NYU on February 4th and both were inspired by a hackathon at the University of Toronto to preserve US environmental data last December.

NYU announced plans to invest $500m in its new-ish Brooklyn campus, home to the Tandon School of Engineering and the Center for Urban Science and Planning (CUSP).

Harvard Medical School researchers used ~750,000 records and predictive algorithms developed by Israeli company MedAware to flag erroneous prescriptions. The study had a false positive rate of 25 percent – better than status quo, but plenty of room for improvement.

Geoff Hinton, a major figure in the development of neural networks who now splits his time between Google and the University of Toronto, gave an interview to Gigaom. He worries what will happen when authoritarian leaders have access to armies of weaponized bots. Don’t we all.

The John and Laura Arnold Foundation, a new-ish Texas-based grantor that gives to open science initiatives, was profiled in Wired.

UC-Berkeley launches RISELab a machine learning center and successor to AMPLab. It will be funded by a cornucopia of leading tech companies and banks: Amazon Web Services, Ant Financial, Capital One, Ericsson, GE Digital, Google, Huawei, Intel, IBM, Microsoft, and VMWare.

Alexander Petersen at UC-Merced found troubling evidence of behavior ranging from “editorial apathy to misconduct” at megajournal PLOS ONE between 2006 and 2015.

The University of Rochester Medical School will receive up to $9m from the Centers for Disease Control and Prevention to surveil the globe for signs of emerging infectious disease.

Harvey Mudd has a CS cohort made up of 55 percent women, well above the 16 percent average across universities. They changed their curriculum, outreach strategy, and gender mix of senior faculty to make their department more attractive and welcoming to women students.

Nafundi receives multiyear grant to integrate malaria data



In 2015, the World Health Organization (WHO) announced a new strategy for malaria control and elimination and a key goal was eliminating malaria in at least 10 countries by 2020. In the Eliminating Malaria report that WHO published, action-oriented surveillance data was noted as one of core needs.

Malaria surveillance data can only trigger effective actions when the collected data are monitored and used. In countries where malaria elimination is a priority, collected data is stored across multiple organizations and in multiple systems. The distributed nature of these systems prevents the fast and integrated analysis that is critical for data-driven decision making.

Highlights from the 2017 Neural Computation and Engineering Connection

Michael Beyeler, Ask a Swiss blog


Once a year, researchers meet at the University of Washington (UW) in Seattle as part of the Neural Computation and Engineering Connection to discuss what’s new in neuroengineering and computational neuroscience. Organized by the UW Institute for Neuroengineering, this year’s topics ranged from brain-computer interfaces to rehabilitative robotics and deep learning, with plenary speakers such as Marcia O’Malley (Rice), Maria Geffen (University of Pennsylvania), and Michael Berry (Princeton).

AI Is Your Doctor’s Next Best Friend

Medium, Mike McCormick


Last week the annual JPMorgan Healthcare Conference descended on San Francisco along with hundreds of badge-toting corporate execs and startups hoping to collaborate with them — or perhaps put them out of business.

The conference got me thinking more deeply about the future of healthcare — specifically how the technologies we’re interested in at Comet Labs (i.e. AI and robotics) will affect the industry. Having spoken with dozens of found

Two Sigma to be Inaugural Tenant at The Bridge at Cornell Tech

Cornell Tech, News & Views


Forest City Ratner Companies today announced that Two Sigma Investments, LP, the pioneering tech and investment firm, is the first company to be selected to locate at The Bridge at Cornell Tech. The Bridge is a first-of-its-kind building that will house an extraordinary mix of cutting-edge companies working alongside groundbreaking Cornell academic teams: from recent Cornell Tech graduates hustling to commercialize a new idea, to start-ups on the verge of explosive growth, and established companies developing leading edge technologies and products.

This App Offers Personal Recommendations While Keeping Your Data Private

MIT Technology Review, Rachel Metz


Personalization shapes your apps, services, and digital personal assistants, but to do it they tend to gather a lot of data about you. Researchers have come up with one potential alternative: an app that uses information about you to suggest news stories you may want to read, yet doesn’t send your personal information to any remote servers.

PrIA (for “Private Intelligent Assistant”) was created by researchers at Stony Brook University and IBM. For now, it requires both a laptop and an Android smartphone. The smartphone and laptop collect data from your Twitter, Facebook, and Chrome activity to create a user profile stored locally, not in the cloud. The laptop app downloads stories from Google News—without logging in to a Google account—and then, based on the profile stored only on your devices, it makes recommendations for which stories it thinks you’d want to look at. It sends those to an app on the smartphone.

Ethics — the next frontier for artificial intelligence

TechCrunch, Don Basile


AI’s next frontier requires ethics built through policy. Will Donald Trump deliver?

With one foot in its science fiction past and the other in the new frontier of science and tech innovations, AI occupies a unique place in our cultural imagination. Will we live into a future where machines are as intelligent — or frighteningly, more so — than humans? We have already witnessed AI predict the outcome of the latest U.S. presidential election when many policy wonks failed.

Perhaps we are further along than we thought.

The effects of melting glaciers on tropical communities

Penn State University, Penn State News


Karl Zimmerer, professor of geography, conducts research focused on the impacts of climate change on glaciers in tropical mountains, and how this impacts agrobiodiversity and food security in communities. … Zimmerer develops and tests models of these human-environment interactions of agrobiodiversity that he and his collaborators then apply to climate change challenges, as well as to the adaptation and resilience of sustainable food systems.

Where the Cloud Won’t Work: Machine Learning for the Industrial Internet of Things

The New Stack, Mark Boyd


A quiet race is going on to set up the infrastructure needed for the industrial Internet of Things (IoT). It is generally agreed that the cloud model won’t work to manage sensor data in real time, so instead hardware and network providers are rushing to evolve their technologies and sign up industrial customers to pilot and early implementation initiatives in edge processing.

Stage one of the race is well underway, with the current focus on enabling edge processing on hardware gateways located in the field (factories, workplaces, cities, farms and buildings). To do that, many are leveraging Dockerized containers (and Moore’s Law) to do more powerful data processing. Once this infrastructure has a little more robustness behind it, introducing machine learning (ML) at the edge will spark a second wave of the race.

GEOINT and Machine Learning

Trajectory Magazine, Kristin Quinn


Dr. Nina Berry, a science and technology advisor with the Joint Improvised-Threat Defeat Organization (JIDO), gave a presentation discussing the relevance of AI and ML for the warfighter. She highlighted JIDO’s video intelligence (VIDINT) framework, which helps analysts process video to identify individuals.

“The AI we’ve been doing so far in VIDINT is showing great promise,” Berry said, adding the initiative supports the Pentagon’s Third Offset Strategy and its goal to implement more automation technology.

Dr. Catherine Cotell, the National Geospatial-Intelligence Agency’s (NGA) research anticipatory analytics lead, also discussed AI and the Third Offset Strategy. Cotell said in the face of a “tsunami” of commercial and small sat data NGA is “highly motivated” to automate not just the collection of the images but also the interpretation of them.

DARPA wants to simulate how social media spreads info like wildfire

Network World, Michael Cooney


The goal of the program — Computational Simulation of Online Social Behavior (SocialSim) is to develop what the agency calls “innovative technologies for high-fidelity computational simulation of online social behavior. SocialSim will focus specifically on information spread and evolution. Current computational approaches to social and behavioral simulation are limited in this regard.”

DARAP said such highly accurate, scalable computerized simulation of how online information spreads would help analyze strategic disinformation campaigns by adversaries, deliver critical information to local populations during disaster relief operations and could potentially contribute to other critical missions in the online information domain.

U.S. Soldiers Will Be Armed With Machine Translators to Kill Communication Woes

Quartz, Slator


Communication is critical in most professions, but in some, clear verbal exchanges can mean the difference between life and death. Soldiers in the US Army stationed abroad, for example, must convey and understand information in languages other than English, but don’t always have access to interpreters to help them communicate.

To assist with this, the military has for years been working on a Machine Foreign Language Translation System (MFLTS), rolling out pieces of the software since 2011. The Army will “soon reach full deployment” of the platform, which runs on military and commercial devices, like laptops and phones, the translation industry publication Slator reported in December.

It’s Not the Algorithm, It’s the Data

Communications of the ACM, News, Keith Kirkpatrick


Crime in the U.S. has fallen dramatically over the past three decades, with 2014 statistics from the Federal Bureau of Investigation (FBI) noting the number of violent crimes committed per 100,000 people in 2013 (368) was less than half the level seen in 1991 (758).

Nevertheless, the debate continues over how to maintain these lower crime rates while addressing issues of fairness in the way communities are policed, as well as how to effectively and fairly use risk-assessment tools that can be relied upon by sentencing courts or parole boards.

There are two primary issues at stake: risk-assessment algorithms, which weigh a variety of factors related to recidivism, or the likelihood an individual will commit another crime and wind up back behind bars; and predictive policing, which has been described as using data analytics and algorithms to better pinpoint where and when a crime might occur, so police resources can be more efficiently deployed.

Can artificial intelligence expand health care access?

MIT Sloan School of Management, Amy McMillan Bankson


With the future of the Affordable Care Act in question and debate about health care costs, coverage, and delivery methods continuing around the country, an increasing number of people in the United States are relying on telemedicine for their health needs. According to a report by the research firm Tractica, telemedicine use, or clinical services provided in a remote setting, is projected to increase 700 percent by 2020.

Remedy, co-founded by several MIT and Princeton alumni, is trying to ease the pain of high-deductible plans by offering affordable access to doctors, augmented by artificial intelligence. “Remy,” the company’s automated medical companion, similar to a chatbot, can already perform tasks such as collecting and summarizing patient medical history and complaints for physicians.

Cancer Studies Are Fatally Flawed. Meet the Young Billionaire Who’s Exposing the Truth About Bad Science

WIRED, Science, Sam Apple


In July 2012, [Brian] Nosek received an email from an institution whose name he didn’t recognize: the Laura and John Arnold Foundation. A Google search told him that the Arnolds were a young billionaire couple in Houston. John, Nosek learned, had made his first millions as a wunderkind natural gas trader at Enron, the infamous energy company, and he’d managed to walk away from Enron’s 2001 collapse with a seven-­figure bonus and no accusations of wrong­doing attached to his name. After that Arnold started his own hedge fund, Centaurus Energy, where he became, in the words of one hedge fund competitor, “the best trader that ever lived, full stop.” Then Arnold had abruptly retired at the ripe age of 38 to focus full time on philanthropy.

As Nosek tells it, John Arnold had read about the Reproducibility Project in The Chronicle of Higher Education and wanted to talk. By the following year, Nosek was cofounding an institution called the Center for Open Science with an initial $5.25 million grant from the Arnold Foundation. More than $10 million more in Arnold Foundation grants have come since.

Four Questions For: Geoff Hinton

Gigaom, Byron Reese


You’ve been referred to as the “godfather of neural networks.” Do you believe you’ll see true artificial intelligence in your lifetime?

It depends on what you mean by true artificial intelligence. If you mean autonomous agents with human level abilities at perception, natural language, reasoning and motor control, probably not. However, it’s very hard to see more than about 5 years into the future so I would not rule it out. Ten years ago, most people in AI would have been very confident that there was no hope of doing machine translation using neural nets that have to get all their linguistic knowledge from the raw training data. But that is now the approach that works best and it has just halved the gap in quality between machine translations and human translations.


NYU Spring Welcome 2017

New York, NY Spring Welcome takes place during the first three weeks of spring semester and is a series of events, info sessions and more for all students to kick start the new semester. Whether at NYU for the first time or are back for the spring semester, you won’t want to miss this. This Guide is your resource for all things NYU and the over 200 exciting events going on.

AAAI-17 AI in Practice

San Francisco, CA The Thirty-First Conference on Artificial Intelligence, February 4–9 [$$$$]

2017 ASA Conference on Statistical Practice

Jacksonville, FL February 23-25 [$$$]

HHS Health Datapalooza

Washington, DC April 27-28 [$$$$]

CSV,Conf is back in 2017! Submit talk proposals on the art of data collaboration.

Portland, OR Conference will be May 2-3. Talk proposals for CSV,Conf close February 15, so don’t delay, submit today! The deadline is fast approaching and we want to hear from a diverse range of voices from the data community.

Tools & Resources

2016 JavaScript Rising Stars

Michael Rambeau


The JavaScript community is going full speed on innovation, what was once trendy becomes old-fashioned a few months later.

2016 is over, you may think you missed something important? Don’t worry, we are going to review what were the main trends.

The State of Information Visualization, 2017

Robert Kosara, Eager Eyes blog


Between Surprise Maps, Dear Data, and election graphics, 2016 was an exciting year. There have also been some gaps, however, notably around storytelling. How far have we come, and where is visualization going?

How Bloomberg Integrated Learning-to-Rank into Apache Solr

Tech at Bloomberg


“The latest milestone in open source development at Bloomberg is the incorporation of the Learning-to-Rank (LTR) plug-in into Apache Solr 6.4.0, which shipped this week.”


Internships and other temporary positions

Freelance Contributions

The Pudding; New York, NY
Full-time, non-tenured academic positions

Program Support Analyst, MetroLab Network

Heinz College, Carnegie Mellon University; Washington, DC

Leave a Comment

Your email address will not be published.