Data Science newsletter – May 26, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for May 26, 2017


Data Science News

Improving official statistics in emerging markets using machine learning and mobile phone data

EPJ Data Science, Eaman Jahani


Mobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing countries are prepaid means that the data lacks key information about the user, including gender and other demographic variables. This precludes numerous uses of this data in social science and development economic research. It furthermore severely prevents the development of humanitarian applications such as the use of mobile phone data to target aid towards the most vulnerable groups during crisis. We developed a framework to extract more than 1400 features from standard mobile phone data and used them to predict useful individual characteristics and group estimates. We here present a systematic cross-country study of the applicability of machine learning for dataset augmentation at low cost. We validate our framework by showing how it can be used to reliably predict gender and other information for more than half a million people in two countries. We show how standard machine learning algorithms trained on only 10,000 users are sufficient to predict individual’s gender with an accuracy ranging from 74.3 to 88.4% in a developed country and from 74.5 to 79.7% in a developing country using only metadata. This is significantly higher than previous approaches and, once calibrated, gives highly accurate estimates of gender balance in groups. Performance suffers only marginally if we reduce the training size to 5,000, but significantly decreases in a smaller training set. We finally show that our indicators capture a large range of behavioral traits using factor analysis and that the framework can be used to predict other indicators of vulnerability such as age or socio-economic status. Mobile phone data has a great potential for good and our framework allows this data to be augmented with vulnerability and other information at a fraction of the cost. [full text]

Vast set of public CVs reveals the world’s most migratory scientists

Science, Latest News, John Bohannon


Scientists are migratory beasts. It’s just the nature of the job: You spend your days at the border of human knowledge. Depending on the topic, only a dozen people may deeply understand your research—let alone help you push it further—and they are scattered across the world. For many, completing a Ph.D., doing postdoctoral research, and landing a permanent job all in one country is impossible. And so you wander.

Consider Rimantas Kodzius, possibly the most migratory scientist alive. Since leaving his home country of Lithuania in 1995 for graduate school in Austria, Kodzius, 42, has crossed 10 national borders. “I arrived in China just a week ago,” the synthetic biologist wrote in an email in March. “My home is wherever I work, where I live.”

ImageNet Classification with Deep Convolutional Neural Networks

Communications of the ACM; Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton


We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. [full text]

University Data Science News

A group of 14 neurologists, psychologists, and an anthropologist has published a short exhortation to share code and data in Nature Neuroscience. They conclude by noting that “the sociological barriers to code sharing are harder to overcome than the technical ones. Currently, academic success is strongly linked to publications and there is little recognition for producing and sharing code. Code may also be seen as providing a private competitive advantage to researchers.” In other words, as much lip-service as reproducible workflows, including code and data sharing, receive there is deep normative resistance to sharing. Institutional habits for which there are no positive incentives but overt and covert negative incentives (time consuming, suspicion of being scooped) are unlikely to change.

Joon Son Chung at the University of Oxford is working on an audio visual AI application that allows video of the mouths of speakers to be properly synced with the audio, even if the audio is dubbed. This could make it much more pleasant to watch dubbed foreign language TV and movies. (Personal confession: I learned Spanish slang by watching Sex and the City – originally taped in English so I could read their lips when necessary – dubbed with Spanish speakers who may have done work on telenovelas.)

Elsewhere at the University of Oxford, Philip Howard and Robert Gorwa penned an op-ed accusing Facebook of impeding research into the way fake fb accounts that appear to be grassroots activist hubs could be signs of foreign meddling in domestic elections. The authors note that while fb has removed 30,000 fake accounts ahead of the French elections and thousands more ahead of the Brits next election, the company is not doing enough. “Facebook’s refusal to collaborate with scientists and share data has made it difficult to know how many voters are affected or where this election interference comes from.” How do we get industry partners to adopt more reproducible research standards?

Twitter is a friendlier platform for researchers, though it is certainly not free from malfeasance. A group of researchers at Indiana University have created a repository of public datasets and tools to detect, study, and/or avoid bots because “deception and detection technologies are in an arms race.”

James Gosling, father of Java, has joined Amazon Web Services as Distinguished Engineer. Interestingly, Gosling has warned about cloud lock-in before.

NYU invited Pharell Williams to be our university-wide commencement speaker at Yankee Stadium. UC-Berkeley Statistics Department had DJ Khaled for their much more intimate, ~400 person departmental graduation. Graduation speaking so fly.

Brown University computer scientists have added a safeguard to prevent researchers from wandering through their data in visualization software and casually clicking their way to strong correlations that are actually “completely bogus.” My geeky side is very excited about this.

Astronomers are able to predict the ages of red giants with 99 percent accuracy by converting spectrum data into an image the computer could recognize.

When the public assesses scientists, bias rears its confounding head. Researchers at the University of Essex and the University of Cambridge measured the correlation between scientists’ facial appearance and the level of interest members of the public had in their research. Oh, dear. Why go asking questions like this…but of course we want to know what they found: “People were more interested in learning about the work of scientists who were physically attractive and who appeared competent and moral….interest was somewhat stronger for older scientists and slightly lower for females than for males, with little difference between white and nonwhite scientists and no consistent effects of participant gender or age.” How does a person even “appear” moral? What does that entail?

Scientists have international, migratory careers. We all know that already, I assume, but there’s a cool set of visualizations here, from Science. The UK sends the largest number (32%) of newly minted PhDs out of the country, with the US (19%) close behind…and also happy to receive foreign scientists.

Geoff Hinton et al. have a new paper out on arXiv that explains how they did “considerably better than the previous state-of-the-art” model for classifying the ImageNet database.

Why AI gets the language of games but sucks at translating languages

VentureBeat, Hannes Ben and Forward3D


As seen at Google DeepMind’s conference this week, machine learning with AI has seeped into a number of industries in recent years.

Whereas in the past it was more a topic of discussion on theoretical applications, we now see machine learning being applied in smart cars, video games, digital marketing, virtual personal assistants, chatbots, and other areas of daily life. As AI moves to disrupt and improve more sectors, there are still barriers to overcome before we need to fear for our jobs. In a recent translation competition, human beings beat AI, but it’s only a matter of time before machines become digital babel fish.

Virus Hunters Draw a Map of Zika’s Spread With DNA

WIRED, Science, Megan Molteni


In April 2015, researchers in Brazil reported the first case of Zika virus—finally putting a name to the mysterious rash, fever, and joint pain-causing illness that had been swarming the northeast corner of the country. By the time the World Health Organization declared Zika a global health emergency nearly a year later, the outbreak had spread to 26 countries and territories in the Americas, infecting hundreds of thousands of people and leaving many babies with an incurable developmental defect called microcephaly.

Since then, researchers have been racing to develop treatments and vaccines, the first of which entered mid-stage human trials at the end of March. But according to new genetic evidence published today, public health efforts to contain and fight the disease could have—and should have—gotten underway much sooner.

GOAI: Keeping Databases, Analytics, And Machine Learning All On The GPU

The Next Platform, Timothy Prickett Morgan


Moving data is the biggest problem in computing, and probably has been since there was data processing if we really want to be honest about it. Because of the cost of bandwidth, latency, energy, and iron to do multiple stages of processing on information in a modern application that might include a database as well as machine learning algorithms against stuff stored there as well as from other sources, you want to try to do all your computation from the memory of one set of devices.

That, in a nutshell, is what the GPU Open Analytics Initiative is laying the foundation for with regard to GPU-accelerated computing. The effort, which was gently nudged into being behind the scenes by Nvidia, includes as founding members MapD, the creator of one of the upstart GPU databases that not coincidentally is open sourcing its core database as part of the GOAI launch; Continuum Analytics, a creator of Python tools for GPUs; and H2O, which provides machine learning algorithms that run on GPUs. Others including BlazingDB, another GPU database maker, Graphistry, a provider of visual analytics tools, and the Gunrock project for graph processing at the University of California Davis, are contributing their expertise to the initiative.

With $7.5M in funding, Viz will use AI for time-sensitive image analysis

MobiHealthNews, Jonah Comstock


Viz, a San Francisco-based company looking to apply artificial intelligence to medical imaging, has raised $7.5 million in first-round funding. The round includes two tranches — a $4.7 million round that just closed and a $2.5 million round from last fall. The other $300,000 was reserved for employees to contribute. The funding round was led by DHVC and Alphabet Chairman Eric Schmidt’s Innovation Endeavors. AME Cloud Ventures and Susa Ventures also contributed.

“AI in the form of deep learning is actually relatively simple when you think of it,” CEO Dr. Chris Mansi told MobiHealthNews. “It’s just training an algorithm to identify patterns. And doctors, when we look at medical images, we are looking for patterns that we’ve learned about and seen before and categorizing those patterns into diagnoses. So what we do at Viz is we use deep learning to help doctors make that diagnosis faster and get the right patient to the right doctor at the right time.”

[1705.08807] When Will AI Exceed Human Performance? Evidence from AI Experts

arXiv, Computer Science > Artificial Intelligence; Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, Owain Evans


Advances in artificial intelligence (AI) will transform modern life by reshaping transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI. Researchers predict AI will outperform humans in many activities in the next ten years, such as translating languages (by 2024), writing high-school essays (by 2026), driving a truck (by 2027), working in retail (by 2031), writing a bestselling book (by 2049), and working as a surgeon (by 2053). Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years, with Asian respondents expecting these dates much sooner than North Americans. These results will inform discussion amongst researchers and policymakers about anticipating and managing trends in AI.

Why we need to create AI that thinks in ways we can’t even imagine

IDEAS.TED.COM, Kevin Kelly


Because of a quirk in our evolutionary history, we are cruising as the only self-conscious species on our planet, leaving us with the incorrect idea that human intelligence is singular.

It is not.

Our own intelligence is a society of intelligences, and this suite occupies only a small corner of the many types of intelligences and consciousnesses that are possible in the universe. We like to call our human intelligence “general purpose,” because compared with other kinds of minds we have met, it can solve more types of problems, but as we build more and more synthetic minds we’ll come to realize that human thinking is not general at all. It is only one species of thinking.

The kind of thinking done by the emerging AIs today is already somewhat unlike human thinking.

Using AI to create new jobs

O'Reilly Radar, Tim O'Reilly


Tim O’Reilly delves into past technological transitions, speculates on the possibilities of AI, and looks at what’s keeping us from making the right choices to govern our creations. [video, 3:52]

Andreessen, Eric Schmidt, other VC firms looking for biotech experts

CNBC, Christina Farr


Silicon Valley VC funds are realizing they need to hire investors with medical backgrounds to make good bets in the multi-trillion health market.

Much of this hiring is happening quietly, but sources said that Andreessen Horowitz has quietly expanded its bio fund led by Vijay Pande (a rare example of an investor with a background in medicine and tech); Menlo Ventures is looking for investors with a biotech background; and CRV recently made a hire for its bio-engineering fund run by tech investor George Zachary.

Eric Schmidt’s fund Innovation Endeavors recently publicly posted a job for “someone who lives in the intersection of life sciences and computer science.”


Research Seminar and Happy Hour with CUSP Alumni Patrick Atwater

NYU Center for Urban Science and Progress


Brooklyn, NY May 30, starting at 4 p.m., CUSP (1 MetroTech Center, 19th) [free, registration required]


MTA Genius Transit Challenge: Reimagining New York’s Transportation System

Competitors will be given comprehensive briefings on the transit system and will be challenged to find solutions that can be implemented with speed and efficiency. The competition will be judged by a panel of technology and transportation experts, and the State of New York will provide a $1 million “Genius” award for the best ideas in three categories.

Leave a Comment

Your email address will not be published.