NYU Data Science newsletter – August 23, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 23, 2016

GROUP CURATION: N/A

 
Data Science News



Of prediction and policy – Governments have much to gain from applying algorithms to public policy, but controversies loom

The Economist


from August 20, 2016

FOR frazzled teachers struggling to decide what to watch on an evening off, help is at hand. An online streaming service’s software predicts what they might enjoy, based on the past choices of similar people. When those same teachers try to work out which children are most at risk of dropping out of school, they get no such aid. But, as Sendhil Mullainathan of Harvard University notes, these types of problem are alike. They require predictions based, implicitly or explicitly, on lots of data. Many areas of policy, he suggests, could do with a dose of machine learning.

Machine-learning systems excel at prediction. A common approach is to train a system by showing it a vast quantity of data on, say, students and their achievements. The software chews through the examples and learns which characteristics are most helpful in predicting whether a student will drop out. Once trained, it can study a different group and accurately pick those at risk. By helping to allocate scarce public funds more accurately, machine learning could save governments significant sums. According to Stephen Goldsmith, a professor at Harvard and a former mayor of Indianapolis, it could also transform almost every sector of public policy.

 

Tweet of the Week

Twitter


from August 23, 2016

 

NodeXL Graph Gallery: Graph Details

NodeXL Graph Gallery, Marc Smith


from August 22, 2016


The graph represents a network of 2,322 Twitter users whose tweets in the requested range contained “datamining”, or who were replied to or mentioned in those tweets. The network was obtained from the NodeXL Graph Server on Monday, 22 August 2016 at 05:50 UTC.

 

How The Brightest Minds In AI Are Using Gaming to Reinforce Learning

The Machine Learning Conference


from August 19, 2016

Kurzweil’s Law Of Accelerating Returns can be summed up by saying “Humans use the best tools (technology) available to create the next generation of technology”, enabling exponential technological improvement. Open Source projects and the proliferation of knowledge sharing continue to accelerate these effects. We’re excited and inspired by some of these recent developments in gaming and AI!

 

Databite No. 80: Paula Kift + Mark Latonero

Data & Society


from August 10, 2016

Phones, social media, mobile apps, online maps, instant messaging, translation websites, wire money transfers, cell phone charging stations, and Wi-Fi hotspots have all created a new digital infrastructure for global movement. This infrastructure is as critical to refugees today as roads or railways. But digital infrastructures for movement can just as easily be turned into infrastructures for control by governments, corporations, and even criminals. Indeed, governments are increasingly experimenting with similar digital technologies to reinforce their border controls—to collect, process, and instrumentalize data in order to interfere with the movement of “undesirable” migrants.

Mark and Paula will explore these tensions and discuss how this new digital infrastructure 1) facilitates and constrains the flow of data and people, 2) conceals and constructs identity and status, and 3) affects refugees’ fundamental rights to privacy, data protection, and asylum. [video, 1:05:17]

 

understanding and documenting the differences between software and data in the context of citation

GitHub – danielskatz


from August 18, 2016

This repository is intended to be used to discuss and document the differences between software and data in the context of citation in the research record. … We start with the idea that software, while similar to data in terms of not traditionally having been cited in publications, is also different than data. In the context of research (e.g., in science), the term “data” usually refers to electronic records of observations made in the course of a research study (“raw data”) or to information derived from such observations by some form of processing (“processed data”), as well as the output of simulation or modeling software (“simulated data”). In the following, we use the term “data” in this specific sense

 

UW student project taps ORCA cards, unlocks data trove

Seattle Times


from August 19, 2016

Students in a UW summer fellowship program called Data Science for Social Good work to coax valuable information from overlooked data, and one potential upshot might be improved bus service.

 

Could data help solve Seattle’s transportation challenges?

GeekWire


from August 20, 2016

For Seattleites, traffic and transportation can be serious obstacles. Seattle’s booming population has led to busier streets and buses, and for many residents that poses a challenge to working, getting to school, or taking care of their kids.

But the solutions to Seattle’s transportation troubles may be closer than we imagined — buried deep in data sets. This summer, two groups at the University of Washington’s Data Science for Social Good (DSSG) program set out to harness this data, and create tools that will help citizens and transport officials tackle transportation problems.

 

How This Hedge Fund Robot Outsmarted Its Human Master

Bloomberg


from August 21, 2016

Yoshinori Nomura felt like weeping. It was the morning of June 24, Brexit day, and markets were moving against him.

Well, not against him, exactly. It was the hedge fund manager’s self-learning computer program that had placed the bet, selling Japanese stock-index futures before a sizable market advance. Nomura had anticipated a rally, but decided not to interfere, and his fund was paying the price.

Then, in an instant, everything changed. When new vote counts signaled Britain was going to leave the European Union, a burst of selling sent Japanese shares to their biggest drop in five years. By luck or design, Nomura’s Simplex Equity Futures Strategy Fund ended the day with a 3.4 percent gain, one of its best results in three months of trading.

“The machine was right after all,’’ said Nomura, who spent more than three years refining his trading program and now oversees about 3.5 billion yen ($35 million) in the fund, one of the first in Japan to utilize artificial intelligence technology.

 

Apple Acquires Personal Health Data Startup Gliimpse

Fast Company


from August 22, 2016

Apple’s ambitions in the health sector continue to expand, with its digital health team making its first known acquisition—personal health data startup Gliimpse, Fast Company has learned.

Silicon Valley-based Gliimpse has built a personal health data platform that enables any American to collect, personalize, and share a picture of their health data. The company was started in 2013 by Anil Sethi and Karthik Hariharan. Sethi is a serial entrepreneur who has spent the past decade working with health startups, after taking his company Sequoia Software public in 2000. He got his start as a systems engineer at Apple in the late 1980s.

 

Digital Feeding Frenzy Erupts: Internet of Things, Analytics Drive M&A Activity To Record Levels

Forbes, Joe McKendrick


from August 18, 2016

Just when you thought the tech sector couldn’t get any more hyper than it already is, word comes that the Internet of Things (IoT) and data analytics — two emerging cornerstones of the digital economy — are fueling a feeding frenzy of unparalleled proportions in this space. The second quarter of 2016, just wrapped up, broke the all-time record for technology mergers and acquisitions (M&A) valued at or above $1 billion. A lot of this interaction was driven by the push to adopt Internet of Things and data analytics capabilities.

That’s the word from EY, which just released its latest summation of M&A activity, listing 28 deals that took place between April and June of this year at or above the $1 billion mark.

 

Why all findings from clinical trials need to get published

STAT


from August 19, 2016

When a bearded dude at a Brooklyn coffee house says he’s shopping his novel around to agents, no one expects to see it on bookstore shelves in their lifetime. When a scientist says she’s got a pile of data ready to publish, well, that seems like it should see the light of day. But it just as likely won’t.

New research suggests that nearly half of all clinical trials involving kids go unfinished or unpublished — either because the researchers lose interest in the work or take up more pressing projects, or, in some cases, because the companies that funded the studies don’t want the results to get out.

That news won’t come as a surprise to anyone who has followed the fate of studies in general. But it should catch the notice of the FDA: The pharma industry currently gets a special bonus, in the form of extended exclusive marketing rights, for testing their drugs in kids — a rule that was implemented to accelerate research into childhood ailments.

 

Snapchat’s Move Into Search Was Only a Matter of Time

The Motley Fool


from August 20, 2016

Though its exact application isn’t clear at the moment, Snapchat’s recent purchase of search app Vurb likely helps further its budding rivalry with the likes of Facebook, Alphabet, and others.

 
Events



DNA.Land User Group Meeting




 

Talkabot – The conference for botmakers. September 28 and 29 in Austin, TX.



Austin, TX Talkabot is a two day convergence where we will explore the past, present and future of bots in commerce, journalism and entertainment. Talkabot is for developers, entrepreneurs and technologists looking to understand more about what it takes to build a successful messaging experience. [$$$]
 

PAPIs ’16 — PAPIs.io — Where makers of Predictive Applications & APIs meet



Boston “PAPIs ’16 will be the 3rd International Conference on Predictive Applications and APIs” — Monday-Wednesday, October 10-12. [$$]
 
Deadlines



Take O’Reilly’s 2017 Data Science Salary Survey

deadline: Survey

As a data professional, you are invited to share your valuable insights. Help us gain insight into the demographics, work environments, tools, and compensation of practitioners in our growing field. All responses are reported in aggregate to assure your anonymity.

 

Web, Social Media, and Cellphone Data for Demographic Research

deadline: Conference

Bellevue, WA “There is unfortunately very limited communication between population researchers and data scientists. This workshop is intended to foster communication and exchange between the two communities.” — Workshop precedes Socinfo 2016 on Monday, November 14.

Deadline for submissions is Friday, September 30.

 
Tools & Resources



Spark for Scale: Machine Learning for Big Data

SocialCops Blog


from August 23, 2016

Recently we shared an introduction to machine learning. While making machines learn from data is fun, the data from real-world scenarios often gets out of hand if you try to implement traditional machine-learning techniques on your computer. To actually use machine learning with big data, it’s crucial to learn how to deal with data that is too big to store or compute on a single computing machine.

Today we will discuss fundamental concepts for working with big data using distributed computing, then introduce the tools you need to build machine learning models. We’ll start with some naive methods of solving problems, which are meant only as an example. As we move forward, we will make things more realistic.

 

Google Brain Robotics Data

Google Research Blog, Sergey Levine


from August 23, 2016

While we’ve recently seen great strides in robotic capability, the gap between human and robot motor skills remains vast. Machines still have a very long way to go to match human proficiency even at basic sensorimotor skills like grasping. However, by linking learning with continuous feedback and control, we might begin to bridge that gap, and in so doing make it possible for robots to intelligently and reliably handle the complexities of the real world. [The data used in this research is now available.]

 
Careers


Tenured and tenure track faculty positions

Assistant Professor or Associate Professor, Department of Communication
 

University of California-Davis; Davis, CA
 

Assistant Professor, Management and Organizations, Kellogg School of Management
 

Northwestern University; Evanston, IL
 
Full-time positions outside academia

Data Scientist, Marketplace Belonging
 

Airbnb; San Francisco, CA
 

Data Analyst or Director
 

New York, NY; Turnaround for Children
 
Postdocs

Strategic Data Project Data Fellow, Center for Education Policy Research
 

Society for Research on Educational Effectiveness; Cambridge, MA
 
Internships and other temporary positions

Change in NSF Graduate Research Fellowship Program (GRFP) Solicitation » CCC Blog
 

Computing Community Consortium, CCC Blog
 

PhD Studentship: Interfacing Scientific Data And Complex Policy
 

Coventry, England; University of Warwick
 

Leave a Comment

Your email address will not be published.