Data Science newsletter – September 28, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for September 28, 2017

GROUP CURATION: N/A

Data Science News

CNN Tech, Matt McFarland

from September 11, 2017

Sarah Brayne, a sociology professor at the University of Texas in Austin, conducted more than 100 interviews of officers and civilian employees. She went on ride-alongs in patrol cars and a helicopter, and watched data analysts answer queries from detectives. Brayne also observed divisions adopt the new technologies.

Her results were published online in the American Sociological Review last month.

Experts say that Brayne’s work is a window into the future of law enforcement. It illuminates the promise big data holds for making police work more efficient. But it also shows its perils: how data, which is generally thought to be objective and fair, can exacerbate biases.

For internet gatekeepers, consumer protection laws are better than utility-style regulation

The Brookings Institution, Tom Struble

from September 26, 2017

Back in 2014, former Brookings scholar Robert Litan presciently warned that regulating broadband providers like public utilities in order to protect
Net Neutrality, “could one day boomerang on certain major tech companies, too.” Three years later, that boomerang is now coming back with a vengeance. As progressive luminaries like
Tim Wu and Susan Crawford continue fighting for utility-style regulations for broadband providers, prominent conservatives like
Tucker Carlson and
Steve Bannon have begun demanding similar utility-style regulations for other internet “gatekeepers,” including major websites and online platforms like Google and Facebook

The targets are different, but the arguments attempting to justify these regulations are surprisingly similar. In a nutshell: big corporations have too much control over the free flow of information online, so the government must regulate internet gatekeepers like public utilities in order to protect users from harmful censorship or other discriminatory behavior.

People Are Worried About DHS Plans To Gather Social Media

BuzzFeed News, Adolfo Flores

from September 25, 2017

“There’s a growing trend at the Department of Homeland Security to be snooping on the social media of immigrants.”

I asked Tinder for my data. It sent me 800 pages of my deepest, darkest secrets

The Guardian, Judith Duportail

from September 26, 2017

At 9.24pm (and one second) on the night of Wednesday 18 December 2013, from the second arrondissement of Paris, I wrote “Hello!” to my first ever Tinder match. Since that day I’ve fired up the app 920 times and matched with 870 different people. I recall a few of them very well: the ones who either became lovers, friends or terrible first dates. I’ve forgotten all the others. But Tinder has not.

The dating app has 800 pages of information on me, and probably on you too if you are also one of its 50 million users. In March I asked Tinder to grant me access to my personal data. Every European citizen is allowed to do so under EU data protection law, yet very few actually do, according to Tinder.

Microsoft launches new machine learning tools

TechCrunch, Frederic Lardinois

from September 25, 2017

Microsoft, just like many of its competitors, has gone all in on machine learning. That emphasis is on full display at the company’s Ignite conference this where, where the company today announced a number of new tools for developers who want to build new A.I. models and users who simply want to make use of these pre-existing models — either from their own teams or from Microsoft.

For developers, the company launched three major new tools today: the Azure Machine Learning Experimentation service, the Azure Machine Learning Workbench and the Azure Machine Learning Model Management service.

In addition, Microsoft also launched a new set of tools for developers who want to use its Visual Studio Code IDE for building models with CNTK, TensorFlow, Theano, Keras and Caffe2.

Enlisting Watson: IBM on winning US Army private cloud contracts

DatacenterDynamics, Sebastian Moss

from September 26, 2017

In an increasingly complex and changing world, the US Army is facing more challenges than ever: a rising China, a creative Russia, a wayward North Korea and, perhaps the most difficult of them all – legacy infrastructure.

Faced with overhauling decades of IT buildup and consolidating its data centers, the largest military the world has ever known has struggled, and is behind schedule despite several new initiatives. Here, the private sector eyes an opportunity with huge, multi-year contracts up for grabs.

Fighting for a large slice of this pie is IBM, a company with deep ties to the US government and a significant share of the cloud market.

Columbia Engineers Win NSF Grant to Study NYC Storm Surge Infrastructure Resilience

The Fu Foundation School of Engineering & Applied Science, Columbia University

from September 22, 2017

With the recent Hurricanes Harvey, Irma, and now Maria, which ravaged much of Texas, Florida, and Puerto Rico, as well as Hurricane Katrina and Superstorm Sandy, from which NYC infrastructure is still recovering, it has become clear that addressing threats to infrastructure is critical to keeping our communities safe, functional, and healthy. Storm surge has emerged as one of the most destructive forces on infrastructure, especially interconnected structures in cities. To address this issue, Columbia Engineering Professors George Deodatis, Daniel Bienstock, and Kyle Mandli were recently awarded a two-year $500,000 National Science Foundation (NSF) grant to study storm surge threats to New York City infrastructure.

“Events like these powerful hurricanes have underscored the need for comprehensive plans to protect our infrastructure,” says Deodatis.

Does the Crisis in Science Show That It Is Broken, or Self-Correcting?

Big Think, Jag Bhalla

from September 25, 2017

1. Science needs some tough love (fields vary, but some enable and encourage unhealthy habits). And “good cop” approaches aren’t fixing “phantom patterns” and “noise mining” (explained below).

2. Although everyone’s doing what seems “scientifically reasonable” the result is a “machine for producing and publicizing random patterns,” statistician Andrew Gelman says.

3. Gelman is too kind; the “reproducibility crisis” is really a producibility problem—professional practices reward production and publication of unsound studies.

When Are Citi Bikes Faster Than Taxis in New York City?

Todd W. Schneider

from September 26, 2017

Every day in New York City, millions of commuters take part in a giant race to determine transportation supremacy. Cars, bikes, subways, buses, ferries, and more all compete against one another, but we never get much explicit feedback as to who “wins.” I’ve previously written about NYC’s public taxi data and Citi Bike share data, and it occurred to me that these datasets can help identify who’s fastest, at least between cars and bikes. In fact, I’ve built an interactive guide that shows when a Citi Bike is faster than a taxi, depending on the route and the time of day.

Miles on the MBTA And I’m going to ride all 1,280 of them

Miles from Boston

from September 26, 2017

The 30 was a really nice convenience in the summer, mainly because of its clean schedule. Whereas the 31 runs every 35 minutes during the summer, the 30 is a nice every half hour. Of course, they both run every 15 minutes during the school year, when most people use them, but still…the clockface headways were great.

After some confusion about where the heck the Puffton Village stop was (it doesn’t have a sign), Sam and I hopped aboard the 30 at its northern terminus, a student apartment complex. From there, we headed onto North Pleasant Street, going by more apartment complexes before reaching the roundabout just north of the UMass campus. We served the three main stops on campus, then we made the quick trip to Amherst Center.

Wanted: 1 million people to study genes, habits and health

Associated Press, Lauran Neergaard

from September 25, 2017

The NIH’s massive “All Of Us” project will push what’s called precision medicine, using traits that make us unique in learning to forecast health and treat disease. Partly it’s genetics. What genes do you harbor that raise your risk of, say, heart disease or Type 2 diabetes or various cancers?

But other factors affect that genetic risk: what you eat, how you sleep, if you grew up in smog or fresh air, if you sit at a desk all day or bike around town, if your blood pressure is fine at a check-up but soars on the job, what medications you take.

Startupbootcamp launches insurtech accelerator in Hartford

Digital Insurance, Danni Santana

from September 26, 2017

Startupbootcamp has launched an insurtech accelerator in Hartford, intended to recruit new entrepreneurial and tech talent to the insurance hub.

The incubator program, now accepting applications, is part of The Hartford Insurtech Hub, an initiative created by the insurer to support startups by providing connections to industry partners and investors that will help spur innovation in the region.

Events

Data @ Libs: Inaugural Virtual Reality User Group meeting, 10/5

University of Washington, UW Libraries

from October 05, 2017

Seattle, WA The Health Sciences Library is launching the first meeting of the Virtual Reality User Group with executive sponsors Dr. Edward Verrier, Professor and Chief of Surgery at UW School of Medicine and Tania Bardyn, Associate Dean for University Libraries and Director of the Health Sciences Library. This first meeting will take place on October 5th, 2017 from 11-12pm in the TRAIL room (T216) of the UW Health Sciences Library.

Deadlines

Technology and Employment Survey

University of Oxford researchers want to know how automatable certain tasks are: “We are looking for your opinion: Do you believe that technology exists today that could automate these tasks?”

Ford Foundation Fellowship Program

Through its Fellowship Programs, the Ford Foundation seeks to increase the diversity of the nation’s college and university faculties by increasing their ethnic and racial diversity, to maximize the educational benefits of diversity, and to increase the number of professors who can and will use diversity as a resource for enriching the education of all students. Deadline for applications (Dissertation, Postdoctoral) is December 7.

IJCAI-ECAI-18

Stockholm, Sweden July 13-19, 2018. Deadline for paper submissions is January 31, 2018.

NYU Center for Data Science News

Using Neural Networks to Address Mental Health Crises

Medium, NYU Center for Data Science

from September 21, 2017

Columbia University grad student, Rohan Kshirsagar, collaborates with Sam Bowman and web app maker Koko to invent an online crisis detector.

5 Minutes with Director Richard Bonneau

Medium, NYU Center for Data Science

from September 25, 2017

The Director of the NYU Center for Data Science discusses his exciting vision for the year ahead

10th Data Science Showcase

Moore-Sloan Data Environment, NYU Center for Data Science

from October 10, 2017

New York, NY Tuesday, October 10, starting at 4 p.m., NYU Center for Data Science (60 5th Avenue, 7th Floor). “The Statistics Showcase will provide an overview of statistics research around the university to the Moore-Sloan community and to bring together researchers and students involved in such research.” [free, registration required]

Tools & Resources

Visualization as a Field Is Still InvisibleFacebookGithubLinkedinRSSTwitter

Robert Kosara, Eager Eyes blog

from September 25, 2017

The new series is called What’s Going On in This Graph? and is part of the Times’ Learning Network. They publish a chart once a month with much of its context removed (but enough to figure it out), and ask students to interpret what they are seeing. This is done not in collaboration with the visualization community, but with the American Statistical Association (ASA).

I wanted to learn more about this series, so I contacted Michael Gonchar and Katherine Schulten, who run it on the Times side. Michael Gonchar kindly agreed to talk to me about it. He is a former English and social studies teacher, so he understands the people they make this content for.

Datasets for Natural Language Processing

Machine Learning Mastery, Jason Brownlee

from September 27, 2017

“In this post, you will discover a suite of standard datasets for natural language processing tasks that you can use when getting started with deep learning.”

Google Launches Public Beta of Cloud Dataprep, Built in Collaboration With Trifacta

Trifacta, Sean Ma

from September 21, 2017

Google recently announced that Google Cloud Dataprep—the new managed data wrangling service developed in collaboration with Trifacta—is now available in public beta. This service enables analysts and data scientists to visually explore and prepare data for analysis in seconds within the Google Cloud Platform.

Now that the Google Cloud Dataprep beta is open to the public, more companies can experience the benefits of Trifacta’s data preparation platform. From predictive transformation to interactive exploration, Trifacta’s intuitive workflow has accelerated the data preparation process for Google Cloud Dataprep customers who have tried it out within private beta.

Natural Language Generation | Good vs. Bad Automated Content? It’s In the Context Layer

Automated Insights, Inc., Joe Procopio

from August 24, 2017

The difference between good automated content and bad automated content can be boiled down to the number of scenarios the programmer creates to turn ordinary data into beautiful prose.

Data variability, which is predicated upon the number and the depth of insights driven by changes in the data, is the key quality driver in Natural Language Generation (NLG). And to do NLG data variability right, you have to create a lot of scenarios.

Synthetic Populations for ABMs

Annetta Burger

from September 22, 2017

“Agent-based models are being used for computer experiments in epidemiology, transportation, migration, climate change, and urban studies. Researchers use the models to experiment on simulated human behavior with a synthesized population in a controlled environment. What population synthesis methods are currently being used in ABMs, and how have these synthetic populations been used?”

“Population synthesis is the process of creating agent representations of the model population based on available data. Sample-based methods are more traditional, but new methods also create synthetic populations sample-free.”

Careers

Full-time positions outside academia

Innovation Project Leader, Data Science

GSK; Rockville, MD

Tenured and tenure track faculty positions

Assistant Professor, Interactive Media & Game Development

Worcester Polytechnic Institute; Worcester, MA

Sports.BradStenger.com

Data Science newsletter – September 28, 2017

Leave a Comment Cancel reply