Data Science newsletter – September 28, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for September 28, 2017

GROUP CURATION: N/A

 
 
Data Science News



A rare look inside LAPD’s use of data

CNN Tech, Matt McFarland


from

Sarah Brayne, a sociology professor at the University of Texas in Austin, conducted more than 100 interviews of officers and civilian employees. She went on ride-alongs in patrol cars and a helicopter, and watched data analysts answer queries from detectives. Brayne also observed divisions adopt the new technologies.

Her results were published online in the American Sociological Review last month.

Experts say that Brayne’s work is a window into the future of law enforcement. It illuminates the promise big data holds for making police work more efficient. But it also shows its perils: how data, which is generally thought to be objective and fair, can exacerbate biases.


For internet gatekeepers, consumer protection laws are better than utility-style regulation

The Brookings Institution, Tom Struble


from

Back in 2014, former Brookings scholar Robert Litan presciently warned that regulating broadband providers like public utilities in order to protect
Net Neutrality, “could one day boomerang on certain major tech companies, too.” Three years later, that boomerang is now coming back with a vengeance. As progressive luminaries like
Tim Wu and Susan Crawford continue fighting for utility-style regulations for broadband providers, prominent conservatives like
Tucker Carlson and
Steve Bannon have begun demanding similar utility-style regulations for other internet “gatekeepers,” including major websites and online platforms like Google and Facebook

The targets are different, but the arguments attempting to justify these regulations are surprisingly similar. In a nutshell: big corporations have too much control over the free flow of information online, so the government must regulate internet gatekeepers like public utilities in order to protect users from harmful censorship or other discriminatory behavior.


People Are Worried About DHS Plans To Gather Social Media

BuzzFeed News, Adolfo Flores


from

“There’s a growing trend at the Department of Homeland Security to be snooping on the social media of immigrants.”


I asked Tinder for my data. It sent me 800 pages of my deepest, darkest secrets

The Guardian, Judith Duportail


from

At 9.24pm (and one second) on the night of Wednesday 18 December 2013, from the second arrondissement of Paris, I wrote “Hello!” to my first ever Tinder match. Since that day I’ve fired up the app 920 times and matched with 870 different people. I recall a few of them very well: the ones who either became lovers, friends or terrible first dates. I’ve forgotten all the others. But Tinder has not.

The dating app has 800 pages of information on me, and probably on you too if you are also one of its 50 million users. In March I asked Tinder to grant me access to my personal data. Every European citizen is allowed to do so under EU data protection law, yet very few actually do, according to Tinder.


Microsoft launches new machine learning tools

TechCrunch, Frederic Lardinois


from

Microsoft, just like many of its competitors, has gone all in on machine learning. That emphasis is on full display at the company’s Ignite conference this where, where the company today announced a number of new tools for developers who want to build new A.I. models and users who simply want to make use of these pre-existing models — either from their own teams or from Microsoft.

For developers, the company launched three major new tools today: the Azure Machine Learning Experimentation service, the Azure Machine Learning Workbench and the Azure Machine Learning Model Management service.

In addition, Microsoft also launched a new set of tools for developers who want to use its Visual Studio Code IDE for building models with CNTK, TensorFlow, Theano, Keras and Caffe2.


Enlisting Watson: IBM on winning US Army private cloud contracts

DatacenterDynamics, Sebastian Moss


from

In an increasingly complex and changing world, the US Army is facing more challenges than ever: a rising China, a creative Russia, a wayward North Korea and, perhaps the most difficult of them all – legacy infrastructure.

Faced with overhauling decades of IT buildup and consolidating its data centers, the largest military the world has ever known has struggled, and is behind schedule despite several new initiatives. Here, the private sector eyes an opportunity with huge, multi-year contracts up for grabs.

Fighting for a large slice of this pie is IBM, a company with deep ties to the US government and a significant share of the cloud market.


Columbia Engineers Win NSF Grant to Study NYC Storm Surge Infrastructure Resilience

The Fu Foundation School of Engineering & Applied Science, Columbia University


from

With the recent Hurricanes Harvey, Irma, and now Maria, which ravaged much of Texas, Florida, and Puerto Rico, as well as Hurricane Katrina and Superstorm Sandy, from which NYC infrastructure is still recovering, it has become clear that addressing threats to infrastructure is critical to keeping our communities safe, functional, and healthy. Storm surge has emerged as one of the most destructive forces on infrastructure, especially interconnected structures in cities. To address this issue, Columbia Engineering Professors George Deodatis, Daniel Bienstock, and Kyle Mandli were recently awarded a two-year $500,000 National Science Foundation (NSF) grant to study storm surge threats to New York City infrastructure.

“Events like these powerful hurricanes have underscored the need for comprehensive plans to protect our infrastructure,” says Deodatis.


Does the Crisis in Science Show That It Is Broken, or Self-Correcting?

Big Think, Jag Bhalla


from

1. Science needs some tough love (fields vary, but some enable and encourage unhealthy habits). And “good cop” approaches aren’t fixing “phantom patterns” and “noise mining” (explained below).

2. Although everyone’s doing what seems “scientifically reasonable” the result is a “machine for producing and publicizing random patterns,” statistician Andrew Gelman says.

3. Gelman is too kind; the “reproducibility crisis” is really a producibility problem—professional practices reward production and publication of unsound studies.


When Are Citi Bikes Faster Than Taxis in New York City?

Todd W. Schneider


from

Every day in New York City, millions of commuters take part in a giant race to determine transportation supremacy. Cars, bikes, subways, buses, ferries, and more all compete against one another, but we never get much explicit feedback as to who “wins.” I’ve previously written about NYC’s public taxi data and Citi Bike share data, and it occurred to me that these datasets can help identify who’s fastest, at least between cars and bikes. In fact, I’ve built an interactive guide that shows when a Citi Bike is faster than a taxi, depending on the route and the time of day.


Miles on the MBTA And I’m going to ride all 1,280 of them

Miles from Boston


from

The 30 was a really nice convenience in the summer, mainly because of its clean schedule. Whereas the 31 runs every 35 minutes during the summer, the 30 is a nice every half hour. Of course, they both run every 15 minutes during the school year, when most people use them, but still…the clockface headways were great.

After some confusion about where the heck the Puffton Village stop was (it doesn’t have a sign), Sam and I hopped aboard the 30 at its northern terminus, a student apartment complex. From there, we headed onto North Pleasant Street, going by more apartment complexes before reaching the roundabout just north of the UMass campus. We served the three main stops on campus, then we made the quick trip to Amherst Center.


Wanted: 1 million people to study genes, habits and health

Associated Press, Lauran Neergaard


from

The NIH’s massive “All Of Us” project will push what’s called precision medicine, using traits that make us unique in learning to forecast health and treat disease. Partly it’s genetics. What genes do you harbor that raise your risk of, say, heart disease or Type 2 diabetes or various cancers?

But other factors affect that genetic risk: what you eat, how you sleep, if you grew up in smog or fresh air, if you sit at a desk all day or bike around town, if your blood pressure is fine at a check-up but soars on the job, what medications you take.


Startupbootcamp launches insurtech accelerator in Hartford

Digital Insurance, Danni Santana


from

Startupbootcamp has launched an insurtech accelerator in Hartford, intended to recruit new entrepreneurial and tech talent to the insurance hub.

The incubator program, now accepting applications, is part of The Hartford Insurtech Hub, an initiative created by the insurer to support startups by providing connections to industry partners and investors that will help spur innovation in the region.

 
Events



Data @ Libs: Inaugural Virtual Reality User Group meeting, 10/5

University of Washington, UW Libraries


from

Seattle, WA The Health Sciences Library is launching the first meeting of the Virtual Reality User Group with executive sponsors Dr. Edward Verrier, Professor and Chief of Surgery at UW School of Medicine and Tania Bardyn, Associate Dean for University Libraries and Director of the Health Sciences Library. This first meeting will take place on October 5th, 2017 from 11-12pm in the TRAIL room (T216) of the UW Health Sciences Library.

 
Deadlines



Technology and Employment Survey

University of Oxford researchers want to know how automatable certain tasks are: “We are looking for your opinion: Do you believe that technology exists today that could automate these tasks?”

Ford Foundation Fellowship Program

Through its Fellowship Programs, the Ford Foundation seeks to increase the diversity of the nation’s college and university faculties by increasing their ethnic and racial diversity, to maximize the educational benefits of diversity, and to increase the number of professors who can and will use diversity as a resource for enriching the education of all students. Deadline for applications (Dissertation, Postdoctoral) is December 7.

IJCAI-ECAI-18

Stockholm, Sweden July 13-19, 2018. Deadline for paper submissions is January 31, 2018.
 
NYU Center for Data Science News



Using Neural Networks to Address Mental Health Crises

Medium, NYU Center for Data Science


from

Columbia University grad student, Rohan Kshirsagar, collaborates with Sam Bowman and web app maker Koko to invent an online crisis detector.


5 Minutes with Director Richard Bonneau

Medium, NYU Center for Data Science


from

The Director of the NYU Center for Data Science discusses his exciting vision for the year ahead


10th Data Science Showcase

Moore-Sloan Data Environment, NYU Center for Data Science


from

New York, NY Tuesday, October 10, starting at 4 p.m., NYU Center for Data Science (60 5th Avenue, 7th Floor). “The Statistics Showcase will provide an overview of statistics research around the university to the Moore-Sloan community and to bring together researchers and students involved in such research.” [free, registration required]

 
Tools & Resources



Visualization as a Field Is Still InvisibleFacebookGithubLinkedinRSSTwitter

Robert Kosara, Eager Eyes blog


from

The new series is called What’s Going On in This Graph? and is part of the Times’ Learning Network. They publish a chart once a month with much of its context removed (but enough to figure it out), and ask students to interpret what they are seeing. This is done not in collaboration with the visualization community, but with the American Statistical Association (ASA).

I wanted to learn more about this series, so I contacted Michael Gonchar and Katherine Schulten, who run it on the Times side. Michael Gonchar kindly agreed to talk to me about it. He is a former English and social studies teacher, so he understands the people they make this content for.


Datasets for Natural Language Processing

Machine Learning Mastery, Jason Brownlee


from

“In this post, you will discover a suite of standard datasets for natural language processing tasks that you can use when getting started with deep learning.”


Google Launches Public Beta of Cloud Dataprep, Built in Collaboration With Trifacta

Trifacta, Sean Ma


from

Google recently announced that Google Cloud Dataprep—the new managed data wrangling service developed in collaboration with Trifacta—is now available in public beta. This service enables analysts and data scientists to visually explore and prepare data for analysis in seconds within the Google Cloud Platform.

Now that the Google Cloud Dataprep beta is open to the public, more companies can experience the benefits of Trifacta’s data preparation platform. From predictive transformation to interactive exploration, Trifacta’s intuitive workflow has accelerated the data preparation process for Google Cloud Dataprep customers who have tried it out within private beta.


Natural Language Generation | Good vs. Bad Automated Content? It’s In the Context Layer

Automated Insights, Inc., Joe Procopio


from

The difference between good automated content and bad automated content can be boiled down to the number of scenarios the programmer creates to turn ordinary data into beautiful prose.

Data variability, which is predicated upon the number and the depth of insights driven by changes in the data, is the key quality driver in Natural Language Generation (NLG). And to do NLG data variability right, you have to create a lot of scenarios.


Synthetic Populations for ABMs

Annetta Burger


from

“Agent-based models are being used for computer experiments in epidemiology, transportation, migration, climate change, and urban studies. Researchers use the models to experiment on simulated human behavior with a synthesized population in a controlled environment. What population synthesis methods are currently being used in ABMs, and how have these synthetic populations been used?”

“Population synthesis is the process of creating agent representations of the model population based on available data. Sample-based methods are more traditional, but new methods also create synthetic populations sample-free.”

 
Careers


Full-time positions outside academia

Innovation Project Leader, Data Science



GSK; Rockville, MD
Tenured and tenure track faculty positions

Assistant Professor, Interactive Media & Game Development



Worcester Polytechnic Institute; Worcester, MA

Leave a Comment

Your email address will not be published.