Data Science newsletter – June 4, 2018

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for June 4, 2018

GROUP CURATION: N/A

 
 
Data Science News



Vermont passes first law to crack down on data brokers

TechCrunch, Devin Coldewey


from

While Facebook and Cambridge Analytica are hogging the spotlight, data brokers that collect your information from hundreds of sources and sell it wholesale are laughing all the way to the bank. But they’re not laughing in Vermont, where a first-of-its-kind law hems in these dangerous data mongers and gives the state’s citizens much-needed protections.

Data brokers in Vermont will now have to register as such with the state; they must take standard security measures and notify authorities of security breaches (no, they weren’t before); and using their data for criminal purposes like fraud is now its own actionable offense.


Bloomberg’s Amanda Stent on NLP, Research, and Ethics

Tech at Bloomberg


from

When Amanda Stent, a Natural Language Processing (NLP) Architect at Bloomberg, took on the role of program co-chair for a prestigious conference in computational linguistics, she knew it would be a lot of work. “I had been an area chair several times and I’d developed a lot of ideas about what I wanted to do,” she says. As Stent approaches the culmination of this effort – the opening of 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT) today in New Orleans – she says, “Of course, you never really know what you’re getting yourself into.” Her colleagues from Bloomberg will be presenting three papers at the event.


Wars and clan structure may explain a strange biological event 7,000 years ago, Stanford researchers find

Stanford University, Stanford News


from

It’s not unprecedented for human genetic diversity to take a nosedive once in a while, but the Y-chromosome bottleneck, which was inferred from genetic patterns in modern humans, was an odd one. First, it was observed only in men – more precisely, it was detected only through genes on the Y chromosome, which fathers pass to their sons. Second, the bottleneck is much more recent than other biologically similar events, hinting that its origins might have something to do with changing social structures.

Certainly, the researchers point out, social structures were changing. After the onset of farming and herding around 12,000 years ago, societies grew increasingly organized around extended kinship groups, many of them patrilineal clans – a cultural fact with potentially significant biological consequences. The key is how clan members are related to each other. While women may have married into a clan, men in such clans are all related through male ancestors and therefore tend to have the same Y chromosomes. From the point of view of those chromosomes at least, it’s almost as if everyone in a clan has the same father.

That only applies within one clan, however, and there could still be considerable variation between clans. To explain why even between-clan variation might have declined during the bottleneck, the researchers hypothesized that wars, if they repeatedly wiped out entire clans over time, would also wipe out a good many male lineages and their unique Y chromosomes in the process.


Buying GitHub Would Take Microsoft Back to Its Roots

Bloomberg Deals; Dina Bass , Eric Newcomer , and Mark Bergen


from

For Microsoft Corp., acquiring GitHub Inc. would be both a return to the company’s earliest roots and a sharp turnaround from where it was a decade ago.

The software maker has agreed to acquire GitHub, the code-repository company popular with many software developers, and could announce the deal as soon as Monday, according to people familiar with the matter.

Microsoft’s origin story lies in the market for software-development tools. Decades before former Chief Executive Officer Steve Ballmer jumped up and down on a stage, cheering for “developers, developers, developers,” Bill Gates and Paul Allen co-founded the company to give hobbyists a way to program a new micro-computer kit, the MITS Altair.


Revolutionizing everyday products with artificial intelligence

MIT News, Department of Mechanical Engineering


from

Mechanical engineering researchers are using AI and machine learning technologies to enhance the products we use in everyday life.


Google Plans Not to Renew Its Contract for Project Maven, a Controversial Pentagon Drone AI Imaging Program

Gizmodo, Kate Conger


from

Google will not seek another contract for its controversial work providing artificial intelligence to the U.S. Department of Defense for analyzing drone footage after its current contract expires.

Google Cloud CEO Diane Greene announced the decision at a meeting with employees Friday morning, three sources told Gizmodo. The current contract expires in 2019 and there will not be a follow-up contract, Greene said. The meeting, dubbed Weather Report, is a weekly update on Google Cloud’s business.


Mcity Driverless Shuttle to launch June 4 on North Campus

University of Michigan, The University Record


from

The Mcity Driverless Shuttle will begin operating on North Campus on June 4. It is the nation’s first driverless vehicle research project focusing on user behavior and data collection.

The project will use two fully automated, 11-passenger, all-electric shuttles manufactured by French firm NAVYA. The shuttle will cover a roughly one-mile round-trip route contained to the North Campus Research Complex, carrying students, faculty and staff to the complex from more distant parking.


Teens, Social Media & Technology 2018

Pew Research Center, Monica Anderson and Jingjing Jiang


from

YouTube, Instagram and Snapchat are the most popular online platforms among teens. Fully 95% of teens have access to a smartphone, and 45% say they are online ‘almost constantly’


This tiny camera aims to catch poachers — before they kill

Mongabay, Shreya Dasgupta


from

A Tanzanian game reserve has successfully tested the TrailGuard cryptic camera and 24/7 electronic surveillance system to detect and capture wildlife poachers and their snares.


China Turns to Robotic Policing

OZY, Fast Forward, Ben Halder


from

At Beijing’s Tiananmen Square, stun gun-wielding robots patrol crowds of tourists. While the robots negotiate their own path along designated routes, the stun guns are activated by an officer controlling the bot remotely. In Zhengzhou, the capital of China’s central Henan Province, similar police robots that look like armless Daleks roam the high-speed train station. They use facial-recognition software to help officers identify suspects, interact with customers and answer their questions. Police officers at the station wear facial-recognition sunglasses, developed by Beijing-based tech company LLVision, which pick out fake IDs and identify wanted criminals. And in the central metropolis of Wuhan, the Ministry of Public Security has teamed up with tech giant Tencent to develop a fully automated police station driven by the latter’s AI technology.


How AI Makes Real-Time Analytics More Real

RTInsights, Joe McKendrick


from

Lately, there’s been a lot of progress with real-time analytics, especially when applied against streaming data from systems or devices. But with artificial intelligence coming into the picture, we ain’t seen nothing yet.

That’s the word from a group of McKinsey Global Institute analysts, led by Michael Chui, who connected the dots between AI and hundreds of use cases from across 20 industries in a recent study. Notably, they observe, the most value coming from AI, as indicated by more than two-thirds of projects studied (69%), are in improving the performance of existing analytics efforts. For purposes of clarity, the analysts define AI as “deep learning techniques using artificial neural networks.”


Device allows a personal computer to process huge graphs

MIT News


from

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have now designed a device that uses cheap flash storage — the type used in smartphones — to process massive graphs using only a single personal computer.

Flash storage is typically far slower than DRAM at processing graph data. But the researchers developed a device consisting of a flash chip array and computation “accelerator,” that helps flash achieve DRAM-like performance.

Powering the device is a novel algorithm that sorts all access requests for graph data into a sequential order that flash can access quickly and easily. It also merges some requests to reduce the overhead — the combined computation time, memory, bandwidth, and other computing resources — of sorting.


Dryad partnering with CDL to accelerate data publishing

Data Data Repository, Dryad news and views


from

Dryad is thrilled to announce a strategic partnership with California Digital Library (CDL) to address researcher needs by leading an open, community-supported initiative in research data curation and publishing.


NIH Recommends Precision Medicine Approach to Treat Alzheimer’s

HealthIT Analytics, Jessica Kent


from

Precision medicine, big data analytics, and stakeholder collaboration could improve treatment and prevention of Alzheimer’s disease and related dementias, according to recommendations from the National Institutes of Health (NIH).

The recommendations were the result of the Alzheimer’s Disease Research Summit 2018: Path to Treatment and Prevention, convened by the National Institute on Aging (NIA), part of NIH.

More than 80 leading experts from government, academia, industry, and non-profit organizations gathered to guide continued efforts to build a collaborative, multi-stakeholder research environment capable of delivering urgently needed cures for people at all stages of Alzheimer’s.


Pentagon to Upgrade DCGS ‘Brain’ Using Commercial Software

Bloomberg Government, Chris Cornillie


from

The U.S. Army has launched the next round of upgrades to its battlefield intelligence platform, known as Distributed Common Ground System-Army (DCGS-A), in another procurement likely to pit commercial software vendors against the defense industrial base.

On May 23, the Army released a request for information for the second in a series of incremental updates, or “capability drops,” to DCGS-A, which uses advanced analytics to provide commanders with situational awareness about threats, weather, and terrain.

“Capability Drop-2” (CD-2) calls for commercially proven solutions to upgrade the system’s core enterprise data warehouse component, more commonly known as “the Brain.”


Starting a Second Machine Learning Tools Company, Ten Years Later

Medium, Lukas Biewald


from

I started Weights and Biases because, for the second time in my career, I have deep conviction about what the AI field needs. Ten years ago training data was the biggest problem holding back real world machine learning. Today, the biggest pain is a lack of basic software and best practices to manage a completely new style of coding. Andrej Karpathy describes machine learning as the new kind of programming that needs a reinvented IDE. Pete Warden writes about AI’s reproducibility crisis — there’s no version control for machine learning models and it’s incredibly hard to reproduce one’s own work let alone some else’s. As machine learning rapidly evolves from research projects to critical real-world deployed software we suddenly have an acute need for a new set of developer tools.


Tweet of the Week

Twitter, J. Jennifer Espinoza


from


Data Visualization of the Week

Twitter, Brent Toderian


from

 
Events



Beautiful Games? Putting the World Back in the World Cup

NYU, Institute for Public Knowledge


from

New York, NY June 7, starting at 2 p.m., NYU, Institute for Public Knowledge (20 Cooper Square, 5th Floor). ” Join an all-star team of leading soccer scholars, scribes and talking heads as they discuss the global phenomenon that is the World Cup.” [free, rsvp required]

 
Tools & Resources



BDD100K: A large-scale diverse driving video database

Robohub, Fisher Yu


from

“TL;DR, we released the largest and most diverse driving video dataset with richannotations called BDD100K.”


Glad You Asked: A Snapshot of the Current State of Data Citation

DataCite Blog, Kristian Garza and Martin Fenner


from

“In this blog post we will use data from the Event Data service, with a focus on links between Crossref DOIs and DataCite DOIs, to obtain a snapshot of the current state of data citation. A version of this work was presented at the RDA Plenary Berlin in March 2018, as part of the RDA/WDS Scholarly Link Exchange Working Group session.”

 
Careers


Full-time, non-tenured academic positions

Full Stack Developer



Carnegie Mellon University, Human-Computer Interaction Institute; Pittsburgh, PA

Data Analyst



New York University: Institute of Human Development and Social Change; New York, NY
Full-time positions outside academia

Data Coordinator



Murdoch Children’s Research Institute, Genomics Health Alliances; Parkville, VIC, Australia

Leave a Comment

Your email address will not be published.