Data Science newsletter – November 30, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for November 30, 2017

GROUP CURATION: N/A

 
 
Data Science News



WPI professors: Data science can help us fight human trafficking

Worcester Sun, Renata Konrad and Andrew C. Trapp


from

Analytics, the mathematical search for insights in data, could help law enforcement combat human trafficking. Human trafficking is essentially a supply chain in which the “supply” (human victims) moves through a network to meet “demand” (for cheap, vulnerable and illegal labor). Traffickers leave a data trail, however faint or broken, despite their efforts to operate off the grid and in the shadows.

There is an opportunity – albeit a challenging one – to use the bits of information we can get on the distribution of victims, traffickers, buyers and exploiters, and disrupt the supply chain wherever and however we can. In our latest study, we have detailed how this might work.


How artificial intelligence will self-manage the data center

Network World, Patrick Nelson


from

The reality of a self-managing data center is getting closer with HPE’s announcement last week of what it claims to be the first artificial intelligence (AI) predictive engine for trouble in the data center.

HPE says next year it will offer an AI recommendation engine add-on that’s designed to predict and stop storage- and general-infrastructure trouble before it starts. It’s one of a number of autonomous data center components that we should expect to see soon from players. Other AI and machine learning systems geared towards data centers will be available from companies such as Litbit (which I wrote about in the summer) and Oracle, among others.


‘Share, Test and Refine’ – Researchers receive a $3.4 million grant from the NSF’s Office of Advanced Cyberinfrastructure to improve scientific image processing

University of California-Santa Barbara, The UCSB Current


from

Working with scientific images is arduous, and no simple platform exists for sharing them. That is all about to change.

UC Santa Barbara engineers and researchers have been awarded a $3.4 million grant from the National Science Foundation’s Office of Advanced Cyberinfrastructure to build a large-scale distributed image-processing infrastructure (LIMPID) through a broad, interdisciplinary collaboration. Encompassing databases, image analysis and various scientific disciplines, their creation, BisQue, is an image informatics platform that makes it easy to share, distribute and collaborate with large image datasets.

“Think of BisQue as Google Docs for scientific images,” said UCSB principal investigator B. S. Manjunath, who directs the campus’s Center for Multimodal Big Data Science and Healthcare. “Imaging data is ubiquitous and much of big-data science is image-centric. Working with such data should be as simple as working with text files in Google Docs.”


Google Big Data Mining The Doctor’s Office? It May Save A Lot Of Time

American Council on Science and Health, Chuck Dinerstein


from

One of the enormous problems with electronic health records (EHR) is that it requires lots of data entry. When physicians perform this task, it takes away from valuable time and expertise that could be given over to patients not screens. [1]

Google, in some exploratory work is showing how the same type of speech recognition systems in Google Home can be adapted to medical care.

The preprint paper by Chung-Cheng Chiu et al. made use of two deep learning models to abstract medically relevant information from the conversations of physicians and patients.


UC Berkeley graduate students win $100K in world data science contest

The Daily Californian, Mary Kelly Ford


from

A team of UC Berkeley graduate students used data science and analysis to win $100,000 in an international data science competition hosted by hedge fund Citadel and talent solutions tech company Correlation One on Monday.

The students, Eric Munsing, Sören Künzel, Jake Soloff and Allen Tang, beat teams around the world, including from Harvard University and Massachusetts Institute of Technology, to win the Data Open championship. Their winning data research proposal analyzed the impact of charter schools on education.


How machine learning creates new professions — and problems

FT.com, Richard Waters


from

Please use the sharing tools found via the email icon at the top of articles. Copying articles to share with others is a breach of FT.com T&Cs and Copyright Policy. Email licensing@ft.com to buy additional rights. Subscribers may share up to 10 or 20 articles per month using the gift article service. More information can be found at https://www.ft.com/tour.
https://www.ft.com/content/49e81ebe-cbc3-11e7-8536-d321d0d897a3

This is not a discipline open only to computer scientists. As a field that revolves around the collection, collation and analysis of data, it spans several fields. Maths, statistics and programming all play a part.

Many non-specialist executives will also find themselves in need of more than a passing familiarity with the field, as they interact with machine learning experts on the front lines of business.

There is no single job description that encompasses this emerging field. Many people who have traditionally been called “data analysts” aspire these days to the title of “data scientist”, says Anthony Goldbloom, founder and chief executive of Kaggle. His company, which was acquired this year by Google, maintains an informal network of experts around the world. He adds that data scientists, in turn, aspire to become machine learning experts.


Kindred, UofL form partnership to develop health care technology solutions

Kindred Healthcare


from

Kindred Healthcare Inc. and the University of Louisville have formed an innovative partnership focused on creating health care technology solutions that improve the lives of the aging population.

The partnership, which is being called HIVE, is housed in 3,500 square feet of the former K-I Lumber & Building Materials office building at Floyd and Lee streets, near UofL’s Belknap Campus. The University of Louisville Foundation purchased that property two years ago.

At the site, Kindred employees work with students and faculty from UofL’s J.B. Speed School of Engineering to develop apps that will benefit the aging community. To date, the partners have built Kindred’s first mobile compliance tracking app. They currently are working on an app that would allow nurses to assess a patient’s eligibility for hospice care. Several other projects are in the pipeline.


Artificial intelligence startup spun out of Duke University hires new CEO

The Herald Sun (Durham, NC), Zachary Eanes


from

[Robbie] Allen has been named the new CEO of Infinia ML, an artificial intelligence company that was founded earlier this year in Durham by Lawrence Carin, Duke University’s vice provost for research and a computer engineering professor. The company also boasts $10 million in financing from Carrick Capital Partners.

“I have really enjoyed (the Ph.D. program), which is why I am surprised I am doing this,” Allen said. “I said it would have to be about perfect opportunity to stop what I am doing with my Ph.D.”


With ‘material robotics,’ intelligent products won’t even look like robots

Oregon State University, News and Research Communications


from

Robots as inconspicuous as they are ubiquitous represent the vision of researchers in the new and burgeoning field of material robotics.

In an invited perspective paper published today in Science Robotics, Oregon State University researcher Yiğit Mengüç and three co-authors argue against looking at robotics as a “dichotomy of brain versus body.”

Mengüç and collaborators from the University of Colorado, Yale University and École Polytechnique Fédérale de Lausanne take a view that seeks to dissolve the basic assumption that robots are either “machines that run bits of code” or “software ‘bots’ interacting with the world through a physical instrument.”


At Berkeley, a New Generation of Ethical Hackers Learns the Do’s and Don’ts of Cybersecurity

The New Yorker, Anna Wiener


from

Whenever I teach a security class, it happens that there is something going on in the news cycle that ties into it,” Doug Tygar, a computer-science professor at the University of California, Berkeley, told me recently. Pedagogically speaking, this has been an especially fruitful year. So far in 2017, the Identity Theft Resource Center, an American nonprofit, has tallied more than eleven hundred data breaches, the highest number since 2005. The organization’s running list of victims includes health-care providers, fast-food franchises, multinational banks, public high schools and private colleges, a family-run chocolatier, an e-cigarette distributor, and the U.S. Air Force. In all, at least a hundred and seventy-one million records have been compromised. Nearly eighty-five per cent of those can be traced to a single catastrophic breach at the credit-reporting agency Equifax. That hack was reported in early September—just as Tygar and his students were settling into the third week of a new course called “Cyberwar.”

The purpose of the course, according to Tygar’s faculty Web page, is to teach Berkeley’s budding computer scientists to “forensically examine real cyberwar attacks” with an eye toward preventing them. Occasionally, this might mean mounting attacks of their own. Penal codes around the U.S. are not especially lenient when it comes to cybercrime; in some states, certain computer crimes are considered Class C felonies, on par with arson and kidnapping. So, for the hands-on portion of their studies, Tygar’s students rely on HackerOne, a sort of marketplace-cum-social-network devoted to “ethical hacking.” Companies, organizations, and government agencies use the site to solicit help identifying vulnerabilities in their products––or, as Tygar put it, “subject themselves to the indignity of having undergraduate students try to hack them.” In exchange for information about what they’re doing wrong, many of these clients offer monetary rewards, known as bug bounties.


The machine itself: The state of AI in the data center

DatacenterDynamics, Sebastian Moss


from

Machine learning, a subset of artificial intelligence, is about to transform everything – if you believe breathless reports from the government and in the media. It will change or replace the workforce, create new businesses, and revolutionize existing industries with new innovations, strategies and dangers.

Data centers are standing by to provide a home where those AI systems will be trained and run. But how will the algorithms transform the data centers themselves? Machine learning promises to unlock efficiencies, but that will mean handing over control to the machines. Are we ready for that?

“You can call it AI, you can call it machine learning, you can call it different names – but at the end of the day what you’re trying to do is predict a future outcome based on past data,” IBM’s VP of analytics development, Dinesh Nirmal, told DCD.


How the Supreme Court could keep police from using your cellphone to spy on you

The Washington Post, Bruce Schneier


from

Traditionally, information that was most precious to us was physically close to us. It was on our bodies, in our homes and offices, in our cars. Because of that, the courts gave that information extra protections. Information that we stored far away from us, or gave to other people, afforded fewer protections. Police searches have been governed by the “third-party doctrine,” which explicitly says that information we share with others is not considered private.

The Internet has turned that thinking upside-down. Our cellphones know who we talk to and, if we’re talking via text or email, what we say. They track our location constantly, so they know where we live and work. Because they’re the first and last thing we check every day, they know when we go to sleep and when we wake up. Because everyone has one, they know whom we sleep with. And because of how those phones work, all that information is naturally shared with third parties.


Amazon Is Asking Indians To Hand Over Their Aadhaar, India’s Controversial Biometric ID, To Track Lost Packages

BuzzFeed News, Pranav Dixit


from

The move has baffled Amazon’s Indian customers, who are already grappling with being forced to link their Aadhaar numbers to open bank accounts, get life insurance, and much more.


UW’s Sounding Board wins inaugural Amazon Alexa Prize

University of Washington, Allen School News


from

A team of students and faculty from the Allen School and the University of Washington Department of Electrical Engineering has won the inaugural Alexa Prize from Amazon for Sounding Board, a socialbot that interacts with users through engaging and informative conversation. Sounding Board was chosen from among three finalists as part of a worldwide university competition designed to advance the state of the art in conversational artificial intelligence.


AI Is Dreaming Up New Kinds of Video Games

MIT Technology Review, Simon Parkin


from

Michael Cook, a 30-year-old senior research fellow at the University of Falmouth, has built an AI capable of imagining new video games from scratch. Cook calls the machine Angelina, a recursive acronym that stands for “A Novel Game-Evolving Labrat I’ve Named Angelina” (a joke that Cook says got old pretty quickly). Since its earliest form, in 2011, it has created hundreds of experimental video games, received acclaim in an international game-making competition, and had its work featured in a New York gallery exhibit.

Game-making algorithms are almost as old as video games, but their use has typically been limited to generating terrain and other simple digital art. The next frontier is using increasingly sophisticated machine-learning techniques to design entirely new kinds of games that have, to date, evaded the human imagination. Although in some domains—such as law, ethics, and health care—ignorance of tradition and precedent is a clear weakness for an AI, in game design it is a strength that could unlock new creativity. Moreover, it could help AI designers lower the cost of making games.


Controversial software is proving surprisingly accurate at spotting errors in psychology papers

Science, Dalmeet Singh Chawla


from

When Dutch researchers developed an open-source algorithm designed to flag statistical errors in psychology papers, it received mixed reaction from the research community—especially after the free tool was run on tens of thousands of papers and the results were posted online. Many questioned the accuracy of the algorithm, named statcheck, or said the exercise amounted to public shaming.

But statcheck actually gets it right in more than 95% of cases, its developers claim in a study posted on the preprint server PsyArXiv on 16 November. Some outsiders agree, and are calling for routine use. “The new paper convincingly shows that statcheck is indeed robust,” says Casper Albers, a psychometrician at the University of Groningen in the Netherlands. Others still aren’t convinced.


An artificial intelligence algorithm developed by Stanford researchers can determine a neighborhood’s political leanings by its cars

Stanford University, Stanford News


from

Stanford researchers are using computer algorithms that can see and learn to analyze millions of publicly available images on Google Street View to determine the political leanings of a given neighborhood just by looking at the cars on the streets.


Precision Medicine: A Second Opinion

American College of Cardiology, Michael Joyner


from

“Precision medicine (PM),” formerly known as “personalized medicine,” is a series of ideas that stem in part from enthusiasm for the Human Genome Project (HGP).1 The idea is that through insights from the HGP and follow-up genomic studies, it should be possible to “read” a given individuals genetic “code,” and based on variations in that code engage in “P4 medicine;” that is, predictive, preventive, personalized, and participatory.2 Additionally, many elements of this paradigm seem remote and/or upstream from the geriatric patient with multiple medical problems which frequently include iatrogenic issues associated with polypharmacy.3

That said, it seems to me that when any new claims of improved diagnostic testing and downstream paradigms emerge, they should be evaluated by asking four simple questions: 1) Is what is being discussed measurable? 2) If measurable, is it meaningful? 3) If meaningful, is it actionable? 4) And, finally, if actionable, is it durable?


Madison Avenue Joins NBCU Summit to Address TV Problems

Variety, Brian Steinberg


from

A parade of Madison Avenue’s top executives streamed into a posh Manhattan restaurant in hopes of drumming up new remedies for some of the industry’s longest and most festering problems.

NBCUniversal called the meeting – a rare convocation of executives from fierce rivals – earlier this month, as the industry faces a mounting list of challenges to its ability to monetize advertising. Technology has granted couch potatoes a dizzying array of new ways to unhitch themselves from their traditional TV set – and the media buyers and sellers can’t agree upon a standard set of rules governing how ads are placed in emerging venues like mobile screens and streaming video, nor can they reach consensus on how to measure the viewers who watch them.

“We’ve got a problem,” said Linda Yaccarino, chairman of ad sales and client partnerships at NBCU. If the industry can’t come together, she said, it risks seeing the current ad-supported ecosystem that supports the industry erode over the next decade. “We can’t leave without a meaningful plan for action and follow-up.”


This Is Why WeWork Is Buying Meetup

WIRED, Backchannel, Jessi Hempel


from

I was surprised to learn that Meetup has just sold itself to the poster child of hyper-growth: WeWork. After raising $4.4 billion from Softbank’s humongous Vision Fund, WeWork is now valued at close to $20 billion, putting it in league with Uber and Airbnb as one of the most highly valued private US tech startups. (There are new reports surfacing that WeWork executives traveled to Israel over the holidays to raise more money.)

WeWork’s cofounder and CEO Adam Neumann is rushing to build out a company that endeavors to control the future of physical space. Its executives talk in sweeping terms about an addressable market that encompasses every last square foot of office space in the world. (“Tokyo is a billion. New York City is 400 million. Kansas City is 50 million.”)

Ask Neumann what he’s building however, and he’ll describe a community-manufacturing machine—a startup that, according to its mission, is a place where people “work to make a life, not just a living.

 
Events



Single Cell Genomics Day : A Practical Workshop

Satija Lab at the New York Genome Center and NYU


from

New York, NY Friday, January 19, 2018, starting at 9:30 a.m., NYU Center For Genomics and Systems Biology (12 Waverly Place). [free, registration required]


Southern Data Science Conference

SDSC


from

Atlanta, GA April 13-14, 2018. “SDSC is a special data science R&D conference that brings experts and researchers from the top data science companies and institutes to present their work and share their best practices in data science.” [$$$]


Diversity of Backgrounds: A Career Transition into Engineering

Bloomberg Women in Technology


from

New York, NY December 5, starting at 6 p.m. “ANITAB.org New York and Bloomberg Women in Technology invite you for a panel discussion in which five software engineers from Bloomberg share personal stories about their transition into the tech industry.”


Time Machine 2017

SparkCognition


from

Austin, TX December 13, starting at 8 a.m. “Join SparkCognition, along with a few of the world’s foremost AI leaders in tech, government, business, and research for our first annual Time Machine Summit.” [$$$]

 
Deadlines



State of the Stats 2017

Welcome to the third annual State of the Stats survey! Every year I try to gather people’s hopes, fears, bantz and snidery into one set of data to provide some insight into the health of the [soccer] analytics community.

Seeking a PhD Student

“We are seeking a new PhD student for the Fall of 2018 working on our Natural Visualization project that is sponsored by the National Science Foundation.” Deadline to apply to Georgia Tech CS PhD program is December 15.

NASA Datanauts: Apply to the Spring 2018 Class! | openNASA

“Are you a storyteller? We’d love you to explore new ways to share stories about data investigations you and your fellow Datanauts undertake together.” Deadline to apply is December 22.
 
NYU Center for Data Science News



PhD Candidate Profile: Xintian Han

Medium, NYU Center for Data Science


from

Who are our PhD students? Where do they come from, what are they studying now, and where do they hope to go in the future? Find out more about one of our PhD candidates, Xintian Han!


What kind of data do location search apps like Foursquare use?

Medium, NYU Center for Data Science


from

Foursquare’s Staff Scientist Enrique Cruz tells us how data science transformed the app into a major location search tool


PhD Candidate Profile: Vladimir Kobzar

Medium, NYU Center for Data Science


from

I am a Ph.D. candidate at the NYU Center for Data Science, where I am a member of the Math and Data Group. I earned my MS in Mathematics at the Courant Institute under the advisement of Professor Afonso Bandeira, as well as my JD and LLM from the NYU School of Law.

Before joining the CDS, I worked as a researcher at Argonne National Laboratory on the development of machine learning models for analysis of time-resolved X-ray scattering data.

 
Tools & Resources



[1711.09534] Neural Text Generation: A Practical Guide

arXiv, Computer Science > Computation and Language; Ziang Xie


from

Deep learning methods have recently achieved great empirical success on machine translation, dialogue response generation, summarization, and other text generation tasks. At a high level, the technique has been to train end-to-end neural network models consisting of an encoder model to produce a hidden representation of the source text, followed by a decoder model to generate the target. While such models have significantly fewer pieces than earlier systems, significant tuning is still required to achieve good performance. For text generation models in particular, the decoder can behave in undesired ways, such as by generating truncated or repetitive outputs, outputting bland and generic responses, or in some cases producing ungrammatical gibberish. This paper is intended as a practical guide for resolving such undesired behavior in text generation models, with the aim of helping enable real-world applications.


My Favorite Tool: Rasterio

Data Carpentry, Robert Sare


from

“Rasterio makes manipulating gridded spatial data (rasters) simple and brings these data into the Python ecosystem.”


Announcing a New rOpenSci Software Review Collaboration

rOpenSci


from

rOpenSci is pleased to announce a new collaboration with the Methods and Ecology and Evolution (MEE), a journal of the British Ecological Society, published by Wiley press 1. Publications destined for MEE that include the development of a scientific R package will now have the option of a joint review process whereby the R package is reviewed by rOpenSci, followed by fast-tracked review of the manuscript by MEE. Authors opting for this process will be recognized via a mark on both web and print versions of their paper.


Open University Learning Analytics dataset

Nature, Scientific Data, Jakub Kuzilek, Martin Hlosta & Zdenek Zdrahal


from

“Learning Analytics focuses on the collection and analysis of learners’ data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University. What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students’ interactions in the Virtual Learning Environment (VLE).”


Sentiment analysis methods for understanding large-scale texts: A case for using continuum-scored words and word shift graphs

SpringerOpen blog, Andrew Reagan


from

“Due to the emergence and continuously increasing usage of social media services all over the world, it is now possible to estimate in real-time how entire groups of people are feeling at a given point. However, in order to be able interpret the available data correctly, the right tools and methods need to be used. A new article published in EPJ Data Science examines a range of such methods and shows their ability but also their limitations.”

 
Careers


Full-time positions outside academia

Technical account manager



Figshare; North London, England
Tenured and tenure track faculty positions

Positions in the Information Systems Group



NYU Stern School of Business; New York, NY

Leave a Comment

Your email address will not be published.