Data Science newsletter – February 14, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for February 14, 2018

GROUP CURATION: N/A

 
 
Data Science News



Crowd Workers, AI Make Conversational Agents Smarter

Carnegie Mellon University, News


from

Conversational agents such as Siri, Alexa and Cortana are great at giving you the weather, but are flummoxed when asked for unusual information, or follow-up questions. By adding humans to the loop, Carnegie Mellon University researchers have created a conversational agent that is tough to stump.

The chatbot system, called Evorus, is not the first to use human brainpower to answer a broad range of questions. What sets it apart, says Jeff Bigham, associate professor in the Human-Computer Interaction Institute, is that humans are simultaneously training the system’s artificial intelligence, making it gradually less dependent on people.

Like an earlier CMU agent called Chorus, Evorus recruits crowd workers on demand from Amazon Mechanical Turk to answer questions from users, with the crowd workers voting on the best answer. Evorus also keeps track of questions asked and answered and, over time, begins to suggest these answers for subsequent questions. The researchers also have developed a process by which the AI can help to approve a message with less crowd worker involvement.

“Companies have put a lot of effort into teaching people how to talk to conversational agents, given the devices’ limited command of speech and topics,” Bigham said. “Now, we’re letting people speak more freely and it’s the agent that must learn to accommodate them.”


The 5G Revolution Will Be Televised

Sports Innovation Lab


from

As we watch the Olypmics, a revolution will be unfolding before our very eyes: the arrival of 5G. Korea Telecom has developed the high-speed low latency network that can support mass video streaming and deliver wifi to 250,000 devices simultaneously. They are also partnering with Intel and Samsung on tablet-based video demonstrations of the network. The need for 5G innovation is about more than just live streaming sports content. The applications for this type of technology will be visible in communication between emergency responders and emergency room staff, autonomous vehicles, and connected security technology.


The Scanpy software processes huge amounts of single-cell data

Helmholtz Zentrum München – German Research Center for Environmental Health


from

Scientists from the Helmholtz Zentrum München have developed a program that is able to help manage enormous datasets. The software, named Scanpy, is a candidate for analyzing the Human Cell Atlas, and has recently been published in ‘Genome Biology’.


FCC report finds almost no broadband competition at 100Mbps speeds

Ars Technica, Jon Brodkin


from

If you live in the US and want home Internet service at speeds of at least 100Mbps, you will likely find one Internet service provider in your area or none at all.

The latest Internet Access Services report was released by the Federal Communications Commission last week. The report’s broadband competition chart shows that 44 percent of developed Census blocks had zero home broadband providers offering download speeds of at least 100Mbps and upload speeds of at least 10Mbps.

Forty-one percent of developed Census blocks had one ISP offering such speeds, for a total of 85 percent with zero or one ISP. The remaining 15 percent had two or three providers at that level as of the end of 2016. That’s up a bit from June 30, 2016, when about 12 percent of Census blocks had at least two providers of 100Mbps services.


IBM-Microsoft Spat Elevates Diversity to Tech-Secret Level

Bloomberg Tech, Chris Dolmetsch


from

International Business Machines Corp. called foul on Microsoft Corp.’s hiring of its chief diversity officer in a case that elevates recruiting and promotion of an inclusive workforce to the level of safeguarding proprietary technology.

IBM claims the information that Lindsay-Rae McIntyre possesses — including confidential data about diversity, strategies and initiatives — can cause “real and immediate competitive harm” if she’s allowed to move immediately to Microsoft. IBM sued to enforce a one-year non-competition agreement.

While the lawsuit highlights the contention that can ensue when a senior employee bolts for a rival, it also shines a light on the increasing role that diversity measures play in corporate America. Technology and financial companies have reserved those fights in the past to employees who possessed key technical or strategic knowledge, not those entrusted to make decisions on hiring and the makeup of the workforce.


Tech’s Ethical ‘Dark Side’: Harvard, Stanford and Others Want to Address It

The New York Times, Natasha Singer


from

Laura Noren, who teaches a data science ethics course at New York University, said, “You can patch the software, but you can’t patch a person if you, you know, damage someone’s reputation.”


Extra Extra

Read this contemporary horror story before buying a smart coffee maker.

Team USA sports data analysts want to tip the scales for medals’ competition at the Winter Olympics. (Canada too.)



Sergei Iljukov, of the Research Institute for Olympic Sports in Finland, and Yorck Schumacher, of the Aspetar Orthopaedic and Sports Medicine Hospital in Qatar have been using data science to identify outliers in a given athlete’s performance record. Rooting out cheaters is a worthy goal. As an armchair sports ethnographer, I argue that one of the key collective cultural benefits of sports continuously reaffirming collective commitment to rules that are the same for everyone.



I got my pictures in The New York Times last week. Cool.


Modeling human behavior with Airbnb

EPFL, Medicom


from

Researchers at Idiap and EPFL have been working with psychologists to understand how people form first impressions from photos. They focused on how people respond to properties available on Airbnb. Better analysis of human behavior should allow scientists to program machines capable of making more “human” decisions.

With just a few clicks on TripAdvisor or Airbnb, you can book a romantic apartment for a weekend away with your partner, or a stylish restaurant for a business lunch. The rapid decisions involved, based mainly on images, are far from trivial given their commercial importance and the economic revolution represented by the advent of on-demand economy websites like Airbnb. But what is it about an image that prompts us to describe an interior as “trendy”, “colorful” or “practical”? To answer that question, researchers at Idiap Research Institute and at EPFL have been working with psychologists from the University of Lausanne. They want to gain a better understanding of social media users’ perceptions and behavior and then use this knowledge to program computers capable of making decisions in a more human way. “In the era of big data, machines are increasingly behind a large number of decisions,” explains Daniel Gatica-Perez, adjunct professor at EPFL School of Engineering and Digital Humanities Institute. “Our aim is to make them as similar as possible to human decisions.”


Company Data Science News

Facebook has gotten on journalism’s bad side – maybe by redesigning the news feed algorithm to downplay journos content? But probably by being opaque, which seems to fuel the journalist passion to discover, uncover, and sometimes go undercover to reveal corporate practices. This week, Wired interviewed 51 current or former Facebook employees, revealing extensive corporate surveillance (not that I think Facebook is unique here) and a company struggling to serve all of the communications responsibilities it holds for its 2 billion accounts.

In Germany, a court found Facebook in violation of a German law very similar to the upcoming GDPR regulations. The German Federal Data Protection Act requires that users be given “clear, easy to understand information about the intended use of their data.” Facebook’s default privacy settings, including preactivated location services, were found to violate this law. Rough times for the tech giant.



The AMP technology that Google developed for fast-serving news articles has been gaining reader traction according to Chartbeat.



IBM is suing Microsoft to prevent them from immediately hiring their recently departed chief diversity officer, Lindsay-Rae McIntyre. Instead, IBM wants Microsoft to observe the 1-year non-compete in McIntyre’s contract because her immediate transfer would cause “real and immediate competitive harm” to IBM. This is the first time I’ve ever seen such a battle over a chief diversity officer. Meanwhile, Timnit Gebru is clamoring for more diversity in data science, declaring a “diversity crisis.” I join the clamor, Timnit.



The Rand Corporation has a new report out on “Truth Decay” that takes a systematic look at four potential causes: “cognitive bias; changes in the information system, including social media and the 24-hour news cycle; competing demands on the education system that diminish time spent on media literacy and critical thinking; and polarization, both political and demographic.”



Do NOT use Aetna’s health insurance data as training data. A former medical director at Aetna admitted that the company never looked at doctor’s notes when deciding whether or not a patient’s condition should be covered by their insurance policy. This is bad!



Google’s Cloud Tensor Processing Unit (TPU) chip is now available to researchers. This heats up the simmering hardware showdown between NVidia and Google on hardware designed to enable machine learning tasks. According to Tony Peng Google’s TPUs are “15 to 30 times faster than contemporary GPUs and CPUs in inferencing, and delivered a 30–80 times improvement in TOPS/Watt measure.”

The International Federation of Robotics introduced a new measure that is so apt I can’t believe it doesn’t already exist: robot density. It’s a measure of the number of robots per 10,000 workers. Europe has the highest robot density, at 99 per 10,000 workers, followed by the US at 84 per 10,000 workers. Expect this metric to enter your vocabulary immediately and increase rapidly.



Chris Wiggins, Chief Data Scientist at The New York Times and his nytDEMO team released a couple new data science for news advertising tools last week.



Chemists at Merck and Princeton used a high-throughput reaction technology to produce enough data to train a machine learning algorithm that could predict the outcome of thousands of reactions. This technique promises to dramatically accelerate organic chemistry.



Truth Decay – An Initial Exploration of the Diminishing Role of Facts and Analysis in American Public Life

Rand Corporation, Jennifer Kavanagh and Michael D. Rich


from

Over the past two decades, national political and civil discourse in the United States has been characterized by “Truth Decay,” defined as a set of four interrelated trends: an increasing disagreement about facts and analytical interpretations of facts and data; a blurring of the line between opinion and fact; an increase in the relative volume, and resulting influence, of opinion and personal experience over fact; and lowered trust in formerly respected sources of factual information. These trends have many causes, but this report focuses on four: characteristics of human cognitive processing, such as cognitive bias; changes in the information system, including social media and the 24-hour news cycle; competing demands on the education system that diminish time spent on media literacy and critical thinking; and polarization, both political and demographic. The most damaging consequences of Truth Decay include the erosion of civil discourse, political paralysis, alienation and disengagement of individuals from political and civic institutions, and uncertainty over national policy.

This report explores the causes and consequences of Truth Decay and how they are interrelated, and examines past eras of U.S. history to identify evidence of Truth Decay’s four trends and observe similarities with and differences from the current period. It also outlines a research agenda, a strategy for investigating the causes of Truth Decay and determining what can be done to address its causes and consequences.


State launches Aetna probe after stunning admission – CNN

CNN; Wayne Drash, Elizabeth Cohen and John Bonifield


from

California’s insurance commissioner has launched an investigation into Aetna after learning a former medical director for the insurer admitted under oath he never looked at patients’ records when deciding whether to approve or deny care.

California Insurance Commissioner Dave Jones expressed outrage after CNN showed him a transcript of the testimony and said his office is looking into how widespread the practice is within Aetna.

“If the health insurer is making decisions to deny coverage without a physician actually ever reviewing medical records, that’s of significant concern to me as insurance commissioner in California — and potentially a violation of law,” he said.


PLOS and Cold Spring Harbor Laboratory Enter Agreement to Enable Preprint Posting on bioRxiv

Public Library of Science


from

Public Library of Science (PLOS) and Cold Spring Harbor Laboratory (CSHL) announce an agreement that enables the automatic posting of research articles submitted to PLOS journals on bioRxiv, CSHL’s preprint server for the life sciences. This collaboration between bioRxiv and PLOS empowers authors to share their work on a trusted platform before peer review, accelerating the pace of biomedical research.

 
Events



Defending Democracy: Civil and Military Responses to Weaponized Information

Princeton CITP


from

Princeton, NJ April 7, starting at 11 a.m., Friend Center (65 Olden Street). Co-hosted by Princeton’s Woodrow Wilson School of Public and International Affairs, the Princeton School of Engineering and Applied Science, and the Princeton Veterans Alumni Association. [$$]


Predict Conference

Predict Conference


from

Dublin, Ireland October 2. “We created Predict because no conference out there reflected the current state of technology, where data and artificial intelligence are at the centre of innovation. These are currently the most heavily invested startup areas. Predict attendees will meet innovators and founders operating at the forefront of data and technology.” [$$$]


The Creating Reality Hackathon

University of Southern California


from

Los Angeles, CA March 12-15 at University of Southern California, Tutor Center. [registration required]

 
Deadlines



Deep Learning and Reinforcement Learning Summer School 2018

Toronto, CA Programs will run July 25-August 3 at University of Toronto, Rotman School of Business. Applications will be accepted starting in March.
 
NYU Center for Data Science News



And the Emmy Award Goes to … Professor Claudio Silva

NYU Tandon School of Engineering


from

There may not have been a red carpet involved, but Professor of Computer Science and Engineering Claudio Silva had cause for celebration when the National Academy of Television Arts & Sciences announced the winners of the 2018 Technology and Engineering Emmy Awards, which honor breakthroughs in technology that have a significant effect on television engineering.

Silva is the developer of a visual analytics tool now being used in Major League Baseball stadiums across the country as part of the Emmy-winning Statcast, which combines the tool with a radar-based ball-tracking system and an optical player-tracking solution.

 
Tools & Resources



[1802.01071] Hierarchical Adversarially Learned Inference

arXiv, Statistics > Machine Learning; Mohamed Ishmael Belghazi, Sai Rajeswar, Olivier Mastropietro, Negar Rostamzadeh, Jovana Mitrovic, Aaron Courville


from

We propose a novel hierarchical generative model with a simple Markovian structure and a corresponding inference model. Both the generative and inference model are trained using the adversarial learning paradigm. We demonstrate that the hierarchical structure supports the learning of progressively more abstract representations as well as providing semantically meaningful reconstructions with different levels of fidelity. Furthermore, we show that minimizing the Jensen-Shanon divergence between the generative and inference network is enough to minimize the reconstruction error. The resulting semantically meaningful hierarchical latent structure discovery is exemplified on the CelebA dataset. There, we show that the features learned by our model in an unsupervised way outperform the best handcrafted features. Furthermore, the extracted features remain competitive when compared to several recent deep supervised approaches on an attribute prediction task on CelebA. Finally, we leverage the model’s inference network to achieve state-of-the-art performance on a semi-supervised variant of the MNIST digit classification task.


SLM Lab

Wah Loon Keng


from

A research framework for Deep Reinforcement Learning using Unity, OpenAI Gym, PyTorch, Tensorflow.


ORCID: The number that every academic needs | NCSU Libraries

NCSU Libraries News, Debbie Currie


from

“The Open Researcher and Contributor identifier—or ORCID—easily connects a researcher to his or her research output and allows others to access and share that body of work. ORCID streamlines publication submission and enhances discoverability. And, increasingly, granting bodies are requiring the ORCID as part of their application process.”


Introduction to Learning to Trade with Reinforcement Learning

Denny Britz, WILDML blog


from

“In this post, I’m going to argue that training Reinforcement Learning agents to trade in the financial (and cryptocurrency) markets can be an extremely interesting research problem.”

 
Careers


Internships and other temporary positions

Annual fellowships for aspiring data analysts in soccer/football



North Yard Analytics; New York, NY

Leave a Comment

Your email address will not be published.