Data Science newsletter – January 25, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for January 25, 2019

GROUP CURATION: N/A

 
 
Data Science News



In Davos, U.S. executives warn that China is winning the AI race

The Washington Post, Heather Long


from

Top business leaders here at the World Economic Forum are monitoring China’s slowing economy, but they are eager to make another point: China has taken the lead on the artificial intelligence revolution.

Blackstone chief executive Stephen Schwarzman, who travels frequently to Asia, said he sees an explosion of new AI businesses in China.


Look before you blockchain ​

Gordon and Betty Moore Foundation


from

The Gordon and Betty Moore Foundation funded “Tracing the Supply Chain: How blockchain can enable traceability in the food industry” as part of its efforts to assess the feasibility and costs associated with accelerating blockchain for traceability in beef, soy, farmed shrimp and wild capture tuna supply chains — namely, soy and beef in Latin America and aquaculture in South and Southeast Asia. The report benefited from advisors, including representatives from SAP, Microsoft, IBM, Republic Systems, Synapse Nexus and WWF.


Identifying artificial intelligence “blind spots”

MIT News


from

A novel model developed by MIT and Microsoft researchers identifies instances in which autonomous systems have “learned” from training examples that don’t match what’s actually happening in the real world. Engineers could use this model to improve the safety of artificial intelligence systems, such as driverless vehicles and autonomous robots.

The AI systems powering driverless cars, for example, are trained extensively in virtual simulations to prepare the vehicle for nearly every event on the road. But sometimes the car makes an unexpected error in the real world because an event occurs that should, but doesn’t, alter the car’s behavior.


NBA and Turner sign up to new Warner innovation lab

SportsPro Media, Steven Impey


from

The two organisations have signed a collaboration agreement to help further shape the future of the consumer experience across the co-managed NBA Digital strand, an extensive cross-platform portfolio. Straddling live matches and shoulder content, its platforms include NBA TV, the NBA App, NBA.com, NBA League Pass and the NBA G League website.

“We are always exploring what is next for sports media and what it means for the future NBA experience,” said NBA Commissioner Adam Silver. “This collaboration with WarnerMedia will help identify cutting edge ways to use technology to deliver more immersive experiences to NBA fans.”

Additionally, Warner’s new innovations hub will foster further collaboration between Turner and Xandr, AT&T’s newly launched advertising company, in a bid to improve the delivery of data-driven advertising.


Information theory holds surprises for machine learning

Santa Fe Institute


from

New SFI research challenges a popular conception of how machine learning algorithms “think” about certain tasks.

The conception goes something like this: because of their ability to discard useless information, a class of machine learning algorithms called deep neural networks can learn general concepts from raw data — like identifying cats generally after encountering tens of thousands of images of different cats in different situations. This seemingly human ability is said to arise as a byproduct of the networks’ layered architecture. Early layers encode the “cat” label along with all of the raw information needed for prediction. Subsequent layers then compress the information, as if through a bottleneck. Irrelevant data, like the color of the cat’s coat, or the saucer of milk beside it, is forgotten, leaving only general features behind. Information theory provides bounds on just how optimal each layer is, in terms of how well it can balance the competing demands of compression and prediction.


Government shutdown slows University science research

The Brown Daily Herald, Cate Ryan


from

Since Dec. 22, the University and other research institutions have scrambled to adapt to the unprecedented political stalemate over the federal government shutdown, which has dried up funds for federally supported research projects and now threatens long-term scientific advancement across the nation.

When the government shut down before the holidays, major agencies like the National Science Foundation, the National Endowment for the Humanities, the National Oceanic and Atmospheric Administration and the National Aeronautics and Space Administration all closed without approved appropriations for 2019. Research institutions rely on these agencies not only for funding but also for technological, data-based and research support.

“The federal shutdown is causing uncertainty and complications about research funding at Brown … In some cases the shutdown is slowing or halting research itself,” Vice President for Research Jill Pipher wrote in a statement to The Herald.


Canada, EU, and Africa combine to allow researchers to analyze health data on the largest, most diverse scale

McGill University, Newsroom


from

A collaboration of African, Canadian, and EU researchers came together to announce the CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa) project, establishing an unprecedented multi-continental project to build the infrastructure — data standards, technical protocols, and software — to allow queries and analyses over the distributed data sets made available by each partner, while allowing those partners complete control over the patient data that they have been entrusted with.

Canada’s health data system has always necessarily been federated, and the experience of the Canadian Distributed Infrastructure for Genomics (CanDIG) with building federated queries and analyses over locally controlled private health data is essential to the project. CanDIG member institutions SickKids and McGill University are directly involved with CINECA, and CanDIG as a whole will bring its experience to bear by leading the work of building standard methods for federating queries, and actively participating in building compatible and interoperable systems for login, access control, and running complex distributed analyses.

“CanDIG is already connecting several important Canadian health data sets in cancer research”, said Guillaume Bourque, Director of the Centre for Computational Genomics at McGill and Co-PI of CanDIG.


The Hard Part of Computer Science? Getting Into Class

The New York Times, Natasha Singer


from

Lured by the prospect of high-salary, high-status jobs, college students are rushing in record numbers to study computer science.

Now, if only they could get a seat in class.

On campuses across the country, from major state universities to small private colleges, the surge in student demand for computer science courses is far outstripping the supply of professors, as the tech industry snaps up talent. At some schools, the shortage is creating an undergraduate divide of computing haves and have-nots — potentially narrowing a path for some minority and female students to an industry that has struggled with diversity.

The number of undergraduates majoring in the subject more than doubled from 2013 to 2017, to over 106,000, while tenure-track faculty ranks rose about 17 percent, according to the Computing Research Association, a nonprofit that gathers data from about 200 universities.


Award-Winning Artificial Intelligence Company Yewno Launches Innovative Investment Research Platform, Yewno|Edge

Business Wire, Yewno


from

Yewno, Inc., a leading provider of Artificial Intelligence solutions, launched its revolutionary investment research platform Yewno|Edge. This game changing tool empowers investment managers, analysts, and advisors to stay ahead of the market by delivering deeper knowledge into the factors impacting performance. Yewno|Edge extracts insights from across both fundamental and alternative data, unlocking risk exposures and investment opportunities that would have otherwise gone undetected.


Fake news on Twitter during the 2016 U.S. presidential election

Science; Nir Grinberg, Kenneth Joseph, Lisa Friedland, Briony Swire-Thompson, David Lazer


from

The spread of fake news on social media became a public concern in the United States after the 2016 presidential election. We examined exposure to and sharing of fake news by registered voters on Twitter and found that engagement with fake news sources was extremely concentrated. Only 1% of individuals accounted for 80% of fake news source exposures, and 0.1% accounted for nearly 80% of fake news sources shared. Individuals most likely to engage with fake news sources were conservative leaning, older, and highly engaged with political news. A cluster of fake news sources shared overlapping audiences on the extreme right, but for people across the political spectrum, most political news exposure still came from mainstream media outlets.


Stevens Institute for AI Poised to Spur Research Collaboration and Industry Partnerships

NJ Tech Weekly, Esther Surden


from

In late November, Stevens Institute of Technology (Hoboken) celebrated the launch of its Stevens Institute for Artificial Intelligence (SIAI), with an expo of student and faculty work in this area and a roster of speakers discussing the importance of bringing all AI and machine learning research under one roof.

The university assembled an audience for this event that included key industry and government representatives from NASA, IBM, Pfizer and elsewhere.


Digital Economies at Global Margins

Edited by Mark Graham


from

Over the last five years, more than one billion people became new Internet users. Once, digital connectivity was confined to economically prosperous parts of the world; now Internet users make up a majority of the world’s population. In this book, contributors from a range of disciplines and locations investigate the impact of increased digital connectivity on people and places at the world’s economic margins. Does the advent of a digitalized economy mean that those in economic peripheries can transcend spatial, organizational, social, and political constraints—or do digital tools and techniques tend to reinforce existing inequalities?


World First: Huawei Unveils Groundbreaking 5G Chipset “Tiangang”

Synced


from

Huawei has announced the world’s first core chip for 5G base stations. The Chinese tech giant unveiled the new “Tiangang” chipset today in Beijing.

President of Huawei’s Carrier Business Group Ryan Ding says the chip makes breakthroughs in integration, computing capability and spectral bandwidth. The Tiangang chip’s computing capability is 2.5 times more powerful than previous chips, and it supports the 200 MHz high spectral bandwidth that will be required for future network deployment.


Paper maps versus digital maps: Which are better?

Fast Company, Meredith Broussard


from

Ted Florence is ready for his family trip to Botswana. He has looked up his hotel on Google Maps and downloaded a digital map of the country to his phone. He has also packed a large paper map. “I travel all over the world,” says Florence, the president of the international board of the International Map Industry Association and Avenza, a digital map software company. “Everywhere I go, my routine is the same: I get a paper map, and I keep it in my back pocket.”

With the proliferation of smartphones, it’s easy to assume that the era of the paper map is over. That attitude, that digital is better than print, is what I call “technochauvinism.” In my book, Artificial Unintelligence: How Computers Misunderstand the World, I look at how technochauvinism has been used to create an unnecessary, occasionally harmful bias for digital over print or any other kind of interface. A glance at the research reveals that the paper map still thrives in the digital era, and there are distinct advantages to using print maps.


How Artificial Intelligence Could Improve Access to Legal Information

The Pew Charitable Trusts, Erika Rickard & Lester Bird


from

The Stanford and Suffolk teams, with funding and support from The Pew Charitable Trusts, have collected thousands of online questions about possible legal issues to start developing a data set that can serve to train a natural language processor (NLP)—a subset of AI focused on understanding context in speech. An NLP could recognize that people who seek information online about getting “kicked out” of their rental property, without using the legal term “eviction,” need insight into eviction law.

NLPs work by examining data sets of questions and looking for patterns in words and phrases that match up to a legal term. For example, if hundreds of questions in a database used the phrase “I’m being kicked out” and those questions were all identified by experts as dealing with the legal issue of eviction, then the NLPs would learn that if a person says “I’m getting kicked out,” there is a good chance he or she is being evicted.

 
Events



Registration for a2-dlearn 4 – Feb.23, 2019

Michigan AI Lab, Michigan Institute for Data Science


from

Ann Arbor, MI “We are pleased to present our fourth annual a2-dlearn — a collaboration between the Ann Arbor – Natural Language Processing and Ann Arbor Machine Learning meetups, the Michigan AI Lab and the Michigan Institute for Data Science.” [registration required]

 
Tools & Resources



Amazon open-sources Neo-AI, a framework for optimizing AI models | VentureBeat

VentureBeat, Kyle Wiggers


from

At last year’s re:Invent 2018 conference in Las Vegas, Amazon took the wraps off SageMaker Neo, a feature that enabled developers to train machine learning models and deploy them virtually anywhere their hearts desired, either in the cloud or on-premises. It worked as advertised, but the benefits were necessarily limited to AWS customers — Neo was strictly a closed-source, proprietary affair. That changed this week.

Amazon yesterday announced that it’s publishing Neo’s underlying code under the Apache Software License as Neo-AI and making it freely available in a repository on GitHub. This step, it says, will help usher in “new and independent innovations” on a “wide variety” of hardware platforms, from third-party processor vendors and device manufacturers to deep learning practitioners.


The Bucket List – A Growing and Incomplete List of Future Complexity Explorables

Dirk Brockmann


from

“The Bucket List below contains a few ideas on models and systems that are on the agenda for implementation.”

“If you have ideas about models or systems that are suitable for this site or that are missing contact me and I’ll try to implement them.”


Our data lake story: How Woot.com built a serverless data lake on AWS

AWS Big Data Blog, Karthik Kumar Odapally and Chaya Carey


from

In this post, we talk about designing a cloud-native data warehouse as a replacement for our legacy data warehouse built on a relational database.


Data and Learning Hub for Science (DLHub)

Argonne National Laboratory


from

“A simple way to find, share, publish, and run machine learning models and discover training data for science”


Natural Questions: a New Corpus and Challenge for Question Answering Research

Google AI Blog; Tom Kwiatkowski and Michael Collins


from

Open-domain question answering (QA) is a benchmark task in natural language understanding (NLU) that aims to emulate how people look for information, finding answers to questions by reading and understanding entire documents. Given a question expressed in natural language (“Why is the sky blue?”), a QA system should be able to read the web (such as this Wikipedia page) and return the correct answer, even if the answer is somewhat complicated and long. However, there are currently no large, publicly available sources of naturally occurring questions (i.e. questions asked by a person seeking information) and answers that can be used to train and evaluate QA models. This is because assembling a high-quality dataset for question answering requires a large source of real questions and significant human effort in finding correct answers.

To help spur research advances in QA, we are excited to announce Natural Questions (NQ), a new, large-scale corpus for training and evaluating open-domain question answering systems, and the first to replicate the end-to-end process in which people find answers to questions.

 
Careers


Tenured and tenure track faculty positions

Assistant Professor – in Human Augmentation



University of California-San Diego, Cognitive Science Department; La Jolla, CA
Postdocs

Postdoctoral Position in Creativity and K–12 Computer Science



Harvard University, Graduate School of Education; Cambridge, MA

Leave a Comment

Your email address will not be published.