Data Science newsletter – February 2, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for February 2, 2018

GROUP CURATION: N/A

 
 
Data Science News



Data Visualization of the Week

The Economist


from


Tweet of the Week

Twitter, Katie Mack


from


The Shallowness of Google Translate

The Atlantic, Douglas Hofstadter


from

The program uses state-of-the-art AI techniques, but simple tests show that it’s a long way from real understanding.


How Amazon Rebuilt Itself Around Artificial Intelligence

WIRED, Backchannel, Steven Levy


from

How deep learning came to power Alexa, Amazon Web Services, and nearly every other division of the company. [long read]


Artificial Intelligence: The Complete Guide

WIRED, Business, Tom Simonite


from

For most of us, the most obvious results of the improved powers of AI are neat new gadgets and experiences such as smart speakers, or being able to unlock your iPhone with your face. But AI is also poised to reinvent other areas of life. One is health care. Hospitals in India are testing software that checks images of a person’s retina for signs of diabetic retinopathy, a condition frequently diagnosed too late to prevent vision loss. Machine learning is vital to projects in autonomous driving, where it allows a vehicle to make sense of its surroundings.

There’s evidence that AI can make us happier and healthier. But there’s also reason for caution. Incidents in which algorithms picked up or amplified societal biases around race or gender show that an AI-enhanced future won’t automatically be a better one.


University Data Science News

MIT launched MIT Intelligence Quest, a two part effort that will require hundreds of millions of dollars from industry partners. The structure is described essentially as The Core effort to develop the basic scientific underpinnings of data science and The Bridge unit getting this science ready for real-world application.



MIT Sloan fellow Michael Schrage warns against overconfidence in machine learning. I’m guessing my readers are already equipped with healthy skepticism around ML hype, but it’s still valuable to understand Schrage’s concerns. He sees ML in business contexts all the time and notes, “It’s precisely because machine learning and artificial intelligence platforms are supposed to be ‘smart’ that they pose uniquely challenging organizational risks. They are likelier to inspire false and/or misplaced confidence in their findings; to amplify or further entrench data-based biases; and to reinforce — or even exacerbate — the very human flaws of the people who deploy them.” Now that I’ve got my ethics course into our curriculum, I’m trying to develop a course that teaches students how to safely, articulately put their findings into practice.



Jytte Klausen and Anura Jayasumana were awarded $731,000 over two years from the National Institute of Justice to predict radicalization of would-be violence instigators in real-time using a multimedia data archive and network science. The goal is to inform local law enforcement about violence before it happens. I know these projects are designed to stop violence, a worthy goal, but they all sound like they could lead to a Minority Report situation or a repeat of the DACA list that was originally gathered for good reasons being used to deport people. I would also note that I only see funding reports for anti-Islamist terror. Nobody seems to be getting grants to detect domestic terror with data science.



Yoshua Bengio and his team at the Montreal Institute for Learning Algorithms put out an arxiv paper on MILABOT, a deep reinforcement learning chatbot developed for the Amazon Alexa Prize. It “performed significantly better than other systems” at small talk. Right. I know that sounds like a strange way to measure success, but small talk is hard.



Fei Fei Li was at Davos and has a Twitter thread summarizing her takeaways on AI. Spoiler alert: it wasn’t geeky enough for her, but she still had fun.

Mark Hansen, Director of the Brown Institute for Media Innovation at Columbia was part of the team that created a seriously excellent investigative report in The New York Times on “social media’s black market.” We all knew about fake followers and paying for engagement (e.g. buying likes and retweets). The article does a great job of showing how the affordances of the platform (e.g. Twitter) are part of a sociotechnical system in which there is a brute logic to being fake. Even Michael Dell’s authenticated Twitter account has fake followers. Dell declined to comment. In class, several students reported concern that they may be addicted to Instagram.

Julia Dressel and Hany Farid of Dartmouth show that the controversial COMPAS recidivism prediction algorithm is no better at predicting recidivism than 1) people who don’t know much about recidivism or 2) a linear model with two features.



The Engineering and Physical Sciences Research Council in the UK announced £14m in new funding for data science to be spread across four universities.



NYU’s GovLab has launched a curated set of papers on the Open Governance Research Exchange.



The World Bank Data Team’s 12 funded projects, all of which require a government or non-profit partner, may end up publishing on the new Open Governance Research Exchange.



Howard University’s 3-month pilot partnership with Google at the Mountain View campus was a success, will expand to a full year.



In other Googley university partnerships, DeepMind has announced it will endow a chair in computer science at the University of Alberta.



I will be reading this careful, rigorous AI and Jobs paper by Boston University School of Law economist James Bessen and this one on AI, Labor Productivity, and the Need for Firm-Level Data by Robert Seamans and Manav Raj (both NYU Stern School of Business) over the weekend.



Michael Dietze (Boston University) wants to improve your relationship with nature by offering ecological forecasts of things like whether or not you can expect an algal bloom at the beach, ticks in the woods, or fall foliage in the hills. The utility for hedonic individuals is obvious, but the broader purpose is to improve ecosystems by allowing farmers, private landowners, and local governments to better craft their land-use strategies.



Physicists have a powerful new data science detection apparatus for finding anomalies in the universe. Black holes, ahoy!



It is now possible to get a PhD in imaging science in St. Louis at Washington University. There is already a PhD in Imaging Science available at Rochester Institute of Technology.

At Yale University nearly one-quarter of the undergrad class is currently enrolled in Psychology and the Good Life. NYU, offers a similar course, The Science of Happiness, that has yet to go viral, but the NYU class is reportedly “like a TED talk.”



India Institute of Technology Professor Pushpak Bhattacharyya and a team of four doctoral students have cracked the code on detecting sarcasm, a known problem for accurate sentiment analysis. This problem may have been exacerbated by current geopolitical trends, as political discourse (and disbelief) can be a source of defensive sarcasm.



Luis Ceze, a computer scientist at the University of Washington wants to build a search engine out of DNA. They are collecting 10,000 images to encode in DNA, using it as a memory device, which has already been done. (And is super exciting.) The new challenge is to use DNA as a querying device, which is even more exciting. This is tied for moonshot project of the week with the neuromorphic technology coming out of NIST (below in Government News).


Darpa Wants to Build an Image Search Engine out of DNA

WIRED, Science, Megan Molteni


from

Most people use Google’s search-by-image feature to either look for copyright infringement, or for shopping. See some shoes you like on a frenemy’s Instagram? Search will pull up all the matching images on the web, including from sites that will sell you the same pair. In order to do that, Google’s computer vision algorithms had to be trained to extract identifying features like colors, textures, and shapes from a vast catalogue of images. Luis Ceze, a computer scientist at the University of Washington, wants to encode that same process directly in DNA, making the molecules themselves carry out that computer vision work. And he wants to do it using your photos.

On Wednesday, Ceze’s team at UW launched a social media campaign to collect 10,000 images from around the world and preserve their pixels in the As, Ts, Cs and Gs that make up the building blocks of life. They’ve done this sort of thing before; in 2016 they encoded an entire OK Go music video—setting the record for most amount of data stored in DNA. But this time they decided to crowdsource the data, building a website where people can submit photos and encouraging people to share their images on social media with the hashtag #MemoriesInDNA. “DNA can last thousands of years,” says Ceze. “So this is essentially a time capsule. What do you want to preserve forever?”


Lack of Vision Among Robocar Visionaries

EE Times, Junko Yoshida


from

Hagai Zyss, CEO of the Israeli company Autotalks, is one executive not afraid to confront the lack of vision among automotive automation visionaries.

At the North American International Auto Show in Detroit earlier this month, he addressed the audience about a “lack of coordination between autonomous vehicles and manned vehicles.” This, he said, “fails the driverless vision.”

I talked recently with Zyss on the phone, and asked how his speech went over. “It was very well received. Automakers — on the ground — know that a dedicated vehicle-to-vehicle communication must be put in place. It’s another layer of safety measures everyone can use to start saving lives today.”


Cybersecurity Firm Plans New Cyber Analytics Center In Prince George’s County

Baltimore Citybizlist


from

BlueVoyant, a global cybersecurity firm, today announced plans to establish a Global Cyber Analytics Center at the University of Maryland, College Park in early 2018. Currently operating out of a temporary site in College Park, the company will employ 25 highly-skilled analysts and data scientists and plans to add more team members with the move to the new facility in the university’s Discovery District.


Tracing the steps of nearly 10,000 U of T PhDs after graduation

University of Toronto, UofT News


from

In a first-ever survey of PhD graduates from the University of Toronto, the School of Graduate Studies found that alumni who graduated between 2000 and 2015 are working in 97 different countries, with careers ranging from neurosurgeon to user-experience researcher.

About 60 per cent of graduates across all disciplines found work in academia, and roughly a third hold tenure-stream positions. However, the data suggest PhD graduates are increasingly ending up outside the academy. Comparing the cohorts of 2015 to 2000, nearly twice the proportion of PhDs were employed in the private sector (23 per cent as opposed to 13 per cent).


Google Thinks It Can Accurately Predict Your Next Flight Delay

Futurism, Patrick Caughill


from

Google is updating its Google Flights feature with machine learning software that hopes to accurately predict if your upcoming flight will be delayed. The company recently revealed the update in a blog post, saying “Using historic flight status data, our machine learning algorithms can predict some delays even when this information isn’t available from airlines yet—and delays are only flagged when we’re at least 80% confident in the prediction.” The post does, however, give the company a little wiggle room for predictions that don’t ultimately materialize: Google still advises passengers to arrive on time to the airport.


A Code of Ethics for Data Science – dj patil – Medium

Medium, DJ Patil


from

With the old adage that with great power comes great responsibility, it’s time for the data science community to take a leadership role in defining right from wrong. Much like the Hippocratic Oath defines Do No Harm for the medical profession, the data science community must have a set of principles to guide and hold each other accountable as data science professionals. To collectively understand the difference between helpful and harmful. To guide and push each other in putting responsible behaviors into practice. And to help empower the masses rather than to disenfranchise them. Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see.

So how do we do it? First, there is no single voice that determines these choices. This MUST be community effort. Data Science is a team sport and we’ve got to decide what kind of team we want to be.

To start we need to engage in conversation and spend much more time talking about the changes that are about to take place (to those who have been doing this, thank you!).


AI summer camp for underrepresented youth goes international, tripling AI4ALL’s impact

Medium, AI4ALL


from

“Stanford AI4ALL didn’t just teach me that the power of AI was at my fingertips, but that the ability to make a bigger impact in my community was right in front of me,” said Seattle-area 10th grader Archika Dogra, Stanford AI4ALL 2017 alumna in a recent blog post. Archika’s nonprofit EduSTEM, launched last year, has already impacted over 100 low-income and underserved elementary school students in her community through computer science education.

With the goal of enabling more stories like Archika’s, AI4ALL is pleased to announce the expansion of their summer AI education programs for underrepresented high school students to 4 new sites in summer 2018, tripling their current offering and reaching 175+ new students. The new programs will launch at Carnegie Mellon University, Princeton University, Simon Fraser University, and Boston University, joining the existing programs at Stanford University and UC Berkeley.


Once, twice, six times a grocery shopper – Olin study shows most consumers visit multiple chain stores — for different products

Washington University in St. Louis


from

If Americans fulfilled their java urges the same way they carefully shopped for groceries, they would visit five to seven various chain coffee shops regularly — for a blend of different categories.

In fact, it turns out that grocery categories such as dessert toppings, motor oil, candles and refrigerated ethnic foods were some of the leading products that lure customers to separate stores.

In the first test of detailed consumer-buying habits by categories at more than one chain store selling groceries, a team of business school researchers led by Washington University in St. Louis found that shoppers weren’t monogamist or bigamist but rather polygamist in their choice of outlets.


Ceremony marks beginning of McGill-Kyoto joint PhD program

McGill University, McGill Reporter


from

And so, the warm tone was set for the agreement exchange ceremony marking the creation of the Kyoto-McGill International Collaborative Program in Genomic Medicine. It marks the first time McGill has entered into a joint Ph.D. program with another university.

Participating students will study and conduct research in the field of human biology. By working with biological big data, such as genomic information, intermediate phenotypes, lifestyles and environmental factors, these students will be trained to identify disease-predicting biomarkers and disease-susceptibility genes. Such techniques will be indispensable for the future development of “personalized medicine.”

“The issue of using genomic and related data, big data, is very much at the centre of medical science in the 21st century. We have made it a top priority for our University and certainly for our Faculty,” said Dean Eidelman. “We think that this is going to be critical for health care and for health research.”

 
Events



Cosyne2018 Program – COSYNE

COSYNE


from

Denver, CO March 1-4 for main meeting. March 5-6 for workshops. “The annual Cosyne meeting provides an inclusive forum for the exchange of experimental and theoretical/computational approaches to problems in systems neuroscience.” [$$$]


ASA Symposium on Data Science & Statistics

American Statisticl Association


from

Reston, VA May 16-19. The Symposium on Data Science and Statistics is designed for data scientists, computer scientists, and statisticians who analyze and visualize complex data. Deadline for e-poster submissions is March 8.


Register – February 22-23 Public Session on Reproducibility and Replicability in Science

National Academies of Sciences, Engineering, and Medicine


from

Washington, DC, and Online “The Committee on Reproducibility and Replicability in Science will hold its second meeting on February 22-23, 2018. A public session will be held on February 22 from 12:00 p.m.-5:30 p.m. The session will continue on February 23 from 8:30 a.m.-10:00 a.m.” [registration required]

 
Deadlines



NYU Center for Urban Science + Progress graduate programs

Deadline is February 15 to apply for Advanced Certificate or Master of Science in Applied Urban Science and Informatics offered at CUSP.

MIT Hacking Medicine’s Grand Hack 2018 Application

Details on Grand Hack 2018. Deadline for applications is March 18.
 
Tools & Resources



Kaggle Past Competitions

Eliot Andres


from

“Sortable and searchable compilation of solutions to past Kaggle competitions.”


How the Data That Internet Companies Collect Can Be Used for the Public Good

Harvard Business Review, Stefaan G. Verhulst and Andrew Young


from

A new year has arrived, along with the usual air of optimism. Yet the 21st century is already shaping up to be a challenging one. From climate change to terrorism, the difficulties confronting policy makers are unprecedented in their variety, but also in their complexity. Our existing policy tool kit seems stale and outdated. Increasingly, it is clear, we need not only new solutions but also new methods for arriving at solutions.

Data, and new methods for organizations to collaborate in order to extract insights from data, is likely to become more central to meeting these challenges. We live in a quantified era. It is estimated that 90% of the world’s data was generated in the last two years — from which entirely new inferences can be extracted and applied to help address some of today’s most vexing problems.

In particular, the vast streams of data generated through social media platforms, when analyzed responsibly, can offer insights into societal patterns and behaviors. These types of behaviors are hard to generate with existing social science methods. All this information poses its own problems, of complexity and noise, of risks to privacy and security, but it also represents tremendous potential for mobilizing new forms of intelligence.

Leave a Comment

Your email address will not be published.