Data Science newsletter – August 22, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for August 22, 2019

GROUP CURATION: N/A

 
 
Data Science News



AI is in danger of becoming too male – new research

The Conversation, Juan Mateos-Garcia and Joysy John


from

Artificial Intelligence (AI) systems are becoming smarter every day, beating world champions in games like Go, identifying tumours in medical scans better than human radiologists, and increasing the efficiency of electricity-hungry data centres. Some economists are comparing the transformative potential of AI with other “general purpose technologies” such as the steam engine, electricity or the transistor.

But current AI systems are far from perfect. They tend to reflect the biases of the data used to train them and to break down when they face unexpected situations. They can be gamed, as we have seen with the controversies surrounding misinformation on social media, violent content posted on YouTube, or the famous case of Tay, the Microsoft chatbot, which was manipulated into making racist and sexist statements within hours.

So do we really want to turn these bias-prone, brittle technologies into the foundation stones of tomorrow’s economy?


An AI privacy conundrum? The neural net knows more than it says

ZDNet, Tiernan Ray


from

What is output by a neural network, such as an image classifier, is only a small portion of what the network in a sense “knows” about the data. And the prospect that the network is retaining sensitive user data such as social security numbers could open a new area of concern about AI.


AI is breathing new life into the intelligence community

FedScoop, Jackson Barnett


from

“The field of intelligence analysis is at an inflection point,” Joseph Gartin, CIA’s deputy associate director for learning, wrote recently in the Studies in Intelligence Journal. “Behind us, several decades of accomplishment and innovation, chastened at times by errors and shaped by cautious incrementalism. Ahead, a future—as in all knowledge industries—still coming into view but shaped by the powerful and potentially disruptive effects of artificial intelligence, big data, and machine learning.”


$15 million supports quest for personalized leukemia therapies

Washington University School of Medicine in St. Louis


from

Washington University investigators at Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine in St. Louis have been awarded a $15 million grant to better understand the genetic changes that drive acute myeloid leukemia (AML), a deadly blood cancer, and predict patients’ responses to therapy. The findings also may enable investigators to develop more effective therapies tailored to patients, based on the genetic characteristics of their cancer cells.

The grant, funded by the National Cancer Institute of the National Institutes of Health (NIH), allows for the continuation of work by a team of researchers with complementary expertise in AML, drawing on their unique strengths in basic and translational research.


Yale names inaugural executive director of Tobin Center for Economic Policy

Yale University, YaleNews


from

David Wilkinson ’06 J.D., a former senior White House official and a prominent voice for data-driven, evidence-based policymaking, has been appointed the founding executive director of the Tobin Center for Economic Policy at Yale.

Wilkinson, who will join the Tobin Center on Sept. 4, directed the White House Office of Social Innovation and served in other White House senior policy roles from 2013 to 2017, leading successful national efforts to advance novel, data-driven approaches to social problems. More recently, he served in cabinet-level roles for two Connecticut governors, in which he launched results-based initiatives featured by The New York Times and the Aspen Institute. At the federal and state level, he has advanced bipartisan legislation to harness data and evidence to generate better, more effective policy.


Cargill to open Innovation Lab at University of Illinois at Urbana-Champaign

PR Newswire, Cargill


from

University of Illinois students focused on solving some of the world’s greatest challenges across the global food and agriculture systems will soon have a new space to test their work thanks to a partnership with Cargill. The Research Park at the Urbana-Champaign campus will be the home of the newest Cargill Innovation Lab, the company announced today.

Led by Cargill’s digital labs team of data scientists, Innovation Lab students and staff will work directly with Cargill business leaders, operations teams and customers to help move digital ideas – in weeks, not months – to test mode.


After a failed attempt to bring Amazon to College Park, UMD is bringing itself to Amazon

University of Maryland, The Diamondback student newspaper, Matt McDonald


from

The University of Maryland is opening a new space for “innovation and collaboration” in Crystal City, Virginia, near Amazon’s new HQ2, the university announced Wednesday.

The university has leased 8,000 square feet in a building that will be a part of the larger Amazon headquarters, said Ken Ulman, the university’s chief strategy officer for economic development. The space, which will host symposiums, workshops and other learning events for several colleges at this university, will be dubbed the Discovery Center.


Rejecting a Sex Offender’s Contributions

Inside Higher Ed, Colleen Flaherty


from

Two MIT researchers resign from the Media Lab over its ties to Jeffrey Epstein.


The five technical challenges Cerebras overcame in building the first trillion-transistor chip

TechCrunch, Danny Crichton


from

It’s made a big splash here at Stanford University at the Hot Chips conference, one of the silicon industry’s big confabs for product introductions and roadmaps, with various levels of oohs and aahs among attendees. You can read more about the chip from Tiernan Ray at Fortune and read the white paper from Cerebras itself.

Superlatives aside though, the technical challenges that Cerebras had to overcome to reach this milestone I think is the more interesting story here. I sat down with founder and CEO Andrew Feldman this afternoon to discuss what his 173 engineers have been building quietly just down the street here these past few years, with $112 million in venture capital funding from Benchmark and others.


Human Impact: Climate Change and Citizen Science

Discover magazine, Citizen Science Salon blog


from

This blog post is an edited excerpt from Human Impact, a new publication from Science Connected. Edited by Kate Stone and Shayna Keyles, Human Impact delivers 17 true tales of how humanity has changed the Earth, for better or for worse. This chapter appears in Human Impact as “Act Now: Engaging in Citizen Science,” and includes contributions from Caroline Nickerson, Kristin Butler, and Julia Travers.

Act Now: Engaging in Citizen Science

Citizen science is a field as broad as science itself. It can encompass advocacy, policy, education – all under-girded by rigorous scientific standards for data collection and inquiry (Cavalier, Darlene & Kennedy, Eric. (2016). The Rightful Place of Science: Citizen Science.). Citizen science is public engagement in scientific research. As Jennifer Shirk, Interim Executive Director of the Citizen Science Association, said, “Citizen science offers the power of science to everyone, and the power of everyone to science.”


Google DeepMind Co-Founder Placed on Leave From AI Lab

Bloomberg Technology,


from

The co-founder of DeepMind, the high-profile artificial intelligence lab owned by Google, has been placed on leave after controversy over some of the projects he led.

Mustafa Suleyman runs DeepMind’s “applied” division, which seeks practical uses for the lab’s research in health, energy and other fields. Suleyman is also a key public face for DeepMind, speaking to officials and at events about the promise of AI and the ethical guardrails needed to limit malicious use of the technology.

“Mustafa is taking time out right now after 10 hectic years,” a DeepMind spokeswoman said. She didn’t say why he was put on leave.


Will China lead the world in AI by 2030?

Nature, News, Sarah O'Meara


from

The country’s artificial-intelligence research is growing in quality, but the field still plays catch up to the United States in terms of high-impact papers, people and ethics.


BuzzFeed’s new MoodFeed recommends content based on how you’re feeling

TechCrunch, Anthony Ha


from

BuzzFeed is offering readers a new approach to finding content that fits the way they’re feeling right now.

It’s not the boring old approach of following a link on social media or search, or of typing BuzzFeed.com into your browser. Instead, on MoodFeed, readers can identify their mood, then they’ll get a list of articles that match those feelings.

There are currently six options — curious, stressed, bored, nostalgic, joyful or hungry. If you select “curious,” you’ll see a list of BuzzFeed posts about strange facts, life hacks and the like. If, on the other hand, you go with “nostalgic,” you’ll get lots of headlines about pop culture history. And if you’re not sure, you can just give the mood wheel a spin and see what it lands on.


Bernie Sanders’s call to ban facial recognition AI for policing

Vox, Sigal Samuel


from

Sen. Bernie Sanders is the first presidential candidate to call for a total ban on the use of facial recognition software for policing. He’s also calling for a moratorium on the use of algorithmic risk assessment tools that aim to predict which criminals will reoffend.

His pledge to institute these changes if elected is part of his broader criminal justice reform plan, released over the weekend, which includes other ideas like ending programs that provide military equipment to cops and creating federal standards for body cameras.


What is ‘blobology’ and how is it transforming biology?

BBC Science & Environment, Pallab Ghosh


from

Spectacularly detailed videos from an advanced microscope are sparking a biology “revolution”, scientists say.

The technique was once termed “blobology” because its images were so indistinct.

But now, it is being used to create videos of the body’s inner-workings at a hitherto unseen level of detail.

 
Events



Data, Designed – London

Jonathan Schwabish


from

London, England November 21, starting at 9 a.m. “Bring your data to life using innovative visualization techniques in this one-day workshop unlike any other, and learn from world leaders that come from two sides of data visualization: data science and design. Explore styles, techniques, and best practices to create meaningful and beautiful visualizations.” [$$$]


TextXD 2019

Berkeley Institute for Data Science


from

Berkeley, CA December 3-6, at Berkeley Haas School of Business. “TextXD brings together researchers from across a wide range of disciplines, who work with text as a primary source of data. We work to identify common principles, algorithms and tools to advance text-intensive research, and break down the boundaries between domains, to foster exchange and new collaborations among like-minded researchers.” [save the date]

 
Deadlines



The 2019 DII National Data Science Challenge

“UTHealth School of Biomedical Informatics is hosting The 2019 DII National Data Science Challenge to enable participating teams to leverage a subset of the de-identified, EHR-derived Cerner Health Facts® database for the purpose of solving a clinically relevant problem to advance human health.” Deadline for challenge solutions is September 7.
 
Tools & Resources



The Essential Tools of Scientific Machine Learning (Scientific ML)

The Winnower, Christopher Rackauckas


from

Scientific machine learning is a burgeoning discipline which blends scientific computing and machine learning. Traditionally, scientific computing focuses on large-scale mechanistic models, usually differential equations, that are derived from scientific laws that simplified and explained phenomena. On the other hand, machine learning focuses on developing non-mechanistic data-driven models which require minimal knowledge and prior assumptions. The two sides have their pros and cons: differential equation models are great at extrapolating, the terms are explainable, and they can be fit with small data and few parameters. Machine learning models on the other hand require “big data” and lots of parameters but are not biased by the scientists ability to correctly identify valid laws and assumptions.

However, the recent trend has been to merge the two disciplines, allowing explainable models that are data-driven, require less data than traditional machine learning, and utilize the knowledge encapsulated in centuries of scientific literature.


Teaching Replication to Graduate Students

SAGE Journals, Teaching Sociology; Dragana Stojmenovska, Thijs Bol, Thomas Leopold


from

Replicating published studies promotes active learning of quantitative research skills. Drawing on experiences from a replication course, we provide practical tips and reflections for teachers who consider incorporating replication in their courses. We discuss teaching practices and challenges we encountered at three stages of a replication course: student recruitment, course structure and proceedings, and learning outcomes. We highlight that by engaging in replication, students learn from established scholarly work in a collaborative and reflective manner. Students not only improve their quantitative literacy but also learn more generally about the scientific method and the production of research.


Data Visualization Curriculum

GitHub – uwdata


from

A data visualization curriculum of interactive notebooks, using Vega-Lite and Altair. This repository contains a series of Python-based Jupyter notebooks, a corresponding set of JavaScript notebooks are available online on Observable.


Natural Language Processing and Machine Learning for Law and Policy Texts

SSRN, John Nay


from

Almost all law is expressed in natural language; therefore, natural language processing (NLP) is a key component of understanding and predicting law at scale. NLP converts unstructured text into a formal representation that computers can understand and analyze. The intersection of NLP and law is poised for innovation because there are (i.) a growing number of repositories of digitized machine-readable legal text data, (ii.) advances in NLP methods driven by algorithmic and hardware improvements, and (iii.) the potential to improve the effectiveness of legal services due to inefficiencies in its current practice.

NLP is a large field and like many research areas related to computer science, it is rapidly evolving. Within NLP, this paper focuses primarily on statistical machine learning techniques because they demonstrate significant promise for advancing text informatics systems and will likely be relevant in the foreseeable future.

First, we provide a brief overview of the different types of legal texts and the different types of machine learning methods to process those texts. We introduce the core idea of representing words and documents as numbers. Then we describe NLP tools for leveraging legal text data to accomplish tasks. Along the way, we define important NLP terms in italics and offer examples to illustrate the utility of these tools. We describe methods for automatically summarizing content (sentiment analyses, text summaries, topic models, extracting attributes and relations, document relevance scoring), predicting outcomes, and answering questions.

 
Careers


Full-time positions outside academia

Backend Engineer



Motryx; Halifax, NS, Canada

Director of Design



Sage Bionetworks; Seattle, WA

Leave a Comment

Your email address will not be published.