Data Science newsletter – July 8, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for July 8, 2019

GROUP CURATION: N/A

Data Science News

OPM Announces New ‘Data Scientist’ Job Title

Government Executive, Erich Wagner

from July 01, 2019

Officials at the Office of Personnel Management last week announced a new initiative to improve data management and implement evidence-based policy making across the federal government: establishing a “data scientist” job title for a variety of positions and employees.

In a June 27 memorandum to agency human resources directors, OPM Associate Director for Employee Services Mark Reinhold said agencies may now may add data science titles to a number of positions within their organizations, as part of a broader effort to establish the role of chief data officers and their support staff.

“Data scientist work is multifaceted and requires talent from interdisciplinary backgrounds,” Reinhold wrote. “Data scientists are defined as practitioners with sufficient knowledge in the areas of business needs, domain knowledge, analytical skills and software and systems engineering to manage the end-to-end data processes in the data life cycle.”

Alberta’s artificial intelligence sector to continue its wait for cash

Edmonton Journal, Emma Graney

from July 03, 2019

Alberta’s nation-leading artificial intelligence sector will have to wait to find out whether or not $100 million in promised provincial funding will be coming its way.

Economic Development Minister Tanya Fir said Tuesday she doesn’t know when the sector will get an answer on her government’s funding review.

“We understand the AI sector is an important sector in terms of diversifying our economy, but there are many sectors right now who are hurting for jobs,” she told Postmedia.

3 Sky Surveys Completed in Preparation for Dark Energy Spectroscopic Instrument

Lawrence Berkeley Laboratory, News Center

from July 08, 2019

It took three sky surveys – conducted at telescopes in two continents, covering one-third of the visible sky, and requiring almost 1,000 observing nights – to prepare for a new project that will create the largest 3D map of the universe’s galaxies and glean new insights about the universe’s accelerating expansion.

EU AI Will Rely on Emerging Memory Technologies

EE Times Asia, Nitin Dahad

from July 05, 2019

Imec, the Belgium-based nanoelectronics and digital technologies research center, is leading a European Union program to develop low-power edge artificial intelligence (AI) chips based on several emerging memory technologies.

The three-year program, called Tempo (Technology & hardware for nEuromorphic coMPuting), is a cross-border collaboration between 19 research and industrial partners, including CEA-Leti of France and the Fraunhofer Group of Germany. The joint effort aims to develop process technology and hardware platforms leveraging emerging memory technologies for neuromorphic computing. The goal is to develop a new way to support applications in mobile devices that need complex machine-learning algorithms.

Bankers are rushing to take Oxford University’s fintech courses before robots take their jobs

Business Insider, Markets, Theron Mohamed

from July 02, 2019

More than 9,000 people from upwards of 135 countries have taken the online open courses, which focus on digital transformation in business, at the university’s Saïd Business School, a spokesperson told Markets Insider.

The fintech course, the first of five to be launched, has run 12 times and attracted nearly 4,300 students in less than two years. The average age of participants across the courses is 39, and two-thirds of them came from the financial services sector, suggesting experienced professionals are returning to school to understand how their industry is being disrupted and learn the skills needed to weather the changes.

Meta-Research: Centralized scientific communities are less likely to generate replicable results

eLife, Computational and Systems Biology; Valentin Danchev, Andrey Rzhetsky, James A Evans

from July 02, 2019

Concerns have been expressed about the robustness of experimental findings in several areas of science, but these matters have not been evaluated at scale. Here we identify a large sample of published drug-gene interaction claims curated in the Comparative Toxicogenomics Database (for example, benzo(a)pyrene decreases expression of SLC22A3) and evaluate these claims by connecting them with high-throughput experiments from the LINCS L1000 program. Our sample included 60,159 supporting findings and 4253 opposing findings about 51,292 drug-gene interaction claims in 3363 scientific articles. We show that claims reported in a single paper replicate 19.0% (95% confidence interval [CI], 16.9–21.2%) more frequently than expected, while claims reported in multiple papers replicate 45.5% (95% CI, 21.8–74.2%) more frequently than expected. We also analyze the subsample of interactions with two or more published findings (2493 claims; 6272 supporting findings; 339 opposing findings; 1282 research articles), and show that centralized scientific communities, which use similar methods and involve shared authors who contribute to many articles, propagate less replicable claims than decentralized communities, which use more diverse methods and contain more independent teams. Our findings suggest how policies that foster decentralized collaboration will increase the robustness of scientific findings in biomedical research.

Harvard says fighting climate change is a top priority. But it still won’t divest from fossil fuels.

The Washington Post, Steven Mufson

from July 07, 2019

Harvard University prides itself on being on the cutting edge of climate policy and research. Its students and faculty have deployed drones over the Amazon, worked on a “bionic leaf” to turn sunlight and water into fuel and fertilizer, and searched for a cheaper electrochemical method of capturing carbon dioxide.

But there’s at least one step on climate change that Harvard has not taken: divesting the university’s $39 billion endowment of investments in fossil fuels.

Project Aims to Map What We Do and Don’t Know About the ‘Insect Apocalypse’

WCAI, Living Lab Radio, Heather Goldstone & Elsa Partan

from July 07, 2019

The newly launched EntoGEM project aims to put that debate to rest by comprehensively mapping what we do and, perhaps even more importantly, don’t know about the status of insects.

David Wagner, an entomologist and professor of ecology and evolutionary biology at University of Connecticut, says that it’s a hard question to answer, in part because there are so many insects.

“We’re talking about millions of species,” Wagner said. “It’s hard pegging data to those individual lineages or taxa or species.” [audio, 11:50]

$100M grant aims to make UI leader in quantum engineering

The News-Gazette (Champaign, IL), Julie Wurth

from July 04, 2019

A $100 million investment in a quantum engineering project designed to make Illinois a world leader in that emerging technology is among $3 billion in higher education allocations in the new Rebuild Illinois infrastructure plan.

Gov. J.B. Pritzker, key legislators and state university presidents gathered Wednesday at the Illini Union to celebrate the $45 billion state capital plan — the first in a decade. Funded by tax increases and new user fees, it will send more than $3 billion to colleges and universities over the next six years for projects ranging from library renovations to new science buildings to long-deferred maintenance.

Where America’s diversity is increasing the fastest

Axios, Lazaro Gamio

from July 04, 2019

America is becoming more racially and ethnically diverse, on its way to becoming majority non-white in 2045 — but some parts of the country are changing more rapidly than the rest.

Why it matters: That’s worth remembering as we celebrate July 4 at a time of accelerating changes and political upheaval. The counties seeing the greatest relative increase in racial and ethnic diversity are among the least diverse places in the country — particularly in the Midwest.

Will California’s New Bot Law Strengthen Democracy?

The New Yorker, Noam Cohen

from July 02, 2019

When you ask experts how bots influence politics—that is, what specifically these bits of computer code that purport to be human can accomplish during an election—they will give you a list: bots can smear the opposition through personal attacks; they can exaggerate voters’ fears and anger by repeating short simple slogans; they can overstate popularity; they can derail conversations and draw attention to symbolic and ultimately meaningless ideas; they can spread false narratives. In other words, they are an especially useful tool, considering how politics is played today.

On July 1st, California became the first state in the nation to try to reduce the power of bots by requiring that they reveal their “artificial identity” when they are used to sell a product or influence a voter. Violators could face fines under state statutes related to unfair competition. Just as pharmaceutical companies must disclose that the happy people who say a new drug has miraculously improved their lives are paid actors, bots in California—or rather, the people who deploy them—will have to level with their audience.

“It’s literally taking these high-end technological concepts and bringing them home to basic common-law principles,” Robert Hertzberg, a California state senator who is the author of the bot-disclosure law, told me.

NSLComm Launches 1st Nano-Satellite Fitted with Expanding Antenna Into Space

No Camels, Francine Levy

from July 05, 2019

Israeli communications tech startup NSLComm launched its first nano-satellite, the NSLSat-1, into space on Friday, July 5, fitted with an innovative, antenna that expands and offers high-performance and affordability.

The company developed a fabric-like, flexible and expandable dish antenna to offer high-throughput communications for small satellites that is up to 100 times faster than that of today’s best-performing nano-satellites, NSLcomm said in a statement ahead of the launch this week.

How to Protect Our Kids’ Data and Privacy

WIRED, Opinion, Sophie Allaert, Mélina Cardinal-Bradette and Elif Sert

from July 07, 2019

The Convention on the Rights of the Child, the most ratified human rights treaty ever, protects children as individuals. But modern technology raises new questions: Will children self-censor themselves on the internet because they don’t know how their data will be used? How is access to information limited when social media platforms use algorithms to display personalized and targeted content? We don’t know what ramifications widespread data collection could have on future generations of kids.

To protect children’s fundamental rights, we need a new data protection framework: one based on how the data is used, not who owns it.

The boy was dying. Zebrafish helped save his life

STAT, Sharon Begley

from July 01, 2019

[Hakon] Hakonarson and his colleagues identified the mutation causing the boy’s illness, in which lymphatic vessels proliferate out of control and leak fluid into the lungs and other organs; engineered the mutation into zebrafish; waited for the fish to develop a piscine version of the boy’s disease; tested multiple drugs on the fish; found one that stopped the vessels’ kudzu-like growth; and got permission from federal health officials to try it on their young patient.

It worked, they reported in Nature Medicine on Monday, a success that shows “precision medicine” can be expanded well beyond cancers, where it has shown the greatest promise.

NASA changes how it divvies up telescope time to reduce gender bias

Nature, News, Alexandra Witze

from July 03, 2019

NASA has changed the way that reviewers evaluate requests for viewing time on the agency’s space telescopes in an effort to reduce gender and other biases. It now uses a ‘double-blind’ system, in which neither the proposer nor the reviewer knows who the other is.

The space agency made the announcement, effective immediately, in an e-mail to astronomers last month. The policy applies to all upcoming proposals to use telescopes including the Chandra X-ray Observatory, the Transiting Exoplanet Survey Satellite, the NuSTAR (Nuclear Spectroscopic Telescope Array) and NICER (Neutron Star Interior Composition Explorer) X-ray telescopes, the Swift Observatory and the Fermi γ-ray telescope.

The switch to double-blind review, which NASA calls dual-anonymous review, is designed to eliminate a host of systemic biases that have plagued the agency’s review of research proposals.

Events

SF Scala Meetup – Building microservices with Finagle and Scala

Meetup, SF Scala Meetup

from July 17, 2019

San Francisco, CA July 17, starting at 6 p.m., Strava (208 Utah St.). “This talk will cover Strava’s usage of Finagle and Scala, how it enables building backend services that are fast, efficient, and scalable, and how Scala and Finagle fit in with their long-term platform architecture, vision and growth.” [rsvp required]

ACM SIGIR International Conference on the Theory of Information Retrieval

ACM SIGIR

from October 02, 2019

Santa Clara, CA October 2-5, “a forum for the presentation and discussion of research related to the foundational aspects of Information Retrieval (IR).” [$$$]

Deadlines

NBA Hackathon

“APPLICATIONS FOR THE 2019 NBA HACKATHON WILL CLOSE ON TUESDAY, JULY 16, 2019 AT 11:59 PM.”

Using the RxRx1 dataset, we are sponsoring a NeurIPS 2019 competition called CellSignal to encourage researchers to explore methods of separating biological and technical factors in biological data.

“The task is to correctly classify the perturbation present in each image in a held out set of experiments that were run in batches different from the experiments in the training set. Thus, in order for the classifier to generalize well to unseen batches, it must learn to separate biological and technical factors and make predictions only on the biology of the perturbation.” Kaggle Team Merger deadline is July 22.

Data Science Fellowship

“The Data Incubator is a Cornell-funded data science training organization. We run a free advanced 8-week fellowship (think data science bootcamp) for PhDs looking to enter industry.” … “The next program (both in-person and online) will be 2019-09-16 – 2019-11-08.”

Tools & Resources

See Google’s first guidelines for data visualization

Fast Company, Mark Wilson

from June 28, 2019

Google does everything from building smartphones to creating driverless car companies. So it’s easy to forget that its official mission is “to organize the world’s information and make it universally accessible and useful.” Google is and will always be a company anchored in the business of gathering data and serving it up in a comprehensible way, whether that’s through Search results, Google News alerts, or restaurants you might like automatically popping up on Google Maps.

So it’s perhaps no surprise that Google now wants to help others package their data, too. Manuel Lima—a celebrated information designer who founded and heads Google’s data visualization team (formed in 2018)— has published Google’s Six Principles for Designing Any Chart, a cheat sheet to clear, accessible data visualizations.

Fighting complexity in software development

Anish Kandikar, GitHub – CardManagement

from July 05, 2019

After working on different projects, I’ve noticed that every one of them had some common problems, regardless of domain, architecture, code convention and so on. Those problems weren’t challenging, just a tedious routine: making sure you didn’t miss anything stupid and obvious. Instead of doing this routine on a daily basis I became obsessed with seeking solution: some development approach or code convention or whatever that will help me to design a project in a way that will prevent those problems from happening, so I can focus on interesting stuff. That’s the goal of this article: to describe those problems and show you that mix of tools and approaches that I found to solve them.

This is changing my life. Mathpix’s “Snip” will let you screenshot an equation, and it will return the LaTeX code.

Twitter, Sean Carroll

from June 12, 2019

Works passably on handwriting, and nearly flawlessly on pdf equations.

Input Uncertainty and Gaussian Processes – Uncertain Input GPs Through the Ages

J. Emmanuel Johnson

from July 07, 2019

When applying Gaussian processes (GPs) for regression, we always assume that there is some noise σ2y in the output measurements y. We rarely ever assume that there are errors in the input points x

. This assumption does not hold as we can definitely have errors in the inputs as well. For example, most remote sensing satellites that measure spectral radiances; they produce spectral measurements that have well calibrated errors. Many models that measure things like temperature or other climate/meteorological phenomena take into account these sensor errors.

This chain of error moving through models is known as error propagation; something that is well known to any physics 101 student that has ever taken a laboratory class. In this review, I would like to go through some of the more important algorithms in the context of GPs and dissect some of the mathematics behind it so that we can arrive at some understanding of how uncertain inputs are handled in GPs.

Careers

Full-time positions outside academia

Graphics editor for newsletters

The Washington Post; Washington, DC

Team Lead Data Science

Grubhub; Chicago, IL

Deputy Division Director, Division of Environmental Biology, BIO

National Science Foundation; Alexandria, VA

Full-time, non-tenured academic positions

Director of Research Programs

Stanford University, Human-Centered Artificial Intelligence Institute; Palo Alto, CA

University Library Specialist, Learning Spaces & Services

North Carolina State University, University Libraries; Raleigh, NC

Sports.BradStenger.com

Data Science newsletter – July 8, 2019

Leave a Comment Cancel reply