Data Science newsletter – July 29, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for July 29, 2019

GROUP CURATION: N/A

Data Science News

The moral side of cloud-based data science

InfoWorld, David S. Linthicum

from July 19, 2019

Technology once again makes the difference here. We’re tracking more things every day, and IoT is making that tracking both a commodity and instantaneous. Our understanding of data science’s potential is growing off the charts, and compute and storage on demand is approaching the price of free.

The point I’m trying to make is that we have some questions to ask. It’s not simply that we can do wonderful things with our data now, and that there is much more meaningful data around. It’s really how we deal with the forthcoming ethical questions. I hope we make the right bets.

Yale ISPS receives grant funding for Data CuRe project

Yale University, YaleNews

from July 25, 2019

Yale’s Institution for Social and Policy Studies (ISPS) has received a grant from The Laura Bush 21st Century Librarian Program, part of the Institute of Museum and Library Services (IMLS), to support the sharing and development of reproducible research practices.

ISPS was selected as one of only 24 recipients by the program in recognition of its two-year project grant, titled “Implementing Data Curation for Reproducibility (Data CuRe) Training Program.” The project aims to expand the capacity of research libraries to support reproducible research practices. “The goal is to strengthen the role of libraries in sustaining the value of research assets and to elevate their role as key research partners in preserving the integrity of the scientific record,” said Limor Peer, ISPS associate director for research and principal investigator on the grant.

This project builds on a collaboration ISPS formed with experts at the Odum Institute at the University of North Carolina-Chapel Hill and the Cornell Institute for Social and Economic Research. The three organizations founded the CURE Consortium in 2016 to promote the need to curate data and code for the purpose of ensuring computational reproducibility.

Researchers deploy new tech to explore depths of Gulf of Mexico

Florida International University, FIU News

from July 26, 2019

FIU marine scientist Kevin Boswell and a multi-institution research team will deploy experimental technology next week to explore the deep scattering layers of the ocean.

They are looking for information about animals in the Gulf of Mexico that make up the scattering layers — those that undergo daily vertical migrations of 100 to 1,000 meters. These animals represent the largest organized animal migration on the planet, yet little is known about them. What scientists do know is these animals are major players in the global carbon cycle, transporting carbon to deeper waters as they migrate. Some of them are part of a global discussion about whether they could have economic potential from a fisheries standpoint.

The research team will deploy an autonomous glider modified with sonar technology to collect up-close and personal data on the migrating animals in the water column.

Partnership with Auburn University gives Tuskegee students supply chain, RFID experience

Tuskegee University, News

from July 26, 2019

A partnership between Tuskegee University’s Brimmer College of Business and Information Science and Auburn University’s RFID Lab is giving its supply chain management students hands-on experience managing real-world data for some of the retail and apparel industry’s most well-known companies.

‘There’s no doubt left’ about scientific consensus on global warming anymore

Grist, The Guardian, Jonathan Watts

from July 27, 2019

The scientific consensus that humans are causing global warming is likely to have passed 99 percent, according to the lead author of the most authoritative study on the subject, and could rise further after separate research that clears up some of the remaining doubts.

Three studies published in Nature and Nature Geoscience use extensive historical data to show there has never been a period in the last 2,000 years when temperature changes have been as fast and extensive as in recent decades.

It had previously been thought that similarly dramatic peaks and troughs might have occurred in the past, including in periods dubbed the Little Ice Age and the Medieval Climate Anomaly. But the three studies use reconstructions based on 700 proxy records of temperature change, such as trees, ice, and sediment, from all continents that indicate none of these shifts took place in more than half the globe at any one time.

A new tool uses AI to spot text written by AI

MIT Technology Review, Will Knight

from July 26, 2019

Researchers from Harvard University and the MIT-IBM Watson AI Lab have developed a new tool for spotting text that has been generated using AI. Called the Giant Language Model Test Room (GLTR), it exploits the fact that AI text generators rely on statistical patterns in text, as opposed to the actual meaning of words and sentences. In other words, the tool can tell if the words you’re reading seem too predictable to have been written by a human hand.

Meet the robots being used to help solve America’s recycling crisis

CNBC, Lori Ioannou and Magdalena Petrova

from July 27, 2019

To tackle this environmental catastrophe, U.S. companies and researchers are developing AI-assisted robotic technology that can work with humans in processing plants and improve quality control. The goal is to have robots do a better job at sorting garbage and reduce the contamination and health hazards human workers face in recycling plants every day. Sorting trash is a dirty and dangerous job. Recycling workers are more than twice as likely as other workers to be injured on the job, according to a report at the University of Illinois School of Public Health. The profession also has high fatality rates.

The way the robots work is simple. Guided by cameras and computer systems trained to recognize specific objects, the robots’ arms glide over moving conveyor belts until they reach their target. Oversized tongs or fingers with sensors that are attached to the arms snag cans, glass, plastic containers, and other recyclable items out of the rubbish and place them into nearby bins. [video, 15:06]

The Hidden Costs of Automated Thinking

The New Yorker, Jonathan Zittrain

from July 23, 2019

This approach to discovery—answers first, explanations later—accrues what I call intellectual debt. It’s possible to discover what works without knowing why it works, and then to put that insight to use immediately, assuming that the underlying mechanism will be figured out later. In some cases, we pay off this intellectual debt quickly. But, in others, we let it compound, relying, for decades, on knowledge that’s not fully known.

In the past, intellectual debt has been confined to a few areas amenable to trial-and-error discovery, such as medicine. But that may be changing, as new techniques in artificial intelligence—specifically, machine learning—increase our collective intellectual credit line. Machine-learning systems work by identifying patterns in oceans of data. Using those patterns, they hazard answers to fuzzy, open-ended questions. Provide a neural network with labelled pictures of cats and other, non-feline objects, and it will learn to distinguish cats from everything else; give it access to medical records, and it can attempt to predict a new hospital patient’s likelihood of dying. And yet, most machine-learning systems don’t uncover causal mechanisms. They are statistical-correlation engines. They can’t explain why they think some patients are more likely to die, because they don’t “think” in any colloquial sense of the word—they only answer. As we begin to integrate their insights into our lives, we will, collectively, begin to rack up more and more intellectual debt.

LinkedIn, a Data Center Innovation Powerhouse, Is Moving to Azure

Data Center Knowledge, Scott Fulton III

from July 25, 2019

As the unique system LinkedIn engineers built over the years gets replaced with a public cloud, how much of the innovation will Microsoft retain?

Behind “Substitution” – Six years of research reveals the fingerprint of substitutive systems

Nature Behavioural & Social Sciences, Ching Jin

from July 15, 2019

When I mention this work to friends, the first question I’m asked is why I’ve made substitution the primary focus of this research. There’s even a little confusion sometimes. “A six-year-long project about substitution? Why substitution?” And, honestly, when we launched the project in 2013, I would have never imagined that “people’s behaviour of replacing one item for another” would ultimately become the main topic of our paper. In fact, it would be two years later when we even started to use the term in our conversations.

Our goal in the beginning was simple: to gain further insight of the topic “diffusion of innovations” by using a large-scale cellphone dataset (D1 in our paper, thanks to Johannes and Geoffrey). Working intensively over two years, we discovered several new macroscopic properties about the adoption behvaiour in the system. We even obtained an entire modeling framework (thanks to Chaoming and Dashun) which could accurately capture various complex phenomena in the dataset. This was exciting, and as a junior PhD student, I felt that we had already gathered enough raw material for a paper, although the mechanisms behind the model were not yet clear.

The T-Mobile Sprint merger could mean the end of the physical SIM card

The Verge, Dieter Bohn

from July 26, 2019

The Department of Justice may have just done more to eliminate those little plastic SIM cards you have to use to get your phone to work on a wireless carrier than all of the efforts of Big Tech over the past four years. That’s because it is requiring Dish and T-Mobile to support eSIM technology as a condition of its merger approval.

What may seem like a wonky side detail or extra technical requirement in a blockbuster merger approval announcement could end up changing not just how your phone gets online, but also (eventually) the way phones are built. It’s not going to happen overnight — the process will probably take years — but this small proviso in the merger approval could affect much more than who can sell wireless service in America.

Electronic SIM (eSIM) is the technology that allows wireless devices to get activated on a network through software. In theory, it makes it much easier for consumers to switch networks because they don’t have to acquire a physical thing (the SIM card) from the network they want to switch to.

The global internet is powered by vast undersea cables. But they’re vulnerable.

CNN, James Griffiths

from July 26, 2019

“Most people are absolutely amazed” by the degree to which the internet is still cable-based, said Byron Clatterbuck, chief executive of Seacom, a multinational telecommunications firm responsible for laying many of the undersea cables connecting Africa to the rest of the world.

“People are so mobile and always looking for Wi-Fi,” he said. “They don’t think about it, they don’t understand the workings of this massive mesh of cables working together.

“They only notice when it’s cut.”

The “Math Prefresher” and The Collective Future of Political Science Graduate Training

Gary King, Shiro Kuriwaki, and Yon Soo Park

from July 24, 2019

The political science math prefresher arose a quarter century ago and has now spread to many of our discipline’s Ph.D. programs. Incoming students arrive for graduate school a few weeks early for ungraded instruction in math, statistics, and computer science as they are useful for political science. The prefresher’s benefits, however, go beyond the technical material taught: it develops lasting camaraderie with their entering class, facilitates connections with senior graduate students, opens pathways to mastering methods necessary for research, and eases the transition to the increasingly collaborative nature of graduate work. The prefresher also shows how faculty across a highly diverse discipline can work together to train the next generation. We review this program, highlight its collaborative aspects, and try to take the idea to the next level by building infrastructure to share teaching materials across universities so separate programs can build on each other’s work and improve all our programs.

How economics can raise its game

Tim Harford

from July 26, 2019

How can economics become a more insightful discipline? Should it aim to be more like physics, with its precision and predictive power? Or should economists emulate anthropologists or historians, immersing themselves in the details of the particular and the unquantifiable? … The fact that both views have the ring of plausibility suggests that this is a tougher challenge than it might appear from the sidelines. Now a new paper addresses the question from the heart of academic economics: Nobel laureate George Akerlof, writing in the Journal of Economic Literature.

Why Proper Metadata In Music Is So Important?

Medium, The Startup, Karl Fowlkes

from July 24, 2019

Metadata seems like a complex concept but it’s really not, just get the proper names and information from the people who are involved with making the record and the proper backend business partners. That’s it. However, if you go to the song credits on a streaming platform right now there will be some artists that you support who have uploaded tracks without crediting and entering the proper metadata in a song. This usually happens when an artist is independent, dropping a project without much fanfare or for a lack of a better word, lazy. On the other end, for a major label to release a song or project without proper metadata is completely unacceptable.

These mishaps in metadata happen at every level and part of the problem is a lack of a standard across the industry.

Events

ODSC West 2019

Open Data Science Conference

from October 29, 2019

San Francisco, CA October 29-November 1. “ODSC is the best community data science event on the planet. There are other events that cover special topics, or industries, etc., but ODSC is comprehensive and totally community-focused: it’s the conference to engage, to build, to develop, and to learn from the whole data science community.” [$$$$]

Deadlines

Woot! The #AI4SocialGood workshop was accepted at #NeurIPS2019!

I pumped to see all the great projects and be one of the organizers this time around 🙂

Tools & Resources

Elements of Informed Consent

Sage Bionetworks

from July 26, 2019

The Elements of Informed Consent is a toolkit that will help researchers think through what information participants should receive as part of the consenting process in order to make an informed decision about whether or not to join a study. Developed by the Governance Team at Sage Bionetworks, the toolkit shares basic information and best practices for developing an effective consenting process. We believe that informed participants make the best participants, because they understand the study, its risks and benefits, and how their data will be treated.

I’ve started a LinkedIn group for supporting underrepresented people in tech speaking at conferences.

Twitter, Sizanne Axtell

from July 24, 2019

Would love some new members! If you have public speaking/preso resources and calls for speakers to share, check it out!

Why I livestream my science

Science, Share, S. Hussain Ather

from July 18, 2019

I arrive at the lab where I work as a postbaccalaureate researcher and, like many scientists starting their days, I open my laptop, check my email, and plan my agenda. But then, when I’m ready to get to work, I turn on my laptop camera. I perform computational research studying the neuroscience of zebrafish—and I use an online streaming service to share it with the world in real time. I had started to use the platform, called Twitch, a few years ago to watch people play video games. But until last year, it had never occurred to me that I could use it for research, too.

NIH-funded Researchers Invited to Use NIH Figshare

Data Science at NIH blog

from July 23, 2019

As part of the NIH Strategic Plan for Data Science, the NIH is committed to making datasets resulting from NIH investigator publications more accessible. Researchers sometimes find themselves with a requirement to share data, but cannot identify a specific repository to use. This situation often occurs when researchers have research datasets underlying publication figures and tables, or data not associated with a publication, they would like to share. The NIH Figshare pilot helps address this issue by providing a generalist repository option for these data.

Data submitted to NIH Figshare will be reviewed to ensure there is no personally identifiable information in the data and metadata prior to being published and made discoverable. Review will also ensure the data and metadata are findable, accessible, interoperable, and reusable – or aligned with the FAIR principles.

Sports.BradStenger.com

Data Science newsletter – July 29, 2019

Leave a Comment Cancel reply