Data Science newsletter – May 4, 2021

Newsletter features journalism, research papers and tools/software for May 4, 2021

 

Google Research, Beckman researchers tap artificial intelligence to learn more about how intelligence is created in the human brain

University of Illinois, Beckman Institute for Advanced Science & Technology, News


from

Much like the sweeping blue ocean covering our world, the human brain is vast, largely unexplored and mostly a mystery, despite being essential to humanity. The biological origins of intelligence – and how it is created by the network architecture the human brain – remains largely unknown – but Beckman researchers and Google Research are one step closer to finding answers with their recent collaboration.

A team of researchers is turning to advances in computer science and engineering to uncover the secrets. The interdisciplinary team is jointly led by Lav Varshney, associate professor of electrical and computer engineering, neuroscience, and computer science, and Aron Barbey, professor of psychology, neuroscience, and bioengineering, in partnership with Been Kim, a research scientist at Google Research.


UCSD Launches Study Using Oura To Explore Pregnancy

Oura, News


from

The goal of this research is to find answers and make them publicly available, together. In doing so, UC San Diego and Oura hope to generate useful information for clinicians and individuals alike, while revealing the power of participants to drive research in underserved areas. All findings will be made public, regardless of whether or not you have access to an Oura Ring.

More specifically, UC San Diego is inviting Oura Ring users with a history of pregnancy (and their partners) to contribute their historical Oura data and personal observations to research. UC San Diego will then compile the data from those who opt-in to construct models of what patterns appear “normal” for pregnancy vs. what might change ahead of common complications.


How a university got itself banned from the Linux kernel

The Verge, Monica Chin


from

On the evening of April 6th, a student emailed a patch to a list of developers. Fifteen days later, the University of Minnesota was banned from contributing to the Linux kernel.

“I suggest you find a different community to do experiments on,” wrote Linux Foundation fellow Greg Kroah-Hartman in a livid email. “You are not welcome here.”

How did one email lead to a university-wide ban? I’ve spent the past week digging into this world — the players, the jargon, the university’s turbulent history with open-source software, the devoted and principled Linux kernel community. None of the University of Minnesota researchers would talk to me for this story. But among the other major characters — the Linux developers — there was no such hesitancy. This was a community eager to speak; it was a community betrayed.


New online tool creates a searchable database of UW–Madison research and scholarship

University of Wisconsin-Madison, News


from

Over the last three years, researchers at the University of Wisconsin–Madison have authored more than 21,400 articles, presented at nearly 1,500 conferences, earned almost 500 patents and generated nearly $2 billion in grant funding.

These are among just some of the activities now captured in a new campus tool called Research at UW–Madison, a searchable website to identify research and research projects, foster research collaborations, and highlight the achievements of UW–Madison scholars.

“We decided to invest in this tool, in part, to help our stakeholders who want to better understand the range of research and scholarly activity of UW–Madison researchers,” says Provost Karl Scholz. “The site showcases not only the range but the extraordinary depth and volume of scholarly activity at UW–Madison.”


The Computers Are Getting Better at Writing, Thanks to Artificial Intelligence

The New Yorker, Stephen Marche


from

GPT-3 hints at a world in which machines can generate language. The consequences are vertiginous. To spend ten minutes with Sudowrite is to recognize that the undergraduate essay, the basic pedagogical mode of all humanities, will soon be under severe pressure. Take an A paper, change a few words in the first paragraph, push buttons three times, and you have an essay that fits the assignment. Whatever field you are in, if it uses language, it is about to be transformed. The changes that are coming are fundamental to every method of speaking and writing that presently exists.

Amit Gupta is one of the founders of Sudowrite. He left Silicon Valley for a career as a science-fiction writer and found a glorious fusion of the two when GPT-3 appeared. “Other artists have had tools like this for a long time,” he said. “There are really sophisticated tools for visual artists—whether it’s Photoshop or a 3‑D tool, there’s all these things that you can apply. People who are using 3‑D models aren’t using clay. They’re not using the tools of the past. They’re using really advanced tools that automate a lot of the processes. But writing has been stuck in the past. We’re not using paper and pen, but we’re not much better off than that.” Sudowrite accesses GPT-3’s interface and turns it into a legible tool that any writer can use. Currently, Sudowrite is in beta. Gupta imagines the product turning into a resource that writers will pay fifteen to twenty dollars per month to use.


Inside Netflix’s Quest to End Scrolling – How the Company is Working to Solve One of its Biggest Threats: Decision Fatigue.

New York Magazine, Vulture, Josef Adalian


from

Today, the company is launching Play Something, a new viewing mode designed to make it easier for the indecisive among us to quickly find something to watch. As with those early forays into instant-playing content, the goal of this new shuffle feature is to eliminate, or at least ease, the Peak TV-era anxiety so many of us feel while trying to find something to watch. But unlike its past attempt, it won’t be automatic: You’ll have to opt in — either at start-up or when you’re browsing your home page. If you do, the usual page upon page of box art and show descriptions disappears. Instead, the Netflix matrix chooses something it thinks you’ll be into and just starts streaming it, along with an onscreen graphic briefly explaining why it chose that title. Don’t like what you see? A quick button press skips ahead to another selection. If you suddenly decide an earlier selection is actually a better pick, you can also go backward. (The feature will initially be available on all Netflix


Online privacy notices don’t work. Here are 9 alternatives

World Economic Forum; Jen King, Andreas Katsanevas, Anne Flanagan


from

For the past two decades, we have become accustomed to the tiresome ritual of clicking through privacy policies online, but like lengthy terms of service documents, these are generally not written with the average person in mind. They are foremost legal documents, and, after all, who has time?

As people, we assume that we can and want to make informed decisions about who collects data about us and how. Yet numerous studies have identified how infeasible it is for people to have the capacity, let alone will, to manage the multitude of relationships they have with data processors.

What we have here is not just a legal problem, but a human-technology interaction problem, one that must be addressed with due consideration of human limitations with respect to technology, as well as the broader consequences for society.


Salmon have shrunk so much that Whole Foods redid its guidelines

The Seattle Times, Bloomberg, Kim Chapman


from

“It’s very irritating when the supplier sends it to me and I see the size of these fillets,” he said. “In the last eight to 10 years, the salmon sizes have started to get smaller and smaller.”

These disruptions are, for now, more a nuisance than serious problem. But they almost certainly presage more costly changes to come and, much more importantly, raise alarm bells about the growing crisis in some key salmon populations that is being driven, according to many scientists, by climate change and more competition for food. Decades after the Atlantic cod fisheries collapsed, concern is now mounting among experts that wild Pacific salmon could face a similar fate.

“The whole thing is out of whack,” said Laurie Weitkamp, a U.S. fisheries biologist at the National Oceanic and Atmospheric Administration. “Salmon managers are realizing that climate change is impacting their stocks and it is generally not favorable and it’s only going to get worse.”


Judges hear arguments over contentious Census privacy tool

Associated Press, Kim Chandler and Mike Schneider


from

The fight over whether the U.S. Census Bureau can use a controversial statistical technique to keep people’s information private in the numbers used for drawing political districts on Monday went before a judicial panel that must decide if the method provides enough data accuracy.

A panel of three federal judges heard arguments on whether the method known as “differential privacy” meets the federal legal requirement for keeping private the personal information of people who participated in the 2020 census while still allowing the numbers to be sufficiently accurate for the highly partisan process of redrawing congressional and legislative districts. Differential privacy adds mathematical “noise,” or intentional errors, to the data to obscure any given individual’s identity while still providing statistically valid information.

Because a panel of three federal judges will decide the matter, any appeal of the lawsuit brought by the state of Alabama could go straight to the Supreme Court.


‘I was terrible at crosswords so I built an AI to do them’

BBC News, Jane Wakefield


from

“I was with 700 people who were really good at solving crossword puzzles and it annoyed me that I was so terrible, so I decided to write a computer program that would get even on my behalf,” he told the BBC.

And finally he did. After 10 failed attempts, Dr Fill – as the program is known – has just won its first competition.


Education gives NC an edge in drawing tech companies

Raleigh News & Observer, Randy Woodson


from

To meet the growing needs of data science research, education and workforce development in North Carolina, NC State recently launched our Data Science Academy — an incubator for training and long-term research partnerships.

Access to talent remains dominant in corporate real estate decisions, outweighing cost and other factors. States with robust education systems provide strong talent pipelines and, as a result, positive economic impact.

In Raleigh, nearly half of the population over the age of 25 holds a bachelor’s degree or higher, ahead of Austin, Atlanta and Nashville, and slightly behind San Francisco and Boston. Raleigh is also high in growth across STEM completions at 40.3%, again ahead of Austin, Atlanta and Nashville. This data is based on an independent study commissioned by Wake County Economic Development that contrasts Raleigh with the 100 largest U.S. metro areas.


The Drive to Make Machine Learning Greener

Discover Magazine, The Physics arXiv Blog


from

At issue is an important question: How can the carbon emissions from this number-crunching be reduced?

Now we have an answer thanks to the work of David Patterson at the University of California, Berkeley, with a group from Google who he also advises. This team says there is significant room for improvement and that straightforward changes can reduce the carbon footprint of machine learning by three orders of magnitude.


Vision for ultra-precision agriculture includes machine-learning enabled sensing, modeling, robots tending crops

Iowa State University, News Service


from

The ideas behind COALESCE have been bubbling around the Iowa State campus for years and have led to the creation of a core research team: Sarkar; Singh; Baskar Ganapathysubramanian, the Joseph C. and Elizabeth A. Anderlik Professor in Engineering; and Arti Singh, an assistant professor of agronomy.

The ideas have also attracted several competitive grants, including an initial grant to the core team from the Iowa Soybean Association with Arti Singh as the principal investigator. There was also a three-year seed grant to the core team from Iowa State’s Presidential Initiative for Interdisciplinary Research. These grants helped build the team, make initial discoveries and connect with other researchers.

An illustration from the seed project – a project called “Data Driven Discoveries for Agricultural Innovation” – shows an airplane, three drones and four robots collecting data from a field to help the farmer standing to the side.

How can all that data help a farmer?

“Data science isn’t just about assembling data and making predictions,” Ganapathysubramanian said. “It’s also about making decisions.”


Why American medicine needs its own DARPA

TheHill, Opinion, Kenneth Dekleva


from

President Biden’s recent nomination of Dr. Stacey Dixon as the Principal Deputy Director of National Intelligence is historic. She is an exemplary leader who holds a doctorate in engineering and has extensive experience in national security technology, highlighting the Biden administration’s desire to place technology at the lead of intelligence gathering and national security. America is in a technology race, where it confronts formidable adversaries such as China, Russia, North Korea and Iran who espouse hybrid warfare, whole-of-government approaches and aggressive drives, such as “Made in China 2025,” to dominate novel technologies — artificial intelligence (AI), robotics, hypersonic weapons, quantum computing, biomedicine and machine learning.


LTU launching interdisciplinary Master of Science in Artificial Intelligence

Lawrence Technological University (Southfield, MI)


from

Explosive growth in the use of artificial intelligence technologies in a wide variety of industries, such as automotive, healthcare, retail, finance, science and manufacturing, has led Lawrence Technological University (LTU) to create a new degree, the Master of Science in Artificial Intelligence (MSAI). AI and big data are now being successfully used in the battle against the pandemic by providing solutions to tracking, predicting, contact tracing, early diagnosis and treatment of COVID-19.

The new program will be a joint effort between of the Department of Electrical and Computer Engineering (ECE) in LTU’s College of Engineering (CoE), and the Department of Mathematics and Computer Science (MCS) in the LTU’s College of Arts and Sciences (CoAS).


Deadlines



arXiv’s Giving Week is May 2 – 8, 2021

“arXiv is not free to operate, and, as a nonprofit, we depend on the generosity of foundations, members, donors, volunteers, and individuals like you to survive and thrive. If arXiv matters to you and you have the means to contribute, we humbly ask you to join arXiv’s global community of supporters with a donation during arXiv’s Giving Week, May 2 – 8, 2021.”

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



Announcing the NASA Openscapes Framework

Openscapes, NASA


from

“Together with the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California Santa Barbara, University of Washington eScience Institute, Metadata Game Changers, 2i2c, and The Carpentries, we are answering the call from NASA to support researchers from the NASA Distributed Active Archive Centers (DAACs) migrating workflows to the cloud.”


Switching From Spreadsheets to Neptune.ai and How It Pushed My Model Building Process to the Next Level

neptune.ai, Nikita Kozodoi


from

This tool allowed me to save a lot of time and focus on modeling decisions, which helped me to earn three medals in Kaggle competitions.

In this post, I will share my story of switching from spreadsheets to Neptune for experiment tracking. I will describe a few disadvantages of spreadsheets, explain how Neptune helps to address them, and give a couple of tips on using Neptune for Kaggle.


A first look at torch for R

R-bloggers, R on Will Hipson


from

In this post, I explore torch – a package for R that mirrors the PyTorch framework for deep learning. Motivation I’ve been a bit reluctant to join in on the deep learning hype for some time. Much of this I attribute to my lack of enthusiasm toward Python frameworks for deep learning. Don’t get me wrong. Tensorflow + Keras offers an intuitive API for neural nets, but can I just be frank and say I like R better for everything else?


Heiko Müller, Sonia Castelo, Munaf Qazi, and Juliana Freire: The openclean Open-Source Data Cleaning Library

Heiko Müller, Sonia Castelo, Munaf Qazi, and Juliana Freire


from

Over the years, many tools for profiling, preparing, and cleaning data have been developed, both in academia and industry. These approaches were developed in isolation and in different programming languages with no standardized interfaces. Thus, it is difficult for data scientists to combine existing tools and re-use them in their data processing pipelines.

Inspired by the wide adoption of generic machine learning frameworks such as scikit-learn, TensorFlow, and PyTorch, we are currently developing openclean, an open-source Python library for data profiling and data cleaning. The source code for openclean is available on GitHub.

Our goals for openclean are twofold. First, we aim to provide a unified framework for practitioners that brings together open-source data profiling and data cleaning tools into an easy-to-use environment. By making existing tools available to a large user-community, and through the integration with the rich Python ecosystem, openclean has the potential to simplify data cleaning tasks. Second, by providing a structured, extensible framework, openclean can serve as a platform to which researchers and developers can contribute their techniques.


Careers


Full-time positions outside academia

Data and Applied Scientist – Natural Language Processing – Microsoft Research



Microsoft Research, Natural Language Processing group; Redmond, WA
Internships and other temporary positions

application developer



DataCite; Remote (Europe, U.S.)

Leave a Comment

Your email address will not be published.