Data Science newsletter – August 18, 2021

Newsletter features journalism, research papers and tools/software for August 18, 2021

 

The state of data scientists: Overwhelmed and underfunded

TechRepublic, Veronica Combs


from

Data science initiatives need a strategic makeover to break down silos, support long-term thinking and improve daily operations, according to a new survey.

Three hundred data executives in the U.S. identified a wide range of problems in Domino Data Lab’s report, “Data Science Needs to Grow Up: The 2021 Domino Data Lab Maturity Index.

A majority of respondents (82%) were concerned about the impact of both of these issues:

  • A major revenue loss or a hit to brand reputation stemming from bad or failing models.
  • A trend toward splashy investments that have short-term payoffs

  • AI researchers trust international, scientific organizations most

    Cornell University, Cornell Chronicle


    from

    Researchers working in the areas of machine learning and artificial intelligence trust international and scientific organizations the most to shape the development and use of AI in the public interest.

    But who do they trust the least? National militaries, Chinese tech companies and Facebook.

    Those are some of the results of a new study led by Baobao Zhang, a Klarman postdoctoral fellow in the College of Arts and Sciences. The paper, “Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers,” published Aug. 2 in the Journal of Artificial Intelligence Research.


    How open-source software shapes AI policy

    The Brookings Institution, Alex Engler


    from

    Open-source software quietly affects nearly every issue in AI policy, but it is largely absent from discussions around AI policy—policymakers need to more actively consider OSS’s role in AI.


    Smart Cities, Bad Metaphors, and a Better Urban Future

    WIRED, Science, Adam Rogers


    from

    The people who write about cities—I’ve done it myself—also tend to grope for organizing metaphors in current science. A city is a machine, a city is an animal, a city is an ecosystem. Or maybe a city is like a computer. To the urbanist and media studies writer Shannon Mattern, that’s the dangerous one.

    Mattern’s new book comes out August 10; it’s a collection (with revisions and updates) of some of her very smart work for Places Journal called A City Is Not a Computer: Other Urban Intelligences. In it, Mattern wrestles with the ways that particular metaphor has screwed up the design, planning, and living-in of cities in the 20th century. It happens at every scale, from surveilling individual people as if they were bits to monitoring the widescreen data necessary to keep a city functioning for the good of its inhabitants. Of all the ways information can travel through an urban network, Mattern says, it’d probably be better to have public libraries be the nodes than the panopticon-like centralized dashboards so many cities try to build. The problem is that the metrics people choose to track become targets to achieve. They become their own kind of metaphors, and they’re usually wrong.


    Our new report looks at #COVID19 vaccine #misinformation and its link to vaccine attitudes. A fifth of Americans report believing #vaccine falsehoods.

    Twitter, Katherine Ognyanova


    from


    U.S. prisons mull AI to analyze inmate phone calls

    Reuters, David Sherfinski and Avi Asher-Schapiro


    from

    For people like Heather Bollin, a 43-year-old woman in Texas engaged to a man who is currently incarcerated, constant surveillance is a fact of life: the three daily phone calls they have together are subject to monitoring by prison officials.

    “We are never able to communicate without being under surveillance,” she told the Thomson Reuters Foundation in a phone interview, asking that the prison her fiance is in remain anonymous because she fears retaliation.

    Prisons in the United States could get more high-tech help keeping tabs on what inmates are saying, after a key House of Representatives panel pressed for a report to study the use of artificial intelligence (AI) to analyze prisoners’ phone calls.


    My 2 cents on Interactive Art and Reinforcement Learning

    Medium, Ishan Nangia


    from

    Even though a lot of work has been done, and I am confident that this is only the surface of the iceberg that represents what all is possible in this space, I haven’t personally come across any approaches using Reinforcement Learning (RL) to redefine and automate interaction with an art piece. In my view RL would really allow us to change how different art pieces are viewed and used. Giving an artwork almost sentient capabilities should be possible through RL. Thus, the paucity of work using RL with art didn’t reconcile with the potentially limitless avenues that mixing these two fields would open. So, I thought a little about it and below are my reasons.

    My reasoning explaining why RL hasn’t been used in this field is threefold. The first reason is that Reinforcement Learning (RL) is a much more esoteric field when compared to what AI represents these days. It is a tougher, more black-boxish and less easily accessible field in my experience. It isn’t as commonplace as it’s relatively pervasive siblings, supervised and unsupervised learning. Thus, this in turn makes it tougher for artists to find researchers and engineers proficient at RL and minimizes possible collaborations.


    Can technology help authors write a book?

    BBC News, Bernd Debusmann


    from

    For Michael Green, a US data scientist turned novelist, the need to use technology to simplify and streamline the writing process came when he was in the middle of writing his first book.
    Michael Greenimage sourceMichael Green
    image captionMichael Green came up with the idea for the digital platform Lynit to help his own writing problems

    With 500 pages of a complex story written, he recalls that the process had become difficult to manage: “In the midst of editing, I got to the point where I started feeling like I had a lot of plots and characters.”

    “I had all these documents on the deeper aspects of the world I was creating. I was worried about being able to keep track of it all. That’s when I switched into my more data science-minded approach to solving a complex problem with a lot of different pieces.”


    Major U.K. science funder to require grantees to make papers immediately free to all

    Science, Cathleen O'Grady


    from

    The United Kingdom currently has one of the highest rates of open-access publication in the world, with many researchers posting their research papers on websites that make them publicly available for free. But the country’s leading funding agency today announced a new policy that will push open access even further by mandating that all research it funds must be freely available for anyone to read upon publication.

    The policy by the funder, UK Research and Innovation (UKRI), will expand on existing rules covering all research papers produced from its £8 billion in annual funding. About three-quarters of papers recently published from U.K. universities are open access, and UKRI’s current policy gives scholars two routes to comply: Pay journals for “gold” open access, which makes a paper free to read on the publisher’s website, or choose the “green” route, which allows them to deposit a near-final version of the paper on a public repository, after a waiting period of up to 1 year. Publishers have insisted that an embargo period is necessary to prevent the free papers from peeling away their subscribers.

    But starting in April 2022, that yearlong delay will no longer be permitted: Researchers choosing green open access must deposit the paper immediately when it is published.


    How police monitor social media to find crime and track suspects

    MLive.com, Malachi Barrett


    from

    Police are scrolling through social media to find crime and check up on potential suspects, raising concerns about surveillance in an increasingly online world.

    Monitoring public posts on platforms like Facebook, Twitter, and Instagram is a common way police departments across the country collect information about individuals or specific types of activities. Representatives of the Michigan State Police and Detroit Police Department said officers manually search public posts and also feed social media photos to facial recognition algorithms that hunt for similarities between millions of faces collected in police databases.


    Policymakers Must Regulate Big Tech’s AI Dominance

    Foreign Policy, Bhaskar Chakravorti


    from

    Google CEO Sundar Pichai has suggested—more than once—that artificial intelligence (AI) will affect humanity’s development more profoundly than humanity’s harnessing of fire. He was speaking, of course, of AI as a technology that gives machines or software the ability to mimic human intelligence to complete ever more complex tasks with little or no human input at all.

    You may laugh Pichai’s comparison off as the usual Silicon Valley hype, but the company’s dealmakers aren’t laughing. Since 2007, Google has bought at least 30 AI companies working on everything from image recognition to more human-sounding computer voices—more than any of its Big Tech peers. One of these acquisitions, DeepMind, which Google bought in 2014, just announced that it can predict the structure of every protein in the human body from the DNA of cells—an achievement that could fire up numerous breakthroughs in biological and medical research. These breakthroughs will of course only happen if Google allows broad access to DeepMind’s knowledge, but the good news is that Google has decided it will. However, there is a “but.”

    For one, Google isn’t the only gatekeeper whose decisions will largely determine the direction AI technology takes. The roster of companies snatching up AI startups globally is also dominated by the familiar Big Tech names that so often accompany the search and advertising giant: Apple, Facebook, Microsoft, and Amazon. In 2016, this group, along with Chinese mega-players such as Baidu, spent $20 billion to $30 billion out of an estimated global total of $26 billion to $39 billion on AI-related research, development, and acquisitions. With dominance in search, social media, online retail, and app stores, these companies have near-monopolies on user data.


    Researchers create “master faces” — AI-generated artificial photos of human faces that are generic enough that they can pass as many different people in face-recognition systems

    Twitter, Clive Thompson


    from


    CMU Misinformation Researchers Zero in on Climate Change

    Carnegie Mellon University, News


    from

    This week, the Intergovernmental Panel on Climate Change released a report outlining faster and more irrevocable changes due to carbon dioxide emissions than previously feared. [Aman] Tyagi, who earned his Ph.D. in engineering and public policy at CMU, chose to focus on climate misinformation in order to apply his knowledge of computer science to this societal problem.

    “There’s no lack of scientific evidence for climate change, but it’s still unfortunately being debated on social media as a partisan issue,” Tyagi said. “If people wrongly believe, then, that it isn’t man-made, we probably won’t see any major policies which successfully curb climate change. That’s the motivation behind this work.”

    Tyagi and [Kathleen] Carley collected a data set of 38 million unique tweets over 100 weeks. Using a state-of-the-art stance detection method, they were able to sort the data set into groups of believers and disbelievers of climate change science. Of the seven major conspiracy theories they identified, disbelievers primarily shared two: the chemtrails theory, which claims that the trails following high-altitude jets are chemical agents being sprayed for nefarious purposes; and the geo engineering theory, which claims that government experiments are causing climate change.


    Two numbers: 1,000 = acquisitions made by GAFAM (Google, Amazon, Facebook, Apple, and Microsoft) in the past 20 years.

    Twitter, Julia Powles


    from

    Zero = acquisitions actually blocked by regulators worldwide.

    Nice piece by @TomValletti
    , Chief Competition Economist of the European Commission 2016–2019.


    Understanding extreme weather

    Stanford University, Stanford News


    from

    From lake-draining drought in California to bridge-breaking floods in China, extreme weather is wreaking havoc. Preparing for weather extremes in a changing climate remains a challenge, however, because their causes are complex and their response to global warming is often not well understood. Now, Stanford researchers have developed a machine learning tool to identify conditions for extreme precipitation events in the Midwest, which account for over half of all major U.S. flood disasters. Published in Geophysical Research Letters, their approach is one of the first examples using AI to analyze causes of long-term changes in extreme events and could help make projections of such events more accurate.


    Deadlines



    Designing for Data Awareness: Addressing Privacy and Security Concerns About “Smart” Technologies

    “This one-day workshop, participants will work together to brainstorm potential solutions for making smart device data more visible and interpretable for consumers.” Deadline for submissions is September 1.

    Apply to Participate in Virtual Office Hours for Biomechanical Modeling or Machine Learning Research Questions

    “We are pleased to announce we will be holding Virtual Office Hours to support researchers working with wearable sensors, video technology, and other modalities in rehabilitation research. These office hours are offered as part of the training efforts of our Mobilize Center. We will have two tracks for the office hours: 1) Biomechanical modeling with OpenSim and IMUs or video, and 2) Machine learning, including video analysis. All phases of a research project are supported, such as formulating a research question, choosing and planning appropriate methods, and addressing issues with carrying out the study.” Apply for appointments by September 3.

    SPONSORED CONTENT

    Assets  




    The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

     


    Tools & Resources



    [2108.02497] How to avoid machine learning pitfalls: a guide for academic researchersopen searchopen navigation menucontact arXivsubscribe to arXiv mailings

    arXiv, Computer Science > Machine Learning; Michael A. Lones


    from

    This document gives a concise outline of some of the common mistakes that occur when using machine learning techniques, and what can be done to avoid them. It is intended primarily as a guide for research students, and focuses on issues that are of particular concern within academic research, such as the need to do rigorous comparisons and reach valid conclusions. It covers five stages of the machine learning process: what to do before model building, how to reliably build models, how to robustly evaluate models, how to compare models fairly, and how to report results.


    POSH: A Data-Aware Shell

    Micah Lerner


    from

    This week’s paper review covers POSH, a system capable of achieving dramatic speedups for unmodified shell scripts that perform large amounts of IO – intriguing use cases of POSH are log analysis or the git workflows of large software projects. In particular, POSH shines in environments that use distributed file systems like Network File System (NFS) mounts – I’ve included a link to a great overview of NFS in the sidebar (or if you are on mobile, you can click the number “4” to reveal it).

    Leave a Comment

    Your email address will not be published.