Data Science newsletter – January 22, 2021

Newsletter features journalism, research papers and tools/software for January 22, 2021

GROUP CURATION: N/A

 

Each year, the NSF Survey of Earned Doctorates releases the demographics of new PhDs in STEM, and data is important to show trends quantitatively.

Twitter, Dr. Laura Lopez


from

For example, here’s the data of Black, Latinx, and Native American astronomy & astrophysics PhDs vs total in the last 5 yrs.


An Algorithm Is Helping a Community Detect Lead Pipes

WIRED, Business, Sidney Fussell


from

The model had shown promise in Flint before officials rebelled. Now Toledo is using it, while incorporating more public input.


James DiCarlo named director of the MIT Quest for Intelligence

MIT News, MIT Schwarzman College of Computing


from

James DiCarlo, the Peter de Florez Professor of Neuroscience, has been appointed to the role of director of the MIT Quest for Intelligence. MIT Quest was launched in 2018 to discover the basis of natural intelligence, create new foundations for machine intelligence, and deliver new tools and technologies for humanity.

As director, DiCarlo will forge new collaborations with researchers within MIT and beyond to accelerate progress in understanding intelligence and developing the next generation of intelligence tools.


Cell’s publisher invites statements in papers about studies’ diversity and inclusion efforts

Science, Sofia Moutinho


from

Last week, Cell Press, publisher of more than 50 journals, became one of the first publlshers to invite authors to do so. The publisher—whose titles include the prestigious Cell, as well as Neuron and Current Biology—said the declarations can highlight authors’ personal characteristics or elements of a study’s design. These can include, for example, researchers’ use of ethnically diverse cell lines and genomic data sets and efforts to ensure gender balance in study samples and on lists of citations and authors.

Submitting such a declaration is not mandatory and will not affect whether submitted manuscripts are accepted for publication, Cell Press said.

“We see it as a way to acknowledge current authors who have worked to improve inclusion and diversity in science and encourage others to do more going forward,” said Deborah Sweet, vice president of editorial, in a statement. The company plans to track the diversity disclosures during the coming year to measure progress toward that outcome.


Amazon hires Seattle U computer science chair, then makes big donation to find replacement

GeekWire, Lisa Stiffler


from

Amazon is making a significant donation to Seattle University to create an endowed chair to lead its Computer Science Department, a move that promises to help the private Jesuit university raise the profile and expand the capabilities of its computer science program.

Seattle University announced the gift on Thursday morning with much gratitude to Amazon — but there’s more to the story.

The university lost its existing chair, Roshanak Roshandel, to the tech giant in December. She resigned from the role to take a full-time position at Amazon as a principal product manager – technical (PMT) for Alexa Experience.


Cornell Day of Data to focus on research collaboration

Cornell University, Cornell Chronicle


from

The annual Cornell Day of Data gathers strength in numbers by bringing together professors, researchers and students across the university to share techniques, tools and insights in working with data.

This year’s virtual event, Jan. 27-28, focuses on the theme of “Scholarship through Collaboration.”

Shane G. Henderson, the Charles W. Lake, Jr. Professor in Productivity in the School of Operations Research and Information Engineering, will deliver the keynote presentation, “Yes, Data Is Great, But …”, about decision-making under uncertain conditions.


NULab Co-Director David Lazer’s Latest Survey Finds Mask-Wearing And Social Distancing Have Improved; More Needed to Stop COVID-19

Northeastern University, NULab for texts, maps and networks


from

David Lazer, NULab co-director and University Distinguished Professor of political science and computer and information sciences at Northeastern, has just released the January 2021 report of The COVID States Project. Lazer co-authored the findings with colleagues from Northeastern University, Harvard University, Rutgers University, and Northwestern University. The latest survey collected responses from 25,640 people across all 50 states and Washington, D.C, between December 16, 2020 and January 11, 2021. Eighty percent of respondents “very closely” followed recommendations to wear a face mask outside the home. Compared to the results from November and early December, fewer people surveyed had said that in the preceding 24 hours, they visited a gym, restaurant, workplace, or crowded space with people outside their household.

But the survey found that there are still not enough people following public health recommendations to the extent necessary to stop the spread of COVID-19. The latest survey found an upward trend in mask-wearing and social distancing behaviors, but neither are at the level surveys found in the spring. David Lazer told NPR’s Morning Edition that “It’s good news-bad news. The good news is we’ve improved a lot in terms of mask-wearing and social distancing. The bad news is, to bend the curve they really need to be much better.”


Identification of effective spreaders in contact networks using dynamical influence

Applied Network Science, Ruaridh A. Clark & Malcolm Macdonald


from

Contact networks provide insights on disease spread due to the duration of close proximity interactions. For systems governed by consensus dynamics, network structure is key to optimising the spread of information. For disease spread over contact networks, the structure would be expected to be similarly influential. However, metrics that are essentially agnostic to the network’s structure, such as weighted degree (strength) centrality and its variants, perform near-optimally in selecting effective spreaders. These degree-based metrics outperform eigenvector centrality, despite disease spread over a network being a random walk process. This paper improves eigenvector-based spreader selection by introducing the non-linear relationship between contact time and the probability of disease transmission into the assessment of network dynamics. This approximation of disease spread dynamics is achieved by altering the Laplacian matrix, which in turn highlights why nodes with a high degree are such influential disease spreaders. From this approach, a trichotomy emerges on the definition of an effective spreader where, for susceptible-infected simulations, eigenvector-based selections can either optimise the initial rate of infection, the average rate of infection, or produce the fastest time to full infection of the network. Simulated and real-world human contact networks are examined, with insights also drawn on the effective adaptation of ant colony contact networks to reduce pathogen spread and protect the queen ant. [full text]


A Skeleton Key for AI Hardware Experimentation

The Next Platform, Nicole Hemsoth


from

For those who are at the intersection of AI hardware and software, the open source Apache TVM effort is already well known and used among a number of chipmakers as well as developers. It is a machine learning compiler framework that can meet devices at the edge to datacenter with optimized configurations no matter the target hardware.

If it wasn’t already in use by AMD, Qualcomm, Arm, Xilinx, Amazon, and many others, it might smack of that “magic compiler” mojo some of the AI chip startups began with a few years ago. The idea that machine learning models don’t need to be uniquely hand-tailored to individual hardware devices expanding potential for hardware startups and established vendors alike. And now might be its time to really shine as a standard base for new AI hardware to roll into production without the heavy burden on users to adopt an architecture-specific approach. There are, after all, plenty of devices for ML acceleration to choose from.

“There has been a proliferation of hardware targets and that has been fragmented, and so too has the software ecosystem around those. There’s TensorFlow, Keras, PyTorch, and so on not to mention the increasingly complex interplay between ML models, software frameworks, and hardware,” says Luis Ceze, professor at the University of Washington and co-founder and co-CEO of TVM-driven startup, OctoML.


The pandemic is teaching us to use data in new ways

Wired UK, Rhonda Vonshay Sharpe


from

In 2021, we will see academics, activists and grassroots organisations unite to advocate for policies that provide workers with a living wage, paid leave and a pathway to wealth accumulation. These policies will have a central theme: the redistribution of wealth with the intent to reduce economic inequality and increase economic security.

Economic insecurity will take centre stage as we enter 2021 in a recession. Politicians will blame each other for not enacting policies that prioritise the safety of frontline workers, renew consumer confidence or stabilise disruptions to the supply chain. As the death toll increases from Covid-19, we will hear a familiar term: “disproportionately” – this is code for how institutional racism and anti-Black policies have impacted Black, Hispanic and Indigenous communities.

We will also hear a new term, “disaggregated data” – in other words, data that has been divided into detailed sub-categories – which will allow us to understand who exactly Covid-19 has killed, who was denied or benefited from relief packages and who was completely forgotten. Disaggregated data will also allow for the identification of the nuances in the characteristics and outcomes that define vulnerability. We will require that data be collected and reported with the purpose of being separated into its component parts. This will give us the accountability we need to measure progress in the wellbeing of vulnerable populations.


The new B.1.1.7 is a ‘super-spreader’ strain. Here’s how the U.S. can control it, says Dr. Eric Topol

MarketWatch, Jaimy Lee


from

Of all the news about the virus that has wreaked havoc on our lives for the last year, the emergence of several possibly more infectious variants in different parts of the world is worrying.

Why? Some of the new variants, including the B.1.1.7 identified by health officials in the United Kingdom, the P1 in Brazil, and the B.1.351 in South Africa, are thought to be more transmissible, which then raises questions about infection rates, the severity of disease, and the burden on hospitals already struggling with an ever-growing number of cases.

“When they’re hyper-transmissible, they double every week,” Dr. Eric Topol, founder and director of the Scripps Research Translational Institute, said in a Jan. 13 interview. “They go exponential.”

This is why Topol, an expert on many things digital health and a longtime advocate of genomic sequencing, is arguing for wider spread utilization of genomic sequencing surveillance in the U.S. Until recently, not many people were making the case for this type of broad use of the technology, but that’s already changing, driven by fear of the new B.1.1.7 strain.


Where, how you live matter when it comes to COVID-19

The Daily Memphian, Jane Roberts


from

[Cherisse] Madlock-Brown, assistant professor at the University of Tennessee Health Science Center, is now leading a team of national data experts digging into the medical records of 2.5 million people who were tested for COVID in dozens of clinical centers across the nation.

The members are gleaning medical evidence from 147 million doctor visits, overlaying it with Google Maps and public databases on employment, food security and access to medical care, to tease out factors that may prove why some people are more vulnerable to COVID, or what life conditions hamper recovery, which could include poor air quality from neighborhood fixtures, like factories.


Where surveillance cameras work but the justice system doesn’t

rest of the world, Madeleine Wattenbarger


from

Mexico City has one of the most ambitious and sophisticated video surveillance systems in the world. But it hasn’t stopped crime.


Guidelines for military and non-military use of Artificial Intelligence

European Parliament, News


from

  • AI can replace neither human decision-making nor human contact
  • EU strategy prohibiting lethal autonomous weapon systems needed
  • Call for a ban on “highly intrusive social scoring applications” by public authorities; concerns over “deepfakes”

  • Researcher awarded for work on intelligent reinforced concrete pipes

    Innovation in Textiles blog


    from

    In 2020, ITA-doctoral candidate Kira Heins was awarded Hanns Voith Foundation Prize 2020 in the field of New Materials to the tune of € 5,000. Ms Heins received the prize for her Master’s thesis Development and Realisation of Concepts for Manufacturing of Intelligent Textile reinforced Concrete Pipes.

    With her work, Ms. Heins is researching the implementation of textile reinforced concrete pipes with leakage detection integrated into the reinforcement for the first time.


    Economist Nouriel Roubini: “Twitter and the Other Platforms Are Bad, Facebook Is Worse”

    Der Spiegel, Tim Bartz


    from

    Star economist Nouriel Roubini believes that President-elect Joseph Biden’s first term will be overshadowed by civil unrest at home and cyberattacks from abroad. He believes social media platforms must be more strictly regulated.


    Events



    2020-21 Translational Data Analytics Institute Speaker Series

    The Ohio State University, Translational Data Analytics Institute


    from

    Online January 28, starting at 12 p.m. Eastern time. Speaker: Monica Sanders, University of Delaware, “AI, Big (or Bespoke) Data and Disasters” [registration required]


    Deadlines



    Data Science and Applied AI Postdoctoral Scholars Program

    “This unique program provides postdocs with the opportunity to pursue original research on significant questions in data science, while also developing specialized domain expertise in one or more complementary areas such as behavioral science, healthcare, and public policy.” Deadline for applications is February 1.

    SPONSORED CONTENT

    Assets  




    The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

     


    Tools & Resources



    Make Your Own Internet Archive With Archive Box

    NixIntel


    from

    One of the biggest challenges of internet investigation is preserving data once you’ve found it. We have access to more information than ever before, but so much of it can be easily lost if we don’t take steps to archive it. If you’ve ever bookmarked an important resource only to come back later and see that it’s no longer available, you’ll know how frustrating it can be. I wrote about this problem last year in this post about the Attrition of Information In OSINT along with some suggestions about how to preserve internet material as well as how to recover data when it has been removed.


    Fine-tuning a *3-billion* parameter model on a single GPU?

    Twiter, Hugging Face


    from

    Now possible in transformers, thanks to the DeepSpeed/Fairscale integrations!

    Thank you @StasBekman
    for the seamless integration, and thanks to @microsoft
    and @facebookai
    teams for their support!


    HOT NEWS!! “Code as Creative Medium” by @golan & @tegabrain — featuring creative coding exercises and assignments collected from across our community — is now out! If you teach computational new media art and design, we made this for you!

    Twitter, Golan Levin


    from

    “Code as Creative Medium” features a foreword by @reas
    ; interviews with @tchoi8
    @hdeweyh
    @RLukeDuBois
    @polishedsolid
    @zachlieberman
    @runemadsen
    @laurenleemack
    @aparrish
    @phoenixperry
    @shiffman
    @siusoon
    @sugi2000
    & @blprnt
    ; and computational book design by @Q_Shim
    & @MinsunEo
    !


    Careers


    Full-time positions outside academia

    Senior Research SDE



    Microsoft Research; Montreal, QC, Canada

    Leave a Comment

    Your email address will not be published.