Data Science newsletter – September 29, 2021

Newsletter features journalism, research papers and tools/software for September 29, 2021

 

Accelerating Discovery and Innovation at the University of Alabama with the Dell ‘Cheaha’ Supercomputer

insideHPC


from

Working closely with HPC specialists from Dell Technologies and NVIDIA, the UAB team recently expanded research computing resources.

This UAB addition incorporates four DGX A100 systems with 2nd Gen AMD EPYC processors. Each DGX system can allocate up to 56 GPU instances using NVIDIA multi-instance GPU (MIG) technology. Each server can be allocated to different virtual machines via an OpenStack cloud layer.

“It gives us a solution for machine learning, both the training and inference aspects, as well as exploratory uses for data analysis,” says Ralph Zottola, Ph.D. and Assistant Vice President for Research Computing at UAB.


The Misalignment Problem: Social Media Stance vs. Public Opinion Surveys

Medium, NYU Center for Data Science


from

CDS Faculty Fellow Sarah Shugars, alongside their colleagues, recently published a paper “(Mis)alignment Between Stance Expressed in Social Media Data and Public Opinion Surveys,” describing the misalignment between a person’s stance on social media and their stance as measured by public opinion polls.

Their goal with the research was to answer two questions: 1) when do annotators (people who categorize data) agree with each other on a particular categorization for a stance and 2) how well does human behavior map to other ways of inferring a person’s attitude (i.e., surveys)?

From here, they looked at four different stances, which included Donald Trump, COVID-related lockdowns, face masks, and COVID 19 vaccinations. Using these four topics, they collected survey information accordingly and extracted data from each individual’s Twitter, using annotators to guide the categorization of tweets.


An NSF grant will support undergrads as they research, design & build a game that teaches ethical thinking in #DataScience and artificial intelligence.

Twitter, UW eScience Institute


from

The project is led by Cecilia Aragon and Bernease Herman from UW, and Sarah Evans at @UNTsocial.


Study suggests R rate for tracking pandemic should be dropped in favour of ‘nowcasts’

University of Cambridge (UK), Research News


from

When the COVID-19 pandemic emerged in 2020, the R rate became well-known shorthand for the reproduction of the disease. Yet a new study suggests it’s time for ‘A Farewell to R’ in favour of a different approach based on the growth rate of infection rather than contagiousness.

The study, published in the Journal of the Royal Society Interface and led by researchers from the University of Cambridge, is based on time series models developed using classical statistical methods. The models produce nowcasts and forecasts of the daily number of new cases and deaths that have already proved successful in predicting new COVID-19 waves and spikes in Germany, Florida and several states in India.

The study is co-authored by Andrew Harvey and Paul Kattuman, whose time series model reflecting epidemic trajectories, known as the Harvey-Kattuman model, was introduced last year in a paper published in Harvard Data Science Review.


Mammoth AI report says era of deep learning may fade, but that’s unlikely

ZDNet, Tiernan Ray


from

A report put out last week by Stanford University, in conjunction with multiple institutions, argues that the dominance of the deep learning approach may fade in coming years, as it runs out of answers for tough questions of building AI.

“The recent dominance of deep learning may be coming to an end,” write the report’s authors. “To continue making progress, AI researchers will likely need to embrace both general- and special-purpose hand-coded methods, as well as ever faster processors and bigger data.” says the AI100 report.


SOM’s asset management program begins its first year

Yale Daily News, Brian Zhang


from

The School of Management welcomed to campus its inaugural class of students pursuing master’s degrees in asset management.

The new program’s inaugural class contains 56 students hailing from 13 different countries. Classes officially started on Sept. 1, following two weeks of boot camp and orientation in which students honed quantitative skills, learned about the program through sample lectures and got the opportunity to meet one another in recreational settings. The program had been slated to begin in the 2020-21 academic year, but its inaugural year was delayed due to the COVID-19 pandemic.


How College and University Leaders Are Getting Their Campuses Vaccinated

National Academy of Sciences, Megan Lowry


from

College-aged individuals are in a unique phase of development, and while many live independently as adults, manage their own lives, or even have children of their own — what motivates someone in college can be very different from what motivates adults in their 30s or 40s.

Findings from developmental psychology and brain research about adolescent and young adult behavior indicate that college-aged individuals are socially driven, with a strong desire for reward and acceptance. Identity, agency, and autonomy are centrally important during the college years. A rapid expert consultation published last year by the National Academies says campus leaders may find success in their COVID-19 prevention strategies by tying them to these characteristics and motivations.


NSF establishes new institutes for harnessing the data revolution

National Science Foundation


from

The U.S. National Science Foundation announced a $75 million investment to establish five new Harnessing the Data Revolution Institutes. The institutes support convergence between science and engineering research communities, bringing together expertise in data science foundations, systems, applications and cyberinfrastructure. Together, they will enable breakthroughs through collaborative, co-designed programs to formulate innovative data-intensive approaches for addressing critical national challenges. … A summary of each award is provided below.

  • NSF Institute for a New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning — Led by The Ohio State University
  • NSF Institute for Accelerated AI Algorithms for Data-Driven Discovery — Led by the University of Washington
  • NSF Institute for Harnessing Data and Model Revolution in the Polar Regions — Led by the University of Maryland, Baltimore County
  • NSF Institute for Data Driven Dynamical Design — Led by the Colorado School of Mines
  • NSF Institute for Geospatial Understanding through an Integrative Discovery Environment, I-GUIDE — Led by University of Illinois Urbana-Champaign

  • Ohio State grant establishes new center for maternal and pediatric drug research

    Ohio State Medical Center


    from

    A new, five-year $17.1 million grant awarded to The Ohio State University College of Medicine will fund the creation of a knowledge bank to support the study of drugs and medical treatments for pregnant and lactating women and children.

    Lang-LiThe National Institute of Child Health and Human Development grant to the departments of Biomedical Informatics, Obstetrics and Gynecology and Pediatrics will help establish the Maternal and Pediatric Precision in Therapeutics Data, Model, Knowledge and Research Coordination Center. This newly created center will serve as a national resource for multidisciplinary expertise and knowledge in maternal and pediatric therapeutics.


    3 lessons from IBM on designing ethical AI technology

    World Economic Forum, Daniel Lim and Brian Green


    from

    Over the past two years, the World Economic Forum has been working with a multi-stakeholder group to advance ethics in technology under a project titled Responsible Use of Technology. This group has identified a need to highlight and share best practices in the responsible design, development, deployment and use of technology. To this end, we have embarked on publishing a series of case studies that feature organizations that have made meaningful contributions and progress in technology ethics. Earlier this year, we began this series with a deep dive into Microsoft’s approach to responsible innovation.

    In the second edition of this series, we focus on IBM’s journey towards ethical AI technology. The insights from this effort are detailed in a report titled Responsible Use of Technology: The IBM Case Study, which is jointly authored by the World Economic Forum and the Markkula Center for Applied Ethics at Santa Clara University. Below are the key lessons learned from our research, along with a brief overview of IBM’s historical journey towards ethical technology.

    1. Trusting your employees to think and act ethically


    AI could help to predict the next virus to jump from animals to humans

    BBC Science Focus Magazine, Jason Goodyer


    from

    Researchers at the University of Glasgow have figured out a method of using machine learning – a type of artificial intelligence – to comb through the viruses’ genomes and predict their likelihood of infecting humans.

    The method may help scientists to develop vaccines to protect against the most likely candidates before they make the jump from animals to humans.


    The changing face of fraud and how artificial intelligence is helping to counter it [Q&A]

    Beta News, Ian Barker


    from

    Financial crime is on the increase and attackers must be stopped in their tracks. Financial service organizations new and old need to ensure they have the right technology in place to predict, detect and deter fraud, whilst ensuring minimal disruption to the customer journey.

    We spoke to Martin Rehak, CEO of Resistant AI to find out how sophisticated artificial intelligence can detect known criminal practices and more importantly, predict the unknown emerging patterns of financial crime.


    Alumni gifts will support Cornell Bowers CIS

    Cornell University, Cornell Chronicle


    from

    Gifts totaling $10 million – $5 million from Steve Conine ’95 and his wife Alexi Conine ’96, and $5 million from Niraj Shah ’95 and his wife Jill Shah – will support the construction of a leading-edge new building for the Cornell Ann S. Bowers College of Computing and Information Science.

    The combined gift, led by the co-founders of the online home-goods retailer Wayfair, helps bolster growth of the college and supports increasing student demand and interest in computing and information science. Enrollment in Cornell Bowers CIS has grown sixfold over the past decade, with an anticipated 2,000 majors this year in the college’s disciplines.


    Aptamer-nanopore sensor detects only infectious viruses

    Chemical & Engineering News, Mark Peplow


    from

    By decorating tiny nanopores with short DNA strands called aptamers, researchers have created a sensor that distinguishes between infectious and non-infectious viruses (Sci. Adv. 2021, DOI: 10.1126/sciadv.abh2848). The device can detect a model virus that mimics SARS-CoV-2, for example, with a sensitivity close to the best conventional diagnostics.

    The gold-standard method for detecting viruses is based on reverse transcription polymerase chain reaction (RT-PCR). But this tends to be time-consuming and expensive, and its high sensitivity can also be an Achilles’ heel. That’s because RT-PCR can pick up genetic material in the long-dead viral debris that may linger in patients for weeks after an infection. In the current pandemic, that means some patients are flagged as COVID-positive when they are no longer infectious.

    “We really need a rapid method to tell people not only if they have a virus, but whether that virus is infectious or not,” says Yi Lu of the University of Illinois Urbana-Champaign, part of the team behind the new sensor. “To me, that is a really critical missing link.”


    Key Bank Invests $750,000 in Diversity Program at Case Western Reserve

    WTAM (Cleveland), Steve Bailey


    from

    KeyBank is investing $750,000 to create a new program at Case Western Reserve University (CWRU) that will help address issues on diversity, the digital workforce and more equitable neighborhoods. This grant is part of KeyBank’s $40 billion National Community Benefits Plan and is symbolic of the bank’s longstanding commitment to diversity, equity and inclusion.

    The KeyBank Partnership for Equity through Education and Community Impact at CWRU program will help to address issues on diversity, the digital workforce, and more equitable neighborhoods.


    Poorly Circulated Room Air Raises Potential Exposure to Contaminants by up to 6 Times

    Lawrence Berkeley Labratory, News Center


    from

    Having good room ventilation to dilute and disperse indoor air pollutants has long been recognized, and with the COVID-19 pandemic its importance has become all the more heightened. But new experiments by indoor air researchers at Lawrence Berkeley National Laboratory (Berkeley Lab) show that certain circumstances will result in poor mixing of room air, meaning airborne contaminants may not be effectively dispersed and removed by building level ventilation.

    Using CO2 as a tracer to track small respiratory aerosols that travel with air currents in a room, the Berkeley Lab team found that when overhead vents (or diffusers) are supplying heated air, it created thermally stratified conditions that block the flow of clean air down to the “breathing zone” in the middle height of the room. As a result, even when people are sitting more than 6 feet from each other, some occupants may be exposed to respiratory aerosols from others at a rate 5 to 6 times higher than if the same room were well mixed. Their study, “Measured influence of overhead HVAC on exposure to airborne contaminants from simulated speaking in a meeting and a classroom,” was published recently in the journal Indoor Air.


    Virginia Tech Launches National Security Institute

    The Roanoke Star News


    from

    Virginia Tech has announced the formation of the Virginia Tech National Security Institute, aspiring to become the nation’s preeminent academic organization at the nexus of interdisciplinary research, technology, policy, and talent development to advance national security.

    “As chairman of the Senate’s Select Committee on Intelligence, I am pleased that Virginia Tech is strategically organizing and prioritizing its national security research and workforce efforts,” said U.S. Sen. Mark Warner. “Given the university’s nearly $50 million Department of Defense research portfolio, and its strategic locations in Northern Virginia near key national intelligence agencies and the Pentagon, this purposeful focusing of Virginia Tech’s efforts in national security is welcome news. Virginia Tech’s new National Security Institute will help our nation develop new security-related technological advancements while helping train the future generations of intelligence leaders.”


    Grant Opens Doors for Students to Enter Growing Field of Data Science

    Cal State Fullerton, CSUF News


    from

    Don’t be afraid of data science.

    That’s the message that Sam Behseta, professor of mathematics, and his colleague Jessica Jaynes, associate professor of mathematics, hope will resonate with Cal State Fullerton students who want to learn more about this growing field.

    Recently, CSUF, in partnership with University of California, Irvine and Cypress Community College, received a grant of $1.5 million from the National Science Foundation to reach out to underrepresented students, encouraging them to consider careers in data science.


    UCF Joins New National Consortium, Computing Alliance to Support Latino Students

    University of Central Florida News


    from

    The University of Central Florida has joined a new, national consortium that will advance scholarship of Latinx students and the field of Latino studies, and an alliance that supports the growth and success of women and Latinos in computing fields.

    The consortium includes 16 U.S. Hispanic Serving Institutions that are top-tier doctoral universities with very high research activity, as designated by the Carnegie Classification of Institutions of Higher Education. The “Crossing Latinidades: Emerging Scholars and New Comparative Directions” initiative is the first of the consortium and was just awarded a three-year, $5 million grant by The Andrew W. Mellon Foundation to support its mission of increasing the number of Latinx students pursuing terminal degrees and advancing to careers in academia.

    The initiative has three main components, including support and training for students as they develop their dissertation proposal; research working groups that will train graduate students, support junior faculty, and advance knowledge of Latino Humanities; and a web portal where faculty and students of all the consortium’s universities can network and collaborate on ongoing Latino research studies. The initiative is led by the University of Illinois Chicago (UIC).


    $3.25m DOE Grant Funds UChicago/Argonne Research on AI Models of Physics Simulations

    HPC Wire


    from

    As the world’s climate changes, extreme events considered “once-in-a-century” emergencies appear much more frequently than the name suggests. From prolonged cold snaps and heat waves to infrastructure collapses such as floods and blackouts, these rare and hard-to-predict events challenge governments and industries in their preparation for worst-case scenarios. But a new collaboration between University of Chicago and Argonne National Laboratory researchers will apply artificial intelligence to accelerate the scientific simulation of complex physical systems, with the potential to more accurately determine the probability of these extremes.

    The project, funded through a $3.25 million grant from the U.S. Department of Energy (DOE), will explore the fundamentals of “surrogate models” — simplified models built using artificial intelligence that speed up the complex scientific models for climate, energy infrastructure, and other systems. By allowing researchers to run many more simulations in the same amount of time, these surrogates enable better quantification of the risk of extreme events, the use of computer modeling in rapid decision-making, and other advantages.


    Events



    Join the new @Stanford Department of Health Policy for an important panel discussion on health equity

    Twitter, Sherri Rose


    from

    Online October 29, starting at 12 p.m. Pacific. “Panelists: @SArtiga2
    (@KFF) @KBibbinsDomingo (@UCSF_Epibiostat) Jeremy Goldhaber-Fiebert (@StanfordHP) @PerssonPetra
    (@Stanford Economics)”


    Deadlines



    We are looking forward to submissions to our special issue on Machine Learning Approaches in Big Data Visualization.

    Submission Deadline: October 29, 2021 – still 1 month time to finish your submissions.

    SPONSORED CONTENT

    Assets  




    The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

     


    Tools & Resources



    Social Media Focal Events

    GitHub- ryanjgallagher


    from

    It is often difficult to organize data from multiple API queries. For example, we may collect tweets when a hashtag starts trending by using Twitter’s filter stream. Later, we may make a separate query to the search endpoint to backfill our stream with what we missed before we started it, or update it with tweets that occurred since we stopped it. We may also want to get reply threads, quote tweets, or user timelines based on the tweets we collected. All of these queries are related to a common focal event—the hashtag—but they require several separate calls to the API. It is easy for these multiple queries to result in many disjoint files, making it difficult to organize, merge, update, backfill, and preprocess them quickly and reliably.

    To address these issues, focalevents can be used to organize social media focal event data collected from Twitter’s v2 API using academic credentials and PostgreSQL.


    Introducing LingHacks Resources! Wondering where to start learning NLP or do NLP research/activities?

    Twitter, LingHacks


    from

    Check out our growing collection of resources at http://linghacks.tech/resources. Want to submit a resource for the community? Fill out the form


    PASS: An ImageNet replacement for self-supervised pretraining without humans

    arXiv, Computer Science > Computer Vision and Pattern Recognition, Yuka Asano et al..


    from

    Computer vision has long relied on ImageNet and other large datasets of images sampled from the Internet for pretraining models. However, these datasets have ethical and technical shortcomings, such as containing personal information taken without consent, unclear license usage, biases, and, in some cases, even problematic image content. On the other hand, state-of-the-art pretraining is nowadays obtained with unsupervised methods, meaning that labelled datasets such as ImageNet may not be necessary, or perhaps not even optimal, for model pretraining. We thus propose an unlabelled dataset PASS: Pictures without humAns for Self-Supervision. PASS only contains images with CC-BY license and complete attribution metadata, addressing the copyright issue. Most importantly, it contains no images of people at all, and also avoids other types of images that are problematic for data protection or ethics. We show that PASS can be used for pretraining with methods such as MoCo-v2, SwAV and DINO. In the transfer learning setting, it yields similar downstream performances to ImageNet pretraining even on tasks that involve humans, such as human pose estimation. PASS does not make existing datasets obsolete, as for instance it is insufficient for benchmarking. However, it shows that model pretraining is often possible while using safer data, and it also provides the basis for a more robust evaluation of pretraining methods.

    Leave a Comment

    Your email address will not be published.