Data Science newsletter – January 7, 2022

Newsletter features journalism, research papers and tools/software for January 7, 2022

 

Terra takes the pain out of ‘omics’ computing in the cloud

Nature, Technology Feature, Jeffrey M. Perkel


from

Advances in DNA sequencing mean it’s much easier to collect genetic data than it is to analyse them. As data sets have ballooned, [Elinor] Karlsson says, “We went from it being inconvenient to copy data to my server, to being cost-prohibitive. Institutions didn’t have the resources to keep up.”

So the project turned to a system that could: Terra.

Terra began life as a service called FireCloud. It was developed by the Data Sciences Platform (DSP) team at the Broad Institute of MIT and Harvard in Cambridge, Massachusetts, in collaboration with Microsoft and Verily Life Sciences — a subsidiary of Google’s parent company, Alphabet. In 2019 it was renamed and folded into the National Human Genome Research Institute’s Analysis, Visualization and Informatics Lab-Space (AnVIL). Terra’s web-based interface provides user-friendly access to scalable workflows, collaboration and analysis tools. It allows researchers to quickly integrate and computationally analyse vast ‘omics’ data sets on the Google Cloud Platform.


Where’s Today’s Beethoven?

Cold Takes blog, Holden Karnofsky


from

The broad theme is that across a variety of areas in both art and science, we see a form of “innovation stagnation”: the best-regarded figures are disproportionately from long ago, and our era seems to “punch below its weight” when considering the rise in population, education, etc. Since the patterns look fairly similar for art and science, and both are forms of innovation, I think it’s worth thinking about potential common factors.

Below, I will:

  • List the three main hypotheses people offer to answer “Where’s Today’s Beethoven?”: the “golden age” hypothesis (people in the past were better at innovation), the “bad taste” hypothesis (Beethoven and others don’t deserve their reputations), and the “innovation as mining” hypothesis (ideas naturally get harder to find over time, and we should expect art and science to keep slowing down by default). Importantly, I think each of these has interesting and not-widely-accepted implications of its own.
  • Examine systematic data on trends in innovation in a number of domains, bringing together (a) long-run data on both art and science over hundreds of years and more; (b) recent data on technology and more modern art/entertainment genres (film, rock music, TV shows, video games). I think this is the first piece to look at this broad a set of trends of this form.

  • American businesses need to invest in tech education

    Fortune, Rose Stuckey Kirk


    from

    Computer and mathematical occupations are expected to see fast employment growth “as strong demand is expected for IT security and software development, in part due to increased prevalence of telework spurred by the COVID-19 pandemic,” according to a recent report from the Bureau of Labor Statistics.

    Meanwhile, the landmark $1.2 trillion infrastructure bill includes $65 billion to expand broadband access, especially to bridge the digital divide in America and reach people and places too long left behind.

    U.S. businesses can leverage this national investment in digital access to build the 21st-century digital-savvy workforce we urgently need by investing in today’s youth, including those who live in often overlooked and underrepresented areas.


    Long-Reads and Powerful Algorithms Identify “Invisible” Microbes

    University of California, San Diego, UC San Diego News


    from

    Microbes are everywhere – in our guts, on our skin, permeating the environments around us. Studying these microbial communities has delivered tremendous insights into disease and good health, but identifying all the distinct species in a sample can be challenging.

    Now, a study by an international research team has shown that highly accurate, long-read genomic sequencing technology (HiFi) can shine a light on this previously hidden biology.

    Researchers at the University of California San Diego Department of Computer Science and Engineering, the U.S. Department of Agriculture, the biotechnology company Pacific Biosciences and labs in Russia, Israel and the Netherlands have shown that HiFi, combined with advanced algorithms, can differentiate between nearly identical organisms, allowing researchers to more completely catalogue microbial communities.


    System recognizes hand gestures to expand computer input on a keyboard

    University of Waterloo (Canada), Waterloo News


    from

    Researchers are developing a new technology that uses hand gestures to carry out commands on computers.

    The prototype, called “Typealike,” works through a regular laptop webcam with a simple affixed mirror. The program recognizes the user’s hands beside or near the keyboard and prompts operations based on different hand positions.

    A user could, for example, place their right hand with the thumb pointing up beside the keyboard, and the program would recognize this as a signal to increase the volume. Different gestures and different combinations of gestures can be programmed to carry out a wide range of operations.


    College Admissions in an Era of Uncertainty

    Public Policy Institute of California, Niu Gao and Hans Johnson


    from

    In spring 2020, the University of California (UC) and the California State University (CSU) announced new admissions policies to address some of the academic effects of the pandemic. Many high schools switched from letter grades to pass/no pass or course credit systems during the pandemic; those non-letter grades will now satisfy A–G requirements during the 2019–20 and 2020–21 school years but will not be calculated in GPAs. UC will continue to calculate GPAs using letter grades earned in grades 10 and 11.

    In addition, UC eliminated the use of standardized tests such as the SAT and ACT for admission decisions, and expanded the number of courses that meet math requirements to include advanced courses such as introduction to data science and statistics. CSU suspended the use of ACT/SAT scores for the fall 2021, winter 2022, and spring 2022 admission cycles.

    Early evidence suggests that these policy changes have had a mostly positive impact.


    The White House OSTP is seeking information on how to improve community health and health equity through digital health technologies

    Twitter, Georgia Tech Institute for People and Technology


    from

    Visit http://federalregister.gov/d/2021-28193 and submit a formal comment to guide future policies in this area.


    Seeing the plasma edge of fusion experiments in new ways with artificial intelligence

    MIT News, Plasma Science and Fusion Center


    from

    To better understand edge conditions, scientists focus on modeling turbulence at this boundary using numerical simulations that will help predict the plasma’s behavior. However, “first principles” simulations of this region are among the most challenging and time-consuming computations in fusion research. Progress could be accelerated if researchers could develop “reduced” computer models that run much faster, but with quantified levels of accuracy.

    For decades, tokamak physicists have regularly used a reduced “two-fluid theory” rather than higher-fidelity models to simulate boundary plasmas in experiment, despite uncertainty about accuracy. In a pair of recent publications, [Abilhash] Mathews begins directly testing the accuracy of this reduced plasma turbulence model in a new way: he combines physics with machine learning.


    WTF is automatic content recognition?

    Digiday, Tim Peterson


    from

    Your TV is listening. And watching.

    Whether a TV is listening or watching along with a viewer depends on whether it’s a smart TV, but an increasing share of TV watch time is happening on smart TVs. In the third quarter of 2021, the amount of time people spent watching video programming on a smart TV increased by 64% year over year, according to video measurement and analytics firm Conviva. And by 2026, 51% of households worldwide are expected to own a smart TV, according to consulting firm Strategy Analytics.

    That means there is a growing amount of TV viewership data being accumulated via a technological process called automatic content recognition, which companies including LG, Roku, Samsung and Vizio enable on their smart TVs or the smart TVs they power.


    Alteryx Announces Acquisition of Trifacta

    Alteryx


    from

    “We’re incredibly excited to join forces with Alteryx to create the industry’s leading independent cloud analytics provider,” said Adam Wilson, CEO of Trifacta. “Together, we have the opportunity to enable thousands of customers globally to unlock powerful business insights with the combination of Trifacta’s Data Engineering Cloud and Alteryx’s Analytics Automation platform.”

    Alteryx will acquire Trifacta for $400 million in cash, subject to customary purchase price adjustments.


    HKU launches Institute of Data Science with HK$150 million donation from The Musketeers Foundation

    The University of Hong Kong, Media


    from

    The University of Hong Kong (HKU) received a HK$150 million donation from The Musketeers Education and Culture Charitable Foundation (the Foundation) to establish the “Musketeers Foundation Institute of Data Science” (the Institute), marking a major milestone in the University’s innovation and technology development. The Institute will be anchored at the Tech Landmark, headquarters of the HKU InnoTech initiative.

    The Institute will be the command centre based at the HKU Main Campus with branches, satellite centres, labs, and institutes in the Mainland, in particular in the Greater Bay Area and the Lok Ma Chau Loop, to facilitate collaborations with local partners in industry, business, government sectors and other research institutions where data is collected. Our goal is to establish a premier Institute, and attract and nurture the best talents from across the globe.


    The AI Forecaster: Machine Learning Takes on Weather Prediction

    Eos, Aaron Sidder


    from

    Although weather prediction keeps improving year to year for shorter-term forecasts, forecast skill decreases in the 2-week to 2-month time frame. These longer-timescale forecasts can play a critical role for many sectors, including water conservation, energy demand, and disaster preparedness.

    In a new study, Weyn et al. set out to improve this subseasonal to seasonal forecasting, as it is known, using a novel approach to weather prediction. Using a convolutional neural network, the authors developed a machine learning weather prediction system called Deep Learning Weather Prediction (DLWP). The model is trained on past weather data, which differs from standard numerical weather prediction models that create mathematical representations of physical laws. DLWP projects 2–6 weeks into the future for the entire globe.

    The authors compared the DLWP model to current state-of-the-art numerical weather models. The evaluation showed that the standard numerical forecasts perform better for short lead times, for instance, 2–3 weeks out. However, the DLWP model compared well when projecting 4–6 weeks down the line.


    Gen Z are not ‘coddled.’ They are highly collaborative, self-reliant and pragmatic, according to new Stanford-affiliated research

    Stanford University, Stanford News


    from

    Generation Z – also known as Gen Z, iGen or postmillennial – are a highly collaborative cohort that cares deeply about others and have a pragmatic attitude about how to address a set of inherited issues like climate change, according to research by Roberta Katz, a senior research scholar at Stanford’s Center for Advanced Study in the Behavioral Sciences (CASBS).

    Since 2017, Katz, along with her co-authors, Sarah Ogilvie, a linguist at the University of Oxford and formerly at Stanford; Jane Shaw, a historian who is the principal of Harris Manchester College at Oxford and was previously dean for Religious Life at Stanford; and Linda Woodhead, a sociologist at King’s College London, collaborated as part of a multi-year CASBS research project to better understand a generation who, born between the mid-1990s to around 2010, grew up with digital tools always at their fingertips.


    OSTP Launches AI Researchers Portal

    Computing Community Consortium, The CCC Blog, Maddy Hunter


    from

    The Office of Science and Technology Policy’s (OSTP) AI.gov just launched the AI Researchers Portal. This portal was established by the National AI Initiative Office, in partnership with Federal departments and agencies and the Networking and Information Technology Research and Development National Coordination Office. The platform provides a central location for researchers to explore Federal AI resources. This exciting new resource will make it easier for new and experienced researchers to explore what is available to those working on AI innovations and research.


    Interesting Insights into How Endowments Invest

    alpha architect, Elisabetta Basilico


    from

    Endowments have distinctive characteristics: they are truly long-term investors and they don’t have to pay taxes. One of the first-ever articles on endowment investing goes back to the mid-’50s with Carpenter (1956) and in 1974, Tobin defined endowments as “investment funds that aim to meet the needs of their beneficiaries over multiple generations and to adhere to the principle of intergenerational equity.” Despite their popularity and the ease of access to university-based endowments, there is little in the academic literature about the history of endowment investing, and how endowments invest. In this article, the authors aim at filling this gap.


    I’m currently reading my 5th or 6th article today that includes in it some form of looking back on the pandemic and its lessons. *Every* *single* *one* includes a variation of one or more of the following from scientists, public health agency heads, journ

    Twitter, Tara Haelle


    from

    —”I didn’t expect so much public loss of trust”
    —”I didn’t expect political leaders to eschew public health advice bc of ratings/donors/etc”
    —”I didn’t expect big swaths of public opposition to vaccines/masks/etc”
    —”I didn’t expect ppl to ignore public health recs” 2/x


    Events



    Panel: Data Science Education | Leading the conversation and the democratization of data science

    Harvard Data Science Institute


    from

    Online January 18, starting at 12 p.m. Eastern. [rsvp required]


    Deadlines



    We Robot 2022 Call for Papers

    “We invite submissions for the 11th annual robotics law and policy conference—We Robot—to be held at the University of Washington in Seattle, Washington, on September 14-16, 2022. Currently we are planning for an in-person event.” Deadline for abstracts submissions is March 7.

    SPONSORED CONTENT

    Assets  




    The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

     


    Tools & Resources



    New publication: Data Science in the Library

    Research Information, Library News


    from

    Data Science in the Library: Tools and Strategies for Supporting Data-Driven Research and Instruction, from Facet Publishing, brings together an international group of librarians and faculty to consider the opportunities afforded by data science for research libraries. Using practical examples, each chapter focuses on data science instruction, reproducible research, establishing data science services and key data science partnerships.


    Thoughts on Design Projects in Visualization Courses

    As I See It blog, John Stasko


    from

    A big part of our graduate visualization courses has been a semester-long group design project in which a team of 2-4 students pick and topic and data set(s), then design and create an interactive visualization of that topic. In our CS 7450—InfoVis course in the past, students learned D3 and used it to build their visualizations. Here are some of the top projects from the Fall 2019 version of the course as examples.

    Recently, we started teaching a new graduate course, CS 6730-Data Visualization: Principles and Applications. This course now serves as our introductory class on data visualization for graduate students. We sought to make it accessible to students from all units on campus, so there are no prerequisites. Also, instead of web programming with D3, students learn and use various visualization applications such as Datawrapper, RAWGraphs, and especially, Tableau. For their semester design project, most groups used Tableau, but a few used other tools such as Figma and infogram.


    Best Paper Awards in Computer Science

    Jeff Huang


    from

    This is a collection of best paper awards from the main conferences in each computer science subfield, starting from 1996. This data was entered by hand from sources found online (many of them no longer available), so please email bestpaper@jeffhuang.com if you notice any errors or omissions

    Leave a Comment

    Your email address will not be published.