Data Science newsletter – February 8, 2022

Newsletter features journalism, research papers and tools/software for February 8, 2022

 

US pushes back on EU’s proposed laws impacting US tech companies

GZERO Media, Marietje Schaake


from

The EU is working on a series of legislative proposals, for example, to ensure risk mitigation around the use of AI, or the protection of fundamental rights, but also to make sure that there is fairness and competition in the digital economy. And the Digital Services Act still under negotiation between the European Commission, member state governments, and the European Parliament, seeks to impose proactive obligations on large gatekeeper tech companies, to basically extend antitrust principles and protect smaller players.

And now at the eleventh hour, the Biden administration through Commerce Secretary, Raimondo, but also a number of senators, is voicing its concern. The political leaders worry that the EU rules would discriminate unfairly against American tech companies and really single them out. But what’s easily overlooked in their statements is that US-based tech companies have grown exceptionally large, and that a law that wishes to put specific obligations on the largest companies would inevitably include many American companies.


Rather pleased with this map

Twitter, erindataviz


from


Top White House scientist resigns, admits he ’caused hurt’

NBC News, Dennis Romero


from

Eric Lander admitted “being disrespectful and demeaning” to his staff, behavior that President Biden has said would result in firing.


Love the insights in this paper by @MaartenSap &colleagues: large language models are more predictive of imagined stories than autobiographical stories.

Twitter, Rada Mahilcea


from

Which raises an interesting question: is the use of LLMs pushing us (even more) away from what’s real?


CDC turns to poop surveillance for future COVID monitoring

Ars Technica, Beth Mole


from

The Centers for Disease Control and Prevention on Friday announced it is now publicly logging levels of SARS-CoV-2 found in sewage from around the country. The announcement elevates a growing system for wastewater surveillance that the CDC says will eventually be aimed at other infectious diseases.

The system began as a grassroots research effort in 2020 but has grown to a network of more than 400 wastewater sampling sites nationwide, representing the feces of approximately 53 million Americans. The CDC is now working with 37 states, four cities, and two territories to add more wastewater sampling sites. The health agency expects to have an additional 250 sites online in the coming weeks and more after that in the coming months.

In a press briefing Friday, Dr. Amy Kirby, the CDC’s program lead for the National Wastewater Surveillance System (NWSS), called the sampling a critical early warning system for COVID-19 surges and variants, as well as “a new frontier of infectious disease surveillance in the US.”


A new programming language for high-performance computers

MIT News, MIT CSAIL


from

High-performance computing is needed for an ever-growing number of tasks — such as image processing or various deep learning applications on neural nets — where one must plow through immense piles of data, and do so reasonably quickly, or else it could take ridiculous amounts of time. It’s widely believed that, in carrying out operations of this sort, there are unavoidable trade-offs between speed and reliability. If speed is the top priority, according to this view, then reliability will likely suffer, and vice versa.

However, a team of researchers, based mainly at MIT, is calling that notion into question, claiming that one can, in fact, have it all. With the new programming language, which they’ve written specifically for high-performance computing, says Amanda Liu, a second-year PhD student at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), “speed and correctness do not have to compete. Instead, they can go together, hand-in-hand, in the programs we write.”


House of Representatives Passes China Competition Bill

Lawfare, Raquel Leslie and Brian Liu


from

On Feb. 4, the House of Representatives passed a bill aimed at increasing U.S. economic competitiveness with China. Dubbed the America COMPETES Act of 2022, the omnibus bill would devote nearly a quarter of a trillion dollars to subsidize domestic semiconductor manufacturing and research on artificial intelligence, quantum computing and other critical technologies. The House bill incorporates key elements of a bill that passed the Senate last year, which the New York Times has called the “most expansive industrial policy legislation in U.S. history.”

Biden has expressed strong support for seeing the bill enacted into law. In a statement released after passage of the House bill, Biden said that the vote was critical “​​for outcompeting China and the rest of the world in the 21st century.” The House bill is now expected to head into a conference committee, where Congress will reconcile the differences between the Senate and House bills before sending the legislation to the president for signature.


State AGs sue Google over use and collection of location data

Denver7, Joseph Peters


from

Google’s location tracking services are the subject of lawsuits in Texas, Washington D.C., Indiana and Washington state.

The lawsuits, filed in January, claim Google tracked users for years, often after users specifically turned off ‘Location History’ or similar features.

“Google’s claims to give consumers ‘control’ and respect their ‘choice’ largely serve to obscure the reality that, regardless of the settings they select, consumers have no option but to allow the Company to collect, store and use their location data,” wrote attorneys for the State of Indiana in their complaint against Google.


‘On the Books’: University Libraries research project digitizes NC Jim Crow laws

University of North Carolina at Chapel Hill, The Daily Tar Heel student newspaper, Kate Carroll


from

Since 2019, University Libraries’ On the Books: Jim Crow and Algorithms of Resistance project has used machine learning technology to digitize every law passed in N.C. during the Jim Crow era and has identified a comprehensive list of Jim Crow laws.

The multi-disciplinary team of UNC legal experts, historians and library specialists used text mining to discern and compile legislation passed between the Reconstruction Era and the Civil Rights Movement.

Now, the team is expanding the initiative with the support of a $400,000 grant from the Andrew W. Mellon Foundation.


A New Database Reveals How Much Humans Are Messing With Evolution

WIRED, Science, Amit Katwala


from

In the late 1990s, biologist Andrew Hendry noticed similarly quick changes in phenotype while studying salmon. (Phenotype refers to the trait that actually exists in the animal, even if it’s not reflected by a change in its underlying genetic code.) “We had this impression that well, actually, maybe this rapid evolution thing is not so exceptional,” says Hendry, now a professor at McGill University in Montreal. “Maybe it’s actually occurring all the time, and people just haven’t emphasized it.”

With a colleague, Michael Kinnison (now at the University of Maine), Hendry pulled together a database of examples of rapid evolution and wrote a 1999 paper that kickstarted interest in the field. Now, Hendry and colleagues have updated and expanded the original data set with more than 5,000 additional examples: everything from the cranial depth of the common chaffinch to the lifespan of the Trinidadian guppy. Scientists are using this data to answer questions about how fast and far the natural world is changing, and how much of the change is due to humans.

In an initial paper published in November 2021 using the new data set (which is called Proceed, for Phenotypic Rates of Change Evolutionary and Ecological Database), Hendry and colleagues reexamined five key questions raised by previous work. They confirmed, for instance, that on average, all over the world, animal species seem to be getting smaller. This runs contrary to a theory of evolution called Cope’s rule, which posits that species should increase in size over time.


Innovation in the reviewing process is very welcome! Great to see this.

Twitter, Victor Veitch, Francesca Orabona


from

There are some important (and great!) changes in the review process at @icmlconf 2022.

https://icml.cc/Conferences/2022/ReviewForm

– It seems that there are at least 2 phases
– Papers with 2 negative recommendations in phase 1 are rejected
– But, Meta-reviewer can reverse this outcome


SMU Graduate Julian LaNeve Wins $100k Grand Prize from Data Science Competition

Southern Methodist University, SMU Daily Campus student newspaper, Pooja Krishna


from

SMU Alumnus Julian LaNeve won $100,000 from the 2021 Data Open Championship in December, alongside teammates from Duke and UC Berkeley.

Sponsored by Citadel LLC and Citadel Securities in partnership with Correlation One, the Data Open Championship is the largest and most prestigious university-level data science competition in the world. During the week-long competition, participants team up to work through a dataset and present their findings to a panel of judges.


Census releasing popular survey after fixing data gaps

Associated Pres, Mike Schneider


from

Months after saying numbers from a 1-year version of a widely-utilized survey measuring how Americans live wasn’t usable because of problems from the pandemic, U.S. Census Bureau officials said Monday that data from a 5-year version of the survey meet its standards and will be released next month.

The statistical agency said it would release the 2020 American Community Survey 5-year estimates in mid-March. In October, the Census Bureau released the survey’s 1-year estimates only in an experimental format with a warning that it may not meet the agency’s statistical quality standards.

The survey typically relies on responses from 3.5 million households on questions about commuting times, internet access, family life, income, education levels, disabilities, military service and employment, but disruptions caused by the pandemic produced fewer responses.

The 1-year estimates provide information for a single year on places only with at least 65,000 people. The 5-year estimates offer data at smaller geographies and are aggregated over multiple years.


Reflections and Predictions: Jeffrey Heer

Trifacta blog, Jeffrey Heer


from

Just a few weeks into 2022, we’re already learning about what the year has in store for us. It’s not too late to take a look back, and ahead. In a recent episode of The Data Wranglers podcast, Joe Hellerstein and I did just that. We identified three drivers of changes in how we look at data, and it’s worth bringing these thoughts to the page, too. … These were our three big takeaways from data in 2021:

  • The continued rise of the cloud
  • Issues in data ethics
  • Continued COVID Pandemic

  • “tens of billions of dollars worth of global life sciences R&D depends on software tools”…yet @NIH and other major funders in the US have no long-term strategy to fund software essential for research. Short-sighted and risky to say the least!

    Twitter, Lorena Barba


    from

    “Choosing to build scientific software tools should not require career sacrifice. A prosperous future in the physical world may depend on it.”


    The Department of Defense is Prioritizing Open Source Software. Here’s How Open Source Projects Can Benefit.

    Gradients blog, Ritwik Gupta


    from

    On January 26, 2022, the new Chief Information Officer (CIO) of the U.S. Department of Defense (DoD), John B. Sherman, released a memo to the entire Department titled “Software Development and Open Source Software”. In this memo, the CIO addresses two primary concerns: 1) using open source software (OSS) introduces supply chain risks for DoD software programs, and 2) sharing DoD code via open source channels without proper checks enables potential leaks of proprietary DoD information to adversaries. In laying out how these two concerns should be addressed properly, the CIO categorizes OSS into a unique position, one which can be utilized by OSS foundations and project maintainers to gain funding for their essential contributions.


    Texas A&M Energy Institute and the Texas A&M Institute of Data Science to Team on Data Science for Energy Sector

    insideHPC, Texas A&M University


    from

    The Texas A&M Energy Institute and the Texas A&M Institute of Data Science (TAMIDS) have signed a formal agreement to update two current certificate programs and to add a third, the Division of Research announced today.

    Under the agreement, the institutes plan to work on at least two major projects. One will upgrade the Energy Institute’s existing Master of Science in Energy and Certificate in Energy programs by adding a unit on “energy digitization,” the intersection between data science and the world’s energy sector. The other will develop a Certificate in Data Sciences and Energy program that the institutes will administer together.


    Franklin College adds data science major

    Daily Journal (Franklin, IN), Andy Bell-Baltaci


    from

    ranklin College students will soon be able to pursue data science, a new major that will help facilitate entry into high-end technological fields.

    Students were already graduating from Franklin College and pursuing jobs in areas similar to data science, so adding the data science major was a way to ease the process of finding placement in that specific field, said Kristin Flora, the college’s dean and vice president of academic affairs.

    “It was primarily put together by Stacy Hoehn, associate professor of mathematics, as part of her sabbatical project. She researched the curricula of other schools who host a data science major. With that, she learned many Franklin College students found their way into the field of data science, working at Facebook, Eli Lilly, Stubhub and Salesforce. Maybe we can give students a better foundation through a data science major that gives them more direct entry into the field rather than having them double major or pursue a career in graduate training. Seeing the path our students took and the national demand for data scientists increasing, it made sense for us to pursue it,” Flora said.


    KY bill could freeze tuition for 4 years students attend college

    Lexington Herald Leader, Monica Kast


    from

    A bill filed in the Kentucky House of Representatives would require public universities in the state to freeze tuition for four years with each incoming class. HB452, the Kentucky Student Tuition Protection and Accountability Act filed by Rep. William Lawrence (R-Maysville), would require all public, four-year institutions funded by the state to set tuition and fees for each incoming class, and then freeze those rates for four years. The freeze would apply to in-state students who enroll at the institutions.


    Biden science adviser apologizes for “demeaning” behavior toward staff

    Axios, Oriana Gonzalez


    from

    Eric Lander, President Biden’s science adviser, has apologized for speaking to White House Office of Science and Technology Policy staff in “a disrespectful or demeaning way,” according to a note he sent to OSTP staff this weekend.

    The big picture: An investigation found that Lander violated the White House’s workplace policy and “corrective action” was taken, according to a OSTP spokesperson.


    Events



    Come join us in Irvine, CA for 3 days of discussions with thought-leaders in academic data science!

    Twitter, Academic Data Science Alliance


    from

    Registration closing soon!! Travel support available. Discounted hotel block closes February 13th!

    SPONSORED CONTENT

    Assets  




    The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

     


    Tools & Resources



    The world of information — in Fun Size!

    Twitter, University of Michigan School of Information


    from

    Fun Size is a free newsletter of bite-size technology, information and library news, from the University of Michigan School of Information. Subscribe today


    Researchers use tiny magnetic swirls to generate true random numbers

    Brown University, News from Brown


    from

    Whether for use in cybersecurity, gaming or scientific simulation, the world needs true random numbers, but generating them is harder than one might think. But a group of Brown University physicists has developed a technique that can potentially generate millions of random digits per second by harnessing the behavior of skyrmions — tiny magnetic anomalies that arise in certain two-dimensional materials.

    Their research, published in Nature Communications, reveals previously unexplored dynamics of single skyrmions, the researchers say. Discovered around a half-decade ago, skyrmions have sparked interest in physics as a path toward next-generation computing devices that take advantage of the magnetic properties of particles — a field known as spintronics.


    Red Teaming Language Models with Language Models

    DeepMind, Research


    from

    Language Models (LMs) often cannot be deployed because of their potential to harm users in ways that are hard to predict in advance. Prior work identifies harmful behaviors before deployment by using human annotators to hand-write test cases. However, human annotation is expensive, limiting the number and diversity of test cases. In this work, we automatically find cases where a target LM behaves in a harmful way, by generating test cases (“red teaming”) using another LM. We evaluate the target LM’s replies to generated test questions using a classifier trained to detect offensive content, uncovering tens of thousands of offensive replies in a 280B parameter LM chatbot. We explore several methods, from zero-shot generation to reinforcement learning, for generating test cases with varying levels of diversity and difficulty. Furthermore, we use prompt engineering to control LM-generated test cases to uncover a variety of other harms, automatically finding groups of people that the chatbot discusses in offensive ways, personal and hospital phone numbers generated as the chatbot’s own contact info, leakage of private training data in generated text, and harms that occur over the course of a conversation. Overall, LM-based red teaming is one promising tool (among many needed) for finding and fixing diverse, undesirable LM behaviors before impacting users.


    Careers


    Tenured and tenure track faculty positions

    Sr. Tenure-Track Faculty in Artificial Intelligence



    University of Colorado Boulder, Institute of Cognitive Science; Boulder, CO

    Leave a Comment

    Your email address will not be published.