Data Science newsletter – March 5, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for March 5, 2020

GROUP CURATION: N/A

 
 
Data Science News



Georgia Tech physicists unlock the secret to perfect wok-tossed fried rice

Ars Technica, Jennifer Ouelette


from

The trick is a timely combination of side-to-side and see-sawing motions.


UT announces two new degrees for Fall 2020

Toledo Blade, Bri'on Whiteside


from

Responding to the evolution of the workforce, the University of Toledo will debut two new undergraduate degrees in data science and analytics to address what university administrators say is rising demand for data-savvy professionals.

In fall 2020, UT will offer a bachelor of arts degree in data analytics in the College of Arts and Letters, and a bachelor of science degree in data science in the College of Natural Sciences and Mathematics.


Si2 group for AI and machine learning in chip design tools

eeNews Europe, Nick Flaherty


from

The Silicon Integration Initiative (Si2) has launched a special interest group to focus on the growing needs and opportunities in artificial intelligence (AI) and machine learning (ML) for electronic design automation.

The Si2 group will be chaired by researchers from IBM and Samsung and aims to identify current solutions and technology gaps in AI and ML strategies for EDA digital design.

“AI and ML are changing semiconductor design and improving performance and time to market,” said Leigh Anne Clevenger, Si2 design automation data scientist. “Based on member company interest, we expect the SIG to propose prototype projects to accelerate the development of standards in areas such as machine learning training, and data handling and sharing.


Announcing the 2019 Google Faculty Research Award Recipients

Google AI Blog, Maggie Johnson and Negar Saei


from

This year we received 917 proposals from ~50 countries and over 330 universities, and had the opportunity to increase our investment in several research areas related to Health, Accessibility, AI for Social Good, and ML Fairness. All proposals went through an extensive review process involving 1100 expert reviewers across Google who assessed the proposals on merit, innovation, connection to Google’s products/services and alignment with our overall research philosophy.

As a result of these reviews, Google is funding 150 promising proposals across a wide range of research areas, from Machine Learning, Systems, Human Computer Interaction and many more, with 26% of the funding awarded to universities outside the United States. Additionally, 27% of our recipients this year identified as a historically underrepresented group within technology. This is just the beginning of a larger investment in underrepresented communities and we are looking forward to sharing our 2020 initiatives soon.


The Big Story: The Difference In Differential Privacy

AdExchanger


from

If advertising wants to keep using big data sets filled with consumer information, they’ll have to do a better job protecting them.

And differential privacy is one of the most promising weapons in the privacy-protecting arsenal.

This week on The Big Story, senior editor Allison Schiff dishes on differential privacy. It might sound like an arcane academic concept – which it is – but she’ll give you the elevator pitch in a way that anyone can understand. [audio, 27:41]


Federally protected lands reduce habitat loss and protect endangered species, study finds

Tufts University, Tufts Now


from

Using more than 30 years of earth satellite images, scientists at Tufts University and the non-profit conservation organization Defenders of Wildlife have discovered that habitat loss for imperiled species in the U.S. over this period was more than twice as great on non-protected private lands than on federally protected lands. As wildlife face a host of survival threats ranging from habitat destruction to global climate change, the study, published today in Frontiers in Ecology and the Environment, provides evidence that federal land protection and listing under the U.S. Endangered Species Act are effective tools for stemming losses in species habitat.


This Company Is Using Racially-Biased Algorithms to Select Jurors

VICE, Motherboard, Todd Feathers


from

Momus Analytics’ predictive scoring system is using race to grade potential jurors on vague qualities like “leadership” and “personal responsibility.”


New projects broaden the search for alien signals from space

knowable magazine, Tom Siegfried


from

Sure, after decades of listening, there is still no message. But with more data to sift through, and new technologies with superior search capabilities, odds of hearing from E.T. are rapidly improving. If the probability in the decade 2011–2021 were x percent, it’s going to be 1,000 times x in the following decade, says Andrew Siemion, director of the Berkeley SETI Research Center. (SETI stands for Search for Extra-Terrestrial Intelligence.)

The reason for E.T. optimism stems largely from several new projects in the works, enhanced with advanced methods for discerning an actual message hidden in the static of cosmic cacophony.

Siemion, speaking in Seattle on February 15 at the annual meeting of the American Association for the Advancement of Science, reported a new release of data from Breakthrough Listen, a major enterprise for recording radio signals from space. Now available for others to analyze, the data dump contains 2 petabytes of information (to store that much, you’d need 2,000 of today’s typical PCs with their puny 1 terabyte hard drives).


Civic Participation in a Datafied Society

YouTube, SAGE Ocean


from

“Drawing on the on-going collaborative project ‘Towards Democratic Auditing’ carried out by the Data Justice Lab, in this talk Lina Dencik will engage with the question of advancing civic participation in a context of rapid technological and social transformation. Lina will discuss experiments in new democratic practices to ensure legitimacy, transparency, accountability and intervention in relation to data-driven governance and outline emerging terrains for developing citizen agency in a datafied society.” [video, 39:20]


Building Hardware to Enable Continuous Data Protections

DARPA


from

The safety and security of critical information – whether it is sensitive intellectual property (IP), financial information, personally identifiable information (PII), intelligence insight, or beyond – is of vital importance. Conventional data encryption methods or cryptographic solutions, such as Advanced Encryption Standards (AES), translate data into a secret “code” that can only be decoded by people with access to a decryption key. These methods protect data as it is transmitted across a network or at rest while in storage. Processing or computing on this data however requires that it is first decrypted, exposing it to numerous vulnerabilities and threats. Fully homomorphic encryption (FHE) offers a solution to this challenge. FHE enables computation on encrypted data, or ciphertext, rather than plaintext, or unencrypted data – essentially keeping data protected at all times. The benefits of FHE are significant, from enabling the use of untrusted networks to enhancing data privacy. Despite its potential, FHE requires enormous computation time to perform even simple operations, making it exceedingly impractical to implement with traditional processing hardware.


What artificial intelligence tells us about happiness and mental health

World Economic Forum, Darren Thackeray


from

Researchers from Canada’s University of Waterloo gathered more than 700,000 anonymous online journal entries written by over 67,000 users of a mobile mood tracking app. They then developed an AI computer model that could identify keywords within the text.


Top Computer Science Schools in 2020

ACM Careers, QS World University Rankings


from

In this year’s QS Subject Rankings for computer science, 104 of the 601 schools featured are situated in the United States, with five of these in the top 10 for computer science worldwide (the most of any country). Outside of the top 10, there are plenty of top computer science universities available, including Princeton University (11th), University of California, Los Angeles (UCLA, 15th), and University of Washington (18th). Three schools, Columbia University, Cornell University and New York University (NYU), are tied in 19th place worldwide.


Forget Math; Language Skills Are Stronger Predictor of Programming Ability

ACM Careers, University of Washington


from

Credit: Getty Images

New research from the University of Washington finds that a natural aptitude for learning languages is a stronger predictor of learning to program than basic math knowledge, or numeracy. That’s because writing code also involves learning a second language, an ability to learn that language’s vocabulary and grammar, and how they work together to communicate ideas and intentions. Other cognitive functions tied to both areas, such as problem solving and the use of working memory, also play key roles.

“Many barriers to programming, from prerequisite courses to stereotypes of what a good programmer looks like, are centered around the idea that programming relies heavily on math abilities, and that idea is not born out in our data,” says Chantel Prat.


SF poised to pass Prop E, which could significantly reduce new supply of startup office space

TechCrunch, Lucas Matney


from

San Francisco is poised to pass a controversial proposition that would almost certainly limit further office space development in the city, perhaps pushing more tech companies and startups to set up their HQs elsewhere.

Prop E‘s passing, which seemed likely Wednesday afternoon following Tuesday’s election, ties office development approval to the city’s ability to meet affordable housing goals, something that the city and its developers haven’t proven themselves all that capable of doing in recent years. Amid skyrocketing rents and a homeless crisis, there have been ample concerns that the structures in the city are being overstressed, low and moderate income residents are being pushed out and that the influx of tech startups is exacerbating the problem.


The Official Coronavirus Numbers Are Wrong, and Everyone Knows It

The Atlantic, Alexis C. Madrigal


from

We know, irrefutably, one thing about the coronavirus in the United States: The number of cases reported in every chart and table is far too low.

The data are untrustworthy because the processes we used to get them were flawed. The Centers for Disease Control and Prevention’s testing procedures missed the bulk of the cases. They focused exclusively on travelers, rather than testing more broadly, because that seemed like the best way to catch cases entering the country.

 
Deadlines



Startup Showcase: MinneWiADS – Women in Analytics & Data Science Conference

“MinneAnalytics is proud to announce MinneWiADS” … “The conference will include a Startup Showcase session with pitches from early-stage companies involved in analytics, AI, machine learning, and other emerging data technology that have at least one female or non-binary cofounder.”

Task – SIGMORPHON 2020 Shared Task: Grapheme-to-Phoneme, Unsupervised Induction of Morphology, and Typologically Diverse Morphological Inflection

“In this task, participants will create computational models that map a sequence of “graphemes”—characters—representing a word to a transcription of that word’s pronunciation.” Deadline for submissions is April 27.

Barclays Accelerator, powered by Techstars – New York

“The Barclays Accelerator, powered by Techstars, is a top-tier fintech accelerator for early stage startups. Every year we invest in 10 fintech companies and help them get their businesses to the next level.” Deadline for applications is May 10.

Announcing the 2020 Frictionless Data Tool Fund

Open Knowledge Foundation is launching the second round of the Frictionless Data Tool Fund, a mini-grant scheme offering grants of $5,000 to support individuals or organisations in developing an open tool for reproducible science or research built using the Frictionless Data specifications and software. We welcome submissions of interest until 17th May 2020.”

DCASE2020 Challenge – IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events

“Sounds carry a large amount of information about our everyday environment and physical events that take place in it. We can perceive the sound scene we are within (busy street, office, etc.), and recognize individual sound sources (car passing by, footsteps, etc.). Developing signal processing methods to automatically extract this information has huge potential in several applications.” Deadline for challenge submissions is June 15.
 
Tools & Resources



New Open Source App: Data Science Education

datanami, George Leopold


from

“An open source project shepherded by the Linux Foundation aims to accelerate data science curricula while benefitting from the contributions of students and teachers. OpenDS4All is funded by IBM and is being developed by the University of Pennsylvania. The effort would give educators free access to information needed to develop data science coursework. In return, successful approaches would be folded back into what project promoters call ‘constantly evolving and improving’ curricula.”

“A starter ‘curriculum kit’ includes a set of open source building blocks that could be used to launch data science programs. Based on the Python programming language, the tools and frameworks include code, documentation and data sets, organizers said.”


Yoshua Bengio and Team Introduce GNN Benchmarking Framework

Synced


from

“A new study introduces a reproducible graph neural network (GNN) benchmarking framework to study and quantify the impact of theoretical developments for GNNs. In the field of analyzing and learning from data on graphs, GNNs have become an essential tool. With promising applications in different domains such as chemistry, physics, social sciences, knowledge graphs, recommendation, and neuroscience, how to study and build more powerful GNNs is a hot topic.”


[2002.12327] A Primer in BERTology: What we know about how BERT works

arXiv, Computer Science > Computation and Language; Anna Rogers, Olga Kovaleva, Anna Rumshisky


from

Transformer-based models are now widely used in NLP, but we still do not understand a lot about their inner workings. This paper describes what is known to date about the famous BERT model (Devlin et al. 2019), synthesizing over 40 analysis studies. We also provide an overview of the proposed modifications to the model and its training regime. We then outline the directions for further research.


Sno — Distributed version-control for geospatial and tabular data

koordinates


from

“Sno stores geospatial and tabular data in Git, providing version control at the row and cell level.”


A brief introduction to fuzzing and why it’s an important tool for developers

Microsoft Research, Patrice Godefroid


from

“In the February 2020 issue of Communications of the ACM, Microsoft researcher Patrice Godefroid published a new review article entitled “Fuzzing: Hack, Art, and Science,” which presents an overview of the main automated testing techniques in use today for finding security vulnerabilities in software.”

 
Careers


Full-time positions outside academia

Sr. Grants Management Specialist (Senior Programs Officer)



National Foundation on the Arts and the Humanities, Institute of Museum and Library Services; Washington, DC

Natural Resources Specialist (Monitoring Data Coordinator)



Department of the Interior, Bureau of Land Management; Lakewood, CO
Internships and other temporary positions

the Congressional innovation Scholars program



TechCongress; Washington, DC

Leave a Comment

Your email address will not be published.