Data Science newsletter – January 6, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for January 6, 2020

GROUP CURATION: N/A

Data Science News

In 2020, let’s stop AI ethics-washing and actually do something

MIT Technology Review, Karen Hao and Jonathan Stray

from December 27, 2019

Last year, just as I was beginning to cover artificial intelligence, the AI world was getting a major wake-up call. There were some incredible advancements in AI research in 2018—from reinforcement learning to generative adversarial networks (GANs) to better natural-language understanding. But the year also saw several high-profile illustrations of the harm these systems can cause when they are deployed too hastily.

A Tesla crashed on Autopilot, killing the driver, and a self-driving Uber crashed, killing a pedestrian. Commercial face recognition systems performed terribly in audits on dark-skinned people, but tech giants continued to peddle them anyway.

Lack of guidance leaves public services in limbo on AI, says watchdog

The Guardian, Dan Sabbagh

from December 29, 2019

Police forces, hospitals and councils struggle to understand how to use artificial intelligence because of a lack of clear ethical guidance from the government, according to the country’s only surveillance regulator.

The surveillance camera commissioner, Tony Porter, said he received requests for guidance all the time from public bodies which do not know where the limits lie when it comes to the use of facial, biometric and lip-reading technology.

“Facial recognition technology is now being sold as standard in CCTV systems, for example, so hospitals are having to work out if they should use it,” Porter said. “Police are increasingly wearing body cameras. What are the appropriate limits for their use?

“The problem is that there is insufficient guidance for public bodies to know what is appropriate and what is not, and the public have no idea what is going on because there is no real transparency.”

Excited to close out the year by sharing that our report and recs to the NIH Advisory Committee to the Director were accepted and recommended by the ACD to be acted on by the NIH!

Twitter, Rediet Abebe

from December 20, 2019

Recs for biomed ML include to develop and publish:

consent & data access standards

criteria for ML-friendly datasets

ethical principles e.g. related to equity & governance

datasheets & model cards for biomed ML

Convene cross-disciplinary collabs

Exclusive: Facebook adding part-time fact-checking contractors

Axios, Sara Fischer

from December 17, 2019

Facebook is creating a new pilot program in the U.S. that will leverage part-time contracted “community reviewers” to expedite its fact-checking process.

The big picture: The community reviewers will help to corroborate or debunk stories that Facebook’s machine learning tools flag as potential misinformation. This will make it easier for Facebook’s fact-checking partners to quickly debunk false claims.

WPI Launches Transdisciplinary Master’s Degree Program in Neuroscience

Worcester Polytechnic Institute, News & Events

from December 09, 2019

Drawing on the university’s strengths in bioinformatics, data science, and machine learning, the program will include a unique focus on computational neuroscience.

The Power of Social Media in Predicting Stock Returns

Institutional Investor, Christine Idzelis

from January 02, 2020

The predictability of returns falls when traders are tweeting about a higher number of companies, researchers found.

Top minds in machine learning predict where AI is going in 2020

VentureBeat, Khari Johnson

from January 02, 2020

AI is no longer poised to change the world someday; it’s changing the world now. As we begin a new year and decade, VentureBeat turned to some of the keenest minds in AI to revisit progress made in 2019 and look ahead to how machine learning will mature in 2020. We spoke with PyTorch creator Soumith Chintala, University of California professor Celeste Kidd, Google AI chief Jeff Dean, Nvidia director of machine learning research Anima Anandkumar, and IBM Research director Dario Gil.

Face X-ray for More General Face Forgery Detection

arXiv, Computer Science > Computer Vision and Pattern Recognition; Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, Baining Guo

from December 31, 2019

In this paper we propose a novel image representation called face X-ray for detecting forgery in face images. The face X-ray of an input face image is a greyscale image that reveals whether the input image can be decomposed into the blending of two images from different sources. It does so by showing the blending boundary for a forged image and the absence of blending for a real image. We observe that most existing face manipulation methods share a common step: blending the altered face into an existing background image. For this reason, face X-ray provides an effective way for detecting forgery generated by most existing face manipulation algorithms. Face X-ray is general in the sense that it only assumes the existence of a blending step and does not rely on any knowledge of the artifacts associated with a specific face manipulation technique. Indeed, the algorithm for computing face X-ray can be trained without fake images generated by any of the state-of-the-art face manipulation methods. Extensive experiments show that face X-ray remains effective when applied to forgery generated by unseen face manipulation techniques, while most existing face forgery detection algorithms experience a significant performance drop.

University tech presents growing privacy concern for students, educators

EdScoop, Betsy Foresman

from December 30, 2019

Some educators are concerned that despite the intention of this technology being used to help students, it may also be taking opportunities away from them.

“I always say higher education is a time of exploration, where you start to figure out who you want to become for the later period of your life,” Kyle Jones, an assistant professor at Indiana University told American Public Media. But with predictive algorithms being used to help students navigate through their education, Jones argues that data can steer students too narrowly and limit them from exploring opportunities they may appear to be less qualified for.

‘TripAdvisor for peer review’ targets publishing bias

Times Higher Education (THE), Jack Grove

from January 02, 2020

The scornful comments of “reviewer 2” have become a running joke in academia. But a new artificial intelligence system – dubbed a “TripAdvisor for peer review” – may soon be able to test whether the scathing remarks of anonymous referees are being handed out fairly or not.

Amid concerns that peer reviewers are harsher in their criticisms of female researchers or those from less prestigious institutions, PeerJudge scans peer review reports for keywords – either positive or negative – to see whether reviewers are unduly tough when assessing certain types of researchers.

The program, which was created by technology company F1000 and researchers at the University of Wolverhampton, also checks whether reviewer comments correspond with the final recommendation to accept or reject the paper – an area of grievance for researchers when broadly positive comments on their manuscripts are followed by a call to reject.

Russia starts testing its own internal internet

TechCrunch, Devin Coldewey

from December 26, 2019

Russia has begun testing a national internet system that would function as an alternative to the broader web, according to local news reports. Exactly what stage the country has reached is unclear, but certainly the goal of a resilient — and perhaps more easily controlled — internet is being pursued.

The internet, of course, is made up of a global web of infrastructure that must interface physically, virtually and, increasingly, politically with the countries to which it connects. Some countries, like China, have opted to very carefully regulate that interface, controlling which websites, apps and services can be accessed from the local side of that interface.

Russia has increasingly leaned toward that approach, with President Putin signing a law earlier this year there, Runet, which would build the necessary infrastructure to maintain, essentially, a separate internal internet should such a thing become necessary (or convenient).

Three good summaries of #NeurIPS2019 for those who couldn’t make it

Twitter, Reza Zadeh

from December 31, 2019

By Andrey @andrey_kurenkov

By Alex @kolchinski

By Chip @chipro

A 400% Surge in Data Costs Is Enraging Traders in Post-MiFID Era

Bloomberg Economics, Ksenia Galouchko

from December 10, 2019

Virtu Financial Inc. complained to regulators of “arbitrary and opportunistic” price increases by the London Stock Exchange, after fees it pays for market data jumped 36%. German trade association BVI fumed over hikes of 199% by Deutsche Boerse.

Now Europe’s market watchdog has reached the same conclusion: Two years after being implemented, the rules overhaul known as MiFID II has failed to bring down the costs of market data. Nor has it made it easier for brokers and fund managers to figure out what they’re paying for.

It is painful to admit, but important to do so. I apologize to all. I was a bit busy when this was submitted, and did not do my job well.

Twitter, Frances Arnold

from January 02, 2020

For my first work-related tweet of 2020, I am totally bummed to announce that we have retracted last year’s paper on enzymatic synthesis of beta-lactams. The work has not been reproducible.

Simulated Image Demonstrates the Power of NASA’s Wide Field Infrared Survey Telescope

NASA, Hubblesite, News

from January 05, 2020

NASA’s upcoming Wide Field Infrared Survey Telescope (WFIRST), scheduled for launch in the mid-2020s, will have the power to survey the sky 1,000 times faster than the Hubble Space Telescope, with Hubble-quality detail, in the near-infrared.

A simulated image of a 34,000-light-year swath across our neighboring galaxy Andromeda showcases WFIRST’s unique detector configuration, expansive field of view and high resolution. The image was generated using data collected by Hubble, and shows the red and infrared light of more than 50 million individual stars in Andromeda, as they would appear with WFIRST.

WFIRST is designed to address key questions across a wide range of topics, including dark energy, exoplanets, and general astrophysics spanning from our solar system to the most distant galaxies in the observable universe. WFIRST is expected to amass more than 4 petabytes of information per year, all of which will be non-proprietary and immediately accessible to the public.

Events

ACM/IEEE International Conference on Human-Robot Interaction

ACM and IEEE

from March 23, 2020

Cambridge, England March 23-25, at the Fitzwilliam College (Workshops) and at Corn Exchange and Cambridge Guild Hall (Main Conference). [$$$]

Please Join Us For The Thirty-Second Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-20); Collocated with AAAI-20!

AAAI

from February 07, 2020

New York, NY February 7-12. “The Thirty-Second Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-20) will focus on successful applications that showcase novel uses of AI technology.” [$$$$]

Spring 2020 Symposium: The Future of Biosensing in Wearables at the Point of Care

Boston University, Precision Diagnostics Center

from February 24, 2020

Boston, MA February 24 at Boston University. [registration required]

Deadlines

New NHLBI BioData Catalyst Fellows Program Seeking Applications

“The National Heart, Lung, and Blood Institute recently announced the BioData Catalyst Fellows Program. This program will provide researchers the opportunity to receive funding to help support research on novel and innovative data science and data-focused research programs. Preference for this one-year position will be given to early career researchers – graduate students, postdocs, and junior faculty.” Deadline to apply is January 17.

2020 Syngenta Crop Challenge in Analytics

“The objective is to estimate yield performance of the cross between inbred and tester combinations in a given holdout set. Specifically, we are asking for the mean yield performance of each inbred by tester combination in the holdout set.” Deadline for entries is January 21.

Emerging Leaders in Data Science Fellowship

National Institute of Allergy and Infectious Disease “has established the Emerging Leaders in Data Science Fellowship Program to develop a cadre of talented data scientists who have a strong interest in applying their bioinformatics and data science expertise to advance research on infectious and immune-mediated diseases. This fellowship is managed by NIAID’s Office of Data Science and Emerging Technologies.” Deadline to apply is February 8.

MinneWiADS Women in Analytics and Data Science Conference

Eden Prairie, MN May 15. “MinneWiADS brings together leaders, experts, and interested individuals in analytics and data science, especially those who identify female and non-binary, to share knowledge, to make connections, and to inspire one another.” Deadline for speaker proposals is March 1.

Tools & Resources

Simplifying how scientists share data

The Ohio State University, Ohio State News

from January 02, 2020

A research team has developed an open-source data-management system that the scientists hope will solve all of those problems. The researchers outlined their system today in the journal PLOS ONE.

“We wanted to create a file format and a dataset model that would encapsulate the majority of datasets we work on, on all the instruments in a lab,” said Philip Grandinetti, professor of chemistry at The Ohio State University and senior author of the paper. “There’s this long-standing problem, pervasive among scientists, that you buy a multimillion-dollar instrument and the companies that make that instrument have their own proprietary format, and it’s a nightmare to share with anyone else.”

Accuracy vs Speed – what Data Scientists can learn from Search

KDnuggets, Radu Miclaus

from January 02, 2020

As data scientists, we have a top priority: delivering accurate insights. If you’re like me, a data scientist who’s also working on search (or any real-time application for that matter), you’ve got to manage two competing priorities that sometimes butt heads: accuracy and speed.

I’ll walk you through some of the things I’ve learned through trial and error as a data scientist in search, including tips on how to smooth out some of the friction that can pop up when you’re building a tool that demands cross-collaboration with other domain disciplines.

Careers

Tenured and tenure track faculty positions

Associate or Full Professor of Media, Culture, and Communication and Cinema Studies, Tenure-Track

New York University, Steinhardt School of Culture, Education and Human Development; New York, NY

Full-time positions outside academia

Staff Machine Learning Engineer

Compass; New York, NY

Postdocs

Adobe postdoc position

Northwestern University, MU Collective; Evanston, IL

Smart Cities Postdoctoral Associates

New York University, Tandon School of Engineering; Brooklyn, NY

Sports.BradStenger.com

Data Science newsletter – January 6, 2020

Leave a Comment Cancel reply