Data Science newsletter – September 3, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for September 3, 2019

GROUP CURATION: N/A

 
 
Data Science News



Text Sophistication and Sophisticated Investors

SSRN; Juha Joenväärä, Jari Karppinen, Melvyn Teo, Cristian Ioan Tiu


from

We show that two novel measures of text sophistication, applied to hedge fund strategy descriptions, encapsulate incremental information about funds. Consistent with the linguistics literature, hedge funds with lexically diverse strategy descriptions outperform, eschew tail risk, and encounter fewer regulatory problems. In line with the literature, hedge funds with syntactically complex strategy descriptions report more regulatory violations and trigger more severe infractions. Fund investors recognize the dichotomy and direct flows accordingly, but not enough to erode away the alphas of lexically diverse funds. Our findings suggest that text sophistication measures provide texture on the cognitive ability and trustworthiness of sophisticated investors.


Taking Machine-Learning Models in Health Care from Concept to Bedside

University of Michigan Precision Health


from

Machine learning—an application of artificial intelligence in which large amounts of data are processed, patterns are identified, and the information then applied to help answer real-life questions—holds enormous promise for precision health applications. But implementing machine learning in clinical healthcare has been challenging. As the authors of a recent Nature Medicine article point out, “The potential impact of machine learning (ML) in health care warrants genuine enthusiasm, but its limited adoption in clinical care to date indicates that many of the current strategies are far from optimal.”

The authors provide an overview of common challenges to implementing ML in a health-care setting, and describe the necessity of breaking down the silos in ML and of engaging an interdisciplinary team of collaborators “from beginning (problem formulation) to end (widespread deployment)” of the ML process.


Amazon fires: What we know and what we can do

Earth Innovation Institute, Daniel Nepstad


from

The fires now burning in Brazil’s Amazon, which have prompted international outcry, are a classic example of a “chronic emergency”. Brazilian President Jair Bolsonaro, after casting blame for the fires on international NGO’s, acknowledged his administration is ill-equipped to battle the flames and soon after called in Brazil’s armed services to assist in that fight. But to truly address the situation, what is needed is a systemic, long-term strategy for prevention that feeds into a broader regional development plan. The current focus on Amazon fire —within Brazil and internationally—opens an opportunity to make that shift and to postpone or even avoid what scientists refer to as the Amazon forest “tipping point”, whereby the forest is no longer able to sustain itself.

This blog draws on 30 years of research on Amazon fire, including the world’s largest tropical forest fire experiment, to propose eight recommendations for Brazil and the international community, to fight fires in the short-term and prevent them in the long-term.


More Math for Admission

Inside Higher Ed, Greta Anderson


from

California State University System weighs adding a quantitative reasoning requirement that advocates for underrepresented students worry could worsen gaps in access.


In a Data Literacy Crisis, Librarians Could Be the Experts We Need

Fortune, Gwen Moran


from

Improving data literacy is going to require work from schools, employers, and individuals, Barrow says. The onus is on everyone to get better with information. While they’re often time- and resource-strapped, teachers can introduce more nonfiction texts with statistics and data points into their curricula and discuss how to understand, question, and investigate numbers to see the context and potential flaws, Fontichiaro suggests. Employers can invest in more training for workers to better understand data, Barrow adds.

The good news is that a hard-working legion of data literacy experts is working on equipping a new army of information warriors in the data literacy fight: librarians.

Berson-Michelson and Fontichiaro collaborated on a data literacy project for students and librarians. The project explained basic statistics, as well as concepts such as how to evaluate source quality, logical fallacy, emotionally evocative language. It helps librarians and others spot bias and bad info, helping them fine-tune their “BS meter,” Berson-Michelson says.


RecSys 2019 — sold out at record pace

Medium, Alan Said


from

If you are reading this, you are likely aware of the fact that the 13th ACM Conference on Recommender Systems (RecSys 2019) is sold out. It sold out prior to the early bird registration deadline. This is a short description of how that happened, and how unprepared we were for it happening.


What Sci-Fi Can Teach Computer Science About Ethics

WIRED, Culture, Gregory Barber


from

Schools are adding ethics classes to their computer-science curricula. The reading assignments: science fiction.


Governor Gavin Newsom Announces Members of the Future of Work Commission

California Governor Gavin Newsome


from

Governor Gavin Newsom today announced members of his Future of Work Commission, which include prominent leaders from technology, labor, business, education and other sectors across the state. The Commission will be ​co-chaired by James Manyika, chairman and director of the McKinsey Global Institute, and Mary Kay Henry, president of the Service Employees International Union, and will be guided by senior members of the Governor’s team, including Secretary of the Labor and Workforce Development Agency Julie Su, Chief Economic and Business Advisor Lenny Mendonca, and Senior Policy Advisor Lande Ajose.


Sinan Aral on the Threats of Social Media Manipulation on Elections

Medium, MIT Initiative on the Digital Economy


from

In an August 30 article in Science, MIT IDE lead, Sinan Aral and Assistant Professor, Dean Eckles, write about the critical issue of “Protecting Elections from Social Media Manipulation.” They present new research on measuring manipulation of elections — what may be one of the biggest threats to democracy in the modern era.
In the article, they advocate a four-point research agenda for measuring social media manipulation, highlight underutilized approaches to rigorous causal inference, and discuss political, legal, and ethical implications of undertaking such analysis. They also describe “analysis of the indirect, systemic effects of social media manipulation on campaign messaging and the news cycle.”


UK Biobank Gets Geneticists To Cooperate, Not Compete

NPR, Shots blog, Richard Harris


from

There’s an astonishing outpouring of new information linking genes and health, thanks to the efforts of humble Englishmen and women such as Chritopeher Fletcher. The 70-year-old man recently drove 90 miles from his home in Nottingham to a radiology clinic outside the city of Manchester.

He is one of half a million Brits who have donated time, blood and access to their medical records to a remarkable resource called UK Biobank. The biobank, in turn, has become a resource for more than a thousand scientists around the world who are interested in delving into the link between genes, behaviors and health.

Popularity of the resource is snowballing. Just this week, a major study using the data explored the genetics of same-sex sexual behavior. And as researchers discover the biobank’s value, there’s a strong incentive to add to the database to make it even richer.


Hiring committees with an implicit gender bias hire fewer women, says study

Physics World, Belle Dumé


from

Members of scientific evaluation committees who deny that gender discrimination is a problem tend to unconsciously hire fewer women than men. That is according to a study by social scientists and cognitive psychologists in France who say that education and training is required to overcome the biases that lead to this gender discrimination during the recruitment process.

Women are underrepresented in many areas of science, technology, engineering and mathematics (STEM), particularly in physics. At the French National Centre for Scientific Research (CNRS), for example, women make up just 35% of all researchers – a figure that decreases for senior research positions. A team led by Isabelle Régner and Pascal Huguet from Aix-Marseille Université therefore decided to see if this disparity is caused by the intrinsic beliefs of the hirers themselves regarding gender roles.


arXiv Code of Conduct

arXiv e-print repository


from

arXiv is a community of authors, readers, moderators, advisory board members, supporting members (libraries, research institutions, etc.), donors, staff, and collaborators in scholarly communication and software development. arXiv is also the platform where our community engages with one another. We expect all community members to act in accordance with our core values outlined here.

Failure to comply with this Code of Conduct may result in a warning or suspension from participation in arXiv.


The Fry’s Era

Monday Note,Jean-Louis Gassée


from

A couple of weeks back, I went to the Palo Alto Fry’s to buy a Microsoft mouse for my 11” iPad Pro (as recounted in the July 21st Monday Note). First surprise, the store that I expected to be active at 8am for its hard-working tech clientèle didn’t open its doors until 10. Worse, what this lover of retail saw inside the store was frightening: bare and half-empty shelves. In the trade, this is a no-no. A store must radiate a happy image of abundance, not one of sad poverty.

A little later, I saw stories in the local press about the upcoming closing of Palo Alto’s Fry’s. I visited the Sunnyvale store where I once worked; it has fared no better: Empty parking lot, empty shelves, bored employees, two customers. The Palo Alto store is no accident, Fry’s has been in serious decline for a while.

What happened? Amazon.


Pentagon admits China could outpace US on AI without changes

Business Insider, Ryan Pickrell


from

“If we do not find a way to strengthen the bonds between the United States government and industry and academia, then I would say we do have the real risk of not moving as fast as China when it comes to” artificial intelligence, Lt. Gen. Jack Shanahan said, responding to Insider’s queries at a Pentagon press briefing Friday.

Shanahan, the director of the Pentagon’s Joint Artificial Intelligence Center, said that China’s civil-military integration “does give them a leg up,” adding that the the Department of Defense will “have to work hard on strengthening the relationships we have with commercial industry.”


What the heck is Banjo? The artificial intelligence firm in your backyard that’s raised $250 million | ParkRecord.com

Park Record (Park City, UT), James Hoyt


from

Damien Patton likes to fly under the radar.

Paradoxically, the founder and CEO of Banjo, a Park City-based artificial intelligence firm, takes leaps in his personal life as an offroading and skiing enthusiast.

Unlike the visibility Skullcandy’s hilltop perch near S.R. 224 affords the consumer headphone brand, it’s not readily apparent to the vacationers, athletes and commuters passing through that Kimball Junction is the nerve center of a global data operation that just raised more than $100 million in its most recent round of funding.

“You’re always excited to see the fact that you’re being successful with it, but actually, I don’t dwell on it,” Patton said in an interview, indicating he prefers instead to think about solving the problems of today and the future — like homelessness, gun violence and the opioid crisis — with big data.

Petabytes of data pass through the unassuming Basin office park that houses Banjo’s global headquarters and around 100 of its employees. An array of monitors displaying a grid of traffic cameras, a heatmap of 911 calls across the country, a news ticker of global events written by a machine-learning algorithm and countless other visualizations of data loom over a bullpen of Banjo employees working at their desks. The voice of a worker speaking on the phone abruptly cuts off as he steps into a booth in the corner and slides a glass door shut, doing what the spies of “Get Smart” couldn’t with their Cone of Silence.

 
Events



Business Analytics Innovation Summit

University of Akron, College of Business Administration


from

Akron, OH September 27 at University of Akron. “This year’s Business Analytics Innovation Summit will tackle the critical question of how to produce benefits from Business Analytics initiatives.” [$$]


BetaKit’s Guide to Elevate AI

Elevate


from

Toronto, ON, Canada September 25. “Taking place at the MaRS Discovery District, Elevate AI will investigate the commercialization and responsible use of artificial intelligence, machine learning, and deep learning right now. Academics, AI founders, and corporate leaders will also discuss the Canadian AI ecosystem, and the roles the public and private sectors must play to develop and maintain Canada’s AI advantage.” [$$$]


Record Linkage Workshop

University of Minnesota, Institute for Research on Statistics and its Applications


from

Minneapolis, MN September 20, starting at 9 a.m., University of Minnesota. “Explore how subject matter experts in four distinct fields of work use machine learning methods to link databases together. What works? What doesn’t? Leverage their collective knowledge and discover new approaches for overcoming common methodological challenges.” [$$]


NYC AI Workshop

New York University, Machine Learning for Language Lab


from

New York, NY October 26-28 at NYU’s Machine Learning for Language Lab. ” The NYC AI Workshop is a two-day introductory workshop on artificial intelligence and machine learning designed for early undergraduate students in New York City area.” [registration required]


AI and The Work of the Future Congress

MIT, CSAIL


from

Cambridge, MA November 21-22 at MIT. “The MIT Computer Science and Artificial Intelligence Lab (CSAIL), MIT Work of the Future (WOTF), MIT Initiative on the Digital Economy (IDE) present the MIT AI & the Work of the Future Congress.” [$$$]


Industry Seminar: Justin Boelio, Director of Data Science, Penguin Random House

Harvard Data Science Initiative


from

Cambridge, MA September 5, starting at 5 p.m., Harvard Science Center. [free, registration required]


Metascience 2019 Symposium – The Emerging Field of Research on the Scientific Process

Brian Nosek, Jonathan Schooler, Jon Krosnick, Leif Nelson, Jan Walleczek


from

Stanford, CA September 5-8 at Stanford University. “This symposium serves as a formative meeting for metascience as a discipline.” [free, registration required]

 
Deadlines



Reproducibility Challenge @ NeurIPS 2019

“One of the challenges in machine learning research is to ensure that published results are reliable and reproducible. In support of this, the goal of this challenge is to investigate reproducibility of empirical results submitted to the 2019 Neural Information Processing Systems (NeurIPS). We are choosing NeurIPS for this challenge because the timing is right for course-based participants (see below), and this time we focus on accepted papers which will be automatically made available publicly on Open Review on acceptance notification.” Deadline for registration is November 1.
 
Tools & Resources



The Functional Art: An Introduction to Information Graphics and Visualization: Data ‘essays’, not data ‘stories’

Alberto Cairo, The Functional Art blog


from

Eric William Lin favors the term “data essay”, rather than “data story”, to refer to data and visualization-driven narrative and explanatory pieces, particularly those in news media or in publications such as The Pudding—which indeed calls its projects “essays”. In the same conversation on Twitter, Kim Rees replied that she prefers “data documentary’; she has a talk about it.


Statistical Insights: The ADIMA database on Multinational Enterprises

OECD Statistics and Data Directorate, Graham Pilgrim


from

OECD has begun to develop a new database – the Analytical Database on Individual Multinationals and Affiliates (ADIMA) – using a number of open “big data” sources that can provide new insights on individual MNEs and their global profiles.


Behavioral Risk Factor Surveillance System (BRFSS) Historical Questions | Chronic Disease and Health Promotion Data & Indicators

Center for Disease Control


from

The BRFSS is a continuous, state-based surveillance system that collects information about modifiable risk factors for chronic diseases and other leading causes of death.

 
Careers


Tenured and tenure track faculty positions

Assistant Professor



University of North Carolina-Chapel Hill, Department of Sociology; Chapel Hill, NC
Full-time, non-tenured academic positions

Research Scientist in High Performance Data Analytics



New Jersey Institute of Technology, Institute for Data Science; Newark, NJ

Leave a Comment

Your email address will not be published.