Data Science newsletter – February 8, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for February 8, 2019

GROUP CURATION: N/A

 
 
Data Science News



Measuring all the Factors around Social Science Impact

Social Science Space, SAGE


from

As part of a larger effort to support social scientists achieve and demonstrate impact, SAGE Publishing brought together 14 individuals who are both passionate about social science’s impact and intimately involved in improving its measurement for a one-day workshop held at Google’s main campus. With representation from Google Scholar, Clarivate, Altmetric, the Sloan Foundation, universities, and social science organizations, the group spent the day mapping out the impact landscape, making connections, and recommending opportunities for assessment of impact.

The first step saw the group come to a consensus on who are the constituents of social science affected by existing and potential metrics.


Walgreens Tests New Smart Coolers

The Atlantic, Sidney Fussell


from

Walgreens is piloting a new line of “smart coolers”—fridges equipped with cameras that scan shoppers’ faces and make inferences on their age and gender. On January 14, the company announced its first trial at a store in Chicago in January, and plans to equip stores in New York and San Francisco with the tech.

Demographic information is key to retail shopping. Retailers want to know what people are buying, segmenting shoppers by gender, age, and income (to name a few characteristics) and then targeting them precisely. To that end, these smart coolers are a marvel.


How Do Word Meanings Significantly Change Over Time and In Context? Data Science Has an Answer

Medium, NYU Center for Data Science


from

Linguists, social scientists, and natural language processing (NLP) researchers want to understand whether and how meanings of words vary across textual and historical contexts. In NLP research, “word embedding” refers to the mapping of word meanings as vectors of real numbers to represent how similar one word is to another in meaning. This allows researchers to determine how the meaning of a word like “awesome” may have shifted in a statistically significant way over time. Data scientists have already developed word embedding techniques to analyze language evolution, but most models split training sets according to decades or marked shifts in meaning of a particular word.

In a new paper, Rujun Han of the University of Southern California and CDS alum, Michael Gill of Facebook and former CDS Faculty Fellow, Arthur Spirling, Associate Professor of Politics and Data Science, Kyunghyun Cho, Assistant Professor of Computer Science and Data Science, propose a novel method that leverages document metadata to comprehensively model how a word’s meaning changes over time and in relation to similar terms. They focus on the problem of how to think about similarities between word meaning vectors in a statistical way.


Thousands of scientists run up against Elsevier’s paywall

Nature, News, Holly Else


from

Researchers have been left without access to new papers as libraries and the major publisher fail to agree on subscription deals.


Google and Waze Must Stop Sharing Drunken-Driving Checkpoints, New York Police Demand

Streetsblog, The New York Times, Michael Gold


from

One feature has Waze in conflict with law enforcement officials across the country: how the app marks the location of police officers on the roads ahead or stationed at drunken-driving checkpoints.

Over the weekend, the New York Police Department, the largest force in the nation, joined the fray, sending a letter to Google demanding that the tech giant pull that feature from Waze.


Detecting Breast Cancer with Convolutional Neural Nets

Medium, NYU Center for Data Science


from

Breast tissue is divided into four categories: “fatty”, “scattered fibroglandular density”, “heterogeneously dense” and “extremely dense.” The latter two of these classifications are considered “dense,” which indicates the presence of more fibroglandular tissue. Tissue is normally “dense” in younger women and becomes more fatty after menopause. Of women above age 40, around 50% have “dense” breast tissue. Current research suggests that women with dense breast tissue are at higher risk for developing tumors. This is especially dangerous because dense breast tissue can conceal indicators of cancer on a mammogram.

Contributors to a recent publication on breast density classification include the Center for Data Science’s Nan Wu, Krzysztof J. Geras, also at the School of Medicine, Yiqiu Shen, Jingyi Su, and Kyunghyun Cho.


Soil Moisture Data Sets Become Fertile Ground for Applications

Eos, Rupesh Shrestha and Alison G. Boyer


from

We developed the Soil Moisture Visualizer (SMV), an integrated visualization and data distribution platform, to harmonize and organize the wide variety of soil moisture data available for North America (Figure 1). Bringing these disparate data into a single system not only adds value to the existing data but also facilitates exploratory analysis and data discovery for different groups of users. The SMV is provided as an open and free data access tool from NASA’s Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC).


Chasing down that list of potential Predpol customers reveals dozens of cities that have secretly experimented with “predictive policing” / Boing Boing

Boing Boing, Cory Doctorow, and Motherboard, Caroline Haskins


from

Last October, I published a list of cities that appeared to have contracted with Predpol, a “predictive policing” company, based on research provided to me by an anonymous source who used clever methods to uncover the customer list.

Now, Motherboard’s Caroline Haskins has used that list as a jumping-off point for a wide array of public records requests that have revealed a pattern of extensive, secret experimentation with precrime in dozens of cities that allowed Predpol to access policing data and predict where more crimes would occur.


NC State, Infosys collaborate on training new hires

Raleigh News & Observer, Zachery Eanes


from

One of the reasons North Carolina was able to convince information technology company Infosys to expand in Raleigh — besides $25 million in incentives — was access to the state’s vibrant universities.

Now, less than a year after opening its office in Wake County, Infosys is already striking a partnership with North Carolina State University.

N.C. State and Infosys began a three-year collaboration this year to train some of its newest hires in the applications of data science, a field that N.C. State has made a name for itself nationally.


New disease surveillance tool helps detect any human virus

Broad Institute


from

A new computational method developed by Broad Institute scientists helps overcome this hurdle. Built in the lab of Broad Institute researcher Pardis Sabeti, the “CATCH” method can be used to design molecular “baits” for any virus known to infect humans and all their known strains, including those that are present in low abundance in clinical samples, such as Zika. The approach can help small sequencing centers around the globe conduct disease surveillance more efficiently and cost-effectively, which can provide crucial information for controlling outbreaks.

The new study was led by MIT graduate student Hayden Metsky and postdoctoral researcher Katie Siddle, and it appears online in Nature Biotechnology.

“As genomic sequencing becomes a critical part of disease surveillance, tools like CATCH will help us and others detect outbreaks earlier and generate more data on pathogens that can be shared with the wider scientific and medical research communities,” said Christian Matranga, a co-senior author of the new study who has joined a local biotech startup.


Hypergiant deal boosts AI in Space City

Houston Chronicle, Andrea Leinfelder


from

Artificial intelligence company Hypergiant Industries will use satellites as its next vantage point for collecting data and helping customers predict the future.

Austin-based Hypergiant announced Tuesday that it has acquired Satellite & Extraterrestrial Operations & Procedures, a Houston company that deploys satellites into space. Hypergiant also said it will open a new aerospace division in Texas’ hub for space exploration.


Eternal Market Patience Offers Eternal Rewards

Bloomberg Opinion, John Authers


from

“Genius is eternal patience.” So said Michelangelo, and his comment continues to resonate so strongly that it greeted Bloomberg users Wednesday morning when they signed on to the terminal. And his comment fits neatly with the remarkable efforts now afoot as the quants mine ever-greater mounds of historical information in search of patterns.

The reason for doing so is that any given statistical finding grows far more robust if it can be shown to hold over a larger sample. And the latest entry to the field is remarkable. A group of quants at Robeco Groep, the Dutch investment house, have produced research into the global performance of stocks, bonds, currencies and commodities going back to 1800. The numbers they processed all came from the enormous data set built by Global Financial Data.


What’s Behind JPMorgan Chase’s Big Bet on Artificial Intelligence?

Knowledge@Wharton


from

When America’s biggest bank, JPMorgan Chase, hired Apoorv Saxena in August 2018 as its global head of AI and machine-learning services based in San Mateo, Calif., finance industry watchers saw that as a sign that the bank was making a big bet on artificial intelligence to shape its future strategies. Saxena previously headed product management for cloud-based artificial intelligence at Google. At JPMorgan Chase, he also oversees asset and wealth management artificial intelligence technology.

According to Saxena, AI will help financial services companies expand banking penetration worldwide, launch new products and deepen customer engagements. [audio, 21:40]


IBM plans $2 billion expansion in New York for artificial intelligence

LoHud.com, Chad Arnold


from

IBM will invest $2 billion in New York to develop artificial intelligence hardware and bolster research opportunities in the field with the state’s publicly funded university system, the company announced Thursday.

The technology company will be partnering with the State University of New York to develop an AI Hardware Center at SUNY Polytechnic Institute in Albany.

And it comes with a hefty state subsidy of $300 million.


The Challenge of America’s First Online Census

WIRED, Security, Issie Lapowsky


from

Despite the country’s ballooning population and advances in automation, the crucial process of counting every person living in the United States hasn’t changed all that much in the course of the census’ 230-year history. Until now, it’s mostly come down to distributing paper questionnaires to every home and hiring an army of clipboard-carrying canvassers to knock on the door of anyone who doesn’t respond. In 2020, that will change. For the first time ever, the bureau is asking the majority of people to answer the census online. Not only that, but behind the scenes the entire process of running the census is getting a high-tech facelift.

 
Events



Cascadia R Conference 2019

Cascadia R


from

Redmond, WA June 8, starting at 8 a.m., Microsoft Conference Center. “The third annual Cascadia R Conference is a one day event where Pacific Northwest #rstats programming enthusiasts gather to learn how organizations from Vancouver to Portland, are innovating with R.” [$$]


VOICE 2019

Modev


from

Newark, NJ July 22-26. “VOICE Summit sponsored by Amazon Alexa is the world’s largest voice-tech conference attracting 5,000+ developers, conversational designers, startups, brands, agencies and executives at the forefront of the voice-first era.” [$$$]


ICL 2019 Conference

ICLR


from

New Orleans, LA May 6-9. “Seventh International Conference on Learning Representations” [registration required]


DIDS: Fostering diverse & inclusive data science at Berkeley

Berkeley Institute for Data Science


from

Berkeley, CA February 13, starting at 3:30 p.m., Berkeley Institute for Data Science. “An event for under-represented undergraduate students” [free, registration required]

 
Deadlines



Activities in Extended Videos Prize Challenge (ActEV-PC)

“The Activities in Extended Videos Prize Challenge (ActEV-PC) seeks to encourage the development of robust automatic activity detection algorithms for an extended video. ActEV-PC is operated by NIST and sponsored by IARPA. Challenge.” Open Leaderboard evaluation ends on February 28.

Are you looking for a mentor? Am doing research on Mentoring.

Please complete this brief survey, whether you are in #DataScience or any other field.
 
Tools & Resources



Flickr-Faces-HQ Dataset (FFHQ)

Github – NVlabs


from

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN).


TensorFlow.js: machine learning for the web and beyond | the morning paper

Adrian Colyer, the morning paper blog


from

TensorFlow.js brings TensorFlow and Keras to the the JavaScript ecosystem, supporting both Node.js and browser-based applications. As well as programmer accessibility and ease of integration, running on-device means that in many cases user data never has to leave the device.


Being Persuasive: Lessons from Lawyers That All Scientists Need

Eos, Tamee Albrecht and Amy Hudson


from

In summer 2017, we participated in the Expert Witness Training Academy – Effectively Communicating Science, a workshop hosted by Mitchell Hamline School of Law, a law school in St. Paul, Minn., and funded by the Paleoclimate Program at the National Science Foundation. Over an intensive weeklong training, two dozen scientists joined law students in a mock trial [Thorstad et al., 2017]. Each participant practiced the roles of lawyer and expert witness. Law faculty coached us in the trial’s main components: opening statements, direct examination, cross-examination, and closing arguments.

Talk about being out of your comfort zone. The lawyer’s role was unfamiliar, to say the least. What did we know about delivering an opening statement? How are cross-examinations structured? When do we yell “Objection!”?


Data visualization literacy: Definitions, conceptual frameworks, exercises, and assessments

Proceedings of the National Academy of Sciences; Katy Börner, Andreas Bueckle, and Michael Ginda


from

In the information age, the ability to read and construct data visualizations becomes as important as the ability to read and write text. However, while standard definitions and theoretical frameworks to teach and assess textual, mathematical, and visual literacy exist, current data visualization literacy (DVL) definitions and frameworks are not comprehensive enough to guide the design of DVL teaching and assessment. This paper introduces a data visualization literacy framework (DVL-FW) that was specifically developed to define, teach, and assess DVL. The holistic DVL-FW promotes both the reading and construction of data visualizations, a pairing analogous to that of both reading and writing in textual literacy and understanding and applying in mathematical literacy. Specifically, the DVL-FW defines a hierarchical typology of core concepts and details the process steps that are required to extract insights from data. Advancing the state of the art, the DVL-FW interlinks theoretical and procedural knowledge and showcases how both can be combined to design curricula and assessment measures for DVL. Earlier versions of the DVL-FW have been used to teach DVL to more than 8,500 residential and online students, and results from this effort have helped revise and validate the DVL-FW presented here.


BBC Visual and Data Journalism cookbook for R graphics

BBC


from

At the BBC data team, we have developed an R package and an R cookbook to make the process of creating publication-ready graphics in our in-house style using R’s ggplot2 library a more reproducible process, as well as making it easier for people new to R to create graphics.


Release Release of 2019-02-06 ·baseballdatabank

Github – chadwickbureau


from

The first stable release for 2018. All significant changes relative to 2018 have been made, and all tables updated. Further releases in v2018.x will be “bug fixes” only. Note that some awards tables and the salaries table have not been updated due to no contributions; these remain welcomed and will be included in a subsequent release if received! The lack of interest in these tables suggests they are not widely used, and we are considering deprecating them or relegating them to appendices in future years.

 
Careers


Full-time positions outside academia

Data Scientist



Thorn; San Francisco, CA
Tenured and tenure track faculty positions

Professor of Cultural Data Analytics



Tallinn University, School of Baltic Film, Media, Arts and Communication; Tallinn, Estonia

Leave a Comment

Your email address will not be published.