Data Science newsletter – September 25, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for September 25, 2019

GROUP CURATION: N/A

 
 
Data Science News



Automation Isn’t About to Make Truckers Obsolete

Harvard Business Review, Maury Gittleman and Kristen Monaco


from

Hardly a day goes by without someone suggesting that technologies like AI, machine learning, and robotics will transform the 21st century labor market. A prominent example of this has been in truck driving – an occupation that spans multiples industries and moves over 70% of U.S. freight by weight —which many speculate will see a widespread loss of jobs with the rise of self-driving technology. Some have forecast that autonomous vehicles will eliminate 2-3 million trucking jobs over the next several years. But in looking at the data, we believe that, while the risk of job loss from automation is very real, the projections that often get touted are overstated. In a study in the Industrial and Labor Relations Review, we argue that there are three key reasons why:

Reason #1: Truck drivers do more than drive trucks.


Machine learning you can dance to

MIT News, Office of the Vice Chancellor


from

“There’s a lot of manual searching to get the right musical result, which can be distracting and time-consuming,” says Justin Swaney, a PhD student in the MIT Department of Chemical Engineering, a music producer, and co-creator of a new tool that uses machine learning to help producers find just the perfect sound.

Called Samply, Swaney’s visual sample-library explorer combines music and machine learning into a new technology for producers. The top winner at the MIT Stephen A. Schwarzman College of Computing Machine Learning Across Disciplines Challenge at the Hello World celebration last winter, the tool uses a convolutional neural network to analyze audio waveforms.

“Samply organizes samples based on their sonic characteristics,” explains Swaney. “The result is an interactive plot where similar sounds are closer together and different sounds are farther apart. Samply allows multiple sample libraries to be visualized simultaneously, shortening the lag between imagining a sound in your head and finding it.”


Boundless Bio launches to exploit burgeoning understanding of extrachromosomal DNA

Chemical & Engineering News, isa M. Jarvis


from

Boundless Bio has formally launched with $46.4 million in funding from Arch Venture Partners and City Hill Ventures. The San Diego–based biotech firm hopes to prevent the proliferation of extrachromosomal DNA, gene fragments that recently were found to help tumors diversify, evolve, and develop resistance to cancer drugs.

In cancer cells, chunks of DNA break off from chromosomes to form circular structures called extrachromosomal DNA, or ecDNA. Although ecDNA was identified decades ago, researchers lacked good tools to study its role in cancer biology.

In the past 5 years, University of California, San Diego, professor Paul Mischel has shown that the rapidly replicating fragments contain many copies of the genetic stretches encoding for oncogenes, allowing them to be preferentially passed down from cell to cell. And in a 2017 paper (Nature, DOI: 10.1038/nature21356), his lab reported that ecDNA is present in some 40% of tumor tissues, while nearly absent in healthy tissues.


Artificial intelligence uses too much energy to train. Researchers want to fix that.

Slate, Covering Climate Now, April Glaser


from

Some researchers are starting to think about how to change that. In a paper published in July, Schwartz and three other researchers—Jesse Dodge from Carnegie Mellon University, Noah Smith from the University of Washington, and Oren Etzioni from the Allen Institute—make the case for standardizing how we measure just how carbon-intensive artificial intelligence is. The team wants all future A.I. research to tack on such a measurement—and believes that doing so will lead to new innovations in making the technology less taxing on the environment. In an interview, Schwartz lamented that most A.I. systems are graded only on how much better they get at doing their job. But it can’t just be about accuracy anymore, he says. It has to be about energy efficiency too. “We don’t want to reach a state where A.I. will be become a significant contributor to global warming,” he said.


Palantir seeks big valuation jump in new funding push

Silicon Valley Business Journal, Cromwell Schubarth


from

Palantir Technologies Inc. is reportedly seeking to raise between $1 billion and $3 billion in new funding at a valuation of at least $26 billion.

Reuters cited unnamed sources in a report Friday who said that the Palo Alto data analytics business has held talks with Japan’s SoftBank Group Corp., sovereign wealth funds and others about raising its first investment round in four years.

Those sources said that the new valuation could be as high as $30 billion, a 33 percent jump from the $20 billion the company was figured to be worth in the last funding.


Bears Against ICE: Tell EECS to cancel info-session with Palantir, the tech giant behind ICE’s deportation machine

The Action Network


from

Palantir’s surveillance technology is “mission critical” to ICE’s operations, according to the agency itself. This year, Palantir was revealed to be involved in the arrests of at least 443 people: mothers, fathers, cousins, and other family members and sponsors who were arrested when ICE investigated children who crossed the border alone. Palantir was shown to be intimately involved in workplace raids, like the raids led by ICE this month that arrested almost 700 people in Mississippi, the largest such raid in a decade. These raids have increased by 650% under President Trump, targeting thousands of people annually for arrest and deportation. The company has faced frequent protests and direct actions calling to shut down operations and drop ICE contracts at its headquarters this past year from Palo Alto to New York and internal worker demands that it stop facilitating ICE deportation machine.


Alumni, Industry Leaders to Return to Ithaca to Celebrate 20th Anniversary of Computing and Information Science

The Cornell Daily Sun, Shruti Juneja


from

Twenty years after Cornell launched an interdisciplinary Computing and Information Science unit to “mold the next generation of tech innovators and thought leaders,” some of those leaders will return to Ithaca to share their vision for the next 20 years — just a few days before Homecoming weekend.

CIS houses the departments of computer science, information science and statistics and data science with faculty hailing from mulitple colleges and two Cornell campuses. It is centered in the Bill and Melinda Gates Hall in Ithaca and the Cornell Tech campus in New York City, which opened in 2017.

“In 1999, the world was wary that the Y2K bug would wreak havoc on global computer systems, the Blackberry and Matrix was released, and Prince was realizing his ultimate party year,” the the CIS@20 website states. “Here in Ithaca, New York, we were announcing the advent of Cornell Computing and Information Science [CIS], which was created to respond uniquely to the educational and scientific challenges of the advancing information age.”


Nebraska’s ‘monumental’ data project key in improving health outcomes, reducing costs, officials say

Omah World Herald, Julie Anderson


from

Partners in Nebraska’s health information system are taking next steps to improve health outcomes — and the state’s IT workforce — through data.

The Nebraska Health Information Initiative, or NEHII, has been working to link patient records collected by physicians, hospitals, pharmacies and other health care entities across the state for roughly a decade. An ER doctor in Kearney, for instance, now can check the medications and health history of a visiting Omahan who lands in her hospital.

But so far, that data has been put to limited use with questions involving larger groups of patients, known as population health. That may involve checking whether at-risk groups, such as diabetics, are getting recommended eye and foot screenings on time and reaching out to those who aren’t.


NIH funds centers for advancing the reference sequence of the human genome

National Institutes of Health, News Releases


from

New grants totaling approximately $29.5 million will enable scientists to generate and maintain the most comprehensive reference sequence of the human genome. The awards, made over five years pending the availability of funds, are managed by the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health.

The currently available reference sequence of the human genome is becoming obsolete. The funds are necessary for making advances in DNA sequencing technology and computational methods possible. As a result, NHGRI will fund two centers as a part of a new Human Genome Reference Program (HGRP).

“It has grown more and more important to have a high-quality, highly usable human genome reference sequence that represents the diversity of human populations. The proposed improvements will serve the growing basic and clinical genomics research communities by helping them interpret both research and patient genome sequences,” said Adam Felsenfeld, Ph.D., NHGRI program director in the Division of Genome Sciences.


How Massachusetts Built A Booming Biotech Ecosystem | FiveThirtyEight

FiveThirtyEight, Satta Sarmah-Hightower


from

In 2018, Massachusetts biopharma companies surpassed more than $4.8 billion in venture capital investment, up from $900 million in 2012, according to Robert Coughlin, CEO of the Massachusetts Biotechnology Council (MassBio), an industry trade group with more than 1,200 members that span biotech and life sciences companies, research institutions and hospitals. Today, there are more than 113,000 biopharma and biotech research and development jobs and more than 30 million square feet of lab space throughout the state.

From investing in innovation to collaborative partnerships between government, academia, hospitals and the private sector, this is how Massachusetts has built, bric


UW creates new school for computer, data and information sciences to meet rising demand

University of Wisconsin, The Badger Herald, Arushi Gupta


from

For the first time in two decades, the University of Wisconsin announced the creation of a new division of the College of Letters & Science on September 5 — the School of Computer, Data and Information Sciences.

The school was created in response to rising interest of both employers and students in computer science. According to a UW news website, computer science has grown to be the most popular major at UW and has 1,560 students.

Interim Dean of L&S Eric Wilcots said there are many reasons for making the new school. Besides the fact computer science is the largest major on campus, Wilcots said that over the past few years, there has been an enormous ramp-up in student interest in computer science.


600,000 Images Removed from AI Database After Art Project Exposes Racist Bias

Hyperallergic, Zachary Small


from

The image tagging system that went viral on social media was part of artist Trevor Paglen and AI researcher Kate Crawford’s attempts to publicize how prejudiced technology can be.


Exclusive: Fitbit considers whether it should explore a sale – sources

Reuters, Greg Roumeliotis


from

Fitbit has held discussions with investment bank Qatalyst Partners about whether it should engage with potential acquirers, the sources said.

Fitbit has not yet decided to pursue a sale and there is no certainty it will do so, the sources said. Qatalyst has been seeking to persuade Fitbit to explore its options for several weeks, arguing it could attract acquisition interest from Google owner Alphabet Inc (GOOGL.O) as well as private equity firms, one of the sources added.

 
Tools & Resources



Packaging and Publishing in Python

GitHub – carpentries-incubator


from

“This is to be a collaboration among scientific computing researchers and educators to develop accessible tutorial materials and minimal templates that empower researchers developing new analysis techniques to release them in formats that encourage community adoption.” … “This tutorial will cover some next steps for getting your python-based data analysis project organized and ready to share.”


Reimagining Experimentation Analysis at Netflix

Medium, Netflix Tech Blog; Mao, Sri Sri Perangur, Colin McFarland


from

“With our new platform for experimentation analysis, it’s easy for scientists to perfectly recreate analyses on their laptops in a notebook. They can then choose from a library of statistics and visualizations or contribute their own to get a deeper understanding of the metrics.”


In-depth Introduction to Survey Analysis

MonkeyLearn Blog, Federico Pascual


from

“If you’ve obtained results for closed-ended survey questions, such as responses to scaled and multiple-choice questions, this data will be easy to quantify. If you want to analyze a qualitative survey, the scenario is a bit more complicated.”

“First, you need to build up a solid tagging structure to categorize your survey texts and make sure the whole team is on board, which becomes even more difficult if you handle considerable volumes of data. Then, you’ll need to make sense of all this tagged data by analyzing it manually.”


[1909.09428] A Critical Analysis of Biased Parsers in Unsupervised Parsing

arXiv, Computer Science > Computation and Language; Chris Dyer, Gábor Melis, Phil Blunsom


from

A series of recent papers has used a parsing algorithm due to Shen et al. (2018) to recover phrase-structure trees based on proxies for “syntactic depth.” These proxy depths are obtained from the representations learned by recurrent language models augmented with mechanisms that encourage the (unsupervised) discovery of hierarchical structure latent in natural language sentences. Using the same parser, we show that proxies derived from a conventional LSTM language model produce trees comparably well to the specialized architectures used in previous work. However, we also provide a detailed analysis of the parsing algorithm, showing (1) that it is incomplete—that is, it can recover only a fraction of possible trees—and (2) that it has a marked bias for right-branching structures which results in inflated performance in right-branching languages like English. Our analysis shows that evaluating with biased parsing algorithms can inflate the apparent structural competence of language models.


Data science for all – an open source approach to education

IBM, Data Science Community, Global Data Science Forum, Ana Echeverri


from

With IBM’s investment and industry expertise, University of Pennsylvania’s long-standing academic leadership and the Linux Foundation as a premier open source consortium, we are creating a curriculum kit comprised of a set of open source building blocks for teaching the core concepts of data science in undergraduate and graduate programs. These building blocks are based on Python and open source tools and frameworks, and include slides, documentation, code, and data sets that could be adopted or updated by anyone.

This idea of open source Data Science education is personal to me. Access to education changed my life. Coming from a small town in Colombia, South America, education gave me the opportunity to work with cutting edge Data Science and AI technologies at one of the best companies in the world (IBM). I believe this project will provide a foundation of building blocks for schools to supplement, strengthen and start up their data science programs. And most importantly, because this is open source, it enables any institution on earth thus providing more opportunities for learners to participate in the AI Economy like I did.

 
Careers


Tenured and tenure track faculty positions

Assistant Professor: Society and Ethics in Data Science



University of California-San Diego, Department of Communication; La Jolla, CA

Faculty Position in Educational Data Science



Stanford University, Graduate School of Education; Palo Alto, CA
Full-time, non-tenured academic positions

LEO Lecturer III



University of Michigan, School of Information; Ann Arbor, MI

Leave a Comment

Your email address will not be published.