Data Science newsletter – August 16, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for August 16, 2017

GROUP CURATION: N/A

 
 
Data Science News



On Thin Ice: How AI Predicts Melting of Sea Ice

NVIDIA Blog, Jamie Beckett


from

Arctic sea ice that once foiled bold explorers and led sailors to their deaths is melting as the Earth warms.



Big data yields surprising connections between diseases

University of Chicago, Medicine & Biological Sciences


from

Using health insurance claims data from more than 480,000 people in nearly 130,000 families, researchers at the University of Chicago have created a new classification of common diseases based on how often they occur among genetically-related individuals.

Researchers hope the work, published this week in Nature Genetics, will help physicians make better diagnoses and treat root causes instead of symptoms.

“Understanding genetic similarities between diseases may mean that drugs that are effective for one disease may be effective for another one,” said Andrey Rzhetsky, PhD, the Edna K. Papazian Professor of Medicine and Human Genetics at UChicago who was the paper’s senior author. “And for those diseases with a large environmental component, that means we can perhaps prevent them by changing the environment.”


Machine learning for music discovery (workshop) @ICML2017, Sydney

Keunwoo Choi


from

Another icml, another ml4md workshop! It was 3rd machine learning for music discovery workshop this year and was featured with many awesome talks as expected. I’ll summarise who talked what briefly.


Low Budgets and High Expectations: Machine Learning Startups

Medium, Towards Data Science, Daniel Shapiro


from

This article is about the smaller and newer companies out there. Let me preface this whole thing by saying that not all startups are the same. Everyone is a special flower that grows or wilts just so. Now, let’s generalize. There exists a strange dynamic in the startup world where artificial intelligence is really hot, and yet the alignment between expectations and reality is way out of whack when it comes to pricing expectations. An enterprise client will typically need some quick solutions to deploy apache spark or hadoop or whatever, just to get at their data in a timely manner. That’s before even thinking about machine learning. For startups it is just as challenging.


Science and data science

Proceedings of the National Academy of Sciences; David M. Blei and Padhraic Smyth


from

Data science has attracted a lot of attention, promising to turn vast amounts of data into useful predictions and insights. In this article, we ask why scientists should care about data science. To answer, we discuss data science from three perspectives: statistical, computational, and human. Although each of the three is a critical component of data science, we argue that the effective combination of all three components is the essence of what data science is about.


Design Lab Faculty to Launch NSF-Funded Graduate Education Project

CalIT2


from

On September 1, the Design Lab at UC San Diego will launch a new project to help teach incoming graduate students how to program in the era of big data. The project is funded by the National Science Foundation (NSF) Innovations in Graduate Education (IGE) program, and the Design Lab project is one of 10 new IGE grants awarded a total of $4.8 million to “pilot, test and validate innovative and potentially transformative ways to teach science, technology, engineering and mathematics (STEM).”

The UC San Diego team will receive approximately $500,000 over three years to develop a new data-science teaching approach via “Augmenting, Piloting and Scaling Computational Notebooks to Train New Graduate Researchers in Data-Centric Programming.”


Walsh named interim co-director of the HathiTrust Research Center

Indiana University Bloomington, School of Informatics and Computing


from

John Walsh, an associate professor of information and library science at the School of Informatics and Computing, has been named interim co-director of the HathiTrust Research Center.

Walsh will replace Professor of Informatics and Computing Beth Plale, who will be taking a temporary assignment as science advisor for public access for the National Science Foundation. Plale was among the founders of the HTRC in 2011, which was opened to develop cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge in the HathiTrust Digital Library (HTDL).


High-quality online video with less rebuffering

MIT News, CSAIL


from

“Studies show that users abandon video sessions if the quality is too low, leading to major losses in ad revenue for content providers,” says MIT Professor Mohammad Alizadeh. “Sites constantly have to be looking for new ways to innovate.”

Along those lines, Alizadeh and his team at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed “Pensieve,” an artificial intelligence (AI) system that uses machine learning to pick different algorithms depending on network conditions. In doing so, it has been shown to deliver a higher-quality streaming experience with less rebuffering than existing systems.


Google paying Apple $3 billion to remain default search – Bernstein

CNBC, Todd Haselton


from

Google is paying Apple billions of dollars to remain the default search engine on iPhones and iPads, Bernstein said in a note to investors on Monday.

The firm believes that Google will pay Apple about $3 billion this year, up from $1 billion just three years ago, and that Google’s licensing fees make up a large bulk of Apple’s services business.


Swift creator Chris Lattner joins Google Brain after Tesla Autopilot stint

TechCrunch, Darrell Etherington


from

Chris Lattner, one of the key creators behind the Apple programming language Swift, is on the move again. After a short six-month stay at Tesla, which he joined last year from Apple to act as VP of Autopilot Software, Lattner announced on Twitter today that his next stop is Google Brain.


U.S. judge says LinkedIn cannot block startup from public profile data

Reuters, Salvador Rodriguez


from

A U.S. federal judge on Monday ruled that Microsoft Corp’s (MSFT.O) LinkedIn unit cannot prevent a startup from accessing public profile data, in a test of how much control a social media site can wield over information its users have deemed to be public.

U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles.


Amazon Could Probably Conquer Drugstores, Too – Bloomberg

Bloomberg View, Megan McArdle


from

Can Amazon do to the pharmacy business what it’s done to … well, everything else?

Rumor has it they’re thinking about doing just that. They’ve reportedly created a new general manager position to look into such an expansion.


Cardiologist-level arrhythmia detection with convolutional neural networks

Adrian Colyer, the morning paper


from

This is a story very much of our times: development and deployment of better devices/sensors (in this case an iRhythm Zio) leads to collection of much larger data sets than have been available previously. Apply state of the art deep learning techniques trained on those data sets, and you get a system that outperforms human experts.


Chicago-Based Civic Tech Fellowship Gives Fledgling Government Data Scientists a Head Start

Government Technology, Zack Quaintance


from

Shortly after [Nick] Lucius started with the city, however, he was picked for an inaugural program run by a Chicago data science company called Uptake. Dubbed beyond.uptake, the six-month fellowship included training sessions on methodologies, cybersecurity, machine learning, agile development, data visualization and more. Participants were also paired with three mentors from Uptake’s expert staff: a data scientist, an engineer, and a business or agile coach. They spent the next six months working on a project, checking in regularly with their mentors as they went.

Tom Schenk, Chicago’s chief data officer, said the program helped bridge the gap between the experience and technical knowledge Lucius brought to the work and the esoteric nature of the field.


To prevent traffic fatalities, Alabama invests $1.8 million in data science

statescoop, Jason Shueh


from

Alabama Gov. Kay Ivey is investing $1.8 million into new digital tools and data collection efforts to stop a surge of traffic fatalities throughout the state.

Ivey awarded the funding Monday through three grants to the University of Alabama’s Center for Advanced Public Safety (CAPS), a group that has served the state through its collection of traffic data, its traffic survey software and its efforts to develop a state highway safety plan. Ivey said in a statement she hopes the additional funding will lower the state’s traffic fatalities, which jumped nearly 24 percent from 788 in 2015 to 975 in 2016.


New program teaches data science for energy and environment research

University of Chicago, UChicago News


from

For the future of the planet, there are few research subjects more important than the global supplies of food, water and energy. To comprehensively study, understand and inform policy around these complex systems, the next generation of researchers in the physical, social and biological sciences will need fluency with data analysis methods that transverse traditional academic boundaries.

A new interdisciplinary curriculum will train graduate students from geosciences, economics, computer science, public policy and other programs in computational and data science techniques critical for modern science. With a $3 million award from the National Science Foundation, the new research traineeship grant will combine expertise from across UChicago and Argonne National Laboratory in computing, statistics, social science, climate and agriculture.


Is Python or R Better for a Data Science Educational Program?

Tech.co, Boulder, Matthew Drury


from

A frequent question from students who are considering joining our immersive program is “Why does the immersive program teach data science in Python instead of R?” I have a lot of experience writing both Python and R, ranging from small one-off projects and analysis to large efforts involving many developers, and I have formulated some well-balanced opinions on Python vs. R. I do believe that Python is the correct choice for an educational program in Data Science.


Researchers to develop mobile sensor technology, improve job performance

University of Minnesota Twin Cities


from

Mobile sensors for workplace productivity? Activity trackers for meetings? It’s the future of mobile sensor technology and two University of Minnesota faculty are making it possible, ushering in a new era of productivity and efficiency in the workplace.

Deniz Ones, professor of psychology at the University’s College of Liberal Arts and Mustafa al’Absi, professor of behavioral medicine at the University’s Medical School, join a six-university team which will use the $13.8 million project to develop and test mPerf, a system of mobile sensors and software that can be used to objectively assess everyday job performance. The project was funded by the Intelligence Advanced Research Projects Activity’s (IARPA) Multimodal Objective Sensing to Assess Individuals with Context (MOSAIC) program.


Artificial Intelligence Policy: A Roadmap

SSRN, Ryan Calo


from

Talk of artificial intelligence is everywhere. People marvel at the capacity of machines to translate any language and master any game. Others condemn the use of secret algorithms to sentence criminal defendants or recoil at the prospect of machines gunning for blue, pink, and white-collar jobs. Some worry aloud that artificial intelligence will be humankind’s “final invention.”

This essay, prepared in connection with UC Davis Law Review’s 50th anniversary symposium, explains why AI is suddenly on everyone’s mind and provides a roadmap to the major policy questions AI raises. The essay is designed to help policymakers, investors, technologists, scholars, and students understand the contemporary policy environment around AI at least well enough to initiative their own exploration.


Scientists Hack a Computer Using DNA

MIT Technology Review, Antonio Regalado


from

In what appears to be the first successful hack of a software program using DNA, researchers say malware they incorporated into a genetic molecule allowed them to take control of a computer used to analyze it.

The biological malware was created by scientists at the University of Washington in Seattle, who call it the first “DNA-based exploit of a computer system.”

To carry out the hack, researchers led by Tadayoshi Kohno (“see “Innovators Under 35, 2007”) and Luis Ceze encoded malicious software in a short stretch of DNA they purchased online. They then used it to gain “full control” over a computer that tried to process the genetic data after it was read by a DNA sequencing machine.


Artificial intelligence identifies plant species for science

Nature News & Comment, Heidi Ledford


from

Computer algorithms trained on the images of thousands of preserved plants have learned to automatically identify species that have been pressed, dried and mounted on herbarium sheets, researchers report.

The work, published in BMC Evolutionary Biology on 11 August1, is the first attempt to use deep learning — an artificial-intelligence technique that teaches neural networks using large, complex data sets — to tackle the difficult taxonomic task of identifying species in natural-history collections.

It’s unlikely to be the last attempt, says palaeobotanist Peter Wilf of Pennsylvania State University in University Park. “This kind of work is the future; this is where we’re going in natural history.”

 
Events



BigSurv18

Universitat Pompeu Fabra


from

Barcelona, Spain October 25-27, 2018. Exploring new statistical frontiers at the intersection of survey science and big data. [save the date]


Digital Building Lab Annual Symposium

Georgia Institute of Technology


from

Atlanta, GA Tuesday, October 3. Our seventh annual symposium showcases two days of panels with leading international experts, presentations, and an exhibit of AEC research being conducted across the Georgia Tech community. [$$$]


EuroSciPy 2017

SciPy


from

Erlangen, Germany August 28-September 1. The EuroSciPy meeting is a cross-disciplinary gathering focused on the use and development of the Python language in scientific research. [$$$]


Medidata NEXT Global 2017 – NYC

Cvent


from

New York, NY November 8-10. [$$$]

 
Deadlines



» CfP: Online Othering: Exploring the Dark Side of the Web The Sociological Imagination

This edited collection explores the othering and discrimination propagated and encountered by individuals online and in social media contexts and cultures. It problematizes and analyses the dichotomy presented between real and virtual worlds (and spaces) by exploring the motivations behind certain offending and othering behaviours, and the impact this has on the targets of online abuse and hate speech. Deadline for interested contributors is August 31.
 
NYU Center for Data Science News



[1708.02254] Asking Too Much? The Rhetorical Role of Questions in Political Discourse

arXiv, Computer Science > Computation and Language; Justine Zhang, Arthur Spirling, Cristian Danescu-Niculescu-Mizil


from

Questions play a prominent role in social interactions, performing rhetorical functions that go beyond that of simple informational exchange. The surface form of a question can signal the intention and background of the person asking it, as well as the nature of their relation with the interlocutor. While the informational nature of questions has been extensively examined in the context of question-answering applications, their rhetorical aspects have been largely understudied.
In this work we introduce an unsupervised methodology for extracting surface motifs that recur in questions, and for grouping them according to their latent rhetorical role. By applying this framework to the setting of question sessions in the UK parliament, we show that the resulting typology encodes key aspects of the political discourse—such as the bifurcation in questioning behavior between government and opposition parties—and reveals new insights into the effects of a legislator’s tenure and political career ambitions.


Researchers Use Machine Learning to Spot Counterfeit Consumer Products

NYU News


from

A team of researchers has developed a new mechanism that uses machine-learning algorithms to distinguish between genuine and counterfeit versions of the same product.


Data Science Study Session @Lifion

Meetup, New York Data Science Study Group


from

New York, NY Monday, August 21, Bruno Gonçalves, Moore-Sloan Fellow at NYU’s Center for Data Science, will explain the intuition behind word embeddings and the word2vec family of algorithms. [rsvp required]

 
Tools & Resources



SentEval

GitHub – facebookresearch


from

SentEval is a library for evaluating the quality of sentence embeddings. We assess their generalization power by using them as features on a broad and diverse set of “transfer” tasks (more details here). Our goal is to ease the study and the development of general-purpose fixed-size sentence representations.


Genie in a Box : Making Spark Easy for Stitch Fix Data Scientists

Stitch Fix Technology, Multithreaded blog, Neelesh Salian


from

The transient nature of the clusters should not be an issue or concern for Data Scientists. They should not worry about the execution environment and how to interact with it. Genie and the tools around it, help us in making that possible.


Practically Managing A Data Science Team

Medium, Towards Data Science, Seattle Data Guy


from

We wanted to offer some great tips that will help your data science team be more succesful. This has nothing to do with algorithms and models, and everything with how data specialist need to operate in a business.

 
Careers


Full-time, non-tenured academic positions

Full Stack Research Software Engineer



Caltech and Simons Collaboration on the Nonperturbative Bootstrap; Pasadena, CA

Leave a Comment

Your email address will not be published.