Data Science newsletter – October 10, 2016

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for October 10, 2016

 
 
Data Science News





Researchers Trace “New World” Rice Back to Africa, Suggesting New Link between Crop Movement and the Slave Trade

NYU News


from October 03, 2016

A team of researchers has traced a type of rice grown in the South American country of Suriname to a similar type found in the fields of Ivory Coast, revealing additional linkages between “New World” and “Old World” crops that likely resulted from the slave trade.

“This evidence is an incentive to re-examine historical sources to trace the way African rice ended up in the New World and why it was cultivated by enslaved Africans and their descendants,” write the authors of the study, which was led by researchers at Wageningen University in the Netherlands.

The work, which appears in the journal Nature Plants, also included NYU Biology Professor Michael Purugganan.




Will the API Kill the Data Scientist?

ProgrammableWeb


from October 05, 2016

Since PipelineDB released the developer preview of its realtime analytics API, Stride, one is tempted to ask whether APIs in connection with machine learning may replace the growing role of human data scientists in the data analytics space. To further explore this concept, ProgrammableWeb caught up with PipelineDB’s President and Co-Founder, Jeff Ferguson. Ferguson and his team believe that machine learning can easily replace the role humans play in a number of data analytics tasks, and this shift has already begun to take place.




Precision medicine: This AI field is red hot, right now, with huge social implications

Precision medicine describes the practice of drawing insights from large amounts of data to develop therapeutic protocols for individuals or small groups of patients with unique characteristics. It can involve looking at gene, but the biggest advances right now involve aggregating patient records and/or clinical trial data. Progress towards building infrastructure and incentives for accessing well-anonymized, shared clinical trial data continues full of conflict and tech-utopianism. But, hey, it continues. At the moment, because of these legitimate bottlenecks in sharing procedures, many of the datasets are smaller than they will be in the future.

NIH has committed $207M to collect genetic and behavioral data from 1 million people. The Kavli HUMAN project at NYU is doing something similar among 10,000 New Yorkers over a ten year period. Both projects are promising because they rest on robust sampling methodologies and long term observations, but will not yield results any time soon. Most human health outcomes unfold over years, not days.

Using existing patient records, researchers at Columbia University’s Data Science Institute have already discovered one potentially fatal as-yet unknown drug + drug interaction and are hunting down three other suspiciously negative drug + drug pairs. (If you are taking the antibiotic Rocephin and the OTC drug Prevacid at the same time, you might want to stop. And see a doctor. Today.)

On 60 Minutes, Charlie Rose looked at how a collaboration between University of North Carolina’s Lineberger Comprehensive Cancer Center and IBM’s Watson team was able to train a computer to read 25 million medical journal articles. Comparing the literature to patient records, the AI model proposed new treatments for a whopping 30 percent of the patients in their sample. Cancer care is likely to see much more AI activity in the wake of Joe Biden’s cancer moonshot mission and the White House’s Precision Medicine Initiative.

As the ExAC protein-encoding gene database at MIT’s Broad Institute continues to develop, the field of precision medicine will start to move beyond patient records and other text-analytics methods into computational biology. There are significant challenges associated with combining deeply heterogenous genotypical data with text-based patient/clinician reports, and health metrics (e.g. blood pressure, temperature, cardiac events).

When it comes to artificial intelligence, mainstream news has a lopsided tendency to focus on driverless cars. However, precision medicine is currently one of the most robust applications of artificial intelligence to society that has both significant, positive social impacts AND serious ethical discussions about privacy, social engineering, and just responses to human suffering.




AI for good…?

In conferences around the world, professors and tech-world leaders are wondering whether and how AI can operate for the social good, or at least can be made to avoid doubling down on existing inequality, racial segregation, and our human propensity to invent things that are sort of good, but sometimes fatal (e.g. cars, our diets).

The University of Southern California has announced a Center on Artificial Intelligence for Society to leverage AI for broad social impact. Stanford earlier announced it’s One Hundred Year Study to “collect and connect reflections about AI and its influences as the field advances”.

Phillip Koopman, Professor of Computer Science at Carnegie Mellon University, explained that self-driving cars may never be verifiably safe because machine learning techniques learn by induction, not by optimizing for a set of known requirements. Because cities and humans are not fully predictable, self-driving cars will always be encountering something for the first time. Mileage may vary. But that could be the least of our concerns.

The non-profit DataKind that matches non-profits with data scientists to optimize their social impact received a $2m, 3-year commitment from the Omidyar Network.

Huawei committed $1m to Berkeley’s Artificial Intelligence Laboratory. There was no mention of social impact, only a commitment to advancing basic science.

Why isn’t advancing basic science always tied to social impact? That’s a question worth considering for a country responsible for The Manhattan Project.




FACT SHEET: The Opportunity Project – Unleashing the power of open data to build stronger ladders of opportunity for all Americans | whitehouse.gov

The White House, Office of the Press Secretary


from October 06, 2016

By providing easy access to curated federal and local datasets at opportunity.census.gov, and facilitating collaboration between technologists, issue experts, and community leaders, the Opportunity Project is transforming government data into digital tools that create more just and equitable communities and help people solve problems in their everyday lives.




AI For A Better Tomorrow

USC, Viterbi School of Engineering


from October 07, 2016

To advance artificial intelligence research, Tambe and Professor Eric Rice of the USC Suzanne Dworak-Peck School of Social Work have joined forces to co-direct the Center on Artificial Intelligence for Society, or CAIS. At the center, one of the first such university-based institutes dedicated to studying AI as a force for good, researchers will leverage artificial intelligence to address myriad problems ranging from climate change to security to health to homelessness.




Computing Research and the Emerging Field of Data Science

Computing Research Association, CRA Bulletin


from October 07, 2016

Our ability to collect, manipulate, analyze, and act on vast amounts of data is having a profound impact on all aspects of society. This transformation has led to the emergence of data science as a new discipline[1]. The explosive growth of interest in this area has been driven by research in social, natural, and physical sciences with access to data at an unprecedented scale and variety, by industry assembling huge amounts of operational and behavioral information to create new services and sources of revenue, and by government, social services and non-profits leveraging data for social good. This emerging discipline relies on a novel mix of mathematical and statistical modeling, computational thinking and methods, data representation and management, and domain expertise. While computing fields already provide many principles, tools and techniques to support data science applications and use cases, the computer science community also has the opportunity to contribute to the new research needed to further drive the development of the field. In addition, the community has the obligation to engage in developing guidelines for the responsible use of data science.




Disaggregating the evidence linking biodiversity and ecosystem services

Nature Communications; Taylor H. Ricketts et al.


from October 07, 2016

Ecosystem services (ES) are an increasingly popular policy framework for connecting biodiversity with human well-being. These efforts typically assume that biodiversity and ES covary, but the relationship between them remains remarkably unclear. Here we analyse >500 recent papers and show that reported relationships differ among ES, methods of measuring biodiversity and ES, and three different approaches to linking them (spatial correlations, management comparisons and functional experiments). For spatial correlations, biodiversity relates more strongly to measures of ES supply than to resulting human benefits. For management comparisons, biodiversity of ‘service providers’ predicts ES more often than biodiversity of functionally unrelated taxa, but the opposite is true for spatial correlations. Functional experiments occur at smaller spatial scales than management and spatial studies, which show contrasting responses to scale. Our results illuminate the varying dynamics relating biodiversity to ES, and show the importance of matching management efforts to the most relevant scientific evidence. [full text]




Ron Wyden Discusses Encryption, Data Privacy and Security

The New York Times


from October 09, 2016

Ron Wyden, a Democratic senator from Oregon, has been a leading voice on the side of encryption and against giving the Justice Department more power to get consumer data from tech companies. Mr. Wyden, a member of the Senate Select Committee on Intelligence, recently talked to The New York Times about the privacy-versus-security debate.




Real AI products arrive

O'Reilly Media, Jon Bruner, Bots podcast


from October 06, 2016

Something remarkable is happening in the world of artificial intelligence. At the O’Reilly AI Conference in New York, people weren’t just talking about AI as a far-off dream; they were talking about AI as something that exists in real products today.

In this episode of the O’Reilly Bots podcast, I talk with three artificial-intelligence practitioners about the real practice of AI: Hilary Mason, Jimi Smoot, and Roger Chen. [audio, 32:26]

 
Events



National Data Service Consortium Workshop



Pittsburgh, PA Our theme is New Frontiers in Data Discovery: Collaboration with Research Libraries. Wednesday-Friday, October 19-21. [free]

Join CTSP for social impact Un-Pitch Day on October 21st



Berkeley, CA Join us for Un-Pitch Day on October 21st from 3 – 7pm, where Berkeley graduate students will offer their technical expertise to help solve your organization’s pressing technology challenges.

Ekphrasis: A Symposium on Data and Information Visualization in the Arts and Sciences



New York, NY Friday, November 18, 2016. Deadline to submit projects to particpate is Friday, October 21.

8th Global Deep Learning Summit



San Francisco, CA Thursday-Friday, January 26-27. [$$$$]
 
NYU Center for Data Science News





PhD Program in Data Science – NYU Center for Data Science

NYU Center for Data Science


from January 04, 2017

Applications for inaugural class of PhD students accepted now for fall 2017 start date.

Admitted students are guaranteed financial support in the form of tuition and a stipend in the fall and spring semesters for 5 years. [video, 4:28]

 
Tools & Resources



traces — traces 0.3.1 documentation

Mike Stringer


from September 28, 2016

A Python library for unevenly-spaced time series analysis.


A Dramatic Tour through Python’s Data Visualization Landscape (including ggplot and Altair) – Regress to Impress

Dan Saber, Regress to Impress blog


from October 02, 2016

I’m using my discovery of Altair as an opportunity to step back — to investigate how Python’s statistical visualization options hang together. I hope this investigation proves helpful for you as well.


Learning R Versus d3.js for Visualization

Nathan Yau, Flowing Data blog


from October 07, 2016


The path of software development craftsmanship

Dmitri Pavlutin


from September 21, 2016

If you want to be a great software developer, without doubt you should constantly master trending technologies (like React, Redux, Angular). And in the meantime put the same amount of effort to learn regularly fundamentals like algorithms, data structures, the principles of writing scaling and clean code.

The article is my own thoughts on how to deal with the development complexity. Also you’ll find an interesting division of developer’s mastery progress. I name these phases upside-down triangle, rectangle and triangle.

 
Careers


Internships and other temporary positions

Japan Society for the Promotion of Science Fellowship



Social Science Research Council; United States
Postdocs

Postdoc; Data and Network Sciences



NetworkScience Institute, Northeastern University; Boston, MA

Leave a Comment

Your email address will not be published.