NYU Data Science newsletter – August 2, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 2, 2016

GROUP CURATION: N/A

Data Science News

Machines v. hackers: Cybersecurity’s artificial intelligence future

Christian Science Monitor, CSMonitor.com

from July 25, 2016

It’s a common refrain after any recent high-profile breach into federal computers and corporate networks: There aren’t enough skilled cybersecurity professionals to outwit criminal hackers.

That message from officials, executives, and industry experts isn’t just grousing, either. According to industry estimates, the US needs about 200,000 more workers to fill current cybersecurity roles. Globally, the gap is five times higher – an estimated 1 million workers.

How To Fool AI Into Seeing Something That Isn’t There

WIRED, Security

from July 29, 2016

Our machines are littered with security holes, because programmers are human. Humans make mistakes. In building the software that drives these computing systems, they allow code to run in the wrong place. They let the wrong data into the right place. They let in too much data. All this opens doors through which hackers can attack, and they do.

But even when artificial intelligence supplants those human programmers, risks remain. AI makes mistakes, too. As described in a new paper from researchers at Google and OpenAI, the artificial intelligence startup recently bootstrapped by Tesla founder Elon Musk, these risks are apparent in the new breed of AI that is rapidly reinventing our computing systems, and they could be particularly problematic as AI moves into security cameras, sensors, and other devices spread across the physical world. “This is really something that everyone should be thinking about,” says OpenAI researcher and ex-Googler Ian Goodfellow, who wrote the paper alongside current Google researchers Alexey Kurakin and Samy Bengio.

Data Science Summit 2016 – conference videos

YouTube, Turi

from August 01, 2016

The Data Science Summit is the premier event for data scientists and developers to learn the latest innovations and research from industry innovators and academic experts in data science, applied machine learning and building intelligent applications.

The Double Game of Digital Management: Managing in Times of Big Data and Analytics

bcg.perspectives

from July 27, 2016

Digital technologies are profoundly transforming not just products, services, and business processes but also management itself and the inner workings of companies. A new generation of executives is relying on data analytics and artificial intelligence to support and make decisions.

Our aim is to demystify these once-arcane fields and show how they have moved out of the laboratory and into the executive suite.

Trump vs Clinton: worlds apart on science

Nature News & Comment

from July 26, 2016

Presidential candidates begin to make clear their stark differences on climate change, energy production and stem-cell research.

How a happy moment for neuroscience is a sad moment for science

Medium, Mark Humphries

from August 01, 2016

The Allen Institute for Brain Science released a landmark set of data in June. Entitled the “Allen Brain Observatory”, it contains a vast array of recordings from the bit of cortex that deals with vision, while the eyes attached to that bit of cortex were looking at patterns. Not too exciting, you say. In some respects you’d be right: some mouse brain cells became active when shown some frankly boring pictures. Experimental neuroscience is eternally lucky that mice have a very high boredom threshold.

The release of this data took a privately funded institute. It could not have come from a publicly-funded scientist. It is a striking case-study in how modern science is worryingly broken, because it prioritises private achievement over the public good.

What Does the Universe Do When We’re Not Looking?

Universe Today, Fraser Cain

from July 19, 2016

I invite you to become as obsessed as I am about the LSST. … The Large Synoptic Survey Telescope is all about time. Watching the sky over and over, night after night, watching for anything that changes. [video, 7:15]

[1607.08237] The population of long-period transiting exoplanets

arXiv, Astrophysics > Earth and Planetary Astrophysics; Daniel Foreman-Mackey, Timothy D. Morton, David W. Hogg, Eric Agol, Bernhard Schölkopf

from July 27, 2016

The Kepler Mission has discovered thousands of exoplanets and revolutionized our understanding of their population. This large, homogeneous catalog of discoveries has enabled rigorous studies of the occurrence rate of exoplanets and planetary systems as a function of their physical properties. … we perform a fully automated search for long-period exoplanets with only one or two transits in the archival Kepler light curves. When applied to the 40,000 brightest Sun-like target stars, this search produces 16 long-period exoplanet candidates. Of these candidates, 6 are novel discoveries and 5 are in systems with inner short-period transiting planets.

More exoplanets:

Astronomers have released a list of list of the 20 most “Earth-like” planets (August 05, Wired UK)

What Does the Universe Do When We’re Not Looking? (July 19, Universe Today, Fraser Cain)

Make Algorithms Accountable

The New York Times, Julia Angwin

from August 01, 2016

Algorithms are ubiquitous in our lives. They map out the best route to our destination and help us find new music based on what we listen to now. But they are also being employed to inform fundamental decisions about our lives.

Companies use them to sort through stacks of résumés from job seekers. Credit agencies use them to determine our credit scores. And the criminal justice system is increasingly using algorithms to predict a defendant’s future criminality.

Those computer-generated criminal “risk scores” were at the center of a recent Wisconsin Supreme Court decision that set the first significant limits on the use of risk algorithms in sentencing.

Stanford researchers teach human ethics to autonomous cars

Stanford News, Selina Pan

from August 01, 2016

To actually integrate autonomous vehicles into everyday life, researchers need to teach the cars how to make the safe driving decisions that come intuitively to humans. Stanford engineers are conducting experiments to translate social behavior into algorithms so that self-driving cars will maintain vehicle safety and passenger comfort. [video,2:21]

[1607.08723] Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner

arXiv, Computer Science > Computation and Language; Emmanuel Dupoux

from July 29, 2016

During their first years of life, infants learn the language(s) of their environment at an amazing speed despite large cross cultural variations in amount and complexity of the available language input. Understanding this simple fact still escapes current cognitive and linguistic theories. Recently, spectacular progress in the engineering science, notably, machine learning and wearable technology, offer the promise of revolutionizing the study of cognitive development. Machine learning offers powerful learning algorithms that can achieve human-like performance on many linguistic tasks. Wearable sensors can capture vast amounts of data, which enable the reconstruction of the sensory experience of infants in their natural environment. The project of ‘reverse engineering’ language development, i.e., of building an effective system that mimics infant’s achievements appears therefore to be within reach.
Here, we analyze the conditions under which such a project can contribute to our scientific understanding of early language development. We argue that instead of defining a sub-problem or simplifying the data, computational models should address the full complexity of the learning situation, and take as input the raw sensory signals available to infants. This implies that (1) accessible but privacy-preserving repositories of home data be setup and widely shared, and (2) models be evaluated at different linguistic levels through a benchmark of psycholinguist tests that can be passed by machines and humans alike, (3) linguistically and psychologically plausible learning architectures be scaled up to real data using probabilistic/optimization principles from machine learning. We discuss the feasibility of this approach and present preliminary results.

Projected spread of Zika virus in the Americas | bioRxiv

bioRxiv; Alessandro Vespignani et al.

from July 28, 2016

We use a data-driven global stochastic epidemic model to project past and future spread of the Zika virus (ZIKV) in the Americas. The model has high spatial and temporal resolution, and integrates real-world demographic, human mobility, socioeconomic, temperature, and vector density data. We estimate that the first introduction of ZIKV to Brazil likely occurred between August 2013 and April 2014. We provide simulated epidemic profiles of incident ZIKV infections for several countries in the Americas through December 2016. The ZIKV epidemic is characterized by slow growth and high spatial and seasonal heterogeneity, attributable to the dynamics of the mosquito vector and to the characteristics and mobility of the human populations. We project the expected timing and number of cases of microcephaly assuming three levels of risk associated with ZIKV infection during the first trimester of pregnancy. Our approach represents an early modeling effort aimed at projecting the potential magnitude and timing of the ZIKV epidemic that might be refined as new and more accurate data from the region will be available.

Bayesian second law of thermodynamics

Physical Review E; Anthony Bartolotta, Sean M. Carroll, Stefan Leichenauer, and Jason Pollack

from August 01, 2016

We derive a generalization of the second law of thermodynamics that uses Bayesian updates to explicitly incorporate the effects of a measurement of a system at some point in its evolution. By allowing an experimenter’s knowledge to be updated by the measurement process, this formulation resolves a tension between the fact that the entropy of a statistical system can sometimes fluctuate downward and the information-theoretic idea that knowledge of a stochastically evolving system degrades over time. The Bayesian second law can be written as ?H(?m,?)+?Q?F|m?0, where ?H(?m,?) is the change in the cross entropy between the original phase-space probability distribution ? and the measurement-updated distribution ?m and ?Q?F|m is the expectation value of a generalized heat flow out of the system. We also derive refined versions of the second law that bound the entropy increase from below by a non-negative number, as well as Bayesian versions of integral fluctuation theorems. We demonstrate the formalism using simple analytical and numerical examples.

Events

Open Data Maker’s Hackathon

During the GODAN Summit 2016 happening September 15-16, 2016, GODAN [Global Open Data for Agriculture & Nutrition] is hosting the Open Data Maker’s Hackathon, bringing together software and agricultural innovators to create the beginnings of practical solutions that allow for better utilizing, collecting, and/or making more accessible open data to improve our food system. In an effort to support young, next generation innovators, this hackathon is intended for current university students and/or entrepreneurs 26 years old and younger.

New York, NY Thursday-Friday, September 15-16, at the New York Hilton Midtown.

Deadlines

The Lightning Challenge: Doctoral Students

deadline: subsection?

Applicants to participate in The Lightning Challenge: NYU Doctoral Students are asked to make and upload a 3 minute video of themselves talking about their dissertation research. This video and the accompanying application will be evaluated by the selection committee, who will choose Lightning Challenge participants from those who apply.

Applications are due by Monday, August 15, 2016.

Call for Proposals: LTER NCO Synthesis Working Groups | NCEAS

deadline: subsection?

To promote analysis and synthesis of LTER data, the NCO requests proposals for Synthesis Working Groups, with research to begin before May 2017. Funding is available for 2-4 projects of up to 2 years in duration. The Project Selection Committee is most interested in proposals anticipated to return high impact results in the five core LTER thematic areas (primary production, population studies, organic matter dynamics, mineral cycling, disturbance patterns and processes) as well as the urban LTER themes of land use change and human-environment interactions, but proposals in other areas will also be considered. Principal investigators are strongly encouraged to contact the NCO (proposals@lternet.edu) to briefly discuss ideas before submitting proposals.

Deadline for proposal submission is Wednesday, October 5, 2016

CDS News

Vasant Dhar on Using Data Science to Tackle Educational Problems

NYU Center for Data Science

from August 01, 2016

Dhar is mostly known for his ongoing work in the financial sector, but he recently began to investigate the role of data science in tracking educational standards, specifically in his home country, India. One of his ongoing research projects is exploring the possibility of using educational smartphone games to assist in childhood education.

Tools & Resources

Personal Data Resources

Research Data Network

from August 02, 2016

Curated list of resources for managing personal data and best practice for anonymisation and preservation.

Deep Reinforcement Learning for Keras.

GitHub – matthiasplappert

from July 31, 2016

keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seamlessly integrates with the deep learning library Keras. Just like Keras, it works with either Theano or TensorFlow, which means that you can train your algorithm efficiently either on CPU or GPU. Furthermore, keras-rl works with OpenAI Gym out of the box. This means that evaluating and playing around with different algorithms is easy.

Faculty research and public records laws

Society of American Archivists' Records Management Roundtable, The Schedule blog

from July 28, 2016

Working in higher education records management, I am fascinated by how different states treat records commonly associated with universities. One of the most unpredictable areas is state law regarding the public record status of public university faculty research.

Content for fifth elephant workshop 2016. Pandas, Luigi, Spark & Flask

GitHub – unnati-xyz

from July 31, 2016

In this workshop, you will learn how to build a scalable data science platform with set up and conduct data engineering using Pandas and Luigi, build a machine learning model with Apache Spark and deploy it as predictive api with Flask

Careers

4 Reasons Not To Get That Masters In Data Science

Forbes, Meta S. Brown

Sports.BradStenger.com

NYU Data Science newsletter – August 2, 2016

Leave a Comment Cancel reply