NYU Data Science newsletter – May 6, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 6, 2016

GROUP CURATION: N/A

 
Data Science News



Google Is Feeding Romance Novels To Its Artificial Intelligence Engine To Make Its Products More Conversational

BuzzFeed News, Alex Kantrowitz


from May 04, 2016

For the past few months, Google has been feeding text [from romance novels] to an AI engine — all of it taken from steamy romance novels with titles like Unconditional Love, Ignited, Fatal Desire, and Jacked Up. Google’s AI has read them all — every randy, bodice-ripping page — because the researchers overseeing its development have determined that parsing the text of romance novels could be a great way of enhancing the company’s technology with some of the personality and conversational skills it lacks.

 

Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights | whitehouse.gov

whitehouse.gov; Megan Smith, DJ Patil, Cecilia Muñoz


from May 04, 2016

As technology advances and our economic, social, and civic lives become increasingly digital, we are faced with ethical questions of great consequence. Big data and associated technologies create enormous new opportunities to revisit assumptions and instead make data-driven decisions. Properly harnessed, big data can be a tool for overcoming longstanding bias and rooting out discrimination.

The era of big data is also full of risk. The algorithmic systems that turn data into information are not infallible—they rely on the imperfect inputs, logic, probability, and people who design them. Predictors of success can become barriers to entry; careful marketing can be rooted in stereotype. Without deliberate care, these innovations can easily hardwire discrimination, reinforce bias, and mask opportunity.

 

Google Is Helping Flint Prioritize Which Lead Pipes Need to Be Replaced

Gizmodo


from May 03, 2016

Using building records as well as the data collected by water testers, scientists created a predictive algorithm that makes a good guess about the state of each of Flint’s service lines based on certain property indicators, like the age, size, and type of building.

With the information they do know, the scientists will be able to deploy this algorithm across the entire city using machine learning, creating a fairly accurate map which recommends how repairs should be prioritized based on the number of people they will benefit.

 

New algorithm puts time-scrambled data into chronological order

physicsworld.com


from May 04, 2016

An international team of scientists has developed an algorithm that can put data with large time uncertainty into chronological order. After applying statistical techniques to data obtained with a 300 fs (300 × 10–15 s) timing uncertainty, the team was able to describe the laser-driven explosion of a nitrogen molecule with 1 fs resolution – an improvement in time resolution of two orders of magnitude. Because the algorithm is based on statistics, it could potentially be applied to other disciplines with timing uncertainty, such as climate science and astronomy.

 

How computers find naked people

Kill Screen, David Rudin


from May 04, 2016

It’s 2016, and as Clarifai “data scientist and NSFW enthusiast” Ryan Compton (yes, that’s his title) puts it: “the discovery of nude pictures has been a central problem in computer vision for over two decades. Potter Stewart’s “I know it when I see it” maxim doesn’t apply to technology. Devices don’t really see things—not in any conventional sense. Computers can recognize patterns, but that raises the question of what nudity looks like as a pattern. Humanity, as anyone with an Internet connection can tell you, has conjured all sorts of imaginative ways of being nude, so good luck with that one.

 

New Textio tool uses machine learning to find the most effective words

GeekWire


from May 04, 2016

Picking the right words is a key component of many jobs, and most of the time, we rely on our own vocabulary. But a new feature from Seattle-based tech startup Textio uses the power of machine learning to find the most effective words, thanks to a vast database and the power of a supercomputer.

Textio’s new predictive engine, called Opportunities, was announced today. Designed for job posts and emails to candidates, Opportunities adds to the existing Textio analysis tool that helps companies find diverse employees. The tool already spots language that historically attracts more male candidates and suggests replacements that are more neutral.

 

What We Mean When We Talk About EvGen Part II: Building Out a National System for Evidence Generation

FDA Voice, Rachel E. Sherman


from May 03, 2016

In an earlier FDA Voice blog post, we discussed a pair of concepts – interoperability and connectivity – that are essential prerequisites for the creation of a successful national system for evidence generation (or “EvGen”). In this post, we take a look at how we would apply these constructs as we go about building such a system.

 
Events



Complexity: A Science of the Future?



In recent years, the advent of sophisticated computer technology has allowed studies of complex systems, in which large collections of components interact in nonlinear ways, such as cell colonies, neurons in the brain, the immune system, economic markets, and social groups. In complex systems, simple, nonlinear interactions are iterated over time and give rise to self-organization, evolution, learning, and adaptation—phenomena that eluded explanation until now, or as the physicist Phil Anderson stated, “More is different.” This panel, including selected physicists and mathematicians, will explore whether studies of complexity complement traditional physics or may upend science as we know it.

New York, NY. Monday, May 9, at The New York Academy of Sciences (7 World Trade Center
250 Greenwich Street, 40th floor), starting at 7:00 p.m.

 

UCLA Information Studies to Host Discussion on Use of Data in Policing



The UCLA Department of Information Studies will host a panel titled, “Community Safety & Self Determination in a Police State: The Impact of Data, Prediction, & Policing in Poor and Working Class Communities.”

Virginia Eubanks, co-founder of the grassroots anti-poverty organization Our Knowledge, Our Power and members of the Stop LAPD Spying Coalition will be discussing the use of data for policing and surveillance of local communities.

Thursday, May 12, at UCLA, starting at 3 p.m. in the GSE&IS Building, Room 111, North Campus.

 
CDS News



Is Big Data Taking Us Closer to the Deeper Questions in Artificial Intelligence?

Edge.org, Gary Marcus


from May 04, 2016

What I’m worried about and what I’m thinking about these days is if we’re really making progress in AI. I’m also interested in the same kind of question in neuroscience, which is that we feel like we’re making progress, but are we?

Let’s take AI first. There’s huge progress in AI, or at least huge interest in AI—a bigger interest than there’s ever been in my lifetime. I’ve been interested in AI since I was a little kid trying to program computers to play chess, and do natural language databases, and things like that, though not very well. [video, 39:01]

 
Tools & Resources



BTrDB: Optimizing Storage System Design for Timeseries Processing

Adrian Colyer, the morning paper blog


from May 04, 2016

It turns out you can accomplish quite a lot with 4,709 lines of Go code! How about a full time-series database implementation, robust enough to be run in production for a year where it stored 2.1 trillion data points, and supporting 119M queries per second (53M inserts per second) in a four-node cluster? Statistical queries over the data complete in 100-250 ms while summarizing up to 4 billion points. It’s pretty space-efficient too, with a 2.9x compression ratio. At the heart of these impressive results, is a data structure supporting a novel abstraction for time-series data: a time partitioning, copy-on-write, version-annotated, k-ary tree.

 

IU data scientists launch free tools to analyze online trends, memes

IU Bloomington Newsroom


from May 05, 2016

Did more people see #thedress as blue and black or white and gold? How many Twitter users wanted pop star Katy Perry to take the #icebucketchallenge?

The power to explore online social media movements — from the pop cultural to the political — with the same algorithmic sophistication as top experts in the field is now available to journalists, researchers and members of the public from a free, user-friendly online software suite released today by scientists at Indiana University.

The Web-based tools, called the Observatory on Social Media, or “OSoMe” (pronounced “awesome”), provide anyone with an Internet connection the power to analyze online trends, memes and other online bursts of viral activity.

 

A library for probabilistic modeling, inference, and criticism. Deep generative models, variational inference. Runs on TensorFlow.

GitHub – blei-lab/edward, Dustin Tran


from May 05, 2016

Edward is a Python library for probabilistic modeling, inference, and criticism. It enables black box inference for models with discrete and continuous latent variables, neural network parameterizations, and infinite dimensional parameter spaces. Edward serves as a fusion of three fields: Bayesian statistics and machine learning, deep learning, and probabilistic programming.

 

Leave a Comment

Your email address will not be published.