Data Science newsletter – January 31, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for January 31, 2018

GROUP CURATION: N/A

 
 
Data Science News



Harvard student helps crack mystery of Inca code

The Boston Globe, Christela Guerra


from

“The only history we have of the Inca Empire are ones that were written by Spaniards after they conquered the Incas,” said Urton. “And those have all sorts of problems about the Spaniards writing from their own viewpoint and with their own prejudices. It seemed to me that the khipus represented the Incas’s own histories of themselves.”

Alas, there is no Rosetta Stone for khipus, no translation for what the patterns of knots represent, and no match between the Spanish documents and the khipus themselves. What did exist was the Harvard Khipu Database Project, which Urton established in 2002 to collect all known information about khipus into one centralized repository.


Purdue-affiliated startup designing next-generation hardware, software to propel computer intelligence to next level

Purdue University, Research Foundation News


from

A Purdue University-affiliated startup is designing next-generation hardware and software for deep learning aimed at enabling computers to understand the world in the same way humans do.

FWDNXT, based in the Purdue Research Park, has developed a low-power mobile coprocessor called Snowflake for accelerating deep neural networks effective at image recognition and classification. Snowflake was designed with the primary goal of optimizing computational efficiency by processing multiple streams of information to mix deep learning and artificial intelligence techniques with augmented reality application.


Pittsburgh’s Metro21 partnership predicts fire risk with data analytics

statescoop, Jason Shueh


from

A partnership between the Pittsburgh Bureau of Fire and Carnegie Mellon University’s smart city initiative, Metro21, has developed new tech that can more accurately show a building’s fire risk.

The collaboration has led the Metro21’s team of technologists to develop a machine learning model that uses historical data to predict where structure fires are most likely to occur in nonresidential buildings. In a status update on Metro21’s site on Jan. 3, organizers reported that the modeling has proven to be more accurate than best modeling previously available in the industry.

“For any given 6-month window, we are able to accurately detect … 57 percent of the fire incidents that occurred,” the group’s most recent report reads.


NIST’s Superconducting Synapse May Be Missing Piece for ‘Artificial Brains’

NIST


from

Researchers at the National Institute of Standards and Technology (NIST) have built a superconducting switch that “learns” like a biological system and could connect processors and store memories in future computers operating like the human brain.

The NIST switch, described in Science Advances, is called a synapse, like its biological counterpart, and it supplies a missing piece for so-called neuromorphic computers. Envisioned as a new type of artificial intelligence, such computers could boost perception and decision-making for applications such as self-driving cars and cancer diagnosis.


RBC funds new cybersecurity lab at University of Waterloo

CBC News


from

Increasing online attacks on financial institutions, including banks, is the main reason the Royal Bank of Canada is funding a new cybersecurity lab and research at the University of Waterloo.

RBC is investing $1.78 million to develop advanced cybersecurity and privacy tools in the face of malicious attacks and botnets that target clients’ personal data.


Amazon is becoming its own university

Quartz, Amy X. Wang


from

The e-commerce/retail/food/books/cloud-computing/etc company made another move this week that, while nowhere near as flashy as the above efforts, tells of curious things to come. Amazon has hired Candace Thille, a leader in learning science, cognitive science, and open education at Stanford University, to be “director of learning science and engineering.” A spokesperson told Inside Higher Ed that Thille will work “with our Global Learning Development Team to scale and innovate workplace learning at Amazon”; Thille herself said she is “not really at liberty to discuss” her new project.

What could Amazon want with a higher education expert? The company already has footholds in the learning market, running several educational resource platforms. But Thille is famous specifically for her data-driven work, conducted at Stanford and Carnegie Mellon University, on nontraditional ways of learning, teaching, and training—all of which are perfect, perhaps even necessary, for the education of employees.


The accuracy, fairness, and limits of predicting recidivism

Science Advances, Julia Dressel and Hany Farid


from

Algorithms for predicting recidivism are commonly used to assess a criminal defendant’s likelihood of committing a crime. These predictions are used in pretrial, parole, and sentencing decisions. Proponents of these systems argue that big data and advanced machine learning make these analyses more accurate and less biased than humans. We show, however, that the widely used commercial risk assessment software COMPAS is no more accurate or fair than predictions made by people with little or no criminal justice expertise. We further show that a simple linear predictor provided with only two features is nearly equivalent to COMPAS with its 137 features.


Facebook’s Privacy Principles

Facebook


from

Facebook was built to bring people closer together. We help you connect with friends and family, discover local events and find groups to join. We recognize that people use Facebook to connect, but not everyone wants to share everything with everyone – including with us. It’s important that you have choices when it comes to how your data is used. These are the principles that guide how we approach privacy at Facebook.


EPSRC gives £14 million to projects that take new approaches to Data Science

UK Engineering and Physical Sciences Research Council


from

Five new research projects that take novel approaches to challenges in data science were announced today. The successful proposals were responding to a call (New Approaches to Data Science issued by the Engineering and Physical Sciences Research Council (EPSRC) which closed in January last year.

The research will be led by four universities, the University of Glasgow, Lancaster University, the University of Liverpool, and the University of Oxford.

Co-investigators will be drawn from the universities of Bristol, Cambridge, Liverpool, Oxford, and Warwick, Swansea University, the Natural Environment Research Council (NERC) Centre for Ecology and Hydrology, and Science and Technology Facilities Council’s (STFC) Hartree Centre.


Artificial Intelligence May Have Cracked Freaky 600-Year-Old Manuscript

Gizmodo, George Dvorsky


from

For Greg Kondrak, an expert in natural language processing at the University of Alberta, this seemed a perfect task for artificial intelligence. With the help of his grad student Bradley Hauer, the computer scientists have taken a big step in cracking the code, discovering that the text is written in what appears to be the Hebrew language, and with letters arranged in a fixed pattern. To be fair, the researchers still don’t know the meaning of the Voynich manuscript, but the stage is now set for other experts to join the investigation.

The first step was to figure out the language of the ciphered text. To that end, an AI studied the text of the “Universal Declaration of Human Rights” as it was written in 380 different languages, looking for patterns. Following this training, the AI analyzed the Voynich gibberish, concluding with a high rate of certainty that the text was written in encoded Hebrew. Kondrak and Hauer were taken aback, as they went into the project thinking it was formed from Arabic.


How Artificial Intelligence Realized Chipotle’s ‘Worst Queso Scenario’

Forbes, Christina Troitino


from

Following a tough series of highly publicized norovirous outbreaks over the summer, Chipotle Mexican Grill CMG -3.64%’s lackluster queso debut last September perfectly positioned the chain for a flurry of social-media-based schadenfreude. And while viral waves of backlash have been normalized in our current social media climate, Chipotle’s recent queso mishap has caught the attention of artificial intelligence-wielding financial analysts.

For a note titled “Worst Queso Scenario?” RBC trained a machine to assess how social media backlash impacted Chipotle. David Palmer, RBC Capital Markets’ restaurant and packaged food analyst, outlined how RBC Capital Markets’ research department uniquely used artificial intelligence to robustly understand how social media actions, like tweets and Google searches, could be linked to the chain’s future business health. Among other findings, the bank’s machine learning group found that negative tweets outnumbered positive tweets in the weeks after the product’s launch.


The Open Governance Research eXchange (OGRX) expands to include 999 Papers and 18 Partners – Help us meet new milestones in 2018!

NYU GovLab


from

The Open Governance Research eXchange (OGRX) began with a few ideas and a few key partners seeking to develop answers and solutions to an important question: How to make governance innovation more evidence based by making key findings on what works and why more accessible?

An ever-changing landscape not only requires innovation in how we seek to address major social problems. We also need to move away from faith based innovations to more evidence based ones – promoting and leveraging research across disciplines, and sharing the findings that emerge in a succinct and streamlined manner. Enter OGRX, the first and only platform where innovators can access and search curated research from different disciplines and countries on opening governance.

Today, The Governance Lab at the NYU Tandon School of Engineering and its partners are pleased to announce two exciting milestones for the OGRX community: (1) the addition of three new partners, and (2) the growth of our repository of curated publications.


Announcing Funding for 12 Development Data Innovation Projects

World Bank Data Team


from

We’re pleased to announce support for 12 projects which seek to improve the way development data are produced, managed, and used. They bring together diverse teams of collaborators from around the world, and are focused on solving challenges in low and lower middle-income countries in Sub-Saharan Africa, East Asia, Latin America, and South Asia.

Following the success of the first round of funding in 2016, in August 2017 we announced a $2.5M fund to support Collaborative Data Innovations for Sustainable Development. The World Bank’s Development Data group, together with the Global Partnership for Sustainable Development Data, called for ideas to improve the production, management, and use of data in the two thematic areas of “Leave No One Behind” and the environment. To ensure funding went to projects that solved real people’s problems, and built solutions that were context-specific and relevant to its audience, applicants were required to include the user, in most cases a government or public entity, in the project team. We were also looking for projects that have the potential to generate learning and knowledge that can be shared, adapted, and reused in other settings.


Animals worldwide stick close to home when humans move in

Nature, News, Emma Maris


from

Animals living in landscapes used intensively by people travel, on average, only half to one-third as far as animals in more remote areas do — a pattern that’s consistent across dozens of species worldwide. The finding, published today in Science1, has implications for important ecological processes linked to animal movement, such as seed transport and nutrient cycling. And it could spell trouble for the animals themselves as the climate changes.

More than 100 scientists around the world shared satellite-tracking data for 803 mammals from 57 species, from impala (Aepyceros melampus) to olive baboons (Papio anubis) and grizzly bears (Ursus arctos). The data charted the animals’ movements over timescales of up to ten days, and were correlated with a Human Footprint Index that measures how deeply our species has impacted a place, using metrics such as population density and the presence of roads and night-time lights.


Government Data Science News

In the government surveillance horror story of the week, Immigration and Customs Enforcement (ICE) now has agency-wide access to a real-time Vigilant Solutions database of every license plate in the US. Vigilant would not comment, but The Verge reports that they get their data from camera-equipped police cars and auto-repossession firms.



The Trump Administration is considering a nationalized 5G network paid for and operated by the federal government. Undertaken reportedly to guard against China, it looks a lot like a Chinese state-owned enterprise. Expect concerted pushback from the big telecoms.



The FBI dropped key tables from the 2016 Uniform Crime Report. The missing tables included data about the relationship between homicide victims and the people who killed them, potentially obscuring the decrease in domestic homicides that appear to correlate with the decline in marriage rates and increased economic status of women. Tables on drug arrests by type of drug are also missing. It is difficult to reliably study the social impact of the opioid epidemic without drug arrest data. FBI Director Christopher Wray said that these tables would eventually be released in testimony to the House Judiciary Committee. However, some in the White House think Wray may resign over the release of a memo written by House Intelligence Chairman, Republican Devin Nunes that makes the FBI look shady, part of what appears to be a Trump administration campaign to discredit the FBI while it conducts an investigation into Russian influence in the 2016 election. Meanwhile, the science community awaits the full release of the 2016 UCR.


Brenda Fitzgerald, current Director of the Centers for Disease Control is stepping down due to a series of recusals and a refusal to testify in front of Congress. The reason? Her financial interests, including investments in cancer detection and certain opioid concerns, make it difficult for her to appear unbiased. Additionally, Politico reported that she purchased shares in a tobacco company after taking the top job at the CDC.



Michael Schneider, a NIST researcher published a Science Advances paper solving an efficiency challenge for neuromorphic chips. Designed to run ‘brain-like’ software more quickly and efficiently, neuromorphic chips “accumulate small amounts of information from multiple sources, alter it to produce a different type of signal and fire a burst of electricity only when needed” just like mammalian neurons. One big remaining issue: the current iteration has to run at temperatures near absolute zero. This type of hardware will race quantum computing to market; by my estimates both are 8-10 years out.

Pittsburgh, in partnership with researchers from Carnegie Mellon University has a new model that can show a non-residential building’s fire risk more accurately than existing techniques. They are about 57% accurate. Now fire inspectors can hit high-risk buildings and rectify hazardous conditions.


Filmmakers at Sundance Explore New Narratives Using Artificial Intelligence

Intel, iQ blog, Deb Miller Landau


from

At Sundance, visitors to the Intel Tech Lodge on Park City’s Main Street could see demos by Vancouver-based Ziva Dynamics, whose software uses machine learning algorithms to create computer-generated simulations that can transform the way special effects studios create characters.

“Our software equips people with the tools to create the most realistic representations of creatures and characters as possible,” said VFX pioneer James Jacobs, Ziva’s co-CEO who won a SciTech Academy Award in 2013 for the engineering that powered characters in Avatar, The Hobbit and Planet of the Apes, among others.

Ziva’s software uses offline finite element simulations combined with geometric warping and machine learning to create real-time characters that can progressively learn, according to Jacobs. Instead of learning from time consuming and expensive scans, the software can synthesize data from the simulation itself — in minutes.

 
Events



Love Data Week – formerly Love Your Data Week

Heather Coates; Heidi Beke-Harrigan; Carla Graebner; Megan Hardeman; Julie Vecchio; Stephanie Wright


from

Worldwide February 12-16. “Similar to Open Access Week, the purpose of the Love Data Week (LDW) event is to raise awareness and build a community to engage on topics related to research data management, sharing, preservation, reuse, and library-based research data services. We will share practical tips, resources, and stories to help researchers at any stage in their career use good data practices.”


Panel Discussion Set for February 25: “The Role of the University in an Era of Science Skepticism and ‘Fake News’”

Virginia Tech, Center for Communicating Science


from

Blacksburg, VA “The Center for Communicating Science is honored to present a panel discussion on atopic critical for our time: ‘The Role of the University in an Era of Science Skepticism and Fake News.’ Featuring panelists from on and off campus and moderated by science communicator and comedian Brian Malow, the event will take place at 3 p.m. Sunday, February 25, in the Anne and Ellen Fife Theatre at the Moss Arts Center.” [free]

 
Tools & Resources



Heuristics for Scientific Writing (a Machine Learning Perspective)

Zachary C. Lipton, Approximately Correct blog


from

“These days, as I work with younger students, teaching them how to write clear scientific prose, I find myself repeating these one-liners, and occasionally inventing new ones.”

“The following list consists of easy-to-memorize dictates, each with a short explanations. Some address language, some address positioning, and others address aesthetics. Most are just heuristics so take each with a grain of salt, especially when they come into conflict. But if you’re going to violate one of them, have a good reason. This can be a living document, if you have some gems, please leave a comment.”


minigo: An open-source implementation of the AlphaGoZero algorithm

GitHub – tensorflow


from

“This is a pure Python implementation of a neural-network based Go AI, using TensorFlow. While inspired by DeepMind’s AlphaGo algorithm, this project is not a DeepMind project nor is it affiliated with the official AlphaGo project.”


From big data to fast data

O'Reilly Radar, Raul Estrada


from

Enterprise data needs change constantly but at inconsistent rates, and in recent years change has come at an increasing clip. Tools once considered useful for big data applications are not longer sufficient. When batch operations predominated, Hadoop could handle most of an organization’s needs. Development in other IT areas (think IoT, geolocation, etc.) have changed the way data is collected, stored, distributed, processed and analyzed. Real-time decision needs complicate this scenario and new tools and architectures are needed to handle these challenges efficiently.

Think of the 3 V’s of data: volume, velocity, and variety. For a while big data emphasized data volume; now fast data applications mean velocity and variety are key. Two tendencies have emerged from this evolution: first, the variety and velocity of data that enterprise needs for decision making continues to grow. This data includes not only transactional information, but also business data, IoT metrics, operational information, and application logs. Second, modern enterprise needs to make those decisions in real time, based on all that collected data. This need is best clarified by looking at how modern shopping websites work.

 
Careers


Full-time positions outside academia

Community manager for books



Collaborative Knowledge Foundation; San Francisco, CA

Omidyar Network Logo Director, Tech and Society Solutions Lab



Omidyar Network; Redwood City, CA
Postdocs

Machine Learning for Time Domain​ Science



University of California-Berkeley, Department of Astronomy; Berkeley, CA

Leave a Comment

Your email address will not be published.