Data Science newsletter – May 8, 2017

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 8, 2017

GROUP CURATION: N/A

 
 
Data Science News



Another Group of Google Veterans Starts a Self-Driving Technology Company

BloombergTechnology, Alex Webb


from

A mile away from where Google builds the maps used by people around the world, a 25-person startup is trying to do something similar for robots.

DeepMap Inc., which was founded by mapping veterans of Alphabet Inc., is building systems enabling self-driving cars to steer through complex cityscapes. DeepMap plans to license its map-building software to automakers and technology companies looking to teach cars how to drive.


Company Data Science News

Paperspace is a start-up offering cheap machine-learning-as-a-service (MLaaS) in secure shell or web-based Linux instances. Students and postdocs may want to give it a try – it’s easier and cheaper than AWS. While some are skeptical that machine learning practitioners are thwarted by getting set up in AWS, Caroline Sinders makes the case that machine learning desperately needs better UX. Her argument is congruent with those calling for more algorithmic transparency and explainability.

Time Warner is trying to make its three-year-old Turner subsidiary the Spotify of TV. They call themselves a “fan engagement company” and intend to fight for our attention by remastering the mobile experience – clickable CNN feeds?, allowing us to create TV playlists, then using our playlists + machine learning to predict/construct live linear feeds of content snippets drawn from across the network.

Speaking of Spotify, Wired tells the story of how a relatively unknown artist, Starley Hope, hit the Billboard top 100 chart thanks to Spotify’s human + algorithmic cooperative boost.

Brendan Frey, a University of Toronto professor and founder of Deep Genomics, is pivoting to drug development. Frey will start by focusing on “drugs for Mendelian disorders, inherited diseases that result from a single genetic mutation”. I can see a strong case for allowing professors to found start-ups. This is typically frowned upon because it takes professors away from teaching. I imagine many postdocs will not only receive excellent training as a result of working with Frey, but Deep Genomics may be better positioned to offer them full-time, permanent employment than academia.

Entrupy is a start-up that uses computer vision algorithms to detect counterfeit products. Here’s a kicker: Yann LeCun is an investor. Cashing in after surviving the bitter years of AI winter. Nice.

Grammarly raised $110m from VCs to instantly correct grammatical errors as users type. (With the lessons we’ve learned from auto-correct and Microsoft’s talking paperclip Clippy, I anticipate a mix of comedy, confusion, irritation, and occasional gratitude.) The company seems to have dropped a serious ad-spend on youtube (source: my personal viewing experience). As a freshmen writing professor, I wonder how much impact dropping $110m on better grammatical instruction would have. I find my students eager – even demanding – when it comes to pumping me for information about the rules of grammar and style. It is possible I have the world’s best students.

Microsoft Ventures invested in start-up Agolo out of Columbia University, a company that wants to “defeat information overload”. Don’t you love how aggressive the language is in tech? A synonym might be: summarize. Or, if you’re feeling meditative, which is so hot right now, you may prefer distill.

Facebook is launching a convolutional neural net (CNN) approach to language translation.

Nvidia is opening the Deep Learning Institute from which it aims to train 2,000 developers and data scientists in applied AI techniques. Courses cost only $30! Where academia cannot keep up with STEM educational demands, industry may step in and carry some of the training load.


Every data point has a human story

World Bank, The Data Blog, Raka Banerjeee


from

Citizens4Earth follows Talip Kilic from the World Bank’s Living Standards Measurement Study program as he travels to far-flung rural communities in central and southwestern Uganda, along with the survey teams for the Uganda National Panel Survey (UNPS). In the episode, James Muwonge (Director of Socioeconomic Surveys at the Uganda Bureau of Statistics) explains why household surveys like the UNPS are so important for investment decisions and policy-making, particularly in developing countries like Uganda.


How Technology is Transforming Media Consumption

Cornell Tech, News & Views


from

Turner will soon be able to use technology to serve individualized content based on people’s preferences. “Personalized networking is going to completely transform how people enjoy media. It takes on demand to a whole new level,” Martin said, “The content you watch will be increasingly customized to you.”

Martin gave the example of Spotify, highlighting that Turner wants to bring that level of customization to video. As the technology evolves, people will be able to create content lists, archive it, put it in libraries, and access it whenever and wherever they want. If people don’t want to do it on an on demand basis, they can be served different live linear streams.


An AI-Driven Genomics Company Is Turning to Drugs

MIT Technology Review, Will Knight


from

Deep Genomics, a Canadian company that uses machine learning to trace potential genetic causes for disease, announced Tuesday that it’s getting into drug development. It joins a growing list of AI companies betting that their techniques can help produce powerful new drugs by finding subtle signals in huge quantities of genomic data.

Deep Genomics was founded by Brendan Frey, a professor at the University of Toronto who specializes in both machine learning and genomic medicine. His company uses deep learning, or very large neural networks, to analyze genomic data. Identifying one or more genes responsible for a disease can help researchers develop a drug that addresses the behavior of the faulty genes.


The future is in interactive storytelling

The Conversation, Noah Wardrip-Fruin and Michael Mateas


from

What is out there for the player who wants to explore on his or her own in rich universes like the ones created by Marvel? Not much. Not yet. But the future of media is coming.

As longtime experimenters and scholars in interactive narrative who are now building a new academic discipline we call “computational media,” we are working to create new forms of interactive storytelling, strongly shaped by the choices of the audience. People want to explore, through play, themes like those in Marvel’s stories, about creating family, valuing diversity and living responsibly.

These experiences will need compelling computer-generated characters, not the husks that now speak to us from smartphones and home assistants. And they’ll need virtual environments that are more than just simulated space – environments that feel alive, responsive and emotionally meaningful.


Seven Ways to Transform 21st-Century Economics — and Economists

Evonomics, Kate Raworth


from

What if we started economics with humanity’s goals for the 21st century, and then asked what economic mindset would give us half a chance of achieving them?

Spurred on by this question, I pushed aside my old economics textbooks and sought out the best emerging ideas that I could find, drawing on diverse schools of thought including complexity, ecological, feminist, behavioural and institutional economics, and set out to discover what happens when they all dance on the same page. The insights that I drew out imply that the economic future will be fascinating, but wildly unlike the past, so long as we equip ourselves with the mindset needed to take it on. So here are seven ways in which I believe we can all start to think like 21st century economists:

1. Change the goal: from GDP growth to the Doughnut.


Microsoft CEO Says Company’s Next Smartphone Might Not Look Like One

Digital Trends, Kyle Wiggers


from

Microsoft’s next phone may not look like a phone. That is according to company CEO Satya Nadella, who sat down with Marketplace’s “Make Me Smart” for a lengthy interview about Microsoft’s new education platforms.

“We make phones today, we have OEMs like HP making phones and others and we picked a very specific area to focus on which is management, security, and this one particular feature that we have called Continuum, which is a phone that can even be a desktop,” Nadella explained. “So when you say, ‘When will we make more phones,’ I’m sure we’ll make more phones. But they may not look like phones that are there today.”


The Secret Hit-Making Power of the Spotify Playlist

WIRED, Gear, David Pierce


from

Starley Hope thought she was done with the music biz after writing “Call on Me.” The song felt like her last shot at success after spending years trying to make it as a songwriter. “I decided I was going to give it up and get a ‘real job,’” she told the website Music Love. She released the track as her debut single in July, expecting it to be her swan song.

It wasn’t. “Call on Me” currently sits at No. 80 on the Billboard Hot 100. It hit number one on the Swedish charts, and went gold or platinum in nine countries. Hope wrote a certified hit single, one dwarfed only by the remix from Aussie DJ Ryan Riback. Although you hear it all over the radio now, Spotify users found themselves bouncing along to her synthy strains seven months ago.

The song owes much of its success to Spotify playlists.


The fintech revolution is nigh. Our next move is critical.

The Hill, Allan Grody


from

The U.S. needs a comprehensive fintech and regtech (regulatory technology) plan to efficiently deploy DLT. Without such a plan, we may have set in motion competitive forces prematurely, while pulling things out of current regulations that have significant implications for the financial system’s digital future.

For example, the Dodd-Frank replacement bill, The Financial CHOICE Act, eliminates the Office of Financial Research (OFR). The rationale for elimination focuses almost exclusively on its duplication of economic analysis done by multiple federal agencies. This analysis fails to recognize the OFR’s key role in driving data standardization, a necessity for making both DLT and systemic risk analysis possible.


Physiognomy’s New Clothes

Medium, Blaise Aguera y Arcas


from

The practice of using people’s outer appearance to infer inner character is called physiognomy. While today it is understood to be pseudoscience, the folk belief that there are inferior “types” of people, identifiable by their facial features and body measurements, has at various times been codified into country-wide law, providing a basis to acquire land, block immigration, justify slavery, and permit genocide. When put into practice, the pseudoscience of physiognomy becomes the pseudoscience of scientific racism.

Rapid developments in artificial intelligence and machine learning have enabled scientific racism to enter a new era, in which machine-learned models embed biases present in the human behavior used for model development. Whether intentional or not, this “laundering” of human prejudice through computer algorithms can make those biases appear to be justified objectively.

A recent case in point is Xiaolin Wu and Xi Zhang’s paper, “Automated Inference on Criminality Using Face Images.”


University Data Science News

University of Oregon professor Mas Subramanian created a new pigment of blue while looking for new materials to use in electronics. Called YInMn blue because it is made of yttrium, indium, and manganese, we should not be surprised that Crayola is hosting a contest to come up with a better name. Subramanian, true to his engineering background, has filed a patent. Subramanian will not be the first to own a hue of blue, joining artist Yves Klein who ‘owns’ International Klein Blue, though his rights are only recognized in France.


Ed Lazowska, professor of computer science at the University of Washington, was named Geek of the Year at the GeekWire awards. Congrats Ed!! Well-deserved. I think. I don’t quite know what it means to win Geek of the Year, but it sounds (mostly) good. UW has truly been a leader among the three Moore-Sloan Data Science Environment schools.

Let’s get it together when it comes to the positive results publishing bias. Tania Bubela, a professor at the University of Alberta in Canada, found that only 45 percent of completed stem cell clinical trials were ever published in academic journals. Unsurprisingly (but so not OK), “Bubela and her colleagues found that 67.3 percent of the studies reported positive outcomes even though the trials were early-stage, safety-focused studies.”

Daniel Katz from the law school at Illinois Institute of Technology used a random forest model to predict US Supreme Court case outcomes with 70 percent accuracy. This is better than what human legal scholars have been able to do, but I know some of you reading this are thinking you can do better with a different model. Right? 😉

UC-Berkeley is bridging a known training gap by offering data science short courses to professors and other instructors.

The University of Pittsburgh has a new School of Computing and Information to which Paul R. Cohen was just named dean. Cohen was previously a program manager at DARPA specializing in AI.

Daniel Larremore (Santa Fe Institute) and co-authors published a paper in Nature last week that proves community detection in networks will never be ‘solved’ by a universal algorithm. Hallelujah. In a nutshell, their argument is: “any algorithm that’s exceptionally good at finding communities in one type of network must be exceptionally bad at finding communities in another.”

The University of Texas system bought a piece of land in Houston for $215m to house a University of Houston data science institute. Instead, the pricey purchase seriously upset Texas lawmakers and UT heavyweights who shut the plan down. It may be resuscitated, this time with Rice, Texas Southern University, Texas A&M University, and UT joining UH. University politics can be dark and twisty, but this one seems to have much larger financial implications than most. Everything is bigger in Texas.

Bad management is rumored to be a problem for Alphabet company Verily which has just lost Thomas Insel, its lead behavioral scientist and former head of the National Institute of Mental Health. His departure was preceded by Vikram Bajaj Verily’s chief scientific officer at the time and Dr. Mark Lee. STAT news reports that Verily CEO Andrew Conrad frustrates his colleagues by launching projects that aren’t feasible and being imperious. Good management is absolutely key to any fledgling endeavor.

Biotia, a Cornell Tech and Weill Cornell Medicine start-up, aims to monitor microbial environments in hospitals by swabbing and genetically sequencing “their high-risk environments, monitoring hygiene, identifying pathogens and tracking antibiotic resistance”. The company is ready for a Series A round of funding as soon as 2018. This is an excellent example of the current and near-term explosion of precision medicine applications coming to fruition.

At Stanford, a precision medicine application allows doctors to titrate drug dosages to individuals in real-time. In this case, the individuals are mice.


Whiskey and Blackjack: What Machine Learning Teaches Humans about Learning

YouTube, Stanford Graduate School of Business


from

Much of the latest developments in artificial intelligence has been about teaching machines to learn like humans—but is there something machines can teach us about learning? Eugene Nho, MBA/MS student in intelligent systems (CS/EE/Neuroscience) at Stanford, explores how a family of machine learning called reinforcement learning can shed light on the way we learn from our experience.


Stem Cell Trial Data Mostly Go Unpublished

The Scientist Magazine®, Anna Azvolinsky


from

Less than half of completed stem cell studies in humans are published in peer-reviewed journals, according to an analysis of regenerative medicine trials.


When Ocean and Atmosphere Couple, the Climate Wobbles

Eos, Emily Underwood


from

Does an atmospheric jet weaken because of changes in ocean current or temperature, for example, or does it work the other way around, with the ocean responding to changes in the atmosphere? Or is it some of both? In a new study, Vannitsem and Ghil present a new mathematical tool for determining what drives the system: ocean, atmosphere, or the two combined.


Deep learning tech identifies counterfeit goods

CNBC, Nyska Chandran


from

Entrupy is a portable scanning device that instantly detects imitation designer bags by taking microscopic pictures that take into account details of the material, processing, workmanship, serial number, and wear/tear. It then employs the technique of deep learning to compare the images against a vast database that includes top luxury brands and if the bag is deemed authentic, users immediately get a Certificate of Authenticity.

After launching as a paid service in September 2016, the New York-based venture now has over 130 paid customers, almost all of whom are American businesses drawn to the 97.1 percent accuracy rate, explained Entrupy CEO Vidyuth Srinivasan.


In controversial move, Brazil may outsource Amazon deforestation monitoring

Science, ScienceInsider, Erik Stokstad


from

In a major change, Brazil’s Ministry of the Environment is looking for a company to help it monitor deforestation in the Amazon. “This is a surprise for everyone … crazy stuff,” says Tasso Azevedo, coordinator of the Greenhouse Gas Emission Estimate System and Observatório do Clima in São Paulo and former head of the Brazilian Forest Service. The controversial proposal led to the firing of one of the ministry’s top scientists, who is a vice president of the Intergovernmental Panel on Climate Change (IPCC).


Artificial intelligence prevails at predicting Supreme Court decisions

Science, Latest News, Matthew Hutson


from

“See you in the Supreme Court!” President Donald Trump tweeted last week, responding to lower court holds on his national security policies. But is taking cases all the way to the highest court in the land a good idea? Artificial intelligence may soon have the answer. A new study shows that computers can do a better job than legal scholars at predicting Supreme Court decisions, even with less information.

Several other studies have guessed at justices’ behavior with algorithms. A 2011 project, for example, used the votes of any eight justices from 1953 to 2004 to predict the vote of the ninth in those same cases, with 83% accuracy. A 2004 paper tried seeing into the future, by using decisions from the nine justices who’d been on the court since 1994 to predict the outcomes of cases in the 2002 term. That method had an accuracy of 75%.

 
Events



Winner Takes All: How AI Is Learning About Us By Playing Classic Video Games and Why the Results Are So Shocking (workshop) – Tech2025

Tech 2025


from

New York, NY Interactive workshop featuring guest instructor, Julian Togelius (Assoc. Professor, Artificial Intelligence in Games, NYU Tandon School of Engineering), on training algorithms using various types of games. May 25 at location TBA. [$$]

 
Deadlines



Face Recognition Prize Challenge (FRPC)

From the Challenge.gov webpage, participants will be directed to register with the National Institute of Standards and Technology. Registration closes on June 15, 2017.

Big Data on Human and Social Sciences – History, Issues and Challenges

Lisbon, Portugal The Instituto de História Contemporânea / Institute for Contemporary History and the History Lab at Columbia University will be hosting an international conference to examine the challenges and impact of ‘Big Data’ in the human and social sciences. Conference is November 6-7. Deadline for paper submissions is July 31.
 
NYU Center for Data Science News



The Writer partners with New York University to deploy AI to discern meaning, tone and sentiment to words

The Drum, Laurie Fullerton


from

The language consultancy, The Writer, has today announced its collaboration with New York University to support research into Natural Language Processing (NLP) – the branch of computer science that deals with analyzing and generating human language.

 
Tools & Resources



IARPA Announces Publication of Data from the Good Judgment Project

Office of the Director of National Intelligence


from

The Intelligence Advanced Research Projects Activity, within the Office of the Director of National Intelligence, announced that forecasting data collected during IARPA’s Aggregative Contingent Estimation—ACE—program by team Good Judgment is now available for use by the public and the research community via https://dataverse.harvard.edu/dataverse/gjp. This data set includes millions of forecasts made by participants over the four years of the ACE forecasting competition and led to many noteworthy findings regarding the accuracy of human judgment about geopolitical events, including the finding that averaging the most historically accurate forecasters outperforms individual forecasters.


News articles scraped from posts during 2012 and 2013 in the popular news community on reddit: r/worldnews.

GitHub – BenjaminDHorne


from

We extract posts from 2012 and 2013 for one popular news community on reddit: r/worldnews. Once we extract all posts, we extract the voting score, number of comments, post title, and news story urls from each post. These news story urls are used to scrape a sample of news articles, including the body text and title text, using a mix of our own code and the Python Goose library. We will filter out any article that is under 100 characters or blocked by a paywall. This reddit data comes from Tan and Lee’s reddit post data set (Tan and Lee 2015) and Hessel et al.’s full comment tree extension to that reddit dataset (Hessel, Tan, and Lee 2016).


[1705.01509] Neural Models for Information Retrieval

arXiv, Computer Science > Information Retrieval; Bhaskar Mitra, Nick Craswell


from

Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query. Traditional learning to rank models employ machine learning techniques over hand-crafted IR features. By contrast, neural models learn representations of language from raw text that can bridge the gap between query and document vocabulary. Unlike classical IR models, these new machine learning based approaches are data-hungry, requiring large scale training data before they can be deployed. This tutorial introduces basic concepts and intuitions behind neural IR models, and places them in the context of traditional retrieval models. We begin by introducing fundamental concepts of IR and different neural and non-neural approaches to learning vector representations of text. We then review shallow neural IR methods that employ pre-trained neural term embeddings without learning the IR task end-to-end. We introduce deep neural networks next, discussing popular deep architectures. Finally, we review the current DNN models for information retrieval. We conclude with a discussion on potential future directions for neural IR.

 
Careers


Postdocs

Postdoctoral Fellowships – Biomedicine and Bioinformatics – Job Opportunities – University of Cambridge



University of Cambridge; Cambridge, England
Full-time positions outside academia

Visual Information and Production Specialist



Congressional Budget Office; Washington, DC

Leave a Comment

Your email address will not be published.