Data Science newsletter – May 16, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for May 16, 2017

GROUP CURATION: N/A

 
 
Data Science News



With Isaac, Nvidia Trains Robots in Virtual Environments

Design News, Chris Wiltz


from

It’s like training robots using The Matrix. Nvidia has announced Isaac, a virtual simulator that it says will allow robots to learn significantly faster than in the physical world.


Under Trump, inconvenient data is being sidelined

The Washington Post, Julie Eilperin


from

The Trump administration has removed or tucked away a wide variety of information that until recently was provided to the public, limiting access, for instance, to disclosures about workplace violations, energy efficiency, and animal welfare abuses.

Some of the information relates to enforcement actions taken by federal agencies against companies and other employers. By lessening access, the administration is sheltering them from the kind of “naming and shaming” that federal officials previously used to influence company behavior, according to digital experts, activists and former Obama administration officials.

The Occupational Safety and Health Administration, for instance, has dramatically scaled back on publicizing its fines against firms.


Germany greenlights self-driving vehicle tests on public roads

ReadWrite, David Curry


from

Germany’s large automotive industry will soon be able to test self-driving cars in their home-country. The upper house of parliament approved a law on Friday that enables manufacturers to test vehicles on public roads for the first time.

The legislation requires a human driver to remain in the vehicle at all times and a black box must be kept inside the car to ascertain when the computer takes control of the vehicle.


The Sloan Foundation funds OpenCitations

OpenCitations


from

The Alfred P. Sloan Foundation, which funds research and education in science, technology, engineering, mathematics and economics, including a number of key technology projects relating to scholarly communication, has agreed to fund The OpenCitations Enhancement Project, a new project to develop and enhance the OpenCitations Corpus.

As readers of this blog will know, the OpenCitations Corpus is an open scholarly citation database that freely and legally makes available accurate citation data (academic references) to assist scholars with their academic studies, and to serve knowledge to the wider public.


Penn Medicine’s Alzheimer’s Genetics Data Storage Site Receives Five-Year Renewal to Total $7.7 Million

Penn Medicine News


from

Penn Medicine’s Alzheimer’s disease genetics data repository has received a five-year funding renewal from the National Institute on Aging (NIA) of the National Institutes of Health. The award is expected to total $7.7 million over five years.

A one-stop portal, the NIA Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS), gathers, processes, and organizes massive stores of genetic (and limited, but significant, phenotypic) data and related findings for researchers to explore. This enables in-depth analysis, potentially leading to the discovery of causal genetic variants, disease mechanisms, drug targets, and possible prevention and cure strategies. The site receives data from NIA-funded and other Alzheimer’s disease and Alzheimer’s disease related dementias genetics studies nationwide that it shares with qualified investigators. Much of this research is conducted at 32 NIA-funded nationwide Alzheimer’s Disease Centers, including the Penn Alzheimer’s Disease Core Center.


Why Amazon is eating the world

TechCrunch, Zack Kanter


from

I co-founded a software startup in December. Each month, I send out an update to our investors to keep them updated on our progress. But the past month was a bit different — our industry (retail) is going through a transformation.

Instead of just writing about our “internal” news, I wrote about the impending apocalypse in the broader world of retail. More specifically, I included some thoughts on Amazon and why their commanding lead is only going to get larger. Amazon is the most impressive company on earth, and I think it is one of the least understood. A few people suggested that I post this publicly, so here goes.


Don’t Grade Teachers With a Bad Algorithm

Bloomberg View, Cathy O'Neil


from

For more than a decade, a glitchy and unaccountable algorithm has been making life difficult for America’s teachers. The good news is that its reign of terror might finally be drawing to a close.


Engineering researchers apply data science to better predict effect of weather and other conditions on solar panels

Case Western University, The Daily


from

In a new study, a team of researchers from Case Western Reserve University and Gebze Technical University (GTU) in Turkey used data science to determine and predict the effects of exposure to weather and other conditions on materials in solar panels.

Using data science to predict the deterioration of such materials could lead to finding new ways to extend their lifetime, the researchers say.

“This makes the cost of solar energy—in this case—better and easier to understand,” said Case Western Reserve Professor Laura Bruckman, who directed the study with Abdulkerim Gok, a research associate at GTU formerly with Case Western Reserve.


UT, Oak Ridge National Laboratory to launch new data science doctoral program

WATE 6


from

The University of Tennessee and Oak Ridge National Laboratory announced the launch of a new doctoral program in data science Monday.

The data science and engineering program will be a part of the Bredesen Center for Interdisciplinary Research and Graduation Education. It is expected to begin in the fall.


Company Data Science News

Crowdstrike raised $100m in a new funding round. CrowdStrike is led by heavy-hitters in the cybersecurity space: “CEO [George Kurtz] is the former chief technology officer of McAfee, and his co-founder and chief technology officer Dmitri Alperovitch is McAfee’s former head of threat research.”

IBM has new quantum chips that use 17 qubits (up from five), proving that there are advantages to being an old kid on the block. The wise ones among us know how crucial hardware is to technological revolutions like machine learning and artificial intelligence. Google earlier trumpeted plans to build and test a 49-qubit device at some point during 2017.

Etsy is a Brooklyn-based craft sales market place and certified b-corp. Being a b-corp allows it to have Brooklyn-y habits legally. For instance, they can pay employees to improve workplace culture and compost food waste. Etsy is one of only two publicly traded companies to have b-corp status…and it may well lose that status as we tick into the next calendar year. If you want to understand the nuts and bolts of inserting positive social impact into corporations where “positive” is usually synonymous with “revenue” or better yet, “profit”, then this is a must-read.

Improbable, a startup in the UK founded by two computer science grads from the University of Cambridge reeled in a colossal amount of VC funding ($502m). The company simulates systems at scale. So far they have “debuted a working simulation of the internet’s entire underlying infrastructure” and “a working mock-up of the city of Cambridge, with 130,000 virtual inhabitants…traffic and public-transport networks, utilities, power lines and mobile-phone and internet systems”. This company demonstrates aspects of data science coming together like very few others I have seen. Get familiar with Improbable now.

Andreessen Horowitz published a web-based primer on AI. It’s good. There’s an accompanying github repo. Color me surprised by AZ16. I’m reconsidering my ideas about which institutions and individuals are doing the community building work in data science.

The Sidewalk Labs project based at Google NYC has been quieter than I expected. This month Dan Doctoroff hinted that Sidewalk Labs may be partnering with Toronto to build part of Quayside, a 2000 acre development. Pulling language from the RFP, the tech of the project would, “serve as a test bed for emerging technologies, materials, approaches and processes that will address these challenges and create a new global benchmark for sustainable, inclusive and accessible urban development”. That type of world-making is definitely what I expected.

The Partnership on AI added eight new corporate members (Intel, Salesforce, eBay, Sony, SAP, McKinsey & Company, Zalando and Cogitai) and 14 new non-profit members (Allen Institute for Artificial Intelligence, the AI Forum of New Zealand, the Centre for Democracy & Technology, the Centre for Internet and Society – India, Data & Society Research Institute, the Digital Asia Hub, the Electronic Frontier Foundation, the Future of Humanity Institute, the Future of Privacy Forum, Human Rights Watch, the Leverhulme Centre for the Future of Intelligence, UNICEF, Upturn, and the XPRIZE Foundation).

Cardiogram, in partnership with University of California San Francisco, modified an Apple watch that can “detect atrial fibrillation with 98.04% sensitivity and 90.2% specificity” using a deep neural network.

Microsoft is now offering certifications in data science, encroaching on the credentialing role of the university. If unis don’t offer the training industry needs fast enough and for the types of people who want it, industry will step in and do it themselves. Cloudera and Hortonworks offer Hadoop certification, which meet a real need.

Do some architectural gawking at Apple’s newly finished headquarters.

You may not want to gawk, but Apple is also building a giant data center in the desert outside Reno.

In the section we reserve for world-leading tech people who spoke publicly last week: Bill Gates wrote a letter to this year’s graduating class in which he lists artificial intelligence as the top field in which to make a difference right now. Jeff Bezos said much the same in a fireside chat with the World Economic Forum. He called AI an “enabler” (in a good way! not in a ‘have another cookie/drink/episode’ kind of way) and went on to enthuse that AI ““will empower and improve every business, every government organization, every philanthropy”.

Zillow is getting sued by a homeowner in Chicago over its Zestimate product. Zestimate uses algorithms to estimate the market value of homes and displays them next to the list price. It will be interesting to see if the legal challenge – that Zillow needs to be licensed in order to offer housing appraisals – will hold. This case may have widespread implications for data science applications in general.


‘MARRVEL’ new way to compile big data in human-centric way

Baylor College of Medicine


from

When a group of researchers in the Undiagnosed Disease Network at Baylor College of Medicine realized they were spending days combing through databases searching for information regarding gene variants, they decided to do something about it. By creating MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration) they are now able to help not only their own lab but also researchers everywhere search databases all at once and in a matter of minutes.

This collaborative effort among Baylor, the Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital (NRI) and Harvard Medical School is described in the latest online edition of the American Journal of Human Genetics.


$25 million gift to fund unique new retail school

McGill University, McGill Reporter


from

The proposed Bensadoun School of Retail Management will offer academic programs from the undergraduate to the PhD levels with input from global retailers, conduct research on the fast-changing sector, and feature a retail innovation lab to test out new ideas, technologies and products within an academic setting.


Why AI researchers like video games

The Economist


from

LAST year Artur Filipowicz, a computer scientist at Princeton University, had a stop-sign problem. Dr Filipowicz is teaching cars how to see and interpret the world, with a view to them being able to drive themselves around unaided. One quality they will need is an ability to recognise stop signs. To that end, he was trying to train an appropriate algorithm. Such training meant showing this algorithm (or, rather, the computer running it) lots of pictures of lots of stop signs in lots of different circumstances: old signs and new signs; clean signs and dirty signs; signs partly obscured by lorries or buildings; signs in sunny places, in rainy places and in foggy ones; signs in the day, at dusk and at night.

Obtaining all these images from photo libraries would have been hard. Going out into the world and shooting them in person would have been tedious. Instead, Dr Filipowicz turned to “Grand Theft Auto V”, the most recent release of a well-known series of video games. “Grand Theft Auto V” is controversial because of its realistic portrayal of crime and violence—but from Dr Filipowicz’s point of view it was ideal, because it also features realistic stop signs. By tinkering with the game’s software, he persuaded it to spit out thousands of pictures of these signs, in all sorts of situations, for his algorithm to digest.


Building a smarter Hacker News

TechCrunch, Ron Miller


from

With only 24 hours to build an app, it’s never easy to come up with a viable idea, nevermind build one that involves training a machine learning algorithm. Yet that’s exactly what a team of four friends accomplished this weekend at the annual TechCrunch Disrupt New York Hackathon.

The team — made up of twin brothers Daniel and David Robinson, Nathan Gould and Chris Riederer — built a website called Tagger News that takes a subset of Hacker News articles, analyzes the content using machine learning algorithms and applies appropriate subject tags. The team included two data scientists, a product architect at a blockchain company and a Columbia computer science PhD student. Oh, and David Robinson co-wrote a book called Text Mining with R for O’Reilly Media. Clearly they didn’t lack skills or brain power.


Dow Jones launches Data, News and Analytics (DNA) Platform

insideBIGDATA


from

Dow Jones has announced the beta launch of DNA, a platform that empowers third-party businesses to access and integrate Dow Jones news and data into their own solutions.

With the launch of DNA (“Data, News and Analytics”), Dow Jones is opening up its unrivaled archive of premium news and data to customers, partners and their developers for the first time in its 130-year history.

This single platform gives customers and partners the enhanced ability to integrate Dow Jones information into their systems and workflows to power custom solutions such as process automation, predictive modeling and data mining.


Making Progress Toward Open Data: Reflections on Data Sharing at PLOS ONE

PLOS, EveryONE blog, Meg Byrne


from

Since its inception, PLOS has encouraged data sharing; our original data policy (2003 – March 2014) required authors to share data upon request after publication. In line with PLOS’ ethos of open science and accelerating scientific progress, and in consultation with members of the wider scientific community, PLOS journals strengthened their data policy in March 2014 to further promote transparency and reproducibility.[1] This move was viewed as controversial by many, particularly for PLOS ONE, the largest and most multidisciplinary journal to ever undertake such a mandate. In this post, we look at our experience so far.

In an effort to make data sharing a more integral part of publication and scientific output, PLOS’ data availability policy requires data underlying the results presented in papers to be fully available at time of publication, unless certain restrictions apply.


The Year Ahead for GPU Accelerated Supercomputing

The Next Platform, Nicole Hemsoth


from

GPU computing has deep roots in supercomputing, but Nvidia is using that springboard to dive head first into the future of deep learning.

This changes the outward-facing focus of the company’s Tesla business from high-end supers to machine learning systems with the expectation that those two formerly distinct areas will find new ways to merge together given the similarity in machine, scalability, and performance requirements. This is not to say that Nvidia is failing the HPC set, but there is a shift in attention from what GPUs can do for Top 500 class machines to what graphics processors can do for AI supercomputers.

The last several annual GPU Technology Conference (GTC) events have focused on Nvidia’s expanding supercomputing presence, beginning in earnest with the launch of the Titan machine at Oak Ridge National Laboratory in 2012, which was the top supercomputer in the world—and the first of its kind in terms of its GPU count. At this year’s GTC, however, we heard far more about slinging together DGX-1 appliances to build AI supercomputers than we did about future Volta-based GPU supercomputers like the forthcoming Summit and Sierra machines.


A16Z AI Playbook

Andreessen Horowitz


from

Artificial Intelligence (AI) is a set of computer science techniques that, as Stanford professor Andrew Ng is fond of saying, gives your software super powers.

Building on our Primer on Artificial Intelligence, this microsite is intended to help newcomers (both non-technical and technical) begin exploring what’s possible with AI. We’ve met with hundreds of Fortune 500 / Global 2000 companies, startups, and government policy makers asking: “How do I get started with artificial intelligence?” and “What can I do with AI in my own product or company?”

 
Events



International Conference for Free and Open Source Software for Geospatial

The Open Source Geospatial Foundation


from

Boston, MA FOSS4G Boston 2017 is August 14-19 [$$$]


O’Reilly Artificial Intelligence Conference

O'Reilly Media


from

San Francisco, CA September 17-20 [$$$$]


Cambridge Data Science Summit

Cambridge Spark


from

Cambridge, England June 9-10 at Sanger Institute [$$$]

 
Deadlines



NIPS 2017 Dates and Deadlines

Long Beach, CA NIPS 2017 is December 4-9. Deadline for paper submissions is May 19.

asa theory section offering free grad student memberships

The ASA Theory Section is looking to reach out to graduate students who may have theoretical interests but have not joined the section.
 
Tools & Resources



You should make an R package for your paper

Spencer Fox


from

I had wanted to get into R package creation for a while now, but finally got a chance to do so for my lab’s recent Zika paper, Assessing Real-time Zika Risk in the United States. If you’re interested in learning about the content from that paper you can check out the blogpost hosted on the BMC Infectious Diseases blog. I’ve also given an introduction to the package on the rtZIKVrisk github. Here I wanted to talk a bit about some of the advantages of putting your papers into packages (both for yourself and other researchers).


What’s New in Hadoop 3.0

Edureka, Shubham Sinha


from

“This ‘What’s New in Hadoop 3.0’ blog focus on the changes that are expected in Hadoop 3, as it’s still in alpha phase.”


ParlAI: A new software platform for dialog research

Facebook Code, Engineering Blog; Jason Weston, Alexander Miller, Will Feng


from

“ParlAI (pronounced “par-lay”) is a one-stop shop for dialog research, where researchers can submit new tasks and training algorithms to a single, shared repository.”

 
Careers


Internships and other temporary positions

PULSE Fellowship in Artificial Intelligence, Law, and Policy



UCLA; Los Angeles, CA
Full-time, non-tenured academic positions

Full Stack Research Software Engineer



NYU Center for Urban Science and Progress; New York, NY

Digital Scholarship Librarian



NYU Libraries; New York, NY

developers to work on NumPy



University of California-Berkeley, Berkeley Institute for Data Science; Berkeley, CA
Postdocs

Linking genetic predictors of high-dimensional imaging phenotypes and disease outcomes



EMBL-European Bioinformatics Institute, NIHR Cambridge Biomedical Research Centre, University of Cambridge; Cambridge, England

Leave a Comment

Your email address will not be published.