Data Science newsletter – May 24, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for May 24, 2018

GROUP CURATION: N/A

 
 
Data Science News



UNL among universities pushing for more federal ag research dollars

Lincoln Journal Star, Eric Gregory


from

A consortium of 16 public and private universities, co-led by University of Nebraska-Lincoln Chancellor Ronnie Green, is seeking to reverse the trend of shrinking research dollars in the 2018 version of the farm bill now slogging its way through Congress.

FedByScience aims to share the stories of how scientific discoveries have improved food production in the U.S., while also communicating how future findings could help farmers thrive in the future amid turbulent commodity markets, more extreme weather and an uncertain economy.

“A stronger investment in agricultural research can provide the science and innovation that farmers need to navigate these obstacles,” Green said at the effort’s launch last month.


Business Class – Inside the strange, uniform politics of today’s MBA programs—and what it says about America’s elites

New Republic, John Benjamin


from

MBA programs are not the open forums advertised in admissions brochures. Behind this façade, they are ideological institutions committed to a strict blend of social liberalism and economic conservatism. Though this fusion may be the favorite of American elites—the kinds of people who might repeat that tired line “I’m socially liberal but fiscally conservative”—it takes a strange form in business school. Elite business schooling is tailored to promote two types of solutions to the big problems that arise in society: either greater innovation or freer markets. Proposals other than what’s essentially more business are brushed aside, or else patched over with a type of liberal politics that’s heavy on rhetorical flair but light on relevance outside privileged circles.

It is in this closed ideological loop that we wannabe masters of the universe often struggle to think clearly about the common good or what it takes to achieve it. Today’s MBA programs, insofar as they churn out graduates riveted to this worldview, limit the vision of future leaders at a time when public dissatisfaction with business and its institutions makes our complacency a danger.


The (holy) ghost in the machine: Catholic thinkers tackle the ethics of artificial intelligence

Deseret News, Religion News Service, Jack Jenkins


from

In a conversation in Rome facilitated by the Vatican last year, Cardinal Gianfranco Ravasi, president of the Pontifical Council for Culture, reportedly weighed whether or not the phrase “artificial intelligence” is an oxymoron and criticized AI scientists’ use of the term “electronic person.” That came on the heels of a July 2017 roundtable on the topic at the Dominican School of Philosophy and Theology in Berkeley, Calif., where speakers from business and technology mixed with academics and theologians.

“The church … has always understood that technology is judged by morality,” the Rev. Eric Salobir, an organizer of the forum, said at the time.

In April, at Santa Clara University, a Jesuit college in California’s Bay Area, University of Notre Dame professor Mark Graves delivered the dean’s lecture on “Robots Reading Theology: AI, Cultural Analytics and Machine Ethics,” citing Thomas Aquinas to describe a form of “machine ethics” that would apply to AI.


This UK startup thinks it can win the self-driving car race with better machine learning

TechCrunch, Steve O'Hear


from

A new U.K. self-driving car startup founded by Amar Shah and Alex Kendall, two machine learning PhDs from University of Cambridge, is de-cloaking today. Wayve — backed by New York-based Compound, Europe’s Fly Ventures, and Brent Hoberman’s Firstminute Capital — is building what it describes as “end-to-end machine learning algorithms” to make autonomous vehicles a reality, an approach it claims is different to much of the conventional thinking on self-driving cars.

Specifically, as Wayve CEO Shah explained in a call last week, the young company believes that the key to making an autonomous vehicle that is truly just that (i.e. able to drive safely in any environment it is asked to), is a much greater emphasis on the self-learning capability of its software. In other words, self-driving cars is an AI problem first and foremost, and one that he and co-founder Kendall argue requires a very specific machine-learning development skill set.


Company Data Science News

My favorite new technology of the week is an image recognition, field-ready robot that putters up and down rows of crops, spotting weeds, and squirting them with the appropriate herbicide from what I’m imagining is a repurposed Super Soaker. This type of technology will allow for a huge reduction in herbicide use because squirt guns use far less liquid than sprinklers.



Microsoft acquired Semantic Machines, a talent-heavy natural language processing group of about 25 people. Their technology and skills are impressive; Microsoft is planning to integrate the tech into their voice assistant, Cortana. Two of the new hires – Dan Klein and Percy Liang – are affiliated with UC-Berkeley and Stanford, respectively. Again, this is an interesting version of brain drain, in which talented people split time between industry and academia. Seems promising, but the difficulty is in the details.



The American Civil Liberties Union has asked Amazon to stop selling their low-cost facial recognition technology for image and video data to law enforcement. Why? Because there is no way to stop law enforcement from using the technology to track anyone, rather than tracking only those suspected of criminal behavior. We have certainly seen evidence that everyday cops are tempted to track ex-partners and other people of a personal interest. Because it only costs $400 to set up and $6 per month in ongoing fees, there is concern that the technology could rapidly blanket the entire US. The letter calling for discontinuation of Rekognition was also signed by the Electronic Frontier Foundation and Human Rights Watch.



If you are famous, you will struggle to maintain any shred of anonymity. British broadcaster SkyNews used the same Amazon Rekognition technology during the royal wedding proceedings to spot celebrities among attendees.



Samsung, avoiding the US and Canada this time around, has announced it will open a new AI research shop in Cambridge, UK. The center will be led, naturally, by Andrew Blake, who is a professor.



Yann LeCun notes that Facebook can detect about 75% of hate speech automatically. Sadly, even the best data science cannot yet detect the difference between fact and fake news. He gives an honest, straight-forward interview, in which he says, “there is a myth that if you have more data, you win in terms of progress towards AI. You don’t. The main limitation of making machines more intelligent is not data: it’s new concepts, new algorithms, new math, etc.” He also notes that academics in Europe are “very badly paid” which is so, so true.



Facebook has released a 12-minute documentary style video to discuss how it has contributed to the spread of fake news and what it plans to do next. The speakers are product managers, engineers, and data scientists.



Amazon Web Services (AWS) introduced AWS DeepLens, a deep-learning enabled video camera that will release on 14 June 2018. It is designed for programmers and comes loaded with “pre-trained models, tutorials, and code.” If you buy one of these, let me know how it works.



Elsewhere at Amazon, Alexa is getting creepy – recording conversations and sending them to unwitting recipients, getting triggered by words that do not sound at all like “Alexa” or from tones within television and radio programs that people cannot hear. Watch this SNL clip and then you’ll know what I mean when I say, I don’t know about that.


Computer Memory: Visualizing a Century of Oral History

Medium, Fathom Information Design


from

As part of our ongoing exploration into representing and understanding large document sets, we dove into the Computer History Museum’s interview archive. From the co-founders of Ethernet and Devo to professors at Carnegie Mellon, the archive includes 800+ oral histories of individuals involved in all aspects of computing over the last century.

Going into this project, our goal was to design an initial landscape overview of the documents. We wanted to find a way to see the nature, volume, and density of the content and connections within the archive.


The Cybersecurity 202: We surveyed 100 security experts. Almost all said state election systems were vulnerable.

The Washington Post, PowerPost, Derek Hawkins


from

The midterm elections are less than six months away, but an overwhelming 95 percent of digital security experts surveyed by The Cybersecurity 202 say state election systems are not sufficiently protected against cyberthreats.


[1805.06983] Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

arXiv, Computer Science > Computer Vision and Pattern Recognition; Gabriele Campanella, Vitor Werneck Krauss Silva, Thomas J. Fuchs


from

In the field of computational pathology, the use of decision support systems powered by state-of-the-art deep learning solutions has been hampered by the lack of large labeled datasets. Until recently, studies relied on datasets in the order of few hundreds of slides which are not enough to train a model that can work at scale in the clinic. Here, we have gathered a dataset consisting of 12,160 slides, two orders of magnitude larger than previous datasets in pathology and equivalent to 25 times the pixel count of the entire ImageNet dataset. Given the size of our dataset it is possible for us to train a deep learning model under the Multiple Instance Learning (MIL) assumption where only the overall slide diagnosis is necessary for training, avoiding all the expensive pixel-wise annotations that are usually part of supervised learning approaches. We test our framework on a complex task, that of prostate cancer diagnosis on needle biopsies. We performed a thorough evaluation of the performance of our MIL pipeline under several conditions achieving an AUC of 0.98 on a held-out test set of 1,824 slides. These results open the way for training accurate diagnosis prediction models at scale, laying the foundation for decision support system deployment in the clinic.


Hundreds of Apps Can Empower Stalkers to Track Their Victims

The New York Times, Jennifer Valentino-DeVries


from

KidGuard is a phone app that markets itself as a tool for keeping tabs on children. But it has also promoted its surveillance for other purposes and run blog posts with headlines like “How to Read Deleted Texts on Your Lover’s Phone.”

A similar app, mSpy, offered advice to a woman on secretly monitoring her husband. Still another, Spyzie, ran ads on Google alongside results for search terms like “catch cheating girlfriend iPhone.”

As digital tools that gather cellphone data for tracking children, friends or lost phones have multiplied in recent years, so have the options for people who abuse the technology to track others without consent.


How the Math Men Overthrew the Mad Men

The New Yorker, Ken Auletta


from

Once, Mad Men ruled advertising. They’ve now been eclipsed by Math Men—the engineers and data scientists whose province is machines, algorithms, pureed data, and artificial intelligence. Yet Math Men are beleaguered, as Mark Zuckerberg demonstrated when he humbled himself before Congress, in April. Math Men’s adoration of data—coupled with their truculence and an arrogant conviction that their “science” is nearly flawless—has aroused government anger, much as Microsoft did two decades ago.

The power of Math Men is awesome. Google and Facebook each has a market value exceeding the combined value of the six largest advertising and marketing holding companies. Together, they claim six out of every ten dollars spent on digital advertising, and nine out of ten new digital ad dollars. They have become more dominant in what is estimated to be an up to two-trillion-dollar annual global advertising and marketing business. Facebook alone generates more ad dollars than all of America’s newspapers, and Google has twice the ad revenues of Facebook.


A Sweeping Plan to Fix the Subways Comes With a $19 Billion Price Tag

The New York Times, Emma G. Fitzsimmons


from

A sweeping proposal to overhaul New York City’s subway and improve the broader transit system is expected to cost more than $19 billion, according to two people who were briefed on Tuesday, and goes far beyond the emergency repair plan that was unveiled last summer after the subway fell into crisis.

The proposal by the subway’s new leader, Andy Byford, will be announced on Wednesday in a highly anticipated presentation before the Metropolitan Transportation Authority’s board.

Mr. Byford has warned that the subway needs major upgrades to reverse its precipitous slide and the work will require short-term pain for millions of subway riders. His plan will focus on speeding up the rollout of a new signal system to replace the subway’s current antiquated equipment.


How Has Taylor Swift’s Word Choice Changed Over Time?

R-bloggers, Deeply Trivial blog, Sara Locatelli


from

Sunday night was a big night for Taylor Swift – not only was she nominated for multiple Billboard Music Awards; she took home Top Female Artist and Top Selling Album. So I thought it was a good time for some more Taylor Swift-themed statistical analysis.

When I started this blog back in 2011, my goal was to write deep thoughts on trivial topics – specifically, to overthink and overanalyze pop culture and related topics that appear fluffy until you really dig into them. Recently, I’ve been blogging more about statistics, research, R, and data science, and I’ve loved getting to teach and share.

But sometimes, you just want to overthink and overanalyze pop culture.

So in a similar vein to the text analysis I’ve been demonstrating on my blog, I decided to answer a question I’m sure we all have – as Taylor Swift moved from country sweetheart to mega pop star, how have the words she uses in her songs changed?


Samsung is opening a massive AI research center in the UK

The Next Web, Matthew Hughes


from

Deep-pocketed Korean conglomerate Samsung today announced it will open a massive AI research center, based in leafy (and deeply tech-centric) Cambridge.

The Samsung AI Center, Cambridge (SAIC-Cambridge) will be lead by Professor Andrew Blake, who is one of Britain’s leading academics in the AI space.


Baidu spins out its global ad business to sharpen its focus on artificial intelligence

TechCrunch, Jon Russell


from

Baidu, the Chinese search giant, is spinning out its business unit responsible for utility apps and its mobile ad business to sharpen its focus on artificial intelligence.

As part of the spin-out, Baidu is selling a large chunk of its equity in the ‘Global DU’ business to as-yet-undisclosed investors. The plan is to sell “a majority equity stake” in order to take Global DU independent. Once the deal is completed — it is targeted at a Q3 2018 timeframe — Baidu’s share of the business will drop to around 34 percent. Further, the business is likely to raise additional capital for growth.

Spinning out business units is commonplace among Chinese tech companies, Baidu itself recently did so with its financial services business.


Russia Tries to Get Smart about Artificial Intelligence

The Wilson Quarterly, Jill Dougherty & Molly Jay


from

Spurred from the top, could Russia one day muster real competition with the great powers in the AI field? The implications of the question are great, as indeed, current and potential applications of the technology cross sectors and range from the beneficial to the malign. Russia’s recent social-media-based propaganda campaigns to influence Western elections employed only relatively basic AI, but achieved certain impact. Today, it is unclear whether the country can achieve top AI status, but from Washington to Beijing, other capitals are tracking Moscow’s efforts.

The undisputed world leaders in artificial intelligence are the United States and China. Compared to their investment in the field, the Russian government lags far behind. According to the Russian tech website cnews.ru, the size of the AI/machine learning market in the country was less than $12 million in 2017. That’s estimated to grow, and markedly, to some $460 million by 2020. Still, the figure is dwarfed by the roughly $7.4 billion that the Pentagon budgeted last year on AI and allied fields like big data and cloud computing.

 
Events



ICML 2018 Accepted Papers

ICML


from

Stockholm, Sweden July 10-15. [$$$]


Games for Change Festival

Games for Change


from

New York, NY June 28-30. Produced in collaboration with Parsons School of Design. [$$$]


Hacking Dermatology

MIT Hacking Medicine


from

Cambridge, MA June 8-10. “ou and other hackers will dream up solutions and create teams to compete for over $75k in grants. Your projects will be designed to address the most stubborn challenges related inflammatory skin conditions, including psoriasis and atopic dermatitis, and related patient experience pain points.” [application required]

 
Deadlines



Digital Health Breakthrough Network

“The Digital Health Breakthrough Network is a research incubator that helps startups conduct rapid pilot studies to test their products and move them closer to market. We give you data to help scale or pivot your technology as quickly as possible.” Deadline for the current round of applications is June 15.

Facebook invites submissions for Crisis Informatics research awards

“To support new and innovative research in the area of crisis informatics to make social media tools more useful to people responding to or impacted by a disaster, we invite the academic community to respond to a call for research proposals on crisis informatics and the role of social media in disaster response and recovery.” Deadline for applications is September 20.
 
Tools & Resources



Pytorch Glow

GitHub – pytorch


from

“This document provides a short description about producing ahead-of-time compiled executable bundles. The motivation for this work is to remove the cost of compile time by allowing the users of Glow to compile the package ahead of time.”


Easily Access Amazon SageMaker Example Notebooks Through the Jupyter Interface

Amazon AWS, About AWS


from

“You can now access all of the example Jupyter notebooks provided through Amazon SageMaker from a new ‘SageMaker Examples’ tab on the Jupyter interface console to help you get started using machine learning even faster. These examples cover topics like machine learning fundamentals, in-depth instruction on specific algorithms and frameworks, advanced SageMaker features, and integration with Apache Spark.”


Observable: Downloading and Embedding Notebooks

Observable, Jeremy Ashkenas


from

Observable provides an open-source runtime, which stitches together a notebook’s cells into a dependency graph and brings them to life through evaluation; a standard library, which provides helpful functions for working with HTML, SVG, generators, files and promises among other useful sundries; and an inspector, which implements the default strategy for rendering DOM and JavaScript values into a live web page — although you’re free to write your own.”


Deploy JupyterHub for teaching

GitHub – jupyterhub


from

“The goal of this repository is to produce a reference deployment of JupyterHub for teaching with nbgrader.”


ZoteroBib

Corporation for Digital Scholarship


from

“ZoteroBib helps you build a bibliography instantly from any computer or device, without creating an account or installing any software.”


When to Raise a Series A

Y Combinator, Aaron Harris


from

“One of the hardest questions to answer when considering an A is “when is my company ready?” This is another one of those questions for which there are hundreds of answers on the internet, none of which are particularly satisfying. The reason these answers don’t work is that each rule has so many exceptions as to make the rule seem silly.”


Winds 2.0: It’s Time to Revive RSS

The Stream Blog, Thierry Schellenbach


from

Stream is releasing Winds 2.0, an open-source Podcast & RSS Reader. You can download Winds here or head over to Github for the code.”

 
Careers


Full-time, non-tenured academic positions

Data Librarian / Community Manager



Northwestern University, Clinical and Translational Sciences Institute; Chicago, IL
Full-time positions outside academia

Special Project Archivist



City of New York, Department of Cultural Affairs; New York, NY

Leave a Comment

Your email address will not be published.