Data Science newsletter – November 29, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for November 29, 2017

GROUP CURATION: N/A

 
 
Data Science News



Merton on fintech, retirement, more

top1000funds.com, Amanda White


from

At the Fiduciary Investors Symposium at the Massachusetts Institute of Technology in October 2017, top1000funds.com interviewed Nobel laureate Robert Merton, Distinguished Professor of Finance at MIT Sloan School of Management.

Merton generously shared his personal journey and lessons he learned from working at MIT and Harvard University over the last 50 years, retirement solutions, his prediction for how fintech will influence advice, and the future of active management (listen to the podcast).


AI-enabled Klarity helps companies identify risks in contracts

MIT Sloan School of Management, Newsroom


from

New software out of MIT uses artificial intelligence to speed up and reduce the cost of contract review for companies. Klarity is taking on nondisclosure agreements first, part of a strategy to build customer trust and expand its offerings.

In a usual scenario today, a salesperson might receive a draft of a nondisclosure agreement from a potential customer and forward it to a company’s in-house lawyers. It could take a couple days for the legal team to review the contract and send it back — or a couple of weeks. As the salesperson waits, he or she loses the ability to move the deal forward.


Meet the Spreadsheet That Can Solve NYC Transit (and the Man Who Made It)

VICE, Motherboard, Aaron Gordon


from

Charles Komanoff’s ‘Balanced Transportation Analyzer’ is the most comprehensive accounting possible of how a congestion charge in Manhattan would affect New York City.


U of T geophysicists uncover new evidence for an alternative style of plate tectonics

University of Toronto, UofT News


from

“It seems the heavy base of the tectonic plate has ‘dripped’ off into the mantle, leaving a massive gap in the plate beneath Central Anatolia. Essentially, by dropping this dense lithospheric anchor, there has been an upward bobbing of the entire land mass across hundreds of kilometres,” said Oğuz H. Gӧğüş of the Eurasia Institute of Earth Sciences at Istanbul Technical University (ITU), lead author of the study.

It’s a new idea where plate shortening initially squeezed and folded a mountain belt, triggering the thickening and dripping of the deep lithosphere, and then increasing the elevation of most of central Turkey. Puzzled by the presence of such a process at a significant distance away from regular plate tectonic boundaries, the research team set about identifying why, in an area of high heating and high elevation, is the lithosphere below completely gone – something that was recently discovered from seismology.


University Data Science News

“The move from pies to bars has brought better understanding,” said five statistically minded professors arguing for transforming the way we teach. They were not, however, advocating that we move our classes into the bar (though some professors have nonetheless done so). Rather, they were arguing that “data analysis is not purely computational and algorithmic — it is a human behaviour. In this case, the behaviour is made worse by training that was developed for a data-poor era.”



“Because of the frequent shortages of surface water, California has to resort to pumping out water from the ground, but that’s not sustainable because there isn’t enough rain to replenish the amount we’re taking out. It won’t last,” says Caltech grad student Armeen Taeb. He and his colleague Venkat Chandrasekaran at Caltech and John Reager and Michael Turmon at NASA Jet Propulsion Lab have an empirical model of the interconnected statewide water reservoir system. The idea is that the model will allow researchers to collect observations and make predictions about the severity of impending or ongoing droughts with enough time to take measures that would prevent catastrophic localized water unavailability.



Northwestern University’s Neil Kelleher and Philip D. Compton are able to use native mass spectrometry for “native proteomics”. They take protein complexes from cells and tissue and are then able to analyze them at this larger scale with top-down mass spectrometry. The technique preserves certain features and interactions that would be lost in current standard protein imaging practice that can denature some proteins. While this probably fairly technical for those of you who aren’t chemists or biologists, this is another big advance in a field that is kicking out big advances all over the place.



Tell your students to take notes in lecture using a pen and paper. New research out of Princeton University, UCLA, York University, McMaster University, and the United States Military Academy suggests that students who take notes on laptop perform worse on tests than those who write their notes on paper. What’s more surprising is that even students who are merely sitting near the laptop users also perform worse than those farther away. Typing can be more automatic and thoughtless than writing; Facebook can be directly and indirectly distracting, it appears.

NYU Tandon’s Steve Kuyan talked about the academic brain drain in AI and how this impacts what we teach in universities.



While we’re talking about teaching, it’s good for students (and probably the rest of us) to write about science for a broad audience. They tend to discover better ideas when asked to take a holistic viewpoint.



Research funders like the Wellcome Trust and the Bill and Melinda Gates Foundation have tied grant funding to requirements that researchers share data. Now they’ve penned a missive in the New England Journal of Medicine emphasizing their frustration with the ICMJE’s rather floppy directives on data sharing. They note, “The ICMJE has not mandated data sharing as a requirement for publication, and we find the example statements it provides to be vague and open to interpretation”. Score one for the funders! And science!

Geoff Hinton and Sara Sabour (an Iranian researcher who was denied a visa into the US to work at UW-Seattle) are now implementing capsule networks in Toronto. I’m not so nationalistic that I’m sad the talented Sabour is in Canada. Still, I wanted to share this story because it highlights why scientists need to care about policy. See: Government Data Science News here, every week.



For-profit journals, your business model has always been shady and is now looking shaky. Timothy Gowers writes in celebration of arXiv that, “in mathematics at least, journals have become almost irrelevant: their main purpose is to provide a stamp of approval, and even then one that gives only an imprecise and unreliable indication of how good a paper actually is.”

Like so many other universities we’ve highlighted, Colorado School of Mines, Metropolitan State Denver and CU-Boulder are struggling to keep up with a sky-rocketing demand among undergraduates for computer science instruction.



I’ve heard about people suggesting that the way to protect privacy is to put a price on personal data. I never really thought that would work because the value of any given individual data is likely lower than the cost of a buying/selling transaction. But, hey, who I am I to speculate? There’s a new paper that takes a much more rigorous approach and determines that making personal data salable might make it more secure? Maybe.



Andreas Mueller of Columbia University’s Data Science Institute helped dermatologist Lynn Petukhova understand the underlying causal structure of hair disorders. Yes, hair disorders are problems worth solving.



Lest you think researchers at Columbia University aren’t working on the most important problems, please note that Yaniv Erlich and his co-authors have developed a method that can quickly and accurately identify people and cell lines from DNA. This is particularly useful in cancer research where cell lines often experience contamination from other types of cells, rendering final results unintelligible.

My colleagues up at NYU Langone Medical School have been working on the City Health Dashboard that delivers data about social determinants of health to cities and the general public. Things like teen birth rate, opioid deaths, walkability, air quality, and a few measures of overall clinical care rates are displayed for four cities. This is a pilot phase that will soon scale to 500 cities. All the data are downloadable. All the visualizations are designed to communicate well to a broad audience.



Physics professors, take note: only 5% of physics BS degree holders go on to become professors. At the PhD level almost half take positions outside academia right away with more leaving academia for industry or government jobs after postdocs. Physics professors Laurie McNeil and Paula Heron argue that this means the way physics is taught has to include “technical writing, programming…and managing”.



UC-Santa Barbara engineers and researchers received $3.4 million from NSF to build “Google docs for scientific images” called BisQue. The goal is to let scientists “share, test, and refine methods that have common underlying algorithms and methods.”


A New Way for Machines to See, Taking Shape in Toronto

The New York Times, Cade Metz


from

Geoff Hinton and Sara Sabour, a young Google researcher, are exploring an alternative mathematical technique that he calls a capsule network. The idea is to build a system that sees more like a human. If a neural network sees the world in two dimensions, a capsule network can see it in three.

Mr. Hinton, a 69-year-old British expatriate, opened Google’s artificial intelligence lab in Toronto this year. The new lab is emblematic of what some believe to be the future of cutting-edge tech research: Much of it is expected to happen outside the United States in Europe, China and longtime A.I. research centers, like Toronto, that are more welcoming to immigrant researchers.
Ms. Sabour is an Iranian researcher who wound up in Toronto after the United States government denied her a visa to study computer vision at the University of Washington.


The end of an error?

The Times Literary Supplement (UK), Timothy Gowers


from

What would the world be like without formal peer review? One can get some idea by looking at what the world is already like for many mathematicians. These days, the arXiv is how we disseminate our work, and the arXiv is how we establish priority. A typical pattern is to post a preprint to the arXiv, wait for feedback from other mathematicians who might be interested, post a revised version of the ­preprint, and send the revised version to a journal. The time between submitting a paper to a journal and its appearing is often a year or two, so by the time it appears in print, it has already been thoroughly assimilated. Furthermore, looking a paper up on the arXiv is much simpler than grappling with most journal websites, so even after publication it is often the arXiv preprint that is read and not the journal’s formatted version. Thus, in mathematics at least, journals have become almost irrelevant: their main purpose is to provide a stamp of approval, and even then one that gives only an imprecise and unreliable indication of how good a paper actually is.

Of course, different disciplines have different needs and very different publishing ­cultures.


This Antarctic Selfie Is Helping Build the Interplanetary Internet

VICE, Motherboard, Daniel Oberhaus


from

The selfie above is remarkable for a number of reasons. In the first place, it features three NASA engineers at the National Science Foundation’s McMurdo Station, the largest research base in Antarctica. There’s also the picture inside the picture featuring Vint Cerf, best known as the “father of the internet” for his role in creating the protocols that determine how information is sent on the web.

However the most remarkable thing about this photo is what’s not pictured: the interplanetary internet it is helping to create.

After taking this photo on November 20, the NASA engineers sent it from McMurdo to the International Space Station. Although many of us trade selfies on a daily basis, sending a snapshot between two of the most remote human habitats in existence requires more than just opening up your photo app of choice. In this case, these NASA researchers were reliant on an experimental way of routing data called Delay (or Disruption) Tolerant Networking, and it’s the future of the internet in interplanetary space.


Scotland at the heart of advances in machine learning

The Scotsman, Graeme Smith


from

It is no coincidence that 12 years ago, Amazon chose Edinburgh for its first development centre outside North America. Today, our scientists, engineers and designers are harnessing the latest technology to create inventions that help hundreds of millions of customers all over the world. We thrive in Scotland’s capital because of the talent here and because great technologists from all over the world want to come and live in Edinburgh. The University of Edinburgh is an important partner for us and other technology companies in Scotland. Their School of Informatics produces more world-leading and internationally excellent research in computer science and informatics than any other university in the UK.

Access to world-leading research talent in Edinburgh has allowed us to build a team of scientists and engineers that bring state-of-the-art machine learning to more of our business.


Colorado colleges overflowing with huge wave of computer science students

The Denver Post, Monte Whaley


from

A massive influx of computer science majors in Colorado and across the country is overwhelming college and university classrooms as students opt to gain the skills required to fill nearly 500,000 open jobs in cybersecurity, data science and machine learning.

Freshman James Schreiner said nearly all of his computer science classes at the Colorado School of Mines in Golden are full and his fellow students are brimming with optimism about what tasks and projects they will take on after graduation.

“Computer science is so variable, and you can do almost anything with it,” said Schreiner, who could earn up to $90,000 as a software developer in the first year after graduation, according to industry estimates. “Of course, I am part of a generation born in an age of technology and it’s so easy for us to connect with new ideas.”


New York Startup Unleashes Big Data on Art Investing

Bloomberg Technology, Molly Schuetz


from

Hedge funds and some of the world’s biggest banks have embraced the predictive properties of machine learning to spot patterns and guide their investment decisions. Could this branch of artificial intelligence be used to divine the vagaries of the art market? A New York startup says it can.

Arthena analyzes hundreds of thousands of data points on works of art—artist, style, medium, size and so forth. Adding a touch of human insight, the company picks pieces it says will generate handsome returns for investors. Arthena currently manages several funds, ranging from low-risk ones that invest in modern art to higher-risk funds that buy works from emerging artists. The startup, which is backed by Foundation Capital, Beamonte Investments and Y Combinator, recently teamed up with brokerage Charles Schwab, which offers a suite of alternative investment offerings.


The Economics Data Revolution Has Growing Pains

BloombergView, Noah Smith


from

The empirical revolution is a good thing — it will make people take economists more seriously as scientists, and result in fewer nasty surprises for policy makers who in the past might have relied too much on speculative theory.

But rapid growth usually comes with growing pains, and empirical economics is no different. As evidence becomes more and more important to the discipline, it was inevitable that the methods empirical researchers use would come under increasing scrutiny. And that scrutiny was bound to find some systematic mistakes and methodological issues.


Internet’s Top Cop Under Trump May Struggle to Run at Web Speeds

Bloomberg Politics, Todd Shields and David McLaughlin


from

Federal Communications Commission Chairman Ajit Pai’s plan to gut Obama-era net neutrality rules calls for handing off the job of policing broadband service to an agency with different powers and a different mandate.

Giving the Federal Trade Commission oversight for the web can make sense from Pai’s perspective: It’s a consumer-protection agency that already has taken action against high-speed internet providers.

But, there’s a key difference: The FCC sets rules designed to prevent bad behavior, while the FTC acts after wrongdoing has occurred. That distinction has become a flash point in the debate over Pai’s proposal, which would change the way the government regulates the internet with far-reaching implications for a host of industries.

Opponents say that reactive nature means the trade commission is too slow to oversee the rapidly evolving digital economy.


WeWork To Acquire Meetup

Crunchbase News, Gabriela Barkho


from

WeWork’s acquisition spree continues.

WeWork will acquire Meetup, Crunchbase News has learned, adding another purchase to the rapidly expanding rental startup’s roster.

The $20 billion coworking space pioneer is set to acquire New York City-based community platform Meetup, according to an internal Meetup meeting that took place today. Meetup CEO and co-founder Scott Heiferman announced the move internally, telling employees “we are being acquired by WeWork.”


RBC joins tech giants like Facebook, Google, Microsoft in setting up AI lab in Montreal

Toronto Star, The Canadian Press, Ross Marowits


from

The Royal Bank of Canada will open a Borealis AI lab in the new year, joining labs in Toronto and Edmonton. It hopes to have 10 researchers on staff in the first year of operation.


AI faces a power struggle if it is to be used more broadly

New Electronics (UK)


from

In neural networks, the transfer of data and neuron weights to and from memory dominates power consumption. Bringing data and processing power closer together makes a big difference. A further optimisation is to make the arrays smaller to limit their capacitance. At ISSCC 2017, University of Michigan researchers, led by Professors David Blaauw and Dennis Sylvester, showed a hierarchical way to bring memory and processing closer together for ML.


Government Data Science News

The 2020 Census may change the way it collects data on race and ethnicity by possibly adding a checkbox for people with roots in the Middle East or North Africa, among other proposals. The Census is akin to a sacred text as far as statisticians and social scientists are concerned. This is a big deal.



Kathy Partin, who has led the US Office of Research Integrity (ORI) for what have been described as two “tumultuous” years was forced out. Details are slim, but she is reportedly fighting to keep her job, one that seems inherently full of conflict. Her main objective is to investigate reports of scientific fraud. (Filing under: jobs I am glad someone else does.)

The US Patent Office is raising its filing fee 10% starting 16 January 2018. Should clear up some of the backlog, but it likely won’t provide much of deterrent to frivolous patent filers.

Story time: A man with a spreadsheet walked into the MTA and congestion pricing was passed into law! No. That didn’t happen. But Charles Komanoff does have a 5.2 Mb, 72 worksheet masterpiece that explains how congestion pricing in Manhattan would impact New York City. He determined that a congestion pricing fee of $16 could make buses and subways FREE. For everyone. All the time. The spreadsheet is available for download if you want to poke around with the parameters. Paging Mayor Bill DeBlasio and Governor Andrew Cuomo: try not to kill each other while considering this data-driven plan to fund the crippled MTA.

San Francisco could be the first city to let all of its parking prices set dynamically. The MTA’s board of directors will vote on this plan next Tuesday.

The Washington DC lobbying of global food conglomerates (Nestle, Kraft, General Mills, etc.) led by the Grocery Manufacturers Association has fallen out of sync, a result of how consumers’ tastes change with hyperspeed as purchase data gets captured and analyzed.



How Bots Broke the FCC’s Public Comment System During the Net Neutrality Debate

WIRED, Security, Issie Lapowsky


from

On a single day in late May, hundreds of thousands of public comments poured into the Federal Communications Commission regarding its plans to roll back net neutrality protections. A week and a half later, on June 3, hundreds of thousands more followed. The spikes weren’t the voices of pro-net neutrality Americans, worried what will happen if the FCC allows internet service providers to block and throttle content whenever it so chooses. In fact, they weren’t really voices at all.

According to multiple researchers, more than one million of the record 22 million comments the FCC received were from bots that used natural language generation to artificially amplify the call to repeal net neutrality protections. That number may only represent a fraction of the actual bot submissions. The New York Attorney General’s office is currently investigating their source.

But while reports so far have focused on bad actors flooding the FCC with phony content, some of those same techniques also allowed legitimate groups, like the Electronic Frontier Foundation, to tell their members to click a button and send an auto-generated—albeit earnest—comment to the FCC, creating a groundswell of activism among actual humans. The result: A net neutrality comment period that garnered more input from the public than all previous comment periods across all government agencies—combined.


A.I.-powered Ara electric toothbrush keeps track of how well you’re brushing

Digital Trends, Lulu Chang


from

You may not want to think about your dental hygiene over the holidays, but without a healthy set of pearly whites, how could you possibly eat all that delicious food you’ve been dreaming about since last holiday season? Here to help you avoid the withering stare of your dentist, even in the midst of gluttony, is the Ara from Kolibree, a new AI-powered toothbrush that claims to help your teeth stay on the nice list this season.

This electric toothbrush will ensure that you’re brushing for 2 minutes by staying on for 2 minutes. While you may think that you’re spending 120 seconds at the sink twice a day, data suggests that the average brushing time in the U.S. is only 45 to 70 seconds long (and consequently, that 60 to 90 percent of children and nearly all adults have cavities). But aside from pure time spent brushing, Ara also comes with a Coach+ feature that shows you where you should be expending more effort.


Ohio State University linguistics team using Ohio Supercomputer Center to translate lesser-known languages

Ohio Supercomputer Center


from

William Schuler, Ph.D., a linguistics professor at The Ohio State University, is part of a project called Low Resource Languages for Emergent Incidents (LORELEI), an initiative through the Defense Advanced Research Projects Agency (DARPA). The LORELEI program’s goal is to develop technology for languages about which translators and linguists know nothing.

As part of LORELEI, Schuler and his team are using the Ohio Supercomputer Center’s Owens Cluster to develop a grammar acquisition algorithm to discover the rules of lesser-known languages, learning the grammars without supervision so disaster relief teams can react quickly.

 
Events



Cutter Lecture on Data Sharing in the Context of Clinical Trials

Harvard T.H. Chan School of Public Health


from

Boston, MA Friday, December 8, starting at 1:30 p.m., Kresge G1 – Snyder Auditorium (677 Huntington Ave) [free]


Roundtable on Data Science Postsecondary Education Meeting #5: Integrating Ethics and Privacy Concerns into Data Science Education

National Academies of Sciences, Engineering, and Medicine


from

Washington, DC, and Online Friday, December 8, organized by the National Academies of Sciences, Engineering, and Medicine. “This meeting will bring together data scientists and educators to discuss how to define and strengthen existing data science programs and how to best engage and retain data science students.”

 
Deadlines



Choose your breakthrough of the year!

Pick your favorite breakthrough from the list of candidates below by 3 December. Then check back the next day, when we will start a second round of voting with your four top picks.

Announcing AWS Machine Learning Research Awards

The AWS Machine Learning Research Awards program funds university departments, faculty, PhD students, and post-docs that are conducting novel research in machine learning (ML).



Our goal is to accelerate the development of innovative algorithms, publications, and source code across a wide variety of ML applications and focus areas.

Call for Applications: Join us in Seattle to Become a Data Carpentry Instructor

“Together with Data Carpentry, the West Big Data Innovation Hub is excited to host hands-on training for community members to become Data Carpentry instructors. Training will take place March 22-23, 2018, in Seattle at University of Washington eScience Institute.” Deadline to apply is December 15.

IEEE BIOROB 2018: Call for Workshops and Tutorials

Enschede, The Netherlands Conference is August 26-29, 2018. Deadline for workshop and tutorial submissions is February 15, 2018.
 
Tools & Resources



Future of pandas

Jeff Reback


from

Architecture overview for the future of the Python Pandas data analytics [53 slides, from PyData NYC 2017]


Sequence Modeling with CTC

Distill, Awni Hannun


from

A visual guide to Connectionist Temporal Classification, an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems.


Text Mining “Pitching in a Pinch”

Jim Albert, Exploring Baseball Data with R


from

[Christy] Mathewson wrote about his baseball pitching experiences in the book Pitching in a Pinch, copyrighted in 1912. This book is freely available as part of Project Gutenberg. It is an interesting read — much of the book is devoted to the pitcher/batter matchups that we focus on today. Anyway, Mathewson’s book is a good springboard for introducing the tidy approach to text mining described in the new book by Julia Silge and David Robinson.


AWS launches Amazon Sumerian to build AR, VR and 3D apps quickly

TechCrunch, Ingrid Lunden


from

We’d heard months ago that Amazon would be using its Re:Invent AWS event to roll out some a new service related to building in mixed reality — augmented reality and virtual reality. And on the eve of the conference kicking off, it’s done just that. Today the company announced Amazon Sumerian, a new platform for developers to build and host VR, AR and 3D apps quickly and with minimal coding, for smartphones and tablets, head-based displays, digital signage and web browsers. As with many other AWS services, Sumerian is “free” to use: you pay only for the storage for what you create.

 
Careers


Postdocs

Postdoctoral Fellow in Energy Data Visualization and Curation



University of California-Berkeley, Library; Berkeley, CA
Full-time, non-tenured academic positions

Research Fellow in Responsible Artificial Intelligence



University of Southampton; Southampton, England
Internships and other temporary positions

UX/IXD/ID/Prototyping Internship



PARC; Palo Alto, CA

Leave a Comment

Your email address will not be published.