Data Science newsletter – September 18, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for September 18, 2017

GROUP CURATION: N/A

 
 
Data Science News



Should Journals Be Responsible for Reproducibility?

Footnote, William G. Jacoby, Sophia Lafferty-Hess and Thu-Mai Christian


from

Science is an inherently social enterprise. Progress only occurs when new results are communicated to and accepted by the relevant scientific communities. The major lines of communication run through professional journals and the double-blind peer review process. Academic journals are also a main currency of scholarly success, as publication in a top journal can be a make-or-break career moment for a researcher.

Because of their central role in academic communication and career advancement, journals help set the rules of how research is evaluated and rewarded. At the American Journal of Political Science (AJPS), we work closely with our partners at the Odum Institute for Research in Social Science at University of North Carolina-Chapel Hill and the Qualitative Data Repository at Syracuse University to promote reproducibility and transparency as cornerstones of high-quality research.

While the political science discipline has long paid lip service to the importance of these issues, the AJPS’s Replication & Verification Policy requires scholars to “practice what we preach” by incorporating reproducibility and data-sharing into the academic publication process.¹ Our goal is to establish a standard for the information that must be made available about the research that appears in our journal.


Government Data Science News

The City of London has decided not to renew Uber’s license to operate. In a truly British turn of phrase, the company was described as being “not fit and proper” to continue operations. It is nearly impossible to go two weeks without including news about Uber.

President Trump’s Office of Science and Technology Policy is still struggling to staff up, with only 42 staffers. That’s down from 130+ during the Obama years. The OSTP has historically been very active in disaster preparedness, which is necessary during hurricane season.

NASA‘s experiment in which six people lived in isolation in a lava field in Hawaii for six months has ended. Much fresh fruit was consumed. The project was designed to explore the impact of isolation on cohorts in preparation to send humans to Mars.

The Department of Defense budget bill contained a clause mandating that the federal government must make certain data openly available to the public. “If, as expected, the DoD budget bill becomes law, the open data amendment provides a legal mandate for the government to adopt open data practices. How that requirement would be met remains unclear” following apparent diminution of the amount of climate change data available. The lack of clarity is a hallmark of the Trump administration.

Santa Monica has a Chief Well-Being Officer whose job is to measure and improve upon the well-being of people living in Santa Monica. They’ve created a well-being index composed of “more than 100 data elements drawn from a survey of individual well-being, publicly available social-media data, and city data on everything from crime to voting and library usage”. It’s a first-of-its kind approach that could be a harbinger of better city responsiveness in affluent communities like Santa Monica. It’s hard for me to believe that cash-strapped cities will be able to support positions like this. Still quite interesting to note how data can be used for good.


University Data Science News

Geoffrey Hinton, Professor Emeritus at the University of Toronto and one of the frontiersmen of modern AI is “deeply suspicious” of back-propagation methods. His recommendation for the future of AI methods: “Throw it all away and start again.”

The Institute for Health Metrics and Evaluation at the UW-Seattle released a new global health dataset with a visualization tool.

Germany’s Geisenheim University, a first-timer in the data science newsletter, has installed a Free Air Carbon Dioxide Enrichment (FACE) system that regulates “CO2 levels (and for the vegetables, air temperature as well) over defined areas, making it possible to simulate the conditions expected between now and 2050…to develop climate change adjustment strategies”. Click through for pictures of FACE systems deployed in fields and vineyards.

Stevens Institute of Technology CS researchers have shown that AI makes it much easier to guess passwords, using GANs that correctly guessed 27% of LinkedIn passwords when used with hashCat, a brute force method.

UMass Amherst professors Rosie Cowell and David Huber just won an NIH grant ($2.36m) to develop computational mathematics tools to better interpret fMRI data. As it is, fMRI data is averaged across too many neurons to be mappable to specific brain activations. With this grant, they plan to develop software that does fine-grained analyses.

A new Taulbee Survey used course-level data to track enrollment of women and under-represented minorities in Computer Science courses. They found a “trend of decreasing representation of women at higher course levels”. This is congruent with previous findings, but the authors caution that they have many non-majors in their mid- and upper-level courses so a decrease in women does not mean they are leaving CS. They may not have been CS majors at all.

David Simchi-Levi, an Engineering Systems professor at MIT, developed a machine learning algorithm that helped the online retailer Rue La La increase their revenue by 10 percent.

California Digital Library won an NSF EAGER award to turn data management plans into a key component of a networked research data management ecosystem. Sounds promising.

UC- San Diego EE PhD Janelle Shane offered pro-bono machine learning expertise to a local animal shelter that had to name a bunch of newborn guinea pigs quickly. Her algo trained on a set of existing guinea pig names (guys, how is there data on everything!?) and from that training it was able to produce gems like Popchop, Princess Pow, and Fuzzable.

ResearchGate has been pressured by a publishing trade group to limit the distribution of papers by scanning them at upload to “determine if they can be shared publicly or privately”. “Private sharing” in this case means sharing with your co-authors and the vaguely labeled “other private research groups”. Looks like ResearchGate was afraid of being sued for copyright infringement or otherwise badgered by wealthy for-profit publishers.

Yale University has put together an interactive map showing how Americans feel about climate change, that identifies North Dakota as the state with the fewest people who are “very worried” about it.

MIT geophysicist Daniel Rothman has modeled all five of earth’s known mass extinction events. He predicts that the Earth could tip into another mass extinction pattern in 2100, based on the amount of carbon that will then be in the oceans. It would take another 10,000 years for the changes to play out, but his point is that after 2100 we will be in “unknown territory”.


Vocalytics analyzes your body language to make you a better public speaker

TechCrunch, Frederic Lardinois


from

Vocalytics, one of the many projects that came out of our Disrupt SF hackathon this weekend, wants to make you a better public speaker.

The project uses machine learning to analyze videos of your performances and gives you feedback on your body language. The team trained the system to look for your hand gestures and pose, but the plan is to expand this project to also look at your eye gaze, facial animation and other non-verbal gestures.

The ultimate goal of Vocalytics, which was built by Danish Dhamani and Paritosh Gupta, is to build an A.I. that can give you feedback on your body language that’s on par with what you’d get from a professional coach.


Company Data Science News

Facebook has opened a new Facebook AI Research lab in Montreal headed by McGill Professor Joelle Pineau. The company also announced it will invest $7 million “to support AI research at academic institutions in Montreal”. This is not at all surprising. It’s becoming more difficult to hire AI talent, Toronto (where Fujitsu Research will build a new R&D center with help from University of Toronto) and Montreal have a long history of incubating advanced AI techniques and remain strong, and Canadian immigration laws are easier for knowledge workers to navigate.

Apple has given consumers pretty good privacy protection via full-disk encryption in the iPhone and end-to-end encryption in iMessage. However, three researchers spent six months reverse engineering Apple’s code to determine the strength of the differential privacy protection and found it woefully bad. For reference, differential privacy is a technique in which user data is splattered with noise in order to preserve the overall parameters of the dataset but disallow reidentification. As you can imagine, there is some tradeoff between the degree of noise splattering and how strong the privacy protection is. Academics prefer epsilon scores of one or lower. Apple’s technology has a score of 6 (MacOS) and 14 (iOS 10). And epsilon works on an exponential scale, so a score of 14 is basically indicative of no meaningful privacy protection.



Microsoft Azure announced a new level of “confidential computing” in the cloud, potentially solving a business pain point associated with housing sensitive information in third party clouds. The new guarantee offers protection from malicious insiders with administrative privileges, hackers and malware, and other third parties who do not have consent.

“In July, the Director of the World Wide Web Consortium overruled dozens of members’ objections to publishing a DRM standard without a compromise to protect accessibility, security research, archiving, and competition. The Electronic Frontier Foundation appealed that decision, lost and then resigned, writing, “Today, the W3C bequeaths a legally unauditable attack-surface to browsers used by billions of people. They give media companies the power to sue or intimidate away those who might re-purpose video for people with disabilities. They side against the archivists who are scrambling to preserve the public record of our era. The W3C process has been abused by companies that made their fortunes by upsetting the established order, and now, thanks to EME, they’ll be able to ensure no one ever subjects them to the same innovative pressures.”

Kaspersky Lab a company we wrote about last week because the federal government officially banned its use on government computers over fears of Russian government spying has opened a pop-up store in London. The store carries t-shirts, mugs, and art by Ben Eine but *people can only pay with their personal data*. No cash. Just your WhatsApp, chat, or email screenshots. Um…what?

Facebook has been receiving criticism from academics and policy watchers for their admission that Russian operatives had been running campaign ads on the site. Zuckerberg has announced changes in the way the platform handles political advertising are in the pipeline.


How One of Apple’s Key Privacy Safeguards Falls Short

WIRED, Security, Andy Greenberg


from

For the past year, Apple has touted a mathematical tool that it describes as a solution to a paradoxical problem: mining user data while simultaneously protecting user privacy. That secret weapon is “differential privacy,” a novel field of data science that focuses on carefully adding random noise to an individual user’s information before it’s uploaded to the cloud. That way, a company such as Apple’s total dataset reveals meaningful results without any one person’s secrets being spilled.

But differential privacy isn’t a simple toggle switch between total privacy and no-holds-barred invasiveness. And a new study, which delves deeply into how Apple actually implements the technique, suggests the company has ratcheted that dial further toward aggressive data-mining than its public promises imply.


Jill Pipher tapped as new vice president for research

Brown Daily Herald, Jonathan Douglas


from

Pipher outlined three main goals for her tenure: increasing funding for research, expanding commercial engagement and continuing “operational excellence” within her office. … Pipher was previously the founding director of the Institute of Computational and Experimental Research in Mathematics, a position she held from 2010 to 2016. Under her direction, the institute received both founding and operational grants from the National Science Foundation — establishing it as one of eight NSF-funded mathematical sciences institutes in the country — as well as a number of other grants from both federal and commercial sources. Pipher also coordinated numerous semester long programs, which brought together over a thousand researchers during her tenure.


Chips Off the Old Block: Computers Are Taking Design Cues From Human Brains

The New York Times, Cade Metz


from

New technologies are testing the limits of computer semiconductors.
To deal with that, researchers have gone looking for ideas from nature.


Using Data to Help People Live Healthier and Longer Lives

Medium, Dr. Tyrone Grandison


from

At 3:30pm today (September 14th, 2017), the Institute for Health Metrics and Evaluation (IHME) launched the 2016 Global Burden of Disease (GBD) — a comprehensive data set on the risks, injuries and diseases that impact the number of healthy years due to various factors; across gender, age groups, and time.


How Venture Capital Propelled tolingo from Startup to Profitable Expansion

Slator


from

German-based tolingo was founded in 2007 as a pay-as-you-go translation platform with a seamless technological interface. Venture capital (VC) firms were attracted by the startup’s original premise that the translation process can be aided on so many levels by technology. Among the VCs that invested in tolingo was Acton Capital Partners.

“Acton was the third institutional investor in an institutional round worth a few million Euro in 2011,” said Acton Investment Director Marcus Polke. The role VCs played, however, extended beyond funding.


Amazon Is Hungry and It’s Coming for Your Cable Channels

NBC News, Claire Atkinson


from

Amazon already accounts for about a quarter of all online sales in the United States. Now the company is holding talks to supersize its video-channel business, not just in the U.S. but around the globe.

In the past few weeks, Amazon has started talks to buy scores of small television channels, several major program providers confirmed to NBC News. A representative for Amazon declined to comment, but hinted there will be much to say in the coming weeks about its efforts in online video.

Currently, subscribers to Amazon Prime get TV, movies and music, as well as free shipping on online purchases. They can also pay extra for premium channels such as HBO and Showtime, along with a host of niche-interest services on topics such as health or horror.


VAST 2007: 10 Year Test of Time Award

IEEE VIS


from

Jigsaw is a visual analytics system, developed by the Georgia Institute of Technology, for enabling analysts and researchers to explore, analyze, and make sense of document collections. This 2007 VAST paper, co-authored by John Stasko, Carsten Görg, Zhicheng Liu, and Kanupriya Singhal, brought Jigsaw to light for the first time. Since then, the software has been used extensively by investigators, analysts, and researchers in many fields, including visualization, text analysis, journalism, law enforcement, finance, and so on According to Google Scholar, the paper has received over 400 citations. While a number of papers published in IEEE VAST 2007 have made significant impact because of their novel scientific contributions, this VAST paper on Jigsaw stands the test of the time with the highest impact.


Agricultural Goes High Tech

EE Times, Alix Paultre


from

For an endeavor often associated with simple tools, modern agriculture is a hive of engineering activity. Some of the development work targets mitigation strategies for environmental challenges; other research focuses on tech solutions for cultivation and harvesting. Advanced sensors, software, and systems that can operate precisely and under harsh conditions in the field are critical for monitoring and managing the multiple aspects of growing productive plants.

Germany, a country known for both its agriculture and its high tech, had an opportunity to showcase both at a recent open house at Geisenheim University, an agricultural institute situated in one of the best wine regions in Europe (Figure 1). Unsurprisingly, much of the research conducted at Geisenheim focuses on viticulture (Figure 2).


Mapping a Future Where Robots Have Taken Over the Supply Chain

Bloomberg, Justin Bachman


from

This future journey is an attempt to show one how different technologies may reshape the retail world. In a not-too-distant future, current tools and ideas may have reached a point where fulfilling our online consumption no longer requires many—or any—humans at all.

Of course, there is no guarantee this particular scenario will come to pass. Some unknown hurdle could cripple regulators’ appetite for self-driving vehicles. Low-margin shoe assembly might flee Asia if it’s cheaper to “print” footwear in local markets using 3D additive manufacturing tools. But the shipping and logistics worlds are keenly invested in the potential of these technologies. Some version of this reality is close.


Full video: Matt Zeiler of Clarifai at Code Commerce

Recode, Johana Bhuiyan


from

While there are a lot of uses for image recognition — you’ve probably most recently interacted with facial recognition, for instance, on Facebook — Zeiler said online retail stands to benefit significantly from using AI.

“You have to be thinking about it in your business,” Zeiler said at Recode’s Code Commerce event in New York. “This is going to change every interaction with your customers.” [video, 11:16]


At Stanford MedicineX a DIY diabetes e-patient becomes study PI

MedCity News, Arundhati Parmar


from

Dana Lewis has been a vocal proponent of the empowered patient movement and has assiduously chronicled her efforts in managing her Type 1 diabetes including building a better alarm for her continuous glucose monitor when traditional medtech manufacturers gave her unsatisfying answers.

On Friday, at the annual Stanford Medicine X conference at Stanford University in California that kicked off Friday and continues through the weekend, Lewis announced a new role for herself: the patient as principal investigator.

The study/project funded by Robert Wood Johnson Foundation is called”Learning to not wait: Opening pathways for discovery, research, and innovation in health and healthcare.” For the next 18 months, the project aims at studying data and innovation coming out of the diabetes community with a goal to eventually identify its needs and then create a resource framework that can help to scale such a community patient effort.


Cornell Tech will help make computers ‘accountable’

Cornell Chronicle


from

If computers are going to run everything, we probably should keep an eye on how they do it. A nationwide team of computer scientists has launched a project to make automated decision-making systems “accountable,” ensuring they respect privacy and make decisions fairly.

“There’s a lot of new technology being deployed in a variety of important settings, and we don’t fully understand all the ramifications,” said Thomas Ristenpart, associate professor of computer science at Cornell Tech.

Ristenpart and Helen Nissenbaum, Cornell Tech professor of information science, will be co-principal investigators for the project, which also includes scientists at Carnegie Mellon University and the International Computer Science Institute in Berkeley, California. The work will be supported by a $3 million, five-year grant from the National Science Foundation (NSF).

 
Events



AI for good: Big data challenges for disaster response and recovery.

PyData SoCal


from

Culver City, CA PyData SoCal Meetup on Thursday, October 12, starting at 6:30 p.m., DataScience.com (200 Corporate Pointe). [rsvp required]


DevFest DC 2017

Google Developer Group DC


from

Tysons, VA Monday, October 9, starting at 9 a.m., Sheraton Tysons Hotel (8661 Leesburg Pike). [$$]

 
Deadlines



CMU Sports Analaytics Conference

Pittsburgh, PA Conference is October 28. Deadline for research poster submissions is October 1.

Black in AI

Long Beach, CA The first Black in AI event will take place at NIPS 2017 on December 8. Deadline for submissions is October 13.
 
Tools & Resources



Data Security for Data Scientists

Medium, Andrew Therriault


from

Security is no longer just a niche specialty of database admins and network engineers. Everyone who creates, manages, analyzes, or even just has access to data is a potential point of failure in an organization’s security plan. So if you use data which is at all sensitive — that is, any data you wouldn’t freely give out to any random stranger on the internet — then it’s your responsibility to make sure that data is protected appropriately.

I had the importance of data security hammered home for me in 2016. While the DNC hack targeted the organization’s email servers (which my team didn’t interact with, except as normal email users), it’s not hard to imagine that someone who could get into those systems could also have found their way into our databases of voters and campaigns.


Introducing Azure confidential computing

Microsoft Azure, Mark Russinovich


from

I’m excited to announce that Microsoft Azure is the first cloud to offer new data security capabilities with a collection of features and services called Azure confidential computing. Put simply, confidential computing offers a protection that to date has been missing from public clouds, encryption of data while in use. This means that data can be processed in the cloud with the assurance that it is always under customer control. The Azure team, along with Microsoft Research, Intel, Windows, and our Developer Tools group, have been working on confidential computing software and hardware technologies for over four years. The bottom of this post includes a list of Microsoft Research papers related to confidential computing. Today we take that cutting edge one step further by now making it available to customers via an Early Access program.


[1709.04396] A Tutorial on Deep Learning for Music Information Retrieval

arXiv, Computer Science > Computer Vision and Pattern Recognition; Keunwoo Choi, György Fazekas, Kyunghyun Cho, Mark Sandler


from

“The goal of this paper is to boost the interest of beginners by providing a comprehensive tutorial and reducing the barriers to entry into deep learning for MIR. We lay out the basic principles and review prominent works in this hard to navigate field. We then outline the network structures that have been successful in MIR problems and facilitate the selection of building blocks for the problems at hand.”


This Font Makes Graphics Out Of Numbers In Seconds

Fast Company, Katharine Schwab


from

“Inspired by Edward Tufte’s ‘sparklines,’ the typeface turns typed numerals into inline data visualizations.”

 
Careers


Internships and other temporary positions

Internship , Periodic Table of Life (PeTaL)



NASA, Glenn Research Center; Cleveland, OH
Full-time positions outside academia

Data Engineer



MGH & BWH Center for Clinical Data Science; Boston, MA
Postdocs

Post-Doctoral Researcher – BOTFIND project



University of Oxford, Oxford Internet Institute; Oxford, England
Tenured and tenure track faculty positions

Assistant Professor – Statistics (2)



Syracuse University; Syracuse, NY

Leave a Comment

Your email address will not be published.