Data Science newsletter – September 25, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for September 25, 2017

GROUP CURATION: N/A

 
 
Data Science News



University Data Science News

Naftali Tishby of Hebrew University of Jerusalem introduced the information bottleneck theory of learning that could explain how deep neural networks ‘learn’. There’s a youtube video in which Tishby explains how “the most important part of learning is actually forgetting.”

The Moore Foundation has a new project called Dat in the Lab with the California Digital Library that will pilot adding Dat to existing data management plans.

Single-blind vs. double-blind peer review is getting attention. From the authors’ perspective, “only one in eight authors actually chose to have their reviewers blinded when given the option.” A new paper on the arXiv (authors: Claire Le Goues, Yuriy Brun, Sven Apel, Emery Berger, Sarfraz Khurshid, Yannis Smaragdakis) suggests that double-blind reviewing does work. 74%-90% of reviewers could not guess the author of the blinded papers they reviewed. Further, all program committee chairs in the study “were strongly supportive of continuing to use doubleblind review in the future. All felt that double-blind review mitigated effects of (subconscious) bias.”



Georgia Institute of Technology and University of Michigan researchers received an NSF grant to combat computerized market manipulation in the financial system.

Cohen Veterans Bioscience announced its partnership with the Open Commons Consortium and the University of Chicago to establish the Brain Commons. The Brain Commons is a cloud-hosted platform designed to aggregate imaging data, genomic data, data from wearables, and clinical data to better understand brain conditions. It’s a little like the genomics project at Broad, but with a focus on brain conditions more than genomics.

UC Irvine took a $200m donation from a wealthy family to start an ‘interdisciplinary integrative health’ school. They are now taking heat from the medical community for allowing money to turn the school towards “junk science.” “They called integrative medicine a rebranding of alternative medicine — a collection of practices not supported by science, like homeopathy, and of ideas stolen from mainstream medicine, like nutrition. Those stolen ideas do not need to be integrated into mainstream medicine because they are already part of it,” wrote one critic, Steven Novella, on the blog Science-Based Medicine.

Stanford University has spun out a startup in the precision medicine/health care analytics space led by two professors. It’s very interesting to see so much entrepreneurialism exploding out of universities these days.

The p-hacking debate continues this week with a new paper out arguing for us to Abandon Statistical Significance. Its authors want a science future in which “p-values are just one of many pieces of information with no privileged role in scientific publication and decision making.” It was published in arXiv, Statistics. Andrew Gelman, a co-author, adds more on his blog.


Moore Foundation supports new collaboration with California Digital Library (and a new team member!)happy datquestionstar-dat

Dat Project


from

We are excited to announce a new project called Dat in the Lab. The project is a collaboration between the us and the California Digital Library (CDL) with support from the Gordon and Betty Moore Foundation. Through this project, we are also happy to welcome Danielle Robinson to the Code for Science & Society team as our Scientific and Partnerships Director. Dat in the Lab will pilot integrating Dat into existing research data management workflows. This $180k grant will support us as we pilot Dat for research data management in close collaboration with two University of California research groups and with the expert support of CDL. We will work with these researchers to develop workflows to manage, sync, version, and publish datasets.


It’s Getting Hard to Tell If a Painting Was Made by a Computer or a Human

Artsy, Rene Chun


from

Cultural pundits can close the book on 2017: The biggest artistic achievement of the year has already taken place. It didn’t happen in a paint-splattered studio on the outskirts of Beijing, Singapore, or Berlin. It didn’t happen at the Venice Biennale. It happened in New Brunswick, New Jersey, just off Exit 9 on the Turnpike.

That’s the home of the main campus of Rutgers University—all four square miles and 640 buildings of it, including the school’s Art and Artificial Intelligence Lab (AAIL). Nobody would mistake this place as an incubator for fine art. It looks like a bootstrap startup, all cubicles and gray carpet, with lots of cheap Dell monitors and cork boards filled with tech gibberish.


Extra Extra

There’s a great story in The Atlantic that follows four recent astronomical detections of gravitational waves. The waves form when black holes collide, creating ripples in the space-time continuum.

On Earth, sustainable transportation deserves attention, especially when you consider all of the words, time and money spent on autonomous cars. First, a look at NYC’s citywide bike travel times. Next, a Boston-area college student is blogging his travels on all 1280 miles of the MBTA public transit system.

And Michelin has reinvented the wheel out of biodegradable materials without air-filled tubes. No flats possible. The wheel is not manufacturable yet, but they are working on it.


[1709.01609] Effectiveness of Anonymization in Double-Blind Review

arXiv, Computer Science > Digital Libraries; Claire Le Goues, Yuriy Brun, Sven Apel, Emery Berger, Sarfraz Khurshid, Yannis Smaragdakis


from

Double-blind review relies on the authors’ ability and willingness to effectively anonymize their submissions. We explore anonymization effectiveness at ASE 2016, OOPSLA 2016, and PLDI 2016 by asking reviewers if they can guess author identities. We find that 74%-90% of reviews contain no correct guess and that reviewers who self-identify as experts on a paper’s topic are more likely to attempt to guess, but no more likely to guess correctly. We present our findings, summarize the PC chairs’ comments about administering double-blind review, discuss the advantages and disadvantages of revealing author identities part of the way through the process, and conclude by advocating for the continued use of double-blind review.


CSRankings: Computer Science Rankings

Emergy Berger


from

This ranking of top computer science schools is designed to identify institutions and faculty actively engaged in research across a number of areas of computer science, based on the number of publications by faculty that have appeared at the most selective conferences in each area of computer science (see the FAQ for more details).


NVIDIA CEO Jensen Huang Kicks Off Berkeley AI Event

The Official NVIDIA Blog


from

Highlighting the growing ties between NVIDIA and researchers at elite universities, NVIDIA founder and CEO Jensen Huang spoke to a gathering of AI researchers at the University of California, Berkeley, Wednesday.

The talk kicked off BAIR NVIDIA AI Day at UC Berkeley’s Memorial Stadium, which brought together researchers from the university and the company for talks and demos.

“You here at UC Berkeley are at the intersection of artificial intelligence, computer science and autonomous machines,” Jensen, clad in his trademark black leather jacket, told more than 400 students, faculty and researchers. “Really, it can’t get any better than that.”


Using social media data to help measure smoke exposure

Facebook Research; Moira Burke, Bonne Ford, Jeffrey Pierce


from

It’s difficult to pinpoint how far particulate matter from fires travels. Ground-based monitors are sparse in parts of the U.S., and satellites can’t always tell the altitude of the smoke particles.

Posts on Facebook can help signal how far the particles from a fire have spread and can help researchers and health organizations understand the effects of smoke exposure on a population.


How salaries for federal software engineers stack up against the private sector

Fifth Domain, Armin Haracic


from

Tech-recruiting firm Cyber Coders revealed new data that showcased the most in-demand IT, accounting and construction vertical jobs and salaries. The information technology section in particular showed that IT workers in the private sector tended to have higher salaries than federal IT workers.

Software Engineering was found to be the most in-demand job in information technology, with an annual salary range of $99,106 to $139,165.


Government Data Science News

The US Federal government informed 21 states that their election systems had been hacked possibly by Russian operatives. The advice from the government noted that “in most of the 21 states the targeting was preparatory activity such as scanning computer systems…including voter registration systems.”

Oh, Canada! The country is experiencing a brain gain following the US election of President Trump. Axios reports that, “numerous startups in the tech hub of Toronto say they have had steady, double-digit increases in job applications from the United States.”

Another feather in Canada’s science cap is Mona Nemer, a cardiologist and VP of Research at University of Ottawa who has just been named the new Chief Science Adviser. Justin Trudeau said, “scientists need to have a voice” which is yet another reason to love Trudeau and his delightful Canadian scientists.

Washington DC has a rat problem. I mean actual furry, four-legged, long-tailed rats. This is not a euphemism for weaselly politicians. The city is using smart trash cans and predictive analytics to route sanitation pick-ups to receptacles as they get full, leaving poorer foraging grounds for the vermin.



Los Angeles Police Department was open-minded enough to let sociologist Sarah Brayne conduct field work among them as they adopted Palantir technology to track potential criminals. She found lots of what we would expect: low income populations are more heavily surveilled and thus more likely to be arrested. Not all police love the technology. It is easier to track crimes across jurisdictions with a decent shared database. This is the type of ethnographic engagement that can help us figure out how to design data interventions in actual organizations going forward. Hooray for ethnography.



Speaking of government databases for tracking individuals, the Department of Homeland Security is apparently capturing the social media data and search results of all immigrants, including permanent residents and naturalized citizens. (Wow. Even naturalized citizens?) Civil rights groups oppose this move as a violation of privacy with a chilling impact on the democratic exercise of free speech. DHS notes they have been conducting this type of surveillance and will continue to do so as they feel it is their right.

Columbia University School of Social Work assistant professor Desmond Patton is taking a less reflexive approach to data science in crime settings by capturing gang members social media posts. The goal of the project is to reduce gang violence by building an “aggression-indicator.” The project is funded by DARPA. Hopefully the project is also mindful that there may be unintended consequences that befall people who are being tracked and flagged for aggressive tendencies.



The NIH‘s “All of Us” precision medicine through genetics program is aiming to recruit more than 1 million people. If you want to participate, sign up here.

But big projects like “All of Us” cost a lot of money. House Representatives Elise Stefanik (R-NY) and Scott Peters (D-CA) are heading up a bipartisan group to fully fund the NIH. They note that funding has shrunk since 2003. This left “NIH’s purchasing power eroded by nearly 25 percent and forced NIH to abandon half of its promising research every year.” I seriously hope they are successful.


Facebook’s Ad Scandal Isn’t a ‘Fail,’ It’s a Feature

The New York Times, Zeynep Tufekci


from

Anyone who understands how Facebook works shouldn’t have been surprised. That’s because the same digital platform that offers us social interaction, news, entertainment and shopping all in one place makes its money by making it cheap and easy to send us commercial or political messages, often guided by algorithms. The recent scandal is just a reminder.

Almost every feature on Facebook is designed to make the site engaging — to encourage you to spend time there while Facebook serves more ads to you. At the same time, users can share “promoted posts” — targeted messages that advertisers pay Facebook to place in their feeds — merging pay-for-play content with the natural flow of information among friends and family. It’s a powerful combination.


This Is What Airbnb Should Do With Resy

Skift, Deanna Ting


from

When Airbnb announced it led a $13 million Series A investment in New York City-based dining reservations platform Resy in January, it was readily assumed that we’d see some sort of synergy between the two companies — aside from the fact that Airbnb CEO Brian Chesky’s sister, Allison Chesky, also happens to be the managing editor of Resy.

And on Wednesday, we did, when Airbnb announced an expansion of an earlier pilot program it ran in May, allowing Airbnb app users to book restaurant reservations, powered by Resy (see video below for an explanation of how it works).


Few authors choose anonymous peer review, massive study of Nature journals shows

Science, News, Martin Enserink


from

Most papers are reviewed in single-blind fashion—that is, the reviewers know who the authors are, but not vice versa. In theory, that knowledge allows them to exercise a conscious or unconscious bias against researchers from certain countries, ethnic minorities, or women, and be kinder to people who are already well-known in their field. Double-blind reviews, the argument goes, would remove those prejudices. A 2007 study of Behavioral Ecology found that the journal published more articles by female authors when using double-blind reviews—although that conclusion was challenged by other researchers a year later. In a survey of more than 4000 researchers published in 2013, three-quarters said they thought double-blind review is “the most effective method.”


‘A more efficient university’: UT-Austin raises on-time graduation rate.

The Washington Post, Nick Anderson


from

Nearly two-thirds of students who entered the University of Texas at Austin as freshmen in 2013 graduated on time, reflecting significant improvement for the state’s flagship campus amid a national push to get more students through college in four years.

Data released Wednesday show that UT-Austin’s four-year graduation rate rose from 52 percent in 2013 to 66 percent this year. The growth spanned racial groups and family income levels, the university said. Students whose finances made them eligible for Pell Grants made especially large gains, narrowing historic gaps in degree attainment.


‘Substantial cost’: University of California foots major security bill for free speech

The Washington Post, Nick Anderson


from

Universities around the country are learning this year that free speech is far from free.

University of California President Janet Napolitano said Wednesday that the UC System plans to reimburse its Berkeley campus for half of the cost of security for conservative commentator Ben Shapiro’s speech there last week. Splitting the bill, Napolitano said, means that her office will pay about $300,000 while UC Berkeley pays an equal amount.


Company Data Science News

Microsoft’s CEO Satya Nadella has a new book out, Hit Refresh: The quest to rediscover Microsoft’s soul and imagine a better future for everyone and he is doing a bit of a publicity tour around it. This is also the one-year anniversary of Microsoft’s AI and Research group which now employs 8000 people. Nadella’s aim is to “democratize AI” which is a promising vision. The company released new machine learning tools to help spread the power of data science to the people.

Data for Democracy, Bloomberg, and BrightHive have pledged to develop a code of ethics for data science. They’re calling it the Community Principles on Ethical Data Sharing and want it to be akin to the Hippocratic Oath doctors use. It’s so good to see the time and thought going into data science ethics.



Hilary Mason talks about moving from Fast Forward Labs, a data science company she founded, to Cloudera (who acquired the company). She outlines the organizational structure that has to be in place for data science to be effective, which is one of my favorite topics.

Deep Genomics a drug development startup powered by deep learning in Toronto, is expanding rapidly and just raised another $13m in venture capital. The company was co-founded by a University of Toronto professor Brendan Frey and is just one example of the burgeoning deep learning, AI, and data science tech scene in Canada.



A Tinder user living in Europe requested her data history, as European citizens have a right to do. She started using the app in late December of 2013 and generated 800 pages of highly personal data. She was not thrilled. Online dating is riddled with pitfalls. Worrying about which companies know your whereabouts – because people have to type in addresses of places to meet up – and sexual tastes does give pause, no?


Microsoft launches new healthcare division based on artificial intelligence software

The Telegraph (UK), Cara McGoogan


from

Microsoft is setting up a new healthcare department at its ­Cambridge research facility, as part of plans to use its artificial intelligence software to ­enter the health market.

The computer giant has created the division as part of its commitment to “transform healthcare” using technologies such as machine learning and cloud computing.

Its research plans ­include monitoring systems that can help keep patients out of hospitals and alert them in a timely manner about problems, and large studies into ­diseases such as diabetes.


7 Ways the 4IR Can De-stress the Planet

World Economic Forum, Celine Herweijer


from

While some of the most potentially game-changing and pervasive 4IR solutions have not been invented yet, we have a good idea of some of the 4IR tech-related trends that are looking inevitable over the next 20-30 years. Here’s seven that venture capitalists have front of mind as they look to find the world’s most important companies of tomorrow, with examples of the opportunities they present for a sustainable Fourth Industrial Revolution.

1. Proliferation of AI. The convergence of machine learning, big data and hardware advances speeding up computations (graphical processing units – GPUs – and soon deep learning chips) have brought AI from “in vitro” into everyday life. Because of AI, the 4IR is an intelligence and productivity revolution.


Georgia Tech, Michigan Researchers Funded to Deter Financial Market Manipulation

Georgia Institute of Technology, College of Computing


from

Researchers at the Georgia Institute of Technology and the University of Michigan will develop innovative approaches to detecting and deterring the computerized manipulation of financial markets under a $1 million grant from the National Science Foundations’s Big Data program.

Market manipulation, also known as price manipulation, is the deliberate use of misleading information with the intent of deceiving investors about the supply or demand of a security for financial gain or other advantage. Increasingly, manipulators attack market integrity through complex computer-controlled attacks.

 
Events



Visual Trumpery – a lecture by Alberto Cairo

Data Visualization New York Meetup


from

New York, NY Thursday, October 12, starting at 7:30 p.m., The New School (66 West 12th St.) [free,registration required]


EGG2017: Innovate. Get Ahead. Disrupt. And Embrace Unconventionality.

Dataiku


from

New York, NY November 30. “Dataiku’s first large-scale data science and analytics conference.” [$$$]


Pfizer, Veritas, MGH Join Xconomy’s Healthcare + A.I. Conference

Xconomy


from

Cambridge, MA November 2. “The half-day conference, called Healthcare + A.I., is happening at Pfizer’s offices in Cambridge.” [$$$]


HudsonAlpha’s Tech Challenge

HudsonAlpha Institute for Biotechnology


from

Huntsville, AL November 4-5. “Goal: Inspire creativity through collaboration, and energize the current and next generation of problem-solvers to conceive and construct innovative solutions to biotech challenges.” [$$]

 
Deadlines



Apply to be a fellow – Delta Analytics

Delta Data and Teaching Fellowship is a 6-month commitment that you make to your team and to your non-profit client. The typical time commitment is 5-10 hours per week, with monthly cohort-wide Sunday working sessions. Additionally, at this time we are only accepting applicants living in the San Francisco Bay Area. Deadline for applications is September 30.

NIAID Emerging Leaders in Data Science Fellowship

As an ORISE Research Participant in the NIAID Emerging Leaders in Data Science Fellowship Program in the OSMO Office of Data Science and Emerging Technologies, the participant will receive training and hands-on-experience in applying and managing big data, bioinformatics strategies and computational platforms and tool development to study infectious, immunological, and allergic diseases in both extramural and intramural divisions at NIAID and will receive training in the intersection and management of big data and disease-oriented data-intense research efforts. Deadline to apply is October 15.

MACHINE DECEPTION – NIPS 2017 workshop

Long Beach, CA December 8. “Machine deception refers to the capacity for machines to act as a medium through which human and other machine agents may be manipulated to believe, act upon or otherwise accept false information.” Deadline for submissions is November 1.
 
Tools & Resources



Stack Overflow salary calculator for developers

FlowingData, Nathan Yau, Stack Overflow


from

Stack Overflow used data from their developer survey to build a prediction model for salary, based on role, location, education, experience, and skills. The result was a salary calculator that you can use to gauge how much you should be making.”


Visualizing A Convolutional Neural Network’s Predictions

Pranav Rajpurkar


from

Visualizations can confer useful information about what a network is learning. When building a Convolutional Neural Network to identify objects in images, we might want to be able to interpret the model’s predictions. For example, we might want to explain why the network classifies a particular image as a spaceship. In this post, we look at papers that introduce visualization techniques for CNN-based image classification models.

We first look at Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

This paper introduces two ideas for visualizing the workings of the neural network. Both of them require computing of the gradient of the output with respect to the input image.


What’s Going On in This Graph?

The New York Times, AMSTAT


from

Graphs, maps and charts from The New York Times. And an invitation to students to discuss them live.


Credible sources of accurate information about AI

fast.ai, Rachel Thomas


from

I want to recommend several credible sources of accurate information. Most of the writing on this list is intended to be accessible to anyone—even if you aren’t a programmer or don’t work in tech.

 
Careers


Internships and other temporary positions

Internship – Data Science and Modern Visualization



NASA, Glenn Research Center; Cleveland, OH
Postdocs

Center for Population Biology Postdoc Fellowship



University of California-Davis; Davis, CA
Tenured and tenure track faculty positions

Assistant Clinical Professor of Law & Associate Director – Samuelson Law, Technology and Public Policy Clinic



University of California-Berkeley, School of Law; Berkeley, CA
Full-time positions outside academia

Research Scientist (AI)



Facebook AI Research; Montreal, Canada

Data Scientist, Machine Learning and AI



Etsy; Brooklyn, NY

Leave a Comment

Your email address will not be published.