The Moore Foundation has a new project called Dat in the Lab with the California Digital Library that will pilot adding Dat to existing data management plans.
Single-blind vs. double-blind peer review is getting attention. From the authors’ perspective, “only one in eight authors actually chose to have their reviewers blinded when given the option.” A new paper on the arXiv (authors: Claire Le Goues, Yuriy Brun, Sven Apel, Emery Berger, Sarfraz Khurshid, Yannis Smaragdakis) suggests that double-blind reviewing does work. 74%-90% of reviewers could not guess the author of the blinded papers they reviewed. Further, all program committee chairs in the study “were strongly supportive of continuing to use doubleblind review in the future. All felt that double-blind review mitigated effects of (subconscious) bias.”
Cohen Veterans Bioscience announced its partnership with the Open Commons Consortium and the University of Chicago to establish the Brain Commons. The Brain Commons is a cloud-hosted platform designed to aggregate imaging data, genomic data, data from wearables, and clinical data to better understand brain conditions. It’s a little like the genomics project at Broad, but with a focus on brain conditions more than genomics.
UC Irvine took a $200m donation from a wealthy family to start an ‘interdisciplinary integrative health’ school. They are now taking heat from the medical community for allowing money to turn the school towards “junk science.” “They called integrative medicine a rebranding of alternative medicine — a collection of practices not supported by science, like homeopathy, and of ideas stolen from mainstream medicine, like nutrition. Those stolen ideas do not need to be integrated into mainstream medicine because they are already part of it,” wrote one critic, Steven Novella, on the blog Science-Based Medicine.
The p-hacking debate continues this week with a new paper out arguing for us to Abandon Statistical Significance. Its authors want a science future in which “p-values are just one of many pieces of information with no privileged role in scientific publication and decision making.” It was published in arXiv, Statistics. Andrew Gelman, a co-author, adds more on his blog.
We are excited to announce a new project called Dat in the Lab. The project is a collaboration between the us and the California Digital Library (CDL) with support from the Gordon and Betty Moore Foundation. Through this project, we are also happy to welcome Danielle Robinson to the Code for Science & Society team as our Scientific and Partnerships Director. Dat in the Lab will pilot integrating Dat into existing research data management workflows. This $180k grant will support us as we pilot Dat for research data management in close collaboration with two University of California research groups and with the expert support of CDL. We will work with these researchers to develop workflows to manage, sync, version, and publish datasets.
Cultural pundits can close the book on 2017: The biggest artistic achievement of the year has already taken place. It didn’t happen in a paint-splattered studio on the outskirts of Beijing, Singapore, or Berlin. It didn’t happen at the Venice Biennale. It happened in New Brunswick, New Jersey, just off Exit 9 on the Turnpike.
That’s the home of the main campus of Rutgers University—all four square miles and 640 buildings of it, including the school’s Art and Artificial Intelligence Lab (AAIL). Nobody would mistake this place as an incubator for fine art. It looks like a bootstrap startup, all cubicles and gray carpet, with lots of cheap Dell monitors and cork boards filled with tech gibberish.
On Earth, sustainable transportation deserves attention, especially when you consider all of the words, time and money spent on autonomous cars. First, a look at NYC’s citywide bike travel times. Next, a Boston-area college student is blogging his travels on all 1280 miles of the MBTA public transit system.
arXiv, Computer Science > Digital Libraries; Claire Le Goues, Yuriy Brun, Sven Apel, Emery Berger, Sarfraz Khurshid, Yannis Smaragdakis
from
Double-blind review relies on the authors’ ability and willingness to effectively anonymize their submissions. We explore anonymization effectiveness at ASE 2016, OOPSLA 2016, and PLDI 2016 by asking reviewers if they can guess author identities. We find that 74%-90% of reviews contain no correct guess and that reviewers who self-identify as experts on a paper’s topic are more likely to attempt to guess, but no more likely to guess correctly. We present our findings, summarize the PC chairs’ comments about administering double-blind review, discuss the advantages and disadvantages of revealing author identities part of the way through the process, and conclude by advocating for the continued use of double-blind review.
This ranking of top computer science schools is designed to identify institutions and faculty actively engaged in research across a number of areas of computer science, based on the number of publications by faculty that have appeared at the most selective conferences in each area of computer science (see the FAQ for more details).
Highlighting the growing ties between NVIDIA and researchers at elite universities, NVIDIA founder and CEO Jensen Huang spoke to a gathering of AI researchers at the University of California, Berkeley, Wednesday.
The talk kicked off BAIR NVIDIA AI Day at UC Berkeley’s Memorial Stadium, which brought together researchers from the university and the company for talks and demos.
“You here at UC Berkeley are at the intersection of artificial intelligence, computer science and autonomous machines,” Jensen, clad in his trademark black leather jacket, told more than 400 students, faculty and researchers. “Really, it can’t get any better than that.”
It’s difficult to pinpoint how far particulate matter from fires travels. Ground-based monitors are sparse in parts of the U.S., and satellites can’t always tell the altitude of the smoke particles.
Posts on Facebook can help signal how far the particles from a fire have spread and can help researchers and health organizations understand the effects of smoke exposure on a population.
Tech-recruiting firm Cyber Coders revealed new data that showcased the most in-demand IT, accounting and construction vertical jobs and salaries. The information technology section in particular showed that IT workers in the private sector tended to have higher salaries than federal IT workers.
Software Engineering was found to be the most in-demand job in information technology, with an annual salary range of $99,106 to $139,165.
Government Data Science News
The US Federal government informed 21 states that their election systems had been hacked possibly by Russian operatives. The advice from the government noted that “in most of the 21 states the targeting was preparatory activity such as scanning computer systems…including voter registration systems.”
Oh, Canada! The country is experiencing a brain gain following the US election of President Trump. Axios reports that, “numerous startups in the tech hub of Toronto say they have had steady, double-digit increases in job applications from the United States.”
Another feather in Canada’s science cap is Mona Nemer, a cardiologist and VP of Research at University of Ottawa who has just been named the new Chief Science Adviser. Justin Trudeau said, “scientists need to have a voice” which is yet another reason to love Trudeau and his delightful Canadian scientists.
Washington DC has a rat problem. I mean actual furry, four-legged, long-tailed rats. This is not a euphemism for weaselly politicians. The city is using smart trash cans and predictive analytics to route sanitation pick-ups to receptacles as they get full, leaving poorer foraging grounds for the vermin.
Los Angeles Police Department was open-minded enough to let sociologist Sarah Brayne conduct field work among them as they adopted Palantir technology to track potential criminals. She found lots of what we would expect: low income populations are more heavily surveilled and thus more likely to be arrested. Not all police love the technology. It is easier to track crimes across jurisdictions with a decent shared database. This is the type of ethnographic engagement that can help us figure out how to design data interventions in actual organizations going forward. Hooray for ethnography.
Speaking of government databases for tracking individuals, the Department of Homeland Security is apparently capturing the social media data and search results of all immigrants, including permanent residents and naturalized citizens. (Wow. Even naturalized citizens?) Civil rights groups oppose this move as a violation of privacy with a chilling impact on the democratic exercise of free speech. DHS notes they have been conducting this type of surveillance and will continue to do so as they feel it is their right.
Columbia University School of Social Work assistant professor Desmond Patton is taking a less reflexive approach to data science in crime settings by capturing gang members social media posts. The goal of the project is to reduce gang violence by building an “aggression-indicator.” The project is funded by DARPA. Hopefully the project is also mindful that there may be unintended consequences that befall people who are being tracked and flagged for aggressive tendencies.
But big projects like “All of Us” cost a lot of money. House Representatives Elise Stefanik (R-NY) and Scott Peters (D-CA) are heading up a bipartisan group to fully fund the NIH. They note that funding has shrunk since 2003. This left “NIH’s purchasing power eroded by nearly 25 percent and forced NIH to abandon half of its promising research every year.” I seriously hope they are successful.
Anyone who understands how Facebook works shouldn’t have been surprised. That’s because the same digital platform that offers us social interaction, news, entertainment and shopping all in one place makes its money by making it cheap and easy to send us commercial or political messages, often guided by algorithms. The recent scandal is just a reminder.
Almost every feature on Facebook is designed to make the site engaging — to encourage you to spend time there while Facebook serves more ads to you. At the same time, users can share “promoted posts” — targeted messages that advertisers pay Facebook to place in their feeds — merging pay-for-play content with the natural flow of information among friends and family. It’s a powerful combination.
When Airbnb announced it led a $13 million Series A investment in New York City-based dining reservations platform Resy in January, it was readily assumed that we’d see some sort of synergy between the two companies — aside from the fact that Airbnb CEO Brian Chesky’s sister, Allison Chesky, also happens to be the managing editor of Resy.
And on Wednesday, we did, when Airbnb announced an expansion of an earlier pilot program it ran in May, allowing Airbnb app users to book restaurant reservations, powered by Resy (see video below for an explanation of how it works).
Most papers are reviewed in single-blind fashion—that is, the reviewers know who the authors are, but not vice versa. In theory, that knowledge allows them to exercise a conscious or unconscious bias against researchers from certain countries, ethnic minorities, or women, and be kinder to people who are already well-known in their field. Double-blind reviews, the argument goes, would remove those prejudices. A 2007 study of Behavioral Ecology found that the journal published more articles by female authors when using double-blind reviews—although that conclusion was challenged by other researchers a year later. In a survey of more than 4000 researchers published in 2013, three-quarters said they thought double-blind review is “the most effective method.”
Nearly two-thirds of students who entered the University of Texas at Austin as freshmen in 2013 graduated on time, reflecting significant improvement for the state’s flagship campus amid a national push to get more students through college in four years.
Data released Wednesday show that UT-Austin’s four-year graduation rate rose from 52 percent in 2013 to 66 percent this year. The growth spanned racial groups and family income levels, the university said. Students whose finances made them eligible for Pell Grants made especially large gains, narrowing historic gaps in degree attainment.
Universities around the country are learning this year that free speech is far from free.
University of California President Janet Napolitano said Wednesday that the UC System plans to reimburse its Berkeley campus for half of the cost of security for conservative commentator Ben Shapiro’s speech there last week. Splitting the bill, Napolitano said, means that her office will pay about $300,000 while UC Berkeley pays an equal amount.
Data for Democracy, Bloomberg, and BrightHive have pledged to develop a code of ethics for data science. They’re calling it the Community Principles on Ethical Data Sharing and want it to be akin to the Hippocratic Oath doctors use. It’s so good to see the time and thought going into data science ethics.
Hilary Mason talks about moving from Fast Forward Labs, a data science company she founded, to Cloudera (who acquired the company). She outlines the organizational structure that has to be in place for data science to be effective, which is one of my favorite topics.
Deep Genomics a drug development startup powered by deep learning in Toronto, is expanding rapidly and just raised another $13m in venture capital. The company was co-founded by a University of Toronto professor Brendan Frey and is just one example of the burgeoning deep learning, AI, and data science tech scene in Canada.
A Tinder user living in Europe requested her data history, as European citizens have a right to do. She started using the app in late December of 2013 and generated 800 pages of highly personal data. She was not thrilled. Online dating is riddled with pitfalls. Worrying about which companies know your whereabouts – because people have to type in addresses of places to meet up – and sexual tastes does give pause, no?
Microsoft is setting up a new healthcare department at its Cambridge research facility, as part of plans to use its artificial intelligence software to enter the health market.
The computer giant has created the division as part of its commitment to “transform healthcare” using technologies such as machine learning and cloud computing.
Its research plans include monitoring systems that can help keep patients out of hospitals and alert them in a timely manner about problems, and large studies into diseases such as diabetes.
While some of the most potentially game-changing and pervasive 4IR solutions have not been invented yet, we have a good idea of some of the 4IR tech-related trends that are looking inevitable over the next 20-30 years. Here’s seven that venture capitalists have front of mind as they look to find the world’s most important companies of tomorrow, with examples of the opportunities they present for a sustainable Fourth Industrial Revolution.
1. Proliferation of AI. The convergence of machine learning, big data and hardware advances speeding up computations (graphical processing units – GPUs – and soon deep learning chips) have brought AI from “in vitro” into everyday life. Because of AI, the 4IR is an intelligence and productivity revolution.
Georgia Institute of Technology, College of Computing
from
Researchers at the Georgia Institute of Technology and the University of Michigan will develop innovative approaches to detecting and deterring the computerized manipulation of financial markets under a $1 million grant from the National Science Foundations’s Big Data program.
Market manipulation, also known as price manipulation, is the deliberate use of misleading information with the intent of deceiving investors about the supply or demand of a security for financial gain or other advantage. Increasingly, manipulators attack market integrity through complex computer-controlled attacks.
Huntsville, AL November 4-5. “Goal: Inspire creativity through collaboration, and energize the current and next generation of problem-solvers to conceive and construct innovative solutions to biotech challenges.” [$$]
Delta Data and Teaching Fellowship is a 6-month commitment that you make to your team and to your non-profit client. The typical time commitment is 5-10 hours per week, with monthly cohort-wide Sunday working sessions. Additionally, at this time we are only accepting applicants living in the San Francisco Bay Area. Deadline for applications is September 30.
As an ORISE Research Participant in the NIAID Emerging Leaders in Data Science Fellowship Program in the OSMO Office of Data Science and Emerging Technologies, the participant will receive training and hands-on-experience in applying and managing big data, bioinformatics strategies and computational platforms and tool development to study infectious, immunological, and allergic diseases in both extramural and intramural divisions at NIAID and will receive training in the intersection and management of big data and disease-oriented data-intense research efforts. Deadline to apply is October 15.
Long Beach, CA December 8. “Machine deception refers to the capacity for machines to act as a medium through which human and other machine agents may be manipulated to believe, act upon or otherwise accept false information.” Deadline for submissions is November 1.
“Stack Overflow used data from their developer survey to build a prediction model for salary, based on role, location, education, experience, and skills. The result was a salary calculator that you can use to gauge how much you should be making.”
Visualizations can confer useful information about what a network is learning. When building a Convolutional Neural Network to identify objects in images, we might want to be able to interpret the model’s predictions. For example, we might want to explain why the network classifies a particular image as a spaceship. In this post, we look at papers that introduce visualization techniques for CNN-based image classification models.
We first look at Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
This paper introduces two ideas for visualizing the workings of the neural network. Both of them require computing of the gradient of the output with respect to the input image.
I want to recommend several credible sources of accurate information. Most of the writing on this list is intended to be accessible to anyone—even if you aren’t a programmer or don’t work in tech.