Data Science newsletter – August 29, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for August 29, 2018


Data Science News

The Untold Story of NotPetya, the Most Devastating Cyberattack in History

WIRED, Security, Andy Greenberg


Crippled ports. Paralyzed corporations. Frozen government agencies. How a single piece of code crashed the world.

How Facebook, Apple and Microsoft are contributing to an openly licensed map of the world

The Open Data Institute


At the State of the Map conference in Milan, the teams from Microsoft, Apple and Facebook presented their projects, describing how they are working with communities.

RENCI to Lead Two $1 Million Grants to Support Data-Intensive Scientific Research

University of North Carolina at Chapel Hill, Renaissance Computing Institute


Two new $1 million awards from the National Science Foundation aim to help researchers take advantage of the latest advances in data science, networking and computation while protecting the integrity of their scientific work. The Renaissance Computing Institute (RENCI) of the University of North Carolina at Chapel Hill will serve as lead institution on both projects.

Many scientists today use sophisticated data-intensive approaches to combine and analyze large data sets from scientific instruments and data stores all over the country. While these techniques hold great value for discovery and innovation, integrating the necessary data and tools into a scientist’s workflow is often a complex undertaking. In addition, errors can be introduced when data is moved or analyzed; if those errors go undetected, it can compromise the science.

With the two new grants, NSF’s Office of Advanced Cyberinfrastructure is supporting RENCI and its collaborators for developing a high-performance, end-to-end platform to facilitate the use and processing of scientific data and with developing a system to detect and diagnose unintentional data errors.

Government Data Science News

The National Institute of Health sent a letter to 10,000 researchers last week reminding them to be extremely cautious about sharing the proposals they review. The main concern is to protect innovative American scholarship from falling into foreign hands. The agency is investigating six unnamed labs for potentially sharing research with their shadow labs in other countries. I find it very interesting that the NIH is all atwitter about what seems to be a relatively small amount of potential leakage to foreign labs when the review system invites lots and lots of intellectual “borrowing” by reviewers of the projects under review. I’m sure everyone is always totally totally honest and has never been “inspired” to conduct research very similar to the projects outlined in the proposals they just turned down. Never happens…

The NIH also announced a new partnership with Google Cloud to distribute cloud computing to NIH collaborators at 2,500 institutions. This type of cloud computing partnership is critical for the advancement of data science in research universities. Thank you, Google Cloud.

The City of New York passed a law (45-0, no deadlock here) requiring Airbnb to share the names and addresses of hosts in order to crack down on illegal subletting. Retaliating swiftly, Airbnb has sued to prevent themselves from turning over the data. Expedia’s similar HomeAway division has filed a similar suit. Airbnb and HomeAway argue the law violates users’ constitutional rights to privacy. New York City alleges it cannot enforce existing rules designed to keep housing costs from skyrocketing without access to the companies’ data.

Tech savvy Senator Ron Wyden (D-OR) sent a letter to the Department of Justice asking for an investigation into the use of cell site simulators by law enforcement. The devices have conclusively been shown to disrupt 911 service, posing a threat to public safety. Typically, cell site simulators are used to intercept communications from a target phone, but “a cell-site simulator wouldn’t just disrupt service for the specific person or persons being tracked but would likely disrupt service for every mobile device in the area as it tricks every phone in the area into connecting to the fake base station in search of the target phone. This could be especially dangerous during a natural disaster”.

The National Science Foundation gave nearly $5 million to develop the Research Security and Operations Center (ResearchSOC) and provide cybersecurity for scientific computing. Unless the computing is happening in a centralized, generally controlled environment, I feel like this is going to have an extremely limited impact. Maybe the proof of concept will be compelling.

And the NSF is worried that facilities costs for big projects like the Large Synoptic Survey Telescope (LSST) are so high that supporting them with the current budget could leave no money to conduct research. Director Frances Córdova is investigating strategies to increase the operating and maintenance budgets across a variety of disciplines, but astronomy is the most pressing problem with the LSST scheduled to go live in 2020.

California’s House and Senate have passed a bill that would make all peer-reviewed research funded by the state of California open to the public no more than one year after publication. Governor Brown is now considering it. For-profit publisher Sage has a large office outside of Los Angeles and would be negatively impacted by the bill. The public, located all over the state, would be positively impacted.

Germany is starting their own version of DARPA which will initially focus only on cybersecurity with a little AI thrown in to keep it spicy. Their goal is to be cybersecurity independent so they can detect and fend off attacks without relying on private companies or other countries.

Is machine learning overhyped?

Chemical & Engineering News, Sam Lemonick


If machine learning is less valuable than it’s been claimed to be, says George Schatz, a physical chemist at Northwestern University, “people end up wasting time and effort” testing it in their labs. Scientists who invest training, lab time, and money into machine learning could find themselves in a difficult position if the tool doesn’t solve a problem as promised.

On the other hand, if machine learning is the wave of the future, chemists who aren’t using it risk falling behind their peers.

It probably won’t be possible to definitively answer the question “Is machine learning overhyped?” without the benefit of hindsight. But after conducting dozens of conversations with chemists, C&EN has found that a consensus about the current state of machine learning emerges.

Introducing High School Women to the World of Artificial Intelligence

Boston University, BU Today


On a recent afternoon, 25 young high school women gathered in a BU computer science (CS) classroom to learn about artificial intelligence (AI). Half of them were learning the basics of Computer Vision for American Sign Language alphabet recognition. They watched as a photograph of a house transformed on-screen into a numerical code that the artificial intelligence software could understand. The rest of the students were engaged in a project called Twitter Classification for Disaster Relief, where they were using a Naive Bayes classifier to classify by category tweets relating to Hurricane Sandy.

The young women were enrolled in Boston University AI4ALL, a new summer program designed to promote greater diversity and inclusion in the overwhelmingly male AI field. The three-week program, which concluded earlier this month, introduced the rising juniors and seniors from the Boston area to AI through a series of team projects, industry field trips, and presentations from guest speakers, capped off with a group research project and presentation to friends and family.

Airbnb Sues New York City to Block User-Data Bill Over Privacy

Bloomberg Technology, Olivia Zaleski and Gerrit De Vynck


Home-rental providers Airbnb Inc. and HomeAway are suing New York City over a recently passed law that allows the collection of data on the websites’ hosts, claiming the ordinance violates users’ constitutional rights.

Airbnb is hoping to avoid millions in losses when the law, designed to police short-term home rentals, takes effect this winter. The New York City legislation, which passed with a 45-0 vote, would require Airbnb and similar sites to share the names and addresses of their hosts with the city’s Office of Special Enforcement. After Airbnb sued Friday morning, Expedia Inc.’s HomeAway filed a similar case, saying the new legislation violates privacy laws.

How Microsoft plans to save the internet from bad actors with AccountGuard

diginomica, Jerry Bowles


Buried in the avalanche of hot political news this week are details of how Microsoft—yes, that Microsoft–is leading the charge to root out and destroy efforts by bad actors to disrupt free elections. Is the colossus of Redmond positioning itself as the internet’s top cop?

Microsoft’s Digital Crimes Unit (DCU) successfully executed a court order last week to disrupt and transfer control of six internet domains created by a group widely associated with the Russian government and known as Strontium, APT28, or more colorfully, Fancy Bear.

Social science replication crisis: studies in top journals keep failing to replicate

Vox, Brian Resnick


One of the cornerstone principles of science is replication. This is the idea that experiments need to be repeated to find out if the results will be consistent. The fact that an experiment can be replicated is how we know its results contain a nugget of truth. Without replication, we can’t be sure.

For the past several years, social scientists have been deeply worried about the replicability of their findings. Incredibly influential, textbook findings in psychology — like the “ego depletion” theory of willpower, or the “marshmallow test” — have been bending or breaking under rigorous retests. And the scientists have learned that what they used to consider commonplace methodological practices were really just recipes to generate false positives. This period has been called the “replication crisis” by some.

And the reckoning is still underway. Recently, a team of social scientists — spanning psychologists and economists — attempted to replicate 21 findings published in the most prestigious general science journals: Nature and Science. Some of the retested studies have been widely influential in science and in pop culture, like a 2011 paper on whether access to search engines hinders our memories, or whether reading books improves a child’s theory of mind (meaning their ability to understand that other people have thoughts and intentions different from their own).

On Monday, they’re publishing their results in the journal Nature Human Behavior.

The Strategic Value of Library Carpentry and The Carpentries to Research Libraries

The Carpentries, Elaine L. Westbrooks


The Data Science Community Newsletter, a helpful resource by Laura Noren and Brad Stanger, helps us track all the data science initiatives sweeping through higher ed. Brown, Harvard, NYU, Stanford, UC Irvine, UIUC, University of Michigan, and others have launched data science initiatives in the past few years and have been announced in each newsletter.

A recent paper titled Creating Institutional Change in Data Science from the Moore-Sloan Data Science Environments (MSDSE), NYU, UC Berkeley, and the University of Washington, highlights the importance of library services and spaces in driving these initiatives on campuses. “Partnering with our libraries”, according to the paper, “has been an important component of these efforts.” The report also notes that libraries’ engagement with The Carpentries has strengthened ties with groups on campus.

As the MSDSE report shows, libraries can serve as the institutional home for data science initiatives within their communities. In general, libraries are a natural hub for researchers, citizens, students, programmers: we help bring people together to tackle difficult challenges.

Bharat Anand named vice provost for advances in learning

Harvard Gazette


Since 2013, Anand has been the faculty chair of HBX, the Harvard Business School digital learning initiative that he helped to create. Anand said that he is looking forward to the challenge and opportunities of his University-wide role.

“Harvard has been a hotbed of innovations in pedagogy and learning during the last few years,” he said. “It’s a good time to take stock of what we’ve learned from these various projects and how this might inform our future efforts, while also recognizing that we are still probably in the early stages of imagining and shaping what the future of higher education will eventually look like. I’m looking forward to working with the many colleagues across the University who care deeply about these questions, and seeing how I can help with those efforts.”

Sidewalk Labs Reveals What Its Toronto Smart City May Look Like

Metropolis Magazine, Zachary Edelson


While questions surrounding data collection loom, the renderings hint at what the technology company hopes to achieve on the Toronto waterfront.

UW-Madison to Upgrade Engineering Campus With $100M Foxconn Gift

Xconomy, Jeff Buchanan


Foxconn, a leading Taiwanese contract manufacturer constructing a huge electronic display assembly plant in Southeastern Wisconsin, announced a $100 million gift to the state’s flagship public university Monday. The company’s gift to the University of Wisconsin-Madison will support research and development of new technologies statewide, Foxconn said.

The nine-figure gift from Foxconn is one of the largest in UW-Madison’s history, chancellor Rebecca Blank said during a signing ceremony held on the school’s campus Monday

The Foxconn gift will allow the university to replace a building on its engineering campus with a new facility that will serve engineering students, faculty, and staff, as well as researchers from other disciplines.


OpenML Fall Hackathon

Open Machine Learning


Paris, France September 17-21, co-located with COSEAL. [registration required]

Cornell’s Tech/Law Colloquium

Cornell Computing and Information Science, Cornell Law School, Artificial Intelligence, Policy, and Practice initiative


New York, NY September 4, starting at 7 p.m. in Myron Taylor Hall, room 182. Speaker: Karen Levy (Cornell University). [free]

Cory Doctorow with Jad Abumrad

Columbia University, The Brown Institute for Media Innovation


New York, NY September 27 starting at 5 p.m. A public lecture by Cory Doctorow followed by a Q&A with Jad Abumrad organized by Brown Institute, Columbia University. [free, registration required]

Tools & Resources

A search engine for color palettes

FlowingData, Picular


“Picular is a simple tool that lets you search for a topic, and with Google Images as source, outputs a set of colors related to your query. This is going to be a great timesaver.


Internships and other temporary positions

Scholarly Communication Assistant

New York University, NYU Libraries; New York, NY

Project Instructor (Part-time)

DataCamp; Anywhere
Full-time, non-tenured academic positions

Professor of Practice – Future of Libraries, Distinguished Practitioner in Residence

University of Washington Information School; Seattle, WA

Research Associate in Reinforcement Learning

University of Oxford, Department of Computer Science; Oxford, England

Post-doc in systems/cognitive neuroscience

University of California-San Diego, Department of Cognitive Science; San Diego, CA
Full-time positions outside academia

Data Visualization Designer

Capital One; Vienna, Virginia, or New York, New York

Leave a Comment

Your email address will not be published.