Data Science newsletter – June 1, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for June 1, 2018

GROUP CURATION: N/A

Data Science News

Twitter, Moments

from May 31, 2018

A summary of the #ethicalCS Twitter edchat on May 30th, 2018 about engaging others in ethically-minded Computer Science education

Carroll College’s new data major aims to make students more marketable

Helena Independent Record, Erin Loranger

from May 30, 2018

Carroll College announced a new degree program in data science Tuesday that will help prepare students to enter one of the fastest growing career paths in the country.

The data science major and minor will combine computer science, math, statistics, and business to analyze and understand data in various forms.

“We are at the forefront of what other colleges and universities around Montana are doing to train students in the skills required to be a Data Scientist,” Eric Sullivan, associate professor of mathematics, said in a statement. “Coming out of our program, our students will understand data, be able to write programs to make sense of it, and will be able to apply it to the specific demands of whichever industry they’re working in.”

Locking the lab: Globalization raises concerns about research data security on campus

Madison.com, Pat Schneider

from May 30, 2018

The haul from university computer systems by an Iranian hacking ring was mind-boggling: 31 terabytes — or trillion bytes — of data and intellectual property. It was valued at $3.4 billion and originated from more than 300 institutions, 144 of them in the United States.

Weeks after federal prosecutors brought indictments in the case, a congressional hearing on April 11 asked how universities might better protect intellectual property and national security data from theft by adversarial nations.

Universities need to beef up security and lock their laboratories to prevent losses of what is often publicly funded research, said members of subcommittees of the House Science, Space and Technology Committee.

New tool improves fishing efficiency and sustainability

Stanford University, Stanford News

from May 30, 2018

New software targets the most abundant fishing grounds and reduces catch of unwanted or protected species using satellite data, maps and observations.

Intermountain, University of Michigan advance precision medicine for cardiovascular care, cancer

Healthcare IT News, Bernie Monegain

from May 30, 2018

Two new studies shine a light on the promise and accelerating speed of precision medicine and genomics capabilities to transform healthcare faster than many observers might imagine.

Intermountain Healthcare and the University of Michigan separately conducted research projects that uncovered a genetic link in a rare cardiovascular condition and a flaw in androgen-specific therapies, respectively.

Taken together, the two give a glimpse into the future of precision medicine tactics pinpointing incredibly specific findings clinicians can use to personalize treatment regimens.

This Brewery Is Using Machine Learning to Create the Ideal IPA

Food & Wine, Mike Pomranz

from April 11, 2018

As computers integrated into everyday life, a romanticism emerged: the idea that they might be able to do everything perfectly—from handling your finances to even finding you a mate. And as the field of artificial intelligence continues to grow, a brewery in Virginia has even used this technology to create what it hopes could be the perfect IPA—and the methodology they used is certainly intriguing.

Charlottesville’s Champion Brewing company recently teamed up with the nearby machine learning company Metis Machine to brew their new ML IPA—a computer’s vision of what should essentially be the ideal IPA. And since the project is based in science, Champion was very specific about what data it chose to feed into the computer.

“We provided the parameters on which IPAs are judged at the Great American Beer Festival (SRM, ABV, IBU) and matched that range with the 10-best-selling IPAs nationally, as well as the 10 worst selling IPAs at a local retailer and Metis came up with the results,” Hunter Smith, owner of Champion Brewing Company said announcing the beer. “We are stoked to be the first to use this method of creating a beer recipe.”

Predictive algorithms are infiltrating schools — here’s why that’s a good thing

The Next Web, Germain Chastel

from May 28, 2018

‘Risk identification’ has become a highly in-vogue term in the American public school system. Districts want to decrease dropout rates, and a slew of data-driven tools have cropped up that use predictive analytics derived from questionably acquired data in order to identify at-risk students. Preventing drop out, like preventing crime, is a noble and worthy goal. But the reason that predictive policing has inspired outrage, is that algorithms tend to perpetuate systemic bias, and only work through collecting swaths of data points — data that people may not know is being collected.

The rise of predictive analytics in institutions such as schools and criminal justice systems raises a series of ethical questions which I’ve outlined below. But the fact is these algorithms are here to stay — and, I argue, that’s a good thing. The questions they raise — about racism, data ownership, and the ethics of predicting crime at all — are ones we should have been examining for decades.

Open-source style community engagement for the Data Commons Pilot Phase Consortium

Living in an Ivory Basement blog, Rayna Harris

from May 29, 2018

In November 2017, the National Institutes of Health (NIH) announced the formation of a Data Commons Pilot Phase Consortium (DCPPC) to accelerate biomedical discovery by making big biomedical data more findable and usable.

It’s called a consortium because the awardees are all working together in concert and collaboration to achieve the larger goal. Those awardees (big cats who run academic research labs or companies) have each brought on numerous students, postdocs, and staff, so the size of the consortium has already grown to over 300 people! That’s a lot of cats to herd.

So, how are we keeping everyone in the community coordinated and engaged? Here’s a little insight into our approach, which was first outlined by Titus in this blog post.

Here’s Mary Meeker’s essential 2018 Internet Trends report

TechCrunch, Josh Constine

from May 31, 2018

Want to understand all the most important tech stats and trends? Legendary venture capitalist Mary Meeker has just released the 2018 version of her famous Internet Trends report. It covers everything from mobile to commerce to the competition between tech giants. Check out the full report below, and we’ll add some highlights soon.

Google will always do evil – But its employees won’t.

Engadget, Jessica Conditt

from May 24, 2018

One day in late April or early May, Google removed the phrase “don’t be evil” from its code of conduct. After 18 years as the company’s motto, those three words and chunks of their accompanying corporate clauses were unceremoniously deleted from the record, save for a solitary, uncontextualized mention in the document’s final sentence.

Google didn’t advertise this change. In fact, the code of conduct states it was last updated April 5th. The “don’t be evil” exorcism clearly took place well after that date.

Google has chosen to actively distance itself from the uncontroversial, totally accepted tenet of not being evil, and it’s doing so in a shady (and therefore completely fitting) way. After nearly two decades of trying to live up to its motto, it looks like Google is ready to face reality.

In order for Google to be Google, it has to do evil.

How Spotify Is Killing Jazz, Soul, Classical Music

hypebot, Chris Rizik

from May 30, 2018

Is it time for a Fair Streaming movement? Chris Rizik explores the structure of the new streaming economy and the way in which artists are paid could spell bad news for Jazz, Soul, Classical and other niche music genres.

Prestige drives epistemic inequality in the diffusion of scientific ideas

arXiv, Computer Science > Social and Information Networks; Allison C. Morgan, Dimitrios Economou, Samuel F. Way, Aaron Clauset

from May 25, 2018

The spread of ideas in the scientific community is often viewed as a competition, in which good ideas spread further because of greater intrinsic fitness. As a result, it is commonly believed that publication venue and citation counts correlate with importance and impact. However, relatively little is known about how structural factors influence the spread of ideas, and specifically how where an idea originates can influence how it spreads. Here, we investigate the role of faculty hiring networks, which embody the set of researcher transitions from doctoral to faculty institutions, in shaping the spread of ideas in computer science, and the importance of where in the network an idea originates. We consider comprehensive data on the hiring events of 5,032 faculty at all 205 Ph.D.-granting departments of computer science in the U.S. and Canada, and on the timing and titles of 200,476 associated publications. Analyzing three popular research topics, we show empirically that faculty hiring plays a significant role in driving the spread of ideas across the community. We then use epidemic models to simulate the generic spread of research ideas and quantify the consequences of where an idea originates on its longterm diffusion across the network. We find that research from prestigious institutions spreads more quickly and completely than work of similar quality originating from less prestigious institutions. Our analyses establish the theoretical trade-offs between university prestige and the quality of ideas necessary for efficient circulation. These results suggest a lower bound for epistemic inequality, identify a mechanism for the persistent epistemic advantage observed for elite institutions, and highlight limitations for meritocratic ideals.

Predicting the temporal activity patterns of new venues

EPJ Data Science; Krittika D’Silva et al.

from May 18, 2018

Estimating revenue and business demand of a newly opened venue is paramount as these early stages often involve critical decisions such as first rounds of staffing and resource allocation. Traditionally, this estimation has been performed through coarse-grained measures such as observing numbers in local venues or venues at similar places (e.g., coffee shops around another station in the same city). The advent of crowdsourced data from devices and services carried by individuals on a daily basis has opened up the possibility of performing better predictions of temporal visitation patterns for locations and venues. In this paper, using mobility data from Foursquare, a location-centric platform, we treat venue categories as proxies for urban activities and analyze how they become popular over time. The main contribution of this work is a prediction framework able to use characteristic temporal signatures of places together with k-nearest neighbor metrics capturing similarities among urban regions, to forecast weekly popularity dynamics of a new venue establishment in a city neighborhood. We further show how we are able to forecast the popularity of the new venue after one month following its opening by using locality and temporal similarity as features. For the evaluation of our approach we focus on London. We show that temporally similar areas of the city can be successfully used as inputs of predictions of the visit patterns of new venues, with an improvement of 41% compared to a random selection of wards as a training set for the prediction task. We apply these concepts of temporally similar areas and locality to the real-time predictions related to new venues and show that these features can effectively be used to predict the future trends of a venue. Our findings have the potential to impact the design of location-based technologies and decisions made by new business owners. [full text]

Deadlines

Generative Design in Minecraft

“The Settlement Generation Challenge is about writing an algorithm that can create a settlement for a given, unknown Minecraft map. The challenge is to produce an algorithm that is adaptive towards the provided map, creates a settlement that satisfies a range of functional requirement – but also looks good and evokes an interesting narrative. The goal is to basically produce an algorithm that can rival the state of the art of what humans can produce.” Deadline for submissions is June 30.

Tools & Resources

Introduction to Recommender Systems in 2018

RE•WORK, Tryolabs, Gastón Rodríguez

from May 29, 2018

In this blog post, we’ll describe the broad types of the most popular recommender systems and give insights into how they work, going through a few examples.

To give some motivation on the subject and help decide whether it’s a worthwhile investment, we’ll point to some real-life case studies, talk about the high level requirements for implementing recommender systems, and discuss how they can be evaluated fairly.

Public Data Release of Stack Overflow’s 2018 Developer Survey

Stack Overflow, Kaggle

from May 30, 2018

Starting today, you can access the public data release for Stack Overflow’s 2018 Developer Survey. Over 100,000 developers from around the world shared their opinions about everything from their favorite technologies to job preferences, and this data is now available for you to analyze yourself. This year, we are partnering with Kaggle to publish and highlight this dataset. This means you can access the data both here on our site and on Kaggle Datasets, and that on Kaggle, you can explore the dataset using Kernels. Kaggle is awarding two $1,000 awards over the next two weeks to authors of top Kernels on the Stack Overflow dataset.

Careers

Full-time positions outside academia

Front End Software Engineer

Google; Cambridge, MA

Tenured and tenure track faculty positions

Director, Center for Social Science

Western Washington University; Bellingham, WA

Sports.BradStenger.com

Data Science newsletter – June 1, 2018

Leave a Comment Cancel reply