Data Science newsletter – March 27, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for March 27, 2018

GROUP CURATION: N/A

 
 
Data Science News



Company Data Science News

Apple entered into agreements to store health care data from 13 hospital systems on its platform back in January. This week, they’re up to 39 partners, including NYU Langone The effort would allow all of the electronic health data to become “liquid”, flowing from provider to provider with patients as they navigate their health care needs. This could be a big step towards delivering some of the consumer-facing “goods” that were promised when electronic health recording technology was introduced. Medical histories no longer need to be obscure, stored in a paper file at an office with limited hours, unavailable to the emergency room or wherever a patient may land, in need of urgent, accurate, assistance. Historically, electronic health records have also been obscure. Now if they’re in standardized formats on patients’ phones, critical care may receive a quality boost, which is a big deal considering how many health issues revolve around the speed and accuracy of early interventions.



Google is building a techtopia in Toronto called Sidewalk. Project lead Rohit Aggarwala laid out a data non-hoarding principle theorized by Ann Cavoukian, “privacy by design is that you don’t ever start the idea of hoovering up a bunch of information and then figur[ing] out what it’s there for….you figure out what is the minimum amount of information that we need to affect this problem” we are facing. Still leaves a lot of room to define the problem broadly and thereby collect vast amounts of information, but at least it moves us past the ‘more is more’ big data size-related triumphalism.

In other coverage, people in Toronto are pressuring Google and the one Toronto official who has seen the Sidewalk + Toronto agreement to make the details public. One of the main issues is what will done with users’/residents’ data.

Baidu has a pocket speech-to-speech translator that impressed people at MIT Tech Review’s EmTech Digital conference in San Francisco. It was able to translate back and forth between Mandarin and English. I haven’t had a chance to try it myself, but I wish I could have.



Adobe was also at that conference showing off a technology that can find and replace portions of an image with an upgrade (or downgrade). Don’t like the sky in your selfie? Replace it with a quiet sunrise. Wish those buildings behind you were less strip mall-ish? Swap out the Subway, McDonalds, and dental office for a string of wine bars, boutiques, and independent bookstores straight out of Brooklyn. Point being: where Photoshop introduced doubt in the veracity of the digital image, this new technology will make the entire question of truth in images moot.

Pew Research finds that roughly half of Americans do not trust social media companies or the government to properly safeguard their data. This is a big deal for anyone planning to collect, store, or use data gathered from users. If you think the reckoning happening now between Facebook, Cambridge Analytica, and US regulators is somehow outside of the realm of data science research, think again. Any regulations will likely impact data scientists, too. Any breaches of trust perpetrated by others may also reduce public trust in researchers, especially in this case where a professor led the initial data collection.



Francois Chollet of Google unspooled a Twitter thread explaining that the problem with Facebook is “not *just* the loss of your privacy and the fact that it can be used as a totalitarian panopticon. The more worrying issue, in my opinion, is its use of digital information consumption as a psychological control vector.” We should all perk up when someone from within Google is worried about the “totalitarian panopticon.” He goes on, “If Facebook gets to decide, over the span of many years, which news you will see (real or fake), whose political status updates you’ll see, and who will see yours, then Facebook is in effect in control of your political beliefs and your worldview.” The thread is here, and a longer Medium post is here.



Google may owe Oracle $8.8bn for using its Java APIs in the Android product. The case has gone back and forth and back again, but it is here as a reminder of just how much skin these companies have in this game.



Turner Broadcasting wants to merge with AT&T to get its data.



Microsoft is continuing to grow its AI footprint, which will likely keep the race for AI talent ridiculously tight.



Under pressure from just about everyone, Facebook announced that it will stop using data from third-party data brokers like Experian and Acxiom. Advertisers on the platform are unhappy. Consumers may never have understood how data brokers were involved in the ecosystem, making it hard for them to tell how to react.


New Mapping Tool to Help Appalachian Counties Direct Resources for Opioid Crisis

EfficientGov, Andrea Fox


from

NORC at the University of Chicago and its Federal and state partners developed a data-driven tool that maps socioeconomic factors underlying the opioid epidemic in hard hit Appalachian counties.


Gun deaths could become easier to study thanks to the new spending bill

The Verge, Rachel Becker


from

The Centers for Disease Control and Prevention now have the government’s permission to resume gun violence research, in writing: the massive omnibus spending bill that President Donald Trump signed today clarifies that a 22-year-old ban on using federal funds to advocate or promote gun control doesn’t actually ban research.

While the bill is a step in the right direction, researchers will only believe that the landscape of gun violence research is actually changing when they see money for it in the CDC’s budget. “It’s not bad news — it’s good news,” says Jeffrey Swanson, a professor in psychiatry and behavioral sciences at Duke University. “But I’m skeptical that it’s going to really turn things around without some money being made available.”


Baidu shows off its instant pocket translator

MIT Technology Review, James Temple


from

Baidu showed off the speed of its pocket translator for the first time in the United States during an afternoon presentation at MIT Technology Review’s EmTech Digital conference in San Francisco.

The Chinese Internet giant has made significant strides improving machine language translation since 2015, using an advanced form of artificial intelligence known as deep learning, said Hua Wu, the company’s chief scientist focused on natural-language processing.


Adobe bets that AI tools can foster real creativity

MIT Technology Review, Rachel Metz


from

On Monday at MIT Technology Review’s yearly EmTech Digital conference in San Francisco, Gavin Miller, head of Adobe Research, explored this question, showing off videos of several AI-using prototypes from software maker Adobe’s Creative Intelligence Lab.

One of the prototypes, called Project Scene Stitch, illustrates how an algorithm could be used to replace ugly buildings in the foreground of a photo—a user would enter some key words, and the algorithm would find another image that would fit naturally into the space the user wanted to fill.


Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science

OpenReview Anonymous Preprint, Anonymous


from

In this position paper, we propose data statements as a practice that NLP technologists, in both research and development, can adopt to begin to address critical scientific and ethical issues that result from the use of data from certain populations in the development of technology for other populations. We de- scribe the form data statements can take and explore the implications of adopting them as part of our regular practice. We argue that they will help alleviate issues related to exclusion and bias in language technology; lead to better precision in claims about how NLP research can generalize and thus better engineering results; protect companies from pub- lic embarrassment; and ultimately lead to language technology that meets its users in their own preferred linguistic style and furthermore does not misrepresent them to others.


It’s Time for Data Ethics Conversations at Your Dinner Table

Tech at Bloomberg,


from

This opinion piece has been contributed by Lucy C. Erickson, Natalie Evans Harris and Meredith M. Lee, three members of the D4GX community. All three authors contributed equally to this piece.

With 2.5 quintillion records of data created every day, people are being defined by how they travel, surf the Internet, eat, and live their lives. We are in the midst of a “data revolution,” where individuals and organizations can store and analyze massive amounts of information. Leveraging data can allow for surprising discoveries and innovations with the power to fundamentally alter society: from applying machine learning to cancer research to harnessing data to create “smart” cities, data science efforts are increasingly surfacing new insights ‒ and new questions.

Working with large databases, new analytical tools, and data-enabled methods promises to bring many benefits to society. However, “data-driven technologies also challenge the fundamental assumptions upon which our societies are built,” says Margo Boenig-Liptsin, co-instructor of UC Berkeley’s “Human Contexts and Ethics of Data” course. Boenig-Liptsin notes, “In this time of rapid social and technological change, concepts like ‘privacy,’ ‘fairness,’ and ‘representation’ are reconstituted.” Indeed, bias in algorithms may favor some groups over others, as evidenced by notorious cases such as the finding by MIT Researcher Joy Buolamwini that certain facial recognition software fails to work for those with dark skin tones. Moreover, lack of transparency and data misuse at ever-larger scales has prompted calls for greater scrutiny on behalf of more than 50 million Facebook users.


It’s not just Facebook. Thousands of companies are spying on you (opinion) – CNN

CNN.com, Bruce Schneier


from

In the wake of the Cambridge Analytica scandal, news articles and commentators have focused on what Facebook knows about us. A lot, it turns out. It collects data from our posts, our likes, our photos, things we type and delete without posting, and things we do while not on Facebook and even when we’re offline. It buys data about us from others. And it can infer even more: our sexual orientation, political beliefs, relationship status, drug use, and other personality traits — even if we didn’t take the personality test that Cambridge Analytica developed.

But for every article about Facebook’s creepy stalker behavior, thousands of other companies are breathing a collective sigh of relief that it’s Facebook and not them in the spotlight. Because while Facebook is one of the biggest players in this space, there are thousands of other companies that spy on and manipulate us for profit.

Harvard Business School professor Shoshana Zuboff calls it “surveillance capitalism.” And as creepy as Facebook is turning out to be, the entire industry is far creepier. It has existed in secret far too long, and it’s up to lawmakers to force these companies into the public spotlight, where we can all decide if this is how we want society to operate and — if not — what to do about it.


Government Data Science News

US Immigration and Customs Enforcement (ICE) is using Facebook data to track individuals, according to a new report by Lee Fang at The Intercept. The report also indicates that Palantir is working in conjunction with ICE, which is a matter of public record, but is likely far more impactful than the fact that ICE is using Facebook data.



The 2020 Census is in deep trouble. The Trump Administration has pushed forward an idea to require respondents to identify their citizenship. Twelve states are suing to prevent the new question from appearing. It is hard to blame immigrants for avoiding government surveillance as the promises they were given about the DACA registry have not been met. Further diminishing trust, the Census 2020 trial run now underway in Rhode Island is not going well. In the largely Latino community, “people are afraid.”

NASA has pushed back the launch date for the James Webb Space Telescope to May 2020. Originally slated to launch this October, agency director Robert Lightfoot also warned it may cost more than the $8bn budget. NASA has been discussed as a case of a “permanently failing organization” in which the objectives of key stakeholders are misaligned, resulting in persistently poor performance (Meyer & Zucker, 1989).



France has made a €1.5bn commitment to fund research and development in AI, citing no interest in falling behind the US and China. Notably, Prime Minister Macrón included ethical design as a key focus. Then he gave an interview to Wired.



Waymo appears to be winning the self-driving car race, having had no recent accidents resulting in death or dismemberment. Instead, they ordered 20,000 self-driving Jaguars which will allow the fleet to complete 1 million self-driven trips in 2020.



The National Science Foundation is revising its 2014 Geosciences road map. The comments will be open until April 15th. In particular, they are looking for input on “imperatives in research, community resources and infrastructure, data and infrastructure, and education and diversity while also looking at research frontiers.”


Tenacious new gun researchers are determined to break cycle of mass shootings

The Washington Post, William Wan


from

After a two-decade recruiting drought, gun researchers say they are suddenly seeing a wave of young scientists entering their field — an unforeseen consequence of recent mass shootings.

Unlike past generations, the new scientists appear undeterred by the field’s lack of funding, dearth of data and hostile political climate. The new contingent has brought energy and fresh approaches to a beleaguered, intractable domain, longtime experts say.

Their work coincides with a resurgence of gun control activism — led by the teenage Parkland, Fla., student survivors who mounted this weekend’s March for Our Lives — as well as with increased interest from private foundations and state-level governments in funding such research. And on Wednesday, congressional leaders took baby steps toward lifting what has amounted to a ban on federally funded gun research by issuing clarifying language — but no new money — supporting such research.

Together, the infusion of talent and momentum represents a tantalizing opportunity to finally answer questions that have long plagued the U.S. gun debate: questions about the causes of gun violence, its effects on society and — most crucial — the best ways to prevent this brand of violence.


Sidewalk Toronto is still figuring out how to govern and protect privacy

Betakit, Sameer Chhabra


from

Sidewalk Labs is currently in the process of negotiating provisions to guarantee the privacy and respectful use of the data that will eventually flow through the Sidewalk Toronto development.

At two consecutive March 20 presentations at the Metro Toronto Convention Centre, Sidewalk Labs’ head of urban development Rohit Aggarwala discussed the company’s commitment to protecting individual privacy.

“What matters in a city is when people feel comfortable, people feel respected, people feel a sense of ownership in that place,” said Aggarwala. “And they can’t do that if they feel they’re being monitored for the wrong reasons or for any reasons.”

During the presentation — itself meant to bring attendees up-to-date on the latest developments since Sidewalk Toronto held its open hall forum earlier this year — Aggarwala stated that Sidewalk Labs is not “the kind of firm that’s going to get a lot of data and figure out what to do with it [later].”


The 2020 Census May Undercount Millions of Texans

CityLab, Alexa Ura and Chris Essig


from

The massive state is full of people most likely to be undercounted—Hispanic immigrants, people living in poverty, and hurricane victims who’ve been displaced from their homes.


The Proposal to Lower P Value Thresholds to .005

JAMA, The JAMA Network, Viewpoint; John P. A. Ioannidis


from

P values and accompanying methods of statistical significance testing are creating challenges in biomedical science and other disciplines. The vast majority (96%) of articles that report P values in the abstract, full text, or both include some values of .05 or less.1 However, many of the claims that these reports highlight are likely false.2 Recognizing the major importance of the statistical significance conundrum, the American Statistical Association (ASA) published3 a statement on P values in 2016. The status quo is widely believed to be problematic, but how exactly to fix the problem is far more contentious. The contributors to the ASA statement also wrote 20 independent, accompanying commentaries focusing on different aspects and prioritizing different solutions. Another large coalition of 72 methodologists recently proposed4 a specific, simple move: lowering the routine P value threshold for claiming statistical significance from .05 to .005 for new discoveries. The proposal met with strong endorsement in some circles and concerns in others.

 
Events



University of British Columbia CS50 Data Science Symposium

University of British Columbia Data Science Institute


from

Vancouver, BC, Canada Friday, May 11. “The Data Science Institute is sponsoring one of the symposiums at the UBC Computer Science 50th Anniversary Celebration.” [$$]


CMU Baseball Analytics Workshop

Carnegie Mellon Sports Analytics


from

Pittsburgh, PA April 7. “An interactive workshop event for students and anyone interested in learning more about baseball analytics.” [$$]


Princeton MediHack

Princeton MediHack


from

Princeton, NJ April 20-22. “During this 36-hour event, students will team up to collaborate on a project to be presented to a panel of judges by the end of the weekend. Experts in the field of medicine, including doctors and healthcare entrepreneurs, will serve as speakers and mentors.” [application required]


2018 Midwest Undergraduate Data Analytics Competition

Winona State University, MinneAnalytics and the Minnesota Wild


from

Winona, MN April 7-8. “MUDAC brings together about 150 undergraduates from colleges and universities around the Midwest. Participants spend 24 hours working on a data analytics problem provided by our data sponsor. The challenge this year will focus on gaining a deeper understanding of the characteristics that influence season ticket sales for the Minnesota Wild.” [registration required for employers, judges, teams]

 
Deadlines



Berkeley Institute for Data Science – Call for Data Science Fellows

“Successful applicants will join our current cohort of fellows, building a community of postdoctoral scholars, graduate student researchers, and research staff with excellent credentials in their fields as well as a strong interest in advancing data science and data-intensive research across the campus and beyond.” Deadline to submit letter of intent is April 2.

Workshop for Natural Language Processing Open Source Software (NLP-OSS)

Melbourne, Australia July 19 or 20, co-located with ACL 2018. Deadline for paper submissions is April 8.

ICML 2018: The Debates – Call for Papers

Stockholm, Sweden Workshop for ICML 2018, July 10-15. “Machine Learning: The Great Debates (ML-GD) will be a day-long series of unscripted discussions that will match up researchers to present their best arguments and points of view on major issues within the field and around the future of the technology.” Deadline for submissions is April 23.
 
Moore-Sloan Data Science Environment News



5 Minutes with Sharon Weinberg

Medium, NYU Center for Data Science


from

Former Vice Provost for Faculty Affairs at NYU talks data science and higher education

 
Tools & Resources



As Design And Engineering Blur, Figma Wants To Be Their Platform

Fast Company, Harry McCracken


from

When Figma debuted in 2015, a Fast Company story described it as a Google Docs for designers—which was as good a way as any to describe the basic concept. A tool for creating user-interface designs, it competes with Photoshop, Sketch, and InVision, all of which have their loyalists. However, instead of making you download and install desktop software, Figma runs in a web browser and stores designs in the cloud. That streamlines adoption, encourages collaboration, and—perhaps most important—ensures that everybody is always working with the latest version of a file.

But Figma’s internet-native DNA opens up other opportunities beyond those that are immediate and obvious. Today, the company is unleashing one of them by turning itself into a platform.

At launch, that means that Figma has created an API which makes it relatively simple for a developer to reach into a Figma repository and specify a design (or lots of designs) to use in a third-party app or service.

 
Careers


Full-time, non-tenured academic positions

Lecturer in Computer Games



Macquarie University; Sydney, Australia

Research position in Computational Social Science



GESIS – Leibniz Institute for the Social Science; Cologne, Germany

Leave a Comment

Your email address will not be published.