Data Science newsletter – March 25, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for March 25, 2019

GROUP CURATION: N/A

 
 
Data Science News



You can help “rescue” weather data from the 1860s

Ars Technica, Scott K. Johnson


from

“Weather Rescue” sounds like it could be a Baywatch-style TV show about the adventures of an emergency response team. But the Weather Rescue project led by University of Reading researcher Ed Hawkins is actually focused on data that needs rescuing.


Scientists rise up against statistical significance

Nature, Comment; Valentin Amrhein, Sander Greenland & Blake McShane


from

Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.


How the tragic death of Do Not Track ruined the web for everyone

Fast Company, Glenn Fleishman


from

A decade ago–long before the current controversies over what big companies are doing with our data–a lot of people were already irate about ad networks that followed their activity across sites in order ever more precisely to target marketing messages. A feature called Do Not Track arose as a simple, comprehensible way for browser users to take back their privacy. To opt out of being tracked, you’d check a box in your browser’s settings.

Notably, this didn’t opt out of advertising–just the technology used to target ads. With Do Not Track checked, no web server or embedded code would associate your behavior at a given site with actions elsewhere on the web. It was a great idea.

And now it’s dead.


How a driveway chat resulted in the Institute for Human-Centered Artificial Intelligence

Stanford University, Stanford News


from

It was the summer of 2016.

“John,” [Fei-Fei] said, “As Stanford’s provost, you’ve led an effort to draw an arrow from technology to the humanities, to help humanists innovate their methodology. It’s time to build another arrow coming back the other direction. It should become a complete feedback loop. We need to bring the humanities and social thinking into tech.”

She went on to explain an epiphany she had recently had — a problem she could no longer ignore. The people building the future all seemed to come from similar backgrounds: math, computer science and engineering. There were not enough philosophers, historians or behavioral scientists influencing new technology. There were very few women or people from underrepresented groups.


AI.gov launched for U.S. artificial intelligence initiatives

SlashGear, Matthew DeCarlo


from

AI.gov will provide a central location for information about AI-related projects occurring across various agencies, including policy initiatives, regulatory hurdles, as well as fact sheets and other resources. Along with helping inform the public and preparing people for the many anticipated effects that AI will have on the American economy and workforce, the new website will aid agencies in coordinating their research and development efforts.


Open access task force releases draft recommendations

MIT News, MIT Libraries


from

The Ad Hoc Task Force on Open Access to MIT’s Research has released a set of draft recommendations that aim to support and increase the open sharing of MIT publications, data, software, and educational materials. They are available for public comment until April 17.

The recommendations include ratifying an Institute-wide set of principles for open science; broadening the MIT Faculty Open Access Policy to cover all MIT authors; adopting an open access (OA) policy for monographs; and asking department heads to develop discipline-specific plans to encourage and support open sharing from their faculty, students, and staff.


Changes in Ocean ‘Conveyor Belt’ Foretold Abrupt Climate Changes by Four Centuries

Columbia University, Earth Institute, Sarah Fecht


from

In the Atlantic Ocean, a giant ‘conveyor belt’ carries warm waters from the tropics into the North Atlantic, where they cool and sink and then return southwards in the deep ocean. This circulation pattern is an important player in the global climate, regulating weather patterns in the Arctic, Europe, and around the world. Evidence increasingly suggests that this system is slowing down, and some scientists fear it could have major effects, such as causing temperatures to dive in Europe and warming the waters off the East coast of the United States, potentially harming fisheries and exacerbating hurricanes.

A new study published in Nature Communications provides insight into how quickly these changes could take effect if the system continues weakening. Led by scientists at Columbia’s Lamont-Doherty Earth Observatory in collaboration with the Norwegian Research Centre, the study is the first to precisely determine the time lags between past changes to the ocean conveyor belt and major climate changes.


Fake News on Facebook: A Rare Occurrence

Medium, NYU Center for Data Science


from

The dissemination of fake news received widespread attention from journalists, scholars, and the public after the 2016 election, with some people even ascribing the election outcome to the spread of false information on social media. But a new study from CDS affiliate faculty members Jonathan Nagler and Joshua Tucker, Professors of Politics and co-Directors of the NYU Social Media and Political Participation (SMaPP) lab, and Andrew Guess of Princeton University and former Moore Sloan Data Science Environment postdoc, reveals that the incidence of fake news sharing on Facebook was actually quite rare — over 90% of people in the researchers’ sample shared no links to fake news websites stories during the sample period.


Historic Hardware

Caltech Magazine


from

Caltech is a treasure trove of vintage computing equipment … if you know where to look. “You see these e-waste piles all around campus and people just stick old equipment in them,” says Albert Tseng, a sophomore majoring in computer science. “Most of it is actual waste, but sometimes you find really interesting things being discarded.”

Tseng teamed up recently with fellow sophomore Hongsen Qin, whose particular passion is old keyboards. The pair recruited junior Karthik Karnik and, together, the three undergrads turned their hobby into a new student club dedicated to preserving, collecting, and showcasing vintage computing hardware used on the Caltech campus.

Through word of mouth and social media, the Vintage Computing Club has garnered interest from a broad swath of students and alumni.


Moshe Vardi on Social Implications of Technology and Our Responsibility As Academics

Communications of the ACM, News, Changing Academic Life


from

Moshe Vardi is a professor of computer science at Rice University in the U.S. and holds numerous honors and awards. In this conversation he talks about the impact of technologies on society and how this challenges what computer science should be concerned about and our responsibilities to engage in these issues. What he has to say speaks not only to computer scientists but to all academics.

Side note: This is the first part of a much longer conversation. Part 2, coming out as a separate podcast, discusses the changes and challenges in academia more generally.


Data sharing practices of medicines related apps and the mobile ecosystem: traffic, content, and network analysis

The BMJ, Quinn Grundy et al.


from

Journalists recently revealed that Australia’s most popular medical appointment booking app, HealthEngine, routinely shared 100s of users’ private medical information to personal injury law firms as part of a referral partnership contract. Although the company claimed this was only done with users’ consent, these practices were not included in the privacy policy but in a separate “collection notice,” and there was no opportunity for users to opt-out if they wished to use the application (app).1

Mobile health apps are a booming market targeted at both patients and health professionals. These apps claim to offer tailored and cost effective health promotion, but they pose unprecedented risk to consumers’ privacy given their ability to collect user data, including sensitive information. Health app developers routinely, and legally, share consumer data with third parties in exchange for services that enhance the user’s experience (eg, connecting to social media) or to monetise the app (eg, hosted advertisements). Little transparency exists around third party data sharing, and health apps routinely fail to provide privacy assurances, despite collecting and transmitting multiple forms of personal and identifying information.


Model learns how individual amino acids determine protein function

MIT News


from

In a paper being presented at the International Conference on Learning Representations in May, the MIT researchers develop a method for “learning” easily computable representations of each amino acid position in a protein sequence, initially using 3-D protein structure as a training guide. Researchers can then use those representations as inputs that help machine-learning models predict the functions of individual amino acid segments — without ever again needing any data on the protein’s structure.

In the future, the model could be used for improved protein engineering, by giving researchers a chance to better zero in on and modify specific amino acid segments. The model might even steer researchers away from protein structure prediction altogether.


In the Sky and on the Ground, Collaboration Vital to DARPA’s CODE for Success

DARPA


from

On a brisk February morning in the Yuma, Arizona, desert, a swarm of unmanned aerial vehicles equipped with DARPA’s Collaborative Operations in Denied Environment system, or CODE, successfully carried out mission objectives, even when communications were offline and GPS was unavailable.

One-by-one, six RQ-23 Tigersharks lifted off, fitted with an array of sensors onboard. Next to the runway at the U.S. Army’s Yuma Proving Ground, the mission team inside a small operations center tracked the aircraft and as many as 14 additional virtual planes on an aerial map. The capstone demonstration paired program performer Raytheon’s software and autonomy algorithms and Johns Hopkins University Applied Physics Laboratory’s White Force Network to create a realistic, live/virtual/constructive test environment. During four demonstration runs, the team activated a variety of virtual targets, threats, and countermeasures to see how well the Tigersharks could complete their objectives in suboptimal conditions.


Iowa’s Real Population

Chris Jones


from

Iowa has around 3 million people, a total that has not changed much over the last 80-90 years. People are large animals, and as such our bodies produce a lot of waste. That being said, we produce much less waste than the animals that we eat. Take hogs, for example. A feeder pig is about the same size as a human being, but it excretes 3 times as much nitrogen (N), 5 times as much phosphorus (P), and 3.5 times as much solid matter (TS-total solids) (1). Some of this is because of the pig’s diet and some it is because modern hogs grow really, really fast (these things relate to each other, obviously). A pig weighs about 3 pounds at birth and about 250 pounds at slaughter a mere 6 months later, so it is gaining more than one pound per day. By comparison, a human infant gains a pound about every 20 days. There’s a reason we use the word “hog” metaphorically and pejoratively because they consume anything and everything in sight virtually non-stop, which is one reason why they make for a good food animal.

Everybody knows Iowa has a lot of livestock. If you’re like me, maybe you have heard from time to time that our state has enough animals to effectively be as populous as California, using one common example. As you will soon see, it’s bigger than that. Much bigger.


How Google Scholar Judges Research

Social Science Space, Louis Coiffait


from

The last two decades have seen dramatic changes in scholarly communication. Journals have moved online, many of the worlds’ documents have been digitized, and comprehensive, relevance-ranked search has become available to almost everyone. Researchers almost everywhere can now find, and often access, nearly all of the world’s scholarly outputs. This has profound implications for the research process, academic careers (and wellbeing), as well as attempts to measure and incentivise research (including its ‘impact’ on other research, and on policy or practice).

In the next three articles, I’d like to summarise what I’ve discovered about one of the key services making this happen, Google Scholar. Many of the findings apply to other platforms, too, and indeed the wider citation and research ‘impact’ system. But to understand some of the politics and power of the current system, as well as the unanswered questions, we need to deep-dive into the details. In addition to testing the Scholar service, I reviewed some of the relevant literature and interviewed in-the-know experts. As I’m a policy person, not an academic, this is written from that perspective.

 
Events



Uncertainty, Information, and Narrative: A Statistical Perspective on Scientific Storytelling

Carnegie Mellon University, The Humanities Center


from

Pittsburgh, PA March 28, starting at 4:30 p.m., Carnegie Mellon University Giant Eagle Auditorium. Speaker: Robert Kass.[free]


2019 Conference on Cognitive Computational Neuroscience

CCN 2019


from

Berlin, Germany September 13-16, at the Technical University of Berlin. [save the date]


Stanford Colloquium on Machine Learning and Causal Inference

Society for Epidemiologic Research


from

Stanford, CA April 29, starting at 9 a.m. [free, registration required]


Book Launch: All Data Are Local by Yanni Alexander Loukissas

Georgia Institute of Technology, Institute for People and Technology


from

Atlanta, GA April 18, starting at 4:30 p.m., Georgia Tech Institute for People and Technology (Centergy Building, Suite 600). “This celebration is part of IPaT’s signature event, Industry Innovation Day, on Thursday, April 18 in Tech Square.” [free, registration required]


2019 Brain Health & Performance Summit

The Ohio State University, Wexner Medical Center


from

Columbus, OH June 5-7. “Attendees will find enlightening discussion, CME opportunities and practical applications for new research, presented by leaders driving discoveries in brain health and how it affects human performance.” [$$$]

 
Deadlines



Workshop: Call for Papers – COMMUNITY NORMS

“We invite submissions for a one-day workshop called “Managing and Designing for Norms in Online Communities” held at ICWSM on June 11, 2019 in Munich, Germany. Our workshop will bring together an interdisciplinary group of researchers and practitioners to discuss and design socio-technical solutions for managing norms in online communities.” Deadline for submissions is April 5.

EY announces the launch of a global data science challenge to identify and develop top talent in analytics and artificial intelligence

“The EY NextWave Data Science Challenge is designed to test the skills of data science students who will use data from Skyhook, one of the pioneers in location technology and intelligence, to solve issues related to the future of mobility and smart cities.” Deadline for submissions is May 10.
 
Tools & Resources



Preload, prefetch and other tags: what they do and when to use them

PerfPerfPerf, Ivan Akulov


from

There’re multiple ways to improve web performance (see Web Performance 101 for a full overview). One of those ways is to preload content you’ll need later in advance. Prefetch a CSS file, prerender a full page, or resolve a domain ahead of time – and you won’t have to wait for it when it’s actually needed! Sounds cool.

What’s even cooler is that browsers have a simple built-in way to do all these things. There’re five tags that instruct the browser to preload something.


Meet the newest Techstars Seattle class: 10 founders share their pitch and startup tips

GeekWire, Taylor Soper


from

GeekWire caught up with the entrepreneurs who are apart of the tenth Techstars Seattle cohort, a milestone for the 3-month accelerator that has graduated 100 companies to date over the past decade.

Alumni of the organization — companies such as Remitly, Outreach, Skilljar, Bizible, Leanplum and Zipline — have collectively raised more than $700 million in investment capital. Most have built their startups in the Pacific Northwest, helping expand the entrepreneurial clout in the region.


Introducing Orchestra 1.0, an open-source system for Robotic Process Automation

Medium, Team B12


from

Today we’re excited to unveil an updated version of Orchestra that makes project management and team communication even easier. We initially open-sourced Orchestra v0.1.0 in September 2015 with key features like human + machine workflows, hierarchical review, and expertise certification. After years of engineering and design, we’re psyched to announce Orchestra 1.0!

Orchestra is an open-source Robotic Process Automation (RPA) system that organizes teams of human experts and machines to work together. Orchestra automates repetitive tasks in team-based knowledge work so teams work faster and experts spend more time on creative and analytical tasks.


How I’m able to take notes in mathematics lectures using LaTeX and Vim

Gilles Castel


from

A while back I an­swered a ques­tion on Quora: Can peo­ple ac­tu­al­ly keep up with note-​taking in Math­e­mat­ics lec­tures with LaTeX. There, I ex­plained my work­flow of tak­ing lec­ture notes in LaTeX using Vim and how I draw fig­ures in Inkscape. How­ev­er, a lot has changed since then and I’d like to write a few blog posts ex­plain­ing my work­flow.

 
Careers


Full-time positions outside academia

Data Scientist



ViriCiti; Amsterdam, Netherlands

Data Scientist



Aspen Mesh; Boulder, CO

Research Assistant



Microsoft Research New England, Social Media Collective; Cambridge, MA
Full-time, non-tenured academic positions

Staff Data Scientist



Texas A&M University; College Station, TX
Internships and other temporary positions

AI Residency Program



IBM Research

Leave a Comment

Your email address will not be published.