Data Science newsletter – July 25, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for July 25, 2018

GROUP CURATION: N/A

 
 
Data Science News



The age you feel means more than your actual age

BBC – Future, David Robson


from

Imagine, for a moment, that you had no birth certificate and your age was simply based on the way you feel inside. How old would you say you are?

Like your height or shoe size, the number of years that have passed since you first entered the world is an unchangeable fact. But everyday experience suggests that we often don’t experience ageing the same way, with many people feeling older or younger than they really are.

Scientists are increasingly interested in this quality. They are finding that your ‘subjective age’ may be essential for understanding the reasons that some people appear to flourish as they age – while others fade. “The extent to which older adults feel much younger than they are may determine important daily or life decisions for what they will do next,” says Brian Nosek at the University of Virginia.


Kamala Harris leads potential 2020 Democratic contenders in Facebook ad spending

Santa Cruz Sentinel, Bay Area News Group, Casey Tolan


from

Sen. Kamala Harris is spending more money on Facebook ads than any other U.S. Senator or potential Democratic presidential candidate, helping her grow a digital presence that would be a powerful asset if she decides to run for the White House.

Over a two-month snapshot from early May through early July, Harris’ ads on Facebook have been viewed at least 7.8 million times, and her campaign has spent at least $134,500 on the platform, according to data from a recent New York University study provided to the Bay Area News Group. That’s a larger sum than any U.S. politician other than President Trump and Texas Rep. Beto O’Rourke, who’s running for Senate.


The Hunt for Earth’s Deep Hidden Oceans

Quanta Magazine, Marcus Woo


from

Water-bearing minerals reveal that Earth’s mantle could hold more water than all its oceans. Researchers now ask: Where did it all come from?


1 big thing: The childless, aging future

Axios, Mike Allen


from

Since the dawn of humans, we have had to figure out how to support a rising population. But in a momentous change, that age-old challenge is reversing: The population of most countries is shrinking — for many of them, at an alarming pace — and at the same time aging, writes Axios Future Editor Steve LeVine.

Much of the world teeters on the cusp of a childless, elderly future.

Why it matters: A growing, youthful population is typically a bedrock sign of vitality. In the industrial age, that’s included a growing economy, greater opportunity, advancing technology, and a more comfortable retirement for older people. The turnaround on all continents except Africa means supporting an increasing number of retired people with many fewer workers.


Toward an Open Science Enterprise

The National Academies, Board on Research Data and Information


from

Openness and sharing of information are fundamental to the progress of science and to the effective functioning of the research enterprise. The advent of scientific journals in the 17th century helped power the Scientific Revolution by allowing researchers to communicate across time and space, using the technologies of that era to generate reliable knowledge more quickly and efficiently. Harnessing today’s stunning, ongoing advances in information technologies, the global research enterprise and its stakeholders are moving toward a new open science ecosystem. Open science aims to ensure the free availability and usability of scholarly publications, the data that result from scholarly research, and the methodologies, including code or algorithms, that were used to generate those data.


UPS is turning to predictive analytics

Business Insider, Nicholas Shields


from

Parcel delivery giant UPS recently started using a new predictive analytics algorithm to optimize the movement of parcels throughout its entire delivery network, The Wall Street Journal reports.

The algorithm will collect and analyze more than 1 billion data points per day on factors like package weight, shape, and size. From there, it’ll cross reference the data with historical delivery trends to estimate capacity, package volumes, and customer demand.


Duke partners with Vivli to engage in a new era of open science

Duke Clinical Research Institute


from

The DCRI’s Frank Rockhold, PhD, will serve on the steering committee for the new data sharing and analytics platform.

Duke University School of Medicine has announced a partnership with Vivli, a nonprofit organization that today launched a new data sharing and analytics platform. With the launch of the Vivli platform, researchers worldwide will have a simple yet secure way to discover, share and analyze data from clinical trials, regardless of who sponsored the research or where the research took place. Vivli, Duke and other partners hope that sharing data from clinical trials will expedite the development of cures and treatments for diabetes, Alzheimer’s, malaria and other diseases.


Goldman Sachs leads $37M round for Boston-based AI startup Cogito

Boston Business Journal, Kelly J. O'Brien


from

Boston-based Cogito Corp., an MIT-born company that puts artificial intelligence technology to work in call centers, has raised $37 million in a round led by the growth equity division of Goldman Sachs (NYSE: GS).

Cogito has now raised more than $57 million since it was born out of the MIT Human Dynamics Lab in 2007. Most of that funding has come in the past three years as Cogito has transitioned from a research and development shop supported by DARPA, a major funder of military technology, to a company that sells software to some of the largest call center operators in the world.

The software, which combines machine learning, voice analysis and behavioral science, is used by companies including Humana and MetLife.


From professor-in-waiting to florist: Why some PhDs are quitting academia for unconventional jobs

CBC Radio, Donya Ziaee


from

When Jennifer Fowlow started her PhD program in women’s studies, her dream was to become a professor.

But by her third year, she says, she was fed up.

“When I look back … to me it was just one big, stressful guilt trip,” says Fowlow.

“There was always another article I should have read, another book I should have gotten, more notes I should have written.” [audio, 32:09]


University Data Science News

Dear for-profit textbook publishers: SUNY-Buffalo (and many other universities) is looking for ways to reduce student costs by using free and open source teaching material. I have been saying for over a year now that we need the equivalent of ResearchGate for syllabi, course modules, and teaching aids of all kinds. Contact me (laura.noren@nyu.edu) if you want to collectively make this happen in a non-profit way. It would be smart to integrate a tool for understanding which educational materials are most useful, like this usage visualization tool from Purdue University.


The Duke University School of Medicine is collaborating with the non-profit Vivli on infrastructure designed to help serve, share, and combine data from clinical trials, no matter who conducted them. It costs money to be a contributor – $2,000 for storage and sharing; $4500 for storage, sharing, plus Institutional Review Board reviewing. There are academic discounts of unknown proportion by arrangement. I wish there were a way to do this for free, but I understand that there are costs associated with servers, maintenance, review, and cybersecurity. This is one of a very small number of portals for sharing clinical trial data and one of the only that is actively reaching out to industry so I expect a fair amount of uptake.



Collaboration opportunity on under-reported status of postdocs. Why? Because it’s my birthday!! (That’s why the newsletter is going out today, August 2nd.) Who wants to help visualize all the postdoctoral life science fellow data from the ten schools voluntarily reporting it to the Coalition for Next Generation Life Science? Right now, there are unified reporting standards but no unified dataset. Let’s unify and visualize postdocs by the numbers. BTW: MIT needs to have a long think about its 72/28 postdoc gender ratio. Only ten schools are opting in, so if you’re more of an activist type, you could put pressure on your own institute or a bunch of other institutes to report. This is a first step in getting more attention to the underpaid, longer-than-you-think, more commonly occurring stage of academic development that I call postdoc purgatory. Make it out on top and end up in a tenure-track job. Make it out on the bottom and realize you spent your 20s getting a PhD, your 30s getting a postdoc, and now you’re poor, effectively deported and looking for work in your home country where you have no connections because you haven’t lived there for 2 decades or more. (Most postdocs are foreign born.)



Speaking of academia, “to me it was just one big, stressful guilt trip” said Jennifer Fowlow, former Canadian women’s studies doctoral candidate, now successful florist. In Canada, only 50% of phd candidates in humanities complete their doctorates (it goes up to 80% in health sciences). For those who do finish, only one in five have full-time teaching jobs.



Michael Mauskopf of Columbia University and Noah Askin of INSEAD analyzed the audio structure and cultural attributes (danceability, anyone?) of 60 years of Billboard’s Top 40 hit songs. Turns out Top 40 songs are more likely to be different than the rest of what’s on the Top 40 at that time, but not so different. Yes, that last sentence nicely contradicts itself. Be different, but not too different is advice meant to paralyze anyone trying to pick a song, an outfit, a meal, a style of speech, an opening line, or *a predictive algorithm.* That’s why we’re talking about this. It strikes me that the way we listen to music now, aided by algorithms (different) and influencer-constructed playlists (not that different than disc jockeys), could morph us away from the modalities that brought us the hits in the past. What do you think, are Spotify and Pandora changing how hit songs come to be hits? (See DVotW.) Relatedly, audio on-demand streams grew 45.5 percent to 268.3 billion, according to new analysis by Nielsen Music.

Ah, the fruit fly, a model research organism joins the nematode worm as an organism that has had its entire neurological structure constructed and imaged. The fly brain team is led by Davi Bock and a crew from the Howard Hughes Medical Institute at Janelia Farm. Having such a detailed resolution of brain is somewhat akin to producing a platform as a service for research.



Ten years ago (2008-2009), Yale University had a data breach that leaked “names, Social Security numbers, dates of birth, Yale email addresses, and in some cases the physical addresses” for 119,000 individuals. They still don’t know who did it, but that same server was hacked again sometime between March 2016 and June 2018. Only 33 individuals were exposed that time. The moral of the story here is the length of time it takes to investigate security breaches. Years. Sometimes decades?? Please take precautions. Detection is a very hard problem.



Tufts University is offering a new Master of Science in Data Analytics spanning “the sciences, arts, and humanities.”

Brian Nosek at the University of Virginia has evidence suggesting that the age people feel they are correlates with their overall health. Most people tend to feel older than they are until the tipping point at age 25 when people start to feel younger than they are. The few people who feel older than they are have a higher risk of death. The write-up doesn’t assess whether there’s an unobserved variable, a major-bummer factor like a terminal illness, that could be causing both the aged feeling and the increased risk of death.



Brent Hecht, a computer scientist chairing the Future of Computing Academy (which sits within the Association for Computing Machinery), proposes that peer reviewers need to include an assessment of the societal impact of the research under review. This is a step in the right direction, but I wish that the review panels would then include social scientists who are trained to assess social impact working in conjunction with computer scientists.



University of California – Irvine neuroradiologists have launched the UCI Center for Artificial Intelligence in Diagnostic Medicine. The center will focus on AI interventions in diagnostics that will deliver near-term impacts to patients.



In another new Canadian partnership, University of Waterloo has partnered with Neuberger Berman to develop data science techniques, this time for investment management.


CMU, Bossa Nova to apply AI to retail analytics

Carnegie Mellon University, College of Engineering


from

Carnegie Mellon University has announced a research partnership with Bossa Nova, the leading provider of real-time, on-shelf product data for the global retail industry, to develop and integrate artificial intelligence into service robots in retail stores nationwide.


CEO Sundar Pichai: Google Translate handles 143 billion words daily

Business Insider, Troy Wolverton


from

During Google’s second-quarter earnings conference call on Monday, CEO Sundar Pichai revealed an intriguing piece of information that hints at the translation product’s moneymaking potential. The app translates a staggering 143 billion words every day, Pichai said. And, he added, it got a big boost during this summer’s World Cup soccer tournament.


Security Experts Warn Census Bureau: Beware of Hackers

CityLab, Kriston Capps


from

The U.S. is planning an experiment in democracy: The 2020 census will be the first in the nation’s history to be conducted electronically. The Census Bureau expects more households than not to participate in the process online using computers and even smartphones.

By ditching paper questionnaires, the bureau hopes to cut costs, streamline operations, and modernize the constitutionally mandated decennial count. But the decision to go from analog to digital couldn’t come at a worse time. Russia’s interference in the 2016 presidential election has raised root-level questions about the government’s readiness (and willingness) to shore up its cybersecurity protocols ahead of the midterms.

That’s why a murderer’s row of national security experts wrote to Commerce Department Secretary Wilbur Ross and Census Bureau Acting Director Ron Jarmin this week to ask for details about the bureau’s strategy for protecting its data. Putting sensitive data about every American within the potential reach of a foreign power’s hackers could undermine public confidence in the census—or worse.


Data sharing and the future of science

Nature Communications, Editorial


from

Who benefits from sharing data? The scientists of future do, as data sharing today enables new science tomorrow. Far from being mere rehashes of old datasets, evidence shows that studies based on analyses of previously published data can achieve just as much impact as original projects.


New Coalition Is Dedicated to Transparency About Ph.D. Outcomes

Chronicle of Higher Education, Letters,


from

We read with interest your article, “Colleges Can’t — or Won’t — Track Where Ph.D.s Land Jobs. Should Disciplinary Associations?” (The Chronicle, July 17). We applaud the work of the American Historical Association in tracking and reporting the career outcomes of 93 percent of the history Ph.D.s who graduated 2004-2013.

While you suggest universities are not investing necessary resources in collecting and publicizing information about career outcomes for their Ph.D. students, we call your attention to the work of the newly formed Coalition for Next Generation Life Science, an alliance of universities (and a research institute) committed to data transparency about outcomes for Ph.D. students and postdocs in the life sciences.

 
Events



UAI 2018 – Conference on Uncertainty in AI

Association for Uncertainty in Artificial Intelligence


from

Monterey, CA August 6-10. “The Conference on Uncertainty in Artificial Intelligence (UAI) is one of the premier international conferences on research related to knowledge representation, learning, and reasoning in the presence of uncertainty.” [$$$]

 
Deadlines



picoCTF – CMU Cybersecurity Competition

“picoCTF is the largest hacking contest ever. We are offering a limited number of exclusive sponsorship opportunities.”

Launch of the 3rd edition of CNIL-Inria Prize

“The third edition of the CNIL-INRIA “Privacy Protection” Prize starts on 16 May 2018. It will reward a scientific paper on privacy and personal data protection published in 2016-2017.” Deadline for submissions is September 1.
 
Tools & Resources



Cornell University Library Repository Principles and Strategies Handbook

CUL Public Wiki


from

In order to better coordinate a repository ecology that includes multitudinous individual systems, and synthesize staff knowledge and expertise that spans decades, the Repository Principles subgroup of CUL’s Repository Executive Group (RepoExec) has created this open handbook of repository principles and strategies.

The handbook provides support for both new and existing repository managers, comprising both recommended practices and specifically identified action steps that will allow them to track their progress and identify gaps. Each section of the handbook covers a different strategic area of repository management, standing largely on its own and linking to other sections when appropriate. Although there is no primary section order, we recommend starting with Repository Scope and Service Planning.


The Data Transfer Project

GitHub – google


from

The Data Transfer Project makes it easy for people to transfer their data between online service providers. We are establishing a common framework, including data models and protocols, to enable direct transfer of data both into and out of participating online service providers. http://datatransferproject.dev


Animation with CPPNs

Observable, Emily Reif


from

CPPNs (or Convolutional Pattern Producing Networks) are in essence quite simple: a model is trained on a single image, the input being an (x, y) coordinate and the output being the (r, g, b) value at that index.

As you can image, if you make the neural net big enough and trained for long enough, it could completely reconstruct the image!


Top 20 Python AI and Machine Learning Open Source Projects

Dataquest


from

“Many aspiring professionals and enthusiasts find it hard to establish a proper path into the field, given the enormous amount of resources available today. The field is evolving constantly and it is crucial that we keep up with the pace of this rapid development. In order to cope with this overwhelming speed of evolution and innovation, a good way to stay updated and knowledgeable on the advances of ML, is to engage with the community by contributing to the many open-source projects and tools that are used daily by advanced professionals.”


NLP-progress

Sebastian Ruder


from

“Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.”


Responsible sharing of biomedical data and biospecimens via the “Automatable Discovery and Access Matrix” (ADA-M)

npj, Genomic Medicine, Anthony J. Brookes et al.


from

Given the data-rich nature of modern biomedical research, there is a pressing need for a systematic, structured, computer-readable way to capture, communicate, and manage sharing rules that apply to biomedical resources. This is essential for responsible recording, versioning, communication, querying, and actioning of resource sharing plans. However, lack of a common “information model” for rules and conditions that govern the sharing of materials, methods, software, data, and knowledge creates a fundamental barrier. Without this, it can be virtually impossible for Research Ethics Committees (RECs), Institutional Review Boards (IRBs), Data Access Committees (DACs), biobanks, and end users to confidently track, manage, and interpret applicable legal and ethical requirements. This raises costs and burdens of data stewardship and decreases efficient and responsible access to data, biospecimens, and other resources. To address this, the GA4GH and IRDiRC organizations sponsored the creation of the Automatable Discovery and Access Matrix (ADA-M, read simply as “Adam”). ADA-M is a comprehensive information model that provides the basis for producing structured metadata “Profiles” of regulatory conditions, thereby enabling efficient application of those conditions across regulatory spheres. Widespread use of ADA-M will aid researchers in globally searching and prescreening potential data and/or biospecimen resources for compatibility with their research plans in a responsible and efficient manner, increasing likelihood of timely DAC approvals while also significantly reducing time and effort DACs, RECs, and IRBs spend evaluating resource requests and research proposals. Extensive online documentation, software support, video guides, and an Application Programming Interface (API) for ADA-M have been made available.

 
Careers


Full-time positions outside academia

Toxicologist



Office of Management and Budget; Washington, DC
Tenured and tenure track faculty positions

Assistant Professor of Anthropology



Brandeis University; Waltham, MA
Full-time, non-tenured academic positions

Deputy Director – Engelberg Center on Innovation Law & Policy



New York University, School of Law; New York, NY

Leave a Comment

Your email address will not be published.