Data Science newsletter – June 6, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for June 6, 2018

GROUP CURATION: N/A

 
 
Data Science News



Government Data Science News

The National Institutes of Health has announced a strategic plan to “modernize the NIH-funded biomedical data ecosystem”. It starts with hiring a Chief Data Strategist who will orchestrate data strategy for the All of Us Program (precision medicine), the BRAIN Initiative (neuroscience), the Cancer Moonshot, and the Opioid Initiative. In essence, this is the role of US biomedical data czar – “By 2025, the total amount of genomics data alone is expected to equal or exceed totals from the three other major producers of large amounts of data: astronomy, YouTube, and Twitter.” (Don’t ask me how Twitter jumped ahead of the rest of Google or Facebook in that quote. They are biomedical scientists, not social media experts.) I’m extremely optimistic that they will strike the appropriate balance between advancing scientific goals as fast as possible without sacrificing patients’ rights. There is a long history of involving biomedical ethicists and developing professional protocols in biomedical research.



NIH released findings from the National Cancer Institute’s Molecular Analysis for Therapy Choice (NCI-MATCH) program. Patient participation is 62.5%, much higher than the predicted 25%. Three drugs used to treat different cancers showed some efficacy, allowing further research to proceed. In all cases, the drugs were matched to patients using recommendations from precision medicine approaches.

Taiwan’s Ministry of Science and Technology forged a new decade-long partnership with GPU-maker NVidia. The extensive partnership includes supercomputing infrastructure, research, education, a start-up accelerator program, and investment into Taiwanese manufacturing, healthcare, urban safety, and transportation. One cool project already underway: Tainan City is using drones to inspect bridges for structural integrity.



Washington State’s Attorney General, Bob Ferguson, is suing Facebook and Google for breaking state law by failing to maintain and make available data about political advertisers on their sites. Neither company has commented. Facebook took in over three times as much as Google, according to a story in TechCrunch.



Canada is fighting hard to prevent brain drain to Silicon Valley and New York with a new article touting all the benefits of staying in Canada. Oddly, they fail to mention the TrudeauTrump gradient.



US Chief Technology Advisor Michael Kratsios announced that he would do “anything we can do to unlock government data” for artificial intelligence research. He warmly encouraged, “any academic that has any insights” to get in touch regarding which data could be useful. I am hopeful that this will turn into a meaningful commitment, but the fact that he didn’t mention any specific data, timeline, partnership plans, or other clear strategy makes it feel like an empty promise. Let’s hope I am wrong.



Singapore, on the other hand, has released a detailed strategy for weaving ethical considerations into the development of machine learning and other AI technology. All that Singaporean law-making is looking pretty good right now.


Many Breast Cancer Patients Can Safely Skip Chemo, Large Trial Confirms

Forbes, Elaine Schattner


from

A large study confirms that many patients with early-stage breast cancer don’t need chemotherapy after surgery. The TAILORx trial evaluated a pathology test called OncotypeDx (Genomics Health), which predicts risk of recurrence, and found the test to be reliable.

The TAILORx trial involves over 10,000 women with the most common form of breast cancer. All had hormone receptor (HR) positive (+), Her2 negative disease, with tumors not involving lymph nodes. All had Oncotype testing of their tumors. Now with 8 years of follow-up, the main finding is that for patients with low or intermediate Oncotype results, taking chemotherapy in addition to hormone-blocking endocrine treatment does not affect survival or lower the likelihood of cancer’s recurrence.

As a former practicing oncologist and survivor of early-stage breast cancer, I think the TAILORx findings are among the most important reported since I entered the field. These results have the potential to reduce overtreatment of breast cancer—stopping patients’ unnecessary exposure to chemotherapy—in a large fraction of cases.


Company Data Science News

Facebook, while under a 2011 consent decree from the US Federal Trade Commission that prohibited the social media behemoth from sharing user data with third parties unless the users had explicitly consented, shared user data with third parties without obtaining their consent. The New York Times investigation found that hardware makers like BlackBerry, Apple, Samsung, Amazon and 56 other device makers were considered “extensions of Facebook” by Facebook executives, thereby exempting them from the FTC consent decree that applied to data sharing partnerships.

If you are feeling confused and angry at Facebook’s disingenuous rule-bending, so is former FTC official Jessica Rich who spoke bluntly in the Times’ article, “Under Facebook’s interpretation, the exception swallows the rule.” At stake is a fundamental, perhaps deliberate, misunderstanding of what consent means. I will clarify how consent works: only first parties can give consent. I can give consent to Samsung to copy and mine my data. I cannot give consent to Samsung to mine data from my Facebook friends.

Still confused? Imagine if we applied Facebook’s version of consent to sexual relationships. It would look a lot like Gayle Rubin’s assessment that when men (usually fathers) were able to give consent to sexual access to women (usually their daughters), women became less than full people. They became valuable physical and symbolic links that protected the economic status of the men in their lives. They were ‘trafficked’ to build kinship ties between men, playing a key role in the patriarchal structure of wealth transfer. A German regulator, who did not cite Gayle Rubin’s discussion of commoditized people, commented that “BlackBerry users seem to have been turned into data dealers, unknowingly and unwillingly.” I won’t claim that using Facebook supports the patriarchy, though at a couple levels of remove, this is true. What I do argue is that corporate powerhouses – device makers and social media behemoths – are enacting an age-old strategy to protect their own dominant positions. Turn people into commodities without their consent. Leave them with no obvious way to refuse, abstain, or see clearly that they should want to do so.



Speaking of data sharing: Two companies – Luna DNA and Nebula Genomics – want to collect DNA and pay people to share it with pharmaceutical companies. Nebula, based in San Francisco, wants pay its customers in cryptocurrency. Neat!



This brings me to a second key requirement when obtaining consent: consent can only be obtained when the person is fully informed of the consequences of their participation. Neither of these companies has explained basic consequences such as: how much users will receive, what would happen if their data were leaked or stolen, and what would happen if the US rescinds its current commitment to the idea that pre-existing conditions cannot be considered when setting health insurance prices. Oh, and I have no idea how to consent someone to being paid in cryptocurrency, given the volatility and short history of that form of payment.



A Canadian company, integrate.ai is going to confront these consent questions immediately as they aim to be an ethical AI company while relying on data sourced from social media sites. They are one to watch – no word yet on how they plan to address the pain point that arises at the overlap between ethical consenting practice, available social media data, and business models relying on the ingestion thereof.



Alphabet shareholders – many of whom are executives at the company – voted against several proposals put forward by employees to link pay to diversity goals and pressure Google to release more information about its content moderation strategies, especially on YouTube. Keep fighting the good fight, Googlers. This isn’t a defeat, it’s just one more step on the path to success.



The day after this shareholder meeting, Alphabet CEO Sundar Pichai announced the underlying principles guiding AI at Google. The fourth principle is “be accountable to people”. I am scratching my head trying to figure out how this is working, given that they literally *just* refused to share information on how content moderation works. [See the government section to see how they are refusing to share political campaign ad data with the state of Washington, who has gone ahead and sued the company.] One of our fellow readers asked me, “will they really not use ML to reinforce stereotypes? It’s literally all their technology currently does! And will they really DESIGN for privacy? It would require discarding their entire codebase and restarting from scratch.” Please, please, let me know what you think. I want to be excited that Google is trying to care – it seems better than Facebook – but capitalism’s demand for growth seems to be the rock against which all good intentions are smashed.

Intel is working on Low Power Display Technology that should improve the battery life in your laptop, possibly giving you up to 28 hours.



Microsoft
dropped a data center into the ocean near Scotland. The goal is to use cold ocean water to naturally cool the 864 servers, reducing demand for electricity by up to 20 percent. The data center will be connected to land by a cable that provides both a fiber connection and electrical power. This data center is considered experimental and will be monitored for computational and environmental impact.

Spotify is planning to make a “category defining” hardware product. So far, there’s only speculation on what it will be, but they have registered with the Federal Communication Commission, a key step in the process.


Backstory: How we honed in on the idea of selling your own DNA for profit

The San Diego Union-Tribune, Gary Robbins


from

There’s been so much news lately about people’s genetic makeup it’s been hard to choose which story to pull aside for a closer look.

The alleged Golden State Killer was identified, in part, because his DNA was uploaded to a genealogy database.

The new book “FutureFace” makes it clear that consumers won’t always get identical results if they submit their DNA to different personal genomics companies to learn about their ancestry.

Then we learned that a couple of startups — Luna DNA of Solana Beach and Nebula Genomics of San Francisco — plan to help consumers get paid for sharing their genetic material with scientists.


University Data Science News

Kipp Johnson (Mount Sinai) Jessica Torres Soto (Stanford), and others who have worked with anchor author Joel Dudley (Mount Sinai School of Medicine) published a paper explaining why and how supervised and unsupervised learning will supercharge doctors treating cardiovascular patients. Because cardiologists often have more data than other types of specialists, they are likely to reap the rewards of data-voracious machine learning methods more quickly. This is a *great* paper to use if you are trying to teach interdisciplinary data science. It walks through various types of machine learning models, compares them to linear regression, and situates them in the context of cardiovascular medicine. Smashingly well done.



University of Manchester researchers have a new behavior-based biometric airport security system. By analyzing the amount of pressure, distance between footsteps, timing of footsteps, and pressure points, the researchers were able to correctly identify individuals with only a 0.7% error rate (I was unable to link to the entire paper to investigate these impressive claims further). The database they used had 20,000 footsteps from 127 people. The researchers are also planning to apply gait analysis to identify the onset of cognitive decline in aging patients. This is the future of security. Expect to see many more behavior based strategies replacing highly exploited, threadbare security protocols revolving around names, social security numbers, dates of birth, and PINs. The future of security is nearly here and it will be so much better.



Ripple, the blockchain company that owns cryptocurrency XRP, has committed $50m to its University Blockchain Research Initiative (URBI) to, you know, study the blockchain. Seventeen universities – including UC-Berkeley, Penn’s Wharton School and Princeton – have already been awarded funding. Because Ripple is a blockchain company, there is some hope it will be able to survive the cryptocurrency hype, but I wouldn’t count on another $50m becoming available annually.



The University of Washington announced a restructuring of the admissions process for Computer Science (CS) majors. In the past, students were admitted first to UW and declared their CS major after a year of coursework. Now, students interested in CS will be admitted directly to the CS major right out of high school. Students who don’t get into the major will have a hard time getting into CS classes because course enrollment policies prioritize students in the major. This is one way to transparently, if bluntly, address the rising demand for CS degrees: let students know they need to commit to the major early and go to a different school if they don’t get in. It would be even better if they could hire more CS faculty, but we all know how hard it is to compete for top talent there.



MIT announced a new joint urban science and computer science major, course 11-6 for those familiar with the Institute’s numbering system. In their words the major will combine, “urban planning and public policy, design and visualization, data analysis, machine learning, and artificial intelligence, pervasive sensor technology, robotics, and other aspects of both computer science and city planning.” My take: It’s a major that focuses on the digitization of the experience of living and working in cities, explicitly considering the human and social impacts of infrastructural change.



The National Academy of Sciences put together a committee to figure out which core competencies undergraduates should develop in data science programs. They came up with a smart, nuanced set of guidelines that emphasize interdiscipinarity: “Data education should expand throughout academic disciplines, but it is necessary that students trained in data science are exposed to continual curricular and co-curricular opportunities to hone vital communication and teamwork skills. Inclusivity refers not only to diversifying student’s exposure to data, but also requires faculty and institutions to think inclusively about collaboration.” Note that they do not think of interdisciplinarity as a set of canonical methods or hard problems, they think of it as a practice, almost as a method in itself. In my experience, this is an area that is severely, cripplingly lacking in data science education. Please, let’s stop thinking of these as (pejorative) soft skills and start thinking of them as professional skills.



Two Swiss scholars, Cristina Kadar and Irena Pletikosa, combined NYC MTA subway data, Foursquare check-in data, American Community Survey (e.g. Census) data, and NYPD open crime data to improve upon current crime forecasting models. They were able to take the current best-in-class R-squared from 65% to 89% by including data about inflows to neighborhoods. This was particularly effective for grand larceny; the Census data was already capable of explaining assault rates fairly well. Of course, what is missing from the underlying data is the comparison between rates of crime committed versus rates of crime reported. Many crimes go unreported, which is likely related to different expectations of apprehension rates. If citizens don’t expect criminals to be prosecuted – say, for instance, women who know their sexual assaults will devolve into “he said it was consensual” or “you shouldn’t have worn/done/said XY or Z” so they stay quiet or communities of color that haven’t been able to rely on police to investigate *homicides* – they are less likely to report crimes in the first place. I’m a bit guilty of blaming the authors for writing the wrong paper, a common academic faux pas, but I am concerned about increasing our ability to ratchet up police pressure on communities that are already over represented in existing datasets. The consequences of overfitting here are serious.


Scientists race to reveal how surging wildfire smoke is affecting climate and health

Science, Warren Cromwell


from

Emily Fischer is likely one of the few people whose summer plans were buoyed by a recent forecast that much of the western United States faces another worse-than-normal wildfire season. Unusually warm weather and drought, together with plenty of dry grass and brush, are expected to create prime conditions for blazes this summer, federal officials announced on 10 May.

The forecast has local officials bracing for the worst. But it represents an opportunity for Fischer, an atmospheric scientist at Colorado State University in Fort Collins who is preparing to spend the summer flying through plumes of wildfire smoke aboard a C-130 cargo plane jammed full of scientific equipment. The flights are the highlight of an unprecedented effort, costing more than $30 million, that involves aircraft, satellites, instrumented vans, and even researchers traveling on foot. Over the next 2 years, two coordinated campaigns—one funded by the National Science Foundation (NSF), and the other by NASA and the National Oceanic and Atmospheric Administration (NOAA)—aim to better understand the chemistry and physics of wildfire smoke, as well as how it affects climate, air pollution, and human health.


Footsteps, Pressure Sensors, and AI: The Next Step in Airport Security

EE Times, Nitin Dahad


from

The University of Manchester’s school of electrical and electronic engineering, in collaboration with the University of Madrid, has developed a behavioral biometric verification system that can measure a human’s individual gait or walking pattern, enabling successful identification of a person simply as they walk on a pressure pad in the floor and analyzing the footstep’s 3D and time-based data.

With this system, the researchers claim that the way a person walks and analysis of that individual’s footsteps could be used as a biometric at airport security instead of fingerprinting and eye-scanning, providing a non-intrusive method of identity verification.

The results, published in a machine-learning research journal, the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), earlier this year showed that, on average, the AI system developed correctly identified an individual almost 100% of the time, with just a 0.7% error rate.


The ‘Moneyball’ Effect: Alumni Find Fulfilling Careers in Major League Baseball

University of Virginia, UVA Today


from

In the last several years, a number of University of Virginia graduates have joined their ranks, crunching numbers for Major League Baseball teams.

“‘Moneyball’ got everybody thinking about sports analytics,” said Arlyn Burgess, the associate director for operations and strategic initiatives at UVA’s Data Science Institute. “It is certainly a niche that we want to make sure that we’re covering because we see such an excitement from students.”


Chinese universities to offer new cutting-edge science courses to freshmen

Global Times (CN), Yin Han


from

Hundreds of new college majors on cutting-edge science will be available to college freshmen, which is expected to help narrow China’s talent gap in high-tech fields, expert said.

The Ministry of Education approved in March a total of 2,311 new majors, including those on data science, robotic engineering, artificial intelligence (AI) and cyber security. Many will be available the coming fall term, chinanews.com reported.

A data science and big data technology major will be available in 250 universities and colleges in China, including Nankai University, Tongji University and Xiamen University.


Stanford makes a startling new discovery. Ethics

ZDNet, Chris Matyszczyk


from

Now, in a glorious moment of chest-beating and head-bobbing, Stanford University president Marc Tessier-Lavigne has admitted that his university — which spawned so many young, great tech titans, such as the founders of Google, Instagram and LinkedIn — failed to make titanic efforts in the area of ethics.

In an interview with the Financial Times, he revealed that the university now intends to explore the teaching of “ethics, society and technology.”

As we survey the political and social carnage that seems to have been enabled by technology over the last few years, it’s remarkable that this wasn’t thought of before.


World-Renowned AI Scientists, Dr. Sebastian Seung and Dr. Daniel Lee Join Samsung Research

Samsung Global Newsroom


from

Samsung Electronics today announced that it is adding prominent artificial intelligence (AI) experts Dr. H. Sebastian Seung, the Evnin Professor in the Neuroscience Institute and Department of Computer Science at Princeton University, and Dr. Daniel D. Lee, the UPS Foundation Chair Professor in the School of Engineering and Applied Science at the University of Pennsylvania, to expand its global AI R&D capabilities.

At Samsung Research, Drs. Seung and Lee will play a central role in building up fundamental research on AI that will advance human knowledge with the potential for revolutionary business impact. “Samsung is a company with a long history of pursuing innovation, and is committed to tapping the full potential of artificial intelligence,” said Dr. Seung. “I look forward to working at Samsung to help discover what lies ahead in AI.”


Ripple, Blockchain-Based Payment Network, to Grant $50M to 17 Universities for Blockchain, Cryptocurrency Research, Workforce Development

The 74, Emmeline Zhao


from

A blockchain-based payment network is granting $50 million to 17 universities around the world as an investment in shaping the workforce of the future.

Ripple, a network that bills itself as using blockchain technology to enable individuals, banks, and payment providers to exchange payments globally and instantly, announced Monday the launch of the University Blockchain Research Initiative. Through UBRI, Ripple will donate more than $50 million to 17 initial university partners around the world to “support academic research, technical development, and innovation in blockchain, cryptocurrency, and digital payments,” the company said in a statement.

 
Events



Big Data Toronto Brings Canada to the Centre Stage in Big Data and AI

Corp Agency


from

Toronto, Ontario, Canada June 12-13. “Big Data focuses on the skills, software and leadership needed to implement data insights & AI Toronto is dedicated to Toronto’s growing AI and deep learning communities.” [$$$]


The Proliferation of Artificial Intelligence – Will AI Be the Biggest Disruption in the Next 10 Years?

VLAB


from

Stanford, CA June 19, starting at 6 p.m., Stanford Faculty Club. Panelists: Patrick Ehlen from Loop AI Labs, SriSatish Ambati from H2O.ai, Shubho Sengupta from Facebook AI Research and
Jishnu Bhattacharjee from Nexus Venture Partners. [$$]


NYCML ’18

NYC Media Lab


from

New York, NY September 20. “NYCML’18 features keynotes and discussions that explore the intersection of technology and civic engagement—including the key trends that will drive how media is produced, consumed, distributed and monetized.” [$$$]

 
Deadlines



Sage Digital Health Catalyst Program

“The often costly nature of leveraging emerging technologies creates a high barrier to entry for researchers wanting to effectively deploy digital health technologies at scale. The Digital Health (DH) Catalyst Program aims to address this by providing pro bono consulting and in-kind infrastructure to support innovative ideas for biomedical research studies that leverage digital health technologies to answer a pressing scientific question.” Deadline for First Round applications is July 31.
 
Moore-Sloan Data Science Environment News



Wow! Stunned to win #polmeth emerging scholar awd, and deeply appreciate all the lovely tweets, emails people sent. Thanks to every person (incl R2!) who helped me along the way!

Twitter, Arthur Spirling


from

 
Tools & Resources



HEI making MOSES ozone data available to public

Green Car Congress


from

“The Health Effects Institute (HEI) is making available to the public the database and material of the Multicenter Ozone Study in oldEr Subjects (MOSES). MOSES measured a large number of cardiovascular and respiratory endpoints in 87 healthy participants (60 years old on average) who were exposed to 0, 70, or 120 parts per billion ozone for 3 hours while exercising moderately.”


How to Explore Apple Watch Data in R

DataScience.com, inertia7, Raul Eulogio


from

“One of the first challenges is making sense of the vast amount of data available. In this post, I will conduct an exploratory analysis of data from a friend’s Apple Watch. The data is anonymized and I will focus on exploring one factor that was measured by the app: active energy burned (although, in future iterations, I would like to include other categories gathered by the watch).”


Nvidia launches Isaac robot platform with Jetson Xavier robot processor

VentureBeat, Dean Takahashi


from

Nvidia launched its Nvidia Isaac robot platform today to power the next generation of autonomous machines, bringing artificial intelligence capabilities to robots for manufacturing, logistics, agriculture, construction, and many other industries.

Launched at Computex 2018 in Taiwan by Nvidia CEO Jensen Huang, the Nvidia Isaac platform includes new hardware, software, and a virtual-world robot simulator that makes it easy for developers to create new kinds of robots.


A database of biological and geomorphological sea-level markers from the Last Glacial Maximum to present

Nature, Scientific Data; F.D. Hibbert, F.H. Williams, S.J. Fallon & E.J. Rohling


from

“The database is global in scope, internally consistent, and contains U-series and radiocarbon dated indicators from both biological and geomorpohological archives. We focus on far-field data (i.e., away from the sites of the former continental ice sheets), but some key intermediate (i.e., from the Caribbean) data are also included. All primary fields (i.e., sample location, elevation, age and context) possess quantified uncertainties, which—in conjunction with available metadata—allows the reconstructed sea levels to be interpreted within both their uncertainties and geological context.”

 
Careers


Postdocs

Post-doc



Rutgers University, Department of Computer Science; Piscataway, NJ

AI Research Fellow



The Alan Turing Institute; London, England
Full-time, non-tenured academic positions

Social Science Research Coordinator



Stanford University, School of Humanities and Sciences; Stanford, CA

Developer



University of Michigan, Institute for Social Research; Ann Arbor, MI

Program Coordinator



University of Chicago, Harris School of Public Policy and the Department of Computer Science; Chicago, IL
Full-time positions outside academia

Executive Director



Simply Secure; Berlin, Germany

Software Engineer Manager, TensorFlow Extended Open Source



Google; Mountain View, CA

Leave a Comment

Your email address will not be published.