Data Science newsletter – May 9, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for May 9, 2018

GROUP CURATION: N/A

 
 
Data Science News



Introducing Google AI

Google AI Blog, Christian Howard


from

To better reflect this commitment, we’re unifying our efforts under “Google AI”, which encompasses all the state-of-the-art research happening across Google. As part of this, we have expanded the Google AI website, and are renaming our existing Google Research channels, including this blog and the affiliated Twitter and Google+ channels, to Google AI. And if you’re looking for information that existed on research.google.com or the affiliated social channels, don’t fret, it’s all still there. Any links to previous Google Research website content, blog posts or tweets will redirect appropriately.

The Google AI channels will continue to showcase the breadth of Google research, innovation and publications, in addition to a lot more new and exciting content to come. We encourage you to explore! We look forward to continuing to bring you the latest updates and results from Google, in AI and across many other areas of research.


Government Data Science News

Canada has the support of its general public when it comes to using science to understand the environmental impact of policy decisions according to lead author Aerin Jacob. Refreshing! Sixty percent of industry respondents, however, were opposed to “strengthening scientific rigour.” That finding is simultaneously boring and infuriating. How’s this for a new coinage: furistasis. The feeling perpetual fury over an event, set of related events, or reported attitude that has become predictable.



The National Institutes of Health has been recruiting heavily for its “All of Us” precision medicine initiative. They have to recruit heavily to hit their goal of 1 million Americans. It’s got one of these new-fangled informed consent procedures in which the NIH can ‘inform’ enrollees that their data will be used to advance science (!) by doing, you know, precision medicine research. Now, some people are totally OK with that nebulous promise, but Professor Tim Caulfield takes issue with the heady PR language NIH is using (which is related to, but different than their informed consenting procedure). No letter of informed consent that I have ever seen has been allowed to promise that the research would “heal the sick and eradicate disease” or that, “the next great breakthrough will be found” in the subject.

The National Science Foundation is selling the research ship Marcus S. Langseth in 2020 leaving researchers who study the crustal image of the ocean floor scrambling for alternatives. Like so many data scientific fields, this type of marine science is tied closely to data collecting instrumentation. Without access to this ship, there are essentially no comparable alternative ships available to American researchers. The problem? It costs $70,000 per research day to keep the Langseth going.



The 2020 Census is going to undercount people. Vox has a smart, cute comic strip explainer that demonstrates why, how, and with what consequences the Census became a political issue.


Protecting confidentiality in genomic studies

MIT News


from

Cryptographic system could enable “crowdsourced” genomics, with volunteers contributing information to privacy-protected databases.


Mobile data usage overtakes Wi-Fi, and CBRS threatens

Network World, Patrick Nelson


from

Wi-Fi could be in for a rough ride. It’s getting hit by cellular data, plus a new chunk of data-friendly spectrum: the Citizen Broadband Radio Service (CBRS).


Scientists warn against losing a crucial research ship: The National Science Foundation ‘has betrayed us’

The Washington Post, Ben Guarino


from

Marine scientists are bracing for the loss of the world-class research vessel Marcus G. Langseth. The National Science Foundation plans to sell the 235-foot ship in 2020, according to a “Dear Colleague” letter published on the agency’s website last month. Without a vessel to replace the Langseth, ocean seismologists fear their field will suffer.

“We’re not trying to save the Langseth at all costs,” said James Austin, a geoscientist at the University of Texas at Austin. “We’re trying to save deep-ocean crustal imaging.”

Deep-ocean crustal imaging is where the Langseth excels. It is no ordinary ship. Its sophisticated array of pneumatic guns generates a blast that bounces off the Earth’s crust and penetrates dozens of miles into the planet. Unspooled behind the ship, miles of cables strung with microphones capture the blast’s reflection. This sonic bounce creates maps of mid-ocean-ridge magma chambers and tectonic plate edges, features that are otherwise difficult, if not impossible, to survey.


Microsoft Charts Its Own Path on Artificial Intelligence

WIRED, Business, Tom Simonite


from

At its annual Build conference Monday, Microsoft will suggest companies with big AI ambitions should steer clear of chips like Google’s. It says machine learning is evolving so fast that it doesn’t make sense to burn today’s ideas permanently into silicon chips that could soon prove limiting or obsolete.

Instead, Microsoft is pitching the idea of running AI projects atop chips called FPGAs, whose designs can be reprogrammed to support new forms of software on the fly. That allows Microsoft to avoid having to dabble in designing silicon for its servers—it buys its FPGAs from chip giant Intel. Some companies are already buying into Microsoft’s vision.


Fifty years of statistics at Penn State

Penn State University, Penn State News


from

Penn State’s Department of Statistics is celebrating its 50th anniversary this month. From humble beginnings in the 1960s, it has grown to become one of the leading centers in the world for research in statistical theory and applications. The department also educates thousands of students across the University — students in numerous other disciplines as well as its own majors — and is nationally recognized for statistics education, theoretical statistics research, and statistical approaches to solving problems in a large variety of scientific disciplines.

It is reasonable to ask what an academic statistics department does; after all, nearly everyone encounters statistics in their lives, and many people even take an introductory statistics course or two, but very few are ever immersed in the life of a research department like this one. The late John Tukey — a highly influential and colorful statistician, and the person who coined the terms “software” and “bit” — once said that the best thing about being a statistician is that you get to play in everyone’s backyard. His point was that statistics is highly interdisciplinary, and this is one of the aspects of the subject that attracts so many quantitatively minded scientists to the field.


Computation counts

MIT News, School of Engineering


from

Students are flocking to 6.00 (Introduction to Computer Science and Programming Using Python), where they learn not just coding but computational thinking.


The Salaries Of Data Scientists Remain Steady But Still Sexy With AI On The Horizon

Forbes, Gil Press


from

The just-published 5th-annual Burtch Works Study: Salaries of Data Scientists provides fresh insights into the compensation trends for those holding the sexiest job of the 21st century:

  • Similar to last year’s figures, the median base salaries of data scientists have remained relatively steady. After rising rapidly in 2015 and 2016, median base salaries at all job levels changed by a single-digit percentage point or not at all from March 2017 to March 2018.
  • The median base salaries range from $95,000 at level 1 (0-3 years of experience) to $165,000 at level 3 (9+ years) for individual contributors and from $145,000 at level 1 (1-3 reports) to $250,000 at level 3 (10+ reports) for managers.

  • Nakasone takes helm at NSA and newly elevated Cyber Command

    Fifth Domain, Mark Pomerleau


    from

    Gen. Paul Nakasone assumed the directorship of the National Security Agency and Cyber Command, now officially a unified combatant command, from Adm. Michael Rogers in a ceremony May 4.

    In doing so, Nakasone became “the primary guardian of our nation’s cyber domain, said Patrick Shanahan, deputy secretary of defense.


    Who’s Winning the Self-Driving Car Race?

    Bloomberg, Hyperdrive, David Welch and Elisabeth Behrmann


    from

    A scorecard breaking down everyone from Alphabet’s Waymo to Zoox.


    [1805.00400] Computing Environments for Reproducibility: Capturing the “Whole Tale”

    arXiv, Computer Science > Computers and Society; Adam Brinckman et al.


    from

    The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale project aims to address these barriers by connecting computational, data-intensive research efforts with the larger research process–transforming the knowledge discovery and dissemination process into one where data products are united with research articles to create “living publications” or “tales”. The Whole Tale focuses on the full spectrum of science, empowering users in the long tail of science, and power users with demands for access to big data and compute resources. We report here on the design, architecture, and implementation of the Whole Tale environment.


    Extra Extra

    Hello, journalists and data scientists who have to explain your field to your parents and others who are excited about AI and robots (wowee!), but still don’t understand what you actually do. Pete Warden, academic turned Googler, has a lovely blog post that describes tomorrow’s robots as similar to pets. His viewpoint is equal parts accessible, accurate, and endearing.



    The American Red Cross maintains a crucial part of medical infrastructure: the blood supply. They decided to test for Zika which was expensive ($137 m), but logical. Now they are being blamed for spending too much because not enough people who donated blood tested positive for Zika. Is this backwards thinking or what?


    Data Science for Undergraduates: Opportunities and Options

    National Academies of Sciences, Engineering, and Medicine; Division on Engineering and Physical Sciences. CSTB


    from

    As our economy, society, and daily life become increasingly dependent on data, work across nearly all fields is becoming more data driven, affecting both the jobs that are available and the skills that are required. At the request of the National Science Foundation, the National Academies of Sciences, Engineering, and Medicine were asked to set forth a vision for the emerging discipline of data science at the undergraduate level. The study committee considered the core principles and skills undergraduates should learn and discussed the pedagogical issues that must be addressed to build effective data science education programs. Data Science for Undergraduates: Opportunities and Options underscores the importance of preparing undergraduates for a data-enabled world and recommends that academic institutions and other stakeholders take steps to meet the evolving data science needs of students.


    Seven Visualization Talks That Terrified Me At CHI

    Eager Eyes blog, Michael Correll


    from

    I recently attended CHI 2018 in Montréal, QC. Normally conferences leave me excited for the next idea or the next direction (and also physically exhausted). This was one of the first where I was left feeling terrified: A lot of the work did an excellent job of highlighting core problems about our assumptions as visualization researchers, and poked at big intractable issues that I had mostly been ignoring for a long time. There were also exactly 666 accepted papers, which is not a good omen either.

    Of course, there was lots of other interesting vis work at CHI (they even let me give a couple talks this year), but these were the talks I was pondering on the flight home.

     
    Events



    Best Practices for Scaling Data Science Across the Organization

    KDnuggets, Forrester, Anaconda


    from

    Online “Join Forrester and Anaconda for a webinar on Thursday, May 17, at 2:00 PM CT, to learn best practices for scaling data science across your entire organization. Learn how to tackle five key challenges facing organizations today!”


    Data Science Journalist Nate Silver and Former U.S. Assistant Health Secretary Karen DeSalvo Headline NYU Langone’s Population Health Conference on Data Science and Public Action

    NYU Langone Health


    from

    New York, NY “The Department of Population Health at NYU Langone Health is holding a conference on May 21, 2018, about how the healthcare and public health sectors can work with other partners in leveraging data science to advance health outcomes and health equity.” [sold out, waiting list registration]


    3rd Annual Crops in silico Symposium & Hackathon

    University of Illinois


    from

    Urbana, IL July 31-August 3, organized by University of Illinois researchers in Plant Biology and Information Science. [registration required]

     
    Deadlines



    Summer Institute in Computational Social Science Partner Site

    Boulder, CO August 13-17 at University of Colorado. “The purpose of the Summer Institute is to bring together graduate students, postdoctoral researchers, and beginning faculty interested in computational social science. The Summer Institute is for both social scientists (broadly conceived) and data scientists (broadly conceived). This satellite program is co-organized by Brian Keegan and Allie Morgan.” Deadline for applications is June 1.

    2018 Lasi-Nordic

    Copenhagen, Denmark August 29-30. “LASI-Nordic draws together the Nordic Learning Analytics community to learn from experts, to share research and experience, and to investigate if there is a Nordic approach to learning analytics. Over the course of two days, there will be keynote talks, hands-on workshops, tutorials, panels, and a firehose & poster/demo session in which to participate.” Deadline for abstract submissions is July 1.

    TrackML Particle Tracking Challenge

    “A team of Machine Learning experts and physics scientists working at CERN (the world largest high energy physics laboratory), has partnered with Kaggle and prestigious sponsors to answer the question: can machine learning assist high energy physics in discovering and characterizing new particles?”

    “Specifically, in this competition, you’re challenged to build an algorithm that quickly reconstructs particle tracks from 3D points left in the silicon detectors.” Deadline for entries is August 6.

     
    Tools & Resources



    Should I use an algorithm here? EFF’s 5-point checklist

    Boing Boing, Cory Doctorow


    from

    The Electronic Frontier Foundation’s Jamie Williams and Lena Gunn have drawn up an annotated five-point list of questions to ask yourself before using a machine-learning algorithm to make predictions and guide outcomes.

    The list draws heavily on two essential recent books on the subject: Cathy O’Neil’s Weapons of Math Destruction and Virginia Eubanks’s Automating Inequality, both of which are essential reads.


    ORNL Data Scientists Release pbdR 1.0

    HPC Wire


    from

    Scientists at the Department of Energy’s Oak Ridge National Laboratory have released pbdR 1.0, a full suite of software packages they developed to make the R programming language easy to use for high-performance computing. R is the most commonly used statistical data analysis software among academic researchers.

    Intended to enable big data analysis — analyzing large groups of data on leadership-class supercomputers such as ORNL’s Titan and Summit — among research disciplines that typically work with small datasets, pbdR offers ease of installation and use, computational speed and capability across multiple operating systems.

     
    Careers


    Tenured and tenure track faculty positions

    Associate Director of the Data Science in Public Policy Program



    Georgetown University, McCourt School of Public Policy; Washington, DC
    Full-time positions outside academia

    Vice President, Data Science & Analytics



    HomeAway; Austin, TX

    Leave a Comment

    Your email address will not be published.