Data Science newsletter – May 11, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for May 11, 2018

GROUP CURATION: N/A

 
 
Data Science News



AI in Finance: The latest news & tech

REWORK


from

  • Sony Bank to use AI to bolster lending operations
  • AppZen, who promises to transform back office enterprise operations, announces new partnership with Coupa
  • Baidu raises more than US$1.9 billion for artificial intelligence-powered finance arm
  • South African Bank Becomes Biometric ATM Pioneer
  • Goldman Sachs Buys Personal Finance App Clarity Money

  • Study: There’s still very little evidence that health apps work

    MobiHealthNews, Jonah Comstock


    from

    A new overview of systematic reviews of published randomized control trials of mobile health apps found just 23 RCTs of currently-available apps have been conducted, and less than half of those showed a positive health effect from the app in question.

    A group of researchers from the Centre for Research in Evidence-Based Practice at Bond University in Queensland, Australia conducted the review, which was published in Nature’s new Digital Medicine journal.

    “Smartphone popularity and mHealth apps provide a huge potential to improve health outcomes for millions of patients,” the researchers wrote. “However, we found only a small fraction of the available mHealth apps had been tested and the body of evidence was of very low quality. Our recommendations for improving the quality of evidence and reducing research waste and potential harm in this nascent field include encouraging app effectiveness testing prior to release, designing less biased trials, and conducting better reviews with robust risk of bias assessments. Without adequate evidence to back it up, digital medicine and app ‘prescribability’ might stall in its infancy for some time to come.”


    Max Planck-Yale Center for Biodiversity Movement and Global Change launched

    Yale University, YaleNews


    from

    In a world of changing climate and shifting habitat, species of all sorts are on the move — and so is Yale’s biodiversity research.

    On May 2, Yale launched the Max Planck–Yale Center for Biodiversity Movement and Global Change with a daylong symposium. The center, a partnership with the Max Planck Society, will channel new resources, technology, and expertise into advancing scientific knowledge of species movement through time.

    “The establishment of this center emphasizes the importance of international collaborations in understanding and solving challenges we all share,” said Yale President Peter Salovey. “In today’s increasingly complex and interconnected landscape, investigators across disciplines and around the world must work together to produce research and technological innovations.”


    Testing for Zika in blood donors finds few infections — at $5.3 million each

    STAT, Helen Branswell


    from

    An expensive screening program designed to keep the Zika virus out of the Red Cross’ blood supply has caught fewer than a dozen infected donations, a new study published Wednesday revealed.

    The program, which costs roughly $137 million a year to operate, detected only eight units that tested positive for the virus between June 2016 and September 2017. And half of those units contained Zika antibodies as well as virus, which suggests they probably would not have been able to infect a recipient, if anyone had been transfused with them.

    The high screening cost and low number of positive detections works out to about $5.3 million for each positive unit the Red Cross pulled from the system, said the study, published in the New England Journal of Medicine.


    Allen Integrated Cell is a powerful tool for visualizing biology in 3D

    TechCrunch, Devin Coldewey


    from

    What does a cell look like? If you had to draw one, you’d probably do the usual thing: a sort of fried egg with a nucleus yolk and a couple of ribosomes peppered around, maybe a rough endoplasmic reticulum if you’re fancy. But cells are vastly more complicated than that, not to mention three-dimensional. Allen Integrated Cell is a new tool that lets anyone visualize cells the way they actually exist in the body.

    It’s from the Allen Institute for Cell Science, a Seattle research outfit founded by Microsoft co-founder Paul Allen . The institute has been doing research in this direction for some time, but today is the first public release of the Integrated Cell program, which you can try on the web right here.


    University Data Science News

    MIT and Stanford University researchers Bonnie Berger, Hyunghoon Cho, and David Wu published a paper that would increase the confidentiality of genome databases using a technique that could be replicated for other sensitive data. The technique relies on using multiple servers to store data. No single server will have all of the data nor will it be possible to tell which server has accurate data. All datasets will appear to be complete on any given server, but will require what they call “secret sharing” (love that) to work together. A secret, in this case, is not a computationally intensive cryptographic technique requiring any reinvestment in fracking or coal mining to carry off. Instead, to store a given number x a secret-sharing system sends the random number r to one server and x-r to the other. Using multiplication adds useful complexity without much additional compute power. This is a technique to watch among those who care about privacy, security, and efficient use of natural resources (aka compute resources).



    MIT, like many other colleges and universities, is seeing a huge uptick in students taking intro to computer science coursework. I’ll spare you the course numbering system unique to MIT (but, seriously, it’s superior to other systems once learned) and simply say that demand has doubled for the most intro-y of intro classes. It being MIT and all, the original intro class was too challenging – and maybe not practical enough – for many students so they added an intro-before-intro class that prepared students for the original intro class about a decade ago. That split in two. Now those classes are burgeoning.



    In our Tweet of the Week (below), Joi Ito, Director of MIT’s Media Lab wonders if academic articles are losing their unchallenged primacy because blogs can be decent. In a follow-up to The Whole Tale workshop, a group of attendees released a paper to the arXiv further advancing the argument that academic articles are losing their primacy because they aren’t executable. They fail to include “the whole tale” of the research process so they present WholeTale, a research environment aimed to capture processes, not just papers.


    Loyola University of Maryland launched an undergrad data science major. Its courses are taught across 15 different departments signaling a curriculum that may have been assembled to hit the market demand for data science without necessarily taking the time to structure and order courses in the most pedagogically beneficial way. It’s often difficult for advisors to navigate these kinds of radically interdisciplinary majors at first, let alone students. I don’t have any direct experience with Loyola of Maryland, but I have seen data science potpourri before.



    Carnegie Mellon University has certainly earned its esteem as a top engineering and computer science university. No question about that. They have announced that they will be the first to offer an undergraduate degree in artificial intelligence. Forgive me for being the grouchy old hype police, but I feel like their computer science degree would have been a splendid preparation for a career in data science. But they are the first to offer an undergrad AI degree!



    Canadian researchers Zachary Spicer and Nicole Goodman are worried about losing their top students to US companies (No word on if they are worried about losing academic talent to US universities.) Canada does see 24 to 66 percent their STEM students move to the US. Why?: “higher pay, firm reputation, and the perceived greater variety in terms of scope of work.” At the University of Waterloo, there is a “Cali or bust” peer pressure. (January high temperatures in Waterloo average 26 degrees F.) California *is* very nice.



    Have you ever wondered what university diversity officers actually do? Well, The Economist has a doozy of a piece suggesting a latent explanation for the proliferation of these roles:

      1) administrators above the new diversity officers want new administrators below them because a new administrator is easier to control than a new faculty member
      2) in this moment, there is some regulatory pressure to hire diversity bureaucrats and a lot of space to voluntarily expand on those regulations. This suggests actual progress on diversity initiatives may be flaccid at best.



    Linkoping University in Sweden is opening a four-petaflop ClusterVision super computer that has been named Tetralith. I rather fancy the moniker.



    A new Bond University study found that poor research design for testing the efficacy of health apps means that there is little evidence that they work to combat diseases like diabetes, obesity, and mental health and essentially no credible evidence that they can do anything else. Common problems: small sample size and testing the app versus care rather than testing the treatment app versus a placebo app. And the lack of a quality digital placebo leads to overstatements about the therapeutic value of apps like Headspace, according to two research clinicians, Jessica Lipschitz and John Torous.


    NC State Named First University-Based IBM Q Hub in North America

    North Carolina State University, NC State News


    from

    NC State will be the first university in North America to establish an IBM Q Hub as part of the global IBM Q Network, a collaboration between tech powerhouse IBM and top Fortune 500 companies, national research labs and leading universities to advance quantum computing.

    The network provides early access to IBM’s quantum computing systems, with the goal of exploring practical applications important to business and science.

    Starting this fall, NC State will have access to IBM Q commercial quantum computing devices, including the most advanced and scalable universal systems available. The current 20 qubit IBM Q system will be followed by a 50 qubit prototype in the next generation.


    University of Arizona to be part of ambitious $1.45 billion U.S. health study

    tucson.com, Arizona Daily Star, Stephanie Innes


    from

    “What we’re doing with the All of Us Research Program is intersecting with other fundamental changes in medicine and research to empower Americans to live healthier lives,” National Institutes of Health director Dr. Francis S. Collins said in a prepared statement.

    UA officials say the UA Health Sciences/Banner effort will focus on the inclusion of Hispanic/Latino and American Indian/Alaskan Native participants and communities to gain greater insight on disease prevention and treatment.

    “Keep in mind, many groups of people are underrepresented in biomedical research data and therefore, things we discover don’t necessarily work for them,” Theodorou said.


    DNA sequencing to become part of Geisinger’s routine clinical care

    Geisinger Health System


    from

    Mammograms, colonoscopies and cholesterol checks are just a few of the routine screenings saving lives by detecting cancers and heart disease early.

    Geisinger patients will soon add DNA sequencing to that list.

    Geisinger President and CEO David T. Feinberg, M.D., MBA, announced Sunday that the Pennsylvania-based health system is expanding its successful genomics program beyond the realm of research and into everyday, preventive care.


    Empowering Rural Women and Girls Through Data

    United Nations Foundation, Eleanor Carey


    from

    Rural women and girls – and what is needed to ensure their progress – are the focus of this year’s UN Commission on the Status of Women (CSW). One area that must get attention is data. To improve the lives of women and girls in rural areas, we need to improve how we collect and produce data.

    The UN Secretary-General’s report for CSW recognizes an important reciprocal relationship between data systems and development outcomes – a connection that has been underlined repeatedly since the Sustainable Development Goals called for an accompanying data revolution.

    We know from the data we do have that rural women constitute one of the groups left furthest behind, lagging behind both men and urban women in nearly every gender equality and development indicator.

    But we also know that we do not have nearly enough data about the nuances of rural women’s lives to inform innovative policies and programs to achieve better outcomes.


    Carnegie Mellon Launches Undergraduate Degree in Artificial Intelligence

    Carnegie Mellon School of Computer Science


    from

    Carnegie Mellon University’s School of Computer Science will offer a new undergraduate degree in artificial intelligence beginning this fall, providing students with in-depth knowledge of how to transform large amounts of data into actionable decisions.

    SCS has created the new AI degree, the first offered by a U.S. university, in response to extraordinary technical breakthroughs in AI and the growing demand by students and employers for training that prepares people for careers in AI.

    “Specialists in artificial intelligence have never been more important, in shorter supply or in greater demand by employers,” said Andrew Moore, dean of the School of Computer Science. “Carnegie Mellon has an unmatched depth of expertise in AI, making us uniquely qualified to address this need for graduates who understand how the power of AI can be leveraged to help people.”


    What Astronomers Are Learning From Gaia’s New Milky Way Map

    Quanta Magazine, Abstractions blog, Natalie Walchover


    from

    A roundup of some of the most important discoveries gleaned so far from the Gaia space observatory’s new map of the galaxy.


    Any task that can be solved by getting an ML-powered bot to talk to a human on the phone should probably be turned into a website, and probable was, a decade ago.

    Twitter, Benedict Evans


    from

    Why are you asking your phone to call a restaurant and not using opentable?


    XNOR.ai raises $12M and unveils self-service platform to help developers bring AI to devices

    GeekWire, Alan Boyle


    from

    “AI for Everyone, Everywhere” may sound like a science-fiction slogan, but it’s actually the name given to software from XNOR.ai that’s already making devices smarter in the real world.

    The self-service software development platform is a new product for the Seattle startup, which is also announcing a $12 million Series A funding round led by Madrona Venture Group.

    XNOR CEO Ali Farhadi says the new investment will help his company, which was spun out from the Allen Institute for Artificial Intelligence last year, develop a high-end version of the “AI for Everyone, Everywhere” platform for enterprise-level applications.

     
    Deadlines



    Speed Conference / Cornell Tech

    New York, NY September 28-29. “Chaired by James Grimmelman and Helen Nissenbaum, this 1.5-day conference will explore the challenges of algorithmic governance in an accelerated age. Participants will be drawn from information science; law; computer science and electrical engineering; communications; economics and finance; philosophy; science and technology studies; and other disciplines. They will discuss current, forthcoming, and potential changes in algorithmic speed in their respective domains and techniques for avoiding, detecting, mitigating, and responding to speed-abetted algorithmic mistakes and abuses.” Deadline for abstract submissions is July 9.
     
    Moore-Sloan Data Science Environment News



    Working side by side: PhD Program alum Ariel Rokem is a data scientist for academia

    University of California-Berkeley, Berkeley Neuroscience


    from

    Ariel Rokem is a Data Scientist at the University of Washington eScience Institute, where he collaborates with researchers from diverse fields to develop and maintain software for the analysis of large data sets. Rokem describes his path from experimental scientist to data scientist as a gradual shift, but there were signs of his interest in computational analysis early in his research career. While studying to get his BS in Biology and Psychology from the Hebrew University of Jerusalem, Rokem wanted to work with Idan Segev, a computational neuroscientist. Segev turned him down, saying he hadn’t taken enough math. Rokem set out to remedy that and took extra classes in math and programming.

    Putting this fresh knowledge to use, Rokem spent a year in the lab of Andreas Herz at Humboldt-Universität zu Berlin, engaged in electrophysiological and computational studies of the grasshopper auditory system. He then returned to Hebrew University and completed a master’s degree in Cognitive Psychology, studying the differences in hearing and memory between blind and sighted people in Merav Ahissar’s lab.

    Rokem joined the Berkeley Neuroscience PhD Program in 2005 and became Michael Silver’s first graduate student.

     
    Tools & Resources



    It’s time to realize Apache Kafka’s full potential, spanning past and present

    Pyrostore, Michael Drogalis


    from

    Kafka users enjoy a broad sweet spot, one that can naturally grow in the context of use cases and in concert with the organization that runs it. You can simply run it as a message bus. Or, it can drive reactive microservices. In its most sophisticated form, Kafka promises to be the central nervous system (CNS) for a business, turning the database inside out.

    We can go further. Kafka’s power inspires an ideal system, where all data is stored in Kafka – forever. In such a setting, topics spanning live and historical data can be seamlessly consumed (and queried) as a single source of truth, or replayed in one brush without gluing together auxiliary storage. With a CNS at the heart of event streams that keep our data indefinitely, where else could Kafka take us?


    Announcing Rust 1.26

    The Rust Programming Language Blog, The Rust Core Team


    from

    The past few releases have had a steady stream of relatively minor additions. We’ve been working on a lot of stuff, however, and it’s all starting to land in stable. 1.26 is possibly the most feature-packed release since Rust 1.0.


    To gain an edge, embrace a ‘multi-cloud’ strategy, university IT leaders urge

    Edscoop, Emily Tate


    from

    At the Internet2 Global Summit, attendees outlined the strengths and weaknesses of AWS, GCP and Azure — and imagined using all three vendors at once.


    What is data visualization for?

    Stamen, Eric Rodenbeck


    from

    It’s past time for those of us who do this work to realize that telling stories with data is not a sub-genre of computer science, or of anything else for that matter. Dataviz is dataviz. It’s not a sub-genre of anything, any more than movies are a sub-genre of live theater, or photography is a sub-genre of painting. Computer science has about as much to say about the medium of data visualization as photographic paper engineering has to say about Instagram. Computer science can be the technical underpinnings of dataviz, sometimes. Dataviz is the medium.

    Dataviz! Data visualization! I don’t think it’s for anything! I don’t believe it’s meaningful to say that dataviz is for one thing, any more than it’s meaningful to say that architecture is for any one thing. Or that photography is for one thing, that it has a purpose that can be defined in a sentence or two. Or that movies are for one thing, one that you could win an argument about.


    Scientific debt

    David Robinson, Variance Explained blog


    from

    In my new job as Chief Data Scientist at DataCamp, I’ve been thinking about the role of data science within a business, and discussing this with other professionals in the field. On a panel earlier this year, I realized that data scientists have a rough equivalent to this concept: “scientific debt.”

    Scientific debt is when a team takes shortcuts in data analysis, experimental practices, and monitoring that could have long-term negative consequences.


    3 emerging Kubernetes trends

    The Enterprisers Project, Gordon Haff


    from

    Kubernetes continues to gain steam in enterprises, and for good reason: It tames the complexity that arises as you begin to use containers at scale. It automates and orchestrates Linux container operations, eliminating many manual tasks involved in deploying and scaling containerized applications.

    Where is Kubernetes headed next? I got a good look at last week’s KubeCon conference in Copenhagen.

     
    Careers


    Internships and other temporary positions

    Data Clinic – Part-time Internship Fall 2018



    Two Sigma Data Clinic; New York, NY
    Full-time positions outside academia

    Data Scientist



    Two Sigma Data Clinic; New York, NY
    Full-time, non-tenured academic positions

    Metadata and Data Curation Specialist



    University of Victoria, Ocean Networks Canada; Victoria, BC, Canada

    Leave a Comment

    Your email address will not be published.