Data Science newsletter – November 2, 2016

Newsletter features journalism, research papers, events, tools/software, and jobs for November 2, 2016

GROUP CURATION: N/A

 
 
Data Science News



Major Roadblocks on the Path to Machine Learning

The Next Platform, Thomas Dinsmore


from October 31, 2016

Even in analytically sophisticated organizations, machine learning often operates in “silos of expertise.” For example, the financial crimes unit in a bank may use advanced techniques to catch anti-money laundering; the credit risk team uses completely different and incompatible tools to predict loan defaults and set risk-based pricing; while treasury uses still other tools to predict cash flow. Meanwhile, customer service and branch operations do not use machine learning at all because they lack the critical mass of specialists and software.


Big Data and Expected Returns

Bloomberg View, Matt Levine


from November 01, 2016

Here’s a keynote speech that Commissioner Kara Stein of the Securities and Exchange Commission gave to the “Big Data in Finance Conference” last week, titled “A Vision for Data at the SEC.” It’s about how the SEC does and should use data to catch crimes and prevent crashes and so forth. But it seems to me that Stein raises and then passes over the most interesting questions.


Cat and mouse with data: FinTech begets RegTech

The article that made me want to high five my neighbor Jim (because he was the closest human to me) detailed how the SEC used blue book trading data to catch inside traders. Watch out finance mice, the cat has sharpened its claws! Oddly, one of the inside traders was a plumber (who had been getting tips from a banker friend at Barclays). But, boy is that a triumph for the SEC’s model. Completely unpredictable for humans to look at a plumber as a possible financial fraudster, right? Right. Go get ‘em, RegTech robot.

Snappy, funny finance Bloomberg View columnist Adam Levine still feels like the SEC’s Commissioner Kara Stein is not answering the most interesting questions about precisely what kind of information constitutes “material” information, the word used to describe the type of information traders use to make trades. In a data-saturated environment, this could be anything from a weather report to a tip from your friend at Barclays that a big pharma merger is about to be announced.


Drowning in data or swimming in sensors, U.S. Army to adopt AI

Bloomberg Government, Cameron Leuthy


from October 20, 2016

The U.S. Army is coming to grips with a problem faced by the intelligence community: finding actionable insight in a rising tide of data. Artificial intelligence promises to help, though how and when AI will be factored into military requirements is an open question.

The number of sensors and communications modes now deployed are drowning soldiers in data and making it difficult to make decisions. Lieutenant General Robert Ashley Jr., the Army’s intelligence chief, wants to use AI to help soldiers process data and share it with partners, according to ExecutiveGov.


The Polls of the Future Are Reproducible and Open Source

Slate, Andrew Gelman


from November 01, 2016

For the past few election cycles, it’s become generally understood that single surveys can be noisy and that averaging across polls gives a better sense of trends in public opinion. But how do you do your averaging? Our current election forecasts and poll aggregators fall in one of two camps: either transparent and simple (for example, the moving averages of RealClearPolitics) or complicated with proprietary “special sauce” (for example, whatever Nate Silver is currently doing).

But the model I’m talking about here is the best of both worlds. It includes state and national poll data, models of correlations between states, a pre-election forecast, state-level and electoral college forecasts, and pretty graphics—but all in an open-source framework, so anyone can check any part of the data or model.


Roundup: University Data Science

  • New Research Center to Explore Ethics of Artificial Intelligence (November 01, The New York Times, John Markoff)

  • Center for Spatial Data Science opens at UChicago
    (October 31, University of Chicago, UChicago News)
  • NSF Announces New Program to Support Institutes for Theoretical Foundations of Data Science (Deadline: January 19, Computing Community Consortium, The CCC blog, Tracy Kimbrel)
  • UC Santa Cruz now offering graduate degrees in computational media (November 03, 2016
    University of California-Santa Cruz, Newscenter, Tim Stephens)

  • Center for Spatial Data Science opens at UChicago

    University of Chicago, UChicago News


    from October 31, 2016

    A new center at the University of Chicago will make connections in areas as diverse as health care, real estate and social networks by focusing on space.

    The Center for Spatial Data Science is concerned with geography, proximity and other location-based measures that hold great promise for unlocking discoveries from data across many fields. While methods based on time have long dominated statistical analysis, the opening of the center—and arrival of its director Luc Anselin, the Stein-Freiler Distinguished Service Professor in Sociology—will expand the use of spatial data science in research across campus.


    Beyond Silicon: Squeezing More Out of Chips

    The New York Times, John Markoff


    from October 30, 2016

    If silicon has its limits, ingenuity may not. Better algorithms and new kinds of hardware circuits could help scientists continue to make computers that can do more and at a lower cost.

    “It’s been a fun ride,” said Thomas M. Conte, an electrical engineer at the Georgia Institute of Technology. “Today you’re entering this patchwork world where you are going to find a better solution for a particular problem, and that’s how we’re going to advance in the future.”


    In defense of data sharing, but done in the right way

    STAT, Jeffrey M. Drazen and Isaac S. Kohane


    from October 31, 2016

    As electronic medical records (EMRs) began to enter the scene, clinicians were asked to document their care of patients in more detail than ever before. The goals for this effort were to improve communication among caregivers and to inform payment reimbursement. But as with the asthma trial above, these data are having another life.

    Today, EMR data are being used to predict domestic abuse, characterize genetic risks for rheumatoid arthritis and non-responsiveness to antidepressants, as well as to identify distinct subgroups of children with autism. Unknown signals can often lurk in a mountain of electronic data.


    Genetic testing fumbles, revealing ‘dark side’ of precision medicine

    STAT, Sharon Begley


    from October 31, 2016

    Enthusiasm for precision medicine, from the White House down to everyday physicians, is at an all-time high. But serious problems with the databases used to interpret patients’ genetic profiles can lead to “inappropriate treatment” with “devastating consequences,” researchers at the Mayo Clinic warned on Monday.

    Their report describes the cases of some two dozen people who were told they had a potentially fatal illness and one who had a heart defibrillator surgically implanted but, it turns out, never needed it. The individuals were family members who underwent genetic testing after a young relative died of a heart syndrome. Test results indicated that they carried a mutation in a heart-related gene — and the database that the testing company used indicated it caused a serious disorder.

     
    Events



    Open Source Computational Acoustics and Filter Design



    Berkeley, CA Perrin Meyer
    (Director of Digital Research, Meyer Sound Laboratories Inc), November 4, at 1:10 p.m., 190 Doe Library

    Ann Arbor Deep Learning



    Ann Arbor, MI Saturday, 12 November 2016. Annual event to bring together deep learning enthusiasts, researchers and practitioners from a variety of backgrounds. 1670 Bob and Betty Beyster Building; 2260 Hayward St. [free]

    Kavli HUMAN Project San Diego Town Hall at SfN 2016



    San Diego, CA 6:30 – 8:30 pm Sunday 13 November 2016. Please join us for a town hall-style discussion about the Project at the Society for Neuroscience meeting. Marriott Marquis San Diego Marina (333 West Harbor Drive). [free]

    Digital-Life-Design salon with DLD Fellow Axel Kuhn



    Palo Alto, CA Lunch Monday 14 November 2016. Dive into the hottest Silicon Valley topics during our traditional DLDsalon. MacArthur Park Restaurant (27 University Avenue). [application required]

    GIS Day 2016 San Francisco



    San Francisco, CA Celebrate all things mappy with some of the region’s leading geo-luminaries. Afternoon and evening events on Wednesday, November 16. [free, registration required]

    Hockey Graphs and Vancouver Canucks Co-Host Vancouver Hockey Analytics Conference 2017



    Vancouver, BC, Canada Saturday, 11 March 2017. Registration opening soon, will be capped at 100. Rogers Arena.
     
    Deadlines



    Open Calls for Commentary

    Cambridge University Press. Commentary Proposals for “Building Machines That Learn and Think Like People” by Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. Deadline is Tuesday 22 November 2016.

    8th Conference on Complex Networks – Call for Papers

    Dubrovnik, Croatia March 21-24, 2017. Abstract/Paper submission deadline: Sunday, November 27.

    Request for Proposals: Graduate Student Workshop on Socio-Environmental Synthesis (Spring 2017) | SESYNC

    Workshop Application Deadline: January 13, 2017

    NSF Announces New Program to Support Institutes for Theoretical Foundations of Data Science

    NSF recently released a new program solicitation, Transdisciplinary Research in Principles of Data Science (TRIPODS), to support collaborative institutes bringing together the theoretical computer science, statistics, and mathematics communities.

    Phase I proposals may request up to $500,000 per year for 3 years. Letters of intent are required and are due January 19, 2017. Proposals are due March 15, 2017.


    MathSport International 2017 Conference

    Padua, Italy Mathsport International 2017 is hosted by University of Padua and organized by the Department of Mathematics from Monday 26th June to Wednesday 28th June 2017. Deadline for submissions is Monday, January 30, 2017.
     
    Tools & Resources



    How To Curate Content For Learning

    Anders Pink blog


    from November 01, 2016

    The pace of change these days means that L&D staff simply can’t create all the content that learners require. The good news is there is a huge amount of content readily available on the web that can be accessed and made available to support learners. Let’s look at what it takes to be a great content curator for learning, and some resources that can help you.


    5 lessons learned from building a data pipeline

    Computerworld, Andrew Montalenti


    from October 31, 2016

    Whether you decide to build or buy, you can benefit from the lessons we learned while building our own real-time pipeline. At least that way you’ll be armed with what you need to meet the challenge.

    Lesson #1: With analytics data, scale matters


    Open Visualization Research from VIS 2016

    Benjamin Bach


    from October 31, 2016

    This site contains links to interactive demos, websites, evaluation data, and other complementary material about research presented at the IEEE VIS conference, Oct 2016, in Baltimore. We hope this website helps you finding inspiration in browsing all the amazing output the scientific visualization community is producing — besides publishing scientific papers.


    Practical advice for analysis of large, complex data sets

    The Unofficial Google Data Science Blog, Patrick Riley


    from October 31, 2016

    For a number of years, I led the data science team for Google Search logs. We were often asked to make sense of confusing results, measure new phenomena from logged behavior, validate analyses done by others, and interpret metrics of user behavior. Some people seemed to be naturally good at doing this kind of high quality data analysis. These engineers and analysts were often described as “careful” and “methodical”. But what do those adjectives actually mean? What actions earn you these labels?

    To answer those questions, I put together a document shared Google-wide which I optimistically and simply titled “Good Data Analysis.” To my surprise, this document has been read more than anything else I’ve done at Google over the last eleven years. Even four years after the last major update, I find that there are multiple Googlers with the document open any time I check.

     
    Careers


    Internships and other temporary positions

    A+E Networks Digital Media Storytelling Fellowship



    A+E Networks; New York, NY

    Pre- and Post-Doctoral Fellowships



    Linda Hall Library; Kansas City, MO

    2017 Research Experiences for Undergraduates internship program



    Santa Fe Institute; Santa Fe, NM
    Full-time positions outside academia

    Research Assistant — United States Holocaust Memorial Museum



    United States Holocaust Memorial Museum; Washington, DC

    Data engineer and back-end developer



    Chimu Systems; Northbrook, IL

    Leave a Comment

    Your email address will not be published.