Data Science newsletter – November 1, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for November 1, 2017

GROUP CURATION: N/A

 
 
Data Science News



Report: 59% of employed data scientists learned skills on their own or via a MOOC

Tech Republic, Alison DeNisco Rayome


from

The majority of employed data scientists gained their skills through self-learning or a Massive Open Online Course (MOOC) rather than a traditional computer science degree, according to a survey from data scientist community Kaggle, which was acquired by Google Cloud earlier this year.

Some 32% of full-time data scientists started learning machine learning or data science through a MOOC, while 27% said that they began picking up the needed skills on their own, the 2017 State of Data Science & Machine Learning Survey report found. Some 30% got their start in data science at a university, according to the survey of more than 16,000 people in the field.

More than half of currently employed data scientists still use MOOCs for ongoing education and skillbuilding, the report found, demonstrating the potential of these courses for helping people gain real world skills.


Introducing Kaggle’s State of Data Science & Machine Learning Report, 2017 | No Free Hunch

Kaggle, No Free Hunch blog, Mark McDonald


from

Over 16,000 responses surveys were submitted, with over 6 full months of aggregated time spent completing it (an average response time of more than 16 minutes). Today we’re publicly releasing:

  • This interactive report featuring a few initial insights from the survey. We put this together with the folks from the Polygraph. It includes interactive visualizations so you can easily cut the data to find out exactly what you want to know. The report is focused on a few key areas that are important to our team: who is working with data, what’s happening at the cutting edge of machine learning across industries, and how new data scientists can best break into the field. But, there’s a ton more to learn from the survey’s dataset. So we’re also releasing…
  • The code behind the interactive report so that data scientists can build off our analysis to discover further insights without starting from scratch.
  • The raw, anonymized dataset of survey responses, so that data scientists can dig into the data itself to create their own reports, analyses, and opinions about the state of data science and machine learning.

  • Vector Institute doubles AI faculty with 10 new additions

    betakit, Douglas Soltys


    from

    Toronto-based Vector Institute announced today that it has effectively doubled its faculty of AI researchers. The non-profit said that ten new machine learning, deep learning, and AI researchers have joined its faculty from both Canadian and international institutions. The total headcount, when including Chief Scientific Advisor Geoffrey Hinton and Research Director Richard Zemel, sits now at 20.

    “Increasingly, the world’s most promising researchers in deep learning and other AI subfields are looking at Canada as a hub with many opportunities to collaborate, advance research and develop applications,” said Hinton. “Vector is a pillar of the Canadian AI ecosystem and I’m very excited that the team is expanding with highly sought-after talent, some of whom I’ve had the pleasure of working with. This team will drive Vector’s excellence in research, education and industry collaboration.”


    Groundbreaking to be held for Fort Meade cyberspace facility

    Fifth Domain, The Associated Press


    from

    Maryland Gov. Larry Hogan will be the keynote speaker at a groundbreaking ceremony for the 175th Cyberspace Operations Squadron Facility at Fort Meade.

    The new building will provide operational and command space for missions aligned to the state of Maryland, the National Security Agency and the United States Cyber Command. The 175th Wing Contracting Office is responsible for contract execution and administration.

    The squadron’s federal mission will be to defend and protect Air Force and U.S. critical infrastructure against cyberattacks.


    Empowering human-centric organizations using AI driven emotion analytics – The Influential Series Issue #4

    LinkedIn, Jeffrey Postles


    from

    For this issue of The Influential Series my esteemed guests are Lana Novikova and Hollie Gordon from Heartbeat Ai. Lana is the founder & CEO and Hollie is the Director of Business Development & Social Impact. Heartbeat Ai was founded two years ago and is an award winning software-as-a-service that reveals human emotions so you can truly understand customers, shoppers, employees, patients, etc. Heartbeat Ai transforms text input from any source into ten primary and one hundred secondary emotion categories. These emotion categories offer revealing insight into the “deep why” behind behaviour and are accessible through a user-friendly dashboard.


    China’s AI Superstar Face++ Finishes Series C Funding Round With a Staggering US$460 Million

    Medium, Synced


    from

    China’s computer vision company Megvii, a.k.a Face++ (旷视科技) has closed their Series C funding round at a staggering US$460 million. The investment was led by China’s state-owned venture capital Guo Feng Fund, joined by Alibaba’s Ant Financial and the Foxconn Group. Other investors include Russia-China Investment Fund (RCIF), Sunshine Insurance Group, and SK Group. Tengda Capital is the exclusive financial advisor for the deal.

    Face++, established in 2011, received its A round financing from Lenovo’s Comet Labs. It’s been a rich year for Chinese AI startups — computer vision company YITU closed their C round investment at US$380 million this May, while SenseTime’s B round financing totaled US$410 million.


    Rousing Masses to Fight Cancer with Open Source Machine Learning

    Georgia Tech News Center


    from

    Here’s an open invitation to steal. It goes out to cancer fighters and tempts them with a new program that predicts cancer drug effectiveness via machine learning and raw genetic data.

    The researchers who built the program at the Georgia Institute of Technology would like cancer fighters to take it for free, or even just swipe parts of their programming code, so they’ve made it open source. They hope to attract a crowd of researchers who will also share their own cancer and computer expertise and data to improve upon the program and save more lives together.

    The researchers’ invitation to take their code is also a gauntlet.

    They’re challenging others to come beat them at their own game and help hone a formidable software tool for the greater good.


    Hundreds of humpback whales are massing in a tiny spot of ocean. Here’s why

    Science, Elizabeth Pennisi


    from

    Marine biologists Mduduzi Seakamela of the National Department of Environmental Affairs in Cape Town, South Africa, and Kenneth Findlay at the Cape Peninsula University of Technology, also in Cape Town, didn’t know all this for sure when they recruited graduate student David Cade to help them pin down what the whales were doing off their coastline. With the proper permits in hand, Cade—a Stanford University student who works at the Hopkins Marine Station in Pacific Grove, California—helped them outfit a dozen whales in the area with the research equivalent of a Go-Pro, a video camera that came with a motion detector and a depth finder. Together, these instruments revealed what the whales were doing underwater. What’s more, the ship where the researchers were working could monitor the density of prey in the water underneath it, so Cade could figure out what the whales were eating.


    A communal catalogue reveals Earth’s multiscale microbial diversity

    Nature, Earth Microbiome Project


    from

    Our growing awareness of the microbial world’s importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth’s microbial diversity. [full text]


    [1710.10742] Implicit Causal Models for Genome-wide Association Studies

    arXiv, Statistics > Machine Learning; Dustin Tran, David M. Blei


    from

    Progress in probabilistic generative models has accelerated, developing richer models with neural architectures, implicit densities, and with scalable algorithms for their Bayesian inference. However, there has been limited progress in models that capture causal relationships, for example, how individual genetic factors cause major human diseases. In this work, we focus on two challenges in particular: How do we build richer causal models, which can capture highly nonlinear relationships and interactions between multiple causes? How do we adjust for latent confounders, which are variables influencing both cause and effect and which prevent learning of causal relationships? To address these challenges, we synthesize ideas from causality and modern probabilistic modeling. For the first, we describe implicit causal models, a class of causal models that leverages neural architectures with an implicit density. For the second, we describe an implicit causal model that adjusts for confounders by sharing strength across examples. In experiments, we scale Bayesian inference on up to a billion genetic measurements. We achieve state of the art accuracy for identifying causal factors: we significantly outperform existing genetics methods by an absolute difference of 15-45.3%.


    ‘We can’t compete’: why universities are losing their best AI scientists

    The Guardian, Ian Sample


    from

    “He was offered such a huge amount of money that he simply stopped everything and left,” said Maja Pantic, professor of affective and behavioural computing at Imperial. “It’s five times the salary I can offer. It’s unbelievable. We cannot compete.”

    It is not an isolated event. Across the country, talented computer scientists are being lured from academia by private sector offers that are hard to turn down. According to a Guardian survey of Britain’s top ranking research universities, tech firms are hiring AI experts at a prodigious rate, fuelling a brain drain that has already hit research and teaching. One university executive warned of a “missing generation” of academics who would normally teach students and be the creative force behind research projects.


    New Scripps ResearchKit app uses 23andMe data to predict heart disease risk | MobiHealthNews

    MobiHealthNews, Tom Sullivan


    from

    The Scripps Research Institute on Wednesday announced a new smartphone app that can calculate a user’s risk of heart disease.

    Hospitals will want to keep an eye on the research study that Scripps conducts about the effectiveness of giving patients predictions concerning adverse health events and what impact that could have on influencing healthy behaviors and lifestyle changes.

    Scripps MyGeneRank enables users to connect to 23andMe’s API and, using that as well as survey and ResearchKit data, compile a 10-year risk score for a heart attack or other coronary events.


    The College Kids Doing What Twitter Won’t | WIRED

    WIRED, Backchannel, Lauren Smiley


    from

    Bhat and Phadte, 20-year-old students who study computer science at UC Berkeley, decided to launch a data-driven counterattack, aiming to do what Twitter itself has not: publicly expose alleged bot accounts right there on the platform for the world to see. This week, the duo launched a Google Chrome browser extension that inserts a button onto every Twitter profile and tweet that reads, snappily, “Botcheck.me.” Click it, and you get a diagnosis of whether the account appears to be run by a person or by some sort of automation, based on the duo’s own machine learning model. Their model is targeted exclusively to hunt propaganda bots about US politics. (It would not be able to detect a bot that, say, tweeted out a cat picture every minute.)

    The duo joins a cadre of outside investigators who, in the absence of more public action from Twitter, are providing their own analyses of the bot epidemic.

     
    Events



    CyberCon 2017

    Fifth Domain


    from

    Pentagon City, VA November 28, starting at 7:30 a.m., organized by Fifth Domain. “Defending in cyberspace takes a holistic approach, encompassing technology, policy and people. That’s why we’re bringing together military, intelligence community and homeland security leaders to discuss the latest in cyber defense and offense.” [free for government, $$$ for industry]


    The Big Picture: What’s At Stake in Trump’s America

    Public Books, NYU’s Institute for Public Knowledge


    from

    New York, NY November 7, Tishman Auditorium at NYU School of Law. “Public Books and NYU’s Institute for Public Knowledge invite you to join us for an all-day symposium investigating the rise of Trump and America’s turn toward authoritarian rule.” [rsvp required]

     
    Deadlines



    Free Software Foundation “Social Benefit” Award Nominations

    “The Free Software Foundation is now accepting nominations for the 2017
    “Project of Social Benefit Award,” presented to the project or team
    responsible for applying free software, or the ideas of the free
    software movement, in a project that intentionally and significantly
    benefits society in other aspects of life.” The deadline for nominations is November 5.

    Do you work w/#Tensorflow? Take a few minutes to fill out this short survey & we’ll give you a free O’Reilly ebook

    From O’Reilly Strata: “Answer a few questions about Tensorflow—and, as our thank you gift, you’ll be able to select a free ebook from those pictured below.”

    NASA looking for private organizations to take over Spitzer mission – SpaceNews.com

    NASA is considering handing over operations of one of its flagship “Great Observatories” to a private organization that would also be responsible for funding the spacecraft.

    In a request for information (RFI) issued Oct. 12, NASA said it was looking for U.S.-based institutions interesting in taking over operations of the Spitzer Space Telescope after NASA funding for the infrared observatory ends in 2019. Responses are due by Nov. 17.


    ICML 2018 Call for Papers

    Stockholm, Sweden ICML takes place July 10-15, 2018. ICML will adopt a single reviewing cycle, with a single paper deadline on February 9.
     
    Tools & Resources



    dask-ml 0.1 documentation

    GitHub – dask


    from

    “Dask-ML enables parallel and distributed machine learning using Dask alongside existing machine learning libraries like Scikit-Learn.”


    [1710.11573] Deep Learning as a Mixed Convex-Combinatorial Optimization Problem

    arXiv, Computer Science > Learning; Abram L. Friesen, Pedro Domingos


    from

    As neural networks grow deeper and wider, learning networks with hard-threshold activations is becoming increasingly important, both for network quantization, which can drastically reduce time and energy requirements, and for creating large integrated systems of deep networks, which may have non-differentiable components and must avoid vanishing and exploding gradients for effective learning. However, since gradient descent is not applicable to hard-threshold functions, it is not clear how to learn them in a principled way. We address this problem by observing that setting targets for hard-threshold hidden units in order to minimize loss is a discrete optimization problem, and can be solved as such. The discrete optimization goal is to find a set of targets such that each unit, including the output, has a linearly separable problem to solve. Given these targets, the network decomposes into individual perceptrons, which can then be learned with standard convex approaches. Based on this, we develop a recursive mini-batch algorithm for learning deep hard-threshold networks that includes the popular but poorly justified straight-through estimator as a special case. Empirically, we show that our algorithm improves classification accuracy in a number of settings, including for AlexNet and ResNet-18 on ImageNet, when compared to the straight-through estimator.


    Researchers Unveil Tool to Debug ‘Black Box’ Deep Learning Algorithms | Columbia News

    Columbia University, Columbia News


    from

    Researchers at Columbia and Lehigh universities have come up with a way to automatically error-check the thousands to millions of neurons in a deep learning neural network. Their tool, DeepXplore, feeds confusing, real-world inputs into the network to expose rare instances of flawed reasoning by clusters of neurons. Researchers present it on Oct. 29 at ACM’s Symposium on Operating Systems Principles in Shanghai.

    “You can think of our testing process as reverse engineering the learning process to understand its logic,” said co-developer Suman Jana, a computer scientist at Columbia Engineering and a member of the Data Science Institute. “This gives you some visibility into what the system is doing and where it’s going wrong.”


    Introducing Vega-Lite 2.0

    Medium, UW Interactive Data Lab


    from

    “Vega-Lite enables concise descriptions of visualizations as a set of encodings that map data fields to the properties of graphical marks. Vega-Lite uses a portable JSON format that compiles to full specifications in the larger Vega language.”


    Eager Execution: An imperative, define-by-run interface to TensorFlow

    Google Research Blog; Asim Shankar and Wolff Dobson


    from

    “Eager execution is an imperative, define-by-run interface where operations are executed immediately as they are called from Python. This makes it easier to get started with TensorFlow, and can make research and development more intuitive.”

     
    Careers


    Tenured and tenure track faculty positions

    Department of Brain & Cognitive Science



    Massachusetts Institute of Technology; Cambridge, MA
    Full-time positions outside academia

    Data Scientist (Social Network Analysis)



    Humanyze; Palo Alto, CA
    Postdocs

    Computational analysis of genomics, epigenomics, transcriptomics profile



    Weill Cornell Medicine, Caryl and Israel Englander Institute for Precision Medicine; New York, NY

    Leave a Comment

    Your email address will not be published.