Data Science newsletter – April 5, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for April 5, 2019

GROUP CURATION: N/A

 
 
Data Science News



Improving intro CS by explicitly teaching programming skills

Medium, Bits and Behavior, Benjie Xie


from

You’re in your first day of your first computer science (CS1) course. The instructor wants to “dive right in.” They want you to open up your computer and to the “IDE” you were supposed to download before class and copy some code projected on the screen. You copy the code. They tell you “don’t try to understand [the code] too much.” That’s funny to you because you don’t understand any of the magic that just occurred! Some code was written in English words but not typical sentence structure. After pressing a button to run the code, “Hello world!” appeared in a funny font. What did you just write and why did you get “Hello world” and your neighbor get many lines of scary red text? How did the code your copied make “Hello world!” appear? And was that it, or was there something more? So much to infer, so little instruction on how to do so!

I intended for the above example to demonstrate that programming requires many skills which we don’t necessarily teach.


Using AI to Make Better AI

IEEE Spectrum, Mark Anderson


from

Since 2017, AI researchers have been using AI neural networks to help design better and faster AI neural networks. Applying AI in pursuit of better AI has, to date, been a largely academic pursuit—mainly because this approach requires tens of thousands of GPU hours. If that’s what it takes, it’s likely quicker and simpler to design real-world AI applications with the fallible guidance of educated guesswork.

Next month, however, a team of MIT researchers will be presenting a so-called “neural architecture search” algorithm that can speed up the AI-optimized AI design process by 240 times or more. That would put faster and more accurate AI within practical reach for a broad class of image recognition algorithms and other related applications.

“There are all kinds of tradeoffs between model size, inference latency, accuracy, and model capacity,” says Song Han, assistant professor of electrical engineering and computer science at MIT.


Are PhDs just cheap labour for universities?

The Guardian, Rachel Hall


from

PhD grads complain that there are too few academic jobs. Should universities support their postgrads into alternative careers?


New machine learning model describes dynamics of cell development

Helmholtz Zentrum München (German Research Center for Environmental Health)


from

From their birth through to their death, cells lead an eventful existence. Thanks to single-cell genomics, their destiny in large cell populations can now be analyzed. However, this method destroys the cell, which makes it difficult to draw conclusions about the dynamics of cell development. In order to address this problem, researchers at the Helmholtz Zentrum München and the University of Massachusetts use pseudodynamics, a mathematical model that estimates developmental processes from single-cell time series observations. Their report has been published in ‘Nature Biotechnology’.


Google cancels A.I. ethics panel after uproar

CNBC, Jordan Novet


from

  • Google identified the people who would serve on its Advanced Technology External Advisory Council last week.
  • The involvement of the president of the Heritage Foundation was called into question early on.

  • AI pioneer: ‘The dangers of abuse are very real’

    Nature, News Q&A, Davide Castelvecchi


    from

    Nature sat down with [Yoshua] Bengio in London in January.

    Do you see a lot of companies or states using AI irresponsibly?

    There is a lot of this, and there could be a lot more, so we have to raise flags before bad things happen. A lot of what is most concerning is not happening in broad daylight. It’s happening in military labs, in security organizations, in private companies providing services to governments or the police.


    Boston’s new centralized data platform is the city’s starting point for predictive analytics

    StateScoop, Ryan Johnson


    from

    The lifeblood of a “smart city” is data — internal government data, mobility data, public safety data and anything else that can be measured and optimized on a city street. Boston’s new data warehouse platform, launched internally through the city’s analytics team this month, will set the city up for success as future emerging technologies are implemented, according to the city’s chief data officer, Stefanie Costa Leabo.

    The new platform, Costa Leabo said, is a commitment to centralizing all of the currently disparate data stores the city has. Boston partnered with Civis Analytics to concentrate the government’s data in one location maintained by Costa Leabo’s team and housed in the Department of Innovation Technology. And it will automate the city’s open data pipelines — pushing out more data to the general public more accurately and with less effort.


    The Corporations Devouring American Colleges

    The Huffington Post, Highline, Kevin Carey


    from

    The price of college is breaking America. At a moment when Hollywood celebrities and private equity titans have allegedly been spending hundreds of thousands in bribes to get their children into elite schools, it seems quaint to recall that higher learning is supposed to be an engine of social mobility. Today, the country’s best colleges are an overpriced gated community whose benefits accrue mostly to the wealthy. At 38 colleges, including Yale, Princeton, Brown and Penn, there are more students from the top 1 percent than the bottom 60 percent.

    Tuition prices aren’t the only reason for this, but they’re a major one. Public university tuition has doubled in the last two decades, tripled in the last three. Prestige-hungry universities admit large numbers of students who can pay ever-increasing fees and only a relative handful of low-income students. The U.S. now has more student loan debt than credit card debt—upward of $1.5 trillion. Nearly 40 percent of borrowers who entered college in the 2003 academic year could default on their loans by 2023, the Brookings Institution predicts.


    Elite U.S. school MIT cuts ties with Chinese tech firms Huawei, ZTE

    Reuters


    from

    The Massachusetts Institute of Technology has severed ties with Huawei Technologies and ZTE Corp as U.S. authorities investigate the Chinese firms for alleged sanctions violations, the school said on Wednesday.


    Elsevier’s Presence on Campuses Spans More Than Journals. That Has Some Scholars Worried.

    The Chronicle of Higher Education, Lindsay Ellis


    from

    On a recent panel on challenges to the future of teaching and research, Colleen Lyon outlined what was, to her, a “dangerous” dynamic in the world of academic publishing.

    Lyon, a librarian of scholarly communications at the University of Texas at Austin, listed scholarly-publishing tools that had been acquired by the journal publishing giant Elsevier. In 2013, the company bought Mendeley, a free reference manager. It acquired the Social Science Research Network, an e-library with more than 850,000 papers, in 2016. And it acquired the online tools Pure and Bepress — which visualize research — in 2012 and 2017, respectively.


    Interview: The Bengio Brothers

    Eye On AI


    from

    Eye on AI sat down with the two recently for their first interview together. They talked about their unconventional parents, their first computer (an Atari 800) and the divergent paths they have taken in the past few years, with Samy joining Google and while Yoshua started his own company, Element AI, while remaining committed to academic research by founding Mila, or the Montreal Institute for Learning Algorithms. [video, 30:42]


    Researchers developed algorithms that mimic the human brain (and the results don’t suck)

    The Next Web, Tristan Greene


    from

    A pair of researchers recently developed a method for successfully conducting unsupervised machine learning that mimics how scientists believe certain aspects of the human brain works. These biologically-feasible algorithms could provide an alternate path forward for the field of AI.

    IBM researcher Dmitry Krotov and John J. Hopfield, inventor of the associative neural network, developed a set of algorithms that teach machines in the same loose, unfettered way humans learn. Their algorithms allow machines to learn in an unsupervised manner – without using the labelled datasets that modern deep learning does.


    Apple hires AI expert Ian Goodfellow from Google

    CNBC, Jordan Novet


    from

    Ian Goodfellow joined Apple’s Special Projects Group as a director of machine learning last month.


    Guest Post – Rob Johnson on Shifting Relationship Dynamics and Imbalances in an Open Access World

    The Scholarly Kitchen, Nicola Poser


    from

    What kinds of constraints/opportunities do you think stakeholders (publishers in particular) will face in a gold open access (OA) world?

    I think the biggest constraint and opportunity is that we aren’t going to be in a fully gold-OA world any time soon, but rather we’ll see an increasingly mixed economy in publishing. The rapid growth of preprints, early moves towards the syndication of content and the cancellation of big deals are all pointers to what may lie ahead, but at this point it’s hard to know how any of these will play out. We can say with some confidence that it will be a golder world in future, but the next few years are likely to be characterized more by fragmentation and increased complexity than a global flip to OA.

    That said, there’s no doubt that gold OA is driving significant change in the scholarly publishing market. Some years ago, for example, the expectation was that the APC (article processing charge) model would result in authors becoming more price-sensitive when deciding where to publish. It was hoped that this would address the problem of ‘intermediation’ in the subscription market, where researchers act as both producers and consumers of research, but the actual purchase of content is undertaken by academic libraries.

     
    Events



    d3.southBay(‘Meetup’)

    Meetup, Bay Area d3 User Group


    from

    Sunnyvale, CA April 9, starting at 6 p.m., LinkedIn Building R (950 W Maude Ave). “Join us for a behind-the-scenes conversation with some of the authors from the Machine Learning Research Journal distill.pub (https://distill.pub/about/).” [rsvp required]

     
    Deadlines



    Tool Foundary

    “Tool Foundry is a Luminary Labs initiative funded by grants from the Gordon and Betty Moore Foundation and Schmidt Futures to expand access to science.”

    “The Tool Foundry accelerator is a four-month program designed to help a diverse cohort of teams iterate and scale accessible and powerful scientific discovery tools.” Deadline to apply is May 30.

     
    Tools & Resources



    [D] Amazon to release largest social conversation and knowledge dataset

    reddit.com/r/MachineLearning


    from

    From the blog post:

    Today I am happy to announce our intention to make available the Topical Chat dataset, a corpus of human-human social conversations collected from crowd workers that will be released publicly on September 17, 2019.

    [26 comments]


    Together with @Theteamatx and @GoogleAI, we have recently proposed the Randomized-to-Canonical Adaptation Network (RCAN)

    Twitter, DeepMind


    from

    RCAN: a real2sim image translator trained with domain randomization. It achieves SoTA performance on robotic grasping with no real data.


    Today we’re releasing a large-scale extendable dataset of mathematical questions, for training (and evaluating the abilities of) neural models that can reason algebraically.

    Twitter, DeepMind


    from

    Paper: https://t.co/D8g477gcQ4
    Code and data: https://t.co/QvR2WkK7j2… https://t.co/SZRSMvEeO3″


    AmpliGraph: Python library for Representation Learning on Knowledge Graphs

    GitHub – Accenture


    from

    AmpliGraph is a suite of neural machine learning models for relational Learning, a branch of machine learning that deals with supervised learning on knowledge graphs.

    Leave a Comment

    Your email address will not be published.