Data Science newsletter – December 27, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for December 27, 2019

GROUP CURATION: N/A

 
 
Data Science News



Cornell University academics uses machine learning to analyse Oktoberfest data

ZDNet, Aimee Chanthadavong


from

Academics from Cornell University have concluded that machine learning could potentially be used for applications such as automated checkout systems following an experiment using the technology to identify what food and drink Oktoberfest-goers preferred.

The published datasets [PDF] revealed that 1,100 images were analysed as part of the research. The images were captured over 11 days using a video camera that was installed above a checkout in a beer tent during the world’s largest beer festival in Munich, Germany.


The Economic Puzzle of Transformative Technologies

Medium, MIT Initiative on the Digital Economy, Irving Wladawsky-Berger


from

“General purpose technologies (GPTs) are engines for growth…” wrote Erik Brynjolfsson, Daniel Rock, and Chad Syverson in The Productivity J-Curve, a working paper recently published by the National Bureau of Economic Research (NBER.) “These are the defining technologies of their times and can radically change the economic environment. They have great potential from the outset, but realizing that potential requires larger intangible and often unmeasured investments and a fundamental rethinking of the organization of production itself.” [Read a summary of this research paper here.]

As we’ve learned over the past two centuries, there’s generally been a significant time lag between the broad acceptance of a major new transformative technology and its ensuing impact on companies, governments, and other institutions. Even after reaching a tipping point of market acceptance, it takes considerable time — often decades— for these new technologies and business models to be widely embraced across economies and societies — and for their full benefits to be realized.


Central Banking in Challenging Times

SSRN, BIS Working Paper, Claudio E. V. Borio


from

Since the Great Financial Crisis, central banks have been facing a triple challenge: economic, intellectual and institutional. The institutional challenge is that central bank independence – a valuable institution – has come in for greater criticism. This essay takes a historical perspective and draws parallels with the previous waxing and waning of central bank independence. It suggests that this institution is closely tied to globalisation, as both spring from the same fountainhead: an intellectual and political environment that supports an open system in which countries adhere to the same principles and governments remain at arm′s length from the functioning of a market economy. This suggests that the fortunes of independence are also tied to those of globalisation. The essay then proceeds to explore ways that can help safeguard independence. A key one is to narrow the growing expectations gap between what central banks are expected to deliver and what they can actually deliver. In that context, it also considers and dismisses the usefulness of recently proposed schemes that involve controlled deficit monetisation.


Statistic of the decade: The massive deforestation of the Amazon

Rappler, Liberty Vittert


from

This year, I was on the judging panel for the Royal Statistical Society’s International Statistic of the Decade.

Much like Oxford English Dictionary’s “Word of the Year” competition, the international statistic is meant to capture the zeitgeist of this decade. The judging panel accepted nominations from the statistical community and the public at large for a statistic that shines a light on the decade’s most pressing issues.

On December 23, we announced the winner: the 8.4 million soccer fields of land deforested in the Amazon over the past decade. That’s 24,000 square miles, or about 10.3 million American football fields.


Without a National Artificial Intelligence Strategy, the United States Risks Missing Out on All the Technology’s Benefits—And Falling Behind Rivals Such as China

Foreign Policy, Martijn Rasser


from

n the coming years, artificial intelligence will dramatically affect every aspect of human life. AI—the technologies that simulate intelligent behavior in machines—will change how we process, understand, and analyze information; it will make some jobs obsolete, transform most others, and create whole new industries; it will change how we teach, grow our food, and treat our sick. The technology will also change how we wage war. For all of these reasons, leadership in AI, more than any other emerging technology, will confer economic, political, and military strength in this century—and that is why it is essential for the United States to get it right.

That begins with creating a national strategy for AI—a whole-of-society effort that can create the opportunities, shape the outcome, and prepare for the inevitable challenges for U.S. society that this new technological era will bring. The United States has taken important steps in this direction. In February, the White House launched the American AI Initiative, which articulates a comprehensive vision for American leadership in AI. Last month, the Congressionally-mandated National Security Commission on Artificial Intelligence (NSCAI) released its interim report, which outlines five lines of effort to help ensure U.S. technological leadership.

What is still missing, however, is a true framework for a national AI strategy. The United States needs to think big and take bold action to harness the technology’s potential and address the challenges. On its current trajectory, the United States is poised to fall behind its competitors. China, in particular, is spending growing sums on AI research and development and is outpacing the United States in the deployment of AI-based systems. China is also luring more expats back home to join the AI ecosystem. There are three key areas where the United States should act to reverse this dynamic and lay the foundation for long-term leadership in AI.


How the Internet Spans the Globe

Communications of the ACM, Logan Kugler


from

Tech giants like Google (which owns YouTube) and Facebook “have surpassed Internet backbone providers—the traditional telecom carriers—as the largest users of international capacity,” according to TeleGeography. The firm estimates the amount of international capacity deployed by tech companies grew eight times from 2014 to 2018.

These undersea cables connect almost the entire globe. Google is a part or sole owner of 15 different undersea cables, with end points terminating in places that include Chile and France. Facebook is a major capacity buyer or part owner of 10 cables with end points in Singapore, China, and the U.S. (among other countries). Amazon is part owner or a major capacity buyer of five cables, with end points in countries that include Malaysia. Microsoft is part owner or a major capacity buyer of four different cables, with end points in countries that include Spain.

One of the fastest undersea cables in operation today is the Marea cable, partially owned by Microsoft, Amazon, and Facebook. It transmits data at 160 terabits per second, which is “16 million times faster than the average home Internet connection,” says Microsoft. The Marea cable has enough capacity to “stream 71 million high-definition videos simultaneously,” according to the company.


What’s unique about CS education compared to other DBERs?

Mark Guzdial, Computing Education Research Blog


from

I was recently asked by an NSF program officer to answer the questions, “What makes CS education different than other discipline-based education research (DBER, like math ed, physics ed, or engineering ed)? What research questions might we ask (that we might not currently be asking) to improve practice in CS education?” If I’m going to write a long-form email reply, I might as well make it a blog post. I’m using the specific term, computer science education, not my preferred and more general computing education because the question was framed specifically about CS programs, not necessarily about information technology, information systems, cybersecurity, or other computing programs.


Big data, small state: R.I. database offers the potential for efficient, improved health care

Brown University, News from Brown


from

From the machine learning that allows Alexa to understand users who ask “What’s the weather today?” to the systems that predict the paths and impacts of hurricanes, big data is being used to accomplish remarkable tasks across a wide range of sectors.

The field of health care is no exception — physicians are using artificial intelligence tools to better identify breast cancer, scholars are analyzing patient data to identify inefficiencies within the health care system, and the possibilities for using data to solve problems in health care are nearly boundless.

Through a new partnership between the Rhode Island Department of Health and Brown-based, federally funded Advance Clinical and Translational Research (Advance-CTR), five health care researchers affiliated with Brown University, Lifespan and the University of Rhode Island now have access to the state’s All-Payer Claims Database (APCD). APCDs are mandated, state-run archives that contain information from medical and pharmacy bills for the state’s residents, whether they’re insured by Medicaid, Medicare or private insurance companies.


5G arrived in Chicago in 2019. What will happen with the fifth generation of wireless in 2020?

Chicago Tribune, Ally Marotti


from

Unlike consumers, businesses will likely start using 5G in 2020, said Joseph Doering, who leads Accenture’s communications, media and telecom practice in the Midwest.

In an Accenture survey of 100 Chicago business executives, 46% of respondents said they believed 5G will have a significant effect on their business in one to three years.

“You have every major business out there looking at how that (5G) is going to be impacting their business, not just in 2020, but beyond,” Doering said. “It’s very strategic and competitive … how do they get there and get here first.”


Peer Review of Research Data Submissions to ScholarsArchive@OSU: How can we improve the curation of research datasets to enhance reusability?

Journal of eScience Leadership, Clara Llebot and Steven Van Tuyl


from

Objective: Best practices such as the FAIR Principles (Findability, Accessibility, Interoperability, Reusability) were developed to ensure that published datasets are reusable. While we employ best practices in the curation of datasets, we want to learn how domain experts view the reusability of datasets in our institutional repository, ScholarsArchive@OSU. Curation workflows are designed by data curators based on their own recommendations, but research data is extremely specialized, and such workflows are rarely evaluated by researchers. In this project we used peer-review by domain experts to evaluate the reusability of the datasets in our institutional repository, with the goal of informing our curation methods and ensure that the limited resources of our library are maximizing the reusability of research data.

Methods: We asked all researchers who have datasets submitted in Oregon State University’s repository to refer us to domain experts who could review the reusability of their data sets. Two data curators who are non-experts also reviewed the same datasets. We gave both groups review guidelines based on the guidelines of several journals. Eleven domain experts and two data curators reviewed eight datasets. The review included the quality of the repository record, the quality of the documentation, and the quality of the data. We then compared the comments given by the two groups.

Results: Domain experts and non-expert data curators largely converged on similar scores for reviewed datasets, but the focus of critique by domain experts was somewhat divergent. A few broad issues common across reviews were: insufficient documentation, the use of links to journal articles in the place of documentation, and concerns about duplication of effort in creating documentation and metadata. Reviews also reflected the background and skills of the reviewer. Domain experts expressed a lack of expertise in data curation practices and data curators expressed their lack of expertise in the research domain.

Conclusions: The results of this investigation could help guide future research data curation activities and align domain expert and data curator expectations for reusability of datasets. We recommend further exploration of these common issues and additional domain expert peer-review project to further refine and align expectations for research data reusability.


State funding approved for cyber research nodes, including one run by VCU

Richmond Times-Dispatch, Michael Martz


from

Virginia has fulfilled its initial budget promise for a statewide cyber- security research initiative by approving $10 million in state funding for four regional research operations, including one led by Virginia Commonwealth University. The participating institutions already have raised money to more than match the state funds, especially in research.

The Virginia Research Investment Committee approved $2.5 million for each of the four “nodes” created under the Commonwealth Cyber Initiative last year, as well as an additional $500,000 each in capital funding for equipment to carry out collaborative research that will help educate students to fill jobs in the fast-growing field.

The state already has provided $10 million to Virginia Tech to establish and run the hub operation for the initiative, which involves 39 colleges and universities, as well as 65 private partners, at the four regional nodes in central, Southwestern, eastern and Northern Virginia.


NYU Center for Data Science: What is intelligence?

Kyunghyun Cho


from

A few weeks ago there was an open house at NYU Center for Data Science intended for faculty members of NYU. As one of the early members of the Center (i know! already!) i was given an opportunity to share why i joined the center and my experience at the Center so far with the audience. although i’m much more familiar with giving a research talk using a set of slides, i decided to try something new and give a talk without any slide. of course, this is totally new to me, and i couldn’t help but prepare a script in advance. i didn’t really stick to the script during my talk then, but thought it’s not too bad an idea to share this with a broader community beyond NYU.


Artificial Intelligence Is Rushing Into Patient Care – And Could Raise Risks

Scientific American, Kaiser Health News, Liz Szabo


from

AI systems are not as rigorously tested as other medical devices, and have already made serious mistakes


Mike Bloomberg uses Hawkfish, tech firm he founded, for White House bid

CNBC, Brian Schwartz and Tucker Higgins


from

  • A technology company called Hawkfish, with former executives from Facebook and Foursquare, is helping Bloomberg’s 2020 presidential bid.
  • Soon after Hawkfish incorporated in the spring, it started working on races in Virginia and Kentucky, where Democrats scored key victories in November.

  • Colleges are turning students’ phones into surveillance machines, tracking the locations of hundreds of thousands

    Santa Rosa Press Democrat, The Washington Post, Drew Harwell


    from

    When Syracuse University freshmen walk into professor Jeff Rubin’s Introduction to Information Technologies class, seven small Bluetooth beacons hidden around the Grant Auditorium lecture hall connect with an app on their smartphones and boost their “attendance points.”

    And when they skip class? The SpotterEDU app sees that, too, logging their absence into a campus database that tracks them over time and can sink their grade. It also alerts Rubin, who later contacts students to ask where they’ve been. His 340-person lecture has never been so full.

    “They want those points,” he said. “They know I’m watching and acting on it. So, behaviorally, they change.”

     
    Events



    MIT Deep Learning: Lectures and Tutorials for Introduction to and Overview of the Field

    Lex Fridman


    from

    Cambridge, MA Starting January 6, 2020. “This page is a collection of MIT courses and lectures on deep learning, deep reinforcement learning, autonomous vehicles, and artificial intelligence organized by Lex Fridman.” Speakers include: Andrew Trask (OpenMined and University of Oxford), Vladimir Vapnik (Columbia University), Francois Chollet (Google), Vivienne Sze (MIT).

     
    Deadlines



    Take the Data Science Survey from the University of Pisa

    “A research team from the University of Pisa seeks your participation in their Data Science Survey. The survey is mainly addressed at those who work every day in the field of data science, either on their own or for a company.”
     
    Tools & Resources



    JSON on the command line with jq

    George Ornbo


    from

    A series of how to examples on using jq, a command-line JSON processor


    workflowr 1.6.0 is now on CRAN

    Twitter, John Blischak


    from

    New function wflow_run() to run code directly in the R console … New vignette on options for including large data files


    Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art

    arXiv, Computer Science > Computer Vision and Pattern Recognition; Lucas Caccia, Herke van Hoof, Aaron Courville, Joelle Pineau


    from

    Recent years have witnessed enormous progress in AI-related fields such as computer vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. While several survey papers on particular sub-problems have appeared, no comprehensive survey on problems, datasets, and methods in computer vision for autonomous vehicles has been published. This book attempts to narrow this gap by providing a survey on the state-of-the-art datasets and techniques. Our survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Towards this goal, we analyze the performance of the state of the art on several challenging benchmarking datasets, including KITTI, MOT, and Cityscapes. Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we also provide a website that allows navigating topics as well as methods and provides additional information.


    NeurIPS 2019 Notes [pdf] (david-abel.github.io)

    Hacker News


    from

    Link to David Abel’s comprehensive notes (pdf) from NeurIPS 2019, along with Y Combinator Hacker News comments on this year’s conference.

     
    Careers


    Full-time positions outside academia

    Data Scientist



    Altice; Long Island City, NY

    Product Manager



    Crossref; Lynnfield, MA
    Internships and other temporary positions

    Intern with rOpenSci’s Community Manager



    rOpenSci; Remote

    Leave a Comment

    Your email address will not be published.