Data Science newsletter – December 28, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for December 28, 2018


Data Science News

Artificial Intelligence Creates Realistic Photos of People, None of Whom Actually Exist

Open Culture, Petapixel, Colin Marshall


These neural networks have, it seems, competed their way to generating images of fabricated human faces that genuine humans have trouble distinguishing from images of the real deal. Their architecture, described in a paper by the Nvidia researchers who developed it, “leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis.” What they’ve come up with, in other words, has made it not just more possible than ever to create fake faces, but made those faces more customizable than ever as well.

How Much of the Internet Is Fake?

New York Magazine, Intelligencer, Max Read


How much of the internet is fake? Studies generally suggest that, year after year, less than 60 percent of web traffic is human; some years, according to some researchers, a healthy majority of it is bot. For a period of time in 2013, the Times reported this year, a full half of YouTube traffic was “bots masquerading as people,” a portion so high that employees feared an inflection point after which YouTube’s systems for detecting fraudulent traffic would begin to regard bot traffic as real and human traffic as fake. They called this hypothetical event “the Inversion.”

Mapping City Trees With Artificial Intelligence

CityLab, Linda Poon


How many trees are in your city?

It might seem like a straightforward question, but finding the answer can be a monumental task. New York City’s 2015-2016 tree census, for example, took nearly two years (12,000 hours total) and more than 2,200 volunteers. Seattle’s tree inventory won’t be complete until at least 2024. Such efforts aren’t done in vain; in the short term, they allow cities to better maintain their urban trees. And over the long run, they lay out the foundation for various initiatives that address everything from climate change to public health.

Aiera Bags $3.5M for A.I. to Help Wall Street Analysts Pick Stocks

Xconomy, Jeff Engel


The latest move comes from Aiera (pronounced like “era”), a New York- and Boston-based company that announced today it took in $3.5 million in seed funding. Flybridge Capital Partners led the investment and was joined by Macquarie Group, Innovation Endeavors, Fin Venture Capital, and WGI Group.

Aiera, which stands for “artificially intelligent equity research analyst,” wants to use A.I. technologies to help active investment managers make decisions, in part by forecasting stock performance and recommending whether to buy, hold, or sell particular stocks.

One Giant Step for a Chess-Playing Machine

The New York Times, Science, Essay, Steven Strogatz


The stunning success of AlphaZero, a deep-learning algorithm, heralds a new age of insight — one that, for humans, may not last long.

Fortnite: The hot, new social network

Axios; Mike Allen, Sara Fischer, Felix Salmon


For tween and teen young men — and even college and young celebrities and pro athletes — “Fortnite: Battle Royale” in 2018 became more than a live-streaming game: It’s now so consuming that it has taken on hallmarks of a social network. Think of it as the new Snapchat.

Expanding an innovation ecosystem: Princeton leads the way in 2018

Princeton University, Office of Communications


As President Christopher L. Eisgruber noted in July, standing alongside New Jersey Gov. Phil Murphy at Princeton Innovation Center BioLabs, “We believe that these cross-sector partnerships will foster a vibrant innovation ecosystem with the potential to generate transformative ideas, benefit the regional economy and attract top talent — outcomes that will make central New Jersey an increasingly attractive place for all who live, learn and work here.”

The year kicked off with the opening of the Biolabs facility in Plainsboro at the Princeton Forrestal Center, and closed with Google and the University announcing that a new artificial intelligence laboratory would open in the town of Princeton in January. In between were other developments such as an agreement between the Princeton Catalysis Initiative (PCI) and Celgene Corp., and a Princeton-Microsoft agreement to research biofilms.

Xavier Amatriain’s answer to What were the most significant machine learning/AI advances in 2018? – Quora

Quora, Xavier Amatriain


If I had to summarize the main highlights of machine learning advances in 2018 in a few headlines, these are the ones that I would probably come up:

  • AI hype and fear mongering cools down
  • More focus on concrete issues like fairness, interpretability, or causality
  • Deep learning is here to stay and is useful in practice for more than image classification (particularly for NLP)
  • The battle on the AI frameworks front is heating up, and if you want to be someone you better publish a few frameworks of your own

  • Congress votes to make open government data the default in the United States

    E Pluribus Unum blog, Alex Howard


    On December 21, 2018, the United States House of Representatives voted to enact H.R. 4174, the Foundations for Evidence-Based Policymaking Act of 2017, in a historic win for open government in the United States of America.

    The Open, Public, Electronic, and Necessary Government Data Act (AKA the OPEN Government Data Act) is about to become law as a result. This codifies two canonical principles for democracy in the 21st century:

  • public information should be open by default to the public in a machine-readable format, where such publication doesn’t harm privacy or security
  • federal agencies should use evidence when they make public policy
    Tools & Resources

    How to Moderate a Panel Discussion

    Harvard Business Review, Dorie Clark


    First, it’s important to prepare your panelists in advance for what to expect. At one recent conference where I was a panelist, my moderator didn’t contact me until the morning of our session. “Unfortunately I couldn’t find your email address in my mailbox,” he wrote me, “and I couldn’t obtain it from the [conference organizers]. They’ve been a bit overwhelmed I guess these last few days. But [fellow panelist] gave it to me this morning and so here is the outline. Let me know if it works and see you later today!”

    OrbitDB Peer-to-Peer Databases for the Decentralized Web

    GitHub – orbitdb


    OrbitDB is a serverless, distributed, peer-to-peer database. OrbitDB uses IPFS as its data storage and IPFS Pubsub to automatically sync databases with peers. It’s an eventually consistent database that uses CRDTs for conflict-free database merges making OrbitDB an excellent choice for decentralized apps (dApps), blockchain applications and offline-first web applications.

    Leave a Comment

    Your email address will not be published.