Data Science newsletter – June 17, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for June 17, 2019

GROUP CURATION: N/A

 
 
Data Science News



Announcing the 2019 Bloomberg Data Science Research Grant Winners

Tech at Bloomberg blog


from

  • Marti A. Hearst (University of California, Berkeley)
    Unsupervised Abstractive News Summarization
  • Maria-Florina Balcan (Carnegie Mellon University)
    Data-Driven Transfer Clustering
  • Stefano Ermon (Stanford University)
    Differentiable ranking losses
  • Walter Lasecki and Jonathan Kummerfeld (University of Michigan)
    An Adaptive Crowdsourcing System for Real-Time Domain Adaptation
  • Eduardo Blanco (University of North Texas)
    Extracting Spatial Timelines from Text
  • Jeff Dalton (University of Glasgow)
    A Multi-task Model for Information Extraction and Entity-Centric Ranking tasks

  • Authenticity Under Fire – Researchers are calling into question authenticity as a scientifically viable concept.

    Scientific American, Scott Barry Kaufman


    from

    Authenticity is one of the most valued characteristics in our society. As children we are taught to just “be ourselves”, and as adults we can choose from a large number of self-help books that will tell us how important it is to get in touch with our “real self”. It’s taken as a given by everyone that authenticity is a real thing and that it is worth cultivating.

    Even the science of authenticity has surged in recent years, with hundreds of journal articles, conferences, and workshops. However, the more that researchers have put authenticity under the microscope, the more muddied the waters of authenticity have become. Many common ideas about authenticity are being overturned. Turns out, authenticity is a real mess.


    The Hacker Learns to Trust

    Medium, Connor Leahy


    from

    A few days ago, I trained a 1.5bn parameter language model, with the goal of replicating GPT2 (though subsequent analysis showed that although my model is identical in most technical details, its performance characteristics are significantly worse) and wrote an extended essay on why I wanted to release it to the public. (I find it amusing that some have called it a “manifesto”) Since then, I have received a huge amount of thoughtful, relevant arguments and information, and a decent amount of much less helpful kinds of feedback as well, that have made me rethink my arguments and views very carefully. Instead of replying to all the points and comments as they arose I wanted to take my time and formulate my thoughts right. Not communicating clearly enough or leaving too much up for interpretation can be as harmful as not communicating at all.


    Socialist critics of surveillance capitalism: corporations shouldn’t own and control your data

    Twitter, Aral Balkan and Peter Wang


    from

    … But it’s OK if governments own and control your data if we’re the ones in government…


    Chairwoman Johnson and Rep. González-Colón Introduce the Vera Rubin Survey Telescope Designation Act

    U.S. Congress, House Committee on Science, Space and Technology


    from

    Chairwoman Eddie Bernice Johnson (D-TX) and Congresswoman Jenniffer González-Colón (R-PR) introduced H.R. 3196, the “Vera Rubin Survey Telescope Designation Act.” H.R. 3196 would designate the Large Synoptic Survey Telescope (LSST), a National Science Foundation and Department of Energy funded telescope that is currently under construction on a mountaintop site in northern Chile, as the “Vera Rubin Survey Telescope.”


    A new business in small satellites orbiting the Earth

    The Economist


    from

    IN MAY 1999 a group of researchers from the Technical University of Berlin launched an unusual satellite. At a time when most of the machinery in orbit weighed thousands of kilograms, TUBSAT was a petite 45kg. A box that measured 32cm on each side, it carried three video cameras, the idea being to test whether such a titchy spacecraft could capture useful imagery of Earth. The researchers cited low mass, and the resultant low costs, as the benefits of such comparatively tiny satellites. They promised to open up “new market areas” for Earth observation.

    It took around 15 years for them to be proved right. A few such “smallsats”, sometimes called nanosats or CubeSats, were launched every year in the decade up until 2014, when numbers spiked. Planet Labs, a Californian company founded by ex-Nasa engineers, launched 33 smallsats that year, each weighing just a few kilos. Planet’s satellites are spiritual successors of TUBSAT, designed to gather imagery of the Earth’s surface. The firm sells its customers images from around 150 active satellites it has in orbit.


    New commission to explore Alabama artificial intelligence industry

    Birmingham Business Journal (AL), Tyler Patchen


    from

    A new commission in Alabama will explore the growth of artificial intelligence in the state and look at ways to grow the emerging technology in Alabama.

    A bill sponsored by Alabama Sen. Jabo Waggoner was passed by Alabama lawmakers during the recently completed session.

    The commission will advise state leaders on the growth of the AI industry and make recommendations on policy to advance AI technology efforts.


    AI created to translate babies’ cries

    BBC Science Focus Magazine, James Lloyd


    from

    Babies can cry because they are feeling ill or pain but they will also often let out a whimper if they are feeling hungry or sleepy. This makes it incredibly difficult for parents, especially first-time parents, to know exactly why their little ones are snivelling. Now, a group of researchers based at Northern Illinois University in the States has created a method distinguishing between normal cry signals and abnormal ones, such as those resulting from an underlying illness, using artificial intelligence.

    The method could be useful for both parents at home as well as by doctors that need use it to discern cries among sick children, they say


    MIT Open Documentary Lab publishes first major analysis of collective media creation, including AI, launches Co-Creation Studio

    MIT Media Lab, MIT Open Documentary Lab


    from

    The Co-Creation Studio at MIT Open Documentary Lab is publishing Collective Wisdom: Co-Creating Media within Communities, across Disciplines and with Algorithms. This first-of-its-kind field study of the media industry highlights trends, opportunities, and challenges to help advance the understanding and recognition of co-created works and practices—efforts that function outside the limits of singular authorship.


    Is the Beach Safe to Swim? A Faster Water Test

    CityLab, Leslie Nemo


    from

    This method required new expertise and equipment. At the time, “there probably weren’t any [commercial] labs that could do seven-day-a-week testing and turn results around,” said environmental scientist Sam Dorevitch, who had already been studying this kind of beach testing and leads one such well-suited lab at the University of Illinois at Chicago. If the Park District, which had contracted out the old-school water assessment method to a nearby lab, wanted to adopt the protocol, it needed to hire a new facility. And what do you know—Dorevitch’s lab was there. “I suppose that’s sort of happy luck on our part,” acknowledged Breitenbach.

    After trials in 2015 and 2016 to see how the same-day technique compared to the traditional method, the district went all in. The department pays Dorevitch’s lab about $300,000 a year to sample, analyze, and report results from water collected at 20 different lakefront locations every day from Memorial Day weekend through Labor Day.


    Trustees sign off on Purdue budget, eighth year of frozen tuition

    Purdue University News


    from

    Purdue trustees on Friday (June 14) approved the tuition and fee schedule for the 2020 and 2021 fiscal years and the university’s operating budget for the 2020 fiscal year.

    Tuition will remain frozen on Purdue’s main campus for the seventh and eighth consecutive years — through the 2020-21 academic year — keeping base undergraduate tuition and fees for Indiana residents under $10,000 per year. With the exception of flight program fees, which will increase by 3% in fiscal year 2020 and 5.6% in fiscal year 2021, all general fees also will be held steady.


    Seven new degrees approved for seven IU campuses: News at IU: Indiana University

    Indiana University, News at IU


    from

    A Bachelor of Science in data science at IU Bloomington has also been approved by the trustees. The 120-credit program will be jointly offered by the School of Informatics, Computing and Engineering in collaboration with the Departments of Statistics, Mathematics and Linguistics in the College of Arts and Sciences. Students will choose from five specialization tracks — foundational data science, data systems, network and applied data science, data sciences design, and biological and health data science– and will complete a two semester-long capstone project within that specialization area in their fourth year.


    Breaking down the walls of scientific secrecy

    CBC News, Second Opinion (Canada), Kelly Crowe


    from

    Sharing science in real time is the radical concept behind the Open Lab Notebooks initiative by the Structural Genomics Consortium (SGC), a Canadian research group at the University of Toronto with a team that also does work out of Oxford University in the U.K. The SGC is trying to break down the walls of scientific secrecy by encouraging its scientists to post their research notes every few weeks.

    The goal is to speed up science and develop low-cost drugs faster in a patent-free system, where research discoveries are immediately available for anyone in the world to use.

    But opening their research notes to the public is daunting for many scientists in biomedical research.


    UF has plans for $2.2B in projects in next 10 years

    Gainesville Sun, Daniel Smithson


    from

    New academic buildings, some of which have already received state funding, will also take up a large chunk of the $1.5 billion. UF plans to spend $495 million dollars on academic projects, $390 million on non-academic projects, $378 million on UF Health projects, and $323 million on utilities, landscape and transportation during the next five years.

    UF wants to spend $183 million on a new UF College of Dentistry and $125 million on its new Data Science and Information Technology academic building.


    University of Washington computer science leader Hank Levy reflects on 13 years of massive change

    GeekWire, Todd Bishop


    from

    During his tenure, the program positioned itself as one of the top 5 computer science programs in the country, after MIT, Stanford, Berkeley, and Carnegie Mellon in the minds of many in the industry. It grew its faculty by 30 positions, or 70 percent; doubled its space with the addition of the Bill & Melinda Gates Center for Computer Science & Engineering; tripled its undergraduate enrollment and doubled its graduate enrollment; and developing strengths in areas such as robotics, data science, security, sensors and machine learning. [audio, 38:10]

     
    Deadlines



    BOOK PRIZE CONTEST 1 OF 4: #PETSWINPRIZES

    “Simply put, what I’m [Andy Kirk] looking for are submissions of photographs of animals alongside the first edition of my book. The animals don’t need to actually be your pets, they can be a local rat, a random cow, a nearby earthworm or a passing fly. The best compositions, however, will be those where I’m getting a clear sense that the animal is engaging with the book’s contents: I want to believe the animal is benefiting from the learning, that it now feels equipped to take on a visualisation challenge. Equally, if the animal appears entirely indifferent, that is ok, I won’t judge it or hold a grudge. How you convey whatever type of animal-book engagement is entirely up to you!” Deadline for submissions is June 24.
     
    Tools & Resources



    A Survey of Reinforcement Learning Informed by Natural Language

    arXiv, Computer Science > Machine Learning; Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel


    from

    “To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making problems. We thus argue that the time is right to investigate a tight integration of natural language understanding into RL in particular.”


    voila – From Jupyter notebooks to standalone applications

    GitHub – QuantStack


    from

    “The following two examples show how a standalone Jupyter notebook can be turned into a separate app, from the command-line integration.”


    Fans Are Better Than Tech at Organizing Information Online

    WIRED, Ideas, Gretchen McCulloch


    from

    The Archive of Our Own has none of these problems. It uses a third tagging system, one that blends the best elements of both styles.

    On AO3, users can put in whatever tags they want. (Autocomplete is there to help, but they don’t have to use it.) Then behind the scenes, human volunteers look up any new tags that no one else has used before and match them with any applicable existing tags, a process known as tag wrangling. Wrangling means that you don’t need to know whether the most popular tag for your new fanfic featuring Sherlock Holmes and John Watson is Johnlock or Sherwatson or John/Sherlock or Sherlock/John or Holmes/Watson or anything else. And you definitely don’t need to tag your fic with all of them just in case. Instead, you pick whichever one you like, the tag wranglers do their work behind the scenes, and readers looking for any of these synonyms will still be able to find you.

    AO3’s trick is that it involves humans by design—around 350 volunteer tag wranglers in 2019, up from 160 people in 2012—who each spend a few hours a week deciding whether new tags should be treated as synonyms or subsets of existing tags, or simply left alone.


    Here’s How to Forge Effective AI Regulation

    Medium, MIT Initiative on the Digital Economy, Paula Klein and Gillian Hadfield


    from

    “What’s needed is a dual approach to addressing these challenges: First, build AI systems that can interact with human norms, rules, and law. Then, build “a novel regulatory structure — third-party regulatory markets — to spur the development and deployment of innovative regulatory technologies that can keep up with the speed and complexity of advances in AI.”

     
    Careers


    Full-time, non-tenured academic positions

    Operations Director, Leeds Institute for Data Analytics



    University of Leeds; Leeds, England

    Deputy Director



    University of California-Santa Barbara, National Center for Ecological Analysis and Synthesis (NCEAS); Santa Barbara, CA
    Full-time positions outside academia

    Products Insights Manager, Personalization



    Spotify; New York, NY
    Internships and other temporary positions

    Innovator in Residence Program



    U.S. Library of Congress, Washington, DC

    Leave a Comment

    Your email address will not be published.