Data Science newsletter – September 10, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for September 10, 2019

GROUP CURATION: N/A

 
 
Data Science News



Northwestern Mutual Data Science Institute names directors from Marquette, UWM

Milwaukee Biz Times


from

A Marquette University geneticist and University of Wisconsin-Milwaukee marketing professor have been named directors of the new Northwestern Mutual Data Science Institute.

As directors, Edward Blumenthal of Marquette and Purush Papatla of UWM will provide strategic direction for the data science institute, a partnership of the Milwaukee-based life insurance company and two universities.


Information gerrymandering in social networks skews collective decision-making

Nature, News and Views, Carl T. Bergstrom & Joseph B. Bak-Coleman


from

An analysis shows that information flow between individuals in a social network can be ‘gerrymandered’ to skew perceptions of how others in the community will vote — which can alter the outcomes of elections.


China strengthens its campaign against scientific misconduct

Chemical & Engineering News, Hepeng Jia


from

The State Administration of Press and Publication, the agency in charge of China’s publishing sector, released and adopted in July the Academic Publishing Specification—Definition of Academic Misconduct for Journals. Other standards developed by the agency cover citation and translation practices and the use of ancient Chinese.


A Farewell

Lucidworks, Grant Ingersoll


from

A little over 12 years ago to the day, I took the single greatest leap of faith in my working life: I quit my comfortable job at Syracuse University doing search and NLP research engineering to join up with Marc, Yonik, and Erik to begin the journey that is Lucidworks. Today that journey is nearing an end as I’m announcing my departure from Lucidworks as a full-time employee and beginning a new journey as shareholder and cheerleader.

Back in 2007, the four of us founders set out to build a business around the large and vibrant Apache Solr community using an open-source 1.0 business model. Over the years, through countless ups and downs, that vision has evolved and expanded in ways we never imagined. I’m incredibly proud of what we’ve built at Lucidworks and am also excited for where the team is headed, even if it is without me in a day-to-day role.


Etsy’s free shipping push asks sellers to compete with Amazon

Vox, Kaitlyn Tiffany


from

What choice does Etsy have but to try to thrive in the new normal? It’s advertising on television now, gearing up for a marketing push around free shipping in mid-September, and implying to sellers that all of their financial concerns will vanish once Etsy is a true household name.

“Why would you expect free shipping from the nice lady in Iowa who hand-knits afghans? Why would you think that she would be able to do that for you?” Sandberg asks, hypothetically, of the customers Etsy says it has polled in robust surveys over the past few years. “If I were sending something to her, it would cost me $12. I understand that. I’m a grown-up.”

The lady hand-making afghans in Iowa has enough working against her, you know? Plus, as Sandberg explains, channeling the sentiment of Etsy’s 2 million sellers: “Etsy was supposed to be different.”


Under pressure, US universities start the year seeking to curb foreign influence fears

South China Morning Post, Simone McCarthy


from

  • Colleges respond to concern from lawmakers and federal agencies with measures including updated protocols for foreign visitors on campuses
  • They are becoming ‘much more cautious’ about contractual relationships with ‘foreign entities’

  • Google says it received document request from DOJ

    TheHill, Harper Neidig


    from

    “On August 30, 2019, Alphabet received a civil investigative demand from the DOJ requesting information and documents relating to our prior antitrust investigations in the United States and elsewhere,” the company said in a regulatory filing submitted on Friday. “We expect to receive in the future similar investigative demands from state attorneys general. We continue to cooperate with the DOJ, federal and state regulators in the United States, and other regulators around the world.”

    The Justice Department announced earlier this summer that it was launching an antitrust review of major technology companies, after months of political pressure from Silicon Valley’s critics on Capitol Hill.


    Influencers are fighting for attention as Instagram tests removing likes from its platform: ‘There’s no audience applause at the end of a performance’

    Business Insider, Paige Leskin


    from

  • Instagram has been testing in some countries what its platform would look like without likes appearing on posts.
  • The removal of likes is designed to improve the lives of consumers, but influencers are starting to feel the impact of the change on their accounts and their brands.
  • But brands value “authentic” influencers who are their “true selves” more than engagement metrics such as likes, influencer marketers tell Business Insider.

  • Scraping A Public Website Doesn’t Violate the CFAA, Ninth Circuit (Mostly) Holds

    Reason.com, The Volokh Conspiracy blog, Orin S. Kerr


    from

    The Ninth Circuit Court of Appeals has handed down a groundbreaking decision today on the federal computer hacking law, the Computer Fraud and Abuse Act (CFAA). In HiQ Labs v. LinkedIn, the court held that scraping a public website is likely not a CFAA violation.

    Under the new decision, violating the CFAA requires “circumvent[ing] a computer’s generally applicable rules regarding access permissions, such as username and password requirements,” that thus “demarcate[]” the information “as private using such an authorization system.” If the data is available to the general public, the court says, it’s not an unauthorized access to view it—even when the computer owner has sent a cease-and-desist letter to the visitor telling them not to visit the website.

    This is a major case that will be of interest to a lot of people and a lot of companies. But it’s also pretty complicated and easy to misunderstand. This post will go through it carefully, trying to explain what it says and what it doesn’t say.


    DOE announces funding for Argonne projects on better materials and chemistry through data science

    Argonne National Laboratory


    from

    The U.S. Department of Energy (DOE) Office of Science (Basic Energy Sciences) has announced that its Argonne National Laboratory will be receiving funding for two new projects that will use data science to accelerate discovery in chemistry and material sciences.

    Argonne is one of five national laboratories and fourteen universities awarded three-year grants under a DOE Funding Opportunity titled ​“Data Science for Discovery in Chemical and Materials Sciences.” Argonne was awarded funding for two research projects. Total funding will be nearly $4.75 million over three years.


    YouTube vowed to halt comments on videos with young kids. It hasn’t

    CNET, Joan E. Solsman


    from

    An outcry about pedophilia made YouTube promise it would shut off comments on videos with kids age 13 and younger. Six months later, they’re still easy to find.


    YouTube Videos Are a Gold Mine for Health Researchers

    The Atlantic, Sidney Fussell


    from

    Earlier this summer, a team at England’s Keele University published a behavioral study on children with autism. But it didn’t do it by interviewing subjects, or administering questionnaires. Instead, it used YouTube videos. Bappaditya Mandal and his colleagues trained an artificial intelligence to study the body movements of children with autism, using it to classify their behaviors as either typical or atypical. The researchers’ goal, Mandal told me, is to use computers to more quickly evaluate edge cases that might normally require lab equipment or invasive tactile sensors.

    Mandal’s research builds on algorithms that track the appearance of tremors or seizures in children with epilepsy. Epilepsy is slightly more common in people with autism, and vice versa. Video analysis can help scientists and families establish a narrative—when behaviors appear, what triggers them, and which parts of the body are most affected—“all the things the doctors need to know in order to do a good diagnosis,” Mandal explained.


    List: Honest Academic Job Postings [satire]

    McSweeney’s Internet Tendency, Ryan Weber


    from

    English department seeks a tenure-track assistant professor specializing in Shakespeare, Romanticism, Victorianism, modernism, post-modernism, post-colonialism, southern literature, Appalachian literature, African American literature, Caribbean literature, Irish literature before 1200, Croatian literature after 1853, Joyce, Chaucer, Hemingway, Morrison, Milton’s lesser works, those damn Ayn Rand novels our male sophomores want to read, the non-sexy D.H. Lawrence books, and Soviet-era science fiction after Khrushchev. Candidates are expected to teach a 4/4 load of freshman composition.


    The Shocking Paper Predicting the End of Democracy

    POLITICO, Rick Shenkman


    from

    Human brains aren’t built for self-rule, says Shawn Rosenberg. That’s more evident than ever.


    Why internet use estimates differ between surveys

    Pew Research Center, Arnold Lau


    from

    How many U.S. adults use the internet? This might seem like a straightforward question, with a straightforward way to find out: Just ask. Indeed, there is a lot of information available from large, high response rate federal surveys as well as from surveys conducted by Pew Research Center and other organizations. However, these different sources of information measure internet use in ways that can be tricky to reconcile. Depending on the source, the estimated share of Americans who were online as of 2017 and 2018 is as low as 79% or as high as 89%.

    Pew Research Center has been tracking Americans’ internet behavior for years, all the while focusing on personal use from any location. While our approach has evolved over time, the current measurement consists of two questions: “Do you use the internet or email, at least occasionally?” and “Do you access the internet on a cellphone, tablet or other mobile handheld device, at least occasionally?” In a January 2018 phone survey, 89% of U.S. adults said “yes” to at least one of these questions.

     
    Events



    Santa Fe Institute presents SFI’s Stanislaw Ulam Memorial Lecture Series, with Lauren Ancel Meyers

    Santa Fe Institute


    from

    Santa Fe, NM September 23-24. Lauren Ancel Meyers, Professor of Integrative Biology at The University of Texas at Austin, on how the science of networks, using social media & internet search data, might help prevent the next pandemic.” [free, tickets required]


    Socinfo2019 – 11th International Conference on Social Informatics

    Social Informatics 2019


    from

    Doha, Qatar November 18-21. “SocInfo is an interdisciplinary venue for researchers from Computer Science, Informatics, Social Sciences and Management Sciences to share ideas and opinions, and present original research work on studying the interplay between socially-centric platforms and social phenomena.”

     
    Deadlines



    eLife Latest: Highlighting meta-research

    “eLife welcomes the submission of more studies in meta-research via our website. Authors should select “Initial Submission: Feature Article” when submitting, and are requested to include a list of 4-6 potential referees in their cover letter: for each potential referee, please include a one-sentence explanation of why they are suitable.”
     
    Tools & Resources



    Data Hub: A Generalized Metadata Search & Discovery Tool

    LinkedIn Engineering; Mars Lan, Seyi Adebajo, Shirshanka Das


    from

    As the operator of the world’s largest professional network and the Economic Graph, LinkedIn’s Data team is constantly working on scaling its infrastructure to meet the demands of our ever-growing big data ecosystem. As the data grows in volume and richness, it becomes increasingly challenging for data scientists and engineers to discover the data assets available, understand their provenances, and take appropriate actions based on the insights. To help us continue scaling productivity and innovation in data alongside this growth, we created a generalized metadata search and discovery tool, Data Hub.


    Workflow systems turn raw data into scientific knowledge

    Nature, Toolbox, Jeffrey M. Perkel


    from

    CWL [Common Workflow Language] is a way of describing analytical pipelines and computational tools — one of more than 250 systems now available, including such popular options as Snakemake, Nextflow and Galaxy. Although they speak different languages and support different features, these systems have a common aim: to make computational methods reproducible, portable, maintainable and shareable. CWL is essentially an exchange language that researchers can use to share pipelines for whichever system. For Finn, that language brought sanity to his codebase, reducing it by around 73%. Importantly, it has made it easier to test, execute and share new methods, and to run them on the cloud.

    There is a learning curve to adopting workflow languages. But, says Brian Naughton, data lead and co-founder of the drug-discovery firm Hexagon Bio in Menlo Park, California, “the energy that you expend learning is more than made up for by the energy you save in having your code be reproducible.”


    3 Ways to Manage Human Bias in the Analytics Process

    KDnuggets, Kalina Angelova


    from

    As businesses turn to machine learning to automate processes, questions have been raised about the ethical implications of computers making decisions. How do we address the potential for bias? What’s less talked about, but equally important, is the topic of human bias as it relates to analytics and business decision making. Human bias can enter the analytics process every step of the way. As business decision-makers start looking to predictive analytics for specific advice on what action to take next, it’s important that data and methods are leveraged as objectively as possible. The responsibility of monitoring human bias in analytics is a great one, and it all starts with the people building models from the ground up: data scientists.

    As a data scientist navigating human bias in analytics every day, here are three areas I focus on to manage bias throughout the process—from evaluating the initial request and collecting information to building the model and mining for insights.

     
    Careers


    Full-time positions outside academia

    Lead Data Scientist



    Girl Scouts; New York, NY

    Solutions Engineer



    Fred Hutchinson Cancer Research Center, Hutch Data Commonwealth; Seattle, WA

    Leave a Comment

    Your email address will not be published.