NYU Data Science newsletter – June 30, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for June 30, 2016

GROUP CURATION: N/A

 
Data Science News



Tweet of the Week

Twitter, Peter Krafft


from June 24, 2016

 

Meet the challenge of interdisciplinary science

Nature News & Comment, Editorial


from June 29, 2016

To tackle society’s challenges through research requires the engagement of multiple disciplines. … To highlight the issues that arise in such research, imagine an integrated project to determine the causes of destructive risk-taking in inner-city adolescents and to identify appropriate interventions. Such a programme might combine disciplines ranging from anthropology, sociology, psychology, law, economics and ethics to psychiatry, health systems, urban design and developmental neurobiology.

 

Tweet of the Week

Twitter, Micah Blake McCurdy


from April 07, 2015

 

Why Scientists Are So Worried about Brexit

MIT Technology Review


from June 20, 2016

A report by the House of Lords reported in April that “the overwhelming balance of opinion from the UK science community” opposed Brexit. … Why? Partly because the EU funds a lot of science and technology research for its member countries, with 74.8 billion euros budgeted from 2014 to 2020.

More Brexit:

  • Brexit polling: What went wrong? (June 24, Andrew Gelman, Statistical Modeling, Causal Inference, and Social Science blog)
  • Brexit: voter turnout by age (June 24, Financial Times, John Burn-Murdoch)
  • After Brexit, the race is on to replace London as Europe’s startup capital (June 27, Quartz, Joon Ian Wong)
  • Channelling Brexit anger (June 30, Times Higher Education)
  •  

    Biden threatens funding cuts for researchers who fail to report clinical trial results

    STAT


    from June 29, 2016

    At a national cancer summit Wednesday, Vice President Joe Biden threatened to cut funds to medical research institutions that don’t report their clinical trial results in a timely manner.

    “Under the law, it says you must report. If you don’t report, the law says you shouldn’t get funding,” Biden said, citing a STAT investigation that found widespread reporting lapses.

     

    Google says machine learning is the future. So I tried it myself

    The Guardian, Technology


    from June 28, 2016

    If deep learning will be as big as the internet, it’s time for everyone to start looking closely at it.

     

    Where Will All the New Neuroscientists Go?

    Scientific American Blog Network, Gary Stix


    from June 19, 2016

    Leaders in the field highlight the need for new career paths to accommodate a flood of PhDs.

    Also in neuroscience:

  • Brain Scanning Just Got Very Good—and Very Unsettling (June 21, IEEE Spectrum))
  •  

    Why Google Stores Billions of Lines of Code in a Single Repository

    Communications of the ACMACM


    from July 01, 2016

    Early Google employees decided to work with a shared codebase managed through a centralized source control system. This approach has served Google well for more than 16 years, and today the vast majority of Google’s software assets continues to be stored in a single, shared repository. Meanwhile, the number of Google software developers has steadily increased, and the size of the Google codebase has grown exponentially (see Figure 1). As a result, the technology used to host the codebase has also evolved significantly.

     

    Microsoft CEO Satya Nadella: Humans and A.I. can work together to solve society’s challenges.

    Slate


    from June 28, 2016

    Microsoft’s CEO explores how humans and A.I. can work together to solve society’s greatest challenges.

     

    Researchers Sue the Government Over Computer Hacking Law

    WIRED, Security


    from June 29, 2016

    In the age of big data analytics, the proprietary algorithms web sites use to determine what data to display to visitors have the potential to illegally discriminate against users. This is particularly troublesome when it comes to employment and real estate sites, which could prevent users from having a fair crack at jobs and housing simply by failing to display certain listings to them based on their race or gender.

    But four academic researchers who specialize in uncovering algorithmic discrimination say that a decades-old federal anti-hacking statute is preventing them from doing work to detect such discrimination. They say a provision of the Computer Fraud and Abuse Act could be used to criminally prosecute them for research that involves scraping publicly available data from these sites or creating anonymous user accounts on them, if the sites’s terms of service prohibit this activity.

     

    [1606.08562] Complex Systems and a Computational Social Science Perspective on the Labor Market

    arXiv, Computer Science > Social and Information Networks; Abdullah Almaatouq


    from June 28, 2016

    Labor market institutions are central for modern economies, and their polices can directly affect unemployment rates and economic growth. At the individual level, unemployment often has a detrimental impact on people’s well-being and health. At the national level, high employment is one of the central goals of any economic policy, due to its close association with national prosperity. The main goal of this thesis is to highlight the need for frameworks that take into account the complex structure of labor market interactions. In particular, we explore the benefits of leveraging tools from computational social science, network science, and data-driven theories to measure the flow of opportunities and information in the context of the labor market. First, we investigate our key hypothesis, which is that opportunity/information flow through weak ties, and this is a key determinant of the length of unemployment. We then extend the idea of opportunity/information flow to clusters of other economic activities, where we expect the flow within clusters of related activities to be higher than within isolated activities. This captures the intuition that within related activities there are more “capitals” involved and that such activities require similar “capabilities.” Therefore, more extensive clusters of economic activities should generate greater growth through exploiting the greater flow of opportunities and information. We quantify the opportunity/information flow using a complexity measure of two economic activities (i.e. jobs and exports).

     

    [1606.08813] EU regulations on algorithmic decision-making and a “right to explanation”

    arXiv, Statistics > Machine Learning; Bryce Goodman, Seth Flaxman


    from June 28, 2016

    We summarize the potential impact that the European Union’s new General Data Protection Regulation will have on the routine use of machine learning algorithms. Slated to take effect as law across the EU in 2018, it will restrict automated individual decision-making (that is, algorithms that make decisions based on user-level predictors) which “significantly affect” users. The law will also create a “right to explanation,” whereby a user can ask for an explanation of an algorithmic decision that was made about them. We argue that while this law will pose large challenges for industry, it highlights opportunities for machine learning researchers to take the lead in designing algorithms and evaluation frameworks which avoid discrimination.

     

    Brexit polling: What went wrong?

    Andrew Gelman, Statistical Modeling, Causal Inference, and Social Science blog


    from June 24, 2016

    I could’ve just as well titled this, “Brexit prediction markets: What went wrong?” But it seems pretty clear that the prediction markets were following the polls.

    More Brexit:

  • Why Scientists Are So Worried about Brexit (June 20, MIT Technology Review)
  • Brexit: voter turnout by age (June 24, Financial Times, John Burn-Murdoch)
  • After Brexit, the race is on to replace London as Europe’s startup capital (June 27, Quartz, Joon Ian Wong)
  • Channelling Brexit anger (June 30, Times Higher Education)
  •  

    Peter Scholze And The Future Of Arithmetic Geometry

    Quanta Magazine, Erica Klarreich


    from June 28, 2016

    In 2010, a startling rumor filtered through the number theory community and reached Jared Weinstein. Apparently, some graduate student at the University of Bonn in Germany had written a paper that redid “Harris-Taylor” — a 288-page book dedicated to a single impenetrable proof in number theory — in only 37 pages. The 22-year-old student, Peter Scholze, had found a way to sidestep one of the most complicated parts of the proof, which deals with a sweeping connection between number theory and geometry.

    “It was just so stunning for someone so young to have done something so revolutionary,” said Weinstein, a 34-year-old number theorist now at Boston University. “It was extremely humbling.”

    Mathematicians at the University of Bonn, who made Scholze a full professor just two years later, were already aware of his extraordinary mathematical mind. After he posted his Harris-Taylor paper, experts in number theory and geometry started to notice Scholze too.

     

    Suddenly Everybody Is Obsessed with A.I.—Even If Investors Don’t Get It | Vanity Fair

    Vanity Fair, The Hive blog


    from June 29, 2016

    As Silicon Valley investors and tech giants continue to pour cash into burgeoning artificial intelligence technologies such as machine learning and chatbots, the relatively nascent A.I. industry is emerging as the latest mega-hot new ticket in town—the heir to online delivery apps, anything-hailing services, and virtual reality start-ups. But much like another buzz-worthy predecessor, Big Data, many A.I. cheerleaders and investment check signatories probably don’t quite understand it. But in Silicon Valley, when has that ever stopped anyone?

     

    NOAA establishes new panel to guide sustained National Climate Assessment

    NOAA


    from June 29, 2016

    NOAA today announced the appointment of 15 members to the new Advisory Committee for the Sustained National Climate Assessment. The committee will advise NOAA on sustained climate assessment activities and products, including engagement of stakeholders. NOAA will ensure the committee’s advice is provided to the White House Office of Science and Technology Policy (OSTP) for use by the United States Global Change Research Program (USGCRP), a confederation of the research arms of 13 federal departments and agencies, which carry out research and develop and maintain capabilities to support the Nation’s understanding and response to global change. OSTP requested NOAA lead the federal advisory committee.

     

    How Emailing “I Love You” Translated Into $1 Million In Data Analysis Revenue

    Fast Company


    from June 29, 2016

    CB Insights added personality to its newsletter, and it’s become a real way to attract customers.

     

    Brain Scanning Just Got Very Good—and Very Unsettling – IEEE Spectrum

    IEEE Spectrum


    from June 21, 2016

    Seven years ago, the U.S. National Institutes of Health (NIH) decided to map all the connections in the brain. In 2010, the Human Connectome Project (HCP) was born. It has provided funding to the tune of $40 million to two collaborating consortia whose aim was to acquire and share high-resolution data of structural and functional connections in the human brain. The researchers have sought to understand, on a scale never before attempted, the neural pathways that make us human, and how changes in those pathways make us sick.

    At a symposium yesterday at the NIH campus in Bethesda, Maryland, top researchers from the HCP came together to provide an update on the project’s achievements and future directions.

     
    Events



    O’Reilly Artificial Intelligence Conference



    Discover the real-world opportunities of applied artificial intelligence

    New York, NY Monday-Tuesday, September 26-27.

     

    1-day Reproducibility Conference Coming to Columbia University December 2016!



    Columbia University and other New York City research institutions are hosting a one-day symposium to showcase a robust discussion of reproducibility and research integrity among leading experts, high-profile journal editors, funders and researchers.

    New York, NY Friday, December 9, at Columbia University. Registration coming soon.

     
    Deadlines



    COLING 2016

    deadline: subsection?

    Osaka, Japan COLING 2016, the 26th International Conference on Computational Linguistics, will be organized by the Association for Natural Language Processing (ANLP) from Sunday-Friday, December 11-16.

    Deadline for submissions is Friday, July 15.

     

    2016 Workshop on Visualization for the Digital Humanities

    deadline: subsection?

    Baltimore, MD We invite contributions for the 2016 Workshop on Visualization for the Digital Humanities. This will be a one day workshop taking place as part of IEEE VIS 2016.

    Deadline for submissions is Saturday, July 30.

     

    NYC Digital Humanities – Third Annual Graduate Student Project Award

    deadline: subsection?

    We are pleased to announce our third annual cross-institutional NYCDH digital humanities graduate student project award. We invite all graduate students attending an institution in New York City and the metropolitan area to apply.

    Deadline to apply is Monday, August 15.

     

    Leamer-Rosenthal Prizes for Open Social Science

    deadline: subsection?

    In order to promote transparent research, and to offer recognition and visibility to scholars practicing open social science, the John Templeton Foundation is generously supporting the Berkeley Initiative for Transparency in the Social Sciences to launch prizes named for pioneers who helped lay the foundations of research transparency: economist Edward E. Leamer and psychologist Robert Rosenthal.

    Deadline for nominations is Friday, September 16.

     

    Call for Computer Vision Research Proposals with New Amazon Bin Image Data Set

    deadline: subsection?

    The Amazon Academic Research Awards (AARA) program is soliciting computer vision research proposals for the first time. The AARA program funds academic research and related contributions to open source projects by top academic researchers throughout the world.

    Deadline for application submissions is Saturday, October 1.

     
    Tools & Resources



    working with spatial data – workshop materials

    GitHub – enjalot


    from June 25, 2016

    This workshop is designed to be very hands-on, with many examples that can be extended as exercises. It would be impossible to touch everything that we could find interesting in web mapping, so the hope is that after going through these three acts you will feel empowered to swap in your own data and leverage hundreds of examples in your own data visualization projects!

     

    Communicating data science: A guide to presenting your work

    Kaggle, no free hunch blog


    from June 29, 2016

    See the forest, see the trees. Here lies the challenge in both performing and presenting an analysis. As data scientists, analysts, and machine learning engineers faced with fulfilling business objectives, we find ourselves bridging the gap between The Two Cultures: sciences and humanities. After spending countless hours at the terminal devising a creative and elegant solution to a difficult problem, the insights and business applications are obvious in our minds. But how do you distill them into something you can communicate?

     

    Going beyond full utilization: The inside scoop on Nervana’s Winograd kernels

    Nervana Systems, Urs Köster and Scott Gray


    from June 28, 2016

    This is part 2 of a series of posts on how Nervana uses the Winograd algorithm to make convolutional networks faster than ever before. In the first part we focused on benchmarks demonstrating a 2-3x algorithmic speedup. This part will get a bit more technical and dive into the guts of how the Winograd algorithm works, and how we optimized it for GPUs.

     

    10 tips to make the most of a Datathon

    Juan Bernabe, Big Data Doctor blog


    from June 28, 2016

    A Datathon is the place where Data Scientists come to “work-out”, to release these endorphins and share it with other Data Lovers. It’s like a standard Hackathon, usually in the same format, but Tef-data-challengewhere the main character is the data.

    Motivated for the upcoming Telefónica Data Challenge, where I’m going to be on the other side (-unfortunately- in the jury, not as a participant), I’d like to share a few tips for Datathon participants, to make the most of the event.

     

    Wide & Deep Learning: Better Together with TensorFlow

    Google Research Blog, Heng-Tze Cheng


    from June 29, 2016

    The human brain is a sophisticated learning machine, forming rules by memorizing everyday events (“sparrows can fly” and “pigeons can fly”) and generalizing those learnings to apply to things we haven’t seen before (“animals with wings can fly”). Perhaps more powerfully, memorization also allows us to further refine our generalized rules with exceptions (“penguins can’t fly”). As we were exploring how to advance machine intelligence, we asked ourselves the question—can we teach computers to learn like humans do, by combining the power of memorization and generalization?

    It’s not an easy question to answer, but by jointly training a wide linear model (for memorization) alongside a deep neural network (for generalization), one can combine the strengths of both to bring us one step closer. At Google, we call it Wide & Deep Learning.

     

    Leave a Comment

    Your email address will not be published.