NYU Data Science newsletter – May 31, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 31, 2016

GROUP CURATION: N/A

 
Data Science News



Jeremy M. Berg Named Editor-in-Chief of the Science Family of Journals at AAAS

AAAS


from May 25, 2016

Jeremy Berg, Associate Senior Vice Chancellor for Science Strategy and Planning in the Health Sciences at the University of Pittsburgh and former director of the National Institute of General Medical Sciences at the U.S. National Institutes of Health (NIH), has been named by the Board of Directors of the American Association for the Advancement of Science (AAAS) to serve as editor-in-chief of the Science family of journals, beginning 1 July 2016.

Berg, who also holds positions as Pittsburgh Foundation Professor and Director of the Institute for Personalized Medicine, Professor of Computational and Systems Biology, and Professor of Chemistry at the University of Pittsburgh, will become the 20th editor-in-chief of Science since the journal’s inception in 1880.

 

The White House Is Finally Prepping for an AI-Powered Future | WIRED

WIRED, Business


from May 30, 2016

Researchers disagree on when artificial intelligence that displays something like human understanding might arrive. But the Obama administration isn’t waiting to find out. The White House says the government needs to start thinking about how to regulate and use the powerful technology while it is still dependent on humans.

“The public should have an accurate mental model of what we mean when we say artificial intelligence,” says Ryan Calo, who teaches law at University of Washington. Calo spoke last week at the first of four workshops the White House hosts this summer to examine how to address an increasingly AI-powered world.

 

Doctors Test Tools to Predict Your Odds of a Disease

Wall Street Journal


from May 30, 2016

Thomas McGinn, chairman of medicine at a major New York hospital system, is betting he can predict if a patient has strep, pneumonia or other ailments not by ordering traditional lab tests or imaging scans, but by calculating probabilities with a software program.

Also, in healthcare:

  • The Economic Consequences of Hospital Admissions (May 30, National Bureau of Economic Research)
  • Are patients giving away too much data with wearable devices? (May 26, MedCity News)
  • Hi Reddit, we’re Nick and Cori Ruktanonchai, and we published a paper in PLOS Computational Biology on how mobile phone data can target malaria elimination efforts — Ask Us Anything! (May 25, reddit.com/r/science)
  •  

    When a Robot Books Your Airline Ticket

    The New York Times


    from May 30, 2016

    Jay Baer, a digital marketing consultant in Bloomington, Ind., spends half his time traveling on business. That means he also has to spend hours each week coordinating that travel.

    Help has arrived with the Pana app, which employs artificial intelligence to aid customers.

    Virtual travel assistant services — some from established companies like Facebook, IBM and Expedia, and others from new entrants like Pana and HelloGbye — are now popping up worldwide, just as major hotel chains like Starwood and Hilton are incorporating robots into their everyday operations.

     

    SAGE Open five years on: Lessons learned and future thoughts on open access in humanities and social sciences.

    London School of Economics, The Impact Blog; Dave Ross


    from May 30, 2016

    SAGE Open is celebrating its 5th birthday. When SAGE Publishing launched SAGE Open in 2010, the humanities and social sciences were still grappling with how to approach open access (OA). Through its mega-journal, well over 1000 articles have now been published OA, and it is one of SAGE’s most-used journals. Dave Ross looks back at the journal’s growth and lessons learned.

     

    We need to know the algorithms the government uses to make important decisions about us

    The Conversation, Nick Diakopoulos


    from May 23, 2016

    In criminal justice systems, credit markets, employment arenas, higher education admissions processes and even social media networks, data-driven algorithms now drive decision-making in ways that touch our economic, social and civic lives. These software systems rank, classify, associate or filter information, using human-crafted or data-induced rules that allow for consistent treatment across large populations.

    But while there may be efficiency gains from these techniques, they can also harbor biases against disadvantaged groups or reinforce structural discrimination. In terms of criminal justice, for example, is it fair to make judgments on an individual’s parole based on statistical tendencies measured across a wide group of people? Could discrimination arise from applying a statistical model developed for one state’s population to another, demographically different population?

     

    Artificial Intelligence Requires Thoughtful Policymaking, Experts Say

    AAAS


    from May 17, 2016

    With appropriate policies in place, robots should become our “best friends,” not our “worst nightmare,” experts said at the 41st Annual AAAS Forum on Science & Technology Policy on 14 April.

    During a panel, entitled “Best Friend or Worst Nightmare? Autonomy and AI in the Lab and in Society,” experts on artificial intelligence (AI) spoke about the role of policy in integrating new technologies into people’s lives. They both praised current AI advancements, and urged more policymaking in the arena of autonomous systems, particularly related to disaster relief, sustainability, and the military, among other applications. [video, 1:28]

     

    Mariano Sigman: Your words may predict your future mental health

    TED Talk, TED.com


    from May 25, 2016

    Can the way you speak and write today predict your future mental state, even the onset of psychosis? In this fascinating talk, neuroscientist Mariano Sigman reflects on ancient Greece and the origins of introspection to investigate how our words hint at our inner lives and details a word-mapping algorithm that could predict the development of schizophrenia. “We may be seeing in the future a very different form of mental health,” Sigman says, “based on objective, quantitative and automated analysis of the words we write, of the words we say.”

     

    Machine-Learning Radars May Come to Automotive

    EE Times


    from May 26, 2016

    The IMEC research institute (Heverlee, Belgium) plans to make future sensors — specifically radar sensors — as well as devices that extract useful information locally and even become learning machines.

    IMEC is already working with automotive radar market leader Infineon Technologies AG at 79GHz in 28nm CMOS. Now it wants to go to a yet smaller wavelength and add machine learning to the back end of its sensors said Wim van Thillo, program director for perceptive systems at IMEC, speaking at the IMEC Technology Forum.

     

    Why Do So Many Studies Fail to Replicate?

    The New York Times, SundayReview, Jan Van Bavel


    from May 27, 2016

    Last year, a colleague asked me if I would send her the materials needed to try to replicate one of my published papers — that is, to rerun the study to see if its findings held up. “I’m not trying to attack you or anything,” she added apologetically.

    I laughed. To a scientist, replication is like breathing. Successful replications strengthen findings. Failed replications root out false claims and help refine imprecise ones. Testing and retesting make science what it is.

    But I understood why my colleague was being delicate. Around that time, the largest replication project in the history of psychology was underway. This initiative, called the Reproducibility Project, reran 100 studies published in prominent psychology journals.

     

    [1605.08535] Deep API Learning

    arXiv, Computer Science > Software Engineering; Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim


    from May 27, 2016

    Developers often wonder how to implement a certain functionality (e.g., how to parse XML files) using APIs. Obtaining an API usage sequence based on an API-related natural language query is very helpful in this regard. Given a query, existing approaches utilize information retrieval models to search for matching API sequences. These approaches treat queries and APIs as bag-of-words (i.e., keyword matching or word-to-word alignment) and lack a deep understanding of the semantics of the query.
    We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query. Instead of a bags-of-words assumption, it learns the sequence of words in a query and the sequence of associated APIs. DeepAPI adapts a neural language model named RNN Encoder-Decoder. It encodes a word sequence (user query) into a fixed-length context vector, and generates an API sequence based on the context vector. We also augment the RNN Encoder-Decoder by considering the importance of individual APIs. We empirically evaluate our approach with more than 7 million annotated code snippets collected from GitHub. The results show that our approach generates largely accurate API sequences and outperforms the related approaches.

     

    Are patients giving away too much data with wearable devices?

    MedCity News


    from May 26, 2016

    New technologies even help us monitor our moods and the quality of our sleep. Are you more depressed during the winter months? Does your stress level rise before a meeting with your boss? Do you get sleepy after your lunch break? Armed with this information, users can gain a better understanding of themselves, identifying patterns and using that data to make lifestyle changes.

    All good, right?

    That all depends on who benefits from these tools, who is accessing the data and why.

     

    The First Visual Search Engine for Scientific Diagrams

    MIT Technology Review


    from May 27, 2016

    A machine-vision algorithm has learned to analyze and categorize scientific figures.

     

    Here’s how text analysis is transforming social-science research

    The Washington Post, Monkey Cage blog; Joshua Tucker and Margaret Roberts


    from May 27, 2016

    The journal Political Analysis has recently published a “virtual issue” on “Recent Innovations in Text Analysis for Social Science.” In addition to the guest editor’s introduction, there are seven papers in the virtual issue. All of the papers are available for free reading online, for a limited time. I spoke to University of California at San Diego political scientist Margaret Roberts, who edited the issue, about the subject matter. What follows is a lightly edited version of our discussion.

    Also, in text analysis:

  • Mariano Sigman: Your words may predict your future mental health (May 25, TED Talk, TED.com)
  • Text as Data 2016 research conference (June 20, Northeastern University)
  •  
    Events



    TreesCount! Data Jam – Jun 4, 2016 : NYC Parks



    To spark and sustain public engagement, NYC Parks launched the TreesCount! 2015 campaign. To date, more than 2,300 New Yorkers have volunteered helping complete the first comprehensive map of our city’s street trees. Now, we are looking for data scientists, statisticians, developers, designers, visualizers, cartographers, and quants to help us transform the data gathered thus far into actionable insights. Bring your skills, questions, and creativity to this data jam!

    New York, NY Saturday, June 4, starting at 10 a.m., Civic Hall
    (156 Fifth Avenue) [$]

     
    Deadlines



    Workshop on Algorithms for Modern Massive Data Sets.

    deadline: subsection?

    Registration fees are waived for students (non postdoc) with an approved poster presentation.

    Berkeley, CA Tuesday-Friday, June 21-24 in Stanley Hall.

    Deadline for submissions is Sunday, June 12.

     
    Tools & Resources



    Introducing BoxArt: A Library to Help Build HTML Games

    Bocoup


    from May 23, 2016

    We’ve been busy building some Open Web Games at Bocoup. As we did so, we realized there was a dearth of resources for making performant, fun web games using the DOM. Most material aimed at game developers focuses on canvas rendering, and there aren’t many resources for web developers that show them how to use the accessible and responsive HTML they already know to build games. To address this we are excited to announce BoxArt to share the lessons we have learned while building modern DOM games.

     

    How to Spot Bullshit: A Primer by Princeton Philosopher Harry Frankfurt

    Open Culture


    from May 30, 2016

    … The bullshit artist’s approach is far more vague. It’s about creating a general impression.

    There are times when I admit to welcoming this sort of manure. As a maker of low budget theater, your honest opinion of any show I have Little Red Hen’ed into existence is the last thing I want to hear upon emerging from the cramped dressing room, unless you truly loved it. [video, 5:50]

     

    [1605.07723] Data Programming: Creating Large Training Sets, Quickly

    arXiv, Statistics > Machine Learning; Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré


    from May 25, 2016

    Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive part of applying machine learning. We therefore propose a paradigm for the programmatic creation of training sets called data programming in which users provide a set of labeling functions, which are programs that heuristically label large subsets of data points, albeit noisily. By viewing these labeling functions as implicitly describing a generative model for this noise, we show that we can recover the parameters of this model to “denoise” the training set. Then, we show how to modify a discriminative loss function to make it noise-aware.

     
    Careers



    Research Fellow (74572-056) at University of Warwick
     

    University of Warwick, Tobias Preis
     

    Lesson learned 1 week as a data analyst
     

    Medium, Robin Lee
     

    Leave a Comment

    Your email address will not be published.