NYU Data Science newsletter – November 3, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for November 3, 2015

GROUP CURATION: N/A

 
Data Science News



Who is allowed to call themselves a data scientist?

Win-Vector Blog, John Mount


from October 30, 2015

Given we agree data science exists, who is allowed to call themselves a data scientist?

There is a school of thought that you can not call yourself a data scientist unless you master all of the following:

  • Statistical learning theory
  • High dimensional geometry
  • Optimization theory
  • Petabyte scale operations
  • Advanced programming
  • Combinatorics and algebra
  • Theoretical computer science
  • Measure theory
  • All of statistics
  • SQL
  • noSQL
  • Distributed System design
  •  

    The Power of Open-Data

    Bloomberg L.P., Arnaud Sahuguet


    from October 27, 2015

    After recently participating in Bloomberg’s 2015 Data for Good Exchange, it became even more apparent that making decisions about public problems should be (1) data-driven, (2) collaborative and (3) participatory.

    Data-driven approaches are the foundation of any kind of science. While companies, governments and non-profits have embraced this thinking, collaboration is rare. Why?

    Issues of funding and attributions often make the incentives misaligned between parties. In the private sector, collaborations exist, but usually on both ends of a legal contract. In the non-profit world, most often each player is convinced his or her solution is better. Inside government, we see little collaboration because of turf-wars, but also because of ignorance.

     

    Do one thing…

    O'Reilly Radar, Mike Loukides


    from October 30, 2015

    I’ve been lamenting the demise of the Unix philosophy: tools should do one thing, and do it well. The ability to connect many small tools is better than having a single tool that does everything poorly.

    That philosophy was great, but hasn’t survived into the Web age. Unfortunately, nothing better has come along to replace it. Instead, we have “convergence”: a lot of tools converging on doing all the same things poorly.

    The poster child for this blight is Evernote.

     

    Open Access at a Crossroads

    The Scholarly Kitchen


    from October 28, 2015

    Last week marked the annual celebration/marketing event that is Open Access Week, and this year it seemed something of a mixed bag. Open access (OA) is growing into maturity, and has rapidly become integrated into the scholarly publishing landscape over the last fifteen or so years. We have now reached a point where experiments have been in place for a while and results can be analyzed. Early assumptions can now be measured and the move to OA seems to have reached something of a crossroads.

    Make no mistake, OA is here to stay, and there is no crisis of confidence, at least as far as the continuing growth in access to the research literature. But the repercussions of the business models and methodologies chosen for OA are beginning to be recognized.

     

    Data mining Instagram feeds can point to teenage drinking patterns : NewsCenter

    University of Rochester, News Center


    from October 29, 2015

    Instagram could offer a novel way of monitoring the drinking habits of teenagers.

    Using photos and text from Instagram, a team of researchers from the University of Rochester has shown that this data can not only expose patterns of underage drinking more cheaply and faster than conventional surveys, but also find new patterns, such as what alcohol brands or types are favored by different demographic groups. The researchers say they hope exposing these patterns could help develop effective intervention.

    Instagram is very popular among teenagers and it offers large amounts of information about this target population in the form of photos and text. As Jiebo Luo, professor of computer science at the University of Rochester, and his colleagues describe in a new paper, underage drinkers “are willing to share their alcohol consumption experience” in social media. Studying the social media behavior of this group allows the researchers to observe it passively in an “undisturbed state.”

     

    A New and Stunning Metric from NIH Reveals the Real Nature of Scientific Impact | ASCB

    ASCB, The Activation Energy Blog


    from October 22, 2015

    What if I told you that nearly 90 percent of the publications which have profoundly influenced the life sciences did not appear in a high-impact factor journal? If you signed the San Francisco Declaration on Research Assessment, you probably aren’t surprised. If you haven’t signed DORA, it may be time for you to reconsider the connection between true breakthrough papers and so-called journal impact factors (JIFs).

    Today we received strong evidence that significant scientific impact is not tied to the publishing journal’s JIF score. First results from a new analytical method that the National Institutes of Health (NIH) is calling the Relative Citation Ratio (RCR) reveal that almost 90% of breakthrough papers first appeared in journals with relatively modest journal impact factors. According to the RCR, these papers exerted major influence within their fields yet their impact was overlooked, not because of their irrelevance, but because of the widespread use of the wrong metrics to rank science.

    In the initial RCR analysis carried out by NIH, high impact factor journals (JIF ? 28) account for only 11% of papers that have high RCR (3 or above). Here is hard evidence for what DORA supporters have been saying since 2012. Using the JIF to credit influential work means overlooking 89% of similarly influential papers published in less prestigious venues.

     

    NYU Medical School adds big data to course requirements

    MedCity News


    from November 01, 2015

    For generations, physicians have been trained in basic science and human anatomy to diagnose and treat the individual patient.

    But now, massive stores of data about what works for which patients are literally changing the way medicine is practiced. “That’s how we make decisions; we make them based on the truth and the evidence that are present in those data,” says Marc Triola, an associate dean at New York University School of Medicine.

    Figuring out how to access and interpret all that data is not a skill that most physicians learned in medical school. In fact, it’s not even been taught in medical school, but that’s changing.

     

    Two paths to the future of artificial intelligence « Ken’s Blog

    Ken Perlin, Ken's Blog


    from October 31, 2015

    Halloween is an appropriate time to talk about a potentially very scary topic: Possible future paths toward Artificial Intelligence.

    All human and animal intelligence has evolved according to one principle: The fittest species survives. And this principle infuses every member of every species. Both collectively and as individuals, our most powerful instinct is to continue to stay alive.

    So it stands to reason that as computers continue to increase in power, and artificial intelligence is therefore able to come ever closer to the level of richness and complexity that we associate with natural intelligence, there are at least two possible ways we can achieve “sentient” level AI.

     

    Malaysia testing ‘artificial intelligence’ for prisons

    GovInsider.Asia


    from October 30, 2015

    A police station in Malaysia has a new system that can detect if inmates get into a fight or try to escape by using new video analysis techniques.

    The Smart Lock-Up system analyses footage from surveillance cameras in cells, corridors and along the prison’s perimeter, detecting behaviour like climbing, loitering, fighting, suicide attempts and vandalism. It then locates sends an alert to the authorities.

     

    NYU Pollack Center and Cornerstone Debut Database to Track SEC Enforcement Actions

    NYU News


    from October 27, 2015

    The NYU Pollack Center for Law & Business, in partnership with Cornerstone Research, announces the launch of the Securities Enforcement Empirical Database (SEED), located at seed.law.nyu.edu. SEED tracks and records information for US Securities and Exchange Commission (SEC) enforcement actions against public companies. With data on SEC actions filed since October 1, 2009, SEED offers insight into multiyear trends and priorities in federal securities enforcement.

    “Our goal is to shed light on the SEC’s securities law enforcement decisions,” says Stephen Choi, Murray and Kathleen Bring Professor of Law and director of the Pollack Center. “SEED is the first public database to provide easily searchable and verified data to researchers, counsel, and corporations, as well as regular reports on developments and trends.”

     

    Accelerating scientific publication in biology

    Proceedings of the National Academy of Sciences


    from October 26, 2015

    Scientific publications enable results and ideas to be transmitted throughout the scientific community. The number and type of journal publications also have become the primary criteria used in evaluating career advancement. Our analysis suggests that publication practices have changed considerably in the life sciences over the past 30 years. More experimental data are now required for publication, and the average time required for graduate students to publish their first paper has increased and is approaching the desirable duration of PhD training. Because publication is generally a requirement for career progression, schemes to reduce the time of graduate student and postdoctoral training may be difficult to implement without also considering new mechanisms for accelerating communication of their work. The increasing time to publication also delays potential catalytic effects that ensue when many scientists have access to new information. The time has come for life scientists, funding agencies, and publishers to discuss how to communicate new findings in a way that best serves the interests of the public and the scientific community.

     

    Apple’s Secrecy Hurts Its AI Software Development

    Bloomberg Business


    from October 29, 2015

    In the world of artificial intelligence, one of the year’s biggest coming-out parties is the Neural Information Processing Systems conference. Thousands of researchers from universities and software companies gather to share their work and wrestle with new ways to tailor software to people’s habits. At last year’s conference in Montreal, employees of Google, Microsoft, and IBM presented papers on teaching computers to work faster and smarter, such as by reading the house numbers in a photo to determine an address. But one player was conspicuously absent: Apple. This year, Chinese search giant Baidu and Facebook, along with Google and Microsoft, are slated to present papers. Apple isn’t.

    Apple researchers attended the Montreal conference last year but kept a low profile and didn’t say who they worked for unless asked, says Yoshua Bengio, an AI pioneer and professor of computer science at the University of Montreal. This is typical of the company’s appearances at the field’s big AI conferences, say Bengio and other prominent researchers. “Apple is off the scale in terms of secrecy,” says Richard Zemel, a professor in the computer science department at the University of Toronto. “They’re completely out of the loop.” Apple declined to comment for this story.

     
    Deadlines



    KDD Cup 2016 Call for Proposals

    deadline: subsection?

    We invite organizations’ proposals for KDD Cup 2016. Starting in 1997, KDD Cup has been the most prestigious annual data mining competition held in conjunction with the ACM SIGKDD conference on Knowledge Discovery and Data Mining.

    Deadline for Proposals is Monday, December 7.

     

    Developing a Research Agenda for Human-Centered Data Science Workshop

    deadline: subsection?

    The study and analysis of large and complex data sets offer a wealth of insights in a variety of applications. Computational approaches provide researchers access to broad assemblages of data, but the insights extracted may lack the rich detail that qualitative approaches have brought to the understanding of sociotechnical phenomena. How do we preserve the richness associated with traditional qualitative methods while utilizing the power of large data sets? How do we uncover social nuances or consider ethics and values in data use?

    Deadline for Workshop Submissions is Friday, December 11

     

    Leave a Comment

    Your email address will not be published.