NYU Data Science newsletter – March 3, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for March 3, 2016

GROUP CURATION: N/A

 
Data Science News



New emphasis on data science valuable to students in all fields

The Daily Californian


from March 01, 2016

Universities are occasionally accused — especially by those who see technology as predestined to disrupt all traditional modes of doing business — of being too resistant to change. Our learning environments, some say, look the same as they did 500 years ago, with approaches to pedagogy mired in threadbare tradition and hidebound custom. The university is committed only to the preservation of its own authority and practices, say the critics, noting that professors still actually don medieval robes (O.K., admittedly, this last part is true … but only for special occasions!).

The reality is that change and innovation are unceasing at pre-eminent universities such as UC Berkeley. We are constantly reinventing our teaching methods to meet new student needs and redesigning our physical spaces to accommodate new technology. And, within the greater framework of a liberal education, we are continually re-evaluating our academic offerings to ensure that we can provide each generation of students with the skills and abilities they need to flourish once they graduate.

On this last front, one particularly salient change here on our campus was the launch of the Berkeley Data Science Education Program (DSEP) this past semester.

 

Former Facebook Data Scientist Shares How to Wrangle Your Data

Inc.com, Bill Carmody


from March 02, 2016

While the deep technical expertise of a data scientist is necessary on some projects, you don’t need them to build a culture of data-driven decision-making. Here’s how to empower your team.

 

The Big List of Internet of Things Companies | App Development Marketplace

AppIndex


from March 01, 2016

… There are, of course, many parts to the internet of things: the sensors and their operating systems, the networks over which the sensors communicate and their protocols, the servers to receive and store all the data and the software to either automate or analyse it all. Below we have listed some of the best companies providing each of these services, both on a general level and in more specific sectors such as the smart home, security and fitness: the latter has had the most media coverage of IoT applications thanks to things like Fitbit and other fitness-tracking devices.

 

The Mirage of a Citizen Data Scientist

KDnuggets, Gregory Piatetsky


from March 02, 2016

… I dislike “citizen data scientist” for two reasons.

First, the word “citizen” has very misleading connotations, especially given the heated debate now in the US regarding immigration.

Having organized and attended many US conferences on data mining and data science, I observed that the majority of researchers and attendees are actually immigrants or visitors. Whether they have become US citizens like me, or are permanent residents, or visitors, has very little relation to the quality of their work.

Second, the term “citizen” Data Scientist implies that people without much training can do the work of a Data Scientist.

 

Precision Medicine at one year: A soaring White House summit and the potholes ahead

HealthNewsReview.org


from February 29, 2016

Last week there was a big shindig at the White House reviewing progress from the first year of the million-person Precision Medicine Initiative (PMI). … This is an exciting scientific undertaking — one that merits the attention these outlets have devoted to it. But the coverage sounded mostly like cheerleading, and none of these stories included a skeptical word about the many challenges ahead and how they could thwart the initiative’s lofty objectives. I watched the webcast of the event with a critical eye and took notes as I was watching. Here are a six of the things that I thought journalists should have been thinking about and writing about as they covered the event.

Also:

  • Precision medicine for understudied populations (TEDMED blog, Roxanne Dameshjou, March 2)
  • Vanderbilt, Google’s Verily to Launch Precision Medicine Initiative Cohort (GEN News Highlights, February 25)
  • Obama Is Using The Bully Pulpit To Set Patient Data Free (Forbes, Matthew Herper, February 25)
  • President Weighs In on Data From Genes (The New York Times, February 25)
  • Biden’s cancer moonshot effort looks at Utah database (STAT, February 26)
  •  

    Precision medicine for understudied populations

    TEDMED Blog, Roxanne Dameshjou


    from March 02, 2016

    Precision medicine, which leverages a patient’s genetics to help make medical decisions, has the potential to revolutionize medicine. Its applications are numerous: from predicting who may have an adverse reaction to a medication, to allowing targeted therapies of cancer with particular mutations. In 2015, President Obama’s State of the Union announced an initiative to further our understanding of precision medicine and to build the infrastructure to implement it. An important part of this initiative is building a large diverse research cohort to help discover disease-gene and drug-gene associations. The key word is diverse – because genetic risk factors can be population-specific. In the past, individuals of African, Hispanic, and Middle Eastern ancestry have been understudied. Only by including individuals from all different ancestral backgrounds can we hope to implement precision medicine in an inclusive way.

     

    Renter Beware: Study Finds Craigslist Catches Barely Half of Scam Rental Listings

    NYU, Tandon School of Engineering


    from March 01, 2016

    Apartment hunters in big cities know the drill: They spot a listing for a well-priced, attractive place and make an inquiry, only to be met with demands for an instant credit check or an upfront fee to access the full listing. Savvier home hunters spot these scams immediately, but others fall through the cracks, making popular rental listing sites like Craigslist a highly lucrative spot for fraud.

    A new study by researchers at the New York University Tandon School of Engineering finds that Craigslist fails to identify more than half of scam rental listings on the site’s pages and that suspicious postings often linger for as long as 20 hours before being removed—more than enough time to snare victims, especially in competitive housing markets.

    The research team was led by Damon McCoy, an assistant professor of computer science and engineering, along with Elaine Shi, an assistant professor of computer science at Cornell University, and Youngsam Park, a doctoral student at the University of Maryland.

     

    Computational Social Science: Toward a Collaborative Future

    Microsoft Research, Hannah Wallach


    from March 01, 2016

    Fifteen years ago, as an undergraduate computer science student in the UK, I read a popular sci-
    ence article (Matthews, 1999) profiling the research of my now colleague, Duncan Watts. This
    article, about the science of small-world networks, changed my life. To understand why, though,
    it’s necessary to know that in the UK, there is (or at least was during the 1980s and 1990s) a pro-
    found ”them-versus-us” split between the STEM fields and all other disciplines. This split is
    amplified or perhaps even caused by the fact that people specialize very young—choosing, at age
    fifteen or sixteen, whether they will ever take another math course or write another essay again. I,
    like everyone else in my degree program, had chosen STEM, but my decision hadn’t been easy—I
    had also wanted to study the social sciences. The article about Duncan’s research changed my life
    because it had never before occurred to me that that math and computers could be used to study
    social phenomena. For the first time, I realized that rather than studying
    either
    computer science
    or
    the social sciences, perhaps I could study both. This, then, became my motivating goal

     

    Questions for Thomas Bourgeron: In search of ‘second hits’

    Spectrum – Autism Research News


    from March 01, 2016

    Thomas Bourgeron, professor of genetics at the Institut Pasteur in Paris, is among a group of researchers taking a second look at people who carry a harmful mutation in a known autism gene. Scientists often exclude these individuals from studies under the assumption that the genetic cause of their autism has already been identified, he says. Bourgeron says scouring their genomes may reveal additional mutations or ‘second hits’ that help to explain how autism is inherited and why it is so diverse.

    We asked Bourgeron how painting a more detailed picture of autism’s genetic landscape will help to advance the field.

     

    Patients key to making sense of medical data

    MIT Sloan School of Management


    from March 01, 2016

    rom brain activity to muscle performance, the human body produces two terabytes worth of data in a given day. This data provides valuable insight into body and mind activity, said Ben Schlatka, vice president of corporate development and co-founder of MC10—but no physician today is willing or able to process that much information.

    “You need to abstract [data] in a simplified way if you want to do business in a real-world ambulatory setting,” Schlatka said during a panel discussion on the future of medical device analytics at the Feb. 26 MIT Sloan Healthcare and Bioinnovations Conference, held at the MIT Media Lab.

    With so many companies trying to provide data analytics services to the health care industry, Schlatka and his fellow panelists spoke about how important it is for firms to build a viable business model.

     
    Events



    How Research is Improving Government



    The Center for Public Scholarship (CPS) at The New School presents “From Social Science to Social Policy: How Research is Improving Government,” a conversation about President Obama’s executive order mandating the application of social science findings to make government policies more effective.

    In a conversation moderated by Kenneth Prewitt (Columbia University), Cass Sunstein (Harvard Law School) and Daniel Kahneman (Princeton University) reflect on the changes this executive order is likely to produce and the kinds of social science findings that are most relevant to this effort to make policy more effective as well as whether there may be a downside.

    Thursday, March 3, starting at 6 p.m., The New School, Theresa Lang Community and Student Center, 2nd Floor,55 West 13th St.

     

    The 2016 Machine Learning Conference NYC



    The Machine Learning Conference (MLconf) began in 2012, as a partnership with Carnegie Mellon University’s GraphLab team, to gather the thought leaders in Machine Learning, specifically Graph Databases.

    Friday, April 15, 230 Fifth Ave, NYC

     
    Deadlines



    Open Web Fellows Program

    deadline: subsection?

    The Open Web Fellows program — a collaboration between Ford Foundation and Mozilla – is an international leadership initiative that brings together technology talent and civil society organizations to advance and protect the open Web.

    Deadline to apply is Sunday, March 20.

     

    Call for Proposals – 2016 Data Science for Social Good

    deadline: subsection?

    The UW eScience Institute is excited to announce its second offering of the Data Science for Social Good summer program.

    We invite short proposals (1-2 pages) for 10-week data-intensive research projects requiring collaboration in scalable data management, statistical analysis and machine learning, open source software development, cloud and cluster computing, and/or visualization. Accepted projects will be undertaken by a team of four interdisciplinary students and co-led by the proposal PI and a Data Scientist in the eScience Institute (see Project Teams below).

    Deadline for proposals is Wednesday, March 30.

     

    Call for Data Science Fellow Applications

    deadline: subsection?

    The Berkeley Institute for Data Science (BIDS) invites applications for our Data Science Fellow Program.

    Successful applicants will join our current cohort of fellows in helping make data analysis easier in the research sciences. BIDS data science fellows are postdoctoral scholars, graduate student researchers, or staff with excellent credentials in their fields as well as strong interests in advancing data-analysis approaches with a community of like-minded individuals from across campus.

    Deadline to apply is Friday, April 1.

     

    Data & Society is hiring two postdoctoral scholars.

    deadline: subsection?

    Data & Society is looking for two postdocs whose work complements research projects currently underway at the Institute. Postdocs will be expected to spend half of their time directly contributing to an existing Data & Society project, under the supervision of the project lead, and half to their own research, under the supervision of danah boyd.

    Postdocs will publish research and advance scholarly conversations while engaging with broader constituencies across numerous disciplines and perspectives. They will be expected to be active members of the Data & Society community – participating in seminars, research exchange, and events.

    Deadline to submit materials is Friday, April 1.

     

    Call for Participation – DML2016

    deadline: subsection?

    Join us October 5-7, 2016 for the 7th annual Digital Media and Learning Conference. This international gathering brings together a vibrant and diverse community of innovators, thinkers, and progressive educators to delve into leading-edge topics in digital media and learning. We build connections across research, design, and practice in the service of progressive, equitable, and youth-centered approaches to learning with technology.

    Deadline to apply to participate is Monday, April 4.

     
    Tools & Resources



    Kafka: The Definitive Guide 

    O'Reilly Media; Neha Narkhede, Gwen Shapira, Todd Palino


    from February 29, 2016

    Learn how to take full advantage of Apache Kafka, the distributed, publish-subscribe queue for handling real-time data feeds. With this comprehensive book, you’ll understand how Kafka works and how it’s designed. Authors Neha Narkhede, Gwen Shapira, and Todd Palino show you how to deploy production Kafka clusters; secure, tune, and monitor them; write rock-solid applications that use Kafka; and build scalable stream-processing applications.

     

    Create standalone simulation tools to facilitate collaboration between HEP and machine learning community

    The Journal of Brief Ideas; Pierre Baldi et al.


    from March 03, 2016

    Discussions at recent workshops have made it clear that one of the key barriers to collaboration between high energy physics and the machine learning community is access to training data. Recent successes in data sharing through the HiggsML and Flavours of Physics Kaggle challenges have borne much fruit, but required significant effort to coordinate.

    While static simulated datasets are useful for challenges, in the course of investigating new machine learning techniques it is advantageous to be able to generate training data on demand (e.g. Refs. 1, 2, 3 ).
    Therefore we recommend efforts be made to produce the ingredients required to facilitate such collaboration.

     

    TensorFlow machine learning with financial data on Google Cloud Platform

    Google Cloud Platform Blog


    from March 02, 2016

    If you knew what happened in the London markets, how accurately could you predict what will happen in New York? It turns out, this is a great scenario to be tackled by machine learning!

    The premise for this problem is that by following the sun and using data from markets that close earlier, such as London that closes 4.5 hours ahead of New York, you could more accurately predict market behaviors 7 out of 10 times.

    We’ve published a new solution, TensorFlow Machine Learning with Financial Data on Google Cloud Platform, that looks at this problem. We hope you’ll enjoy exploring it with us interactively in the Google Cloud Datalab notebook we provide.

     

    Leave a Comment

    Your email address will not be published.