Data Science newsletter – February 15, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for February 15, 2017

GROUP CURATION: N/A

 
 
Data Science News



Why 2017 is Fitbit’s year of machine learning

MixPanel, The Signal blog


from

Fitbit’s product has a very good problem. The amount of data it tracks can be overwhelming. The devices ingest fitness data and feed it into the applications, where it’s paired with data from the user’s app interactions. From these data sets, the data science team can build a comprehensive user profile. There are occupational hazards, though. What you gain in information, you can lose in clarity.

“The volume of data does makes it challenging,” Raj says. “We have to make sure that we’re scaling both our hardware and our ETL processes. Storage is essentially a solved problem, so we focus our efforts towards compute time and processing.”


I am excited to announce that in March I will be running The Engine. Can’t wait to build the team.

Twitter, Katie Rae


from


The Mole Mapper Study, mobile phone skin imaging and melanoma risk data collected using ResearchKit

Nature, Scientific Data; Sancy A. Leachman, Brian M. Bot et al.


from

Sensor-embedded phones are an emerging facilitator for participant-driven research studies. Skin cancer research is particularly amenable to this approach, as phone cameras enable self-examination and documentation of mole abnormalities that may signal a progression towards melanoma. Aggregation and open sharing of this participant-collected data can be foundational for research and the development of early cancer detection tools. Here we describe data from Mole Mapper, an iPhone-based observational study built using the Apple ResearchKit framework. The Mole Mapper app was designed to collect participant-provided images and measurements of moles, together with demographic and behavioral information relating to melanoma risk. The study cohort includes 2,069 participants who contributed 1,920 demographic surveys, 3,274 mole measurements, and 2,422 curated mole images. Survey data recapitulates associations between melanoma and known demographic risks, with red hair as the most significant factor in this cohort. Participant-provided mole measurements indicate an average mole size of 3.95 mm. These data have been made available to engage researchers in a collaborative, multidisciplinary effort to better understand and prevent melanoma.


Next-generation 3D Graphics on the Web

Apple, WebKit, Dean Jackson


from

Apple’s WebKit team today proposed a new Community Group at the W3C to discuss the future of 3D graphics on the Web, and to develop a standard API that exposes modern GPU features including low-level graphics and general purpose computation. W3C Community Groups allow all to freely participate, and we invite browser engineers, GPU hardware vendors, software developers and the Web community to join us.

To kick off the discussion, we’re sharing an API proposal, and a prototype of that API for the WebKit Open Source project. We hope this is a useful starting point, and look forward to seeing the API evolve as discussions proceed in the Community Group.


AAAS and Gates Foundation Partnership Announcement

Science


from

The American Association for the Advancement of Science (AAAS) and the Bill & Melinda Gates Foundation have formed a partnership to advance scientific communication and open access publishing. The partnership will also ensure open access to research funded by the Gates Foundation and published in the Science family of journals.


The Neural Network: How artificial intelligence is fuelling ‘Phasebook’

Perimeter Institute, Inside the Perimeter


from

A machine learning algorithm designed to teach computers how to recognize photos, speech patterns, and hand-written digits has now been applied to a vastly different set a data: identifying different phases of condensed matter.

In a project half-jokingly called “Phasebook,” two Perimeter researchers showed that a neural network system – a standard part of today’s powerful artificial intelligence (AI) algorithms – can also identify phase transitions between states of matter. The research, published today in the journal Nature Physics, validates the idea that the relationship between theoretical physics and AI can be a fruitful, two-way exchange.


What’s The Best Path To Becoming A Data Scientist?

Forbes, Quora, Monica Rogati


from

There’s a lot of interest in becoming a data scientist, and for good reasons: high impact, high job satisfaction, high salaries, high demand. A quick search yields a plethora of possible resources that could help — MOOCs, blogs, Quora answers to this exact question, books, Master’s programs, bootcamps, self-directed curricula, articles, forums and podcasts. Their quality is highly variable; some are excellent resources and programs, some are click-bait laundry lists. Since this is a relatively new role and there’s no universal agreement on what a data scientist does, it’s difficult for a beginner to know where to start, and it’s easy to get overwhelmed.


Scientific data used to track and protect animals is vulnerable to hacking

CBC Radio, Day 6 with Brett Bambury


from

Animal tracking data is a useful tool for understand how animals behave, and in the case of this bison project, how well they’re adapting to their new environment.

However, as useful as it is, a recent report in the journal Conservation Biology has found that this data is vulnerable to hackers, poachers, and members of the public looking to interfere with the animals. As the lead author of that report, Steven Cooke puts it, “There’re always going to be people with nefarious purposes.” [audio, 7:22]


Study: Climate change puts songbirds at risk

The Recorder, Greenfield MA


from

A new study of songbird dehydration and survival risk during heat waves in the United States desert Southwest suggests that some birds are at risk of lethal dehydration and mass die-offs when water is scarce, and the risk is expected to increase as climate change advances.

Using physiological data, hourly temperature maps and modeling, first author Tom Albright at the University of Nevada, Reno, with Blair Wolf at the University of New Mexico and Alexander Gerson at the University of Massachusetts Amherst investigated how rates of evaporative water loss varied in five bird species with varied body mass. They mapped potential effects of current and future heat waves on lethal dehydration risk for songbirds in the Southwest and how rapidly this can occur in each species. Details are in an issue of Proceedings of the National Academy of Sciences.


Twitter hopes machine learning can save it from oblivion

VentureBeat, Chris O'Brien


from

[CEO Jack] orsey credited the company’s machine learning efforts for that success. Now he hopes they can take that further, beyond just the timeline, and expand it to other parts of Twitter.

“The other thing that we’re investing a lot in is making sure that we apply machine learning more broadly around our entire experience,” Dorsey said. “We can get a lot smarter and provide more magical experiences for people around showing them what’s breaking in real time and giving them a sense of what’s going on without having to do as much work as they currently have to do on the platform.”

The goal seems to be to use machine learning to pull away even farther from the real-time experience to a more relevant, curated one.


Artificial Intelligence Music and Soundtracks are a thing now

Knowmail, Matthew T. Alan


from

Artificial intelligent music and soundtracks? Yeah, that’s a thing now. AI is slowly but steadily moving in on the music industry. From composing to collaborating with humans, AI is proving to be a huge player in music creation. While still in its beginning stages, Artificial Intelligence and music make quite a compelling duo….at least from what we can see so far.

Artificially Intelligent technology like Jukedeck, Sony’s Flow Machine, and IBM’s Watson Beat are providing new ways for brands to get access to unique style tones for their marketing videos. While a musician can cost as much as a few hundred to a couple of thousand dollars for a jingle, an AI-composed jingle from Jukedeck will only run you about $21.99.


The First Two Months of D4D

Medium, Data For Democracy


from

Since its inception in December 2016, the small community of Data For Democracy volunteers has grown into a network of over 700 people, spanning a range of locations, timezones, and backgrounds — as you can see on our brand new website.

This group of passionate and civic-minded people is applying a diverse set of skills and knowledge to an equally varied selection of projects, and has made remarkable progress in the past two months! Here are some highlights of what we’ve been up to.


The Next Big Blue-Collar Job Is Coding

WIRED, Business, Clive Thompson


from

When I ask people to picture a coder, they usually imagine someone like Mark Zuckerberg: a hoodied college dropout who builds an app in a feverish 72-hour programming jag—with the goal of getting insanely rich and, as they say, “changing the world.”

But this Silicon Valley stereotype isn’t even geographically accurate. The Valley employs only 8 percent of the nation’s coders. All the other millions? They’re more like Devon, a programmer I met who helps maintain a ­security-software service in Portland, Oregon. He isn’t going to get fabulously rich, but his job is stable and rewarding: It’s 40 hours a week, well paid, and intellectually challenging. “My dad was a blue-­collar guy,” he tells me—and in many ways, Devon is too.


WooCast: How the Digital Cookies Crumble

Woodrow Wilson School of Public and International Affairs


from

Can liberty survive the digital age? In this episode, Princeton University professors Jennifer Rexford and Janet Vertesi discuss internet infrastructure and its effect on how people use the web as a vehicle for communication and information. [audio, 42:15]


Melinda Gates on the importance (and lack) of big data in global health

GeekWire, Clare McGrane


from

Speaking at an event marking the tenth anniversary of the University of Washington’s Department of Global Health on Wednesday night, Gates said collaboration between science, tech, and global health organizations is one reason they’re starting to see more data.

“When you start to see how biology is coming together with computer science; how computer science and mathematical modeling are coming together with diseases; how we’re taking global health and looking at our local population… the amount of data that we’re finally getting and combing through is making us so much more knowledgeable about where the problems are,” she said. “Not just the problems at a scale level, but the problems at a community level and how to actually intervene.”


Machine Learning Predicts Leukemia Remission with 100% Accuracy

HealthIT Analytics


from

Researchers at Indiana University-Purdue University Indianapolis have developed a machine learning algorithm that can correctly predict relapse rates for acute myelogenous leukemia (AML) with 90 percent accuracy – and remission rates 100 percent of the time.

Using bone marrow and medical history data from AML patients along with information from healthy individuals, the small study highlighted the potential for machine learning to eventually replace traditional manual flow cytometry data analysis for highly accurate and timely results.


AI Software Juggles Probabilities to Learn from Less Data

MIT Technology Review, Will Knight


from

A Boston-based startup called Gamalon has developed technology that lets computers do this in some situations, and it is releasing two products Tuesday based on the approach.

If the underlying technique can be applied to many other tasks, then it could have a big impact. The ability to learn from less data could let robots explore and understand new environments very quickly, or allow computers to learn about your preferences without sharing your data.


The dirty secret of machine learning

O'Reilly Radar, Jenn Webb and David Beyer


from

“Too many businesses now are pitching AI almost as though it’s batteries included. I think that’s dangerous because it’s going to potentially lead to over-investment in things that overpromise. Then when they under-deliver, it has a deflationary effect on people’s attitudes toward the space. It almost belittles the problem itself. Not everything requires the latest whiz-bang technology. In fact, the dirty secret of machine learning—and, in a way, venture capital—is so many problems could be solved by just applying simple regression analysis. Yet, very few people, very few industries do the bare minimum.” [video, 26:48]


Roger Penrose and the vision thing

Prospect Magazine, Philip Ball


from

Penrose is still defining the way we see the universe. But, in today’s world of ultra-specialised science, could a thinker of such breadth ever emerge again?


Heavyweight funders back central site for life-sciences preprints

Nature News & Comment, Ewan Callaway


from

Life scientists keen to share their findings online before peer review are spoilt for choice. Whereas physicists gravitate to one repository — the ‘preprint’ server arXiv — life sciences has a fast-growing roster of venues for preprints. There’s the biology-focused bioRxiv, and a biology section on arXiv too. But other sites have sprouted up in the past year, or soon will do, and these too provide opportunities for life sciences: ChemRxiv for chemistry, psyArXiv for psychology; even AgriXiv for agricultural sciences and paleorXiv for palaeontology.

Now, a coalition of biomedical funders and scientists is throwing its weight behind a ‘one-stop shop’ for all life-sciences preprints — a move that its backers argue should clarify any confusion and make it easier to mine the preprint literature for insights. On 13 February, ASAPbio, a grassroots group of biologists that advocates for preprints, issued a funding call to build a central preprint site; the US National Institutes of Health (NIH), the Wellcome Trust and several other leading funders announced their support for the concept.

 
Events



CodeX FutureLaw 2017

Stanford Law School


from

Palo Alto, CA On April 6, 2017, CodeX – the Stanford Center for Legal Informatics will host the CodeX FutureLaw 2017, CodeX’s fifth annual conference focusing on how technology is changing the landscape of the legal profession, the law itself, and how these changes impact us all. [$$$]


Looking forward to the 2017 Five College DataFest

Five College DataFest


from

Amherst, MA Dates have been set for the 2017 Five College DataFest! The event will take place March 31 – April 2, 2017. Students from the five colleges, Smith, Mt Holyoke, Amherst, Hampshire, and UMass Amherst, are invited to participate. Registration will be open soon.

 
Deadlines



MDST announces Detroit blight data challenge

The Michigan Data Science Team and the Michigan Student Symposium for Interdisciplinary Statistical Sciences have partnered with the City of Detroit on a data challenge that seeks to answer the question: How can blight ticket compliance be increased? The competition runs through March 15.

Call for Big Boulder 2017 Speakers

Boulder, CO The conference is June 1-2. Deadline for submissions is Friday, April 28.
 
NYU Center for Data Science News



Two grants to Sam Bowman

New York University, Department of Linguistics


from

Sam Bowman’s research group was awarded two grants this month:

  • A Google Faculty Research Award entitled “A Corpus and Challenge for Cross-Genre Natural Language Understanding” to me, Adina Williams and Nikita Nangia will support the creation of a new public benchmark dataset for sentence understanding technologies.
  • An NVIDIA GPU Seed Grant will provide hardware to support three projects, headed by WooJin Chung, Andrew Drozdov and Kelly Zhang.
  •  
    Tools & Resources



    Big Data Meets Computation workshop

    IPAM


    from

    Videos now available for most presentations.


    Deep Learning & Parameter Tuning with MXnet, H2o Package in R

    HackerEarth Blog,


    from

    “In this tutorial, I’ll start with the basics of neural networks and deep learning (from scratch). Along with theory, we’ll also learn to build deep learning models in R using MXNet and H2O package.”


    Twitter CLI

    GitHub – sferik


    from

    A command-line power tool for Twitter.


    Exploring New Technologies for Making Maps. Vector Tiles & WebGL (Part One)

    Bocoup, Yannick Assogba


    from

    “This blog post, and the next one in the series, aim to share what I learned in exploring Tangram. This post will cover basic setup and concepts and also explore visualizing data on a choropleth map.”

     
    Careers


    Postdocs

    Postdoctoral fellow position to study microbiome-behavior interactions



    Champalimaud Neuroscience Programme; Lisbon, Portugal

    Leave a Comment

    Your email address will not be published.