Fitbit’s product has a very good problem. The amount of data it tracks can be overwhelming. The devices ingest fitness data and feed it into the applications, where it’s paired with data from the user’s app interactions. From these data sets, the data science team can build a comprehensive user profile. There are occupational hazards, though. What you gain in information, you can lose in clarity.
“The volume of data does makes it challenging,” Raj says. “We have to make sure that we’re scaling both our hardware and our ETL processes. Storage is essentially a solved problem, so we focus our efforts towards compute time and processing.”
Nature, Scientific Data; Sancy A. Leachman, Brian M. Bot et al.
from
Sensor-embedded phones are an emerging facilitator for participant-driven research studies. Skin cancer research is particularly amenable to this approach, as phone cameras enable self-examination and documentation of mole abnormalities that may signal a progression towards melanoma. Aggregation and open sharing of this participant-collected data can be foundational for research and the development of early cancer detection tools. Here we describe data from Mole Mapper, an iPhone-based observational study built using the Apple ResearchKit framework. The Mole Mapper app was designed to collect participant-provided images and measurements of moles, together with demographic and behavioral information relating to melanoma risk. The study cohort includes 2,069 participants who contributed 1,920 demographic surveys, 3,274 mole measurements, and 2,422 curated mole images. Survey data recapitulates associations between melanoma and known demographic risks, with red hair as the most significant factor in this cohort. Participant-provided mole measurements indicate an average mole size of 3.95 mm. These data have been made available to engage researchers in a collaborative, multidisciplinary effort to better understand and prevent melanoma.
Apple’s WebKit team today proposed a new Community Group at the W3C to discuss the future of 3D graphics on the Web, and to develop a standard API that exposes modern GPU features including low-level graphics and general purpose computation. W3C Community Groups allow all to freely participate, and we invite browser engineers, GPU hardware vendors, software developers and the Web community to join us.
To kick off the discussion, we’re sharing an API proposal, and a prototype of that API for the WebKit Open Source project. We hope this is a useful starting point, and look forward to seeing the API evolve as discussions proceed in the Community Group.
The American Association for the Advancement of Science (AAAS) and the Bill & Melinda Gates Foundation have formed a partnership to advance scientific communication and open access publishing. The partnership will also ensure open access to research funded by the Gates Foundation and published in the Science family of journals.
A machine learning algorithm designed to teach computers how to recognize photos, speech patterns, and hand-written digits has now been applied to a vastly different set a data: identifying different phases of condensed matter.
In a project half-jokingly called “Phasebook,” two Perimeter researchers showed that a neural network system – a standard part of today’s powerful artificial intelligence (AI) algorithms – can also identify phase transitions between states of matter. The research, published today in the journal Nature Physics, validates the idea that the relationship between theoretical physics and AI can be a fruitful, two-way exchange.
There’s a lot of interest in becoming a data scientist, and for good reasons: high impact, high job satisfaction, high salaries, high demand. A quick search yields a plethora of possible resources that could help — MOOCs, blogs, Quora answers to this exact question, books, Master’s programs, bootcamps, self-directed curricula, articles, forums and podcasts. Their quality is highly variable; some are excellent resources and programs, some are click-bait laundry lists. Since this is a relatively new role and there’s no universal agreement on what a data scientist does, it’s difficult for a beginner to know where to start, and it’s easy to get overwhelmed.
Animal tracking data is a useful tool for understand how animals behave, and in the case of this bison project, how well they’re adapting to their new environment.
However, as useful as it is, a recent report in the journal Conservation Biology has found that this data is vulnerable to hackers, poachers, and members of the public looking to interfere with the animals. As the lead author of that report, Steven Cooke puts it, “There’re always going to be people with nefarious purposes.” [audio, 7:22]
A new study of songbird dehydration and survival risk during heat waves in the United States desert Southwest suggests that some birds are at risk of lethal dehydration and mass die-offs when water is scarce, and the risk is expected to increase as climate change advances.
Using physiological data, hourly temperature maps and modeling, first author Tom Albright at the University of Nevada, Reno, with Blair Wolf at the University of New Mexico and Alexander Gerson at the University of Massachusetts Amherst investigated how rates of evaporative water loss varied in five bird species with varied body mass. They mapped potential effects of current and future heat waves on lethal dehydration risk for songbirds in the Southwest and how rapidly this can occur in each species. Details are in an issue of Proceedings of the National Academy of Sciences.
[CEO Jack] orsey credited the company’s machine learning efforts for that success. Now he hopes they can take that further, beyond just the timeline, and expand it to other parts of Twitter.
“The other thing that we’re investing a lot in is making sure that we apply machine learning more broadly around our entire experience,” Dorsey said. “We can get a lot smarter and provide more magical experiences for people around showing them what’s breaking in real time and giving them a sense of what’s going on without having to do as much work as they currently have to do on the platform.”
The goal seems to be to use machine learning to pull away even farther from the real-time experience to a more relevant, curated one.
Artificial intelligent music and soundtracks? Yeah, that’s a thing now. AI is slowly but steadily moving in on the music industry. From composing to collaborating with humans, AI is proving to be a huge player in music creation. While still in its beginning stages, Artificial Intelligence and music make quite a compelling duo….at least from what we can see so far.
Artificially Intelligent technology like Jukedeck, Sony’s Flow Machine, and IBM’s Watson Beat are providing new ways for brands to get access to unique style tones for their marketing videos. While a musician can cost as much as a few hundred to a couple of thousand dollars for a jingle, an AI-composed jingle from Jukedeck will only run you about $21.99.
Since its inception in December 2016, the small community of Data For Democracy volunteers has grown into a network of over 700 people, spanning a range of locations, timezones, and backgrounds — as you can see on our brand new website.
This group of passionate and civic-minded people is applying a diverse set of skills and knowledge to an equally varied selection of projects, and has made remarkable progress in the past two months! Here are some highlights of what we’ve been up to.
When I ask people to picture a coder, they usually imagine someone like Mark Zuckerberg: a hoodied college dropout who builds an app in a feverish 72-hour programming jag—with the goal of getting insanely rich and, as they say, “changing the world.”
But this Silicon Valley stereotype isn’t even geographically accurate. The Valley employs only 8 percent of the nation’s coders. All the other millions? They’re more like Devon, a programmer I met who helps maintain a security-software service in Portland, Oregon. He isn’t going to get fabulously rich, but his job is stable and rewarding: It’s 40 hours a week, well paid, and intellectually challenging. “My dad was a blue-collar guy,” he tells me—and in many ways, Devon is too.
Woodrow Wilson School of Public and International Affairs
from
Can liberty survive the digital age? In this episode, Princeton University professors Jennifer Rexford and Janet Vertesi discuss internet infrastructure and its effect on how people use the web as a vehicle for communication and information. [audio, 42:15]
Speaking at an event marking the tenth anniversary of the University of Washington’s Department of Global Health on Wednesday night, Gates said collaboration between science, tech, and global health organizations is one reason they’re starting to see more data.
“When you start to see how biology is coming together with computer science; how computer science and mathematical modeling are coming together with diseases; how we’re taking global health and looking at our local population… the amount of data that we’re finally getting and combing through is making us so much more knowledgeable about where the problems are,” she said. “Not just the problems at a scale level, but the problems at a community level and how to actually intervene.”
Researchers at Indiana University-Purdue University Indianapolis have developed a machine learning algorithm that can correctly predict relapse rates for acute myelogenous leukemia (AML) with 90 percent accuracy – and remission rates 100 percent of the time.
Using bone marrow and medical history data from AML patients along with information from healthy individuals, the small study highlighted the potential for machine learning to eventually replace traditional manual flow cytometry data analysis for highly accurate and timely results.
A Boston-based startup called Gamalon has developed technology that lets computers do this in some situations, and it is releasing two products Tuesday based on the approach.
If the underlying technique can be applied to many other tasks, then it could have a big impact. The ability to learn from less data could let robots explore and understand new environments very quickly, or allow computers to learn about your preferences without sharing your data.
“Too many businesses now are pitching AI almost as though it’s batteries included. I think that’s dangerous because it’s going to potentially lead to over-investment in things that overpromise. Then when they under-deliver, it has a deflationary effect on people’s attitudes toward the space. It almost belittles the problem itself. Not everything requires the latest whiz-bang technology. In fact, the dirty secret of machine learning—and, in a way, venture capital—is so many problems could be solved by just applying simple regression analysis. Yet, very few people, very few industries do the bare minimum.” [video, 26:48]
Penrose is still defining the way we see the universe. But, in today’s world of ultra-specialised science, could a thinker of such breadth ever emerge again?
Life scientists keen to share their findings online before peer review are spoilt for choice. Whereas physicists gravitate to one repository — the ‘preprint’ server arXiv — life sciences has a fast-growing roster of venues for preprints. There’s the biology-focused bioRxiv, and a biology section on arXiv too. But other sites have sprouted up in the past year, or soon will do, and these too provide opportunities for life sciences: ChemRxiv for chemistry, psyArXiv for psychology; even AgriXiv for agricultural sciences and paleorXiv for palaeontology.
Now, a coalition of biomedical funders and scientists is throwing its weight behind a ‘one-stop shop’ for all life-sciences preprints — a move that its backers argue should clarify any confusion and make it easier to mine the preprint literature for insights. On 13 February, ASAPbio, a grassroots group of biologists that advocates for preprints, issued a funding call to build a central preprint site; the US National Institutes of Health (NIH), the Wellcome Trust and several other leading funders announced their support for the concept.
Palo Alto, CA On April 6, 2017, CodeX – the Stanford Center for Legal Informatics will host the CodeX FutureLaw 2017, CodeX’s fifth annual conference focusing on how technology is changing the landscape of the legal profession, the law itself, and how these changes impact us all. [$$$]
Amherst, MA Dates have been set for the 2017 Five College DataFest! The event will take place March 31 – April 2, 2017. Students from the five colleges, Smith, Mt Holyoke, Amherst, Hampshire, and UMass Amherst, are invited to participate. Registration will be open soon.
The Michigan Data Science Team and the Michigan Student Symposium for Interdisciplinary Statistical Sciences have partnered with the City of Detroit on a data challenge that seeks to answer the question: How can blight ticket compliance be increased? The competition runs through March 15.
Sam Bowman’s research group was awarded two grants this month:
A Google Faculty Research Award entitled “A Corpus and Challenge for Cross-Genre Natural Language Understanding” to me, Adina Williams and Nikita Nangia will support the creation of a new public benchmark dataset for sentence understanding technologies.
An NVIDIA GPU Seed Grant will provide hardware to support three projects, headed by WooJin Chung, Andrew Drozdov and Kelly Zhang.
“In this tutorial, I’ll start with the basics of neural networks and deep learning (from scratch). Along with theory, we’ll also learn to build deep learning models in R using MXNet and H2O package.”
“This blog post, and the next one in the series, aim to share what I learned in exploring Tangram. This post will cover basic setup and concepts and also explore visualizing data on a choropleth map.”