Data collaboratives are partnerships to share and use data for the public good. They require partnerships between organizations that have data, can derive insights from the data, and take some action or make a decision that is informed by those insights. They often involve collaborations between companies, university researchers, non-profit organizations, and government agencies.
Much of the excitement about data collaboratives stems from the flood of data from the private sector, such as social media data, mobile Call Data Records, satellite imagery, and e-commerce transactions. There are many scenarios in which private sector data can be reused to create additional public value by helping to solve some important societal problem.
A portable research lab developed by University of Wisconsin–Madison scientists and engineers recently returned home to Madison following a 22,000-mile journey to the Philippine Sea and back during the heart of monsoon season.
The lab, known as SPARCLET, traveled aboard the research vessel Thomas G. Thompson for two months to aid in a study called the Propagation of Intra-Seasonal Tropical Oscillations, or PISTON. It is aimed at better understanding how pollutants and turbulent conditions over the Philippine Sea affect the region and influence global weather.
If you want official numbers on how 2018 ranks in the annals of recent record-breaking temperatures, you’ll have to wait.
One result of the government shutdown, now in its fourth week, is that NASA and the National Oceanic and Atmospheric Administration are unable to issue their annual temperature analysis. And, because that data is so widely used, neither can some other governments.
Facebook became an incredibly successful advertising platform in part because it allows marketers to show people ads using fine-grained categories, which are generated based on an individual’s behavior. The company says this allows it to show users ads that are more relevant to their interests. But its data collection practices also have led to a series of privacy scandals over the past several years, along with increased scrutiny from lawmakers around the globe.
In response to questions about its targeting practices, Facebook has said that anyone can use the platform’s ad preferences menu to see and control how Facebook has categorized them. But a new survey from Pew Research Center suggests that the vast majority of US users isn’t aware that Facebook tracks their interests and traits this way. When respondents found out, most said they were uncomfortable with the assumptions the social network had made.
The promise of so-called “low code/no code” software-development tools is to enable anyone to create business applications around their custom needs. It sounds like Amazon Web Services is getting ready to extend that idea to everyone.
Based on several LinkedIn resumes and a recent tech talk, it now seems like more than 50 engineers are working on a secretive low-code/no-code project that’s part of an effort called AWS For Everyone. Earlier reports indicated that AWS has for some time been working on a cloud service that would allow people with little to no software development experience create simple business applications without having to call up the IT department, but it wasn’t clear what that entailed.
The age of artificial intelligence is upon us. AI is no longer a future technology but a present one. The AI revolution is highly global, with nations such as China playing a leading role in AI innovation. The 116th Congress has a valuable part in ensuring continued American competitiveness in AI innovation, especially human capital development and smart, sensible regulation.
The U.S. lacks a comprehensive national AI strategy. By contrast, over a dozen other nations and international organizations have published AI strategies. For example, the European Union has released its AI strategy with a focus on investing in its innovation ecosystem, developing talent, building a common data space in compliance with data principles, and developing ethics to create trust. According to the EU Commission, “the ambition is then to bring Europe’s ethical approach to the global stage.”
If you’re looking for a window into contemporary youth culture, there is nothing better than TikTok. The social short-video app’s primary feature is copyright agreements that let users record themselves lip-syncing to popular music, but it also plays host to a rapidly flourishing meme ecosystem. Spend a modicum of time with its videos and you’ll notice recurring motifs: Fortnite dances, T-poses, salutes, kids tying nooses (made of toilet paper) around their necks. But most pervasive is that essential tradition of youth: irony.
If you download TikTok and flip through the creative, lighthearted video clips trending on the app’s own network, you might feel relaxed. It’s just people goofing around and having fun, remixing soundbites and running jokes! TikTok can often seem like an oasis, a retreat from the more toxic sectors of the internet.
Data Artisans was founded in 2014 by the team leading the development of Apache Flink, an open source large-scale data processing technology. The startup offers its own dA Platform, with open source Apache Flink and Application Manager, to enterprise customers that include Netflix, ING, Uber and Alibaba itself.
The Chinese e-commerce giant has been working with Data Artisans since 2016, through support and open source work to help the architecture and performance of the software, both companies said in statements. Data Artisans is on record as raising $6.5 million over two rounds, most recently a Series A in 2016 led by Intel Capital, but there was a seemingly unannounced Series B which closed last year and it looks like Alibaba was involved, according to a blog post from Data Artisans co-founders Kostas Tzoumas and Stephan Ewen.
Can machine learning address the kind of ethnic and racial disparities in the criminal justice system that propelled the Black Lives Matter movement? Or help develop personalized treatments for the “silent epidemic” of brain injury, which affects 1.5 million Americans every year?
Two teams of faculty from seven disciplines will begin answering those questions with the second set of Phase I grants awarded by the University of Miami Laboratory for Integrative Knowledge, or U-LINK. A key initiative of the Roadmap to Our New Century, U-LINK was launched two years ago to foster interdisciplinary collaborations and new approaches to complex problems across the University.
There is something that is weighing heavily on the minds of some infectious diseases scientists these days. It’s not the challenging Ebola outbreak in the Democratic Republic of the Congo, though that is deeply concerning. It’s not a new flu virus or slashed research budgets or laboratory safety violations.
It’s an international treaty. More specifically, it’s an agreement within a treaty that could, depending on how negotiations play out, make it extraordinarily difficult to conduct disease surveillance or forge research collaborations around the world.
The agreement — known as the Nagoya Protocol — could drown researchers in oceans of paperwork and hobble the world’s scientists when they must next race to combat a new disease disaster, some fear.
The editorial board of an influential scientometrics journal — the Journal of Informetrics — has resigned in protest over the open-access policies of its publisher, Elsevier, and launched a competing publication.
The board told Nature that given the journal’s subject matter — the assessment and dissemination of science — it felt it needed to be at the forefront of open publishing practices, which it says includes making bibliographic references freely available for analysis and reuse, and being open access and owned by the community.
Bellevue, WA May 29-June 1. “2019. SDSS provides a unique opportunity for data scientists, computer scientists, and statisticians to come together and exchange ideas.” [$$$]
London, England Feb. 11-13 at Imperial College London. “The intersection of the fields of dynamical systems and machine learning is largely unexplored, and the goal of this symposium is to bring together researchers from these fields to fill the gap between the theories of dynamical systems and machine learning.” [Pre-registration required]
San Francisco, CA July 10-11. “Transform’s content is focused on the strategic and practical applications of AI. These include cases studies, panels, and workshops.” [$$$$]
Austin, TX January 25 at AT&T Conference Center. “We decided create a new event dedicated solely to Artificial Intelligence – The Texas AI Summit – to be held on the day before Data Day Texas. This allows folks the opportunity to attend Data Day, or The Texas AI Summit — or both.” [$$$]
University of California-Berkeley, RISE Lab; Sanjay Krishnan, Zongheng Yang, Joe Hellerstein, and Ion Stoica
from
What is the role of machine learning in the design and implementation of a modern database system? This question has sparked considerable recent introspection in the data management community, and the epicenter of this debate is the core database problem of query optimization, where the database system finds the best physical execution path for an SQL query.
We present a global dataset of anthropogenic carbon dioxide (CO2) emissions for 343 cities. The dataset builds upon data from CDP (187 cities, few in developing countries), the Bonn Center for Local Climate Action and Reporting (73 cities, mainly in developing countries), and data collected by Peking University (83 cities in China). The CDP data being self-reported by cities, we applied quality control procedures, documented the type of emissions and reporting method used, and made a correction to separate CO2 emissions from those of other greenhouse gases. Further, a set of ancillary data that have a direct or potentially indirect impact on CO2 emissions were collected from other datasets (e.g. socio-economic and traffic indices) or calculated (climate indices, urban area expansion), then combined with the emission data. We applied several quality controls and validation comparisons with independent datasets. The dataset presented here is not intended to be comprehensive or a representative sample of cities in general, as the choice of cities is based on self-reporting not a designed sampling procedure.
Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane
from
The class on which this book is based was created in response to a very real challenge: how to introduce new ideas and methodologies about economic and social measurement into a workplace focused on producing high-quality statistics. We are deeply grateful for the inspiration and support of Census Bureau Director John Thompson and Deputy Director Nancy Potok in designing and implementing the class content and structure.