Bloomberg Philanthropies is introducing a new certification initiative that aims to set the national standard for how local governments enact evidence-based policies. Think of it as a Good Housekeeping Seal, or LEED certification, but for data-smart cities.
The New York Times, Mike McPhate, Derek Watkins and Jim Wilson
Knowing with precision how much snow has accumulated is crucial for farmers and water managers.
That’s where a mapping project at NASA’s Jet Propulsion Laboratory known as the Airborne Snow Observatory comes in. Using measurements gathered by specialized instruments on a plane, scientists have been able to gain an unprecedented understanding of the amount of water present in the Sierra’s snow.
Toronto will soon get the Vector Institute for Artificial Intelligence, geared to fuelling “Canada’s amazing AI momentum”.
The new research facility, which will be officially launched on Thursday, will be dedicated to expanding the applications of AI by through explorations in deep learning and other forms of machine learning. It has received about C$170m (US$127m/£102m) in funding from the Canadian and Ontario governments and a group of 30 businesses, including Google and RBC.
New research by Warwick Analytics shows that the number of data scientists in the UK looks to grow by around 50 percent in 2017. However, skilled resources are scarce and a key constraint is the amount of time data scientists manually spend processing data. This means that the key to unlocking the full potential of the UK data-science market is not just to train more data scientists, but to speed up a lot of the manual processes that transform and prepare data for analysis.
The change illustrates the new skepticism with which major marketers are approaching online ad platforms and the automated technology placing their brands on millions of websites. In recent years, advertisers have increasingly shunned buying ads on individual sites in favor of cheaply targeting groups of people across the web based on their browsing habits, a process known as programmatic advertising — enabling, say, a Gerber ad to show up on a local mother’s blog, or a purse in an online shopping cart to follow a person around the internet for weeks.
The chip maker Nvidia is riding the current artificial-intelligence boom with hardware designed to power cutting-edge learning algorithms. And the company sees health care and medicine as the next big market for its technology.
Kimberly Powell, who leads Nvidia’s efforts in health care, says the company is working with medical researchers in a range of areas and will look to expand these efforts in coming years.
Hedge funds have been trying to teach computers to think like traders for years.
Now, after many false dawns, an artificial intelligence technology called deep learning that loosely mimics the neurons in our brains is holding out promise for firms. WorldQuant is using it for small-scale trading, said a person with knowledge of the firm. Man AHL may soon begin betting with it too. Winton and Two Sigma are also getting into the brain game.
The quant firms hope this A.I. — a kind of machine learning on steroids — will give them an edge in the escalating technological arms race in global finance.
Responding to the elephant poaching crisis illustrated in 2016’s Great Elephant Census (GEC), philanthropist Paul G. Allen and his team of technologists and conservation experts are partnering with park managers across Africa to provide a new technology platform to better protect this iconic species and other wildlife threatened by human activities.
A right to data. It’s worth taking a moment to let that sink in. This is not a random luxury, but a right we are all entitled to.
Open data requires open data journalism to make sense of it all, of course. Journalists are uniquely placed to make sense of the numbers and open them up for readers desperate for facts and a greater understanding of what’s going on around them. Whether it’s crime data or health statistics, the promise of open data was too great to be fulfilled in just a few years but promised a new era of greater awareness for all of us.
arXiv, Physics > Physics and Society; Eitan Adam Pechenick, Christopher M. Danforth, Peter Sheridan Dodds
It is tempting to treat frequency trends from the Google Books data sets as indicators of the “true” popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclusions about the evolution of cultural perception of a given topic, such as time or gender. However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. With this understood, the Google Books corpus remains an important data set to be considered more lexicon-like than text-like. Here, we show that a distinct problematic feature arises from the inclusion of scientific texts, which have become an increasingly substantive portion of the corpus throughout the 1900s. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We highlight these dynamics by examining and comparing major contributions to the statistical divergence of English data sets between decades in the period 1800–2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts, in clear contrast to the first version of the fiction data set and both unfiltered English data sets. Our findings emphasize the need to fully characterize the dynamics of the Google Books corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.
IEEE Spectrum, Raja Chatila, Kay Firth-Butterfield, John C. Havens and Konstantinos Karachalios
Algorithms with learning abilities collect personal data that are then used without users’ consent and even without their knowledge; autonomous weapons are under discussion in the United Nations; robots stimulating emotions are deployed with vulnerable people; research projects are funded to develop humanoid robots; and artificial intelligence-based systems are used to evaluate people. One can consider these examples of AI and autonomous systems (AS) as great achievements or claim that they are endangering human freedom and dignity.
We need to make sure that these technologies are aligned to humans in terms of our moral values and ethical principles to fully benefit from the potential of them.
Columbia University President Lee Bollinger welcomed staff and press to a preview of the first two buildings at the university’s new Manhattanville Campus in Harlem. At more than seventeen acres in size, this ambitious and controversial project has been in the making for nearly fifteen years, the largest expansion in over a century. Yet it is clear the university hopes that the Manhattanville Campus will not only provide a much-needed, modernized counterpart to its McKim, Mead and White campus in Morningside Heights but also serve as a community partner for the neighborhood.
Recently, astronomers announced the discovery that a star called TRAPPIST-1 is orbited by seven Earth-size planets. Three of the planets reside in the “habitable zone,” the region around a star where liquid water is most likely to exist on the surface of a rocky planet. Other potentially habitable worlds have also been discovered in recent years, leaving many people wondering: How do we find out if these planets actually host life?
For the first time, computers have taught themselves how to cooperate in games in which the objective is to reach the best possible outcome for all players. The feat is far harder than training artificial intelligence (AI) to triumph in a win-lose game such as chess or checkers, researchers say. The advance could help enhance human-machine cooperation.
The emphasis on machine learning scalability has often been focused on node counts in the past for single-model runs. This is useful for some applications, but as neural networks become more integrated into existing workflows, including those in HPC, there is another way to consider scalability. Interestingly, the lesson comes from an HPC application area like weather modeling where, instead of one monolithic model to predict climate, an ensemble of forecasts run in parallel on a massive supercomputer are meshed together for the best result. Using this ensemble method on deep neural networks allows for scalability across thousands of nodes, with the end result being derived from an average of the ensemble–something that is acceptable in an area that does not require the kind of precision (in more ways than one) that some HPC calculations do.
This approach has been used on the Titan supercomputer at Oak Ridge, which is a powerhouse for deep learning training given its high GPU counts. Titan’s 18,688 Tesla K20X GPUs have proven useful for a large number of scientific simulations and are now pulling double-duty on deep learning frameworks, including Caffe, to boost the capabilities of HPC simulations (classification, filtering of noise, etc.).
Projects will be selected by the Incubator Advisory Board with guidance from the NYU Game Center Faculty and Staff. Online applications will be reviewed first. Deadline for applications is Sunday, April 16. A subset of the online applications will be invited to pitch in-person at The Game Center on the evening of April 26.
Bud Mishra, an affiliated professor at CDS, Gelin Gao (NYU Courant), and Daniele Ramazzotti (Stanford) have therefore suggested a new method for stress testing financial portfolios in their newly released paper, Efficient Simulation of Financial Stress Testing Scenarios with Suppes-Bayes Causal Networks (SBCNs).
“In this post we will look at a recently developed architecture, Adversarial Autoencoders, which are inspired in VAEs, but give us more flexibility in how we map our data into a latent dimension (if this is not clear as of now, don’t worry, we will revisit this idea along the post). One of the most interesting ideas about Adversarial Autoencoders is how to impose a prior distribution to the output of a neural network by using adversarial learning.”
Re-reading all of these posts made me wonder about putting some scrabble data together into a package so myself and others could do some fun data analysis. The results is the package scrabblr. In this package I decided to collate every turn played by two ‘expert’ level computer sims when playing against each other.
arXiv, Astrophysics > Instrumentation and Methods for Astrophysics; Daniel Foreman-Mackey, Eric Agol, Ruth Angus, Sivaram Ambikasaran
The growing field of large-scale time domain astronomy requires methods for probabilistic data analysis that are computationally tractable, even with large datasets. Gaussian Processes are a popular class of models used for this purpose but, since the computational cost scales as the cube of the number of data points, their application has been limited to relatively small datasets. In this paper, we present a method for Gaussian Process modeling in one-dimension where the computational requirements scale linearly with the size of the dataset. We demonstrate the method by applying it to simulated and real astronomical time series datasets. These demonstrations are examples of probabilistic inference of stellar rotation periods, asteroseismic oscillation spectra, and transiting planet parameters. The method exploits structure in the problem when the covariance function is expressed as a mixture of complex exponentials, without requiring evenly spaced observations or uniform noise. This form of covariance arises naturally when the process is a mixture of stochastically-driven damped harmonic oscillators – providing a physical motivation for and interpretation of this choice – but we also demonstrate that it is effective in many other cases. We present a mathematical description of the method, the details of the implementation, and a comparison to existing scalable Gaussian Process methods. The method is flexible, fast, and most importantly, interpretable, with a wide range of potential applications within astronomical data analysis and beyond. We provide well-tested and documented open-source implementations of this method in C++, Python, and Julia.
“This post is the first in a two-part series on stock data analysis using R, based on a lecture I gave on the subject for MATH 3900 (Data Science) at the University of Utah. In these posts, I will discuss basics such as obtaining the data from Yahoo! Finance using pandas, visualizing stock data, moving averages, developing a moving-average crossover strategy, backtesting, and benchmarking.”