ASTRONOMER MEREDITH RAWLS was in an astronomy master’s program at San Diego State University in 2008 when her professor threw a curveball. “We’re going to need to do some coding,” he said to her class. “Do you know how to do that?”
Not really, the students said.
And so he taught them—at lunch, working around their regular class schedule. But what he meant by “coding” was Fortran, a language IBM developed in the 1950s. Later, working on her PhD at New Mexico State, Rawls decided her official training wasn’t going to cut it. She set out to learn a more modern language called Python, which she saw other astronomers switching to. “It’s going to suck,” she remembers telling herself, “but I’m just going to do it.”
And so she started teaching herself, and signed up for a workshop called SciCoder.
University of Oxford, Oxford Internet Institute, The Policy and Internet Blog
from
Community-based approaches are widely employed in programmes that monitor and promote socioeconomic development. And building the “capacity” of a community — i.e. the ability of people to act individually or collectively to benefit the community — is key to these approaches. The various definitions of community capacity all agree that it comprises a number of dimensions — including opportunities and skills development, resource mobilization, leadership, participatory decision making, etc. — all of which can be measured in order to understand and monitor the implementation of community-based policy. However, measuring these dimensions (typically using surveys) is time consuming and expensive, and the absence of such measurements is reflected in a greater focus in the literature on describing the process of community capacity building, rather than on describing how it’s actually measured.
A cheaper way to measure these dimensions, for example by applying predictive algorithms to existing secondary data like socioeconomic characteristics, socio-demographics, and condition of housing stock, would certainly help policy makers gain a better understanding of local communities.
In order to provide personalized ads, tech giants such as Google and Facebook are trying to abstract their users’ personality from their posts on social media. Hence, it is essential for social networking applications to predict personality from written text. However, it requires too much input data to be realistically used. In this paper, the authors developed a model that can predict personality with reduced data requirement. The model achieves better performance than state-of-the-art techniques while requiring 8 times less data.
A computer scientist discusses the evolution of computational linguistics and where it’s headed next. He was recently named the Thomas M. Siebel Professor in Machine Learning.
Farm equipment company John Deere is no stranger to the internet of things. It was connecting sensors and actuators on the farm twenty years ago. The next big thing in farming is what connected devices enable: precision agriculture. Precision Agriculture combines connected with devices with machine learning to have them make faster and more precise decisions, possibly without a farmer’s input.
To make precision ag the new reality, John Deere needs Silicon Valley skills. That’s why last week it opened an office in the SoMA area of San Francisco to connect with local talent.
The New York Times, The Upshot blog, Aaron E. Carroll
from
esearch is hard, and rarely perfect. A better understanding of methodology, and the flaws inherent within, might yield more reproducible work.
The research environment, and its incentives, compound the problem. Academics are rewarded professionally when they publish in a high-profile journal. Those journals are more likely to publish new and exciting work. That’s what funders want as well. This means there is an incentive, barely hidden, to achieve new and exciting results in experiments.
Some researchers may be tempted to make sure that they achieve “new and exciting results.” This is fraud. As much as we want to believe it never happens, it does. Clearly, fabricated results are not going to be replicable in follow-up experiments.
Ballmer’s NBA team, the Los Angeles Clippers, will partner with Los Angeles-based startup Second Spectrum to release a new product that overlays the traditional TV broadcast with new data and animations largely driven by artificial intelligence.
Led by two former USC professors, Second Spectrum has built technology that essentially allows computers to watch sports and track player/ball movement at a granular level. It then applies machine learning and AI to help derive new insights for coaches, players, and even media-related customers.
When machines control all the world’s finances and run factory floors, what will humans be left to do?
We’ll make art, says Kai-Fu Lee, a former Google and Microsoft executive who has since launched VC firm Sinovation Ventures.
“Art and beauty is very hard to replicate with AI. Given AI is more objective, analytical, data driven, maybe it’s time for some of us to switch to the humanities, liberal arts, and beauty,” Lee told Quartz editor-in-chief Kevin Delaney during a live Q&A session. “Maybe professions where it’s hard to find a job might be good to study.”
The New York Times, Jess Bidgood and Jeremy B. Merrill
from
“There’s a lot of discussion about it, both inside a department as well as across the field,” said Randy H. Katz, a professor in the electrical engineering and computer science department at the University of California, Berkeley, who discovered in one year that about 100 of his roughly 700 students in one class had violated the course policy on collaborating or copying code.
Computer science professors are now delivering stern warnings at the start of each course, and, like colleagues in other subjects, deploy software to flag plagiarism. They have unearthed numerous examples of suspected cheating.
At Brown University, more than half the 49 allegations of academic code violations last year involved cheating in computer science.
Only a few outliers in the banking sector, such as Capital One, have been able to ship AI products as quickly as their counterparts in Silicon Valley. While many financial institutions have publicly announced ambitious plans to integrate artificial intelligence and machine learning, customers are still waiting months later for these proposed products and services.
So why are banks – who are typically the most capable and tech-intensive players in the business world – acting like Luddites with AI? And how can AI entrepreneurs and developers building products for the industry nail their pitches and drive home deals?
Patients play Litesprite’s video game, and clinics get information about which coping mechanisms seem to work best for each person. Health professionals can then decide which treatment options are best.
London, England September 21-22 September at Alan Turing Institute. This workshop aims to address the gap between research methodologies in NLP/ML and the humanities and the social sciences. Deadline for attendee applications is June 7.
“Trifacta has released Principles of Data Wrangling: Practical Techniques for Data Preparation, the first how-to guide on data wrangling. But why should you read this book? It’s simple – because your time is as valuable as your data.” [free to download]
University of California-Berkeley, RISELab, Joe Hellerstein
from
“Part of the Berkeley tradition—and the RISELab mission—is to release open source software as part of our research agenda. Six months after launching the lab, we’re excited to announce initial v0.1 releases of three RISElab open-source systems: Clipper, Ground and Ray.”
“How do I load my multiple gigabyte data file? Algorithms crash when I try to run my dataset; what should I do? Can you help me with out-of-memory errors? In this post, I want to offer some common suggestions you may want to consider.”
“The fundamental unit in PyTorch is the Tensor. This post will serve as an overview for how we implement Tensors in PyTorch, such that the user can interact with it from the Python shell. In particular, we want to answer four main questions:”
1. How does PyTorch extend the Python interpreter to define a Tensor type that can be manipulated from Python code?
2. How does PyTorch wrap the C libraries that actually define the Tensor’s properties and methods?
3. How does PyTorch cwrap work to generate code for Tensor methods?
4. How does PyTorch’s build system take all of these components to compile and generate a workable application?