Data Science newsletter – January 5, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for January 5, 2018

GROUP CURATION: N/A

 
 
Data Science News



Transitioning From Academia to Industry: Perspectives from Indeed’s Data Scientists

Medium, Indeed Data Science, Robyn Rap


from

I still remember the moment I told my advisor that I was considering leaving academia. The stress. The fear. Saying the words, “I don’t think I want this for myself” out loud. And afterwards, the relief.

At the time, I was juggling work on my dissertation proposal, multiple publications, teaching responsibilities, and a research assistantship. Meanwhile, I had begun researching data science, a field where I could use the skills I had learned in my doctoral program, but in a setting that better suited how I wanted to live and work. So, I made the decision to leave academia, pivot my skillset, and look for data science work in the private sector.

Of course, my experience was not unique


Are your machine learning models good enough?

Eric D. Brown


from

While machine learning, deep learning and big data can provide an enormous amount of value to an organization, there is ample opportunity to mess things up dramatically. There are plenty of times where small errors (and even massive errors) can be introduced into the process. For example, during the data munging / exploration phase, a simple error can introduce changes in the data, which could cause massive changes in the results of any modeling.

Additionally, bias can easily be introduced to the process (either on purpose or by accident). This bias can push the results to tell a story that people want the data / models to tell. It is very easy to fall into the “let’s use statistics to support our view” trap that many fall into. Rather than look for data and/or outputs to support your view (and hence build an illusion), your machine learning initiatives (and any other data projects) should be as bias free as possible.


AI System Sorts News Articles By Whether or Not They Contain Actual Information

VICE, Motherboard, Michael Byrne


from

In a recent paper published in the Journal of Artificial Intelligence Research, computer scientists Ani Nenkova and Yinfei Yang, of Google and the University of Pennsylvania, respectively, describe a new machine learning approach to classifying written journalism according to a formalized idea of “content density.” With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles.


New Report Highlights Growing Role of Learning Outcomes in U.S. Doctoral Education

US Council of Graduate Schools


from

Universities and the graduate education community are paying closer attention to the intended learning outcomes of doctoral education, according to a new report from the Council of Graduate Schools (CGS). Once associated with undergraduate education, learning outcomes—the knowledge, skills, attitudes and competencies that a degree holder can expect to attain by the end of a degree program—are becoming more central to graduate programs, including PhD programs. The finding is based on a CGS study supported by Lumina Foundation that involved surveys, interviews with leaders in higher education, and a day-long convening of higher education leaders.

According to a 2016 CGS Survey of graduate schools, the majority (65%) of responding institutions reported that all or most of their doctoral programs had developed learning outcomes. The report posits that this widespread use is at least partly tied to an increasing interest on the part of accreditors in documenting and measuring these outcomes. A CGS poll of chief officers of accrediting bodies that accredit doctoral programs found that nearly three out of four accreditors (72%) believe they are paying closer attention to outcomes assessment in doctoral education than they did in 2011.


Integrating two types of crop models to predict the effect of climate change on crop yields

University of Illinois Urbana-Champaign


from

Scientists now have a new tool to predict the future effects of climate change on crop yields. Researchers from University of Illinois are attempting to bridge two types of computational crop models to become more reliable predictors of crop production in the U.S. Corn Belt.

“One class of crop models is agronomy-based and the other is embedded in climate models or earth system models. They are developed for different purposes and applied at different scales,” says Kaiyu Guan, an environmental scientist at the University of Illinois and the principal investigator on the research. “Because each has its own strengths and weaknesses, our simple idea is to combine the strengths of both types of models to make a new crop model with improved prediction performance.”


[1801.00631] Deep Learning: A Critical Appraisal

arXiv, Computer Science > Artificial Intelligence; Gary Marcus


from

Although deep learning has historical roots going back decades, neither the term “deep learning” nor the approach was popular just over five years ago, when the field was reignited by papers such as Krizhevsky, Sutskever and Hinton’s now classic (2012) deep network model of Imagenet. What has the field discovered in the five subsequent years? Against a background of considerable progress in areas such as speech recognition, image recognition, and game playing, and considerable enthusiasm in the popular press, I present ten concerns for deep learning, and suggest that deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.


Source{d} Applies Machine Learning to Help Companies Manage Their Code Bases

The New Stack, Susan Hall


from

If you go to GitHub, the most popular developer platform today, and search for a piece of code, it is a plain-text search.

“It’s like how we used to search on the web in 1996,” said Eiso Kant, CEO and co-founder at source{d}, a startup focused on applying machine learning on top of source code.

“We have been writing trillions of lines of source code across the world, but none of the systems or developer tools or programming languages we’ve designed actually learn from all the source code we have written.”


Combining Experiments, Models Boosts Social Behavior Research

North Carolina State University, NC State News


from

Researchers from North Carolina State University and Northwestern University are outlining a new approach to behavioral research that draws on experimental studies and computer models to offer new insights into organizational and group behavior.

“Social research has a history of using both small-scale experiments and computer models to explore questions about human behavior – but there are very few examples of how to use these two techniques in concert,” says William Rand, a computer scientist and assistant professor of business management in NC State’s Poole College of Management who is co-lead author of a paper describing the work.

“This paper details an approach that we feel capitalizes on the best aspects of both research techniques to advance our understanding of the behavior of large groups and advance the field,” says Ned Smith, an associate professor of management and organizations at Northwestern University’s Kellogg School of Management, who is co-lead author of the paper.


What is Chicago’s Array of Things Initiative?

GovTech, Data-Smart City Solutions, Sean Thornton


from

If you’re a frequent reader of all things civic tech, then you may have already come across the Array of Things (AoT). Launched in 2016, the project, which consists of a network of sensor boxes mounted on light posts, has now begun collecting a host of real-time data on Chicago’s environmental surroundings and urban activity. After installing a small number of sensors downtown and elsewhere in 2016, Chicago is now adding additional sensors across the city and the city’s data portal currently lists locations for all of AoT’s active and yet-to-be installed sensors. Next year, data collected from AoT will be accessible online, providing valuable information for researchers, urban planners, and the general public.

AoT’s public engagement campaign has been picking up steam as well, with a recent community event held this fall. As a non-proprietary project, AoT is being implemented as a tool to improve not just urban planning and sustainability efforts, but quality of life for residents and communities. To engage with the public, project leaders have held meetings and workshops to build relationships with residents and identify community priorities.


Satellites Predict a Cholera Outbreak Weeks in Advance

Scientific American, Sarah Derouin


from

A test in Yemen showed satellite data could foresee an outbreak four weeks before it exploded


Weather satellite and scientists may face funding drought despite devastating hurricanes

TheHill; Jordan Gerth


from

In November, the first government satellite in the Joint Polar Satellite System series, JPSS-1, an effort of the National Oceanic and Atmospheric Administration (NOAA) and the National Aeronautics and Space Administration (NASA), was successfully launched from Vandenberg Air Force Base in California.

And it began its orbit just in time. A month later, wildfires ravaged southern California and arctic cold beset the Midwest and East Coast just as the JPSS-1 instruments were starting to capture imagery. In doing so, JPSS-1 advances the capabilities of a constellation of civilian weather-monitoring satellites that orbit the earth from pole to pole.

The recurrence of extreme weather is nearly assured. Funding for the JPSS program is not. The JPSS program will continue until 2038 — if Congress continues to fund it.


A Saucy App Knows China’s Taste in News. The Censors Are Worried.

The New York Times, Raymond Zhong


from

One of the world’s most valuable start-ups got that way by using artificial intelligence to satisfy Chinese internet users’ voracious appetite for news and entertainment. Every day, its smartphone app feeds 120 million people personalized streams of buzzy news stories, videos of dogs frolicking in snow, GIFs of traffic mishaps and listicles such as “The World’s Ugliest Celebrities.”

Now the company is discovering the risks involved, under China’s censorship regime, in giving the people exactly what they want.

The makers of the popular news app Jinri Toutiao unveiled moves this week to allay rising concerns from the authorities. Last week, the Beijing bureau of China’s top internet regulator accused Toutiao of “spreading pornographic and vulgar information” and “causing a negative impact on public opinion online,” and it ordered that updates to several popular sections of the app be halted for 24 hours.

In response, the app’s parent company, Beijing Bytedance Technology, took down or temporarily suspended the accounts of more than 1,100 bloggers that it said had been publishing “low-quality content” on the app. It also replaced Toutiao’s “Society” section with a new section called “New Era,” which is heavy on state media coverage of government decisions.


FDA Seeks Patient Engagement for Clinical Trials, Precision Medicine

HealthIT Analytics, Jessica Kent


from

The FDA is seeking representatives for its Patient Engagement Collaborative, which will work to involve patients in activities such as clinical trials and precision medicine efforts.


Company Data Science News

All Intel chips released since 1995 contain a security vulnerability called Meltdown. Typically, malware exploits target software, but this time the problem is in the way chip architecture has been designed to speed things up. The fear is now two-fold: 1. if this security weakness has been exploited, heaps and heaps of sensitive info (e.g. anything you’ve ever done or stored on your computer) could be in bad hands. That’s one of those giant amorphous fears known for inducing generalized anxiety across broad swaths of the population. (Did you think you could crawl away from generalized population-level anxiety in 2018? Why? Why would you think that?) 2. The patches available for this problem may slow processing times. This is potentially not that bad for your internet browsing and Netflix binging habits, but it is hair-pullingly frustrating for data scientists trying to run cloud apps.

There is another chip-based security gap called Spectre that impacts more chips, cannot be patched, but can only read random snippets of your computer’s memory at a time.



Thank you to the many who were frantically scrambling to address this security problem in December. Huge debts of gratitude to the researchers who discovered these flaws: Jann Horn, Daniel Gruss, Moritz Lipp, Yuval Yarom, Paul Kocher, Daniel Genkin, Michael Schwarz, Mike Hamburg, Stefan Mangard, Thomas Prescher, and Werner Haas. You are way better than James Bond, in my opinion.



Nissan is developing a technology that would read drivers’ brain waves to help them make evasive maneuvers faster by 0.2 to 0.5 precious seconds. This is much more compelling than working on a technique to shave milliseconds off a financial or consumer transaction.

Google “is using deep learning to predict a person’s blood pressure, age and smoking status by analyzing a photograph of their retina” using convolutional neural nets. See the arXiv paper here.

Envision Genomics internist and CEO Grace Terrell, MD asserts that “patients are not prototypes, and no one is average.” That’s why she started a precision genomics company, to address the needs of patients with rare diseases, unique reactions to typical protocols that work for the average patient, and missed diagnoses. As I have been saying for quite some time, expect to see market-ready precision medicine. This area is one of the leaders when it comes to delivering on the promise of data science (notice I did not say artificial intelligence). As with any data-driven decision making process, there are concerns with data collection integrity and quality control.



EY (formerly Ernst & Young) ran a survey that found there is a talent shortage in data science (though they called it AI). I hope they didn’t spend too much on that part of the survey, because: obvious. However, they took the opportunity to ask about the impact of the lack of gender diversity on machine learning biases. Forty-one percent of respondents were concerned. Smart to worry. If you work at a company that wants to run through a workshop on avoiding biases in machine learning, you know where to find me.



Because there’s a bookie for every unpredictable situation, note that Austin and Atlanta have the highest odds (3-to-1) of landing Amazon HQ2 according to an Irish betting site. Boston is in third with 7 to 1 odds and The Boston Globe ran a story on the backroom politics that went into the city’s courtship of Bezos’ behemoth.



Mapzen, a 40 person mapping start-up, is shutting down as of 1 February 2018. It’s tough to be a start-up. At least all their software is open source [GitHub]. Planetizen has more of the context on the shutdown.



In start-ups that are going somewhere: source{d} is looking at code as if it is natural language (so true!) to help characterize what a given piece of code — or the entire corpus of code in public repos on GitHub — can do. To quote the aptly named co-founder Eiso Kant, “When we look at the future of search, the future of code suggestion, the future of compilers, it comes a lot down to understanding natural language, understanding what the intent of the developer actually is and what they’re trying to do with a piece of code that they’re writing.” Such a cool problem.


Ancient Americans arrived in a single wave, Alaskan infant’s genome suggests

Science, Michael Price


from

A rare smidgen of ancient DNA has sharpened the picture of one of humanity’s greatest migrations. Some 15,000 to 25,000 years ago, people wandered from Asia to North America across a now-submerged land called Beringia, which once connected Siberia and Alaska. But exactly when these ancient settlers crossed and how many migrations occurred are hotly debated. Now, the oldest full genome to be sequenced from the Americas suggests that some settlers stayed in Beringia while another group headed south and formed the population from which all living Native Americans descend.

“This is an important study that significantly narrows the subset of possibilities [for how the Americas were peopled],” says David Reich, a geneticist at Harvard Medical School in Boston. “It’s very exciting.”


As Flow of Foreign Students Wanes, U.S. Universities Feel the Sting

The New York Times, Stephanie Saul


from

Just as many universities believed that the financial wreckage left by the 2008 recession was behind them, campuses across the country have been forced to make new rounds of cuts, this time brought on, in large part, by a loss of international students.

Schools in the Midwest have been particularly hard hit — many of them non-flagship public universities that had come to rely heavily on tuition from foreign students, who generally pay more than in-state students.


China is building a giant $2.1 billion AI research park

CNBC, Arjun Kharpal


from

China is planning to build a 13.8 billion yuan ($2.1 billion) technology park dedicated to developing artificial intelligence (AI), state-backed news agency Xinhua reported Wednesday.

The campus will be constructed within five years and situated in the suburban Mentougou district in western Beijing. It will cover 54.87 hectares, Xinhua said.

The technology park will be home to around 400 businesses and is expected to create an annual output value of about 50 billion yuan.


Delaware to sue EPA over upwind air pollution

Reuters


from

The state of Delaware on Tuesday said it would sue the U.S. Environmental Protection Agency for failing to ensure that nearby Pennsylvania and West Virginia curbed upwind air pollution from their power plants.

“The Clean Air Act entitles Delaware to relief from upwind pollution and the remedy we are seeking is reasonable and within EPA’s authority and responsibility to grant,” said Delaware Governor John Carney.

Delaware says more than 90 percent of ground-level ozone, or smog, in Delaware originates from emissions in upwind states. Despite previously petitioning the EPA four times in 2016 to crack down on the upwind pollution, the governor said the state has not gotten relief.


Deep learning sharpens views of cells and genes

Nature, News, Amy Maxmen


from

“It was unrealistic to apply machine learning to many areas of biology before,” says Philip Nelson, a director of engineering at Google Research in Mountain View, California. “Now you can — but even more exciting, machines can now see things that humans might not have seen before.”

Convolutional neural networks allow computers to process an image efficiently and holistically, without splitting it into parts. The approach took off in the tech sector around 2012, enabled by advances in computer power and storage; for example, Facebook uses this type of deep learning to identify faces in photographs. But scientists struggled to apply the networks to biology, in part because of cultural differences between fields. “Take a group of smart biologists and put them in a room of smart computer scientists and they will talk two different languages to each other, and have different mindsets,” says Daphne Koller, chief computing officer at Calico — a biotechnology company in San Francisco, California, that is backed by Google’s parent, Alphabet.


Mapzen to Close at the End of January

Planetizen, James Brasuell


from

The mapping community was sad to learn this week that the Mapzen open source mapping platform would cease operation at the end of the month. Many of Mapzen’s open source tools will live on, for those that now how to use them.

 
Events



Boundary Spanning: Advances in Socio-Environmental Systems Research – ​An International Symposium

SESNYC


from

Annapolis, MD June 11-13. “An international symposium to bring together leaders, emerging scholars, and other key individuals interested in innovating research and processes for solving socio-environmental problems.”

 
Deadlines



Data & Society Fellows Program

Data & Society is assembling its fifth class of fellows to join us from September 1. Deadline to apply is January 15.

From AO to AI: Predicting How Points End in Tennis

Tennis Australia’s Game Insight Group is seeking an automated solution for predicting the type of point ending in tennis using tracking data. This contest represents the first public release of a large amount of tracking data from professional tennis matches. A successful solution has the potential to revolutionize the way that tennis uses data science to collect match statistics and make a huge impact on the sport.” Deadline for submissions is January 23.

L’Oréal USA Fellowships for Women in Science

The annual program recognizes and rewards five U.S.-based women researchers at the beginning of their scientific careers. “Recipients each receive up to $60,000 that must be put towards their postdoctoral research.” Deadline for applications is February 2.

Leave a Comment

Your email address will not be published.