Data Science newsletter – January 28, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for January 28, 2019

GROUP CURATION: N/A

 
 
Data Science News



CS+X major program to no longer accept new students

Stanford Daily, Elena Shao and Emily Wan


from

Just four years after its inception, the CS+X joint major pilot program will be “discontinued” due to limited interest. The academic program allows students to attain a Bachelor’s of Arts and Sciences degree in computer science and a humanities discipline.

All students currently enrolled in the program will be allowed to complete their degrees, according to an email sent by CS+X faculty to a declared student. The University later clarified in a statement that the students who plan to declare the CS+X joint major must do so by June 18, 2019, the end of spring quarter, at which point no new CS+X major declarations will be accepted.

Since its launch as a six-year pilot program in fall 2014, the CS+X has expanded from two joint majors, CS+English and CS+Music, to 14 majors combining computer science with various humanities disciplines. However, the program conferred fewer than 15 degrees in the 2017-2018 academic year and has drawn criticism for its lack of depth and interdisciplinary cohesion.


Coding competencies: Princeton’s evolving approach to teaching software best practices to graduate students

Princeton University, Office of Communications


from

Princeton wants to shake up how graduate students learn the basic tools and principles of writing good code for computationally intensive scientific research, and Gabe Perez-Giz is emphatically on board.


Virginia Tech Program Trains Data Scientists to Solve Urban Problems

Diverse Issues in Higher Education, LaMont Jones


from

Data will become a more important part of problem-solving as America’s urban centers continue to gain a greater share of the U.S. population, and students at Virginia Tech are part of a forward-looking research program that uses data in an interdisciplinary approach to address areas of concern to urban planners, government policymakers and local communities.

Virginia Tech created UrbComp, a 12-credit urban computing graduate certificate program, to train students to become interdisciplinary and ethical data scientists who will help society cope with problems that will become more complex in areas from transportation to epidemiology as more American cities grow into vast metropolitan areas over the next 20 years.


In the dark: The cost of Canada’s data deficit

The Globe and Mail, Eric Andrew-Gee and Tavia Grant


from

When it comes to basic data about its own citizens – from divorce rates to driving patterns to labour trends – Canada simply doesn’t have the answers. If information is power, this country has a big problem.


A new study suggests Amazon’s Rekognition software sucks at detecting dark-skinned women

Orlando Weekly, Bloggytown blog, Monivette Cordeiro and Joey Roulette


from

Amazon’s facial Rekognition software, a brand which is being tested in Orlando as a surveillance system, misidentifies darker-skinned women as men roughly a third of the time, according to a new study.

In tests led by MIT Media Lab researcher Joy Buolamwini and University of Toronto researcher Deborah Raji, the study published Thursday looked at face-scanning software from Amazon, Microsoft, IBM , Face++ and the Florida-based company Kairos.

Amazon’s facial analysis software made no errors in identifying the faces of lighter-skinned men. But it misidentified the gender of lighter-skinned women about 7 percent of the time.


How Bots Ruined Buying Sneakers

Complex, Tommy Battle


from

I’ve never seen a business where loyal customers anticipate not being able to buy anything.

In fact, lottery culture is as synonymous with sneaker and streetwear culture as the product retailers shell out on a weekly basis. But you’re not in line against the kid who spends all night camped out with his friends, now you’re up against against technology, where the camp-outs no longer work. The process is nearly sadistic at this point.


Advances in weather prediction

Science, Perspective; Richard B. Alley, Kerry A. Emanuel and Fuqing Zhang


from

Several features of the weather system are more persistent than day-to-day weather, allowing accurate predictions further into the future, from subseasonal to seasonal, annual, and interannual time scales and beyond, with even greater scope for improvement. For example, the Madden-Julian Oscillation (MJO) moves eastward around the tropics over 30 to 90 days, affecting rain, wind, clouds, air pressure, the onset and demise of summer monsoons, and more, with important agricultural and other implications. Weather prediction models have now shown predictive skills for the MJO phenomena up to 5 weeks (5).

In parallel with improving forecasts, communication of the growing wealth of weather data has expanded greatly, enabling a timely flow of ever more detailed and accurate information to a rich diversity of users.


DeepMind Beats Pros at StarCraft in Another Triumph for Bots

WIRED, Business, Tom Simonite


from

In London last month, a team from Alphabet’s UK-based artificial intelligence research unit DeepMind quietly laid a new marker in the contest between humans and computers. On Thursday it revealed the achievement in a three-hour YouTube stream, in which aliens and robots fought to the death.

DeepMind’s broadcast showed its artificial intelligence bot, AlphaStar, defeating a professional player at the complex real-time strategy videogame StarCraft II. Humanity’s champion, 25-year-old Grzegorz Komincz of Poland, lost 5-0. The machine-learning-powered software appeared to have discovered strategies unknown to the pros who compete for millions of dollars in prizes offered each year in one of e-sports’ most lucrative games. “It was different from any StarCraft that I have played,” said Komincz, known professionally as MaNa.


Oracle Opens Next-Gen Data Center in Canada

RTInsights, Sue Walsh


from

Oracle has opened a data center in Toronto, Canada to support demand in the region for their public cloud, Oracle Cloud Infrastructure. The next-generation cloud infrastructure offers flexibility to companies, allowing them to run both cloud and traditional workloads on the same platform. Oracle’s cloud regions give them the ability to offer a broad, fast-growing suite of cloud applications, security, blockchain, and artificial intelligence services and Oracle Autonomous Database on enterprise-grade cloud infrastructure.


Automation and Artificial Intelligence: How machines are affecting people and places

The Brookings Institution; Mark Muro, Robert Maxim and Jacob Whiton


from

At first, technologists issued dystopian alarms about the power of automation and artificial intelligence (AI) to destroy jobs. Then came a correction, with a wave of reassurances. Now, the discourse appears to be arriving at a more complicated understanding, suggesting that automation will bring neither apocalypse nor utopia, but instead both benefits and stress alike. Such is the ambiguous and sometimes disembodied nature of the “future of work” discussion.

Hence the analysis presented here. Intended to bring often-inscrutable trends down to earth, the following report develops both backward and forward-looking analyses of the impacts of automation over the years 1980 to 2016 and 2016 to 2030 to assess past and upcoming trends as they affect both people and communities in the United States.

The report focuses on areas of potential occupational change rather than net employment losses or gains. Special attention is applied to digging beneath national top-line statistics to explore industry, geographical, and demographic variations.


Apple car layoffs signal focus on AI

CNBC, Steve Kovach


from

It’s not often you hear about layoffs at Apple.

So it came as a surprise Wednesday when CNBC learned that Apple was removing 200 employees from its self-driving car unit. Apple confirmed the staffing change, but reading between the lines of a spokesperson’s statement, it sounds like the move is the latest in the company’s broader goal to improve its artificial intelligence and machine learning capabilities as it faces increased competition from rivals Google and Amazon.

“As the team focuses their work on several key areas for 2019, some groups are being moved to projects in other parts of the company, where they will support machine learning and other initiatives, across all of Apple,” the company spokesperson said in a statement to CNBC Wednesday.


[1901.08152] Three principles of data science: predictability, computability, and stability (PCS)

arXiv, Statistics > Machine Learning; Bin Yu, Karl Kumbier


from

We propose the predictability, computability, and stability (PCS) framework to extract reproducible knowledge from data that can guide scientific hypothesis generation and experimental design. The PCS framework builds on key ideas in machine learning, using predictability as a reality check and evaluating computational considerations in data collection, data storage, and algorithm design. It augments PC with an overarching stability principle, which largely expands traditional statistical uncertainty considerations. In particular, stability assesses how results vary with respect to choices (or perturbations) made across the data science life cycle, including problem formulation, pre-processing, modeling (data and algorithm perturbations), and exploratory data analysis (EDA) before and after modeling.


Driverless cars now allowed on Ontario roads as part of pilot program

National Post, The Canadian Press


from

Participants in Ontario’s automated vehicle pilot program can test driverless cars on public roadways, under strict conditions


Harvard works to embed ethics in computer science curriculum

Harvard Gazette


from

Barbara Grosz has a fantasy that every time a computer scientist logs on to write an algorithm or build a system, a message will flash across the screen that asks, “Have you thought about the ethical implications of what you’re doing?”

Until that day arrives, Grosz, the Higgins Professor of Natural Sciences at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS), is working to instill in the next generation of computer scientists a mindset that considers the societal impact of their work, and the ethical reasoning and communications skills to do so.

“Ethics permeates the design of almost every computer system or algorithm that’s going out in the world,” Grosz said. “We want to educate our students to think not only about what systems they could build, but whether they should build those systems and how they should design those systems.”

 
Events



Turing Fest

Turing Fest


from

Edinburgh, Scotland August 27-29. “Turing Fest brings the smartest people in global tech to Scotland’s iconic capital city. Be part of a diverse audience of founders, engineers, product managers, marketers, investors and more, coming together to learn, connect and share a unique experience.” [registration begins in February]

 
Deadlines



Wellcome Data Re-use Prize: Malaria

“Submissions to the prize should generate a new insight, tool, or health application from the vast amount of data held by [Malaria Atlas Project]. Participants can pursue any research question or innovation that makes best use of the data.” Deadline for applications is March 15

Circular Economy 2030 contest

“In 2018, global demand for resources was 1.7 times what the earth can support in one year and is only expected to rise.1 Google Cloud and SAP are coming together to help address this challenge and host a social entrepreneurship contest for a sustainable future.” Deadline for applications is March 17.
 
Tools & Resources



Leveraging Unannotated Data to Bootstrap Alexa Functions More Quickly

Amazon, Alexa Blogs, Anuj Goyal


from

Developing a new natural-language-understanding system usually requires training it on thousands of sample utterances, which can be costly and time-consuming to collect and annotate. That’s particularly burdensome for small developers, like many who have contributed to the library of more than 70,000 third-party skills now available for Alexa.

One way to make training more efficient is transfer learning, in which a neural network trained on huge collections of previously annotated data is then retrained on the comparatively sparse data in a new area. Last year, my colleagues and I showed that, for low volumes of training data, transfer learning could reduce the error rate of natural-language-understanding (NLU) systems by an average of 14%.

This year, at the 33rd conference of the Association for the Advancement of Artificial Intelligence (AAAI), we will present a method for reducing the error rate by an additional 8% — again, for low volumes of training data — by leveraging millions of unannotated interactions with Alexa.


Patsy

Github – pydata


from

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. Patsy brings the convenience of R “formulas” to Python.


How NOT to design RESTful APIs

Noteworthy – The Journal Blog, Rob Konarski


from

I was helping out a friend who needed to integrate housing availability from a property management system with his client’s website. Luckily, the property management system had an API. Unfortunately, everything about it was wrong.

The goal of this story is not to give a bad advertisement for a used system, but to share how things should NOT be developed, as well as learn the right approaches when it comes to designing APIs.


AWS launches Neo-AI, an open-source tool for tuning ML models

TechCrunch, Frederic Lardinois


from

AWS isn’t exactly known as an open-source powerhouse, but maybe change is in the air. Amazon’s cloud computing unit today announced the launch of Neo-AI, a new open-source project under the Apache Software License. The new tool takes some of the technologies that the company developed and used for its SageMaker Neo machine learning service and brings them (back) to the open-source ecosystem.

The main goal here is to make it easier to optimize models for deployments on multiple platforms — and in the AWS context, that’s mostly machines that will run these models at the edge.

 
Careers


Full-time positions outside academia

Associate Sports Analytics Specialist



ESPN; Bristol, CT
Internships and other temporary positions

Information Security Intern



The New York Times; New York, NY

Leave a Comment

Your email address will not be published.