Data Science newsletter – July 10, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for July 10, 2017


Data Science News

Why technology hasn’t solved our parking problems

The Boston Globe, Scott Kirsner


A few years back, it seemed like one of the perennial headaches of urban life — parking — was about to be vanquished by the equivalent of an extra-strength aspirin.

Startups installed sensors in the pavement that would alert your smartphone when a space opened up, or hired armies of valets who would scoot over to pick up your car and whisk it into a garage. One company, FlightCar, even made use of your vehicle when you left it in an airport lot, by renting it out to incoming travelers — and giving you free parking and a car wash in exchange.

Many of the companies, including FlightCar, have since gone out of business, or discontinued their original services while rushing to figure out something else that might work.

AI, people, and society

Science, Editorial, Eric Horvitz


In an essay about his science fiction, Isaac Asimov reflected that “it became very common…to picture robots as dangerous devices that invariably destroyed their creators.” He rejected this view and formulated the “laws of robotics,” aimed at ensuring the safety and benevolence of robotic systems. Asimov’s stories about the relationship between people and robots were only a few years old when the phrase “artificial intelligence” (AI) was used for the first time in a 1955 proposal for a study on using computers to “…solve kinds of problems now reserved for humans.” Over the half-century since that study, AI has matured into subdisciplines that have yielded a constellation of methods that enable perception, learning, reasoning, and natural language understanding.

Banks Deploy AI to Cut Off Terrorists’ Funding

WIREd, Business, Issie Lapowsky


“It doesn’t take much to survive in a hostel in Belgium while waiting to be moved to another location,” says Dan Stitt, who’s spent two decades in the financial crimes industry, with stints at the Drug Enforcement Agency and the Export-Import Bank of the United States. The pattern of small transactions a terrorist in hiding makes might not raise red flags for the usual anti-money-laundering systems.

Unless those systems use artificial intelligence.

To tackle Google’s power, regulators have to go after its ownership of data

The Guardian, Evgeny Morozov


The problem with regulating technology companies is that, faced with tough new rules, they can eventually innovate their way out, often by switching to newer, unregulated technologies. The risk of targeted regulation informed by little other than economic doctrines might even be fuelling a corporate quest for eternal disruption: instead of surrendering to the regulators, technology firms prefer to abandon their old business model.

It’s through this lens that we should interpret the likely fallout from the €2.4bn fine imposed on Alphabet, Google’s parent company, by the European commission. It arrives after a lengthy, seven-year investigation into whether the company abused its dominance to promote its own online shopping service above search results. The commission’s case seems sound; the sad fate of small online retailers, unable to compete with Alphabet over the past decade, suggests as much.

However, one should not mistake the factual correctness of the commission’s case for an informed strategic vision: if it has a clue about effective ways to limit the power of data platforms, it’s not showing it.

The Netflix Prize: How a $1 Million Coding Contest Changed Streaming

Thrillist, Dan Jackson


Lester Mackey, who speaks in a soft monotone, becomes audibly excited when discussing the end of the competition. Almost a decade later, but he can take you right back to the final days. By that point, Dinosaur Planet had been folded into a 30 person mega-team called The Ensemble, itself formed to compete with the super-group BellKor’s Pragmatic Chaos. (Yes, most of these team names sound like titles for sci-fi novels.)

The BellKor squad included Chris Volinsky and his AT&T colleagues Robert Bell and Yehuda Koren, along with four other engineers from the United States, Austria, Canada and Israel. The global effort paid off: On June 26, 2009, BellKor’s Pragmatic Chaos finally crossed the 10% finish line, triggering a 30 day window to submit a better algorithm. Mackey and his Ensemble teammates were feeling the heat.

“We had, of course, not slept for the past two days because the contest was about to end,” says Mackey. “We knew we were close to BellKor’s Pragmatic Chaos but we didn’t know if we had passed them for a day. Days prior we had actually managed to inch ahead of them. But knowing that team, we knew they could easily strike back with something better.”

Why artificial intelligence is far too human

The Boston Globe, Ideas, Emily Kumler


Have you ever wondered how the Waze app knows shortcuts in your neighborhood better than you? It’s because Waze acts like a superhuman air traffic controller — it measures distance and traffic patterns, it listens to feedback from drivers, and it compiles massive data sets to get you to your location as quickly as possible.

Even as we grow more reliant on these kinds of innovations, we still want assurances that we’re in charge, because we still believe our humanity elevates us above computers. Movies such as “2001: A Space Odyssey” and the “Terminator” franchise teach us to fear computers programmed without any understanding of humanity; when a human sobs, Arnold Schwarzenegger’s robotic character asks, “What’s wrong with your eyes?” They always end with the machines turning on their makers.

What most people don’t know is that artificial intelligence ethicists worry the opposite is happening: We are putting too much of ourselves, not too little, into the decision-making machines of our future.

AI Research Seeks to Grow Trust between Humans and Computers

Boston University, Research


In an age of self-driving cars and autonomous drones, artificial intelligence (AI) is becoming a bigger part of our lives. It’s also getting increasingly savvy. Today, AI can recognize text, distinguish people by their faces, and even identify physical objects, to some degree. But even the best AI systems still get things wrong much of the time.

That poses a big problem, says Kate Saenko, assistant professor of computer science at Boston University. “If an AI tool makes mistakes, human users quickly learn to discount it, and eventually stop using it altogether,” she says. “I think that humans by nature are not likely to just accept things that a machine tells them.”

A further complication, she adds, is that as AI becomes more powerful, the algorithms that drive it have become increasingly opaque to human users. Information goes into one end of a computational “black box,” and an answer comes out the other side—yet the set of rules and reasoning used to find that answer are obscured.

Saenko is working to change that relationship.

How Nature Solves Problems Through Computation

Quanta Magazine, Joshua Sokol


The evolutionary biologist Jessica Flack seeks the computational rules that groups of organisms use to solve problems.

Big Data in the Social Sciences: An interview featuring Ian Mulvany



How big is Big Data, and how does it apply to the realm of social sciences? In this latest PLOSCast episode, Elizabeth explores that question with Ian Mulvany, the Head of Product Innovation at SAGE Publications. They discuss his journey from astrophysics into scientific publishing, transitioning from editorial into product management. He describes his passion for investigating ways to use technology to improve the research process, and how Big Data can add new instruments to social sciences. Finally, they consider how change is made in scholarly publishing and what that could look like in the future. [audio, 34:58]

Information Science at CU Boulder

ACM Interactions magazine


What is a unique feature of your lab? The Department of Information Science at the University of Colorado Boulder (CU) is brand-new and small, so we have had the opportunity to make our space and processes highly collaborative. As such, our “lab” is an entire department! We have faculty and students working in areas that include computational social science, digital humanities, crisis informatics, digital identity, personal information management, philanthropic informatics, data ethics and policy, network science, machine learning, information visualization, and diversity and inclusion in STEM. One theme that cuts across some of our research (and serves to translate research into undergraduate education) is human-data interaction, as an expansion of human-computer interaction. We combine areas of expertise for the benefit of the classroom, Ph.D. student advising, and numerous research collaborations. In addition, the entirety of the faculty and Ph.D. student group is working together on a project to understand and promote the practice of human-centered data science across several academic disciplines at CU. Though we all have research agendas of our own, having a collective identity is important to our mission and our operations. We really do feel like a research group as much as a department.

What shortcomings do you see with deep learning?

Quora, Oren Etzioni


… the overblown claims about the impressive success of AlphaGo are a case of a person climbing to the top of the tree and shouting “I’m on my way to the moon!”

Here’s why:

1. AlphaGo relied on a massive amount of labeled data, which is easily available in games, and often unavailable in other contexts.

How we tracked and analyzed over 200,000 people’s footsteps at MIT

Medium, Moin Nadeem


Most people consider their phone as a receiver; it connects to cellular / WiFi networks, and for all practical uses, are only functional when connected. However, when phones are searching for WiFi networks, they commonly also send out small packets of information called probe requests.

These probe requests send snippets of information such as a unique MAC address (similar to a fingerprint), RSSI signal (logarithmic signal strength), and a list of previous SSIDs encountered. As each phone will send out one MAC address (excluding recent attempts at anonymization), we can easily leverage these to track students walking around campus.

The Word Choices That Explain Why Jane Austen Endures

The New York Times, The Upshot blog, Kathleen A. Flynn and Josh Katz


Two hundred years after her death, Jane Austen commands a cultural empire — fan fiction, adaptations, merchandise — with her six novels at the center. It raises the question: Why her, as opposed to someone else?

Franco Moretti, founder of the Stanford Literary Lab, which applies data analysis to the study of fiction, argues that certain books survive through the choices of ordinary readers, a process something like evolution: “Literary history is shaped by the fact that readers select a literary work, keeping it alive across the generations, because they like some of its prominent traits.”

What traits make Austen special, and can they be measured with data? Can literary genius be graphed?

How FDA Plans to Help Consumers Capitalize on Advances in Science

US FDA, FDA Voice blog, Scott Gottlieb


FDA will soon unveil a comprehensive Innovation Initiative. It will be aimed at making sure our regulatory processes are modern and efficient, so that safe and effective new technologies can reach patients in a timely fashion. We need to make sure that our regulatory principles are efficient and informed by the most up to date science. We don’t want to present regulatory barriers to beneficial new medical innovations that add to the time, cost, and uncertainty of bringing these technologies forward if they don’t add to our understanding of the product’s safety and benefits.

This imperative is driven by our mandate to promote the public health. It includes a responsibility to make sure that we’re taking steps, within the scope of our existing responsibilities, to also help facilitate access to new innovations once FDA approves them.

Amazon Prime is on pace to become more popular than cable TV

Recode, Jason Del Rey and Rani Molla


Someday soon, more U.S. households will be subscribers of Amazon Prime than cable or satellite TV, according to recent estimates of Amazon’s popular shipping and entertainment service.

According to estimates from Morningstar, nearly 79 million U.S. households now have an Amazon Prime membership*, up from around 66 million at the end of last year.

Michigan’s New Motor City: Ann Arbor as a Driverless-Car Hub

The New York Times, Neal E. Boudette


As the world looks ahead to a future of interconnected, self-driving cars, this college town 40 miles west of Detroit has emerged as a one-of-a-kind, living laboratory for the technologies that will pave the way.

Here, it is not uncommon to see self-driving Ford Fusions or Lexus sedans winding their way through downtown streets and busy intersections, occupied by engineers with eyes focused more on laptops and test equipment than the roadway.

Soon students and staff members at the University of Michigan will be able to get around the engineering campus on fully automated, driverless shuttle buses provided by a French company drawn to Ann Arbor by the university’s autonomous-car test track, known as MCity.

How Instagram Learns From Your Likes To Keep You Hooked

Fast Company, John Paul Titlow


Instagram’s Explore tab whittles billions of images down to just the ones you’ll like by tapping your behavior and metadata–not the image content itself.

Dr. Brenda Fitzgerald is named new CDC director

STAT, Helen Branswell


Georgia’s public health commissioner, an OB-GYN and two-time Republican candidate for Congress, has been named the next director of the Centers for Disease Control and Prevention.

Dr. Brenda Fitzgerald will replace Dr. Tom Frieden, who served as CDC director for eight years before stepping down in January.

Fitzgerald joined the Atlanta-based agency effective Friday, its acting director, Dr. Anne Schuchat, told CDC staff in an email. Schuchat, a CDC veteran and principal deputy director under Frieden, assured staff she will work closely with Fitzgerald to affect a smooth transition.

Stanford: Hacking for Defense class innovates for spies and warriors

San Jose Mercury News, Ethan Baron


It’s not every college class that leaves students soaked in seawater, caked with sand and in a world of hurt at the Navy SEALs’ Special Warfare Center.
It’s certainly not every class whose final presentations draw an audience combining U.S. Marines, Air Force pilots and a partner at famed venture capital firm Andreessen Horowitz.

And it’s most definitely not every class where a visiting reporter is told it’s OK to mention that one of the projects is sponsored by the CIA — but it’s not OK to say which one.

As surprising as the content and supporters of Stanford University’s Hacking for Defense class may be, it fits right into Silicon Valley: This is a course built on the “lean startup” method that has become a ubiquitous road map for building a company fast.

West Elm’s New AI Tool Scans Pinterest To Recommend Furnishings

Fast Company, Sean Captain


From a handful of images, neural networks learn someone’s style and match it to furniture, rugs, curtains, and more in a few seconds.

The World May Be Headed for a Fragmented ‘Splinternet’

WIRED, Business, Davey Alba


The rulings on online speech are coming down all over the world. Most recently, on June 30, Germany passed a law that orders social media companies operating in the country to delete hate speech within 24 hours of it being posted, or face fines of up to $57 million per instance. That came two days after a Canada Supreme Court ruling that Google must scrub search results about pirated products. And in May a court in Austria ruled that Facebook must take down specific posts that were considered hateful toward the country’s Green party leader. Each of those rulings mandated that companies remove the content not just in the countries where it was posted, but globally. Currently, in France, the country’s privacy regulator is fighting Google in the courts to get the tech giant to apply Europe’s “right to be forgotten” laws worldwide. And, around the world, dozens of similar cases are pending.

The trend of courts applying country-specific social media laws worldwide could radically change what is allowed to be on the internet, setting a troubling precedent. What happens to the global internet when countries with different cultures have sharply diverging definitions of what is acceptable online speech? What happens when one country’s idea of acceptable speech clashes with another’s idea of hate speech? Experts worry the biggest risk is that the whole internet will be forced to comport with the strictest legal limitations.

“There’s a risk of a race to the bottom here,” says Vivek Krishnamurthy, assistant director of Harvard Law School’s Cyberlaw Clinic, who specializes in international internet governance.

Collaboration across (baseball) fields leads to Amazonian rivers

Cornell Chronicle


An ambitious project that deploys big data and uses machine learning to understand the ecological impacts of hydropower dams in the Amazon Basin started in a mundane enough setting: on the sidelines at youth baseball games.

Conversations initially sparked when they were parents at local games ultimately led Alex Flecker, professor of ecology and evolutionary biology in the College of Agriculture and Life Sciences, and Carla Gomes, professor of computer science and director of the Institute for Computational Sustainability, to a collaborative effort to solve problems facing one of the most biodiverse areas of the world.

Using a seed grant from the Atkinson Center for a Sustainable Future and additional support from the National Science Foundation’s Expeditions in Computing initiative, the researchers are collaborating across disciplines – from hands-in-the-dirt ecology to the computing power of networks – crunching big data related to Amazonian rivers and waterways as they evaluate the cumulative economic and environmental impacts of dams.


CSF2017 – 30th IEEE Computer Security Foundations Symposium



Santa Barbara, CA August 21-25, Co-located with CRYPTO’17. [$$$]

TDWI Leadership Summit – Achieve Faster Time to Value with Analytics and Business Intelligence

Transforming Data With Intelligence


Anaheim, CA August 7-8 at Disneyland Hotel, co-located with TDWI Anaheim. [$$$$]

Public Discussion of the Value of Social, Behavioral, and Economic Sciences to National Priorities

The National Academies of Sciences, Engineering, Medicine


Washington, DC Wednesday, July 19, starting at 9 a.m., National Academy of Sciences. [free, registration required]


Survey: Advanced Analytics: Moving Towards Machine Learning, NLP, and AI (TDWI)

“This 5-7 minute survey asks questions about your organization’s strategies, plans, and opinions about advanced analytics – specifically AI, machine learning, natural language processing, and other ‘new’ technologies in the market.”
Tools & Resources

Introducing ELF: An extensive, lightweight and flexible platform for game research

Facebook Code, Engineering Blog; Yuandong Tian


Despite the many benefits of using games for training, it can be difficult for individuals to conduct AI research in a game environment. Due to the limitations of many current learning algorithms, hundreds of thousands of rounds of gameplay are required, which is not possible without a sufficient supply of computational resources, e.g., high-performance computing platforms equipped with many CPUs, GPUs, or specialized hardware. Furthermore, the relevant algorithms are complex and delicate to tune. These problems compound as the complexity of the training environment increases and multiple AI agents are introduced.

To address these issues and democratize AI research efforts, the FAIR team has created ELF: an Extensive, Lightweight, and Flexible platform for game research. ELF allows researchers to test their algorithms in various game environments, including board games, Atari games (via the Arcade Learning Environment), and custom-made, real-time strategy games. Not only does it run on a laptop with GPU, it also supports training AI in more complicated game environments, such as real-time strategy games, in just one day using only six CPUs and one GPU.

Why Angular 2/4 Is Too Little, Too Late

Medium, Chris Cordle


It’s 2017. You’re looking to build or rewrite your front-end architecture. You’ve heard a lot about React vs. Angular and how they contrast and compare. The debates have raged on over the years. A lot of people will tell you that this is a Coke vs. Pepsi debate, framework vs. library, tradeoffs, pros/cons, etc, etc. but in 2017, it’s easy to see that Angular 2/4 will never reach the top-tier status of its popular AngularJS predecessor.

UseRSportTutorial: Tutorial material for Sports Analytics with R presented at the 2017 Melbourne Data Science Week

GitHub – skoval


“This repo contains the tutorial material used as part of the workshop on Sports Analytics with R that was given at the 2017 Melbourne Data Science Week”

Kubernetes’s days may be numbered as open source changes

InfoWorld, Matt Asay


“Kubernetes (and a host of other new-school open source projects) turns this model on its head, however. It started life as open source and thereby rendered the initial price as $0.00. This is OK for now while Google funds Kubernetes to provide an API-laden path to its cloud, but what happens if Google finds more efficient ways to drive paying customers to the Google Cloud Platform and other services?”

Constrained Policy Optimization

The Berkeley Artificial Intelligence Research Blog, Joshua Achiam


Deep reinforcement learning (RL) has enabled some remarkable achievements in hard control problems: with deep RL, agents have learned to play video games directly from pixels, to control robots in simulation and in the real world, to learn object manipulation from demonstrations, and even to beat human grandmasters at Go. Hopefully, we’ll soon be able to take deep RL out of the lab and put it into practical, everyday technologies, like UAV control and household robots. But before we can do that, we have to address the most important concern: safety.

We recently developed a principled way to incorporate safety requirements and other constraints directly into a family of state-of-the-art deep RL algorithms. Our approach, Constrained Policy Optimization (CPO), makes sure that the agent satisfies constraints at every step of the learning process. Specifically, we try to satisfy constraints on costs: the designer assigns a cost and a limit for each outcome that the agent should avoid, and the agent learns to keep all of its costs below their limits.


Internships and other temporary positions

Part investment associate/part community builder

First Round venture capital; New York, NY

Leave a Comment

Your email address will not be published.