Data Science newsletter – June 26, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 26, 2017


Data Science News

Tweet of the Week

Twitter, ali


Data Visualization of the Week

Twitter, Stanford Journalism


As a researcher…I’m a bit bloody fed up with Data Management

Cameron Neylon


The following will come across as a rant. Which it is. But it’s a well intentioned rant. Please bear in mind that I care about good practice in data sharing, documentation, and preservation. I know there are many people working to support it, generally under-funded, often having to justify their existence to higher-ups who care more about the next Glam Mag article than whether there’s any evidence to support the findings. But, and its an important but, those political fights won’t become easier until researchers know those people exist, value their opinions and input, and internalise the training and practice they provide. The best way for that to happen is to provide the discovery points, tools and support where researchers will find them, in a way that convinces us that you understand our needs. This rant is an attempt to illustrate how large that gap is at the moment.

Nonprofits, not Silicon Valley startups, are creating AI apps for the greater good

Recode, Shannon Farley


Predictions for the potential of artificial intelligence wax poetic — solutions from climate change to curing disease — but the everyday applications make it seem far more mundane, like a glorified clock radio.

Thankfully, the future may be closer than we think. And the miraculous feats are not happening in Silicon Valley X-Labs — in a plot twist, nonprofits are leading the charge in creating human-centered applications of the hottest AI technologies. From the simplest automated communications to contextual learnings based on analysis of deep data, these technologies have the potential to rapidly scale and improve the lives of our most underserved communities.

MIT and Google researchers have made AI that can link sound, sight, and text to understand the world

Quartz, Dave Gershgorn


If we ever want future robots to do our bidding, they’ll have to understand the world around them in a complete way—if a robot hears a barking noise, what’s making it? What does a dog look like, and what do dogs need?

AI research has typically treated the ability to recognize images, identify noises, and understand text as three different problems, and built algorithms suited to each individual task. Imagine if you could only use one sense at a time, and couldn’t match anything you heard to anything you saw. That’s AI today, and part of the reason why we’re so far from creating an algorithm that can learn like a human. But two new papers from MIT and Google explain first steps for making AI see, hear, and read in a holistic way—an approach that could upend how we teach our machines about the world.

“It doesn’t matter if you see a car or hear an engine, you instantly recognize the same concept. The information in our brain is aligned naturally,” says Yusuf Aytar, a post-doctoral AI research at MIT who co-authored the paper.

Two Conjectures Collide, Endangering the Naked Singularity

WIRED, Science, Quanta Magazine, Natalie Wolchover


Physicists have wondered for decades whether infinitely dense points known as singularities can ever exist outside black holes, which would expose the mysteries of quantum gravity for all to see. Singularities—snags in the otherwise smooth fabric of space and time where Albert Einstein’s classical gravity theory breaks down and the unknown quantum theory of gravity is needed—seem to always come cloaked in darkness, hiding from view behind the event horizons of black holes. The British physicist and mathematician Sir Roger Penrose conjectured in 1969 that visible or “naked” singularities are actually forbidden from forming in nature, in a kind of cosmic censorship. But why should quantum gravity censor itself?

Now, new theoretical calculations provide a possible explanation for why naked singularities do not exist—in a particular model universe, at least.

Harvard launches data science master’s degree program

Harvard University, John A. Paulson School of Engineering and Applied Sciences


Harvard will offer a Master of Science (SM) degree in Data Science beginning in fall of 2018. The new degree, under the joint academic leadership of the faculties of Computer Science in the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS), and of Statistics in the Faculty of Arts and Sciences (FAS), will train students in the rapidly growing field of data science.

IU trustees approve three new graduate degree programs

News at IU Bloomington


The Indiana University Board of Trustees has approved three new graduate degree programs, including a Master of Science in intelligent systems engineering, the final component of the university’s new program in intelligent systems engineering established in 2015.

The intelligent systems engineering program, the first engineering program in the history of the IU Bloomington campus, is housed within the IU School of Informatics and Computing. The program already offers bachelor’s and Ph.D. degrees.

No One Has the Data to Prevent the Next Flint

WIRED, Science, Issie Lapowsky


You have no real way of knowing if your town, your family, or your children face the kind of water contamination that exposed everyone in Flint, Michigan, to lead poisoning. Not because Flint is an outlier–it may, in fact, be the norm—but because no one has enough data to say for sure.

Five state and local officials in Flint face involuntary manslaughter charges for failing to alert the public to the looming health crisis there. Yet a recent Reuters report found 3,000 geographic areas in the US with lead poisoning rates twice that of Flint. But you would be hard-pressed to determine whether you lived in one of them because the United States lacks the data—and data collection requirements—needed to know for sure whether people are being poisoned by their drinking water. President Trump’s proposed cuts to the Environmental Protection Agency’s could make it even harder to know.

Policy and population behavior in the age of Big Data

Current Opinion in Behavioral Sciences; Kai Ruggeri et al.


Policies are large-scale interventions that typically aim to influence behaviors and decision-making across entire populations to obtain a desired outcome. With the rapid increase in Big Data and its growing influence on policy, there is an emerging opportunity to produce meaningful and efficient mechanisms for improving public policy outcomes. However, there are still considerable gaps between existing theories in the behavioral sciences and evidence generated by Big Data, including the representation of key groups within the population. We outline the need for replicating established behavioral insights through Big Data that should coincide with clear ethical standards for implementing such approaches through evidence-based policymaking. [full text]

Nine ideas for teaching Computing at School from the 2017 CAS conference

Duncan Hull, O'Really? blog


The Computing At School (CAS) conference is an annual event for educators, mostly primary and secondary school teachers from the public and private sector in the UK. Now in its ninth year, it attracts over 300 delegates from across the UK and beyond to the University of Birmingham, see the brochure for details. One of the purposes of the conference is to give teachers new ideas to use in their classrooms to teach Computer Science and Computational Thinking. I went along for my first time (*blushes*) seeking ideas to use in an after school Code Club (ages 7-10) I’ve been running for a few years and also for approaches that undergraduate students in Computer Science (age 20+) at the University of Manchester could use in their final year Computer Science Education projects. So here are nine ideas (in random brain dump order) I’ll be putting to immediate use in clubs, classrooms, labs and lecture theatres:

1. Linda Liukas demonstrated some intriguing ideas from her children’s books and that are based on Montessori education. I shall be trying some of these out (particularly the storytelling stuff) at code club to keep girls involved

When will Alexa, Google Assistant, and other ‘chatbots’ finally talk to us like real people?

Science, Latest News, Matthew Hutson


Nearly every tech company worth its silicon now has a virtual assistant: Apple’s Siri, Amazon’s Alexa, Microsoft’s Cortana, and Google Assistant, among others. What do these digital helpers—known as chatbots—have in store for us? Science talked with Alexander Rudnicky, a computer scientist at Carnegie Mellon University in Pittsburgh, Pennsylvania, who studies human-machine conversations, about what chatbots can and can’t do, how they learn from us, and whether we can prevent them from adopting our worst behaviors.

Q&A: Why a top mathematician has joined Emmanuel Macron’s revolution | Science | AAAS

Science, ScienceInsider, Elisabeth Pain


Science talked to one of En Marche!’s new National Assembly members, mathematician and Fields medalist Cédric Villani, 43, who won 69% of the vote in a constituency south of Paris. Villani, who heads the Henri Poincaré Institute in the capital, has won a book prize from the American Mathematical Society in 2014 and joined the prestigious Pontifical Academy of Sciences last year. Frequent media appearances over the past decade—and his trademark silk ascot and spider brooch—have made him one of France’s best-known scientists.

Computer chip mimics human brain, with light beams for neurons

Science, Latest News, Matthew Hutson


Artificial neural networks, computer algorithms that take inspiration from the human brain, have demonstrated fancy feats such as detecting lies, recognizing faces, and predicting heart attacks. But most computers can’t run them efficiently. Now, a team of engineers has designed a computer chip that uses beams of light to mimic neurons. Such “optical neural networks” could make any application of so-called deep learning—from virtual assistants to language translators—many times faster and more efficient.

“It works brilliantly,” says Daniel Brunner, a physicist at the FEMTO-ST Institute in Besançon, France, who was not involved in the work. “But I think the really interesting things are yet to come.”

Nissan, eyeing fully self-driving cars, offers a glimpse of its new semi-autonomous Leaf

The Verge, Andrew J. Hawkins


Nissan says the next generation of its all-electric Leaf vehicles will feature the Japanese automaker’s most advanced driver assistance system yet. Nissan first revealed its plan to include the ProPilot technology, which allows single-lane autonomous driving, in the forthcoming version of the Leaf back at CES last January. Now the automaker is offering a sneak peek of its latest effort to take on Tesla in the race to sell more advanced, mass-market electric vehicles.

The glimpse of the interior, behind-the-steering-wheel display looks similar to Tesla’s Autopilot, which offers similar functionality through its Autosteer and Traffic Aware Cruise Control features.

China’s genomics giant to make stock-market debut

Nature News & Comment, David Cyranoski


China’s genomics giant BGI, once the world leader in DNA sequencing for basic science, is going public — capping off a dramatic transformation into a mainly biomedical firm with a focus on reproductive health.

A financial prospectus document released to support the initial public offering (IPO) details how BGI, squeezed by its rivals and the plummeting cost of sequencing, has been drawn to more-profitable pursuits, such as prenatal genetic testing, in China’s expanding medical market. The shift is also in line with the Chinese government’s multibillion-yuan drive to promote precision medicine, an effort to use the reams of genomic and other medical data being created to tailor treatments.

The fight to save thousands of lives with sea-floor sensors

Nature News & Comment, Alexandra Witze


Geophysicists are ramping up their efforts to monitor major undersea faults for movement, and search for signs of the next catastrophic quake.

Zendrive puts the skids under traffic fatalities

Diginomica, Jerry Bowles


In 2013, Jonathan Matus, who helped turn Google’s Android into the world’s dominant mobile software and later led the launch of Facebook’s mobile platform, teamed up with another Google engineer, Pankaj Risbood to found Zendrive, a tech company that uses sensors in smartphones to capture, analyze and then coach a driver on what they can do to be safer while behind the wheel.

Phones enabled with the Zendrive’s technology can relay data not only about when drivers are looking at their phones, but also how fast they’re driving at any given moment, or if they make hard turns at intersections. It detects collisions, aggressive driving, distracted driving, and more, including whether you’ve recently stopped at a pub.

Will Mars Go Mute? NASA’s Aging Orbiters May Not Last Long Enough to Support Future Exploration

Scientific American, Leonard David


The space agency does not have a replacement in the works to take over their vital role of relaying communications

The Man Who Helped Turn Toronto Into a High-Tech Hotbed

The New York Times, Craig S. Smith


As an undergraduate at Cambridge University, Geoffrey Everest Hinton thought a lot about the brain. He wanted to better understand how it worked but was frustrated that no field of study — from physiology and psychology to physics and chemistry — offered real answers.

So he set about building his own computer models to mimic the brain’s process.

“People just thought I was crazy,” said Dr. Hinton, now 69, a Google fellow who is also a professor emeritus of computer science at the University of Toronto.


Proposers’ Day Announcement for Finding Engineering-Linked Indicators (FELIX)



Washington, DC The Intelligence Advanced Research Projects Activity will host a Proposers’ Day Conference for the FELIX program on July 27 … “The FELIX program aims to develop new capabilities that can detect engineered changes within biological systems to expedite appropriate mitigation responses to unlawful or accidental release of organisms.” [registration required]

What artificial intelligence can learn from the brain, and vice versa: An Evening with Dr. Gary Marcus

Allen Institute for Brain Science, Djavad Mowafaghian Centre for Brain Health


Vancouver, Canada Sunday, July 9, starting at 6 p.m. Kickoff to Neurofutures 2017 conference. [free, registration required]


International Conference on Health Policy Statistics – Submit an Abstract

Charleston, SC Conference is January 10-12, 2018. Deadline for abstracts is June 29.

William G. Hunter Award

June 30 is the application deadline for the William G. Hunter Award, sponsored by the Statistics Division of the American Society for Quality. Presented annually, the award was designed to encourage the creative development and application of statistical techniques to problem solving in the quality field.

ASA Science Policy Fellowships

The application deadline is July 21.

Amazon Research Awards | Call For Proposals

Full-time faculty members of institutions in North America and Europe granting PhD degrees in fields related to Machine Learning are eligible to apply. Deadline for proposals is September 15.
Tools & Resources

[1706.07094] Constrained Bayesian Optimization with Noisy Experiments

arXiv, Statistics > Machine Learning; Benjamin Letham, Brian Karrer, Guilherme Ottoni, Eytan Bakshy


Randomized experiments are the gold standard for evaluating the effects of changes to real-world systems, including Internet services. Data in these tests may be difficult to collect and outcomes may have high variance, resulting in potentially large measurement error. Bayesian optimization is a promising technique for optimizing multiple continuous parameters for field experiments, but existing approaches degrade in performance when the noise level is high. We derive an exact expression for expected improvement under greedy batch optimization with noisy observations and noisy constraints, and develop a quasi-Monte Carlo approximation that allows it to be efficiently optimized. Experiments with synthetic functions show that optimization performance on noisy, constrained problems outperforms existing methods. We further demonstrate the effectiveness of the method with two real experiments conducted at Facebook: optimizing a production ranking system, and optimizing web server compiler flags.

Seeing Like a Bike: Towards Integrating the Sensor System

Data Science for Social Good – Atlanta


The seeing like a bike team is now in the stage of wrapping up the sensor box and integrating parts as a system. Each level of the sensor system design – from hardware to software – is under the iterative refinement process to better collect data as well as to provide a seamless experience to end users.

How I went from failing every interview to a job at Amazon

Reginald Long


“Over time, I’ve learned that interviewing is a skill. If you treat interviewing like a skill, and not an inherent capability, then you have the agency to improve enough to get your dream tech job. It’s easy to lament the state of software engineering interviews, but it’s harder to accept it and create a plan for success. But would you rather complain and feel good about yourself or have a job?”

Caffeine :: Livecode the Web!

thisContext blog


With the arrival of Bert Freudenberg’s SqueakJS, it was finally time for me to revisit the weird and wonderful world of JavaScript and web development. My previous experiences with it, in my consulting work, were marked by awkward development tools, chaotic frameworks, and scattered documentation. Since I ultimately rely on debuggers to make sense of things, my first question when evaluating a development environment is “What is debugging like?”

A scalable time-series database that supports SQL

O'Reilly Radar, Ben Lorica


“In this episode of the Data Show, I spoke with Michael Freedman, CTO of Timescale and professor of computer science at Princeton University. … The early incarnation of Timescale was a startup focused on IoT, and it was while building tools for the IoT problem space that Freedman and the rest of the Timescale team came to realize that the database they needed wasn’t available (at least out in open source). Specifically, they wanted a database that could easily support complex queries and the sort of real-time applications many have come to associate with streaming platforms. [audio, 49:19]

Be a Better Communicator by Embarrassing Yourself

New York Magazine, Science of Us blog, Thomas MacMillan


Tina Fey joined Alda onstage at the New-York Historical Society to talk about his new book — If I Understood You, Would I Have This Look on My Face? — as part of a World Science Festival event. The actors — both with backgrounds in comedy and improvisational theater — spoke about how improv techniques can make you a better communicator by teaching you to think fast, trust yourself, and connect with your collaborators and your audience. In the course of the evening, a packed crowd of scientists and science enthusiasts cheered as the two actors played catch with an invisible beach ball, babbled in nonsense language, and enlisted a volunteer to carry a glass of water across the stage — each a lesson about fundamental aspects of good communication, like listening, relating, empathizing, and storytelling.



Postdoctoral Associate – Machine Learning in Computational Biology

Broad Institute; Cambridge, MA

Leave a Comment

Your email address will not be published.