Data Science newsletter – July 13, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for July 13, 2018


Data Science News

UC opens doors to record number of Californians, led by growth in transfer students

Los Angeles Times, Teresa Watanabe


The University of California opened its doors to a record number of Californians for fall 2018, led by growth in transfer students from across the state, according to preliminary data released Wednesday.

The public research university’s nine undergraduate campuses offered seats to 95,654 Californians, nearly 3,000 more students than last year. Overall, UC admitted about three-fifths of the 221,788 California, out-of-state and international students who applied.

Data sciences program blends interdisciplinary training for a growing industry

Penn State University, Penn State News


While working as a stock clerk at his hometown grocery store near Pittsburgh, Vince Trost’s manager constantly tasked him with putting mayonnaise on the shelves.

“I started to ask myself ‘why do we need so much mayonnaise?’” said Trost. “The greater Cranberry area didn’t need so much mayo.”

After learning that his manager indiscriminately ordered the stock based on estimated demand for upcoming sales, Trost turned to Google. He wanted to know if there was a company that could optimize orders based on historical sales data. That’s when he discovered the up-and-coming field of data sciences.

“I found all these cool articles about companies using data to solve really big problems,” he said.

Rice adopting boot camp model for big data

Houston Chronicle, Andrea Leinfelder


Rice University is providing an express lane for data analysis jobs and their potential $60,000-plus salary.

The Houston campus is taking the coding boot camp model, popularized in recent years for its quick cultivation of web developers, and adapting it to big data analysis. The program meets a need for local employers that struggled to fill more than 50,000 positions requiring data proficiency in the past year.

Local data analysts make an average of $62,118 a year. Data scientists average $105,726, according to Glassdoor.

“The job demands are massive, and it’s across the spectrum,” said Robert Bruce, dean of the Susanne M. Glasscock School of Continuing Studies at Rice.

Using AI To Transform Hollywood’s Marketing And Much More

TV[R]EV, David Bloom


Entertainment companies are entering the Age of Data, where they’ll have access to more information than ever about their products, their audiences and how to create, market and distribute one to the other. Now, those companies and their leadership have to be ready to embrace the coming huge opportunities, especially as data-driven competitors such as Netflix, MoviePass and Amazon transform the industry.

That was one message this morning from Stephen F. DeAngelis, CEO and founder of AI provider Enterra Solutions, speaking before a group of Hollywood technology executives in Beverly Hills.

He noted wryly that Hollywood has portrayed AI technologies in dark or at least complicated ways over the years, from the murderous HAL 9000 in 2001: A Space Odyssey to the world-ending SkyNet in the Terminator films to the runaway AIs of Ex Machina and Her.

We’re quite a ways still from AI with that kind of power and autonomy, DeAngelis said, but he cautioned that people think of AI tools in overly limited ways.

Machine learning boosts Swiss startup’s shot at human-powered land speed record

TechCrunch, Devin Coldewey


The current world speed record for riding a bike down a straight, flat road was set in 2012 by a Dutch team, but the Swiss have a plan to topple their rivals — with a little help from machine learning. An algorithm trained on aerodynamics could streamline their bike, perhaps cutting air resistance by enough to set a new record.

Currently the record is held by Sebastiaan Bowier, who in 2012 set a record of 133.78 km/h, or just over 83 mph. It’s hard to imagine how his bike, which looked more like a tiny landbound rocket than any kind of bicycle, could be significantly improved on.

But every little bit counts when records are measured down a hundredth of a unit, and anyway, who knows but that some strange new shape might totally change the game?

To pursue this, researchers at the École Polytechnique Fédérale de Lausanne’s Computer Vision Laboratory developed a machine learning algorithm that, trained on 3D shapes and their aerodynamic qualities, “learns to develop an intuition about the laws of physics,” as the university’s Pierre Baqué said.

What happens when you take a bunch of medications? A new algorithm could help doctors figure it out – Scope

Stanford Medicine, Scope Blog


People take a lot of drugs: According to the CDC, about a quarter of us took two prescription drugs in the last month, and people over 65 routinely often take five or more — some as many as 20. And the thing is, doctors don’t always know what side effects some combinations might cause — there are just too many combinations to test. What we know about those side effects is largely discovered by accident and recorded in adverse-event reporting systems.

Fortunately, computer scientists may be able to help. As I explain in a Stanford News story, researchers Marinka Zitnik, PhD; Monica Agrawal; and Jure Leskovec, PhD, have designed a new system to deal with the literally billions of possibilities when considering any two drugs — out of about 5,000 on the U.S. market — and one of around a thousand different side effects. Their system builds a partial picture of those possibilities, starting with what data is already available about how proteins in our bodies interact with drugs and what adverse side effects have already been reported.

Should the ‘gig economy’ factor into a college education?

Christensen Institute, Michael B. Horn


Tom Vander Ark, CEO of Getting Smart, a learning design firm, and his team have been running a series on their blog called “It’s a Project-Based World,” The series focused on making recommendations for educators, but the changes Vander Ark and his team have been writing about have big implications for graduates and employers as well. I caught up with Vander Ark and asked him his thoughts on aligning education to the gig economy.

Blazar-made intergalactic ‘ghost’ heralds new era in astronomy

NOVA Next, Tim Treuer


To begin unravelling a century-long mystery in cosmology, astronomers first had to catch a ghost.

And recently, they did, detecting a single neutrino, a particle so small that it was historically treated as a dimensionless point without width or volume. The detection pointed researchers to a supermassive black hole in a distant galaxy, suggesting an intergalactic source for a variety of exotic and enigmatic cosmic rays that have puzzled the scientific community since the early 20th century.

OpenHouse.AI: Disrupting Real Estate through Transparency

Forbes, Cognitive World, Hessie Jones


How does technology alter an industry that is heavily regulated, with strong checks and balances and where information is heavily guarded? The real estate industry is comprised mainly of independent realtors, unified under brokerage houses and a nationwide real estate association. Within the confines of this oligopoly are rules to protect its players and ensure the market maintains dependence on the structure that has been defined for decades.

In the meantime, the market has emerged into an on-demand economy that has been driven by information. Today home buyers’ increasing access to information allows them, in some ways, to circumvent the agent. The need for instant gratification and knowledge to find the best value at the lowest cost is slowly evolving this industry. How real estate succeeds in the next decade will fully rely on the changing habits of the home buyer, with an acquiescence to slowly dismantling this market structure to improve buyer access to information, while creating new sources of value.

One start-up based in Calgary and Toronto is paving the way for this disruption and is challenging the players: the realtors and the home buyers to think differently about these transactions. OpenHouse.AI is a cloud-based data and feature rich web platform for residential real-estate that empowers buyers and sellers to make better and faster real estate decisions through clarity and transparency.

Bloomberg Offers Advanced Online Course in Machine Learning

Dice, Nick Kolakowski


Bloomberg is the latest firm to launch a (free) online course in machine learning. And it’s definitely not meant for folks who aren’t great at math.

Indeed, Bloomberg’s Foundations of Machine Learning is meant for those tech pros with strong experience in machine learning and mathematics, such as financial analysts, senior project managers, and engineers with quite a few years of experience under their (proverbial) belts. It’s an offshoot of an internal “Machine Learning EDU” initiative already taken by 1,000 Bloomberg software engineers.

The course is taught by David Rosenberg, a data scientist in the Office of the CTO at Bloomberg; he’s also an instructor at NYU’s Center for Data Science. “In doing this, our goal is to help make valuable machine learning skills more accessible to people with a strong math background, including experienced software developers, experimental scientists, engineers and financial professionals,” read a new Bloomberg blog posting about the program.

Georgia Tech Solves ‘Texture Fill’ Problem with Machine Learning

Georgia Tech, College of Computing


A new machine learning technique developed at Georgia Tech may soon give budding fashionistas and other designers the freedom to create realistic, high-resolution visual content without relying on complicated 3-D rendering programs.

TextureGAN is the first deep image synthesis method that can realistically spread multiple textures across an object. With this new approach, users drag one or more texture patches onto a sketch — say of a handbag or a skirt — and the network texturizes the sketch to accurately account for 3-D surfaces and lighting.

The California Consumer Privacy Act of 2018: a deep dive

Dr. Sanebaits Thenball, Digifesto blog


I have given the California Consumer Privacy Act of 2018 a close read.

In summary, the act grants consumers a right to request that businesses disclose the categories of information about them that it collects and sells, and gives consumers the right to businesses to delete their information and opt out of sale.

What follows are points I found particularly interesting. Quotations from the Act (that’s what I’ll call it) will be in bold. Questions (meaning, questions that I don’t have an answer to at the time of writing) will be in italics.

Don’t Just Lecture Robots—Make Them Learn

WIRED, Science, Matt Simon


New research out of UC Berkeley is making learning way easier on both the human and machine: By drawing on prior experience, a humanoid-ish robot called PR2 can watch a human pick up an apple and drop it in a bowl, then do the same itself in one try, even if it’s never seen an apple before. It’s not the most complex of tasks, but it’s a big step toward making machines rapidly adapt to our needs, fruit-related or otherwise.

Australian experiment wipes out over 80% of disease-carrying mosquitoes

CNN, Jesse Yeung


In an experiment with global implications, Australian scientists have successfully wiped out more than 80% of disease-carrying mosquitoes in trial locations across north Queensland.

The experiment, conducted by scientists from the Commonwealth Scientific and Industrial Research Organization (CSIRO) and James Cook University (JCU), targeted Aedes aegypti mosquitoes, which spread deadly diseases such as dengue fever and Zika.

In JCU laboratories, researchers bred almost 20 million mosquitoes, infecting males with bacteria that made them sterile. Then, last summer, they released over three million of them in three towns on the Cassowary Coast.

Music streams in the US grew by over 118bn in the first half of 2018 – the biggest jump in history

Music Business Worldwide, Tim Ingham


Here’s an encouraging stat for the global music business: the amount of music streamed in the United States in the first half of this year grew by the largest volume ever seen.

According to Nielsen Music’s new mid-year report on the US market, there were 403.4bn total music streams in the States in the six months to end of June, up 41.6%.

Of that number, 268.2bn streams were on audio platforms (+45.4%) and 135.2bn were on video platforms (+34.7%).


The Unlinkable Data Challenge: Advancing Methods in Differential Privacy

“This competition is about creating new methods, or improving existing methods of data de-identification, in a way that makes de-identification of privacy-sensitive datasets practical. A first phase hosted on HeroX will ask for ideas and concepts, while later phases executed on Topcoder will focus on the performance of developed algorithms.” Deadline for submissions is July 26.
Tools & Resources


Google, Colaboratory


“We call them ‘seeds.’ Each seed is a machine learning example you can start playing with. Explore, learn and grow them into whatever you like.”

Foundations of Machine Learning

GitHub – Bloomberg


Bloomberg presents ‘Foundations of Machine Learning,’ a training course that was initially delivered internally to the company’s software engineers as part of its ‘Machine Learning EDU’ initiative. This course covers a wide variety of topics in machine learning and statistical modeling. The primary goal of the class is to help participants gain a deep understanding of the concepts, techniques and mathematical frameworks used by experts in machine learning.”

Company Data Science News

Harvard Dataverse

Facebook has published a dataset of all of the URLs shared on fb at least 20 times, with at least one public share. These URLs are tied to information about the person sharing them – gender, location, etc. – and the data will be accessible by researchers whose RFPs are approved. This is part of the effort led by Alondra Nelson and the Social Science Research Council, Facebook, and researchers like Gary King from Harvard.

Milena Marinova has moved to Pearson from Intel where she will be using data science to personalize education. While I believe there may be some value in personalized education, I am also worried that it will yield negative unintended consequences. We have tried placing students in educational tracks before. It tends to benefit gifted students but increase certain challenges for those who aren’t labeled gifted. Marinova is highly educated but appears to have spent zero years as a teacher. Further, Pearson tends to insight suspicion among academics who feel that the company puts profit motives over pedagogical excellence.

Andy Patrizio writing for Ars Technica has an overview of the ongoing “arms race” for AI-specific chip technology. To recap, Google, Facebook, and Microsoft are developing AI-specific chips but will likely never sell them to consumers, preferring to push that traffic to their cloud-based web services. Intel, with Spring Crest and Lake Crest, and NVidia with Jetson Xavier, Drive Pegasus, and Drive Xavier, are releasing new chips to the consumer market. Both of these companies already have deep consumer-chip experience.

Twitter is deleting bots and other fake accounts. This will surely improve the user experience. Only idiotic investors will clutch their pearls over a one-time dip in user numbers. Jeremy Ashkenas is tracking Twitter follower counts at Observable.

Hinge, the “relationship app” startup based in New York City, is now using data science to suggest your “most compatible” match based on the people you’ve right-swiped in the past. Does anyone else see a huge problem with the the training dataset here? Use this in class as an example likely to produce which of the above: false positives; false negatives.


Full-time positions outside academia

Research Scientist, Google Brain

Google; Amsterdam, Netherlands
Tenured and tenure track faculty positions

Associate Professor – Critical Journalism Studies

University of California-San Diego, Department of Communication; San Diego, CA

Leave a Comment

Your email address will not be published.