Data Science newsletter – June 20, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for June 20, 2018

GROUP CURATION: N/A

 
 
Data Science News



Dems seek to seize on data privacy as midterm issue

TheHill, Ali Breland


from

Democrats are looking to seize on growing public worries about data privacy to rally voters in the upcoming midterms.

Tech policy typically ranks low on an average voter’s list of priorities, prompting skepticism that data privacy can move the needle in an election.

But in a couple key Senate races in Nevada and Arizona, Democratic campaign groups say public frustration over how corporations are able to suck up increasing amounts of consumer data could make a difference.

They specifically want to use last year’s congressional vote to repeal broadband privacy regulations to attack Republicans.


Participation in AP Computer Science Principles Grows Again

Education Week, Sasha Jones


from

Since its launch, AP Computer Science Principles has become one of the most dramatically expanding AP courses, increasing access for both underrepresented minorities and female students to the field of computer science. And it looks like that growth continued into the 2018 testing cycle, according to an early examination of the 2018 exam data released by the College Board this week.

Between 2017 and 2018, the number of students taking the AP CSP exam increased from 50,000 to 76,000—about 50 percent, according to the new data.

In part, this is due to Nevada and Kentucky partnerships with the College Board to bring AP CSP to every school district within their states. In the 2017-2018 school year, Kentucky doubled the number of schools offering the course.

Unlike other computer science courses, AP CSP consists of projects that students create throughout the year and submit to receive an AP score for college credit in addition to the end-of-the-year exam. While AP CSP teaches programming and computing, it emphasizes creativity and design, providing student with the time and tools needed to explore problem-solving through computing.

“The success of AP CSP is thanks to the tireless work of educators and AP teachers around the country who have made access to computer science education for all students a priority,” said Trevor Packer, who leads the AP Program. “Students are flocking to these courses because they know they will be using computer science in whatever career they choose.”


The Quest to Decode the Brain

Wall Street Journal, Daniela Hernandez


from

New tools that can give a more detailed look at how neurons work are leading the research of brain scientists in new directions


From rust belt to robot belt: Turning AI into jobs in the US heartland

MIT Technology Review, David Rotman


from

The vast vacant lot along the Monongahela River has been a scar from Pittsburgh’s industrial past for decades. It was once the site of the Jones and Laughlin steelworks, one of the largest such facilities in the city back when steel was the dominant industry there. Most of the massive structures are long gone, leaving behind empty fields pocked with occasional remnants of steelmaking and a few odd buildings. It all stares down the river at downtown Pittsburgh.

Next to the sprawling site is one of Pittsburgh’s poorer neighborhoods, Hazelwood, where a house can go for less than $50,000. As with many of the towns that stretch south along the river toward West Virginia, like McKeesport and Duquesne, the economic reasons for its existence—steel and coal—are a fading memory.

These days the old steel site, called Hazelwood Green by its developers, is coming back to life. At one edge, fenced off from prying eyes, is a test area for Uber’s self-driving cars. A new road, still closed to the public, traverses the 178 acres of the site, complete with parking signs, fire hydrants, a paved bike path, and a sidewalk. It doesn’t take much imagination to picture it bustling with visitors to the planned park along the riverfront.


How Netflix sent the biggest media companies into a frenzy, and why Netflix thinks some are getting it wrong

CNBC, Alex Sherman


from

The success of Netflix in the market is why we’re seeing “the greatest rearranging of the media industry chessboard in history,” according to BTIG media analyst Rich Greenfield.

But chasing scale isn’t the answer for every media company, according to Netflix CFO David Wells.

“Not everybody’s going to get big,” Wells said in an interview. “The strategic question is, ‘what type of business do I want to be in the next five or 10 years?'”

So legacy giants are now beginning to contemplate how to beat Netflix at its own game. Comcast, which owns CNBC parent NBCUniversal, has had preliminary talks with AT&T to start an over-the-top digital streaming service with NBCUniversal and Warner Bros. content, according to people familiar with the matter. Discovery is also pondering its own OTT service, potentially with a global technology company, said other sources. Disney is debuting its streaming service next year.


Company Data Science News

Facebook has been building a streaming service and will spend $1 billion to produce custom content for it. I rage quit Facebook a while back and it has crippled my capacity to comment on the likely success of this maneuver. It’s still true that people are glued to their devices and often too lazy to leave the platforms they are on at the moment…so I reluctantly admit I think Facebook will succeed in increasing the length of time users spend on the platform. This will, of course, result in more advertising dollars.



Netflix realized years ago that there’s a business model for data-informed entertainment media and the company is setting the pace. Facebook can try to catch up. Amazon, among others, will also try.

Great news: Verizon, AT&T, Sprint and T-Mobile have decided not to give information on U.S. phone owners’ locations to data brokers. Bad news: this is a totally voluntary move and there’s no legislation to back it up. I am heartened that the companies have moved in the direction of increasing their protection of user data. Applause to the legal teams of all four companies.



Nvidia is opening a new AI research hub in Toronto, one of the hottest Canadian cities for AI research. The lab will be run by Sanja Fidler who is also a professor at University of Toronto.

Adobe will also be opening an artificial intelligence research center in Canada. The company is searching for someone Toronto-based to lead it.



IBM’s brand new Project Debater chatbot held its own against humans. Honestly, I’m not sure if this is a huge leap for chatbot technology or a sad commentary on the status of human debate skills.



Smart home devices are being used to wage domestic abuse. Usually the perpetrators are men and the victims are women.



Microsoft acquired four year old reinforcement learning startup Bonsai. Not sure how much money traded hands.

Atul Gawande, a surgeon and author of many best-selling books on medicine, will become the CEO of a new healthcare collaboration formed by Amazon, Berkshire Hathaway, and JPMorgan Chase.

Marc Benioff thinks Silicon Valley tech people should start fighting for user privacy: “as an industry, we must recognize that the failure to protect personal data is not only a danger to consumers, it also poses one of the greatest threats to the long-term health of the high-tech industry itself, and, by extension, our innovation-based economy.”


The Brilliant Ways UPS Uses Artificial Intelligence, Machine Learning And Big Data

Forbes, Bernard Marr


from

In a business where shaving off a mile per day per driver can result in savings of up to $50 million per year, UPS has plenty of incentive to incorporate technology to drive efficiencies in every area of its operations. According to UPS’ Chief Technology Officer Juan Perez, “Our business drives technology at UPS.” Here are just a few of the ways UPS uses big data and artificial intelligence (AI) to prepare for the 4th Industrial Revolution.


How Spirit AI uses artificial intelligence to level up game communities

VentureBeat, Dean Takahashi


from

Spirit AI is using artificial intelligence to combat toxic behavior in game communities. The London company has created its Ally social intelligence tool to decipher online conversations and monitor whether cyberbullying is taking place.

It is the brainchild of researchers at New York University, according to Mitu Khandaker, creative partnerships director at Spirit AI and an assistant arts professor at the NYU game center. The company uses AI, natural language understanding, and machine learning to help data science and customer service teams to understand the general tenor of an online community. It also helps predict problems before they escalate.

Ally considers context, nuance and the relationships between users versus seeking and blocking keywords. The software uses natural language understanding and AI to identify the intent of a message, and then analyzes the behavior and reactions to determine its impact. It can do things like identify cyberbullying.


Extra Extra

San Francisco is a uniquely challenging place to live. If you find an apartment you can afford, you might want to make sure it isn’t in one of these 39 high rises that are likely to collapse in the next major earthquake.



All those claims about brain chemistry? Yeah, there is still no exacting strategy for measuring minute changes in brain chemistry, according to NIH and university researchers. We may have one soon, but I hope all of my readers are aware that the talk about serotonin and dopamine is usually a little ducky (quack science).



The stories you have heard and *I have taught* about Philip Zimbardo’s prison studies were fake. He trained his students to act, then claimed they fell into these roles with no more prompting than the guard or prisoner labels he had applied to them. And then he let us all believe his fake study – even did a documentary about it – for decades.


The Pipeline – A wide-ranging, unabashedly honest conversation about why tech companies aren’t diverse — and how to fix them

California Sunday, As told to Joy Shan and Elise Craig


from

When people talk about tech’s lack of diversity, they often talk about the pipeline problem: the idea that there aren’t enough qualified women and people of color to hire, which is why the industry is so homogeneous. But to some, the so-called pipeline problem is an excuse. Tech companies, they believe, simply aren’t trying hard enough. To better understand the contours of this debate, we talked to teachers, first-time startup founders, recruiters, engineers, venture capitalists, diversity and inclusion consultants, aspiring coders, and programmers who can’t wait to quit.


Everything big data claims to know about you could be wrong

University of California-Berkeley, Berkeley News


from

When it comes to understanding what makes people tick — and get sick — medical science has long assumed that the bigger the sample of human subjects, the better. But new research led by UC Berkeley suggests this big-data approach may be wildly off the mark.

That’s largely because emotions, behavior and physiology vary markedly from one person to the next and one moment to the next. So averaging out data collected from a large group of human subjects at a given instant offers only a snapshot, and a fuzzy one at that, researchers said.

The findings, published this week in the Proceedings of the National Academy of Sciences journal, have implications for everything from mining social media data to customizing health therapies, and could change the way researchers and clinicians analyze, diagnose and treat mental and physical disorders.

“If you want to know what individuals feel or how they become sick, you have to conduct research on individuals, not on groups,” said study lead author Aaron Fisher, an assistant professor of psychology at UC Berkeley. “Diseases, mental disorders, emotions, and behaviors are expressed within individual people, over time. A snapshot of many people at one moment in time can’t capture these phenomena.”


University Students Create Spatial Analysis Tools to Help Cities Do More with Data

Government Technology, Ben Levine


from

Graduate students from the University of Pennsylvania’s Master of Urban Spatial Analytics Practicum are working with city officials in Philadelphia, Providence, R.I., and Minneapolis to develop data science tools to improve safety, health and quality of life for residents.


Government Data Science News

The UK has a new data ethics framework designed to help policy makers and data science practitioners do well with data science. The first of its seven principles is “Start with clear user need and public benefit.” Well, I certainly damn-well hope this is the first consideration you’ve all implicitly been using.



Nevada and Kentucky are offering AP computer science in every district in their respective states. There has been a noticeable uptick in the number of students taking the AP CS exam, which is actually more of a project-based year long immersive learning experience than a one-off exam preceded by copious amounts of memorization. At the same time, elite private secondary schools in Washington D.C. will stop teaching AP classes. Better AP classes, and better-prepared AP students, stand to erode some of the competitive advantage held by elite pre-college institutions.



Joseph Simons, the new head of the FTC announced the big tech companies better be ready because he’s coming for them. Simons has scheduled 15-20 hearings on “internet privacy, big data and platform companies’ market power.” Public comments are welcome.



California is considering a bill that would require all data and findings funded by the state be made available to the public no more than one year after publication. This state-level open access policy is great, especially in combination with similar policies at private foundations and some federal agencies.



Orlando International Airport will be the first to scan faces of arriving and departing passengers on all international flights. The airport’s chief executive described the announcement as, “like Christmas in June.” What happens to the scans? The “face scan is compared to a Department of Homeland Security biometric database that has passport images of people who should be on the flight.” Problematic people will be investigated, others will speedily proceed with their day. Technically, US citizens can opt out, but there is no known procedure for so doing.



California’s net neutrality law passed, but not before being watered down at the last minute by Democratic Senator Miguel Santiago who appeared to be fed by AT&T executives. He introduced amendments that were not read or voted on by the rest of the committee before the vote was called.


3 Big Benefits of Self-Service Data Access

Socrata, Inc., Justin Bruce


from

In order to truly transform their programs, front-line staff often need self-service access to raw data more than roll-up of data that has been aggregated.

Aggregate data shows sweeping trends: Are crime rates up or down? Have the number of buildings damaged by blight changed? How are graduation rates trending? This is valuable, of course, but for on-the-ground knowledge workers, access to raw data is even more meaningful.

With self-service access to the more granular information that makes up the aggregate data — the timing of parking tickets, for instance, or locations of blighted buildings — government staffers can understand what’s causing trends. The nitty-gritty details can also shape strategic plans. Yet, too often, on-the-ground departmental employees can’t get eyes on data without requesting a report from IT.


Mobile carriers cut off flow of location data to brokers

Associated Press, Frank Bajak


from

Verizon, AT&T, Sprint and T-Mobile have pledged to stop providing information on U.S. phone owners’ locations to data brokers, stepping back from a business practice that has drawn criticism for endangering privacy.

The data has apparently allowed outside companies to pinpoint the location of wireless devices without their owners’ knowledge or consent. Verizon said that about 75 companies have been obtaining its customer data from two little-known California-based brokers that Verizon supplies directly — LocationSmart and Zumigo.

Verizon was the first major carrier to declare it would end sales of such data to brokers that then provide it to others. It did so in a June 15 letter to Sen. Ron Wyden, an Oregon Democrat who has been probing the phone location-tracking market. AT&T, T-Mobile and Sprint followed suit Tuesday after The Associated Press reported the Verizon move.

None of the carriers said they are getting out of the business of selling location data. The carriers together have more than 300 million U.S. subscribers.


Measuring Brain Chemistry

NIH, Director's Blog, Dr. Frances Collins


from

Serotonin is one of the chemical messengers that nerve cells in the brain use to communicate. Modifying serotonin levels is one way that antidepressant and anti-anxiety medications are thought to work and help people feel better. But the precise nature of serotonin’s role in the brain is largely unknown.

That’s why Anne Andrews set out in the mid-1990s as a fellow at NIH’s National Institute of Mental Health to explore changes in serotonin levels in the brains of anxious mice. But she quickly realized it wasn’t possible. The tools available for measuring serotonin—and most other neurochemicals in the brain—couldn’t offer the needed precision to conduct her studies.

Instead of giving up, Andrews did something about it. In the late 1990s, she began formulating an idea for a neural probe to make direct and precise measurements of brain chemistry. Her progress was initially slow, partly because the probe she envisioned was technologically ahead of its time. Now at the University of California, Los Angeles (UCLA) more than 15 years later, she’s nearly there. Buoyed by recent scientific breakthroughs, the right team to get the job done, and the support of a 2017 NIH Director’s Transformative Research Award, Andrews expects to have the first fully functional devices ready within the next two years.

 
Events



USGIF Announces New trajectoryXyzt Event

Trajectory Magazine, USGIF


from

Santa Monica, CA September 19-20. “The first ever conference about the commercial applications of geospatial intelligence (GEOINT). The Foundation is creating trajectoryXyzt in response to the rapid emergence of multiple technologies enabling GEOINT in the commercial space.” [save the date, registration opening soon]

 
Deadlines



13th INFORMS Workshop on Data Mining & Decision Analytics

Phoenix, AZ November 3, in conjunction with INFORMS National Meeting. Deadline for submissions is September 1.

PANAMM 2018: Workshop on Positive and Negative Adversaries in Multimedia

Seoul, Korea Satellite to ACM Multimedia 2018 which takes place October 22-26. Deadline for submissions is September 8.
 
Tools & Resources



How to test accessibility of emails

Campaign Monitor Engineering, Samantha Connelly


from

Testing for accessibility in your emails doesn’t have to be a chore. Making accessible and inclusive emails will increase your target audience (1, 2, 3) and will have other unintended benefits. These other benefits are known as the Curb-Cut effect. When people first introduced curb cuts (i.e. sloped ramps) at intersections for wheel chair riders they discovered that many other people benefited too; i.e. people pushing strollers/shopping trolleys, people with deliveries and people with walkers all used these new curb cuts to help navigate their urban environments.


How to Build Your Own Laugh Detector

IDEO Labs, Nat Steinsultz


from

At their best, machine learning algorithms are only capable of reproducing patterns that exist in the training set, so it’s important to consider any existing structure or bias that could fundamentally characterize the way that the trained algorithm behaves during inference.

I decided to use Audioset as the basis of my laughter detection algorithm, largely because of its accessibility and size. However, the entire dataset was too large and diverse for my problem; I really only care about detecting one kind of noise. So I created my own subset of the Audioset data that contained all the examples of laughter, along with an equal number of examples that did not contain laughter. I limited the non-laughter examples to human sounds. I skipped sounds like musical instruments, farm animals, and explosions, since we’re unlikely to record them in a project space. Including those sounds in the training set might lead to an algorithm that achieves a high accuracy during training, but under performs during inference.


If We Assume: The “Curve of Knowledge”

James Davenport , If We Assume blog


from

As a student I was worried that I wouldn’t be able to come up with new ideas, or discover the Next Cool Thing!… I think a lot of people worry about that.

Honestly: I still worry that I don’t have another “amazing” idea in me…

To help ease that fear as a PhD student, I jotted down a concept I called the “Curve of Knowledge”. It works like this:

  • Come up with new ideas or notions for projects as they occur to you naturally
  • Spend a little time vetting the idea, making sure it’s not obviously flawed
  • Look through the literature, assuming somebody already had this idea decades ago
  • When you inevitably find that the idea has been done, see what the date on the paper says.

  • Twitter meets TensorFlow

    Twitter, Engineering blog, Nicholas Léonard and Cibele Montez Halasz


    from

    Machine learning enables Twitter to drive engagement, surface content most relevant to our users, and promote healthier conversations. As part of its purpose of advancing AI for Twitter in an ethical way, Twitter Cortex is the core team responsible for facilitating machine learning endeavors within the company. With first-hand experience running machine learning models in production, Cortex seeks to streamline difficult ML processes, freeing engineers to focus on modeling, experimentation, and user experience. Our mission is to empower internal teams to efficiently leverage artificial intelligence by providing a platform and unifying, educating, and advancing the state of the art in ML technologies within Twitter. Indeed, Cortex is Twitter’s ML platform team.

    In this blog post, we will discuss the history, evolution, and future of our modeling/testing/serving framework, internally referred to as Deepbird, applying ML to Twitter data, and the challenges of serving ML in production settings. Indeed, Twitter handles large amounts of data and custom data formats. Twitter has a specific infrastructure stack, latency constraints, and a large request volume.

     
    Careers


    Full-time positions outside academia

    Software Engineer



    Skopos Labs; Brooklyn, NY

    Microbial Genomics Scientist



    One Codex; San Francisco, CA

    Research and Evaluation Assistant



    National Socio-Environmental Synthesis Center; Annapolis, MD
    Full-time, non-tenured academic positions

    Research Officer in Analytic Software Development



    London School of Economics, Departments of Methodology and International Relations; London, England

    Policy Analyst (Data Scientist)



    University of Chicago, Crime Lab New York; New York, NY

    Leave a Comment

    Your email address will not be published.