Data Science newsletter – March 14, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for March 14, 2018

GROUP CURATION: N/A

 
 
Data Science News



PolyU and Alibaba Join Hands to Promote Integration of Fashion and Artificial Intelligence

Hong Kong Polytechnic University


from

The Institute of Textiles and Clothing (ITC) of The Hong Kong Polytechnic University (PolyU) collaborates with the Vision and Beauty Team at Alibaba Group, which specializes in vision intelligence and applications, to establish the first-of-its-kind “FashionAI Dataset” for systematic analysis and labelling of fashion images based on “fashion attributes” (fashion characteristics) and “key points” of an apparel. By integrating fashion knowledge and machine learning formulation, the establishment of the Dataset will enable machine to better understand fashion, bringing a new horizon to the fashion retail industry through the application of AI.

“Transforming fashion knowledge into determination of fashion related attributes and fashion item categorization of the fashion image database is a very complicated and challenging task, while it is the most fundamental task in deep learning applications. ITC is pleased to collaborate with Alibaba to address the needs of fashion retailers and consumers,” said Professor Calvin Wong, Cheng Yik Hung Professor in Fashion and Associate Head of ITC.


Government Data Science News

China just published results of a decade long project to boost agricultural yields while reducing reliance on fertilizers. While they haven’t exactly produced the second Green Revolution, their results are certainly uplifting: “corn, rice and wheat output grew by some 11% over that decade, whereas the use of damaging and expensive fertilizers decreased by between 15% and 18%, depending on the crop.” Score: 1 for “moving in the right direction.”



US District Court Judge Lucy Koh is putting the hurt on Yahoo! following their data breach in 2014 that seems to have been orchestrated by Russian operatives. Now that Verizon bought Yahoo!, the combined company, Oath will be responsible for damages that could be in the hundreds of millions. A Verizon spokesman, Bob Varettoni, believes “the claims are baseless.” I don’t know about that, Bob.



The US Army awarded Palantir an $876m contract. This is after CEO Peter Thiel sued the government over how the Army would award this particular contract. Click through for an oddly chosen picture of Thiel.



But in New Orleans, Mayor Mitch Landrieu has decided not to renew Palantir’s contract. Landrieu stated only that Palantir’s technology was no longer being used on a daily basis. The technology has led to calls for retrials in high profile gang cases as a result of difficulty getting Palantir to participate fully in the discovery process. Palantir refused to report whether a suspect was or was not in their database.

The Office of National Statistics in the UK has a new report that spells out how many of the United Nation’s 244 Sustainable Development Goals can be measured. They have “headline data” for 114 of the goals, may be able to wrestle up data on 86 more, but admit there are legitimate “data gaps” in which no data are likely to be forthcoming for 10 more. Sustainable Cities and Communities and (Goal 11) and Life on Land (Goal 15) are two of the goals facing data gaps.


PSU receives $300,000 grant to help students use big data to guide Portland’s growth

Portland State University


from

The National Science Foundation has awarded Portland State University a three-year, $300,000 grant to help undergraduate student researchers use big data to guide growth in the Portland metropolitan region.

PSU’s “Computational Modeling Serving the City” project will bring eight college students to PSU from throughout the U.S. this summer to work on research projects involving transportation, pollution, water sources and other topics important to Portland’s quality of life.


Hedge Funds That Use AI Just Had Their Worst Month Ever

Bloomberg Technology, Dani Burger


from

Chalk one up for the humans.

Hedge funds that use artificial intelligence and machine learning in their trading process posted the worst month on record in February, according to a Eurekahedge index that’s tracked the industry from 2011. The first equity correction in two years upended their strategies as once-reliable cross-asset correlations shifted.

While computerized programs are feared for their potential to render human traders obsolete, the AI quants lagged behind their discretionary counterparts. The AI index fell 7.3 percent last month, compared to a 2.4 percent decline for the broader Hedge Fund Research index.


Extra Extra

There’s another missive written by a former Google employee, this time discussing internal organization with choice phraseology such as: “Google was awash in money and happy to spray it all over its employees.”



I am a survey geek so I’m excited about Delphia’s new survey tool that guides people to meaningful decisions about what kind of workplace culture best suits them or which college to attend.



Headsup: You can use AI to tell if your papayas are ripe. This technology was clearly developed outside the US. Here we experience widespread anxiety over the timing associated with ripening avocados.


Chinese project offers a brighter farming future

Nature, Editorial


from

A massive, decade-long experiment involving millions of Chinese farmers demonstrates an evidence-based approach to sustainability.


Toward a universal decoder of linguistic meaning from brain activation

Nature Communications; Francisco Pereira & Evelina Fedorenko et al.


from

Prior work decoding linguistic meaning from imaging data has been largely limited to concrete nouns, using similar stimuli for training and testing, from a relatively small number of semantic categories. Here we present a new approach for building a brain decoding system in which words and sentences are represented as vectors in a semantic space constructed from massive text corpora. By efficiently sampling this space to select training stimuli shown to subjects, we maximize the ability to generalize to new meanings from limited imaging data. To validate this approach, we train the system on imaging data of individual concepts, and show it can decode semantic vector representations from imaging data of sentences about a wide variety of both concrete and abstract topics from two separate datasets. These decoded representations are sufficiently detailed to distinguish even semantically similar sentences, and to capture the similarity structure of meaning relationships between sentences. [full text]


Amazon Is Hiring More Developers For Alexa Than Google Is Hiring For Everything

Forbes, John Koetsier


from

Amazon is hiring 1,147 people for its Alexa business unit alone, says Citi Research in a new report. That’s more than Google is hiring for product and technical roles across the entire Alphabet conglomerate, including YouTube, Waymo, Google Fiber, and — of course — the main money maker in the Alphabet empire: the original Google.

“Key takeaways from the job openings at Alphabet include that the company’s pace of hiring relative to its current headcount (3%) is among the lowest in our coverage,” said Citi analysts Mark May and Caleb Siegel.

That could bode well for Google’s margins, Citi says.


Laptops in the Classroom: An Open and Closed Case

FutureEd blog


from

At first I welcomed this development, grateful that my 80-plus students would have ready access to course readings and an efficient way to capture our conversations. But soon students began to complain about being distracted by the steady flow of news alerts and social media activity on their peers’ screens.

Three years ago, I decided to require that students disable their laptops’ wireless capabilities, convinced that the root of the problem was the temptation of being connected—and the costs giving into that temptation imposed on other students. My “no-WiFi” policy proved difficult to enforce, however, and did nothing about the possibility that laptop-use might hinder learning in other ways.

This semester, I went a step further.


2018 will be the year of self service data science for the enterprise

Computer Business Review, Nick Jewell


from

According to Gartner, in their Magic Quadrant for Business Intelligence and Analytics Platforms, the number of ‘citizen data scientists’ will grow five times faster than the number of data scientists. Citizen data scientists (a technical term that could be described in more friendly terms as an everyday or everyman analyst) are the data analysts of tomorrow. Gartner defines a citizen data scientist ‘as a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.’ In plainer terms, it’s people who use data properly to do their jobs better.

With more and more citizen data scientists and increasingly efficient data analytics tools to match, 2018 will be the year of self-service data analytics for the enterprise. They will be armed with technologies that allow them to add and blend data, then extract insight from data sets without needing a background in coding or certified skill-sets. Given the consumerisation of so many technologies, it’s no surprise that data science is coming out of the IT department and into the hands of business people.


Company Data Science News

Tim Berners-Lee, father of the World Wide Web, came out with a piece in The Guardian a week before Facebook found itself in scalding hot, reputation melting water over the way their platform allows app makers like Cambridge Analytica to harvest all sorts of user data for…well, mostly for whatever they want because Facebook doesn’t routinely conduct audits, nor does anyone else. Facebook has, at this point, terminated Cambridge Analytica’s access. The New York Times is reporting that Facebook’s Chief Information Security Officer, Alex Stamos, just resigned over disputes about how the company has handled Russian interference in the US election and misinformation.



Oh, Facebook. Being one of the first giant tech companies means you have to figure out how to deal with all sorts of unprecedented ethical and legal situations. Please at least actively try. Sir Berners-Lee advocates we go ahead and smash the golden calves of our time (e.g. giant tech companies) by deflating two foundational myths. He writes that our “collective imagination” has allowed “the myth that advertising is the only possible business model for online companies, and the myth that it’s too late to change the way platforms operate” to govern the information commons. He encourages everyone “to be a little more creative.” I’m guessing the EU Commission that is bringing forth a revised version of the General Data Privacy Regulation in little more than a month will gleefully take up this particular call for creative destruction.



Speaking of the new wrath upon Facebook’s, all of the media places were aflame over the weekend when major news outlets in the US (The New York Times) and UK (Channel 4 undercover reporting) are pumping out scathing stories explaining how a psychology professor working for Cambridge Analytica used a psychometric profiling quiz app installed on the fb platform to harvest user data. Cambridge Analytica then used the data they gathered to Facebook, of course, didn’t really care because they tend to take a mostly hands off approach to app makers, limiting mostly those that traffic in nudies and hate speech.



Did you miss Uber? For a while they were in this newsletter every single week, but not recently. Today, I am sad to report that a woman died after she was run over by a member of Uber’s self-driving car fleet in Tempe, Arizona. The woman was jaywalking at the time, which makes her totally normal. The car was operating in autonomous mode with a human passenger/driver.



The Eurekaindex tracks 15 AI driven hedge funds. It tanked big time last month. Bottom line: Using AI techniques well is really hard.



Amazon is hiring over 1,100 people to work on Alexa. This is more than the total number of engineers Google is hiring to work on any project. That point Tim Berners-Lee made about the myth that the internet runs on ads? He is right. It can run quite well on retail!



Madison Square Garden has been using facial recognition to identify attendees at events including Knicks and Rangers games for an undisclosed amount of time. In a statement, the arena’s owners described this as “a safe and wonderful experience for our guests.” The NBA spokesperson seems like the high tech part “our teams are exploring the use of all state-of-the-art technology, including facial recognition.” The NHL had nothing to say.



Microsoft has announced a technique for translating Chinese news into English with the same accuracy as humans using the dataset newstest2017.

Larry Page invented a hybrid drone/plane and has been conducting test flights of his flying taxi above New Zealand, where all of the wealthy technorati hang out. Lucky New Zealand!



If you’re young, may want to have kids someday, and thinking of taking a job at a tech company, you might want to check out their parental leave policies. It varies considerably, especially for the non-primary parent. Hat tip to Daniela Huppenkothen for the link.

Big tech companies and their acqui-death strategies may be suffocating innovation in AI. Why is it only an anti-trust case if two big companies merge? What about what happens when five or ten big companies buy hundreds of potential competitors before they are able to make a dent in the market?



Spotify’s IPO is super interesting from a financial perspective. They are forgoing the typical route in which bankers make a book of institutional investors who agree to buy the stock in return for which the bankers earn lots of money. Instead, they are skipping the book making for a different route in which it looks like bankers will also earn lots of money. (By ‘earn’ I mean receive.) Bankers ‘earned’ bucket loads of money by underpricing the Zscaler IPO last week. It listed at $16 and shot to a peak of $33.37 on its first trading day. Bankers!!!


Latest tools in machine learning aim to accelerate precision medicine through American Heart Association and Duke Clinical Research alliance

American Heart Association


from

A new strategic alliance will target the prediction, prevention and treatment of cardiovascular diseases using artificial intelligence computing and big data, the American Heart Association (AHA) and the Duke Clinical Research Institute (DCRI) announced today.

The AHA’s Institute for Precision Cardiovascular MedicineTM together with the DCRI’s data science team, under the direction of Michael Pencina, Ph.D., and Lawrence Carin, Ph.D., will develop and test machine learning methods on the AHA Precision Medicine Platform, which is powered by Amazon Web Services.


The Final Frontier’s Final Frontier

University of Pittsburgh, Swanson School of Engineering


from

In T minus 8,760 hours, or roughly one year, the Space Test Program-Houston 6 (STP-H6) hybrid and reconfigurable space supercomputer will board the International Space Station. The newest mission to the ISS featuring research and technology from the University of Pittsburgh’s NSF Center for Space, High-performance, and Resilient Computing (SHREC) will bring an unprecedented amount of computing power into space and invaluable research opportunities from the ground station on Pitt’s Oakland campus.

“Computer engineering for space is the ultimate challenge,” says Alan George, SHREC founder and the Mickle Chair Professor of Electrical and Computer Engineering (ECE) at Pitt’s Swanson School of Engineering. “Space computing has become a principal challenge in all spacecraft, since remote sensing and autonomous operation are the main purposes of spacecraft and both demand high-performance computing.” This new mission experiment is the work of an outstanding team of graduate and undergraduate students studying at Pitt, led by Chris Wilson.


Madison Square Garden Has Used Face-Scanning Technology on Customers

The New York Times, Kevin Draper


from

Madison Square Garden has quietly used facial-recognition technology to bolster security and identify those entering the building, according to multiple people familiar with the arena’s security procedures.

The technology uses cameras to capture images of people, and then an algorithm compares the images to a database of photographs to help identify the person and, when used for security purposes, to determine if the person is considered a problem. The technology, which is sometimes used for marketing and promotions, has raised concerns over personal privacy and the security of any data that is stored by the system.

 
Events



Public talk by Stuart Geiger at the IT University of Copenhagen.

IT University of Copenhagen, Ethos Lab


from

Copenhagen, Denmark March 26 at 1 p.m. “Computational Ethnography and the Ethnography of Computation: The Case for Context” [free]


Machine Eatable

Microsoft Research, DataKind


from

New York, NYMicrosoft Cities and DataKind are re-launching Machine Eatable, a lunchtime data science discussion at Civic Hall.” … “Our lunch on Friday, March 23rd will feature Jake Hofman, a researcher at Microsoft.” [free, registration required]


UC Berkeley Professor Abbeel to speak at TechCrunch Sessions: Robotics, May 11 at Zellerbach Hall

TechCrunch


from

Berkeley, CA “TechCrunch’s upcoming TC Sessions: Robotics is fast taking shape. The single-day event, held on May 11, will focus on the crossroads of the latest AI and robotics technology and the startup ecosystem. We’re really pleased to announce two sessions for the show — as well as throw open the application for an early-stage robotics startup pitch competition.” [$$$]


The Secret Science Club: Secret Science Club presents Social Neuroscientist Jay Van Bavel on “Our Partisan Brains” Monday, March 19, 8PM @ the Bell House, FREE!

The Secret Science Club


from

Brooklyn, NY “Secret Science Club presents Social Neuroscientist Jay Van Bavel on “Our Partisan Brains” Monday, March 19, 8PM @ the Bell House” [free]


Data Natives 2018

Andrea Baronchelli, Abeer ElBahrawy and Laura Alessandretti


from

London, England April 19 at City. University of London. “Join us for relaxed day of networking and presentations on human behaviour, cities, computational social science and complex networks, among the other things.” [free, registration required]

 
Deadlines



SFU helps close STEM gender gap through summer AI program

“Twenty-four female Grade 11 students from across Canada will be immersed in life as an AI researcher on SFU’s Burnaby campus from July 8-20.” Deadline to apply is April 3.

ACIC 2018 Causal Inference Challenge

Pittsburgh, PA “This year, the Atlantic Causal Inference Conference 2018 will take place at Carnegie Mellon University between May 21st to May 23rd.” … “The challenge focuses on computational methods of inferring causal effects from real-world healthcare-related data.” Deadline for submissions is April 30.

Call for Papers – CoNLL 2018

Brussels, Belgium October 31-November 1. “SIGNLL, the Association for Computational Linguistics’ Special Interest Group on Natural Language Learning, invites you to submit your papers to CoNLL 2018” Deadline for paper submissions is June 1.
 
Moore-Sloan Data Science Environment News



Could machine learning get its own new programming language?

Medium, NYU Center for Data Science


from

Stefan Karpinski, a research engineer for the Moore Sloan Data Science Environment at CDS, explores the prospect of a new programming language for machine learning

 
Tools & Resources



Civiqs Results

Civiqs


from

Civiqs operates a nationally representative online survey panel for scientific, real-time public opinion research.”


Will Lyon – Graph Analysis of Russian Twitter Trolls

YouTube, Global Data Geeks


from

“As part of the US House Intelligence Committee investigation into how Russia may have influenced the 2016 US election, Twitter released the screen names of nearly 3000 Twitter accounts tied to Russia’s Internet Research Agency. These accounts were immediately suspended, deleting the data from Twitter.com and Twitter’s developer API. In this talk we show how we can reconstruct a subset of the Twitter network of these Russian troll accounts and apply graph analytics to the data to try to uncover how these accounts were spreading fake news.”


Tandem. Decentralized, cross-editor, collaborative text-editing!

GitHub – typeintandem


from

Tandem is a decentralized, collaborative text-editing solution. Tandem works with native text editors, works across different editors, and uses peer-to-peer connections to facilitate communication.

Tandem exists as a set of plugins for native text editors. We currently support Sublime Text 3 and Neovim. We also unofficially support Vim.


The API Economy — The Why, What, and How – Andreessen Horowitz

a16z Podcast; Cristina Cordova, Augusto Marietti, Laura Behrens Wu, and Sonal Chokshi


from

“APIs (application programming interfaces), observe the guests in this episode of the a16z Podcast, can be described as everything from Lego building blocks to Tetris to front doors to even veins in the human body. Because the defining property of APIs is that they’re ways to send and receive information between different parts, that is, communicate between software applications (which often map onto different organizational functions/services in a company too). APIs therefore give companies access to data and competencies they wouldn’t otherwise have — or better yet, that they no longer need — by letting even non-tech and small companies combine these building blocks to get exactly what they want.”


Announcing Machine Learning Model Export in Databricks

The Databricks Blog, Wayne Chan


from

“This Databricks feature furthers our efforts to unify analytics across data engineering and data science by allowing you to export models and full machine learning pipelines from Apache Spark MLlib. These exported models and pipelines can be imported into other (Spark and non-Spark) platforms to do scoring and make predictions.”

 
Careers


Full-time positions outside academia

Primary Research Scientist



Legendary, Applied Analytics division; Boston, MA

Account Manager – Digital Analytics Consultant



E-Nor; United States or Canada
Postdocs

PostDoc – Learning models parameters at scale



Inria; Sophia Antipolis, France

Leave a Comment

Your email address will not be published.