Data Science newsletter – November 3, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for November 3, 2017

GROUP CURATION: N/A

 
 
Data Science News



Data analytics have revolutionized the NBA

Quartz, Dan Kopf


from

… More sophisticated analysis led to the other big change in the NBA. Teams are now much better at evaluating defense. Using the granular tracking data, analysts can now identify which players are best at altering or discouraging the most efficient types of shots (three-pointers and dunks). Analysts have also used the most sophisticated statistical techniques, like Bayesian networks, to establish how much better a team’s overall defense is when a given player is on the court.

All this has led to the near-extinction of certain types of players that were highly valued in the past. Teams are no longer as interested in players who take a lot of inefficient two-point shots, and don’t grade out as strong defenders. Also, all players are now expected to be good passers, who can identify which other players on the court might have an opportunity for an efficient shot.

In addition to changing the perception about which kinds of basketball players are most valuable, the NBA’s data revolution has also changed how much “valuable” players actually play. Teams are increasingly giving players days off, or playing them less, when data show they are at risk of injury. Silver has said that teams are even testing saliva for signs of fatigue.


Tweet of the Week

Twitter, RogueNASA


from


Data Visualization of the Week

Twitter, NPR visuals team


from


Lockheed Martin places big bets on defense start-ups

CNBC, Andrew Zaleski


from

Up until last summer, Lockheed Martin Ventures had typically tailored its investments to companies in the later stages of development. Moran has refocused the company’s funding to target earlier-stage start-ups and get in at the ground level of young companies that might still be trying to figure out their business paths but are generating innovating technology in the process.

So far this year, Moran’s Ventures division has invested in start-ups working in several key areas. Among them: Terran Orbital, a manufacturer of nanosatellites; Cyberreason, which makes cybersecurity software; and Peloton Technology, a maker of autonomous technology for trucks that raised a $60 million round in April that included Lockheed funding. All told, Lockheed Martin Ventures has funded companies to the tune of $20 million over the last year.


Welcome from Director Magdalena Balazinska

University of Washington, eScience Institute


from

Our data scientists, postdoctoral fellows, faculty, and students won numerous prizes and awards, and published world-class research. Other flagship activities including the Winter Incubator program, seminars, poster sessions, workshops, and working groups have attracted large crowds and campus-wide interest. We have created new data science courses to meet the growing needs and interests of our students. We advertised and disseminated our education programs across campus, with new departments joining our cross-university initiatives.

As the new year begins, we are happy to welcome our new cohort of postdoctoral fellows and graduate students! The graduate students are pursuing Advanced Data Science Options in multiple units on campus, while coming together as a cohort through data science classes, our seminars, and other events in the WRF Data Science Studio. The postdoctoral fellows bring with them invigorating outside expertise and creativity. They push the state-of-the-art in their domain while advancing data science methods.


NSF Doubles Down on Data Science

datanami, George Leopold


from

The National Science Foundation has awarded two grants to the University of California at Berkeley, the first to deepen the theoretical foundations of data science, the second addressing the big data skills gap.

The first NSF awarded supports creation of Foundation of Data Analysis (FODA) Institute that brings together basic research on applied mathematics, theoretical computer science and theoretical statistics, the university announced.

The award also will fund two national data science workshops designed to develop curriculum “anchored in the actual practice of data science work,” organizers said. The curricula materials, including course modules and exercises, will be publicly available.


Informaticists seek a leader for data science research | Health Data Management

ZDNet, Larry Dignan


from

Nvidia said it has expanded partnerships and added industry-specific courses to teach deep learning and artificial intelligence under its Deep Learning Institute effort.

The Deep Learning Institute launched about a year ago, with the aim of teaching new skills to data scientists, students and developers.

Under the expansion, the Deep Learning Institute (DLI) will partner with Booz Allen Hamilton and deeplearning.ai to offer more training. In addition, DLI is forming a university ambassador program to help instructors teach AI.


Architectures Battle for Deep Learning

EE Times, Linley Gwennap


from

In the brief history of deep neural networks (DNNs), users have tried several hardware archi­tec­tures to increase their performance. General-purpose CPUs are the easiest to program but are the least efficient in performance per watt. GPUs are optimized for parallel floating-point computa­tion and provide several times better performance than CPUs. As GPU vendors discovered a siza­ble new customer base, they began to enhance their designs to further improve DNN throughput. For example, Nvidia’s new Volta architecture adds dedicated matrix-multiply units, accelerating a common DNN operation.

Even these enhanced GPUs remain burdened by their graphics-specific logic. Furthermore, the recent trend is to use integer math for DNN inference, although most training continues to use floating-point computations.


The Devil Is in the Detail of Deep Learning Hardware

Electronic Design, James Morra


from

To identify skin cancer, perceive human speech, and run other deep learning tasks, chipmakers are editing processors to work with lower precision numbers. These numbers contain fewer bits than those with higher precision, which require heavier lifting from computers.

Intel’s Nervana unit plans to release a special processor before the end of the year that trains neural networks faster than other architectures. But in addition to improving memory and interconnects, Intel created a new way of formatting numbers for lower precision math. The numbers weigh fewer bits so the hardware can use less silicon, less computing power, and less electricity.


Nvidia Eyes Gov’t Market with Deep Learning Expansion

datanami, George Leopold


from

Nvidia said it is expanding its deep learning partnerships this week with government contractor Booz Allen Hamilton to help train U.S. military personnel along with AI pioneer Andrew Ng to train developers working in fields such as natural language processing and video analytics.


Cornell Tech Dreams Big on Roosevelt Island | 2017-11-01 | Architectural Record

Architectural Report, Joann Gonchar,


from

The concept for what would become Cornell Tech—the new tech-focused graduate school on New York’s Roosevelt Island—was born during the last recession. Former mayor Michael Bloomberg was behind the push to accelerate growth of the city’s tech sector and make the economy less dependent on Wall Street and the financial-services industry. At the inauguration of the first phase of the campus in mid-September, Bloomberg summed up his sweeping vision. “The companies and innovations spawned by Cornell Tech graduates will . . . help our city compete with tech centers around the world, from Silicon Valley to Seoul.”

This new institution, which offers graduate-level degrees in disciplines such as information science, electrical engineering, and connective media, and combines cutting-edge research and education with entrepreneurship, had, since 2012, been housed in Google’s building in Manhattan. But now its 300 students and 30 faculty members have a home of their own, with three completed structures: an academic building by Morphosis Architects; a “co-location” facility by Weiss/Manfredi Architects that provides space for collaboration between academia and industry; and a residential tower by Handel Architects.


Artificial intelligence in business: Separating the real from the hype

McKinsey & Company, Simon London


from

The potential for AI to infuse business and value chains across various industries is greater than ever before—but where should executives start?

Typically understood as being all about robots and whiz-bang machines, artificial intelligence (AI) can be tough for executives to wrap their business minds around. In this episode of the McKinsey Podcast, senior partner Peter Breuer and McKinsey Global Institute partner Michael Chui speak with McKinsey Publishing’s Simon London about burgeoning business applications of artificial intelligence, the line between hype and true use cases for AI, and how business leaders can separate one from the other. [audio, 23:16]


University Data Science News

Just 150 US universities receive 90% of US Federal research grant dollars. This is small compared to, say, the business equivalent which is arguably the Fortune 500. (Does anyone talk about an Academic 150?) In an ongoing project led by a team including Julia Lane these universities are being recruited to submit data about who they hire, degrees they grant, grants submitted, and expenditures on services to the UMETRICS database. The goals of the service are to evaluate the impact of higher education and to allow for a broader understanding of the economic role of science.

Kate Crawford and Meredith Whittaker of the new-ish AI Now released a report on ethics in AI a couple weeks ago. They emphasize framing the social impact of AI in the context of power dynamics, an opinion with which this sociologist wholeheartedly agrees. As for actionable recommendations: hire more domain experts: “Who gets a seat at the table in the design of these systems? At the moment it’s driven by engineering and computer science experts who are designing systems that touch everything from criminal justice to healthcare to education. But in the same way that we wouldn’t expect a federal judge to optimize a neural network, we shouldn’t be expecting an engineer to understand the workings of the criminal justice system.”

Anthony Townsend, author of the 2013 book Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, has a new piece out in which he argues that smart cities need to be researched and understood far better or we’ll end up with a slew of unintended negative consequences like phone-distracted drivers. He disagrees with Shannon Mattern who earlier argued that cities should not be used as laboratories, positing instead that the upside for positive social impact stemming from cities-as-laboratories is too great to pass up. He noted, provocatively, that “anachronistic, Western notions of individual privacy are questionably robust when stacked against the greater good that is possible” in intensively surveilled, sensored, “smart” cities.

Shlomo Zilberstein and Joydeep Biswas at the University of Massachusetts Amherst won a $700,000 grant from the NSF to develop more advanced autonomous robotic systems “that can learn from experience, recognize when they need human help and ask for it.” Current industrial robots have to be caged to keep people from getting too close to them because they aren’t smart enough to be careful (apparently, this applies to the people and the robots, hence the cages). Future robots will be trained to take care around their bumbling humans.

From human-robot collaboration to robot-robot collaboration: University of Pennsylvania received a 5-year $27 million from the Army Research Lab to develop teams of robots that can learn from and collaborate with each other. This idea holds a ton of promise and I can’t wait to check in again in a year or two. Or five.



Also at Penn the demand for Computer and Information Systems courses has increased so much (e.g. 120 students per class to 350 students per class) that the school is struggling to find physical space for all the students and TAs, especially during office hours. It’s a problem, for sure, but a good one to have. There’s a new report out by the National Academies of Sciences, Engineering, and Medicine on the growth of student demand for CS degrees.

Aomawa Shields, University of California-Irvine professor, won a grant from the NASA Habitable Worlds project to study the surface reflectivity of exoplanets.



Super interesting project by Carnegie Mellon’s Marcel Just and the University of Pittsburgh’s David Brent. They have developed a way to identify suicidal individuals through brain imaging. They elicit responses to concepts such as death, cruelty and trouble and are then able to sort those with suicidal thinking from those who aren’t thinking that way. This could be a huge win in the data science for social good category if it is deployed well.

Georgia Tech cancer drug researchers John McDonald and Fredrik Vannberg have released their machine learning and data normalization code so that other researchers can accelerate the path to more effective drug development. “With our project, we’re advertising that sharing should be what everybody does,” Vannberg said. “This can be a win for everybody, but really it’s a win for the cancer patients.”


The Earth Microbiome Project is demonstrating the powerful scientific advantages of well-managed open source data sharing. I’ll let them summarize: “The Earth Microbiome Project was founded in 2010 to sample the Earth’s microbial communities at an unprecedented scale…. We recognized that open and collaborative science, including scientific crowdsourcing and standardized methods, would help to reduce technical variation among individual studies, which can overwhelm biological variation and make general trends difficult to detect. Comprising around 100 studies…the EMP has now dwarfed by 100-fold the sampling and sequencing depth of earlier meta-analysis efforts; concurrently, powerful analysis tools have been developed, opening a new and larger window into the distribution of microbial diversity on Earth.”

UC-Berkeley computer science students Ash Bhat and Rohan Phadte created a browser plug-in to detect whether a given Twitter account is actually a political propaganda bot.



Columbia University’s Junfeng Yang and Suman Jana along with Lehigh University’s Yinzhi Cao have developed DeepXplore, a tool to make deep learning more transparent. The tool feeds confusing real-world inputs into the network to help understand how thoroughly the logic has been tested for errors.

NYU Tandon School of Engineering has announced a partnership with the Indian Institute of Technology, Kanpur. The effort aims to incubate and launch engineering start-ups by making flex desk space available for short visits and providing insights into each country’s respective markets. This entrepreneurial university-based activity follows a larger trend in academia to promote economic development through new business creation.


Colleges and Universities Should Take Action to Address Surge of Enrollments in Computer Science

National Academies of Science, Engineering and Medicine


from

U.S. colleges and universities should respond with urgency to the current surge in undergraduate enrollments in computer science courses and degree programs, which is straining resources at many institutions, says a new report from the National Academies of Sciences, Engineering, and Medicine. The report examines the benefits and drawbacks of a range of strategies that institutions could pursue in response – such as adding faculty and resources, imposing targeted controls on enrollment, or using innovative technologies to deliver instruction to large numbers of students, among many other options.

An important factor driving the enrollment surge is the labor market, where the number of computing jobs far exceeds the number of computer science graduates being produced. According to the Bureau of Labor Statistics, employment in computer occupations in and beyond the technology sector grew by nearly a factor of 20 between 1975 and 2015 – nearly twice as fast as the production of bachelor’s degrees in computer and information science and support services. In particular, expertise in cybersecurity, data science, and machine learning are in high demand.


Falling Walls: The Past, Present and Future of Artificial Intelligence

Scientific American Blog Network, Jurgen Schmidhuber


from

The Falling Walls Conference is an annual, global gathering of forward thinking individuals from 80 countries organized by the Falling Walls Foundation. Each year, on November 9—the anniversary of the fall of the Berlin Wall—20 of the world’s leading scientists are invited to Berlin to present their current breakthrough research. The aim of the conference is to address two questions: Which will be the next walls to fall? And how will that change our lives? The author of the following essay is speaking at this year’s Falling Walls gathering.

 
Tools & Resources



[1710.06451] A Bayesian Perspective on Generalization and Stochastic Gradient Descent

arXiv, Computer Science > Learning; Samuel L. Smith, Quoc V. Le


from

This paper tackles two related questions at the heart of machine learning; how can we predict if a minimum will generalize to the test set, and why does stochastic gradient descent find minima that generalize well? Our work is inspired by Zhang et al. (2017), who showed deep networks can easily memorize randomly labeled training data, despite generalizing well when shown real labels of the same inputs. We show here that the same phenomenon occurs in small linear models. These observations are explained by evaluating the Bayesian evidence, which penalizes sharp minima but is invariant to model parameterization. We also explore the “generalization gap” between small and large batch training, identifying an optimum batch size which maximizes the test set accuracy. Interpreting stochastic gradient descent as a stochastic differential equation, we identify a “noise scale” g=ϵ(NB−1)≈ϵN/B, where ϵ is the learning rate, N training set size and B batch size. Consequently the optimum batch size is proportional to the learning rate and the training set size, Bopt∝ϵN. We verify these predictions empirically.


Clicks are a vanity metric. Here are some story analytics you should be worrying about. –

Storybench, Cat McGloin


from

“Whatever the reason, a lack of clicks rarely has anything to do with the standard of your work. So why, if the second scenario plays out – and lots of people click – do we congratulate ourselves and take this as a sign that our articles are great when all it really means is that this time around you nailed that digital ad?”

 
Careers


Full-time, non-tenured academic positions

Manager of Industry Engagement and Outreach



Columbia University, Data Science Institute; New York, NY
Postdocs

Postdoctoral Research in Extragalactic astronomy



Haverford College; Haverford, PA
Full-time positions outside academia

Deputy Head of Audience Engagement



Financial Times; London, England

Leave a Comment

Your email address will not be published.