U.S. institutions awarded 54,904 research doctorate degrees in 2016, only five fewer than the previous year’s record high, according to the Survey of Earned Doctorates (SED), a federally sponsored annual census of research degree recipients.
SED provides data for Doctorate Recipients from U.S. Universities, a report published by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation (NSF) that supplies data and analysis of the American system of doctoral education, a vital U.S. economic interest.
Since the SED began collecting data in 1957, the number of research doctorates awarded in science and engineering (S&E) fields has exceeded the number of non-S&E doctorates, and the gap is widening. In 1957, S&E doctorates made up 65 percent of all doctorates awarded. In 2016, S&E doctorates made up 75 percent.
University Data Science News
The Social Science Research Council is coordinating a new academia-industry research partnership, the Social Data Initiative. Principal Investigators whose studies are approved will gain access to de-identified Facebook data, much more comprehensive than the data they typically have. The details of how the data, the power, and publication decisions will flow are spelled out by Harvard’sGary King and Stanford Law’sNathaniel Persily. Academics who want to get involved can either serve on the advisory committee or apply for grants, similar to the way they’d apply for any other kind of grant. Funding comes from private foundations including the Alfred P. Sloan Foundation, the William and Flora Hewlett Foundation, the Omidyar Network, and the Charles Koch Foundation. The initial goal of the Social Data Initiative is to investigate the way Facebook use may be changing democratic practice including the 2016 US elections.
The National Science Foundation just published its annual report on doctoral education in the US, based on exit surveys with newly minted doctorates. The number of doctorates granted in the US in 2016 — 54,904 — was nearly identical to the 2015 high of 54,909. Hot takes:
nine of the top ten granting institutions are flagship state universities (Texas, Wisconsin, Michigan, California (UCLA and Berkeley), Minnesota, Florida, Indiana, Ohio)
women earned 46% of all doctorates granted, continuing to move towards parity. There are still substantial within-field gender gaps in math and engineering,
after graduation, Science, Engineering, and Math (SEM) doctorate holders with “firm employment plans” went either to academic employment (32.6%), postdocs (38.5%), or industry (29%). Industry positions pay almost 1.5 times as much as assistant professor positions and a little more than double postdoc salaries in SEM fields,
about 39% of science, engineering, and math PhDs had no firm employment plans when they submitted their dissertations and took the survey.
University of Pennsylvania is now offering an ethics class in the Computer and Information Science Department led by Ani Nenkova and Michael Kearns. The big difference here is that the course plans to focus on technical skills, which is a huge benefit.
Alan Mislove and Christo Wilson, associate and assistant professors of computer science at Northeastern University are moving forward with their case against a federal computer crimes law that criminalizes breaches of websites’ terms of service, according to a federal district court judge. The computer scientists argue that they need to be able to submit false information to test for gender, racial, and other types of discrimination. Their argument is allowed to go forward. They are being represented by the American Civil Liberties Union (ACLU). Two other plaintiffs, Christian Sandvig from the University of Michigan, Karrie Karahalios from the University of Illinois, and First Look Media were not allowed to continue as plaintiffs as their parts of the case were dismissed.
More and more universities are starting master’s and doctoral programs in data science and analytics—of which statistics is foundational—due to the increasing interest from students and employers. Amstat News reached out to those in the statistical community who are involved in such programs to find out more about them. Given their interdisciplinary nature, we identified programs involving faculty with expertise in different disciplines to jointly reply to our questions. We have profiled many universities in our April, June, and December 2017 issues and January 2018 issue; here are several more.
Ian Buck doesn’t just run the Tesla accelerated computing business at Nvidia, which is one of the company’s fastest-growing and most profitable products in its twenty five year history. The work that Buck and other researchers started at Stanford University in 2000 and then continued at Nvidia helped to transform a graphics card shader into a parallel compute engine that is helping to solve some of the world’s toughest simulation and machine learning problems.
The annual GPU Technology Conference was held by Nvidia last week, and we sat down and had a chat with Buck about a bunch of things relating to GPU accelerated systems, including what is driving the adoption of GPU computing and the things, such as the new NVSwitch that helps boost the performance of machine learning and perhaps HPC workloads.
It is just about impossible to seriously regulate data use with these current practices in place. We need to build security and privacy controls into software tools. Researchers have been developing techniques for doing precisely this. There exist techniques that can, for instance, ensure that an app can read camera information but not send it across the network to anybody else. My research group, in collaboration with researchers at UC Santa Cruz, UC San Diego, Harvard and MIT, is working on a set of techniques that allow programmers to attach precise, complex rules about data use — like “only my friends near me can see my location between 9 a.m. Monday and 5 p.m. Friday” — directly to sensitive data values, allowing developers to write these kinds of policies in one place and auditors to check such policies by looking in a single location. (Full disclosure: Facebook has contributed funding to my research group, and we collaborate with two Facebook employees on a non-privacy related aspect of the work. I also worked on backend privacy at Facebook as an intern in 2012.) This is part of a broader context of researchers at places like Cornell, Stanford and MIT, where there are also groups actively working on information flow security techniques for preventing these kinds of leaks. Requiring a software company like Facebook to use such techniques would make it much easier to enforce higher-level regulation.
I spoke with Zuckerberg on Friday about the state of his company, the implications of its global influence, and how he sees the problems ahead of him.
“I think we will dig through this hole, but it will take a few years,” Zuckerberg said. “I wish I could solve all these issues in three months or six months, but I just think the reality is that solving some of these questions is just going to take a longer period of time.”
But what happens then? What has this past year meant for Facebook’s future? In a 2017 manifesto, Zuckerberg argued that Facebook would help humanity takes its “next step” by becoming “the social infrastructure” for a truly global community.
As a software engineer at Microsoft, Elena Voyloshnikova’s job is to make informed recommendations about how to improve the performance of software engineering tools.
But too often, she spends her days manually analyzing the data she needs to make those decisions. Lately, her team has been discussing the potential of building machine learning models to automate that task – creating more time to focus on the decision-making.
That’s why she was intrigued when she received an email announcing an upcoming AI training session for Microsoft employees.
“I asked my manager, ‘Can I go to this?’” she said. “I thought it looked like a good overview of things I would like to know.”
Berkeley, Calif., is considering shutting out of city contracts companies that help federal immigration officials create databases and registries used to target immigrants and religious minorities.
The Sanctuary City Contracting and Investment Ordinance, to be taken up by the City Council April 3, wouldn’t allow contracts with any vendor that’s working with U.S. Immigration and Customs Enforcement to create a database that could be used to identify and round up immigrants. It would also prohibit city investments in such vendors.
The gigantic data centers that power the internet consume vast amounts of electricity and emit 3 percent of global CO2 emissions. To change that, data companies need to turn to clean energy sources and dramatically improve energy efficiency.
Apple has hired Google’s chief of search and artificial intelligence, John Giannandrea, a major coup in its bid to catch up to the artificial intelligence technology of its rivals.
Apple said on Tuesday that Mr. Giannandrea will run Apple’s “machine learning and A.I. strategy,” and become one of 16 executives who report directly to Apple’s chief executive, Timothy D. Cook.
The sonification of big data will help people better understand and analyze big data, as well as detect anomalies in the data, say researchers at Virginia Tech.
The two most important words you need to know to understand the fate of our coastlines are “grounding line.” Those words describe where Antarctica’s voluminous ice shelves begin to float, holding back a wall of ice on land.
A study published on Monday in Nature Geoscience is among the first to create a detailed snapshot of how warming ocean waters are eating away at grounding lines around the continent. Over just five years, the continent lost 564 square miles of grounded ice, an area equivalent to roughly 25 Manhattans, 12 San Franciscos or four Philadelphias. This is not good news for any of those or other coastal cities.
Goldman Sachs has hired a senior employee from Amazon to run the bank’s artificial-intelligence efforts.
Charles Elkan has joined Goldman Sachs as a managing director leading the firm’s machine learning and AI strategies, according to an internal memo viewed by Business Insider.
Elkan comes from Amazon, where he was responsible for the Artificial Intelligence Laboratory at Amazon Web Services, according to the memo. He previously led the retailing giant’s Seattle-based central machine-learning team.
U.S. Army, Communications-Electronics Research, Development and Engineering Center
from
Aberdeen Proving Ground, MD May 2-4. “CERDEC is the Army’s applied research and advanced technology development center for command, control, communications, computers, cyber, intelligence, surveillance and reconnaissance (C5ISR) technologies and systems.” [registration required]
San Francisco, CA May 30-31. “Rev is for data science leaders and practitioners, offering interactive sessions, stimulating conversations, and tutorials about how to run, manage, and accelerate data science as an organizational capability.” [$$$]
One of the most frequent questions and topics that I see come up on community resources such as StackOverflow, the Confluent Platform mailing list, and the Confluent Community Slack group, is getting data from a database into Apache Kafka®, and vice versa. Often it’s Oracle, SQL Server, DB2, etc—but regardless of the actual technology, the options for doing it are broadly the same. In this post we’ll look at each of those options and discuss the considerations around each. It may be obvious to readers, but it’s worth restating anyway: since this is Kafka—a streaming platform—we are talking about streaming integration of data, not just bulk static copies of the data.
“Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP). Most stuff here is just raw unstructured text data, if you are looking for annotated corpora or Treebanks refer to the sources at the bottom.”