Data Science newsletter – June 11, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for June 11, 2019

GROUP CURATION: N/A

 
 
Data Science News



Why Are We Hooked on Self-Tracking Fitness Devices?

Psychology Today, Pirkko Markula


from

Personal health-tracking technologies are gaining popularity: Consumer studies show that 15% of Americans currently use wearable technologies daily and a further 56% wants to monitor their health behavior through such devices (Sanders, 2017). The expanding technology with internet connection, such as my mobile phone, now links even uninformed citizens unintentionally into their net. Having been caught, I am also wondering what are the exact benefits of knowing my step counts. Can there also be unintended consequences when detecting and sharing one’s personal fitness data? The popularity of fitness tracking, or “sensor mania” as Swan (2012) has described it, has also awakened researchers to comment on the use and meaning of these devices in the current health and fitness conscious era.


Google to Acquire Looker

Google Cloud Blog, Thomas Kurian


from

Looker extends our business analytics offering with two important capabilities—first, the ability to define business metrics once in a consistent way across data sources. This makes it easy for anyone to query data while maintaining consistent definitions in their calculations, ensuring teams get accurate results. Second, Looker also provides users with a powerful analytics platform that delivers applications for business intelligence and use-case specific solutions such as Sales Analytics, as well as a flexible, embedded analytics product to collaborate on business decisions. The addition of Looker to Google Cloud will help us offer customers a more complete analytics solution from ingesting data to visualizing results and integrating data and insights into their daily workflows. It will also help us deliver industry specific analytics solutions in our key verticals, whether that’s supply chain analytics in retailing; media analytics in entertainment; or healthcare analytics at global scale.


The economics of privacy

Marginal Revolution blog, Tyler Cowen


from

Perhaps the biggest complaint about tech companies today is that they do not respect our privacy. They gather and store data on us, and in some cases, such as Facebook, they charge companies for the ability to send targeted ads to us. They induce us to self-reveal on the internet, often in ways that are more public than we might at first expect. Furthermore, tech data practices are not entirely appropriate, as for instance Facebook recently stored user passwords in an insecure, plain text format.

This entire debate is overblown, and the major tech companies are much less of a threat to our actual privacy than is typically assumed.

For most people, gossip from friends, relatives, colleagues, and acquaintances is a bigger privacy risk than is information garnered on-line. Gossip is an age-old problem, and still today many of the biggest privacy harms come through very traditional channels. And unlike false charges planted on social media, often there is no way to strike back against secretive whisperings behind one’s back. In the workplace, one employee may tell the boss that another employee does not work hard enough, or high school gossip may destroy reputations and torment loners and non-conformists, to give two common examples of many.


Hadoop runs out of gas

InfoWorld, Matt Asay


from

Big data remains a big deal, but that fact is somewhat obscured by the recent stumbling of its former poster children: Cloudera, Hortonworks, and MapR. Once the darlings of data, able to raise gargantuan piles of cash—Intel pumped $766 million into Cloudera in just one investment round!—the heavyweights have been forced to skinny down, whether by merging (Cloudera and Hortonworks) or cutting heads (MapR).

Meanwhile, other open source big data vendors like Elastic and MongoDB are soaring. What gives? There is, of course, a variety of reasons, among them the fact that the erstwhile Hadoop vendors bet big on the wrong audience, namely architects bound to the data center, while the market shifted to developers seeking freedom in the cloud.


Artificial intelligence is coming. Will Congress be ready?

Roll Call, Kate Ackley


from

Members of Congress are waking up to the potential dangers of widespread use of AI technologies. They have drafted bills that would not only require more transparency and accountability over these automated systems, but also allow users to withhold certain information from the large data sets that drive artificial intelligence.

“It’s a fundamental way in which decisions are made now — algorithms and computers,” said Sen. Ron Wyden, an Oregon Democrat who is co-sponsoring a bill dubbed the Algorithmic Accountability Act, introduced in April. “And it seems to me that there’s not much transparency, not much disclosure, and that’s what we sought to do in our bill.”

The bill would require the Federal Trade Commission to prepare rules requiring companies to test their AI-powered systems for accuracy, fairness, bias, discrimination, privacy and security, and to correct errors if they find them. The bill is backed by Sen. Cory Booker, a New Jersey Democrat running for president, and New York Democratic Rep. Yvette D. Clarke.


16 Things OMB Wants Agencies to Accomplish in the Data Strategy’s First Year

Nextgov, Aaron Boyd


from

The Trump administration Tuesday released a draft of the first one-year action plan, which will establish hard deliverables for agencies to attain as part of the Federal Data Strategy.

Along with the draft action plan, the Office of Management and Budget also released the final versions of the principles and practices it expects agencies to follow with regard to gathering, using, protecting and engaging with data.

The finalized strategy issued Tuesday covers the first two-thirds of the larger Federal Data Strategy: the principles that guide how agencies think about data and specific practices that can further those efforts. The third pillar—the action plan—is only in draft form, with a comment period open for the next 30 days.


Opinion | The Anti-College Is on the Rise

The New York Times, Molly Worthen


from

A small band of students will travel to Sitka, Alaska, this month to help reinvent higher education. They won’t be taking online courses, or abandoning the humanities in favor of classes in business or STEM, or paying high tuition to fund the salaries of more Assistant Vice Provosts for Student Life. They represent a growing movement of students, teachers and reformers who are trying to compensate for mainstream higher education’s failure to help young people find a calling: to figure out what life is really for.

These students will read works by authors ranging from Plato and Herbert Marcuse to Tlingit writers. The point is to “develop and flex a more rigorous political imagination,” according to one course syllabus. They will take on 15 to 20 hours a week of manual labor in Sitka, and set their group’s rules on everything from curfews to cellphones. Last summer’s cohort discouraged the use of phones during class and service hours and ordered everyone to turn off the internet at 10 p.m.

This is Outer Coast, one of an expanding number of educational experiments born out of a deepening sense that mainstream American colleges are too expensive, too bureaucratic, too careerist and too intellectually fragmented to help students figure out their place in the universe and their moral obligations to fellow humans.


Master of Science (MS) in Cyber Intelligence

University of Dallas


from

The Master of Science (M.S.) in Cyber Intelligence program at the University of Dallas specializes in the application of the most sought-after skills in the key fields of business analytics and cybersecurity. As a STEM-designated program, the M.S. in Cyber Intelligence combines cybersecurity principles — such as cryptography, penetration testing, forensics and strategic management perspective — with the most relevant statistical and programming techniques. UD’s cyber program is designated a National Center of Academic Excellence in Cyber Defense Education by the National Security Agency (NSA) and the U.S. Department of Homeland Security (DHS).


Dodgeball is a tool of ‘oppression’ used to ‘dehumanize’ others, researchers argue

The Washington Post, Jacob Bogage


from

One of gym classes’ most common games is being used as a tool of “oppression,” according to a team of Canadian researchers.

Dodgeball in phys-ed classes teaches students to dehumanize and harm their peers, professors from three Canadian universities said in a presentation this week at the Congress of the Humanities and Social Sciences in Vancouver. A paper on the subject is set to appear in the journal European Physical Education Review.


First Systematic Study Of The Advice People Would Give To Their Younger Selves

The British Psychological Society, Research Digest, Christian Jarrett


from

The question is an old favourite – if you could travel back in time, what advice would you give to your younger self? Yet despite the popularity of this thought experiment, no one has, until now, actually studied what people would tell themselves.

Reporting their findings in The Journal of Social Psychology Robin Kowalski and Annie McCord at Clemson University have done just that in two surveys of hundreds of participants on Amazon’s Mechanical Turk website. Their findings show that people’s advice to their younger selves is overwhelmingly focused on prior relationships, educational opportunities and personal worth, echoing similar results derived from research into people’s most common regrets in life. Moreover, participants who said they had followed the advice they would give to their younger selves were more likely to say that they had become the kind of person that their younger self would admire. “…[W]e should consult ourselves for advice we would offer to our younger selves,” the researchers said. “The data indicate that there is much to be learned that can facilitate wellbeing and bring us more in line with the person that we would like to be should we follow that advice.”


Salesforce is buying data visualization company Tableau for $15.7B in all-stock deal

TechCrunch, Ingrid Lunden


from

On the heels of Google buying analytics startup Looker last week for $2.6 billion, Salesforce today announced a huge piece of news in a bid to step up its own work in data visualization and (more generally) tools to help enterprises make sense of the sea of data that they use and amass: Salesforce is buying Tableau for $15.7 billion in an all-stock deal.


Data center boosters hope new tax incentives ‘stop the bleeding,’ keep tech sites in Illinois

Chicago Tribune, Ally Marotti


from

Illinois lawmakers approved a data center tax incentive proponents say could make tech giants and other firms think twice about locating their data storage facilities anywhere else.

The incentive, passed as part of a $45 billion capital construction plan, would exempt qualifying data centers from state and local sales tax on equipment inside the facilities, such as cooling and heating equipment, servers and storage racks.


Microsoft launches first AI hub to reverse talent flow to the coasts and bring non-PhDs into AI jobs

CNET, Jason Hiner


from

The question about the implications of artificial intelligence, not just for the future of technology but the future of jobs and our communities, is something Microsoft is investing new energy and resources to answer. On Friday, the company announced that it’ll open its “first regional hub for AI” in Louisville, Kentucky. The project will be an experiment to upskill people on how to work alongside AI and robots, and it’ll create a learning lab to prepare communities for the years ahead when AI and automation are going to disrupt the economy and the workforce.

Microsoft chose Louisville because 28% of the jobs in the community are at risk from automation, according to research cited by Microsoft CTO Jennifer Byrne at the press conference announcing the partnership. Louisville has a particularly heavy manufacturing base, with 12% of all jobs being in the manufacturing sector.


Greg Morrisett named dean of Cornell Tech

Cornell University, Cornell Chronicle


from

Greg Morrisett, dean of Cornell’s Faculty of Computing and Information Science and an international expert in software security, has been named the Jack and Rilla Neafsey Dean and Vice Provost of Cornell Tech, Provost Michael Kotlikoff announced June 7. The five-year appointment, approved by the Executive Committee of the Cornell Board of Trustees, is effective Aug. 1.

Morrisett has stewarded CIS through four years of explosive growth, recruiting world-renowned faculty, integrating the computer and information science departments, and spearheading the development of the new Department of Statistics and Data Science. His research focuses on building provably correct and secure software, with an emphasis on cryptographic schemes, machine learning and compilers.


UVA Board Approves Establishment of School of Data Science

University of Virginia, UVA Today


from

“It’s not every day that we have an opportunity to start a new school at the University of Virginia,” Executive Vice President and Provost Liz Magill said. “It’s positively thrilling, especially because the SDS is on the cutting edge of an emerging field. The world is evolving and we are at the leading edge of the evolution in data science. The design of the school itself – both its open approach to research and discovery and its interdisciplinarity – is an exciting and innovative addition to our rich field here at UVA.”

The final step to establish the school will be approval from the State Council of Higher Education for Virginia, with a proposal to be submitted next week. The UVA Faculty Senate voted unanimously in favor of the school in May.

 
Events



Workshop: Data scholarship for the humanities with the HathiTrust Digital Library

HathiTrust, University of Oxford


from

Oxford, England July 5, starting at 10 a.m. “This workshop will introduce attendees to a suite of tools and services for computational text analysis research through the HathiTrust Research Center (HTRC). The HTRC leverages the scope and scale of the 17 million volume HathiTrust Digital Library collection to create opportunities for researchers to perform text data mining on subsets of the corpus.” [free, registration required]


Machine Learning for Music Discovery Workshop

Erik M. Schmidt, Oriol Nieto, Katherine M. Kinnaird, Fabien Gouyon, Gert Lanckriet


from

Long Beach, CA June 15, starting at 9 a.m., part of ICML.


HILDA 2019: Workshop on Human-In-the-Loop Data Analytics

Leilani Battle, Surajit Chaudhuri, Arnab Nandi


from

Amsterdam, The Netherlands July 5, starting at 9 a.m., co-located with SIGMOD 2019.


Medical Sensors & Design Conference

Sensors Online


from

San Jose, CA June 25, starting at 9 a.m. “The Medical Sensors Design Conference creates unique educational and networking opportunities that drive sensor development and foster innovation in medical and healthcare applications. Co-located with Sensors Expo & Conference 2019, the Medical Sensors Design Conference provides the perfect context for design engineers and OEMs to learn about underlying sensor technologies and emerging applications.” [$$$]

 
Tools & Resources



Algorithmic Impact Assessment

Government of Canada


from

The AIA is a questionnaire designed to help you assess and mitigate the impacts associated with deploying an automated decision system. The AIA also helps identify the impact level of your automated decision system under the Directive on Automated Decision-Making. The questions are focused on your business processes, your data, and your system design decisions.


GRANDstack Starter

GitHub – grand-stack


from

This project is a starter for building a GRANDstack (GraphQL, React, Apollo, Neo4j Database) application. There are two components to the starter, the UI application (a React app) and the API app (GraphQL server).


Visualizing and Measuring the Geometry of BERT

arXiv, Computer Science > Machine Learning; Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg


from

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.


Have you listened to @goodcodepodcast from @dlicornelltech yet?

Twitter, Cornell Tech


from

“Tune in every week to hear experts talk about ethical dilemmas posed by today’s evolving technology and how we can prevent them.”

Leave a Comment

Your email address will not be published.