Data Science newsletter – August 9, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for August 9, 2018

GROUP CURATION: N/A

 
 
Data Science News



IEEE Language Rankings 2018

R-bloggers, Revolutions, David Smith


from

Python retains its top spot in the fifth annual IEEE Spectrum top programming language rankings, and also gains a designation as an “embedded language”. Data science language R remains the only domain-specific slot in the top 10 (where it as listed as an “enterprise language”) and drops one place compared to its 2017 ranking to take the #7 spot.

Looking at other data-oriented languages, Matlab as at #11 (up 3 places), SQL is at #24 (down 1), Julia at #32 (down 1) and SAS at #40 (down 3).


UVA to Lead Statewide ‘Biocomplexity Initiative’

University of Virginia, UVA Today


from

The University of Virginia is developing a new statewide, multi-site “Biocomplexity Initiative” that seeks to employ the principles of interdisciplinary team science to integrate such diverse disciplines as molecular science and public policy analysis in order to solve complex and multidimensional problems in medicine and society.

Chris Barrett, a global leader in applying computer science concepts and tools to making new discoveries in complex systems involving human health, habitat and well-being, will lead the initiative. He has directed the Biocomplexity Institute at Virginia Tech for the past four years and is a professor of computer science. He comes to UVA in September to establish an institute that will collaborate with researchers across Virginia.


Mayo, University of Illinois tap genomic data with new informatics tool

Health Data Management, Greg Slabodkin


from

A newly developed tool combines different types of genomic data to predict patients’ specific responses to therapeutic drugs.

The application, developed by the University of Illinois and the Mayo Clinic, is based on information contained in human genomes and enables researchers to predict patient reactions.

“We all know treatment outcomes for complex diseases like cancers vary dramatically among individuals, from lacking of efficacy resulting in disease recurring to severe toxicity resulting in noncompliance in patients who cannot tolerate these life-saving drugs,” says Leiwei Wang, professor of pharmacology at the Mayo Clinic. “Therefore, it is extremely important for us to understand better of how and why patients respond differently, so that we can truly individualize their therapies by choosing the right drug at the right dose.”


Measuring the Potential Impact of Amazon HQ2 on DC and Other Candidate Sites

CARTO Blog, Dave Bryson


from

We don’t yet know where it will end up, but Amazon HQ2 is going to vastly change the physical, societal, and economic landscape of the chosen site. For the candidate cities, Amazon itself, and other local economic interests, the impact of HQ2 will be significant. But what insights from Amazon’s impact on the city of Seattle, Washington can municipal leaders and other local officials consider to anticipate changes if their city is selected by the e-commerce giant?

Using Location Intelligence and multiple data streams, we can gain significant insights on the impact of development at a hyper-local level.


Getty Images launches a new AI tool that helps publishers find the right picture for the story

Journalism.co.uk, Marcela Kunova


from

Getty Images has released an artificial intelligence (AI) tool for publishers that recommends the best choice of images to accompany a news story.

‘Panels by Getty Images’ uses customisable filters and a self-improving algorithm that learns how an editor selects an image and optimises its performance over time.

“The AI starts working for you as soon as you copy/paste your article into the tool,” says Getty Images senior vice president of data and insights Andrew Hamilton.


Waste Heat: The Overlooked Energy Problem, and How to Solve It

SingularityHub, Anastasiya Ernays-Eskarsa


from

As anyone who has taken elementary physics knows, energy can’t be created or destroyed. In a closed ecosystem, the amount of energy stays the same, merely changing between high-quality and low-quality forms, namely heat. Low-quality heat energy can be an environmental pollutant, just like the plastic detritus with which our oceans are now awash.


University Data Science News

The field of psychology is working hard to emerge from its recent reproducibility crisis. There is still plenty of evidence of non-reproducible work and data hoarding. Morton Ann Gernsbacher, a psychologist at UW-Madison put together three handy lists of ways to reward research transparency during the hiring process (4 tips), during the promotion/tenure evaluation (2 tips), and through awards/honors (2 tips).



In Scientific American, Yale psychiatrist Daniel Barron addressed a controversial, flaming-hot open question about open access. Just exactly when does a scientist have to share data to comply with best practices in open science? Immediately upon publication of the first paper on a given study? Can the lab wait to publish all of the papers related to the study – likely supporting the careers of postdocs and grad students – before sharing all the data? Barron, a clinician, points that most research data does not have an immediate impact on patient care. The inflamed arguments about patients dying while some greedy scientist holds onto her study data for the months and years it takes to publish all the studies is…inaccurate and manipulatively melodramatic. He writes, “I remain unconvinced that there is an immediacy to sharing most forms of scientific data—especially an immediacy in the name of the public interest. I am convinced that other scientists feel an immediate need to analyze data sets that they do not own—especially if the results of a particularly excellent data set can be published in Nature and make them famous.” He said it so I don’t have to.

ALSO in a heated Twitter-based fight about open access to data, William Gunn, Director of Scholarly Communications for Elsevier arrived at a uniquely phrased tweet defending Elsevier’s for-profit publishing strategy. Ashley Farley, writing about patients with rare diseases trying to read paywalled studies: “People shouldn’t have to jump through additional hoops to access information because they aren’t privileged enough to be associated with an institution that can (nowadays its barely) afford subscriptions.” Mr. Gunn responded, “Yes, everyone should have rainbows, unicorns, & puppies delivered to their doorstep by volunteers. Y’all keep wishing for that, I’ll keep working on producing the best knowledge and distributing it as best we can.” Vince Madai summarized what many academics were thinking, “You mean like you receive our research, reviews and edits for free? Or do you mean scientific results that in most cases the tax payers already have paid for and thus have an intrinsic right to have access to without finding them behind a paywall that makes some publishers rich?” Gunn did not give an inch. Readers, check the thread.

Research hack courtesy of this Google team of researchers . Have oodles of contributing authors? Set up an email account for the paper! (h/t Andy Pavlo)

Caltech engineers built a drone that can herd birds away from airports (and their deaths). It wins the “data science for good” award of the week.



Python is still the top programming language in the IEEE ranking of programming languages, followed closely by C++ and C.



In a wide-ranging interview, Yoshua Bengio admits he felt “overloaded with teaching and starting a lab” and he “regretted working so hard when [his] second baby was born.” Obsidian just sent one of ours off to a postdoc with Bengio. About choosing postdocs, he said to select “postdocs who already know the stuff, so they don’t need two years to get up to speed and can help manage a larger team.” Hard to scale that to all labs, but it seems to be working for Bengio.



Writing quietly at the arg min blog, Ben Recht, David A. Forsyth, and Alexei Efros argue against “co-employment” in which computer scientists spend 20% of their time teaching in universities and 80% of their time conducting research in industry. They make the case that the best and brightest should be teaching, but as I just pointed out, Bengio had a reduced teaching load from the very beginning. The best and brightest who don’t like teaching will find a way to avoid most of their teaching responsibilities. Frankly, if they aren’t good teachers, this may be better for the students, though with the cost of overburdening the remaining faculty. They make other salient points: academic freedom doesn’t exist in companies, much of the research output is difficult to share from a corporate base.



Yann LeCun disagrees. He splits time between NYU (20%) and Facebook AI Research (80%) and writes (with the help of a ghost writer? Some of these phrases do not sound like LeCun.) “Dual affiliation is a boon. It benefits not just the AI economy but individual academics — both researchers and students — as well as industry….At FAIR, we openly share our advances as much as we can, as fast as we can in the form of technical papers, open source code and teaching material.” He doesn’t dwell on critiques that co-employment takes away from teaching. I will be the first to deflect any potential criticisms of his lecture coverage. He wrote his own slides and showed up to teach at NYU while I was there. Students seemed to like being in his lectures and they were not overly shy with criticism, generally speaking. Co-employment may work. I haven’t seen enough evidence that it is systematically worse than the current academic arrangement.



Sekar Kathiresan, who holds positions at Mass General Hospital, Harvard Medical School, and the Broad Institute published a paper in Nature that made an important advance in the field of genetics. We know that very few diseases are caused by single genes. Kathiresan and collaborators developed polygenic indicators for five common diseases (in the US): coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer (pdf + paywall). Possessing polygenic indicators makes these individuals three times more likely to develop the respective disease.

Vanderbilt University has launched a new data science initiative that will establish a masters degree in data science and build data science courses for undergrads. They came to visit NYU at one point, so I like to think the Moore-Sloan Data Science Environment may have contributed an idea or two to Vanderbilt’s initiative.



Clarkson University is offering a new undergraduate degree in data science.

The Wellcome Trust has published a thorough review of the way it assesses potential grants and grantees, with an increasing emphasis on social impact. For research, this means that research impact will not only be measured by citation count. For PIs, this means their impact on trainees will be taken into consideration. Let’s see which other foundations follow their lead.



Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt’s former PhD student, now assistant professor at George Washington University, used machine learning to de-anonymize sections of code. In other words, each developer writes code differently but these differences are consistent within a coder. Thus, it is possible to predict who wrote what, given a sufficiently comprehensive training data set. Cool, but unsurprising.



UC-Riverside professor Evangelos Papalexakis got an additional infusion of cash – $7,000 from Snap, Inc. (e.g. SnapChat) – to continue improving the accuracy of his fake news detection algorithm which is currently ~75%. Seems like a small amount of money for a large problem.



The Charles Koch Foundation gives millions of dollars of grant money to colleges, but has been criticized by investigative reporters including Jane Mayer for using grants to college to launder their libertarian ideology through the supposedly unbiased halls of academe. (Clever!) Now, the foundation has decided to be more transparent, releasing the text of grant agreements with new grantees. These new documents indicate that the university (Arizona State), not the foundation, will direct the research and make hiring decisions. Koch watchers are uneasy, as this transparency appears to be another game. They want to see the details of the arrangements with George Mason University’s Mercatus Center, the locus of the most pointed criticisms in Mayer’s book Dark Money.



University of Pennsylvania researchers have found structural abnormalities in the brain that correlate with psychopathy, “the findings highlighted four distinct dimensions of psychopathology – mood, psychosis, fear, and disruptive behavior – all of which were associated with a distinct pattern of abnormal connectivity across the brain.” This is a big deal for psychology because diagnoses are typically made based on observed and reported behavior, and no reliance on structural data.


Why Sequoia And Coatue Just Poured $40 Million More Into This Data Science Startup

Forbes, Alex Konrad


from

To hear Nick Elprin tell it, technology is going through a third seismic shift in computing.

First, the CEO of Domino Data Lab argues, there was hardware design, better microchips that enabled computers to work faster and handle more information. Then came the claim that “software is eating the world” as applications changed how businesses of all kinds operated on desktop computers and smartphones. “Data science and companies building models, that’s the third wave,” Elprin says.

The entrepreneur has a lot riding on that belief. He and cofounders Chris Yang and Matthew Granade gave up lucrative careers in finance at Bridgewater five years ago to start Domino, a software startup looking to help businesses conduct and make use of data science. And with $40 million in fresh funding from Sequoia Capital and Coatue Management, Elprin and Domino are racing to build what they think could be the Salesforce or Marketo of data science.


Who’s Ahead in AI Research? Insights from NIPS, Most Prestigious AI Conference

Noteworthy – The Journal Blog, Gleb Chuvpilo


from

Despite what the media might make you think, the West is far, far ahead in Artificial Intelligence research, and there is no imminent threat of China taking over. We know this from our deep dive into the most prestigious international AI research venue, the Conference on Neural Information Processing Systems (NIPS). The latest one, NIPS 2017, was held in Long Beach, California, attracting over 8,000 registered attendees, up 2,000 from the prior year. 679 papers out of 3,240 submitted were accepted for a 21% acceptance rate. Using these conference proceedings, we went into each of the 679 accepted papers and compiled a complete list of 2,497 authors and their affiliated organizations (many repeating, of course), and then calculated something we called the Publication Index.

What you can see below is the resulting chart roughly following the Power Law distribution with a factor of 3: the United States is in clear lead with a Publication Index of 414, followed by Western Europe with an index of 136, and China being the distant third, with an index of 39. (As an aside, we define Western Europe as the EEA (European Economic Area) + Switzerland, where the EEA includes the European Union, Norway, and Luxembourg; we feel that it’s fair to bundle these European countries together due to healthy coordination of research funding and multinational cooperation between them.)


Billion-dollar start-up Darktrace is fighting cybercrime with A.I.

CNBC, Master Class, Catherine Clifford


from

Pit the machines against the machines to keep your data safe.

That’s the philosophy of cybersecurity start-up Darktrace, which uses artificial intelligence to fight cybercrime against corporations.

“It’s clear that we’re now in the midst of a cyberarms race, and the battlefield is going to be inside of every company network and we’re going to see a war against algorithms,” Darktrace co-founder and CEO Nicole Eagan tells CNBC Make It.


Samsung plans $22 billion for artificial intelligence, autos

Associated Press, Youkyung Lee


from

Samsung Electronics plans to spend a whopping $22 billion over the next three years on artificial intelligence, auto components and other future businesses as the company maps out its strategy under the restored leadership of Vice Chairman Lee Jae-yong after he was freed from prison.

The announcement Wednesday by South Korea’s biggest business group was welcome news, coming at a time of deepening unease over slowing growth in Asia’s fourth largest economy. It also highlights that South Korea’s decades-old reliance on big businesses for jobs and investments is unlikely to end anytime soon since the country lacks a strong pool of small and medium-sized companies found in Japan or Germany.


Husker DogFest to launch canine cognition lab

University of Nebraska-Lincoln, Nebraska Today


from

A new University of Nebraska–Lincoln lab is going to the dogs — to study canine psychology and related interactions with humans.

Nebraska’s Canine Cognition and Human Interaction Lab will showcase its work during Husker DogFest, a free, open-to-the-public event, 10 a.m. to 2 p.m. Aug. 11 in the greenspace immediately south of Manter Hall. The event will include lab tours, professional dog demonstrations, dog activities, pet-related vendors and food vendors.

The lab is a research program created by Jeffrey Stevens, associate professor of psychology. Its research focuses on understanding both dog psychology and how interacting with dogs influences human behavior and psychology. The dog-focused part of the research will involve owners bringing their dogs into the lab to essentially play games for treats while researchers record data about their decision making and cognition. The other part of the research will have people interacting with dogs to see how it influences the people.


How Accessible is Psychology Data?

Discover, Blogs, Neuroskeptic


from

In a slightly depressing new paper, two researchers describe how they tried to get access to the data behind 111 of the most cited psychology and psychiatry papers published in the past decade. The researchers, Tom E. Hardwicke and John P. A. Ioannidis of Stanford, wanted to place the data into a ‘Data Ark‘ to ensure its continued preservation for science. Unfortunately, in most cases, the data was not made available.

The paper is called Populating the Data Ark and it’s out now in PLoS ONE.

Hardwicke and Ioannidis wrote to the authors of each of the highly-cited articles, explaining the idea behind the Data Ark and requesting the raw data – including the option to give the data to the researchers but with restrictions on who could access it.


Engineers Taught a Drone to Herd Birds Away From Airports

Caltech, News


from

Engineers at Caltech have developed a new control algorithm that enables a single drone to herd an entire flock of birds away from the airspace of an airport. The algorithm is presented in a study in IEEE Transactions on Robotics.


Providing Equitable Access to Computing Education

Communications of the ACM; Mark Guzdial and Amy Bruckman


from

We propose that CS departments who offer MOOCs must balance the opportunities they are offering to advantaged students (like MOOCs) by pairing them with opportunities for less-advantaged students. CS MOOCs fill a need and should be offered and even expanded. But they do not meet the definition of Rawlsian justice. CS departments should offer interventions that measurably reach advantaged and less-advantaged students equally. Dollar for dollar, student for student, initiatives that reach more advantaged students need to be matched with those that reach less-advantaged ones.

The U.S. National Science Foundation has launched a new pilot effort to expand engagement in broadening participation in computing (BPC) activities by awardees in their Computer and Information Science and Engineering (CISE) directorate. They aim to increase the number of computer scientists who are working to make computing education more accessible. Some CISE proposals already require BPC plans, and more proposals will be required in the future. Proposal writers will be provided a set of resources, and they will be encouraged to participate in meaningful activities that have successfully reached underrepresented populations. Example programs include the Distributed Research Experiences for Undergraduates (DREU) programd from the Computing Research Association’s Committee on the Status of Women in Computing Research (CRA-W) and the NCWIT Aspirations award. There are things we can do that have a measurable impact on increasing equitable access to computing education, and it is the responsibility of the entire CS community to do them and assess whether they are working.

 
Events



Preconference on Politics and Computational Social Science (PaCCS)

Northeastern University


from

Boston, MA August 29, starting at 9 a.m., Northeastern University (716 Columbus Ave, 6th fl). [$$$]


3rd Bay Area Postdoc Symposium

Bay Area Postdocs


from

Berkeley, CA September 8, starting at 8:30 a.m., Lawrence Berkeley National Lab (1 Cyclotron Road). [$$, space is limited]

 
Deadlines



Economic Graph Research Program: Insights and Updates

“We will be accepting a new round of Economic Graph Research proposals next month. Starting August 20th, interested researchers from universities, think tanks, and other non-profit entities can submit their proposals for Economic Graph Research projects to the team at LinkedIn.”

Apply Now — SAGE Ocean | Supporting Social Scientists Working with Big Data & Tech

“SAGE’s Concept Grants program has been developed as part of the SAGE Ocean initiative to fund innovative software ideas that support social science researchers to work with big data and new technologies.” Deadline for applications is February 15, 2019.
 
Tools & Resources



New Arabic NLP findings – An Arabic speech corpus and an Arabic Root Finder neural net

Tarek Hoteit


from

“Published a new blog post on Arabic.Computer about two projects I found on the Internet that are useful for Arabic NLP initiatives. The first one is Arabic Speech Corpus in a Damascian accent provided by Nawar Halabi and is offered under a non-commercial license. The other one is Arabic Root Finder, a useful Keras/Scikit-Learn neural network for finding Arabic word roots and is offered by Tyler Boyd under a GPL-3 license.”


Report from Hacking and Making at Time-Bounded Events at CHI 2018 – Blog Detail

Science Gateways Community Institute, Mona Wong


from

SGCI co-hosted the Hacking and Making at Time-Bounded Events: Current Trends and Next Steps in Research and Event Design Workshop at the CHI 2018 Conference in Montreal, Canada on April 22. The purpose of the one-day workshop was to bring together researchers and event organizers to share and learn from one another on the phenomenon of time-bounded collaborations such as hackathons, sprints, codefests, etc.

,br/>
The workshop brought together 23 attendees, all with various backgrounds and ranging in experience from none at all (myself) to researchers interested in studying these events from a social perspective and also event organizers with varying degrees of experience.


Image Tagging in SAGE Journals – Part Two

SAGE Ocean, James Siddle


from

In part one of this series we introduced the topic of automated image tagging and showed how Cloud Vision APIs such as Clarifai can be used to classify images into different categories. We showed examples of SAGE images and the tags assigned by different Cloud Vision APIs, then discussed use cases for this innovative technology—primarily in discoverability and accessibility.

In this follow-on post, we focus on data analysis and specifically co-occurrence networks. By way of example we present a co-occurrence network derived from Clarifai image tags, which represents a kind of mental model of the SAGE journal images we processed.

 
Careers


Full-time positions outside academia

VP of Business Strategy & Analytics



Minnesota Twins; Minneapolis, MN
Postdocs

Postdoctoral scientist



Johns Hopkins University, Computational Cognitive Neuroscience Lab; Baltimore, MD

Leave a Comment

Your email address will not be published.