Data Science newsletter – April 18, 2017

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for April 18, 2017

GROUP CURATION: N/A

 
 
Data Science News



Building a genome from scratch: an interview with Dr. Leslie Mitchell

EU Synbio S blog, Konstantinos Vavitsas


from

I had the pleasure to meet Leslie about two years ago during a summer course in Italy. She was one of the instructors, and her lecture about DNA synthesis in massive amounts and with different techniques made me realize I was doing something wrong with my cloning. She is also a very cheerful person, with a very positive attitude. So when I saw the recent synthetic chromosome articles, I contacted her and she was very kind to answer my questions.


Data science workshop introduces students to programming, teaches valuable skills

SUNY Geneseo, The Lamron student newspaper, Annie Renaud


from

Assistant Director and Manager for Systems and Networking Research Technologist Kirk Anne hosted a data science workshop for the programming and data science club on Sunday April 9, in which students analyzed a dataset to determine whether passengers on the Titanic were likely to survive.


Case Study: Ad Targeting With Real-Time Analytics for Higher ROAS

RTInsights


from

PMG worked with MediaMath to collect data into a centralized platform and turn the raw data into usable marketing signals. As data was ingested, PMG was able to use a series of four key “adaptive segments” to create and build precise consumer profiles that populated retroactively, and update those profiles in real-time. This provided PMG and its client the ability to quickly identify customers at various stages of search and transactions.

For campaigns to “co-op-based segments,” PMG customized audience segments based on users that browsed the applicable co-op brand and products within the retailer’s site, and used segment sizing to understand how many people they would actually target for a specific brand.


How to Build an AR/VR Studio: Four Lessons from the Harvard Innovation Labs

The Harvard Innovation Labs


from

After looking at potential solutions ready “out-of-the-box,” we quickly decided that these options would severely limit the use of a diverse set of VR systems, AR headsets, software, and design tools. If we wanted to create an AR/VR Studio that didn’t implicitly pick winners in this highly competitive space, the only option was to build the studio from scratch.

In building the studio over the course of six months, here are a few lessons that we learned along the way.


Your new, future robot co-worker

The Daily Illini | Longform, Emily Scott


from

Brain-computer interfaces provide for direct communication between the brain and an external device. It’s an area of research that has been studied for decades, an area that entrepreneur Elon Musk stresses as an important tool to develop artificial intelligence further.

In this case, the interface is being used to communicate with a robot. Kesavadas’ experiment has shown there is a way to use this interface to get a robot to perform an action based on reading brain signals from a human.

In a way, the robot is reading your mind.


University Data Science News

Leslie Mitchell at NYU Langone Medical Center is building a synthetic genome from scratch. “We are no longer limited to the study of cells that are a product of evolution,” she explains. Geneticist’s ability to “edit mammalian systems” is daunting. “It is probably naive to rely on altruism. Even the best intentions can go awry. Technical limitations will only impede progress on building increasingly complex genetic systems for so long. I’m an advocate for total transparency…and an inclusive approach.” Indeed, her team includes labs around the world.

Yuan Ji, Oded Rozenbaum and Kyle Welch of George Washington University School of Business, scraped “1,112,476 employee ratings of 14,282 public firms in the period 2008-2015” from Glassdoor and found that employee ratings are a good predictor of SEC fraud violations. They hypothesize that when firms are under pressure, managers may pressure employees to meet targets, resulting in grumpy employees who write negative comments on Glassdoor. If the managers can’t squeeze enough value out of employees, they may try misreporting or other creative fraudulent behaviors to meet the firm’s goals.

Peter Szolovits at MIT CSAIL, a leading expert in using natural language processing in precision medicine applications, explained his goal. It’s not “trying to get all doctors to ‘work as well as the best’.” Instead, it is much better to get “the least-skilled doctors [to perform like] average” doctors. In a refreshingly nonchalant dismissal of IoT for healthcare practice, he reminded everyone that, “IoT today is full of security holes” and not at all “ready for prime time.” Then he captured this organizational sociologist’s heart by declaring that the unrealized potential of electronic health records is not technical, it is “institutional and policy-based.”

Arizona State University, the University of Houston, and the NSF have partnered to create an industry-academia research hub for neurotechnology called The BRAIN Center. The research will focus on improving patient outcomes for those with injuries to or degeneration of the central nervous system.

A Stanford University team has launched DAWN, a project to “democratize AI and machine learning.” Within the next five years they aim to “build out the toolbox that we believe will empower the 99.9 percent to build and deploy their own world-class data products, quickly and cheaply.”

Hahrie Han, a political scientist at UC-Santa Barbara, explains the March for Science. She knows that the disparate goals, “makes it harder to translate whatever happens in the march to political influence. And related to my points about centralization or decentralization, one of the challenges is what happens to the coalition afterwards? If they’re too disparate or fragmented, it could be harder to coalesce around shared goals.” Han goes on to point out that, “the thing that is most predictive of whether any pressure group is able to achieve its political goals is the extent to which it has relationships with political elites.” Charismatic scientists among us, this is the call to use that trait for the betterment of science by persuading as many politicians and voters as you can that science is worth funding.



The Gordon and Betty Moore Foundation will now require all grantees to make their grant-funded publications “openly available within 12 months of publication, either on the journal’s site or in an open access repository.” They allow grant funds to cover the cost of fees associated with open access publishing. Great decision, it follows the Bill and Melinda Gates Foundation. I wonder what the conversation was like around the decision to cover what many consider to be rather ridiculous OA publishing fees. A related fee is the cost to publish data to an open data archive like Dryad, whose organizers found that 96 percent of their users do not budget for data publication fees. One-quarter of their respondents paid the publication fees on their personal credit cards and were not reimbursed.

Elsewhere in publishing problems, PubMed is now publishing funding information in its abstracts to make potential conflicts of interest more obvious.

Duke, Stanford, and Verily (an Alphabet company) have announced the first initiative of Project Baseline, which is to recruit 10,000 participants and track their detailed health data over at least four years. The amount of data collected is extensive, consisting of: “repeat clinical visits; daily use of a wrist-worn investigational device and other sensors; and regular participation in interactive surveys and polls by using a smartphone, computer or call center.” This is similar to the Kavli Foundation’s HUMAN Project which is studying the lives of 10,000 New Yorkers 13 or older for “decades.” Both projects seem ethically dubious, though the Duke+Stanford+Verily initiative avoids working with minors, limits participants’ surveillant period to four years, and is planning to have participant feedback on conference calls throughout the term of the study.


Government Data Science News

Pittsburgh and Boston are racing to become the key hubs for robotics research and development. The Department of Defense is housing its advanced robotics institute at Carnegie Mellon University, which has long been home to leading robotics researchers. In Boston, Northeastern University’s new Interdisciplinary Science and Engineering Complex features a major robotics research center.

US Representative Derek Kilmer (D-Washington) introduced the Open Government Data Act which calls for the federal government to share machine readable data by default, with exceptions for personal and security-sensitive data. A new MITCMU report about the impact of AI on the US job market concluded that in order to forecast changes in specific industries we need a new federal data gathering initiative.

Steve Ballmer launched USAFacts, a non-profit project to display US government statistics. So far, the website is mostly sparklines and large-font single numbers: median age in American 37.8! He also produced a 2017 10-K pdf for the US which offers a Ballmer-esque perspective on the intersection of business and civic institutions. Note to Steve: open government advocates are not huge fans of pdfs.

The FCC auctioned off 175 radio wave frequencies transferring the balance from old broadcast television stations to newer broadcast TV stations and internet service providers. T-Mobile, the only mobile phone provider to make significant investments, spent $8b.

Dawn Tilbury has been appointed Assistant Director for Engineering at NSF. Tilbury is a mechanical engineering professor at the University of Michigan working on mobile robotics and passionate about mentoring junior faculty in STEM fields, especially junior women faculty. I’m glad to see that in federal science leadership.

Mattel will develop Aristotle, an intelligent robot for children that is designed to answer their questions, nurture them, and remain age appropriate as they grow. The toy maker was planning a partnership with Amazon‘s Alexa, but has dropped that option in favor of an as-yet unknown partner. Mattel is opening the platform to outside developers, one of which may have thousands of books loaded into the device, accompanied by images it can project on the ceiling to enable digital bedtime stories. Protecting child user’s privacy is part of the discussion; details are skimpy.


Last Words: Computational Linguistics and Deep Learning

MIT Press, Nautilus, Christopher Manning


from

The Deep Learning Tsunami

Deep Learning waves have lapped at the shores of computational linguistics for several years now, but 2015 seems like the year when the full force of the tsunami hit the major Natural Language Processing (NLP) conferences. However, some pundits are predicting that the final damage will be even worse. Accompanying ICML 2015 in Lille, France, there was another, almost as big, event: the 2015 Deep Learning Workshop. The workshop ended with a panel discussion, and at it, Neil Lawrence said, “NLP is kind of like a rabbit in the headlights of the Deep Learning machine, waiting to be flattened.” Now that is a remark that the computational linguistics community has to take seriously! Is it the end of the road for us? Where are these predictions of steamrollering coming from?


3 reasons why data scientist remains the top job in America

InfoWorld, Vivian Zhang and Chris Neimeth


from

A new study by CareerCast.com revealed data scientist jobs have the best growth potential over the next seven years, as they are one of the toughest jobs to fill. Statistics from rjmetrics.com show that there were anywhere from 11,400 to 19,400 data scientists in 2015, and over 50% of those roles were filled in the last four years.


NSA’s powerful Windows hacking tools leaked online

CNN Tech, Selena Larson


from

A hacking group has dumped a collection of spy tools allegedly used by the National Security Agency online. Experts say they are damaging.

The exploits, published by the Shadow Brokers on Friday, contain vulnerabilities in Windows computers and servers. They may have been used to target a global banking system. One collection of 15 exploits contains at least four Windows hacks that researches have already been able to replicate.

Late Friday, Microsoft said the exploits had been patched in previous updates, or are not able to be replicated on supported platforms. Windows users should make sure their software is up to date and upgrade to Windows 7 or a newer version.


AS VEHICLES INCREASINGLY RELY ON COMPUTERS, HACKERS HAVE FOUND WAYS TO CONTROL THEM – REMOTELY

ARCCA


from

While you may not remember the names, Dr. Charlie Miller and Chris Valasek, you may recall what they did in 2015. They were the two researchers who exposed the cyber-security holes in the FCA/Chrysler connectivity system that left their vehicles vulnerable to cyber-attacks.

Miller and Valasek famously conducted a test whereby they hacked into a Jeep’s system and took command of many of the vehicle’s controls – remotely – while it was traveling down a freeway. (The full story is definitely worth reading – click here) In further tests, by hacking remotely they could disable or slam on a victim’s brakes, turn the steering wheel or, in some cases, cause unintended acceleration.

As Dr. Miller points out, with the ever-growing race to autonomy, the risk for cyber-attacks on vehicles is a paramount issue.


Brain.fm and other music-streaming apps can now curate music based on your brainwaves — Quartz

Quartz, Nona Tepper


from

“You’ve got Spotify looking at your choices of song and providing suggestions on things you selected before now,” says musician Eduardo Miranda. “If you have something that is more connected to your own biology, it’s another way of providing services that may be more personalized.”

Miranda is a musician and composer who is best known for his liberal use of the electroencephalogram (EEG) machine to create complex string arrangements. For his next act, he is using brainwave-imaging software to change how he creates his music. Specialists have long used EEGs to diagnose and study epilepsy, sleep disorders, and other medical issues, but Miranda is using the technology to measure how rhythm affects brain activity.


Transparency is good: Pass the Open Government Data Act

The Seattle Times Editorial Board


from

By making federal data available by default, the Open Government Data Act proposed by U.S. Rep. Derek Kilmer would increase transparency and support innovation.


Helping Students Learn by Sketching

Northwestern University, Northwestern Engineering


from

Although sketching exercises can help students learn many subjects, they are woefully underused in classrooms.

“Sketches are difficult and time-consuming to grade,” said Northwestern Engineering’s Ken Forbus. “Intelligent tutoring systems, which enable students to receive feedback on their work anywhere and anytime, rarely are capable of understanding sketches.”

Forbus, Walter P. Murphy Professor of Computer Science, and his team have developed a new solution called “Sketch Worksheets,” a software equivalent of pencil and paper worksheets commonly found in classrooms. The difference? The software can also provide on-the-spot feedback by analyzing student sketches and then comparing them to the instructor’s sketches.


New CEO wants to push one of Durham’s most successful startups to the next level

The Herald Sun, Durham NC


from

Marc Zionts is a very active man. He’s run seven technology companies since the 1990s, competes in extreme cycling races often and commutes from Chicago to locales across the country for work. He talks quickly and says he likes to pour himself into his work.

Now he’s bringing that energy to Durham as the new chief executive officer of the technology company Automated Insights.

Durham-based Automated Insights (Ai) is an artificial intelligence company that uses algorithms to create written reports for clients from data rather than using humans to write them.

 
Events



Columbia University Causal Inference Conference: Varying treatment effects

Andrew Gelman, Michael Sobel, Jose Zubizarreta, Jennifer Hill, Beth Tipton


from

New York, NY Saturday, May 6 [free, sold out]


Georgia Tech Cyber Security Town Hall: New Standards for Controlled Unclassified Information

Georgia Institute of Technology


from

Atlanta, GA Wednesday, April 26, starting at 10 a.m. in the Student Center Theatre. … Failure to adhere to security standards could lead to the loss of funding for contracts and other projects. In order to protect this funding, Georgia Tech Cyber Security is developing a strategy to help our community comply with these standards and is looking to engage the campus in this discussion. [free]


Geoffrey B. West: The Universal Laws of Growth and Pace

The Long Now Foundation


from

San Francisco, CA Tuesday, May 23, at San Francisco Jazz Center. Tickets go on sale April 23. [price TBA]


SlatorCon London

Slator


from

London, England May 9 starting at 2 p.m., Ace Hotel London Shoreditch [$$$]


Big Data Finance Conference 2017

NYU Center for Data Science


from

New York, NY May 19 starting at 8 a.m., NYU Center for Data Science [$$$]

 
NYU Center for Data Science News



Creating A Multi-Genre Corpus for Natural Language Inference

NYU Center for Data Science


from

Although natural language processing (NLP) has made major strides in the last few years, to what extent can an NLP algorithm understand human sentences beyond a superficial read? Although they can computationally identify, count, or regurgitate individual words, phrases, and sentences, can they capture the meaning behind the words that they are handling?

These questions are at the heart of a fledgling sub-field within NLP called Natural Language Inference (NLI), where CDS professor Sam Bowman’s work is currently located.

 
Tools & Resources



Ten simple rules for making research software more robust

PLOS Computational Biology; Morgan Taschuk and Greg Wilson


from

Software produced for research, published and otherwise, suffers from a number of common problems that make it difficult or impossible to run outside the original institution or even off the primary developer’s computer. We present ten simple rules to make such software robust enough to be run by anyone, anywhere, and thereby delight your users and collaborators.


JASA Editors Offer Advice to Authors

American Statistical Association


from

David Ruppert and Nick Jewell are co-editors of the Journal of the American Statistical Association Theory and Methods section. We asked them to offer advice for prospective authors and discuss their thoughts about JASA and journal publishing in general.

Leave a Comment

Your email address will not be published.