Data Science newsletter – July 13, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for July 13, 2017

GROUP CURATION: N/A

 
 
Data Science News



Don’t pay prizes for published science

Nature News & Comment, Editorial


from

China and other countries should look again at how they pay bonuses and allocate grants that are based on individual research papers.


Kill the myth of the miracle machine

Nature News & Comment, Daniel Sarewitz


from

Unchallenged assumptions about how science works threaten its support and decrease its ability to contribute to society, says Daniel Sarewitz.


Google launches Gradient Ventures to invest in A.I. start-ups

CNBC, Jordan Novet


from

Alphabet has officially launched Gradient Ventures, a new firm within Google that will invest in early-stage artificial intelligence start-ups.

The announcement on Tuesday is the latest sign of Alphabet’s growing interest in AI, a futuristic technology that’s also gaining significance at Amazon, Apple, Facebook and Microsoft. Axios reported on the new program in May, but the details were not yet public.

Gradient will invest in 10 to 15 deals this year and will typically commit $1 million to $8 million in each, said Anna Patterson, founder and managing director of the firm.


Talking About Data: Lessons from Science Communication

UC3, Data Pub blog, John Borghi


from

As a person who worked for years in psychology and neuroscience laboratories before coming to work in academic libraries, I have particularly strong feelings about ambiguous definitions. One of my favorite anecdotes about my first year of graduate school involves watching two researchers argue about the definition of “attention” for several hours, multiple times a week, for an entire semester. One of the researchers was a clinical psychologist, the other a cognitive psychologist. Though they both devised research projects and wrote papers on the topic of attention, their theories and methods could not have been more different. The communication gap between them was so wide that they were never able to move forward productively. The punchline is that, after sitting through hours of their increasingly abstract and contentious arguments, I would go on to study attention using yet another set of theories and methods as a cognitive neuroscientist. Funny story aside, this anecdote illustrates the degree to which people with different perspectives and levels of expertise can define the same problem in strikingly different ways.


Microsoft creates an AI research lab to challenge Google and DeepMind

TechCrunch, Darrell Etherington


from

Microsoft has created a new research lab with a focus on developing general-purpose artificial intelligence technology, the company revealed today. The lab will be located at Microsoft’s Redmond HQ, and will include a team of more than 100 scientists working on AI, from areas including natural language processing, learning and perception systems.


How old are your cells? New method determines cell age more accurately

Johns Hopkins University, Hub


from

The multidisciplinary team of engineers and clinicians examined dermal cells from just underneath the surface of the skin taken from both males and females between the ages of 2 and 96.

The researchers from Johns Hopkins, Yale University, and the National Cancer Institute of the National Institutes of Health hoped to devise a system that, through computational analysis, could take the measure of various factors of cellular and molecular functions. From that information, they hoped to determine the biological age of individuals more accurately using their cells, in contrast to previous studies, which makes use of gross physiology, or examining cellular mechanisms such as DNA methylation.


Chicago’s WindyGrid Puts Open Data to Work

The New Stack, Susan Hall


from

With its WindyGrid project, the city of Chicago is using open data from multiple sources, both internal and external, to offer an unprecedented view of everything happening within the city, as Tom Schenk, Chicago’s chief data officer demonstrated at the recent MongoDB World 2017.

WindyGrid combines data collected in more than three dozen of systems within city departments — such as 911 calls, non-emergency 311 calls, building permits, health inspections — and combines it with data from other sources, such as weather data and tweets, to produce a comprehensive view of the city. It can show where police, fire and ambulance vehicles are in real time. It can plot reported potholes and the status of each complaint.


Big cuts ahead for energy innovation programs

Marketplace, Dan Boyd


from

House lawmakers are preparing next year’s federal finances after hearing President Trump’s budget wishlist. A House appropriations subcommittee has just moved forward with a budget for the Department of Energy with includes deep cuts to energy innovation programs.

This isn’t quite the massive paring back proposed by the Trump Administration, but this latest House proposal does slash funding for renewable energy programs by about $1 billion. It also cuts all funding to the Advanced Research Projects Agency – Energy (ARPA-E).

Since 2009, ARPA-E has been giving government grants to energy projects with serious potential which may be too early stage to attract private investment. [audio, 2:21]


Microsoft Thinks AI Will Fill Your Blind Spots, Not Take Over Your Job

MIT Technology Review, Jamie Condliffe


from

Hot on the heels of Google, Microsoft has launched an initiative that it hopes will enable humans and artificial intelligence to complement each other more effectively.

At an event in London on Wednesday, Microsoft announced that it’s bringing together a new team of 100 engineers and researchers under the umbrella of Microsoft Research AI at its headquarters in Redmond, Washington. The company says that it’s an effort to break down barriers between people who have until now been working across separate areas of AI. Speaking at the event, Eric Horvitz, the managing director of Microsoft Research, said that he thinks the initiative will put Microsoft on “the path to understanding the mysteries of human intellect.”


YLS and Wikimedia Foundation Create New Initiative on Intermediaries and Information

Yale Law School


from

The Information Society Project (ISP) at Yale Law School and the Wikimedia Foundation have partnered to create the Wikimedia/Yale Law School Initiative on Intermediaries and Information (WII), a new initiative that aims to raise awareness and support intermediary liability protections and open internet principles globally.

As part of an ongoing collaboration between Yale Law School and the Wikimedia Foundation, the initiative will support deeper research on policies, legislation, and threats related to intermediary liability and hyperlinking legislation.


Halli Labs is joining Google!

Medium, Halli Labs


from

Today, we are thrilled to share the news that the Halli Labs team is joining Google. As we wrote in our introductory blog post, Halli Labs was founded with the goal of applying modern AI and ML techniques to old problems and domains — in order to help technology enable people to do whatever it is that they want to do, easier and better.

Well, what better place than Google to help us achieve this goal. We will be joining Google’s Next Billion Users team to help get more technology and information into more people’s hands around the world. We couldn’t be more excited!


Esri’s ArcGIS Hub Makes Open Data Into a Social Network

CityLab, Laura Bliss


from

Years in the making, a new tool aims to shift that paradigm. Launched Monday by the global GIS mapping software giant Esri, ArcGIS Hub is an online platform that clusters datasets around specific citywide initiatives, in the hope that people can more readily tap into information applicable to their lives. “Rather than ask what data should be available, this asks cities to think about what people care about,” says Andrew Turner, the chief technology officer of Esri’s R&D Center in Washington, D.C. He estimates he’s spent 15 years theorizing about this tool (and about two years building it with his team).


PARC to Develop Explainable Artificial Intelligence (XAI) Science for DARPA

Globe Newswire, PARC


from

PARC, a Xerox company, today announced it has been selected by the Defense Adavanced Research Projects Agency (DARPA), under its Explainable Artificial Intelligence (XAI) program, to help advance the underlying science of AI. For this multi-million dollar contract, PARC will aim to develop a highly interactive sense-making system called COGLE (COmmon Ground Learning and Explanation), which may explain the learned performance capabilities of autonomous systems to human users.

The key idea behind COGLE is to establish common ground between concepts and abstractions used by humans and the capabilities learned by a machine. These learned representations would then be exposed to the human via COGLE’s rich sense-making interface, enabling people to understand and predict the behavior of an autonomous system.

For the DARPA project, COGLE will be developed using an autonomous Unmanned Aircraft System (UAS) test bed.


Prediction of employment and unemployment rates from Twitter daily rhythms in the US

EPJ Data Science, Eszter Bokányi et al.


from

By modeling macro-economical indicators using digital traces of human activities on mobile or social networks, we can provide important insights to processes previously assessed via paper-based surveys or polls only. We collected aggregated workday activity timelines of US counties from the normalized number of messages sent in each hour on the online social network Twitter. In this paper, we show how county employment and unemployment statistics are encoded in the daily rhythm of people by decomposing the activity timelines into a linear combination of two dominant patterns. The mixing ratio of these patterns defines a measure for each county, that correlates significantly with employment (0.46±0.02) and unemployment rates (−0.34±0.02). Thus, the two dominant activity patterns can be linked to rhythms signaling presence or lack of regular working hours of individuals. The analysis could provide policy makers a better insight into the processes governing employment, where problems could not only be identified based on the number of officially registered unemployed, but also on the basis of the digital footprints people leave on different platforms. [full text]


Harvard’s John Brownstein works with Uber, Apple on health tech

CNBC, Christina Farr


from

Brownstein, who is also a computational epidemiologist at Boston Children’s Hospital, has quietly become a go-to for tech executives as they plot their initial steps into the health sector.

“I work with tech companies because I think a lot about how we can use these data-sets to understand population health,” he said in a recent interview with CNBC.

Brownstein is fascinated by how the growing volume of digital data aggregated by tech companies can be meaningful. He studies how things like Google searches, data from wearable trackers, or social media posts can provide insight on predicting disease and improving health outcomes.


Microsoft’s new iPhone app narrates the world for blind people

The Verge, James Vincent


from

Microsoft has released Seeing AI — a smartphone app that uses computer vision to describe the world for the visually impaired. With the app downloaded, the users can point their phone’s camera at a person and it’ll say who they are and how they’re feeling. They can also point it at a product and it’ll tell them what it is. All of this is done using artificial intelligence that runs locally on their phone.

The company showed off a prototype of Seeing AI in March last year at its Build conference, but starting today, the app is available to download for free in the US on iOS. However, there’s no word yet on when it’ll come to Android or other countries.


From the Gowanus Canal to Esteemed Startup Accelerator

NYU Tandon School of Engineering


from

Manifold Robotics — the brainchild of Professor of Mechanical and Aerospace Engineering Maurizio Porfiri and postdoctoral researcher Jeffrey Laut — was recently selected to participate in the prestigious Cleantech Open Northeast 2017 accelerator program. Laut and Porfiri, who is the director of the Dynamical Systems Laboratory at NYU Tandon, will join 30 other companies selected by Cleantech Open Northeast, a regional member of the national Cleantech Open — the largest cleantech startup accelerator that supports entrepreneurs and business ventures in clean technology, such as agriculture, water and waste technologies, transportation, or energy storage.

Manifold Robotics evolved out of the Brooklyn Atlantis project created by Porfiri and Associate Professor Oded Nov, which Laut worked on as a doctoral student at NYU Tandon. Combining robotics, technology and citizen science, the Brooklyn Atlantis team built robotic boats to collect data on water quality and capture images of the infamously polluted Gowanus Canal.


Government Data Science News

The National Institutes of Health with Stanford University used mobile phones to track physical activity from users in 100 countries. Check out the Data Viz of the Week to see which countries are the most active. Gently, the authors note that 5 million people die annually due to causes stemming from inactivity. I would call this couch potato syndrome, but that is offensive. There are legitimate reasons to be encouched. The big question the study raises for me is whether phones can be used to motivate people to move. Would motivation apps just sound like my mildly offensive couch potato syndrome comment? Would robotic motivation be just as likely to motivate indignant oppositional encouchment or would they actually get people going?



Data.gov, the website that flickered on and off earlier this year, has re-upped their contract with REI (government contractor, not outdoors retailer) to improve security, move away from their outdated CKAN open source code platform, and better integrate heterogeneous geospatial data from reporting agencies. Seems like business as usual: somewhat depressing but not alarming.

The UN’s International Telecommunication Union reports that only 38 percent of countries have published a cybersecurity strategy and a whopping, scary, stomach-churning 50 percent don’t even have a cybersecurity strategy at all!

My favorite quote of the week comes from….wait for it….fedscoop, a publication focused on the salubrious inner workings of the US Federal government’s giant squid-like bureaucracy. Evann Smith, lead data scientist at Thresher, a firm contracted to work with government agencies notes that these clients, “are not trying to distinguish between a Chihuahua and a blueberry muffin.” But that is so important! Do not eat your dog for breakfast! …
Thresher, you’ll want to note, is chaired by Harvard’s Gary King. As far as I know, university professors are not allowed to serve in the C-suite of any company. The best they can do is serve as chairman of the board. I presume King is perhaps more heavily involved in Thresher’s day-to-day work than your average chairman. I have no problem with that, seeing as how I am a citizen standing to benefit from Thresher’s contract.. (Sorry Harvard undergrads who may be seeing less of King in the classroom. Perhaps he has some data you can label instead?)

Nature’s editorial board has publicly reprimanded Chinese universities (typically backed by the Chinese government) for paying publishing prizes to their faculty. Publishing bounties are bad for science, because “handing out prizes so soon after publication rewards science that is not yet proven.” I would also imagine it results in great crush of sub-par submissions that tax the voluntary nature of reviewers. (Maybe they should offer randomized payouts to reviewers, instead?)

Tom Schenk Chief Data Officer extraordinaire of Chicago publicly announced WindyGrid at MongoDB World 2017. WindyGrid is a data-driven dashboard providing real-time information about the entire city of Chicago. For instance, WindyGrid “can show where police, fire and ambulance vehicles are.” It can plot reported potholes, mosquitos testing positive for West Nile, failed restaurant health inspections, and is working on adding the ability to press “play” to display future projections. This is by far the best data-rich platform for accessing city data out there.

ESRI (a private company that makes ArcGIS software) also promoted its data-driven city navigation platform, ArcGIS Hub, last week. The city of Long Beach has been using the Hub since earlier this year.

ARPA-E, the department of energy’s Advanced Research Projects Agency for Energy program, could have its entire budget zeroed. There are additional proposed cuts to programs that support energy innovation research at the DoE.

DARPA has chosen PARC, yes, the PARC affiliated with Xerox, to advance explainable AI (XAI). Researchers at PARC will be collaborating with faculty from Carnegie Mellon University, West Point, University of Michigan, University of Edinburgh, and the Florida Institute for Human & Machine Cognition.

Rhode Island may become the first state to limit access to the records (including emails) of public university faculty. Public records requests are intrusive and time consuming – imagine having to submit all of your email, minus student correspondence which is protected under FERPA, to a special interest group. The law was initiated to protect them from groups who file public interest requests specifically to slow their research.


This New MIT Master’s Program Doesn’t Require A College Or High School Degree

WBUR, Fred Thys


from

The Poverty Action Lab, officially known as the Abdul Latif Jameel Poverty Action Lab, or J-PAL, is testing more than 800 programs around the world. And now it’s part of a bold experiment by MIT: to allow students to take rigorous courses online for credit, and if they perform well on exams, to apply for a master’s degree program on campus.

“Anybody could do that,” Duflo says. “At this point, you don’t need to have gone to college. For that matter, you don’t need to have gone to high school.”

The master’s program is in data, economics and development policy. Duflo says with the knowledge gained in the program, students should be able to run their own evaluation projects. They would know that most imaginative, well-thought-out programs fail, and therefore they have to be tested in the field — and they would have the tools to do that testing. [audio, 4:25]


Toyota launches venture capital fund targeting artificial intelligence startups

TechCrunch, John Mannes


from

Toyota is the latest Fortune 500 company to launch an AI focused venture capital fund. The initial early-stage fund will deploy $100 million and operate as a subsidiary of the Toyota Research Institute. The automaker has strategically positioned itself as an ROI rather than strategic-focused fund — meaning that it aims to profit like any other VC firm.

 
Events



ICME Summer Workshops

Stanford University, Institute for Computational & Mathematical Engineering


from

Stanford, CA The Institute for Computational & Mathematical Engineering at Stanford offers a variety of summer workshops to students, partners, and the wider community. This year’s series of day-long workshops is happening from August 14-18. [$$$]

 
Deadlines



ParlAI request for proposals

Facebook is pleased to invite university teams to respond to this call for research proposals on chatbots and dialogue systems that make use of Facebook’s ParlAI framework; a total of 7 awards are available. Deadline for applications is August 25.
 
NYU Center for Data Science News



Seed grants to Sam Bowman

NYU Department of Linguistics


from

Sam Bowman recently received NYU’s University Research Challenge Fund seed grant for a project titled “Unsupervised Sentence Representation Learning.” The project covers methods for learning general-purpose vector representations for the meanings of sentences (rather than of words, as in the widely-used “word embeddings” family of techniques).


NYU releases the densest LiDAR dataset ever to help urban development

TechCrunch, Darrell Etherington


from

New York University has made available the densest public LiDAR data set ever collected, via its Center for Urban Science and Progress. The laser scanned data, collected using aerial LiDAR instruments, is about 30 times as dense as a typical data set at a resolution of around 300 points per square meter, and covers a 1.5km square region of Dublin’s city center.

The data was collected by Professor Debra F. Laefer and her research team, and includes both a top-down view of the roofs and distribution of buildings, as well as info about their vertical surfaces, making it possible to build 3D models of the urban landscape with detail around building measurements, tress, power lines and poles and even curb height, CUSP says.

 
Tools & Resources



Clarifai launches SDK for training AI on your iPhone

VentureBeat, Khari Johnson


from

Computer vision startup Clarifai has launched a mobile SDK in limited preview to process and carry out artificial intelligence on iOS devices.


Creating insanely fast image classifiers with MobileNet in TensorFlow

Hackernoon, Matt Harvey


from

“MobileNets are a new family of convolutional neural networks that are set to blow your mind, and today we’re going to train one on a custom dataset.”


Benchmarking TensorFlow on Cloud CPUs: Cheaper Deep Learning than Cloud GPUs

Max Woolf


from

I’ve been working on a few personal deep learning projects with Keras and TensorFlow. However, training models for deep learning with cloud services such as Amazon EC2 and Google Compute Engine isn’t free, and as someone who is currently unemployed, I have to keep an eye on extraneous spending and be as cost-efficient as possible (please support my work on Patreon!). I tried deep learning on the cheaper CPU instances instead of GPU instances to save money, and to my surprise, my model training was only slightly slower. As a result, I took a deeper look at the pricing mechanisms of these two types of instances to see if CPUs are more useful for my needs.


How to Get Fresh Perspective on a Project You’re Sick Of

New York Magazine, Science of Us blog, Alice Robb


from

If you hear a single word over and over, it can start to lose its meaning: linguists call the phenomenon “semantic satiation.” If you’re exposed to the same smell again and again, your nose can stop registering it. And when you read over thousands of your own words — if you’re trying to edit a book, a dissertation, a long article — a similar process kicks in. It can become almost impossible to assess a project you’ve been immersed in for weeks or months.


The Facebook Algorithm Mom Problem

Chris Aldrich


from

For quite a while now, I’ve been publishing most of my content to my personal website first and syndicating copies of it to social media silos like Twitter, Instagram, Google+, and Facebook. Within the Indieweb community this process is known as POSSE an acronym for Post on your Own Site, Syndicate Elsewhere.

Anecdotally most in social media have long known that doing this type of workflow causes your content to be treated like a second class citizen, particularly on Facebook which greatly prefers that users post to it manually or using one of its own apps rather than via API.

This means that the Facebook algorithm that decides how big an audience a piece of content receives, dings posts which aren’t posted manually within their system. Simply put, if you don’t post it manually within Facebook, not as many people are going to see it.

Generally I don’t care too much about this posting “tax” and happily use a plugin called Social Media Network Auto Poster (aka SNAP) to syndicate my content from my WordPress site to up to half a dozen social silos.

What I have been noticing over the past six or more months is an even more insidious tax being paid for posting to Facebook. I call it “The Facebook Algorithm Mom Problem”.

 
Careers


Full-time positions outside academia

Software Engineer



CognitiveScale; Austin, TX
Postdocs

Post-Doc Data Science Fellowship



University Of Chicago, Center for Data Science and Public Policy Computation Institute; Chicago, IL

Post-Doc: Sensemaking et Narration



Ecole nationale d’ingénieurs de Brest (ENIB); Brest, France

Postdoctoral Research Assistant



University of Oxford, Oxford Vaccine Group; Oxford, England
Full-time, non-tenured academic positions

Data Scientist



University Of Chicago, Center for Data Science and Public Policy Computation Institute; Chicago, IL
Tenured and tenure track faculty positions

Open Rank Professor of Political Science/Social Data



Penn State University; University Park, PA

Leave a Comment

Your email address will not be published.