Data Science newsletter – July 14, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for July 14, 2017

GROUP CURATION: N/A

 
 
Data Science News



How deep learning can help us understand physician specialties

Amino, Shashank Shekhar


from

At Amino, we care about helping people find the right doctor. That’s why we decided to do an investigative project to represent physician data in a way that would allow us to discover how specialties relate to each other and determine one or more specialties for a physician.

We found that a deep learning technique helped us solve this problem effectively.


Apple Reveals Why the A.I.-Powered HomePod Will Sound So Advanced

Inverse, Mike Brown


from

Apple’s HomePod has the potential to blow away the competition when it launches in December. The $349 pod costs more than the $179 Amazon Echo and $129 Google Home, but Apple is pushing the speaker’s high-end audio quality that could lure consumers away. A new patent published Thursday reveals how Apple may use advanced artificial intelligence to boost the already-impressive speaker hardware even further, making music sound crystal clear no matter the setting.


It’s Time to Make Code More Tinker-Friendly

WIRED, Business, Clive Thompson


from

We need new tools that let everyone see, understand, and remix today’s web. We need, in other words, to reboot the culture of View Source.

The good news? A renaissance is approaching. This spring, New York’s Fog Creek Software launched Glitch, a site that hosts hundreds of simple web apps—everything from Tetris clones to databases and to-do lists—written using Javascript. The code for each is visible and usually helpfully studded with creator comments, so noobs can grok it. You can easily spin off your own copy of the app, tweak it, and publish it.


The AI detectives

Science, News, Paul Voosen


from

Jason Yosinski sits in a small glass box at Uber’s San Francisco, California, headquarters, pondering the mind of an artificial intelligence. An Uber research scientist, Yosinski is performing a kind of brain surgery on the AI running on his laptop. Like many of the AIs that will soon be powering so much of modern life, including self-driving Uber cars, Yosinski’s program is a deep neural network, with an architecture loosely inspired by the brain. And like the brain, the program is hard to understand from the outside: It’s a black box.

This particular AI has been trained, using a vast sum of labeled images, to recognize objects as random as zebras, fire trucks, and seat belts. Could it recognize Yosinski and the reporter hovering in front of the webcam? Yosinski zooms in on one of the AI’s individual computational nodes—the neurons, so to speak—to see what is prompting its response. Two ghostly white ovals pop up and float on the screen. This neuron, it seems, has learned to detect the outlines of faces. “This responds to your face and my face,” he says. “It responds to different size faces, different color faces.”

No one trained this network to identify faces.


Every AI Powerhouse Wanted This Whiz Kid. He’s Taking Them On Instead

Forbes, Aaron Tilley


from

In the summer of 2013, as Matthew Zeiler was close to finishing a Ph.D. in artificial intelligence at New York University, he seemed to have every tech giant in the palm of his hand. Zeiler had left an internship with a Google AI group a few weeks earlier when he got a call from an unknown number while he was running along the Hudson River. It was Alan Eustace, then a senior vice president of engineering at Google, who had heard about Zeiler’s AI chops. Eustace wanted Zeiler to join permanently. To entice him, Eustace told him he would make an offer that was among the highest Google had ever made to a new graduate, Zeiler recalls. Zeiler won’t say how much he was offered, and Google declined to comment. But offers for top recruits with specific expertise can add up to several millions of dollars over four years, according to people with knowledge of the matter. Regardless, Google’s offer kicked off a bidding war for Zeiler and his know-how in deep learning, the vaunted branch of AI that’s driving major breakthroughs in computing.

Within days, Zeiler received a bigger offer from Microsoft, which Google promptly matched. Apple also wanted to chat, and when Zeiler flew out to Silicon Valley, Mark Zuckerberg personally sought to persuade him to join a new AI research group at Facebook. Zeiler respectfully turned them all down, deciding instead to start a company with an audacious goal: to compete with the giants that were courting him. “It was a crazy period,” Zeiler remembers. “I had this low-risk opportunity of joining a tech giant versus doing my own startup.” Zeiler says he knew that some of his algorithms worked better than Google’s on certain AI problems. “I knew I had to follow my gut,” he says.

Four years later, Zeiler’s New York City-based startup, Clarifai, is widely seen as one of the most promising in the crowded, buzzy field of machine learning.


Why The First Complete Map of the Ocean Floor Is Stirring Controversial Waters

Smithsonian, Kyle Frischkorn


from

Charting these watery depths could transform oceanography. It could also aid deep sea miners looking for profit


How Will The Pentagon Create Its AIs? The Algorithmic-Warfare Team Is Charting a Path

Defense One, Bradley Peniston


from

By year’s end, advanced algorithms will be helping troops in Iraq and Syria hunt through video and still imagery to find ISIS targets — and that’s just the start for AI, said Col. Drew Cukor, who leads the Pentagon team charged with laying out a plan to begin realizing the promise of artificial intelligence for the U.S. military.

“The Department of Defense is starting with computer vision because there’s been immense breakthroughs from the research into self-driving cars, but everything will be touched,” Cukor said Thursday at the Defense One Tech Summit in Washington, D.C.“We have an acquisition community that is larger than the entire Marine Corps. We have a large maintenance corps that could be helped by the kind of predictive AI already in use by commercial aviation companies.”


Lights, camera, CRISPR: Biologists use gene editing to store movies in DNA

Nature News & Comment, Heidi Ledford


from

Internet users have a variety of format options in which to store their movies, and biologists have now joined the party. Researchers have used the microbial immune system CRISPR–Cas to encode a movie into the genome of the bacterium Escherichia coli.

The technical achievement, reported on 12 July in Nature, is a step towards creating cellular recording systems that are capable of encoding a series of events, says Seth Shipman, a synthetic biologist at Harvard Medical School in Boston, Massachusetts. While studying brain development, Shipman became frustrated by the lack of a technique to capture how cells in the brain take on distinct identities. This inspired him to explore the possibility of making cellular recorders.


UW iSchool’s new dean on the present — and future — of what used to be called library school

The Seattle Times, Katherine Long


from

It used to be known as the library school, and it’s still graduating librarians. But the University of Washington’s Information School, or the iSchool, is now as much about managing data as it is about helping library patrons find the right book.

Pending approval from the University of Washington’s regents board Thursday, the iSchool will have a new dean, Anind Dey. He replaces Harry Bruce, who is stepping down after 11 years.

Dey, 47, comes to the UW from Carnegie Mellon University, where he has been a professor since 2005 and researches the use of everyday technology to develop tools and techniques for understanding and modeling human behavior — for example, whether a cellphone can be put to use monitoring a student to see if he or she is binge-drinking.


SoIC faculty earn grant to expand Data Capsule Service of the HathiTrust Digital Library

Indiana University, SoIC News


from

Several faculty of the School of Informatics and Computing, in partnership with research libraries and library schools, have been awarded a research grant from the Institute of Museum and Library Services for an extension of the Data Capsule service, which enables remote access of the HathiTrust Digital Library, to other collections managed by research libraries.

As the volume of digital content has expanded exponentially over the past several years, researchers and educators have recognized the potential of big data techniques to analyze, access, and organize digital scholarly collections. The Data Capsule service, which was developed for use in the HathiTrust Research Center (HTRC), creates virtual computers for users to access a restricted collection. Within HTRC, the Data Capsule service is used for non-consumptive analytics, which allows the computer to analyze the text but doesn’t allow the user to read or disseminate copyrighted content.


Tacos vs Burritos Index: The Great Divide in Mexican-American Cuisine

Priceonomics, Datafiniti


from

We analyzed restaurant menu data from Priceonomics customer Datafiniti to see who serves Mexican food and what kind of food that actually is. With the ability to filter for cuisine as well as restaurants with available menu data, we easily found several thousand records to start our investigation. From this initial dataset, we extracted over 100,000 menu items and searched for specific instances of tacos or burritos. Finally, by grouping this data geographically, we were able to compare cities.

Ultimately we found that most major cities (e.g. NYC), as well as cities in the Southwest and California, had the most Mexican restaurants to offer. Cities in Texas, Colorado, and California reign supreme for the most restaurants per capita. In the taco vs. burrito debate, the overall skew of menu items was 56% tacos and 44% burritos nationally. Most notably, cities in Texas offered mostly taco options, while cities in the middle of the country and Northwest offered more burrito options.


Extra Extra

Bruce Schneier is one of the best, publicly accessible writers when it comes to cybersecurity. Last week he explained “traffic shaping”, an NSA practice wherein flows of communication data are routed in ways that allow them to be legally and technically surveilled. Schneier may be pleased to note that the Girl Scouts are now offering a merit badge in information security. You go, girls!

NPR produced a story [4 min] on the impact of air pollution on human health, especially mortality. Francesca Dominici proves that current EPA safety standards are not stringent enough; ~12,000 deaths per year could be prevented by lowering the level of fine particulate matter in the air by one microgram per cubic meter. The Trump administration has proposed moving in the opposite direction, loosening standards.



America’s Cup is a nail-biting, at times death-defying, high speed catamaran race that is – like so many human + object races – as much about engineering as it is about athleticism. In this case, the boats can cost as much as a billion dollars. Weird allocation of resources! But since someone went and spent all that money, go see what it bought them.

Maksym Zavershynksyi has a compact explainer about technical debt in machine learning, the lurking giant under all of our projects’ beds. He focuses on feedback loops, correction cascades, and hobo-features. It’s perfect for teaching – short enough the students will read it, clear enough that they will get it, and organizationally ready to jump into a slide deck.

Daniel Sarewitz, co-director of the Consortium for Science, Policy and Outcomes at Arizona State University based in Washington DC argues that scientists are acting like a special interest group and that we are aligned with Republicans when we fight for basic science. He argues that many socially impactful advances have come from applied science (though he doesn’t use the term applied science) and that there’s too much hype around basic science. Got my adrenaline flowing. I’m uncomfortable with any argument that continues to pit applied science against basic science.


(possible) Tweet of the Week

Twitter


from


If you’re not a white male, artificial intelligence’s use in healthcare could be dangerous

Quartz, Robert Hart


from

Healthcare inequalities are systemic and closely intertwined with social inequalities. In the US, black men and women can be expected to live a decade less than their white counterparts, and are also much more likely to die from heart disease, various types of cancer, and stroke. Rates of diabetes in Hispanic Americans are around 30% higher than in whites. Gay, lesbian, and bisexual adults are twice as likely to suffer with mental-health problems. Access to and quality of healthcare is similarly dismal when it comes to diversity, starkly cutting across racial, social, and economic divides.

If developed and used sensitively, artificial intelligence systems could go a long way to mitigating these inequalities by removing human bias. A careless approach, however, could make the situation worse.


The end of the internet startup

Vox, Timothy B. Lee


from

We haven’t had a major new technology company in more than 10 years.


Microsoft launches AI for Earth to give $2M in services to environmental projects

TechCrunch, Ingrid Lunden


from

After helping to launch the Partnership on AI with Google, Facebook and others; and doubling down on AI research, today Microsoft unveiled a new initiative that points to how it plans to target specific verticals in what can potentially be a very nebulous field — while also raising the public image of AI as some grow concerned about the implications of its encroaching influence.

Today, the company announced AI for Earth, a new program that will be dedicated to AI-based projects in the areas of agriculture, water, biodiversity and climate change, where Microsoft proposes to donate up to $2 million in Microsoft tools, services and training for projects to help them get a leg up. It will be led by Microsoft Chief Environmental Scientist Lucas Joppa.


Artificial Intelligence Helps Build Brain Atlas of Fly Behavior

Howard Hughes Medical Institute


from

Scientists at HHMI’s Janelia Research Campus created comprehensive brain maps linking different groups of neurons to specific behaviors, using a machine-learning program that annotated more than 225 days of videos of flies – a feat that would have taken humans some 3,800 years.


Augmenting diagnostic vision with AI

The Lancet, Giorgio Quer


from

In the span of their professional lives a radiologist will read over 10 million images, a dermatologist will analyse 200 000 skin lesions, and a pathologist will review nearly 100 000 specimens. Now imagine a computer doing this work over days, rather than decades, and learning from and refining its diagnostic acumen with each new image. This is the capability that artificial intelligence (AI) will bring to medical care: the potential to interpret clinical data more accurately and more rapidly than medical specialists. [free full text with registration]

 
Events



Dr. Fei-Fei Li will keynote GHC17

Anita Borg Institute


from

Orlando, FL The Grace Hopper Celebration of Women in Computing (GHC) is the world’s largest gathering of women technologists, takes place October 4-6. [$$$$]


FARCON

MinneAnalytics


from

Minneapolis, MN August 24. Minnesota’s premiere conference on Analytics in the Financial and Retail Industries. [$$]

 
Deadlines



Journal of Computational Social Science

To be published in 2018, Journal of Computational Social Science is an interdisciplinary peer-reviewed journal that ties together groundbreaking research across the strata of the social sciences (sociology, economics, political science, psychology, linguistics, and other disciplines), physics, biology, management science, computer science, and data science.

Data Science for Social Good Conference 2017

Chicago, IL The conference at the University of Chicago will highlight the successes, opportunities, and challenges faced by the growing Data Science for Social Good community. Deadline for submissions is July 28.

FORCE 2017

Berlin, Germany The FORCE2017 Research Communication and e-Scholarship Conference brings together a diverse group of people interested in changing the way in which scholarly and scientific information is communicated and shared. Deadline for submissions is August 15.

Parkinsons Disease Digital Biomarker DREAM Challenge

The Challenge is to “benchmark methods for the processing of sensor data for development of digital signatures reflective of Parkinson’s Disease.” Funders: Michael J. Fox Foundation and
Robert Wood Johnson Foundation. Registration required. Deadline for submissions is October 2017.
 
NYU Center for Data Science News



Videos

Medium, NYU Center for Data Science


from

Learn more about CDS with 2 new videos. One introduces the CDS faculty. The second explains the PhD. program.


Dah-tah or Day-tah? Analyzing the global Americanization of British English

Medium, NYU Center for Data Science


from

CDS prof. Bruno Gonçalves & company use data from Twitter and Google Books to analyze the English language


NYU Center for Urban Science and Progress Professor Releases World’s Densest Urban Aerial Laser Scanning Dataset

NYU Center for Urban Science and Progress, press release


from

New York University’s Center for Urban Science and Progress professor Debra F. Laefer today released the world’s densest urban aerial laser scanning (LiDAR) dataset. At over 300 points per square meter, this is more than 30 times denser than typical LiDAR data and is an order of magnitude denser than any other aerial LiDAR dataset. The dataset also includes the first ever urban scan with the fullwave form version of the data, as well as affiliated imagery and video. (TechCrunch also covered the dataset.)

 
Tools & Resources



Imagine this: Creating new visual concepts by recombining familiar ones

Google DeepMind; Alexander Lerchner, Irina Higgins and Matt Botvinick


from

This idea of “compositionality” is at the core of human abilities such as creativity, imagination and language-based communication. Equipped with just a small number of familiar conceptual building blocks, we are able to create a vast number of new ones on the fly. We do this naturally by placing concepts in hierarchies that run from specific to more general and then recombining different parts of the hierarchy in novel ways.

But what comes so naturally to us, remains a challenge in AI research.

In our new paper, we propose a novel theoretical approach to address this problem. We also demonstrate a new neural network component called the Symbol-Concept Association Network (SCAN), that can, for the first time, learn a grounded visual concept hierarchy in a way that mimics human vision and word acquisition, enabling it to imagine novel concepts guided by language instructions.


8 Steps to Making a Mobile-Friendly Survey

SocialCops blog


from

“Mobile data collection is a completely different experience than standard surveys, and a mobile-friendly survey should reflect that. Below are some strategic and technical elements to consider when designing a survey for a mobile platform.”


Making Data Science – IoT Projects fail-proof!

Medium, Towards Data Science, Harish Vadada


from

Data Science projects need to be setup for success with the right level of stakeholder sponsorship both in terms of processes as well as with the right — data science exploratory tools, data and people.

In my experiences as a PM, I have seen data science projects fail due to Process reasons —

  • I. Differences between traditional software engineering and Data Sciences.

  • pyspark 2.2.0 : Python Package Index

    Spark Developers


    from

    “Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.”


    ReproZip Jupyter Extension

    YouTube, ViDA-NYU


    from

    “This video demonstrates using the ReproZip extension to the Jupyter Notebook to automatically package a notebook’s environment, dependencies and data, allowing for future reproduction anywhere.”


    Evaluating Interactive Graphical Encodings for Data Visualization

    Medium, GT Visualization


    from

    Data representation is one of the main components of data visualization. The fundamental focus of data representation is mapping from data values to graphical representations. Visualization designers use elementary graphical units called “graphical encodings” to map data to graphical representation. Consider a case in which we visualize two numerical values using two bars with different lengths. Here, length is the primary encoding variable used to map the data values. Alternatively, angle is the primary graphical encoding for a pie chart.


    Five Things We Learned At . . . SciPy2017

    NYU Data Services, Data Dispatch, Vicky Steeves


    from

    These are my top 5 favourite takeaways from SciPy 2017:
    1. SciSheets

     
    Careers


    Full-time positions outside academia

    Data Scientist



    PUSH Inc.; Toronto, Ontario, Canada
    Postdocs

    Post-doctoral position in learning for IoT



    Universite Catholique de Louvain; Louvain-la-Neuve, Belgium

    Leave a Comment

    Your email address will not be published.