Data Science newsletter – May 13, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for May 13, 2019

GROUP CURATION: N/A

 
 
Data Science News



Big Data Can Lie: Simpson’s Paradox

Mind Matters, Robert J. Marks


from

An underlying cause of the challenge posed by Simpson’s Paradox is clustering, where individual data points in a big data set must be grouped into distinct groups (or clusters). Clustering is a heuristic, a practical method of problem-solving that remains largely an art. In the above figure, for example, do we keep all the data in one big cluster, divide it into two groups, or five groups, or ten? If we know nothing more about the data, there is no right answer. For this particular case, we might eyeball that there are five clusters of points in the figure. But in other cases, the points are more homogeneously distributed and it will not be as clear.

Even more challenging is characterizing data that represent 25 different variables instead of only two. Directly visualizing points in a 25-dimensional space is simply not possible for a human. A seasoned practitioner with domain expertise must spend some time becoming friends with the data before figuring which data point goes where. A human is needed to make things work.


Statistics for Data Science

American Statistical Association, Karen Kafadar


from

Most of us would agree the primary reason one collects data is to draw inferences and insights from it. So, our discipline plays a central role in data science, as many others have noted. But the relative contributions of mathematics, computer science, and statistics—and the disciplines of application—will inevitably shape the styles of departments or schools of data science. Just as statistics departments have distinguished themselves with different emphases, we can expect to see diversity develop in data science programs. In any data science program, however, statistics must play some role. Data science must include solid probabilistic and statistical foundations for drawing inferences from data. How much of a role?


Why you should cite open source tools

Juan Nunez-Iglesias, I Love Symposia! blog


from

If you are a user of open source tools in the Scientific Python stack, I have two requests for you:

1. When you publish your work, cite every library that you import. Most scientific software has a notice on their homepage or README file pointing to a paper you can cite. By definition, if you’ve imported a library, you’ve found it useful, and if you’ve found it useful, then you probably care about supporting its authors. This is a small way you can contribute to their success.

2. You are good enough to contribute. If you have an issue with an open source package you are using, look at the source code. Submit an issue to the project’s bug tracker (usually GitHub). And try your hand at fixing it. The software’s authors will usually offer guidance on how to do this, and you will improve your own skills as a result. Good software development practices is one of the most transferrable skills you can gain.


Putting The ‘Art’ In Artificial Intelligence

WBUR, WAMU, The 1A


from

Is AI art just “found art,” or is it something more? Can AI be creative? And if so, how might that redefine what it means to create something? … GUESTS: Marcus du Sautoy, Author, The Creativity Code: Art and Innovation in the Age of AI; Professor of Mathematics at the University of Oxford. @MarcusduSautoy [audio, 34:45]


UC Merced MESA Lab Wins Big Ideas Contest

University of California-Merced, Newsroom


from

Now ubiquitous, machine learning has made its way into farming, thanks to an affordable, unmanned ground vehicle that recognizes and spatially maps agricultural pest infestations, treating them with “surgical precision spraying.”

New, award-winning robot-driven technology developed by a team of UC Merced graduate and undergraduate researchers from the Mechatronics, Embedded Systems and Automation (MESA) Lab is helping drive the evolution of food production.

The students’ robotic system has the potential to revolutionize farming by making it more sustainable, economically viable and healthier for the growing global population.

The robot also won the team first place in UC Berkeley’s Big Ideas contest, standing out from more than 300 submissions in the Food Services category. The Big Ideas award comes with $10,000 and recognition for “outstanding creativity, innovation and potential for social impact.”


DOE awards $5.5M to apply machine learning to geothermal exploration

Green Car Congress


from

The US Department of Energy (DOE) announced up to $5.5 million for 10 new projects to apply machine learning techniques to geothermal exploration and production. Machine learning—the use of advanced algorithms to identify patterns in and make inferences from data—could assist in finding and developing new geothermal resources. If applied successfully, machine learning could lead to higher success rates in exploratory drilling, greater efficiency in plant operations, and ultimately lower costs for geothermal energy.


Geoffrey Hinton discusses how AI could inform our understanding of the brain | VentureBeat

VentureBeat, Kyle Wiggers


from

University of Toronto faculty member, Google Brain researcher, and recent Turing Award recipient Geoffrey Hinton spoke this afternoon during a fireside chat at Google’s I/O developer conference in Mountain View. He discussed the origin of neural networks — layers of mathematical functions modeled after biological neurons — and the feasibility and implications of AI that might someday reason like a human.

“It seems to me that there is no other way the brain could work,” said Hinton of neural networks. “[Humans] are neural nets — anything we can do they can do … better than [they have] any right to.”


On building flexible, inclusive graduate programs and why it matters

Medium, Emily M. Bender


from

The elements that make up the flexibility of CLMS that were critical to this student’s success on one level are simple: The program can be done full time (three classes/quarter + a summer internship; finishing in 12 months) or part time (one-two classes/quarter). Students can be in person on the UW campus, or join in the same courses online (we use Zoom to webcast the lectures). And students can move flexibility between those modes: if someone starts full time but needs to drop back to part time, that’s fine. People can move back and forth between being online and in person even within a course. Students who typically attend in person can avail themselves of the online option (or catch up later with recordings) if they’re home sick, home caring for someone who is sick, traveling, etc. Students who are typically online (e.g. living in another part of the country) are welcome to come in person if they are in Seattle.

We built this flexibility over time, because it benefitted our program: When we first started (in 2005), we imagined that the program would be full-time only, that all students would finish in 12 months (and also, terribly unrealistically, that all students would do an internship and then also complete a thesis based on that internship). In order for this self-sustaining (“fee-based”) degree to be viable, it had to attract enough students. And we quickly discovered that we could attract more students if we made it possible to attend part time.

 
Events



MIT Sloan CIO Symposium

MIT Sloan School of Management


from

Cambridge, MA May 22, starting at 8:25 a.m. “While IT experiences at JetBlue differ from Deutsche Bank or Eli Lilly, tech leaders from these and other companies will find common ground and exchange tips at the day-long conference. The agenda was crowdsourced by the senior IT executives who will attend the event.” [$$$]


New York Scientific Data Summit

Brookhaven National Laboratory (BNL) Computational Science Initiative and Columbia University Data Science Institute


from

New York, NY June 12-14, hosted by Columbia University Data Science Institute. [$$$]


International Conference on Robotics and Automation 2019

IEEE


from

Montreal, QC, Canada May 20-24. “As the flagship conference of the IEEE Robotics and Automation Society, ICRA will bring together the world’s top researchers and most important companies to share ideas and advances in our field.” [$$$$]


MedCity Converge

MedCity News


from

Philadelphia, PA June 19, starting at 8:30 a.m. “Brings together all sectors of the oncology ecosystem—health systems, insurers, digital health, investors, hospitals, pharma, entrepreneurs and other key stakeholders—to discover the latest innovations and the future of cancer care.” [$$$]


Dr. Yuval Shahar Seminar

University of British Columbia, Data Science Institute


from

Vancouver, BC, Canada May 28, starting at 12 p.m., University of British Columbia (ICICS/Computer Science Building). [rsvp requested]

 
Deadlines



2019 Human Development Data Visualization Challenge: Visualize Inequality

“The winner will invited to the announcement event in Brussels, Belgium at the European Development Days on 18 June 2019. The winning data visualization will be published and showcased in the online version of the 2019 Human Development Report; other visualizations will also be showcased by HDRO in different ways, including online and social media.” Deadline for submission is May 27.

Instagram announces Request for Proposals for Well-being and Safety Research at CHI 2019

“Instagram is launching a request for proposals (RFP) for safety and well-being research at CHI 2019. Instagram is looking for proposals that investigate experiences on the platform that foster or harm the well-being and safety of our communities and societies. Find out more about the Instagram’s RFP for Well-being and Safety Research in the next section.” Deadline for applications is July 3.
 
Tools & Resources



Taguette

Remi Rampin. Vicky Steeves and Sarah DeMott


from

“Have you ever searched “free qualitative research software” only to be disappointed that nothing lets you tag your materials? Search no more! Taguette is a free and open source tool for qualitative research. You can import your research materials, highlight and tag quotes, and export the results!”


Computer scientists develop dynamic checklist so web designers can work more efficiently, creatively

Virginia Tech, Virginia Tech Daily


from

Virginia Tech researchers, in collaboration with Manhattan-based web design startup B12, wondered whether it would be possible to take Gawande’s concept and apply it to a scenario with less rigidly defined tasks such as web design, and apply checklists more broadly to creatively driven tasks like writing or programming.

The research team will present their findings at the Association of Computing Machinery’s Human-Computer Interaction conference on Human Factors in Computing Systems in Glasgow, Scotland, on May 7, 2019.

Aditya Bharadwaj, a Ph.D. candidate in the Department of Computer Science in the College of Engineering and first author on the study, collaborated with B12 to study the workflow habits of designers. Many of B12’s design experts work remotely, making face-to-face review an impossibility.


People + AI Guidebook

Google, People + AI Research (PAIR)


from

This Guidebook will help you build human-centered AI products. It’ll enable you to avoid common mistakes, design excellent experiences, and focus on people as you build AI-driven applications.

It was written for user experience (UX) professionals and product managers as a way to help create a human-centered approach to AI on their product teams. However, this Guidebook should be useful to anyone in any role wanting to build AI products in a more human-centered way.


Guidelines for Human-AI Interaction

Microsoft Research; Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, Eric Horvitz


from

Advances in artificial intelligence (AI) frame opportunities and challenges for user interface design. Principles for human-AI interaction have been discussed in the human-computer interaction community for over two decades, but more study and innovation are needed in light of advances in AI and the growing uses of AI technologies in human-facing applications. We propose 18 generally applicable design guidelines for human-AI interaction. These guidelines are validated through multiple rounds of evaluation including a user study with 49 design practitioners who tested the guidelines against 20 popular AI-infused products. The results verify the relevance of the guidelines over a spectrum of interaction scenarios and reveal gaps in our knowledge, highlighting opportunities for further research. Based on the evaluations, we believe the set of design guidelines can serve as a resource to practitioners working on the design of applications and features that harness AI technologies, and to researchers interested in the further development of guidelines for human-AI interaction design.

 
Careers


Full-time positions outside academia

Data Sharing Program Manager



Digital Science Research and Solutions Inc.; Bethesda, MD
Postdocs

Research Fellow in Biomedicine, Self and Society: Beyond Bodies



The University of Edinburgh; Edinburgh, Scotland

Predoctoral Research Fellow (f/m/x)



(WZB Berlin Social Science Centre; Berlin, Germany

Leave a Comment

Your email address will not be published.