Data Science newsletter – January 11, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for January 11, 2017

GROUP CURATION: N/A

 
 
Data Science News



Strengthening Global Sharing of Scientific Data and Research Findings

The White House, Mahlet Mesfin, Jerry Sheehan and Mark Paese


from

Openly accessible scientific data can be a powerful catalyst in international scientific collaboration. To inform and improve consistency among Federal departments and agencies on open scientific data sharing in support of international scientific cooperation, the Interagency Working Group on Open Data Sharing Policy released a report describing Principles for Promoting Access to Federal Government-Supported Scientific Data and Research Findings Through International Scientific Cooperation. The working group, which reports to the Subcommittee on International Issues established under the National Science and Technology Council’s Committee on Science, includes representatives from Federal science agencies involved in international scientific collaboration. The principles demonstrate the United States’ commitment to increasing access to unclassified scientific data generated by Federal agencies or resulting from Federally funded research (“government-supported scientific data”) to further international cooperation in science and technology to address global challenges.


Viral Pulpits: Clerics and the Sectarianization of the Gulf Online Sphere

George Washington University, Project on Middle East Political Science


from

Dubbed “Twitter Sheikhs” and “YouTube Preachers,” a diverse group of Sunni clerics have emerged as the superstars of the Gulf online sphere.[1] The four most popular Twitter accounts in both Saudi Arabia and Kuwait are Sunni religious leaders.[2] Prominent cleric Mohammed al-Arefe tops the list in Saudi Arabia, with 15.8 million followers, while televangelist Mishary Rashid’s 11.4 million followers have made him the most popular Twitter user in Kuwait. Although news outlets, royal family members, pop stars, and soccer players hold these top spots in other Gulf monarchies, clerics aren’t far behind.


The Humans Working Behind the AI Curtain

Harvard Business Review, Mary L. Gray and Siddharth Suri


from

Just how artificial is Artificial Intelligence? Facebook created a PR firestorm last summer when reporters discovered a human “editorial team” – rather than just unbiased algorithms – selecting stories for its trending topics section. The revelation highlighted an elephant in the room of our tech world: companies selling the magical speed, omnipotence, and neutrality of artificial intelligence (AI) often can’t make good on their promises without keeping people in the loop, often working invisibly in the background.

So who are the people behind the AI curtain?


Real-time Smart City Traffic Monitoring Using Microservices-based Streaming Architecture (Part 2)

MapR, Converge blog, Mathieu Dumoulin


from

This series of blog posts details my findings as I bring to production a fully modern take on Complex Event Processing, or CEP for short. In many applications, ranging from financials to retail and IoT applications, there is tremendous value in automating tasks that require to take action in real time. Putting aside the IT system and frameworks that would support this capability, this is clearly a useful capability.


Cybersecurity expert at Waterloo Engineering takes aim at the ‘impossible’

University of Waterloo


from

An electrical and computer engineering professor at the University of Waterloo, Vijay Ganesh has developed a number of SMT/SAT solvers: cybersecurity programs that generate test inputs, letting developers pinpoint and address vulnerabilities before their software hits the market.

When he started his research in the early 2000s, people told him the challenge was so impossibly complex that he was wasting his time. You can never come up with an algorithm to solve these kinds of equations, his colleagues said, since the number of potential inputs is exponentially large.


Why Uber lost $2.2 billion in 9 months

Vox, Timothy B. Lee


from

Uber has lost money every year since it was founded in 2009, and its losses seem to keep growing. An obvious interpretation of these results is that Uber’s fundamental business is unsound, and the company will never reach profitability.

But it’s important to remember that Uber’s results during its first eight years look a lot like those of Amazon during its early years. Every year from its founding in 1994 until 2000, Amazon lost more money than the year before, leading skeptics to wonder if the company could ever turn a profit.


Trump shouldn’t close the doors on government data

The Washington Post, Catherine Rampell


from

What are the nation’s leading economists freaking out about these days?

Not so much plunging into recession (due to a possible trade war) or, alternatively, an overheated economy (due to a possibly ill-timed fiscal stimulus).

Instead, they’re worried that we won’t actually know whether either scenario materializes — because some of our best economic measurement tools may soon be compromised.


Andreas Müller | Bringing Machine Learning to the Masses

Columbia University, Data Science Institute


from

This fall Andy Müller finally gave his 13,000 Twitter fans what they’d been clamoring for: a layman’s guide to machine learning. The book, Introduction to Machine Learning with Python, is a continuation of work that has made him the go-to engineer for questions on using Python for data science.

For five years Müller has led the daily upkeep of scikit-learn, Python’s suite of data-analysis tools, and attracted a growing following on Twitter for his authoritative advice. He will soon wear a third hat, teaching Columbia’s first course in applied machine learning this spring while running a clinic for researchers grappling with big data problems. Soft-spoken and approachable, he seems to enjoy the challenge of helping problem-solvers overcome technical hurdles.


Apple plans ‘high-tech manufacturing’ of data-center gear in Arizona

Business Insider


from

Apple is seeking permission to conduct “high-tech manufacturing” and to build data-center server gear in a Mesa, Arizona, facility, according to a notice published Monday by the US federal government.


Searching deep and dark: Building a Google for the less visible parts of the web

The Conversation, Christian Mattman


from

In today’s data-rich world, companies, governments and individuals want to analyze anything and everything they can get their hands on – and the World Wide Web has loads of information. At present, the most easily indexed material from the web is text. But as much as 89 to 96 percent of the content on the internet is actually something else – images, video, audio, in all thousands of different kinds of nontextual data types.

Further, the vast majority of online content isn’t available in a form that’s easily indexed by electronic archiving systems like Google’s. Rather, it requires a user to log in, or it is provided dynamically by a program running when a user visits the page. If we’re going to catalog online human knowledge, we need to be sure we can get to and recognize all of it, and that we can do so automatically.


Intellectual Craftsmanship and Scholarly Engagement — JSTOR’s Ideas for Redesigning the Digital Monograph

The Scholarly Kitchen, Jill O'Neill


from

I can’t tell you when I first encountered C. Wright Mills’s article, On Intellectual Craftsmanship, but for me — as an educated layperson — the article was memorable for its discussion of the steps a working scholar should take in beginning to integrate his or her thinking about a topic with the scholarship of those who had gone before. The critical take-away for me was Mills’s characterization of the monographs that would be generated by his young audience of social scientists as being “organized releases from the continuous work that goes into them.” The scholarly process — his ‘continuous work’ — could be expected to emerge from files of notes, annotated pages, and scribbled connections among ideas.

I was reminded of Mills’s work when I was reading the working draft of a JSTOR Labs report published in December of 2016, Reimagining the Digital Monograph: Design Thinking to Build New Tools for Researchers. (The draft is open for comment until January 31, 2017). JSTOR Labs has already launched tools for discovering content quoting material from Shakespeare as well as for helping to understand the United State Constitution. If the Labs’ folks were revisiting the structure of the monograph, I was curious as to the outcome.


Thanks to AI, Computers Can Now See Your Health Problems

WIRED, Science, Megan Molteni


from

Patient Number Two was born to first-time parents, late 20s, white. The pregnancy was normal and the birth uncomplicated. But after a few months, it became clear something was wrong. The child had ear infection after ear infection and trouble breathing at night. He was small for his age, and by his fifth birthday, still hadn’t spoken. He started having seizures. Brain MRIs, molecular analyses, basic genetic testing, scores of doctors; nothing turned up answers. With no further options, in 2015 his family decided to sequence their exomes—the portion of the genome that codes for proteins—to see if he had inherited a genetic disorder from his parents. A single variant showed up: ARID1B.

The mutation suggested he had a disease called Coffin-Siris syndrome. But Patient Number Two didn’t have that disease’s typical symptoms, like sparse scalp hair and incomplete pinky fingers. So, doctors, including Karen Gripp, who met with Two’s family to discuss the exome results, hadn’t really considered it. Gripp was doubly surprised when she uploaded a photo of Two’s face to Face2Gene. The app, developed by the same programmers who taught Facebook to find your face in your friend’s photos, conducted millions of tiny calculations in rapid succession—how much slant in the eye? How narrow is that eyelid fissure? How low are the ears? Quantified, computed, and ranked to suggest the most probable syndromes associated with the facial phenotype. There’s even a heat map overlay on the photo that shows which the features are the most indicative match.


Why Big Data Can Lead to Bigger Problems

Bloomberg


from

“Weapons of Math Destruction” Author Cathy O’Neil discusses the perils of big data. She speaks with Scarlet Fu on “Walk the Talk,” on “Bloomberg Markets.”

 
Events



AI By the Bay



San Francisco, CA March 6-8 [$$$$]

Collaborations Workshop 2017 (CW17)



Leeds, England March 27-29 at the Leeds University Business School. The theme of the workshop will be The Internet of Things (IoT) and Open Data: implications for research. [$$$]
 
Deadlines



ICMR 2017 | Call for open source software papers

Bucharest, Romania Conference is June 6-9. Deadline for submissions is Thursday, January 27.

Privacy + Security Forum I Speaker Proposal

Washington, DC The 2017 Forum will be held October 4-6.
 
Tools & Resources



Three Ways to Install Python on your Windows Computer

yhat, Elise


from

“One of the downsides is that despite the Python community’s attempts to make it an accessible tool for everyone, a lot of folks find the installation process daunting or confusing.”


fivethirtyeight Package

CRAN, Albert Y. Kim, Chester Ismay, and Jennifer Chunn


from

“We present fivethirtyeight: an R package of data and code behind the stories and interactives at FiveThirtyEight.com.”

 
Careers


Postdocs

Postdoc, Division of Biostatistics



Northwestern University, Feinberg School of Medicine; Chicago, IL

Leave a Comment

Your email address will not be published.