Data Science newsletter – October 13, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for October 13, 2017

GROUP CURATION: N/A

 
 
Data Science News



Company Data Science News

Facebook COO Sheryl Sandberg gave an interview with Axios in which she addressed how the platform will sanction political campaign ads following revelations that fb sold $100,000 of political ads to Russian operatives. (To be fair, that’s a fairly small budget.) She basically notes that fb will stick to its core values and allow any type of speech in political ads – true, twisted, or otherwise – except for violence, porn, and libelous content. In her words: “When you allow free expression, you allow free expression.” She then freely expressed that she thinks “the Harvey Weinstein thing is abysmal. And it’s not just his behavior, it’s the behavior of everyone around him…the world is still run by men. I’m not sure it’s going that well.” So there’s freely expressed opinion for us.

Amazon has also been impacted by the Harvey Weinstein scandal, calling for the suspension of Roy Price, head of Amazon Studios pending an investigation of sexual harassment sustained by Isa Hackett in and around Comic-Con. This is the first time I have ever linked to the Hollywood Reporter. It may not be the last. If Harvey Weinstein, a man who was immune to prosecution for over a decade, can be forced into the justice system, more women in Hollywood may feel emboldened to speak up.



Echo is Amazon’s voice activated assistant and it can now tell the difference between individual voices. This is a catch-up move. Google Home launched a similar software upgrade last April.



Milk cows! [I’m from the Midwest. Please excuse my penchant for non-sequiturs and dairy exuberance.] SomaDetect, run by Bethany Deshpande and her husband and business partner Nicholas Clermont in Fredericton, New Brunswick, Canada, won $1m in a business plan contest. Their product is a sensor “that can quickly detect two main things: the fat content in milk, which can increase what a farmer is paid for the milk, and somatic cell counts, which can indicate disease.” Such a cool application of sensors and data science.

Baidu is increasing its US footprint and heightening competition for talented data scientists by opening another autonomous driving plant in Sunnyvale, CA.

Western Digital brought a 14TB hard drive to market. It is filled with helium and rated at 2,500,000 hours Mean Time Between Failures.



DeepMind is opening an office in Montreal near McGill University. Reminder, Facebook just opened an AI research lab there, too. We have been saying it here in the newsletter: Canada is a major hub for data science research and development.



DeepMind also had to release its 2016 revenue data to the UK government last week. They operated at a $162m loss with $137m going for staff related costs including salary, travel, hardware, software, and probably stock options. It’s so hard to imagine the speed of their staff expansion if they weren’t owned by Google.



GitLab raised another $20m in a Series C round led by GV. GV used to go by the name Google Ventures. From the TechCrunch write up, “GitLab started out as a git-based open source tool for self-hosting code repositories. Since its launch in 2014, the company has branched out, though, and added a number of more DevOps-centric services to its lineup. This includes a number of workflow tools, but also features that easily enable code review/test/release automation and even application monitoring.”



Google’s Fei Fei Li gave an interview in which she noted that AI researchers, “need to be much more human-centered. If you look at where we are in AI…it is very task-focused, it lacks contextual awareness, and it lacks the kind of flexible learning that humans have. We also want to make technology that makes humans’ lives better, our world safer, our lives more productive and better.” I’m so glad to see this many people speaking out at once from different corners about the importance of the human contexts in which our technologies operate.



Tim O’Reilly has a new book out. It’s called WTF. I can guarantee nobody would ever allow me to publish with a title like that. Actually, the title is a little longer than that, “WTF: What’s the future and why it’s up to us.” But if you read him here in evonomics I think it’s possible to assume your first guess about the meaning of WTF is one accurate interpretation of his meaning. It may be better than the publishers safe-for-work compromise, but they know what sells far better than I do. O’Reilly also gave last week’s Data Bite talk at Data & Society which you can watch for free.



Heads up: Amazon has a clandestine office in Santa Cruz for the Alexa project. And Amazon executive reportedly had HQ2 talks with Pittsburgh city officials. Good for surfers, and for Steelers.


Pittsburgh, Allegheny County officials land conversation with Amazon executive

Pittsburgh Post-Gazette, Adam Smeltz


from

Pittsburgh faces competition from dozens of other cities as it woos Amazon’s lucrative second headquarters, including up to 50,000 jobs.

On Tuesday, Mayor Bill Peduto and Allegheny County Executive Rich Fitzgerald landed a private conference call with the Amazon executive who will read all those proposals, Mr. Peduto confirmed Wednesday.


Something is bothering me about the press surrounding this article. Let me tell you why…

Twitter, Simon DeDeo


from

Emilio Ferrara‏ @jabawack

First Evidence That Online Dating Is Changing the Nature of Society https://buff.ly/2wNutqa


How Will Artificial Intelligence Impact Open Technologies?

Medium, Jason Schultz


from

Mozilla has long been one of the strongest champions for openness in technology — from the software it produces to the web standards it adopts. As new technologies emerge, the battle over closed versus open systems continues to be one of the most important factors for a range of concerns that are critical to a healthy information ecosystem — innovation, competition, privacy, security, consumer protection — and even civil rights.

With new advances in artificial intelligence — particularly in the fields of machine learning and sensor technology — questions of “open” versus “closed” have arisen again. However, what is becoming quickly clear is that the traditional open strategies, such as permissive licensing and code/documentation publication, may not work as well or even at all.


How the 4 Largest Airlines Use Artificial Intelligence

Techemergence, Kumba Sennaar


from

The most popular AI applications from the top four industry leaders currently using AI appear to be:

  • AI Assistants: Responding to customer inquiries and responding to voice commands for domestic airline flight info and ticket availability through interactions using natural language (see American Airlines and United below)
  • Smart Logistics: Machine learning algorithms are being applied to data to help automate airline operations. (see Southwest below).
  • Facial Recognition: Facial recognition technology is being used to perform customer identity verification and to match passengers to their luggage through kiosks (see Delta below, and you may want to read our full “facial recognition use-cases” article here)

  • Inside the Moonshot Effort to Finally Figure Out the Brain

    MIT Technology Review, M. Mitchell Waldrop


    from

    AI is only loosely modeled on the brain. So what if you wanted to do it right? You’d need to do what has been impossible until now: map what actually happens in neurons and nerve fibers.


    HHS is investing in growing its in-house data science talent

    Fedscoop, Tajha Chappellet-Lanier


    from

    Data, which holds the promise of increased efficiency and better mission outcomes for federal agencies, can’t deliver much without the help of capable data science practitioners. So the Department of Health and Human Services is looking to build out its own data science talent.

    This month the agency is launching the pilot edition of CoLab, an eight-week-long data science training program supported by the Biomedical Advanced Research and Development Authority, the Office of Business Management and Transformation, the Office of the CTO and the IDEA Lab.

    CoLab is a kind of spin-off of HHS Ignite, an accelerator-like IDEA Lab program that allows teams of HHS employees to explore innovative startup ideas. It’s being run in partnership with the D.C.-based data science education startup Data Society.


    NSF Awards TACC and University of Louisville $600,000 Grant for Data Science Education

    HPC Wire


    from

    Colleges and universities across the US are creating data science programs to train future professionals to manage the massive amounts of digital data created by a range of sources – from web traffic to digital cameras. This data analysis frequently requires large-scale cyberinfrastructure – advanced computing systems that can deal with terabytes or even petabytes of data. However, few programs teach students how to use such resources effectively.

    A new, three-year, $600,000 grant from the National Science Foundation’s (NSF) Education and Human Resources directorate to the Texas Advanced Computing Center (TACC) and the University of Louisville (UofL) will support the development of training, tools, and a cloud-based virtual environment to teach data science at the largest scales and provide computational resources for education. The grant is part of NSF’s “Improving Undergraduate STEM Education” (IUSE) program.


    National Academy of Sciences Starts Framing Data Science Education

    Campus Technology, Dian Schaffhauser


    from

    As the use of data proliferates through business, government and academia, a new job title — the data scientist — has emerged, sweeping away seemingly less compelling occupations in its wake. For the second year in a row, for example, job website Glassdoor named “data scientist” as the top career based on the number of job openings, salary and overall job satisfaction rating. PwC projected in 2015 that 2.3 million open jobs asked for the kind of analytics skills on which data scientists thrive.

    That kind of workforce opportunity has inspired hundreds of colleges and universities to open certificate and undergraduate- and master’s-level degree programs. Yet, no two programs are the same. As a new interim report from the National Academies of Sciences stated, the field is heaped with “new principles for data collection, storage, integration and analysis.” Plus, new tools abound in the field. Yet, “the main concepts, skills and ethics powering this emerging discipline of data science” haven’t been identified yet. Data science, as the report stated, “is still in its infancy.”


    Scientists Can Read a Bird’s Brain and Predict Its Next Song

    MIT Technology Review, Antonio Regalado


    from

    Entrepreneurs in Silicon Valley this year set themselves an audacious new goal: creating a brain-reading device that would allow people to effortlessly send texts with their thoughts.

    In April, Elon Musk announced a secretive new brain-interface company called Neuralink. Days later, Facebook CEO Mark Zuckerberg declared that “direct brain interfaces [are] going to, eventually, let you communicate only with your mind.” The company says it has 60 engineers working on the problem.

    It’s an ambitious quest—and there are reasons to think it won’t happen anytime soon. But for at least one small, orange-beaked bird, the zebra finch, the dream just became a lot closer to reality.


    What Facebook Did to American Democracy

    The Atlantic, Alexis C. Madrigal


    from

    Tech journalists covering Facebook had a duty to cover what was happening before, during, and after the election. Reporters tried to see past their often liberal political orientations and the unprecedented actions of Donald Trump to see how 2016 was playing out on the internet. Every component of the chaotic digital campaign has been reported on, here at The Atlantic, and elsewhere: Facebook’s enormous distribution power for political information, rapacious partisanship reinforced by distinct media information spheres, the increasing scourge of “viral” hoaxes and other kinds of misinformation that could propagate through those networks, and the Russian information ops agency.

    But no one delivered the synthesis that could have tied together all these disparate threads. It’s not that this hypothetical perfect story would have changed the outcome of the election. The real problem—for all political stripes—is understanding the set of conditions that led to Trump’s victory. The informational underpinnings of democracy have eroded, and no one has explained precisely how.


    I picked the wrong week to quit sniffing Gluon: Amazon, Microsoft hope easy AI dev tool sticks

    The Register, Thomas Claburn


    from

    Amazon and Microsoft on Thursday rolled out open-source software called Gluon in the stated hope of simplifying the implementation of machine learning.

    Gluon provides an interface, in the form of an API, for building neural networks, something that’s not all that easy to do for those not steeped in the art. The software, available under the Apache 2.0 license, allows developers to construct machine learning models with the help of modular components.


    Ending AIG’s systemic-risk status raises questions on stability body’s role

    Reuters, Bora Yagiz


    from

    A recent decision by the Financial Stability Oversight Council to rescind the systemically important financial institution designation of insurer AIG(here) signals more than a change in perception of riskiness of that particular firm, but raises questions about the future of the council created under the 2010 Dodd-Frank Act.


    We Almost Gave Up On Building Artificial Brains

    Discovery Magazine, John Wenz


    from

    Today artificial neural networks are making art, writing speeches, identifying faces and even driving cars. It feels as if we’re riding the wave of a novel technological era, but the current rise in neural networks is actually a renaissance of sorts.

    It may be hard to believe, but artificial intelligence researchers were already beginning to see the promise in neural networks during World War II in their mathematical models. But by the 1970s, the field was ready to give up on them entirely.

    “[T]here were no impressive results until computers grew up, that is until the past 10 years,” Patrick Henry Winston, a professor at MIT who specializes in artificial intelligence, says. “It remains the most important enabler of deep learning.”


    Amazon has quietly been growing a mysterious office in a small surf town near San Francisco

    CNBC, Eugene Kim


    from

    It’s unclear what exactly the team there is working on or why Amazon has picked Santa Cruz as an office location. But the report says the office has a group of engineers working on the Alexa voice technology and could possibly expand up to 200 people.

    It’s also interesting that Amazon didn’t take any tax breaks or economic benefits from the city of Santa Cruz for opening an office there. According to another report by the Silicon Valley Business Journal, Amazon has received more than $1.2 billion in state subsidies for the 250 facilities it’s built across the country.


    How Business Schools Can Integrate Data Analytics into the Accounting Curriculum

    The CPA Journal; Miklos A. Vasarhelyi, Norbert Tschakert, Julia Kokina, and Stephen Kozlowski


    from

    The demand for accountants with data analytics skills is growing rapidly, providing for exceptional career opportunities. A recent McKinsey Global Institute study asserted that, even with additional university programs in data analytics becoming available, companies will need to train employees internally for many years to overcome the current skills shortage (“The Age of Analytics: Competing in a Data-Driven World,” December 2016, http://bit.ly/2fyfoVU). McKinsey also noted that, while the estimated number of data science programs graduates could increase by 7% per year, the high-case scenario indicates a 12% annual growth in demand, leading to a shortfall of some 250,000 data scientists. In addition, McKinsey described the equally important role of the business translator, who acts as the link between the analytical talent and the practical application of the analytical results to address business requirements. The requirements for these business translators include not only an in-depth understanding of the data, but also organizational knowledge and industry or functional expertise that enables them to ask the data scientists the correct questions and derive the appropriate insights from their findings. These requirements appear analogous to the traditional requirements that an accountant be able to understand and explain the accounting results, but in this case, that expertise includes numerous forms of data.

     
    Deadlines



    Adobe Launches Analytics Challenge For Data Scientists

    Teams that place in the Adobe challenge will receive $60,000 in cash prizes — $35,000 of which will go to the first place team alone. Deadline for submissions is October 16.

    Machine Learning on the Phone and other Consumer Devices

    Long Beach, CA NIPS 2017 workshop on December 9. Deadline for submissions is October 31.
     
    NYU Center for Data Science News



    DeepMind Fellow Profile: Yassine Kadiri

    Medium, NYU Center for Data Science


    from

    Learn more about Yassine Kadiri, an M.S. Data Science student and one of four DeepMind Fellows at NYU CDS this year

     
    Tools & Resources



    [1710.02249] Multiresolution Consensus Clustering in Networks

    arXiv, Computer Science > Social and Information Networks; Lucas G. S. Jeub, Olaf Sporns, Santo Fortunato


    from

    Networks often exhibit structure at disparate scales. We propose a method for identifying community structure at different scales based on multiresolution modularity and consensus clustering. Our contribution consists of two parts. First, we propose a strategy for sampling the entire range of possible resolutions for the multiresolution modularity quality function. Our approach is directly based on the properties of modularity and, in particular, provides a natural way of avoiding the need to increase the resolution parameter by several orders of magnitude to break a few remaining small communities, necessitating the introduction of ad-hoc limits to the resolution range with standard sampling approaches. Second, we propose a hierarchical consensus clustering procedure, based on a modified modularity, that allows one to construct a hierarchical consensus structure given a set of input partitions. While here we are interested in its application to partitions sampled using multiresolution modularity, this consensus clustering procedure can be applied to the output of any clustering algorithm. As such, we see many potential applications of the individual parts of our multiresolution consensus clustering procedure in addition to using the procedure itself to identify hierarchical structure in networks.


    Library of Congress Holdings by LCC

    Medium, Matt Miller


    from

    “A couple years ago I converted the LCC classification outlines in PDF format to JSON. The goal was to have a way to coarsely place a resource into a LCC category. With the Library of Congress data release I wanted to try it with their holdings to see the shape of the collection.”

    “The LCC consists of 21 classes which narrows topics into increasingly specific categories. This means the system is hierarchical, I wanted to show that hierarchy and which parts of it held the most resources. I used the Book, Serial, Music, Map and Visual Materials MARC records.”


    NovoPayment Launches Developer Hub for LATAM Focused Fintech Innovation

    ProgrammableWeb, Eric Carter


    from

    NovoPayment, fintech service provider specifically targeting the Americas, has launched its Developer Hub. Within the Developer Hub, NovoPayment currently offers 37 APIs that serve a number of fintech related operations specifically targeted for the Latin America market. NovoPayment looks to empower software developers to build financial and payment services across industries including insurance, banking, travel, transportation and much more.


    HappyFace: Identifying and Predicting Frustrating Obstacles for Learning Programming at Scale

    Philip Guo


    from

    Wouldn’t it be awesome if a programming instructor or course designer could magically peek into the minds of learners to see exactly when they’re getting frustrated? That way, they can know what kinds of code are confusing for learners, when to best intervene to help out, and how they can improve instructional materials in the future.

    How might we go about achieving this goal? We could attach electrodes to everyone’s heads and monitor their brain activity. We could use 3D sensors like a Kinect to detect their body posture and facial expressions. We could attach eye trackers to everyone’s computer monitor and use that data to estimate frustration. We could imagine doing all sorts of monitoring, but the problem with all of these ideas is that they’re expensive and thus don’t scale. How are we ever going to attach these devices onto hundreds of students (or tens of thousands, in the case of MOOCs)? Who will calibrate or maintain them? What if they get lost or stolen?

    Instead of relying on high-tech gizmos, we decided to take the most low-tech possible approach: What if we just asked learners to tell us when they’re frustrated?

     
    Careers


    Postdocs

    Post-doctoral Research Fellow



    Interdisciplinary Affective Science Laboratory; Boston, MA
    Full-time positions outside academia

    Data Developer Job



    Sports Reference; Philadelphia, PA

    Leave a Comment

    Your email address will not be published.