Piet Heinkade 11, 1019 BR Amsterdam, Netherlands
ADS Drinks & Data: ADS Meets CIDR
The Conference on Innovative Data Systems Research (CIDR) is a systems-oriented conference, emphasizing the systems architecture perspective. It is complementary in its mission to the mainstream database conferences like SIGMOD and VLDB.
Taking advantage of the presence of prominent data systems researchers visiting CIDR, Amsterdam Data Science is organizing a free and public meetup at the conference venue, right after it finishes.
ADS would like to invite you to this special meetup where we will hear from Turing award winner Michael Stonebreaker, Stanford associate professor Christopher Ré, known for his work on the Snorkel system (which he will talk about), and UvA professor by special appointment, Hinda Haned.
|14:30||Walk-in, Networking & Drinks
by Peter Boncz
Professor of Large-Scale Analytical Data Management, Vrije Universiteit Amsterdam & Senior Researcher in the Database Architectures Group, CWI
|15:05||Invited Talk #1 “Data Science: Most of Us Are Working on the Wrong Problem”
by Michael Stonebraker
Adjunct Professor of Computer Science, M.I.T. Chief Technology Officer, TamR & Paradigm4
|15:40||Invited Talk #2 “If you want to be rich, get a lot of money: Theory and Systems for Weak Supervision”
by Christopher Ré
Associate professor in the Department of Computer Science, Stanford University
|16:15||Invited Talk #3 “On the challenges of bringing explainable AI to practice”
by Hinda Haned
Professor by special appointment of AI, UvA and Lead Data Scientist, Ahold Delhaize
|16:50||Drinks & Networking
Michael was the main architect of the INGRES relational DBMS, and the object- relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for 25 years. More recently at M.I.T. he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction processing engine, the SciDB array DBMS, and the Data Tamer data curation system. Presently, he serves as Chief Technology Officer of Paradigm4 and Tamr, Inc. He was awarded the IEEE John Von Neumann award in 2005 and the 2014 Turing Award, and is presently an Adjunct Professor of Computer Science at M.I.T, where he is co- director of the Intel Science and Technology Center focused on big data.
Abstract: This talk will demonstrate why most of us, who are working on building or applying algorithms for machine learning in various application areas, are solving the wrong problem. We present data to demonstrate that “algo” is a minor part of what data scientists spend their time doing. Instead, we present what we believe are the issues that are the “high pole in the tent” for data scientists to worry about.
Chris is an associate professor in the Department of Computer Science at Stanford University. He is affiliated with the Statistical Machine Learning Group and Stanford AI Lab. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. He cofounded a company, based on his research into machine learning systems, that was acquired by Apple in 2017. More recently, he cofounded SambaNova systems based, in part, on his work on accelerating machine learning.
Abstract: If you want to build a high-quality machine learning product, build a large, high-quality training set. At first glance, this seems as useful as the statement “if you want to be rich, get a lot of money.” However, a key idea driving our work is that new theoretical and systems concepts including weak supervision, automatic data augmentation policies, and more, can enable engineers to build training sets more quickly and cost effectively.
Along with state-of-the-art results on benchmarks, these concepts have allowed our group and collaborators to build a range of state-of-the-art applications including patient-care monitoring on electronic health records, automatic triage systems for radiologists, and enabling cardiologists to spot rare abnormalities in video MRI—along with widely used products from Apple and Google. This talk describes the theoretical and systems challenges that such applications create.
On the machine learning theory side, a key problem is estimating the quality and correlation of various sources of training data—but without ground truth labels. This problem connects to classical questions about estimating the covariance of latent variable models. We describe our new techniques that solve this case and can even improve fully supervised methods for estimating the structure of graphical models.
On the machine learning systems side, this theory opens up new ways to build machine-learning systems. Here, we describe our recent work on systems that help engineers build and maintain machine learning products—without writing low-level code in frameworks like TensorFlow. These systems draw on recent ideas in machine learning, e.g., zero-code deep learning systems, and twists on classical data management ideas, e.g., schemas to separate the model, the supervision, and down-stream serving code.
Much of this work is open source and available at http://snorkel.org.
Hinda has been lead data scientist at Ahold Delhaize since 2015, where her activities involve designing and building solutions to answer business questions with data mining and machine learning techniques. In 2018, she was named professor by special appointment at the University of Amsterdam, where she focuses on researching and developing solutions for best practices for safe and responsible machine learning. From 2010 to 2015, she worked as a research statistician at the Netherlands Forensic Institute, during this time, she developed statistical models and open source software to facilitate the interpretation of complex DNA profiles.
Abstract: Providing explanations about how a machine learning model produced a particular outcome can help enhance users’ trust and their willingness to adopt the model for high-stake applications. Recent years have seen a surge in research on explaining AI-powered systems, but very little in this body of work evaluates the usefulness of the provided explanations from a practical human-centered perspective. In this talk, Hinda will discuss some of the challenges of bringing explainability into practice, and how it needs to be thought of as a process rather than a product.
Date: Wednesday 15 January 2020
Location: Mövenpick Hotel, Amsterdam
Registration is free but you must do so in advance through our Meetup page.
The event will be in English and is open to all.
Amsterdam Data Science (ADS) accelerates data science research by connecting, sharing and showcasing world-class technology, expertise and talent from Amsterdam on a regional, national and international level. Our research enables business and society to better gather, store, analyse and present data in order to gain valuable insights and make informed decisions.
We will be taking photos of the event and posting them on the ADS website and social media channels. If you have any questions or concerns, please send email to: email@example.com