Workshop Details

ML Summit Summit 2021
19. - 21. April 2021 | Online
Das große Trainingsevent für Machine Learning
& Data Science

Joerg Schad
ML 2019se Archiv


01 Okt 2018
10:00 - 13:00
Bis zum 18. Februar anmelden und bis zu 200 € pro Ticket sparen! Jetzt anmelden
Dieser Talk Stammt aus dem Archiv. Zum AKTUELLEN Programm

Building and Operating an Open Source Data Science Platform

01 Okt 2018
10:00 - 13:00

Monday, 1. October 2018 | 10:00 - 13:00

There are many great tutorials for training your deep learning models using TensorFlow, Keras, Spark or one of the many other frameworks. But training is only a small part in the overall deep learning pipeline. This workshop gives an overview into building a complete automated deep learning pipeline starting with exploratory analysis, over training, model storage, model serving, and monitoring and answer questions such as:

  • How can we enable data scientists to exploratively develop models? 
  • How to automatize distributed Training, Model Optimization and serving using CI/CD?
  • How can we easily deploy these distributed deep learning frameworks on any public or private infrastructure?
  • How can we manage multiple different deep learning frameworks on a single cluster, especially considering heterogeneous resources such as GPU?
  • How can we store and serve models at scale?
  • How can we monitor the entire pipeline and track performance of the deployed models?

The participants will build an end-to-end data analytics pipeline including: 

  • Data preparation using Apache Spark
  • JupyterLab self-service for data scientists
  • Data storage using HDFS* Distributed training
  • Automation & CI/CD using Jenkins
  • Resource sharing (including GPUs) between multiple user/jobs
  • Model and metadata storage
  • Model serving and monitoring