"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

November 21, 2020

Kubeflow Pipelines - Learning Notes #1

To appreciate something we need to why, how, what about the Tool.


Key Notes

Why / Necessity
  1. Monitoring of Model
  2. Training /Serving - Differences in transformation, handling missing data
  3. Frequency to refresh the model



Production System Components


Kubeflow Platform

Develop, Deploy, Manage
Pipelines, Data Management, Serving (Rest End Point) 



Pipeline Component
Commands
Setup cluster, permissions in yaml file


Demo with screenshots




Pipelines
  • Domain-Specific Language
  • Instantiate Components
  • Define Dependency between components
  • Compile and Deploy Pipeline





Custom Components




Somehow the gap between ML code vs kubeflow code there is a lot of learning. How much time it takes to port to this infra? I need to experiment to comment. A lot of features are there but we shouldn't end up rewriting ML code to pipeline code. 

Notes #2



Codify ML Workflows
Adopt pipeline mindset
Experiment, Reproduce, Share pipeline

Define Pipeline
  • The description on ML Workflow
  • Runs on Container
  • Execution vs Runtime decoupled
  • Components - one step of workflow
  • Component - Packaged as Docker image
  • Pod for Each Step
  • Pipeline SDK


More Reads
pipeline sdk key notes - Link1, Link2, Link3
SDK Summary pointers



KALE (Kubeflow Automated pipeLines Engine) is a project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.

An Argo workflow executor is a process that conforms to a specific interface that allows Argo to perform certain actions like monitoring pod logs, collecting artifacts, managing container lifecycles, etc.

Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports hyperparameter tuning, early stopping and neural architecture search (NAS). Learn more about AutoML at fast.ai, Google Cloud, Microsoft Azure or Amazon SageMaker

Katib is the project which is agnostic to machine learning (ML) frameworks.

Ray Train, an easy-to-use library for distributed deep learning.

Dask is a flexible library for parallel computing in Python.

ML metadata (MLMD) library by Google. MLMD is an integral part of TensorFlow Extended (TFX) and a stand-alone application

The most important entities created and stored by MLMD are:
  • Artifacts that are generated by the pipeline steps (e.g., the trained model).
  • Metadata about the executions (e.g., the step itself).
  • Metadata about the context (e.g., the whole pipeline).

Keep Thinking!!!

No comments: