To appreciate something we need to why, how, what about the Tool.
Key Notes
Why / Necessity
- Monitoring of Model
- Training /Serving - Differences in transformation, handling missing data
- Frequency to refresh the model
Production System Components
Kubeflow Platform
Develop, Deploy, Manage
Pipelines, Data Management, Serving (Rest End Point)
Pipeline Component
Commands
Setup cluster, permissions in yaml file
Demo with screenshots
Pipelines
- Domain-Specific Language
- Instantiate Components
- Define Dependency between components
- Compile and Deploy Pipeline
Custom Components
Somehow the gap between ML code vs kubeflow code there is a lot of learning. How much time it takes to port to this infra? I need to experiment to comment. A lot of features are there but we shouldn't end up rewriting ML code to pipeline code.
Notes #2
Codify ML Workflows
Adopt pipeline mindset
Experiment, Reproduce, Share pipeline
Define Pipeline
- The description on ML Workflow
- Runs on Container
- Execution vs Runtime decoupled
- Components - one step of workflow
- Component - Packaged as Docker image
- Pod for Each Step
- Pipeline SDK
More Reads
SDK Summary pointers
KALE (Kubeflow Automated pipeLines Engine) is a project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.
An Argo workflow executor is a process that conforms to a specific interface that allows Argo to perform certain actions like monitoring pod logs, collecting artifacts, managing container lifecycles, etc.
Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports hyperparameter tuning, early stopping and neural architecture search (NAS). Learn more about AutoML at fast.ai, Google Cloud, Microsoft Azure or Amazon SageMaker
Katib is the project which is agnostic to machine learning (ML) frameworks.
Ray Train, an easy-to-use library for distributed deep learning.
Dask is a flexible library for parallel computing in Python.
ML metadata (MLMD) library by Google. MLMD is an integral part of TensorFlow Extended (TFX) and a stand-alone application
The most important entities created and stored by MLMD are:
- Artifacts that are generated by the pipeline steps (e.g., the trained model).
- Metadata about the executions (e.g., the step itself).
- Metadata about the context (e.g., the whole pipeline).
Keep Thinking!!!
No comments:
Post a Comment