[Day 146] MLOps zoomcamp module 2 homework + some more prep for Microsoft x NVIDIA's hackaton
Hello :)
Today is Day 146!
A quick summary of today:- did 2nd homework from MLOps zoomcamp on experiment tracking
- covered some material on devops on Azure
MLOps zoomcamp Module 2 homework on experiment tracking
The homework link is here. The homework followed the below outline:
- Install mlflow
- Preprocess NYC taxi data
- Train a model with autolog
- Launch a tracking server locally
- Tune model hyperparams
- Promote the best model
My code for it is on my github.
The used model was a random forest regressor using rmse as the main evaluation metric. The homework was more focused on running mlflow commands rather than the preprocessing/training (which were done in Module 1 of the course).
Below is a graphical comparison of some of the models I ran
I had to set up logging and mlflow db for a sample runRun hyperparam optimization
And finally choosing and registering the best modelFor a bit I faced an issue with the params passed to the model because they were missing (initially not logged in the hyperparam opt code), and then I kept registering the *worst* performing model because i had [0] at the end of that best run, but when I put [-1] it was all good. Below are the top 5 and then best among the 5 models is registerered.
2nd prep course for Microsoft x NVIDIA hackaton in Microsoft's office in Seoul
It was a bit long so I covered just the bits I thought were new, and below are some notes I took along the way.
Azure Boards: tools for agile planning, work item tracking, visualization, and reporting
Azure Pipelines: a CI/CD platform that is language, platform, and cloud-agnostic, supporting containers and Kubernetes
Azure Repos: cloud-hosted private Git repositories with two types of version control: distributed (Git) and centralized (Team Foundation Version Control, TFVC). Features include
- Free private Git repositories
- Support for any Git client
- Web hooks and API integration
- Semantic code search
- Git code reviews with threaded discussion
- Continuous integration for each change
- Branch policies to ensure code quality
- Integration with favorite tools and editors
Azure Artifacts: Integrated package management supporting Maven, npm, Python, and NuGet packages from public or private sources
Azure Test Plans: An integrated solution for planned and exploratory testing
Workflows: Define automation processes, detailing trigger events and jobs to run. Workflows are written in YAML and located within the GitHub repository at .github/workflows. Example:
GitHub runners: compute resources that execute GitHub Actions workflows, performing build, test, and deployment tasks directly within GitHub repositories
Continuous integration with actions example
- On: Specifies what will occur when code is pushed
- Jobs: There's a single job called build
- Strategy: It's being used to specify the Node.js version
- Steps: Are doing a checkout of the code and setting up dotnet
- Run: Is building the code
Next, Azure Pipelines is a robust service for creating cross-platform CI/CD workflows, integrating with your preferred Git provider and deploying to major cloud services. Key features
- End-to-end flow of value: Focus on delivering continuous value from concept to customer, avoiding siloed processes
- Automation and reliability: Establish a repeatable, reliable process for software delivery
- Quality verification: Each pipeline stage verifies feature quality to prevent user-facing errors
- Team feedback and visibility: Ensure all team members have feedback and visibility into the delivery process
- Frequent small changes: Promote frequent, smaller changes for better flow and optimization
- Continuous improvement: Continuously monitor and resolve obstacles to improve pipeline efficiency
Pipeline stages:
- Build automation and Continuous Integration: Automatic building and integrating of changes
- Test automation: Automated testing for functionality and quality
- Deployment automation: Automated deployment to various environments
Orchestration:
- Release and pipeline orchestration: Manage and control the entire pipeline, providing top-level insights
- Value stream mapping: Identify and improve inefficiencies
- Infrastructure efficiency: Efficient infrastructure is crucial for an effective pipeline
Key terms in Azure pipelines
- Agent: software that runs a build or deployment job
- Artifact: collection of files or packages published by a build, used for distribution or deployment
- Build: one execution of a pipeline, collecting logs and test results
- Continuous Delivery (CD): process of building, testing, and deploying code to various stages, ensuring quality through multiple stages and constant monitoring
- Continuous Integration (CI): practice of automating the testing and building of code to catch issues early. CI systems produce artifacts used in CD pipelines for automatic deployments
- Deployment target: destination for application deployment, such as a virtual machine, container, or web app
- Job: a set of steps run on an agent within a build, representing an execution boundary
- Pipeline: script defining the CI/CD process, composed of tasks for testing, building, and deploying
- Release: execution of a release pipeline, including deployments to multiple stage
- Stage: primary divisions in a pipeline, such as 'build,' 'test', and 'deploy'
- Task: building block of a pipeline, performing specific jobs like building, testing, or deploying
- Trigger: configuration that specifies when the pipeline should run, such as on code push or schedule
That is all for today!
See you tomorrow :)