Welcome to AWS MLOps Python package’s documentation!

This package contains the classes for managing the saving of the AWS services.

Getting started

AWS MLOps package is implemented to help you like a framework for deploying what it is necessary to manage your models.

The goal is to implement this package to keep your focus on your prep and test code using the AWS SageMaker services, AWS Step Functions and AWS Lambda.

It is part of the educational repositories to learn how to write stardard code and common uses of the TDD and CI / CD.

Prerequisites

You can use Serverless framework for deploying the AWS services: if you want to use the guide below, you have to install npm (Node Package Manager) before.

If you want to use another AWS tool, you can see the repository aws-tool-comparison before to implement your version.

Installation

The package is not self-consistent. So you have to download the package by github and to install the requirements before to deploy the example on AWS:

git clone https://github.com/bilardi/aws-mlops
cd aws-mlops/
npm install
export AWS_PROFILE=your-account
export STAGE=studio
bash example/deploy.sh

Or if you want to use this package into your code, you can install by python3-pip:

pip3 install aws_mlops
python3
>>> import aws_mlops
>>> help(aws_mlops)

Read the documentation on readthedocs for

  • Usage

  • Development

Change Log

See CHANGELOG.md for details.

License

This package is released under the MIT license. See LICENSE for details.

Usage

The AWS Step Functions manages your ML cycle of data processing, modeling and testing or prediction with your best model. You can find all python scripts that you have to prepare for your MLOps solution in the folder example:

  • config.py, it contains all you want to configure for using the aws_mlops library

  • definitions.py, it contains all you want to configure for creating your step functions

  • processing.py, probably you can copy it and it is not necessary to modify it

  • Dockerfile, it contains what you need for running the script named processing.py

  • prep_with_pandas.py, it contains your code for data processing and it is loaded by processing.py

  • test_with_pandas.py, it contains your code for testing and it is the script loaded by processing.py

You can also find the ipynb files that they are useful to prepare your python scripts:

  • prep_with_pandas.ipynb for prep_with_pandas.py

  • test_with_pandas.ipynb for test_with_pandas.py

And there are some bash scripts for creating your CI / CD system:

  • test.sh for testing python library and step functions deployment

  • test_docker.sh for testing the scripts you can call from processing.py

  • build_image.sh for building your image and saving it on AWS ECR

  • deploy.sh for deploying step functions and lambda of your infrastructure

Example

You need an infrastructure with a process for

  • preparing the raw data for your training by AWS Sagemaker Autopilot

  • running Autopilot

  • inference with your best model and your test data

  • testing and saving the prediction data, metrics and attributes used

When you know the initial hyperparameters that you can use, you can setup the config.py and

  • preparing the raw data for your training

  • training and tuning your model

  • inference with your best model and your test data

  • testing and saving the prediction data, metrics and attributes used

And the last two points have to be usable for

  • inference with your best model and your new data

  • saving the prediction data, metrics and attributes used

When you have prepared the python scripts listed above, you have to

  • commit your changes and push on your repo

  • proceed with the commands for deploying described in the Development Section and paragraph Deploy on AWS

When you have deployed the infrastructure, you can use the example/mlops.ipynb for calling the whole cycle or only a specific piece.

The secret is to version all: data, code and model that you use for defining that prediction. It is important to version any change for the analysis step.

If you need to improve your configuration or your scripts, the best way is

  • commit any change of your python scripts listed above, thus the s3 key will be different for commit

  • if you change only the raw/new data, the s3 key would be different for datatime, but you also can fix it for your testing

  • if you have to test your change, deploy an infrastructure for your branch, thus the s3 key will be different from production

Development

The environments for development can be many: you can organize a CI/CD system with your favorite software. The primary features of your CI/CD are: having a complete environment for

  • development for each developer, to implement something and for running unit tests

  • staging for running unit and integration tests, to check everything before release

  • production

If you want to use AWS CDK and AWS CodePipeline, you can see these repositories before to implement your version

When you add the data management in your CD cycle, you have to add the data versioning:

  • the system improved in the folder named example, it provides s3 key with branch, environment, commit and datatime

  • so you can have a complete environment for each combo of them

This is important to commit any change for the analysis step.

Run tests

cd aws-mlops/
npm install
pip3 install --upgrade -r example/requirements.txt
python3 -m unittest discover -v
# even with functional and infrastructure tests
export AWS_PROFILE=your-account
bash example/test.sh

Improve your python scripts for processing by Jupyter

cd aws-mlops/
docker run --rm -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -e AWS_PROFILE=your-account -v $HOME/.aws/credentials:/home/jovyan/.aws/credentials:ro -v "$PWD":/home/jovyan/ jupyter/datascience-notebook

You can find two ipynb files in the folder named example: they can help you to improve your code for the processing steps.

Test your python scripts

If you did never push an image on your repository, run the commands of Deploy on AWS paragraph before run the docker.

cd aws-mlops/
export AWS_PROFILE=your-account
export STAGE=development
bash example/test_docker.sh # and with bash example/test.sh for all

Deploy on AWS

cd aws-mlops/
export AWS_PROFILE=your-account
export STAGE=development
bash example/deploy.sh

Remove on AWS

The stack has the tags necessary for being deleted itself, if you use the aws-saving. Or you can run the commands below to remove by Serverless only the environment that you want to delete:

cd aws-mlops/
export AWS_PROFILE=your-account
SLS_DEBUG=* sls remove --stage development

Indices and tables