Usage

The AWS Step Functions manages your ML cycle of data processing, modeling and testing or prediction with your best model. You can find all python scripts that you have to prepare for your MLOps solution in the folder example:

  • config.py, it contains all you want to configure for using the aws_mlops library

  • definitions.py, it contains all you want to configure for creating your step functions

  • processing.py, probably you can copy it and it is not necessary to modify it

  • Dockerfile, it contains what you need for running the script named processing.py

  • prep_with_pandas.py, it contains your code for data processing and it is loaded by processing.py

  • test_with_pandas.py, it contains your code for testing and it is the script loaded by processing.py

You can also find the ipynb files that they are useful to prepare your python scripts:

  • prep_with_pandas.ipynb for prep_with_pandas.py

  • test_with_pandas.ipynb for test_with_pandas.py

And there are some bash scripts for creating your CI / CD system:

  • test.sh for testing python library and step functions deployment

  • test_docker.sh for testing the scripts you can call from processing.py

  • build_image.sh for building your image and saving it on AWS ECR

  • deploy.sh for deploying step functions and lambda of your infrastructure

Example

You need an infrastructure with a process for

  • preparing the raw data for your training by AWS Sagemaker Autopilot

  • running Autopilot

  • inference with your best model and your test data

  • testing and saving the prediction data, metrics and attributes used

When you know the initial hyperparameters that you can use, you can setup the config.py and

  • preparing the raw data for your training

  • training and tuning your model

  • inference with your best model and your test data

  • testing and saving the prediction data, metrics and attributes used

And the last two points have to be usable for

  • inference with your best model and your new data

  • saving the prediction data, metrics and attributes used

When you have prepared the python scripts listed above, you have to

  • commit your changes and push on your repo

  • proceed with the commands for deploying described in the Development Section and paragraph Deploy on AWS

When you have deployed the infrastructure, you can use the example/mlops.ipynb for calling the whole cycle or only a specific piece.

The secret is to version all: data, code and model that you use for defining that prediction. It is important to version any change for the analysis step.

If you need to improve your configuration or your scripts, the best way is

  • commit any change of your python scripts listed above, thus the s3 key will be different for commit

  • if you change only the raw/new data, the s3 key would be different for datatime, but you also can fix it for your testing

  • if you have to test your change, deploy an infrastructure for your branch, thus the s3 key will be different from production