01/22/2019 Data.

fahr is a proof-of-concept Python CLI for training machine learning models remotely. Basically fahr does all of the work of packaging your model training script, shipping it on the cloud (via AWS SageMaker or Kaggle Kernels), and reading back to local disk for you.

fahr grew out of my experience trying to use AWS SageMaker (prior to its late 2019 SageMaker Studio facelift) and finding it to be an incredibly frustrating developer experience. Curious if I could make something better, I built most of fahr over the course of three coffee-fueled days in January 2019.

You need two files, a requirements.txt files specifying code dependencies (conda.yml is also supported), and a .py file defining the model training code. You call fahr fit to launch the training job, then fahr fetch to get model outputs.

Under the hood, fahr builds the code and model files into a Docker image locally (using the Docker Python SDK), pushes them to an AWS ECR registry on the cloud (using the AWS Python SDK), and then orchestrates a custom SageMaker run using this container as its entrypoint. Training outputs are downloaded from an S3 bucket that acts as the data sink.

This was a satisfying project to work on because it was an opportunity to write idiomatic Python cloud orchestration code with no deadline pressure and no hacks. Take a look at the code to see the result.

— Aleksey