Data Science Projects: What Comes After Jupyter Notebooks?

Carolina Dias
7 min readNov 1, 2021

--

Making a simple API with Flask and deploying it to Heroku

Image by Emile Perron on Unsplash

Disclaimer: This was based on the course Machine Learning Zoomcamp by Alexey Grigorev. You can check it out here (I highly recommend it) and you can still sign-up for it as of today.

Introduction

When beginning to learn the notorious data science, we often start by learning how to use Jupyter Notebooks, how to analyze data in it, how to train models and so on and so forth.

But in a production environment, this is only a small part of the work needed to make the model useful. And to learn the skills needed to deploy a model capable of making predictions “in the real world", we got to start somewhere.

And we can start here, learning how to save and deploy our model into an API that accepts a POST request and returns a response with the prediction result. Understanding what is an API, what is a request and the HTTP verbs is a great start to our journey. For this I recommend the (very small) free e-book “The Little Book on REST Services".

With that in mind, let’s start our project.

Preliminaries

This assumes you have Python 3 installed (3.8, to be precise). This was produced in a Linux system, so it probably also works the same way in MacOs. Can’t say the same for Windows 😝

We’ll use Pipenv to manage our environment and dependencies. We can install it using:

pip3 install pipenv

Our project will live in the “project-insurance-forecast" directory. To initialize a new environment for this project we use

pipenv install

This will create two files: Pipenv and Pipenv.lock and it will use the default Python version of our system.

To activate the environment

pipenv shell

So now that our environment is set up we can install new packages by using

pipenv install package-name

If you cloned the repository for this project, all libraries and dependencies will be installed when we initiate the environment. A step by step of this can be found at the project README.md.

The Jupyter Step

For our analysis we’ll use Jupyter Lab and libraries such as pandas, numpy, matplotlib, seaborn and scikit-learn.

Since this is more focused on the later stages of the project, we go briefly over what we’ve done in the notebook and the full version can be found here at the main notebook.

Our data is the Medical Cost Insurance Forecast from Kaggle. This is a simple dataset in which we want to predict the value of the insurance charges for a person, so it is a regression problem.

A sample of the data we will be using

After an exploratory data analysis of the data and checking that everything is in order, we begin testing some machine learning models and tuning the ones that seem most promising.

At the end we arrive at a Random Forest Regressor model, not surprising :D

So what do we do now with our chosen model?

First of all, we need a way to make our model available so we can just plug the values and receive a prediction. We’ll save it as a .bin file, but other extensions can be used to abstract it as well, such as .pkl or .joblib.

After testing and tuning some models with GridSearchCV, we arrived at the following as the best option:

model = RandomForestRegressor(bootstrap = True, max_features = ‘auto’, min_samples_split = 50, n_estimators = 20, random_state = 42847)

To save it to a file called “model.bin” we do:

After this we’ll have a “model.bin” in our current working directory ready to be used whenever we want.

What now?

We have some options as to what to do now. Locally, we can build a Docker image and run it: this will help to further isolate our coding environment, and we can even publish it to Docker Hub.

Or we can deploy it in an API, using Flask or FastAPI, for example. We’ll do this using Flask for now (in the future we’ll see how to do it in FastAPI).

Finally, we can make a simple app and share it using Streamlit. It is more visually pleasing this way.

For our Flask API, we’ll deploy it on Heroku. We could use Docker for this as well, but we’ll see how we can do it by deploying our .py file directly using git.

First, we need to build our Flask app with our previously saved model. This can be accomplished with the following steps, adapted for our dataset, in a file called predict.py:

  1. First let’s import the libraries that will be used:

2. Now we open our previously saved model and initialize a Flask app which we’ll name “forecast”.

3. Add the code in the route “/predict” and create a function called “predict” to clean the data that we will receive, and finally run the app at “http://localhost:9696/predict”.

The cleaning part is needed so we make sure the data that is going into our model is the same format as the data it was trained with.

To run our Flask app we can use Python directly:

python predict.py

Or use it with gunicorn (recommended):

gunicorn --bind 0.0.0.0:9696 predict:app

It should now be running at http://localhost:9696/predict .

To send a POST request to the app, we can make a mini script to test it, namely “make_requests.py”, with the following code:

If everything is up and running correctly, we’ll receive a response with the value of the predicted charges for this particular example. We can change the example values for whatever we want to get different results.

Example of the expected output

This mini API now only works locally in our machine, but we would also like to deploy it so it can run from anywhere. This is where Heroku comes into play.

Deploying our API using Heroku

As seen in its own description, “Heroku is a platform as a service (PaaS) that enables developers to build, run, and operate applications entirely in the cloud.”

After making an account at https://signup.heroku.com/, we can go ahead and install the command line interface for Heroku by using:

sudo snap install --classic heroku

This will help us in deploying our model to the cloud. Still in the terminal, type heroku login to give the CLI access to your account.

In our current working directory, let’s create a file called “Procfile” with no extension. This is needed to configure the Heroku app. In this app we only need to add the following line:

web: gunicorn predict:app

Now, let’s create a name for our app. In this example we’ll be using “insurance-forecast” as the name.

heroku create insurance-forecast

Finally, after committing our changes as usual using git, we’ll can deploy the app by using:

git push heroku main

If everything went as planned, our API in now running at https://insurance-forecast.herokuapp.com/predict. We can now make a POST request to this URL in the same way as above and expect the same results:

Now our mini API is deployed to the cloud 😀

You could even have a landing page for your project:

Bonus: Streamlit App

Streamlit is an open-source Python library which helps us in making simple web apps for our projects, by abstracting much of the code needed for this.

Here in our example, we’ll make a web app that asks the user for some information and predicts the price of the insurance based on that.

The beginning of our “streamlit_app.py” is pretty similar to what we have in the Flask app.

The rest of the code uses the Streamlit library to build the parts of our app. Most of the code is text to fill the information in our app.

To run it locally, we can do:

streamlit run streamlit_app.py

And access the given URL.

We can also deploy this to run anywhere. Unfortunately we cannot deploy it in the same Heroku app, which is a shame. We can create a new Heroku app for this or use Streamlit Sharing, which gives us the option to deploy 3 streamlit apps for free.

After making an account and waiting for the green light for the access, we can deploy our code by clicking in “New App”. We’ll have the following:

Creating a new app in Streamlit Sharing

Simply paste the URL of the GitHub repository where the .py file is hosted. Tip: copy and paste the URL of the file itself, not only the repo, and all information will be filled automatically.

By clicking on “Deploy!” and waiting for a little bit, our app will be deployed and accessible to anyone with the URL.

Conclusion

While this is a simple project, it can be easily adapted to be a much more complex API, and the same principles still apply. Other cloud option can also be used for the deployment, making this a very flexible way of putting your projects out there!

Full project:

Any tips or suggestions? Feel free to contact me!

--

--

Carolina Dias

A machine learning engineer learning how machines learn and a mathematician bad at math