Streamlit your Data Science application into AWS
To check a more advanced real world application developed using these guidelines please consider visiting this repository
Have you ever seen those fancy Data Science web apps that include interactive plots, Machine Learning inference in realtime and custom user inputs? Well, I have and I was always wondering how they were built. I mean, I’m sure that a whole team of front-end engineers, data engineers and data scientists could create something like that, but I was always looking for a simpler solution. And then I found Streamlit.
What is Streamlit?
Streamlit is a Python framework that allows you to deploy Data Science applications in the form of Python scripts, dealing with the back and most of the front-end nuances for you. It’s a great tool for Data Scientists that want to share their work with the world, but don’t want to spend too much time dealing with web development.
It also includes a Cloud service to host your applications, which is great for a personal project or a small team. But what if you want to deploy your application to a bigger audience or you are concerned about privacy and security issues? Well, you can do that too, but you’ll need to use AWS, which we will talk about later.
How does it work?
Streamlit is a Python framework, so you’ll need to install it in your environment. You can do that by running the following command:
pip install streamlit
If we run streamlit hello
we will be able to see how we have a running web application on our localhost, and with 0 code! Sure enough, our custom Data Science applications will not be so easy to create or simple to use, but the key elements are there.
Creating a Streamlit application
There are several commands that we can use to create a Streamlit application, being the most important ones explained in the Cheat Sheet. As we will progress building our app, I’ll leave some comments in the code to explain what is going on:
import pandas as pd
import streamlit as st
import plotly.express as px
# Sets the page layout to wide and creates a title
st.set_page_config("My dummy Streamlit App", None, "wide")
# Creates a h1 title
st.title("My first Streamlit application")
# Creates a DataFrame
df = pd.DataFrame({
'Seller': ["A", "B", "C", "D"],
'Sales': [5, 6, 7, 8]
})
# Creates a sidebar with a multiselect widget
sellers_to_filter = st.sidebar.multiselect(
"Select sellers to filterby",
[seller for seller in df["Seller"].unique()],
[seller for seller in df["Seller"].unique()],
)
filtered_df = df[df["Seller"].isin(sellers_to_filter)]
# Creates 2 columns for plots
col1, col2 = st.columns(2)
# Creates a bar chart
col1.plotly_chart(px.bar(filtered_df, x="Seller", y="Sales"))
# Creates a pie chart
col2.plotly_chart(px.pie(filtered_df, values="Sales", names="Seller").update_layout(showlegend=False))
# Displays the DataFrame as a table
st.table(filtered_df)
While we iterate through our perfect web app, we can use the following command to see the changes in real time:
streamlit run app.py
Containerizing our application
Now that our app is complete and we are happy with the results of this first iteration, we can start to think about how to deploy it. As we mentioned before, we could use Streamlit Cloud to host our application, but we will use AWS to have more control over our application and to be able to scale it in the future.
Although we could deploy our application as a script directly in an EC2 instance, it’s advised to containerized it. This way, we can easily deploy it in any environment and we can also scale up to Kubernetes if needed, for example, or collaborate easier with other teams.
To containerize our application, we will use Docker. We will install Docker and create a Dockerfile
with the following content:
FROM python:3.9
EXPOSE 8501
WORKDIR /app
RUN apt-get update && apt-get install -y \
build-essential \
software-properties-common \
git
COPY ./requirements.txt /app/requirements.txt
RUN pip3 install --upgrade pip && pip install -r requirements.txt
And we will also create a docker-compose.yml
file with the following content to manage our image and prepare it to include further services if needed:
version: "3.9"
services:
my-app:
build:
context: .
dockerfile: ./Dockerfile
volumes:
- $HOME/.aws/config:/root/.aws/config
- $HOME/.aws/credentials:/root/.aws/credentials
command: streamlit run app.py --server.port=8501 --server.address=0.0.0.0
restart: always
ports:
- 8501:8501
We can test our application in local using the following command on the root of our project:
docker-compose up
This should have opened a port 8501 in our localhost, where we can see our application running.
Deploying our application
AWS Configuration
In order to deploy our application in AWS, we first need to create an AWS account. If you don’t have one, you can create one here. Please consider that we will only be using the free tier offered by Amazon (more info) to avoid extra costs with our examples. To create an alert in case you surpass the free tier usage go to Account Settings > Budgets > Create Budget and select the Zero spend budget
template:
Then we can create a Group and a IAM user following the instructions here, downloading the secret key and the access key. We will need these credentials to connect to our AWS account programmatically.
Once you have your account, you can create an EC2 instance. We recommend using t2.micro with Amazon Linux for testing due to the low cost, but you can use a bigger instance if you want to scale up your application.
On the Security Group configuration, we need to expose the port that’s going to be used for our application, which in this case will be the 8501. We also need to add a rule to allow SSH access to our instance, so we can connect to the instance later. On this step, it’s also important to create a key pair to authenticate our connection while connecting to the instance via SSH.
Once our EC2 is configured, feel free to run it using the AWS console so we can properly used it.
Connecting to the EC2 instance
Now that our EC2 instance is running, we can connect to it using SSH. We can do that by running the following command:
ssh -i <path-to-key-pair> ec2-user@<public-ip>
Installing Docker in EC2
Once we are connected to our EC2 instance, we need to install Docker. We can do that by running the following commands:
sudo yum update -y
sudo amazon-linux-extras install docker
sudo service docker start
sudo usermod -a -G docker ec2-user
And now we just need to download the code from our repository and build the image:
git clone <git-repository-url>
cd <project-name>
docker-compose up
Adding security to our application
Streamlit can be integrated with SSOs such as OKTA and it’s the recommended option for production environments. However, it also offer different alternatives for use cases without SSO.
The first alternative would be to use the .streamlit/secrets.toml
file to store our credentials and later on check for the contents of this file in our application. An example of this integration, following the code in the Streamlit guidelines, could be the following:
# .streamlit/secrets.toml
[passwords]
# Follow the rule: username = "password"
alice_foo = "streamlit123"
bob_bar = "mycrazypw"
# app.py
import streamlit as st
def check_password():
"""Returns `True` if the user had a correct password."""
def password_entered():
"""Checks whether a password entered by the user is correct."""
if (
st.session_state["username"] in st.secrets["passwords"]
and st.session_state["password"]
== st.secrets["passwords"][st.session_state["username"]]
):
st.session_state["password_correct"] = True
del st.session_state["password"] # don't store username + password
del st.session_state["username"]
else:
st.session_state["password_correct"] = False
if "password_correct" not in st.session_state:
# First run, show inputs for username + password.
st.text_input("Username", on_change=password_entered, key="username")
st.text_input(
"Password", type="password", on_change=password_entered, key="password"
)
return False
elif not st.session_state["password_correct"]:
# Password not correct, show input + error.
st.text_input("Username", on_change=password_entered, key="username")
st.text_input(
"Password", type="password", on_change=password_entered, key="password"
)
st.error("😕 User not known or password incorrect")
return False
else:
# Password correct.
return True
if check_password():
# Here comes the main content of our application
Even though we can use this approach, it’s not recommended to store our credentials in plain text. We can use AWS Secrets Manager to store our credentials and retrieve them in our application. This way, we can avoid storing our credentials in plain text and we can also use the same credentials for different applications. This approach is a little bit more complex, but I find it to be much more secure.
After registering our secret in AWS Secrets Manager, we can retrieve and store it in our application session state by using code like the following:
import ast
import boto3
import botocore
def state_password_dict(secret_name: str, region_name: str):
session = boto3.session.Session()
client = session.client(service_name="secretsmanager", region_name=region_name)
try:
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
secret_string = get_secret_value_response["SecretString"]
secret_dict = ast.literal_eval(secret_string)
st.session_state["secret_dict"] = secret_dict
except botocore.exceptions.ClientError:
# if secret not found then create empty dict
st.session_state["secret_dict"] = {}
Later on, instead of checking the credentials given by the user against those in st.secrets
, we can check them against the credentials stored in our session state by using st.session_state["secret_dict"]
instead.
Consider that in order to use this code we need to have in our EC2 instance the AWS config files and add them to our Docker image. You can find more information about this in the AWS documentation. Go ahead and check our
docker-compose.yml
file to see how we did it.
Final remarks
Even though this guide was more focused on AWS, I hope it has illustrated how to easily deploy a ML application in the cloud. No matter which service or framework you use, at the end of the day the core concepts such as containerization and deployment are very similar regardless of our vendor choices.
It’s true that Streamlit provide us a friendly framework to create our applications, but it’s also true that we need to be aware of the security and privacy issues that we are facing when deploying our code in the cloud. Some DevOps knowledge always come in handy when dealing with these issues, alongside with a good understanding of the cloud services that we are using.
I hope that this guide has helped you to understand how to deploy your ML application in the cloud, and we hope to see you soon in our next post!