Machine Learning Engineering Blog

Infrastructure should be reproducible just like the other components in the software world. Most of the configurations and settings have been done manually through guidelines and documentation traditional infrastructure and platform engineering. Any decision to change something on the platform needs to be documented in order to keep track of the current status. However, it is a common pitfall to forget updating the documentation among engineers. I love documentation but keeping history is much easier using one of the version control systems such as git! It makes things a lot more transparent if you define your infrastructure as code. The knowledge is not only in somebody's head but also in a place where your whole team can rerun everything when things are down. Then I met Terraform.

In this blog post we will be creating a Kubernetes Cluster on Google Cloud Platform using Terraform.

Terraform is a tool to “Use Infrastructure as Code to provision and manage any cloud, infrastructure, or service”. It is a life saving tool that allows you to manage most of your services. I love it. I hope you will like it too!

Okay, let's get our hands dirty. First things first, start with opening your favorite IDE. My favorite is PyCharm. But I am going to use Visual Studio Code for this tutorial. I am creating a new folder called “terraform-kubernetes” and will open the project in VSC. I will be installing a Terraform plugin in my VSC that has linting and formatting features which is very helpful.

Next Step: Install Terraform

If you use Mac, install it via brew;

brew install terraform

I have a Ubuntu machine. So, I installed following the steps below;

Install unzip
sudo apt-get install unzip
Choose the version that you want to install. You can find the change log in its GitHub repo.
At this point in time I downloaded the latest version from Terraform Downloads.
Unzip it:
unzip terraform_0.12.24_linux_amd64.zip
Install it:
sudo install terraform /usr/local/bin/
Let’s see if it is working
terraform

Woohoo! You just installed Terraform. We are ready to go!

The next step is creating a Google Cloud project. We will need the ID of that project.

For this tutorial we are going to use Kubernetes on GCP. Go ahead and enable Kubernetes API for your project!

You can enable it from the following link, make sure you update the project_id to yours:

https://console.cloud.google.com/apis/api/container.googleapis.com/overview?project=project_id

For example this is the URL for my project:

https://console.cloud.google.com/apis/api/container.googleapis.com/overview?project=terraform-selin

Now we are ready to start writing our first Terraform configuration. Let’s go back to Visual Studio Code and create a file called main.tf.

We need to tell Terraform which provider we are going to use. A provider understands the API interactions for the given provider. In this case we are going to use Google Cloud Platform. So, the provider name will be google. This main.tf file will be the place we are going to connect our Terraform project with Google Cloud project.

The file should look something like this below;

provider "google" {
    project = <project>
    region = <region>
    zone = <zone>
}

We can pass the values of the variables directly in this file. But I don’t like that so much. Variables help labeling values with a descriptive name. It is much easier to read the code when the variables are used. Let’s create a file called variables.tf under the root directory of the project.

variables.tf

variable "project" {
   description  = "My Google project id"
   default = "terraform-selin"
}

variable "region" {
   description  = "Europe West region NL"
   default = "europe-west4"
}

variable "zone" {
   description  = "Europe West region NL"
   default = "europe-west4-a"
}

Update the default value in the “project_id” variable with your own project id. For my case it is GKE

Here:

Get the project id

I am going to go ahead and update the values on my main.tf file with reading from variables.tf file.

main.tf


provider "google" {
project = var.project_id
region = var.region
zone = var.zone
}

We want to install Kubernetes, so I will be creating a different directory for that to keep the repository clean and readable. Let’s create a folder called kubernetes (Google Kubernetes Engine)

mkdir kubernetes

So, under this directory I’ll be describing what kind of Kubernetes cluster I want to create.
So, we will be using google_container_cluster resource to do that. Let’s create a file called main-k8s.tf and variables.tf under the kubernetes folder.

Your folder structure should look like this:

Folder structure

In the variables I’ll define the machine type and the region for the cluster. I am using “n1-standard-1” which is the smallest machine at this point in time.

gke/variables.tf

variable "region" {
   description  = "Europe West region NL"
   default = "europe-west4"
}

variable "machine-type" {
   description  = "The default type of machine"
   default = "n1-standard-1"
}

Now, we will write down the configuration for the google_container_cluster resource. Give a unique name to your kubernetes cluster. Location is the region information. google_container_node _pool manages a node pool in GKE cluster.

gke/main.tf

resource "google_container_cluster" "my_gke_cluster" {
   name = "selin-k8s-terraform"
   location = var.region
   initial_node_count = 1
}

resource "google_container_node_pool" "preemptible_nodes" {
 name       = "selin-node-pool"
 location   = var.region
 cluster    = google_container_cluster.my_gke_cluster.name
 node_count = 1

 node_config {
   preemptible  = true
   machine_type = var.machine-type

   metadata = {
     disable-legacy-endpoints = "true"
   }

   oauth_scopes = [
     "https://www.googleapis.com/auth/logging.write",
     "https://www.googleapis.com/auth/monitoring",
   ]
 }
}

We need to login to Google Cloud Project in order to communicate with it. I am going to use my own credentials. Execute the command below;

gcloud auth application-default login

We need to initialize a Terraform working directory. Execute the command:

terraform init

It is time to execute my favorite command in Terraform. Terraform plan! This command is used to prepare the execution of the plan. We can see the entire changes that will take place. We can also give specific names to a plan with -out parameter.

terraform plan -out kubernetes-plan

What message did you see? Did it say “No changes. Infrastructure is up-to-date.”? Yes, because first Terraform reads the main.tf file. There are only project configurations stated in the main.tf file. We need to add our Kubernetes module here.

So, I am adding the piece of code below to my main.tf file

module "kubernetes_cluster" {
   source = "./kubernetes"
}

We execute the init command again to install all modules required by this configuration

terraform init

Make sure all the configuration are valid using the command below;

terraform validate

Let’s execute our plan again

terraform plan -out kubernetes-plan

We need to see a message “This plan was saved to: kubernetes-plan”. If you see an error message instead of this one, please go back to your configuration and check if something is wrong.
It is time to apply all the changes that are planned.

terraform apply

Type yes and hit Enter to approve the change.

You did it! It is going to take a couple of minutes to get all the services up and running.

Let’s look at the visual graph of Terraform resources using graph command

terraform graph

If you want to destroy the resources that are created use the command below;

terraform destroy

Here you can find the code on GitHub.

I hope you enjoyed this post. If you have any questions or remarks, leave a comment! Stay tuned!

Maintenance is one of the indisputable costs in software development projects. The code base is updated with multiple commits day after day. We cannot promise that if the code is still behaving as same as before the previous commit. If an issue is discovered today which is introduced two weeks ago, it can be too late to fix it or the cost of fixing could be much higher compared to fixing the issues right after the issues are introduced. Finding the issues as fast as possible can be done by proper testing in Continuous Integration (CI). This concept is a development practice which states that the code should be tested and verified with each commit from each developer from the project.

In this article, I am going to implement a Continuous Integration pipeline using Containerization. By packaging the application together with its dependencies, containerization helps smooth delivery of the software in an isolated environment. In this post we will use Docker for containerization and Gitlab-CI for Continuous Integration.

First, I am going to build a Fibonacci calculator using Flask. I am creating a function which calculates the nth Fibonacci number. Additionally, I created an endpoint which allows user to GET the index or POST a form. Here is how app.py looks like:

The project includes a requirements.txt file with all the dependencies required for the project to be run.

Testing is one of the crucial part of software engineering projects to make sure everything is healthy and works fine. It is time to write unit tests! I am using pytest for my unit tests. Naming conventions are important for pytest, we need to start naming the python test file with ‘test_’. I called my file test_app.py. The clean way of using fixtures with pytest is the reason why I chose it.

@pytest.mark.parametrize is a fixture in pytest which we can use to pass multiple parameters for the same test. I need to define what the input and the expected output is, like below.

assert n keywords checks if the given operation is correct between the result of the application and the expected value.

To execute the unit tests, in the project root folder we run the pytest command with the following result:

In the result we can see that the same scenario is executed with 3 different input/ouput pairs.

It is time to dockerize the application!

Docker is containerization platform which packages the application code with its dependencies. So, the application can run anywhere independently of environment. I need to tell Docker what to install, where the starting point of my application is located and what the environment variables required by the app are. To pass all this information to Docker, we need to create a Dockerfile. A Dockerfile contains the recipe of your application.

FROM: what is the base image

RUN: to run bash commands

COPY: copy from to target

WORKDIR: set the given directory as a current directory

ENV: set an environment variable

CMD: Application start point

My Docker file is below:

Now, I can build my docker image from the Dockerfile to see if it works. Go to the project directory.

cd dockerci

docker build -t calculator .

To list all available docker images:

docker images or docker image ls

I see my image is built successfully, now I can run my container to run the application.

docker run -d -p 5000:5000 calculator

To list all docker containers:

docker ps

Perfect, I see my docker container is created and running successfully.

Note: If you cannot see your docker container when you do docker ps, run docker ps -a to see containers that are not running. Then you can docker logs <CONTAINER_ID> to investigate what is going on.

My container is running successfully, now I can go http://localhost:5000 to see my application and play with it.

Now that Docker is set up, it is time to jump into CI. I will use Gitlab-CI because it is very easy to integrate with your application. There are two ways to implement Gitlab-CI, first you can use the Gitlab native servers to build and run your application, which means you basically don’t need anything but a pipeline. This pipeline will be executed on the Gitlab servers. Alternatively, you can setup Gitlab runner on your own server, then the pipeline operations will be executed there. For the current project, I will use Gitlab-CI servers so I don’t need to setup Gitlab Runner.

Let’s create the .gitlab-ci.yml file.

We can run multiple stages in an order using stages like below:

Here we have 3 stages unittest, build and push. First the unit test stage will be executed, then the build stage and finally the push stage. If the unit test stage fails, the pipeline will stop and next steps will not be executed.

To do this, I need to write what commands need to be executed. The first stage is unit test and I define it with stage: unittest, and the script phase will execute the commands in order. To execute the pip command I started with base image image: python:3.6

Together with build and push my gitlab-ci.yml file looks like this:

Now my pipeline is ready!

Whenever I push a new change this pipeline will be automatically trigggered by Gitlab-CI. I can clearly see which steps are being executed currently or which are already done.

My pipeline is successfully executed and the status is passed!

Here is my Youtube Channel which I explain all the steps in detail. : https://www.youtube.com/playlist?list=PLSt3GXI7vWE9iCnPgkIdPnmina9EEoGb4

Here is the gitlab project link: https://gitlab.com/selin-gungor/astrea/tree/master

Thanks for reading. See you in my next post! 🙂

Selin Gungor

Machine Learning Engineering Blog

Creating a Kubernetes Cluster via Terraform

Continuous Integration with Docker