SPARCRequest with Kubernetes (by OHSU)

Wenjun He 1 (887 days ago)

GitHub repo: https://github.com/OCTRI/sparc-request-kubernetes

This reference project can be used to inform how your group could run SPARCRequest in a Kubernetes environment. The following is representative of how OCTRI chose to implement SPARCRequest.

Considerations for running SPARCRequest in Kubernetes

SPARCRequest Customization

By using a second image we are able to modify our instance of SPARCRequest without needing to maintain a separate branch in source control.

See the Image section below for a description of how we build and manage our SPARCRequest instance.

Deployments

We have two deployment workloads defined; one for the main application and a second for the delayed job that performs background tasks like sending emails.

Main Deployment

The main SPARCRequest application generally runs well inside Kubernetes, a single process Puma web server running the rails app. As seen in the deployment manifest, we specify the resources, configuration, liveness, initContainers and any volumes required for persistence.

The initContainers are used to perform database migrations and asset compilation and must run to completion before the main workload starts.

The volumes persist the things like attachments. We use a NFS PersistentVolume to preserve these files beyond the lifespan of a pod.

The livenessProbe informs the scheduler when the container is unresponsive and that it should restart the pod. This ensures that if for any reason the application fails it will be restarted without intervention.

Delayed Job

The challenging part of running SPARCRequest in a Kubernetes environment is running the Delayed Job as this is a separate process from SPARCRequest. To accommodate this process we run it as a Deployment running the following command:

rails jobs:work

This can be seen in the deployment_delayed_job manifest. There is no reason it could not be a container in the same pod. We chose to run it as a separate pod to assist with troubleshooting and to isolate the main application in case the delayed job runs amok. This also means that we can restart it independently of the main application.

Configuration

Much of SPARCRequest's configuration is performed via a number of files in the config directory. To minimize the amount of modification of the configuration files we converted them to rely on environment variables. See the database.yml file as an example. By externalizing the configuration we can use the same code in any environment, only updating configuration as needed.

Then by using a ConfigMap and Secrets we can deploy the same image to any of our environments (dev, stage, prod) and it will be configured appropriately. The Deployment and other resources can then reference the same configuration. See the deployment.yaml for an example of how the configuration is referenced.

Scheduled Tasks

RMID / IRB Service

MUSC operates another application that, among other things, synchronizes the IRB records associated with Protocols and Projects in SPARCRequest. This was an essential feature that OHSU needed to ensure that work performed for projects was in compliance with the IRB status. To that end, we built an API compatible application that allowed SPARCRequest to retrieve IRB information. We also run it as a daily scheduled task to ensure all IRB records in SPARCRequest are up to date. The cron_irb.yaml is an example of running a scheduled task to retrieve the IRB records.

Accessing the application in the cluster

Once SPARCRequest is running in the cluster it needs to be made available outside the cluster so users can access it. This is done with the Service resource. This informs the cluster that the sparc deployment should be made externally available and which ports to access the application on.

The cluster will also need to have an Ingress resource defined, which is responsible for receiving requests outside the cluster and routing them to the appropriate service. That is out of scope for this project, but there are lots of resources available to set this up.

Images

We chose to use two images as our approach in order to maintain a clean separation between the standard SPARCRequest and our customized version. This allows us to isolate our changes from the main code for greater portability.

SPARCRequest base image

The first step is to build the base-image which builds a container image directly from a SPARCRequest tag. We use Docker to build our images.

cd base-image
docker build --rm -t example.edu/sparc_request_base --pull .

If you want to target another version of SPARCRequest you can pass a build argument.

export SPARC_VERION=3.9.0
cd base-image
docker build --rm --build-arg SPARC_VERSION=${SPARC_VERION} -t example.edu/sparc_request_base:${SPARC_VERION} --pull .

We recommend tagging your image with the version of SPARCRequest to make it clear which version you are building with.

Organization specific image

The customized image extends the base image with organization specific changes, including things like environment, locale, and extension files to the standard SPARCRequest version.

Running SPARCRequest

We recommend that you have your Ingress and PersistentVolume resources setup to your satisfaction before attempting to run SPARCRequest in the cluster.

Once you have your images build you can deploy them to the cluster using the kubectl command line tool. The first time you attempt to deploy you should make sure to run it in the following order to ensure the deployments have everything they need.

You can monitor the rollout with kubectl to see the deployment

and the pod as it starts up

and review the logs