Instaclustr Open Service Broker — A Complete End-to-End Example

A complete end-to-end example that illustrates most of the important concepts required to use the Open Servicer Broker(OSB).

Instaclustr
14 min readSep 10, 2019

Instaclustr has recently launched the Instaclustr Service Broker, an implementation of the Open Service Broker (OSB) API for Instaclustr managed services (Apache Cassandra, Spark, Zeppelin, and Kafka).

Over a series of blogs I plan to try it out using the following “bottom-up” approach:

  • get a complete end-to-end Kubernetes workflow working to test and demonstrate the steps required to use the OSB (this blog)
  • use the resulting example workflow to automate our Anomalia Machina application, and finally,
  • try out a Kubernetes-native CI/CD tool (such as Spinnaker) to demonstrate using it for a more complex and realistic scenario.

In this blog, I take the new Instaclustr Open Service Broker for a test drive and build a simple Kubernetes workflow to:

  • provision Cassandra and Kafka clusters, and
  • deploy an application that can connect to the clusters and use them, and cleans up afterwards.

It’s about the simplest possible complete pipeline possible, illustrates most of the important concepts required to use the Open Servicer Broker, and is a good starting point to extend for more complex use cases.

The Instaclustr Open Service Broker Provisioning API

“The Open Service Broker API project allows developers, ISVs, and SaaS vendors a single, simple, and elegant way to deliver services to applications running within cloud native platforms. To build a Service Broker, you must implement the required endpoints as defined in the API specification.”

In Kubernetes, “The Service Catalog project is responsible for integrating Service Brokers to the Kubernetes ecosystem.” Thus, all of the interactions with the Service Broker actually go via the Catalog.

The OSB is a specification, which supports a lifecycle for securely provisioning and using the services, with steps for: service discovery, creating, using (providing information to enable connection, binding to a service, unbinding) and deleting. This sounds a bit like the classic SOA lifecycle which involves publishing services to a registry service by the service provider, and then the consumer finding a service (discovery), and then binding and using the service:

Open Service Broker - SOA collaboration diagram

SOA collaboration diagram

Here a similar diagram for the OSB (main provisioning steps only):

SOA collaboration diagram

OSB collaboration diagram

The actual Steps from the service consumer perspective are as follows (there are some additional steps required to get things ready before these ones, see Preconditions below):

  1. (catalog) Listing the managed services and Service Plans available from a service broker, enabling the consumer to select a plan.
  2. Provisioning a new instance of the managed service.
  3. Binding to the managed service, which returns the connection information and credentials.
  4. Mapping the connection credentials into the application, enabling the consumer to connect to the service and use it.
  5. Unbinding
  6. Deprovisioning

In Kubernetes, all the interactions between the consumer and the service broker occur via the Service Catalog. I’ve attempted to show the operations required to provision and deprovision clusters in the following diagrams:

Open Service Broker - Provisioning Steps

Provisioning Steps

Open Service Broker - Deprovisioning Steps

Deprovisioning steps

The diagrams are (approximately) UML sequence diagrams, and show the provisioning and deprovisioning workflow steps being initiated from the left hand side (with the exception of the “0” steps which are initiated by the service provider and the application). In our example, the steps are just a sequence of kubectl commands running in a shell script. Everything from the workflow to the Catalog (inclusive) is running in a Kubernetes.

The workflow interacts with the Catalog, which in turn interacts with the service provider which provisions and deprovisions the clusters. Time runs from top to bottom, and as a result of some operations other Kubernetes resources are created, used and destroyed (including instances, bindings, and applications). These have more limited lifetimes and must be created, used and deleted in the correct order. With these diagrams in mind let’s jump into the example.

Goal & Preconditions — Run a Cassandra /Kafka application in a Kubernetes cluster

We assume that the goal is to run a Cassandra and/or Kafka application in a Kubernetes cluster, with provisioning of the clusters and orchestration of the application controlled (at least initially) by a series of Kubernetes commands (kubectl).

We also assume that the Kubernetes cluster and the application will be running in the same cloud provider and region as the Instacluster provisioned Cassandra and Kafka clusters. In order to demonstrate a complete end-to-end workflow (including provisioning and deprovisioning) we also assume a CI/CD type of use case, where the application will run for some short period of time and then terminate (or be terminated). For example, load or soak testing prior to pushing the application to production deployment.

Before proceeding, you please ensure that you are familiar with, or have installed, the following:

  1. You have an Instaclustr account, and access to the Instaclustr Username and Provisioning API Key (The key is available from the Console->Account->API Keys.
  2. You are familiar with the Instaclustr Provisioning API support documentation, which provides JSON example of the payloads required to provision clusters. The Instaclustr Service Broker uses the same fields and values, but typically in YAML format.
  3. You have decided on data centres and node details. The available Data centres and node sizes are also documented in the provisioning API document, but the console “create cluster” page contains all of the current options. Pricing information is available from the Instaclustr Console.
  4. You have a Kubernetes cluster running (e.g. using AWS EKS, our experience with AWS EKS, and probably a quicker way using eksctl).
  5. You have followed the steps 1–3.1 (1 — Install Service Catalog; 2 — setup a secret for the Provisioning API authentication; and 3.1 — Create a ClusterServiceBroker) in our “Using Instaclustr with Kubernetes” support page.
  6. You have some familiarity with the Open Service Broker and Kubernetes Service Catalog (introduction and main site) documentation.
  7. If you have a real application that you want to deploy and run with the provisioned clusters then it will need to be modified to obtain connection details (e.g. Cluster IP addresses, username/password) from environment variables, so that it will be ready for the “Deploy application with mapped binding” workflow step.
  8. You know what sort of connectivity you want between the Kubernetes cluster and application deployed on it, and the Instaclustr Cassandra and/or Kafka clusters running in your chosen cloud provider, and have set this up, tested, and have all the relevant information to hand. In general (and for AWS), the connectivity options are VPC Peering, Public IP addresses added to the cluster firewalls, or Custom VPC (which uses “run in your own account”, where the cluster is provisioned in the same account/VPC as the customer’s account/VPC). Given current limitations with the Instaclustr Open Service Broker (you can’t change the cluster after provisioning), only the firewall and Custom VPC options are available, as both of these can be requested at provisioning time. For simplicity for this demonstration example, we assume that you know the public IP address of the Kubernetes worker node (a single instance sufficiently large enough to run all the application Pods). If the worker node is terminated and started again, you’ll have to find the public IP address again, and update the YAML files. For production some alternatives include adding a NAT gateway to your Kubernetes cluster for egress (with an elastic IP address), or using Public IPs in the Kubernetes cluster VPC (which allows for elastic IP addresses).
  9. Your use case doesn’t require modifications to the clusters once they have been provisioned (e.g. resizing, changes to firewall rules, etc). The current Instaclustr Open Service Broker implementation doesn’t allow for changes after creation. If your use case requires changes after cluster creation (e.g. dynamic resizing for Apache Cassandra), you can use our provisioning API instead of, or in conjunction with, the Instaclustr OSB (but watch out you don’t get the OSB into a state that is out of synchronisation with the actual cluster, see Orphans below).

Kubectl Provisioning Workflow Steps

Step 3: Provision and Wait

The following kubectl steps are for the provisioning workflow step 3 (and wait):

kubectl apply -f ClusterServiceBroker.yaml

kubectl create -f CassandraServiceInstance.yaml
kubectl create -f KafkaServiceInstance.yaml

kubectl wait --for=condition=Ready --timeout=30m \ ServiceInstance/my-cassandra-instance \ ServiceInstance/my-kafka-instance

The above steps ensure that the ClusterServiceBroker is running (using our current setup instructions it doesn’t appear to be persistent across worker node restarts), starts the provisioning of Cassandra and Kafka Service Instances, and waits for both of the clusters to be in the “Ready” state (which can take some time, including up to 20 minutes extra time after they are Ready for the Catalog state to be updated correctly). Note that there are potential provisioning errors that can result in the clusters never being Ready, so we’ve included a timeout of 30 minutes so that we don’t wait forever. Ideally we’d check for these errors in real-time and give up sooner (but I haven’t worked out how to do this with vanilla kubectl yet, as the kubectl wait command can only wait for one condition). Also, we don’t do anything special if the timeout occurs (the rest of the steps will just all fail). The kubectl commands are relatively straightforward, as the complexity is really in the ServiceInstance yaml files as follows (note that we’ve used the default namespace throughput as it’s more complicated to keep track of multiple namespaces):

CassandraServiceInstance.yaml

apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceInstance
metadata:
name: my-cassandra-instance
namespace: default
spec:
clusterServiceClassExternalName: instaclustr-managed-service
clusterServicePlanExternalName: new-cluster
authInfo:
basic:
secretRef:
namespace: default
name: my-secret
parameters:
clusterName: Paul-Cassandra-Cluster
bundles:
- bundle: APACHE_CASSANDRA
version: apache-cassandra-3.11.4.ic1
options:
authnAuthz: true
clientEncryption: false
usePrivateBroadcastRPCAddress: false
luceneEnabled: false
backupEnabled: false
provider:
name: AWS_VPC
nodeSize: m4l-250
dataCentre: US_EAST_1
clusterNetwork: 192.168.0.0/18
rackAllocation:
numberOfRacks: 3
nodesPerRack: 1
firewallRules:
- network: 184.72.91.238/32
rules:
- type: CASSANDRA

KafkaServiceInstance.yaml

apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceInstance
metadata:
name: my-kafka-instance
namespace: default
spec:
clusterServiceClassExternalName: instaclustr-managed-service
clusterServicePlanExternalName: new-cluster
authInfo:
basic:
secretRef:
namespace: default
name: my-secret
parameters:
clusterName: Paul-Kafka-Cluster
bundles:
- bundle: KAFKA
version: apache-kafka:2.1.1
options:
clientEncryption: false
numberPartitions: 30
autoCreateTopics: true
deleteTopics: true
provider:
name: AWS_VPC
nodeSize: t2.small-20-gp2
dataCentre: US_EAST_1
clusterNetwork: 192.168.0.0/18
rackAllocation:
numberOfRacks: 3
nodesPerRack: 1
firewallRules:
- network: 184.72.91.238/32
rules:
- type: KAFKA

You’ll need to change some of the parameters for your use case and environment (e.g. cluster name, provider, node sizes, data center, firewall rules, etc). You can check that the clusters were provisioned correctly with:

kubectl describe ServiceInstances

You may notice that sometimes there are minor differences between what you requested and what you got. This is because some combinations are not supported, and some settings are compulsory (E.g. Kafka clusters can only be created with user authentication turned on).

Step 4: Bind

The following kubectl steps are for the provisioning workflow step 4: (with some diagnostics added):

kubectl create -f CassandraServiceBinding.yaml
kubectl create -f KafkaServiceBinding.yaml

kubectl wait --for=condition=Ready \ ServiceBinding/my-cassandra-binding \
ServiceBinding/my-kafka-binding

svcat describe binding my-cassandra-binding
svcat describe binding my-kafka-binding

kubectl get secret my-cassandra-binding -o yaml | grep \ node-public-address | awk -F ":" '{print $2}' | base64 --decode

kubectl get secret my-kafka-binding -o yaml | grep \ node-public-address | awk -F ":" '{print $2}' | base64 --decode

The above steps create the Cassandra and Kafka ServiceBindings. These bindings will contain all the information that an application needs to connect to the clusters, at least public and/or private IP addresses, and possibly username/password. Note that BIndings have Secrets, so if you run this command you’ll see all the secrets:

kubectl get ServiceBindings

Example ServiceBinding yaml files (which set up the relationships between Bindings and Kubernetes secrets) are as follows.

CassandraServiceBinding.yaml

apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceBinding
metadata:
name: my-cassandra-binding
namespace: default
spec:
instanceRef:
name: my-cassandra-instance
authInfo:
basic:
secretRef:
namespace: default
name: my-cassandra-secret

KafkaServiceBinding.yaml

apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceBinding
metadata:
name: my-kafka-binding
namespace: default
spec:
instanceRef:
name: my-kafka-instance
authInfo:
basic:
secretRef:
namespace: default
name: my-kafka-secret

Step 5 (and Step 1 of deprovisioning): Deploy application with mapped bindings and wait until finished.

A single step is required to deploy an application (which we assume has been built and pushed to the docker repository already, see instructions here) with mapped bindings:

kubectl apply -f k8_env_test.yaml

Well, that was easy. But where do the mappings occur? Actually in the Deployment yaml file as follows (see highlighted section):

apiVersion: apps/v1
kind: Deployment
metadata:
name: env-test-deployment
namespace: default
labels:
app: env-test
spec:
replicas: 1
selector:
matchLabels:
app: env-test
template:
metadata:
labels:
app: env-test
spec:
containers:
- name: env-test
image: brebs/env-test:latest
env:
- name: "CassandraPublicIPS"
valueFrom:
secretKeyRef:
name: my-cassandra-binding
key: node-public-addresses
- name: "KafkaPublicIPS"
valueFrom:
secretKeyRef:
name: my-kafka-binding
key: node-public-addresses
- name: "CassandraUsername"
valueFrom:
secretKeyRef:
name: my-cassandra-binding
key: username
- name: "CassandraPassword"
valueFrom:
secretKeyRef:
name: my-cassandra-binding
key: password
- name: "KafkaUsername"
valueFrom:
secretKeyRef:
name: my-kafka-binding
key: username
- name: "KafkaPassword"
valueFrom:
secretKeyRef:
name: my-kafka-binding
key: password
ports:
- containerPort: 80
- containerPort: 1235
resources:
requests:
cpu: "500m"

When this deployment is created, the specified binding name/values are copied to the named environment variables before the java program is started. The problem with a deployment is that it isn’t expected to terminate. In order to demonstrate the complete OSB lifecycle, including deprovisioning, we actually need to use a Kubernetes Job. Jobs either run to completion or can be terminated after a fixed period of time. They can be run in 1 or many Pods. Here’s an example yaml Job file for a single pod job that computes PI for a few seconds and then terminates:

pi.yaml

apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
backoffLimit: 5
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never

It’s run as follows. Note that the kubectl wait is Step 1 in the deprovisioning workflow.

kubectl create -f pi.yaml
kubectl wait --for=condition=Complete jobs/pi
Kubectl delete -f pi.yaml

Note that you have to explicitly delete the job as it (sensibly) hangs around after completion in case you want to check the logs.

Here’s a slightly different yaml Job file that runs for longer, but terminates after 10s:

pi_timeout.yaml

apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout
spec:
backoffLimit: 5
activeDeadlineSeconds: 10
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(200000)"]
restartPolicy: Never

It’s run like this:

kubectl create -f pi_timeout.yaml
kubectl wait --for=condition=Failed jobs/pi-with-timeout
kubectl delete -f pi_timeout.yaml

This seems like a slightly odd way of running a Job, as it will “succeed” by failing! Presumably there may be other conditions which cause a Failed job state, so ideally we should also check that the reason was DeadlineExceeded as well, before deciding what to do next.

Here’s an example yaml Job file to run a java program (a Cassandra client, using public IPs to connect, and user credentials) for 30 minutes in 10 Pods, using the secret public IP addresses, and then terminates it:

cassandra.yaml

apiVersion: apps/v1
kind: Job
metadata:
name: cassandra-job
labels:
app: cassandra-client
spec:
backoffLimit: 5
activeDeadlineSeconds: 1800
parallelism: 10
template:
metadata:
labels:
app: cassandra-client
spec:
containers:
- name: cassandra-client
image: user/cassandra-client:latest
env:
- name: "CassandraPublicIPS"
valueFrom:
secretKeyRef:
name: my-cassandra-binding
key: node-public-addresses
- name: "CassandraUsername"
valueFrom:
secretKeyRef:
name: my-cassandra-binding
key: username
- name: "CassandraPassword"
valueFrom:
secretKeyRef:
name: my-cassandra-binding
key: password
ports:
- containerPort: 80
- containerPort: 1234
resources:
requests:
cpu: "500m"
restartPolicy: Never

It is run like this:

kubectl create -f cassandra.yaml
kubectl wait --for=condition=Failed Job/cassandra-job
kubectl delete -f cassandra.yaml

Kubectl Deprovisioning Workflow Steps

Note that the Deprovisioning workflow — Step 1, Application completion or termination — has been covered above. The rest of the steps are actually pretty straightforward, but there are two ways of doing them, either using svcat (which operates on the instances) or kubetcl (which operates on the resources) as follows:

svcat unbind my-kafka-instance
svcat unbind my-cassandra-instance
svcat deprovision my-cassandra-instance
svcat deprovision my-kafka-instance

And kubectl as follows:

kubectl delete ServiceBinding/my-kafka-binding
kubectl delete ServiceBinding/my-cassandra-binding
kubectl delete ServiceInstance/my-kafka-instance
kubectl delete ServiceInstance/my-cassandra-instance

Complete workflow

Here are all the main steps for the complete end-to-end workflow in one place:

kubectl apply -f ClusterServiceBroker.yaml
kubectl create -f CassandraServiceInstance.yaml
kubectl create -f KafkaServiceInstance.yaml
kubectl wait --for=condition=Ready --timeout=30m \ ServiceInstance/my-cassandra-instance \ ServiceInstance/my-kafka-instance
kubectl create -f CassandraServiceBinding.yaml
kubectl create -f KafkaServiceBinding.yaml
kubectl wait --for=condition=Ready \ ServiceBinding/my-cassandra-binding ServiceBinding/my-kafka-binding
kubectl create -f cassandra.yaml
kubectl wait --for=condition=Failed Job/cassandra-job
kubectl delete -f cassandra.yaml
kubectl delete ServiceBinding/my-kafka-binding
kubectl delete ServiceBinding/my-cassandra-binding
kubectl delete ServiceInstance/my-kafka-instance
kubectl delete ServiceInstance/my-cassandra-instance

Two useful tricks

There were two issues that I encountered multiple times during this exercise: errors in the YAML files, and service instance Orphans. Luckily I found these two simple workarounds!

YAML file problems

It’s important to construct the YAML files correctly (both format and content), otherwise you’ll get strange errors. One way of doing this is to use an online tool to convert from the Instaclustr Provisioning API JSON examples to YAML, and then copy/paste the YAML into the YAML files in the correct location.

Orphans

Because the Open Service Broker acts as a stateful broker between the Catalog and the Service Provider, it’s possible (and likely) that it eventually gets into an odd state (for example, as a result of an unrecoverable provisioning error, or if a cluster is deleted using the Instaclustr console or Provisioning API, i.e. clusters than cannot be provisioned, or that were provisioned but are now deleted). This can result in service instances (and bindings) being in a state where they can’t be used, but also can’t be deleted, so you can’t create new ones with the same names. These are called orphans. If you are 100% sure the instance has no running cluster associated with it, then it is possible to force deletion using the “ — abandon” option on the svcat command as follows:

svcat deprovision --abandon my-kafka-instance
This action is not reversible and may cause you to be charged for the broker resources that are abandoned. If you have any bindings for this instance, please delete them manually with svcat unbind --abandon --name bindingName
Are you sure? [y|n]:
y

Conclusions

We’ve successfully created a complete demonstration example for the Instaclustr Open Service Broker, using only the Kubernetes command line (kubectl) to run it. However, running the workflow this way has a few limitations. We were able to use the kubectl wait command to ensure that some of the steps were ready before proceeding to the next step. However, we weren’t able to detect and respond to error conditions, which could cause detection of problems to be delayed, and problems with the subsequent steps and correct clean up. Also, for a CI/CD use case, how do you know if the application passed the tests? For our Anomalia Machina application we instrumented it with a combination of Prometheus (for metrics) and OpenTracing and Jaeger (for tracing). We need a way to specify a pass/fail condition, get and analyse the metrics and/or traces, and then make a decision (perhaps based on some absolute pass/fail condition, or a comparison to previous or current baseline results obtained). In another blog, we’ll eventually have a look at (hopefully!) one way of addressing both of these aspects, probably using Spinnaker.

--

--

Instaclustr

Managed platform for open source technologies including Apache Cassandra, Apache Kafka, Apache ZooKeepere, Redis, Elasticsearch and PostgreSQL