Kubernetes Guides

Portworx Tutorial : Demonstrate HA Cassandra Stateful Application

Portworx is a popular Kubernetes persistent storage and Docker storage solution. It’s a clustered block storage solution and provides a Cloud-Native layer from which containerized stateful applications programmatically consume block, file, and object storage services directly through the scheduler.

With Portworx, you can manage any database or stateful service on any infrastructure using any container scheduler. You get a single data management layer for all of your stateful services, no matter where they run.

In this post, we will learn how to deploy Cassandra to Kubernetes and use Portworx Volumes to provide HA capability:

  1. Install, configure Portworx
  2. Use the Portworx Storage Class to create a PVC with 3 replicas of the data
  3. Use a simple YAML file to deploy Cassandra using this storage class
  4. How to validate data persistence by deleting the Cassandra pod

First, we will deploy Cassandra in a StatefulSet with a single node (replicas=1) to show the basics of node failover. We will create sample data, force Cassandra to flush the data to disk, and then failover the Cassandra pod and show how it comes back up with its data intact. Then, we’re going to show how we can scale the cluster to 3 nodes and dynamically create volumes for each.

Step #1.Validate Kubernetes

Use kubectl get nodes to check if the Kubernetes nodes are ready.

Image – Kubernetes Pods are ready

Step #2.Install Portworx

Portworx requires at least 2 to 3 nodes in the cluster to have dedicated storage for use. It will then carve out virtual volumes from these storage pools. In this example, we use a 20GB block device that exists on each node.

Image – Choose the device to install portworx
Image – Install Portworx

In the above install command, note the below:

  • c=px-demo specifies the cluster name
  • b=true specifies to use internal etcd
  • kbVer=${VER} specifies the Kubernetes version
  • s=/dev/vdb specifies the block device to use

Use kubectl get pods -n kube-system -l name=portworx -o w to check if the Portworx pods are ready and status is in RUNNING state.

Image – Portworx pods are ready

You can also take a look at the cluster status using the pxctl command as well.

Now, we have the Portworx cluster ready, we can proceed to the next step.

Step #3: Create StorageClass

StorageClass provides a way to describe the “classes” of storage. Various classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators.

Storage class may differ according to the needs of the business application. Now for our scenario, we have defined below storage class with a replication factor of 2 to accelerate Cassandra node recovery and we also defined a group name for Cassandra so that we can take 3DSnapshots.

Image – Cassandra StorageClass

Refer here for a full list of supported parameters for Portworx volume.

Create the storage class using kubectl create command.

Image – Create the storage class

In case of production environments, you would also have to add the "fg=true" parameter to your StorageClass to ensure that Portworx places each Cassandra volume and their replica on separate nodes so that in case of node failure we never failover to a node where it is already running. To enable this feature with a 3 volume group and 2 replicas you need a minimum of 6 worker nodes.

We have got StorageClass ready, let’s deploy Cassandra on the cluster.

Step #4: Deploy Cassandra

In this step, we are going to deploy a 3 node Cassandra application using a stateful set. StatefulSet is used to manage stateful applications i.e., maintains a sticky identity for each of their Pods. Kubernetes maintains a persistent identifier so that it can maintain across any rescheduling.

Create below Cassandra StatefulSet that uses a Portworx PVC created in the earlier step.

apiVersion: v1
kind: Service
metadata:
labels:
app: cassandra
name: cassandra
spec:
clusterIP: None
ports:
- port: 9042
selector:
app: cassandra
---
apiVersion: "apps/v1beta1"
kind: StatefulSet
metadata:
name: cassandra
spec:
serviceName: cassandra
replicas: 1
template:
metadata:
labels:
app: cassandra
spec:
# Use the stork scheduler to enable more efficient placement of the pods
schedulerName: stork
containers:
- name: cassandra
image: gcr.io/google-samples/cassandra:v14
imagePullPolicy: Always
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done"]
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: CASSANDRA_SEEDS
value: "cassandra-0.cassandra.default.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "K8Demo"
- name: CASSANDRA_DC
value: "DC1-K8Demo"
- name: CASSANDRA_RACK
value: "Rack1-K8Demo"
- name: CASSANDRA_AUTO_BOOTSTRAP
value: "false"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
readinessProbe:
exec:
command:
- /bin/bash
- -c
- /ready-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
# These volume mounts are persistent. They are like inline claims,
# but not exactly because the names need to match exactly one of
# the stateful pod volumes.
volumeMounts:
- name: cassandra-data
mountPath: /cassandra_data
# These are converted to volume claims by the controller
# and mounted at the paths mentioned above.
volumeClaimTemplates:
- metadata:
name: cassandra-data
annotations:
volume.beta.kubernetes.io/storage-class: px-storageclass
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: cqlsh
spec:
containers:
- name: cqlsh
image: mikewright/cqlsh
command:
- sh
- -c
- "exec tail -f /dev/null"

Create the StatefulSet using kubectl create command.

Image – Create a Cassandra StatefulSet

Use kubectl get pods the command to validate if the pod is READY.

Image – Validate if pods are ready

As an optional step, you can use pxctl the command line to inspect the volumes underlying volumes of Cassandra pod. that we have created.

Image – Inspect volume using pxctl

From the output, infer the following

  • State indicates the volume is attached and shows the node on which it is attached and This is the node where the Kubernetes pod is running.
  • HA shows the number of configured replicas for this volume.
  • Labels show the name of the PVC for this volume.
  • Replica sets on nodes shows the px nodes on which volume is replicated.

Now that we have Cassandra ready, we can create a sample database and populate some data.

Step #5: Create a Cassandra Database

Initialize a sample database on our Cassandra instance using CQL commands.

Image – Connect to CQL Shell session

Next step is to create a keyspace with replication of 3 and insert some sample data:

Image – Create a keyspace and insert sample data

Once the data is inserted, check if the same has been created.

Image – Select rows from the keyspace

Now that we have got the records created, we can proceed to check if the failover works properly or not but before that, we will have to flush (use nodetool flush command) the in-memory data onto disk so that when the Cassandra starts on another node it will have access to the data that was just written. Cassandra by default keeps data in memory and only flushes it to disk after 10 minutes by default.

Image – Flush data to disk

Step #6: Delete Cassandra Instance

Let us simulate failure by cordoning the node where Cassandra is running and then deleting the Cassandra pod. The pod will then be rescheduled to make sure it lands on one of the nodes that have the replica of the data.

Image – Delete Cassandra instance

Once the Cassandra pod gets deleted, Kubernetes will start to create a new Cassandra pod on another node. Use kubectl get pods to verify, when the pod comes back up it will be in the RUNNING and READY(1/1) state.

Image – Verify replacement pod starts running

Also, we have to uncordon the node before the next step.

Image – Uncordon node

We have the new Cassandra pod running, let’s check if the database we previously created is still intact.

Step #7: Verify data is still available

Let’s start a CQL Shell session and validate if the data is available.

Image – Verify if data is still available

Congrats! we have our data and survived the node failure too!

Step #8: Scale the cluster

We will scale our Cassandra stateful set to 3 replicas using kubectl scale command.

Image – Scale the cluster

You can watch the pods getting added:

Image – Cluster scaled

It will take a minute or two for all three Cassandra nodes to come online and discover each other.

Additional Resources :

 

Summary
Article Name
Portworx Tutorial : Demonstrate HA Cassandra Stateful Application
Description
In this post, we will learn how to deploy Cassandra to Kubernetes and use Portworx Volumes to provide HA capability:
Author
Publisher Name
Upnxtblog
Publisher Logo
Karthik

Allo! My name is Karthik,experienced IT professional.Upnxtblog covers key technology trends that impacts technology industry.This includes Cloud computing,Blockchain,Machine learning & AI,Best mobile apps, Best tools/open source libs etc.,I hope you would love it and you can be sure that each post is fantastic and will be worth your time.

Share
Published by
Karthik

Recent Posts

Innovators in Crypto: Prominent AI-Powered Coins

The cryptocurrency industry is being reshaped by the fusion of blockchain technology and artificial intelligence…

5 days ago

Top AI Design Tools Every Graphic Designer Should Use in 2024

Introduction Artificial Intelligence (AI) has also found its relevance in graphic design and is quickly…

1 month ago

Transforming Industries: The Integration of AI and Blockchain

Imagine a world where the brilliance of Artificial Intelligence (AI) meets the unbreakable security of…

2 months ago

How Can I Use Automation to Streamline My Digital Marketing Efforts?

In today’s fast-paced digital landscape, automation is not just a luxury but a necessity for…

2 months ago

Top 5 AI Technologies Transforming the Casino Gaming Landscape in 2025

The world of casino gaming has leveraged the emerging technology advancements to create immersive and…

2 months ago

Choosing the Best Web Hosting for Your Small Business Needs

Selecting the right web hosting for your small business is crucial for ensuring a smooth…

2 months ago

This website uses cookies.