Instaclustr Node Patching — finding a Kernel bug with the AWS EC2 Xen Hypervisor

How does the Instaclustr OS patching cycle work?

  • Step 1: Perform thorough internal testing of various node infrastructures, including extensive performance testing
  • Step 2: Upgrade a small portion of our non-production SLA tier clusters (developer size nodes)
  • Step 3: Upgrade all remaining non-production SLA tier clusters
  • Step 4: Make this Operating System version the default for all new nodes
  • Step 5: Upgrade all production SLA tier clusters
  • Step 6: Perform final analysis of the fleet to ensure that all applicable nodes are running the correct OS version

Issues found in testing

[    6.902955] ena: The ena device sent a completion but the driver didn't receive a MSI-X interrupt (cmd 8), autopolling mode is OFF[    6.914601] ena: Failed to submit get_feature command 12 error: -62[    6.921175] ena 0000:00:03.0: Cannot init indirect table[    6.927008] ena 0000:00:03.0: Cannot init RSS rc: -62[    6.947883] ena: probe of 0000:00:03.0 failed with error -62[   65.783000] nvme nvme0: I/O 15 QID 0 timeout, completion polled
/sys/hypervisor/version/{major,minor,extra}.

Recommendations from AWS

Instaclustr Actions

Next Steps

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Instaclustr

Instaclustr

Managed platform for open source technologies including Apache Cassandra, Apache Kafka, Apache ZooKeepere, Redis, Elasticsearch and PostgreSQL